CN113850261A - Processing method, device and equipment for bill OCR and computer readable storage medium - Google Patents

Processing method, device and equipment for bill OCR and computer readable storage medium Download PDF

Info

Publication number
CN113850261A
CN113850261A CN202111185315.4A CN202111185315A CN113850261A CN 113850261 A CN113850261 A CN 113850261A CN 202111185315 A CN202111185315 A CN 202111185315A CN 113850261 A CN113850261 A CN 113850261A
Authority
CN
China
Prior art keywords
information
bill
result
target
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111185315.4A
Other languages
Chinese (zh)
Inventor
岳征宇
姜良友
张文婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Whale Stork Technology Co ltd
Original Assignee
Beijing Whale Stork Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Whale Stork Technology Co ltd filed Critical Beijing Whale Stork Technology Co ltd
Priority to CN202111185315.4A priority Critical patent/CN113850261A/en
Publication of CN113850261A publication Critical patent/CN113850261A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The application provides a processing method, a device, equipment and a computer readable storage medium for bill OCR, wherein the processing method for bill OCR comprises the following steps: performing OCR recognition on the target bill to obtain an initial recognition result; extracting key information from the initial recognition result according to a preset rule; determining missing information according to the associated bill of the target bill and/or the first database related to the type of the missing information under the condition that the key information is missing; and supplementing missing information in the initial recognition result to obtain a target recognition result. According to the bill OCR processing method, labor can be effectively saved, labor cost is reduced, and efficiency and accuracy of bill recognition are improved.

Description

Processing method, device and equipment for bill OCR and computer readable storage medium
Technical Field
The present application relates to the field of bill recognition technologies, and in particular, to a processing method for bill OCR, a processing apparatus for bill OCR, and a computer-readable storage medium.
Background
In the related art, the insurance personnel usually need to upload bills in the process of applying for claim settlement. And then, after the staff manually identifies the bill content, the bill content is input into the system through a related program. However, the manual identification of the bill contents results in a large workload of workers, long time consumption and high error rate of entry.
Disclosure of Invention
The embodiment of the application provides a processing method of bill OCR, which aims to solve the problems in the related technology, and the technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a processing method for a ticket OCR, including:
performing OCR recognition on the target bill to obtain an initial recognition result;
extracting key information from the initial recognition result according to a preset rule;
determining missing information according to the associated bill of the target bill and/or the first database related to the type of the missing information under the condition that the key information is missing;
and supplementing missing information in the initial recognition result to obtain a target recognition result.
In one embodiment, determining the missing information according to the associated ticket of the target ticket and/or the first database related to the type of the missing information includes:
and selecting the associated bill or the first database according to the preset priority to determine the missing information.
In one embodiment, determining the missing information according to the associated ticket of the target ticket and/or the first database related to the type of the missing information includes:
determining first preselected information and a corresponding confidence coefficient according to the associated bill;
determining second preselected information and a corresponding confidence coefficient thereof according to the first database;
and determining the missing information from the first preselected information and the second preselected information according to the confidence degree.
In one embodiment, extracting key information from the initial recognition result according to a preset rule includes:
determining a result to be selected from the initial recognition result;
deleting prefix and suffix contents in the result to be selected to obtain key information; and/or the presence of a gas in the gas,
carrying out phrase splitting on the result to be selected to obtain key information; and/or the presence of a gas in the gas,
filtering the result to be selected according to a second database related to the result to be selected to obtain key information; and/or the presence of a gas in the gas,
and under the condition that the to-be-selected result comprises a seal or a frame, performing corresponding seal or frame removing operation to obtain key information.
In one embodiment, OCR recognition of a target ticket comprises:
selecting a corresponding recognition model according to the category of the target bill;
and performing OCR recognition on the target bill according to the corresponding recognition model.
In one embodiment, the processing method of the bill OCR further includes:
and calling the corresponding database based on various information in the target recognition result to determine a plurality of index information.
In one embodiment, the processing method of the bill OCR further includes:
and performing characteristic verification on the plurality of preprocessed bills to screen out the target bills.
In a second aspect, an embodiment of the present application provides a processing apparatus for a ticket OCR, including:
the bill recognition module is used for carrying out OCR recognition on the target bill to obtain an initial recognition result;
the information extraction module is used for extracting key information from the initial identification result according to a preset rule;
the missing information determining module is used for determining the missing information according to the associated bill of the target bill and/or the first database related to the type of the missing information under the condition that the key information is missing;
and the result determining module is used for supplementing missing information in the initial recognition result so as to obtain a target recognition result.
In one embodiment, the missing information determining module is further configured to select the associated ticket or the first database according to a preset priority to determine the missing information.
In one embodiment, the missing information determination module is further configured to:
determining first preselected information and a corresponding confidence coefficient according to the associated bill;
determining second preselected information and a corresponding confidence coefficient thereof according to the first database;
and determining the missing information from the first preselected information and the second preselected information according to the confidence degree.
In one embodiment, the information extraction module is further configured to:
determining a result to be selected from the initial recognition result;
deleting prefix and suffix contents in the result to be selected to obtain key information; and/or the presence of a gas in the gas,
carrying out phrase splitting on the result to be selected to obtain key information; and/or the presence of a gas in the gas,
filtering the result to be selected according to a second database related to the result to be selected to obtain key information; and/or the presence of a gas in the gas,
and under the condition that the to-be-selected result comprises a seal or a frame, performing corresponding seal or frame removing operation to obtain key information.
In one embodiment, the ticket identification module is further configured to:
selecting a corresponding recognition model according to the category of the target bill;
and performing OCR recognition on the target bill according to the corresponding recognition model.
In one embodiment, the processing device for bill OCR further includes:
and the index information determining module is used for calling the corresponding database based on each item of information in the target recognition result so as to determine a plurality of index information.
In one embodiment, the processing device for bill OCR further includes:
and the characteristic verification module is used for performing characteristic verification on the plurality of preprocessed bills to screen out the target bills.
In a third aspect, an embodiment of the present application provides a processing device for ticket OCR, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which computer instructions are stored, and when executed by a processor, the computer instructions implement a method in any one of the above-mentioned aspects.
The advantages or beneficial effects in the above technical solution at least include: the bill identification device can effectively save manpower, reduce the cost of people and improve the bill identification efficiency. Moreover, post-processing of automatically supplementing missing information can be performed under the condition that the extracted key information is missing, verification is not needed to be performed manually after OCR (Optical Character Recognition), and therefore the accuracy of bill Recognition is improved, meanwhile, the workload of workers can be further reduced, and the flow of data entry is simplified.
The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present application will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
FIG. 1A is a flow chart of a processing method of ticket OCR according to an embodiment of the present application;
FIG. 1B is a flow chart of determining missing information according to an embodiment of the present application;
FIG. 1C is a flow chart of OCR recognition of a target ticket according to one embodiment of the present application;
FIG. 2 is a logical block diagram of a processing method of ticket OCR according to an embodiment of the present application;
FIG. 3 is a block diagram of a processing apparatus for ticket OCR according to an embodiment of the present application;
FIG. 4 is a block diagram of a processing device for ticket OCR according to an embodiment of the present application.
Detailed Description
In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will appreciate, the described embodiments may be modified in various different ways, without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
FIG. 1A shows a flow chart of a processing method of bill OCR according to an embodiment of the application. As shown in fig. 1A, the processing method of the ticket OCR may include:
step S110: performing OCR recognition on the target bill to obtain an initial recognition result;
step S120: extracting key information from the initial recognition result according to a preset rule;
step S130: determining missing information according to the associated bill of the target bill and/or the first database related to the type of the missing information under the condition that the key information is missing;
step S140: and supplementing missing information in the initial recognition result to obtain a target recognition result.
The related bills of the target bills can be bills uploaded by the user at the same time or in the same batch; the associated ticket of the target ticket may also be a ticket having the same identification information as the target ticket, where the identification information may be a name or an account number.
For example, when a medical care-giver applies for a claim, an outpatient invoice, a prescription sheet, and a medical record sheet are usually uploaded. Illustratively, when OCR recognition is performed on the out-patient invoice, the out-patient invoice is the target bill. The method comprises the steps of firstly carrying out OCR recognition on the out-patient invoice, and collecting all information on the out-patient invoice in the OCR recognition process, so that the initial recognition result may comprise key information required for claim application and information irrelevant to claim application. And then extracting key information required by claim application according to a preset rule.
When the extracted key information is missing, for example, the extracted key information at this time is as follows:
the hospital of seeing a doctor: is there a Benevolence hospitals; name: a Yue; sex: male; the type of medical insurance: town employees; the date of the visit: 2021-08-27; the major names are: western/chinese patent; the large amount is: 86.06/68.61; detailed item name: halogen rice? A cream; the detailed item amount: 27.4600, respectively; total amount: 164.67.
wherein, "? "indicates a portion in which a deletion occurs in the key information. From the above information, it can be known that the extracted key information such as sex information, medical insurance type information, date information of treatment, major name information, major amount information, detailed amount information, total amount information, and the like is complete information. The key information such as the information of the hospital, the name information and the detailed name information is lost.
Since the item name information is medicine information, the item name information can be determined from a first database, such as a medicine library, associated with the medicine information. Moreover, since the information of the hospital for visit, the name information and the detailed name information may appear on the associated tickets of the out-patient invoice (i.e., the above-mentioned prescription slip and medical record slip), the missing information among the information of the hospital for visit, the name information and the detailed name information on the out-patient invoice can be determined from the information of the hospital for visit, the name information and the detailed name information on the prescription slip and the medical record slip. Further, missing information can be supplemented, and the target recognition result is obtained as follows:
the hospital of seeing a doctor: college of capital medical sciences affiliated with college hospitals; name: somebody in Yue; sex: male; the type of medical insurance: town employees; the date of the visit: 2021-08-27; the major names are: western/chinese patent; large amount of money: 86.06/68.61; detailed item name: halometasone cream; the detailed item amount: 27.4600, respectively; total amount: 164.67.
according to the processing method of the bill OCR, the target bill can be identified through the OCR technology, and compared with the existing manual identification mode, the method can effectively save manpower, reduce labor cost and improve bill identification efficiency. Moreover, post-processing of automatically supplementing missing information can be carried out under the condition that the extracted key information is missing, manual verification is not needed after OCR recognition, and therefore the bill recognition accuracy is improved, meanwhile, the workload of workers can be further reduced, and the data entry process is simplified.
In one embodiment, step S130 includes: and selecting the associated bill or the first database according to the preset priority, and determining the missing information, namely the missing key information. For convenience of description, the key message in which the absence occurs will be referred to as "incomplete information" below.
For example, the selection-related ticket may be set to a first priority level and the selection-first database may be set to a second priority level in advance. Under the condition that key information is lost, when the first priority is higher than the second priority, firstly selecting a related bill of the target bill, judging whether clear information which is the same as incomplete information in the target bill exists in the related bill or not, and if so, determining the lost information according to the information which is the same as the incomplete information in the related bill; and if the judgment result is negative, further selecting the first database, and determining the missing information according to the first database.
For example, the detailed name information of the out-patient invoice is incomplete information, if the first priority is higher than the second priority, and under the condition that clear detailed name information exists in the prescription list and the medical record list, missing information in the detailed name information of the out-patient invoice is determined according to the detailed name information in the prescription list and the medical record list; and under the condition that the detailed name information cannot be acquired from the prescription list and the medical record list, determining the detailed name information according to the medicine library.
Therefore, by selecting the associated bill or the first database according to the preset priority, the other one of the associated bill and the first database does not need to be further selected under the condition that one of the associated bill and the first database with higher priority can determine the missing information, so that the workload can be effectively reduced, and the efficiency can be improved.
In the above example, the first priority is higher than the second priority. Those skilled in the art can understand that the first priority may also be smaller than the second priority, and at this time, the first database is selected first when the key information is missing, the principle is similar to the case where the first priority is greater than the second priority, and details are not described here.
In one embodiment, as shown in fig. 1B, step S130 includes:
step S131: and determining the first preselected information and the corresponding confidence coefficient according to the associated bill.
Step S132: determining second preselected information and a corresponding confidence coefficient thereof according to the first database;
step S133: and determining the missing information from the first preselected information and the second preselected information according to the confidence degree.
In step S133, if the confidence degree corresponding to the first preselected information is greater than the confidence degree corresponding to the second preselected information, determining missing information from the first preselected information; and if the confidence degree corresponding to the first preselected information is smaller than the confidence degree corresponding to the second preselected information, determining missing information from the second preselected information. Therefore, the accuracy of bill identification can be improved according to the confidence degrees of the first preselected information and the second preselected information, and automatic and accurate data entry is realized.
In one example, in step S131, when the target ticket has a plurality of associated tickets, and the same information as the incomplete information exists in each of the plurality of associated tickets, the first preselected information may also be determined according to the confidence level. For example, when the target ticket is an out-patient invoice, the prescription and medical record are both associated tickets of the out-patient invoice. When the information identical to the incomplete information of the diagnosis invoice exists in both the prescription list and the medical record list, the information identical to the incomplete information in the prescription list and the confidence coefficient of the information identical to the incomplete information in the medical record list are judged, and finally the information with the high confidence coefficient is used as first preselected information. By the arrangement, the first preselected information can be accurately determined, so that the accuracy of bill identification can be further improved.
In one embodiment, step S120 includes:
step S121: determining a result to be selected from the initial recognition result;
step S122: and deleting prefix and suffix contents in the result to be selected to obtain key information.
Illustratively, when the determined candidate result is "name: when a certain Yue exists, the prefix in the result to be selected can be deleted to obtain the key information of the certain Yue. In this way, prefix content unrelated to claim application can be deleted, and only the required key information is retained.
In one embodiment, step S120 includes:
step S121: determining a result to be selected from the initial recognition result;
step S123: and splitting phrases of the result to be selected to obtain key information.
Illustratively, in the candidate result, a situation that a plurality of fields are mixed may occur, for example, when the candidate result is "amoxicillin 18 yuan", the plurality of fields in the candidate result may be separated to obtain the key information "drug name ═ amoxicillin, price ═ 18 yuan". Therefore, phrases mixed together in the selected result can be split, and the key information is clearer.
In one embodiment, step S120 includes:
step S121: determining a result to be selected from the initial recognition result;
step S124: and filtering the result to be selected according to a second database related to the result to be selected so as to obtain key information.
For example, in the case that the content of the target ticket is complex, there may be content irrelevant to claim application in the candidate result, for example, when the content of the candidate result includes fields of medicine, disease, and hospital, the irrelevant content in the candidate result may be deleted according to the corresponding medicine library, disease information library, and hospital library; when the content of the result to be selected includes fields such as service information, the irrelevant content in the result to be selected can be deleted or the corresponding data in the result to be selected can be subjected to supplementary verification according to the service information base. By the arrangement, the finally obtained key information can be further ensured to be the required information when applying for claim settlement, and the filtration of irrelevant information is realized.
Of course, in the case that there is irrelevant information in the result to be selected, the irrelevant information may be excluded by using characteristics such as the header, the picture ratio, and the location of the key information to obtain the key information, and the method is not limited to filtering the result to be selected according to the second database.
In one embodiment, step S120 includes:
step S121: determining a result to be selected from the initial recognition result;
step S125: and under the condition that the to-be-selected result comprises a seal or a frame, performing corresponding seal or frame removing operation to obtain key information. For example, the operation of removing print and frame can be performed by using the chroma processing mode. The key information can be accurately identified under the condition that the key information is covered by the seal or the frame, and the identification accuracy is effectively ensured.
Optionally, step S120 may further include:
step S126: under the condition that the to-be-selected result comprises information with obvious characteristics, obtaining key information by adopting a regular matching mode and the like; and/or
Step S127: and when the result to be selected has an error, processing and optimizing the result to be selected to obtain the key information.
In step S126, key information is obtained by using a method such as regular matching for the distinctive information such as date, age, amount, sex, and invoice number. In step S127, for the case where the quality of the ticket is low due to problems such as printing and shooting, and an error occurs in the candidate result, the error may be recorded and classified, and when the frequency of occurrence of one of the categories of errors is high, the candidate result may be processed and optimized by a corresponding processing logic, for example, when the "due charge" occurs in the charge field, the "due charge" is changed to the "western charge", so that the accuracy may be further improved.
In one embodiment, as shown in fig. 1C, step S110 includes:
step S111: selecting a corresponding recognition model according to the category of the target bill;
step S112: and performing OCR recognition on the target bill according to the corresponding recognition model.
For example, a large amount of characteristic values and key contents can be extracted from different bills in advance to establish different identification models for the different bills, such as an identification card identification model, a bank card identification model, different regional medical bill identification models and the like.
In step S111 and step S112, the target ticket may be compared with the key content according to the feature value of the target ticket, and then the target ticket is assigned to the recognition model with the highest similarity according to the feature value of the target ticket and the key content for OCR recognition.
For example, when a medical insurance participant applies for claim settlement, the outpatient service invoice, the prescription slip and the medical record slip which need to be uploaded respectively correspond to three different recognition models, when the number of the recognition models selected according to the types of the target bills is less than three, at least one of the outpatient service invoice, the prescription slip and the medical record slip is absent in the bills uploaded by the user, and at this time, the user can be prompted to upload the absent bills; when the number of the identification models selected according to the types of the target bills is more than or equal to three, OCR (optical character recognition) can be respectively carried out on the outpatient service invoice, the prescription slip and the medical record slip through different identification models. Therefore, different types of target bills can be classified, and the recognition efficiency can be further improved while the bill contents are recognized more accurately.
In an embodiment, with reference to fig. 2, before step S110, the method may further include:
step S100: and performing characteristic verification on the plurality of preprocessed bills to screen out the target bills. In this step, the bills uploaded by the user can be preprocessed, for example, the images of irrelevant bills such as X-ray pictures and irrelevant section pictures are specially processed, and the target bills meeting the requirements (for example, relevant to claim settlement) are screened out, so that the workload of recognition can be reduced in the subsequent process.
In one embodiment, the processing method of the bill OCR further includes:
step S150: and calling the corresponding database based on various items of information in the target recognition result to determine a plurality of index information. In this step, based on the information in the target recognition result obtained in step S140, such as information about the date of visit, information about the hospital of visit, name information, and amount of money spent, index information such as the amount of money paid, the number of times of visit in the month, the amount of money spent in the month, the amount of money paid in the month, common symptoms of the user, and what services can be provided to the user can be obtained by associating, calculating, and fusing with the corresponding database, so that real-time recommendation can be provided to the user according to the index information, and the user experience can be effectively improved.
In one embodiment, referring to fig. 2, the processing method of the ticket OCR may further include:
step S160: and storing various items of information and a plurality of index information in the target identification result to provide data support for subsequent application.
Step S170: and performing related application on the stored data.
According to the bill OCR processing method provided by the embodiment of the application, with reference to FIG. 2, after a user submits a bill, the bill can be verified through preprocessing, irrelevant bills are removed, a target bill is screened out, then the target bill is classified through a recognition model, OCR recognition is carried out on the target bill by adopting the corresponding recognition model, key information is extracted, then corresponding post-processing is carried out on a recognition result, and finally data fusion, data storage and data application are carried out.
From this, can accomplish processes such as extraction, processing, the application to data by a station formula, realized extracting, handling and fusing accurate, fast to the bill content, can realize the automation mechanized operation, use manpower sparingly the cost, the lifting efficiency passes through preliminary treatment and aftertreatment work when facing diversified, the content is complicated, the not good bill of quality, can effectively promote the degree of accuracy of result. In addition, the circulation process does not reveal privacy data of multiple parties, the data can be invisible to users, and various scene requirements of big data joint analysis, machine learning joint modeling and the like can be met. In addition, after the data are subjected to fusion calculation, a solid data foundation can be provided for the industry field related to the bill, and the business processing flow is accelerated.
FIG. 3 shows a block diagram of a processing apparatus 300 for ticket OCR according to an embodiment of the second aspect of the present application. As shown in fig. 3, the apparatus may include: the bill recognition module 310 is configured to perform OCR recognition on the target bill to obtain an initial recognition result; the information extraction module 320 is configured to extract key information from the initial recognition result according to a preset rule; the missing information determining module 330 is configured to determine missing information according to the associated ticket of the target ticket and/or the first database related to the type of the missing information when the key information is missing; and the result determining module 340 is configured to supplement the missing information in the initial recognition result to obtain the target recognition result.
In one embodiment, the missing information determining module 330 is further configured to select the associated ticket or the first database according to a preset priority to determine the missing information.
In one embodiment, the missing information determination module 330 is further configured to: determining first preselected information and a corresponding confidence coefficient according to the associated bill; determining second preselected information and a corresponding confidence coefficient thereof according to the first database; and determining missing information from the first preselected information and the second preselected information according to the confidence degree.
In one embodiment, the information extraction module 320 is further configured to: determining a result to be selected from the initial recognition result; deleting prefix and suffix contents in the result to be selected to obtain key information; and/or, performing phrase splitting on the result to be selected to obtain key information; and/or filtering the result to be selected according to a second database related to the result to be selected to obtain key information; and/or performing corresponding de-printing or de-framing operation under the condition that the to-be-selected result comprises a seal or a frame so as to obtain key information.
In one embodiment, the ticket identification module 310 is further configured to: selecting a corresponding recognition model according to the category of the target bill; and performing OCR recognition on the target bill according to the corresponding recognition model.
In one embodiment, the processing device 300 for bill OCR further includes: and an index information determining module 550, configured to invoke a corresponding database based on each item of information in the target recognition result to determine a plurality of index information.
In one embodiment, the processing device 300 for bill OCR further includes: and the characteristic verification module is used for performing characteristic verification on the plurality of preprocessed bills to screen out the target bills.
The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.
Fig. 4 shows a block diagram of a processing apparatus of a ticket OCR according to an embodiment of the present invention. As shown in fig. 4, the apparatus includes: memory 410 and processor 420, memory 410 having stored therein instructions executable on processor 420. The processor 420, when executing the instructions, implements the method of implementing the SOA service in the above embodiments. The number of the memory 410 and the processor 420 may be one or more. The electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
The electronic device may further include a communication interface 430, which is used for communicating with an external device for data interactive transmission. The various devices are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor 420 may process instructions for execution within the electronic device, including instructions stored in the memory 410 or on the memory 410 to display graphical information of a GUI on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, as desired, along with multiple memories 410 and multiple memories 410. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
Alternatively, in practical implementation, if the memory 410, the processor 420 and the communication interface 430 are integrated on one chip, the memory 410, the processor 420 and the communication interface 430 may communicate with each other through internal interfaces.
It should be understood that the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be an advanced reduced instruction set machine (ARM) architecture supported processor.
Embodiments of the present application provide a computer-readable storage medium (such as the memory 410 described above) storing computer instructions, which when executed by a processor implement the methods provided in embodiments of the present application.
Alternatively, the memory 410 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 410 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 410 may optionally include memory located remotely from processor 420, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the present application are generated in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" is inclusive of two or more, unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.
The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered as a sequential list of executable instructions for implementing logical functions, and may be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. All or a portion of the steps of the method of the above embodiments may be performed by associated hardware, and the program may be stored in a computer readable storage medium, and when executed, the program may include one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module may also be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
While the present invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (16)

1. A processing method of bill OCR is characterized by comprising the following steps:
performing OCR recognition on the target bill to obtain an initial recognition result;
extracting key information from the initial recognition result according to a preset rule;
under the condition that the key information is lost, determining the lost information according to the related bill of the target bill and/or a first database related to the type of the lost information;
and supplementing the missing information in the initial recognition result to obtain a target recognition result.
2. The method of claim 1, wherein determining the missing information from the associated ticket of the target ticket and/or the first database related to the type of missing information comprises:
and selecting the associated bill or the first database according to a preset priority, and determining the missing information.
3. The method of claim 1, wherein determining the missing information from the associated ticket of the target ticket and/or the first database related to the type of missing information comprises:
determining first preselected information and a corresponding confidence coefficient according to the associated bill;
determining second preselected information and a corresponding confidence coefficient thereof according to the first database;
and determining the missing information from the first preselected information and the second preselected information according to the confidence degree.
4. The method according to claim 1, wherein extracting key information from the initial recognition result according to a preset rule comprises:
determining a result to be selected from the initial recognition result;
deleting prefix and suffix contents in the to-be-selected result to obtain the key information; and/or the presence of a gas in the gas,
carrying out phrase splitting on the result to be selected to obtain the key information; and/or the presence of a gas in the gas,
filtering the result to be selected according to a second database related to the result to be selected to obtain the key information; and/or the presence of a gas in the gas,
and under the condition that the to-be-selected result comprises a seal or a frame, performing corresponding seal or frame removing operation to obtain the key information.
5. The method of claim 1, wherein performing OCR recognition on the target ticket comprises:
selecting a corresponding recognition model according to the category of the target bill;
and performing OCR recognition on the target bill according to the corresponding recognition model.
6. The method according to any one of claims 1-5, further comprising:
and calling a corresponding database based on various information in the target recognition result to determine a plurality of index information.
7. The method according to any one of claims 1-5, further comprising:
and performing characteristic verification on the plurality of preprocessed bills to screen out the target bills.
8. A processing apparatus for ticket OCR, comprising:
the bill recognition module is used for carrying out OCR recognition on the target bill to obtain an initial recognition result;
the information extraction module is used for extracting key information from the initial identification result according to a preset rule;
the missing information determining module is used for determining the missing information according to the related bill of the target bill and/or the first database related to the type of the missing information under the condition that the key information is missing;
and the result determining module is used for supplementing the missing information in the initial recognition result to obtain a target recognition result.
9. The apparatus of claim 8, wherein the missing information determining module is further configured to select the associated ticket or the first database according to a preset priority to determine the missing information.
10. The apparatus of claim 8, wherein the missing information determination module is further configured to:
determining first preselected information and a corresponding confidence coefficient according to the associated bill;
determining second preselected information and a corresponding confidence coefficient thereof according to the first database;
and determining the missing information from the first preselected information and the second preselected information according to the confidence degree.
11. The apparatus of claim 8, wherein the information extraction module is further configured to:
determining a result to be selected from the initial recognition result;
deleting prefix and suffix contents in the to-be-selected result to obtain the key information; and/or the presence of a gas in the gas,
carrying out phrase splitting on the result to be selected to obtain the key information; and/or the presence of a gas in the gas,
filtering the result to be selected according to a second database related to the result to be selected to obtain the key information; and/or the presence of a gas in the gas,
and under the condition that the to-be-selected result comprises a seal or a frame, performing corresponding seal or frame removing operation to obtain the key information.
12. The apparatus of claim 8, wherein the ticket identification module is further configured to:
selecting a corresponding recognition model according to the category of the target bill;
and performing OCR recognition on the target bill according to the corresponding recognition model.
13. The apparatus of claim 8, further comprising:
and the index information determining module is used for calling the corresponding database based on each item of information in the target recognition result so as to determine a plurality of index information.
14. The apparatus of claim 8, further comprising:
and the characteristic verification module is used for performing characteristic verification on the plurality of preprocessed bills to screen out the target bills.
15. A processing apparatus for ticket OCR, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A computer readable storage medium having stored therein computer instructions which, when executed by a processor, implement the method of any one of claims 1-7.
CN202111185315.4A 2021-10-12 2021-10-12 Processing method, device and equipment for bill OCR and computer readable storage medium Pending CN113850261A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111185315.4A CN113850261A (en) 2021-10-12 2021-10-12 Processing method, device and equipment for bill OCR and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111185315.4A CN113850261A (en) 2021-10-12 2021-10-12 Processing method, device and equipment for bill OCR and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113850261A true CN113850261A (en) 2021-12-28

Family

ID=78978145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111185315.4A Pending CN113850261A (en) 2021-10-12 2021-10-12 Processing method, device and equipment for bill OCR and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113850261A (en)

Similar Documents

Publication Publication Date Title
US20210049708A1 (en) Tax document imaging and processing
US11328365B2 (en) Systems and methods for insurance fraud detection
WO2020134991A1 (en) Automatic input method for paper form, apparatus , and computer device and storage medium
CN101944154B (en) Medical image interpretation system
US9916626B2 (en) Presentation of image of source of tax data through tax preparation application
AU2020227069A1 (en) Aggregation source routing
US11972201B2 (en) Facilitating auto-completion of electronic forms with hierarchical entity data models
US20070100659A1 (en) Management of clinical data exceptions in clinical information systems
WO2023015935A1 (en) Method and apparatus for recommending physical examination item, device and medium
WO2021068629A1 (en) Electronic visa application method and apparatus
US20140215301A1 (en) Document template auto discovery
US20150178346A1 (en) Using biometric data to identify data consolidation issues
CN116612486A (en) Risk group identification method and system based on image identification and graph calculation
CN113850261A (en) Processing method, device and equipment for bill OCR and computer readable storage medium
CN105913071A (en) Information processing device, information processing system and information processing method
CN115050042A (en) Claims data entry method and device, computer equipment and storage medium
CN113793677A (en) Electronic medical record management method and device, storage medium and electronic equipment
US20060116961A1 (en) Method and apparatus for processing checks into an electronic funds transfer system
EP1455305B1 (en) Method for providing quantitative data and images for use in pathology analysis
US11475039B2 (en) Augmented reality database synchronization system
CN116825268A (en) Medical examination report processing method, device, medium and equipment
KR20170118408A (en) Transaction Information Managing System using Optical Character Reader System and Computerized Transaction Information Managing Method using It
RU2652946C1 (en) Method of recognition of payment documents
TWM593619U (en) Real-time claim system
CN117975474A (en) Picture classification method and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination