CN111444792B - Bill identification method, electronic equipment, storage medium and device - Google Patents

Bill identification method, electronic equipment, storage medium and device Download PDF

Info

Publication number
CN111444792B
CN111444792B CN202010176514.8A CN202010176514A CN111444792B CN 111444792 B CN111444792 B CN 111444792B CN 202010176514 A CN202010176514 A CN 202010176514A CN 111444792 B CN111444792 B CN 111444792B
Authority
CN
China
Prior art keywords
bill
identification
ticket
fields
image slice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010176514.8A
Other languages
Chinese (zh)
Other versions
CN111444792A (en
Inventor
孟波川
黄煦
李建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Acer Beijing Information Technology Co ltd
Original Assignee
Acer Beijing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Acer Beijing Information Technology Co ltd filed Critical Acer Beijing Information Technology Co ltd
Priority to CN202010176514.8A priority Critical patent/CN111444792B/en
Publication of CN111444792A publication Critical patent/CN111444792A/en
Application granted granted Critical
Publication of CN111444792B publication Critical patent/CN111444792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Image Analysis (AREA)

Abstract

A bill identifying method, electronic equipment, storage medium and device are disclosed. The method comprises the following steps: establishing a bill template library comprising a plurality of identification area positioning templates based on the bill surface characteristics of each bill; identifying ticket face characteristics of the ticket pictures uploaded by the user and matching corresponding identification area positioning templates based on the ticket face characteristics; cutting the bill picture based on the identification area positioning template to obtain a plurality of image slices of the bill picture; identifying fields in each image slice through an OCR automatic identification algorithm, and screening out image slices with overlapped bill fields and blurred image slices which cannot be accurately identified by the OCR automatic identification algorithm; filtering the image slices with overlapped bill fields and extracting corresponding field information; and establishing corresponding manual identification dispatching tasks for the fuzzy image slices and distributing the manual identification dispatching tasks to a plurality of crowdsourcing personnel. The recognition precision and efficiency of bill pictures are improved in an omnibearing way.

Description

Bill identification method, electronic equipment, storage medium and device
Technical Field
The present invention relates to the field of data processing, and in particular, to a bill identifying method, an electronic device, a storage medium, and an apparatus.
Background
OCR (Optical Character Recognition ) is an important research direction in the field of pattern recognition. In recent years, along with the rapid updating iteration of mobile equipment and the rapid development of the mobile internet, the OCR has wider application fields, from character recognition of previous scanned files to recognition of picture characters applied to natural scenes, such as recognition of characters in identity cards, bank cards, house cards, notes and various network pictures.
Large enterprises, institutions, hospital physical examination, insurance industries and the like have a large number of bills to acquire, input and electronically archive information. The digital management degree of the existing bill is low, and the manual input and manual filing modes adopted are high in labor intensity, low in efficiency, high in cost and easy to make mistakes. Although the machine learning method can be used for recognizing the bill based on the OCR technology at present, the recognition accuracy is not high, so that various information errors of the bill can be caused, the bill can not be rapidly documented, and the working efficiency is improved.
Therefore, it is necessary to develop a bill identifying method to improve the identification accuracy and identification efficiency of bill data.
Disclosure of Invention
The invention provides a bill identification method, electronic equipment, storage medium and device, which are characterized in that different identification area positioning templates of bills are established to form a template library, the identification area positioning templates are used for slicing bill pictures, identifying slice images, filtering and identifying pictures of overlapped parts and manually identifying crowdsourcing modes of fuzzy fields, so that the identification precision and efficiency of the bill pictures are improved in an omnibearing manner.
The bill identification method comprises the following steps:
collecting ticket face characteristics of a plurality of different tickets, and establishing a ticket template library comprising a plurality of identification area positioning templates based on the ticket face characteristics of each ticket;
identifying ticket face characteristics of ticket pictures uploaded by users, and matching corresponding identification area positioning templates from a template library based on the ticket face characteristics;
cutting the bill picture based on the identification area positioning template to obtain a plurality of image slices corresponding to a plurality of bill fields in different areas in the bill picture;
identifying the fields in each image slice through an OCR automatic identification algorithm, and screening out the image slices with overlapped bill fields and the blurred image slices which cannot be accurately identified by the OCR automatic identification algorithm;
filtering the image slices with overlapped bill fields and extracting corresponding field information;
establishing a corresponding manual identification dispatching task for the blurred image slice, and simultaneously distributing the manual identification dispatching task to a plurality of crowdsourcing personnel;
and receiving manual identification field information returned by manual identification, and respectively carrying out structural output on the manual identification field information and the field information identified by the OCR automatic identification algorithm.
Optionally, establishing a ticket template library including a plurality of identification area location templates based on the ticket face characteristics of each ticket includes:
acquiring necessary bill fields, bill categories and positions of unit information of each bill based on the bill surface characteristics of each bill, and establishing an identification area positioning template corresponding to each bill;
associating each identification area positioning template with the bill category and the unit information of the corresponding bill, and establishing a bill template library comprising a plurality of identification area positioning templates; wherein,,
the identification area positioning template comprises a plurality of frame selection identification areas corresponding to positions of a plurality of necessary bill fields in the bill picture, and each frame selection identification area corresponds to different field attributes.
Optionally, identifying the ticket face feature of the ticket picture uploaded by the user, and matching the corresponding identification area positioning template from the template library based on the ticket face feature comprises:
and identifying the bill category and the unit information of the bill picture uploaded by the user, and matching the corresponding identification area positioning template from the bill template library by an accurate matching method or a fuzzy matching method based on the identification result.
Optionally, obtaining a plurality of image slices corresponding to a plurality of different area ticket fields in the ticket picture further includes:
associating each image slice with the field attribute of the corresponding frame selection identification area;
filtering the image slice with overlapped bill fields and extracting corresponding field information comprises the following steps:
and filtering a plurality of overlapped bill fields in the image slice based on the field attribute associated with the image slice, identifying and extracting field information only for bill fields corresponding to the field attribute, and outputting structured data of the field information extracted by identification.
Optionally, establishing a corresponding manual identification dispatching task for the blurred image slice, and distributing the manual identification dispatching task to a plurality of crowdsourcing personnel simultaneously includes:
judging whether privacy information exists in the blurred image slice or not based on field attributes of the blurred image slice;
if privacy information exists, desensitizing the blurred image slice, and then establishing a manual identification dispatch task of the blurred image slice; if the privacy information does not exist, the manual identification dispatching task of the blurred image slice is directly established, and the manual identification dispatching task is simultaneously distributed to a plurality of crowdsourcing personnel.
Optionally, desensitizing the blurred image slice, and then establishing a manual identification dispatch task of the blurred image slice includes:
fragmenting the blurred image slice with the privacy information to obtain a plurality of fragment images of the blurred image slice, and binding each fragment image in the plurality of fragment images with a unique code according to a preset rule, wherein each fragment image comprises part of fields in the blurred image slice;
and respectively establishing a manual identification dispatch task of each fragment image aiming at the plurality of fragment images.
Optionally, after distributing the manual identification dispatching task to a plurality of crowdsourcing personnel simultaneously, the method further comprises:
receiving order receiving information of the plurality of crowdsourcing personnel;
the blurred image slice or the fragment image is sent to at least one crowdsourcing person who receives the order, and the crowdsourcing person who receives the order carries out manual identification on the blurred image slice or the fragment image;
and receiving manual identification field information returned by manual identification.
The invention also proposes an electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the ticket identification method described above.
The present invention also proposes a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the ticket recognition method described above.
The invention also provides a bill identifying device, which comprises:
the template building module is used for collecting ticket face characteristics of a plurality of different tickets and building a ticket template library comprising a plurality of identification area positioning templates based on the ticket face characteristics of each ticket;
the template matching module is used for identifying the ticket face characteristics of the ticket pictures uploaded by the user and matching the corresponding identification area positioning templates from the template library based on the ticket face characteristics;
the cutting processing module is used for cutting the bill picture based on the identification area positioning template so as to obtain a plurality of image slices corresponding to a plurality of bill fields in different areas in the bill picture;
the OCR automatic recognition module is used for recognizing the fields in each image slice through an OCR automatic recognition algorithm and screening out the image slices with overlapped bill fields and the blurred image slices which cannot be accurately recognized by the OCR automatic recognition algorithm;
the filtering processing module is used for filtering the image slices with overlapped bill fields and extracting corresponding field information;
the dispatching module is used for establishing a corresponding manual identification dispatching task for the fuzzy image slice and distributing the manual identification dispatching task to a plurality of crowdsourcing personnel simultaneously;
and the data output module is used for receiving the manual identification field information returned by manual identification and respectively carrying out structural output on the manual identification field information and the field information identified by the OCR automatic identification algorithm.
The invention has the beneficial effects that:
by establishing identification area positioning templates of different notes and forming a template library, the note pictures can be sliced through the identification area positioning templates, slice images are identified, the automatic ORC identification efficiency is improved, and the pictures of overlapping parts are filtered and identified to improve the accuracy of ORC automatic identification; the method is characterized in that the method is used for manually identifying the crowdsourcing mode aiming at the fuzzy field, meanwhile, a credit system about feedback failure and accuracy of crowdsourcing personnel is also established, the complicated tasks with short time and large workload are decomposed into a plurality of simple and small-workload tasks through specialized division, the manpower resources of the society are utilized to the maximum extent and the maximum extent through the mode of the crowdsourcing manual auxiliary identification, so that the efficient and quality-guaranteed manual auxiliary bill identification is realized, the manual identification precision and efficiency are improved, the overall accuracy and identification efficiency of bill identification are further improved, the identification precision and efficiency of bill pictures are improved in an omnibearing manner, and the market popularization value is high.
The method and apparatus of the present invention have other features and advantages which will be apparent from or are set forth in detail in the accompanying drawings and the following detailed description, which are incorporated herein, and which together serve to explain certain principles of the present invention.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the invention.
Fig. 1 shows a flow chart of the steps of a ticket identification method according to the invention.
Detailed Description
The invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are illustrated in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 shows a flow chart of the steps of a ticket identification method according to the invention.
As shown in fig. 1, a bill identifying method according to the present invention includes:
collecting ticket face characteristics of a plurality of different tickets, and establishing a ticket template library comprising a plurality of identification area positioning templates based on the ticket face characteristics of each ticket;
identifying ticket face characteristics of the ticket pictures uploaded by the user, and matching corresponding identification area positioning templates from a template library based on the ticket face characteristics;
cutting the bill picture based on the identification area positioning template to obtain a plurality of image slices corresponding to a plurality of bill fields in different areas in the bill picture;
identifying fields in each image slice through an OCR automatic identification algorithm, and screening out image slices with overlapped bill fields and blurred image slices which cannot be accurately identified by the OCR automatic identification algorithm;
filtering the image slices with overlapped bill fields and extracting corresponding field information;
establishing a corresponding manual identification dispatching task for the fuzzy image slice, and simultaneously distributing the manual identification dispatching task to a plurality of crowdsourcing personnel;
and receiving manual identification field information returned by manual identification, and respectively carrying out structural output on the manual identification field information and the field information identified by the OCR automatic identification algorithm.
Specifically, through establishing different identification area positioning templates of the bills and forming a template library, the bill pictures can be sliced through the identification area positioning templates, slice images are identified, the automatic identification efficiency of ORC is improved, and the pictures of the overlapped parts are filtered and identified to improve the accuracy of ORC automatic identification; the method is characterized in that the method is used for manually identifying the crowdsourcing mode aiming at the fuzzy field, meanwhile, a credit system for feedback failure and accuracy of crowdsourcing personnel is also established, the complicated tasks with short time effect and large workload are decomposed into a plurality of simple and small-workload tasks through specialized division, the manpower resources of society are utilized to the maximum extent and the maximum extent through the mode of crowdsourcing manual auxiliary identification, so that the efficient and quality-guaranteed manual auxiliary bill identification is realized, the manual identification precision and efficiency are improved, the overall accuracy and identification efficiency of bill identification are further improved, and the identification precision and efficiency of bill pictures are improved in an omnibearing manner.
In one example, a bill identification service platform can be developed and provide bill identification service, various types of bills of a partner (hospitals, insurance companies and the like in various places) are actively collected to collect bill characteristic data, distribution characteristics of bill information of different units are determined, identification area positioning templates corresponding to different bills are established, and each identification area positioning template is associated with the bill category of the corresponding bill and the unit information to which the corresponding bill belongs, so that a bill template library is established. When the user needs to identify the bill, uploading the shot or scanned bill picture to the bill image file according to the appointed format. The system automatically identifies the category and the unit information of the bill, and searches and matches the identification area positioning template corresponding to the bill in the bill template library based on the category and the unit information related to the template.
In another example, the uploaded bill category and the unit to which the bill belongs (such as a hospital and an insurance company) may be determined manually (by an operator), and if the bill belongs to a bill not corresponding to a service or the unit to which the bill corresponds does not establish a relevant identification area positioning template, the bill return information may be sent. If the bill (invoice, etc.) of the unit is not modeled, a modeling task is created, and the business of the newly added unit can be processed after a period of time (such as the next working day).
In one example, building a ticket template library comprising a plurality of identification area location templates based on ticket face characteristics of each ticket comprises:
acquiring necessary bill fields, bill types and positions of unit information of the bills based on the bill surface characteristics of the bills, and establishing an identification area positioning template corresponding to the bills;
associating each identification area positioning template with the bill category and the unit information of the corresponding bill, and establishing a bill template library comprising a plurality of identification area positioning templates; wherein,,
the identification area positioning template comprises a plurality of frame selection identification areas corresponding to the positions of a plurality of necessary bill fields in the bill picture, and each frame selection identification area corresponds to different field attributes.
Specifically, based on the ticket surface characteristics in the acquired ticket pictures, the necessary ticket fields, the ticket categories and the positions of unit information of each ticket are acquired, an identification area positioning template corresponding to each ticket is established, a ticket template base comprising identification area positioning templates corresponding to various tickets is established, the types of the tickets and the unit information of the tickets are associated, the categories of the ticket images and the field information in the unit information of the tickets are identified in a template matching stage, the corresponding identification area positioning templates are extracted by accurate matching or fuzzy matching in the ticket template base based on the identification results, then the identification area positioning templates corresponding to the ticket pictures are matched in the identification process of the tickets to cut the necessary areas of the ticket fields, and then the fields in the slice images are accurately identified, so that the identification accuracy and the identification efficiency are improved.
Wherein, the association between each identification area positioning template and all fields of the bill category and the unit information of the corresponding bill and keywords in the bill category and the unit information of the corresponding bill is specifically: and associating each identification area positioning template with all fields of the bill category and the unit information of the corresponding bill and keywords in the bill category and the unit information of the corresponding bill. The bill category is specifically the type, the field, etc. of the bill, such as a business insurance reimbursement bill, a medical insurance reimbursement bill, a hospital reimbursement bill, an insurance company reimbursement bill, etc., and the unit information, that is, the whole name of an enterprise unit, a hospital, etc. on the bill or the whole name of a unit in a financial seal covered on the bill, the keyword may include regional keywords in province, city, county, etc., and keywords in the whole name of the unit, such as XX hospital, XX insurance, etc., and those skilled in the art may perform specific setting according to the specific bill category and the whole name of the unit information, which are not repeated here. The necessary ticket fields include a name field, a user name field, an amount field, a name field of a medicine, a date field, and the like of a hospital or an insurance company.
In one example, identifying ticket features of a ticket picture uploaded by a user and matching corresponding identification area locating templates from a template library based on the ticket features includes:
and identifying the bill category and the unit information of the bill picture uploaded by the user, and matching the corresponding identification area positioning template from the bill template library by an accurate matching method or a fuzzy matching method based on the identification result.
Specifically, in this step, if the bill image uploaded by the user is oblique and distorted, the correction can be performed by using an existing image correction algorithm, so that the whole bill image and a plurality of necessary bill fields can be in one-to-one correspondence with a plurality of frame selection recognition areas in the corresponding recognition area positioning template. And then, cutting the bill picture based on a plurality of frame selection recognition areas in the recognition area positioning template to obtain a plurality of image slices corresponding to the frame selection recognition areas one by one, and associating each image slice with the field attribute of the corresponding frame selection recognition area, so that the fields in the image slices can be corresponding to the field attribute, and the accuracy of extracting the fields is improved.
In one example, the process of the bill picture matching identification area positioning template is as follows:
(1) Firstly judging whether the fields of the bill type and the unit information of the bill picture uploaded by the user are clear or not, if so, identifying all the fields of the bill type and the unit information, and if not, identifying part of the fields of the bill type and the unit information;
(2) Based on all fields, the bill category associated with all the identification area positioning templates and all the fields of the unit information are directly retrieved from a bill template library through the existing character exact matching algorithm, so that the corresponding identification area positioning templates are matched; or, based on the keywords, the keywords of the bill categories and the unit information associated with all the identification area positioning templates are retrieved from the bill template library through the existing character fuzzy matching algorithm, so that the corresponding identification area positioning templates are matched.
Further, the OCR machine recognition can be based on the collected massive bill data as a training sample of the neural network, so that the intelligent recognition of the AI based on OCR is realized. In this embodiment, besides the bill template library, a project classification library, an ICD coding library, a drug library, a diagnosis and treatment library, an operation coding library, a hospital library, a national medical insurance library, an insurance rule library, and the like may be also established to realize background database support, and final logic audit and data correction can be performed on OCR recognition results based on the databases.
In one example, obtaining a plurality of image slices of the ticket picture corresponding to a plurality of different region ticket fields further includes:
associating each image slice with a field attribute corresponding to the frame selection identification area;
filtering the image slice with overlapped bill fields and extracting the corresponding field information comprises the following steps:
and filtering a plurality of overlapped bill fields in the image slice based on the field attribute associated with the image slice, identifying and extracting field information only for bill fields corresponding to the field attribute, and outputting structured data for the identified and extracted field information.
Specifically, in the field identification and extraction process, whether overlapped bill fields exist in each image slice needs to be judged; if only a single bill field exists in the image slice, directly identifying and extracting field information from the bill field in the slice image, and outputting structured data from the field information extracted by the identification; if the image slice comprises a plurality of overlapped bill fields, filtering the bill fields based on field attributes associated with the image slice, identifying and extracting field information only for the bill fields corresponding to the field attributes, and outputting structured data of the field information extracted by identification.
In one example, if a plurality of overlapped bill fields are included in an image slice, such as date and signature overlap in a bill surface, all overlapped areas formed by the overlapped bill fields are cut off to form a single image slice, all field information in the image slice is identified, then field information which should be extracted is determined based on field attributes of the image slice, if the field attributes of the image slice are date, relevant field information of the signature is filtered, only field information relevant to the date is extracted and output, if the field attributes of the image slice are signature, relevant field information of the date is filtered, and only field information relevant to the signature is extracted and output. Therefore, the accuracy of field identification can be effectively improved.
In one example, establishing a corresponding manual identification dispatch task for a fuzzy image slice and distributing the manual identification dispatch task to a plurality of crowdsourcing personnel simultaneously includes:
judging whether privacy information exists in the blurred image slice or not based on field attributes of the blurred image slice;
if privacy information exists, desensitizing the fuzzy image slice, and then establishing a manual identification dispatch task of the fuzzy image slice; if the privacy information does not exist, a manual identification dispatching task of the fuzzy image slice is directly established, and the manual identification dispatching task is simultaneously distributed to a plurality of crowdsourcing personnel.
Specifically, whether privacy information exists in each image slice can be judged through the attribute of the picture, desensitization processing is carried out on the image slice in which the privacy information exists, a fuzzy image slice which cannot be accurately identified by an OCR automatic identification algorithm is screened out, and then a dispatch is carried out through a crowdsourcing manual identification mode, so that field information in the fuzzy image slice after manual auxiliary identification is acquired, the accuracy of bill image identification can be effectively improved, and meanwhile, the labor cost is saved and the identification efficiency is improved. The fuzzy image slice is the situation that field display in the image slice is unclear such as inclination, blurring, overlapping and the like, and the OCR automatic recognition algorithm of the image cannot acquire an accurate recognition result or cannot recognize the image at all, so that manual recognition is required to ensure the accuracy of bill recognition. According to the scheme, the manual complement is realized in an external network crowdsourcing mode, crowdsourcing personnel mainly comprise personnel with professional input skills, the requirement of inputting skills can be simplified through cutting (cutting processing) of bill pictures, the number of the crowdsourcing personnel is beneficially developed, and the manual high-efficiency complement is realized. And qualified crowdsourcing personnel can be screened and classified in a qualification certification (such as an online examination) mode.
In one example, the method further comprises the following steps of:
receiving order receiving information of a plurality of crowdsourcing personnel;
the blurred image slice or fragment image is sent to at least one crowdsourcing person who receives the bill, and the crowdsourcing person who receives the bill manually identifies the blurred image slice or fragment image;
and receiving manual identification field information returned by manual identification.
The specific implementation process of the blurred image slice with privacy information is as follows:
fragmenting the blurred image slice with the privacy information to obtain a plurality of fragment images of the blurred image slice, and binding each fragment image in the plurality of fragment images with a unique code according to a preset rule, wherein each fragment image contains part of fields in the blurred image slice;
respectively establishing a manual identification dispatch task of each fragment image aiming at a plurality of fragment images;
distributing a plurality of manual identification dispatch tasks of a plurality of fragment images to a plurality of different crowdsourcing personnel simultaneously;
receiving order receiving information of a plurality of crowdsourcing personnel, transmitting fragment images and bound unique codes to the crowdsourcing personnel of the order receiving, manually identifying fields in the fuzzy image slices by the crowdsourcing personnel of the order receiving, and returning a result of the corresponding manual identification fields and the bound unique codes;
and receiving returned results of the plurality of manual identification fields, and after the plurality of fragment images of the blurred image slice return the results of the manual identification fields, splicing the results of the plurality of manual identification fields corresponding to the plurality of fragment images according to a preset rule based on unique codes bound by the results of each manual identification field so as to form complete field information corresponding to the blurred image slice.
Specifically, after the image slices with sensitive privacy information (such as an identity card number and the like) fields are subjected to fragmentation processing, a plurality of fragment images are formed, then the plurality of fragment images are respectively distributed to different crowdsourcing personnel as separate manual identification dispatching tasks, each fragment image is bound with a unique code during the fragmentation processing, a certain logic relationship exists among the unique codes of the plurality of fragment images formed by fragmentation of each image slice, after each fragment image is manually identified by the crowdsourcing personnel and a field result of manual identification is returned, splicing and restoring can be carried out based on the original logic relationship and the unique codes respectively bound by the plurality of fragment images, so that desensitization of the privacy information is realized and an output result of a complete manual identification field corresponding to one image slice can be obtained.
In the specific implementation process, before dispatching, an artificial intelligence technology can be utilized to ensure information security and generate a simple and universal document description for each slice image at the same time, so that crowd-sourced personnel receiving the bill can quickly identify and fill in an identification result.
In one example, preferably, establishing the manually identified dispatch task for the blurred image slice further includes: setting a receipt time limit for manually identifying the dispatch task and a feedback time limit for completing the manually identifying the dispatch task; if no crowdsourcing personnel receive the order within the order receiving time limit or no crowdsourcing personnel complete the manual identification order sending task within the feedback time, carrying out secondary distribution, and sending the manual identification order sending task to the crowdsourcing personnel different from the last time.
Specifically, by setting the order receiving time limit and the result feedback time limit, the method is beneficial to realizing quick order dispatch and enabling a user to quickly feedback results, and when no person receives an order or feeds back the results, the order dispatch can be timely carried out again to other crowdsourcing personnel, so that timeliness of bill identification can be effectively improved.
In the specific implementation process, in order to improve the enthusiasm of crowdsourcing personnel, a certain material reward can be given to the crowdsourcing personnel, and a manual identification dispatching task can be only distributed to the crowdsourcing personnel who receive the order first by establishing a mechanism of robbing the order; the credit value of each crowdsourcing person can be established based on the condition that each user completes the manual identification dispatching task in an accumulated mode, wherein the credit value is positively correlated with the accuracy of the manual identification field structure, and the credit value is negatively correlated with the feedback time, namely, the manually identified field information is high in accuracy and short in feedback time, and more credit values and rewards can be obtained. And after receiving the order receiving information of the plurality of crowdsourcing personnel, acquiring the credit value corresponding to the crowdsourcing personnel of each order receiving, and preferentially sending the blurred image slice to the crowdsourcing personnel with the highest credit value. The enthusiasm of manual identification of crowdsourcing personnel can be effectively improved, and the accuracy of bill identification is further improved. Dynamic credit alerts can also be established by means of human behavioral AI analysis, enhancing credit management for crowd-sourced personnel.
In one example, the identification task is distributed to the assignable crowdsourcing personnel A from high to low according to the credit value sequence, and the feedback time is set to be 30s; and after receiving the identification task, if the identification task is not completed within 30 seconds, performing secondary allocation on the identification task, namely allocating the task to an assignable crowdsourcing person Z with the highest credit value except the assignable crowdsourcing person.
In one example, to further improve accuracy of manual identification, for a blurred image slice or fragment image, the same blurred image slice or fragment image may be simultaneously sent to a plurality of crowdsourcing personnel receiving the orders, and the plurality of crowdsourcing personnel receiving the orders respectively perform manual identification and feed back results of the manual identification fields; judging whether a plurality of manually recognized fields fed back are identical or not based on feedback results of a plurality of crowdsourcing personnel, if so, outputting a manually recognized field as a final output field, and if not, re-establishing a manually recognized dispatching task for dispatching again by a fuzzy image slice until all feedback results are identical; or the same manual identification field with the highest duty ratio in the feedback result is used as a final output field.
For example, the same blurred image slice or fragment image is established with a manual identification dispatching task, a dispatching task is established, meanwhile, the dispatching time limit is set to be 20s, the manual identification dispatching task is simultaneously distributed to three crowdsourcing personnel A, B, C, and if A, B, C is not picked up within 20s, the crowdsourcing personnel D, E, F is dispatched again;
if all three crowdsourcing personnel A, B, C successfully receive the order within 20s, (if the order robbing service is provided, the manual identification order assignment task is directly assigned to the crowdsourcing personnel with the shortest order receiving time), the feedback time limit is set to be 30s, the blurred image slices or fragment images are respectively sent to A, B, C, if at least one person does not feed back the result of the manual identification field within 30s A, B, C, the order is reassigned to D, E, F, and the credit value of the fed-back crowdsourcing personnel is updated according to the feedback time; if A, B, C successfully feeds back the respective identification results to be X, directly outputting the manual identification result to be X, updating the credit value according to A, B, C feedback time, and if not, re-assigning D, E, F the list until the three feedback structures are identical; another way is: if A, B, C feeds back X, X, Y the identification results, X may be used as the result of the manual identification field, and the credit value may be updated according to the correctness of the crowdsourcing personnel and the feedback time.
And finally, outputting structured data of bill field information fed back by manual identification, and integrating the structured data with bill field information automatically identified by ORC.
The invention also proposes an electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the ticket identification method described above.
The present invention also proposes a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the ticket recognition method described above.
The invention also provides a bill identifying device, which comprises:
the template building module is used for collecting ticket face characteristics of a plurality of different tickets and building a ticket template library comprising a plurality of identification area positioning templates based on the ticket face characteristics of each ticket;
the template matching module is used for identifying the ticket face characteristics of the ticket pictures uploaded by the user and matching the corresponding identification area positioning templates from the template library based on the ticket face characteristics;
the cutting processing module is used for cutting the bill picture based on the identification area positioning template so as to obtain a plurality of image slices corresponding to a plurality of bill fields in different areas in the bill picture;
the OCR automatic recognition module is used for recognizing the fields in each image slice through an OCR automatic recognition algorithm and screening out the image slices with overlapped bill fields and the blurred image slices which cannot be accurately recognized by the OCR automatic recognition algorithm;
the filtering processing module is used for filtering the image slices with overlapped bill fields and extracting corresponding field information;
the dispatch module is used for establishing a corresponding manual identification dispatch task for the fuzzy image slice and distributing the manual identification dispatch task to a plurality of crowdsourcing personnel simultaneously;
and the data output module is used for receiving the manual identification field information returned by manual identification and respectively carrying out structural output on the manual identification field information and the field information identified by the OCR automatic identification algorithm.
According to the embodiment of the invention, the identification area positioning templates of different notes are established and the template library is formed, the note pictures can be sliced through the identification area positioning templates, slice images are identified, the automatic identification efficiency of ORC is improved, and the pictures of the overlapped parts are filtered and identified to improve the accuracy of the automatic identification of ORC;
further, manual identification of crowdsourcing modes is carried out aiming at fuzzy fields, a credit system for feedback failure and accuracy of crowdsourcing personnel is also established, complex tasks with short time effect and large workload are decomposed into a plurality of simple and small-workload tasks through specialized division, social manpower resources are utilized to the maximum extent and the maximum extent through the mode of crowdsourcing manual auxiliary identification, and then efficient and quality-guaranteed manual auxiliary bill identification is realized, the manual identification precision and efficiency are improved, the overall accuracy and identification efficiency of bill identification are further improved, and therefore the comprehensive identification precision and efficiency of bill pictures are improved, and the method has high market popularization value.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described.

Claims (10)

1. A ticket identification method, comprising:
collecting ticket face characteristics of a plurality of different tickets, and establishing a ticket template library comprising a plurality of identification area positioning templates based on the ticket face characteristics of each ticket;
identifying ticket face characteristics of ticket pictures uploaded by users, and matching corresponding identification area positioning templates from a template library based on the ticket face characteristics;
cutting the bill picture based on the identification area positioning template to obtain a plurality of image slices corresponding to a plurality of bill fields in different areas in the bill picture;
identifying the fields in each image slice through an OCR automatic identification algorithm, and screening out the image slices with overlapped bill fields and the blurred image slices which cannot be accurately identified by the OCR automatic identification algorithm;
filtering the image slices with overlapped bill fields and extracting corresponding field information;
establishing a corresponding manual identification dispatching task for the blurred image slice, and simultaneously distributing the manual identification dispatching task to a plurality of crowdsourcing personnel;
and receiving manual identification field information returned by manual identification, and respectively carrying out structural output on the manual identification field information and the field information identified by the OCR automatic identification algorithm.
2. The ticket identification method of claim 1 wherein creating a ticket template library comprising a plurality of identification zone location templates based on ticket face characteristics of each ticket comprises:
acquiring necessary bill fields, bill categories and positions of unit information of each bill based on the bill surface characteristics of each bill, and establishing an identification area positioning template corresponding to each bill;
associating each identification area positioning template with the bill category and the unit information of the corresponding bill, and establishing a bill template library comprising a plurality of identification area positioning templates; wherein,,
the identification area positioning template comprises a plurality of frame selection identification areas corresponding to positions of a plurality of necessary bill fields in the bill picture, and each frame selection identification area corresponds to different field attributes.
3. The ticket identification method of claim 2, wherein identifying ticket features of a ticket picture uploaded by a user and matching corresponding identification area locating templates from a template library based on the ticket features comprises:
and identifying the bill category and the unit information of the bill picture uploaded by the user, and matching the corresponding identification area positioning template from the bill template library by an accurate matching method or a fuzzy matching method based on the identification result.
4. The ticket identification method of claim 2 wherein obtaining a plurality of image slices corresponding to a plurality of different area ticket fields in the ticket picture further comprises:
associating each image slice with the field attribute of the corresponding frame selection identification area;
filtering the image slice with overlapped bill fields and extracting corresponding field information comprises the following steps:
and filtering a plurality of overlapped bill fields in the image slice based on the field attribute associated with the image slice, identifying and extracting field information only for bill fields corresponding to the field attribute, and outputting structured data of the field information extracted by identification.
5. The ticket identification method of claim 2, wherein establishing a corresponding manual identification dispatching task for the blurred image slice and distributing the manual identification dispatching task to a plurality of crowdsourcing personnel simultaneously comprises:
judging whether privacy information exists in the blurred image slice or not based on field attributes of the blurred image slice;
if privacy information exists, desensitizing the blurred image slice, and then establishing a manual identification dispatch task of the blurred image slice; if the privacy information does not exist, the manual identification dispatching task of the blurred image slice is directly established, and the manual identification dispatching task is simultaneously distributed to a plurality of crowdsourcing personnel.
6. The ticket identification method of claim 5 wherein desensitizing the blurred image slice, and then establishing a manual identification dispatch task for the blurred image slice comprises:
fragmenting the blurred image slice with the privacy information to obtain a plurality of fragment images of the blurred image slice, and binding each fragment image in the plurality of fragment images with a unique code according to a preset rule, wherein each fragment image comprises part of fields in the blurred image slice;
and respectively establishing a manual identification dispatch task of each fragment image aiming at the plurality of fragment images.
7. The ticket identification method of claim 6 wherein after distributing the manual identification dispatch task to a plurality of crowd-sourced personnel simultaneously further comprises:
receiving order receiving information of the plurality of crowdsourcing personnel;
the blurred image slice or the fragment image is sent to at least one crowdsourcing person who receives the order, and the crowdsourcing person who receives the order carries out manual identification on the blurred image slice or the fragment image;
and receiving manual identification field information returned by manual identification.
8. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the ticket identification method of any of claims 1 to 7.
9. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the ticket identification method of any of claims 1-7.
10. A bill identifying device, characterized by comprising:
the template building module is used for collecting ticket face characteristics of a plurality of different tickets and building a ticket template library comprising a plurality of identification area positioning templates based on the ticket face characteristics of each ticket;
the template matching module is used for identifying the ticket face characteristics of the ticket pictures uploaded by the user and matching the corresponding identification area positioning templates from the template library based on the ticket face characteristics;
the cutting processing module is used for cutting the bill picture based on the identification area positioning template so as to obtain a plurality of image slices corresponding to a plurality of bill fields in different areas in the bill picture;
the OCR automatic recognition module is used for recognizing the fields in each image slice through an OCR automatic recognition algorithm and screening out the image slices with overlapped bill fields and the blurred image slices which cannot be accurately recognized by the OCR automatic recognition algorithm;
the filtering processing module is used for filtering the image slices with overlapped bill fields and extracting corresponding field information;
the dispatching module is used for establishing a corresponding manual identification dispatching task for the fuzzy image slice and distributing the manual identification dispatching task to a plurality of crowdsourcing personnel simultaneously;
and the data output module is used for receiving the manual identification field information returned by manual identification and respectively carrying out structural output on the manual identification field information and the field information identified by the OCR automatic identification algorithm.
CN202010176514.8A 2020-03-13 2020-03-13 Bill identification method, electronic equipment, storage medium and device Active CN111444792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010176514.8A CN111444792B (en) 2020-03-13 2020-03-13 Bill identification method, electronic equipment, storage medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010176514.8A CN111444792B (en) 2020-03-13 2020-03-13 Bill identification method, electronic equipment, storage medium and device

Publications (2)

Publication Number Publication Date
CN111444792A CN111444792A (en) 2020-07-24
CN111444792B true CN111444792B (en) 2023-05-09

Family

ID=71652306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010176514.8A Active CN111444792B (en) 2020-03-13 2020-03-13 Bill identification method, electronic equipment, storage medium and device

Country Status (1)

Country Link
CN (1) CN111444792B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931784B (en) * 2020-09-17 2021-01-01 深圳壹账通智能科技有限公司 Bill recognition method, system, computer device and computer-readable storage medium
CN112183594B (en) * 2020-09-17 2024-06-11 微民保险代理有限公司 Bill image processing method and device, storage medium and electronic equipment
CN112329757A (en) * 2020-10-20 2021-02-05 安诚迈科(北京)信息技术有限公司 Method, device and system for desensitizing acquisition of bill information
CN112989990B (en) * 2021-03-09 2023-08-04 平安科技(深圳)有限公司 Medical bill identification method, device, equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012221183A (en) * 2011-04-08 2012-11-12 Fujitsu Marketing Ltd Receipt data recognition device and program therefor
CN107622255A (en) * 2017-10-12 2018-01-23 江苏鸿信系统集成有限公司 Bill images field localization method and system based on situation template and semantic template
CN107977665A (en) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 The recognition methods of key message and computing device in a kind of invoice
CN108229463A (en) * 2018-02-07 2018-06-29 众安信息技术服务有限公司 Character recognition method based on image
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108664897A (en) * 2018-04-18 2018-10-16 平安科技(深圳)有限公司 Bank slip recognition method, apparatus and storage medium
CN108921166A (en) * 2018-06-22 2018-11-30 深源恒际科技有限公司 Medical bill class text detection recognition method and system based on deep neural network
CN109740548A (en) * 2019-01-08 2019-05-10 北京易道博识科技有限公司 A kind of reimbursement bill images dividing method and system
CN110263616A (en) * 2019-04-29 2019-09-20 五八有限公司 A kind of character recognition method, device, electronic equipment and storage medium
CN110751143A (en) * 2019-09-26 2020-02-04 中电万维信息技术有限责任公司 Electronic invoice information extraction method and electronic equipment
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012221183A (en) * 2011-04-08 2012-11-12 Fujitsu Marketing Ltd Receipt data recognition device and program therefor
CN107622255A (en) * 2017-10-12 2018-01-23 江苏鸿信系统集成有限公司 Bill images field localization method and system based on situation template and semantic template
CN107977665A (en) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 The recognition methods of key message and computing device in a kind of invoice
CN108229463A (en) * 2018-02-07 2018-06-29 众安信息技术服务有限公司 Character recognition method based on image
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108664897A (en) * 2018-04-18 2018-10-16 平安科技(深圳)有限公司 Bank slip recognition method, apparatus and storage medium
CN108921166A (en) * 2018-06-22 2018-11-30 深源恒际科技有限公司 Medical bill class text detection recognition method and system based on deep neural network
CN109740548A (en) * 2019-01-08 2019-05-10 北京易道博识科技有限公司 A kind of reimbursement bill images dividing method and system
CN110263616A (en) * 2019-04-29 2019-09-20 五八有限公司 A kind of character recognition method, device, electronic equipment and storage medium
CN110751143A (en) * 2019-09-26 2020-02-04 中电万维信息技术有限责任公司 Electronic invoice information extraction method and electronic equipment
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium

Also Published As

Publication number Publication date
CN111444792A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN111444792B (en) Bill identification method, electronic equipment, storage medium and device
CN109034727B (en) Self-service electronic government affair processing method
CN114862540B (en) Bill auditing system and method thereof
US7869098B2 (en) Scanning verification and tracking system and method
CN111444793A (en) Bill recognition method, equipment, storage medium and device based on OCR
US9785831B2 (en) Personal information collection system, personal information collection method and program
CN108334797B (en) File scanning method, device and computer readable storage medium
CN114202755A (en) Transaction background authenticity auditing method and system based on OCR (optical character recognition) and NLP (non-line segment) technologies
CN111444795A (en) Bill data identification method, electronic device, storage medium and device
CN110689325A (en) Information processing method, device and computer readable storage medium
CN112182246A (en) Method, system, medium, and application for creating an enterprise representation through big data analysis
CN110837998A (en) Contract auditing method, device, equipment and medium
CN113270143A (en) System, method and device for realizing nucleic acid sample full-flow automatic detection based on big data, processor and computer storage medium thereof
CN113723926A (en) Bank pipelining processing method and device combining RPA and AI and electronic equipment
CN115423586A (en) Financial invoice reimbursement, uploading and auditing system based on network
CN109784738A (en) The measures and procedures for the examination and approval and examination & approval device
CN113918609A (en) Test paper creating method and device, computer equipment and storage medium
CN111400529B (en) Data processing method and device
US7225106B2 (en) Data processing system and method for processing test orders
CN111444794B (en) Bill identification auxiliary method, equipment, storage medium and device based on OCR
CN116703346A (en) Attendance management system and method based on big data and artificial intelligence
JP4848865B2 (en) Image processing device
CN115984885A (en) Work order management method and system for marketing field operation
CN115907673A (en) Supply chain system
JP6349332B2 (en) Loan office management system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant