CN114241498A - Bill information to be input identification method, system and medium based on template library - Google Patents

Bill information to be input identification method, system and medium based on template library Download PDF

Info

Publication number
CN114241498A
CN114241498A CN202111397078.8A CN202111397078A CN114241498A CN 114241498 A CN114241498 A CN 114241498A CN 202111397078 A CN202111397078 A CN 202111397078A CN 114241498 A CN114241498 A CN 114241498A
Authority
CN
China
Prior art keywords
bill
identification
image
module
template library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111397078.8A
Other languages
Chinese (zh)
Inventor
胡焱
索春宝
马伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Financial Information Technology Co Ltd
Original Assignee
Inspur Financial Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Financial Information Technology Co Ltd filed Critical Inspur Financial Information Technology Co Ltd
Priority to CN202111397078.8A priority Critical patent/CN114241498A/en
Publication of CN114241498A publication Critical patent/CN114241498A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system and a medium for identifying information to be input of a bill based on a template library, wherein the method comprises the following steps: configuring a sample ticket collection and an image capture module; creating a ticket image database based on the sample ticket set and the image capture module; configuring a data processing module to obtain bill identification specifications; constructing a bill identification template library based on the bill image database, the data processing module and the bill identification specification; detecting bill identification requirements, and executing bill information identification operation based on the data processing module, the bill identification template library and the bill identification requirements; the method can realize the control and identification of various types of bills based on the template library, and simultaneously, based on the construction logic of the template library, the method adopts unique text box information identification logic, is different from the conventional image identification technology, and further improves the identification accuracy and identification efficiency.

Description

Bill information to be input identification method, system and medium based on template library
Technical Field
The invention relates to the technical field of bill information identification, in particular to a method, a system and a medium for identifying information to be input of a bill based on a template library.
Background
There are two bill information recognition methods adopted in financial institutions; the first method is manual identification, which is complex in operation, high in complexity, easy in error in information input, high in labor cost and low in identification efficiency; the second is to carry out image recognition by a machine and then carry out manual verification and modification adjustment, although the recognition efficiency is improved compared with the manual recognition, the consumed labor cost is still high, and the input information is easy to modify and make mistakes; in summary, the existing bill information identification method has low identification efficiency and high identification cost, and the accuracy cannot be guaranteed.
Disclosure of Invention
The invention mainly solves the problems that the existing bill information identification method is low in identification efficiency, high in identification cost and incapable of ensuring accuracy.
In order to solve the technical problems, the invention adopts a technical scheme that: the method for identifying the information to be input of the bill based on the template library comprises the following steps:
an initialization configuration step:
configuring a sample ticket collection and an image capture module; creating a ticket image database based on the sample ticket set and the image capture module;
a template library construction step:
configuring a data processing module to obtain bill identification specifications; constructing a bill identification template library based on the bill image database, the data processing module and the bill identification specification;
intelligent identification:
and detecting bill identification requirements, and executing bill information identification operation based on the data processing module, the bill identification template library and the bill identification requirements.
As an improved scheme, the sample bill is centrally configured with a plurality of entity bill samples;
an image processing program is configured in the image capturing module;
the data processing module comprises: the system comprises a data marking module, a coordinate calculation module and a data integration module;
the bill identification standard is configured with a bill identification keyword list and bill size information;
the bill identification requirements include: a first identification requirement and a second identification requirement; the first identification requirement is that a bill to be identified exists; the second identification requirement is that no bill to be identified exists.
As an improvement, the step of creating a ticket image database based on the sample ticket set and the image capture module further comprises:
calling the image capturing module to capture a plurality of first bill images corresponding to a plurality of the entity bill samples respectively; setting a first background color, a first typeface color, a first resolution and a first image format; configuring a first image empty container;
performing an image processing step based on the image capture module, the first background color, the first typeface color, the first resolution, the first image format, the first image empty container, and a number of the first ticket images, resulting in the ticket image database.
As an improvement, the image processing step includes:
controlling the image capturing module to call the image processing program to respectively perform image color conversion processing on the plurality of first bill images according to the first background color and the first typeface color to obtain a plurality of second bill images;
controlling the image capturing module to call the image processing program to respectively perform image data conversion processing on the second bill images according to the first resolution and the first image format to obtain third bill images;
and introducing a plurality of third bill images into the first image empty container to obtain the bill image database.
As an improvement, the step of constructing a bill identification template library based on the bill image database, the data processing module and the bill identification specification further comprises:
identifying the bill identification keyword list and the bill size information in the bill identification specification; setting a first mark shape and a first coordinate calculation unit; calling a statistical algorithm to count the first number of the third bill images; configuring template empty containers, and creating a plurality of identification empty templates corresponding to the first number according to the bill size information;
respectively executing a bill template construction step on a plurality of third bill images based on the data processing module, the bill identification keyword list, the first mark shape, the first coordinate calculation unit and the plurality of identification empty templates to obtain a plurality of bill templates; and placing a plurality of bill templates into the template empty container to obtain the bill identification template library.
As an improved scheme, a plurality of bill names and a plurality of identification keywords respectively corresponding to the bill names are configured in the bill identification keyword list;
the bill template construction step comprises:
identifying a first ticket name of the third ticket image; identifying a first identification keyword matched with the first bill name in the bill identification keyword list; identifying a first bill text matched with the first identification keyword in the third bill image;
calling the data marking module to generate a marking text box for the first bill text in the third bill image according to the first marking shape; calling the coordinate calculation module to construct a first text box coordinate graph on the third bill image based on the first coordinate calculation unit; calling the coordinate calculation module to calculate the text box coordinate of the marked text box relative to the first text box coordinate graph based on the first coordinate calculation unit;
setting an identification position and a position relation filling area in the identification empty template; firstly, calling the data integration module to fill the first recognition keyword into the identification position, and then calling the data integration module to fill the text box coordinate into the position relation filling area to obtain the bill template.
As an improved solution, the step of executing the bill information identification operation based on the data processing module, the bill identification template library and the bill identification requirement further comprises:
when the bill identification requirement is the first identification requirement, acquiring a bill to be identified corresponding to the first identification requirement; calling the image capturing module to capture a bill image to be recognized of the bill to be recognized; and executing the bill information identification operation on the bill image to be identified based on the data marking module, the coordinate calculation module, the bill identification keyword list and the plurality of bill templates.
As an improvement, the ticket information identification operation includes:
identifying a second bill name of the bill image to be identified; acquiring a second identification keyword corresponding to the second bill name based on the bill identification keyword list;
screening out a first bill template matched with the second recognition keyword from a plurality of bill templates based on the identification position; identifying first text box coordinates in the positional relationship fill area of the first ticket template;
calling the coordinate calculation module to construct a second text box coordinate graph on the bill image to be recognized based on the first coordinate calculation unit; determining a region to be marked of the bill image to be identified based on the first text box coordinate and the second text box coordinate graph;
calling the data marking module to generate an information text box to be input on the area to be marked of the bill image to be identified according to the first marking shape; identifying text data to be entered in the information text box to be entered; and packaging the second bill name and the text data to be input to obtain first bill information.
The invention also provides a bill to-be-input information identification system based on the template library, which comprises the following steps:
the system comprises an initialization configuration module, a template library construction module and an intelligent identification module;
the initialization configuration module is used for configuring a sample bill set and an image capturing module; the initialization configuration module creating a ticket image database based on the sample ticket set and the image capture module;
the template library construction module is used for configuring the data processing module and acquiring the bill identification standard; the template library construction module constructs a bill recognition template library based on the bill image database, the data processing module and the bill recognition standard;
the intelligent identification module is used for detecting bill identification requirements and executing bill information identification operation based on the data processing module, the bill identification template library and the bill identification requirements.
The invention also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the steps of the identification method for the information to be input of the bill based on the template library.
The invention has the beneficial effects that:
1. the bill to-be-input information identification method based on the template library can realize the control and identification of various types of bills based on the template library, and meanwhile, based on the construction logic of the template library, the unique text box information identification logic is adopted, so that the method is different from the conventional image identification technology, the identification accuracy and the identification efficiency are further improved, the defects of the prior art are made up, and the method has extremely high application value.
2. The bill to-be-input information identification system based on the template library can realize the control and identification of various types of bills based on the template library through the mutual cooperation of the initialization configuration module, the template library construction module and the intelligent identification module, and meanwhile, based on the construction logic of the template library, the unique text box information identification logic is adopted, which is different from the conventional image identification technology, so that the identification accuracy and the identification efficiency are further improved, the defects of the prior art are made up, and the bill to-be-input information identification system based on the template library has extremely high application value.
3. The computer readable storage medium can realize the cooperation of a guide initialization configuration module, a template library construction module and an intelligent identification module, further realize the control and identification of various types of bills based on the template library, simultaneously, based on the construction logic of the template library, adopt unique text box information identification logic, is different from the conventional image identification technology, further improves the identification accuracy and identification efficiency, makes up the defects of the prior art, has extremely high application value, and effectively improves the operability of the bill to-be-recorded information identification method based on the template library.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a method for identifying information to be entered into a bill based on a template library according to embodiment 1 of the present invention;
fig. 2 is a schematic specific flow chart of a method for identifying information to be entered into a bill based on a template library according to embodiment 1 of the present invention;
fig. 3 is an architecture diagram of a template library-based ticket information identification system to be entered according to embodiment 2 of the present invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
In the description of the present invention, it should be noted that the described embodiments of the present invention are a part of the embodiments of the present invention, and not all embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Example 1
The embodiment provides a method for identifying information to be entered into a bill based on a template library, as shown in fig. 1 and 2, the method includes the following steps:
s100, initializing configuration, specifically comprising:
s110, configuring a sample bill set and an image capturing module; creating a ticket image database based on the sample ticket set and the image capture module;
specifically, a plurality of entity bill samples are centrally configured in the sample bill, and the entity bill samples are actually paper bills of a plurality of different types; an image processing program is configured in the image capturing module; image processing programs include, but are not limited to, color processing programs, resolution processing programs, resizing programs, and the like; the image capturing module adopts a high-definition camera module;
specifically, the image capturing module is called to capture a plurality of first bill images corresponding to a plurality of entity bill samples respectively; setting a first background color, a first typeface color, a first resolution and a first image format; configuring a first image empty container; in this embodiment, in order to facilitate the construction of the template library, the first background color is set to be white, the first typeface color is set to be black, the first resolution is 1k, and the first image format is set to be jpg format, so that the recognition degree of the bill image is higher; executing an image processing step based on the image capturing module, the first background color, the first typeface color, the first resolution, the first image format, the first image empty container and a plurality of first bill images, so as to improve the recognition degree of the first bill images and obtain the bill image database;
specifically, the image processing step includes: controlling the image capturing module to call the image processing program to respectively perform image color conversion processing on the plurality of first bill images according to the first background color and the first typeface color, wherein the image color conversion is the conversion of the background color and the typeface color, and a plurality of second bill images are obtained; controlling the image capturing module to call the image processing program to respectively perform image data conversion processing on the second bill images according to the first resolution and the first image format, wherein the image data conversion is the conversion of image pixel values and storage formats according to the resolution to obtain third bill images; leading a plurality of third bill images into the first image empty container to obtain the bill image database; step S100 establishes a data base for step S200, and lays a cushion for the intelligent recognition of the bill.
S200, template library construction, specifically comprising:
s210, configuring a data processing module to obtain bill identification specifications; constructing a bill identification template library based on the bill image database, the data processing module and the bill identification specification;
specifically, the data processing module includes: the system comprises a data marking module, a coordinate calculation module and a data integration module; the data marking module adopts an image marking program; the coordinate calculation module adopts a coordinate graph creation program; the data integration module adopts a data sorting/classifying/editing program; the bill identification standard is configured with a bill identification keyword list and bill size information; the bill identification specification is to set basic templates of bills in the template library, for example, the size of the bill template, the background color of the bill template, and the like;
specifically, the bill identification keyword list and the bill size information in the bill identification specification are identified; setting a first mark shape and a first coordinate calculation unit, wherein in the embodiment, the first mark shape is a hollow rectangle, and the first coordinate calculation unit takes 2-5 pixel units as coordinate units; calling a statistical algorithm to count the first number of the third bill images; configuring template empty containers, and creating a plurality of identification empty templates corresponding to the first number according to the bill size information; identifying the size of the empty template to correspond to the bill size information, wherein the number of the empty templates is a first number and is respectively used for placing the related templates of each third bill image; executing a bill template construction step on each third bill image in the plurality of third bill images based on the data processing module, the bill identification keyword list, the first mark shape, the first coordinate calculation unit and the plurality of identification blank templates to obtain a plurality of bill templates; placing a plurality of bill templates into the template empty container to obtain the bill identification template library;
specifically, a plurality of bill names and a plurality of identification keywords respectively corresponding to the bill names are configured in the bill identification keyword list; identifying keywords as a plurality of identification items which correspond to the bill name and need to be identified; for example: if the bill name is a reimbursement invoice, the identification keywords should be: bill name, reimbursement party, reimbursed party, reimbursement amount, reimbursement purpose and the like; the keywords are identified to be abbreviated texts, so that the storage space is saved;
the ticket template construction step performed for each third ticket image includes the following procedures: identifying a first bill name of the third bill image, wherein the first bill name is an identification name capable of directly representing the third bill image, such as an reimbursement invoice or a deposit receipt; identifying a first identification keyword matched with the first bill name in the bill identification keyword list; identifying a first bill text matched with the first identification keyword in the third bill image; calling the data marking module to generate a marking text box for the first bill text in the third bill image according to the first marking shape, namely, performing frame selection on the first bill text; calling the coordinate calculation module to construct a first text box coordinate graph on the third bill image based on the first coordinate calculation unit; calling the coordinate calculation module to calculate the text box coordinate of the marked text box relative to the first text box coordinate graph based on the first coordinate calculation unit; after the coordinates of the text box are obtained, the position relation of the first bill text to be identified to the third bill image is obtained, and then the bill to be identified can be accurately identified according to the first identification keyword and the position relation; setting a mark position and a position relation filling area in the identification empty template; firstly, calling the data integration module to fill the first recognition keyword into the identification position, and then calling the data integration module to fill the text box coordinate into the position relation filling area to obtain the bill template; in the embodiment, the identification position is arranged at the template head for identifying the empty template, so that the empty template is easy to distinguish and screen; the filling position of the text box coordinate in the position relation filling area corresponds to the position relation among all marked text boxes, so that in the relation filling area, not only can the specific text box coordinate be confirmed, but also the position relation among the information needing to be identified under the general frame of the bill can be confirmed;
s300, intelligent identification, specifically comprising:
s310, detecting bill identification requirements, and executing bill information identification operation based on the data processing module, the bill identification template library and the bill identification requirements;
specifically, the bill identification requirement includes: a first identification requirement and a second identification requirement; the first identification requirement is that a bill to be identified exists; the second identification requirement is that no bill to be identified exists;
specifically, when the bill identification requirement is the first identification requirement, a bill needs to be identified, so that a bill to be identified corresponding to the first identification requirement is acquired; therefore, executing operation logic similar to the bill template construction step, firstly calling the image capturing module to capture the bill image to be recognized of the bill to be recognized; then, executing the bill information identification operation on the bill image to be identified based on the data marking module, the coordinate calculation module, the bill identification keyword list and a plurality of bill templates;
specifically, the bill information identification operation includes: identifying a second bill name of the bill image to be identified; acquiring a second identification keyword corresponding to the second bill name based on the bill identification keyword list; after the keywords are obtained, corresponding bill identification templates need to be matched, so that a first bill template matched with the second identification keywords is screened out from a plurality of bill templates based on the identification positions; identifying first text box coordinates in the positional relationship fill area of the first ticket template; calling the coordinate calculation module to construct a second text box coordinate graph on the bill image to be recognized based on the first coordinate calculation unit; calculating in the second text box coordinate graph according to the position relation among the first text box coordinates, and further obtaining the specific position relation of each text message needing to be framed and selected in the bill image to be identified; further adaptively adjusting the specific position relation of each text message according to the scaling relation between the size of the current bill image to be identified and the bill size information of the template, and finally obtaining the area to be marked of the bill image to be identified; calling the data marking module to generate an information text box to be input on the area to be marked of the bill image to be identified according to the first marking shape; the text information in the information text box to be input is the information which needs to be identified and input by the bill, so that the text data to be input in the information text box to be input is identified; packing the second bill name and the text data to be input to obtain first bill information; the identification area confirmed according to the position relation is accurate and error-free, manual check is not needed, the method is suitable for various bills, the templates in the template library can be updated and iterated in real time based on the construction of the template library, the applicability is strong, and the defects in the prior art are overcome.
Example 2
The present embodiment provides a template library-based bill to-be-entered information recognition system based on the same inventive concept as the template library-based bill to-be-entered information recognition method described in embodiment 1, and as shown in fig. 3, the template library-based bill to-be-entered information recognition system includes: the system comprises an initialization configuration module, a template library construction module and an intelligent identification module;
in the template library-based bill to be input information identification system, an initialization configuration module is used for configuring a sample bill set and an image capture module; the initialization configuration module creating a ticket image database based on the sample ticket set and the image capture module;
specifically, a plurality of entity bill samples are centrally configured in the sample bill; an image processing program is configured in the image capturing module;
specifically, the initialization configuration module calls the image capture module to capture a plurality of first bill images corresponding to a plurality of the entity bill samples respectively; the initialization configuration module sets a first background color, a first typeface color, a first resolution and a first image format; the initialization configuration module configures a first image empty container; an initialization configuration module performs an image processing step based on the image capture module, the first background color, the first typeface color, the first resolution, the first image format, the first image empty container, and a number of the first ticket images, resulting in the ticket image database;
specifically, the image processing step includes: the initialization configuration module controls the image capture module to call the image processing program to respectively perform image color conversion processing on the first bill images according to the first background color and the first typeface color to obtain a plurality of second bill images; the initialization configuration module controls the image capturing module to call the image processing program to respectively perform image data conversion processing on the second bill images according to the first resolution and the first image format to obtain third bill images; and the initialization configuration module leads a plurality of third bill images into the first image empty container to obtain the bill image database.
In the bill to be entered information identification system based on the template library, the template library construction module is used for configuring the data processing module and acquiring the bill identification standard; the template library construction module constructs a bill recognition template library based on the bill image database, the data processing module and the bill recognition standard;
specifically, the data processing module includes: the system comprises a data marking module, a coordinate calculation module and a data integration module; the bill identification standard is configured with a bill identification keyword list and bill size information;
specifically, the template library construction module identifies the bill identification keyword list and the bill size information in the bill identification specification; the template library construction module sets a first mark shape and a first coordinate calculation unit; a template library construction module calls a statistical algorithm to count the first number of the third bill images; the template library construction module configures template empty containers and creates a plurality of identification empty templates corresponding to the first number according to the bill size information; the template library construction module is used for respectively executing a bill template construction step on a plurality of third bill images based on the data processing module, the bill identification keyword list, the first mark shape, the first coordinate calculation unit and the plurality of identification empty templates to obtain a plurality of bill templates; and the template library construction module is used for placing a plurality of bill templates into the template empty container to obtain the bill identification template library.
Specifically, a plurality of bill names and a plurality of identification keywords respectively corresponding to the bill names are configured in the bill identification keyword list;
the bill template construction step comprises: the template library construction module identifies a first bill name of the third bill image; the template library construction module identifies a first identification keyword matched with the first bill name in the bill identification keyword list; the template library construction module identifies a first bill text matched with the first identification keyword in the third bill image; a template library construction module calls the data marking module to generate a marking text box for the first bill text in the third bill image according to the first marking shape; the template library construction module calls the coordinate calculation module to construct a first text box coordinate graph on the third bill image based on the first coordinate calculation unit; the template library construction module calls the coordinate calculation module to calculate the text box coordinate of the marked text box relative to the first text box coordinate graph based on the first coordinate calculation unit; a template library construction module sets an identification position and a position relation filling area in the identification empty template; the template library construction module calls the data integration module to fill the first recognition keyword into the identification position, and then calls the data integration module to fill the text box coordinate into the position relation filling area to obtain the bill template.
In the template library-based bill to be input information identification system, an intelligent identification module is used for detecting bill identification requirements and executing bill information identification operation based on the data processing module, the bill identification template library and the bill identification requirements;
specifically, the bill identification requirement includes: a first identification requirement and a second identification requirement; the first identification requirement is that a bill to be identified exists; the second identification requirement is that no bill to be identified exists;
specifically, when the bill identification requirement is the first identification requirement, the intelligent identification module acquires a bill to be identified corresponding to the first identification requirement; the intelligent recognition module calls the image capture module to capture a to-be-recognized bill image of the to-be-recognized bill; the intelligent identification module executes the bill information identification operation on the bill image to be identified based on the data marking module, the coordinate calculation module, the bill identification keyword list and the plurality of bill templates;
specifically, the bill information identification operation includes: the intelligent identification module identifies a second bill name of the bill image to be identified; the intelligent identification module acquires a second identification keyword corresponding to the second bill name based on the bill identification keyword list; the intelligent recognition module screens a first bill template matched with the second recognition keyword from the bill templates on the basis of the identification position; the intelligent identification module identifies first text box coordinates in the position relation filling area of the first bill template; the intelligent identification module calls the coordinate calculation module to construct a second text box coordinate graph on the bill image to be identified based on the first coordinate calculation unit; the intelligent identification module determines an area to be marked of the bill image to be identified based on the first text box coordinate and the second text box coordinate graph; the intelligent identification module calls the data marking module to generate an information text box to be input on the area to be marked of the bill image to be identified according to the first mark shape; the intelligent identification module identifies text data to be entered in the text box of the information to be entered; and the intelligent identification module packs the second bill name and the text data to be input to obtain first bill information.
Example 3
The present embodiments provide a computer-readable storage medium comprising:
the storage medium is used for storing computer software instructions for implementing the template library-based bill to-be-recorded information identification method described in embodiment 1, and comprises a program for executing the template library-based bill to-be-recorded information identification method; specifically, the executable program may be embedded in the template library-based ticket to-be-recorded information identification system described in embodiment 2, so that the template library-based ticket to-be-recorded information identification system may implement the template library-based ticket to-be-recorded information identification method described in embodiment 1 by executing the embedded executable program.
Furthermore, the computer-readable storage medium of the present embodiments may take any combination of one or more readable storage media, where a readable storage medium includes an electronic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
The method, the system and the medium for identifying the bill to be input information based on the template library can realize the control and identification of various types of bills based on the template library through the method, and meanwhile, based on the construction logic of the template library, the unique text box information identification logic is adopted, so that the method is different from the conventional image identification technology, the identification accuracy and the identification efficiency are further improved, effective technical support is provided for the method through the system, the defects of the prior art are finally overcome, and the method has high application value.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, and a program that can be implemented by the hardware and can be instructed by the program to be executed by the relevant hardware may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic or optical disk, and the like.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A bill to-be-entered information identification method based on a template library is characterized by comprising the following steps:
an initialization configuration step:
configuring a sample ticket collection and an image capture module; creating a ticket image database based on the sample ticket set and the image capture module;
a template library construction step:
configuring a data processing module to obtain bill identification specifications; constructing a bill identification template library based on the bill image database, the data processing module and the bill identification specification;
intelligent identification:
and detecting bill identification requirements, and executing bill information identification operation based on the data processing module, the bill identification template library and the bill identification requirements.
2. The template library-based bill to-be-entered information identification method according to claim 1, characterized in that:
a plurality of entity bill samples are centrally configured in the sample bill;
an image processing program is configured in the image capturing module;
the data processing module comprises: the system comprises a data marking module, a coordinate calculation module and a data integration module;
the bill identification standard is configured with a bill identification keyword list and bill size information;
the bill identification requirements include: a first identification requirement and a second identification requirement; the first identification requirement is that a bill to be identified exists; the second identification requirement is that no bill to be identified exists.
3. The template library-based bill to-be-entered information identification method according to claim 2, characterized in that:
the step of creating a ticket image database based on the sample ticket set and the image capture module further comprises:
calling the image capturing module to capture a plurality of first bill images corresponding to a plurality of the entity bill samples respectively; setting a first background color, a first typeface color, a first resolution and a first image format; configuring a first image empty container;
performing an image processing step based on the image capture module, the first background color, the first typeface color, the first resolution, the first image format, the first image empty container, and a number of the first ticket images, resulting in the ticket image database.
4. The template library-based bill to-be-entered information identification method according to claim 3, characterized in that:
the image processing step includes:
controlling the image capturing module to call the image processing program to respectively perform image color conversion processing on the plurality of first bill images according to the first background color and the first typeface color to obtain a plurality of second bill images;
controlling the image capturing module to call the image processing program to respectively perform image data conversion processing on the second bill images according to the first resolution and the first image format to obtain third bill images;
and introducing a plurality of third bill images into the first image empty container to obtain the bill image database.
5. The template library-based bill to-be-entered information identification method according to claim 4, characterized in that:
the step of constructing a bill identification template library based on the bill image database, the data processing module and the bill identification specification further comprises:
identifying the bill identification keyword list and the bill size information in the bill identification specification; setting a first mark shape and a first coordinate calculation unit; calling a statistical algorithm to count the first number of the third bill images; configuring template empty containers, and creating a plurality of identification empty templates corresponding to the first number according to the bill size information;
respectively executing a bill template construction step on a plurality of third bill images based on the data processing module, the bill identification keyword list, the first mark shape, the first coordinate calculation unit and the plurality of identification empty templates to obtain a plurality of bill templates; and placing a plurality of bill templates into the template empty container to obtain the bill identification template library.
6. The template library-based bill to-be-entered information identification method according to claim 5, characterized in that:
a plurality of bill names and a plurality of identification keywords respectively corresponding to the bill names are configured in the bill identification keyword list;
the bill template construction step comprises:
identifying a first ticket name of the third ticket image; identifying a first identification keyword matched with the first bill name in the bill identification keyword list; identifying a first bill text matched with the first identification keyword in the third bill image;
calling the data marking module to generate a marking text box for the first bill text in the third bill image according to the first marking shape; calling the coordinate calculation module to construct a first text box coordinate graph on the third bill image based on the first coordinate calculation unit; calling the coordinate calculation module to calculate the text box coordinate of the marked text box relative to the first text box coordinate graph based on the first coordinate calculation unit;
setting an identification position and a position relation filling area in the identification empty template; firstly, calling the data integration module to fill the first recognition keyword into the identification position, and then calling the data integration module to fill the text box coordinate into the position relation filling area to obtain the bill template.
7. The template library-based bill to-be-entered information identification method according to claim 6, characterized in that:
the step of performing a ticket information recognition operation based on the data processing module, the ticket recognition template library and the ticket recognition requirement further comprises:
when the bill identification requirement is the first identification requirement, acquiring a bill to be identified corresponding to the first identification requirement; calling the image capturing module to capture a bill image to be recognized of the bill to be recognized; and executing the bill information identification operation on the bill image to be identified based on the data marking module, the coordinate calculation module, the bill identification keyword list and the plurality of bill templates.
8. The template library-based bill to-be-entered information identification method according to claim 7, characterized in that:
the bill information identifying operation includes:
identifying a second bill name of the bill image to be identified; acquiring a second identification keyword corresponding to the second bill name based on the bill identification keyword list;
screening out a first bill template matched with the second recognition keyword from a plurality of bill templates based on the identification position; identifying first text box coordinates in the positional relationship fill area of the first ticket template;
calling the coordinate calculation module to construct a second text box coordinate graph on the bill image to be recognized based on the first coordinate calculation unit; determining a region to be marked of the bill image to be identified based on the first text box coordinate and the second text box coordinate graph;
calling the data marking module to generate an information text box to be input on the area to be marked of the bill image to be identified according to the first marking shape; identifying text data to be entered in the information text box to be entered; and packaging the second bill name and the text data to be input to obtain first bill information.
9. The template library-based bill to-be-entered information identification system of the template library-based bill to-be-entered information identification method according to any one of claims 1 to 8, comprising: the system comprises an initialization configuration module, a template library construction module and an intelligent identification module;
the initialization configuration module is used for configuring a sample bill set and an image capturing module; the initialization configuration module creating a ticket image database based on the sample ticket set and the image capture module;
the template library construction module is used for configuring the data processing module and acquiring the bill identification standard; the template library construction module constructs a bill recognition template library based on the bill image database, the data processing module and the bill recognition standard;
the intelligent identification module is used for detecting bill identification requirements and executing bill information identification operation based on the data processing module, the bill identification template library and the bill identification requirements.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the steps of the template library-based ticket to-be-entered information identification method according to any one of claims 1 to 8.
CN202111397078.8A 2021-11-23 2021-11-23 Bill information to be input identification method, system and medium based on template library Pending CN114241498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111397078.8A CN114241498A (en) 2021-11-23 2021-11-23 Bill information to be input identification method, system and medium based on template library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111397078.8A CN114241498A (en) 2021-11-23 2021-11-23 Bill information to be input identification method, system and medium based on template library

Publications (1)

Publication Number Publication Date
CN114241498A true CN114241498A (en) 2022-03-25

Family

ID=80750640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111397078.8A Pending CN114241498A (en) 2021-11-23 2021-11-23 Bill information to be input identification method, system and medium based on template library

Country Status (1)

Country Link
CN (1) CN114241498A (en)

Similar Documents

Publication Publication Date Title
CN110738602B (en) Image processing method and device, electronic equipment and readable storage medium
US8494257B2 (en) Music score deconstruction
CN103975342B (en) The system and method for capturing and handling for mobile image
US20180075298A1 (en) Method and system for webpage regression testing
CN109378052B (en) The preprocess method and system of image labeling
US11182604B1 (en) Computerized recognition and extraction of tables in digitized documents
US7926732B2 (en) OCR sheet-inputting device, OCR sheet, program for inputting an OCR sheet and program for drawing an OCR sheet form
US20050207635A1 (en) Method and apparatus for printing documents that include MICR characters
CN108597565B (en) Clinical queue data collaborative verification method based on OCR and named entity extraction technology
US11418658B2 (en) Image processing apparatus, image processing system, image processing method, and storage medium
CN113673500A (en) Certificate image recognition method and device, electronic equipment and storage medium
US10679091B2 (en) Image box filtering for optical character recognition
CN112749649A (en) Method and system for intelligently identifying and generating electronic contract
CN116092231A (en) Ticket identification method, ticket identification device, terminal equipment and storage medium
CN111881923A (en) Bill element extraction method based on feature matching
JP7035656B2 (en) Information processing equipment and programs
JP4983464B2 (en) Form image processing apparatus and form image processing program
EP1202213A2 (en) Document format identification apparatus and method
CN111860450A (en) Ticket recognition device and ticket information management system
US8229224B2 (en) Hardware management based on image recognition
CN107861931B (en) Template file processing method and device, computer equipment and storage medium
CN114241498A (en) Bill information to be input identification method, system and medium based on template library
CN114399623B (en) Universal answer identification method, system, storage medium and computing device
JP5878004B2 (en) Multiple document recognition system and multiple document recognition method
CN115631374A (en) Control operation method, control detection model training method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination