CN110334596B - Invoice picture summarizing method, electronic device and readable storage medium - Google Patents
Invoice picture summarizing method, electronic device and readable storage medium Download PDFInfo
- Publication number
- CN110334596B CN110334596B CN201910462355.5A CN201910462355A CN110334596B CN 110334596 B CN110334596 B CN 110334596B CN 201910462355 A CN201910462355 A CN 201910462355A CN 110334596 B CN110334596 B CN 110334596B
- Authority
- CN
- China
- Prior art keywords
- invoice
- attribute
- preset
- queried
- picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000013507 mapping Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims description 44
- 238000012795 verification Methods 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000005457 optimization Methods 0.000 abstract 1
- 238000012015 optical character recognition Methods 0.000 description 15
- 230000004308 accommodation Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Character Input (AREA)
Abstract
The invention relates to a process optimization technology, and provides an invoice picture summarizing method, an electronic device and a readable storage medium, wherein the method comprises the following steps: identifying the invoice type of each invoice picture to be summarized by using a pre-trained model; determining preset invoice attribute position information in each invoice picture according to a mapping relation between a preset invoice type and preset invoice attribute position information; performing OCR text recognition on preset invoice attributes and corresponding attribute contents of the determined positions in each invoice picture, and recognizing attribute content information in each invoice picture; receiving invoice attribute information to be queried input by a user, and matching the invoice attribute information to be queried with attribute content information in each invoice picture; finding out the invoice picture matched with the invoice picture, and displaying the found invoice picture. The invention realizes the quick positioning and summarizing of the invoice required by the user in the plurality of invoice pictures, and improves the working efficiency.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an invoice picture summarizing method, an electronic device, and a readable storage medium.
Background
At present, when a user needs to find out one type of invoice meeting the specific attribute required by the user from a plurality of invoice pictures to summarize and view, the user can only look up the invoice by turning one page in the plurality of invoice pictures, and cannot quickly locate and search the invoice which needs to be concerned in the plurality of invoice pictures, so that the efficiency is low.
Disclosure of Invention
The invention aims to provide an invoice picture summarizing method, an electronic device and a readable storage medium, which aim to rapidly locate and summarize invoices required by a user in a plurality of invoice pictures.
To achieve the above object, the present invention provides an electronic device, including a memory and a processor, where the memory stores an invoice picture summarizing system that can run on the processor, and the invoice picture summarizing system when executed by the processor implements the following steps:
after receiving a plurality of invoice pictures to be summarized, identifying the invoice type of each invoice picture to be summarized by utilizing a pre-trained model;
determining preset invoice attribute position information in each invoice picture according to a mapping relation between a preset invoice type and preset invoice attribute position information; the preset invoice attribute position information comprises the positions of the preset invoice attributes and the corresponding attribute contents;
Performing OCR text recognition on preset invoice attributes and corresponding attribute contents of the determined positions in each invoice picture, and recognizing attribute content information corresponding to each preset invoice attribute in each invoice picture;
receiving invoice attribute information to be queried input by a user, and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture;
and finding out an invoice picture corresponding to the attribute content information matched with the invoice attribute information to be queried, and displaying the found invoice picture.
Preferably, before the step of receiving invoice attribute information to be queried input by a user and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture, the method further comprises:
displaying a preset information input interface to be queried, wherein the information input interface to be queried comprises an invoice attribute selection item to be queried and an invoice attribute content input item to be queried, so that a user can input invoice attribute information to be queried in the information input interface to be queried; the invoice attribute information to be queried comprises invoice attributes to be queried selected by a user in an invoice attribute selection item to be queried of the information input interface to be queried and invoice attribute contents to be queried input by the user in an invoice attribute content input item to be queried of the information input interface to be queried.
Preferably, after the step of performing OCR text recognition on the preset invoice attribute and the corresponding attribute content determined in the position in each invoice picture and identifying attribute content information corresponding to each preset invoice attribute in each invoice picture, the method further includes:
establishing a query data table according to the attribute content information corresponding to the preset invoice attribute of each identified invoice picture; the query data table comprises invoice pictures, preset invoice attributes and mapping relations among attribute contents;
the step of receiving invoice attribute information to be queried input by a user and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture comprises the following steps:
receiving invoice attribute information to be queried input by a user, searching in an established query data table according to the invoice attribute to be queried and the invoice attribute content to be queried in the invoice attribute information to find out invoice pictures mapped with the invoice attribute to be queried and the invoice attribute content to be queried in the invoice attribute information to be queried.
Preferably, the pre-trained model is a deep convolutional neural network model, and the training process of the pre-trained model is as follows:
A. Preparing a preset number of image samples marked with the corresponding invoice type for each preset invoice type;
B. dividing the image samples corresponding to each preset invoice type into a training subset with a first proportion and a verification subset with a second proportion, mixing the image samples in each training subset to obtain a training set, and mixing the image samples in each verification subset to obtain a verification set;
C. training a model using the training set;
D. and verifying the recognition accuracy of the trained model by using the verification set, if the accuracy is greater than or equal to the preset accuracy, ending the training, or if the accuracy is less than the preset accuracy, increasing the number of image samples corresponding to each preset invoice type, and re-executing the step B, C, D.
In addition, in order to achieve the above purpose, the present invention also provides an invoice picture summarizing method, which includes:
after receiving a plurality of invoice pictures to be summarized, identifying the invoice type of each invoice picture to be summarized by utilizing a pre-trained model;
determining preset invoice attribute position information in each invoice picture according to a mapping relation between a preset invoice type and preset invoice attribute position information; the preset invoice attribute position information comprises the positions of the preset invoice attributes and the corresponding attribute contents;
Performing OCR text recognition on preset invoice attributes and corresponding attribute contents of the determined positions in each invoice picture, and recognizing attribute content information corresponding to each preset invoice attribute in each invoice picture;
receiving invoice attribute information to be queried input by a user, and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture;
and finding out an invoice picture corresponding to the attribute content information matched with the invoice attribute information to be queried, and displaying the found invoice picture.
Preferably, before the step of receiving invoice attribute information to be queried input by a user and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture, the method further comprises:
displaying a preset information input interface to be queried, wherein the information input interface to be queried comprises an invoice attribute selection item to be queried and an invoice attribute content input item to be queried, so that a user can input invoice attribute information to be queried in the information input interface to be queried; the invoice attribute information to be queried comprises invoice attributes to be queried selected by a user in an invoice attribute selection item to be queried of the information input interface to be queried and invoice attribute contents to be queried input by the user in an invoice attribute content input item to be queried of the information input interface to be queried.
Preferably, after the step of performing OCR text recognition on the preset invoice attribute and the corresponding attribute content determined in the position in each invoice picture and identifying attribute content information corresponding to each preset invoice attribute in each invoice picture, the method further includes:
establishing a query data table according to the attribute content information corresponding to the preset invoice attribute of each identified invoice picture; the query data table comprises invoice pictures, preset invoice attributes and mapping relations among attribute contents;
the step of receiving invoice attribute information to be queried input by a user and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture comprises the following steps:
receiving invoice attribute information to be queried input by a user, searching in an established query data table according to the invoice attribute to be queried and the invoice attribute content to be queried in the invoice attribute information to find out invoice pictures mapped with the invoice attribute to be queried and the invoice attribute content to be queried in the invoice attribute information to be queried.
Preferably, the pre-trained model is a deep convolutional neural network model, and the training process of the pre-trained model is as follows:
A. Preparing a preset number of image samples marked with the corresponding invoice type for each preset invoice type;
B. dividing the image samples corresponding to each preset invoice type into a training subset with a first proportion and a verification subset with a second proportion, mixing the image samples in each training subset to obtain a training set, and mixing the image samples in each verification subset to obtain a verification set;
C. training a model using the training set;
D. and verifying the recognition accuracy of the trained model by using the verification set, if the accuracy is greater than or equal to the preset accuracy, ending the training, or if the accuracy is less than the preset accuracy, increasing the number of image samples corresponding to each preset invoice type, and re-executing the step B, C, D.
Preferably, the preset invoice attribute comprises company name, company industry and account cover.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium storing an invoice picture summarization system executable by at least one processor to cause the at least one processor to perform the steps of an invoice picture summarization method as described above.
According to the invoice picture summarizing method, the electronic device and the readable storage medium, the invoice type of each invoice picture to be summarized is identified through a pre-trained model, the preset invoice attribute position in each invoice picture is determined according to the invoice type, and OCR text recognition is performed on the preset invoice attribute and the corresponding attribute content of the determined position in each invoice picture; receiving invoice attribute information to be queried input by a user, and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture; and finding out an invoice picture matched with the invoice attribute information to be queried, and displaying the found invoice picture. The invoice pictures corresponding to the invoice attributes required to be inquired by the user can be automatically matched in the multiple invoice pictures to be summarized and displayed to the user, so that the user does not need to manually turn pages for searching each invoice picture, the invoice required by the user can be rapidly positioned and summarized in the multiple invoice pictures, and the working efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of the operating environment of a preferred embodiment of invoice picture summary system 10 of the present invention;
Fig. 2 is a flowchart of an embodiment of an invoice picture summarizing method according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
The invention provides an invoice picture summarizing system. Referring to FIG. 1, a schematic diagram of the operating environment of a preferred embodiment of an invoice picture summary system 10 according to the present invention is shown.
In this embodiment, the invoice picture summary system 10 is installed and operated in the electronic device 1. The electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13. Fig. 1 shows only an electronic device 1 with components 11-13, but it is understood that not all shown components are required to be implemented, and that more or fewer components may alternatively be implemented.
The memory 11 is at least one type of readable computer storage medium, and the memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic apparatus 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic apparatus 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic apparatus 1. The memory 11 is used for storing application software and various data installed in the electronic device 1, such as program codes of the invoice picture summary system 10. The memory 11 may also be used for temporarily storing data that has been output or is to be output.
The processor 12 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for executing program code or processing data stored in the memory 11, such as executing the invoice picture summary system 10, etc.
The display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like in some embodiments. The display 13 is used for displaying information processed in the electronic device 1 and for displaying a visual user interface, such as an invoice type of each invoice picture, matched invoice pictures, etc. The components 11-13 of the electronic device 1 communicate with each other via a system bus.
Invoice picture summary system 10 includes at least one computer readable instruction stored in memory 11, which is executable by processor 12 to implement embodiments of the present application.
Wherein the invoice picture summary system 10, when executed by the processor 12, performs the following steps:
step S1, after receiving a plurality of invoice pictures to be summarized, identifying the invoice type of each invoice picture to be summarized by using a pre-trained model;
Step S2, determining preset invoice attribute position information in each invoice picture according to a mapping relation between a preset invoice type and preset invoice attribute position information; the preset invoice attribute position information comprises the positions of the preset invoice attributes and the corresponding attribute contents;
step S3, performing OCR text recognition on the preset invoice attribute and the corresponding attribute content of the determined position in each invoice picture, and recognizing attribute content information corresponding to each preset invoice attribute in each invoice picture;
step S4, receiving invoice attribute information to be queried input by a user, and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture;
and S5, finding out an invoice picture corresponding to the attribute content information matched with the invoice attribute information to be queried, and displaying the found invoice picture.
In this embodiment, first, a plurality of invoice pictures to be summarized are received. For example, receiving an invoice summarizing request including a plurality of invoice pictures to be summarized sent by a user, for example, receiving an invoice summarizing request sent by the user (such as a document entry person) through a mobile phone, a tablet computer, a self-service terminal device and other terminals, for example, receiving an invoice summarizing request sent by the user on a client side pre-installed in the mobile phone, the tablet computer, the self-service terminal device and other terminals, or receiving an invoice summarizing request sent by the user on a browser system in the mobile phone, the tablet computer, the self-service terminal device and other terminals. After receiving the multiple invoice pictures to be summarized, performing preset de-noising treatment on the multiple invoice pictures to be summarized, such as performing Gaussian blur treatment on the multiple invoice pictures to be summarized, so as to primarily remove noise and clutter interference in the multiple invoice pictures to be summarized.
Further, after receiving a plurality of invoice pictures to be summarized, if the positions of the invoice pictures are not correct, the invoice pictures can be rotated. Specifically, the transposition condition of the invoice picture can be judged according to the height-width ratio information of the invoice picture and the position of the seal in the invoice picture, and the overturning adjustment is performed. For example, when the aspect ratio of the invoice picture is greater than 1, the invoice picture is indicated to be reversed in aspect, if the position of the seal in the invoice picture is at the left side of the invoice picture, the invoice picture is subjected to ninety degree clockwise rotation, and if the position of the seal is at the right side of the invoice picture, the invoice picture is subjected to ninety degree anticlockwise rotation; when the aspect ratio of the invoice picture is smaller than 1, the aspect ratio of the invoice picture is not reversed, and if the seal position in the invoice picture is at the lower side of the invoice picture, the invoice picture is clockwise rotated by one hundred eighty degrees.
After receiving the multiple invoice pictures to be summarized, identifying the invoice type of each invoice picture to be summarized by utilizing a pre-trained model, such as a catering invoice, a traffic invoice, an accommodation invoice, an outpatient service bill, a hospitality bill and the like. After the invoice type of the invoice picture is identified, the positions of all the attributes and the corresponding attribute contents in all the invoices of the same invoice type are fixed, so that the positions of all the invoice attributes and the corresponding attribute contents in the invoice picture can be determined according to the identified invoice type of the invoice picture. The training process of the model is that A, preparing a preset number (for example, 1000) of image samples marked with the corresponding invoice types for each preset invoice type (for example, preset invoice types comprise an outpatient service bill, a inpatient service bill, a insurance charge receipt, a claim issuing bill and the like), B, dividing the image samples corresponding to each preset invoice type into a training subset with a first proportion (for example, 80%) and a verification subset with a second proportion (for example, 20%), mixing the image samples in each training subset to obtain a training set, mixing the image samples in each verification subset to obtain a verification set, training the model by using the training set, D, verifying the identification accuracy of the model by using the verification set, and if the accuracy is more than or equal to the preset accuracy, ending the training, or if the accuracy is more than or equal to the preset accuracy is more than the preset accuracy, and if the accuracy is more than the preset accuracy is more than the new accuracy by the preset accuracy and the accuracy is increased by the new accuracy of 35, and the accuracy is increased by the new accuracy of each step is performed.
Each preset invoice attribute can be customized by a user, for example, the user can customize the attribute which needs to be frequently queried or important as the preset invoice attribute, and can also default to all attributes of the invoice, such as company name, company industry, tax payer identification number, address, telephone, account opening row, account number and the like. For example, if the user needs to display the invoice after the invoice is summarized according to the company section or the account cover, the preset invoice attribute can be preset to be the company section or the account cover, when the attribute content information corresponding to the preset invoice attribute of each invoice picture to be summarized is identified, only the attribute content information of the company section or the account cover of each invoice picture is identified, other irrelevant attributes are not identified, the invoice summarizing speed is improved, and the user can quickly locate and inquire the invoice which needs to be focused.
After determining the positions of each preset invoice attribute and corresponding attribute content in the invoice picture, performing OCR text recognition on the attribute content corresponding to the preset invoice attribute of the determined position in the invoice picture. For example, attribute content information corresponding to a preset invoice attribute of a determined position in the invoice picture can be identified by using a predetermined character recognition model. The predetermined character recognition model may be an OCR optical character recognition engine, or a character recognition model obtained by learning and training in advance, such as a time recurrent neural network model (Long-Short Term Memory, LSTM), and the like, which is not limited herein. The professional word stock can also be established in advance, the professional word stock is established according to common words of the invoice (such as names, numbers and the like of various possibly related companies), and attribute content information corresponding to preset invoice attributes of the determined positions in the invoice pictures is compared and identified according to the professional word stock, so that system resources are saved.
Receiving invoice attribute information to be queried input by a user, providing an information input interface to be queried, wherein the information input interface to be queried comprises an invoice attribute selection item to be queried and an invoice attribute content input item to be queried, and the user can select an invoice attribute to be queried which needs to pay attention to search from the invoice attribute selection item to be queried in the information input interface to be queried, and the invoice attribute to be queried is one of preset invoice attributes. After the user selects the invoice attribute to be queried, invoice attribute content to be queried corresponding to the selected invoice attribute to be queried can be input in the invoice attribute content input item to be queried in the information input interface to be queried, if the invoice attribute to be queried selected by the user is a company name, the corresponding company name content (can be a company name whole name or a company name abbreviation) is input in the invoice attribute content input item to be queried, and a query instruction (such as clicking a query button in the information input interface to be queried) is sent, so that an invoice picture matched with the invoice attribute content to be queried input by the user can be quickly searched from a plurality of invoice pictures.
According to the method, the invoice type of each invoice picture to be summarized is identified through a pre-trained model, the preset invoice attribute position in each invoice picture is determined according to the invoice type, and OCR text recognition is carried out on the preset invoice attribute and the corresponding attribute content of the determined position in each invoice picture; receiving invoice attribute information to be queried input by a user, and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture; and finding out an invoice picture matched with the invoice attribute information to be queried, and displaying the found invoice picture. The invoice pictures corresponding to the invoice attributes required to be inquired by the user can be automatically matched in the multiple invoice pictures to be summarized and displayed to the user, so that the user does not need to manually turn pages for searching each invoice picture, the invoice required by the user can be rapidly positioned and summarized in the multiple invoice pictures, and the working efficiency is improved.
In an alternative embodiment, the invoice picture summary system 10, when executed by the processor 12, further implements the steps of:
establishing a query data table according to the attribute content information corresponding to the preset invoice attribute of each identified invoice picture; the query data table comprises invoice pictures, preset invoice attributes and mapping relations among attribute contents.
In this embodiment, a query data table is established according to attribute content information corresponding to the preset invoice attribute of each identified invoice picture; the query data table contains each preset invoice attribute corresponding to each invoice picture and corresponding attribute content information. Therefore, after receiving the invoice attribute information to be queried input by the user, the invoice image matched with the invoice attribute information to be queried input by the user can be found out according to the invoice attribute information to be queried input by the user in the established query data table. And displaying the matched invoice picture to a user, so that the invoice picture can be rapidly positioned according to the user requirement.
As shown in fig. 2, fig. 2 is a flow chart of an embodiment of an invoice picture summarizing method according to the present invention, the invoice picture summarizing method includes the following steps:
Step S10, after receiving a plurality of invoice pictures to be summarized, identifying the invoice type of each invoice picture to be summarized by using a pre-trained model;
step S20, determining preset invoice attribute position information in each invoice picture according to a mapping relation between a preset invoice type and preset invoice attribute position information; the preset invoice attribute position information comprises the positions of the preset invoice attributes and the corresponding attribute contents;
step S30, performing OCR text recognition on preset invoice attributes and corresponding attribute contents of the determined positions in each invoice picture, and recognizing attribute content information corresponding to each preset invoice attribute in each invoice picture;
step S40, receiving invoice attribute information to be queried input by a user, and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture;
and step S50, finding out an invoice picture corresponding to the attribute content information matched with the invoice attribute information to be queried, and displaying the found invoice picture.
In this embodiment, first, a plurality of invoice pictures to be summarized are received. For example, receiving an invoice summarizing request including a plurality of invoice pictures to be summarized sent by a user, for example, receiving an invoice summarizing request sent by the user (such as a document entry person) through a mobile phone, a tablet computer, a self-service terminal device and other terminals, for example, receiving an invoice summarizing request sent by the user on a client side pre-installed in the mobile phone, the tablet computer, the self-service terminal device and other terminals, or receiving an invoice summarizing request sent by the user on a browser system in the mobile phone, the tablet computer, the self-service terminal device and other terminals. After receiving the multiple invoice pictures to be summarized, performing preset de-noising treatment on the multiple invoice pictures to be summarized, such as performing Gaussian blur treatment on the multiple invoice pictures to be summarized, so as to primarily remove noise and clutter interference in the multiple invoice pictures to be summarized.
Further, after receiving a plurality of invoice pictures to be summarized, if the positions of the invoice pictures are not correct, the invoice pictures can be rotated. Specifically, the transposition condition of the invoice picture can be judged according to the height-width ratio information of the invoice picture and the position of the seal in the invoice picture, and the overturning adjustment is performed. For example, when the aspect ratio of the invoice picture is greater than 1, the invoice picture is indicated to be reversed in aspect, if the position of the seal in the invoice picture is at the left side of the invoice picture, the invoice picture is subjected to ninety degree clockwise rotation, and if the position of the seal is at the right side of the invoice picture, the invoice picture is subjected to ninety degree anticlockwise rotation; when the aspect ratio of the invoice picture is smaller than 1, the aspect ratio of the invoice picture is not reversed, and if the seal position in the invoice picture is at the lower side of the invoice picture, the invoice picture is clockwise rotated by one hundred eighty degrees.
After receiving the multiple invoice pictures to be summarized, identifying the invoice type of each invoice picture to be summarized by utilizing a pre-trained model, such as a catering invoice, a traffic invoice, an accommodation invoice, an outpatient service bill, a hospitality bill and the like. After the invoice type of the invoice picture is identified, the positions of all the attributes and the corresponding attribute contents in all the invoices of the same invoice type are fixed, so that the positions of all the invoice attributes and the corresponding attribute contents in the invoice picture can be determined according to the identified invoice type of the invoice picture. The training process of the model is that A, preparing a preset number (for example, 1000) of image samples marked with the corresponding invoice types for each preset invoice type (for example, preset invoice types comprise an outpatient service bill, a inpatient service bill, a insurance charge receipt, a claim issuing bill and the like), B, dividing the image samples corresponding to each preset invoice type into a training subset with a first proportion (for example, 80%) and a verification subset with a second proportion (for example, 20%), mixing the image samples in each training subset to obtain a training set, mixing the image samples in each verification subset to obtain a verification set, training the model by using the training set, D, verifying the identification accuracy of the model by using the verification set, and if the accuracy is more than or equal to the preset accuracy, ending the training, or if the accuracy is more than or equal to the preset accuracy is more than the preset accuracy, and if the accuracy is more than the preset accuracy is more than the new accuracy by the preset accuracy and the accuracy is increased by the new accuracy of 35, and the accuracy is increased by the new accuracy of each step is performed.
Each preset invoice attribute can be customized by a user, for example, the user can customize the attribute which needs to be frequently queried or important as the preset invoice attribute, and can also default to all attributes of the invoice, such as company name, company industry, tax payer identification number, address, telephone, account opening row, account number and the like. For example, if the user needs to display the invoice after the invoice is summarized according to the company section or the account cover, the preset invoice attribute can be preset to be the company section or the account cover, when the attribute content information corresponding to the preset invoice attribute of each invoice picture to be summarized is identified, only the attribute content information of the company section or the account cover of each invoice picture is identified, other irrelevant attributes are not identified, the invoice summarizing speed is improved, and the user can quickly locate and inquire the invoice which needs to be focused.
After determining the positions of each preset invoice attribute and corresponding attribute content in the invoice picture, performing OCR text recognition on the attribute content corresponding to the preset invoice attribute of the determined position in the invoice picture. For example, attribute content information corresponding to a preset invoice attribute of a determined position in the invoice picture can be identified by using a predetermined character recognition model. The predetermined character recognition model may be an OCR optical character recognition engine, or a character recognition model obtained by learning and training in advance, such as a time recurrent neural network model (Long-Short Term Memory, LSTM), and the like, which is not limited herein. The professional word stock can also be established in advance, the professional word stock is established according to common words of the invoice (such as names, numbers and the like of various possibly related companies), and attribute content information corresponding to preset invoice attributes of the determined positions in the invoice pictures is compared and identified according to the professional word stock, so that system resources are saved.
Receiving invoice attribute information to be queried input by a user, providing an information input interface to be queried, wherein the information input interface to be queried comprises an invoice attribute selection item to be queried and an invoice attribute content input item to be queried, and the user can select an invoice attribute to be queried which needs to pay attention to search from the invoice attribute selection item to be queried in the information input interface to be queried, and the invoice attribute to be queried is one of preset invoice attributes. After the user selects the invoice attribute to be queried, invoice attribute content to be queried corresponding to the selected invoice attribute to be queried can be input in the invoice attribute content input item to be queried in the information input interface to be queried, if the invoice attribute to be queried selected by the user is a company name, the corresponding company name content (can be a company name whole name or a company name abbreviation) is input in the invoice attribute content input item to be queried, and a query instruction (such as clicking a query button in the information input interface to be queried) is sent, so that an invoice picture matched with the invoice attribute content to be queried input by the user can be quickly searched from a plurality of invoice pictures.
According to the method, the invoice type of each invoice picture to be summarized is identified through a pre-trained model, the preset invoice attribute position in each invoice picture is determined according to the invoice type, and OCR text recognition is carried out on the preset invoice attribute and the corresponding attribute content of the determined position in each invoice picture; receiving invoice attribute information to be queried input by a user, and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture; and finding out an invoice picture matched with the invoice attribute information to be queried, and displaying the found invoice picture. The invoice pictures corresponding to the invoice attributes required to be inquired by the user can be automatically matched in the multiple invoice pictures to be summarized and displayed to the user, so that the user does not need to manually turn pages for searching each invoice picture, the invoice required by the user can be rapidly positioned and summarized in the multiple invoice pictures, and the working efficiency is improved.
In an alternative embodiment, the method further comprises the following steps, based on the above embodiment:
establishing a query data table according to the attribute content information corresponding to the preset invoice attribute of each identified invoice picture; the query data table comprises invoice pictures, preset invoice attributes and mapping relations among attribute contents.
In this embodiment, a query data table is established according to attribute content information corresponding to the preset invoice attribute of each identified invoice picture; the query data table contains each preset invoice attribute corresponding to each invoice picture and corresponding attribute content information. Therefore, after receiving the invoice attribute information to be queried input by the user, the invoice image matched with the invoice attribute information to be queried input by the user can be found out according to the invoice attribute information to be queried input by the user in the established query data table. And displaying the matched invoice picture to a user, so that the invoice picture can be rapidly positioned according to the user requirement.
In addition, the present invention further provides a computer readable storage medium, where the computer readable storage medium stores an invoice picture summarizing system, where the invoice picture summarizing system may be executed by at least one processor, so that the at least one processor performs the steps of the invoice picture summarizing method in the above embodiment, and specific implementation processes of steps S10, S20, S30, etc. of the invoice picture summarizing method are described above, and are not repeated herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the present invention. The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In addition, while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in a different order than is shown.
Those skilled in the art will appreciate that many modifications are possible in which the invention is practiced without departing from its scope or spirit, e.g., features of one embodiment can be used with another embodiment to yield yet a further embodiment. Any modification, equivalent replacement and improvement made within the technical idea of the present invention should be within the scope of the claims of the present invention.
Claims (8)
1. An electronic device comprising a memory, a processor, the memory having stored thereon an invoice picture summary system operable on the processor, the invoice picture summary system when executed by the processor performing the steps of:
after receiving a plurality of invoice pictures to be summarized, denoising and rotating adjustment processing are carried out on the plurality of invoice pictures, and the invoice type of each invoice picture after denoising processing is identified by using a pre-trained model;
Determining preset invoice attribute position information in each invoice picture according to a mapping relation between a preset invoice type and preset invoice attribute position information; the preset invoice attribute position information comprises the positions of the preset invoice attributes and the corresponding attribute contents;
performing OCR text recognition on the preset invoice attribute and the corresponding attribute content of the determined position in each invoice picture by using a preset determined character recognition model, or performing comparison recognition on the preset invoice attribute and the corresponding attribute content of the determined position in each invoice picture by using a preset determined professional word stock, recognizing attribute content information corresponding to each preset invoice attribute in each invoice picture, wherein the professional word stock is a word stock constructed according to common words of each invoice picture, and the character recognition model is a time recursion neural network model;
receiving invoice attribute information to be queried, which is input by a user in a preset information input interface to be queried, wherein the information input interface to be queried comprises invoice attribute selection items to be queried and invoice attribute content input items to be queried, and the invoice attribute information to be queried comprises invoice attributes to be queried selected by the user in the invoice attribute selection items to be queried and invoice attribute content to be queried, which is input by the user in the invoice attribute content input items to be queried;
Matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture;
and finding out an invoice picture corresponding to the attribute content information matched with the invoice attribute information to be queried, and displaying the found invoice picture.
2. The electronic device of claim 1, wherein after the step of performing OCR text recognition on the preset invoice attribute and the corresponding attribute content in each invoice picture in which the position is determined, the step of recognizing attribute content information corresponding to each preset invoice attribute in each invoice picture further comprises:
establishing a query data table according to the attribute content information corresponding to the preset invoice attribute of each identified invoice picture; the query data table comprises invoice pictures, preset invoice attributes and mapping relations among attribute contents;
the step of receiving invoice attribute information to be queried input by a user and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture comprises the following steps:
receiving invoice attribute information to be queried input by a user, searching in an established query data table according to the invoice attribute to be queried and the invoice attribute content to be queried in the invoice attribute information to find out invoice pictures mapped with the invoice attribute to be queried and the invoice attribute content to be queried in the invoice attribute information to be queried.
3. The electronic device of claim 1, wherein the pre-trained model is a deep convolutional neural network model, the training process of the pre-trained model being as follows:
A. preparing a preset number of image samples marked with the corresponding invoice type for each preset invoice type;
B. dividing the image samples corresponding to each preset invoice type into a training subset with a first proportion and a verification subset with a second proportion, mixing the image samples in each training subset to obtain a training set, and mixing the image samples in each verification subset to obtain a verification set;
C. training a model using the training set;
D. and verifying the recognition accuracy of the trained model by using the verification set, if the accuracy is greater than or equal to the preset accuracy, ending the training, or if the accuracy is less than the preset accuracy, increasing the number of image samples corresponding to each preset invoice type, and re-executing the step B, C, D.
4. The invoice picture summarizing method is characterized by comprising the following steps of:
after receiving a plurality of invoice pictures to be summarized, denoising and rotating adjustment processing are carried out on the plurality of invoice pictures, and the invoice type of each invoice picture after denoising processing is identified by using a pre-trained model;
Determining preset invoice attribute position information in each invoice picture according to a mapping relation between a preset invoice type and preset invoice attribute position information; the preset invoice attribute position information comprises the positions of the preset invoice attributes and the corresponding attribute contents;
performing OCR text recognition on the preset invoice attribute and the corresponding attribute content of the determined position in each invoice picture by using a preset determined character recognition model, or performing comparison recognition on the preset invoice attribute and the corresponding attribute content of the determined position in each invoice picture by using a preset determined professional word stock, recognizing attribute content information corresponding to each preset invoice attribute in each invoice picture, wherein the professional word stock is a word stock constructed according to common words of each invoice picture, and the character recognition model is a time recursion neural network model;
receiving invoice attribute information to be queried, which is input by a user in a preset information input interface to be queried, wherein the information input interface to be queried comprises invoice attribute selection items to be queried and invoice attribute content input items to be queried, and the invoice attribute information to be queried comprises invoice attributes to be queried selected by the user in the invoice attribute selection items to be queried and invoice attribute content to be queried, which is input by the user in the invoice attribute content input items to be queried;
Matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture;
and finding out an invoice picture corresponding to the attribute content information matched with the invoice attribute information to be queried, and displaying the found invoice picture.
5. The invoice picture summarization method as claimed in claim 4, wherein after the step of performing OCR text recognition on the preset invoice attributes and the corresponding attribute contents determined in each invoice picture, identifying attribute content information corresponding to each preset invoice attribute in each invoice picture, the method further comprises:
establishing a query data table according to the attribute content information corresponding to the preset invoice attribute of each identified invoice picture; the query data table comprises invoice pictures, preset invoice attributes and mapping relations among attribute contents;
the step of receiving invoice attribute information to be queried input by a user and matching the invoice attribute information to be queried with attribute content information corresponding to each preset invoice attribute in each invoice picture comprises the following steps:
receiving invoice attribute information to be queried input by a user, searching in an established query data table according to the invoice attribute to be queried and the invoice attribute content to be queried in the invoice attribute information to find out invoice pictures mapped with the invoice attribute to be queried and the invoice attribute content to be queried in the invoice attribute information to be queried.
6. The invoice picture summarization method as claimed in claim 4, wherein the pre-trained model is a deep convolutional neural network model, and the training process of the pre-trained model is as follows:
A. preparing a preset number of image samples marked with the corresponding invoice type for each preset invoice type;
B. dividing the image samples corresponding to each preset invoice type into a training subset with a first proportion and a verification subset with a second proportion, mixing the image samples in each training subset to obtain a training set, and mixing the image samples in each verification subset to obtain a verification set;
C. training a model using the training set;
D. and verifying the recognition accuracy of the trained model by using the verification set, if the accuracy is greater than or equal to the preset accuracy, ending the training, or if the accuracy is less than the preset accuracy, increasing the number of image samples corresponding to each preset invoice type, and re-executing the step B, C, D.
7. The invoice photo gathering method as claimed in claim 4, wherein the preset invoice attributes include company name, company industry, account cover.
8. A computer readable storage medium having stored thereon an invoice picture summarization system which when executed by a processor performs the steps of the invoice picture summarization method of any one of claims 4 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910462355.5A CN110334596B (en) | 2019-05-30 | 2019-05-30 | Invoice picture summarizing method, electronic device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910462355.5A CN110334596B (en) | 2019-05-30 | 2019-05-30 | Invoice picture summarizing method, electronic device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110334596A CN110334596A (en) | 2019-10-15 |
CN110334596B true CN110334596B (en) | 2024-02-02 |
Family
ID=68140562
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910462355.5A Active CN110334596B (en) | 2019-05-30 | 2019-05-30 | Invoice picture summarizing method, electronic device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334596B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112434689A (en) * | 2020-12-01 | 2021-03-02 | 天冕信息技术(深圳)有限公司 | Method, device and equipment for identifying information in picture and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766809A (en) * | 2017-10-09 | 2018-03-06 | 平安科技(深圳)有限公司 | Electronic installation, billing information recognition methods and computer-readable recording medium |
CN107798299A (en) * | 2017-10-09 | 2018-03-13 | 平安科技(深圳)有限公司 | Billing information recognition methods, electronic installation and readable storage medium storing program for executing |
CN109308476A (en) * | 2018-09-06 | 2019-02-05 | 邬国锐 | Billing information processing method, system and computer readable storage medium |
CN109359127A (en) * | 2018-09-07 | 2019-02-19 | 彩讯科技股份有限公司 | A kind of querying method of electronic invoice, device, equipment and storage medium |
CN109815949A (en) * | 2018-12-20 | 2019-05-28 | 航天信息股份有限公司 | Invoice publicity method and system neural network based |
-
2019
- 2019-05-30 CN CN201910462355.5A patent/CN110334596B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766809A (en) * | 2017-10-09 | 2018-03-06 | 平安科技(深圳)有限公司 | Electronic installation, billing information recognition methods and computer-readable recording medium |
CN107798299A (en) * | 2017-10-09 | 2018-03-13 | 平安科技(深圳)有限公司 | Billing information recognition methods, electronic installation and readable storage medium storing program for executing |
CN109308476A (en) * | 2018-09-06 | 2019-02-05 | 邬国锐 | Billing information processing method, system and computer readable storage medium |
CN109359127A (en) * | 2018-09-07 | 2019-02-19 | 彩讯科技股份有限公司 | A kind of querying method of electronic invoice, device, equipment and storage medium |
CN109815949A (en) * | 2018-12-20 | 2019-05-28 | 航天信息股份有限公司 | Invoice publicity method and system neural network based |
Also Published As
Publication number | Publication date |
---|---|
CN110334596A (en) | 2019-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111476227B (en) | Target field identification method and device based on OCR and storage medium | |
US10200336B2 (en) | Generating a conversation in a social network based on mixed media object context | |
US20210097112A1 (en) | Webpage data processing method and device, computer device and computer storage medium | |
CN107704512B (en) | Financial product recommendation method based on social data, electronic device and medium | |
CN109636582B (en) | Credit information management method, apparatus, device and storage medium | |
CN107798001B (en) | Webpage processing method, device and equipment | |
US8180757B2 (en) | System and method for leveraging tag context | |
US20110097694A1 (en) | Interpersonal relationships analysis system and method | |
CN110716991B (en) | Method for displaying entity associated information based on electronic book and electronic equipment | |
WO2019024234A1 (en) | Vehicle loss-related identification photo classification method and system, electronic device, and readable storage medium | |
US20150186739A1 (en) | Method and system of identifying an entity from a digital image of a physical text | |
CN108427701B (en) | Method for identifying help information based on operation page and application server | |
US9256805B2 (en) | Method and system of identifying an entity from a digital image of a physical text | |
CN111259056A (en) | Block chain data query method, system and related equipment | |
JP2020514681A (en) | Substance detection method, device, electronic device, and computer-readable storage medium | |
CN110334596B (en) | Invoice picture summarizing method, electronic device and readable storage medium | |
CN108921193B (en) | Picture input method, server and computer storage medium | |
CN111476633A (en) | Product service recommendation platform, product service path recommendation method and medium | |
CN113064984B (en) | Intention recognition method, device, electronic equipment and readable storage medium | |
CN112364857B (en) | Image recognition method, device and storage medium based on numerical extraction | |
CN114330240A (en) | PDF document analysis method and device, computer equipment and storage medium | |
CN111177387A (en) | User list information processing method, electronic device and computer readable storage medium | |
CN113343109A (en) | List recommendation method, computing device and computer storage medium | |
US20140177951A1 (en) | Method, apparatus, and storage medium having computer executable instructions for processing of an electronic document | |
CN111695441B (en) | Image document processing method, device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |