Invention content
The purpose of the present invention is to provide a kind of invoice information identifying schemes, so overcome at least to a certain extent due to
One or more problem caused by the limitation of the relevant technologies and defect.
Other characteristics and advantages of the present invention will be by the following detailed description apparent from or partially by the present invention
Practice and acquistion.
According to the first aspect of the invention, a kind of invoice information recognition methods is provided, is included the following steps:
Receive invoice image to be identified;
Whether judge in the invoice image comprising default ID;
If so, obtain the invoice identification information included in the default ID;
If it is not, the invoice identification information included in obtaining the predeterminable area of the invoice image;
It is complete that acquisition invoice corresponding with the invoice image is inquired in invoice database according to the invoice identification information
Ticket information.
In some embodiments of the invention, based on aforementioned schemes, judge whether know in the invoice image comprising default
It the step of other code, specifically includes:
Image preprocessing is carried out to the invoice image, to obtain targeted color image;
The targeted color image is converted into corresponding first gray level image;
Extract all rectangular areas included in first gray level image;
Calculate the edge intensity value computing of each rectangular area in all rectangular areas;
When the edge intensity value computing is more than predetermined threshold value, determine to include the default ID in the invoice image,
Otherwise it determines not include the default ID in the invoice image.
In some embodiments of the invention, it based on aforementioned schemes, obtains and is included in the predeterminable area of the invoice image
Invoice identification information the step of, specifically include:
Intercept the first object rectangular area in the predeterminable area;
Using the default angular vertex of the first object rectangular area as datum mark first mesh is determined for searching
The black-pixel region included in mark rectangular area;
The second target rectangle region of generation is marked according to the beginning and end of the black-pixel region;
Extract all black font regions included in the second target rectangle region;
Character recognition is carried out to each black font region according to trained default interacting deep learning network, to obtain
The invoice identification information.
In some embodiments of the invention, based on aforementioned schemes, according to trained default interacting deep learning network
Character recognition is carried out to each black font region, the step of to obtain the invoice identification information, is specifically included:
Each black font region is converted into corresponding second gray level image;
The gray value of second gray level image is determined as target feature vector;
The target feature vector is input in the default interacting deep learning network, and according to gray value and character
The correspondence of classification carries out character recognition, to obtain the invoice identification information.
In some embodiments of the invention, based on aforementioned schemes, judge whether included in the invoice image described
Before the step of default ID, further include:The image pixel of the invoice image received is detected whether in presetted pixel
In the range of.
According to the second aspect of the invention, a kind of invoice information identification device is provided, including:
Receiving module, for receiving invoice image to be identified;
Judgment module, for whether judging in the invoice image comprising default ID;
First processing module is judged in the invoice image for working as the judgment module comprising the default ID
When, obtain the invoice identification information included in the default ID;
Second processing module judges that the invoice image does not include the default ID for working as the judgment module
When, obtain the invoice identification information included in the predeterminable area of the invoice image;
Acquisition module, for inquiring acquisition and the invoice image in invoice database according to the invoice identification information
The corresponding full ticket information of invoice.
In some embodiments of the invention, based on aforementioned schemes, the judgment module specifically includes:
Submodule is handled, for carrying out image preprocessing to the invoice image, to obtain targeted color image;
Transform subblock, for the targeted color image to be converted to corresponding first gray level image;
First extracting sub-module, for extracting all rectangular areas included in first gray level image;
Computational submodule, for calculating the edge intensity value computing of each rectangular area in all rectangular areas;
First determination sub-module, for when the edge intensity value computing is more than predetermined threshold value, determining in the invoice image
Comprising the default ID, otherwise determine not include the default ID in the invoice image.
In some embodiments of the invention, based on aforementioned schemes, the Second processing module specifically includes:
Submodule is intercepted, for intercepting the first object rectangular area in the predeterminable area;
Second determination sub-module, for using the default angular vertex of the first object rectangular area as datum mark for
Search the black-pixel region for determining to be included in the first object rectangular area;
Submodule is generated, for marking the second target rectangle area of generation according to the beginning and end of the black-pixel region
Domain;
Second extracting sub-module, for extracting all black font regions included in the second target rectangle region;
Submodule is identified, for being carried out according to trained default interacting deep learning network to each black font region
Character recognition, to obtain the invoice identification information.
In some embodiments of the invention, based on aforementioned schemes, the identification submodule is specifically used for:
Each black font region is converted into corresponding second gray level image;
The gray value of second gray level image is determined as target feature vector;
The target feature vector is input in the default interacting deep learning network, and according to gray value and character
The correspondence of classification carries out character recognition, to obtain the invoice identification information.
In some embodiments of the invention, based on aforementioned schemes, which further includes:Detect mould
Block, for whether comprising before default ID, detecting the reception mould in judging the invoice image in the judgment module
Whether the image pixel of the invoice image that block receives is in the range of presetted pixel.
According to the third aspect of the invention we, a kind of computer equipment is provided, including:
Processor;
For storing the memory of the processor-executable instruction, wherein, the processor is used to perform the storage
The step of as above any one of embodiment of first aspect the method is realized during the executable instruction stored in device.
According to the fourth aspect of the invention, a kind of computer readable storage medium is provided, is stored thereon with computer journey
Sequence realizes the step of as above any one of embodiment of first aspect the method when the computer program is executed by processor
Suddenly.
In the technical solution provided in some embodiments of the present invention, the identification region of ticket image is published by frame, and
Targetedly invoice information is carried out based on the identification region that frame is selected to identify, without identifying the entire nominal value of invoice image
It rapidly identifies required invoice information, the efficiency of invoice information identification is effectively improved, so as to improve user experience.
Further, specifically on the basis of the default ID of identification invoice, optimization nominal value identification based on invoice
Structural data is retrieved, and achievees the purpose that improve invoice information recognition efficiency.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, these embodiments are provided so that the present invention will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.
In addition, described feature, structure or feature can be incorporated in one or more implementations in any suitable manner
In example.In the following description, many details are provided to fully understand the embodiment of the present invention so as to provide.However,
It will be appreciated by persons skilled in the art that can put into practice the present invention to technical solution without one in specific detail or more
It is more or other methods, constituent element, device, step etc. may be used.In other cases, known in being not shown in detail or describing
Method, apparatus, realization or operation are to avoid fuzzy each aspect of the present invention.
Attached block diagram shown in figure is only functional entity, not necessarily must be corresponding with physically separate entity.
I.e., it is possible to it realizes these functional entitys using software form or is realized in one or more hardware modules or integrated circuit
These functional entitys realize these functional entitys in heterogeneous networks and/or processor device and/or microcontroller device.
Attached flow chart shown in figure feed exemplary illustration, it is not necessary to including all contents and operation/step,
It is not required to perform by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close
And or partly merge, therefore the sequence actually performed is possible to be changed according to actual conditions.
Fig. 1 diagrammatically illustrates the flow chart of invoice information recognition methods according to an embodiment of the invention.
With reference to Fig. 1, invoice information recognition methods according to an embodiment of the invention includes the following steps:
Step S10 receives invoice image to be identified.
It is understood that the invoice image can be by image capture devices such as camera or mobile phones to paper invoice
Or the mode of the copy progress Image Acquisition of electronic invoice (such as PDF editions electronic invoice) obtains.
Whether step S12 is judged comprising default ID in the invoice image, if it is determined that it is to perform step S14 to be,
Otherwise step S16 is performed.
Whether include quickly to obtain on the invoice image it is understood that detecting first after invoice image is got
The default ID of the invoice of invoice information is got, specifically the default ID can preferably be Quick Response Code.
An exemplary embodiment according to the present invention, as shown in Fig. 2, step S12 can specifically include:
Step S120 carries out image preprocessing, to obtain targeted color image to the invoice image.
It is understood that invoice image carry out image preprocessing the step of, specifically can include to invoice image into
Row normalized, the image rotation processing for showing invoice image front, pretreatment that color enhancement is carried out to image etc..
The targeted color image is converted to corresponding first gray level image by step S122.
Step S124 extracts all rectangular areas included in first gray level image.
Step S126 calculates the edge intensity value computing of each rectangular area in all rectangular areas.
Step S128 when the edge intensity value computing is more than predetermined threshold value, is determined in the invoice image comprising described pre-
If identification code, otherwise determine not include the default ID in the invoice image.
Step S14 obtains the invoice identification information included in the default ID, i.e., default when being included in invoice image
During identification code, invoice identification information is placed in default ID and is stored.
For above-mentioned steps S14, provide an exemplary embodiment and illustrate to get the invoice included in invoice image
A kind of scheme of identification information obtains the invoice identification information included in invoice image by code identification technology.
Step S16 obtains the invoice identification information included in the predeterminable area of the invoice image, i.e., when in invoice image
During not comprising default ID, invoice identification information is placed in the predeterminable area of invoice and is stored.
An exemplary embodiment according to the present invention, as shown in figure 3, step S16 can specifically include:
Step S160 intercepts the first object rectangular area in the predeterminable area.
Step S162 determines institute using the default angular vertex of the first object rectangular area as datum mark for searching
State the black-pixel region included in first object rectangular area.
Step S164 marks the second target rectangle region of generation according to the beginning and end of the black-pixel region.
Step S166 extracts all black font regions included in the second target rectangle region.
Step S168 knows each black font region into line character according to trained default interacting deep learning network
Not, to obtain the invoice identification information.
Further, an exemplary embodiment according to the present invention, step S168 can specifically include:It will be described each
Black font region is converted to corresponding second gray level image;The gray value of second gray level image is determined as target signature
Vector;The target feature vector is input in the default interacting deep learning network, and according to gray value and character type
Other correspondence carries out character recognition, to obtain the invoice identification information.
For above-mentioned steps S16, provide an exemplary embodiment and illustrate to get the invoice included in invoice image
Another scheme of identification information obtains the invoice identification information included in invoice image, such as mould by image recognition technology
Formula identifies.
For above-mentioned steps S14 and step S16, the present invention provides two kinds of exemplary embodiment explanations to get invoice figure
The scheme of invoice identification information included as in, wherein, the invoice identification information includes invoice codes, invoice number, makes out an invoice
Date, without one or more of tax volume, the confidential information randomly generated and check code.
It is corresponding with the invoice image to inquire acquisition according to the invoice identification information in invoice database by step S18
The full ticket information of invoice.
The embodiment of the present invention publishes the identification region of ticket image by frame, and the identification region selected based on frame is carried out
Targetedly invoice information identifies, the entire nominal value without identifying invoice image can rapidly identify required invoice letter
Breath is effectively improved the efficiency of invoice information identification, so as to improve user experience.
Wherein, the full ticket information of invoice is specially the structural data stored in invoice database, can specifically be included:Hair
The type of ticket, code, number, date of making out an invoice, purchaser's title, purchaser's Taxpayer Identification Number, pin side's title, pin side taxpayer identification
Number, the amount of money, the amount of tax to be paid, valency tax add up to.
Further, in some embodiments of the invention, based on aforementioned schemes, before step S12 is performed, the hair
Ticket information identifying method further includes:The image pixel of the invoice image received is detected whether in the range of presetted pixel.
Further, in some embodiments of the invention, based on aforementioned schemes, hair is got in execution of step S18
During the full ticket information of ticket, it can be shown or printed.
Fig. 4 diagrammatically illustrates the flow chart of invoice information recognition methods according to another embodiment of the present invention.
Step S402 receives invoice picture stream, preprocessed to obtain Target Photo.
Wherein, the invoice picture stream can either the Image Acquisition such as mobile phone or image barcode scanning equipment be set by camera
It is standby to obtain.
Specifically, the coloured image I1 of invoice is obtained from invoice picture stream, the coloured image I1 of the invoice is carried out
Normalized forms coloured image I2 or increases the processing of image rotation, and coloured image I2 fronts is made to show or carry out
The processing such as trimming.
Further, in such an embodiment, it is preferred to can be returned the coloured image I1 using bilinear interpolation method
One turns to the image of L × H sizes, and wherein L, H is respectively the width and height of image after scaling, and unit is pixel, and value can
It is set according to practical situations, for example, can be the image size after 4160 × 3120 scalings by original color image size
It is 4000 × 3000.
Further, in this embodiment, image can also be carried out to the coloured image I2 using guiding filtering algorithm
Enhancing pretreatment, forms coloured image I3.
Step S404 judges whether the picture pixels of Target Photo are eligible, if performing step S406, otherwise terminates
Flow.
Specifically, such as when the picture pixels of Target Photo are 1280 × 720, it is believed that it is qualified picture, when
So, other pixel thresholds can also be set according to actual conditions.
Step S406 identifies Target Photo upper left side region.
Whether step S408, the upper left side region for detecting Target Photo have Quick Response Code mark, and step S410 is performed if having, no
Then, step S412 is performed.
Specifically, based on said program, can Target Photo be first converted into gray level image, then utilizes polygon approach
Method detection Target Photo in all rectangular areas, further Canny operator (Canny) is utilized to detect each rectangular area
Interior edge, and the mean value of each rectangular area inward flange amplitude is calculated, using the mean value as the edge of corresponding rectangular area
Intensity value.
Further, if the edge intensity value computing is more than preset strength value, which identifies for Quick Response Code, otherwise should
Rectangular area is not Quick Response Code mark, so traverses all rectangular areas, wherein, preset strength value is set according to practical application
It is fixed.
Step S410 identifies Quick Response Code and carries out barcode scanning decoding, identifies the corresponding Quick Response Code letter included in Target Photo
It ceases (i.e. invoice identification information), such as:Invoice type, invoice number, invoice codes, the date of making out an invoice, without tax volume and verification
The contents such as code, the confidential information randomly generated, further perform step S418.
Step S412 identifies the upper right side region of Target Photo.
Step S414 is modified the upper right side region of Target Photo using algorithm for pattern recognition.
Specifically, the rectangular area R1 that interception Target Photo upper right side size is r1 × r2, with the upper right of the rectangular area
Point on the basis of angular vertex, searches black-pixel region to the left or downwards, and records the beginning and end of black-pixel region, label
New rectangular area R2.
Further, each black font region in the R2 of rectangular area is partitioned into using sciagraphy, and it is returned
One change is handled, and wherein the size of r1, r2 can be set according to practical application experience.
Further, utilization trained interacting deep learning network to black font region into the identification of line character.
Wherein, the interacting deep learning network is a kind of by depth belief network (DBN) and depth Boltzmann machine
The deep learning network model that model (DBM) is combined, construction method include the following steps:DBN and DBM are combined structure
Build 6 layer deep learning networks;1st layer is input layer, inputs the vector to determine dimension (length);6th layer is prediction interval (or label
Layer), utilize this spy's output prediction result of logic;2nd, the 3 layer of non-directed graph being made of RBM connects entirely;4th, 5 layer then by RBM groups
Into digraph connection;Each interlayer is attached by weight vector.
Wherein, the interacting deep learning network training method, includes the following steps:It is M that foundation sum, which is n size,
The sample of the character sample (English alphabet, numerical character) of × N, and the classification of each character is identified, wherein, n >
10000, n value is bigger, and e-learning effect is better.And in training method, using " unsupervised pre-training+there is supervision to finely tune "
Mode, pre-training successively is carried out to obtain the better weights of performance to deep layer network by different models, then by this
One weights are finely adjusted depth network with counterpropagation network for initial value;
Further, using above-mentioned sample database and training method, the interacting deep learning network built is trained.
Specifically, when carrying out character recognition to black font region, gray level image is mainly converted it to, then by it
Gray value is input to trained interacting deep learning network as feature vector, then according to gray value and character class
Correspondence realizes the identification of character.It repeats the above steps and each black font region is identified, and according to identification
Sequentially, inverted order output is carried out.
Step S416, the upper right side region recognition detected whether in Target Photo go out invoice identification information, such as invoice class
Type, invoice codes, invoice number, date of making out an invoice, purchaser's title, purchaser's Taxpayer Identification Number, pin side's title, pin side taxpayer know
Alias, the amount of money, the amount of tax to be paid, valency tax such as add up at the contents, if performing step 418, otherwise terminate flow.
Step S418 connects invoice database query structure data according to the invoice identification information identified.
Wherein, invoice identification information includes invoice number, invoice codes, date of making out an invoice, without tax volume and check code etc.
Carry out unanimous vote face structured data query.
Step S420 is detected and whether is inquired data in invoice database, if so, performing step S422, is otherwise performed
Step S424.
Step S422 returns to the full ticket information of the invoice got, and it is shown or printed;Further, may be used
To dock the enterprise information management system or derived type structure data.
Further, the operations such as the storage management of typing structural data and invoice image can also be carried out.
Step S424 returns to the invoice identification information identified.
The above embodiment of the present invention is integrated with Quick Response Code identification technology, image pattern recognition and structuring number
It is investigated that asking service and big data service, the scheme of the embodiment is by the way that Quick Response Code is identified, pattern-recognition, structural data are looked into
Inquiry is organically combined, and when identifying general invoice, online linking Internet picture streaming data, can fast and accurately know at any time
Do not go out the structural data in invoice unanimous vote face, return structure data provide convenient and reliable invoice structural data to the user.
Fig. 5 diagrammatically illustrates the block diagram of invoice information identification device according to an embodiment of the invention.
Reference Fig. 5, invoice information identification device 50 according to an embodiment of the invention, including:Receiving module 502,
Judgment module 504, first processing module 506, Second processing module 508 and acquisition module 510.
Wherein, receiving module 502 is used to receive invoice image to be identified;Judgment module 504 is used to judge the invoice
Whether default ID is included in image;First processing module 506 is used to judge the invoice image when the judgment module 504
In include the default ID when, obtain the invoice identification information included in the default ID;Second processing module 508
For when the judgment module 504 judges that the invoice image does not include the default ID, obtaining the invoice image
Predeterminable area in the invoice identification information that includes;Acquisition module 510 is used for according to the invoice identification information in invoice data
Inquiry obtains the full ticket information of invoice corresponding with the invoice image in library.
The identification region of ticket image is published by frame, and the identification region selected based on frame is carried out targetedly invoice and believed
Breath identification, the entire nominal value without identifying invoice image can rapidly identify required invoice information, be effectively improved
The efficiency of invoice information identification, so as to improve user experience.
In some embodiments of the invention, based on aforementioned schemes, the judgment module 504 specifically includes:Handle submodule
Block 5040, transform subblock 5042, the first extracting sub-module 5044,5046 and first determination sub-module 5048 of computational submodule,
With reference to Fig. 6.
Wherein, processing submodule 5040 is used to carry out image preprocessing to the invoice image, to obtain targeted color figure
Picture;Transform subblock 5042 is used to the targeted color image being converted to corresponding first gray level image;First extraction submodule
Block 5044 is used to extract all rectangular areas included in first gray level image;Computational submodule 5046 is described for calculating
The edge intensity value computing of each rectangular area in all rectangular areas;First determination sub-module 5048 is used for when the edge strength
When value is more than predetermined threshold value, determine otherwise to determine in the invoice image comprising the default ID in the invoice image
The default ID is not included.
In some embodiments of the invention, based on aforementioned schemes, the Second processing module 508 specifically includes:Interception
Submodule 5100, the second determination sub-module 5102, generation module, the second extracting sub-module 5106 and identification submodule 5108, ginseng
According to Fig. 7.
Wherein, interception submodule 5100 is used to intercept the first object rectangular area in the predeterminable area;Second determines
Submodule 5102 is used for determining described for searching using the default angular vertex of the first object rectangular area as datum mark
The black-pixel region included in first object rectangular area;Submodule 5104 is generated to be used for according to the black-pixel region
The second target rectangle region of beginning and end label generation;Second extracting sub-module 5106 is used to extract second target rectangle
All black font regions included in region;Identify that submodule 5108 is used to learn net according to trained default interacting depth
Network carries out character recognition to each black font region, to obtain the invoice identification information.
In some embodiments of the invention, based on aforementioned schemes, the identification submodule 5108 is specifically used for:By described in
Each black font region is converted to corresponding second gray level image;The gray value of second gray level image is determined as target
Feature vector;The target feature vector is input in the default interacting deep learning network, and according to gray value and word
The correspondence for according with classification carries out character recognition, to obtain the invoice identification information.
In some embodiments of the invention, based on aforementioned schemes, which further includes:Detection module
512, with reference to Fig. 5.
Wherein, whether detection module 512 is used in the judgment module 504 judges the invoice image comprising default know
Before other code, whether the image pixel of the invoice image that the receiving module 502 receives is detected in presetted pixel range
It is interior.
Fig. 8 diagrammatically illustrates the block diagram of computer equipment according to an embodiment of the invention.
With reference to Fig. 8, computer equipment 80 according to an embodiment of the invention, including processor 802 and memory
804, wherein, the computer program that can be run on processor 802, wherein memory 804 and processing are stored on memory 804
It can be connected between device 802 by bus, it is real when the processor 802 is for performing the computer program stored in memory 804
The step of invoice information recognition methods described in example is now performed as described above.
Step in the method for the embodiment of the present invention can be sequentially adjusted, merged and deleted according to actual needs.
It should be noted that although several modules or list for acting the equipment performed are referred in above-detailed
Member, but this division is not enforceable.In fact, according to the embodiment of the present invention, two or more above-described moulds
The feature and function of block either unit can embody in a module or unit.A conversely, above-described module
Either the feature and function of unit can be further divided into multiple modules or unit to embody.
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can be realized by software, can also be realized in a manner that software is with reference to necessary hardware.Therefore, according to the present invention
The technical solution of embodiment can be embodied in the form of software product, the software product can be stored in one it is non-volatile
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions so that a calculating
Equipment (can be personal computer, server, touch control terminal or network equipment etc.) is performed according to embodiment of the present invention
Method.
Those skilled in the art will readily occur to the present invention its after considering specification and putting into practice invention disclosed herein
His embodiment.This application is intended to cover the present invention any variations, uses, or adaptations, these modifications, purposes or
Person's adaptive change follows the general principle of the present invention and including the present invention and undocumented in the art known normal
Knowledge or conventional techniques.Description and embodiments are considered only as illustratively, and true scope and spirit of the invention are by following
Claim point out.
It should be understood that the invention is not limited in the precision architecture for being described above and being shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is only limited by appended claim.