CN109791643A - System and method for generating the merging data of electronic document - Google Patents

System and method for generating the merging data of electronic document Download PDF

Info

Publication number
CN109791643A
CN109791643A CN201780058567.6A CN201780058567A CN109791643A CN 109791643 A CN109791643 A CN 109791643A CN 201780058567 A CN201780058567 A CN 201780058567A CN 109791643 A CN109791643 A CN 109791643A
Authority
CN
China
Prior art keywords
electronic document
data
expense
model
transaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201780058567.6A
Other languages
Chinese (zh)
Inventor
N·古兹曼
I·萨夫特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vatbox Ltd
Original Assignee
Vatbox Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/361,934 external-priority patent/US20170154385A1/en
Application filed by Vatbox Ltd filed Critical Vatbox Ltd
Publication of CN109791643A publication Critical patent/CN109791643A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/14Payment architectures specially adapted for billing systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/405Establishing or using transaction specific rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders

Abstract

A kind of system and method that merging data is generated based on electronic document.This method comprises: the first electronic document of analysis is to determine at least one parameter transaction, the first electronic document instruction includes the transaction of at least one expense, wherein first electronic document includes at least partly unstructured data;Model is created for the first electronic document, wherein the model is the structured data sets for including at least one identified parameter transaction;The second electronic document is retrieved based on the model, wherein the authority of second electronic document instruction transaction;Based at least one of rule, the model and described second electronic document is deducted, to determine at least one deductible expenses of at least one expense;And at least one deductible expenses based on determined by generates and merges metadata.

Description

System and method for generating the merging data of electronic document
Cross reference to related applications
This application claims in the priority of on August 5th, the 2016 U.S. Provisional Application No.62/371,221 submitted.The Shen It is please simultaneously also that on November 28th, 2016 is submitting, is now in the portion of the U.S. Patent application No.15/361,934 in application stage Divide continuation application.The content of above-mentioned application is hereby incorporated by reference.
Technical field
The present disclosure relates generally to the confirmation of secretarial document in data system, relate more specifically to the verifying based on electronic document content Request.
Background technique
Customer can order the service such as transportation and housing of businessman in real time by network.These orders can be received immediately And processing.However, order payment, which usually requires the more time, to be completed, the fund being transferred especially for guarantee.Therefore, Businessman usually requires that customer provides payment in real time when placing an order and guarantees.For example, customer can be according to payment input credit card letter Breath, and businessman can before authorization sale the real-time verification credit card information.Verifying generally includes to determine provided letter (that is, whether credit number, validity period, PIN code and/or customer's name match with Given information) effectively whether breath.
After receiving such guarantee, purchase order just is generated for customer.Purchase order provides the proof of order, such as Procurement price, the commodity of order and/or service etc..Later, the invoice of order is generated.Purchase order is generally used to indicate that demand Product and the price of estimation or offer, and invoice is then generally used to indicate that the ultimate price of the product and product that actually provide Lattice.In general, purchasing price shown by order invoice is different from purchasing price shown by purchase order.For example, if hotel Guest initially order three evening stay but final 4th evening continue to move in, then the total price of purchase order can reflect and subsequent invoice Different total prices.The gross invoice price situation different from purchase order total price is difficult tracking, receives greatly especially for daily Measure large enterprise's (for example, large hotel in hundreds of or thousands of hotels of management given area) of order.These differences may The record that will lead to enterprise, which saves, occurs mistake.
Data (such as invoice and purchase order data) relevant to runing are managed as enterprise is increasingly dependent on technology, It can suitably manage and early have become successful key with the appropriate system of verify data.For large enterprise, The data volume that enterprise uses daily is very huge.Therefore, it is unpractical for examining and verify manually such data.So And serious problems may be brought to enterprise by recording the difference saved between document, for example, failing correctly to report to the tax authority Income.
Presently, there are some solutions being capable of automatic identification scanning document (for example, invoice and receipt) or other are non-structural Change the information in electronic document (for example, non-structured text document).And these solutions are in accurate identification and identification electronics The character of document and other characteristic aspects often face the challenge.In addition, the degradation of the unstructured electronic document content of input is logical It often will lead to higher error rate.As a result, (that is, image is very clear) also simultaneously in the ideal case for existing image recognition technology It is unable to entirely accurate, and when input picture clarity is inadequate, their accuracy usually but will sharply decline.In addition, losing Or incomplete data may lead to mistake during subsequent data use.Many existing solutions can not identify loss Data, unless for example structural data concentrate field be not be incomplete.
In addition, existing image recognition solution can not accurately identify certain or all spcial characters (for example, "!", “@”、“#”、“$”、" % ", " & " etc.).For example, some existing image recognition solutions possibly can not accurately by The dash line for including in the receipt of scanning is identified as digital " 1 ".As another example, some existing image recognition solutions It can not identify the spcial characters such as dollar mark (), day metasymbol.
In addition, these solutions are preparing for that may face the challenge when the subsequent identification information used.Specifically, Many such solutions either generate output with unstructured format, or only specific in the electronic document of input When ground is formatted for image identification system identification, structuring output could be generated.The generated unstructured usual nothing of output Method is effectively handled.Particularly, this unstructured output may include copy, and may include after needing before the use The data of continuous processing.
Operation cost refers to as carrying on trade or the cost of business activity and the expense that generates.These expenses usually can be with It deducts.Deductible expenses refers to the expense deducted in subsidiary company income before tax revenue collection.It includes for example that benchmark service, which deducts, Whole administration fee, travel for commercial purpose or amusement expense, automobile expenses and employee welfare.Some functional expenses are " flowings ", must Must be in the current deduction of payment, and therefore other functional expenses then " capitalization " and are shared or depreciation over time.
Some business expenditures, such as bribery, traffic ticket, non-unified clothes and unreasonable a large sum of expense (such as small-sized local The jumble jet of businessman) then deducted by law bans.The rule and law of expense deduction are different because of jurisdiction, therefore just should indeed It is challenging with possibility.For large multinational company, it may be faced when determining which expense can deduct Challenge.When expense report and proof document (for example, receipt and invoice) include unstructured data, this problem can be further Ground complicates, and it is inefficient or inaccurate to may cause processing.How to determine that deductible expenses is a serious problem, because improper The document that ground is submitted may be subjected to legal punishment, and the document for detaining submission because this punishment is worried then may cause money Gold loss.
Therefore it provides a kind of solution for overcoming prior art defect will be advantageous.
Summary of the invention
It is the general introduction of several example embodiments of the disclosure below.This general introduction is in order to facilitate reader to this provided herein A little embodiments have a basic understanding, and simultaneously non-fully limit the scope of the invention.This general introduction and not all contemplated embodiments Extensive overview ot, and it is neither intended to the key or important element for identifying all embodiments, it is not intended to and describes any or all side The range in face.Its sole purpose is that some concepts of one or more embodiments are presented in simplified form, as what is presented later Preamble in greater detail.For convenience, the single implementation of term " some embodiments " Lai Zhidai disclosure can be used herein Example or multiple embodiments.
Specific embodiments disclosed herein includes the method for generating merging data based on electronic document.This method packet Include: for the first electronic document of analysis to determine at least one parameter transaction, first electronic document instruction includes at least one expense Transaction, wherein the first electronic document includes at least partly unstructured data;Model is created for the first electronic document, wherein model It originally is the structured data sets for including at least one identified parameter transaction;The second electronic document is retrieved based on the model, In the second electronic document instruction transaction authority;Based on deduct rule, in the model and second electronic document extremely It is one few, to determine at least one deductible expenses of at least one expense;And at least one can based on determined by The generation that deducts the fee merges metadata.
Specific embodiments disclosed herein further includes non-transitory computer-readable medium, is stored thereon with instruction, is used for Make processing circuit execute based on electronic document generate merging data program, described program include: analysis the first electronic document with Determine at least one parameter transaction, the first electronic document instruction includes the transaction of at least one expense, wherein described first Electronic document includes at least partly unstructured data;Model is created for the first electronic document, wherein the model is to include institute The structured data sets of at least one determining parameter transaction;The second electronic document is retrieved based on the model, wherein described the The authority of two electronic documents instruction transaction;At least one of the regular, model and described second electronic document based on deduction, To determine at least one deductible expenses of at least one expense;And at least one deductible expenses based on determined by generates Merge metadata.
Specific embodiments disclosed herein further includes the system for generating merging data based on electronic document.The system includes: Processing circuit;And memory, the instruction which includes are when being executed by processing circuit, by system configuration are as follows: analysis For first electronic document to determine at least one parameter transaction, first electronic document instruction includes the transaction of at least one expense, Wherein the first electronic document includes at least partly unstructured data;Model is created for the first electronic document, wherein the model is Structured data sets including at least one identified parameter transaction;The second electronic document is retrieved based on the model, wherein the The authority of two electronic documents instruction transaction;Rule, the model and second electronic document are deducted based at least one come really At least one deductible expenses of fixed at least one expense;And at least one deductible expenses based on determined by is raw At merging metadata.
Detailed description of the invention
Presently disclosed subject matter is particularly pointed out and is distinctly claimed by the claim of specification inference.By with The aforementioned and other objects, features and advantages of lower detailed description with the accompanying drawing, disclosed embodiment will become aobvious and easy See.
Fig. 1 is the network diagram for describing various disclosed embodiments;
Fig. 2 is the schematic diagram according to the verifying system of an embodiment;
Fig. 3 is the flow chart for showing the method for merging electronic document according to an embodiment;
Fig. 4 is the process for showing the method that data set is created based at least one electronic document according to an embodiment Figure.
Specific embodiment
Emphasis is noticed, and embodiment disclosed herein is only the illustrated embodiment of many advantageous uses of this paper innovative teachings.Generally For, embodiment to be protected needed for the different definite limitation of the statement made in the description of the present application is any.In addition, some old It states and is likely to be suited for certain inventive features and is not suitable for other features.In general, unless otherwise stated, singular elements can be with It is plural number, vice versa, without loss of generality.In attached drawing, the identical label in several views indicates identical component.
Various disclosed embodiments include the method and system for merging electronic document.In one embodiment, based on finger Show the first expense report electronic document of information relevant to transaction to create data set.It is created based on the first data for electronic documents collection It establishs diplomatic relations the model of easy attribute.
Based on the model created, retrieval provides the second proof electronic document of transaction evidence.Compare expense report electronics Document and proof electronic document are to determine the value of the one or more parameter transactions wherein indicated with the presence or absence of difference.It is poor when existing Different time determines the reason of causing difference.One or more based on expense report model, the data of proof electronic document and enterprise Rule is deducted in a feature, retrieval.Based on the rule, expense report model and the data for proving electronic document, one or more is determined A deductible expenses.It generates the metadata of deductible expenses determined by indicating and sends it to business system.
In some embodiments, based on for example related to different enterprises (the different subsidiaries that such as same parent company possesses) The metadata of the deductible expenses of multiple expense report electronic documents of connection generates and merges expense report electronic document.The merging Expense report electronic document can show that the expense of different enterprises.Therefore, the available merging of giving a report of the metadata of deductible expenses The merging data of expense.
The disclosed embodiments permission merges automatically about the deductible expenses electronic document wherein indicated.More specifically, The disclosed embodiments include the structured data sets model for being used for electronic document is provided, thus allow based on it is unstructured, half Structuring or other modes do not have the electronic toll report of known structure to retrieve proof document.For example, the implementation of the disclosure Example can be used for effectively analyzing the scan image of the expense report of transaction, to allow more accurately to identify the expense of request authority With the part of report, and therefore allow more accurately to identify the appropriate file to prove transaction.It is identified deductible to take With the merging expense report that can be used for creating instruction necessary cost.
Fig. 1 shows example network Figure 100 for describing various disclosed embodiments.In example network Figure 100, close And Data Generator 120, business system 130, database 140 and multiple network source 150-1 to 150-N (are known as separately below Network source 150 and it is referred to as network source 150, is only for purposes of clarity), communicatedly connected via network 110.The net Network 110 can be but not limited to wireless, honeycomb or cable network, local area network (LAN), wide area network (WAN), Metropolitan Area Network (MAN) (MAN), because Special net, global network (WWW), similar network and any combination thereof.
Business system 130 is associated with enterprise, and can store related with the purchase that the representative of enterprise or enterprise carries out Data and show the enterprise characteristic parameter of enterprise characteristic, such as, but not limited to the country one belongs to, operating income data, structuring number According to etc..Enterprise, which can be but not limited to employee, can buy commodity and service (especially in the external quotient for needing to pay value-added tax Product and service) company.Business system 130 can be, but not limited to be server, database, Enterprise Resource Planning System, client Relationship management system, personal computer (PC), personal digital assistant (PDA), mobile phone, smart phone, tablet computer or use To store any other system of related data.
Data relevant to purchase may include for example, electronic document.The each electronic document stored by business system 130 It can show such as expense report or transaction evidence (for example, receipt, invoice, confirmation of purchase).It is wrapped in each electronic document The data included can be structuring, semi-structured, non-structured or combinations thereof.Structuring or semi-structured data can be with It is the unrecognized format of merging data generator 120, is accordingly regarded as being unstructured data.
Database 140 can store the metadata generated by merging data generator 120, for generating merging expense report Table electronic document.Network source 150, which can store, proves electronic document, deduction rule or both.Prove that electronic document can be as For authority of grant requests, such as invoice, tax revenue receipt, the confirmation of order etc..Deducting rule definition deducted can take With (for example, based on type, amount of money etc.), and it can further define feature relevant to enterprise, such as, but not limited to company The country in which it is located, operating income data, structural data (such as subsidy) etc..
Network source 150 can include but is not limited to the server or equipment, tax authority's server, accounting services of businessman Device, database associated with enterprise etc..As non-limiting example, network source 150-1 can be the quotient of storage image file Family's server, the image file show the invoice of the transaction carried out by businessman associated with business server, and network source 150-2 can be storage and the tax authority server regular in the deduction of the expense of particular country occur.
In one embodiment, merging data generator 120 is configured to the machine using the first expense report electronic document The parameter transaction that device visual identity goes out creates model, the first expense report electronic document instruction with include that one or more is taken The relevant information of transaction.In another embodiment, merging data generator 120 is configurable to from such as business system 130 retrieval skill report electronic documents.Based on the model created, merging data generator 120 be configured to from network source 150 it Instruction is retrieved in one proves the second proof electronic document of information of transaction.
In one embodiment, merging data generator 120 is configured to electronic document creation data set, the electronic document Including at least partly lacking the data of known structure (for example, unstructured data, semi-structured data or with unknown structure Structural data).For this purpose, merging data generator 120 is also configured as utilizing optical character identification (OCR) or other images Processing is to determine the data in electronic document.Therefore, merging data generator may include or be communicably connected to identifying processing Device (for example, recognition processor 235 shown in Fig. 2).
In one embodiment, merging data generator 120 is configured to the data to the expense report electronic document created Collection is analyzed, to identify parameter transaction relevant to the transaction indicated in expense report electronic document.Parameter transaction instruction The information of one or more expenses.Merging data generator 120 is configured to data set creation model.Each model is Structured data sets, the identification parameter transaction including transaction.
In one embodiment, it is based on expense report model, merging data generator 120, which is configured to retrieval second, proves electronics Document.The proof electronic document retrieved matches with expense report electronic document, such as relative to proof electronic document and takes With one group of unique identification parameter transaction in each of report electronic document.For example, the proof electronic document retrieved can With transaction ID symbol having the same, or it can have identical date and merchant identifier.If cannot retrieve Second matched proves electronic document, then merging data generator 120 is configurable to determine and indicate in expense report electronic document Expense it is not deductible.
Compared to for example using unstructured data for, can be more effective using structuring model and accurately determine expense With whether deductible.Specifically, can only for expense report electronic document relevant portion (e.g., including in structuring model Part in this specific fields) it is regular to analyze corresponding deduction, to reduce the application example quantity of each rule, and subtract Lacked due to by rule be applied to may be unrelated with each rule data caused by report by mistake.Furthermore, it is possible to from being created Extracted in the specific fields of model for retrieving the corresponding unique identification parameter transaction for proving electronic document, rather than request with All unstructured datas of expense report electronic document are compared.
In one embodiment, based on the comparison result between model and the proof electronic document retrieved, merging data is raw It grows up to be a useful person and 120 is configured to determine the difference of the value in the electronic document compared with the presence or absence of one or more parameter transactions.For this purpose, The comparison result may include parameter transaction in the model that will be created and prove the parameter transaction indicated in electronic document It is compared.To determine difference the parameter transaction compared can be with costs related parameter, more specifically can be In order to successfully deduct and request the parameter of authority.For example, the parameter transaction compared may include each expense (for example, purchase Commodity or service) price.The difference may, for example, be numerical value difference (for example, price, quantity or both), be also possible to compare Example difference etc..
In some embodiments, proof model for proving electronic document can further be created by comparing electronic document This, and the corresponding field relative to parameter transaction to be compared will demonstrate that model is compared with expense report model.For example, The data that can be indicated in " price " field of more each model.Being compared to the data of structuring model can be further Allow more acurrate and effectively determines difference.
In one embodiment, when difference is determined, merging data generator 120 is also configured as determination and causes difference The reason of.Drawn based on one or more causality rules about identified difference and the parameter transaction compared to determine The reason of playing difference.The causality rule may with for example, the difference that is generated due to additional or reimbursement purchase, monetary exchange rate are poor It is different, purchase when do not collect the expenses of taxation (such as value-added tax), incidentals, tip and by parameter transaction value indicate other potential originals Because related.
For example, the price difference of+$ 100.51 may additionally stay to hotel it is related, price difference be-$ 100.51 then may be with room Type be every night 100.51 dollars of room to camp overnight in reimbursement related.
The reason of according to difference, can determine whether expense is deductible.For this purpose, certain predetermined reasons may with it is not deductible Expense it is associated.For example, the reason of causing difference indicate be a late hotel accommodations expense reimbursement, then may cause and determine not Full payment (including the expense of reimbursement that night) is deducted.In further embodiment, based on the reason of cause difference Determine whether expense can partially deduct.For example, if the reason of causing difference is partial refund (for example, moving back in three evenings in total Also wherein the hotel accommodations in an evening take), then the cost components that do not return can be determined as to expense to be deducted.
In one embodiment, the enterprise characteristic parameter based on enterprise, expense report electronic document, prove electronic document or its Combination, merging data generator 120, which is configured to retrieve from one or more network sources 150, deducts rule.For example, being based on business institute Country's (for example, from network source 150 of the tax authority associated with the country in which it is located), the structure of enterprise, enterprise it is nearest Annual revenue and combinations thereof etc. is regular to retrieve retrieved deduction.
In one embodiment, merging data generator 120 is configured to for the rule retrieved being applied to about expense report Model, the data for proving electronic document, enterprise characteristic or combinations thereof each of indicate to determine in expense report electronic document Whether expense can deduct.Furthermore, it is possible to determine the deductible amount of money for each expense.For example, the deductible amount of money can be determined as A part of total cost or the partial amount (for example, when expense is determined as described above partially to be deducted) of expense.
In one embodiment, merging data generator 120 is configured to identified deductible expenses and generates metadata. Metadata may include, for example, the deductible amount of money, about transaction which expense can deduct instruction, cause expense report Prove document between difference the reason of with and combinations thereof etc..Merging data generator 120 is also configured as generating The notice of metadata.
The metadata of generation may be used as merging data, merge expense report for creating.For this purpose, in some embodiments In, merging data generator 120, which is configurable to generate based on multiple groups metadata, merges expense report electronic document.Metadata set Can be related to different expense reports, and can be further related to the expense report from different enterprises.Therefore, merge Expense report electronic document can merge expense for the purpose for example declared dutiable goods.
It should be noted that the above-mentioned described embodiment about a business system 130 in Fig. 1, is only for simplified mesh And do not limit embodiment of the disclosure.Without departing from the scope of the disclosure, multiple enterprises can comparably be utilized System.
Fig. 2 is the example schematic diagram according to the merging data generator 120 of an embodiment.Merging data generator 120 wraps Processing circuit 210 is included, which couples with memory 215, memory 220 and 240 phase of network interface.Implement one In example, merging data generator 120 includes optical character identification (OCR) processor 230.In another embodiment, merging data The component of generator 120 is communicatedly connected via bus 250.
Processing circuit 210 can be implemented as one or more hardware logic components and circuit.Such as, but not limited to, can make The hardware logic component of illustrative type include field programmable gate array (FPGA), specific integrated circuit (ASIC), specially With standardized product (ASSP), system on chip (SOC), general purpose microprocessor, microcontroller, digital signal processor (DSP) etc. or It can be with any other hardware logic component of the calculating of execution information or other operations.
Memory 215 can be (for example, RAM etc.), non-volatile (for example, ROM, flash memory etc.) or its group of volatibility It closes.In one configuration, the computer-readable instruction for realizing one or more embodiments disclosed herein can store In memory 220.
In another embodiment, memory 215 is configured to store software.Software should be broadly interpreted as indicating any The instruction of type, no matter refer to software, firmware, middleware, microcode, hardware description language or other.Instruction may include Code is (for example, with source code format, binary code form, executable code format or any other suitable code lattice Formula).When executed by one or more processors, instruction is so that processing circuit 210 is able to carry out various processing described herein. Specifically, as discussed herein, instruction makes processing circuit 210 be based on electronic document generation merging data when executed.
Memory 220 can be magnetic memory, optical memory etc., and can be implemented as such as flash memory or other memories Technology, CD-ROM, digital versatile disc (DVD) or any other media that can store information needed.
OCR processor 230 can include but is not limited to feature and/or pattern recognition processor (RP) 235, be configured to know Mode that other unstructured data is concentrated, feature or both.Specifically, in one embodiment, OCR processor 230 be configured to The character in unstructured data is identified less.Identified character is utilized to create the data set for match electronic documents.Benefit The data set including data needed for checking request is created with the character identified.
Network interface 240 allow merging data generator 120 and business system 130, database 140, network source 150 or its Combination is communicated, for example to collect metadata, retrieval data, storing data etc..
It should be appreciated that embodiment described herein certain architectures shown in Fig. 2 are not limited to, and it is disclosed not departing from In the case where the range of embodiment, other frameworks can be comparably used.
Fig. 3 shows the example flow diagram of the method for generating merging data based on electronic document according to an embodiment 300.In one embodiment, this method can be executed by merging data generator (for example, merging data generator 120).
In S310, data set is created based on including the first expense report electronic document of information relevant to transaction.It should Transaction includes one or more expenses.The expense report electronic document can include but is not limited to unstructured data, half structure Change data, the structural data with the structure do not expected or do not announced, or combinations thereof.In one embodiment, S310 can be with Including using optical character identification (OCR) analysis cost report electronic document to determine the data in electronic document, identification data In critical field, identification data in value, or combinations thereof.Data set is created to based on electronic document below with reference to Fig. 4 It is further described.
In S320, analysis cost report data collection.In one embodiment, analysis cost report data collection may include but not It is limited to determine that (parameter transaction is such as, but not limited at least one entity identifier (for example, customer enterprise identifies to parameter transaction Symbol, businessman's enterprise identifier or both)), information relevant to transaction is (for example, date, time, price, the type of commodity or pin The service etc. sold) or both of the above.In another embodiment, analysis cost report data collection further includes based on expense report number According to the expense of collection identification transaction.
In S330, based on expense report data set to create model.The model can be but not limited to include multiple fields Data structure.The field may include identified parameter transaction.The field can be predefined.
Due to the structural features of the model of creation, allow for faster handling from electronic document creation model.For example, Than inquiry and manipulation operations can be more efficiently carried out on the data set for lacking this structure in structured data sets.This Outside, the information from electronic document is organized the formation of into structured data sets, preservation can be substantially reduced comprising in an electronic document Information needed for amount of storage.Electronic document typically refers to the figure that more memory spaces are needed than the data set comprising identical information Picture.For example, indicating that the data set of the data of 100,000 image electronic document can be used as data record and be stored in text file In.The size of such text file is significantly less than the size of 100,000 images.
In S340, electronic document is proved based on the model retrieval second created.The proof electronic document instruction retrieved The transaction evidence of expense report electronic document.In one embodiment, S340 includes based on one group of unique identification transaction in model Parameter scans at least one network source.As non-limiting example, in " transaction id " field middle finger of the first model " 123456789 " transaction ID number shown may be used as search inquiry, based on for example including transaction identification number " 123456789 " The metadata of the second electronic document search the second electronic document.In another embodiment, S340 includes being based on the first model To select at least one network source.In some embodiments, if not retrieving proves electronic document (that is, if not demonstrate,proving Bright electronic document is matched with expense report electronic document), then it can determine that the expense indicated in expense report electronic document is not It is deductible and terminate execution.
In optional S350, based on model and the proof electronic document retrieved, to determine one or more parameter transactions With the presence or absence of difference.In one embodiment, S350 includes the parameter transaction and proof electronic document in the model that will be created Corresponding data is compared.In another embodiment, S350 can also include the model created for proving electronic document, and will Data in one or more fields of expense report electronic document and the data in the respective field of proof electronic document carry out Compare.In some embodiments, it if the parameter transaction of one or more groups of comparisons has differences, can determine and not year-on-year Compared with the associated each expense of parameter transaction be nondeductible.
In some embodiments, S350 can also include determining the reason of causing difference.Based on determining reason of discrepancies, with Determine that the one or more expenses indicated in expense report electronic document are nondeductible or only can partially deduct.One In a little embodiments, when difference is amount of money difference (for example, the price of one of expense indicated in expense report and transaction invoice is no When together), it is nondeductible that the high value of the amount of money, which can be determined that,.
In S360, one or more deduction rules are retrieved from network source.Based on enterprise (for example, with expense report electronics The associated enterprise of document) relevant enterprise characteristic to retrieve deducts rule, and is come based on the parameter transaction of transaction further Rule is deducted in retrieval.Specifically, deduct rule can the structure based on country, enterprise where enterprise (for example, subsidiary and mother Company), the operating income of enterprise etc. and change.
In S370, the deduction rule retrieved is applied in expense report electronic document, proof electronic document or both The parameter transaction of instruction.In some embodiments, it can not be answered for the parameter transaction for being confirmed as nondeductible expense It is regular with deducting.It may include but be not limited to determine each deductible expenses, each deductible expenses using the result for deducting rule Amount deducted or both.
In S380, the metadata including identified deductible expenses, amount deducted or both is generated.The metadata can be with It is used together with the metadata of other fees report to generate and merge expense report, to merge expense report.
In optional S390, notice can be generated.The notice may include metadata, the instruction of deductible expenses, determination The reason of causing difference or combinations thereof.In another embodiment, when it is not deductible for determining one or more expenses, notice It can indicate nondeductible expense.
Fig. 4 shows the example flow diagram S310 of the method based on electronic document creation data set according to an embodiment.
In S410, electronic document is obtained.The acquisition of electronic document can include but is not limited to receive electronic document (for example, Receive scan image) or electronic document is retrieved (for example, from consumer's business system, businessman's business system or database or retrieval Electronic document).
In S420, electronic document is analyzed.Analysis can include but is not limited to determine electricity using optical character identification (OCR) Character in subdocument.
In S430, it is based on the analytical procedure, identifies critical field and value in electronic document.Critical field may include but It is not limited to name and address, date, currency, the commodity of sale or service, transaction identifiers, the invoice number etc. of businessman.Electronics text Shelves may include the unnecessary details for not being considered as key value.It for example, the logo of businessman may be unwanted, therefore is not to close Key assignments.In one embodiment, can be with predefined keywords section list, and extracting can be with the matched a plurality of number of critical field According to.Then, liquidation procedures is executed to ensure that information is accurately presented.For example, if OCR causes data to be shown as " 1211212005 ", then this data can be converted to 12/12/2005 by liquidation procedures.Another example, if title is shown as " Mo $ Den " will be then changed to " Mosden ".The external informations such as dictionary, calendar resource can be used to execute liquidation procedures.
In another embodiment, check whether extracted data slot is complete.For example, if Merchant name can identify But its address is lost, then the critical field of seller addresses is imperfect.Attempt the primary key value of polishing missing.The trial can wrap Include inquiry external system with database, be associated with the invoice of previous analysis, or combinations thereof.External system and the example of database can be with Including business directory, Universial Product Code (UPC) database, package delivery and tracking system etc..In one embodiment, S430 is produced Raw one group of complete predefined keywords section and their own value.
In S440, structured data sets are generated.The data set of generation includes the critical field and value of mark.
It should be appreciated that not limited generally using the titles such as " first ", " second " any reference of element herein The quantity or sequence of these elements.On the contrary, these titles are herein usually advantageously to distinguish two or more elements Or multiple examples of element.Therefore, the reference of first element and second element be not meant to only to be able to use two elements or Person's first element must be in some way before second element.Moreover, unless otherwise stated, a set of pieces includes one Or multiple element.
As it is used herein, meaning to can be used alone any column in the object list that phrase "at least one" is followed by Object out or it can use two or more any combination in listed object.For example, if system is described as Including " at least one of A, B and C ", then it only includes A that system, which can be,;It only include B;It only include C;A and B combination;B and C group It closes;A and C in combination;Or A, B and C in combination.
Various embodiments disclosed herein can be implemented as hardware, firmware, software or any combination thereof.In addition, software is excellent Selection of land is embodied as the application program being tangibly embodied on program storage unit (PSU) or computer-readable medium, computer-readable Jie Matter includes the combination of component or particular device and/or equipment.Application program can upload to include any suitable architecture machine Device is simultaneously executed by it.Preferably, which has such as one or more central processing unit (" CPU "), memory and defeated Enter/computer platform of hardware such as output interface on realize.Computer platform can also include operating system and micro-instruction code. Various processes and function described herein can be a part of micro-instruction code or a part of application program or they Any combination, can be executed by CPU, regardless of whether explicitly showing such computer or processor.Furthermore it is possible to The various other peripheral cells for being connected to computer platform, such as additional-data storage unit and print unit are set.In addition, non- Temporary computer-readable medium is any computer-readable medium in addition to temporary transmitting signal.
Herein cited all examples and conditional statement are intended to for pedagogical purposes to help reader to understand disclosed reality The principle and inventor's facilitated technique and the concept that provides of example are provided, and these realities specifically enumerated should be to be construed as being without limitation of Example and condition.In addition, all statements and the equal purport of its specific example of the principle of embodiment disclosed herein, aspect and embodiment In the equivalent including its structure and function.In addition, these equivalents are intended to include currently known equivalent and open in the future The equivalent of hair, that is, exploitation execution identical function any element, but regardless of structure how.

Claims (19)

1. a kind of method for generating merging data based on electronic document, comprising:
The first electronic document is analyzed to determine at least one parameter transaction, the first electronic document instruction includes that at least one takes Transaction, wherein first electronic document includes at least partly unstructured data;
Model is created for the first electronic document, wherein the model is the structuring for including at least one identified parameter transaction Data set;
The second electronic document is retrieved based on the model, wherein the authority of second electronic document instruction transaction;
Based at least one of rule, the model and described second electronic document is deducted, to determine at least one expense At least one deductible expenses;And
It is generated based at least one identified deductible expenses and merges metadata.
2. according to the method described in claim 1, wherein it is determined that at least one parameter transaction further include:
At least one critical field and at least one value are identified in first electronic document;
Create data set based on first electronic document, wherein the data set created include at least one described critical field and At least one described value;And
Created data set is analyzed, wherein determining at least one parameter transaction based on analysis.
3. according to the method described in claim 2, wherein, identifying at least one critical field and at least one value further include:
The first electronic document is analyzed to determine the data in the first electronic document;And
Based on scheduled critical field list, at least part of determining data is extracted, wherein at least the one of determining data Part is matched at least one critical field in scheduled critical field list.
4. according to the method described in claim 3, wherein, analyzing the first electronic document further include:
Optical character identification is executed on first electronic document.
5. according to the method described in claim 1, further include:
Based on second electronic document and be first electronic document creation model, determine first electronic document and It is related at least one of at least one expense with the presence or absence of difference between second electronic document;And
For the difference of each determination, determine whether corresponding expense is nondeductible.
6. according to the method described in claim 5, wherein, second electronic document includes at least partly unstructured data, The method also includes:
Structured data sets model is created for second electronic document;And
Data at least one field of first electronic document model are corresponding at least one of the second electronic document model Data in field are compared, wherein determining difference based on comparative result.
7. according to the method described in claim 5, further include:
For each of at least one expense expense, when determination has differences relative to the expense, determination is drawn The reason of playing difference, wherein it is by based on the reason of cause relative to the expense variance that whether each expense, which is nondeductible, To determine.
8. according to the method described in claim 1, wherein, first electronic document is associated with enterprise, and the method is also wrapped It includes:
At least one is retrieved based at least one enterprise characteristic of enterprise deducts rule.
9. according to the method described in claim 8, wherein, at least one described enterprise characteristic includes at least one of the following: Country, the pattern of enterprises and the operating income at place.
10. a kind of non-transitory computer-readable medium is stored thereon with the instruction for making processing circuit execute a program, institute Stating program includes:
The first electronic document is analyzed to determine at least one parameter transaction, the first electronic document instruction includes that at least one takes Transaction, wherein first electronic document includes at least partly unstructured data;
Model is created for the first electronic document, wherein the model is the structuring for including at least one identified parameter transaction Data set;
The second electronic document is retrieved based on the model, wherein the authority of second electronic document instruction transaction;
Based at least one of rule, the model and described second electronic document is deducted, to determine at least one expense At least one deductible expenses;And
It is generated based at least one identified deductible expenses and merges metadata.
11. a kind of system for generating merging data based on electronic document, comprising:
Processing circuit;And
Memory, the instruction which includes are when being executed by processing circuit, by system configuration are as follows:
The first electronic document is analyzed to determine at least one parameter transaction, the first electronic document instruction includes that at least one takes Transaction, wherein first electronic document includes at least partly unstructured data;
Model is created for the first electronic document, wherein the model is the structuring for including at least one identified parameter transaction Data set;
The second electronic document is retrieved based on the model, wherein the authority of second electronic document instruction transaction;
Based at least one of rule, the model and described second electronic document is deducted, to determine at least one expense At least one deductible expenses;And
It is generated based at least one identified deductible expenses and merges metadata.
12. system according to claim 11, wherein the system is additionally configured to:
At least one critical field and at least one value are identified in first electronic document;
Create data set based on first electronic document, wherein the data set created include at least one described critical field and At least one described value;And
Created data set is analyzed, wherein determining at least one parameter transaction based on the analysis.
13. system according to claim 12, wherein the system is additionally configured to:
First electronic document is analyzed to determine the data in the first electronic document;And
At least part that determining data are extracted based on scheduled critical field list, wherein the data of the determination are at least A part of at least one critical field with scheduled critical field list matches.
14. system according to claim 13, wherein the system is additionally configured to:
Optical character identification is executed on first electronic document.
15. system according to claim 11, wherein the system is additionally configured to:
Based on second electronic document and be first electronic document creation model, determine first electronic document and Whether deposited between second electronic document difference to related about at least one of at least one expense;And
For the difference of each determination, determine whether corresponding expense is nondeductible.
16. system according to claim 15, wherein the system is additionally configured to:
Structured data sets model is created for second electronic document;And
Data at least one field of first electronic document model are corresponding at least one of the second electronic document model Data in field are compared, wherein determining difference based on comparative result.
17. system according to claim 15, wherein the system is additionally configured to:
For each of at least one expense expense, when determination has differences relative to the expense, determination is drawn The reason of playing difference, wherein it is by based on the reason of cause relative to the expense variance that whether each expense, which is nondeductible, To determine.
18. system according to claim 11, wherein the system is additionally configured to:
At least one described deduction rule of at least one enterprise characteristic retrieval based on enterprise.
19. system according to claim 18, wherein at least one described enterprise characteristic includes at least one in following It is a: country, the pattern of enterprises and the operating income at place.
CN201780058567.6A 2016-08-05 2017-08-04 System and method for generating the merging data of electronic document Pending CN109791643A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201662371221P 2016-08-05 2016-08-05
US62/371,221 2016-08-05
US15/361,934 2016-11-28
US15/361,934 US20170154385A1 (en) 2015-11-29 2016-11-28 System and method for automatic validation
PCT/US2017/045554 WO2018027158A1 (en) 2016-08-05 2017-08-04 System and method for generating consolidated data for electronic documents

Publications (1)

Publication Number Publication Date
CN109791643A true CN109791643A (en) 2019-05-21

Family

ID=61073095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780058567.6A Pending CN109791643A (en) 2016-08-05 2017-08-04 System and method for generating the merging data of electronic document

Country Status (3)

Country Link
EP (1) EP3494531A4 (en)
CN (1) CN109791643A (en)
WO (1) WO2018027158A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161616A1 (en) * 2008-12-16 2010-06-24 Carol Mitchell Systems and methods for coupling structured content with unstructured content
US8774516B2 (en) * 2009-02-10 2014-07-08 Kofax, Inc. Systems, methods and computer program products for determining document validity
US8861861B2 (en) * 2011-05-10 2014-10-14 Expensify, Inc. System and method for processing receipts and other records of users
CN103843315B (en) * 2011-10-01 2018-08-17 甲骨文国际公司 Mobile expense solution architectural framework and method

Also Published As

Publication number Publication date
EP3494531A1 (en) 2019-06-12
WO2018027158A1 (en) 2018-02-08
EP3494531A4 (en) 2020-04-08

Similar Documents

Publication Publication Date Title
US11062132B2 (en) System and method for identification of missing data elements in electronic documents
US11138372B2 (en) System and method for reporting based on electronic documents
US20170323006A1 (en) System and method for providing analytics in real-time based on unstructured electronic documents
US20180011846A1 (en) System and method for matching transaction electronic documents to evidencing electronic documents
US20170169292A1 (en) System and method for automatically verifying requests based on electronic documents
EP3494495A1 (en) System and method for completing electronic documents
WO2018132656A1 (en) System and method for generating a modified evidencing electronic document including missing elements
US20180025225A1 (en) System and method for generating consolidated data for electronic documents
US20180046663A1 (en) System and method for completing electronic documents
CN109791643A (en) System and method for generating the merging data of electronic document
US20180025438A1 (en) System and method for generating analytics based on electronic documents
US10387561B2 (en) System and method for obtaining reissues of electronic documents lacking required data
US20180137578A1 (en) System and method for prediction of deduction claim success based on an analysis of electronic documents
CN109154949A (en) Analysis is provided in real time based on non-structured electronic document
EP3494496A1 (en) System and method for reporting based on electronic documents
US20170169519A1 (en) System and method for automatically verifying transactions based on electronic documents
CN108713198A (en) Automatic checking request based on electronic document
WO2018132655A2 (en) System and method for optimizing reissuance of electronic documents
WO2017201292A1 (en) System and method for encrypting data in electronic documents
CN109791548A (en) Match trading electronic document and proof electronic document
WO2018034941A1 (en) System and method for generating analytics based on electronic documents
CN109313765A (en) The System and method for of automatic verifying transaction is carried out based on electronic document
US20170323395A1 (en) System and method for creating historical records based on unstructured electronic documents
WO2017201013A1 (en) System and method for creating historical records based on unstructured electronic documents
CN109791641A (en) Obtain the system and method for lacking the repeating transmission of electronic document of necessary data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190521