CN109983489A - Electronic document is proved based on non-structured data search - Google Patents

Electronic document is proved based on non-structured data search Download PDF

Info

Publication number
CN109983489A
CN109983489A CN201780070059.XA CN201780070059A CN109983489A CN 109983489 A CN109983489 A CN 109983489A CN 201780070059 A CN201780070059 A CN 201780070059A CN 109983489 A CN109983489 A CN 109983489A
Authority
CN
China
Prior art keywords
electronic document
data
transaction
template
inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201780070059.XA
Other languages
Chinese (zh)
Inventor
N·古兹曼
I·萨夫特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vatbox Ltd
Original Assignee
Vatbox Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/361,934 external-priority patent/US20170154385A1/en
Application filed by Vatbox Ltd filed Critical Vatbox Ltd
Publication of CN109983489A publication Critical patent/CN109983489A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/04Payment circuits
    • G06Q20/047Payment circuits using payment protocols involving electronic receipts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/389Keeping log of transactions for guaranteeing non-repudiation of a transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission

Abstract

It is used to search the system and method for proving electronic document based on non-structured data.This method includes at least one parameter transaction for analyzing the first electronic document to determine transaction, wherein the first electronic document includes at least partly non-structured data;For transaction creation template, wherein template is the structured data sets for including at least one determining parameter transaction;Based on the template created, at least one inquiry is generated;And using at least one inquiry to inquire the second electronic document at least one data source.

Description

Electronic document is proved based on non-structured data search
Cross reference to related applications
This application claims submitted on October 16th, 2016 application No. is the power of 62/408,780 U.S. Provisional Application Benefit.The application simultaneously be also on November 28th, 2016 it is submitting, currently examine in, application No. is 15/361,934 beauty The part continuation application of state's patent application.The content of above-mentioned application is incorporated herein by reference in their entirety.
Technical field
Generally, this disclosure relates to search for electronic document, and relate more specifically to based on non-structural in electronic document The search of the data of change.
Background technique
Enterprise Resources Plan (Enterprise Resource Planning, ERP) is business management software, is commonly used in It collects, storage, the data of management and explanation from various business activities, such as the spending that enterprise staff is done.ERP system is usual Collect data relevant to the business activity of various departments in enterprise.Data collected in this way can come from different data Source, and can be different formats.ERP system provides the integrated view of the business activity data, and is further able to give birth to At report on expenses, the relevant tax authority can be sent to after this report.
Particularly in large enterprise, employee is engaged in a large amount of business activity.Such business activity may further bring to The a large amount of business expenditure of tax authority's report.Report that such business expenditure can cause deductions and exemptions of taxes and refund.For this purpose, employee is logical Receipt according to the expense occurred is often provided, and usually requires to indicate the type of such expense.Based on the instruction, ERP system Report can be generated in system, and this report provides any received receipt to the relevant tax authority.
In addition, according to relevant to business activity data are managed, ERP system must be associated with and track managed data set it Between relationship.For example, relevant to the tax affairs report of receipt information must be saved and be associated with receipt itself.Data set it Between association in any mistake can lead to the report of mistake, this can then be caused by unsuccessful redemption and tax-free lead The loss of income of cause, and do not meet laws and rules.Therefore, accurate data management is most important for ERP system.
When the data of part are unstructured, additional challenge can be brought by tracking such data.For example, there is also with chase after Track is stored as the relevant difficulty of payment receipt of image file.It is related to for the existing solution of these challenges based on user The file extension of offer identifies the content of the file comprising non-structured data.This solution is limited to artificial mistake Accidentally (for example, wrong word, file content of mistake etc.), and possibly content therein can not all be described.These disadvantages may Further result in the inaccuracy in ERP system.
The quantity of receipt of the employee obtained in business process may be very huge.This large amount of receipt causes to provide Significantly increase to the data of ERP system, so as to cause being difficult to manage the data in such ERP system.Specifically, existing solution party Case faces the challenge in correct being associated with searched with safeguarded in managed data.These difficulties may cause mistake and not Match.It may be wrong with multiple proofs or the related result of other incorrect reports when mistake and mismatch capture not in time Accidentally.It is time and effort consuming that whether manual authentication report matches with receipt, and is limited to mistake.Further, this Manual authentication itself can not correct the problem of managed data.
In addition, the existing solution for verifying transaction automatically is using including at least partly non-structured data It faces the challenge when electronic document.Specifically, this solution can identify transaction data in the receipt of scanning and its His non-structured data, but when utilizing identified transaction data, it may be possible to it is inefficient and inaccurate.
Therefore it provides the technical solution of many disadvantages of the prior art is overcome to be advantageous.
Summary of the invention
Several exemplary embodiments of the disclosure are summarized as follows.There is provided general introduction is in order to facilitate reader, offer pair The basic comprehension of such embodiment and not exclusively limit disclosed range.The not all contemplated embodiments of the general introduction it is extensive It summarizes, and is neither intended to the key or important element for identifying all embodiments, be not intended in terms of describing any or all Range.Its sole purpose is that some concepts of one or more embodiments are presented in simplified form, more detailed as what is presented later The preamble carefully described.For convenience, term " some embodiments " or " some embodiments " Lai Zhidai disclosure can be used herein Single embodiment or multiple embodiments.
Some embodiments disclosed herein include being used to search the method for proving electronic document based on non-structured data. This method comprises: analyzing the first electronic document to determine at least one parameter transaction of transaction, wherein the first electronic document includes At least partly non-structured data;For transaction creation template, wherein template is the knot for including at least one determining parameter transaction Structure data set;Based on the template created, at least one inquiry is generated;And the is used for inquire using at least one inquiry At least one data source of two electronic documents.
Some embodiments disclosed herein further include non-transitory computer-readable medium, are stored thereon with for making to handle The program that circuit executes, the program include: at least one parameter transaction for analyzing the first electronic document to determine transaction, wherein the One electronic document includes at least partly non-structured data;For transaction creation template, wherein template be include determining at least one The structured data sets of a parameter transaction;Based on the template created, at least one inquiry is generated;And it is looked into using at least one It askes to inquire at least one data source for the second electronic document.
Some embodiments disclosed herein further include being for search proof electronic document based on non-structured data System.The system includes: processing circuit;And memory, which includes instruction, will when instructing circuit processed to execute System configuration are as follows: the first electronic document of analysis is to determine at least one parameter transaction traded, wherein the first electronic document includes At least partly non-structured data;For transaction creation template, wherein template is the knot for including at least one determining parameter transaction Structure data set;Based on the template created, at least one inquiry is generated;And the is used for inquire using at least one inquiry At least one data source of two electronic documents.
Detailed description of the invention
It is particularly pointed out in claims at specification ending and is distinctly claimed presently disclosed subject matter. By detailed description with the accompanying drawing below, foregoing end other objects, the feature and advantage of disclosed embodiment will be aobvious and easy See.
Fig. 1 is the network for describing various open embodiments.
Fig. 2 is to show the process of the method according to the embodiment that electronic document is proved based on non-structured data search Figure.
Fig. 3 is to show the flow chart of the method according to the embodiment for drawing template establishment.
Fig. 4 is the block diagram according to the Query Builder of embodiment.
Specific embodiment
It is important that, it should be noted that embodiment disclosed herein is only showing for many advantageous uses of the innovative teachings of this paper Example.Generally, the statement made in the description of the present application not necessarily limits the embodiment of any various requirement protection.This Outside, some statements are likely to be suited for certain inventive features and are not suitable for other features.In general, unless otherwise stated, single Number elements can be plural number, vice versa and without loss of generality.In the accompanying drawings, similar labelled notation indicates in several views Similar component.
Various disclosed embodiments include being used to search the system and the side that prove electronic document based on non-structured data Method.For the first report electronic document drawing template establishment.Report electronic document includes at least portion of the instruction for the parameter transaction of transaction Divide non-structured data.Based on the critical field and value drawing template establishment identified in report electronic document.Based on what is created Template generation inquiry.Based on type, the data source to be searched for for reporting electronic document etc. come custom-built query.Using the inquiry, search Rope one or more data source is for finding matched proof electronic document.It can be based on template come the result of verification search.Base Electronic document is proved in template and result, be can be generated and is declared electronic document.
Other payments for purchase are carried out to declare value-added tax or enterprise in conventional business process, need to carry out Application procedures.These programs require certain files to prepare and put on record, proves collection etc..According to the type of enterprise, source area, purchase Place etc., the requirement are different because of the difference between the jurisdiction of courts.Nowadays, enterprise often manages theirs in multiple sources Data, so that the task of proof needed for identification becomes complicated.In addition, such prove to may include sensitive data, unless It needs for declaring, otherwise should not share sensitive data.In other cases, it is understood that there may be can not due to the problem of supervising The certain data sent, for example, privacy concern.
Disclosed embodiment allows transaction indicated in the non-structured file based on such as image or text, searches It is suitably proved with extracting.More specifically, non-structured data in analysis report electronic document are to create the number of structuring According to collection template, and then it can be used for generating the structure based on template and uniquely identify the inquiry traded accordingly, to allow height The search of effect and the data source accurately, for suitably proving electronic document.In addition, in order to which more efficiently subsequent makes With can store created template to substitute corresponding report electronic document, because structural data can be than unstructured Data, semi-structured data or lack the data of known structure and be more effectively carried out processing.
Fig. 1 shows example network Figure 100 for describing various open embodiments.Network 100 includes query generation Device 120, web page source 130-1 to 130-N are (just to for the sake of simplicity, being known as a web page source 130 separately below and being referred to as Multiple web page sources 130), database 140 and the business system 150 communicatedly connected by network 110.Network 110 can be But it is not limited to wireless network, honeycomb or cable network, local area network (LAN), wide area network (WAN), Metropolitan Area Network (MAN) (MAN), internet, Wan Wei Net (WWW), similar network and any combination thereof.
Business system 150 is associated with enterprise, and can store represented with enterprise or enterprise trade it is related Data.Enterprise, which may be, but not limited to, its employee, can represent the enterprise of enterprise's purchase commodity and service.Business system 130 It can be but not limited to server, database, Enterprise Resource Planning System, CRM system, user equipment or storage Any other system of related data.User equipment may be, but not limited to, personal computer, laptop computer, plate and calculate Machine, smart phone, wearable computing machine equipment or can grab, store and send non-structured data set it is any its His equipment.As non-limiting example, business system 150 can be the smart phone including camera.Business system 150 can be with It is used by the employee of tissue for example associated with business system 130.
Database 140 can at least storage report electronic document.In an exemplary embodiment, database 140 can be with By enterprise operations associated with business system 150 or associated with it.
Multiple storages of web page source 130 prove electronic document, the such as, but not limited to scanned copy of receipt, invoice etc..It can look into Multiple web page sources 130 are ask, and different web page sources 130 can receive the inquiry of different-format.For this purpose, being stored in multiple webpages Proof electronic document in source 130 may include metadata or associated with metadata, which identifies by proving accordingly The transaction that electronic document proves.
In one embodiment, Query Builder 120 includes optical recognition process device (for example, the optical recognition process in Fig. 4 Device 430).Optical recognition process device is configured at least character of the identification in data, in especially non-structured data.Inquiry Generator 120 is configured to receive from business system 130 and request.The request may include, but be not limited to report electronic document, report Position etc. of the identifier, report electronic document of electronic document in database 140.Report electronic document is at least partly non-knot The electronic document of structure, including but not limited to, non-structured data, partly-structured data, lack known format (that is, by The format that Query Builder 120 identifies) data of structuring or combinations thereof.
Report electronic document is in general, but be not limited to for example (defeated for example, by typewriting or other means by employee's hand filling Enter information) electronic document.In the exemplary embodiment, report electronic document can be the image for showing report on expenses, or The text file of text including report on expenses.Report electronic document indicates information relevant to one or more transaction.
Report electronic document can upload to database 140 by the user of such as business system 150.For example, business system 150 user can shoot the image of report on expenses by the camera (not shown) of business system 150, and store the image on In database 140 (for example, being not shown by the server of enterprise).
In one embodiment, Query Builder 120 is configured at least partly non-structured report electronic document of analysis.Point Analysis may include, but be not limited by computer vision technique identification shown in at least partly non-structured electronic document Element, and the template based on the element creation transaction attribute identified.This computer vision technique may further include figure As identification, pattern-recognition, signal processing, character recognition etc..
Each created template is the data set of structuring, including the identified parameter transaction for transaction.Tool Body, template includes the field of one or more classifications for representing transaction data, wherein each field includes suitable transaction ginseng Several values.The creation of structured data sets template is discussed further below.
In one embodiment, based on the template created, Query Builder 120 is configured at least partly non-structural The each transaction indicated in the report electronic document of change generates inquiry.It can be based further on and be received by multiple web page sources 130 It inquires format, for type of proof electronic document or both needed for certified report electronic document, generates each inquiry.
In one embodiment, Query Builder 120 is configurable to based on the template created, to report electronic document In indicated each transaction determine the required type for proving electronic document.Based on parameter transaction, proof electricity can be determined The required type of subfile, which is such as, but not limited to price, the type of the cargo bought or service, needs to prove Country occurs for the type (for example, when proof electronic document is used as the proof that value-added tax is declared) of electronic document declared, transaction In one or more proof rules or combinations thereof etc..As non-limiting example, for having the price less than 250 Euros Transaction may need less detailed invoice, and may need more detailed invoice for other transaction.It is non-as another Limitative examples can be needed with typically desired VAT invoice in the second national transaction in the transaction of the first country Want any kind of invoice.
Each inquiry can be based on including the value in one or more fields of corresponding template.It is given birth to based on template query At template field can be predefined field, be chosen to uniquely identify out the Transaction Information traded so that use The proof electronic document (for example, receipt) that the inquiry is searched provides the proof of transaction.As non-limiting example, for causing The purchase activity of generation expense, metadata may include the position (indicating in " position " field) of generation expense, generate expense The feature that occurs of business (for example, as " merchandise news " field in indicate) (for example, the type of commodity, selling the type of product Deng), time (for example, as " time " field in indicate) for generating expense, the amount of money is (for example, the currency values indicated in respective field Or quantity), and combinations thereof etc..
In embodiment, Query Builder 120 is configured so that generated inquiry to search for proof electronic document.As a result Prove that electronic document can be associated with the metadata of matching inquiry.Search may include using inquiry generated to inquire one A or multiple web page sources 130.In some embodiments, search may include that electronics text is proved to 140 query result of database Part, and when the proof electronic document for the transaction only not found in the search of database 140, it is inquired to web page source 130. Therefore, in such an embodiment, web page source 130 can be only used for the proof electronic document that inquiry is lost.
In an alternate embodiment of the invention, Query Builder 120 is configurable to the result of cleaning search.The cleaning can wrap It includes, but is not limited to prove to remove private data, unrelated data or both in electronic document from result.It can be based on by enterprise One or more cleaning rules that system 150 provides determine private data and unrelated data.It is private as non-limiting example People and unrelated data may include the personal information of specific employee (for example, personal credit card information, social security number Deng), it is not to provide needed for the proof for supporting value-added tax to declare.In another embodiment, cleaning may include in result electricity Optical character identification is used in subfile, and based on optical character identification as a result, identification is private and irrelevant information.
Use structured stencil for search proof electronic document make than such as directly using non-structured data more It is effective and accurate.Specifically, the metadata based on template generation can be generated relative to specific field, so that metadata is more Effectively and more accurately show the parameter of unique identification transaction.Therefore, metadata can be used for correctly searching for matched proof Electronic document, while reducing the relevant processing capacity of metadata and time compared with.
Query Builder 120 generally includes the processing circuit (example for being coupled to memory (for example, memory 415 of Fig. 4) Such as, the processing circuit 410 in Fig. 4).Processing circuit may include or for processor (not shown) component, or be coupled to storage The processor array of device.Memory includes the instruction that can be executed by processing circuit.When processing circuit executes the instruction, this refers to Enable configuration processing circuit to execute various functions described herein.
It should be appreciated that presently disclosed embodiment is not limited to specific structure shown in Fig. 1, and this public affairs is not being departed from Other structures can be equally applicable in the case where opening the range of embodiment.Specifically, Query Builder 120 may reside within cloud Computing platform, data center etc..In addition, in some embodiments, it is raw that there may be the multiple queries of operation as described above It grows up to be a useful person, and is configured to have one as backup, with load sharing between them, or be divided between them different Function.
It shall yet further be noted that some embodiments about Fig. 1 discussion are described as only interacting with a business system 150, this is only It is rather than the limitation for the disclosure for purposes of simplicity.Data from additional business system can be used for by inquiring The inquiry that generator 120 generates, without departing from the range of disclosed embodiment.In addition, database 140 can equally be another Data source, for example, may have access to the server of one or more database.In addition, without departing from the scope of the disclosure, Multiple databases can be used.
Fig. 2 is to show the method according to the embodiment for being used to search proof electronic document based on non-structured data Example flow Figure 200.In embodiment, this method can be executed by Query Builder (for example, Query Builder 120).
In S210, receives or electronic document is reported in retrieval first.Report electronic document includes being related to one or more friendships Easy at least partly non-structured data.At least partly non-structured data include but is not limited to non-structured data, Partly-structured data or lack known format structuring data.Transaction e file can be from such as Enterprise Resources Planning (ERP) system (for example, business system 130 of Fig. 1), or can be from such as user equipment (for example, business system 150 of Fig. 1) It receives.
In some embodiments, it can receive the request for generating and declaring electronic document, it includes report that this, which declares electronic document, It accuses electronic document or reports the identifier of electronic document.The request also can indicate that the type declared (for example, value-added tax or other Tax revenue, reimbursement of employee's expense etc.).Therefore, in some embodiments, S210 may include search report electronic document.
In example embodiment, report electronic document it can be shown that one or more for example relevant to business activity The image of report on expenses.It, can the shifting as operated by the employee of the tissue of shooting report on expenses table as non-limiting example Dynamic equipment captures image.
In S220, for each transaction creation template indicated in report electronic document.In embodiment, pass through optical character Identification (OCR) processor can analyze transaction e file.The analysis can also include that at least portion is identified using machine vision Divide element, cleaning or the elimination data and generation structural data in non-structured data, which includes extremely The key character and numerical value identified in the non-structured data of small part.As an example, for the image of receipt, machine vision It can be used for identifying relevant to the transaction recorded in receipt information, such as price, position, date, buyer, the seller etc..
In optional S230, based on one in the template created, determining for corresponding transaction proves electronic document institute Need type.In embodiment, S230 can also include one or more numbers of the identification storage type for proving electronic document According to source.Identified data source can be inquired to search the matched proof electronic document for transaction.
In S240, based on the template generation inquiry created.Electronic document is proved determined by being based further on Required type is inquired to generate.For example, can typical identification information based on the required type for proving electronic document, based on being known The inquiry format or both that other data source receives, generates inquiry.It can be generated based on the value in the field of unique identification transaction Inquiry.It can for the template including field " date ", " price ", " quantity " and " project name " as non-limiting example To generate the inquiry for indicating the value in those fields.
In embodiment, S240 may include generating more than one inquiry.For example, when search needs different-format to inquire Data source when, more than one inquiry can be used, with optimize be used for particular source inquiry etc..For this purpose, generated Inquiry can be based further on the principle of optimality to optimize the inquiry of one or more data sources.
In S250, inquiry generated is for the search proof electronic document in one or more data sources.Implement one In example, S250 includes that one or more web page sources are inquired using the inquiry of generation.In another embodiment, S250 may include The database of enterprise is inquired first to search the proof electronic document for transaction, and if only do not searched in the database To when proving electronic document, multiple web page sources are inquired.In embodiment, S250 can also include the electronics text that retrieval arrives Part.In another embodiment, S250 further includes in the electronics that for example storage has been searched in database (for example, database 140) File.
In some embodiments, S250 may include the notice for generating instruction search result.Notice may include being used for The proof electronic document of transaction.
In optional S260, the result of search can be cleared up to remove personal information, irrelevant information or both and all clear up.Clearly Reason can be based on cleaning rule.
In optional S270, electronic document is proved based on the template created and result, can be generated and declare electronic document. In an exemplary embodiment, declare electronic document and can be include result electronic document, complete value-added tax declares and asks Seek table.
In S280, the electronic document for incidental transaction is checked the need for, also, if it is, continue to execute S230, Otherwise executive termination.
Fig. 3 is to show according to the embodiment based on the electronic document including at least partly non-structured data, is used for The example flow diagram S220 of the method for drawing template establishment.
In S310, electronic document is obtained.Obtaining electronic document may include, but be not limited to, reception electronic document (for example, Receive the image of scanning) or electronic document is retrieved (for example, examining from Client Enterprise system, businessman's business system or database Rope).
In S320, electronic document is analyzed to identify the element at least partly non-structured data.Analysis may include But it is not limited to using optical character identification (OCR) to determine the character in electronic document.
Element can include but is not limited to character relevant to transaction, character string or both.As non-limiting example, member Element may include the print data appeared in payment receipt relevant to business activity.Such print data may include but It is not limited to date, time, quantity, seller name, the type of seller business, value-added tax value, the type of bought product, payer Method number of registration etc..
In S330, it is based on the analysis, identifies critical field and value in electronic document.Critical field may include but unlimited In the title of businessman and address, date, currency, the commodity of sale or service, transaction identifiers, invoice number etc..Electronic document can To include the unnecessary details for not being considered as key value.As an example, may not be needed the mark of businessman, therefore it is not to close Key assignments.In embodiment, can predefined key value list, and can extract and the matched data slot of critical field. Then, liquidation procedures is executed to ensure that information is accurately presented.For example, if OCR will lead to data and be shown as " 1211212005 ", then this data can be converted to 12/12/2005 by liquidation procedures.Another example, if title is shown as " Mo $ Den " will be then changed to " Mosden ".It can use the external informations resource such as dictionary, calendar and execute liquidation procedures.
In another embodiment, check whether the data slot of extraction is complete.For example, if can identify the name of businessman Claim, but lose its address, then the critical field of seller addresses is imperfect.Execute the trial for improving the primary key value of missing. The trial may include inquiring the correlation of external system and database, inquiry and previous analyzed invoice, or combinations thereof.It is external System and the example of database may include enterprise content, Universial Product Code (Universal Product Code, UPC) number According to library, package delivery and tracking system etc..In one embodiment, S430 generates one group of complete predefined keywords section and its each From value.
In another embodiment, S330 can also include the ambiguity for eliminating non-structured data.Ambiguity elimination can be with base In but file name, dictionary, algorithm, the synonym etc. that are not limited to non-structured data set.Ambiguity elimination may be implemented more The identification accurately traded.Ambiguity elimination can be based on but be not limited to, and the structure of data is (for example, the number in field " destination " According to disambiguation can be carried out with location-based title), dictionary, algorithm, synonym etc..In some embodiments, if ambiguity Eliminate it is unsuccessful, then can be generated notify and be sent to user (such as user of business system 150), prompt user provide into one The explanation of step.
As non-limiting example, for the image in the file of entitled " purchase receipt ", can use and character string " total price " is located at character string " 300.00 " character in a line the value determined to include in " purchasing price " field 300.00.As another example, can based on dictionary eliminate character string " Drance " ambiguity to generate metadata, the metadata Indicate that position associated with non-structured data set is France.As another example, in field relevant to charge type, The data of the structuring of field can be " Paris taxi ", and the value of the field can be " 60 Euros ".Based on maximum The one or more rule of taxi price, it is too high for taxi fare use can to determine " 60 Euros ", and therefore should Field corresponds to multistage taxi stroke.
In S340, structured data sets are generated.The data set of generation includes identified field and value.
Fig. 4 is the schematic block diagram according to the Query Builder 120 of embodiment.Query Builder 120 includes being coupled to deposit The processing circuit 410 of reservoir 415, reservoir 420 and network interface 440.In embodiment, Query Builder 120 may include Optical character identification (OCR) processor 430.In another embodiment, the component of Query Builder 120 can pass through bus 450 Communicatedly connect.
Processing circuit 410 can be implemented as one or more hardware logic components and circuit.Such as rather than limit, can make The exemplary types of hardware logic component include field programmable gate array (field programmable gate Array, FPGA), specific integrated circuit (application-specific integrated circuit, ASIC), dedicated mark Quasi- product (Application-specific standard products, ASSP), system level chip system (system-on- A-chip system, SOC), general purpose microprocessor, microcontroller, digital signal processor (digital signal Processor, DSP) etc., or it is able to carry out other any hardware logic components of calculating or other information processing.
Memory 415 can be volatibility (such as RAM), non-volatile (such as ROM, flash memory) or combinations thereof.At one In configuration, the computer-readable instruction for executing one or more embodiments as described herein is stored in reservoir 420.
In another embodiment, memory 415 is configured to storage software.Software is broadly interpreted that any type of finger Enable, no matter refer to software, firmware, middleware, microcode, hardware description language or other.Instruction may include code (example Such as, source code format, binary code form, executable code format or other any code formats appropriate).When by one Or multiple processors, when executing the instruction, the instruction is so that processing circuit 410 executes various processes described herein.Specifically, It is that instruction makes processing circuit 410 be based on non-structured data search when being executed proves electronic document as discussed herein.
Reservoir 420 can be magnetic memory, optical memory etc., and may be embodied as such as flash memory or other storages Technology, CD-ROM, digital versatile disc (DVD) or other any media that can be used in storing desired information.
Reservoir 420, which can also be stored, generates first number to the analysis of non-structured data based on OCR processor 430 According to.In another embodiment, reservoir 420 can also be stored based on metadata inquiry generated.
OCR processor 430 can include but is not limited to be configured to identify mode in non-structured data set, feature or The feature and/or pattern recognition processor (recognition processor, RP) 435 of both.Specifically, implement one In example, OCR processor 430 is configured at least identify the character in non-structured data.It can use identified character wound It builds including the data set for data needed for checking request.
Network interface 440 allows Query Builder 120 and business system 130, database 140, business system 150 or its group Conjunction is communicated, and is notified for example to receive electronic document, to send, is searched for electronic document, storing data etc..
It should be appreciated that embodiment as described herein is not limited to specific structure shown in Fig. 4, and the disclosure is not being departed from Other structures can be equally used in the case where the range of embodiment.
It should be noted that the card about the single deals match indicated in search and report electronic document of discussion described herein The various embodiments of bright electronic document, it is only for simplify purpose rather than the limitation to the disclosed embodiments.It is not taking off In the case where from the scope of the present disclosure, it can serially or parallelly find and report that is indicated in electronic document is used for multiple transaction Proof electronic document.As non-limiting example, report that electronic document can be the expense of the multiple transaction of instruction that are being made by employee With report.
It is also to be noted that it is multiple it is discussed about the report electronic document based on non-structured data use institute The proof electronic document found is used for the multiple disclosed embodiments of value-added tax declared, only for the purposes of illustration rather than right In the limitation of the disclosure.It can equally be submitted using proof electronic document for other, such as, but not limited to, other types Declare, tax revenue prepare etc..
Implementable various embodiments disclosed herein is hardware, firmware, software or any combination thereof.In addition, software is preferred Ground be embodied as visibly realizing on program storage unit (PSU) or the computer-readable medium that is made of component on or certain equipment And/or the combination of equipment.Application program can upload to the machine including any suitable architecture and be executed by it.Preferably, should Machine is in the computer platform with such as one or more central processing unit (" CPU "), memory and input/output structure Upper implementation.Computer platform can also include operating system and micro-instruction code.Various processes and function described herein can be with It is a part of micro-instruction code either application program, or is any combination of them, can be executed by CPU, nothing By whether explicitly showing such computer or processor.In addition, various other peripheral cells may be coupled to computer Platform, such as additional-data storage unit and print unit.In addition, non-transitory computer-readable medium is to propagate letter except temporary Any computer-readable medium except number.
All examples as described herein and conditional statement are intended for instructing purpose, to help reader to understand disclosed reality The principle and inventor of applying example promote the concept that this field is contributed, and should be understood as not to it is such specifically quote show Example and condition make limitation.In addition, record herein the principle of embodiment of the disclosure, aspect and embodiment and its specifically show All statements of example, it is intended to including its structural and functional equivalent.In addition, such equivalent includes currently known etc. Jljl and in the future exploitation equivalent, that is, exploitation execution identical function any element, but regardless of structure how.

Claims (19)

1. being used to search the method for proving electronic document based on non-structured data, comprising:
The first electronic document is analyzed to determine at least one parameter transaction of transaction, wherein the first electronic document includes at least partly Non-structured data;
For transaction creation template, wherein the template is the structured data sets for including at least one identified parameter transaction;
Based on the template created, at least one inquiry is created;And
Using at least one described inquiry to inquire the second electronic document at least one data source.
2. according to the method described in claim 1, wherein determining at least one parameter transaction, further includes:
At least one critical field and at least one value are identified in the first electronic document;
Based on the first electronic document, create data set, wherein the data set created include at least one described critical field and At least one described value;And
Created data set is analyzed, wherein determining at least one parameter transaction based on analyzing.
3. according to the method described in claim 2, wherein identifying at least one described critical field and at least one described value, also Include:
The first electronic document is analyzed to determine the data in the first electronic document;And
Based on the list of predefined critical field, at least part of identified data is extracted, wherein identified data At least part matched at least one critical field in predefined critical field list.
4. according to the method described in claim 3, wherein analyzing the first electronic document, further includes:
Optical character identification is executed to the first electronic document.
5. according to the method described in claim 2, wherein pre- based at least one of at least one critical field identified It is worth in each of definition of keywords section, generates the inquiry.
6. according to the method described in claim 1, further include:
The second electronic document is cleared up, wherein cleaning includes removing in personal information and irrelevant information based at least one cleaning rule At least one.
7. according to the method described in claim 1, further include:
Based on the template created, determines the required type of second electronic document, determined wherein being further based on Demand type generate the inquiry.
8. according to the method described in claim 7, further include:
Based on identified at least one data source of demand type identification.
9. according to the method described in claim 1, further include:
Based on the template created and the second electronic document, third electronic document is generated, wherein third electronic document includes request With the second electronic document.
10. a kind of non-transitory computer-readable medium is stored thereon with instruction, for executing one or more processing units For verifying the program of non-structured Enterprise Resources Planning data, described program includes:
The first electronic document is analyzed to determine at least one parameter transaction of transaction, wherein the first electronic document includes at least partly Non-structured data;
For transaction creation template, wherein the template is the structured data sets for including at least one identified parameter transaction;
Based on the template created, at least one inquiry is created;And
Using at least one described inquiry to inquire the second electronic document at least one data source.
11. being used to search the system for proving electronic document based on non-structured data, comprising:
Processing circuit;And
Memory, the memory includes instruction, when executing described instruction by processing circuit, the system configuration are as follows:
The first electronic document is analyzed to determine at least one parameter transaction of transaction, wherein the first electronic document includes at least partly Non-structured data;
For transaction creation template, wherein the template is the structured data sets for including at least one identified parameter transaction;
Based on the template created, at least one inquiry is created;And
Using at least one described inquiry to inquire the second electronic document at least one data source.
12. system according to claim 11, wherein the system is additionally configured to:
At least one critical field and at least one value are identified in the first electronic document;
Based on the first electronic document, create data set, wherein the data set created include at least one described critical field and At least one described value;And
Created data set is analyzed, wherein determining at least one parameter transaction based on analyzing.
13. system according to claim 12, wherein the system is additionally configured to:
The first electronic document is analyzed to determine the data in the first electronic document;And
Based on the list of predefined critical field, at least part of identified data is extracted, wherein identified data At least part matched at least one critical field in predefined critical field list.
14. system according to claim 13, wherein the system is additionally configured to:
Optical character identification is executed to the first electronic document.
15. system according to claim 12, wherein based at least one of at least one critical field identified Value in each of predefined keywords section generates the inquiry.
16. system according to claim 11, wherein the system is additionally configured to:
The second electronic document is cleared up, wherein cleaning includes removing in personal information and irrelevant information based at least one cleaning rule At least one.
17. system according to claim 11, wherein the system is additionally configured to:
Based on the template created, determines the required type of second electronic document, determined wherein being further based on Demand type generate the inquiry.
18. system according to claim 17, wherein the system is additionally configured to:
Based on identified at least one data source of demand type identification.
19. system according to claim 11, wherein the system is additionally configured to:
Based on the template created and the second electronic document, third electronic document is generated, wherein third electronic document includes request With the second electronic document.
CN201780070059.XA 2016-10-16 2017-10-13 Electronic document is proved based on non-structured data search Pending CN109983489A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201662408780P 2016-10-16 2016-10-16
US62/408,780 2016-10-16
US15/361,934 US20170154385A1 (en) 2015-11-29 2016-11-28 System and method for automatic validation
US15/361,934 2016-11-28
PCT/US2017/056448 WO2018071737A1 (en) 2016-10-16 2017-10-13 Finding evidencing electronic documents based on unstructured data

Publications (1)

Publication Number Publication Date
CN109983489A true CN109983489A (en) 2019-07-05

Family

ID=61906440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780070059.XA Pending CN109983489A (en) 2016-10-16 2017-10-13 Electronic document is proved based on non-structured data search

Country Status (3)

Country Link
EP (1) EP3526758A4 (en)
CN (1) CN109983489A (en)
WO (1) WO2018071737A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0415839A (en) * 1990-05-10 1992-01-21 Toshiba Corp Distributed data base control device
US20100161616A1 (en) * 2008-12-16 2010-06-24 Carol Mitchell Systems and methods for coupling structured content with unstructured content
US8774516B2 (en) * 2009-02-10 2014-07-08 Kofax, Inc. Systems, methods and computer program products for determining document validity
US10176239B2 (en) * 2012-04-24 2019-01-08 International Business Machines Corporation Automation-assisted curation of technical support information

Also Published As

Publication number Publication date
WO2018071737A1 (en) 2018-04-19
EP3526758A1 (en) 2019-08-21
EP3526758A4 (en) 2020-05-06

Similar Documents

Publication Publication Date Title
US10614527B2 (en) System and method for automatic generation of reports based on electronic documents
US11062132B2 (en) System and method for identification of missing data elements in electronic documents
US11138372B2 (en) System and method for reporting based on electronic documents
US20170323006A1 (en) System and method for providing analytics in real-time based on unstructured electronic documents
US20180011846A1 (en) System and method for matching transaction electronic documents to evidencing electronic documents
US20170193608A1 (en) System and method for automatically generating reporting data based on electronic documents
EP3494495A1 (en) System and method for completing electronic documents
EP3430540A1 (en) System and method for automatically generating reporting data based on electronic documents
US20170169518A1 (en) System and method for automatically tagging electronic documents
US20180046663A1 (en) System and method for completing electronic documents
US10558880B2 (en) System and method for finding evidencing electronic documents based on unstructured data
CN110023970A (en) System and method for verifying non-structured Enterprise Resources Plan data
CN109983489A (en) Electronic document is proved based on non-structured data search
US10387561B2 (en) System and method for obtaining reissues of electronic documents lacking required data
US20180096435A1 (en) System and method for verifying unstructured enterprise resource planning data
EP3494496A1 (en) System and method for reporting based on electronic documents
US20170323106A1 (en) System and method for encrypting data in electronic documents
CN108713198A (en) Automatic checking request based on electronic document
WO2017201292A1 (en) System and method for encrypting data in electronic documents
US20200118122A1 (en) Techniques for completing missing and obscured transaction data items
CN109313765A (en) The System and method for of automatic verifying transaction is carried out based on electronic document
WO2017142624A1 (en) System and method for automatically tagging electronic documents
WO2018027133A1 (en) Obtaining reissues of electronic documents lacking required data
EP3491554A1 (en) Matching transaction electronic documents to evidencing electronic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190705