CN109983489A - Electronic document is proved based on non-structured data search - Google Patents
Electronic document is proved based on non-structured data search Download PDFInfo
- Publication number
- CN109983489A CN109983489A CN201780070059.XA CN201780070059A CN109983489A CN 109983489 A CN109983489 A CN 109983489A CN 201780070059 A CN201780070059 A CN 201780070059A CN 109983489 A CN109983489 A CN 109983489A
- Authority
- CN
- China
- Prior art keywords
- electronic document
- data
- transaction
- template
- inquiry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/04—Payment circuits
- G06Q20/047—Payment circuits using payment protocols involving electronic receipts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/389—Keeping log of transactions for guaranteeing non-repudiation of a transaction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/04—Billing or invoicing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
- G06Q40/123—Tax preparation or submission
Abstract
It is used to search the system and method for proving electronic document based on non-structured data.This method includes at least one parameter transaction for analyzing the first electronic document to determine transaction, wherein the first electronic document includes at least partly non-structured data;For transaction creation template, wherein template is the structured data sets for including at least one determining parameter transaction;Based on the template created, at least one inquiry is generated;And using at least one inquiry to inquire the second electronic document at least one data source.
Description
Cross reference to related applications
This application claims submitted on October 16th, 2016 application No. is the power of 62/408,780 U.S. Provisional Application
Benefit.The application simultaneously be also on November 28th, 2016 it is submitting, currently examine in, application No. is 15/361,934 beauty
The part continuation application of state's patent application.The content of above-mentioned application is incorporated herein by reference in their entirety.
Technical field
Generally, this disclosure relates to search for electronic document, and relate more specifically to based on non-structural in electronic document
The search of the data of change.
Background technique
Enterprise Resources Plan (Enterprise Resource Planning, ERP) is business management software, is commonly used in
It collects, storage, the data of management and explanation from various business activities, such as the spending that enterprise staff is done.ERP system is usual
Collect data relevant to the business activity of various departments in enterprise.Data collected in this way can come from different data
Source, and can be different formats.ERP system provides the integrated view of the business activity data, and is further able to give birth to
At report on expenses, the relevant tax authority can be sent to after this report.
Particularly in large enterprise, employee is engaged in a large amount of business activity.Such business activity may further bring to
The a large amount of business expenditure of tax authority's report.Report that such business expenditure can cause deductions and exemptions of taxes and refund.For this purpose, employee is logical
Receipt according to the expense occurred is often provided, and usually requires to indicate the type of such expense.Based on the instruction, ERP system
Report can be generated in system, and this report provides any received receipt to the relevant tax authority.
In addition, according to relevant to business activity data are managed, ERP system must be associated with and track managed data set it
Between relationship.For example, relevant to the tax affairs report of receipt information must be saved and be associated with receipt itself.Data set it
Between association in any mistake can lead to the report of mistake, this can then be caused by unsuccessful redemption and tax-free lead
The loss of income of cause, and do not meet laws and rules.Therefore, accurate data management is most important for ERP system.
When the data of part are unstructured, additional challenge can be brought by tracking such data.For example, there is also with chase after
Track is stored as the relevant difficulty of payment receipt of image file.It is related to for the existing solution of these challenges based on user
The file extension of offer identifies the content of the file comprising non-structured data.This solution is limited to artificial mistake
Accidentally (for example, wrong word, file content of mistake etc.), and possibly content therein can not all be described.These disadvantages may
Further result in the inaccuracy in ERP system.
The quantity of receipt of the employee obtained in business process may be very huge.This large amount of receipt causes to provide
Significantly increase to the data of ERP system, so as to cause being difficult to manage the data in such ERP system.Specifically, existing solution party
Case faces the challenge in correct being associated with searched with safeguarded in managed data.These difficulties may cause mistake and not
Match.It may be wrong with multiple proofs or the related result of other incorrect reports when mistake and mismatch capture not in time
Accidentally.It is time and effort consuming that whether manual authentication report matches with receipt, and is limited to mistake.Further, this
Manual authentication itself can not correct the problem of managed data.
In addition, the existing solution for verifying transaction automatically is using including at least partly non-structured data
It faces the challenge when electronic document.Specifically, this solution can identify transaction data in the receipt of scanning and its
His non-structured data, but when utilizing identified transaction data, it may be possible to it is inefficient and inaccurate.
Therefore it provides the technical solution of many disadvantages of the prior art is overcome to be advantageous.
Summary of the invention
Several exemplary embodiments of the disclosure are summarized as follows.There is provided general introduction is in order to facilitate reader, offer pair
The basic comprehension of such embodiment and not exclusively limit disclosed range.The not all contemplated embodiments of the general introduction it is extensive
It summarizes, and is neither intended to the key or important element for identifying all embodiments, be not intended in terms of describing any or all
Range.Its sole purpose is that some concepts of one or more embodiments are presented in simplified form, more detailed as what is presented later
The preamble carefully described.For convenience, term " some embodiments " or " some embodiments " Lai Zhidai disclosure can be used herein
Single embodiment or multiple embodiments.
Some embodiments disclosed herein include being used to search the method for proving electronic document based on non-structured data.
This method comprises: analyzing the first electronic document to determine at least one parameter transaction of transaction, wherein the first electronic document includes
At least partly non-structured data;For transaction creation template, wherein template is the knot for including at least one determining parameter transaction
Structure data set;Based on the template created, at least one inquiry is generated;And the is used for inquire using at least one inquiry
At least one data source of two electronic documents.
Some embodiments disclosed herein further include non-transitory computer-readable medium, are stored thereon with for making to handle
The program that circuit executes, the program include: at least one parameter transaction for analyzing the first electronic document to determine transaction, wherein the
One electronic document includes at least partly non-structured data;For transaction creation template, wherein template be include determining at least one
The structured data sets of a parameter transaction;Based on the template created, at least one inquiry is generated;And it is looked into using at least one
It askes to inquire at least one data source for the second electronic document.
Some embodiments disclosed herein further include being for search proof electronic document based on non-structured data
System.The system includes: processing circuit;And memory, which includes instruction, will when instructing circuit processed to execute
System configuration are as follows: the first electronic document of analysis is to determine at least one parameter transaction traded, wherein the first electronic document includes
At least partly non-structured data;For transaction creation template, wherein template is the knot for including at least one determining parameter transaction
Structure data set;Based on the template created, at least one inquiry is generated;And the is used for inquire using at least one inquiry
At least one data source of two electronic documents.
Detailed description of the invention
It is particularly pointed out in claims at specification ending and is distinctly claimed presently disclosed subject matter.
By detailed description with the accompanying drawing below, foregoing end other objects, the feature and advantage of disclosed embodiment will be aobvious and easy
See.
Fig. 1 is the network for describing various open embodiments.
Fig. 2 is to show the process of the method according to the embodiment that electronic document is proved based on non-structured data search
Figure.
Fig. 3 is to show the flow chart of the method according to the embodiment for drawing template establishment.
Fig. 4 is the block diagram according to the Query Builder of embodiment.
Specific embodiment
It is important that, it should be noted that embodiment disclosed herein is only showing for many advantageous uses of the innovative teachings of this paper
Example.Generally, the statement made in the description of the present application not necessarily limits the embodiment of any various requirement protection.This
Outside, some statements are likely to be suited for certain inventive features and are not suitable for other features.In general, unless otherwise stated, single
Number elements can be plural number, vice versa and without loss of generality.In the accompanying drawings, similar labelled notation indicates in several views
Similar component.
Various disclosed embodiments include being used to search the system and the side that prove electronic document based on non-structured data
Method.For the first report electronic document drawing template establishment.Report electronic document includes at least portion of the instruction for the parameter transaction of transaction
Divide non-structured data.Based on the critical field and value drawing template establishment identified in report electronic document.Based on what is created
Template generation inquiry.Based on type, the data source to be searched for for reporting electronic document etc. come custom-built query.Using the inquiry, search
Rope one or more data source is for finding matched proof electronic document.It can be based on template come the result of verification search.Base
Electronic document is proved in template and result, be can be generated and is declared electronic document.
Other payments for purchase are carried out to declare value-added tax or enterprise in conventional business process, need to carry out
Application procedures.These programs require certain files to prepare and put on record, proves collection etc..According to the type of enterprise, source area, purchase
Place etc., the requirement are different because of the difference between the jurisdiction of courts.Nowadays, enterprise often manages theirs in multiple sources
Data, so that the task of proof needed for identification becomes complicated.In addition, such prove to may include sensitive data, unless
It needs for declaring, otherwise should not share sensitive data.In other cases, it is understood that there may be can not due to the problem of supervising
The certain data sent, for example, privacy concern.
Disclosed embodiment allows transaction indicated in the non-structured file based on such as image or text, searches
It is suitably proved with extracting.More specifically, non-structured data in analysis report electronic document are to create the number of structuring
According to collection template, and then it can be used for generating the structure based on template and uniquely identify the inquiry traded accordingly, to allow height
The search of effect and the data source accurately, for suitably proving electronic document.In addition, in order to which more efficiently subsequent makes
With can store created template to substitute corresponding report electronic document, because structural data can be than unstructured
Data, semi-structured data or lack the data of known structure and be more effectively carried out processing.
Fig. 1 shows example network Figure 100 for describing various open embodiments.Network 100 includes query generation
Device 120, web page source 130-1 to 130-N are (just to for the sake of simplicity, being known as a web page source 130 separately below and being referred to as
Multiple web page sources 130), database 140 and the business system 150 communicatedly connected by network 110.Network 110 can be
But it is not limited to wireless network, honeycomb or cable network, local area network (LAN), wide area network (WAN), Metropolitan Area Network (MAN) (MAN), internet, Wan Wei
Net (WWW), similar network and any combination thereof.
Business system 150 is associated with enterprise, and can store represented with enterprise or enterprise trade it is related
Data.Enterprise, which may be, but not limited to, its employee, can represent the enterprise of enterprise's purchase commodity and service.Business system 130
It can be but not limited to server, database, Enterprise Resource Planning System, CRM system, user equipment or storage
Any other system of related data.User equipment may be, but not limited to, personal computer, laptop computer, plate and calculate
Machine, smart phone, wearable computing machine equipment or can grab, store and send non-structured data set it is any its
His equipment.As non-limiting example, business system 150 can be the smart phone including camera.Business system 150 can be with
It is used by the employee of tissue for example associated with business system 130.
Database 140 can at least storage report electronic document.In an exemplary embodiment, database 140 can be with
By enterprise operations associated with business system 150 or associated with it.
Multiple storages of web page source 130 prove electronic document, the such as, but not limited to scanned copy of receipt, invoice etc..It can look into
Multiple web page sources 130 are ask, and different web page sources 130 can receive the inquiry of different-format.For this purpose, being stored in multiple webpages
Proof electronic document in source 130 may include metadata or associated with metadata, which identifies by proving accordingly
The transaction that electronic document proves.
In one embodiment, Query Builder 120 includes optical recognition process device (for example, the optical recognition process in Fig. 4
Device 430).Optical recognition process device is configured at least character of the identification in data, in especially non-structured data.Inquiry
Generator 120 is configured to receive from business system 130 and request.The request may include, but be not limited to report electronic document, report
Position etc. of the identifier, report electronic document of electronic document in database 140.Report electronic document is at least partly non-knot
The electronic document of structure, including but not limited to, non-structured data, partly-structured data, lack known format (that is, by
The format that Query Builder 120 identifies) data of structuring or combinations thereof.
Report electronic document is in general, but be not limited to for example (defeated for example, by typewriting or other means by employee's hand filling
Enter information) electronic document.In the exemplary embodiment, report electronic document can be the image for showing report on expenses, or
The text file of text including report on expenses.Report electronic document indicates information relevant to one or more transaction.
Report electronic document can upload to database 140 by the user of such as business system 150.For example, business system
150 user can shoot the image of report on expenses by the camera (not shown) of business system 150, and store the image on
In database 140 (for example, being not shown by the server of enterprise).
In one embodiment, Query Builder 120 is configured at least partly non-structured report electronic document of analysis.Point
Analysis may include, but be not limited by computer vision technique identification shown in at least partly non-structured electronic document
Element, and the template based on the element creation transaction attribute identified.This computer vision technique may further include figure
As identification, pattern-recognition, signal processing, character recognition etc..
Each created template is the data set of structuring, including the identified parameter transaction for transaction.Tool
Body, template includes the field of one or more classifications for representing transaction data, wherein each field includes suitable transaction ginseng
Several values.The creation of structured data sets template is discussed further below.
In one embodiment, based on the template created, Query Builder 120 is configured at least partly non-structural
The each transaction indicated in the report electronic document of change generates inquiry.It can be based further on and be received by multiple web page sources 130
It inquires format, for type of proof electronic document or both needed for certified report electronic document, generates each inquiry.
In one embodiment, Query Builder 120 is configurable to based on the template created, to report electronic document
In indicated each transaction determine the required type for proving electronic document.Based on parameter transaction, proof electricity can be determined
The required type of subfile, which is such as, but not limited to price, the type of the cargo bought or service, needs to prove
Country occurs for the type (for example, when proof electronic document is used as the proof that value-added tax is declared) of electronic document declared, transaction
In one or more proof rules or combinations thereof etc..As non-limiting example, for having the price less than 250 Euros
Transaction may need less detailed invoice, and may need more detailed invoice for other transaction.It is non-as another
Limitative examples can be needed with typically desired VAT invoice in the second national transaction in the transaction of the first country
Want any kind of invoice.
Each inquiry can be based on including the value in one or more fields of corresponding template.It is given birth to based on template query
At template field can be predefined field, be chosen to uniquely identify out the Transaction Information traded so that use
The proof electronic document (for example, receipt) that the inquiry is searched provides the proof of transaction.As non-limiting example, for causing
The purchase activity of generation expense, metadata may include the position (indicating in " position " field) of generation expense, generate expense
The feature that occurs of business (for example, as " merchandise news " field in indicate) (for example, the type of commodity, selling the type of product
Deng), time (for example, as " time " field in indicate) for generating expense, the amount of money is (for example, the currency values indicated in respective field
Or quantity), and combinations thereof etc..
In embodiment, Query Builder 120 is configured so that generated inquiry to search for proof electronic document.As a result
Prove that electronic document can be associated with the metadata of matching inquiry.Search may include using inquiry generated to inquire one
A or multiple web page sources 130.In some embodiments, search may include that electronics text is proved to 140 query result of database
Part, and when the proof electronic document for the transaction only not found in the search of database 140, it is inquired to web page source 130.
Therefore, in such an embodiment, web page source 130 can be only used for the proof electronic document that inquiry is lost.
In an alternate embodiment of the invention, Query Builder 120 is configurable to the result of cleaning search.The cleaning can wrap
It includes, but is not limited to prove to remove private data, unrelated data or both in electronic document from result.It can be based on by enterprise
One or more cleaning rules that system 150 provides determine private data and unrelated data.It is private as non-limiting example
People and unrelated data may include the personal information of specific employee (for example, personal credit card information, social security number
Deng), it is not to provide needed for the proof for supporting value-added tax to declare.In another embodiment, cleaning may include in result electricity
Optical character identification is used in subfile, and based on optical character identification as a result, identification is private and irrelevant information.
Use structured stencil for search proof electronic document make than such as directly using non-structured data more
It is effective and accurate.Specifically, the metadata based on template generation can be generated relative to specific field, so that metadata is more
Effectively and more accurately show the parameter of unique identification transaction.Therefore, metadata can be used for correctly searching for matched proof
Electronic document, while reducing the relevant processing capacity of metadata and time compared with.
Query Builder 120 generally includes the processing circuit (example for being coupled to memory (for example, memory 415 of Fig. 4)
Such as, the processing circuit 410 in Fig. 4).Processing circuit may include or for processor (not shown) component, or be coupled to storage
The processor array of device.Memory includes the instruction that can be executed by processing circuit.When processing circuit executes the instruction, this refers to
Enable configuration processing circuit to execute various functions described herein.
It should be appreciated that presently disclosed embodiment is not limited to specific structure shown in Fig. 1, and this public affairs is not being departed from
Other structures can be equally applicable in the case where opening the range of embodiment.Specifically, Query Builder 120 may reside within cloud
Computing platform, data center etc..In addition, in some embodiments, it is raw that there may be the multiple queries of operation as described above
It grows up to be a useful person, and is configured to have one as backup, with load sharing between them, or be divided between them different
Function.
It shall yet further be noted that some embodiments about Fig. 1 discussion are described as only interacting with a business system 150, this is only
It is rather than the limitation for the disclosure for purposes of simplicity.Data from additional business system can be used for by inquiring
The inquiry that generator 120 generates, without departing from the range of disclosed embodiment.In addition, database 140 can equally be another
Data source, for example, may have access to the server of one or more database.In addition, without departing from the scope of the disclosure,
Multiple databases can be used.
Fig. 2 is to show the method according to the embodiment for being used to search proof electronic document based on non-structured data
Example flow Figure 200.In embodiment, this method can be executed by Query Builder (for example, Query Builder 120).
In S210, receives or electronic document is reported in retrieval first.Report electronic document includes being related to one or more friendships
Easy at least partly non-structured data.At least partly non-structured data include but is not limited to non-structured data,
Partly-structured data or lack known format structuring data.Transaction e file can be from such as Enterprise Resources Planning
(ERP) system (for example, business system 130 of Fig. 1), or can be from such as user equipment (for example, business system 150 of Fig. 1)
It receives.
In some embodiments, it can receive the request for generating and declaring electronic document, it includes report that this, which declares electronic document,
It accuses electronic document or reports the identifier of electronic document.The request also can indicate that the type declared (for example, value-added tax or other
Tax revenue, reimbursement of employee's expense etc.).Therefore, in some embodiments, S210 may include search report electronic document.
In example embodiment, report electronic document it can be shown that one or more for example relevant to business activity
The image of report on expenses.It, can the shifting as operated by the employee of the tissue of shooting report on expenses table as non-limiting example
Dynamic equipment captures image.
In S220, for each transaction creation template indicated in report electronic document.In embodiment, pass through optical character
Identification (OCR) processor can analyze transaction e file.The analysis can also include that at least portion is identified using machine vision
Divide element, cleaning or the elimination data and generation structural data in non-structured data, which includes extremely
The key character and numerical value identified in the non-structured data of small part.As an example, for the image of receipt, machine vision
It can be used for identifying relevant to the transaction recorded in receipt information, such as price, position, date, buyer, the seller etc..
In optional S230, based on one in the template created, determining for corresponding transaction proves electronic document institute
Need type.In embodiment, S230 can also include one or more numbers of the identification storage type for proving electronic document
According to source.Identified data source can be inquired to search the matched proof electronic document for transaction.
In S240, based on the template generation inquiry created.Electronic document is proved determined by being based further on
Required type is inquired to generate.For example, can typical identification information based on the required type for proving electronic document, based on being known
The inquiry format or both that other data source receives, generates inquiry.It can be generated based on the value in the field of unique identification transaction
Inquiry.It can for the template including field " date ", " price ", " quantity " and " project name " as non-limiting example
To generate the inquiry for indicating the value in those fields.
In embodiment, S240 may include generating more than one inquiry.For example, when search needs different-format to inquire
Data source when, more than one inquiry can be used, with optimize be used for particular source inquiry etc..For this purpose, generated
Inquiry can be based further on the principle of optimality to optimize the inquiry of one or more data sources.
In S250, inquiry generated is for the search proof electronic document in one or more data sources.Implement one
In example, S250 includes that one or more web page sources are inquired using the inquiry of generation.In another embodiment, S250 may include
The database of enterprise is inquired first to search the proof electronic document for transaction, and if only do not searched in the database
To when proving electronic document, multiple web page sources are inquired.In embodiment, S250 can also include the electronics text that retrieval arrives
Part.In another embodiment, S250 further includes in the electronics that for example storage has been searched in database (for example, database 140)
File.
In some embodiments, S250 may include the notice for generating instruction search result.Notice may include being used for
The proof electronic document of transaction.
In optional S260, the result of search can be cleared up to remove personal information, irrelevant information or both and all clear up.Clearly
Reason can be based on cleaning rule.
In optional S270, electronic document is proved based on the template created and result, can be generated and declare electronic document.
In an exemplary embodiment, declare electronic document and can be include result electronic document, complete value-added tax declares and asks
Seek table.
In S280, the electronic document for incidental transaction is checked the need for, also, if it is, continue to execute S230,
Otherwise executive termination.
Fig. 3 is to show according to the embodiment based on the electronic document including at least partly non-structured data, is used for
The example flow diagram S220 of the method for drawing template establishment.
In S310, electronic document is obtained.Obtaining electronic document may include, but be not limited to, reception electronic document (for example,
Receive the image of scanning) or electronic document is retrieved (for example, examining from Client Enterprise system, businessman's business system or database
Rope).
In S320, electronic document is analyzed to identify the element at least partly non-structured data.Analysis may include
But it is not limited to using optical character identification (OCR) to determine the character in electronic document.
Element can include but is not limited to character relevant to transaction, character string or both.As non-limiting example, member
Element may include the print data appeared in payment receipt relevant to business activity.Such print data may include but
It is not limited to date, time, quantity, seller name, the type of seller business, value-added tax value, the type of bought product, payer
Method number of registration etc..
In S330, it is based on the analysis, identifies critical field and value in electronic document.Critical field may include but unlimited
In the title of businessman and address, date, currency, the commodity of sale or service, transaction identifiers, invoice number etc..Electronic document can
To include the unnecessary details for not being considered as key value.As an example, may not be needed the mark of businessman, therefore it is not to close
Key assignments.In embodiment, can predefined key value list, and can extract and the matched data slot of critical field.
Then, liquidation procedures is executed to ensure that information is accurately presented.For example, if OCR will lead to data and be shown as
" 1211212005 ", then this data can be converted to 12/12/2005 by liquidation procedures.Another example, if title is shown as " Mo $
Den " will be then changed to " Mosden ".It can use the external informations resource such as dictionary, calendar and execute liquidation procedures.
In another embodiment, check whether the data slot of extraction is complete.For example, if can identify the name of businessman
Claim, but lose its address, then the critical field of seller addresses is imperfect.Execute the trial for improving the primary key value of missing.
The trial may include inquiring the correlation of external system and database, inquiry and previous analyzed invoice, or combinations thereof.It is external
System and the example of database may include enterprise content, Universial Product Code (Universal Product Code, UPC) number
According to library, package delivery and tracking system etc..In one embodiment, S430 generates one group of complete predefined keywords section and its each
From value.
In another embodiment, S330 can also include the ambiguity for eliminating non-structured data.Ambiguity elimination can be with base
In but file name, dictionary, algorithm, the synonym etc. that are not limited to non-structured data set.Ambiguity elimination may be implemented more
The identification accurately traded.Ambiguity elimination can be based on but be not limited to, and the structure of data is (for example, the number in field " destination "
According to disambiguation can be carried out with location-based title), dictionary, algorithm, synonym etc..In some embodiments, if ambiguity
Eliminate it is unsuccessful, then can be generated notify and be sent to user (such as user of business system 150), prompt user provide into one
The explanation of step.
As non-limiting example, for the image in the file of entitled " purchase receipt ", can use and character string
" total price " is located at character string " 300.00 " character in a line the value determined to include in " purchasing price " field
300.00.As another example, can based on dictionary eliminate character string " Drance " ambiguity to generate metadata, the metadata
Indicate that position associated with non-structured data set is France.As another example, in field relevant to charge type,
The data of the structuring of field can be " Paris taxi ", and the value of the field can be " 60 Euros ".Based on maximum
The one or more rule of taxi price, it is too high for taxi fare use can to determine " 60 Euros ", and therefore should
Field corresponds to multistage taxi stroke.
In S340, structured data sets are generated.The data set of generation includes identified field and value.
Fig. 4 is the schematic block diagram according to the Query Builder 120 of embodiment.Query Builder 120 includes being coupled to deposit
The processing circuit 410 of reservoir 415, reservoir 420 and network interface 440.In embodiment, Query Builder 120 may include
Optical character identification (OCR) processor 430.In another embodiment, the component of Query Builder 120 can pass through bus 450
Communicatedly connect.
Processing circuit 410 can be implemented as one or more hardware logic components and circuit.Such as rather than limit, can make
The exemplary types of hardware logic component include field programmable gate array (field programmable gate
Array, FPGA), specific integrated circuit (application-specific integrated circuit, ASIC), dedicated mark
Quasi- product (Application-specific standard products, ASSP), system level chip system (system-on-
A-chip system, SOC), general purpose microprocessor, microcontroller, digital signal processor (digital signal
Processor, DSP) etc., or it is able to carry out other any hardware logic components of calculating or other information processing.
Memory 415 can be volatibility (such as RAM), non-volatile (such as ROM, flash memory) or combinations thereof.At one
In configuration, the computer-readable instruction for executing one or more embodiments as described herein is stored in reservoir 420.
In another embodiment, memory 415 is configured to storage software.Software is broadly interpreted that any type of finger
Enable, no matter refer to software, firmware, middleware, microcode, hardware description language or other.Instruction may include code (example
Such as, source code format, binary code form, executable code format or other any code formats appropriate).When by one
Or multiple processors, when executing the instruction, the instruction is so that processing circuit 410 executes various processes described herein.Specifically,
It is that instruction makes processing circuit 410 be based on non-structured data search when being executed proves electronic document as discussed herein.
Reservoir 420 can be magnetic memory, optical memory etc., and may be embodied as such as flash memory or other storages
Technology, CD-ROM, digital versatile disc (DVD) or other any media that can be used in storing desired information.
Reservoir 420, which can also be stored, generates first number to the analysis of non-structured data based on OCR processor 430
According to.In another embodiment, reservoir 420 can also be stored based on metadata inquiry generated.
OCR processor 430 can include but is not limited to be configured to identify mode in non-structured data set, feature or
The feature and/or pattern recognition processor (recognition processor, RP) 435 of both.Specifically, implement one
In example, OCR processor 430 is configured at least identify the character in non-structured data.It can use identified character wound
It builds including the data set for data needed for checking request.
Network interface 440 allows Query Builder 120 and business system 130, database 140, business system 150 or its group
Conjunction is communicated, and is notified for example to receive electronic document, to send, is searched for electronic document, storing data etc..
It should be appreciated that embodiment as described herein is not limited to specific structure shown in Fig. 4, and the disclosure is not being departed from
Other structures can be equally used in the case where the range of embodiment.
It should be noted that the card about the single deals match indicated in search and report electronic document of discussion described herein
The various embodiments of bright electronic document, it is only for simplify purpose rather than the limitation to the disclosed embodiments.It is not taking off
In the case where from the scope of the present disclosure, it can serially or parallelly find and report that is indicated in electronic document is used for multiple transaction
Proof electronic document.As non-limiting example, report that electronic document can be the expense of the multiple transaction of instruction that are being made by employee
With report.
It is also to be noted that it is multiple it is discussed about the report electronic document based on non-structured data use institute
The proof electronic document found is used for the multiple disclosed embodiments of value-added tax declared, only for the purposes of illustration rather than right
In the limitation of the disclosure.It can equally be submitted using proof electronic document for other, such as, but not limited to, other types
Declare, tax revenue prepare etc..
Implementable various embodiments disclosed herein is hardware, firmware, software or any combination thereof.In addition, software is preferred
Ground be embodied as visibly realizing on program storage unit (PSU) or the computer-readable medium that is made of component on or certain equipment
And/or the combination of equipment.Application program can upload to the machine including any suitable architecture and be executed by it.Preferably, should
Machine is in the computer platform with such as one or more central processing unit (" CPU "), memory and input/output structure
Upper implementation.Computer platform can also include operating system and micro-instruction code.Various processes and function described herein can be with
It is a part of micro-instruction code either application program, or is any combination of them, can be executed by CPU, nothing
By whether explicitly showing such computer or processor.In addition, various other peripheral cells may be coupled to computer
Platform, such as additional-data storage unit and print unit.In addition, non-transitory computer-readable medium is to propagate letter except temporary
Any computer-readable medium except number.
All examples as described herein and conditional statement are intended for instructing purpose, to help reader to understand disclosed reality
The principle and inventor of applying example promote the concept that this field is contributed, and should be understood as not to it is such specifically quote show
Example and condition make limitation.In addition, record herein the principle of embodiment of the disclosure, aspect and embodiment and its specifically show
All statements of example, it is intended to including its structural and functional equivalent.In addition, such equivalent includes currently known etc.
Jljl and in the future exploitation equivalent, that is, exploitation execution identical function any element, but regardless of structure how.
Claims (19)
1. being used to search the method for proving electronic document based on non-structured data, comprising:
The first electronic document is analyzed to determine at least one parameter transaction of transaction, wherein the first electronic document includes at least partly
Non-structured data;
For transaction creation template, wherein the template is the structured data sets for including at least one identified parameter transaction;
Based on the template created, at least one inquiry is created;And
Using at least one described inquiry to inquire the second electronic document at least one data source.
2. according to the method described in claim 1, wherein determining at least one parameter transaction, further includes:
At least one critical field and at least one value are identified in the first electronic document;
Based on the first electronic document, create data set, wherein the data set created include at least one described critical field and
At least one described value;And
Created data set is analyzed, wherein determining at least one parameter transaction based on analyzing.
3. according to the method described in claim 2, wherein identifying at least one described critical field and at least one described value, also
Include:
The first electronic document is analyzed to determine the data in the first electronic document;And
Based on the list of predefined critical field, at least part of identified data is extracted, wherein identified data
At least part matched at least one critical field in predefined critical field list.
4. according to the method described in claim 3, wherein analyzing the first electronic document, further includes:
Optical character identification is executed to the first electronic document.
5. according to the method described in claim 2, wherein pre- based at least one of at least one critical field identified
It is worth in each of definition of keywords section, generates the inquiry.
6. according to the method described in claim 1, further include:
The second electronic document is cleared up, wherein cleaning includes removing in personal information and irrelevant information based at least one cleaning rule
At least one.
7. according to the method described in claim 1, further include:
Based on the template created, determines the required type of second electronic document, determined wherein being further based on
Demand type generate the inquiry.
8. according to the method described in claim 7, further include:
Based on identified at least one data source of demand type identification.
9. according to the method described in claim 1, further include:
Based on the template created and the second electronic document, third electronic document is generated, wherein third electronic document includes request
With the second electronic document.
10. a kind of non-transitory computer-readable medium is stored thereon with instruction, for executing one or more processing units
For verifying the program of non-structured Enterprise Resources Planning data, described program includes:
The first electronic document is analyzed to determine at least one parameter transaction of transaction, wherein the first electronic document includes at least partly
Non-structured data;
For transaction creation template, wherein the template is the structured data sets for including at least one identified parameter transaction;
Based on the template created, at least one inquiry is created;And
Using at least one described inquiry to inquire the second electronic document at least one data source.
11. being used to search the system for proving electronic document based on non-structured data, comprising:
Processing circuit;And
Memory, the memory includes instruction, when executing described instruction by processing circuit, the system configuration are as follows:
The first electronic document is analyzed to determine at least one parameter transaction of transaction, wherein the first electronic document includes at least partly
Non-structured data;
For transaction creation template, wherein the template is the structured data sets for including at least one identified parameter transaction;
Based on the template created, at least one inquiry is created;And
Using at least one described inquiry to inquire the second electronic document at least one data source.
12. system according to claim 11, wherein the system is additionally configured to:
At least one critical field and at least one value are identified in the first electronic document;
Based on the first electronic document, create data set, wherein the data set created include at least one described critical field and
At least one described value;And
Created data set is analyzed, wherein determining at least one parameter transaction based on analyzing.
13. system according to claim 12, wherein the system is additionally configured to:
The first electronic document is analyzed to determine the data in the first electronic document;And
Based on the list of predefined critical field, at least part of identified data is extracted, wherein identified data
At least part matched at least one critical field in predefined critical field list.
14. system according to claim 13, wherein the system is additionally configured to:
Optical character identification is executed to the first electronic document.
15. system according to claim 12, wherein based at least one of at least one critical field identified
Value in each of predefined keywords section generates the inquiry.
16. system according to claim 11, wherein the system is additionally configured to:
The second electronic document is cleared up, wherein cleaning includes removing in personal information and irrelevant information based at least one cleaning rule
At least one.
17. system according to claim 11, wherein the system is additionally configured to:
Based on the template created, determines the required type of second electronic document, determined wherein being further based on
Demand type generate the inquiry.
18. system according to claim 17, wherein the system is additionally configured to:
Based on identified at least one data source of demand type identification.
19. system according to claim 11, wherein the system is additionally configured to:
Based on the template created and the second electronic document, third electronic document is generated, wherein third electronic document includes request
With the second electronic document.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662408780P | 2016-10-16 | 2016-10-16 | |
US62/408,780 | 2016-10-16 | ||
US15/361,934 US20170154385A1 (en) | 2015-11-29 | 2016-11-28 | System and method for automatic validation |
US15/361,934 | 2016-11-28 | ||
PCT/US2017/056448 WO2018071737A1 (en) | 2016-10-16 | 2017-10-13 | Finding evidencing electronic documents based on unstructured data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109983489A true CN109983489A (en) | 2019-07-05 |
Family
ID=61906440
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201780070059.XA Pending CN109983489A (en) | 2016-10-16 | 2017-10-13 | Electronic document is proved based on non-structured data search |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3526758A4 (en) |
CN (1) | CN109983489A (en) |
WO (1) | WO2018071737A1 (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0415839A (en) * | 1990-05-10 | 1992-01-21 | Toshiba Corp | Distributed data base control device |
US20100161616A1 (en) * | 2008-12-16 | 2010-06-24 | Carol Mitchell | Systems and methods for coupling structured content with unstructured content |
US8774516B2 (en) * | 2009-02-10 | 2014-07-08 | Kofax, Inc. | Systems, methods and computer program products for determining document validity |
US10176239B2 (en) * | 2012-04-24 | 2019-01-08 | International Business Machines Corporation | Automation-assisted curation of technical support information |
-
2017
- 2017-10-13 EP EP17861011.9A patent/EP3526758A4/en not_active Withdrawn
- 2017-10-13 WO PCT/US2017/056448 patent/WO2018071737A1/en active Application Filing
- 2017-10-13 CN CN201780070059.XA patent/CN109983489A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2018071737A1 (en) | 2018-04-19 |
EP3526758A1 (en) | 2019-08-21 |
EP3526758A4 (en) | 2020-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10614527B2 (en) | System and method for automatic generation of reports based on electronic documents | |
US11062132B2 (en) | System and method for identification of missing data elements in electronic documents | |
US11138372B2 (en) | System and method for reporting based on electronic documents | |
US20170323006A1 (en) | System and method for providing analytics in real-time based on unstructured electronic documents | |
US20180011846A1 (en) | System and method for matching transaction electronic documents to evidencing electronic documents | |
US20170193608A1 (en) | System and method for automatically generating reporting data based on electronic documents | |
EP3494495A1 (en) | System and method for completing electronic documents | |
EP3430540A1 (en) | System and method for automatically generating reporting data based on electronic documents | |
US20170169518A1 (en) | System and method for automatically tagging electronic documents | |
US20180046663A1 (en) | System and method for completing electronic documents | |
US10558880B2 (en) | System and method for finding evidencing electronic documents based on unstructured data | |
CN110023970A (en) | System and method for verifying non-structured Enterprise Resources Plan data | |
CN109983489A (en) | Electronic document is proved based on non-structured data search | |
US10387561B2 (en) | System and method for obtaining reissues of electronic documents lacking required data | |
US20180096435A1 (en) | System and method for verifying unstructured enterprise resource planning data | |
EP3494496A1 (en) | System and method for reporting based on electronic documents | |
US20170323106A1 (en) | System and method for encrypting data in electronic documents | |
CN108713198A (en) | Automatic checking request based on electronic document | |
WO2017201292A1 (en) | System and method for encrypting data in electronic documents | |
US20200118122A1 (en) | Techniques for completing missing and obscured transaction data items | |
CN109313765A (en) | The System and method for of automatic verifying transaction is carried out based on electronic document | |
WO2017142624A1 (en) | System and method for automatically tagging electronic documents | |
WO2018027133A1 (en) | Obtaining reissues of electronic documents lacking required data | |
EP3491554A1 (en) | Matching transaction electronic documents to evidencing electronic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190705 |