CN112017019A - Automatic reimbursement method and device based on PDF semantic extraction analysis, computer equipment and storage medium - Google Patents

Automatic reimbursement method and device based on PDF semantic extraction analysis, computer equipment and storage medium Download PDF

Info

Publication number
CN112017019A
CN112017019A CN202010661725.0A CN202010661725A CN112017019A CN 112017019 A CN112017019 A CN 112017019A CN 202010661725 A CN202010661725 A CN 202010661725A CN 112017019 A CN112017019 A CN 112017019A
Authority
CN
China
Prior art keywords
reimbursement
information
comparison
data
invoice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010661725.0A
Other languages
Chinese (zh)
Inventor
侯健
郭近之
陈伯厚
范为军
洪瑞哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN202010661725.0A priority Critical patent/CN112017019A/en
Publication of CN112017019A publication Critical patent/CN112017019A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application relates to an automatic reimbursement method, an automatic reimbursement device, computer equipment and a storage medium based on PDF semantic analysis, wherein the method comprises the following steps: acquiring electronic reimbursement information; detecting and identifying two dimensions of characters and forms of the document information according to an automatic extraction and intelligent analysis module of the account-reporting platform, and converting the document information into line character information; detecting and identifying through matching semantic models and template information according to characters and tables in the line character information to obtain corresponding reimbursement information; and comparing the expense of the reimbursement information according to the semantic rule base to generate the accounting information, thereby realizing automatic reimbursement. The invention provides a method, a device and analysis for network appointment vehicle reimbursement based on PDF semantic analysis and extraction, so that a reimburser is helped to save form filling time, the auditing workload of financial staff is reduced, and the phenomena of false reimbursement and repeated reimbursement are avoided or even eliminated.

Description

Automatic reimbursement method and device based on PDF semantic extraction analysis, computer equipment and storage medium
Technical Field
The application relates to the field of financial informatization, in particular to a network appointment vehicle reimbursement method and device based on PDF semantic extraction analysis, computer equipment and a storage medium.
Background
With the rapid development and popularization of invoice electronization and online appointment, the demands of enterprises on online appointment vehicles in the scenes of employee travel, outgoing, late overtime and the like are gradually highlighted. Compared with the traditional taxi, the network appointment car can provide an electronic travel list, clearly shows the boarding and alighting places and time of passengers, and financial staff can accurately judge and record the compliance of staff reimbursement in the reimbursement stage, so that the risk of false reimbursement is reduced.
At present, the formats of travel itineraries and electronic invoices provided by network car booking enterprises are PDF formats, and the typesetting formats provided by each enterprise are different and are changed frequently. After the reimburser acquires the travel itinerary and the electronic invoice from the online appointment platform, the reimbursement person needs to manually input the travel itinerary and the electronic invoice information into a related expense reimbursement system, and after the reimbursement person receives the reimbursement application, the reimbursement person needs to manually check the itinerary and the electronic invoice one by one, which is troublesome and laborious.
Therefore, it is necessary to introduce one or more latest technologies to solve the large amount of repetitive tasks existing in the actual financial tasks, such as the extraction and filling of the itinerary data, the examination of the invoice data, and the matching with the itinerary.
Disclosure of Invention
In view of the above, it is necessary to provide a network appointment reimbursement method and apparatus, a computer device and a storage medium based on PDF semantic analysis extraction.
A network appointment reimbursement method based on PDF semantic analysis extraction comprises the following steps: acquiring reimbursement receipt information; carrying out PDF text intelligent analysis according to the document information, and converting the document information into line text information; detecting and identifying through matching semantic models and template information according to characters and tables in the line character information to obtain corresponding reimbursement information; and comparing the reimbursement information according to the semantic rule base to generate bookkeeping information, thereby realizing automatic reimbursement.
In one embodiment, the electronic invoice and itinerary data collection method further comprises: receiving and storing document information uploaded by a reimburser through a reimbursement platform, generating a corresponding two-dimensional code according to the cost type to be reimbursed by the reimburser after a system receives an invoice request initiated by the reimburser, and uploading a corresponding document file by the reimburser through scanning the two-dimensional code; the reimburser can also collect the files by actively sending the electronic invoice and the travel itinerary to a designated collection mailbox or accessing a third party card package (such as a WeChat card package).
In one embodiment, the electronic invoice and itinerary data extraction and conversion method further comprises: and detecting and identifying two dimensions of characters and forms of the document information through an automatic extraction and intelligent analysis module of the system, and converting the document information into the line character information.
Further, the method comprises the following steps:
the Wu-Manber improved algorithm for Chinese PDF text parsing is based on a classical Wu-Manber multi-mode matching algorithm.
A key information extraction mode using a bloom filter;
a high-performance text matching algorithm combining double hash and PDF text coding rules;
the automatic identification module is generated by model construction and identification training based on two layers of CNN convolutional neural networks;
a preprocessing step, namely detecting an area containing characters in a bill image uploaded by a reimburser and generating a corresponding character line;
a whole line identification step, combining character segmentation and single character identification into a new whole line;
a language model decoding step, namely, completing language model identification based on the language model decoding of the N-gram and layout analysis and post-processing of artificial rules;
a character recognition step, namely generating recognition characters by adopting an end-to-end machine learning system;
the recursive neural network sequence model learning algorithm based on the bidirectional long-short term memory neural network is used for deep learning, and the character recognition rate is improved.
In one specific embodiment, the electronic invoice and itinerary data comparison method further includes: and comparing detailed lines of the guarantee information through a cost comparison module of the system, generating bookkeeping data and submitting the bookkeeping data.
The expense detail comparison module of the system comprises a comparison rule base, all comparison rules are solidified and put into a warehouse, when a reimburser submits a reimbursement application, the system automatically compares reimbursement expense data, a comparison result is presented on a foreground for a comparison person to check, and the comparison result comprises non-compliance data.
The system is characterized in that a charge detail comparison module comprises a management function and provides a custom comparison rule loss model, financial staff can store a comparison rule of financial reimbursement into a rule base according to specified operation, and can configure a comparison rule application area, reimbursement categories, reimbursement departments and reimburser grades.
An automatic reimbursement device based on PDF semantic analysis, comprising: the device comprises:
the information acquisition module can be installed in app of the reimburser's mobile equipment to upload and store travel itinerary information; and information can be uploaded and collected from the reimbursement platform.
The intelligent analysis module is used for detecting and identifying characters and forms of the belonged bills through an automatic identification module of the belonged reimbursement platform to generate corresponding reimbursement information;
and the expense comparison module is used for comparing the detail information through the detail comparison module of the reimbursement platform and submitting an reimbursement application.
And the determining module is used for performing compliance verification on the reimbursement information and verifying the true or false of the reimbursement information in real time or after a delay through an invoice verification interface.
And the generation reimbursement module is used for realizing automatic reimbursement of the determined corresponding reimbursement information.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any of the above embodiments when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above embodiments. The automatic reimbursement method, the device, the computer equipment and the storage medium for PDF semantic analysis receive the receipt information uploaded by an reimburser through the reimbursement platform, store the electronic invoice and the travel bill data extraction and conversion, and detect and identify the two dimensions of characters and tables of the receipt information through the automatic extraction and intelligent analysis module of the system and convert the two dimensions into the literary character information; matching the semantic model with the template information according to the line character information to generate structured data and generate structured data required by reimbursement initiation; and comparing detailed lines of the guarantee information through a cost comparison module of the system, generating bookkeeping data and submitting the bookkeeping data. Therefore, the problem that the reimburser manually inputs the travel and the invoice information is solved by automatically extracting and identifying the travel list and the electronic invoice information, so that the reimburser reimbursement application time is shortened, and the risk of reimbursement failure caused by data misfilling is reduced; the work efficiency and the accuracy of checking the invoice and comparing the business data by financial staff are improved, and false reimbursement and repeated reimbursement can be effectively prevented.
Drawings
FIG. 1 is a diagram of an application environment of an automatic reimbursement method for PDF extraction semantic analysis in one embodiment;
FIG. 2 is an overall system architecture of an automatic reimbursement method for PDF extraction semantic analysis in one embodiment;
FIG. 3 is a flow diagram illustrating an exemplary embodiment of an automatic reimbursement method for PDF extraction semantic analysis;
FIG. 4 is a flow diagram of a PDF intelligent parsing engine in one embodiment;
FIG. 5 is an embodiment of a PDF intelligent parsing and comparison rule configuration interface;
FIG. 6 is a diagram of an apparatus of a PDF intelligent parsing engine in one embodiment;
FIG. 7 is a block diagram of an apparatus for PDF intelligent parsing engine in one embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The application provides an automatic reimbursement method based on PDF semantic extraction analysis, which is applied to an application environment shown in FIG. 1. As shown in FIG. 1, automated reimbursement system 100 includes storage device 102, configuration device 104, and reimbursement device 106. The storage device 102 is used for storing various information of reimbursement procedures. The reimbursement process information comprises a plurality of reimbursement operation processes. The reimbursement process information in the storage device 102 may be uploaded by the user 302 to the server 306 through the terminal device 304 for storage in the storage device 102 through the server 306. The reimbursement process information may be reimbursement information composed of natural language input by the user 302 to the terminal device 304. Configuration device 104 is used to generate configuration information for the target reimbursement task. The configuration information in the configuration device 104 may be configuration information generated by the configuration device 104 according to configuration data uploaded to the configuration device 104 by the user 202 through the terminal device 204. That is, the configuration information in the configuration device 104 may be configured manually. Finally, the reimbursement device 106 obtains configuration information required for reimbursement from the configuration device 104, obtains a plurality of PDF electronic invoices and itinerary information of reimbursement process information from the storage device 102 according to the configuration information, intelligently analyzes the obtained reimbursement information through the intelligent analysis engine 400 according to a preset automatic extraction and intelligent analysis module to obtain specific reimbursement billing data, and stores the data into the database 500, thereby realizing an automatic reimbursement process.
In one embodiment, as shown in fig. 2, an automatic reimbursement method for PDF extraction semantic analysis is provided, which is exemplified by applying the method to reimbursement equipment 106 in fig. 1, and includes the following steps:
and S102, acquiring the data of the electronic invoice and the travel itinerary, receiving and storing the bill information uploaded by the reimburser through the reimbursement platform.
In this embodiment, the step of collecting the electronic invoice and the itinerary data further includes:
after receiving an invoice request initiated by a reimburser, the system generates a corresponding two-dimensional code according to the cost type to be reimbursed by the reimburser, and the reimburser can upload a corresponding document file by scanning the two-dimensional code; the more common network car appointment cost reimbursement scene is that the staff take a car at night on duty, go out for public at work period and go on business trip; if the work is reimbursed in late overtime, the system can automatically generate a corresponding two-dimensional code picture on the next day according to the work leaving time of the employee in the previous day;
in addition, the reimburser can also collect the files by actively sending the electronic invoice and the travel itinerary to a designated collection mailbox or accessing a third party card package (such as a WeChat card package);
furthermore, the reimburser can manually initiate reimbursement process in the reimbursement system, and manually upload the files to the system by a mobile phone or a computer after selecting the service type to be reimbursed. If the reimbursement bill is manually initiated by the reimbursement person, the system generates a corresponding two-dimensional code according to basic information of the reimbursement bill, wherein the two-dimensional code mainly comprises information such as reimbursement service type, off-duty date and time, business trip starting time, city where the trip is located, business trip type, organization information of the reimbursement person, budget surplus limit, attention items and the like, so that information input operation of the reimbursement person at a mobile phone end and reimbursement work load can be avoided.
And S104, detecting and identifying two dimensions of characters and forms of the document information according to an automatic extraction and intelligent analysis module of the system, and converting the document information into the line character information.
In this embodiment, the electronic invoice and itinerary data extraction further includes:
a Wu-Manber improved algorithm for Chinese PDF text parsing based on a classical Wu-Manber multi-mode matching algorithm.
Key information extraction schema using bloom filters
High-performance text matching algorithm combining double hash and PDF text coding rules
A preprocessing step, after receiving the travel list information, firstly performing the preprocessing step, splitting the travel list into two parts of a common text line and a table
Detecting whether the network appointment car company to which the travel itinerary file uploaded by the reimburser belongs and the corresponding template information already exist in the system
If the form exists, the conversion of the form to the common text line is finished based on the steps of carrying out form model decoding, carrying out language model decoding based on N-gram, and carrying out manual layout analysis and post-processing
And if the obtained text line has line feed, performing a whole line identification step, and combining the associated characters into a whole line by adopting an end-to-end machine learning method.
And S106, matching the semantic model and the template information according to the line character information to generate structured data and generate structured data required by reimbursement initiation.
In this embodiment, the parsing of the electronic invoice and the itinerary data further includes:
the automatic identification module of the system is constructed based on a two-layer CNN convolutional neural network model and is generated through identification training.
The recursive neural network sequence model learning algorithm based on the bidirectional long-term and short-term neural network is used for deep learning, so that the character recognition rate is improved.
Further, according to template types of different network appointment companies and different periods, the structuring efficiency and accuracy are improved through classification. Such as company name, starting time, starting place, getting-off place, passenger, invoice code, invoice number, invoice amount and invoice date in electronic invoice. And carrying out structured data processing on the identification elements, and storing the identification elements in a database for use by the system.
In this embodiment, if the system administrator opens the travel expense automatic reimbursement process switch, after the user completes the operation of collecting the electronic travel itinerary and the invoice, the system can automatically go to the travel system to judge the current reimburser in real time through the above extraction, after the analysis and identification steps, whether the travel expense travel meeting the conditions exists, the judgment conditions mainly include the travel starting time, whether the travel is in, whether the passenger is the current reimburser, whether the transportation means is a network appointment or not, and the like. And if the itineraries meeting the conditions exist, automatically collecting the current invoices into the corresponding travel itineraries.
If the system administrator opens the switch of the automatic reimbursement process for the urban transportation fees, after the user finishes the operation of collecting the electronic travel lists and the invoices, the system can go to the urban transportation fees system to judge whether the current reimburser exists and whether the behavior meeting the conditions of going out or late overtime exists, the judgment conditions mainly comprise whether the travel starting time is within 30 minutes after going out and going out, whether the attendance time on duty is within 60 minutes, whether the passenger is the current reimburser, whether the transportation means is a network appointment car, and the like, if the travel meeting the conditions exists, the automatic reimbursement process for the urban transportation fees is initiated for the current reimburser automatically, and the current reimburser is notified in the form of short message, mail or APP push message.
And S108, comparing the detailed lines of the guarantee information according to the cost comparison module of the system, generating accounting data and realizing automatic reimbursement.
In this embodiment, the financial cost comparing step further includes:
the expense comparison module of the system comprises a comparison rule base, all comparison rules are solidified and put in a warehouse, when a reimburser submits an reimbursement, the system automatically compares reimbursement expense data, a comparison result is presented on a foreground for the reimburser to check, and the comparison result comprises non-compliance data. Examples are: when the travel expense reimburser submits the application, the travel starting time specified in the travel bill does not belong to any section of travel interval, or the deviation between the starting place and the starting place specified in the travel is large. The start time of the urban traffic fare journey and the off-duty time recorded by the attendance system exceed 1 hour.
In this embodiment, the financial cost comparing step further includes:
the cost comparison module of the system comprises a management function and provides a custom comparison rule model, and financial auditors can store the comparison rules of financial reimbursement into a comparison rule base according to specified operation and can configure the applicable area of the comparison rules, reimbursement categories, reimbursement departments and reimburser grades. Examples are: the financial auditing personnel can adjust the earliest time for reimbursement when leaving work according to the region categories of the first-line city, the second-line city and the like so as to be suitable for auditing influence of work and rest time of different regions on late overtime taxi taking.
When the comparison of the service data is completed for only T +1 time, the invoice checking module is connected with an electronic account base of a tax bureau through an invoice checking interface according to the identified invoice content information, and sends invoice basic information (invoice number, invoice code, invoice amount and invoice drawing time) to verify the authenticity of the invoice in the electronic account base of the tax bureau, wherein the invoice is verified to be true, and then the invoice checking module inquires whether the historical invoice database has the same invoice information or not, and the historical invoice database does not have the same invoice information; all the above operations are passed, the invoice information is marked as successful inspection, and the invoice content information corresponding to the invoice and the original PDF file are stored in a historical invoice database;
it should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.
The present application further provides an automatic reimbursement device based on PDF semantic extraction analysis, as shown in fig. 8, the device includes an information obtaining module 10, an intelligent parsing module 20, a fee comparison module 30, a determining module 40, and a generating reimbursement module 50. The information acquisition module 10 is used for receiving and storing the document information uploaded by the reimburser through the reimbursement platform; the intelligent analysis 20 detects and identifies two dimensions of characters and tables of the document information through an automatic extraction and intelligent analysis module of the system, and converts the document information into character information; the fee comparison 30 is used for comparing detailed lines of the guarantee information through a fee comparison module of the system and generating accounting data; the determining module 40 is configured to perform compliance audit determination on the billing data, and ensure that the billing data meets reimbursement requirements; the generating reimbursement module 50 is used for completing the operation flow of the modules, and the audited accounting data automatically generate reimbursement data.
For specific limitations of an automatic reimbursement device based on PDF semantic extraction analysis, refer to the above limitations on an automatic reimbursement method based on PDF semantic extraction analysis, which are not described herein again. All modules in the automatic reimbursement device based on PDF semantic extraction and analysis can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, the computer device may be a server or a system of a plurality of servers, and the internal structure of the server may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer equipment is used for being connected with an external terminal and receiving reimbursement data such as electronic invoices and travel itineraries input by the terminal. The computer program is executed by a processor to implement an automatic reimbursement method based on PDF semantic extraction analysis.
Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An automatic reimbursement method, apparatus, system, computer device and storage medium based on PDF semantic analysis, the method comprising:
acquiring electronic reimbursement information;
extracting and converting characters and table elements of the document information according to a preset text processing rule to obtain line character information;
performing data analysis on the line character information according to a preset semantic model to obtain structured data;
and comparing the expense information in the structured data in detail according to a preset expense comparison rule to obtain target accounting data and submitting the accounting.
2. The method of claim 1, wherein prior to obtaining the electronic reimbursement information, further comprising:
generating a corresponding collection two-dimensional code according to the reimbursement related basic information acquired by the system:
the aggregation two-dimensional code comprises reimbursement service types, reimbursement document official document numbers, reimburser basic information and the like;
acquiring reimbursement receipt information and travel itinerary information by scanning the two-dimensional code;
the reimbursement receipt information includes: billing main body, billing head-up, amount and the like;
the travel itinerary information includes a departure time, an arrival time, a departure place, a destination, and the like.
3. The method of claim 1, wherein the extracting and parsing process of the line text information comprises:
extracting the electronic document information by using a bloom filter according to the electronic document information, and splitting the electronic document information into two parts, namely a common text line and a table;
matching a corresponding template information table in the system according to the invoicing main body in the reimbursement receipt information by combining double hash and PDF text coding rules;
if the template information form corresponding to the current invoicing main body exists, processing the form based on a form model decoding rule, and converting the form into a common text line through surface mixing analysis;
and if the text line has line feed, performing a whole line identification step, and combining the associated characters into a whole line by adopting an end-to-end machine learning method.
4. The method of claim 1, wherein the line text information data parsing comprises:
matching a semantic model with template information according to the line character information to generate structured data required by reimbursement initiation;
the data analysis module is constructed based on a two-layer CNN convolutional neural network model and is generated through recognition training;
according to the template information of different invoicing main bodies in different periods, the structuring efficiency and accuracy are improved through classification;
such as company name, travel starting time, travel starting place, getting-off place, passenger, invoice code, invoice number, invoice amount and invoice date in the electronic invoice in the travel list;
and carrying out structured data processing on the identification elements and storing the identification elements in a database.
5. The method of claim 1, wherein the cost comparison step comprises:
and according to a semantic rule base in the expense comparison rule, the rules of different formats of the travel itinerary invoicing main body are solidified and put in storage, and when reimbursement occurs, the system automatically compares the travel itinerary detail data to obtain the target bookkeeping data.
6. The method of claim 5, wherein the detail comparison module further comprises a management function for providing a custom comparison rule model, and the financial auditing staff can store the comparison detail rule into the comparison rule base according to the specified operation, and can configure the comparison rule to use the city, the region, the reimbursement category, the reimbursement company, the reimbursement department, and the reimburser level.
7. The method as claimed in claim 6, wherein the detail comparison module of the intelligent reimbursement system further comprises a preset semantic rule base for fixedly storing all rules of different formats of the online taxi appointment companies, if the reimburser submits the reimbursement application, the system automatically compares the detailed data of the travel itinerary and presents the comparison result on a user foreground for the reimburser and a financial reviewer to check, and the comparison result comprises non-compliance data.
8. An automatic reimbursement method and device based on PDF semantic analysis are characterized in that the device comprises:
the information acquisition module can be installed in app of the reimburser's mobile equipment to upload and store travel itinerary information;
the intelligent analysis module is used for detecting and identifying characters and forms of the belonged bills through an automatic identification module of the belonged reimbursement platform to generate corresponding reimbursement information;
the expense comparison module is used for comparing the detail information through the detail comparison module of the reimbursement platform;
the determining module is used for performing compliance verification on the reimbursement information and verifying the true or false of the reimbursement information in real time or after delay through an invoice verification interface;
and the generation reimbursement module is used for realizing automatic reimbursement of the determined corresponding reimbursement information.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010661725.0A 2020-07-10 2020-07-10 Automatic reimbursement method and device based on PDF semantic extraction analysis, computer equipment and storage medium Pending CN112017019A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010661725.0A CN112017019A (en) 2020-07-10 2020-07-10 Automatic reimbursement method and device based on PDF semantic extraction analysis, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010661725.0A CN112017019A (en) 2020-07-10 2020-07-10 Automatic reimbursement method and device based on PDF semantic extraction analysis, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112017019A true CN112017019A (en) 2020-12-01

Family

ID=73498500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010661725.0A Pending CN112017019A (en) 2020-07-10 2020-07-10 Automatic reimbursement method and device based on PDF semantic extraction analysis, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112017019A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669133A (en) * 2020-12-28 2021-04-16 祝泽文 Intelligent cost control reimbursement method capable of automatically matching according to application scenes
CN113065940A (en) * 2021-04-27 2021-07-02 平安普惠企业管理有限公司 Invoice reimbursement method, device, equipment and storage medium based on artificial intelligence
CN113297850A (en) * 2021-05-17 2021-08-24 济南森维网络科技有限公司 Cross-department financial expense management method based on block chain technology
CN113627438A (en) * 2021-08-09 2021-11-09 三峡高科信息技术有限责任公司 Method and system for automatically reimbursing travel expenses based on bill recognition and configuration engine
CN114640645A (en) * 2022-05-18 2022-06-17 深圳高灯计算机科技有限公司 Reimbursement processing method and device for electronic mail, computer equipment and storage medium
CN114724158A (en) * 2022-04-21 2022-07-08 北京梦诚科技有限公司 Engineering quantity auditing method and system, electronic equipment and storage medium
CN116227487A (en) * 2023-01-10 2023-06-06 浙江法之道信息技术有限公司 Legal text risk point intelligent auditing system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669133A (en) * 2020-12-28 2021-04-16 祝泽文 Intelligent cost control reimbursement method capable of automatically matching according to application scenes
CN113065940A (en) * 2021-04-27 2021-07-02 平安普惠企业管理有限公司 Invoice reimbursement method, device, equipment and storage medium based on artificial intelligence
CN113065940B (en) * 2021-04-27 2023-11-17 江苏环迅信息科技有限公司 Method, device, equipment and storage medium for reimbursement of invoice based on artificial intelligence
CN113297850A (en) * 2021-05-17 2021-08-24 济南森维网络科技有限公司 Cross-department financial expense management method based on block chain technology
CN113297850B (en) * 2021-05-17 2023-11-07 江苏环迅信息科技有限公司 Cross-department financial expenditure management method based on block chain technology
CN113627438A (en) * 2021-08-09 2021-11-09 三峡高科信息技术有限责任公司 Method and system for automatically reimbursing travel expenses based on bill recognition and configuration engine
CN114724158A (en) * 2022-04-21 2022-07-08 北京梦诚科技有限公司 Engineering quantity auditing method and system, electronic equipment and storage medium
CN114640645A (en) * 2022-05-18 2022-06-17 深圳高灯计算机科技有限公司 Reimbursement processing method and device for electronic mail, computer equipment and storage medium
CN116227487A (en) * 2023-01-10 2023-06-06 浙江法之道信息技术有限公司 Legal text risk point intelligent auditing system
CN116227487B (en) * 2023-01-10 2023-11-10 浙江法之道信息技术有限公司 Legal text risk point intelligent auditing system

Similar Documents

Publication Publication Date Title
CN112017019A (en) Automatic reimbursement method and device based on PDF semantic extraction analysis, computer equipment and storage medium
CN110544161A (en) financial expense auditing method and device based on automatic extraction of bill data
CN107944011B (en) Method, device, server and storage medium for processing group policy data
CN107918859B (en) Method and device for processing order task and providing travel service
CN114495085B (en) Reimbursement method for online identification and management of multi-platform invoice
CN111177129B (en) Method, device, equipment and storage medium for constructing label system
CN114202755A (en) Transaction background authenticity auditing method and system based on OCR (optical character recognition) and NLP (non-line segment) technologies
CN113011959A (en) Seven-expense intelligent auditing system and use method thereof
WO2020034346A1 (en) Taxi-hailing reimbursement method and apparatus, computer device, and storage medium
CN110782320A (en) Order processing method and device, order reporting and disappearing system and storage medium
CN112785404A (en) Invoice issuing management system
CN112069893A (en) Bill processing method and device, electronic equipment and storage medium
CN111652699A (en) Data transmission method for tax receipt system
CN114240333A (en) Holographic application center system for electronic accounting archives
CN110008772B (en) Method and system for rapidly identifying and inputting invoice for tax administration
CN110717732A (en) Information authentication method and system
CN115907673A (en) Supply chain system
KR102562186B1 (en) System for providing rental property management based official letter sending service
CN114819896A (en) AI-based bill management system, method and storage medium
CN114358707A (en) Man-machine cooperative hybrid examination order decision method and system
US11049204B1 (en) Visual and text pattern matching
CN113837170A (en) Automatic auditing processing method, device and equipment for vehicle insurance claim settlement application
CN111782917A (en) Method and apparatus for visual analysis of financial penalty data
CN116862573B (en) Inter-city network vehicle-reduction short-term travel demand prediction method and system based on incremental training
CN117217676A (en) Intelligent management system and method for engineering construction project

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201201