CN110909226A - Financial document information processing method and device, electronic equipment and storage medium - Google Patents

Financial document information processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110909226A
CN110909226A CN201911194180.0A CN201911194180A CN110909226A CN 110909226 A CN110909226 A CN 110909226A CN 201911194180 A CN201911194180 A CN 201911194180A CN 110909226 A CN110909226 A CN 110909226A
Authority
CN
China
Prior art keywords
financial
structured data
data
document
subject
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911194180.0A
Other languages
Chinese (zh)
Other versions
CN110909226B (en
Inventor
焦嘉烽
陈运文
张健
王璐
纪达麒
王亚楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Daerguan Information Technology (shanghai) Co Ltd
Original Assignee
Daerguan Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Daerguan Information Technology (shanghai) Co Ltd filed Critical Daerguan Information Technology (shanghai) Co Ltd
Priority to CN201911194180.0A priority Critical patent/CN110909226B/en
Publication of CN110909226A publication Critical patent/CN110909226A/en
Application granted granted Critical
Publication of CN110909226B publication Critical patent/CN110909226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The embodiment of the invention discloses a financial document information processing method, a financial document information processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: generating document structured data of the financial document to be audited through a document processing module; generating financial subject structured data based on the document structured data; inputting the document structured data into a character error correction model, and outputting an error correction result; inputting the document structured data into a manager information spot check module to generate a check result of the manager information; respectively inputting the financial subject structured data into a financial index formula calculation module, a financial subject change checking module and a financial statement extraction checking module; respectively generating a checking result of the financial index formula, a checking result of the financial subject change and a checking result of the financial subject data and the corresponding reference data; and displaying all the checking results and the error correction results. The technical scheme provided by the embodiment of the invention can improve the efficiency of auditing the financial documents.

Description

Financial document information processing method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of document processing, in particular to a financial document information processing method and device, electronic equipment and a storage medium.
Background
A financial document is a text containing a large amount of unstructured financial data, such as yearbook, recruitment specifications, etc., and is mainly composed of text paragraphs, tabular data, pictorial data, etc. For example, "2015, 2016 and 2017, the publisher operating income is 23.31, 23.04 and 24.90 yen respectively, the income level is kept basically stable, wherein the net income of interest is 20.35, 21.44 and 22.20 yen respectively; the income of the commission fee and the net commission income are respectively 1.03 million yuan, 0.89 million yuan and 0.86 million yuan; the investment gains were 1.83, 0.52 and 1.82 billion dollars, respectively.
The financial documents have a lot of contents, most financial document auditing workers are used for purely manually checking repeated documents with low technical content, the work task is very heavy, and due to the fact that the contents are many, omission is easy to occur, and the efficiency is low.
Disclosure of Invention
The embodiment of the invention provides a financial document information processing method which can improve the efficiency of auditing financial documents.
In a first aspect, an embodiment of the present invention provides a method for processing financial document information, including:
generating document structured data of the financial document to be audited through a document processing module;
preprocessing the document structured data through a model and extracting financial subjects, inputting an extraction result into a data normalization module, and generating financial subject structured data based on the preprocessed data and the normalization result;
inputting the document structured data into a character error correction model, outputting an error correction result and storing the error correction result;
inputting the document structured data into a manager information spot check module, checking the information of the manager, and generating a check result of the manager information;
inputting the structured data of the financial subjects into a financial index formula calculation module to generate a check result of a financial index formula;
inputting the financial subject structured data into a financial subject change checking module, checking data related to financial subject change in the financial subject structured data, and generating a checking result of financial subject change;
inputting the financial subject structured data into a financial statement extraction and verification module to generate a verification result of the financial subject data and the corresponding reference data;
and displaying all the checking results and the error correction results.
In a second aspect, an embodiment of the present invention further provides a financial document information processing apparatus, including:
the document structured data generation module is used for generating document structured data of the financial document to be audited through the document processing module;
the financial subject structured data generation module is used for preprocessing the document structured data through a model and extracting financial subjects, inputting an extraction result into the data normalization module, and generating financial subject structured data based on the preprocessed data and the normalization result;
the error correction module is used for inputting the document structured data into a character error correction model, outputting an error correction result and storing the error correction result;
the first checking module is used for inputting the document structured data into the manager information spot check checking module, checking the information of the manager and generating a checking result of the manager information;
the second checking module is used for inputting the structured data of the financial subjects into the financial index formula calculation module to generate a checking result of the financial index formula;
the third checking module is used for inputting the financial subject structured data into the financial subject change checking module, checking the data related to financial subject change in the financial subject structured data, and generating a checking result of financial subject change;
the fourth checking module is used for inputting the financial subject structured data into the financial statement extraction checking module to generate a checking result of the financial subject data and the corresponding reference data;
and the display module is used for displaying all the verification results and the error correction results.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a memory for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for processing the financial document information according to the embodiment of the invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement a method for processing financial document information according to an embodiment of the present invention.
According to the technical scheme provided by the embodiment of the invention, financial documents are converted into document structured data, preprocessing and financial subject extraction are carried out through a model, the extraction result is normalized, and financial subject structured data are generated based on the normalization result; respectively inputting the document structured data into a character error correction model and a manager information spot check verification module to obtain an error correction result and a verification result of manager information; the financial subject structured data are respectively input into the financial index formula calculation module, the financial subject change checking module and the financial statement extraction checking module, so that checking results of the financial index formula, the financial subject change checking results, the financial subject data and corresponding reference data are respectively obtained, all the checking results and error correction results are displayed, the auditing efficiency of financial documents can be improved, the labor cost is saved, and the error correction accuracy is high.
Drawings
FIG. 1a is a flow chart of a method for processing financial document information according to an embodiment of the present invention;
FIG. 1b is a flow chart of converting a financial document into a document structured data according to an embodiment of the present invention;
FIG. 2a is a flow chart of a method for processing financial document information according to an embodiment of the present invention;
FIG. 2b is a schematic diagram of a financial subject knowledge graph provided by an embodiment of the invention;
FIG. 2c is a flow chart of construction of a knowledge graph of financial subjects according to an embodiment of the present invention;
FIG. 2d is a schematic structural diagram of a BilSTM-CRF model according to an embodiment of the present invention;
FIG. 3 is a block diagram of an apparatus for processing financial document information according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1a is a flowchart of a method for processing financial document information according to an embodiment of the present invention, where the method may be performed by a financial document information processing apparatus, where the apparatus may be implemented by hardware and/or software, the apparatus may be configured in an electronic device such as a computer, a server, and the like, and the method may be applied in a scenario of auditing, extracting, and annotating financial documents.
As shown in fig. 1a, the technical solution provided by the embodiment of the present invention includes:
s110: and generating document structured data of the financial document to be audited through a document processing module.
In the embodiment of the invention, the financial documents to be audited can include documents in the forms of PDF, word and the like.
Specifically, related software modules (for example, python, Word2HTML, PyDocX, and other software modules) may be called to convert the Word document into an HTML file, and then an HTML parsing module such as Beautiful Soup is used to parse the HTML file, so as to generate document structured data. The Word document belongs to semi-structured data, paragraphs, tables and pictures in the Word document need to be extracted, and then structured data is formed.
Or specifically, the Word document calls a Word-to-PDF software module to generate a PDF file, calls a PDF analysis module to analyze the information of each character in the PDF file, and then generates a document structured data intermediate result. Because the PDF file does not have the concept of paragraphs and is basically a line of characters, the intermediate results are subjected to page crossing, pagination form merging (merging according to the title and position information to which the form belongs), and scattered clauses (a line of characters) merging, and finally document structured data is generated. The PDF file usually includes headers and footers, which have an influence on parsing paragraphs and tables across pages, and the footers are usually numbers, and are connected with the last text of the page content, which causes a misparsing of the numbers, and these information need to be removed when generating the document structured data of the PDF. Fig. 1b is referred to a flow of parsing a financial document (word or PDF) to be checked and converting the parsed document into document structure data.
The document structured data may be in the form shown in table 1 below, or other similar data structures may be used:
TABLE 1
Figure BDA0002294289520000041
S120: and preprocessing the document structured data through a model and extracting the financial subjects, inputting an extraction result into a data normalization module, and generating the financial subject structured data based on the preprocessed data and the normalization result.
In an implementation manner of the embodiment of the present invention, optionally, preprocessing the document structured data by a model and extracting financial subjects, inputting an extraction result into a data normalization module, and generating the financial subject structured data based on the preprocessed data and the normalization result may include: identifying a table type picture in the document structured data through a table optical character recognition OCR model, and identifying a table in the picture to obtain identified document structured data; inputting the identified document structured data into a form classification model to obtain a main body corresponding to each form and the category of the form; inputting the document structured data into a paraphrase table extraction model to obtain a report period and a reference relation of a publisher, and generating analyzed structured data; inputting the document structured data into a paragraph classification model to obtain a main body corresponding to each paragraph; inputting the document structured data into a form financial subject extraction model, and extracting the financial subjects and corresponding information of the form; inputting the document structured data into a text financial subject extraction model, and extracting the text financial subjects and corresponding information; and performing normalization operation on the extracted financial subjects and the corresponding information according to the constructed financial subject knowledge graph and the analyzed structured data, and generating final financial subject structured data based on the normalized extraction result, the main body and the form type. The extraction result can be referred to the form shown in table 2.
TABLE 2
Figure BDA0002294289520000051
Figure BDA0002294289520000061
The management layer analysis opinions may specifically be: the company management layer combines the financial statements (mainly based on the combined financial statements of the candidate examinations) in the company reporting period to analyze the asset liability structure, the cash flow, the repayment capacity, the profit capacity, the future development target and the sustainability of the profit capacity according to the combined caliber of the candidate examinations. Wherein, including the analysis of the structure of the assets and liabilities, the table 3 shows the combined profit schedule of the company in the last three years.
TABLE 3
Figure BDA0002294289520000062
Figure BDA0002294289520000071
The form OCR model is a trained model, and by inputting the document structured data into the model, the form picture in the document structured data can be identified, and the form in the picture can be identified. The form in the general word or PDF-form financial document is a picture, and the form needs to be recognized by a form OCR model, and data in the form needs to be analyzed, so as to improve the auditing capability.
The identified document structured data comprises a table in a document picture and a non-picture table. And inputting the identified document structured data into a form classification model to obtain a main body corresponding to each form and the category of the form. Where the principal may be a publisher, subsidiary, financer, others, etc. The type of form may be a balance sheet, a profit sheet, a cash flow sheet, or other forms, etc.
And inputting the document structured data into a paraphrase table extraction model to obtain the reference relation between the report period and the distributor, and obtaining the analyzed structured data. For example, "report period, last three years, and one period" in the document structured data may refer to "2013, 2014, 2015, 2016 1-9 months".
The form financial subject extraction model can comprise a financial subject word segmentation dictionary, wherein the financial subjects refer to special nouns describing financial data in financial documents, such as 'business income', 'business expenditure', 'mobile assets', and the like, and the financial subject word segmentation dictionary can be constructed by combining with expert knowledge in the financial field. The document structured data can be subjected to word segmentation of financial subjects through the financial subject word segmentation dictionary, so that the financial subjects and corresponding information of the form are extracted through the form financial subject extraction model. The information corresponding to the financial subjects includes information such as numerical values, units, and years. Or the document structured data and the financial subject segmentation dictionary can be input into the form financial subject extraction model, so that the financial subjects and the corresponding information of the form are extracted.
Part of the contents of the constructed financial subject segmentation dictionary may be the following contents in table 4:
TABLE 4
Figure BDA0002294289520000072
Figure BDA0002294289520000081
The text financial subject extraction model comprises a text financial subject extraction model, a text financial subject extraction model and a text financial subject extraction model, wherein the text financial subject extraction model comprises a financial subject segmentation dictionary, and the document structured data can be subjected to segmentation of financial subjects through the financial subject segmentation dictionary, so that the text financial subjects and corresponding information are extracted through the text financial subject extraction model. Or the document structured data and the financial subject segmentation dictionary can be input into the text financial subject extraction model, so that the financial subjects and the corresponding information of the text are extracted.
The normalizing operation performed on the extracted financial subjects and the corresponding information may specifically be: and normalizing certain financial subjects in the analyzed structured data according to the constructed financial subject knowledge graph, so as to generate final structured data of the financial subjects. For example, "2017 months 1-9" may be normalized to "2017 three quarters," restricted monetary funds "and" restricted use monetary funds "and the like may be normalized to" restricted monetary funds, "and the like.
S130: and inputting the document structured data into a character error correction model, outputting an error correction result and storing the error correction result.
In the embodiment of the invention, the text error correction model is used for identifying and outputting errors in the document structured data. Before the document structured data is input into the character error correction model and the error correction result is output and stored, the method can also comprise the step of training the character error correction model by adopting a paragraph which is correctly expressed in the document structured data as a corpus.
S140: and inputting the document structured data into a manager information spot check module, checking the information of the manager, and generating a manager checking result.
In an embodiment of the present invention, the management personnel include board of directors, supervisors and advanced management personnel, among others. Specifically, the document structured data is input into the management personnel information extraction and verification module, and the information verification results of the board of directors, the supervisor and the senior management personnel are generated. Taking age as an example, the method specifically comprises the following steps: identifying the terminal year of the calculated age, and identifying the name and age information in the table according to the named entity identification model; identifying age information of the directors, the prisoners and the senior managers according to the sequence labeling model, and generating structured data of the age information of the directors, the prisoners and the senior managers; and comparing the identified age information with the calculated age information according to the ending year of the age, the normalized age and the birth year and month information of the manager, and generating an age information verification result of the board of directors, the prisoners and the senior managers.
S150: and inputting the structured data of the financial subjects into a financial index formula calculation module to generate a financial index formula check result.
In an implementation manner of the embodiment of the present invention, optionally, the inputting the structured data of the financial subjects into a financial index formula calculating module to generate a checking result of a financial index formula may include: and substituting the structured data of the financial subjects into a pre-configured financial index formula by taking the main body and the year as the differences, calculating the result, comparing the target financial subjects corresponding to the result with the reference data of the target financial subjects, and storing the inconsistent financial subjects into the checking result.
For example, the preconfigured financial index formula may be: "operating gross profit rate is (operating income-operating cost)/operating income". Taking out the operating income and operating cost of which the main body is a publisher and the year is 2018 from the structured data of the asset and debt table, the profit table and the cash flow table, substituting the operating income and the operating cost into a pre-configured financial index formula, and calculating to obtain operating gross profit rate to obtain reference data of the operating gross profit rate; and if the data of the three tables are not available in the document, the data are obtained from other ways, or the financial index formula is not verified. And then calculating the business income and the business cost of the financial subjects structured data, wherein the main body of the financial subjects structured data is 'publisher', the year is '2018', based on a preconfigured financial index formula to obtain the business gross interest rate, comparing the business gross interest rate with the reference data of the business gross interest rate, and if the business gross interest rate is not consistent with the reference data of the business gross interest rate, storing the business gross interest rate.
S160: and inputting the financial subject structured data into a financial subject change checking module, checking data related to financial subject change in the financial subject structured data, and generating a financial subject change checking result.
In an implementation manner of the embodiment of the present invention, optionally, the inputting the financial subject structured data into a financial subject change checking module, checking data related to financial subject change in the financial subject structured data, and generating a financial subject change checking result may include: inputting the financial subject structured data into a financial subject change checking module, and generating a data structure of financial subject increase and decrease information based on the description related to financial subject change in the financial subject structured data; and comparing the change value/change rate of the financial subjects in the data structure with the reference data of the change value/change rate to generate a financial subject change value/change rate verification result.
Wherein, the reference data of the change value/change rate is obtained based on the related subject data in the balance sheet, the profit sheet and the cash flow sheet. For example: the description relating to the change of financial subjects in the financial subject structured data is "the issuer money fund increases in size year by year, 2015 increases by 100.00 ten thousand yuan later than the last year", and a data structure of financial subject increase and decrease information can be generated based on the description. Extracting the currency fund from the asset liability statement, the profit statement and the cash flow statement to the issuer 2015 year and the currency fund from the issuer 2014 year according to the maximum unit, carrying out numerical normalization, subtracting to obtain the reference data of the change value, comparing the reference data of the change value with 100.00 ten-thousand yuan in the data structure of the financial subject increase and decrease information, if the reference data of the change value is not consistent with the reference data of the financial subject increase and decrease information, indicating that the description of the position of the financial subject structured data is wrong, and storing the comparison result.
The data structure of the financial subject increase/decrease information is shown in table 5 (not limited to the following data structure):
TABLE 5
Figure BDA0002294289520000101
S170: and inputting the financial subject structured data into a financial statement extraction and verification module to generate a verification result of the financial subject data and the corresponding reference data.
In an implementation manner of the embodiment of the present invention, optionally, the inputting the structured data of the financial subjects into the financial statement extraction and verification module to generate a verification result of the financial subject data and the corresponding reference data may include: inputting the structured data of the financial subjects into a financial statement extraction and verification module, and classifying the structured data of the financial subjects according to the same subject, year and subject; if a plurality of financial data exist in the target category, normalizing each financial data; and comparing each normalized financial data with corresponding reference data, and taking the comparison result as verification data. The comparison process also needs to normalize the units, i.e. unify the units of the financial data and the corresponding reference data.
The reference data corresponding to the financial data can be derived from the structured data of an asset liability statement, a profit statement and a cash flow statement. Normalizing each financial data may be to unify units of each financial data, and may be to normalize the unit that is the largest among all the financial data, for example, if the unit that is the largest financial data is ten thousand yuan, convert each financial data into data in units of ten thousand yuan, and then perform a numerical comparison. And when the normalized financial data is inconsistent with the corresponding reference data, the fact that the description of the financial subject in the financial document in the year is inconsistent is indicated, and then a financial subject data consistency checking result is generated.
The data structure of the result of the consistency check of the financial subject data is shown in table 6 (not limited to the following data structure):
TABLE 6
Figure BDA0002294289520000111
S180: and displaying all the checking results and the error correction results.
In this embodiment of the present invention, optionally, the displaying all the verification results and the error correction results includes: and displaying the verification result and the error correction result, and identifying the data which are inconsistent in verification in the verification result and the content in the error correction result in the financial document. The checking result and the error correction result can be displayed through the Web front end. The offset or the occurrence frequency of each item of data can be obtained from the structural data of the verification result, and the Web front end can find characters needing highlighting or commenting in the financial document according to the offset, or find characters appearing for the Nth time in the financial document according to the occurrence frequency N and highlight or commenting the characters.
In the related art, PDF documents usually contain headers and footers, which have an influence on parsing paragraphs and tables across pages, and footers are usually numbers, which are connected with the last characters of the page content to cause misinterpretation of the numbers, and need to be removed when generating structured data of PDF. The Word document belongs to semi-structured data, and paragraphs, tables and pictures in the document need to be extracted to form document structured data. The embodiment of the invention processes the financial documents through the document processing module to generate document structured data.
In the related technology, the forms of the photo classes in the Word and PDF documents are identified by a form OCR model, and the data in the forms are analyzed, so that the system auditing capability is improved.
The first step of processing the document structured data is information extraction, namely extracting structured data of financial departments from the document structured data. A piece of financial data is a quadruplet of time, company name, financial subject information. In the related technology, a general extraction method is to extract elements through keyword matching and a regular expression, on one hand, the method cannot cover all element types by using manually defined keywords and regular expressions, and on the other hand, when financial conditions of a plurality of companies are described in one document structured data, and a plurality of index words, a plurality of time and other elements exist in one sentence, it is difficult to accurately obtain a combination relationship of company names, time and financial subjects, thereby resulting in lower accuracy and recall rate. The plurality of models provided by the embodiment of the invention preprocess the document structured data and extract the financial subjects, thereby generating the financial subject structured data, wherein the more comprehensive financial subjects can be extracted based on the financial subject knowledge graph in the financial subject extraction process, the main bodies corresponding to the paragraphs and the tables respectively can be obtained through the processing of the plurality of models, and the corresponding relations between the main bodies and the financial subjects and the like are obtained, thereby forming the financial subject structured data.
In the related technology, the table structure in the financial document is complex, such as the conditions of cell multi-row combination, cell multi-column combination, table paging, table continuous page and the like, complex tables such as frameless tables and the like exist in the PDF and are difficult to process, table elements are easy to be wrongly corresponded, and the financial subjects corresponding to numbers are difficult to accurately find, wherein the table paging refers to adding a continuous table in another page to express a table which cannot be expressed completely in one page, and the table continuous page refers to a table which spans two pages in the Word or PDF; secondly, the table title is an important component of the table content, and can be the description of the main object of the table, and can also be a limiting word of the table description object, if the information in the table title is ignored, the semantic understanding of the table will be incomplete; finally, whether the company corresponding to a certain form is the financial statement of a certain subsidiary company or the financial statement of the issuer is judged. According to the embodiment of the invention, the document structured data is input into the form classification model, so that the main bodies corresponding to the forms and the categories of the main bodies can be identified, and the forms can be well identified.
In a financial document, there may be several aliases and corresponding variants for a financial index. For example, the alias of the "receivable turnover rate" is "the turnover number of receivable funds", "EBITDA" is "the profit before the tax is depreciated and amortized", and "other receivable accounts generated by the business" may be expressed as "the amount of other receivable accounts classified as business money", and it is "normalized" by mapping different aliases to the same index word. In the related technology, not only is there no complete financial index, but also disambiguation mapping of the financial index is lost, so that sufficient data support is lacked during financial data comparison, and the financial data comparison effect is not ideal. According to the embodiment of the invention, the extracted financial subjects are normalized through the financial subject knowledge graph, so that financial subjects with different expressions and the same meaning can be normalized, and the effect of comparing financial data is enhanced.
The character error correction relates to many aspects, such as multiple characters, missing characters, homophone errors and the like, and when the financial documents are checked, higher accuracy and recall rate are difficult to guarantee. According to the embodiment of the invention, the document structured data is input into the character error correction model, so that the error correction result with higher accuracy is obtained.
According to the technical scheme provided by the embodiment of the invention, financial documents are converted into document structured data, preprocessing and financial subject extraction are carried out through a model, the extraction result is normalized, and financial subject structured data are generated based on the normalization result; respectively inputting the document structured data into a character error correction model and a manager information spot check verification module to obtain an error correction result and a verification result of manager information; the financial subject structured data are respectively input into the financial index formula calculation module, the financial subject change checking module and the financial statement extraction checking module, so that checking results of the financial index formula, the financial subject change checking results, the financial subject data and corresponding reference data are respectively obtained, all the checking results and error correction results are displayed, the auditing efficiency of financial documents can be improved, the labor cost is saved, and the error correction accuracy is high.
Fig. 2a is a flowchart of a financial document information processing method according to an embodiment of the present invention, and as shown in fig. 2a, a technical solution according to an embodiment of the present invention includes:
s210: and configuring a financial index formula.
Wherein the financial index formula can be configured into an Excel file. As shown in table 7, the formula of table 7 can be explained in detail in table 8.
TABLE 7
Figure BDA0002294289520000131
TABLE 8
id name target_id
A1_L1 Gross rate of business A1
A1_R1 Income of business A1
A1_R2 Cost of business A1
Wherein, the value of part of the financial subjects is calculated according to the value of other financial subjects, such as "EBITDA ═ total profit amount + interest cost + fixed asset depreciation + intangible asset amortization", and the financial index formula will change with the change of policy. In addition, formulas exist in the calculation of the table aggregate, such as "balance and owner equity aggregate + balance and owner equity aggregate", and the financial index formula can be customized by configuring the financial index formula in an Excel file in the embodiment of the present invention.
S220: and generating document structured data of the financial document to be audited through a document processing module.
S230: and performing natural language processing on other document structured data to construct a financial subject knowledge graph.
In the embodiment of the invention, natural language processing is carried out through a large amount of document structured data, and a knowledge graph containing knowledge of financial subjects, financial subject aliases and the like is constructed, so that various expressions of the same financial subject in the text are unified into the same expression mode. The simplified mode of the financial subject knowledge graph can refer to the form shown in fig. 2b, and the construction process of the financial subject knowledge graph can refer to fig. 2 c.
S240: and generating training corpora of the machine learning model for the document structured data in a mode of manual marking and rule extraction, and constructing a plurality of machine learning models according to the corpora and the expert knowledge.
In the embodiment of the invention, each line of text paragraphs in the two-dimensional array in the paraphrase table and the above information in the paraphrase table are used as input, and the reference relationship in the paraphrase table is used as output, so as to train the paraphrase table extraction model. Wherein, the paraphrase table can refer to table 9, the input of the paraphrase table extraction model is the text paragraph of each line in the two-dimensional array in the paraphrase table and the above information in the paraphrase table, and the output is the reference relationship in the paraphrase table, for example, "report period, last three years and first period" refer to "2013, 2014, 2015, 2016 1-9 months". Wherein, the paraphrase table extraction model can be a deep learning model such as BilSTM-CRF, BERT and the like. The structure of the BilSTM-CRF model can be referred to in FIG. 2 d.
TABLE 9
Figure BDA0002294289520000141
Figure BDA0002294289520000151
And training the table classification model by taking the table upper information in the document structured data corresponding to the financial documents in the training set and each line of text segment in the two-dimensional data of the table as input and taking the category of the table and the main body corresponding to the table as output. The information above the table includes a title (first-level, second-level, third-level title, etc.) corresponding to the table, table name information, and the like. The categories of the forms include a balance sheet, a profit sheet, a cash flow sheet, or other sheets. The table classification model can be a bidirectional long-short term memory (Bi-LSTM) model or a Support Vector Machine (SVM) model.
The method comprises the steps of training a document structured data corresponding to financial documents in a training set, and training a paragraph classification model by taking paragraphs corresponding to main bodies as outputs. The above information of the paragraph includes information of the title (first-level, second-level, third-level title, etc.) corresponding to the paragraph. Since a paragraph describes financial data that may be of a sub-company and may also be of other companies, it is necessary to distinguish the subject to which the paragraph corresponds.
And training the text financial subject extraction model by taking the financial subject description paragraphs in the document structured data as input and information corresponding to the financial subjects as output. The information corresponding to the financial subjects comprises information such as numerical values, units, years and the like. The text financial subject extraction model can be a deep learning model such as a bidirectional long and short term memory-conditional random field (BilSTM-CRF) and BERT.
And taking the paragraphs with correct expression in the document structured data as corpora to train the character error correction model. The character error correction model may be a deep learning model such as bert (bidirectional Encoder reproduction from transformations).
The table type pictures in the document structured data are used as corpora, and the table OCR model is trained. Wherein the table OCR model may be a PSENet + CRNN deep learning model.
It should be noted that the corpora described in the embodiment of the present invention are all pre-labeled corpora.
S250: and preprocessing the document structured data through a model and extracting the financial subjects, inputting an extraction result into a data normalization module, and generating the financial subject structured data based on the preprocessed data and the normalization result.
S260: and inputting the document structured data into a character error correction model, outputting an error correction result and storing the error correction result.
S270: and inputting the document structured data into a manager information spot check module, checking the information of the manager, and generating a manager checking result.
S280: and inputting the structured data of the financial subjects into a financial index formula calculation module to generate a financial index formula check result.
S290: and inputting the financial subject structured data into a financial subject change checking module, checking data related to financial subject change in the financial subject structured data, and generating a financial subject change checking result.
S291: and inputting the financial subject structured data into a financial statement extraction and verification module to generate a verification result of the financial subject data and the corresponding reference data.
S292: and displaying all the checking results and the error correction results.
S220, S250-S292 can be seen in detail in the description of the above embodiments.
Fig. 3 is a block diagram of a financial document information processing apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus according to the embodiment of the present invention includes: the system comprises a document structured data generation module 310, a financial subject structured data generation module 320, an error correction module 330, a first verification module 340, a second verification module 350, a third verification module 360, a fourth verification module 370 and a presentation module 380.
The document structured data generation module 310 is configured to generate document structured data for the financial document to be audited through the document processing module;
the financial subject structured data generation module 320 is configured to perform preprocessing and financial subject extraction on the document structured data through a model, input an extraction result into the data normalization module, and generate financial subject structured data based on the preprocessed data and the normalization result;
the error correction module 330 is configured to input the document structured data into a text error correction model, output an error correction result, and store the error correction result;
the first checking module 340 is configured to input the document structured data to the administrator information spot check checking module, check the information of the administrator, and generate a checking result of the administrator information;
the second checking module 350 is configured to input the structured data of the financial subjects to the financial index formula calculation module, and generate a checking result of the financial index formula;
the third checking module 360 is configured to input the financial subject structured data into a financial subject change checking module, check data related to financial subject change in the financial subject structured data, and generate a checking result of financial subject change;
the fourth checking module 370, configured to input the structured financial subject data into the financial statement extraction checking module, and generate a checking result of the financial subject data and the corresponding reference data;
and a display module 380, configured to display all the verification results and the error correction results.
Optionally, the financial subject structured data generating module 320 is configured to:
identifying a table type picture in the document structured data through a table optical character recognition OCR model, and identifying a table in the picture to obtain identified document structured data;
inputting the identified document structured data into a form classification model to obtain a main body corresponding to each form and the category of the form;
inputting the document structured data into a paraphrase table extraction model to obtain a report period and a reference relation of a publisher, and generating analyzed structured data;
inputting the document structured data into a paragraph classification model to obtain a main body corresponding to each paragraph;
inputting the document structured data into a form financial subject extraction model, and extracting the financial subjects and corresponding information of the form;
inputting the document structured data into a text financial subject extraction model, and extracting the text financial subjects and corresponding information;
and performing normalization operation on the extracted financial subjects and the corresponding information according to the constructed financial subject knowledge graph and the analyzed structured data, and generating final financial subject structured data based on the normalized extraction result, the main body and the form type.
Optionally, the apparatus further comprises a training module, configured to:
taking each line of text paragraphs in the two-dimensional array in the paraphrase table and the above information in the paraphrase table as input, taking the reference relation in the paraphrase table as output, and training the paraphrase table extraction model;
training a table classification model by taking table upper information in document structured data corresponding to financial documents in a training set and each line of text segment in two-dimensional data of the table as input and taking the category of the table and a main body corresponding to the table as output;
training a paragraph classification model by taking paragraphs of document structured data corresponding to financial documents in a training set and the above information of the paragraphs as input and taking main bodies corresponding to the paragraphs as output;
training a text financial subject extraction model by taking a financial subject description paragraph in document structured data corresponding to financial documents in a training set as input and information corresponding to financial subjects as output;
training a form financial subject extraction model by taking a financial subject description paragraph in document structured data corresponding to the financial documents in the training set as input and information corresponding to financial subjects as output;
training a character error correction model by using paragraphs which are correctly expressed in document structured data as linguistic data;
and training a table OCR model by taking the table pictures in the document structured data corresponding to the financial documents in the training set as corpora.
Optionally, the second check module 350 is configured to:
and substituting the structured data of the financial subjects into a pre-configured financial index formula by taking the main body and the year as the differences, calculating the result, comparing the target financial subjects corresponding to the result with the reference data of the target financial subjects, and storing the inconsistent financial subjects into the checking result.
Optionally, the third checking module 360 is configured to:
inputting the financial subject structured data into a financial subject change checking module, and generating a data structure of financial subject increase and decrease information based on the description related to financial subject change in the financial subject structured data;
and comparing the change value/change rate of the financial subjects in the data structure with the reference data of the change value/change rate to generate a financial subject change value/change rate verification result.
Optionally, the fourth checking module 370 is configured to:
inputting the structured data of the financial subjects into a financial statement extraction and verification module, and classifying the structured data of the financial subjects according to the same subject, the same year and the same subject;
if a plurality of financial data exist in the target category, normalizing each financial data;
and comparing each normalized financial data with corresponding reference data, and taking the comparison result as a verification result.
Optionally, the presentation module 380 is configured to:
and displaying the verification result and the error correction result, and identifying the data which are inconsistent in the verification result and the content in the error correction result in the financial document.
The device can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes:
one or more processors 410, one processor 410 being illustrated in FIG. 4;
a memory 420;
the apparatus may further include: an input device 430 and an output device 440.
The processor 410, the memory 420, the input device 430 and the output device 440 of the apparatus may be connected by a bus or other means, for example, in fig. 4.
The memory 420 serves as a non-transitory computer-readable storage medium, and may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a financial document information processing method in an embodiment of the present invention (for example, the document structured data generation module 310, the financial subject structured data generation module 320, the error correction module 330, the first verification module 340, the second verification module 350, the third verification module 360, the fourth verification module 370, and the presentation module 380 shown in fig. 3). The processor 410 executes various functional applications and data processing of the computer device by running the software programs, instructions and modules stored in the memory 420, namely, a financial document information processing method of the above method embodiment is realized, that is:
generating document structured data of the financial document to be audited through a document processing module;
preprocessing the document structured data through a model and extracting financial subjects, inputting an extraction result into a data normalization module, and generating financial subject structured data based on the preprocessed data and the normalization result;
inputting the document structured data into a character error correction model, outputting an error correction result and storing the error correction result;
inputting the document structured data into a manager information spot check module, checking the information of the manager, and generating a check result of the manager information;
inputting the structured data of the financial subjects into a financial index formula calculation module to generate a check result of a financial index formula;
inputting the financial subject structured data into a financial subject change checking module, checking data related to financial subject change in the financial subject structured data, and generating a checking result of financial subject change;
inputting the financial subject structured data into a financial statement extraction and verification module to generate a verification result of the financial subject data and the corresponding reference data;
and displaying all the checking results and the error correction results.
The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to the terminal device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. The output device 440 may include a display device such as a display screen.
The embodiment of the invention provides a computer readable storage medium, which stores a computer program, and the program is executed by a processor to realize a financial document information processing method provided by the embodiment of the invention:
generating document structured data of the financial document to be audited through a document processing module;
preprocessing the document structured data through a model and extracting financial subjects, inputting an extraction result into a data normalization module, and generating financial subject structured data based on the preprocessed data and the normalization result;
inputting the document structured data into a character error correction model, outputting an error correction result and storing the error correction result;
inputting the document structured data into a manager information spot check module, checking the information of the manager, and generating a check result of the manager information;
inputting the structured data of the financial subjects into a financial index formula calculation module to generate a check result of a financial index formula;
inputting the financial subject structured data into a financial subject change checking module, checking data related to financial subject change in the financial subject structured data, and generating a checking result of financial subject change;
inputting the financial subject structured data into a financial statement extraction and verification module to generate a verification result of the financial subject data and the corresponding reference data;
and displaying all the checking results and the error correction results.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A financial document information processing method is characterized by comprising the following steps:
generating document structured data of the financial document to be audited through a document processing module;
preprocessing the document structured data through a model and extracting financial subjects, inputting an extraction result into a data normalization module, and generating financial subject structured data based on the preprocessed data and the normalization result;
inputting the document structured data into a character error correction model, outputting an error correction result and storing the error correction result;
inputting the document structured data into a manager information spot check module, checking the information of the manager, and generating a check result of the manager information;
inputting the structured data of the financial subjects into a financial index formula calculation module to generate a check result of a financial index formula;
inputting the financial subject structured data into a financial subject change checking module, checking data related to financial subject change in the financial subject structured data, and generating a checking result of financial subject change;
inputting the financial subject structured data into a financial statement extraction and verification module to generate a verification result of the financial subject data and the corresponding reference data;
and displaying all the checking results and the error correction results.
2. The method of claim 1, wherein preprocessing the document structured data through a model and extracting financial subjects, and inputting extraction results into a data normalization module, and generating financial subject structured data based on the preprocessed data and the normalization results comprises:
identifying a table type picture in the document structured data through a table optical character recognition OCR model, and identifying a table in the picture to obtain identified document structured data;
inputting the identified document structured data into a form classification model to obtain a main body corresponding to each form and the category of the form;
inputting the document structured data into a paraphrase table extraction model to obtain a report period and a reference relation of a publisher, and generating analyzed structured data;
inputting the document structured data into a paragraph classification model to obtain a main body corresponding to each paragraph;
inputting the document structured data into a form financial subject extraction model, and extracting the financial subjects and corresponding information of the form;
inputting the document structured data into a text financial subject extraction model, and extracting the text financial subjects and corresponding information;
and performing normalization operation on the extracted financial subjects and the corresponding information according to the constructed financial subject knowledge graph and the analyzed structured data, and generating final financial subject structured data based on the normalized extraction result, the main body and the form type.
3. The method of claim 2, further comprising:
taking each line of text paragraphs in the two-dimensional array in the paraphrase table and the above information in the paraphrase table as input, taking the reference relation in the paraphrase table as output, and training the paraphrase table extraction model;
training a table classification model by taking table upper information in document structured data corresponding to financial documents in a training set and each line of text segment in two-dimensional data of the table as input and taking the category of the table and a main body corresponding to the table as output;
training a paragraph classification model by taking paragraphs of document structured data corresponding to financial documents in a training set and the above information of the paragraphs as input and taking main bodies corresponding to the paragraphs as output;
training a text financial subject extraction model by taking a financial subject description paragraph in document structured data corresponding to financial documents in a training set as input and information corresponding to financial subjects as output;
training a form financial subject extraction model by taking a financial subject description paragraph in document structured data corresponding to the financial documents in the training set as input and information corresponding to financial subjects as output;
training a character error correction model by using correctly-expressed paragraphs in document structured data as corpora;
and training a table OCR model by taking the table pictures in the document structured data corresponding to the financial documents in the training set as corpora.
4. The method of claim 1, wherein inputting the structured data of financial subjects to a financial index formula calculation module generates a financial index formula check result comprising:
and substituting the structured data of the financial subjects into a pre-configured financial index formula by taking the main body and the year as the differences, calculating the result, comparing the target financial subjects corresponding to the result with the reference data of the target financial subjects, and storing the inconsistent financial subjects into the checking result.
5. The method of claim 1, wherein inputting the structured financial subject data into a financial subject change verification module, verifying data relating to financial subject changes in the structured financial subject data, and generating a financial subject change verification result comprises:
inputting the financial subject structured data into a financial subject change checking module, and generating a data structure of financial subject increase and decrease information based on the description related to financial subject change in the financial subject structured data;
and comparing the change value/change rate of the financial subjects in the data structure with the reference data of the change value/change rate to generate a financial subject change value/change rate verification result.
6. The method of claim 1, wherein inputting the structured financial subject data into a financial statement extraction verification module to generate a verification result of the structured financial subject data with corresponding benchmark data comprises:
inputting the structured data of the financial subjects into a financial statement extraction and verification module, and classifying the structured data of the financial subjects according to the same subject, the same year and the same subject;
if a plurality of financial data exist in the target category, normalizing each financial data;
and comparing each normalized financial data with corresponding reference data, and taking the comparison result as a verification result.
7. The method according to claim 1, wherein the displaying all the verification results and the error correction results comprises:
and displaying the verification result and the error correction result, and identifying the data which are inconsistent in the verification result and the content in the error correction result in the financial document.
8. A financial document information processing apparatus, comprising:
the document structured data generation module is used for generating document structured data of the financial document to be audited through the document processing module;
the financial subject structured data generation module is used for preprocessing the document structured data through a model and extracting financial subjects, inputting an extraction result into the data normalization module, and generating financial subject structured data based on the preprocessed data and the normalization result;
the error correction module is used for inputting the document structured data into a character error correction model, outputting an error correction result and storing the error correction result;
the first checking module is used for inputting the document structured data into the manager information spot check checking module, checking the information of the manager and generating a checking result of the manager information;
the second checking module is used for inputting the structured data of the financial subjects into the financial index formula calculation module to generate a checking result of the financial index formula;
the third checking module is used for inputting the financial subject structured data into the financial subject change checking module, checking the data related to financial subject change in the financial subject structured data, and generating a checking result of financial subject change;
the fourth checking module is used for inputting the financial subject structured data into the financial statement extraction checking module to generate a checking result of the financial subject data and the corresponding reference data;
and the display module is used for displaying all the verification results and the error correction results.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method of processing financial document information according to any one of claims 1-7.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a financial document information processing method according to any one of claims 1 to 7.
CN201911194180.0A 2019-11-28 2019-11-28 Financial document information processing method and device, electronic equipment and storage medium Active CN110909226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911194180.0A CN110909226B (en) 2019-11-28 2019-11-28 Financial document information processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911194180.0A CN110909226B (en) 2019-11-28 2019-11-28 Financial document information processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110909226A true CN110909226A (en) 2020-03-24
CN110909226B CN110909226B (en) 2023-06-06

Family

ID=69820391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911194180.0A Active CN110909226B (en) 2019-11-28 2019-11-28 Financial document information processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110909226B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651552A (en) * 2020-06-08 2020-09-11 中国工商银行股份有限公司 Structured information determination method and device and electronic equipment
CN111797356A (en) * 2020-07-06 2020-10-20 上海冰鉴信息科技有限公司 Webpage table information extraction method and device
CN111914543A (en) * 2020-06-20 2020-11-10 中国建设银行股份有限公司 Report validity detection method and device, electronic equipment and readable storage medium
CN112015727A (en) * 2020-09-01 2020-12-01 民生科技有限责任公司 Automatic checking and correcting system and method for financial statement data and readable storage device
CN112200117A (en) * 2020-10-22 2021-01-08 长城计算机软件与系统有限公司 Form identification method and device
CN112380826A (en) * 2020-11-12 2021-02-19 中国农业银行股份有限公司佛山分行 Formatted electronic form generation method based on text file
CN112581699A (en) * 2020-12-23 2021-03-30 华言融信科技成都有限公司 Credit report self-service interpretation equipment
CN112990091A (en) * 2021-04-09 2021-06-18 数库(上海)科技有限公司 Research and report analysis method, device, equipment and storage medium based on target detection
CN112990110A (en) * 2021-04-20 2021-06-18 数库(上海)科技有限公司 Method for extracting key information from research report and related equipment
CN113094447A (en) * 2021-03-22 2021-07-09 北京三行科技有限公司 Structured information extraction method oriented to financial statement image
CN113094446A (en) * 2021-03-22 2021-07-09 北京三行科技有限公司 Subject information extraction method oriented to financial statement image
CN113159969A (en) * 2021-05-17 2021-07-23 广州故新智能科技有限责任公司 Financial long text rechecking system
CN113672739A (en) * 2021-07-28 2021-11-19 达而观智能(深圳)有限公司 Data extraction method for image format financial and newspaper document
CN114613516A (en) * 2020-12-29 2022-06-10 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
CN115099224A (en) * 2022-07-08 2022-09-23 江苏理工学院 Method and device for extracting Chinese PDF content by fusing BilSTM + CRF and rule matching
CN115935042A (en) * 2023-01-19 2023-04-07 蔷薇大树科技有限公司 Intelligent pledge asset duplicate checking method and system based on fusion model
CN116503889A (en) * 2023-01-18 2023-07-28 苏州工业园区航星信息技术服务有限公司 File and electronic file processing method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060288268A1 (en) * 2005-05-27 2006-12-21 Rage Frameworks, Inc. Method for extracting, interpreting and standardizing tabular data from unstructured documents
JP2010123149A (en) * 2010-03-12 2010-06-03 Ntt Data Corp Accounting information collection and analysis system, and method and program therefor
CN106649223A (en) * 2016-12-23 2017-05-10 北京文因互联科技有限公司 Financial report automatic generation method based on natural language processing
CN107943785A (en) * 2017-11-06 2018-04-20 广东广业开元科技有限公司 A kind of PDF document processing method and processing device based on big data
CN109117479A (en) * 2018-08-13 2019-01-01 数据地平线(广州)科技有限公司 A kind of financial document intelligent checking method, device and storage medium
CN109800761A (en) * 2019-01-25 2019-05-24 厦门商集网络科技有限责任公司 Method and terminal based on deep learning model creation paper document structural data
CN109857990A (en) * 2018-12-18 2019-06-07 重庆邮电大学 A kind of financial class notice information abstracting method based on file structure and deep learning
JP2019191665A (en) * 2018-04-18 2019-10-31 Tis株式会社 Financial statements reading device, financial statements reading method and program

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060288268A1 (en) * 2005-05-27 2006-12-21 Rage Frameworks, Inc. Method for extracting, interpreting and standardizing tabular data from unstructured documents
JP2010123149A (en) * 2010-03-12 2010-06-03 Ntt Data Corp Accounting information collection and analysis system, and method and program therefor
CN106649223A (en) * 2016-12-23 2017-05-10 北京文因互联科技有限公司 Financial report automatic generation method based on natural language processing
CN107943785A (en) * 2017-11-06 2018-04-20 广东广业开元科技有限公司 A kind of PDF document processing method and processing device based on big data
JP2019191665A (en) * 2018-04-18 2019-10-31 Tis株式会社 Financial statements reading device, financial statements reading method and program
CN109117479A (en) * 2018-08-13 2019-01-01 数据地平线(广州)科技有限公司 A kind of financial document intelligent checking method, device and storage medium
CN109857990A (en) * 2018-12-18 2019-06-07 重庆邮电大学 A kind of financial class notice information abstracting method based on file structure and deep learning
CN109800761A (en) * 2019-01-25 2019-05-24 厦门商集网络科技有限责任公司 Method and terminal based on deep learning model creation paper document structural data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王景香;: "小微企业财务预算Excel模型――"从财务指标到财务预算"的逆向思维" *
王雪;王伦津;吕科;: "基于XBRL的财务报表网络共享技术" *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651552B (en) * 2020-06-08 2024-04-23 中国工商银行股份有限公司 Structured information determining method and device and electronic equipment
CN111651552A (en) * 2020-06-08 2020-09-11 中国工商银行股份有限公司 Structured information determination method and device and electronic equipment
CN111914543A (en) * 2020-06-20 2020-11-10 中国建设银行股份有限公司 Report validity detection method and device, electronic equipment and readable storage medium
CN111797356A (en) * 2020-07-06 2020-10-20 上海冰鉴信息科技有限公司 Webpage table information extraction method and device
CN111797356B (en) * 2020-07-06 2023-08-08 上海冰鉴信息科技有限公司 Webpage form information extraction method and device
CN112015727A (en) * 2020-09-01 2020-12-01 民生科技有限责任公司 Automatic checking and correcting system and method for financial statement data and readable storage device
CN112200117A (en) * 2020-10-22 2021-01-08 长城计算机软件与系统有限公司 Form identification method and device
CN112200117B (en) * 2020-10-22 2023-10-13 长城计算机软件与系统有限公司 Form identification method and device
CN112380826A (en) * 2020-11-12 2021-02-19 中国农业银行股份有限公司佛山分行 Formatted electronic form generation method based on text file
CN112380826B (en) * 2020-11-12 2024-03-22 中国农业银行股份有限公司佛山分行 Formatting electronic form generating method based on text file
CN112581699A (en) * 2020-12-23 2021-03-30 华言融信科技成都有限公司 Credit report self-service interpretation equipment
CN114613516A (en) * 2020-12-29 2022-06-10 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
CN113094446A (en) * 2021-03-22 2021-07-09 北京三行科技有限公司 Subject information extraction method oriented to financial statement image
CN113094447A (en) * 2021-03-22 2021-07-09 北京三行科技有限公司 Structured information extraction method oriented to financial statement image
CN112990091A (en) * 2021-04-09 2021-06-18 数库(上海)科技有限公司 Research and report analysis method, device, equipment and storage medium based on target detection
CN112990110A (en) * 2021-04-20 2021-06-18 数库(上海)科技有限公司 Method for extracting key information from research report and related equipment
CN113159969A (en) * 2021-05-17 2021-07-23 广州故新智能科技有限责任公司 Financial long text rechecking system
CN113672739A (en) * 2021-07-28 2021-11-19 达而观智能(深圳)有限公司 Data extraction method for image format financial and newspaper document
CN115099224A (en) * 2022-07-08 2022-09-23 江苏理工学院 Method and device for extracting Chinese PDF content by fusing BilSTM + CRF and rule matching
CN116503889A (en) * 2023-01-18 2023-07-28 苏州工业园区航星信息技术服务有限公司 File and electronic file processing method, device, equipment and storage medium
CN116503889B (en) * 2023-01-18 2024-01-19 苏州工业园区航星信息技术服务有限公司 File and electronic file processing method, device, equipment and storage medium
CN115935042A (en) * 2023-01-19 2023-04-07 蔷薇大树科技有限公司 Intelligent pledge asset duplicate checking method and system based on fusion model
CN115935042B (en) * 2023-01-19 2023-09-26 蔷薇大树科技有限公司 Mortgage asset intelligent duplicate checking method and system based on fusion model

Also Published As

Publication number Publication date
CN110909226B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN110909226B (en) Financial document information processing method and device, electronic equipment and storage medium
US11520975B2 (en) Lean parsing: a natural language processing system and method for parsing domain-specific languages
CA3033862C (en) System and method for automatically understanding lines of compliance forms through natural language patterns
CA3033859C (en) Method and system for automatically extracting relevant tax terms from forms and instructions
US10733675B2 (en) Accuracy and speed of automatically processing records in an automated environment
US20170186099A1 (en) Systems and methods for identifying and explaining schema errors in the computerized preparation of a payroll tax form
US20190005029A1 (en) Systems and methods for natural language processing of structured documents
CN112035653A (en) Policy key information extraction method and device, storage medium and electronic equipment
US20230136368A1 (en) Text keyword extraction method, electronic device, and computer readable storage medium
CN108170715B (en) Text structuralization processing method
CN112231431B (en) Abnormal address identification method and device and computer readable storage medium
CN111651552B (en) Structured information determining method and device and electronic equipment
US20230028664A1 (en) System and method for automatically tagging documents
Owda et al. Financial discussion boards irregularities detection system (fdbs-ids) using information extraction
Duan et al. Increasing the utility of performance audit reports: Using textual analytics tools to improve government reporting
US7653871B2 (en) Mathematical decomposition of table-structured electronic documents
US20230113578A1 (en) Transaction and ownership information document extraction
CN113095078A (en) Associated asset determination method and device and electronic equipment
CA3076418C (en) Lean parsing: a natural language processing system and method for parsing domain-specific languages
CN116402056A (en) Document information processing method and device and electronic equipment
JP2021071866A (en) Semantic analysis device, semantic analysis method and program for trade transaction telegrams
KR20230139096A (en) Intellectual property data platform
CN117882081A (en) AI enhanced audit platform including techniques for automatically evaluating evidence of a checklist
CN115510196A (en) Knowledge graph construction method, question answering method, device and storage medium
CN116739602A (en) Suspicious electronic bill prediction method based on multi-model fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant