US20200302392A1 - Financial documents examination methods and systems - Google Patents

Financial documents examination methods and systems Download PDF

Info

Publication number
US20200302392A1
US20200302392A1 US15/729,645 US201715729645A US2020302392A1 US 20200302392 A1 US20200302392 A1 US 20200302392A1 US 201715729645 A US201715729645 A US 201715729645A US 2020302392 A1 US2020302392 A1 US 2020302392A1
Authority
US
United States
Prior art keywords
tables
data
user
document
financial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/729,645
Inventor
Naman Shah
Atul Shah
Anurag SAXENA
Jed GORE
Jitender Khatri
Vaibhav Negi
Rajdeep Singh Gill
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sentieo Inc
Original Assignee
Sentieo Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sentieo Inc filed Critical Sentieo Inc
Priority to US15/729,645 priority Critical patent/US20200302392A1/en
Assigned to Sentieo, Inc. reassignment Sentieo, Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GORE, JED, GILL, RAJDEEP SINGH, SHAH, ATUL, Negi, Vaibhav, KHATRI, JITENDER, SAXENA, ANURAG, SHAH, NAMAN
Publication of US20200302392A1 publication Critical patent/US20200302392A1/en
Priority to US17/837,526 priority patent/US20220300906A1/en
Priority to US18/059,588 priority patent/US11829950B2/en
Priority to US18/230,237 priority patent/US20230376900A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Definitions

  • the present invention relates to financial software. More specifically, the invention relates to software for analyzing similar financial data from multiple documents over time thereby gaining insights into the financial data.
  • FIG. 1 is a high-level diagram of a time series feature in accordance with one embodiment
  • FIG. 2 shows a financial table with a column of marks along the left side for each row and a row of marks at the top of the table;
  • FIG. 3 is a screenshot showing how auditing modifies the table on the right pane to display a source table from an original document in accordance with one embodiment
  • FIG. 4 is a block diagram showing a high-level view of similar tables
  • FIG. 5 shows a screenshot when a user clicks Similar Tables in a document and showing tables from five years of quarterly filings and presenting in a split screen view;
  • FIG. 6 is a screenshot of one feature of a similar tables tool in accordance with one embodiment
  • FIG. 7 is a screenshot of a stitched tables feature in accordance with one embodiment
  • FIG. 8 is a screenshot of a melted tables feature in accordance with one embodiment
  • FIG. 9 is a flow diagram of a process of pre-processing a document and creating dictionaries for tables in the document;
  • FIG. 10 is a flow diagram of a process of creating a time series of tables in accordance with one embodiment
  • FIG. 11 is a flow diagram showing options of what can be viewed through the platform user interface and exported into a spreadsheet in accordance with one embodiment
  • FIG. 12 is a block diagram of a system of the financial document intelligence platform in accordance with various embodiments.
  • FIG. 13 is a block diagram illustrating an example of a computer system capable of implementing various processes in the described embodiments
  • a financial document intelligence system receives a document containing unstructured data. Tables in the document are identified and extracted using a parsing engine. Each table is converted to a dictionary. The system then verifies that data in a table is financial data and, once verified, the dictionaries for valid financial data tables are stored. A series of stitched tables, also referred to as a time series, is created for a selected table using a row-based, “next best” matching algorithm. Tables that are similar to the selected table with respect to type and schema are identified and used to create a time series for the selected table. This time series allows users to easily see how certain financial data has changed over time.
  • Example embodiments of methods and systems for examining and analyzing financial and corporate documents are described. These examples and embodiments are provided solely to add context and aid in the understanding of the invention. Thus, it will be apparent to one skilled in the art of software and financial document processing that the present invention may be practiced without some or all of the specific details described herein. In other instances, well-known concepts have not been described in detail in order to avoid unnecessarily obscuring the present invention. Although these embodiments are described in sufficient detail to enable one skilled in the art to practice the invention, these examples, illustrations, and contexts are not limiting, and other embodiments may be used and changes may be made without departing from the spirit and scope of the invention.
  • Time series is a function that allows users to select line item rows from a document or HTMUXBRL table and automatically retrieve historical values of those line items from previous documents.
  • a high-level diagram of the time series feature is shown in FIG. 1 .
  • the user selects either a specific row from a table to be extracted or the entire table. In one embodiment, the selection of a row can also be done automatically by the software.
  • Logic behind the software finds that particular table in previous versions of the document and extracts those lines and builds what is referred to as a composite table.
  • the user interface for a time series table allows the user to perform an audit easily by enabling the user to go back to the table from which the highlighted figure originated.
  • the system can auto-update the composite table whenever a new table is created.
  • FIGS. 2 and 3 first FIG. 2 shows a table with a column of marks along the left side for each row and a row of marks at the top of the table.
  • line items from the original table become columns in the composite table.
  • Each value can be audited by clicking on the table row on the left side of the screen.
  • auditing modifies the table on the right pane to display the source table from the original document.
  • the user can also edit the resulting table on the left.
  • the entire table may be exported to a spreadsheet or saved in another suitable format. Additionally, the resulting table can be set to auto-update with new values when new versions of the document are made available.
  • the time series and table extraction features are implemented in the following manner. As a financial or other type of document is received in the system, it goes through various preprocessing steps. One of the key stages involves identification of tabular data within the documents. The tabular data is of particular significance as the tables provide a quick structured summary of data mentioned in different places with the document.
  • each table is fed to a parsing engine which creates a text skeleton of the table and divides it into different parts such as headers, data headers, terms and values.
  • a parsing engine which creates a text skeleton of the table and divides it into different parts such as headers, data headers, terms and values.
  • the tables in SEC filings are highly non-standardized, consequently the system goes through a number of preprocessing steps to be able to correctly map a term value to its corresponding column value.
  • the value of each term is identified and saved for each period.
  • This skeleton data of each table is then stored in a database from where it can be quickly fetched on demand.
  • the system may maintain these records for all the historical 10-Qs, 10-Ks, 8-K Earnings, XBRL documents and other SEC filings.
  • the identification logic of the present invention matches the tables on the basis of terms used in the table. Since the order of the tables often varies across documents, and is particularly different in 10-Ks and 10-Qs, the term matching algorithm results in a good match.
  • the table extraction function identifies the most similar table found in each previous document, based on the term matching algorithm, and returns them.
  • results of a time series extraction are returned, they are presented within a table for the user.
  • the user may click on each value and on the right side of the screen, an auditing pane loads the corresponding table from where the value was pulled and is displayed. Extracted valued are color coded to make identifying them within large tables easy. This allows a user to quickly audit the entire table to ensure the values that our algorithm has produced are correct. If multiple tables were returned for the document/value that the user is auditing, the additional tables are displayed below. The user can easily replace values by clicking in any portion of the table and typing the new value or by selecting other matched values from a dropdown menu in the auditing pane.
  • Time series allows users to quickly transform these summation values to quarterly values for the entire table by checking specific boxes. If a summation transformation is required only for a single document/value, the user can simply click the YTD box to the left of that value. Once the user is satisfied with the output of the table, they have the option to export the entire table to a spreadsheet. They may also save the output of the time series extraction on the system or open it within a visualization engine, described below.
  • the time series function allows the user to select a number of terms from the source table.
  • the function identifies the top three similar tables in each previous document (on the basis of the term match algorithm) and then looks for the exact term as the user has requested. If the term is found, that numerical value which corresponds to the latest date in the table is fetched.
  • the output of the time series function is the list of quarter-value pair for the term across documents for the previous five years. The user has an option to load the values for older documents if the user wants the previous data.
  • the system first returns an empty value for the missing quarter. The system then goes back to find the similar tables to the source table in the document corresponding to the missing quarter. The system then finds other terms in these similar tables, which are similar to the term that the user has requested. If a term is matched with a high degree of surety, the system finds the corresponding term value for the latest date in the table and returns it with a warning that the actual term and value may be different.
  • the similar tables feature allows a user to click a button above a document table to load up the same table from previous filings.
  • a high-level view of similar tables is shown in FIG. 4 .
  • a user may click Similar Tables on the Income Statement table in a company's quarterly filing and the tool automatically fetches the Income Statement table from five years of quarterly filings and presents them to the user in a split screen view. This is shown in FIG. 5 .
  • the similar tables tool identifies these similar, historical tables by applying an algorithm to take the contents of the original table and statistically compare them to the contents of all tables in historical filings of the same type. This is shown in FIG. 6 .
  • the table with the highest statistical match is presented as the matched similar table.
  • the user may export the set of tables to a spreadsheet or perform advanced analysis/export through time series, stitched tables and melted tables, features that are described below.
  • stitched tables Another feature of the invention is referred to as stitched tables.
  • This tool generalizes the concepts of time series and similar tables extraction to join entire tables processed at once instead of on a line-by-line basis. Line items that do not match are preserved in sequence rather than discarded. Duplicate line items are separately handled.
  • This method has the advantage of being computationally efficient for large volumes of tables. It also has the advantage of handling, in a user-friendly way, constant changes in financial reporting as business needs evolve over time, for example, due to reorganizations, acquisitions, and new/discounted disclosures. An example is shown in FIG. 7 .
  • a feature related to stitched tables may be referred to as melted tables. These tables generalize the concept of stitched tables to encompass multi-dimensional tables where time is not represented in a single column but rather is represented by the entire table. Columns are reshaped into rows and stitched together with their corollaries across time. This has particular applications in a variety of modeling contexts, for example, from Debt Maturity Schedules to property-level ownership breakdowns. An example is shown in FIG. 8 .
  • FIG. 9 is a flow diagram of a process of pre-processing a document and creating dictionaries for tables in the document.
  • the system receives a file of some type of document.
  • the file can come from one of a wide range of sources and may not necessarily be a financial document. For example, it can be a PDF, a PowerPoint document, user notes, an Excel spreadsheet, and so on.
  • the document is some type of financial document such as a 10-K, 10-Q, an annual report, or some other type of conventional financial document for a public company, but may not be.
  • the general goal is to extract financial data formatted as tables from a corpus of documents containing unstructured data.
  • the system is a computing system that executes software provided and managed by a third-party financial intelligence service provider.
  • the document is inputted, in most cases, by a client of the service provider.
  • the first operation by the system is converting it to a suitable format for further processing.
  • the format is HTML. In other embodiments, different formats can be utilized.
  • the system identifies tables in the document and extracts them, it separates the tables from the rest of the non-table (or non-tabular) data. This is done by a parsing engine in the system and, in one embodiment, may be implemented by searching for specific tags, such as “TABLE”. In other implementations, the parsing engine may search unstructured text for keywords associated with financial data. Some of the tables may not contain financial data or numerical data, in other words, they may not be financial tables. For example, a table may contain only text data, such as names, locations, product names, and so on. However, at step 904 , in one embodiment, these tables are still extracted. In addition, the parsing engine is also able to identify and include footnotes.
  • the parsing engine is also capable of identifying and processing multi-columnar tables, rendering complex latitudinal (wide) data structures into simplified longitudinal (long) data structures which may be more easily stored and manipulated programmatically.
  • the system converts each extracted table to what is referred to as a dictionary of table information.
  • the dictionary includes table data values, number of columns, headers, source document location, relationships between data and column headers (e.g., from which column did the data in this row come from), and other data.
  • a sample of a dictionary is “docid”: “123abc”, “currency”: USD, “section;” “Calculation of Net Leverage Ratio”, “period:” “Q1, 2017”, value: 18890, field: calculation of net leverage ratiototal debt, alias: “net_leverage_ratio: calculation . . . ” subsection, table: Net Leverage Ratio, tickler: amt, unit: null.
  • Step 906 is done for each table extracted from the source document. Once all the tables have been converted to dictionaries, the system scans or examines each table, more specifically, the dictionary for each table, to determine if it contains valid financial data at step 908 . For example, the system may look for null values or all text data, examples of two indicators that the table does not contain financial data, the only data relevant to embodiments of the present invention. In one embodiment, the system uses what is referred to as identification logic to spot valid financial tables. For example, it can look for specific financial terms that are commonly used, for instance, as column headers, or look for actual numerical data. This is done for each dictionary created at step 906 .
  • the system stores the dictionary for each valid financial data table.
  • the other dictionaries and tables are discarded.
  • the dictionaries and financial tables are written to a central database. From the database, the tables may eventually be displayed in the user interface of the system. For example, a valid financial data table from the source document can be displayed to the user. As described below in FIG. 10 , if there is a history of tables that are similar to the table selected by the user, a stitched time series of these similar tables with the selected table may be displayed to the user.
  • One version of the user interface of the system also simply displays the previous tables side by side next to the source table thereby enabling rapid, paginated review.
  • the first stage of the document pre-processing stage is complete after step 910 .
  • FIG. 10 is a flow diagram of a process of creating a time series of tables for a selected table in accordance with one embodiment. As noted, this may also be referred as stitching a currently selected table with similar tables from previously submitted documents from the user.
  • the system begins by identifying a current table (i.e., a table selected by the user) for a current entity.
  • the dictionary for the selected table is retrieved from the database.
  • entity can refer to anything for which the service provider has data; it provides an umbrella context for the table. It can be characterized as the top of a schema for a corpus of documents, where all tables (and other data) are subordinate to the entity.
  • an entity could be a private company, a public company's stock ticker, an institution, such the Federal Reserve Bank, a government agency, and so on.
  • it can be anything for which table data has been collected and stored.
  • the system identifies tables from previous documents for that entity that are similar to the current table.
  • the system performs this operation by using data contained in the dictionary for the selected table.
  • identifying similar tables is performed by looking at table names (e.g., “Balance Sheet”) from previous documents, annual reports, for the current entity.
  • this may be done by performing a row-based, “next best” matching algorithm.
  • the “next best match” algorithm can be described as matching the list of rows for the currently selected table against the list of rows for all other tables in previous documents. The best match would be the previous tables for which the number of matched rows is closest to the total number of rows of the currently selected table.
  • the system has identified and verified tables that are essentially the same as the current table but from older documents (e.g., from last month, last quarter, last fiscal year, etc.).
  • the non-matching rows are flagged and included in the tables; they are not discarded by the system. In one embodiment, the non-matching rows are moved to the bottom of the table and displayed in a different color from the matching rows. If there is more than a certain ratio of matching to non-matching rows or there is over a pre-determined percentage of non-matching rows, the tables are flagged or marked for manual review, described in step 1010 below.
  • the operation performed by the system in step 1006 creates a multi-table, row-matched schema for the current table. As noted, this may also be described as a stitched time series of tables for the current table.
  • the row mapping, or stitched table, schema is stored in a database as the default stitching schema for that table. Subsequent user modifications may create a new schema associated with a specific user identifier. As mentioned, the user can modify the default schema by moving non-matching flagged rows back to their original place in the table, requesting that the system merge the flagged row with the non-flagged rows, moving them to wherever they want them to be in the table (e.g., at the top), or can discard them.)
  • the system-created, default stitched tables are stored in the dictionary of the current table.
  • Both user-defined and system schemas can be configured for alerts so that when a new table is released that matches the saved schema, the new data is automatically added to the stored stitched tables and the user is notified of the addition.
  • the service provider via the platform addresses or manually modifies the schema of tables that were marked or somehow distinguished as being heavily flagged tables, tables that have over a pre-determined percentage of flagged rows.
  • a table can also be brought to the attention of the service provider by the user; the user may have a reason or simply want to service provider to audit the table.
  • the tables that are flagged earlier as having too many non-matching rows are still stitched, but may be characterized as insufficiently stitched tables. As such, they are manually reviewed or audited by the service provider who has advanced tools and user interfaces for doing so. During the audit, corrections and updates are made and the insufficiently stitched tables are completed and made into an acceptable time series and stored with the current table in the database and can be displayed or exported, as described below.
  • the user has options with regard to what can be viewed through the platform's user interface and exported into a spreadsheet. These options are shown in FIG. 11 .
  • One option is that the user can elect to see only the current table on the screen. This is shown in box 1102 .
  • the user can then export the current table to an external document, such as a spreadsheet at box 1108 . This is the table that was selected by the user at the beginning of FIG. 9 and is the table without any stitched tables.
  • Another option is the user can click through to each cell of the time series and stitched tables and have the source table load up in a popup or a window so that the values of the time series can be audited.
  • a table stitching engine may determine this in many cases by using an algorithm that uses document type and table headers.
  • the system also provides filters that allow year over year values to be added within the table.
  • the system receives selection input for displaying the default stitched tables as described in FIG. 10 .
  • the default current table may have flagged (unmatched) rows at the bottom of the table (or wherever the system designer decides the default location should be) and may be shown in a different color.
  • the default table is shown with its time series of similar tables.
  • the default current table and the stitched tables can be exported into an external spreadsheet, again as shown in step 1108 .
  • the user can modify the default table, for example, by moving the flagged rows to a different location in the table or deleting them. This modified table and the stitched tables can be selected by the user and the system will display the modified table at step 1106 .
  • These tables can also be exported into a spreadsheet and utilized by the user outside of the system.
  • the dictionary for the table whether the original (step 1102 ), default stitched ( 1104 ) or modified ( 1106 ), is used to display the tabulated data and export into an external document.
  • FIG. 12 is a block diagram of a system of the financial document intelligence platform in accordance with various embodiments.
  • the platform may be implemented as software executing on a user (e.g., a customer of the financial document intelligence service provider) system or may be operated by the service provider and offered as a service to users. In either case, the platform has various components, modules, and databases most of which have been referenced above and whose functionality has been shown in the flow diagrams.
  • a corpus of documents 1202 is inputted into a system or platform 1204 .
  • the documents are input one by one and the flow diagrams above explain the steps taken when one document is inputted, such as at step 902 , and when one table is selected, as in step 1002 .
  • a customer is likely to have inputted numerous documents over a period of time.
  • the only way to obtain a stitched time series of tables is for a user to input numerous documents over time, that is, to have a history of similar documents in the system.
  • a document is received by a document conversion engine 1206 .
  • This module converts the document, which may be any type of document and may not even be a financial document and have no tables in it, to HTML.
  • a table extraction engine 1208 identifies and pulls any financial tables from the document.
  • module 1208 is also responsible for converting each table to a dictionary. As described above, several pre-processing steps occur to ensure that only the dictionaries of valid financial tables are converted to dictionaries.
  • Component 1210 is a database used by system 1204 to store all data needed for creating stitched tables. It stores, among other types of data, dictionaries of tables, user data, and table and row data.
  • a table stitching engine 1212 creates the stitched tables. It takes a current table for an entity and, if historical data is available for that table and entity, creates the stitched tables.
  • Engine 1212 performs many of the steps in FIG. 10 , such as identifying and verifying historical tables that are similar in type and schema to the current table. This logic is referred to as identification logic above. It also performs the operations of the row-by-row matching between all the verified similar tables, flagging of non-matching rows, and creating the time series. It executes the row-matching algorithm. Once the table stitching is complete, data for the current table is stored in the table dictionary and stored in database 1210 .
  • Engine 1212 is in communication with a timeseries user interface module 1214 where the user can display various versions of the current table as described in FIG. 11 .
  • the user can preview the current table, the default stitched tables, or the modified stitched tables.
  • An export engine module 1216 is used to export tables to a spreadsheet for the user if the export feature is selected.
  • FIG. 13 is an illustration of a data processing system 1300 depicted in accordance with some embodiments and as shown in FIG. 12 .
  • Data processing system 1300 may be used to implement one or more computers used in a controller or other components of various systems described above.
  • data processing system 1300 includes communications framework 1302 , which provides communications between processor unit 1304 , memory 1306 , persistent storage 1308 , communications unit 1310 , input/output (I/O) unit 1312 , and display 1314 .
  • communications framework 1302 may take the form of a bus system.
  • Processor unit 1304 serves to execute instructions for software that may be loaded into memory 1306 .
  • Processor unit 1304 may be a number of processors, a multi-processor core, or some other type of processor, depending on the particular implementation.
  • Memory 1306 and persistent storage 1308 are examples of storage devices 1316 .
  • a storage device is any piece of hardware that is capable of storing information, such as, for example, without limitation, data, program code in functional form, and/or other suitable information either on a temporary basis and/or a permanent basis.
  • Storage devices 1316 may also be referred to as computer readable storage devices in these illustrative examples.
  • Memory 1306 in these examples, may be, for example, a random access memory or any other suitable volatile or non-volatile storage device.
  • Persistent storage 1308 may take various forms, depending on the particular implementation. For example, persistent storage 1308 may contain one or more components or devices.
  • persistent storage 1308 may be a hard drive, a flash memory, a rewritable optical disk, a rewritable magnetic tape, or some combination of the above.
  • the media used by persistent storage 1308 also may be removable.
  • a removable hard drive may be used for persistent storage 1308 .
  • Communications unit 1310 in these illustrative examples, provides for communications with other data processing systems or devices.
  • communications unit 1310 is a network interface card.
  • Input/output unit 1312 allows for input and output of data with other devices that may be connected to data processing system 1300 .
  • input/output unit 1312 may provide a connection for user input through a keyboard, a mouse, and/or some other suitable input device. Further, input/output unit 1312 may send output to a printer.
  • Display 1314 provides a mechanism to display information to a user.
  • Instructions for the operating system, applications, and/or programs may be located in storage devices 1316 , which are in communication with processor unit 1304 through communications framework 1302 .
  • the processes of the different embodiments may be performed by processor unit 1304 using computer-implemented instructions, which may be located in a memory, such as memory 1306 .
  • program code computer usable program code
  • computer readable program code that may be read and executed by a processor in processor unit 1304 .
  • the program code in the different embodiments may be embodied on different physical or computer readable storage media, such as memory 1306 or persistent storage 1308 .
  • Program code 1318 is located in a functional form on computer readable media 1320 that is selectively removable and may be loaded onto or transmitted to data processing system 1300 for execution by processor unit 1304 .
  • Program code 1318 and computer readable media 1320 form computer program product 1322 in these illustrative examples.
  • computer readable media 1320 may be computer readable storage media 1324 or computer readable signal media 1326 .
  • computer readable storage media 1324 is a physical or tangible storage device used to store program code 1318 rather than a medium that propagates or transmits program code 1318 .
  • program code 1318 may be transmitted to data processing system 1300 using computer readable signal media 1326 .
  • Computer readable signal media 1326 may be, for example, a propagated data signal containing program code 1318 .
  • Computer readable signal media 1326 may be an electromagnetic signal, an optical signal, and/or any other suitable type of signal. These signals may be transmitted over communications channels, such as wireless communications channels, optical fiber cable, coaxial cable, a wire, and/or any other suitable type of communications channel.
  • the different components illustrated for data processing system 1300 are not meant to provide architectural limitations to the manner in which different embodiments may be implemented.
  • the different illustrative embodiments may be implemented in a data processing system including components in addition to and/or in place of those illustrated for data processing system 1300 .
  • Other components shown in FIG. 13 can be varied from the illustrative examples shown.
  • the different embodiments may be implemented using any hardware device or system capable of running program code 1318 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Technology Law (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

A user is able to extract financial data, particularly tables, from a document. The table is stored and the user can compare the data in this table with data from similar tables from previous documents. The user can see how financial data has changed historically by looking only at financial tables from the same type of document, for example, only balance sheet tables from annual reports for a specific public company, over many years, and see how the values have changed or whether any new categories or types of data have been added or deleted. From the time series of financial data, the user can gain real intelligence into an entity's financial health.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims priority to provisional application No. 62/405,828, filed Oct. 7, 2016, which is incorporated herein by reference for all purposes.
  • BACKGROUND OF THE INVENTION 1. Field of the Invention
  • The present invention relates to financial software. More specifically, the invention relates to software for analyzing similar financial data from multiple documents over time thereby gaining insights into the financial data.
  • 2. Description of the Related Art
  • Current financial and corporate document examination software platforms lack efficient and intuitive features for their users, and do not possess the ability to process unstructured financial data into coherent structures. The user experience for many of these tools do not facilitate quick and in-depth analysis of financial and corporate data, particularly in the instance where such data are contained in free form text, or in data tables that are specific to an industry or a single company. Users are therefore prevented from gaining meaningful insights into what the numbers and statements contained in these financial and corporate documents mean. There is, effectively, an “intelligence-gathering” limit that is reached with current tools. What is needed is a platform for facilitating analysis of similar and sometimes unstructured financial data over a period of time so that changes and trends can be easily detected.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a high-level diagram of a time series feature in accordance with one embodiment;
  • FIG. 2 shows a financial table with a column of marks along the left side for each row and a row of marks at the top of the table;
  • FIG. 3 is a screenshot showing how auditing modifies the table on the right pane to display a source table from an original document in accordance with one embodiment;
  • FIG. 4 is a block diagram showing a high-level view of similar tables;
  • FIG. 5 shows a screenshot when a user clicks Similar Tables in a document and showing tables from five years of quarterly filings and presenting in a split screen view;
  • FIG. 6 is a screenshot of one feature of a similar tables tool in accordance with one embodiment;
  • FIG. 7 is a screenshot of a stitched tables feature in accordance with one embodiment;
  • FIG. 8 is a screenshot of a melted tables feature in accordance with one embodiment;
  • FIG. 9 is a flow diagram of a process of pre-processing a document and creating dictionaries for tables in the document;
  • FIG. 10 is a flow diagram of a process of creating a time series of tables in accordance with one embodiment;
  • FIG. 11 is a flow diagram showing options of what can be viewed through the platform user interface and exported into a spreadsheet in accordance with one embodiment;
  • FIG. 12 is a block diagram of a system of the financial document intelligence platform in accordance with various embodiments; and
  • FIG. 13 is a block diagram illustrating an example of a computer system capable of implementing various processes in the described embodiments;
  • SUMMARY OF THE EMBODIMENTS
  • In one aspect of the invention, a method of extracting financial data from a document and analyzing similar financial data from older documents to enhance understanding of the document is described. A financial document intelligence system receives a document containing unstructured data. Tables in the document are identified and extracted using a parsing engine. Each table is converted to a dictionary. The system then verifies that data in a table is financial data and, once verified, the dictionaries for valid financial data tables are stored. A series of stitched tables, also referred to as a time series, is created for a selected table using a row-based, “next best” matching algorithm. Tables that are similar to the selected table with respect to type and schema are identified and used to create a time series for the selected table. This time series allows users to easily see how certain financial data has changed over time.
  • DETAILED DESCRIPTION
  • Example embodiments of methods and systems for examining and analyzing financial and corporate documents are described. These examples and embodiments are provided solely to add context and aid in the understanding of the invention. Thus, it will be apparent to one skilled in the art of software and financial document processing that the present invention may be practiced without some or all of the specific details described herein. In other instances, well-known concepts have not been described in detail in order to avoid unnecessarily obscuring the present invention. Although these embodiments are described in sufficient detail to enable one skilled in the art to practice the invention, these examples, illustrations, and contexts are not limiting, and other embodiments may be used and changes may be made without departing from the spirit and scope of the invention.
  • One aspect of the present invention is the ability to perform what is referred to as time series and table extraction. Time series is a function that allows users to select line item rows from a document or HTMUXBRL table and automatically retrieve historical values of those line items from previous documents. A high-level diagram of the time series feature is shown in FIG. 1. The user selects either a specific row from a table to be extracted or the entire table. In one embodiment, the selection of a row can also be done automatically by the software. Logic behind the software finds that particular table in previous versions of the document and extracts those lines and builds what is referred to as a composite table.
  • The user interface for a time series table allows the user to perform an audit easily by enabling the user to go back to the table from which the highlighted figure originated. In one embodiment, the system can auto-update the composite table whenever a new table is created. Referring now to FIGS. 2 and 3, first FIG. 2 shows a table with a column of marks along the left side for each row and a row of marks at the top of the table. In one embodiment, line items from the original table become columns in the composite table. Each value can be audited by clicking on the table row on the left side of the screen. Referring to FIG. 3, auditing modifies the table on the right pane to display the source table from the original document. The user can also edit the resulting table on the left. The entire table may be exported to a spreadsheet or saved in another suitable format. Additionally, the resulting table can be set to auto-update with new values when new versions of the document are made available.
  • In one embodiment, the time series and table extraction features are implemented in the following manner. As a financial or other type of document is received in the system, it goes through various preprocessing steps. One of the key stages involves identification of tabular data within the documents. The tabular data is of particular significance as the tables provide a quick structured summary of data mentioned in different places with the document.
  • In the next step, each table is fed to a parsing engine which creates a text skeleton of the table and divides it into different parts such as headers, data headers, terms and values. For example, the tables in SEC filings are highly non-standardized, consequently the system goes through a number of preprocessing steps to be able to correctly map a term value to its corresponding column value. In case the table contains data for different periods, the value of each term is identified and saved for each period.
  • This skeleton data of each table is then stored in a database from where it can be quickly fetched on demand. The system may maintain these records for all the historical 10-Qs, 10-Ks, 8-K Earnings, XBRL documents and other SEC filings.
  • Since the data reported in the filings remain similar from quarter to quarter, the corresponding tables can be identified and matched across the document. The identification logic of the present invention matches the tables on the basis of terms used in the table. Since the order of the tables often varies across documents, and is particularly different in 10-Ks and 10-Qs, the term matching algorithm results in a good match.
  • As the user opens a document in the platform, the user has an option to use the time series and table extraction functions on each identified table. The table extraction function identifies the most similar table found in each previous document, based on the term matching algorithm, and returns them.
  • Once the results of a time series extraction are returned, they are presented within a table for the user. The user may click on each value and on the right side of the screen, an auditing pane loads the corresponding table from where the value was pulled and is displayed. Extracted valued are color coded to make identifying them within large tables easy. This allows a user to quickly audit the entire table to ensure the values that our algorithm has produced are correct. If multiple tables were returned for the document/value that the user is auditing, the additional tables are displayed below. The user can easily replace values by clicking in any portion of the table and typing the new value or by selecting other matched values from a dropdown menu in the auditing pane.
  • Users are normally looking for quarterly values. However, often times values reported within filings are for entire fiscal years or quarter summations of the year to date (3 months, 6 months, 9 months, etc.). Time series allows users to quickly transform these summation values to quarterly values for the entire table by checking specific boxes. If a summation transformation is required only for a single document/value, the user can simply click the YTD box to the left of that value. Once the user is satisfied with the output of the table, they have the option to export the entire table to a spreadsheet. They may also save the output of the time series extraction on the system or open it within a visualization engine, described below.
  • The time series function allows the user to select a number of terms from the source table. The function identifies the top three similar tables in each previous document (on the basis of the term match algorithm) and then looks for the exact term as the user has requested. If the term is found, that numerical value which corresponds to the latest date in the table is fetched. The output of the time series function is the list of quarter-value pair for the term across documents for the previous five years. The user has an option to load the values for older documents if the user wants the previous data.
  • A term may not be used in exactly the same way as in the previous table. This could be because the company has changed the nomenclature for the reported term. For example, a company which stated reporting “Total Members” of its service initially may change it to “Total Membership.”
  • For such cases, the system first returns an empty value for the missing quarter. The system then goes back to find the similar tables to the source table in the document corresponding to the missing quarter. The system then finds other terms in these similar tables, which are similar to the term that the user has requested. If a term is matched with a high degree of surety, the system finds the corresponding term value for the latest date in the table and returns it with a warning that the actual term and value may be different.
  • The similar tables feature allows a user to click a button above a document table to load up the same table from previous filings. A high-level view of similar tables is shown in FIG. 4. For example, a user may click Similar Tables on the Income Statement table in a company's quarterly filing and the tool automatically fetches the Income Statement table from five years of quarterly filings and presents them to the user in a split screen view. This is shown in FIG. 5.
  • The similar tables tool identifies these similar, historical tables by applying an algorithm to take the contents of the original table and statistically compare them to the contents of all tables in historical filings of the same type. This is shown in FIG. 6. The table with the highest statistical match is presented as the matched similar table. Once the tool has presented all the similar tables to a user, the user may export the set of tables to a spreadsheet or perform advanced analysis/export through time series, stitched tables and melted tables, features that are described below.
  • Another feature of the invention is referred to as stitched tables. This tool generalizes the concepts of time series and similar tables extraction to join entire tables processed at once instead of on a line-by-line basis. Line items that do not match are preserved in sequence rather than discarded. Duplicate line items are separately handled. This method has the advantage of being computationally efficient for large volumes of tables. It also has the advantage of handling, in a user-friendly way, constant changes in financial reporting as business needs evolve over time, for example, due to reorganizations, acquisitions, and new/discounted disclosures. An example is shown in FIG. 7.
  • A feature related to stitched tables may be referred to as melted tables. These tables generalize the concept of stitched tables to encompass multi-dimensional tables where time is not represented in a single column but rather is represented by the entire table. Columns are reshaped into rows and stitched together with their corollaries across time. This has particular applications in a variety of modeling contexts, for example, from Debt Maturity Schedules to property-level ownership breakdowns. An example is shown in FIG. 8.
  • Methods and systems for gathering intelligence and understanding financial documents are described in the figures below. FIG. 9 is a flow diagram of a process of pre-processing a document and creating dictionaries for tables in the document. At step 902 the system receives a file of some type of document. The file can come from one of a wide range of sources and may not necessarily be a financial document. For example, it can be a PDF, a PowerPoint document, user notes, an Excel spreadsheet, and so on. In typical cases, the document is some type of financial document such as a 10-K, 10-Q, an annual report, or some other type of conventional financial document for a public company, but may not be.
  • The general goal is to extract financial data formatted as tables from a corpus of documents containing unstructured data. The system is a computing system that executes software provided and managed by a third-party financial intelligence service provider. The document is inputted, in most cases, by a client of the service provider. After the file is entered, the first operation by the system is converting it to a suitable format for further processing. In one embodiment, the format is HTML. In other embodiments, different formats can be utilized.
  • At step 904 the system identifies tables in the document and extracts them, it separates the tables from the rest of the non-table (or non-tabular) data. This is done by a parsing engine in the system and, in one embodiment, may be implemented by searching for specific tags, such as “TABLE”. In other implementations, the parsing engine may search unstructured text for keywords associated with financial data. Some of the tables may not contain financial data or numerical data, in other words, they may not be financial tables. For example, a table may contain only text data, such as names, locations, product names, and so on. However, at step 904, in one embodiment, these tables are still extracted. In addition, the parsing engine is also able to identify and include footnotes. These footnotes may be structurally part of the table or contiguous to the table. The parsing engine is also capable of identifying and processing multi-columnar tables, rendering complex latitudinal (wide) data structures into simplified longitudinal (long) data structures which may be more easily stored and manipulated programmatically.
  • At step 906, the system converts each extracted table to what is referred to as a dictionary of table information. In one embodiment, the dictionary includes table data values, number of columns, headers, source document location, relationships between data and column headers (e.g., from which column did the data in this row come from), and other data. A sample of a dictionary is “docid”: “123abc”, “currency”: USD, “section;” “Calculation of Net Leverage Ratio”, “period:” “Q1, 2017”, value: 18890, field: calculation of net leverage ratiototal debt, alias: “net_leverage_ratio: calculation . . . ” subsection, table: Net Leverage Ratio, tickler: amt, unit: null.
  • Step 906 is done for each table extracted from the source document. Once all the tables have been converted to dictionaries, the system scans or examines each table, more specifically, the dictionary for each table, to determine if it contains valid financial data at step 908. For example, the system may look for null values or all text data, examples of two indicators that the table does not contain financial data, the only data relevant to embodiments of the present invention. In one embodiment, the system uses what is referred to as identification logic to spot valid financial tables. For example, it can look for specific financial terms that are commonly used, for instance, as column headers, or look for actual numerical data. This is done for each dictionary created at step 906.
  • At step 910, the system stores the dictionary for each valid financial data table. The other dictionaries and tables are discarded. The dictionaries and financial tables are written to a central database. From the database, the tables may eventually be displayed in the user interface of the system. For example, a valid financial data table from the source document can be displayed to the user. As described below in FIG. 10, if there is a history of tables that are similar to the table selected by the user, a stitched time series of these similar tables with the selected table may be displayed to the user. One version of the user interface of the system also simply displays the previous tables side by side next to the source table thereby enabling rapid, paginated review. The first stage of the document pre-processing stage is complete after step 910.
  • FIG. 10 is a flow diagram of a process of creating a time series of tables for a selected table in accordance with one embodiment. As noted, this may also be referred as stitching a currently selected table with similar tables from previously submitted documents from the user. At step 1002 the system begins by identifying a current table (i.e., a table selected by the user) for a current entity. The dictionary for the selected table is retrieved from the database. The term “entity” can refer to anything for which the service provider has data; it provides an umbrella context for the table. It can be characterized as the top of a schema for a corpus of documents, where all tables (and other data) are subordinate to the entity. For example, an entity could be a private company, a public company's stock ticker, an institution, such the Federal Reserve Bank, a government agency, and so on. As noted, it can be anything for which table data has been collected and stored.
  • At step 1004 the system identifies tables from previous documents for that entity that are similar to the current table. The system performs this operation by using data contained in the dictionary for the selected table. In one embodiment, identifying similar tables is performed by looking at table names (e.g., “Balance Sheet”) from previous documents, annual reports, for the current entity.
  • In one embodiment, this may be done by performing a row-based, “next best” matching algorithm. The “next best match” algorithm can be described as matching the list of rows for the currently selected table against the list of rows for all other tables in previous documents. The best match would be the previous tables for which the number of matched rows is closest to the total number of rows of the currently selected table. At this stage, the system has identified and verified tables that are essentially the same as the current table but from older documents (e.g., from last month, last quarter, last fiscal year, etc.).
  • It may not be the case that 100% of the rows in all the tables match. In some cases, perhaps because of a merger between two companies or an internal accounting methodology update where field names change, there may be more than a few non-matching row pairs. In any of these scenarios, at step 1006, the non-matching rows are flagged and included in the tables; they are not discarded by the system. In one embodiment, the non-matching rows are moved to the bottom of the table and displayed in a different color from the matching rows. If there is more than a certain ratio of matching to non-matching rows or there is over a pre-determined percentage of non-matching rows, the tables are flagged or marked for manual review, described in step 1010 below.
  • The operation performed by the system in step 1006 creates a multi-table, row-matched schema for the current table. As noted, this may also be described as a stitched time series of tables for the current table. At step 1008 the row mapping, or stitched table, schema is stored in a database as the default stitching schema for that table. Subsequent user modifications may create a new schema associated with a specific user identifier. As mentioned, the user can modify the default schema by moving non-matching flagged rows back to their original place in the table, requesting that the system merge the flagged row with the non-flagged rows, moving them to wherever they want them to be in the table (e.g., at the top), or can discard them.)
  • As noted, at step 1008 the system-created, default stitched tables are stored in the dictionary of the current table. Both user-defined and system schemas can be configured for alerts so that when a new table is released that matches the saved schema, the new data is automatically added to the stored stitched tables and the user is notified of the addition.
  • At step 1010 the service provider via the platform addresses or manually modifies the schema of tables that were marked or somehow distinguished as being heavily flagged tables, tables that have over a pre-determined percentage of flagged rows. A table can also be brought to the attention of the service provider by the user; the user may have a reason or simply want to service provider to audit the table. The tables that are flagged earlier as having too many non-matching rows are still stitched, but may be characterized as insufficiently stitched tables. As such, they are manually reviewed or audited by the service provider who has advanced tools and user interfaces for doing so. During the audit, corrections and updates are made and the insufficiently stitched tables are completed and made into an acceptable time series and stored with the current table in the database and can be displayed or exported, as described below.
  • Once the pre-processing and time series creation are complete, the user has options with regard to what can be viewed through the platform's user interface and exported into a spreadsheet. These options are shown in FIG. 11. One option is that the user can elect to see only the current table on the screen. This is shown in box 1102. The user can then export the current table to an external document, such as a spreadsheet at box 1108. This is the table that was selected by the user at the beginning of FIG. 9 and is the table without any stitched tables. Another option is the user can click through to each cell of the time series and stitched tables and have the source table load up in a popup or a window so that the values of the time series can be audited. Another option is the user can apply filters to the values of the time series and stitched tables to handle instances where a reported number is a year to date number and must be adjusted to match the quarter. A table stitching engine, described below, may determine this in many cases by using an algorithm that uses document type and table headers. The system also provides filters that allow year over year values to be added within the table.
  • At step 1104 the system receives selection input for displaying the default stitched tables as described in FIG. 10. As noted, the default current table may have flagged (unmatched) rows at the bottom of the table (or wherever the system designer decides the default location should be) and may be shown in a different color. The default table is shown with its time series of similar tables. The default current table and the stitched tables can be exported into an external spreadsheet, again as shown in step 1108. As described above, the user can modify the default table, for example, by moving the flagged rows to a different location in the table or deleting them. This modified table and the stitched tables can be selected by the user and the system will display the modified table at step 1106. These tables can also be exported into a spreadsheet and utilized by the user outside of the system. In one embodiment, the dictionary for the table, whether the original (step 1102), default stitched (1104) or modified (1106), is used to display the tabulated data and export into an external document.
  • FIG. 12 is a block diagram of a system of the financial document intelligence platform in accordance with various embodiments. The platform may be implemented as software executing on a user (e.g., a customer of the financial document intelligence service provider) system or may be operated by the service provider and offered as a service to users. In either case, the platform has various components, modules, and databases most of which have been referenced above and whose functionality has been shown in the flow diagrams.
  • A corpus of documents 1202 is inputted into a system or platform 1204. The documents are input one by one and the flow diagrams above explain the steps taken when one document is inputted, such as at step 902, and when one table is selected, as in step 1002. However, a customer is likely to have inputted numerous documents over a period of time. As described above, the only way to obtain a stitched time series of tables is for a user to input numerous documents over time, that is, to have a history of similar documents in the system.
  • In financial document intelligence system 1204, a document is received by a document conversion engine 1206. This module converts the document, which may be any type of document and may not even be a financial document and have no tables in it, to HTML. A table extraction engine 1208 identifies and pulls any financial tables from the document. In one embodiment, module 1208 is also responsible for converting each table to a dictionary. As described above, several pre-processing steps occur to ensure that only the dictionaries of valid financial tables are converted to dictionaries.
  • Component 1210 is a database used by system 1204 to store all data needed for creating stitched tables. It stores, among other types of data, dictionaries of tables, user data, and table and row data. A table stitching engine 1212 creates the stitched tables. It takes a current table for an entity and, if historical data is available for that table and entity, creates the stitched tables. Engine 1212 performs many of the steps in FIG. 10, such as identifying and verifying historical tables that are similar in type and schema to the current table. This logic is referred to as identification logic above. It also performs the operations of the row-by-row matching between all the verified similar tables, flagging of non-matching rows, and creating the time series. It executes the row-matching algorithm. Once the table stitching is complete, data for the current table is stored in the table dictionary and stored in database 1210.
  • Engine 1212 is in communication with a timeseries user interface module 1214 where the user can display various versions of the current table as described in FIG. 11. The user can preview the current table, the default stitched tables, or the modified stitched tables. An export engine module 1216 is used to export tables to a spreadsheet for the user if the export feature is selected.
  • FIG. 13 is an illustration of a data processing system 1300 depicted in accordance with some embodiments and as shown in FIG. 12. Data processing system 1300 may be used to implement one or more computers used in a controller or other components of various systems described above. In some embodiments, data processing system 1300 includes communications framework 1302, which provides communications between processor unit 1304, memory 1306, persistent storage 1308, communications unit 1310, input/output (I/O) unit 1312, and display 1314. In this example, communications framework 1302 may take the form of a bus system.
  • Processor unit 1304 serves to execute instructions for software that may be loaded into memory 1306. Processor unit 1304 may be a number of processors, a multi-processor core, or some other type of processor, depending on the particular implementation.
  • Memory 1306 and persistent storage 1308 are examples of storage devices 1316. A storage device is any piece of hardware that is capable of storing information, such as, for example, without limitation, data, program code in functional form, and/or other suitable information either on a temporary basis and/or a permanent basis. Storage devices 1316 may also be referred to as computer readable storage devices in these illustrative examples. Memory 1306, in these examples, may be, for example, a random access memory or any other suitable volatile or non-volatile storage device. Persistent storage 1308 may take various forms, depending on the particular implementation. For example, persistent storage 1308 may contain one or more components or devices. For example, persistent storage 1308 may be a hard drive, a flash memory, a rewritable optical disk, a rewritable magnetic tape, or some combination of the above. The media used by persistent storage 1308 also may be removable. For example, a removable hard drive may be used for persistent storage 1308.
  • Communications unit 1310, in these illustrative examples, provides for communications with other data processing systems or devices. In these illustrative examples, communications unit 1310 is a network interface card.
  • Input/output unit 1312 allows for input and output of data with other devices that may be connected to data processing system 1300. For example, input/output unit 1312 may provide a connection for user input through a keyboard, a mouse, and/or some other suitable input device. Further, input/output unit 1312 may send output to a printer. Display 1314 provides a mechanism to display information to a user.
  • Instructions for the operating system, applications, and/or programs may be located in storage devices 1316, which are in communication with processor unit 1304 through communications framework 1302. The processes of the different embodiments may be performed by processor unit 1304 using computer-implemented instructions, which may be located in a memory, such as memory 1306.
  • These instructions are referred to as program code, computer usable program code, or computer readable program code that may be read and executed by a processor in processor unit 1304. The program code in the different embodiments may be embodied on different physical or computer readable storage media, such as memory 1306 or persistent storage 1308.
  • Program code 1318 is located in a functional form on computer readable media 1320 that is selectively removable and may be loaded onto or transmitted to data processing system 1300 for execution by processor unit 1304. Program code 1318 and computer readable media 1320 form computer program product 1322 in these illustrative examples. In one example, computer readable media 1320 may be computer readable storage media 1324 or computer readable signal media 1326.
  • In these illustrative examples, computer readable storage media 1324 is a physical or tangible storage device used to store program code 1318 rather than a medium that propagates or transmits program code 1318.
  • Alternatively, program code 1318 may be transmitted to data processing system 1300 using computer readable signal media 1326. Computer readable signal media 1326 may be, for example, a propagated data signal containing program code 1318. For example, computer readable signal media 1326 may be an electromagnetic signal, an optical signal, and/or any other suitable type of signal. These signals may be transmitted over communications channels, such as wireless communications channels, optical fiber cable, coaxial cable, a wire, and/or any other suitable type of communications channel.
  • The different components illustrated for data processing system 1300 are not meant to provide architectural limitations to the manner in which different embodiments may be implemented. The different illustrative embodiments may be implemented in a data processing system including components in addition to and/or in place of those illustrated for data processing system 1300. Other components shown in FIG. 13 can be varied from the illustrative examples shown. The different embodiments may be implemented using any hardware device or system capable of running program code 1318.
  • Therefore, the present disclosure is not to be limited to the specific examples illustrated and that modifications and other examples are intended to be included within the scope of the appended claims. Moreover, although the foregoing description and the associated drawings describe examples of the present disclosure in the context of certain illustrative combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative implementations without departing from the scope of the appended claims. Accordingly, parenthetical reference numerals in the appended claims are presented for illustrative purposes only and are not intended to limit the scope of the claimed subject matter to the specific examples provided in the present disclosure.

Claims (1)

What is claimed is:
1. A method of extracting financial data from a document and analyzing similar financial data from older documents to enhance understanding of the document, the method comprising:
inputting a document containing unstructured data;
identifying tables and data structures in the document and extracting said tables using a parsing engine;
converting tables into dictionaries;
verifying that data in a table is financial data and storing dictionaries for valid financial data tables;
creating a time series for a selected table wherein a row-based matching algorithm is used to identify similar tables; and
storing a time series for a selected table wherein said time series is utilized to examine changes in financial data over time.
US15/729,645 2016-10-07 2017-10-10 Financial documents examination methods and systems Abandoned US20200302392A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US15/729,645 US20200302392A1 (en) 2016-10-07 2017-10-10 Financial documents examination methods and systems
US17/837,526 US20220300906A1 (en) 2016-10-07 2022-06-10 Financial documents examination methods and systems
US18/059,588 US11829950B2 (en) 2016-10-07 2022-11-29 Financial documents examination methods and systems
US18/230,237 US20230376900A1 (en) 2016-10-07 2023-08-04 Financial documents examination methods and systems

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662405828P 2016-10-07 2016-10-07
US15/729,645 US20200302392A1 (en) 2016-10-07 2017-10-10 Financial documents examination methods and systems

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/837,526 Continuation US20220300906A1 (en) 2016-10-07 2022-06-10 Financial documents examination methods and systems

Publications (1)

Publication Number Publication Date
US20200302392A1 true US20200302392A1 (en) 2020-09-24

Family

ID=72515572

Family Applications (4)

Application Number Title Priority Date Filing Date
US15/729,645 Abandoned US20200302392A1 (en) 2016-10-07 2017-10-10 Financial documents examination methods and systems
US17/837,526 Pending US20220300906A1 (en) 2016-10-07 2022-06-10 Financial documents examination methods and systems
US18/059,588 Active US11829950B2 (en) 2016-10-07 2022-11-29 Financial documents examination methods and systems
US18/230,237 Pending US20230376900A1 (en) 2016-10-07 2023-08-04 Financial documents examination methods and systems

Family Applications After (3)

Application Number Title Priority Date Filing Date
US17/837,526 Pending US20220300906A1 (en) 2016-10-07 2022-06-10 Financial documents examination methods and systems
US18/059,588 Active US11829950B2 (en) 2016-10-07 2022-11-29 Financial documents examination methods and systems
US18/230,237 Pending US20230376900A1 (en) 2016-10-07 2023-08-04 Financial documents examination methods and systems

Country Status (1)

Country Link
US (4) US20200302392A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11068448B2 (en) * 2019-01-07 2021-07-20 Salesforce.Com, Inc. Archiving objects in a database environment

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336094B1 (en) 1995-06-30 2002-01-01 Price Waterhouse World Firm Services Bv. Inc. Method for electronically recognizing and parsing information contained in a financial statement
US6026409A (en) * 1996-09-26 2000-02-15 Blumenthal; Joshua O. System and method for search and retrieval of digital information by making and scaled viewing
US5893131A (en) 1996-12-23 1999-04-06 Kornfeld; William Method and apparatus for parsing data
US6850950B1 (en) 1999-02-11 2005-02-01 Pitney Bowes Inc. Method facilitating data stream parsing for use with electronic commerce
US9262384B2 (en) * 1999-05-21 2016-02-16 E-Numerate Solutions, Inc. Markup language system, method, and computer program product
US6718336B1 (en) * 2000-09-29 2004-04-06 Battelle Memorial Institute Data import system for data analysis system
US6782400B2 (en) 2001-06-21 2004-08-24 International Business Machines Corporation Method and system for transferring data between server systems
US7117220B2 (en) * 2001-10-15 2006-10-03 Vanderdrift Richard William System and method for non-programmers to dynamically manage multiple sets of XML document data
US8225217B2 (en) * 2002-05-30 2012-07-17 Microsoft Corporation Method and system for displaying information on a user interface
US7653871B2 (en) 2003-03-27 2010-01-26 General Electric Company Mathematical decomposition of table-structured electronic documents
US7880909B2 (en) 2003-05-20 2011-02-01 Bukowski Mark A Extensible framework for parsing varying formats of print stream data
US7231593B1 (en) 2003-07-24 2007-06-12 Balenz Software, Inc. System and method for managing a spreadsheet
US7856388B1 (en) 2003-08-08 2010-12-21 University Of Kansas Financial reporting and auditing agent with net knowledge for extensible business reporting language
US8600845B2 (en) 2006-10-25 2013-12-03 American Express Travel Related Services Company, Inc. System and method for reconciling one or more financial transactions
US7590647B2 (en) * 2005-05-27 2009-09-15 Rage Frameworks, Inc Method for extracting, interpreting and standardizing tabular data from unstructured documents
US20070073708A1 (en) * 2005-09-28 2007-03-29 Smith Adam D Generation of topical subjects from alert search terms
US20120087537A1 (en) * 2010-10-12 2012-04-12 Lisong Liu System and methods for reading and managing business card information
US8381095B1 (en) * 2011-11-07 2013-02-19 International Business Machines Corporation Automated document revision markup and change control
US10095672B2 (en) 2012-06-18 2018-10-09 Novaworks, LLC Method and apparatus for synchronizing financial reporting data
US10140263B2 (en) * 2014-06-06 2018-11-27 Maud GAGNÉ-LANGEVIN System and method for generating task-embedded documents
US9965809B2 (en) 2016-07-25 2018-05-08 Xerox Corporation Method and system for extracting mathematical structures in tables

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11068448B2 (en) * 2019-01-07 2021-07-20 Salesforce.Com, Inc. Archiving objects in a database environment
US11640378B2 (en) 2019-01-07 2023-05-02 salesforce.com,inc. Archiving objects in a database environment

Also Published As

Publication number Publication date
US20220300906A1 (en) 2022-09-22
US20230376900A1 (en) 2023-11-23
US20230087987A1 (en) 2023-03-23
US11829950B2 (en) 2023-11-28

Similar Documents

Publication Publication Date Title
US20210382887A1 (en) Systems and methods for grouping and enriching data items accessed from one or more databases for presentation in a user interface
US9031873B2 (en) Methods and apparatus for analysing and/or pre-processing financial accounting data
US8468442B2 (en) System and method for rendering data
US7849048B2 (en) System and method of making unstructured data available to structured data analysis tools
US9430801B2 (en) Methods systems and computer program products for generating financial statement complying with accounting standard
US20150347604A1 (en) System and method for information disclosure statement management and prior art cross-citation control
US20070288336A1 (en) Method and System For Advanced Financial Analysis
US20070050702A1 (en) System and method for rendering of financial data
US20050183002A1 (en) Data and metadata linking form mechanism and method
US8321469B2 (en) Systems and methods of profiling data for integration
US20110320399A1 (en) Etl builder
US20090259670A1 (en) Apparatus and Method for Conditioning Semi-Structured Text for use as a Structured Data Source
US10089343B2 (en) Automated analysis of data reports to determine data structure and to perform automated data processing
US20230376900A1 (en) Financial documents examination methods and systems
US10127292B2 (en) Knowledge catalysts
US8260772B2 (en) Apparatus and method for displaying documents relevant to the content of a website
US10474702B1 (en) Computer-implemented apparatus and method for providing information concerning a financial instrument
US20090199158A1 (en) Apparatus and method for building a component to display documents relevant to the content of a website
Kämpgen et al. Accepting the xbrl challenge with linked data for financial data integration
US10896227B2 (en) Data processing system, data processing method, and data structure
Brito et al. A hybrid AI tool to extract key performance indicators from financial reports for benchmarking
Hernandez et al. Unleashing the power of public data for financial risk measurement, regulation, and governance
Patil et al. A systematic study of data wrangling
CN114175021B (en) System and method for generating logical documents for a document evaluation system
US10776399B1 (en) Document classification prediction and content analytics using artificial intelligence

Legal Events

Date Code Title Description
AS Assignment

Owner name: SENTIEO, INC., DELAWARE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHAH, NAMAN;SHAH, ATUL;SAXENA, ANURAG;AND OTHERS;SIGNING DATES FROM 20190305 TO 20190411;REEL/FRAME:050069/0270

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION