CN109062874B

CN109062874B - Financial data acquisition method, terminal device and medium

Info

Publication number: CN109062874B
Application number: CN201810600697.4A
Authority: CN
Inventors: 苏晓明; 汪伟; 王晓伟; 徐冰; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-06-12
Filing date: 2018-06-12
Publication date: 2022-03-04
Anticipated expiration: 2038-06-12
Also published as: CN109062874A; WO2019237540A1

Abstract

The invention is suitable for the technical field of data processing, and provides a financial data acquisition method, terminal equipment and a medium, wherein the method comprises the following steps: acquiring a pre-issued text to be analyzed; converting the text format of the text to be analyzed from a pdf format into a document doc format through a preset text conversion tool; acquiring a text code corresponding to the text to be analyzed based on the text to be analyzed in the doc format; the text code comprises a plurality of types of page tags; searching a table tag in the page tag, and positioning a table in a text to be analyzed according to the text position to which the table tag belongs; extracting each field value associated with the table and table description information; and outputting the form description information and each field value to a pre-created text document so that a business system identifies the text document and acquires financial data associated with the text to be analyzed. The method and the device reduce the acquisition difficulty of the financial data of the enterprise and achieve the multi-dimensional acquisition effect of the financial data.

Description

Financial data acquisition method, terminal device and medium

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a financial data acquisition method, terminal equipment and a computer readable storage medium.

Background

Documents such as quarterly newspapers, annual newspapers and posters are all published documents of enterprises. The disclosure contains a great deal of valuable financial data. For example, the enterprise accounts receivable, accounts payable, balance status, amount of profit or loss, and overall debt status, etc. The financial data can show great reference value after being processed again and analyzed. For example, in various applications, such financial data may be used to independently analyze business conditions of the enterprise, determine industry-industry chain conditions associated with the enterprise, and the like.

However, since the styles of the public documents such as the quarterly newspaper, the annual newspaper, and the stock book are complicated, the automatic extraction and analysis of the financial data for the public documents are not disclosed for a while, and thus, the multi-dimensional acquisition of the financial data cannot be realized.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method for acquiring financial data, a terminal device, and a computer-readable storage medium, so as to solve the problem that multi-dimensional acquisition of financial data cannot be realized in the prior art.

A first aspect of an embodiment of the present invention provides a method for acquiring financial data, including:

acquiring a pre-published text to be analyzed, wherein the initial format of the text to be analyzed is a portable document pdf format;

converting the text format of the text to be analyzed from the pdf format into a document doc format through a preset text conversion tool;

acquiring a text code corresponding to the text to be analyzed based on the text to be analyzed in the doc format; wherein the text code comprises a plurality of types of page tags;

searching a table tag in the page tag, and positioning a table in the text to be analyzed according to the text position to which the table tag belongs;

extracting each field value associated with the table and table description information;

and outputting the form description information and each field value to a pre-created text document so that a business system identifies the text document and acquires financial data associated with the text to be analyzed.

A second aspect of the embodiments of the present invention provides a terminal device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the following steps when executing the computer program:

A third aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of:

In the embodiment of the invention, because the originally loaded public documents such as the stock book, the yearly newspaper, the quarterly newspaper and the like exist in the pdf format, the text format of the public documents is converted into the doc format, and the text code corresponding to the text to be analyzed can be read, so that the position area of the form is determined according to the form label in the text code, and the automatic positioning of the form is realized; in the above-mentioned open document, the data information contained in the table is usually the financial data with higher mining value, therefore, after each table position is obtained by positioning, the field value associated with the table and the table description information are extracted and output to the pre-created text document, so that other business systems can read and analyze the text document with stronger compatibility, thereby realizing the rapid analysis of the enterprise financial data, avoiding the need of reading the enterprise financial data based on the open document with complex style, and reducing the difficulty of obtaining the enterprise financial data; because the business system can automatically identify the financial data contained in various open files through the text documents, compared with the prior art, the multi-dimensional acquisition effect of the financial data is also achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an implementation of a method for acquiring financial data according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an implementation of the financial data acquiring method S104 according to an embodiment of the present invention;

fig. 3 is a flowchart of a specific implementation of the financial data acquiring method S105 according to an embodiment of the present invention;

fig. 4 is a flowchart of another specific implementation of the financial data acquiring method S105 according to the embodiment of the present invention;

FIG. 5 is a flow chart of an implementation of a method for obtaining financial data according to another embodiment of the present invention;

fig. 6 is a block diagram showing the configuration of an apparatus for acquiring financial data according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Fig. 1 shows an implementation flow of the method for acquiring financial data according to the embodiment of the present invention, where the method flow includes steps S101 to S106. The specific realization principle of each step is as follows:

s101: the method comprises the steps of obtaining a pre-published text to be analyzed, wherein the initial format of the text to be analyzed is a portable document pdf format.

In the embodiment of the invention, the text to be analyzed is a public document released by an enterprise, including a quarterly newspaper, an annual newspaper, a stock book and the like. And downloading the text to be analyzed from the corresponding public website periodically according to preset website information. When the enterprise creates the open Document, the Document to be analyzed downloaded from the open website is in a Portable Document Format (PDF) Format, because the Document is output in the PDF Format.

S102: and converting the text format of the text to be analyzed from the pdf format into a document doc format by a preset text conversion tool.

And importing each text to be analyzed in pdf format into a preset text conversion tool, and outputting a file to be analyzed based on document (doc) format after detecting a format conversion instruction sent by a user. The text conversion tool may be, for example, a fuxin converter, a PDF converter, an agile converter, or the like.

S103: acquiring a text code corresponding to the text to be analyzed based on the text to be analyzed in the doc format; wherein the text encoding comprises a plurality of types of page tags.

And for the text to be analyzed in the doc format, reading the text code of the text to be analyzed. The text code includes various types of page tags, such as a table tag and a paragraph tag.

S104: and searching a table tag in the page tag, and positioning a table in the text to be analyzed according to the text position to which the table tag belongs.

In the embodiment of the invention, the text code corresponding to the text to be analyzed is traversed, so that various page tags appearing in the text code are sequentially detected through the preset regular expression. And, in the detected page tags, each form tag is positioned based on the tag character element corresponding to the form tag.

If any table label in the text to be analyzed is obtained through positioning, the text code adjacent to the table label is determined to be the text code matched with one table in the text to be analyzed, and therefore the positioning corresponding to the table in the text to be analyzed can be determined according to the text position of the table label.

As an embodiment of the present invention, fig. 2 shows a specific implementation flow of the financial data obtaining method S104 provided by the embodiment of the present invention, which is detailed as follows:

s1041: and traversing each coding block in the text coding in sequence.

S1042: and judging whether the page tag type corresponding to each coding block is a table type or not.

S1043: if the page tag type corresponding to the coding block is a table type, setting the attribute value of the built-in flag bit as a logic true value so as to mark the text position corresponding to the coding block as the initial position of the table.

S1044: and returning to execute the operation of sequentially traversing each coding block in the text codes until the page tag type corresponding to the taken-out coding block is a non-table type and a non-null value, and marking the text position corresponding to the coding block as the end position of the table.

In the embodiment of the invention, the text coding comprises a plurality of coding blocks (blocks), and each block has a corresponding page tag. And reading each block in the text codes in turn through a preset Document python plug-in. And determining the page tag type of each block according to the difference of the page tags. If the page tag corresponding to the block is a table tag, determining that the page tag type of the block is a table type; and if the page tag corresponding to the block is a paragraph tag, determining that the page tag type of the block is a paragraph type.

In the embodiment of the present invention, if it is detected that the page tag type of any block is a table type, for a text position to which the block belongs, an attribute value of a flag bit, which is a start _ table of the text position, is set to a logical true value true, so as to mark the text position as a start position of a currently detected table. Thereafter, the process returns to step S1041 to search for the next block existing in the text encoding from the current text position, and the subsequent steps S1042 to S1044 are performed.

After setting the attribute value of the start _ table flag bit of the text position to a logical true value, if it is detected that any subsequent block has a corresponding page tag and the page tag type of the block is a non-table type (for example, a paragraph type may be used), setting the flag bit of the end _ table of the text position to which the block belongs to the logical true value, so as to mark the text position as the end position of the currently detected table.

According to the flag bit information corresponding to each text position in the text to be analyzed, a first text position with a start _ table flag bit of true and a second text position with an end _ table flag bit of true appearing for the first time after the first text is set are determined as text areas corresponding to a table.

The embodiment of the invention is suitable for the scene that the table displayed in pages exists in the text to be analyzed. For example, in the pdf-formatted text to be analyzed, if the height of a table is large, the table will be displayed across pages, that is, after the table is divided into at least two sub-tables, each sub-table is displayed in a page of the text to be analyzed. Therefore, after the text format of the text to be analyzed is converted into the doc format, in order to restore the same table based on different blocks in the text encoding, when the page tag types of the two blocks are continuously monitored to be the table types, it can be determined that the text positions of the two blocks are both the position areas where the table exists. If the page tag type of the next block is detected to be a paragraph type, the table is terminated, and therefore a complete table existing in the text to be analyzed can be positioned and extracted based on the text position to which the block belongs and the text positions to which the blocks belong.

In the embodiment of the invention, the attribute values of the built-in flag bits corresponding to the positions of the texts can be determined by detecting the table types of the coding blocks in the texts to be analyzed, so that the starting and ending positions of the tables in the texts to be analyzed can be accurately identified based on the attribute values, the tables displayed in the pages can be automatically identified, various financial data can be classified under the same table after being extracted, and the accuracy of extracting the table data is improved.

S105: extracting various field values associated with the table and table description information.

After each table contained in the text to be analyzed is located, the cell content of each block corresponding to the table is read through the Document python plug-in, the cell content is stored in a preset table _ data array, and data contained in the table _ data array is each field value associated with the table.

In the embodiment of the present invention, the table description information is used to describe the main content of the table data, including but not limited to the title, name or descriptive information of the table. For example, if the table data is financial expenditure data of enterprise a in 3 months, the table description information may be "financial expenditure data in 3 months".

For example, according to the position area to which each table belongs, a plurality of character values before or after the position area may be extracted to be determined as the table description information of the table.

As an embodiment of the present invention, fig. 3 shows a specific implementation flow of the financial data obtaining method S105 provided by the embodiment of the present invention, which is detailed as follows:

s10501: a first-in first-out FIFO queue is created.

S10502: and sequentially traversing each coding block in the text codes, and acquiring the page tag type corresponding to the currently traversed coding block.

S10503: if the page tag type corresponding to the coding block is a paragraph type, sequentially storing each character contained in the coding block into the FIFO queue, and reading the real-time queue length of the FIFO queue.

S10504: and if the real-time queue length of the FIFO queue is greater than a preset threshold value, removing a plurality of characters at the bottom of the FIFO queue, returning to execute the operation of sequentially traversing each coding block in the text codes and acquiring the page tag type corresponding to the currently traversed coding block.

S10505: and if the page tag type corresponding to the coding block is a table type, splicing all characters in the FIFO queue, and outputting a splicing result as table description information associated with the table.

For each located table, in order to extract the table description information of the table, a First-in First-out (FIFO) queue with a preset length is created First. And determining each block before the text position according to the text position to which the table belongs, and sequentially reading the page tag types of each block. If the page tag of any block is a non-null value and the page tag type is a paragraph type, pushing the cell content of the block into the FIFO queue.

In the embodiment of the invention, before the cell content of the block is pressed into the FIFO queue, the real-time queue length of the FIFO queue is obtained according to the number of characters contained in the FIFO queue. If the length of the real-time queue is greater than the preset queue length value, the FIFO queue is full, and therefore data which are firstly input into the FIFO queue are eliminated, and cell contents of the block which is obtained by reading at present are pressed into the processed FIFO queue. And then, returning to execute the step S1052, and stopping pushing the cell content of any block into the FIFO queue until the read page tag type of the block is the table type.

In the embodiment of the invention, after the cell content of the block is stopped being pressed into the FIFO queue, all the characters contained in the FIFO queue are extracted, and the character strings obtained by splicing all the characters are output as the table description information associated with the table.

In the embodiment of the invention, when the page tag type is detected to be the block of the table type, the cell content of the block is stopped being pressed into the FIFO queue, so that each character stored in the FIFO queue is ensured to be the text information of the area closest to the table position. Generally speaking, since the text information closest to the table location area can most represent the main content of the table data (for example, header information at the top of the table), by splicing the characters in the FIFO queue and outputting the splicing result as the table description information associated with the table, automatic positioning of the table description information is realized, and the extraction accuracy of the table description information is improved.

As an embodiment of the present invention, fig. 4 shows another specific implementation flow of the financial data obtaining method S105 provided by the embodiment of the present invention, which is detailed as follows:

s10506: and if the page tag type corresponding to the coding block is a table type, acquiring a regular expression associated with a preset keyword.

S10507: and detecting each character string in the FIFO queue based on the regular expression.

S10508: and if the character string matched with the regular expression exists in the FIFO queue, outputting the character string as the table description information associated with the table.

S10509: and if the character strings matched with the regular expression do not exist in the FIFO queue, respectively calculating the tag distance value of each character string in the FIFO queue and the table tag in the coding block to which the character string belongs.

S10510: and outputting the character string with the minimum label distance value as the table description information associated with the table.

In the embodiment of the present invention, extracting table description information associated with the table based on text information before the table specifically includes:

after cell contents of block with the page tag type of table type are pressed into an FIFO queue, a regular expression associated with a preset associated word is obtained. The preset associated words are characters with high association degree with table descriptive information such as table titles and the like. For example, common table titles typically exist in the format of "XXX tables," so the regular expression corresponding to such table titles may be "\ S ] \ table $". And in the block with the page tag type of the table type, detecting each character string stored in the FIFO queue based on the obtained regular expression.

If a character string satisfying the regular expression is detected in the FIFO queue, the character string is extracted and output as table description information associated with the table.

If the character strings meeting the regular expression are not detected in the FIFO queue, before the text position to which the table belongs is represented, descriptive information similar to the table title does not exist, at this time, N (N is a preset value and is an integer greater than 1) characters adjacent to each other in the FIFO queue are taken as a character string, and the label distance value of the block is read according to the style label of the block to which the last character belongs. The tag distance value represents a distance value between a text position to which the character belongs and the bottom of the current page. Based on the mode, after the tag distance values of all the character strings in the FIFO queue are respectively obtained, one character string with the smallest tag distance value is selected. And outputting the character string with the minimum label distance value as the table description information associated with the table.

In the embodiment of the invention, because the character string with the minimum label distance value is closer to the bottom of the page, and the block to which the character string belongs is positioned in front of the table, the text position to which the character string belongs can be determined to be closest to the initial position of the table. Generally speaking, the text information closest to the start position of the table can describe the subject content of the table data more clearly, so that the accuracy of the table description information is improved to some extent by outputting the character string as the table description information associated with the table.

S106: and outputting the form description information and each field value to a pre-created text document so that a business system identifies the text document and acquires financial data associated with the text to be analyzed.

In the embodiment of the invention, after each field value in the form and the form description information associated with the form are obtained, the form description information and each field value are sequentially output to the pre-established text document according to the sequence of obtaining each character. Wherein the text format of the text document is txt format.

Preferably, in the text document, a preset separator is inserted between any two adjacent field values.

Preferably, the table description information is output to the top position of the text document, and a line break is inserted between the table description information and the field value.

In the embodiment of the invention, the text document is sent to each business system which is connected in advance. Because the business systems of all versions have better compatibility with the text documents in the txt format, the business systems can identify and process the text documents to extract the financial data associated with the texts to be analyzed.

As another embodiment of the present invention, as shown in fig. 5, after S106, the method further includes:

s107: and loading a report template, and respectively importing each item of financial data into a corresponding table body according to a preset table header in the report template.

S108: and generating and displaying a financial data analysis report according to the import result.

In the embodiment of the invention, a report template generated in advance is loaded, wherein the report template comprises various headers, each header corresponds to a table body, each header is used for describing the field attribute of a field value in the table, and each table body is used for recording a field value. For each preset header in the report template, according to the field attribute described by the header, in each item of data of the text document generated in S106, the field value corresponding to the field attribute is screened out, and the field value is imported into the table body corresponding to the header of the report template.

And respectively calculating various statistical information values through a preset calculation formula according to the field value of each field attribute imported by the report template, importing the obtained statistical result to the tail of the report template, and outputting and displaying the financial data analysis report.

In the embodiment of the invention, by importing the field values in the text document into the report template generated in advance, the finally displayed financial data analysis report can detail list the field values in the data analysis process, so that a user can conveniently check whether the analysis process of the financial data is wrong, and the reliability and the accuracy of the financial data analysis report are further improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 6 shows a block diagram of the financial data acquisition apparatus according to the embodiment of the present invention, which corresponds to the financial data acquisition method described in the above embodiment, and only shows the relevant parts according to the embodiment of the present invention for convenience of description.

Referring to fig. 6, the apparatus includes:

the first obtaining unit 61 is configured to obtain a pre-published text to be analyzed, where an initial format of the text to be analyzed is a portable document pdf format.

The converting unit 62 is configured to convert the text format of the text to be analyzed from the pdf format to a document doc format by using a preset text conversion tool.

A second obtaining unit 63, configured to obtain, based on the doc-formatted text to be analyzed, a text code corresponding to the text to be analyzed; wherein the text encoding comprises a plurality of types of page tags.

The searching unit 64 is configured to search a table tag in the page tags, and locate a table existing in the text to be analyzed according to a text position to which the table tag belongs.

An extracting unit 65, configured to extract each field value associated with the table and table description information.

An output unit 66, configured to output the form description information and each field value to a pre-created text document, so that after the text document is identified and processed by a business system, financial data associated with the text to be analyzed is obtained.

Optionally, the search unit 64 includes:

and the traversal subunit is used for sequentially traversing each coding block in the text coding.

And the judging subunit is used for judging whether the page tag type corresponding to each coding block is a table type or not.

And the marking subunit is used for setting the attribute value of the built-in flag bit as a logic true value if the page tag type corresponding to the coding block is a table type, so as to mark the text position corresponding to the coding block as the initial position of the table.

And the return subunit is used for returning and executing the operation of sequentially traversing each coding block in the text codes until the page tag type corresponding to the taken-out coding block is a non-table type and a non-null value, and marking the text position corresponding to the coding block as the end position of the table.

Optionally, the extracting unit 65 includes:

and the creating subunit is used for creating a first-in first-out (FIFO) queue.

And the acquiring subunit is used for sequentially traversing each coding block in the text codes and acquiring the page tag type corresponding to the currently traversed coding block.

And the storage subunit is used for sequentially storing each character contained in the coding block into the FIFO queue and reading the real-time queue length of the FIFO queue if the page tag type corresponding to the coding block is a paragraph type.

And the removing subunit is configured to remove the plurality of characters existing at the bottom of the FIFO queue, return to execute the operation of sequentially traversing each coding block in the text code, and acquire the page tag type corresponding to the currently traversed coding block, if the real-time queue length of the FIFO queue is greater than a preset threshold.

And the splicing subunit is configured to splice the characters in the FIFO queue if the page tag type corresponding to the coding block is a table type, and output a splicing result as table description information associated with the table.

Optionally, the splicing subunit is specifically configured to:

if the page tag type corresponding to the coding block is a table type, acquiring a regular expression associated with a preset keyword;

detecting each character string in the FIFO queue based on the regular expression;

if the character string matched with the regular expression exists in the FIFO queue, outputting the character string as table description information associated with the table;

if the character strings matched with the regular expression do not exist in the FIFO queue, respectively calculating the tag distance value of each character string in the FIFO queue and the table tag in the coding block to which the character string belongs;

and outputting the character string with the minimum label distance value as the table description information associated with the table.

Optionally, the acquiring apparatus of financial data further includes:

and the loading unit is used for loading a report template and respectively importing each item of financial data into a corresponding table body according to a preset table header in the report template.

And the generating unit is used for generating and displaying the financial data analysis report according to the import result.

Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, the terminal device 7 of this embodiment includes a processor 70 and a memory 71, and a computer program 72, such as a financial data acquisition program, operable on the processor 70 is stored in the memory 71. The processor 70, when executing the computer program 72, implements the steps of the above-described embodiments of the method for acquiring financial data, such as the steps 101 to 106 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the units 61 to 66 shown in fig. 6.

Illustratively, the computer program 72 may be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 72 in the terminal device 7.

The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of a terminal device 7 and does not constitute a limitation of the terminal device 7 and may comprise more or less components than shown, or some components may be combined, or different components, for example the terminal device may further comprise input output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal device. The memory 71 may also be used to temporarily store data that has been output or is to be output.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for acquiring financial data, comprising:

creating a first-in first-out (FIFO) queue;

sequentially traversing each coding block in the text codes, and acquiring the page tag type corresponding to the currently traversed coding block;

if the page tag type corresponding to the coding block is a paragraph type, sequentially storing each character contained in the coding block into the FIFO queue, and reading the real-time queue length of the FIFO queue;

if the real-time queue length of the FIFO queue is larger than a preset threshold value, removing a plurality of characters at the bottom of the FIFO queue, returning to execute the operation of sequentially traversing each coding block in the text codes and acquiring the page tag type corresponding to the currently traversed coding block;

if the page tag type corresponding to the coding block is a table type, splicing all characters in the FIFO queue, and outputting a splicing result as table description information associated with the table;

2. The method for acquiring financial data according to claim 1, wherein said searching for a table tag in said page tags and locating a table existing in said text to be analyzed according to a text position to which said table tag belongs comprises:

sequentially traversing each coding block in the text codes;

for each coding block, judging whether the page tag type corresponding to the coding block is a table type;

if the page tag type corresponding to the coding block is a table type, setting the attribute value of the built-in flag bit as a logic true value so as to mark the text position corresponding to the coding block as the initial position of the table;

and returning to execute the operation of sequentially traversing each coding block in the text codes until the page tag type corresponding to the taken-out coding block is a non-table type and a non-null value, and marking the text position corresponding to the coding block as the end position of the table.

3. The method according to claim 2, wherein said splicing characters in said FIFO queue and outputting a result of splicing as table description information associated with said table if the page tag type corresponding to said coding block is a table type comprises:

4. The method for acquiring financial data according to claim 1, wherein after said outputting said form description information and each of said field values to a pre-created text document to make a business system perform recognition processing on said text document, acquiring financial data associated with said text to be analyzed, further comprises:

loading a report template, and respectively importing each financial data into a corresponding table body according to a preset table header in the report template;

and generating and displaying a financial data analysis report according to the import result.

5. A terminal device comprising a memory and a processor, the memory having stored therein a computer program operable on the processor, wherein the processor when executing the computer program implements the steps of:

creating a first-in first-out (FIFO) queue;

6. The terminal device of claim 5, wherein the searching for the table tag in the page tags and locating the table existing in the text to be analyzed according to the text position to which the table tag belongs comprises:

sequentially traversing each coding block in the text codes;

7. The terminal device according to claim 5, wherein if the page tag type corresponding to the coding block is a table type, the splicing characters in the FIFO queue and outputting a splicing result as table description information associated with the table comprises:

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.