CN113326676A

CN113326676A - Deep learning model device for structuring financial text into form

Info

Publication number: CN113326676A
Application number: CN202110415793.3A
Authority: CN
Inventors: 周靖宇; 景泳霖; 袁阳平; 邹鸿岳
Original assignee: Shanghai Kuaique Information Technology Co ltd
Current assignee: Shanghai Kuaique Information Technology Co ltd
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-08-31

Abstract

A deep learning model device for structuring financial texts into forms comprises the following technical scheme: preprocessing, cleaning data, segmenting text into words, forming characters and words, and labeling table lines; secondly, vectorizing words; step three, a character coding layer; step four, a connection layer of character coding and word coding; step five, predicting column information; step six, preprocessing of prediction of row information; step seven, predicting the line information; step eight, setting the total loss function. By means of the model, the unstructured text is directly converted into the table data, the data in the financial data field reaches the commercialization standard, compared with the Pipeline form, the data is improved by 3-5 percent, and the problem of error transmission of the Pipeline is reduced.

Description

Deep learning model device for structuring financial text into form

Technical Field

The invention relates to the technical field of information extraction and conversion, in particular to a deep learning model device for structuring financial texts into tables.

Background

In natural language processing, a common task is to classify text or extract information. Another problem is to extract structural information such as tables in the identification document.

In the fields of finance and the like, there are some deeper technical requirements, such as the problem of directly converting unstructured text into a table. As shown in fig. 1, the bidding data is the bid-altering data of a first-level bid, which means that "place 2.64% of the debt [02XX city to MTN002] is cancelled; meanwhile, 1 hundred million funds with 2.78 standard positions are changed into 4 million of 2.83 standard positions and 6 million of 2.96 standard positions, the problem is abstracted into a problem of structuring unstructured text information into table data, and in the aspect of semantic understanding, the problem is not only simple text classification or recognition intention, but also all elements need to be in one-to-one correspondence with a plurality of intents to form standard table data. This is a difficult technical problem in the current text processing field, and has a series of technical problems.

There is currently no readily available uniform technology for the problem of collating text into forms. The main approach is to split it into multiple sub-tasks and then handle such problems in a pipeline (pipeline) fashion. First, a text classification model is used to classify and judge the overall intention of the question (taking the above example as an example, whether the intention of the user is bid, bid-changing or bid-withdrawal is judged). Next, information extraction is performed, elements in the text are extracted by using a technology of Named Entity Recognition (NER) (in the above example, elements such as bond names, benchmarks, scalars and the like are extracted), and finally, the elements are arranged through a series of rules (for example, according to the positions of the elements before and after the word change) and combined into a form of a table.

First, the Pipeline (Pipeline) type processing method has a relatively large defect of error transmission. In the process of structuring the text into the table, three models are needed, one model is intention classification, the second model is element extraction, the third model is element structuring into the table, and the middle judgment needs to consider the line number (uncertain) of the table. With the existing better model algorithm, the accuracy is about 95%, and the final accuracy is about 80% -85% after the three models are fused by Pipeline. In order to be commercially available or to improve the accuracy, a series of various rules are required for correction and some fault-tolerant designs; secondly, the second defect of the Pipeline is adopted, the text is coded on the bottom layer, and the text is divided into a plurality of subtasks, so that the text is coded independently on each task, technical resources are wasted, and the structuring efficiency is reduced; secondly, related parameters cannot be shared, and the effect of improving the prediction accuracy is achieved; the intention is to judge the subtask, and the subtask is a multi-level classification problem; the first-level classification is used for judging whether to throw, change or withdraw the bidding, and the second-level classification is used for aiming at the intention of change and also needs to judge the elements before change and the elements after change. The existing classification model can not well solve the classification purpose, no related deep learning algorithm model exists in the structured form, a large degree of manual mode is adopted to comb out structured logic rules, and the elements are reordered through the rules. The scheme based on the rule engine needs a large amount of labor cost, meanwhile, the completeness of the rule cannot be guaranteed, and a plurality of problem rules cannot be covered. Secondly, due to the diversity of expression modes of people, the rules cannot be covered comprehensively, and can interfere and conflict with each other, so that the conditions of considering each other are easy to occur. Finally, the development cost and the maintenance cost are high, and whether the rules are effective or not and the influence on the previous rules are considered when a new rule is added for the rules combed at the early stage; development and maintenance costs are extremely high.

Disclosure of Invention

The invention aims to provide a deep learning model device for structuring financial texts into tables so as to solve the problems in the background technology, the invention adopts the idea of a joint model, a plurality of tasks such as text information extraction, table relation judgment, whether a column of information is formed and the like are fused into a multi-task model, a plurality of subtasks are solved through one model, and finally standard table structure information is formed, firstly, the original text is cleaned, the table information is arranged into a data form of model training, secondly, a multi-level model structure is adopted to code the text so as to realize element extraction in the text, then, the extracted elements are combined according to the table column form to form a plurality of lines of information, and the elements of each line are classified after secondary coding, the invention judges whether the text is effective line information or not, and finally realizes the table structuring of the text, the invention constructs a set of multi-level complex models, and realizes the aim of structuring the text into the table in one model, the learning and training process of the algorithm model is shown in figure 2, the invention provides a multi-task neural network, and the non-structural text is directly converted into the table data through one model, so that the commercialization standard is reached in the field of financial data, compared with the form of Pipeline, the invention improves 3-5 percentage points, and reduces the error transmission problem of Pipeline.

In order to achieve the purpose, the invention comprises the following technical scheme: preprocessing, cleaning data, segmenting text into words, forming characters and words, and labeling table lines; secondly, vectorizing words; step three, a character coding layer; step four, a connection layer of character coding and word coding; step five, predicting column information; step six, preprocessing of prediction of row information; step seven, predicting the line information; step eight, setting the total loss function.

The method comprises the following steps of preprocessing, data cleaning, and cleaning and replacing irregular data. For example, the method comprises the steps of performing word segmentation and cutting on text information, wherein the first dimension is an obvious segmentation symbol such as a space, a comma, a semicolon and a Tab key, dividing the text into short sentences, the second dimension is a regular expression, extracting elements such as characters and numbers in the text, dividing the short sentences into words with medium granularity such as the characters and the numbers, the third dimension is a jieba word segmentation, and the third dimension is a jieba word segmentation and word segmentationCharacters and numbers are cut into finer granularity, thereby forming words of three granularities, namely word_c,word_m,word_sFor word information of three granularities, because table information is two-dimensional information of N × M, the scheme divides the two-dimensional information into subtasks of two dimensions, for information in any cell, the information is divided into prediction of column positions and prediction of row positions, the column positions are associated with column name information, namely tasks for identifying named entities, each element is labeled as 'column name' information, for labeling of row information, information of each row is labeled as '0/1' classification problem, when all information conforming to the table rows is labeled as '1', when not conforming to the table rows, the label is labeled as '0'.

Step two is based on word_c,word_m,word_sThe method comprises the steps of firstly adopting word2vec (including but not limited) to vectorize the participles with different scales to obtain the vector characteristics of each participle and integrating the position structure information of the participle, carrying out structure coding on the position of each participle, constructing the position information of each word in the text under the condition of only one line or a plurality of lines of texts, representing the position information of each participle in the lines and columns of the text by using a connection matrix, wherein the connection matrix is defined as A [ i, j ]]1 (when two words are in the same vertical position, or adjacent left and right), otherwise a [ i, j [ ]]With 0, there are three tokens of different granularity, so there are three different connection matrices Ac i, j]，Am[i,j]And As [ i, j ]]Performing vectorization training on the word information by adopting GCN; because each text segment has three participles with different granularities, we adopt the following GCN formula:

wherein the content of the first and second substances,

a is an adjacency matrix and I is an identity matrix;

for normalizing

H^(t)、H^(t+1)Respectively representing the coding of each node in the diagram at the t-th layer and the t + 1-th layer; w^(l)Is a parameter to be learned; h⁽⁰⁾The method comprises the steps of firstly, obtaining three word vectors, namely X, wherein X is initial input, coding the three word vectors through a feature extraction formula of GCN, and obtaining vector codes of words with three different granularities, wherein the three word vectors are respectively H_c，H_mAnd H_s。

And thirdly, coding the character layer by adopting a pre-trained Albert model, and splicing a BilSTM layer on the character layer to be used as an embedding matrix TE.

The four character codes form a code matrix TE of each character, the participles with three different granularities are vectorized to form a code of the word, the participle codes and the character codes are fused by adopting a GAT algorithm, the participles are directly spliced behind the characters, an (N + M) × (N + M) adjacent matrix K is constructed assuming that the length of the characters is N and the number of the participles is M, and when the word contains the information of the characters, K [ i, j ] is used for constructing the adjacent matrix K]1, otherwise K [ i, j [ ]]Constructing three field matrixes K based on three different participles as 0_c，K_mAnd K_sAnd splicing the words and character codes by using a GAT algorithm, wherein the GAT operation method is as follows, and in the GAT operation, the input of the t-th layer is a point set F^t＝{f₁,f₂,...,f_NThere is also an adjacency matrix G, using GAT with multiple heads, the main calculation formula is as follows:

wherein, f'_i∈R^FRepresenting input characteristics of the node i;f’_j∈R^Frepresenting the output characteristic of node j; | represents a splicing operation; σ represents a nonlinear activation function; v. of_iA contiguous vertex representing i;

represents a node i

The weight of the edge connected to node j; w^k∈R^F’×FRepresenting a linear transformation matrix for linearly transforming the features;

and

respectively, the weight parameters of the feedforward neural network; shielding alpha using G^kThe corresponding position.

The output of the last layer is obtained by t ═ 1, 2., N, respectively, and then the result of the last AF for GAT is calculated:

obtaining three different word segmentation and character fusion vector matrixes Q according to the formula_c,Q_mAnd Q_s。

Combining the first vector matrix and the second vector matrix, and secondarily fusing the three vector matrixes with character vectors, wherein the aggregation formula is as follows: z ═ W₁H+W₂Q_c+W₃Q_m+W₄Q_s

Wherein W₁、W₂、W₃、W₄H is the final vector matrix forming the character, which is the parameter matrix to be trained.

The fifth step is to label the text in series, similar to the task of named entity recognition, and label the characters of the text in the BIO form; and training the column information by using a cross entropy function (coordinated loss), wherein the loss function is defined as NER _ loss.

SaidAnd step six, extracting the character vector based on the result of column information prediction. Extracting the character information determined as an entity in consideration of the requirements of downstream tasks, and considering the reason that the lengths of each word in Chinese are different; to form basic vector information for predicting row information, a mean method is adopted to aggregate character vectors contained in each word, and the formula is as follows

Therefore, a word vector of each column is obtained, editable combination is carried out on information of each column to form row information, the process is an editable process, aiming at a general field, a mode of freely combining information of each column can be directly adopted to form various combinations of row information, a combination formula is assumed to have n columns, and M is extracted from a section of text_iThe entity information of the ith column, then the SUM is formed as M₁*M₂*...*M_nCombination information of seed lines, supplementary information: for a particular private domain, some rules for that domain may be added to form a line information combination. The formation of the mandatory information conforms to the rule requirements of the field, and the mandatory information is a freely editable module.

The seventh step is that firstly, each word vector of the randomly combined rows is encoded, the vector of each word is formed and is used as a node vector of the Graph network, secondly, the GAT operation is adopted in the scheme again, and the column information in each row which is freely combined is encoded and learned, wherein the operation method is the same as the fourth step, and is different from the fourth step only in an adjacent matrix G; and thus, vector information R of each row is formed, in the training process, because the row information is randomly combined, when the randomly combined row is in the marked row information, the result is 1, otherwise, the result is 0, and thus, the training process is consistent with the preprocessed row information, and the row information is trained and learned by using a cross entropy function (clustering loss) through comparison between prediction of the random combination and marked 0/1, wherein the loss is defined as structure _ loss.

Weighting the Loss functions of the eight columns and the eight rows to obtain a total Loss function Loss which is NER _ Loss + alpha structure _ Loss and serves as the total Loss function of the model, wherein alpha is an adjustable hyper-parameter, and training the model based on the Loss function; and finally obtaining the result of the model.

The working principle of the invention is as follows: the method comprises the steps of adopting a concept of a joint (join) model, fusing a plurality of tasks such as text information extraction, table relation judgment, whether a column of information is formed and the like into a multi-task model, solving a plurality of subtasks through one model by one model, and finally forming standard table structure information.

After the technical scheme is adopted, the invention has the beneficial effects that: by means of the model, the unstructured text is directly converted into the table data, the data in the financial data field reaches the commercialization standard, compared with the Pipeline form, the data is improved by 3-5 percent, and the problem of error transmission of the Pipeline is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a structure of a traditional financial field for directly converting unstructured text into a table;

FIG. 2 is a schematic diagram of the learning and training process of the algorithm model of the present invention;

FIG. 3 is a diagram of the deep learning model architecture for the text structured into tables based on graph attention in accordance with the present invention.

Detailed Description

Referring to fig. 1 to 3, the present invention comprises the following steps: preprocessing, cleaning data, segmenting text into words, forming characters and words, and labeling table lines; secondly, vectorizing words; step three, a character coding layer; step four, a connection layer of character coding and word coding; step five, predicting column information; step six, preprocessing of prediction of row information; step seven, predicting the line information; step eight, setting the total loss function.

Further, the first step is preprocessing, data cleaning, and irregular data cleaning and replacing are carried out. For example, the method comprises the steps of performing word segmentation and cutting on text information, wherein the first dimension is an obvious segmentation symbol such as a space, a comma, a semicolon and a Tab key, dividing the text into short sentences, the second dimension is a regular expression, extracting elements such as characters and numbers in the text, dividing the short sentences into words with medium granularity such as the characters and the numbers, the third dimension is a jieba word segmentation, and performing finer-grained cutting on the characters and the numbers, thereby forming words with three granularities, namely words with word and word_c,word_m,word_sFor word information of three granularities, because table information is two-dimensional information of N × M, the scheme divides the two-dimensional information into subtasks of two dimensions, for information in any cell, the information is divided into prediction of column positions and prediction of row positions, the column positions are associated with column name information, namely tasks for identifying named entities, each element is labeled as 'column name' information, for labeling of row information, information of each row is labeled as '0/1' classification problem, when all information conforming to the table rows is labeled as '1', when not conforming to the table rows, the label is labeled as '0'.

Further, the second step is based on word_c,word_m,word_sThe method comprises the steps of carrying out vectorization on the position information of the segmentation fusion words by three segmentation words with different scales, firstly carrying out vectorization on the segmentation words with different scales by adopting word2vec (including but not limited to), and obtaining the vector characteristics of each segmentation wordThe method includes the steps of figuring and merging position structure information of participles, carrying out structure coding on the position of each participle by the scheme, constructing the position information of each participle in the text under the condition that only one line or multiple lines of texts exist, and representing the position information of each participle in the lines and columns of the text by a connection matrix, wherein the connection matrix is defined as A [ i, j ]]1 (when two words are in the same vertical position, or adjacent left and right), otherwise a [ i, j [ ]]With 0, there are three tokens of different granularity, so there are three different connection matrices Ac i, j]， Am[i,j]And As [ i, j ]]Performing vectorization training on the word information by adopting GCN; because each text segment has three participles with different granularities, we adopt the following GCN formula:

wherein the content of the first and second substances,

a is an adjacency matrix and I is an identity matrix;

for normalizing

Further, the third step is to encode the character layer, and a layer of BilSTM is spliced on the character layer by adopting a pre-trained Albert model to be used as an embedding matrix TE.

Further, the step four character encoding forms an encoding matrix TE of each character, the three participles with different granularities are vectorized to form word encoding, and the GAT algorithm is adopted to carry out word encoding and character encodingMerging, directly splicing the participles behind the characters, constructing an (N + M) × (N + M) adjacency matrix K under the assumption that the length of the characters is N and the number of the participles is M, and when the words contain the information of the characters, K [ i, j ] is]1, otherwise K [ i, j [ ]]Constructing three field matrixes K based on three different participles as 0_c，K_mAnd K_sAnd splicing the words and character codes by using a GAT algorithm, wherein the GAT operation method is as follows, and in the GAT operation, the input of the t-th layer is a point set F^t＝{f₁,f₂,...,f_NThere is also an adjacency matrix G, using GAT with multiple heads, the main calculation formula is as follows:

wherein, f'_i∈R^FRepresenting input characteristics of the node i; f'_j∈R^FRepresenting the output characteristic of node j; | represents a splicing operation; σ represents a nonlinear activation function; v. of_iA contiguous vertex representing i;

a weight representing an edge connecting node i and node j; w^k∈R^F’×FRepresenting a linear transformation matrix for linearly transforming the features;

and

Furthermore, the fifth step of labeling the text in series, similar to the task of named entity recognition, labeling the characters of the text in the BIO form; and training the column information by using a cross entropy function (coordinated loss), wherein the loss function is defined as NER _ loss.

Further, the sixth step extracts the character vector based on the result of the column information prediction. Extracting the character information determined as an entity in consideration of the requirements of downstream tasks, and considering the reason that the lengths of each word in Chinese are different; to form the basis vector information for the prediction of row information, we use mean

The method (2) aggregates the character vectors contained in each word, and the formula is

Therefore, a word vector of each column is obtained, editable combination is carried out on information of each column to form row information, the process is an editable process, aiming at a general field, a mode of freely combining information of each column can be directly adopted to form various combinations of row information, a combination formula is assumed to have n columns, and M is extracted from a section of text_iThe entity information of the ith column, then the SUM is formed as M₁*M₂*...*M_nThe combination information of the seed lines is,and (3) supplementary information: for a particular private domain, some rules for that domain may be added to form a line information combination. The formation of the mandatory information conforms to the rule requirements of the field, and the mandatory information is a freely editable module.

Further, the seventh step is to encode each word vector of the randomly combined rows, and based on the formed vector of each word, the vector is used as a node vector of the Graph network, and then the scheme adopts the GAT operation again to encode and learn the column information in each row which is freely combined, and the operation method is the same as the fourth step and is different from the fourth step only in the adjacent matrix G; and thus, vector information R of each row is formed, in the training process, because the row information is randomly combined, when the randomly combined row is in the marked row information, the result is 1, otherwise, the result is 0, and thus, the training process is consistent with the preprocessed row information, and the row information is trained and learned by using a cross entropy function (clustering loss) through comparison between prediction of the random combination and marked 0/1, wherein the loss is defined as structure _ loss.

Further, the Loss function weighting of eight columns and rows in the step obtains a total Loss function Loss ═ NER _ Loss + α structure _ Loss as the total Loss function of the model, where α is an adjustable hyper-parameter, and based on the Loss function, the model is trained; and finally obtaining the result of the model.

Furthermore, in the invention, jieba word segmentation is adopted, a feature dictionary is added, multi-granularity word segmentation is carried out, word2vec is adopted for vectorization, other word vectorization and word segmentation modes and new technology appearing in the future can be adopted, and the financial data is tabulated at present, but the technical patent is not limited to the financial data and can be applied to any other task needing to structure a section of text into a table.

After the technical scheme is adopted, the invention has the beneficial effects that: by means of the model, the unstructured text is directly converted into the table data, the data in the financial data field reaches the commercialization standard, compared with the Pipeline form, the data is improved by 3-5 percent, and the problem of error transmission of the Pipeline is reduced. The above description is only for the purpose of illustrating the technical solutions of the present invention and not for the purpose of limiting the same, and other modifications or equivalent substitutions made by those skilled in the art to the technical solutions of the present invention should be covered within the scope of the claims of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A deep learning model apparatus for structuring financial text into forms, comprising: the method comprises the following steps: preprocessing, cleaning data, segmenting text into words, forming characters and words, and labeling table lines; secondly, vectorizing words; step three, a character coding layer; step four, a connection layer of character coding and word coding; step five, predicting column information; step six, preprocessing of prediction of row information; step seven, predicting the line information; step eight, setting the total loss function.

2. The apparatus of claim 1, wherein the deep learning model apparatus is further configured to structure financial text into a form, and wherein: the method comprises the following steps of preprocessing, data cleaning, and cleaning and replacing irregular data. Such as ' full angle and half angle conversion ', removing special symbols, such as ' emoticons ', etc., establishing a multi-dimensional word segmentation method, and performing word segmentation on text information, wherein the first dimension is ' space, comma, mark, TabKey ' and other obvious segmenters, dividing the text into short sentences, extracting the elements such as ' characters, numbers ' and the like in the text by adopting a regular expression in the second dimension, dividing the short sentences into words with medium granularity of ' characters and numbers ', dividing the words by adopting jieba in the third dimension, and cutting the characters and the numbers with finer granularity, thereby forming words with three granularities, namely word_c,word_m,word_sFor word information of three granularities, because table information is two-dimensional information of N × M, the scheme divides the two-dimensional information into subtasks of two dimensions, for information in any cell, the information is divided into prediction of column positions and prediction of row positions, the column positions are associated with column name information, namely tasks for identifying named entities, each element is labeled as 'column name' information, for labeling of row information, information of each row is labeled as '0/1' classification problem, when all information conforming to the table rows is labeled as '1', when not conforming to the table rows, the label is labeled as '0'.

3. The apparatus of claim 1, wherein the deep learning model apparatus is further configured to structure financial text into a form, and wherein: step two is based on word_c,word_m,word_sThe method comprises the steps of firstly adopting word2vec (including but not limited) to vectorize the participles with different scales to obtain the vector characteristics of each participle and integrating the position structure information of the participle, carrying out structure coding on the position of each participle, constructing the position information of each word in the text under the condition of only one line or a plurality of lines of texts, representing the position information of each participle in the lines and columns of the text by using a connection matrix, wherein the connection matrix is defined as A [ i, j ]]1 (when two words are in the same vertical position, or adjacent left and right), otherwise a [ i, j [ ]]With 0, there are three tokens of different granularity, so there are three different connection matrices Ac i, j]，Am[i,j]And As [ i, j ]]Performing vectorization training on the word information by adopting GCN; because each text segment has three participles with different granularities, we adopt the following GCN formula:

wherein the content of the first and second substances,

a is an adjacency matrix and I is an identity matrix;

for normalizing

4. The apparatus of claim 1, wherein the deep learning model apparatus is further configured to structure financial text into a form, and wherein: and thirdly, coding the character layer by adopting a pre-trained Albert model, and splicing a BilSTM layer on the character layer to be used as an embedding matrix TE.

5. The apparatus of claim 1, wherein the deep learning model apparatus is further configured to structure financial text into a form, and wherein: the four character codes form a code matrix TE of each character, the participles with three different granularities are vectorized to form a code of the word, the participle codes and the character codes are fused by adopting a GAT algorithm, the participles are directly spliced behind the characters, an (N + M) × (N + M) adjacent matrix K is constructed assuming that the length of the characters is N and the number of the participles is M, and when the word contains the charactersWhen K [ i, j ] is information]1, otherwise K [ i, j [ ]]Constructing three field matrixes K based on three different participles as 0_c，K_mAnd K_sAnd splicing the words and character codes by using a GAT algorithm, wherein the GAT operation method is as follows, and in the GAT operation, the input of the t-th layer is a point set F^t＝{f₁,f₂,...,f_NThere is also an adjacency matrix G, using GAT with multiple heads, the main calculation formula is as follows:

a weight representing an edge connecting node i and node j; w^k∈R^F ^‘×FRepresenting a linear transformation matrix for linearly transforming the features;

and

Combining the first vector matrix and the second vector matrix, and secondarily fusing the three vector matrixes with character vectors, wherein the aggregation formula is as follows:

Z＝W₁H+W₂Q_c+W₃Q_m+W₄Q_s

6. The apparatus of claim 1, wherein the deep learning model apparatus is further configured to structure financial text into a form, and wherein: the fifth step is to label the text in series, similar to the task of named entity recognition, and label the characters of the text in the BIO form; and training the column information by using a cross entropy function (coordinated loss), wherein the loss function is defined as NER _ loss.

7. The apparatus of claim 1, wherein the deep learning model apparatus is further configured to structure financial text into a form, and wherein: and the sixth step of extracting the character vector based on the result of the column information prediction. Extracting the character information determined as an entity in consideration of the requirements of downstream tasks, and considering the reason that the lengths of each word in Chinese are different; to form basic vector information for predicting row information, a mean method is adopted to aggregate character vectors contained in each word, and the formula is as follows

Therefore, word vectors of each column are obtained, editable combination is carried out on information of each column to form row information, the process is an editable process, and free groups of information of each column can be directly adopted for a general fieldCombining mode, forming various line information combinations and combination formulas, supposing that there are n columns, extracting M from a text_iThe entity information of the ith column, then the SUM is formed as M₁*M₂*...*M_nCombination information of seed lines, supplementary information: for a particular private domain, some rules for that domain may be added to form a line information combination. The formation of the mandatory information conforms to the rule requirements of the field, and the mandatory information is a freely editable module.

8. The apparatus of claim 1, wherein the deep learning model apparatus is further configured to structure financial text into a form, and wherein: the seventh step is that firstly, each word vector of the randomly combined rows is encoded, the vector of each word is formed and is used as a node vector of the Graph network, secondly, the GAT operation is adopted in the scheme again, and the column information in each row which is freely combined is encoded and learned, wherein the operation method is the same as the fourth step, and is different from the fourth step only in an adjacent matrix G; thus, vector information R of each row is formed, in the training process, because the row information is randomly combined, when the row information of the random combination is in the row information of the label, the result is 1, otherwise, the result is 0, and thus the row information is consistent with the preprocessed row information, and the row information is trained and learned by adopting a cross entropy function (trajectory) through comparison between prediction of the random combination and 0/1 of the label, wherein the trajectory is defined as structure _ trajectory; weighting the Loss functions of the eight columns and the eight rows to obtain a total Loss function Loss which is NER _ Loss + alpha structure _ Loss and serves as the total Loss function of the model, wherein alpha is an adjustable hyper-parameter, and training the model based on the Loss function; and finally obtaining the result of the model.