CN113158674A

CN113158674A - Method for extracting key information of document in field of artificial intelligence

Info

Publication number: CN113158674A
Application number: CN202110353610.XA
Authority: CN
Inventors: 曲晨帆; 金连文; 林上港; 马骏; 刘振鑫; 谭濯
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2021-07-23
Anticipated expiration: 2041-04-01
Also published as: CN113158674B

Abstract

The invention discloses a method for extracting key information of a document in the field of artificial intelligence, which comprises the following steps: s1, collecting document data in the artificial intelligence field, and performing key information extraction data annotation; s2, performing further pre-training on the pre-training model RoBERTA; s3, constructing an information extraction model; s4, initializing parameters of the backbone network by using a RoBERTA model obtained by further pre-training; s5, training by using the marked data, carrying out random replacement and data enhancement on the marked data in the training process, and calculating the error of back propagation by using the square cross entropy loss; and S6, extracting information in the unstructured text in the field of artificial intelligence by using the trained information extraction model to obtain result triples. The method of the invention solves the problem that the performance effect is greatly reduced when a sequence labeling model deals with a long-span knowledge text by extracting information as a machine reading understanding task and predicting the positions of the starting point and the end point of each key information in the text.

Description

Method for extracting key information of document in field of artificial intelligence

Technical Field

The invention belongs to the technical field of artificial intelligence natural language processing, and particularly relates to a method for extracting key information of a document in the field of artificial intelligence.

Background

Massive unstructured text documents in the field of artificial intelligence science contain abundant knowledge, and if the unstructured text documents can be structured, ways for people to acquire related knowledge can be greatly enriched, and difficulty for people to acquire related knowledge is reduced. However, the traditional manual-dominated structuring method consumes a lot of human resources and is inefficient, and is not an optimal choice for solving the problem. In contrast, using machines for critical information extraction and knowledge structuring is a very efficient and economical approach.

At present, more and more key information extraction methods based on deep learning are proposed, but certain disadvantages still exist. The key information extraction method based on sequence annotation is more suitable for occasions with short text spans, but a complete result is difficult to obtain when subjects and objects with long text spans are faced. Although the information extraction model HBT based on machine reading understanding can alleviate the above problems, direct application is not effective. In addition, there are various knowledge types in knowledge texts in the field of natural science such as artificial intelligence, and it is not realistic to define these relationship types by covering with an exhaustive method, and although the open information extraction form can solve this problem, some research focuses on the open information extraction of the contents of a sentence, and most methods are implemented by syntactic analysis through rules predefined by human experts. In practical application, the expression modes of knowledge in related text contents are very variable, extraction needs to be performed from the perspective of the whole section, so that it is very difficult to define rules with wide coverage and strong expansibility, and extraction through a machine learning mode faces the problem of insufficient model generalization capability caused by large learning difficulty and less labeled data.

Disclosure of Invention

The invention mainly aims to overcome the defects of the prior art and provide a method for extracting key information of a document in the field of artificial intelligence, which solves the problem of great performance effect reduction when a sequence labeling model deals with a long-span knowledge text by taking information extraction as a machine reading understanding task and predicting the positions of a starting point and an end point of each key information in the text.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for extracting document key information in the field of artificial intelligence comprises the following steps:

s1, collecting document data in the artificial intelligence field, and then performing key information extraction data annotation by using the collected data;

s2, performing further pre-training on the pre-training model RoBERTA in unstructured texts in the field of artificial intelligence;

s3, constructing an information extraction model;

s4, initializing backbone network parameters of the information extraction model by using the RoBERTA model obtained by further pre-training;

s5, training by using the marked data, carrying out random replacement and data enhancement on the marked data in the training process, and calculating the error of back propagation by using the square cross entropy loss;

and S6, extracting information in the unstructured text in the field of artificial intelligence by using the trained information extraction model to obtain result triples, and integrating the result triples.

Further, the step S1 specifically includes:

s11, collecting unstructured text paragraphs derived from scientific publications, documents and network popular science knowledge related to the field of artificial intelligence, and limiting the length of the text paragraphs to be within 510 characters;

s12, defining the type of the key information triple to be extracted, specifically:

the general relationship definition method is adopted to define 5 triple types:

entity-description content, entity-presenter name, entity-inclusion content, entity-application content, and entity-alias name;

defining 4 triple types by adopting a pseudo relation definition method:

entity attribute-pseudo relationship 1-entity, entity attribute-pseudo relationship 2-description content, entity attribute-pseudo relationship 3-application content, and entity attribute-pseudo relationship 4-inclusion content;

s13, labeling the defined triplet type, specifically:

opening a text to be annotated in an open source text annotation tool brat, selecting a certain segment of characters in the text to be annotated as a starting entity subject of a certain triple by using a mouse cursor, clicking and selecting the entity category of the subject in a popped up selection window, selecting an end entity object of the triple in the same way and selecting the category of the end entity object, finally generating a relation connecting line by selecting the subject in the triple by using a mouse and dragging the subject to the object, and selecting the category of the relation connecting line in the popped up selection window to finish the annotation of the triple; and repeating the steps until the labeling of all the triples in all the texts to be labeled is completed.

Further, the RoBERTa model specifically comprises an Embedding layer with three characteristic dimensions of 756, twelve transform layers with characteristic dimensions of 756, and a full-connection layer with an input channel number of 756 and an output channel number of 756, wherein the output channel number is the total number of character types in all training text data;

the three Embedding layers are a Token Embedding layer, a Position Embedding layer and a Segment Embedding layer respectively;

the three Embedding layers respectively map the text data of the input model into a feature vector with the shape of the number of the text segments input into the model multiplied by 512 multiplied by 756, and the feature vector with the shape of the number of the text segments input into the model multiplied by 512 multiplied by 756 obtained by adding the three output feature vectors is used as the integral output of the three Embedding layers and is used as the input of twelve transform layers of the RoBERTA model; the twelve transform layer outputs of the RoBERTa model are a feature vector shaped as the number of text segments input to the model x 512 x 756, and serve as inputs to the fully-connected layer, which is the probabilistic prediction result of the model for each character in each word in the input text segment that is replaced by a preset token, as each character in a dictionary that is the collection of all characters of all input training text segment data.

Further, the step S2 is specifically:

for a pre-training model RoBERTA, firstly, segmenting a training text by using a jieba word segmentation tool, and then initializing RoBERTA model parameters to be trained by using the pre-training RoBERTA model parameters; and then randomly replacing partial words in the word segmentation result by adopting a preset mark based on the word segmentation result of the jieba word segmentation tool in each iteration, inputting the processing result into a pre-training model RoBERTA, and predicting the words to be replaced by the mark by using the pre-training model RoBERTA.

Further, the constructing of the information extraction model specifically includes:

based on a Roberta model, adding a subject prediction module behind a transform layer at the 10 th layer of the Roberta model, adding a feature fusion module behind the subject prediction module, and adding a predict-object prediction module behind the feature fusion module;

the subject prediction module specifically comprises a full connection layer with 756 input channels and 2 output channels, and a ReLU layer, a Dropout layer and a Sigmoid activation function layer which are connected with the full connection layer;

the characteristic fusion module specifically comprises a full connection layer with input and output channel numbers of 1512 and 756 respectively, and a ReLU layer, a Dropout layer and a last two layers of transformers of Roberta which are connected with the full connection layer;

the prediction-object prediction module specifically comprises a full connection layer with an input channel number of 756 and an output channel number of 2 × the total number of prediction categories, and a ReLU layer, a Dropout layer and a Sigmoid activation function layer which are connected with the full connection layer.

Further, the input of the subject prediction module is a feature vector which is output by a 10 th layer transform layer of the information extraction model and has the shape of the number of text segments input into the model multiplied by 512 multiplied by 756, and the output is a probability prediction result which corresponds to 512 character positions of the original input information extraction model, wherein each character position is a starting point of the subject and each character position is an end point of the subject;

the feature fusion module fuses and inputs feature semantics of the subject into a feature vector output by a text segment of an information extraction model in a feature vector output by a transducer at the 10 th layer of a Roberta model to obtain a feature vector of the feature of the subject, wherein the input is the feature vector output by the transducer at the 10 th layer of the Roberta model and having the shape of the text segment quantity input into the model multiplied by 512 multiplied by 756, and the input is a selected subject starting point position label value and a selected subject end point position label value both having the shape of the text segment quantity input into the model multiplied by 1; the selected object is obtained by dynamically and randomly selecting one object from all the labeled objects of each sample in the training text data input into the model during iteration;

during training, the feature fusion module firstly selects vectors of corresponding positions in feature vectors output by a RoBERTA 10 th layer Transformer according to input starting and end positions of a subject to obtain two vectors with the text segment quantity multiplied by 756 input to a model, respectively copies the two vectors into 512 parts to obtain two vectors with the text segment quantity multiplied by 512 multiplied by 756 input to the model, splices the two vectors in feature dimensions to obtain one vector with the text segment quantity multiplied by 512 multiplied by 1512 input to the model, inputs the result into a full-link layer network of the feature fusion module to obtain an output vector with the text segment quantity multiplied by 512 multiplied by 756 input to the model, adds the output vector with the feature vectors output by the 10 th layer Transformer and then passes through the two layers of transformers of the feature fusion module to obtain the output of the feature fusion module;

the input of the prediction-object prediction module is a feature vector of the fused object feature output by the feature fusion module, and the output is the probability of each category of the object corresponding to the selected object, the probability of each category of the prediction between the selected object and the object, and the probability of the starting character position and the ending character position of the object in each character position of the text segment input to the information extraction model.

Further, the initializing the backbone network parameters in step S4 specifically includes:

initializing each Embedding layer of the information extraction model by using each Embedding layer of the Roberta model obtained by training, and initializing each Transformer layer of the information extraction model by using each Transformer layer of the Roberta model obtained by training;

the initial parameters of the full-connected layer in the subject prediction module, the feature fusion module and the prediction-object prediction module are obtained by random sampling in normal distribution with the mean value of 0 and the variance of 2 ÷ the number of input channels of the layer.

Further, the step S5 specifically includes:

s51, extracting labeled data before the model from the input information, and performing random replacement and data enhancement on the labeled data to improve the generalization performance of the model and reduce overfitting;

s52, training by using the square binary cross entropy loss during training, specifically:

respectively squaring a subject prediction probability result and a predict-object prediction probability result output by a Sigmoid activation function layer before performing two-class cross entropy loss calculation;

and simultaneously calculating an error Ls corresponding to the subject prediction result and an error Lpo corresponding to the prediction-object prediction result by the quadratic binary cross entropy loss, wherein the final back propagation error is as follows:

Loss＝k1×Ls+k2×Lpo

wherein k1 and k2 are selected according to actual conditions;

and S53, performing fine tuning training in the training process, wherein the learning rate is 1e-6 at first, then gradually increased to 5e-5, and finally gradually decreased.

Further, in step S51, the random replacement specifically includes:

in each iteration of the training process, before data is input into the information extraction model, one entity is replaced by another entity at random according to a certain probability, one entity attribute is replaced by another entity attribute at random according to a certain probability, application content is replaced by another application content at random according to a certain probability, contained content is replaced by another contained content at random according to a certain probability, and a presenter is replaced by another presenter at random according to a certain probability;

the data enhancement specifically comprises:

in each iteration of the training process, before data is input into the information extraction model, one word in the description content is replaced, added and deleted randomly.

Further, the step S6 specifically includes:

s61, firstly, inputting the text into an information extraction model to obtain the prediction probability results of the starting point and the end point of the subject at each position in the text sequence;

taking all positions with the probability of the predicted starting point being more than 0.5 as the starting positions of the subject, and taking all positions with the probability of the predicted end point being more than 0.5 as the predicted end positions of the subject;

for the starting point position prediction result of each subject, finding the end point position prediction result of the nearest subject behind the position in the text, matching the end point position prediction result with the starting point position prediction result, and taking the content of the corresponding position in the text as the subject prediction result according to each pair of the starting point position and the end point position;

s62, combining the predicted start position and end position of the n paired objects into a batch to obtain an n multiplied by 2 vector;

simultaneously, for each pair of subjects, extracting the 10 th layer transform output feature vector corresponding to the corresponding text of the subject to obtain an expanded 10 th layer transform output feature vector, wherein the shape of the expanded 10 th layer transform output feature vector is nx512 x 756;

respectively taking out the content of corresponding positions of the expanded 10 th layer transform output feature vectors according to the starting point and the end point of each object, respectively obtaining a starting point vector of nx756 and an end point vector of nx756, respectively copying 512 parts of the starting point vector and the end point vector of nx756 to obtain vectors of nx512 x 756 and nx512 x 756, splicing the two vectors in a feature dimension to obtain a feature vector of nx512 x 1512, passing the feature vector through a full connection layer of a feature fusion module to obtain a feature vector of nx512 x 756, adding the obtained feature vector and the feature vector obtained by the expanded 10 th layer transform, and passing the added feature vector through two layers of transforms of the feature fusion module to obtain a feature vector of fused objects;

s63, inputting the feature vector of the fusion subject feature obtained in the step S62 into a prediction result obtained by a prediction-object prediction module;

taking the position with the prediction probability of all the category starting points being more than 0.5 as the category starting point position, and taking the position with the prediction probability of all the category ending points being more than 0.5 as the category ending point position;

for the starting point position prediction result of each predict category, finding the nearest end point position prediction result of the same predict category which is later than the position in the text to pair with the same predict category, and taking the content of the corresponding position in the corresponding text as the object result according to the starting point position and the end point position of each pair of predict categories;

s64, regarding the extracted object-predicate-object triple, if the entity attribute is taken as the object, firstly finding out an entity corresponding to the entity attribute as the triple of the object, namely the entity attribute-pseudo relation 1-entity, then finding out all triples corresponding to the entity attribute, which are not the entity as the object, namely the entity attribute-pseudo relation 2-description content, the entity attribute-pseudo relation 3-application content and the entity attribute-pseudo relation 4-inclusion content, then combining the triples with the common entity attribute as the predicate after removing the pseudo relation into a new triple, namely the entity-entity attribute-content, and finally realizing the open information extraction;

and if the extracted triple does not take the entity attribute as the subject, directly taking the extracted triple as a result.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. according to the method, information extraction is used as a machine reading understanding task to be solved, the starting point and the end point of each key information in the text are predicted, and the problem that the performance effect is greatly reduced when a sequence labeling model deals with a long-span knowledge text is solved; by utilizing the method, infinite relation types can be extracted, and the closed and open information extraction methods are combined into the same frame, so that the accuracy of information extraction is improved.

2. The method plays the advantages of the pre-training model, still shows strong generalization performance under the condition of not rich labeled samples, and can cope with the whole text and variable knowledge expression forms.

3. The method is improved based on the HBT model, and the subject prediction module and the feature fusion module are arranged at proper positions, so that the model keeps proper feature sharing and improves the performance, and simultaneously solves the problem of negative influence caused by the span difference between the subject and the object, thereby improving the overall performance of the information extraction model.

4. The method optimizes the model by using the quadratic binary cross entropy loss, has the effect of on-line difficult sample mining, ensures that the model pays more attention to the correct selection of the starting point and the end point, relieves the unbalanced interference of positive and negative samples caused by a large number of negative samples, enlarges the classification boundary of the positive samples and further improves the overall performance of the model; the generalization capability of the model is improved by carrying out random intra-class interchange on entities, entity attributes, application contents, contained contents, alternative names and proposers and by carrying out random operation on description contents.

Drawings

FIG. 1 is an overall flow diagram of the present invention;

FIG. 2 is a flow chart of the information extraction model training steps of the present invention;

FIG. 3 is a diagram of a subject prediction module of the information extraction model of the present invention;

FIG. 4 is a diagram of a feature fusion module of the information extraction model of the present invention;

FIG. 5 is a diagram of a predict-object prediction module of the information extraction model of the present invention;

FIG. 6 is a flow diagram of information extraction model inference of the present invention;

FIG. 7 is a diagram of the pseudo-relationship knowledge consolidation method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Examples

As shown in FIG. 1, the invention provides a method for extracting document key information in the field of artificial intelligence, which comprises the following steps:

s1, collecting document data in the field of artificial intelligence, and then performing key information extraction data annotation by using the collected data, which specifically comprises:

the general relationship definition is used to define 5 triple types:

defining 4 triple types by adopting a pseudo relation definition method:

entity attribute-pseudo relationship 1-entity, entity attribute-pseudo relationship 2-description content, entity attribute-pseudo relationship 3-application content, and entity attribute-pseudo relationship 4-inclusion content.

S13, labeling the defined triplet type, specifically:

S2, the pre-training model RoBERTA is pre-trained in an unstructured text in the field of artificial intelligence in a self-supervision mode, and the method specifically comprises the following steps:

for a pre-training model RoBERTA, firstly, segmenting a training text by using a jieba word segmentation tool, and then initializing RoBERTA model parameters to be trained by using the pre-training RoBERTA model parameters; then, in each iteration, based on the word segmentation result of the jieba word segmentation tool, a preset mark is adopted to randomly replace partial words in the word segmentation result, the processing result is input into a pre-training model RoBERTA, and then the pre-training model RoBERTA is used to predict words which are marked and replaced; in this embodiment, the adopted preset marks are: [ MASK ];

the RoBERTA model specifically comprises an Embellding layer with 756 characteristic dimensions, twelve transform layers with 756 characteristic dimensions and a full-connection layer with 756 input channels and 756 output channels as the total number of character types in all training text data;

the three Embedding layers respectively map the text data of the input model into a feature vector with the shape of the number of the text segments input into the model multiplied by 512 multiplied by 756, and the feature vector with the shape of the number of the text segments input into the model multiplied by 512 multiplied by 756 obtained by adding the three output feature vectors is used as the integral output of the three Embedding layers and is used as the input of twelve transform layers of the RoBERTA model; the twelve transform layer outputs of the RoBERTa model are a feature vector with the shape of the number of text segments input into the model multiplied by 512 multiplied by 756 and serve as the input of a full-link layer, the output of the full-link layer is the probability prediction result of the model for each character in each word replaced by a preset mark in the input text segment as each character in a dictionary, and the dictionary is the set of all characters of all input training text segment data; in this embodiment, the adopted preset marks are: [ MASK ].

S3, constructing an information extraction model, specifically:

as shown in fig. 3, the subject prediction module specifically includes a full connection layer with 756 input channels and 2 output channels, and a ReLU layer, a Dropout layer, and a Sigmoid activation function layer connected to the full connection layer;

as shown in fig. 4, the feature fusion module specifically includes a fully-connected layer with input and output channel numbers 1512 and 756, and a ReLU layer, a Dropout layer, and a last two layers of transformers of RoBERTa connected thereto;

as shown in fig. 5, the predict-object prediction module specifically includes a full connection layer with an input channel number of 756 and an output channel number of 2 × the total number of predict categories, and a ReLU layer, a Dropout layer, and a Sigmoid activation function layer connected to the full connection layer.

In this embodiment, the subject module uses the feature vectors output by the layer 10 transform layer to predict the probability of the starting point and the ending point of all subjects in the input text paragraph in parallel, and the input is the shape output by the layer 10 transform layer of the information extraction model: inputting the feature vector of text segment quantity multiplied by 512 multiplied by 756 into the model, wherein the output is the probability prediction result of 512 character positions corresponding to the original input information extraction model, each character position is the starting point of the subject and each character position is the ending point of the subject;

the feature fusion module fuses and inputs feature semantics of the subject into a feature vector output by a text segment of an information extraction model in a feature vector output by a transducer at the 10 th layer of a Roberta model to obtain a feature vector of the feature of the subject, wherein the input is the feature vector output by the transducer at the 10 th layer of the Roberta model and having the shape of the text segment quantity input into the model multiplied by 512 multiplied by 756, and the input is a selected subject starting point position label value and a selected subject end point position label value both having the shape of the text segment quantity input into the model multiplied by 1; the selected object is obtained by dynamically and randomly selecting one object from all the labeled objects of each sample in a batch during iteration;

during training, as shown in fig. 2, the feature fusion module selects vectors of corresponding positions in feature vectors output by RoBERTa layer 10 transform according to input start and end positions to obtain two vectors of text segment number × 756 input to the model, copies the two vectors respectively 512 parts to obtain two vectors of text segment number × 512 × 756 input to the model, splices the two vectors in feature dimensions to obtain one vector of text segment number × 512 × 1512 input to the model, inputs the result to the full-connection layer network of the feature fusion module to obtain output vectors of text segment number × 512 × 756 input to the model, adds the output vectors and feature vectors output by the layer 10 transform, and then passes through the two layers of transforms of the feature fusion module to obtain the output of the feature fusion module;

the input of the prediction-object prediction module is the feature vector of the fused object feature output by the feature fusion module, and the output is the respective class probabilities of the object corresponding to the selected object, the prediction respective class probabilities between the selected object and the object, and the probabilities of the start character position and the end character position of the object in the respective character positions of the text segment input to the information extraction model.

S4, initializing backbone network parameters by using a RoBERTA model obtained by further pre-training, specifically:

S5, training by using the labeled data, randomly replacing the labeled data in the training process, enhancing the data to generalize the model, reducing overfitting, and calculating the error of back propagation by using the square cross entropy loss, which specifically comprises the following steps:

Loss＝k1×Ls+k2×Lpo

wherein k1 and k2 are selected according to actual conditions, and k1 and k2 in the embodiment are 1;

In this embodiment, the random replacement specifically includes:

the data enhancement specifically comprises:

in each iteration of the training process, before data is input into the model, one word in the description content is replaced, added and deleted randomly.

S6, extracting information in the unstructured text in the field of artificial intelligence by using the trained information extraction model to obtain result triples, and integrating the result triples, as shown in FIG. 6, specifically:

respectively taking out the content of the corresponding position of the output feature vector of the expanded 10 th layer of transducer according to the starting point position and the end point position of each object, respectively obtaining a starting point vector of nx756 and an end point vector of nx756, respectively copying 512 parts of the output feature vectors to obtain vectors of nx512 x 756 and nx512 x 756, splicing the vectors in feature dimensions to obtain a feature vector of nx512 x 1512, passing the feature vector through a full connection layer of a feature fusion module to obtain a feature vector of nx512 x 756, adding the feature vector with the feature vector obtained by the expanded 10 th layer of transducer, and passing through two layers of transducers of the feature fusion module to obtain a feature vector of fused objects;

s64, as shown in fig. 7, for the extracted object-predictor-object triple, if the entity attribute is used as the object, first, find out an entity corresponding to the entity attribute as the triple of the object, that is, the entity attribute-pseudo relationship 1-entity, then find out all triples corresponding to the entity attribute, that is, the entity attribute-pseudo relationship 2-description content, the entity attribute-pseudo relationship 3-application content, and the entity attribute-pseudo relationship 4-inclusion content, which are not the entity as the object, and then remove the pseudo relationships from the triples, and merge the triples into a new triple by using the common entity attribute as predictor: entity-entity attribute-content, achieving the purpose of open information extraction;

It should also be noted that in this specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for extracting document key information in the field of artificial intelligence is characterized by comprising the following steps:

s3, constructing an information extraction model;

2. The method for extracting key information of documents in the field of artificial intelligence according to claim 1, wherein said step S1 specifically comprises:

the general relationship definition method is adopted to define 5 triple types:

defining 4 triple types by adopting a pseudo relation definition method:

s13, labeling the defined triplet type, specifically:

3. The method for extracting key information of documents in the field of artificial intelligence as claimed in claim 1, wherein the RoBERTa model specifically includes an Embedding layer with three characteristic dimensions of 756, a Transformer layer with twelve characteristic dimensions of 756, and a fully-connected layer with an input channel number of 756 and an output channel number of total number of character types in all training text data;

4. The method for extracting key information of documents in the field of artificial intelligence according to claim 1, wherein said step S2 specifically comprises:

5. The method for extracting the key information of the document in the field of artificial intelligence according to claim 3, wherein the constructing of the information extraction model specifically comprises:

6. The method for extracting key information from documents in the field of artificial intelligence as claimed in claim 5, wherein the input of the subject prediction module is a feature vector with the shape of the number of text segments input to the model multiplied by 512 multiplied by 756 output of the 10 th layer transform layer of the information extraction model, and the output is a probability prediction result with each character position as the start point of the subject and a probability prediction result with each character position as the end point of the subject at 512 character positions corresponding to the original input information extraction model;

7. The method for extracting key information of documents in the field of artificial intelligence according to claim 5, wherein the initializing of backbone network parameters in step S4 specifically comprises:

the initial parameters of the fully-connected layer in the subject prediction module, the feature fusion module and the prediction-object prediction module are obtained by random sampling in normal distribution with the average value of 0 and the variance of 2 of the number of input channels of the layer.

8. The method for extracting key information of documents in the field of artificial intelligence according to claim 6, wherein said step S5 specifically comprises:

Loss＝k1×Ls+k2×Lpo

wherein k1 and k2 are selected according to actual conditions;

9. The method for extracting key information of documents in the field of artificial intelligence according to claim 8, wherein in said step S51, the random replacement specifically comprises:

the data enhancement specifically comprises:

10. The method for extracting key information from a document in the field of artificial intelligence according to claim 2 or 6, wherein the step S6 specifically includes: