CN113158624A

CN113158624A - Method and system for fine-tuning pre-training language model by fusing language information in event extraction

Info

Publication number: CN113158624A
Application number: CN202110384170.4A
Authority: CN
Inventors: 阚志刚; 李东升; 乔林波; 陈易欣; 赖志权; 彭丽雯; 韩毅; 唐宇; 高翊夫; 冯琳慧; 翟琪; 戴蓓亚
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2021-07-23
Anticipated expiration: 2041-04-09
Also published as: CN113158624B

Abstract

The invention discloses a method and a system for fine tuning a pre-training language model by fusing language information in event extraction, wherein the method comprises the following steps: s1, acquiring language information required to be fused by a current event extraction task and coding the language information to obtain a language feature code set; s2, obtaining an initial word vector to be input, and initially fusing the initial word vector and the feature coding set to obtain a fused word vector; s3, respectively inputting the initial word vector and the fused word vector into a pre-training language model, wherein the initial word vector is input into a first pre-training language model to obtain a first group of word representation outputs, and the fused word vector is input into a second pre-training language model to obtain a second group of word representation outputs; and S4, carrying out secondary fusion on the first group of word representations and the second group of word representations to obtain final word representations and finishing fine adjustment. The method has the advantages of simple implementation method, sufficient information fusion, capability of simultaneously reserving original model information and the like.

Description

Method and system for fine-tuning pre-training language model by fusing language information in event extraction

Technical Field

The invention relates to the technical field of Chinese information extraction, in particular to a method and a system for fine tuning a pre-training language model by fusing language information in event extraction.

Background

The characters are important carriers for communication, information recording and thought expression in human production and life, and are visitors and participants of human social development. The computer is an important tool for improving the production efficiency and the living standard of human beings. It is a challenging task to make machines process human languages faster and more accurately, and natural language processing techniques have emerged. In the field of natural language processing, one of the most fundamental and important tasks is to represent text reasonably in a machine.

In recent years, many excellent word representation models, such as word2vec, glove, etc., have been proposed successively, and the performance of natural language processing can be improved significantly based on these models, however, the above models can only use the same vector to represent the same word in different semantic environments, and thus there is a certain limitation. The limitation problem can be solved by the pre-training model, namely, the pre-training model is a network model obtained by using a data set for training in advance, and then the pre-training model is adjusted according to different requirements to be suitable for different tasks. Typical pre-training language models are BERT (Bidirectional Encoder representation for Transformers) models and the like. The pre-training language model is used as a dynamic word vector model, can be applied to different semantic environments after being modified and adjusted, and is gradually attracted by wide attention after being proposed.

The fine-tuning of the pre-trained model is fine-tuning on the pre-trained model using the new data set for the specific task. Because the pre-training model needs to be fully trained on a large-scale corpus, fine tuning of the pre-training language model can achieve excellent performance on most natural language processing tasks. For example, the performance of the process can be greatly improved by finely adjusting the BERT model on the natural language processing task. In the field of information extraction, especially on an event extraction task, the integrity of the extracted information directly affects the final performance of the model, so that the complete information needs to be extracted as much as possible in the fine tuning process of the pre-training language model in the information extraction.

In the prior art, a pre-training language model for Chinese is usually based on characters as the minimum unit, and the indexes of the characters in a word list, the positions of the characters in a sentence and paragraph information of the sentence in which the characters are located are used as initial vectors of the characters, so that the language features of words or phrases in which the characters are located are ignored. And Chinese characters have particularity, and compared with English texts, the Chinese text processing difficulty is higher in practical application. In the Chinese language, a sentence is composed of a plurality of characters, each character is relatively independent and contains rich semantic information, the same sentence is interpreted through different word division or sentence segmentation modes to often obtain different meanings, and the words obtained by directly using a traditional pre-training language model mode indicate that different meanings of the Chinese language are difficult to fully express. Therefore, if language information such as part of speech, named entity recognition, grammar component and the like can be added into the pre-training language model extracted aiming at the Chinese event so as to use specific language characteristics to fine tune the pre-training model, the model performance can be effectively improved.

In view of the fact that the expected language information in the word representation directly obtained by the pre-training language model is insufficient, in order to add the language information into the pre-training language model, a practitioner proposes to add the language features which are coded in advance directly into the word representation from the pre-training language model when an event is extracted and a downstream task network is constructed, so as to make up for the deficiency of the Chinese pre-training language model to a certain extent and strengthen the features of the language information such as part of speech, named entity recognition, grammar components and the like in the word representation. However, the mode of directly integrating the specified language information into the downstream task network is too simple, essentially only the pre-training language model system and the language feature extraction system (such as the Chinese segmentation system) output are combined, and the obtained distributed representation of the words is not fully trained, so that the features are still not rich enough, the Chinese language information features are difficult to be fully integrated, and therefore, the downstream task can only be practically provided with limited performance improvement. However, if the language information and the initial data are directly fused into a new word representation and input into the pre-trained language model for training, although the language information can be sufficiently trained, this approach actually changes the original representation of the characters in the pre-trained language model, which causes the problem that the new word representation and the original representation are not in the same vector space, so that the original language model information is lost, and the model performance is still limited.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides a method and a system for fine tuning a pre-training language model by fusing language information in event extraction, which have the advantages of simple implementation method, full feature fusion, capability of retaining original model information and good fine tuning effect.

In order to solve the technical problems, the technical scheme provided by the invention is as follows:

a method for fine tuning a pre-training language model by fusing language information in event extraction comprises the following steps:

step S1, language information acquisition: acquiring language information required to be fused by a current event extraction task and coding the language information to obtain a language feature coding set;

step S2, information initial fusion: acquiring an initial word vector to be input, and initially fusing the acquired initial word vector and the language feature code set to obtain a fused word vector;

step S3, double model coding: inputting the initial word vector and the fused word vector into a pre-training language model respectively for coding, wherein the initial word vector is input into a first pre-training language model to obtain a first group of word representations, and the fused word vector is input into a second pre-training language model to obtain a second group of word representation outputs, and the first pre-training language model and the second pre-training language model are independent from each other;

step S4, information secondary fusion: and performing secondary fusion on the first group of word representations and the second group of word representations to obtain a final word representation so as to finish fine adjustment.

Further, the step of step S1 includes:

s101, pre-acquiring language information required to be fused of each type, counting and coding the acquired language information, and constructing a feature coding dictionary;

and S102, extracting language information required to be fused by the task according to the current event, and acquiring a corresponding language feature code set from the feature code dictionary to output.

Further, the step of step S2 includes:

step S201, obtaining the dimension d of the hidden layer in the pre-training language model_model；

Step S202, in the initial coding stage of the pre-training language model, for each language in the language feature coding setFeature codes, respectively using the code having d_modelVector calculation of dimensionality is used for representing vector representation of the language feature code, namely the language feature code is projected to a target vector space to obtain a language feature vector set;

and S203, fusing the language feature vector set into the initial word vector to obtain the fused word vector.

Further, in step S202, the specific calculation formula of the language feature vector is as follows:

or in step S202, the sine function and the cosine function are used to alternately generate the dimension information in the vector representation, and the specific calculation formula is as follows:

wherein the content of the first and second substances,iis a dimension of the language feature vector, L_indexfor the coding of said language features to be represented,nthe length of the set is encoded for the language feature,E ⁱ _L-indexrepresenting a language feature codeL_inde xOf the vector representation ofiDimension information.

Further, in step S202, a polar coordinate equation is used to represent a value of each dimension of the language feature vector, and a specific calculation formula is as follows:

wherein the content of the first and second substances,iis a dimension of the language feature vector,L_indexfor the coding of said language features to be represented,nthe length of the set is encoded for the language feature,E ⁱ _L-indexrepresenting a language feature codeL_inde xOf the vector representation ofiDimension information.

Further, the first pre-training language model and the second pre-training language model are the same pre-training language model, or the first pre-training language model and the second pre-training language model adopt different pre-training language models.

Further, when performing the second fusion in step S4, the first group of word representations and the second group of word representations are specifically spliced to obtain a final word representation.

Further, in an event detection stage, according to the steps S1-S4, first language information is fused in the pre-training language model to obtain a first word distributed representation, and the first word distributed representation is classified to obtain a trigger word event type; in the event element extraction stage, second language information is blended into the pre-training language model according to the steps S1-S4 to obtain a second word distributed representation, and the second word distributed representation is classified to obtain event elements and element roles.

A system for fine-tuning a pre-trained language model by fusing language information in event extraction comprises the following steps:

the language information acquisition module is used for acquiring language information required to be fused by the current event extraction task and coding the language information to obtain a language feature coding set;

the information initial fusion module is used for acquiring an initial word vector to be input, and performing initial fusion on the acquired initial word vector and the language feature code set to obtain a fused word vector;

the dual-model coding module is used for respectively inputting the initial word vector and the fused word vector into a pre-training language model for coding, wherein the initial word vector is input into a first pre-training language model to obtain a first group of word representations, the fused word vector is input into a second pre-training language model to obtain a second group of word representation outputs, and the first pre-training language model and the second pre-training language model are independent;

and the information secondary fusion module is used for carrying out secondary fusion on the first group of word representations and the second group of word representations to obtain final word representations so as to finish fine adjustment.

A system for fine-tuning a pre-trained language model by fusing language information in event extraction comprises a processor and a memory, wherein the memory is used for storing a computer program, the processor is used for executing the computer program, and the processor is used for executing the computer program to execute the method.

Compared with the prior art, the invention has the advantages that:

1. the invention adopts a mode of combining double-model coding and double-fusion to realize the fine adjustment of the fusion language information to the pre-training language model, firstly, the language feature coding and the initial word vector are initially fused to fully fuse the specific language information into the initial vector representation of the pre-training language model, then the initial word vector and the fused word vector are respectively encoded and expressed by two independent pre-training language models, and two groups of word representation outputs obtained by the two independent models are secondarily fused to obtain final word representation, so that fused specific language information can be independently and fully trained, information loss of the original language model can be avoided, the information in the original language model is ensured not to be lost while the specified language information is fully utilized, so that the performance of the pre-training language model in Chinese event extraction can be effectively improved.

2. The method fuses the language feature codes into the initial vector representation of the pre-training language model through the deep neural network of the pre-training language model, and can make use of the model structural characteristics of the pre-training language model to enable specific language information to be fully fused with other information of words.

3. The invention further carries out vector representation by using the form of a polar coordinate equation during initial fusion, can ensure that each type of language feature code has unique vector representation, simultaneously uses two variables to jointly represent polar angles, can enrich the representation form of language feature vectors, is beneficial to fully utilizing vector representation space and forming language feature representation with distinct features, and further can ensure the full fusion between specific language information and other information.

4. In the invention, further, during initial fusion, by using the coding mode of position information in a BERT model and combining sine and cosine functions to alternately generate dimension information in language feature vector representation, the difference between the values of each dimension in the language feature vector can be controlled, the problem of over-concentrated distribution of the language feature vector in a vector space in a linear representation method is avoided, and the efficiency and the effect of language information fusion are ensured.

Drawings

Fig. 1 is a schematic flow chart illustrating an implementation process of the method for fine-tuning a pre-training language model by fusing language information in event extraction according to the embodiment.

Fig. 2 is a schematic flow chart illustrating an implementation process of an event detection task in an event extraction task in an embodiment of the present invention.

FIG. 3 is a flow chart illustrating an implementation of fine tuning a pre-training language model by fusing language information in an embodiment of the present invention.

FIG. 4 is a schematic diagram illustrating a concept of merging part-of-speech vectors into a pre-trained language model in an embodiment of the present invention.

Fig. 5 is a flowchart illustrating a method for fine-tuning a pre-training language model by fusing language information in event extraction according to this embodiment.

Detailed Description

The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.

As shown in fig. 1, the method for fine tuning the pre-training language model by fusing language information in event extraction in this embodiment includes the steps of:

s1, language information acquisition: acquiring language information required to be fused by a current event extraction task and coding the language information to obtain a language feature coding set;

s2, information initial fusion: acquiring an initial word vector to be input, and performing initial fusion on the acquired initial word vector and a language feature coding set to obtain a fused word vector;

s3, double-model coding: respectively inputting the initial word vector and the fused word vector into a pre-training language model, wherein the initial word vector is input into a first pre-training language model to be encoded to obtain a first group of word representations, and the fused word vector is input into a second pre-training language model to obtain a second group of word representation output, wherein the first pre-training language model and the second pre-training language model are mutually independent, namely the parameters between the first pre-training language model and the second pre-training language model are not shared;

s4, information secondary fusion: and performing secondary fusion on the first group of word representations and the second group of word representations to obtain final word representations, and finishing fine adjustment.

The embodiment considers the characteristics of Chinese event extraction, forms language feature codes by coding the language information to be fused, initially fuses the language feature codes and the initial word vectors to fully fuse the specific language information into the initial vector representation of the pre-training language model, then the initial word vector and the fused word vector are respectively encoded and expressed by two independent pre-training language models, and two groups of word representation outputs obtained by the two independent models are secondarily fused to realize the fine adjustment of the pre-training language models, so that not only can specific language information be independently and fully trained, but also the information loss of the original language model can be avoided, the information in the original language model is ensured not to be lost while the specified language information is fully utilized, so that the performance of the pre-training language model in Chinese event extraction can be effectively improved.

The language information fused in the embodiment can be type language information such as part of speech, named entity recognition, grammar component and the like, one type of information can be specifically selected for fusion according to actual requirements, and more than two types of information can be fused to further improve the model performance. When more than two types of language information are fused, the language information can be sequentially fused in a circulating execution mode, and different language information can be gradually fused in different stages according to different requirements of different stages in event extraction. If in the event detection stage, the part-of-speech information has a large influence on the event detection, the part-of-speech information can be blended in the event detection stage according to the steps, and the language model with the part-of-speech information can be obtained after training is completed; and in the event element extraction stage, the syntactic component information is fused, and the extraction of the event elements and the element roles can be realized after the model training is finished.

The specific step of step S1 in this embodiment includes:

and S102, extracting language information required to be fused by the task according to the current event, and acquiring a corresponding language feature code from the feature code dictionary.

In a specific application embodiment, according to the characteristics of a specific task, the specified language information in the data set can be directly adopted and encoded to form a language feature encoding set. If the data set of the target task does not contain the language information required by the user, the language information can be acquired by using a natural language processing tool (such as NLTK, StanfordcoreNLP, LTP and the like); then, in the range of the target data set, the features of the specific language information are counted and coded to form a dictionary L _ dit, after the current language information required to be fused is determined, the corresponding language feature code can be obtained directly by inquiring the dictionary L _ dit, and the specific language information is fused into the initial word vector of the pre-training language model. The specific modes of the language information acquisition and the encoding can be configured according to actual requirements.

In this embodiment, the step S2 includes:

s201, obtaining the dimension d of a hidden layer in a pre-training language model_model；

S202, in the initial coding stage of the pre-training language model, for each language feature code in the language feature code set, the code with d is respectively used_modelVector calculation of dimensionality is used for representing vector representation of language feature codes, namely the language feature codes are projected to a target vector space to obtain a language feature vector set;

and S203, fusing the language feature vector set into the initial word vector to obtain a fused word vector.

Assume that the text input for the pre-trained language model is { T }₁, T₂, …, T_nGet { T from dictionary L _ fact₁, T₂, …, T_nThe corresponding language feature code set is { L }₁, L₂, …, L_nAt the initial coding stage of the pre-training language model, one d is used for each token_modelRepresenting its linguistic information by vectors of dimensions, i.e. calculating T separately_iCorresponding language feature vector L_iTo finally obtain the input set { T }₁, T₂, …, T_nCorresponding language feature vector set EL₁, EL₂, …, EL_n}; set linguistic feature vectors { EL₁, EL₂, …, EL_nInitial word vector (EO) merged into pre-trained language model₁, EO₂, …, EO_nGet the fused word vector set (EF)₁, EF₂, …, EF_n}。

When fusing language information into the initial word vectors of the pre-trained language model, the fusion effect depends on how the language feature codes are projected into the target vector space, i.e. how the vector representation is performed. In the step S202, the language feature vector can be calculated by using the following three vector representations according to actual requirements:

the first method comprises the following steps: simple linear representation

The specific calculation formula of the language feature vector in the method is as follows:

（1）

wherein the content of the first and second substances,iis a dimension of a language feature vector,L_indexin order to encode the language features that need to be represented,nthe length of the set is encoded for the language feature,E ⁱ _L-indexrepresenting a language feature codeL_inde xOf the vector representation ofiDimension information.

Although this expression method is fast in calculation speed, the obtained language feature vectors have the same value in each dimension, and thus d is the same as d_modelIn the space of dimension, all the language feature vectors are distributed on a straight line, thus being suitable fornIn smaller applications; when in usenWhen the value of (2) is large, the expression effect of the language feature vector may be affected because the expression vectors corresponding to different language features are relatively close to each other in the vector space.

And the second method comprises the following steps: trigonometric function alternate generation method

The method uses sine and cosine functions to alternately generate dimension information in language feature vector representation, and the specific calculation formula is as follows:

（2）

wherein the content of the first and second substances,iis a dimension of a language feature vector,L_indexin order to encode the language features that need to be represented,E ⁱ _L-indexrepresenting a language feature codeL_indexOf the vector representation ofiDimension information.

According to the method, by using the coding mode of the position information in the BERT model for reference and combining sine and cosine functions to alternately generate the dimension information in the language feature vector representation, the vector representation can be simply and quickly obtained. Compared to the first linear representation method, the trigonometric function alternate generation method represents odd and even dimensions of the linguistic feature vector by using sine and cosine functions, respectively, while simultaneously representing the odd and even dimensions of the linguistic feature vector by using a sine and cosine function, respectivelyiAnd d_modelThe quotient is used as an index of an angle denominator, the difference between values of each dimension in the language feature vector is controlled, and the problem that the language feature vectors are distributed too intensively in a vector space in a linear representation method can be effectively avoided.

And the third is that: equation representation in polar coordinates

The method uses a polar coordinate equation to express the value of each dimension of the language feature vector, and the specific calculation formula is as follows:

（3）

It can be seen from the above formula (3) that if the language feature code changes, the polar diameter in the formula also changes, i.e. the vectors obtained from different language feature codes are represented differently. Thus, using the above-described form of using polar equations to represent the values for each dimension of a linguistic feature vector ensures that each type of linguistic feature code will have a unique vector representation. At the same time, due to the use ofiAndL_inde xthe two variables are used for commonly representing the polar angle, the representation form of the language feature vector can be enriched, the vector representation space is fully utilized, and the language feature representation with distinct features is formed, so that the fusion effect between the language information and other information of the word is further improved, and the method is particularly suitable for representing the polar angle by the two variablesnIn larger situations.

Preferably, in order to take account of the calculation efficiency and the expression effect, an adaptive vector expression method integrating the three methods can be adopted, and the specific steps are as follows: judgment ofnSize of (1), ifnLess than a predetermined threshold, i.e. indicatingnIf the language feature vector is smaller, controlling to calculate the language feature vector by adopting a first linear representation mode; if it isnGreater than a predetermined threshold, i.e. indicatingnIf the language feature vector is larger, the language feature vector is calculated by adopting a third polar coordinate equation representation mode in control, and the basis can benThe adaptive size of the method adopts a proper vector representation mode, and can give consideration to the calculation efficiency and effect in the information fusion process.

Of course, other vector representation modes can be adopted according to actual requirements, and even learning algorithm can be adopted for learning.

In step S203, the language feature vector set may be merged into the initial word vector by a bit-by-bit addition method, for example, the language feature vector is added to the original token vector, segment vector and corresponding element of the position vector of the BERT model. After the language feature information is fused into the pre-training language model through the steps, the language feature information and other information of the words can be further fully fused through a deep neural network in the pre-training language model.

The detailed step of step S3 in this embodiment includes:

s301, representing the initial words which are not integrated into the language information into a set of (EO)₁, EO₂, …, EO_nSending the words into a first pre-training language model to obtain a first group of word representation outputs { F'₁, F'₂, …, F'_n}; fused word vector (EF) fused with language information₁, EF₂, …, EF_nSending the words into a second pre-training language model to obtain a second group of word representation output { F'₁, F''₂, …, F''_n}。

S302, fusing the two groups of word representations to form a word representation set { F ] of an event detection stage₁, F₂, …, F_n }。

In the above steps of this embodiment, two pre-training language models that are independent from each other and do not share parameters are used to perform encoding respectively, so that word vector output of the pre-training language model into which language information is merged and word vector output of the original pre-training language model can be obtained, and the two sets of outputs are merged to ensure that the language information can be trained sufficiently and avoid loss of the original model information.

In this embodiment, the first pre-training language model and the second pre-training language model may adopt the same model, such as both BERT models, and of course, the first pre-training language model and the second pre-training language model may also adopt different pre-training language models, such as one of them adopts a BERT model and the other one adopts an ALBERT (a Lite BERT, a lightweight BERT) model, which may be configured specifically according to actual requirements.

In the second fusion performed in step S4, the first group of word representations and the second group of word representations are specifically spliced, that is, the output of the improved pre-trained language model is fused with the output of the original pre-trained language model in a splicing manner, so as to obtain a final word representation, and the word representation is fused with specific language information, so that the event extraction performance of the model can be improved, and meanwhile, the information of the original model is fully retained.

This embodiment adopts pipeline structure to realize the event extraction task, carries out the event detection to the text earlier promptly, then carries out the event element extraction based on the structure that the event detected, fuses different language information respectively according to the characteristics in two stages, and the step includes: according to the characteristics of event trigger words in an event extraction task, vector representation of characters (or words) in an event detection stage is obtained according to the steps S1-S4, then a network is designed according to the event detection stage in the event extraction task, training is carried out on a specific data set, parameters of two pre-training language models in the network and the event detection stage are updated, and therefore trigger words and event types are obtained; and according to the characteristics of the event elements in the event extraction task, acquiring vector representation of characters (or words) in an event element stage according to the steps S1-S4, designing a network aiming at the event element extraction stage in the event extraction task, training on a specific data set, updating parameters of two pre-training language models in the network and the event element extraction stage, and finally obtaining the event elements and the element roles.

Fig. 2 is a schematic diagram illustrating an implementation flow of an event detection phase in an event extraction task, and the flow of the event element extraction phase is similar to the above. In a specific application embodiment, in an event detection stage, first, according to steps S1-S4, first language information (language information 1) is fused in a pre-training language model to obtain word distributed representation of the current stage, and the word distributed representation is classified by adopting a multi-classification network to obtain a trigger word event type; in the event element extraction stage, second language information (language information 2) is blended into the pre-training language model according to the steps S1-S4 to obtain word distributed representation of the current event element extraction stage, and the word distributed representation is classified to obtain event elements and element roles. The language information fused in each stage can be configured according to actual requirements, for example, in the event detection stage, the word property can be used as the designated language information, and the designated language information in the event element extraction stage can be the grammar component information.

The present invention will be further described below by taking an example of fine tuning a pre-training language model by using part-of-speech information and grammar component information on an event extraction task in a specific application embodiment.

As shown in fig. 3 to 5, in this embodiment, a part-of-speech information is fused to fine-tune a pre-training language model in an event detection stage, and a syntactic component information is fused to fine-tune the pre-training language model in an event element extraction stage to realize event extraction, and the detailed steps are as follows:

the method comprises the following steps: and acquiring part-of-speech information of the data in the target task data set according to the characteristics of an event detection stage in the event extraction task, and capturing the part-of-speech information of the corpus in the data set by using an external NLP tool under the condition that the data set of the event extraction task does not contain the part-of-speech information.

The step of acquiring the part of speech information comprises the following steps:

step 1.1, using tools such as NLTK or LTP and the like to label part of speech of all data in a data set and storing the data;

in this embodiment, the minimum granularity of the part-of-speech tagging is a word, and a word in chinese often contains a plurality of characters. Thus, all characters in the same word range have the same part-of-speech as the word when saved.

Step 1.2, traversing the linguistic data in the data set, counting the part-of-speech types and coding to form a dictionary L _ dit.

In this embodiment, when traversing the data set, the language features are obtained and the word segmentation is performed on the input text each time, and the dictionary L _ dit is established after the language features are aligned with the word segmentation results. In the feature coding, a coding method of numbering all part-of-speech types from 1 is specifically adopted.

Step two: the method comprises the steps of obtaining an original word vector of a pre-training language model, wherein the original word vector comprises word coding, paragraph coding, position coding information and the like, and fusing part-of-speech information into the original word vector to improve the original word vector to form a fused word vector EF. The specific steps of forming the fused word vector EF include;

step 2.1 obtaining dimension d of hidden layer of pre-training language model_model；

Step 2.2 based on input { T }₁, T₂, …, T_nThe part of speech of the dictionary is obtained from the dictionary L _ fact, and the corresponding language feature number { L is obtained from the dictionary L _ fact₁, L₂, …, L_n}。

In this embodiment, for the special token in the BERT models such as [ CLS ] and [ SEP ], the corresponding part-of-speech number is set to be 0.

Step 2.3 for each token, use one d_modelThe part-of-speech information is expressed by a dimensional vector, and the part-of-speech vector expression specifically uses any one of the above equations (1) to (3).

Step 2.4 set of linguistic feature vectors { EL }₁, EL₂, …, EL_nInitial word vector (EO) merged into pre-trained language model₁, EO₂, …, EO_nGet the set { EF }₁, EF₂, …, EF_n}。

In a specific application embodiment, the fusion process is as shown in fig. 4, that is, the part-of-speech vector is added to the original token vector, segment vector and corresponding element of the position vector of the BERT model.

Step three: and D, fusing the word vector output of the pre-training language model fused with the language information in the step two with the word vector output of the original pre-training language model.

Step 3.1 represent set of original words not merged with part-of-speech information { EO₁, EO₂, …, EO_nFeeding into a first BERT model M1, resulting in a first set of word representation outputs { F'₁, F'₂, …, F'_n}；

Step 3.2 represents the set of fused words { EF ] fused with the part of speech information₁, EF₂, …, EF_nSending into a second BERT model M2, resulting in a second set of word representation outputs { F'₁, F''₂, …, F''_n}；

Step 3.3 fuse the two sets of word representations to form the final set of word representations { F }₁, F₂, …, F_n }；

Step four: designing a network facing an event detection stage in an event extraction task, inputting a final word representation as a word feature of the event detection network, and then training a whole model;

step five: according to the characteristics of an event detection stage in an event extraction task, acquiring grammatical component information of data in a target task data set;

step six: the grammatical component information is blended into the initial word vector of the pre-training language model, and a final word representation set is obtained in the same mode of the second step and the third step;

step seven: and designing a network aiming at an event element extraction stage in the event extraction task, and then training a model to obtain event elements and element roles.

The embodiment further provides a system for fine tuning a pre-training language model by fusing language information in event extraction, which comprises the following steps:

the dual-model coding module is used for respectively inputting the initial word vector and the fused word vector into a pre-training language model for coding, wherein the initial word vector is input into a first pre-training language model to obtain a first group of word expressions, the fused word vector is input into a second pre-training language model to obtain a second group of word expression outputs, and the first pre-training language model and the second pre-training language model are mutually independent;

and the information secondary fusion module is used for carrying out secondary fusion on the first group of word representations and the second group of word representations to obtain final word representations and finish fine adjustment.

In this embodiment, the language information obtaining module includes:

the encoding unit is used for acquiring language information required to be fused of each type in advance, counting and encoding the acquired language information and constructing a feature encoding dictionary;

and the first acquisition unit is used for acquiring a corresponding language feature code set from the feature code dictionary according to the language information required to be fused by the current event extraction task.

In this embodiment, the information initial fusion module includes:

a second obtaining unit for obtaining the dimension d of the hidden layer in the pre-training language model_model；

A computing unit, configured to, in an initial encoding stage of a pre-training language model, use the language feature codes with d for each language feature code in the language feature code set_modelCalculating a vector of a dimension for representing vector representation of language feature codes, namely projecting the language feature codes to a target vector space to obtain a language feature vector set;

and the fusion unit is used for fusing the language feature vector set into the initial word vector to obtain the fused word vector.

The system for fusing the language information fine tuning pre-training language model in the event extraction and the method for fusing the language information fine tuning pre-training language model in the event extraction correspond to each other one by one, and the system and the method have the same implementation principle and technical effect, and are not repeated herein.

In another embodiment, the system for fine-tuning the pre-training language model by fusing language information in event extraction according to the present invention may further include: the method comprises a processor and a memory, wherein the memory is used for storing a computer program, the processor is used for executing the computer program, and the processor is used for executing the computer program to execute the method for fine-tuning the pre-training language model by fusing language information in the event extraction.

The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims

1. A method for fine tuning a pre-training language model by fusing language information in event extraction is characterized by comprising the following steps:

step S4, information secondary fusion: and performing secondary fusion on the first group of word representations and the second group of word representations to obtain final word representations, and finishing fine adjustment.

2. The method for fusing language information to fine tune a pre-trained language model in event extraction as claimed in claim 1, wherein said step S1 comprises:

3. The method for fusing language information to fine tune a pre-trained language model in event extraction as claimed in claim 1, wherein said step S2 comprises:

Step S202, in the initial coding stage of the pre-training language model, for each language feature code in the language feature code set, the code with d is used_modelVector calculation of dimensionality is used for representing vector representation of the language feature code, namely the language feature code is projected to a target vector space to obtain a language feature vector set;

4. The method for fine-tuning a pre-trained language model by fusing language information in event extraction according to claim 3, wherein in the step S202, the specific calculation formula of the language feature vector is as follows:

5. The method for fine-tuning a pre-trained language model by fusing language information in event extraction according to claim 3, wherein in the step S202, a polar coordinate equation is used to represent the value of each dimension of the language feature vector, and the specific calculation formula is as follows:

wherein the content of the first and second substances,iis a dimension of the language feature vector,L_indexfor the coding of said language features to be represented,nthe length of the set is encoded for the language feature,E ⁱ _L-indexrepresenting a language feature codeL_indexOf the vector representation ofiDimension information.

6. The method for fine-tuning the pre-training language model by fusing the language information in the event extraction according to any one of claims 1 to 5, wherein: the first pre-training language model and the second pre-training language model are the same pre-training language model, or the first pre-training language model and the second pre-training language model adopt different pre-training language models.

7. The method for fine-tuning the pre-training language model by fusing the language information in the event extraction according to any one of claims 1 to 5, wherein: in the step S4, when performing the second fusion, the first group of word representations and the second group of word representations are specifically spliced to obtain a final word representation.

8. The method for fine-tuning the pre-training language model by fusing the language information in the event extraction according to any one of claims 1 to 5, wherein: in an event detection stage, according to the steps S1-S4, first language information is fused in the pre-training language model to obtain a first word distributed representation, and the first word distributed representation is classified to obtain a trigger word event type; in the event element extraction stage, second language information is blended into the pre-training language model according to the steps S1-S4 to obtain a second word distributed representation, and the second word distributed representation is classified to obtain event elements and element roles.

9. A system for fine-tuning a pre-trained language model by fusing language information in event extraction is characterized by comprising the following steps:

10. A system for fine-tuning a pre-trained language model by fusing language information in event extraction, comprising a processor and a memory, the memory being configured to store a computer program, the processor being configured to execute the computer program, wherein the processor is configured to execute the computer program to perform the method according to any one of claims 1 to 8.