CN113807095B

CN113807095B - Training method, training device, training equipment and training storage medium for entity word extraction model

Info

Publication number: CN113807095B
Application number: CN202110236142.8A
Authority: CN
Inventors: 赵晨旭; 郑宇宇; 顾松庠
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2024-05-17
Anticipated expiration: 2041-03-03
Also published as: CN113807095A

Abstract

The application provides a training method, a training device, training equipment and training storage media for an entity word extraction model, wherein the training method comprises the following steps: acquiring an entity word extraction model to be trained; using the masked first training corpus as an input corpus, and performing first training on Longformer submodels in the entity word extraction model to enable Longformer submodels subjected to the first training to output semantic features; and using a second training corpus marked by the part-of-speech information of the entity words as an input corpus, and performing second training on the first trained Longformer submodels and the CRF submodels in the entity word extraction model so that the CRF submodels subjected to the second training output the entity words according to semantic features output by the Longformer submodels subjected to the second training. Therefore, longformer is combined with the CRF model to construct an entity word extraction model for extracting entity words of long texts, and entity word recognition is not required to be carried out after the long texts are segmented or divided, so that information loss can be avoided, and the reliability of entity word extraction results is improved.

Description

Training method, training device, training equipment and training storage medium for entity word extraction model

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a training method, apparatus, device, and storage medium for an entity word extraction model.

Background

Entity word recognition or extraction is an important basic task in the application fields of information retrieval, question and answer systems, syntactic analysis, machine translation and the like. Currently, for extracting entity words of a long text, the long text is segmented into a plurality of short text segments, and the entity words are extracted for each short text segment respectively. However, the method of extracting the entity words after the long text is segmented easily causes that the interaction between different short text segments cannot be performed, so that the condition of information loss is caused, and the reliability of the entity word extraction result is reduced.

Disclosure of Invention

The present application aims to solve at least one of the technical problems in the related art to some extent.

The application provides a training method, a training device, training equipment and training storage media for an entity word extraction model, which are used for combining Longformer with a CRF model to construct the entity word extraction model for extracting entity words of long texts without splitting or splitting the long texts to identify the entity words, so that information loss can be avoided, the reliability of an entity word extraction result is improved, and the training device is used for solving the technical problems that in the prior art, the mode of extracting the entity words after splitting the long texts easily causes incapability of interaction among different short text segments, thereby causing information loss and reducing the reliability of the entity word extraction result.

An embodiment of a first aspect of the present application provides a training method for an entity word extraction model, including:

Acquiring an entity word extraction model to be trained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel;

using the masked first training corpus as an input corpus, and performing first training on Longformer submodels in the entity word extraction model to enable the Longformer submodels subjected to the first training to output semantic features;

And using a second training corpus marked by part-of-speech information of the entity words as an input corpus, and performing second training on the first trained Longformer submodels and the CRF submodels in the entity word extraction model so that the CRF submodels subjected to the second training output the entity words according to semantic features output by the Longformer submodels subjected to the second training.

According to the training method of the entity word extraction model, the entity word extraction model to be trained is obtained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel; using the masked first training corpus as an input corpus, and performing first training on Longformer submodels in the entity word extraction model to enable Longformer submodels subjected to the first training to output semantic features; and using a second training corpus marked by the part-of-speech information of the entity words as an input corpus, and performing second training on the first trained Longformer submodels and the CRF submodels in the entity word extraction model so that the CRF submodels subjected to the second training output the entity words according to semantic features output by the Longformer submodels subjected to the second training. Therefore, longformer is combined with the CRF model to construct an entity word extraction model for extracting entity words of long texts, and entity word recognition is not required to be carried out after the long texts are segmented or divided, so that information loss can be avoided, and the reliability of entity word extraction results is improved.

An embodiment of a second aspect of the present application provides a training device for an entity word extraction model, including:

The acquisition module is used for acquiring an entity word extraction model to be trained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel;

The first training module is used for carrying out first training on Longformer submodels in the entity word extraction model by taking the masked first training corpus as an input corpus so as to enable the Longformer submodels subjected to the first training to output semantic features;

The second training module is used for taking second training corpus marked by part-of-speech information of the entity words as input corpus, carrying out second training on the first trained Longformer submodels and the CRF submodels in the entity word extraction model, so that the second trained CRF submodels output the entity words according to semantic features output by the second trained Longformer submodels.

According to the training device for the entity word extraction model, the entity word extraction model to be trained is obtained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel; using the masked first training corpus as an input corpus, and performing first training on Longformer submodels in the entity word extraction model to enable Longformer submodels subjected to the first training to output semantic features; and using a second training corpus marked by the part-of-speech information of the entity words as an input corpus, and performing second training on the first trained Longformer submodels and the CRF submodels in the entity word extraction model so that the CRF submodels subjected to the second training output the entity words according to semantic features output by the Longformer submodels subjected to the second training. Therefore, longformer is combined with the CRF model to construct an entity word extraction model for extracting entity words of long texts, and entity word recognition is not required to be carried out after the long texts are segmented or divided, so that information loss can be avoided, and the reliability of entity word extraction results is improved.

An embodiment of a third aspect of the present application proposes a computer device comprising: the training method for the entity word extraction model comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the training method for the entity word extraction model according to the embodiment of the first aspect of the application when executing the program.

An embodiment of a fourth aspect of the present application proposes a non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements a training method for an entity word extraction model as proposed in the embodiment of the first aspect of the present application.

An embodiment of a fifth aspect of the present application proposes a computer program product, which when executed by a processor, performs a training method of an entity word extraction model as proposed by the embodiment of the first aspect of the present application.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flowchart of a training method of an entity word extraction model according to an embodiment of the present application;

FIG. 2 is a flowchart of a training method of an entity word extraction model according to a second embodiment of the present application;

FIG. 3 is a flowchart of a training method of an entity word extraction model according to a third embodiment of the present application;

Fig. 4 is a flow chart of a training method of an entity word extraction model according to a fourth embodiment of the present application;

FIG. 5 is a schematic diagram of a training process of an entity word extraction model in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a training device for entity word extraction model according to a fifth embodiment of the present application;

FIG. 7 illustrates a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.

At present, named entity Recognition (NAMED ENTITY Recognizing, abbreviated as NER) can be performed based on a Bi-directional long-short-term memory (Bidirectional Long Short Term Memory, abbreviated as Bi-LSTM) +conditional random field (Conditional Random Field, abbreviated as CRF) model, or based on a Bi-directional coded representation of machine translation (Bidirectional Encoder Representations from Transformer, abbreviated as BERT) +CRF model. The NER is an entity with specific meaning in the identification text, mainly comprises proper nouns such as person names, place names and organization names, meaningful time and the like, and is an important basic task in the application fields such as information retrieval, question-answering systems, syntactic analysis and machine translation.

However, the recognition effect of the model on sentences or short texts is good, the recognition result on long texts is poor, the long texts are segmented and then entity words are extracted, interaction between different short text segments is easily caused, information loss is caused, and the reliability of the entity word extraction result is reduced.

Although interactions between different text segments can be enhanced by adding some other mechanisms to the model, the implementation is more complex by adding mechanisms, and the tasks are often specific and not very versatile.

Aiming at the problems, the embodiment of the application mainly provides a training method of an entity word extraction model. According to the training method of the entity word extraction model, the long text codes Longformer are combined with the CRF model to construct the entity word extraction model for extracting the entity words of the long text, and the entity word recognition is carried out after the long text is not needed to be segmented or segmented, so that information loss can be avoided, and the reliability of the entity word extraction result is improved.

The following describes a training method, a training device, training equipment and training storage media for an entity word extraction model according to an embodiment of the application with reference to the accompanying drawings.

Fig. 1 is a flowchart of a training method of an entity word extraction model according to an embodiment of the present application.

The embodiment of the application is illustrated by the training method of the entity word extraction model being configured in the training device of the entity word extraction model, and the training device of the entity word extraction model can be applied to any computer equipment so that the computer equipment can execute the training function of the entity word extraction model.

The computer device may be a personal computer (Personal Computer, abbreviated as PC), a cloud device, a mobile device, etc., and the mobile device may be, for example, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, a vehicle-mounted device, etc., which have various hardware devices including an operating system, a touch screen, and/or a display screen.

As shown in fig. 1, the training method of the entity word extraction model may include the following steps:

step 101, obtaining an entity word extraction model to be trained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel.

In the embodiment of the application, the entity word refers to things or concepts with specific meaning in the text, for example, the entity word can comprise proper nouns such as a name of a person, a place name, a name of an organization, and the like, meaningful time, and the like. The meaningful time may be, for example, a time entity word such as holidays, etc., such as an end noon, mid-autumn, etc.

In the embodiment of the application, the entity word extraction model can be built in advance according to the long text coding Longformer submodel and the CRF submodel, so that the pre-built entity word extraction model can be obtained in the application.

Step 102, using the masked first training corpus as an input corpus, and performing first training on Longformer submodels in the entity word extraction model to enable the first trained Longformer submodels to output semantic features.

In the embodiment of the application, the training corpus of Longformer submodels can be acquired online or offline or can be text information stored locally in the computer equipment or can be acquired from the existing test dataset, and the embodiment of the application is not limited to the above. After the training corpus is obtained, masking processing can be performed on the training corpus to obtain a first training corpus. Wherein, the mask refers to deduction processing of one or more characters in the training corpus.

For example, the training corpus is "Xiaoming Zhongjie moon cake", and the subtraction processing is performed on one character, such as "autumn", in the training corpus, so as to obtain the first training corpus that is masked as "Xiaoming Zhongjie moon cake".

In the embodiment of the application, the first training corpus which is subjected to mask can be used as the input corpus, and the Longformer submodels are trained, so that the Longformer submodels subjected to the first training can output semantic features of each character in the input corpus, and the semantic features can be semantic vectors, for example.

And 103, using a second training corpus marked by part-of-speech information of the entity words as an input corpus, and performing second training on the first trained Longformer submodels and the CRF submodels in the entity word extraction model so that the CRF submodels subjected to the second training output the entity words according to semantic features output by the Longformer submodels subjected to the second training.

In the embodiment of the application, the training corpus of the first trained Longformer submodel and the CRF submodel can be collected on line, or can be collected off line, or can be text information locally stored in computer equipment, or can be obtained from the existing test dataset, and the embodiment of the application is not limited to this.

In the embodiment of the application, after the training corpus is obtained, part-of-speech information labeling can be carried out on the entity words in the training corpus to obtain the second training corpus. The part-of-speech information may include, among other things, one or more of an entity word start (Begin) identifier, an entity word End (End) identifier, an entity word Intermediate (Intermediate) character identifier, and a non-entity word identifier.

The entity word starting identifier is used for indicating that the corresponding character belongs to the first character of the entity word; the entity word ending mark is used for indicating that the corresponding character belongs to the last character of the entity word; the entity word middle character identifier is used for indicating that the corresponding character belongs to the middle character of the entity word; and the non-entity word identifier is used for indicating that the corresponding character does not belong to the entity word.

As one example, an entity word start identifier such as B, an entity word end identifier such as E, an entity word intermediate character identifier such as I, and a non-entity word identifier such as O (i.e., other).

For example, the training corpus is "a race of china women's football is watched in bird nest", and it can be known that the entity words in the training corpus are respectively: and labeling the part-of-speech information of the entity words in the training corpus by using Xiaoming, bird nest and China women's football to obtain a second training corpus labeled by the entity words, wherein labeling information corresponding to each character in the second training corpus is [ B, E, O, B, E, O, O, O, B, I, I, E, O, O, O, O and O ].

In the embodiment of the application, after part-of-speech information labeling is performed on entity words in the training corpus to obtain second training corpus, the second training corpus labeled with the part-of-speech information of the entity words can be used as input corpus, and the first trained Longformer submodel and the CRF submodel in the entity word extraction model are subjected to second training, so that the CRF submodel subjected to the second training outputs the entity words according to semantic features output by the Longformer submodel subjected to the second training.

Specifically, the second training corpus may be input into the Longformer sub-model subjected to the first training to obtain semantic features of each character in the second training corpus output by the Longformer sub-model, the semantic features of each character in the second training corpus are input into the CRF sub-model to obtain part-of-speech information of each character output by the CRF sub-model, where the part-of-speech information of each character is used to indicate whether the corresponding character belongs to a physical word, so that the part-of-speech information of each character output by the CRF sub-model and the part-of-speech information of each character marked in the second training corpus can be used to perform the second training on the Longformer sub-model subjected to the first training and the CRF sub-model, so that the CRF sub-model subjected to the second training outputs the physical word according to the semantic features output by the Longformer sub-model subjected to the second training. The following embodiments will be described in detail, and will not be described here.

In order to clearly illustrate the above embodiment, another training method for an entity word extraction model is provided in this embodiment, and fig. 2 is a flow chart of a training method for an entity word extraction model provided in the second embodiment of the present application.

As shown in fig. 2, the training method of the entity word extraction model may include the following steps:

step 201, obtaining an entity word extraction model to be trained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel.

Step 202, using the masked first training corpus as an input corpus, and performing first training on Longformer submodels in the entity word extraction model to enable the first trained Longformer submodels to output semantic features.

The execution of steps 201 to 202 may be referred to the execution of steps 101 to 102 in the above embodiments, and will not be described herein.

Step 203, inputting the second training corpus into the Longformer submodel subjected to the first training to obtain semantic features of each character in the second training corpus.

In the embodiment of the present application, through the first training in step 202, the Longformer sub-model learns the corresponding relationship between the input corpus and the semantic features of each character in the input corpus, so that the semantic features of each character in the second training corpus output by the Longformer sub-model can be obtained by inputting the second training corpus into the Longformer sub-model through the first training.

Step 204, inputting semantic features of each character of the second training corpus into the CRF sub-model to obtain part-of-speech information of each character output by the CRF sub-model; part-of-speech information indicating whether the corresponding character belongs to an entity word.

It should be noted that, the explanation of the part-of-speech information in step 103 in the foregoing embodiment is also applicable to this embodiment, and will not be repeated here.

In the embodiment of the application, the semantic features of each character in the second training corpus obtained by outputting the first training Longformer submodel can be input into the CRF submodel, and the CRF submodel outputs the part-of-speech information of each character.

Step 205, according to the part-of-speech information of each character output by the CRF sub-model and the difference between the part-of-speech information of each character marked in the second training corpus, adjusting model parameters of the CRF sub-model and the Longformer sub-model after the first training.

In the embodiment of the application, the model parameters of the CRF sub-model and the first trained Longformer sub-model can be adjusted according to the part-of-speech information of each character output by the CRF sub-model and the difference between the part-of-speech information of each character marked in the second training corpus, so that the difference between the part-of-speech information of each character output by the CRF sub-model and the part-of-speech information of each character marked in the second training corpus is minimized.

In one possible implementation manner of the embodiment of the present application, a similarity between the part-of-speech information of each character output by the CRF sub-model and the part-of-speech information of each character labeled in the second training corpus may be calculated based on a similarity calculation algorithm, and according to the similarity, a difference between the part-of-speech information of each character output by the CRF sub-model and the part-of-speech information of each character labeled in the second training corpus is determined, where the similarity and the difference form an inverse relationship, that is, the greater the similarity, the smaller the difference, and conversely, the smaller the similarity, the greater the difference. And then, determining the value of the loss function according to the difference between the part-of-speech information of each character output by the CRF sub-model and the part-of-speech information of each character marked in the second training corpus, wherein the value of the loss function and the difference form a positive relation. Therefore, model parameters of the CRF submodel and the Longformer submodel subjected to the first training can be adjusted according to the value of the loss function, so that the value of the loss function is minimized, and the prediction precision of the model is improved. When the loss function is minimized, the second training process can be ended, and then the second trained Longformer submodel and the CRF submodel can be used for identifying entity words in the text (namely, a prediction stage), so that the accuracy and the reliability of a prediction result can be improved.

In the prediction stage, a text (such as a long text) can be input into a second trained Longformer submodel, semantic features of each character in the text are obtained by outputting the second trained Longformer submodel, the semantic features of each character in the text are input into a second trained CRF submodel, and entity words are output by the second trained CRF submodel according to the semantic features output by the second trained Longformer submodel.

According to the training method of the entity word extraction model, semantic features of all characters in the second training corpus are obtained by inputting the second training corpus into the Longformer submodels subjected to the first training, semantic features of all the characters in the second training corpus are input into the CRF submodel to obtain part-of-speech information of all the characters output by the CRF submodel, and model parameters of the CRF submodel and the Longformer submodel subjected to the first training are adjusted according to the part-of-speech information of all the characters output by the CRF submodel and differences among part-of-speech information of all the characters marked in the second training corpus. Therefore, the accuracy and the reliability of the entity word prediction result can be improved.

The first training process of Longformer submodels in any of the above embodiments is described in detail below in conjunction with embodiment three.

Fig. 3 is a flowchart of a training method of an entity word extraction model according to a third embodiment of the present application.

As shown in fig. 3, the training method of the entity word extraction model may include the following steps:

step 301, obtaining an entity word extraction model to be trained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel.

Step 302, inputting each character of the first training corpus after mask into a feature extraction layer of Longformer submodels for feature extraction, and inputting the extracted features of each character into an attention layer of Longformer submodels for local attention prediction to obtain local attention weights of each character.

When the languages of the input corpus are different, for example, when the language of the first training corpus is english, the character features may be word vectors of words, and when the language of the first training corpus is chinese, the character features may be word vectors of words.

It should be noted that the conventional Tranformer-based model has a natural disadvantage when processing long text, because the conventional Tranformer-based model adopts a "fully-connected" attention mechanism, i.e., each character focuses on all other characters in the text sequence, and the complexity of the attention mechanism is as high as O (n ²). Where n is the length of the sequence herein.

Longformer, instead, improves the conventional attention mechanism of the transducer by, for each character, calculating local attention weights only for other characters within the set sliding window that are located near the character, and can reduce the complexity of the attention mechanism to O (n). For example, the window length of the sliding window is set to 512, and for a certain character, the local attention can be paid to 512 characters in the vicinity of the character. That is, the local attention weight of each character in the application is used for representing the association degree or attention degree of the corresponding character and other characters in the adjacent position in the input corpus, wherein the number of the characters at intervals between the corresponding character and the other characters in the adjacent position is smaller than a threshold value; the threshold is determined according to the size of the sliding window, for example, the window length of the sliding window is set to 512, and for a certain character, the local attention can notice 512 characters near the character.

In the embodiment of the present application, the feature extraction layer of Longformer submodel may be, for example, a self-residual network, such as RoBERTa (a modified version of BERT), or may be another network or feature extractor, which is not limited in this aspect of the present application.

In the embodiment of the application, the feature extraction layer of each character input Longformer submodel of the first training corpus after mask is subjected to feature extraction, the extracted features of each character are input into the Attention (Attention) layer of the Longformer submodel to perform local Attention prediction, and the Attention layer predicts to obtain the local Attention weight of each character.

Step 303, inputting the local attention weight of each character into the output layer of the Longformer submodel to obtain the first semantic feature of each character in the first training corpus determined by the output layer according to the local attention weight.

In the embodiment of the application, the local Attention weight of each character predicted by the Attention (Attention) layer of the Longformer submodel can be input into the output layer of the Longformer submodel, and the output layer predicts the first semantic feature of each character in the first training corpus according to the local Attention weight of each character.

Step 304, predicting mask characters in the first training corpus according to the first semantic features of each character to obtain first characters.

In the embodiment of the application, longformer submodels can also predict mask characters in the first training corpus according to the first semantic features of each character to obtain the first character.

For example, the training corpus is "Xiaoming in mid-autumn moon cake", the first training corpus obtained by masking the training corpus is "Xiaoming in mid-autumn moon cake", and the Longformer sub-model predicts that the masking character in the first training corpus can be "autumn" according to the first semantic feature of each character.

Step 305, pre-training Longformer the submodel according to the difference between the first character and the actual mask character of the first training corpus.

It should be understood that, according to the semantic features (such as semantic vectors) output by the output layer of the Longformer submodel, the mask characters are predicted, if the prediction is correct, that is, the difference between the predicted mask characters and the actual mask characters is 0, it is indicated that the semantic features predicted by the Longformer submodel are correct, and if the prediction is incorrect, the difference between the predicted mask characters and the actual mask characters is large, at this time, the model parameters need to be adjusted to improve the accuracy of the model output result.

Specifically, the first character is a mask character predicted by the Longformer submodel, and in order to improve the accuracy of the output result of the Longformer submodel, the Longformer submodel may be pre-trained according to the first character and the actual mask character of the first training corpus. In particular, the Longformer submodel may be pre-trained based on differences between the first character and the actual mask characters of the first training corpus to minimize the differences.

In one possible implementation manner of the embodiment of the present application, the similarity between the first character and the actual mask character may be calculated based on a similarity calculation algorithm, and the difference between the first character and the actual mask character may be determined according to the similarity, where the similarity and the difference are in an inverse relationship. Then, according to the difference between the first character and the actual mask character, determining the value of the loss function, wherein the value of the loss function and the difference are in a positive relation. Therefore, model parameters of the Longformer submodels can be adjusted according to the value of the loss function, so that the value of the loss function is minimized, and the prediction precision of the Longformer submodels is improved.

For example, the training corpus is "Xiaoming in mid-autumn moon cake", the first training corpus obtained by masking the training corpus is "Xiaoming in mid-autumn moon cake", the Longformer sub-model predicts the first character as "element" according to the first semantic feature of each character, and the actual masking character is "autumn", and the model parameters of the Longformer sub-model can be adjusted according to the difference between the word vector corresponding to "element" and the word vector corresponding to "autumn" to minimize the difference.

And 306, adopting a second training corpus marked by part-of-speech information of the entity words as an input corpus, and performing second training on the pre-trained Longformer submodels and the CRF submodels in the entity word extraction model so that the CRF submodels subjected to the second training output the entity words according to semantic features output by the Longformer submodels subjected to the second training.

The execution of step 306 may refer to the execution of step 103 in the above embodiment, or refer to the execution of steps 203 to 205 in the above embodiment, which is not described herein.

According to the training method of the entity word extraction model, the mask characters are predicted according to the semantic features output by the output layer of the Longformer submodels, and the Longformer submodels are pre-trained according to the difference between the predicted mask characters and the actual mask characters, so that the accuracy of the output result of the Longformer submodels after training can be improved.

In a possible implementation manner of the embodiment of the present application, in order to further improve accuracy and reliability of the entity word extracted by the entity word extraction model, the first training process may include not only the pre-training process in the foregoing embodiment, but also a fine-tuning (fine-tune) process. The fine tuning process described above will be described in detail with reference to the fourth embodiment.

Fig. 4 is a flowchart of a training method of an entity word extraction model according to a fourth embodiment of the present application.

As shown in fig. 4, the training method of the entity word extraction model may include the following steps:

Step 401, obtaining an entity word extraction model to be trained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel.

Step 402, inputting the characters of the masked first training corpus into a feature extraction layer of the Longformer submodel for feature extraction, and inputting the extracted features of the characters into an attention layer of the Longformer submodel for local attention prediction, so as to obtain local attention weights of the characters.

Step 403, inputting the local attention weight of each character into the output layer of the Longformer submodel to obtain the first semantic feature of each character in the first training corpus determined by the output layer according to the local attention weight.

Step 404, predicting mask characters in the first training corpus according to the first semantic features of each character to obtain first characters.

Step 405, pre-training Longformer the submodel according to the difference between the first character and the actual mask character of the first training corpus.

The execution of steps 401 to 405 may be referred to the execution of the above embodiment, and will not be described herein.

Step 406, inputting the masked first training corpus into the feature extraction layer of the pre-trained Longformer submodel to perform feature extraction, and inputting the extracted features of each character into the attention layer of the pre-trained Longformer submodel to perform local attention prediction and global attention prediction, so as to obtain the local attention weight and the global attention weight of each character.

In the embodiment of the application, the global pointer is used for indicating a certain character, and the character can pay attention to all other characters and pay attention to the character, namely, the global attention weight of each character in the application is used for representing the association degree or attention degree of the corresponding character and each character at other positions in the input corpus.

In the embodiment of the application, in the fine tuning stage, the masked first training corpus is input into the feature extraction layer of the pre-trained Longformer submodel to perform feature extraction, the extracted features of each character are input into the attention layer of the pre-trained Longformer submodel to perform local attention prediction and global attention prediction, and the local attention weight and the global attention weight of each character are obtained by the attention layer prediction.

Step 407, inputting the local attention weight and the global attention weight of each character into the output layer of the pre-trained Longformer submodel to obtain a second semantic feature determined by the output layer according to the local attention weight and the global attention weight.

In the embodiment of the application, the local Attention weight and the global Attention weight of each character predicted by the Attention (Attention) layer of the Longformer submodel can be input into the output layer of the Longformer submodel, and the output layer predicts the second semantic features of each character in the first training corpus according to the local Attention weight and the global Attention weight of each character.

Step 408, predicting mask characters in the first training corpus according to the second semantic features to obtain second characters.

In the embodiment of the application, longformer submodels can predict mask characters in the first training corpus according to the second semantic features of each character to obtain the second characters.

And 409, adjusting model parameters of the Longformer submodel according to the difference between the second character and the actual mask character.

Specifically, the second character is a mask character predicted by the Longformer submodel, and in order to improve the accuracy of the output result of the Longformer submodel, the Longformer submodel may be pre-trained according to the actual mask character of the second character and the first training corpus. In particular, model parameter adjustments may be made to Longformer sub-models based on differences between the second character and the actual mask character of the first training corpus to minimize the differences.

In one possible implementation manner of the embodiment of the present application, the similarity between the second character and the actual mask character may be calculated based on a similarity calculation algorithm, and the difference between the second character and the actual mask character may be determined according to the similarity, where the similarity and the difference are in an inverse relationship. Then, according to the difference between the second character and the actual mask character, determining the value of the loss function, wherein the value of the loss function and the difference are in a positive relation. Therefore, model parameters of the Longformer submodels can be adjusted according to the value of the loss function, so that the value of the loss function is minimized, and the prediction precision of the Longformer submodels is improved.

Step 410, using the second training corpus labeled by the part-of-speech information of the entity word as the input corpus, and performing the second training on the Longformer submodel and the CRF submodel which are trained in the entity word extraction model, so that the CRF submodel which is trained in the second training outputs the entity word according to the semantic features output by the Longformer submodel which is trained in the second training.

The execution of step 410 may be referred to the execution of step 103 in the above embodiment, or the execution of steps 203 to 205 in the above embodiment, which is not described herein.

According to the training method of the entity word extraction model, a fine-tuning (fine-tune) process is further introduced on the basis of a pre-training process, so that accuracy of a Longformer sub-model prediction result can be further improved, and accuracy and reliability of entity words extracted by the entity word extraction model are improved.

It should be noted that, the structure and parameters of the model determine the size and dimension of the input and output of the model, the dimensions of the input data of different models may be different, and the dimensions of the output data may also be different, so in one possible implementation manner of the embodiment of the present application, in any of the foregoing embodiments, in order to facilitate the CRF sub-model with a structure different from that of the Longformer sub-model, it is possible to implement entity word extraction on the semantic features output by the Longformer sub-model, where the entity word extraction model may further include a full connection layer (Fully Connected Layers, abbreviated as FC) disposed between the Longformer sub-model and the conditional random field CRF sub-model, and the full connection layer performs dimensional alignment on the output of the Longformer sub-model and the input of the CRF sub-model.

As an example, the training process of the entity word extraction model may be shown in fig. 5, where the training process mainly includes building the entity word extraction model, pre-training and fine-tuning the Longformer submodel, and training the Longformer submodel+crf submodel to perform entity word extraction by using the Longformer submodel+crf submodel after training.

The structure of the entity word extraction model may be Longformer +fc+crf. The Longformer model is a AllenAI open-source pre-training model, and utilizes the architecture of a transducer to perform non-supervision learning pre-training on a large scale of non-labeled corpus. The Longformer model improves the traditional attentional mechanisms of the transducer: for each character, only the local attention weight is calculated for the nearby characters in the set sliding window, a small amount of global attention weight is calculated in combination with a specific task, the complexity of an attention mechanism is reduced from O (n ²) to O (n), the universality is strong, the deployment is easy, and the method can be used for various document-level tasks.

The FC layer can be realized by convolution operation, and can be converted into convolution with a convolution kernel of 1x1 aiming at a fully-connected full-connection layer of which the front layer is fully-connected; and aiming at the full-connection layer of which the front layer is a convolution layer, the full-connection layer can be converted into global convolution with a convolution kernel of hxw, wherein h and w are the height and width of the convolution result of the front layer respectively.

CRF model, proposed by Lafferty et al in 2001, is a discriminative probability model, a type of random field, and is commonly used for labeling or analyzing sequence data, such as natural language text or biological sequences. The CRF model combines the characteristics of a maximum entropy model and a hidden Markov model, is an undirected graph model, and has good effects in sequence labeling tasks such as word segmentation, part-of-speech labeling, named entity recognition and the like in recent years.

Wherein, the Longformer submodels are pre-trained: pre-training is performed by using a mask language model (Masked Language Modeling, abbreviated as MLM) based on RoBERTa. In pre-training, each layer may use a fixed 512 size sliding window, temporarily without adding global attention. In addition, in order to realize entity word extraction on long text, the position vector position embedding of the characters can be expanded to 4096 and above, so that the Longformer submodel can support 4096 or more characters to extract semantic features in the use stage, and the extracted semantic features fuse the semantics of 4096 or more characters in the periphery.

Fine tuning Longformer submodels: after the pre-training is finished, the pre-trained Longformer submodels are further subjected to fine tuning, and global attention can be increased according to training tasks during fine tuning, namely two sets of mapping matrixes are arranged in total, one set is used for local self-attention, and the other set is used for global attention.

After Longformer sub-model training is completed, a service interface of a semantic vector output service is built, a text is input from the outside, and semantic features of the input text obtained by the service interface are output by utilizing the Longformer sub-model.

Training Longformer submodel+crf submodel: part-of-speech information labeling is carried out on the training corpus in a BIEO mode (namely, the sequence labeling corpus in the corresponding figure 5), semantic features of all characters in the labeled training corpus output by the Longformer submodel are obtained through built semantic vector output service, the semantic features of all the characters are input into the CRF to carry out part-of-speech information prediction on all the characters, and according to the part-of-speech information of all the characters output by the CRF and the difference between the part-of-speech information of all the characters labeled in the training corpus, model parameters of the CRF submodel and the first trained Longformer submodel are adjusted, so that the difference is minimized.

And extracting entity words from the long text by adopting the Longformer submodel and the CRF submodel after training.

The inventor tests and verifies the entity word recognition model in the application, and the entity word recognition model can expand the length of an input text from 512 characters to 8192 characters, so that the interaction capability of the whole text is enhanced. In addition, the inventor applies the entity word moment model to artificial intelligence (ARTIFICIAL INTELLIGENCE, AI for short) product-resume analysis, and compared with the model in the prior art, the accuracy of the entity word extraction result is improved by five points.

Corresponding to the training method of the entity word extraction model provided in the embodiments of fig. 1 to 4, the present disclosure further provides a training device of the entity word extraction model, and since the training device of the entity word extraction model provided in the embodiments of the present disclosure corresponds to the training method of the entity word extraction model provided in the embodiments of fig. 1 to 4, the implementation of the training method of the entity word extraction model is also applicable to the training device of the entity word extraction model provided in the embodiments of the present disclosure, which is not described in detail in the embodiments of the present disclosure.

Fig. 6 is a schematic structural diagram of a training device for entity word extraction model according to a fifth embodiment of the present application.

As shown in fig. 6, the training device 100 for entity word extraction model may include: an acquisition module 110, a first training module 120, and a second training module 130.

The acquiring module 110 is configured to acquire an entity word extraction model to be trained; the entity word extraction model comprises a long text coding Longformer submodel and a conditional random field CRF submodel.

The first training module 120 is configured to perform a first training on the Longformer submodels in the entity word extraction model by using the masked first training corpus as an input corpus, so that the first trained Longformer submodels output semantic features.

The second training module 130 is configured to use the second training corpus labeled by the part-of-speech information of the entity word as an input corpus, perform a second training on the first trained Longformer submodel and the CRF submodel in the entity word extraction model, so that the CRF submodel after the second training outputs the entity word according to the semantic features output by the Longformer submodel after the second training.

Further, in one possible implementation manner of the embodiment of the present application, the second training module 130 is specifically configured to: inputting the second training corpus into the Longformer submodels subjected to the first training to obtain semantic features of each character in the second training corpus; inputting semantic features of each character of the second training corpus into the CRF sub-model to obtain part-of-speech information of each character output by the CRF sub-model; part-of-speech information indicating whether the corresponding character belongs to an entity word; and according to the part-of-speech information of each character output by the CRF sub-model and the difference between the part-of-speech information of each character marked in the second training corpus, adjusting model parameters of the CRF sub-model and the Longformer sub-model after the first training.

Further, in one possible implementation manner of the embodiment of the present application, the part-of-speech information includes one or more combinations of an entity word start identifier, an entity word end identifier, an entity word middle character identifier and a non-entity word identifier; the entity word starting identifier is used for indicating that the corresponding character belongs to the first character of the entity word; the entity word ending mark is used for indicating that the corresponding character belongs to the last character of the entity word; the entity word middle character identifier is used for indicating that the corresponding character belongs to the middle character of the entity word; and the non-entity word identifier is used for indicating that the corresponding character does not belong to the entity word.

Further, in one possible implementation manner of the embodiment of the present application, the first training module 120 is specifically configured to: inputting each character of the first training corpus subjected to mask into a feature extraction layer of a Longformer submodel for feature extraction, and inputting the extracted features of each character into an attention layer of a Longformer submodel for local attention prediction to obtain local attention weights of each character; inputting the local attention weight of each character into an output layer of the Longformer submodel to obtain first semantic features of each character in a first training corpus determined by the output layer according to the local attention weight; according to the first semantic features of each character, predicting mask characters in the first training corpus to obtain first characters; the Longformer submodel is pre-trained based on differences between the first character and the actual mask character of the first training corpus.

Further, in one possible implementation of the embodiment of the present application, the first training module 120 is further configured to: inputting the masked first training corpus into a feature extraction layer of the pre-trained Longformer submodel for feature extraction, and inputting the extracted features of each character into an attention layer of the pre-trained Longformer submodel for local attention prediction and global attention prediction to obtain local attention weights and global attention weights of each character; inputting the local attention weight and the global attention weight of each character into an output layer of the pre-trained Longformer submodel to obtain a second semantic feature determined by the output layer according to the local attention weight and the global attention weight; according to the second semantic features, predicting mask characters in the first training corpus to obtain second characters; and according to the difference between the second character and the actual mask character, performing model parameter adjustment on the Longformer submodel.

Further, in one possible implementation manner of the embodiment of the present application, the local attention weight of each character is used to represent the association degree of the corresponding character with other characters in the adjacent position in the input corpus, where the number of characters spaced between the corresponding character and the other characters in the adjacent position is smaller than a threshold value; wherein the threshold is determined according to the size of the set sliding window.

Further, in a possible implementation manner of the embodiment of the present application, the global attention weight of each character is used to characterize the association degree of the corresponding character with each character at other positions in the input corpus.

Further, in a possible implementation manner of the embodiment of the present application, the feature extraction layer of the Longformer submodel adopts a self-residual network.

Further, in a possible implementation manner of the embodiment of the present application, the entity word extraction model further includes a full connection layer disposed between the Longformer submodel and the conditional random field CRF submodel; and the full connection layer is used for carrying out dimension alignment on the output of the Longformer submodel and the input of the CRF submodel.

In order to implement the above embodiment, the present application further proposes a computer device including: the training method for the entity word extraction model according to any one of the previous embodiments of the present application is implemented when the processor executes the program.

In order to implement the above embodiments, the present application further proposes a non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements a training method for an entity word extraction model as proposed in any of the foregoing embodiments of the present application.

To achieve the above embodiments, the present application further proposes a computer program product, which when executed by a processor, performs a training method of an entity word extraction model as proposed in any of the foregoing embodiments of the present application.

FIG. 7 illustrates a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present application. The computer device 12 shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the application.

As shown in fig. 7, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include industry Standard architecture (Industry Standard Architecture; hereinafter ISA) bus, micro channel architecture (Micro Channel Architecture; hereinafter MAC) bus, enhanced ISA bus, video electronics standards Association (Video Electronics Standards Association; hereinafter VESA) local bus, and peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECTION; hereinafter PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard disk drive"). Although not shown in fig. 7, a disk drive for reading from and writing to a removable nonvolatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable nonvolatile optical disk (e.g., a compact disk read only memory (Compact Disc Read Only Memory; hereinafter CD-ROM), digital versatile read only optical disk (Digital Video Disc Read Only Memory; hereinafter DVD-ROM), or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the application.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, the computer device 12 may also communicate with one or more networks such as a local area network (Local Area Network; hereinafter: LAN), a wide area network (Wide Area Network; hereinafter: WAN) and/or a public network such as the Internet via the network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the methods mentioned in the foregoing embodiments.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. The training method of the entity word extraction model is characterized by comprising the following steps of:

2. The training method according to claim 1, wherein the second training of the first trained Longformer submodel and the CRF submodel in the entity word extraction model using the second training corpus labeled with entity words as an input corpus comprises:

inputting the second training corpus into the Longformer submodels subjected to the first training to obtain semantic features of each character in the second training corpus;

inputting semantic features of each character of the second training corpus into the CRF sub-model to obtain part-of-speech information of each character output by the CRF sub-model; the part-of-speech information is used for indicating whether the corresponding character belongs to an entity word;

And adjusting model parameters of the CRF sub-model and the first trained Longformer sub-model according to the part-of-speech information of each character output by the CRF sub-model and the difference between the part-of-speech information of each character marked in the second training corpus.

3. Training method according to claim 1 or 2, characterized in that the part-of-speech information comprises one or more combinations of an entity word start identifier, an entity word end identifier, an entity word intermediate character identifier and a non-entity word identifier;

the entity word starting identifier is used for indicating that the corresponding character belongs to the first character of the entity word;

the entity word ending mark is used for indicating that the corresponding character belongs to the last character of the entity word;

The entity word middle character identifier is used for indicating that the corresponding character belongs to the middle character of the entity word;

and the non-entity word identifier is used for indicating that the corresponding character does not belong to the entity word.

4. The training method of claim 1, wherein the first training of the Longformer sub-models in the entity word extraction model using the masked first training corpus as the input corpus comprises:

inputting each character of the masked first training corpus into a feature extraction layer of the Longformer submodel for feature extraction, and inputting the extracted features of each character into an attention layer of the Longformer submodel for local attention prediction to obtain local attention weights of each character;

inputting the local attention weight of each character into an output layer of the Longformer submodel to obtain a first semantic feature of each character in the first training corpus determined by the output layer according to the local attention weight;

predicting mask characters in the first training corpus according to the first semantic features of each character to obtain first characters;

and pre-training the Longformer submodel according to the difference between the first character and the actual mask character of the first training corpus.

5. The training method of claim 4, wherein the pre-training the Longformer sub-model based on the difference between the first character and the actual mask character of the first training corpus further comprises:

Inputting the masked first training corpus into a feature extraction layer of the pre-trained Longformer submodel for feature extraction, and inputting the extracted features of each character into an attention layer of the pre-trained Longformer submodel for local attention prediction and global attention prediction to obtain local attention weights and global attention weights of each character;

Inputting the local attention weight and the global attention weight of each character into an output layer of the pre-trained Longformer submodel to obtain a second semantic feature determined by the output layer according to the local attention weight and the global attention weight;

predicting mask characters in the first training corpus according to the second semantic features to obtain second characters;

And according to the difference between the second character and the actual mask character, performing model parameter adjustment on the Longformer submodel.

6. The training method of claim 4 or 5, wherein,

The local attention weight of each character is used for representing the association degree of the corresponding character and other characters at the adjacent position in the input corpus, wherein the number of the characters at intervals between the corresponding character and the other characters at the adjacent position is smaller than a threshold value;

Wherein the threshold is determined according to the size of the set sliding window.

7. The training method of claim 5, wherein,

The global attention weight of each character is used for representing the association degree of the corresponding character and each character at other positions in the input corpus.

8. The training method of claim 4, wherein,

And the feature extraction layer of the Longformer submodel adopts a self-residual network.

9. The training method of claim 1, wherein the entity word extraction model further comprises a full connection layer disposed between the Longformer submodel and the conditional random field CRF submodel;

The full connection layer is used for carrying out dimension alignment on the output of the Longformer submodel and the input of the CRF submodel.

10. A training device for an entity word extraction model, comprising:

11. Training device according to claim 10, characterized in that the second training module is specifically configured to:

12. The training device of claim 10 or 11, wherein the part-of-speech information comprises one or more combinations of an entity word start identifier, an entity word end identifier, an entity word intermediate character identifier, and a non-entity word identifier;

13. The training device of claim 10, wherein the first training module is specifically configured to:

14. The training device of claim 13, wherein the first training module is further configured to:

15. Training device according to claim 13 or 14, characterized in that,

16. The training device of claim 14, wherein the device comprises a plurality of sensors,

17. The training device of claim 13, wherein the device comprises a plurality of sensors,

18. The training device of claim 10, wherein the entity word extraction model further comprises a full connection layer disposed between the Longformer submodel and a conditional random field CRF submodel;

19. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a training method for the entity word extraction model of any one of claims 1-9 when the program is executed.

20. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a training method of an entity word extraction model according to any of claims 1-9.

21. A computer program product, characterized in that the training method of the entity word extraction model according to any of claims 1-9 is performed when instructions in the computer program product are executed by a processor.