CN113761924A

CN113761924A - Training method, device, equipment and storage medium of named entity model

Info

Publication number: CN113761924A
Application number: CN202110420593.7A
Authority: CN
Inventors: 张颖; 孟凡东; 陈钰枫; 徐金安
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-12-07

Abstract

The application discloses a method, a device, equipment and a storage medium for training a named entity recognition model. And taking the model parameters of the trained first model as initial model parameters of a second model, and training the second model through a first task data set, wherein the first task data set is a data set corresponding to a first task, and the first task is other tasks except the named entity recognition task. And taking the model parameters of the trained second model as initial model parameters of a third model, and finely adjusting the third model through a third named entity recognition data set with labels in the source data to obtain a named entity recognition model. The method can realize the purpose of using richer data to carry out the related knowledge migration, improves the recognition effect of the named entity, and has practicability and expandability.

Description

Training method, device, equipment and storage medium of named entity model

Technical Field

The present application relates to the field of artificial intelligence natural language processing, and in particular, to a method, an apparatus, a device, and a storage medium for training a named entity recognition model.

Background

The artificial intelligence technology is a comprehensive subject, Natural Language Processing (NLP) is a large direction of artificial intelligence research, and Named Entity Recognition (NER) is an important basic tool in application fields such as information extraction, question-answering systems, syntactic analysis, machine translation, and the like, and plays an important role in the process of bringing the Natural Language Processing technology into practical use.

In the related art, a neural network model is mainly trained through a large-scale labeled data set labeled manually, and named entities in text data are identified through the neural network model.

This approach requires a large amount of annotation data, however, in many languages and domains in the real world, annotation data is usually small, and therefore, the problem of low-resource or even zero-resource named entity recognition arises. Aiming at low-resource and even zero-resource languages or fields, how to train and obtain a named entity model to accurately identify the named entity is an urgent technical problem to be solved.

Disclosure of Invention

In order to solve the technical problems, the application provides a training method, a training device, a training apparatus and a training storage medium for a named entity recognition model, which can use richer data sets for transfer learning, show better recognition effects on multiple data sets in cross-language, cross-field and other scenes, and further improve the named entity recognition effect by a target-oriented training strategy, and have practicability and expandability.

The embodiment of the application discloses the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for training a named entity recognition model, where the method includes:

training a first model according to a first named entity recognition data set without labels in target data and a second named entity recognition data set without labels in source data;

acquiring a first task data set with a label, wherein the first task data set is a data set corresponding to a first task, and the first task is other tasks except a named entity identification task;

taking the model parameters of the first model obtained by training as the initial model parameters of a second model, and training the second model through the first task data set;

and taking the model parameters of the second model obtained by training as initial model parameters of a third model, and finely adjusting the third model through a third named entity recognition data set with labels in the source data to obtain a named entity recognition model.

In a second aspect, an embodiment of the present application provides a training apparatus for a named entity recognition model, where the apparatus includes a first training unit, an obtaining unit, a second training unit, and a third training unit:

the first training unit is used for training the first model according to the unmarked first named entity recognition data set in the target data and the unmarked second named entity recognition data set in the source data;

the acquiring unit is used for acquiring a first task data set with a label, wherein the first task data set is a data set corresponding to a first task, and the first task is other tasks except a named entity identifying task;

the second training unit is used for taking the model parameters of the first model obtained by training as the initial model parameters of the second model and training the second model through the first task data set;

and the third training unit is used for taking the model parameters of the second model obtained by training as the initial model parameters of a third model, and finely adjusting the third model through a third named entity recognition data set with labels in the source data to obtain a named entity recognition model.

In a third aspect, an embodiment of the present application provides a training device for a named entity recognition model, where the electronic device includes a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the method of the first aspect according to instructions in the program code.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium for storing program code for executing the method of the first aspect.

According to the technical scheme, in order to train a more accurate and effective named entity recognition model under the scene of low resources and even zero resources and solve the problem of low-resource and even zero-resource named entity recognition, when the named entity recognition model is trained, a third named entity recognition data set with labels in source data (namely a source field or a source language) is used, and in order to reduce the difference between different fields (languages) and realize a target-oriented training strategy, a first model can be trained according to a first named entity recognition data set without labels in target data and a second named entity recognition data set without labels in the source data, so that the distance between the source field (language) and the target field (language) is shortened. And then further acquiring a first task data set with labels, wherein the first task data set is a data set corresponding to the first task, the first task is other tasks except the named entity recognition task, the model parameters of the trained first model are used as the initial model parameters of the second model, and the second model is trained through the first task data set, so that the migration learning is carried out by combining the data sets of other tasks, and the purpose of carrying out related knowledge migration by using richer data is realized. And then, taking the model parameters of the trained second model as the initial model parameters of the third model, and finely adjusting the third model through a third named entity recognition data set with labels in the source data to obtain the named entity recognition model. The method simultaneously uses the first named entity recognition data set, the second named entity recognition data set and other labeled data sets of other tasks for transfer learning, can use richer data sets for transfer learning, shows better recognition effect on a plurality of data sets under cross-language, cross-field and other scenes, and simultaneously further improves the named entity recognition effect by a target-oriented training strategy, and has practicability and expandability.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and for a person of ordinary skill in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is an architectural diagram illustrating knowledge migration from tasks, languages, and domains according to an embodiment of the present disclosure;

fig. 2 is a schematic system architecture diagram of a training method for a named entity recognition model according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a training method for a named entity recognition model according to an embodiment of the present disclosure;

FIG. 4 is a diagram of a target-oriented migration learning framework provided by an embodiment of the present application;

FIG. 5 is a diagram illustrating a training method of a named entity recognition model according to the related art;

FIG. 6 is a flowchart of a training method for a named entity recognition model according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a training apparatus for a named entity recognition model according to an embodiment of the present disclosure;

fig. 8 is a structural diagram of a terminal device according to an embodiment of the present application;

fig. 9 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

The related technology mainly trains a neural network model through a large-scale labeled data set labeled manually, and named entities in the text data are identified through the neural network model. The method needs a large amount of labeled data, however, in many languages and fields in the real world, generally, labeled data are few, and for low-resource and even zero-resource languages or fields, how to train the named entity model to accurately identify the named entity is an urgent technical problem to be solved.

In order to solve the technical problem, a named entity recognition model is usually trained in a migration learning manner in a low-resource or even zero-resource scenario. The application analyzes the key of the problem of low-resource and even zero-resource Named Entity Recognition (NER), namely the lack of target training data which simultaneously meets the following two conditions: 1) from the target domain (or language), 2) with annotation information for the target task (NER task). Although this target training data is difficult to acquire, data satisfying only one of the above two conditions is easy to acquire. Therefore, the present application considers knowledge migration from three aspects of task, language and domain, as shown in fig. 1, taking as an example that the target training data to be acquired is a target language such as spanish and the target task is named entity recognition, the knowledge migration can be performed from three aspects of task, language and domain. In fig. 1, "sao paulo" is spanish, "and" < sao paulo, place name >, < Brasil, place name > "represents the label data of the named entity recognition task.

Tasks are things to do, and may include machine reading understanding (MRC), named entity recognition, for example, "< place name >" in fig. 1 representing annotation data for the named entity recognition task, "what has the most to lose? The | WNJ has the most to lose "represents that the machine reads the text data understanding this task; the languages may include, for example, Spanish, English, German, and so on, and "Como se contact public spoken en Brasil patent" is text data of Spanish and "Brazil to use hovercraft for Amazon travel" is text data of English in FIG. 1; fields may include, for example, medical, biological, news, twitter, etc., and "Santander, 23 may" in FIG. 1 is text data of the field of news, and "RT @ Gabrile _ Corno: Beach by Josh Adamski # media # administration # CGE http:// t.co/ParMW4CG 4X" is text data of the field of twitter.

Specifically, the embodiment of the present application provides a training method for a named entity recognition model, the method simultaneously uses a first named entity recognition data set without labels in target data (i.e., a target field or a target language), a second named entity recognition data set without labels in source data (i.e., a source field or a source language), and labeled data sets of other tasks for migration learning, and can use richer data sets for migration learning, thereby showing a better recognition effect on multiple data sets in scenarios such as cross-language and cross-field, and meanwhile, a target-oriented training strategy further improves the named entity recognition effect, and has practicability and extensibility.

It should be noted that the method provided by the embodiment of the present application may be applied to the application fields of information extraction, question-answering system, syntax analysis, machine translation, and the like.

The method provided by the embodiments of the present application may relate to Artificial Intelligence (AI), which is a theory, method, technique, and application system that simulates, extends, and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge, and uses the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

For example, Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

For another example, the Machine Learning (ML) is a multi-domain cross discipline, and relates to a multi-domain discipline such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning. The named entity recognition model is trained through transfer learning.

For convenience of understanding, the method for training the named entity recognition model provided in the embodiment of the present application may be described with reference to fig. 2, and the method may be executed by a data processing device, where the data processing device may be a terminal or a server. The terminal can be a computer, a tablet computer and the like, and the server can be an independent server or a cluster server.

Fig. 2 is a system architecture diagram of a training method for a named entity recognition model according to an embodiment of the present disclosure. The system architecture, for example, in which the data processing apparatus is a server, includes a server 201.

The low-resource or zero-resource named entity recognition means that the appearance positions and entity types of named entities such as name of a person, place name, organization name and the like are recognized on the premise of lacking of named entity marking data of a target field (or language). Wherein, the data corresponding to the target domain (or language) may constitute the target data, and the data corresponding to other domains (or languages) except the target domain (or language) may constitute the source data.

In low-resource and even zero-resource scenarios, the named entity recognition model is usually trained by means of transfer learning, which may be to apply knowledge learned in a certain domain (or language) or task to a different related domain (or language) or task. Therefore, in the present application, since the target domain (or language) lacks the annotation data, in order to train and obtain the named entity recognition model applied to the target domain (or language), the migration learning can be performed in combination with the data of other domains (or languages), i.e., the source domain (or language), and the data of other tasks.

Specifically, the server 201 may obtain a first named entity recognition data set without a label in the target data and a second named entity recognition data set without a label in the source data, and train the first model according to the first named entity recognition data set and the second named entity recognition data set, so as to shorten a distance between the source domain (or language) and the target domain (or language), and implement a target-oriented training strategy to a certain extent.

Then, the server 201 obtains a first task data set with labels, where the first task data set is a data set corresponding to a first task, and the first task is a task other than the named entity identifying task. And the model parameters of the first model obtained by training are used as the initial model parameters of the second model, and the second model is trained through the first task data set, so that the data sets of other tasks are combined for transfer learning, and the purpose of transferring relevant knowledge by using richer data is realized.

Then, the server 201 uses the trained model parameters of the second model as the initial model parameters of the third model, and fine-tunes the third model through a third named entity recognition data set with labels in the source data to obtain a named entity recognition model.

Next, the method for training the named entity model provided in the embodiment of the present application will be described in detail with a server as an execution subject.

Referring to fig. 3, fig. 3 shows a flow chart of a method of training a named entity recognition model, the method comprising:

s301, training the first model according to the first named entity recognition data set without labels in the target data and the second named entity recognition data set without labels in the source data.

In the embodiment of the application, knowledge migration from three aspects of tasks, languages and fields is considered, and firstly, four feasible guiding principles are provided for selecting data sets and effectively migrating knowledge, which are specifically as follows: 1) available knowledge is essential from the three perspectives of task, domain and language; 2) for migration of domain and language knowledge, consider zooming in on the distance between the source domain (or language) and the target domain (or language); 3) for the migration of the target task (NER) knowledge, the distance between the target task (NER) and other tasks (such as MRC) is considered to be shortened, so that the purpose of helping the learning of the target task by using the labeling information of the other tasks is achieved; 4) finally, the knowledge from the target domain (or language) and target task (NER), respectively, is fused, further approaching the target training data. Based on the four guiding principles, the embodiment of the present application designs an object-oriented migration learning framework, which is shown in fig. 4 and includes two parts: 1) a knowledge migration module for migrating task, domain and linguistic knowledge from different types of available data; 2) and the fine tuning training module comprises a complete data utilization method and various training strategies.

The fine tuning training module can be realized based on a neural network model, such as an AdaptaBERT model, and the whole model framework comprises three auxiliary training tasks: a first model, a second model, and a named entity recognition model.

Due to the target domain (or language)Say) that the named entity with the annotation identifies the dataset and, therefore, can be migrated from other domains (or languages) through the knowledge migration module. Based on the guiding principle 2 in the fine tuning training module), in order to narrow the gap between the source domain (or language) and the target domain (or language), the server may obtain a first named entity recognition data set without labels in the target data (e.g., a data set identified by a) in fig. 4) and a second named entity recognition data set without labels in the source data (e.g., a data set identified by b) in fig. 4), so as to train the first model together with the first named entity recognition data set and the second named entity recognition data set, thereby reducing the distance between the two, and ensuring that the named entity recognition model obtained by subsequent migration learning is more suitable for the target domain (or language). Wherein the first named identification data set can be represented by D_t,noRepresenting, a second named entity identifying a data set D_s,noAnd (4) showing. The source domain (or language) may include a single domain (or language) or may include multiple domains (or languages), which is not limited in this embodiment.

For example, when named entity recognition (target task) in spanish (target language) is required, however, the named entity recognition annotation data in spanish is very small, so that migration learning can be performed through other languages to train a named entity recognition model that can be used for named entity recognition in spanish. Wherein the other language may use one language, such as english; multiple languages, such as english, french, german, etc., may also be used simultaneously.

It should be noted that the first model may be a language model that can be trained based on the label-free dataset. In one possible implementation, the first Model may be, for example, a Mask Language Model (MLM), denoted as f (·, θ ·)_mlm). MLM helps the named entity recognition model learn the context-dependent word representations by reconstructing the randomly masked words. Based on guiding principle 2), MLM performs model training on the mixed data sets identified by a) and b) in fig. 4.

S302, a first task data set with labels is obtained.

Based on guiding principle 1 in the fine tuning training module), in addition to using the NER data sets of the source domain (or language) and the target domain (or language), the labeled data of other tasks needs to be considered at the same time, so as to use richer data for migration learning. Therefore, in this embodiment, the server further needs to obtain a first task data set, where the first task data set is a data set corresponding to the first task, and the first task is a task other than the named entity identification task.

The first task is a task similar to the named entity recognition, for example, the named entity recognition is to recognize entities from text data, each entity is a fragment in the text data, and therefore, the named entity recognition is similar to the fragment extraction task, so the first task may be the fragment extraction task. Since some MRC tasks may implement segment extraction, the first task may be a segment-extraction-type MRC task, and of course, the embodiment of the present application does not limit what kind of segment extraction task the first task is specifically.

It should be noted that the first task data set may be obtained in different manners, and in a possible implementation manner, the first task data set may be determined according to a combination of one or more of the first data subset with labels corresponding to the first task in the target data, the second data subset with labels corresponding to the first task in the source data, and the second data set with labels. The second task data set is a data set corresponding to the second task, and the second task is other tasks except the first task.

Based on the guiding principle 3 in the fine tuning training module), in order to reduce the distance between the target task (NER) and other tasks (such as MRC) and thus help the learning of the target task by using the labeling information of the other tasks, the second task may be a named entity recognition task, so that the distance between the two tasks is further shortened in the subsequent training process, and a better transfer learning effect is achieved.

Taking the example where the first task is an MRC task, that is, the first task data set may be the true one obtained directly from the target dataMRC data (i.e. the first subset of data, which may be D)_t,mRepresenting, for example, the data set identified in c) of fig. 4), the actual MRC data (i.e., the second subset of data, which may be D) obtained directly from the source data_s,mRepresenting, for example, the dataset identified by d) in fig. 4), or may be determined by the labeled second task dataset in the source data. Since the second task data set is a data set corresponding to the second task, the second task data set needs to be converted into a data set corresponding to the first task. Of course, the first task data set may also be derived from various combinations of the above.

Wherein, when the first task is the MRC task, the second task may be the NER task, i.e. the second task data set may be the NER data set with labels, which may be D_s,nRepresenting, for example, the data set identified by f) in FIG. 4; in addition, the second task may also be another task, such as an event extraction task, which is not limited in this embodiment.

In a possible implementation manner, if the first task data set is determined according to the second task data set with the label, the determining of the first task data set according to the second task data set with the label may be performed by converting text data in the second task data set into text data conforming to a target text data format, so as to obtain the third data subset. The target text data format is a text data format corresponding to the first task, and the converted third data subset can be obtained by D_s,nmAnd (4) showing. Then, a first task data set is determined from the third data subset.

The way of determining the first task data set according to the third data subset may be to directly determine the third data subset as the first task data set, and in another possible implementation, the first task data set may also be combined with the first data subset and/or the second data subset. For example, in FIG. 4, the first set of task data used to train the second model includes D_t,m，D_s,mAnd according to D_s,nD obtained by conversion_s,nm。

S303, taking the model parameters of the trained first model as initial model parameters of a second model, and training the second model through the first task data set.

In the migration learning, the knowledge learned in S301 may be migrated first, that is, the model parameters of the trained first model are used as the initial model parameters of the second model, so that the second model learns the knowledge of the source domain (or language), and the first task data set obtained in S302 is used to train the second model.

Wherein the second model corresponds to the first task, and if the first task is a segment-extracted MRC task, the second model is a segment-extracted MRC model represented as g (·, θ)_mrc). The training process of the first model and the second model may be a process of pre-training a language model.

The segment extraction type MRC model mainly has the following three advantages: 1) the MRC model can improve the capability of the NER on fragment extraction and help the NER to better capture semantic information of different entity types; 2) the MRC model can be used to solve many tasks, including NER, so the MRC can act as a bridge to communicate NER and other tasks; 3) the application of the MRC model to other tasks provides a unified framework for migrating different knowledge from other tasks.

If the second model is a segment-extraction-type MRC model, one possible implementation of training the second model with the first task data set is to determine a vector corresponding to each word in text data included in the first task data set, input the vector corresponding to each word into the second model, and obtain prediction results that each word is the beginning of the target segment and the end of the target segment. The possibility that each word is the beginning of the target segment and the ending of the target segment is predicted, then a loss function is constructed through the prediction result, the loss function corresponding to the prediction result is optimized, initial model parameters of the second model are adjusted, and training of the second model is completed.

The words can be constituent units of the text data, the determining modes of the words in different languages are different, and if the language is Chinese, the words can be obtained by segmenting the text data; if the language is a foreign language other than Chinese, such as English, Spanish, etc., the word may be a word in the text data.

If the segment-extracted MRC model includes two linear classification layers, and a word is a word, the specific training method is as follows, and the context-dependent word is expressed as a vector h_tRespectively input into two linear classification layers. The likelihood that each word is predicted to be the target segment start and the target segment end is calculated by:

wherein the content of the first and second substances,

is a vector h_tThe probability that the corresponding word is predicted to start with the target segment,

is a vector h_tProbability of the corresponding word being predicted to end in the target segment, W_statrAnd

is a learnable parameter, d₁Representing the dimensions of the contextually relevant word representation. Finally, by optimizing

And

the loss function on the cross to perform model training.

S304, taking the model parameters of the trained second model as initial model parameters of a third model, and finely adjusting the third model through a third named entity recognition data set with labels in the source data to obtain a named entity recognition model.

Wherein the third named entity recognition dataset is a NER dataset with labels, which may be denoted D_s,nE.g., f) in fig. 4. When the first task is the MRC task and the second task is the NER task in S302, the third named entity recognition data set may be used as the second task data set for determining the first task data set to train the second model.

The named entity recognition model is used for further fine-tuning vectors corresponding to the words and obtaining a model for prediction. The implementation is as follows, first, the vector corresponding to the context-dependent word is input into the linear classification layer of the named entity recognition model, and training is performed by maximizing the probability of each word on the correct entity label. In particular, given a sequence of input words comprising N words, for example, where a word is a word

x_iRepresenting the ith word. It is first input to an encoder f_θObtaining the vector corresponding to each word, thereby obtaining the vector sequence corresponding to all words

Wherein, H ═ f_θ(X)，h_iRepresenting the vector corresponding to the ith word. Encoder f_θθ represents the initial model parameters based on a pre-trained language model implementation, such as BERT.

Then h is put into_iInputting the probability distribution of the vector corresponding to the current position on all entity labels into a system comprising a linear classification layer and a softmax function, wherein the probability distribution is represented as follows:

wherein

Y representsThe one-hot code (onehot) vector corresponding to the entity label, { W, b } represents trainable parameters in the named entity model, and its corresponding initial values, i.e., initial model parameters, may be model parameters of the second model. And the loss function is the cross entropy of the real probability and the prediction probability of each word on all entity labels, and further, the training of the named entity recognition model is completed according to the loss function.

The training steps shown in S301-S304 may correspond to those shown in FIG. 4.

After the named entity recognition model is trained, named entity recognition may be performed using the named entity recognition model, i.e., prediction may be performed using the named entity recognition model, as shown in the ear in fig. 4. In a possible implementation manner, text data to be recognized may be acquired, an entity tag of each word in the text data to be recognized is determined through the named entity recognition model, and then an entity in the text data to be recognized is recognized according to the entity tag.

Wherein the text data to be recognized may be the unlabeled NER dataset identified by e) in FIG. 4, denoted as D_t,no. Of course, the text data to be recognized may be other than D_t,noAnd the text data which is not marked, such as the text data which is acquired in real time and needs to be subjected to named entity recognition.

Compared with the related art, referring to fig. 5, in the related art, a framework of Domain tuning (Domain tuning) and Task tuning (Task tuning) is adopted to perform zero-resource named entity recognition across domains, which only considers label-free data in the target Domain (or language) and the source Domain (or language) and data with label information in the source Domain (or language), but ignores label data of other tasks. Specifically, the related art performs domain fine adjustment by using label-free data in the source domain and the target domain to obtain a mask result, and then performs model parameter migration based on the domain fine adjustment to start task fine adjustment. And in the task fine adjustment process, label prediction is carried out by utilizing data with labeling information in a source field (or language) to obtain an entity label, so that the named entity recognition model obtained by training is used in a target field. In the embodiment of the application, the labeling data of other tasks are considered, a richer data set is used for transfer learning, the distance between a target task (NER) and other tasks (such as MRC) is considered, a target-oriented training strategy is realized, the named entity identification effect is further improved, and the method has practicability and expandability.

The recognition effects of the method provided by the embodiment of the application and the method provided by the related technology on a plurality of data sets under two scenes of cross-language and cross-field can be seen in table 1:

TABLE 1

Wherein D1, D2, D3, D4, D5, D6 represent different data sets, D1, D2, D3 are domain-related data sets, D4, D5, D6 are language-related data sets, in particular, D1: WNUT16 named entity recognition data set, twitter domain, containing ten predefined entity classes, such as gardens, etc.; d2 Twitter NER named entity recognition dataset, Twitter field, containing four predefined entity classes such as person name (PER), location name (LOC), organizational name (GEO), others (MISC); d3 SciTech named entity recognition dataset, science and technology News field, containing four predefined entity categories such as people name (PER), location name (LOC), organizational organization name (GEO), others (MISC); d4 CoNL 03 named entity recognition dataset, German, containing four predefined entity classes like person name (PER), location name (LOC), organization name (GEO), others (MISC); d5 CoNL 02 named entity recognition dataset, Spanish, containing four predefined entity classes like person name (PER), location name (LOC), organizational name (GEO), others (MISC). As can be seen from table 1, on each data set, the method provided by the present application shows a higher recognition effect, for example, 68.40 is greater than 62.8, etc., which means that the method provided by the present application shows a better recognition effect of the named entity on multiple data sets in the scenarios of cross-language and cross-domain, etc., compared to the related art.

It should be noted that the method for training the third model to obtain the named entity recognition model may include many methods, for example, training directly by using the method described in S304. Of course, it is also possible to consider performing enhanced training on the named entity recognition model by using a more complicated method, so as to further improve the recognition effect of the named entity recognition model.

Therefore, in a possible implementation manner, the third model is fine-tuned by the third named entity recognition data set with the label in the source data, and the named entity recognition model is obtained by pre-training the third model by the third named entity recognition data set with the label in the source data, and at this time, a primarily trained named entity recognition model can be obtained. Then, performing label prediction on the text Data in the first named entity identification Data set by using the pre-trained third model to obtain the first named entity identification Data set with the Pseudo label, as shown in the fourth step in fig. 4, that is, constructing Pseudo label Data (Pseudo Data), which can be expressed as Pseudo label Data

The pseudo-labeled NER dataset is then

The method is applied to continuously training the third model, namely training the pre-trained third model according to the first named entity recognition data set with the pseudo labels, and the third model is shown as fifth in fig. 4, so that the final named entity recognition model is obtained.

It should be noted that, after the first named entity recognition data set with the pseudo label is obtained, the enhanced training of the pre-trained third model may be continued, and since the initial model parameters of the third model are obtained by migrating from the second model, the effect of the second model directly affects the recognition effect of the named entity recognition model obtained by the final training. Therefore, in a possible implementation manner, the third model after pre-training is trained according to the first named entity recognition data set with the pseudo label, and the manner of obtaining the named entity recognition model may be to determine the first task data set according to the first named entity recognition data set with the pseudo label, so as to train the second model according to the first task data set obtained at this time, as shown in fig. 4 (c). Then, the pre-trained third model is trained according to the first named entity recognition data set with the pseudo labels, so as to obtain a named entity recognition model, which is shown in fig. 4. The initial model parameters of the pre-trained third model are obtained according to the model parameters of the second model obtained by training. The first named entity recognition dataset with the pseudo label is iteratively trained between the pre-trained third model and the second model.

That is, after completing one iterative training of the third model, for example, the ith iterative training, the first named entity recognition data set with the pseudo label is updated by using the third model obtained by the ith iterative training, which is shown in fig. 4. Then, the (i + 1) th iterative training is performed using the updated first named entity recognition data set with the pseudo label, see ninthly in fig. 4. And after T times of iterative training, obtaining a final named entity recognition model, and using the named entity recognition model for named entity recognition.

In this embodiment, by performing the enhanced training on the named entity recognition model, compared with a pipeline model framework adopted in the related art, the knowledge contained in different data sets is more sufficiently learned, and the obtained named entity recognition model has a better recognition effect.

Next, a named entity recognition method is introduced in combination with an actual application scenario, where the named entity recognition method is implemented based on a named entity recognition model, and the named entity recognition model is obtained by training according to the training method of the named entity recognition model provided in the embodiment of the present application. In order to perform named entity recognition, a named entity recognition model needs to be trained first. In some fields, the named entity recognition model has very little or no labeling data, and in this case, the named entity recognition model can be trained based on the target-oriented migration learning method provided by the embodiment of the application. Wherein, the whole model framework comprises three auxiliary training tasks: a first model, a second model and a named entity recognition model, in this embodiment, the first model is a Mask Language Model (MLM), and the second model is a segment-extracted machine reading understanding (MRC) model, referring to fig. 6, the method includes:

s601, training the MLM according to the first named entity recognition data set without labels in the target data and the second named entity recognition data set without labels in the source data.

And S602, taking the model parameters of the MLM obtained by training as initial model parameters of the MRC model.

And S603, converting the NER data set with the label into an MRC data set with the label.

At this time, the NER dataset with label, i.e. the second task dataset with label, is converted into the MRC dataset with label, i.e. the third data subset.

S604, training the MRC model according to the first data subset with the labels corresponding to the first tasks in the target data, the second data subset with the labels corresponding to the first tasks in the source data and the MRC data set with the labels obtained through conversion.

And S605, taking the model parameters of the MRC model obtained by training as the initial model parameters of the third model.

S606, pre-training the third model through a third named entity recognition data set with labels in the source data.

S607, label prediction is carried out on the text data in the first named entity recognition data set by using the pre-trained third model, and the first named entity recognition data set with the pseudo label is obtained.

And S608, performing iterative training on the first named entity recognition data set with the pseudo labels between the MRC model and the third model.

The iterative training may be performed by updating the first named entity recognition data set with the pseudo label in the third model obtained by the ith iterative training. Then, an (i + 1) th iterative training is performed using the updated first named entity recognition data set with the pseudo label.

And S609, obtaining a final named entity recognition model after T times of iterative training.

And S610, utilizing the obtained named entity identification model to identify the named entity.

Based on the training method of the named entity recognition model provided in the embodiment corresponding to fig. 3, an embodiment of the present application further provides a training apparatus of the named entity recognition model, referring to fig. 7, an apparatus 700 includes a first training unit 701, an obtaining unit 702, a second training unit 703, and a third training unit 704:

the first training unit 701 is configured to train a first model according to a first named entity recognition data set that is not labeled in the target data and a second named entity recognition data set that is not labeled in the source data;

the obtaining unit 702 is configured to obtain a first task data set with a label, where the first task data set is a data set corresponding to a first task, and the first task is a task other than a named entity identification task;

the second training unit 703 is configured to use the model parameters of the first model obtained through training as initial model parameters of a second model, and train the second model through the first task data set;

the third training unit 704 is configured to use the model parameters of the second model obtained through training as initial model parameters of a third model, and fine-tune the third model through a third named entity recognition data set with labels in the source data to obtain a named entity recognition model.

In a possible implementation manner, the first task data set is determined according to a combination of one or more of a first data subset with a label corresponding to the first task in the target data, a second data subset with a label corresponding to the first task in the source data, and a second task data set with a label, where the second task data set is a data set corresponding to a second task, and the second task is a task other than the first task.

In a possible implementation manner, if the first task data set is determined according to the second task data set with labels, the obtaining unit 702 is configured to:

converting the text data in the second task data set into text data conforming to a target text data format to obtain a third data subset, wherein the target text data format is a text data format corresponding to the first task;

determining the first task data set from the third subset of data.

In one possible implementation, the first model is a mask language model and the second model is a segment-extraction machine reading understanding model.

In a possible implementation manner, the second training unit 703 is configured to determine a vector corresponding to each word in text data included in the first task data set;

inputting the vector corresponding to each word into the second model to obtain a prediction result that each word is the beginning of the target segment and the end of the target segment;

and adjusting initial model parameters of the second model by optimizing a loss function corresponding to the prediction result, and finishing the training of the second model.

In a possible implementation manner, the third training unit 704 is configured to pre-train the third model through a third named entity recognition data set with labels in the source data;

performing label prediction on the text data in the first named entity recognition data set by using a pre-trained third model to obtain a first named entity recognition data set with a pseudo label;

and training the pre-trained third model according to the first named entity recognition data set with the pseudo label to obtain the named entity recognition model.

In a possible implementation manner, the third training unit 704 is configured to determine the first task data set according to the first named entity recognition data set with the pseudo label;

training the second model according to the first task data set;

and training the pre-trained third model according to the first named entity recognition data set with the pseudo label to obtain the named entity recognition model, wherein the first named entity recognition data set with the pseudo label is subjected to iterative training between the pre-trained third model and the second model.

In a possible implementation manner, the third training unit 704 is configured to update the first named entity recognition data set with the pseudo label by using a third model obtained by the ith iterative training;

and performing (i + 1) th iterative training by using the updated first named entity recognition data set with the pseudo labels.

The embodiment of the present application further provides a training device for a named entity recognition model, where the training device may be a data processing device, for example, may be a terminal device, and the terminal device is taken as a smart phone as an example:

fig. 8 is a block diagram illustrating a partial structure of a smartphone related to a terminal device provided in an embodiment of the present application. Referring to fig. 8, the smart phone includes: radio Frequency (RF) circuit 810, memory 820, input unit 830, display unit 840, sensor 850, audio circuit 860, wireless fidelity (WiFi) module 870, processor 880, and power supply 890. The input unit 830 may include a touch panel 831 and other input devices 832, the display unit 840 may include a display panel 841, and the audio circuit 860 may include a speaker 861 and a microphone 862. Those skilled in the art will appreciate that the smartphone configuration shown in fig. 8 is not intended to be limiting, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The memory 820 may be used to store software programs and modules, and the processor 880 executes various functional applications and data processing of the smart phone by operating the software programs and modules stored in the memory 820. The memory 820 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the smartphone, and the like. Further, the memory 820 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 880 is a control center of the smart phone, connects various parts of the entire smart phone using various interfaces and lines, and performs various functions of the smart phone and processes data by operating or executing software programs and/or modules stored in the memory 820 and calling data stored in the memory 820, thereby integrally monitoring the smart phone. Optionally, processor 880 may include one or more processing units; preferably, the processor 880 may integrate an application processor, which mainly handles operating systems, user interfaces, applications, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 880.

In this embodiment, the processor 880 in the terminal device may execute the following steps:

The device may further include a server, and embodiments of the present application also provide a server, please refer to fig. 9, fig. 9 is a structural diagram of a server 900 provided in embodiments of the present application, and the server 900 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 922 (e.g., one or more processors) and a memory 932, and one or more storage media 930 (e.g., one or more mass storage devices) storing an application 942 or data 944. Memory 932 and storage media 930 can be, among other things, transient storage or persistent storage. The program stored on the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 922 may be provided in communication with the storage medium 930 to execute a series of instruction operations in the storage medium 930 on the server 900.

The server 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input-output interfaces 958, and/or one or more operating systems 941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

In this embodiment, the central processor 922 in the server 900 may perform the following steps:

According to an aspect of the present application, a computer-readable storage medium is provided, which is used for storing program codes, wherein the program codes are used for executing the training method of the named entity recognition model described in the foregoing embodiments.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the embodiment.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for training a named entity recognition model, the method comprising:

2. The method of claim 1, wherein the first task data set is determined according to one or more combinations of a first labeled subset of data corresponding to the first task in the target data, a second labeled subset of data corresponding to the first task in the source data, and a second labeled task data set, wherein the second task data set is a data set corresponding to a second task, and wherein the second task is a task other than the first task.

3. The method of claim 2, wherein determining the first task data set from the annotated second task data set if the first task data set is determined from the annotated second task data set comprises:

determining the first task data set from the third subset of data.

4. A method according to any one of claims 1 to 3, wherein the first model is a mask language model and the second model is a segment-extracted machine-readable understanding model.

5. The method of claim 4, wherein the training of the second model by the first set of task data comprises:

determining a vector corresponding to each word in the text data included in the first task data set;

6. The method according to any of claims 1-3, wherein said fine-tuning said third model by a third set of named entity recognition data with labels in said source data to obtain a named entity recognition model comprises:

pre-training the third model through a third named entity recognition dataset with labels in the source data;

7. The method according to claim 6, wherein the training the pre-trained third model according to the first named entity recognition data set with the pseudo labels to obtain the named entity recognition model comprises:

determining the first task data set according to the first named entity identification data set with the pseudo label;

training the second model according to the first task data set;

8. The method of claim 7, wherein the pseudo-labeled first named entity recognition dataset is iteratively trained between the pre-trained third model and the second model by:

updating the first named entity recognition data set with the pseudo label by using a third model obtained by the ith iterative training;

9. The method according to any one of claims 1-3, further comprising:

acquiring text data to be identified;

determining an entity label of each word in the text data to be recognized through the named entity recognition model;

and identifying the entity in the text data to be identified according to the entity label.

10. A training device for a named entity recognition model is characterized by comprising a first training unit, an acquisition unit, a second training unit and a third training unit:

11. The apparatus of claim 10, wherein the first task data set is determined according to one or more combinations of a first labeled subset of data corresponding to the first task in the target data, a second labeled subset of data corresponding to the first task in the source data, and a second labeled task data set, wherein the second task data set is a data set corresponding to a second task, and wherein the second task is a task other than the first task.

12. The apparatus according to claim 11, wherein if the first task data set is determined from a second task data set with labels, the obtaining unit is configured to:

determining the first task data set from the third subset of data.

13. The apparatus of any one of claims 10-12, wherein the first model is a mask language model and the second model is a segment-extracted machine-readable understanding model.

14. A training apparatus for a named entity recognition model, the apparatus comprising a processor and a memory:

the processor is configured to perform the method of any of claims 1-9 according to instructions in the program code.

15. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing the method of any of claims 1-9.