CN118036577A

CN118036577A - Sequence labeling method in natural language processing

Info

Publication number: CN118036577A
Application number: CN202410431577.1A
Authority: CN
Inventors: 王涛; 车振英
Original assignee: One Percent Information Technology Co ltd
Current assignee: One Percent Information Technology Co ltd
Priority date: 2024-04-11
Filing date: 2024-04-11
Publication date: 2024-05-14
Anticipated expiration: 2044-04-11
Also published as: CN118036577B

Abstract

The invention relates to the technical field of machine translation, in particular to a sequence labeling method in natural language processing, which comprises the following steps: receiving text data input, and preprocessing, including word segmentation, stop word removal and normalization processing, to create a foundation for subsequent sequence annotation; performing part-of-speech tagging on the preprocessed text, and automatically identifying the part of speech of each word by using an deep learning model; identifying and classifying the entities in the text by applying a sequence labeling model, and labeling the attributes of the entities at the same time; combining the self-attention mechanism and the position code to process the word sequence in the sequence label; performing deep syntactic analysis, and marking the syntactic structure of each sentence in the text, wherein the syntactic structure comprises a main-predicate relation, a clause and phrase boundary; introducing a cross sequence labeling mechanism; and generating a depth annotation output of the text. The invention not only can identify the local mode in the text, such as the relation between the word and the phrase, but also can grasp the global structure and the semantic flow of the whole text.

Description

Sequence labeling method in natural language processing

Technical Field

The invention relates to the technical field of machine translation, in particular to a sequence labeling method in natural language processing.

Background

In the field of machine translation, it is a very challenging task to accurately understand and convert source language text into target language, which requires not only direct translation of words and phrases, but also deep understanding of the syntactic structure and semantic information of the language, and conventional machine translation systems, such as rule-based translation and statistical machine translation, tend to focus on local text segments, while ignoring global context and deep semantic relationships of the text, resulting in an inability to efficiently handle complex language structures and semantic expressions, such as long-distance dependencies and subtle context changes.

With the development of deep learning technology, although Neural Machine Translation (NMT) has made significant progress in dealing with these problems, there are still problems of insufficient understanding of long-distance context dependencies and insufficient processing of syntactic and semantic information. These problems directly affect translation quality, especially when dealing with complex sentence structures and text that contains rich semantics.

In addition, the conventional sequence labeling method generally processes different language features (such as parts of speech, entities, syntax structures and the like) independently in the preprocessing stage of machine translation, so that the information islanding problem is caused, namely, the lack of effective information exchange and utilization between different features, thereby limiting deep understanding and accurate translation of source text.

Therefore, there is a need for a machine translation method that can effectively incorporate sequence labeling methods in natural language processing techniques to achieve comprehensive understanding of deep semantics and structure of source text.

Disclosure of Invention

Based on the above object, the present invention provides a sequence labeling method in natural language processing.

A sequence labeling method in natural language processing comprises the following steps:

S1: receiving text data input, and preprocessing, including word segmentation, stop word removal and normalization processing, to create a foundation for subsequent sequence annotation;

S2: performing part-of-speech tagging on the preprocessed text, automatically identifying the part of speech of each word by using a deep learning model, and providing grammatical clues for entity identification in the text;

S3: identifying and classifying the entities in the text by applying a sequence labeling model, wherein the entity comprises names, places and mechanisms, and labeling the attributes of the entities, such as time, quantity and position;

S4: combining a self-attention mechanism and position coding to process word sequences in sequence labeling, not only recognizing local modes, but also understanding global context, optimizing entity recognition and attribute labeling by considering global context relation and global text structure, and solving the problem of neglecting long-distance dependence in the traditional sequence labeling method;

s5: performing deep syntactic analysis based on the self-attention mechanism in S4, marking the syntactic structure of each sentence in the text, including a main-predicate-guest relationship, clauses and phrase boundaries, and providing structural information for semantic role marking;

S6: introducing a cross sequence labeling mechanism, and performing cross verification and fusion on labels generated by different labeling tasks to solve the problem of information island caused by independent processing of each task in the traditional sequence labeling method, and transmitting and sharing information among different labeling tasks through cross verification;

s7: and generating a depth marking output of the text by combining the results, wherein the output contains comprehensive information of the part of speech, the entity category and the syntactic structure.

Further, the deep learning model in S2 adopts a recurrent neural network model RNN, and S2 specifically includes:

s21: inputting the preprocessed text into an RNN model designed to process sequence data, and processing the vocabulary sequence in the input text by its internal state (memory);

s22: for each vocabulary, the RNN model predicts its part of speech by considering the preceding vocabulary;

S23: in the RNN model training stage, training an RNN model by using a training data set with correct part of speech tagging, and learning a sequence mode of vocabulary and how to correctly tag the part of speech based on context by using the training data set;

S24: after training is completed, feeding the preprocessed text data into a trained RNN model for part-of-speech tagging, and outputting a part-of-speech sequence, wherein each word corresponds to a part-of-speech tag;

S25: the sequence processing power of the RNN model is utilized to optimize the model to process complex text structures and to improve the accuracy of part-of-speech tagging, including considering contextual information both before and after by adding levels or introducing bi-directional RNN structures.

Further, the sequence labeling model in S3 adopts a bi-directional encoder characterization model BERT, and S3 specifically includes:

S31: inputting the preprocessed and part-of-speech tagged text into a BERT model, wherein the BERT model captures deep semantics and context relation of each word in the text by using a pre-trained contextualized word representation thereof;

s32: for each word in the text, the BERT model generates a high-dimensional vector representation that captures the contextual meaning of the word, for each word in the text sequence The BERT model outputs the corresponding code vector；

S33: based on the BERT model, a sequence labeling layer is added for processing the output vector of BERT and distributing entity labels for each vocabulary, wherein the sequence labeling layer is a full connection layer, specifically for each code vectorThe probability distribution of the entity tags is calculated through the full connection layer: Wherein, the method comprises the steps of, wherein, Is a wordIs a probability distribution of the entity tag of (c),Is a weight matrix of the full connection layer,Is a bias term, the softmax function is used to convert the output into a probability distribution;

S34: in the training process, parameters of the BERT model and the sequence labeling layer are adjusted by minimizing a loss function of entity labeling, and the recognition and classification capacity of the model to the entity is optimized;

S35: and after the entity identification and classification are completed, the attribute marking is carried out on the identified entity by utilizing the deep semantic understanding capability of the BERT model, and the specific attribute of the entity is identified.

Further, the step S4 specifically includes:

s41: applying a position code to each word in the sequence to generate a position-dependent vector representation, ensuring that the position of the word in the text sequence can be identified, the position code being a fixed code based on sine and cosine functions;

S42: adding the position codes and word vectors of the words to obtain a comprehensive representation containing both vocabulary content and position information;

S43: in the self-attention mechanism, the attention score of each word in the sequence to all other words is calculated to capture the dependency between the different words, for each word in the sequence Attention score thereofFor wordsThe calculation is as follows: Wherein, the method comprises the steps of, wherein, By comparing wordsAndIs obtained by dot product of the encoded vectors of (2) and represents the similarity between them;

S44: using the representation of each word in the attention score weighted sequence to obtain a weighted representation of each word in context;

S45: the weighted representation is used to identify local patterns in the text as well as global context, thereby taking into account both local and global information in the sequence annotation process.

Further, the step S5 specifically includes:

S51: based on the attention score in S43 Constructing a global syntax dependency graph, wherein each node in the graph represents a word, the edges connecting each node represent the syntax dependency relationship between words, and the weight of the edges is determined by the attention score;

s52: based on the syntax dependency graph, a graph neural network processing algorithm is adopted to identify main-predicate-guest relationships, clause and phrase boundaries, the global dependency graph is converted into series syntax structure labels, each label corresponds to one component or relationship in a sentence, each sentence component and relationship are marked, and a detailed syntax structure containing the main-predicate-guest relationships, clause and phrase boundaries is obtained.

Further, the introducing cross sequence labeling mechanism in S6 specifically includes:

S61: after part-of-speech tagging, entity identification and syntactic analysis are completed, collecting tag data generated by each tagging task, wherein the tag data comprises text information obtained through analysis from different angles;

S62: designing a multi-task learning framework, wherein part of the network structure is shared to learn the general features in each labeling task, and a corresponding task network layer is reserved for each task to capture the special features of the task;

S63: in the multi-task learning process, information is transmitted through a sharing layer, so that information flow and interaction between different labeling tasks are allowed;

S64: by means of the cross-validation technology, information is cross-validated among different labeling tasks, labeling errors or contradictions are identified and corrected by comparing labeling results of the different tasks, and overall accuracy and consistency of labeling are improved;

S65: in the training process, a joint optimization strategy is adopted, the loss functions of all labeling tasks are optimized at the same time, and the mutual influence and constraint among different tasks are considered.

Further, the multi-task learning framework is provided withMarking tasks by different sequences, each taskCorresponding to a specific labeling target, inputting text for a shared network structureFeature representation converted to sharing：WhereinIs a transfer function of the shared layer;

Conversion function of the shared layer The definition is as follows:

Input embedding: for a given input text sequence First, each word is divided intoEmbedded vector conversion into high-dimensional space；

Position coding: to preserve the order information of the words in the sequence, for each embedded vectorAdding position codingGenerating location-aware embeddings；

Transformer layer: embedding location awarenessInput to the transducer layer, computation of shared context sensitive feature representations through self-attention and feed forward networks：

。

Further, the corresponding task network layer performs, for each labeling taskAre all provided with a network layerTo handle shared featuresAnd outputs task-specific labeling results：WhereinIs the firstA plurality of task-specific transfer functions;

The transfer function The definition is as follows:

task-specific feed forward network: context sensitive features obtained from shared layers Processing features using one or more feed forward network layersTo capture task-specific patterns and relationships: Wherein, the method comprises the steps of, wherein, Represent the firstA task specific feed forward network;

And (3) output processing: depending on the nature of the task, The probability distribution is converted by the softmax layer for classification tasks.

Further, the joint optimization strategy comprises the step of enabling loss functions of all labeling tasksOptimizing in combination, total lossIs a weighted sum of the individual task losses: Wherein, the method comprises the steps of, wherein, Is the firstThe weight of the individual tasks is determined,Is a true annotation of the object,Is the calculation prediction labelAnd true annotationA loss function of the difference between them.

Further, the step S7 specifically includes:

Integrating the cross-validation results: utilizing results in a cross sequence labeling mechanism, wherein the output of each labeling task is optimized through a cross verification and fusion process, and consistent labeling information is provided for each word or phrase;

Constructing a comprehensive annotation frame: for each word or phrase in the text, aggregating the labeling results after optimizing each task into a comprehensive labeling set, wherein the comprehensive labeling set comprises information of part of speech, entity category and syntactic relation;

generating depth annotation output: and synthesizing the aggregated information to generate a depth annotation output for the whole text.

The invention has the beneficial effects that:

According to the invention, long-distance dependency in the sequence can be captured through a self-attention mechanism, the problem of long-distance dependency possibly neglected in the traditional sequence labeling method is solved, the global view angle enables entity identification and attribute labeling to be more accurate, the context relation crossing a long text paragraph can be understood, the self-attention mechanism is applied in deep syntactic analysis, the main-predicate-guest relation, clause and phrase boundary in the text can be effectively marked, fine syntactic understanding provides a solid structural information foundation for semantic role labeling, so that semantic analysis is more accurate and deep, the local mode in the text such as the relation of words and phrases can be identified, the global structure and semantic flow of the whole text can be mastered, the comprehensive understanding is a high-level natural language processing task, and more abundant and accurate language information is provided, so that the performance and reliability of the application are improved.

According to the method, through a cross sequence labeling mechanism, different labeling tasks such as part-of-speech labeling, entity identification, syntactic analysis and semantic role labeling can be mutually verified and optimized, the problem of information island possibly caused by independent processing of each task in a traditional sequence labeling method is effectively solved by the mechanism, the labeling accuracy and consistency are remarkably improved, and the text information obtained from different angles can be mutually supplemented and corrected by the aid of a comprehensive multi-task learning framework and the application of cross verification, so that errors are reduced, and the reliability of results is enhanced.

The invention generates the deep annotation output by integrating the annotation information of each level, so that the understanding of the text is not limited to the vocabulary or the syntactic structure of the surface layer, but extends to the deeper semantic and relation, the deeper understanding is the subsequent advanced natural language processing task, and a rich and accurate information basis is provided, thereby improving the performance and accuracy of the systems.

According to the invention, the multi-task learning process is optimized through the structures of the sharing layer and the task specific layer, different sequence labeling tasks are allowed to share the language characteristics of the bottom layer, and meanwhile, the independence and the specificity among the tasks are maintained, so that the learning efficiency is improved, the repeated calculation and the resource consumption are reduced, and the overall effect of multi-task learning is remarkably improved through the fine-granularity information fusion and mutual enhancement.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a sequence labeling method according to an embodiment of the invention;

fig. 2 is a schematic diagram of a cross sequence labeling mechanism according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.

It is to be noted that unless otherwise defined, technical or scientific terms used herein should be taken in a general sense as understood by one of ordinary skill in the art to which the present invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

As shown in fig. 1-2, a sequence labeling method in natural language processing includes the following steps:

S11: and carrying out word segmentation processing on the received text data, and segmenting the continuous text into independent vocabulary units by adopting a language specific word segmentation algorithm so as to facilitate subsequent part-of-speech tagging and entity recognition tasks.

S12: stop words are removed from the segmented text, and a predefined stop word list (such as prepositions, pronouns, auxiliary verbs and the like) is used for removing words which have little influence on the marking task of the subsequent sequence from the text, so that noise is reduced, and processing efficiency is improved.

S13: the normalization processing is carried out on the rest vocabulary, including unifying the case of the vocabulary, converting the morphological change (such as unifying different time states of verbs into basic forms), eliminating synonym difference and the like, so as to reduce the diversity and complexity of texts and ensure the consistency and accuracy of the sequence labeling process.

S6: introducing a cross sequence labeling mechanism, and performing cross verification and fusion on labels generated by different labeling tasks to solve the problem of information island caused by independent processing of each task in the traditional sequence labeling method, and transmitting and sharing information among different labeling tasks through cross verification, for example, assisting part-of-speech labeling and semantic role labeling by using a result of syntactic analysis, thereby improving the overall labeling precision and consistency;

The deep learning model in S2 adopts a cyclic neural network model RNN, and S2 specifically comprises the following steps:

S22: for each word, the RNN model predicts its part of speech by considering the preceding word, this sequence-dependent nature making RNNs particularly suited for part of speech tagging tasks, as part of speech is typically dependent on the context of neighboring words;

In the part-of-speech tagging using a Recurrent Neural Network (RNN), the core computation involves state update and output generation of the RNN, and the following computation description is used to describe how the RNN processes sequence data for part-of-speech tagging:

And (5) updating the state: for each element in the sequence (here, each word in the text), the RNN will calculate the current state from the current input and the previous state, the state update formula being: Wherein, the method comprises the steps of, wherein, Is the current timeIs a hidden state of (c).Is the input at the current time and corresponds to a word vector.Is the hidden state of the previous moment.AndRespectively, a weight matrix input to the hidden layer and hidden layer to hidden layer.Is a bias term.A nonlinear activation function such as tanh or ReLU.

And (3) output generation: hidden state per time stepThe part-of-speech tag used to calculate the output, i.e., the current word, is expressed as: Wherein, the method comprises the steps of, wherein, Is the moment of timeAnd the output of (2) represents the probability distribution of the part-of-speech tags.Is a hidden layer to output layer weight matrix.Is the hidden state at the current time.Is a bias term for the output layer.Typically a softmax function, is used to convert the output into a probability distribution, giving a probability to each possible part-of-speech tag.

In the part-of-speech tagging process of the RNN, the model traverses each word in the text, gradually updates the state by using the calculation process, and generates a part-of-speech probability distribution of each word. During training, these probabilities are compared with the actual parts of speech and model parameters are adjusted by back propagation algorithm、、、、To minimize the difference between the predicted and actual parts of speech and thereby improve the part of speech tagging capability of the model.

The sequence labeling model in S3 adopts a bi-directional encoder to characterize the model BERT, the BERT model is applied to carry out entity recognition and classification on the text after preprocessing and part-of-speech labeling, the models can learn and capture the contextual relation of words from the text, so that different types of entities such as names, places and organizations can be accurately identified, and in the entity recognition process, the model can allocate an entity label for each word or phrase in the text, for example, the names are labeled as 'names', the geographic positions are labeled as 'places', and the like. This process takes advantage of the model's ability to understand and classify different entities in the text, and for identified entities, attributes such as date, time, quantity attributes are noted. This requires that the model not only recognizes entities, but also understands specific properties or features of the entities and assigns them with corresponding property tags, and when labeling the model with training sequences, a large-scale corpus with detailed entity and property labels is used to ensure that the model can accurately learn representations of different entities and their properties, S3 specifically includes:

S34: in the training process, parameters of the BERT model and the sequence labeling layer are adjusted by minimizing a loss function (cross entropy loss) of entity labeling, and the recognition and classification capacity of the model to the entity is optimized;

S4 specifically comprises the following steps:

s44: using the representation of each word in the attention score weighted sequence to derive a weighted representation of each word in context such that the context information of the entire sequence is taken into account when processing each word;

S45: the weighted representation is used to identify local patterns in the text (e.g., usage of phrases or specific phrases) and global context (e.g., semantic streams in whole sentences or paragraphs) so that both local and global information is considered in the sequence labeling process.

S5 specifically comprises the following steps:

The marking mechanism of the introduced cross sequence in the S6 specifically comprises the following steps:

S63: in the multi-task learning process, information is transmitted through a sharing layer, so that information flow and interaction between different labeling tasks are allowed, and knowledge learned from one task can be utilized by other tasks;

S64: by means of the cross-validation technology, information is cross-validated among different labeling tasks, labeling errors or contradictions are identified and corrected by comparing labeling results of different tasks, overall accuracy and consistency of labeling are improved, and information can be exchanged and validated in a sharing layer through different sequence labeling tasks through design of the sharing layer and a corresponding task network layer, so that knowledge transfer and integration among different tasks are promoted, and more comprehensive understanding and processing of text data are facilitated;

The multi-task learning frame is provided withMarking tasks by different sequences, each taskCorresponding to a specific labeling target, inputting text for a shared network structureFeature representation converted to sharing：WhereinIs a transfer function of the shared layer;

Conversion function of shared layer The definition is as follows:

Input embedding: for a given input text sequence First, each word is divided intoEmbedded vector conversion into high-dimensional spaceThrough an embedding layer of a pre-trained Word embedding model such as Word2Vec, gloVe or BERT;

Position coding: to preserve the order information of the words in the sequence, for each embedded vector Adding position codingGenerating location-aware embeddings；

。

The corresponding task network layer marks each taskAre all provided with a network layerTo handle shared featuresAnd outputs task-specific labeling results：WhereinIs the firstA plurality of task-specific transfer functions;

Conversion function The definition is as follows:

And (3) output processing: depending on the nature of the task, Converting into probability distribution through a softmax layer for classifying tasks;

In this way, the transfer function of the layer is shared A generic context sensitive feature representation is provided, and the transfer function of each task specific layerSpecific labeling tasks are performed according to the features, and information communication and fusion between different tasks are allowed to be performed through a sharing layer, so that labeling performance and consistency of the whole system are enhanced.

The joint optimization strategy includes the loss function of all labeling tasksOptimizing in combination, total lossIs a weighted sum of the individual task losses: Wherein, the method comprises the steps of, wherein, Is the firstThe weight of the individual tasks is determined,Is a true annotation of the object,Is the calculation prediction labelAnd true annotationA loss function of the difference between them.

S7 specifically comprises the following steps:

Integrating the cross-validation results: utilizing results in a cross sequence labeling mechanism, wherein the output of each labeling task (such as part of speech, entity class, syntax structure) is optimized through a cross verification and fusion process, and consistent labeling information is provided for each word or phrase;

Constructing a comprehensive annotation frame: for each word or phrase in the text, the labeling results after optimizing each task are aggregated into a comprehensive labeling set, the comprehensive labeling set contains information of part of speech, entity category and syntactic relation, the aggregation considers mutual verification and information fusion in the process of labeling the cross sequences, and consistency and complementarity between labeling results of each dimension are ensured;

Generating depth annotation output: the aggregated information is synthesized to generate a deep annotation output for the whole text, the language attribute and structure of the text are described in detail by the output, and the deep annotation output comprises the part of speech, entity category, syntax role, semantic role and the like of each word or phrase.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the invention is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

The present invention is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims

1. A sequence labeling method in natural language processing, comprising the steps of:

S2: performing part-of-speech tagging on the preprocessed text, and automatically identifying the part of speech of each word by using an deep learning model;

S3: identifying and classifying the entities in the text by applying a sequence labeling model, and labeling the attributes of the entities at the same time;

2. The sequence labeling method in natural language processing according to claim 1, wherein the deep learning model in S2 adopts a recurrent neural network model RNN, and the S2 specifically includes:

S21: inputting the preprocessed text into an RNN model designed to process sequence data, and processing vocabulary sequences in the input text through the internal states thereof;

3. The sequence labeling method in natural language processing according to claim 2, wherein the sequence labeling model in S3 adopts a bi-directional encoder characterization model BERT, and the S3 specifically includes:

s32: for each word in the text, the BERT model generates a high-dimensional vector representation that captures the contextual meaning of the word, for each word in the text sequence The BERT model outputs the corresponding encoding vector/>；

S33: based on the BERT model, a sequence labeling layer is added for processing the output vector of BERT and distributing entity labels for each vocabulary, wherein the sequence labeling layer is a full connection layer, specifically for each code vectorThe probability distribution of the entity tags is calculated through the full connection layer:/>Wherein/>Is a word/>Probability distribution of entity tags of/>Is the weight matrix of the full connection layer,/>Is a bias term, the softmax function is used to convert the output into a probability distribution;

4. The method for sequence labeling in natural language processing according to claim 1, wherein S4 specifically comprises:

S43: in the self-attention mechanism, the attention score of each word in the sequence to all other words is calculated to capture the dependency between the different words, for each word in the sequence Its attention score/>For words/>The calculation is as follows: wherein/> By comparing words/>And/>Is obtained by dot product of the encoded vectors of (2) and represents the similarity between them;

5. The method for sequence labeling in natural language processing according to claim 4, wherein S5 specifically comprises:

6. The sequence labeling method in natural language processing according to claim 1, wherein the introducing cross sequence labeling mechanism in S6 specifically comprises:

7. The method for sequence annotation in natural language processing as claimed in claim 6, wherein the multi-task learning framework is provided withEach task/>, with different sequences labeling the tasksFor a shared network structure, text/>, corresponding to a specific labeling target, is inputConverted to shared feature representation/>：/>Wherein/>Is a transfer function of the shared layer;

Conversion function of the shared layer The definition is as follows:

Input embedding: for a given input text sequence First, each word/>Embedded vector/>, converted into high-dimensional space；

Position coding: to preserve the order information of the words in the sequence, for each embedded vectorAdding position coding/>Generating location-aware embeddings/>；

Transformer layer: embedding location awarenessInput to the transducer layer, computation of shared context sensitive feature representation by self-attention and feed forward network/>：

。

8. The method of claim 7, wherein the corresponding task network layer, for each labeling taskAre all provided with a network layer/>To handle shared features/>And outputs task-specific labeling results/>：/>Wherein/>Is/>A plurality of task-specific transfer functions;

The transfer function The definition is as follows:

task-specific feed forward network: context sensitive features obtained from shared layers Processing features using one or more feed forward network layers/>To capture task-specific patterns and relationships: /(I)Wherein/>Represents the/>A task specific feed forward network;

9. The method for sequence labeling in natural language processing according to claim 8, wherein the joint optimization strategy comprises assigning all labeling tasks to loss functionsTogether optimize, total loss/>Is a weighted sum of the individual task losses: /(I)Wherein/>Is/>Weights of individual tasks,/>Is a true annotation,/>Is to calculate prediction labels/>And true annotation/>A loss function of the difference between them.

10. The method for sequence labeling in natural language processing according to claim 9, wherein S7 specifically comprises: