CN114386395A

CN114386395A - Sequence labeling method and device for multi-language text and electronic equipment

Info

Publication number: CN114386395A
Application number: CN202011112593.2A
Authority: CN
Inventors: 王新宇; 蒋勇; 阮巴赫; 王涛; 黄非; 黄忠强
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2022-04-22

Abstract

One or more embodiments of the present specification provide a method, an apparatus, and an electronic device for sequence tagging of multilingual texts, including: obtaining training results of a plurality of single language models for respective language data sets; constructing a training sample set according to all language data sets and training results thereof; training a multi-language model by using the training sample set until the multi-language model converges; and performing sequence labeling on the text by using the converged multi-language model.

Description

Sequence labeling method and device for multi-language text and electronic equipment

Technical Field

One or more embodiments of the present disclosure relate to the technical field of computer applications, and in particular, to a method and an apparatus for sequence tagging of multilingual texts, and an electronic device.

Background

On e-commerce platforms serving international buyers, the commodity description information usually contains languages of various countries. When searching for goods desired to be purchased, a buyer can input a sentence describing actual needs in a client provided by the platform, so that the most relevant goods can be retrieved by the platform based on a relevance algorithm. The sequence marking is an important ring in the correlation calculation, namely, the key information is extracted by marking the sentences input by the buyers, and the matching calculation is carried out based on the extracted key information to obtain the correlation scores of the commodities and the actual demands.

The existing sequence labeling module usually adopts a mode that one language corresponds to one sequence labeling model, but the accuracy of the model for the calculation of the input except the corresponding language is poor, and the business requirement is difficult to meet.

Disclosure of Invention

The specification provides a sequence labeling method of a multilingual text, which comprises the following steps:

obtaining training results of a plurality of single language models for respective language data sets;

constructing a training sample set according to all language data sets and training results thereof;

training a multi-language model by using the training sample set until the multi-language model converges;

and performing sequence labeling on the text by using the converged multi-language model.

Optionally, before obtaining the training results of the plurality of single language models for the respective language data sets, the method further includes:

obtaining a data set for a first language, wherein data in the data set is a sentence with a sequence labeling result;

performing sequence annotation on the data set by using a single language model of a first language, and calculating annotation loss;

and updating the model parameters of the single language model of the first language according to the annotation loss until the single language model of the first language converges.

Optionally, the obtaining training results of the plurality of single language models for the corresponding language data sets includes:

and inputting a first sentence in the data set of the first language into the converged single language model of the first language to obtain a sequence labeling result of the first sentence.

Optionally, the sequence labeling result includes posterior probability distributions of labels corresponding to the respective words in the first sentence.

Optionally, determining whether the monolingual model of the first language converges is performed by:

and if the marking loss is less than a preset threshold value, determining that the single language model of the first language is converged.

Optionally, the training a multi-language model using the training sample set until the multi-language model converges includes:

performing sequence annotation on the training sample set by using the multi-language model, and calculating a first annotation loss;

performing sequence annotation on the training result by using the multi-language model, and calculating a second annotation loss;

and weighting the first annotation loss and the second annotation loss based on a preset weight, and determining the multi-language model to be converged under the condition that the loss obtained by weighting is less than a preset threshold value.

Optionally, the monolingual model is a conditional random field.

Optionally, the multi-language model is a BERT model-based conditional random field;

the training of the multilingual model using the training sample set includes:

carrying out semantic representation calculation on the sentences in the training sample set by the BERT model, inputting semantic representation results of the sentences into the conditional random field, and carrying out sequence labeling on the sentences by the conditional random field based on the semantic representation results.

The present specification also proposes a device for sequential labeling of multilingual texts, said device comprising:

the first acquisition module is used for acquiring training results of a plurality of single language models for corresponding language data sets;

the construction module is used for constructing a training sample set according to all language data sets and training results thereof;

the training module is used for training the multi-language model by using the training sample set until the multi-language model converges;

and the labeling module is used for performing sequence labeling on the text by using the converged multi-language model.

Optionally, the apparatus further comprises:

a second obtaining module, configured to obtain a data set for a first language before obtaining training results of multiple single language models for corresponding language data sets, where data in the data set is a sentence with a sequence labeling result;

the calculation module is used for carrying out sequence annotation on the data set by using a single language model of a first language and calculating annotation loss;

and the updating module is used for updating the model parameters of the single language model of the first language according to the annotation loss until the single language model of the first language is converged.

Optionally, the first obtaining module is specifically configured to:

Optionally, the training module is specifically configured to:

Optionally, the monolingual model is a conditional random field.

the training module is specifically configured to:

This specification also proposes an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the steps of the above method by executing the executable instructions.

The present specification also contemplates a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the above-described method.

In the above technical solution, the training results of a plurality of single language models for the corresponding language data set can be obtained, and then the training sample set is constructed according to all the language data sets and the training results thereof, so as to train the multilingual models by using the training sample set until the multilingual models converge, thereby performing sequence labeling on the text by using the converged multilingual models. By adopting the mode, when the multi-language model processes the sequence labeling task aiming at the multi-language text, the accuracy of the sequence labeling task aiming at the text of the language corresponding to the single-language sequence labeling processing can be achieved, namely, the accuracy of the sequence labeling task aiming at the multi-language text processed by the multi-language model can be improved.

Drawings

FIG. 1 is a schematic diagram of a system for sequence annotation of multilingual text, according to an exemplary embodiment of the present disclosure;

FIG. 2 is a flowchart of a method for sequence tagging of multilingual text according to an exemplary embodiment of the present disclosure;

FIG. 3 is a hardware block diagram of an electronic device with a sequential labeling apparatus for multiple language texts according to an exemplary embodiment of the present disclosure;

FIG. 4 is a block diagram of a device for labeling sequences of multiple language texts according to an exemplary embodiment of the present specification.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.

It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

The present specification aims to provide a technical solution for obtaining training results of a plurality of single language models for corresponding language data sets, then constructing a training sample set according to all language data sets and the training results thereof, and training a multilingual model by using the training sample set until the multilingual model converges, thereby performing sequence labeling on a text by using the converged multilingual model.

In specific implementation, in order to obtain a multi-language model for performing sequence labeling on multi-language texts, a plurality of single-language models can be set; wherein each single language model corresponds to a language. In this case, for each single language model, the single language model may be trained based on a dataset of a language corresponding to the single language model, and a training result of the single language model with respect to the dataset of the language may be obtained.

When it is determined that all of the single language models are trained and training results of the trained single language models are obtained, a training sample set may be constructed based on the multi-language sentences to which corresponding sequence labeling results are previously labeled and the training results of the trained single language models, and the set multi-language models may be trained based on the training sample set until it is determined that the multi-language models converge.

In the case that the multi-language model convergence is determined, the multi-language model can be considered to be trained completely, so that the subsequent sequence labeling task can be executed based on the trained multi-language model.

Referring to fig. 1, fig. 1 is a schematic diagram of a system for labeling sequences of multiple language texts according to an exemplary embodiment of the present disclosure. Referring to fig. 2 in conjunction with fig. 1, fig. 2 is a flowchart illustrating a method for labeling a sequence of a multilingual text according to an exemplary embodiment of the present disclosure. The sequence labeling method of the multi-language text can be applied to a server side for executing a sequence labeling task of the multi-language text; the server may be deployed on an electronic device, and the electronic device may specifically be a server or a computer, which is not limited in this specification. The sequence labeling method of the multi-language text can comprise the following steps:

step 201, obtaining training results of a plurality of single language models for corresponding language data sets;

step 202, constructing a training sample set according to all language data sets and training results thereof;

step 203, training a multi-language model by using the training sample set until the multi-language model is converged;

and step 204, performing sequence annotation on the text by using the converged multi-language model.

In the embodiment, in order to obtain a multilingual model for performing sequence labeling on multilingual texts, a plurality of single language models can be set; wherein each single language model corresponds to a language. In this case, for each single language model, the single language model may be trained based on a dataset of a language corresponding to the single language model, and a training result of the single language model with respect to the dataset of the language may be obtained.

In one embodiment shown, a preset number of sentences in multiple languages (e.g., Chinese, English, French, German, Russian, etc.) may be obtained as training samples; the preset number can be preset by a technician according to actual requirements.

It should be noted that, on one hand, any one of the sentences in the plurality of languages is pre-labeled with the corresponding sequence labeling result.

In practical applications, the sequence annotation task may include: for a sentence, marking a text sequence obtained by sequencing all words in the sentence according to the sequence of the positions of the words; the sequence labeling performed on the sentence may be to label the part of speech of the word at each position in the sentence, or may also be to label the category of the word at each position in the sentence, which may be preset by a technician according to actual requirements, and the specification does not limit this. In addition, for a sentence, the sentence can be regarded as a text sequence, so that a sequence labeling result obtained by performing sequence labeling on the sentence can also be output in a sequence form; in this case, the sequence annotation result may be referred to as an annotation sequence.

Taking an english sentence as an example, assuming that the sentence is "Bob drank coffee at Starbucks", the words at the positions in the sentence are as shown in table 1 below:

position 1	Position 2	Position 3	Position 4	Position 5
					Bob	drank	coffee	at	Starbucks

TABLE 1

That is, all words in the sentence may be ordered in order from positions 1 to 5, resulting in a text sequence corresponding to the sentence: bob, drank, coffee, at, Starbucks.

After the sentence is subjected to sequence annotation, the obtained sequence annotation result can be shown in the following table 2:

position 1	Position 2	Position 3	Position 4	Position 5
					Bob	drank	coffee	at	Starbucks
Noun (name)	Verb and its usage	Noun (name)	Preposition word	Noun (name)

TABLE 2

That is, the sequence annotation result (i.e., annotation sequence) of the sentence may be: noun, verb, noun, preposition, noun.

On the other hand, for any one of the sentences in the plurality of languages, the language type to which the sentence belongs is the language type to which the text in the sentence belongs. For example, if the text in a sentence is a chinese text, the language type to which the sentence belongs is chinese; assuming that the text in a certain sentence is English text, the language type to which the sentence belongs is English; and so on.

In the case where the sentences in the plurality of languages are acquired, a plurality of data sets may be further created based on the sentences in the plurality of languages as a plurality of training sample sets.

It should be noted that, in the data sets, all sentences in the same data set belong to the same language type, and sentences in different data sets belong to different language types.

That is, for the above-described sentences of a plurality of languages, one data set may be created based on the sentences in which all the belonged languages are of the same kind. In this case, since there are sentences in a plurality of languages, there are correspondingly a plurality of created data sets, i.e., the number of created data sets is the same as the number of kinds of languages to which the sentences belong.

For example, assume that the number of sentences acquired is 100; further assume that there are 50 chinese sentences, 30 english sentences, and 20 french sentences. In this case, it is possible to create a data set 1 based on the 50 chinese sentences, a data set 2 based on the 30 english sentences, and a data set 3 based on the 20 french sentences; that is, the 50 chinese sentences are included in data set 1, the 30 english sentences are included in data set 2, and the 20 french sentences are included in data set 3. Since there are three types of languages to which the 100 sentences belong, there are correspondingly three data sets created.

In the case where the plurality of data sets are created, the plurality of single language models may be trained based on the plurality of data sets until it is determined that the respective single language models converge.

That is, for any one of the plurality of single language models, a data set belonging to a language category corresponding to the single language model may be obtained, and the single language model may be trained based on the data set until it is determined that the single language model converges. For example, assuming that the language type corresponding to a single language model is chinese, the single language model may be trained based on a chinese dataset until it is determined that the single language model converges; assuming that the language type corresponding to a certain single language model is English, training the single language model based on an English data set until the single language model is determined to be converged; and so on.

Specifically, for a single language model, a data set belonging to a language category corresponding to the single language model may be input into the single language model, i.e., the data set is subjected to sequence labeling by using the single language model, so as to obtain a sequence labeling result of each sentence in the data set. Subsequently, a tagging loss may be calculated based on the sequence tagging result of each sentence in the data set obtained by the single language model and the sequence tagging result that each sentence in the data set is pre-tagged, and the model parameters of the single language model may be updated according to the tagging loss until it is determined that the single language model converges.

In practical applications, the single language model may be a Conditional Random Field (CRF).

To determine whether the single language model has converged, a loss of sequence labeling results for each sentence in the dataset from the single language model, relative to the sequence labeling results that were previously labeled for each sentence in the dataset, can be calculated as a labeling loss based on a common loss function of the conditional random field.

Subsequently, it may be determined whether the calculated loss of annotation is less than a preset threshold. If the calculated annotation loss is less than a preset threshold, convergence of the single language model may be determined. If the calculated annotation loss is greater than or equal to a preset threshold, the single language model is considered to be not converged, so that the model parameters of the single language model can be updated, the updated single language model is used for carrying out sequence annotation on the data set again, the sequence annotation result of each sentence in the data set obtained by the updated single language model is determined, and whether the annotation loss of the sequence annotation result which is relative to each sentence in the data set and is annotated in advance is smaller than the preset threshold or not is determined; and so on. The preset threshold may be preset by a technician, or may be a default value, which is not limited in this specification.

In the case where it is determined that the single language model converges with respect to any one of the plurality of single language models, it is considered that the training of the single language model is completed, and a training result of the trained single language model with respect to a data set belonging to a language type corresponding to the single language model can be acquired.

In practical applications, the training result may specifically include: the single language model predicts a posterior probability distribution of labels corresponding to each word in each sentence with respect to each sentence in the dataset. In addition, the training results may further include: the single language model predicts a sequence annotation result of each sentence with respect to each sentence in the data set.

In this embodiment, when it is determined that all of the single language models are trained and training results of the trained single language models are obtained, a training sample set may be constructed based on sentences of the multiple languages (i.e., sentences used for training the multiple single language models) to which corresponding sequence labeling results are pre-labeled and training results of the trained single language models, and a preset multilingual model may be trained based on the training sample set until it is determined that the multilingual model converges.

For example, suppose that a chinese-based data set 1 trains a single language model 1 corresponding to chinese to obtain a trained single language model 1 and a corresponding training result 1; training a single language model 2 corresponding to English based on an English data set 2 to obtain a training result 2 corresponding to the trained single language model 2; the single language model 3 corresponding to the french is trained based on the french dataset 3, and a training result 3 corresponding to the trained single language model 3 is obtained. In this case, a chinese sentence in the data set 1, an english sentence in the data set 2, a french sentence in the data set 3, and the training results 1, 2, 3 may be constructed as a training sample set, and the preset multilingual model may be trained based on the training sample set until it is determined that the multilingual model converges.

In one embodiment shown, the multi-lingual model may be a conditional random field based on a BERT (bidirectional Encoder registration from transformations) model.

In this case, the BERT model performs semantic representation calculation on each sentence in the training sample set in advance, the semantic representation result of each sentence output by the BERT model is input into the conditional random field, and the conditional random field performs sequence labeling on each sentence based on the semantic representation result of each sentence.

It should be noted that since the BERT model is applicable to a plurality of languages, the BERT model-based conditional random field is also applicable to a plurality of languages accordingly.

In one illustrated embodiment, to determine whether the multi-lingual model has been trained to converge, on one hand, a sequence annotation result for each sentence in the training sample set derived from the multi-lingual model may be calculated based on a common loss function of the conditional random field, a loss of the sequence annotation result that is pre-annotated with respect to each sentence in the training sample set (referred to as a first loss); on the other hand, the sequence labeling result of each sentence in the training sample set obtained from the multi-language model can be calculated, and the total loss (referred to as the second loss) of the training results of the plurality of single-language models can be calculated.

Specifically, the second loss may be calculated using the following equation:

wherein, L is_posRepresenting a second loss; q is a number of_t(y_iJ | x) represents the probability that each single language model is labeled as j at location i; q is a number of_s(y_iJ | x) represents the probability that the multi-language model is labeled as j at location i; the | V | represents the size of the annotation set.

It should be noted that, for a single language model, the probability of the single language model being labeled as j at the position i can be obtained by performing data analysis on the posterior probability distribution of labels corresponding to words in each sentence predicted by the single language model with respect to each sentence in the data set.

After the first loss and the second loss are calculated, weighted addition may be performed on the first loss and the second loss based on a preset weight, and when the loss obtained by the weighted addition is smaller than a preset threshold, convergence of the multi-language model is determined; the weight may be preset by a technician, or may be a default value, which is not limited in this specification; the preset threshold may be preset by a technician, or may be a default value, which is not limited in this specification.

Specifically, the first loss and the second loss may be weighted and added using the following formula:

L_all＝λL_pos+(1-λ)L_NLL

wherein, L is_allRepresents the loss resulting from the weighted addition; said L_posRepresenting the second loss; said L_NLLRepresenting a first loss; the λ represents a weight set for a second loss; the (1- λ) represents a weight set for the first loss. That is, the sum of the weight of the first loss and the weight of the second loss is 1.

In this embodiment, when it is determined that the multi-language model converges, it may be considered that the training of the multi-language model is completed, so that the subsequent sequence tagging task may be executed based on the trained multi-language model.

It should be further noted that the above sequence tagging method for multi-language text can be effectively applied to numerous industries such as e-commerce, telecommunication, government affairs, finance, education, entertainment, health, tourism, etc. For example, in these industries, it is often desirable to provide machine translation services to users; in the machine translation service, the sequence labeling method of the multi-language text can be adopted to perform sequence labeling on sentences of various languages input by a user, so that semantic analysis, segmented translation and the like can be performed according to a sequence labeling result, and a translation result closer to the actual semantic expressed by the user is output to the user.

Corresponding to the embodiment of the sequence labeling method of the multi-language text, the specification also provides an embodiment of a sequence labeling device of the multi-language text.

The embodiment of the device for marking the sequence of the multilingual texts can be applied to electronic equipment. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 3, the hardware structure diagram of the electronic device where the multi-language text sequence labeling apparatus is located in this specification is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 3, the electronic device where the apparatus is located in the embodiment may generally include other hardware according to the actual function labeled by the multi-language text sequence, which is not described again.

Referring to fig. 4, fig. 4 is a block diagram of a device for labeling sequences of multiple language texts according to an exemplary embodiment of the present disclosure. The sequence annotation device 40 for multilingual texts can be applied to the electronic device shown in fig. 3, and comprises:

a first obtaining module 401, configured to obtain training results of multiple single language models for corresponding language data sets;

a construction module 402, configured to construct a training sample set according to all language data sets and training results thereof;

a training module 403, configured to train a multi-language model using the training sample set until the multi-language model converges;

and the labeling module 404 is configured to perform sequence labeling on the text by using the converged multi-language model.

In this embodiment, the apparatus 40 further includes:

a second obtaining module 405, configured to obtain a data set for a first language before obtaining training results of multiple single language models for corresponding language data sets, where data in the data set is a sentence with a sequence tagging result;

a calculating module 406, configured to perform sequence labeling on the data set by using a monolingual model of a first language, and calculate a labeling loss;

and an updating module 407, configured to update the model parameters of the monolingual model of the first language according to the annotation loss until the monolingual model of the first language converges.

In this embodiment, the first obtaining module 401 is specifically configured to:

In this embodiment, the sequence labeling result includes a posterior probability distribution of the label corresponding to each word in the first sentence.

In this embodiment, whether the monolingual model of the first language converges is determined by:

In this embodiment, the training module 403 is specifically configured to:

In this embodiment, the monolingual model is a conditional random field.

In this embodiment, the multi-language model is a conditional random field based on a BERT model;

the training module 403 is specifically configured to:

The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims

1. A method of sequence annotation for multilingual text, the method comprising:

2. The method of claim 1, prior to obtaining training results for a plurality of single language models for respective language data sets, the method further comprising:

3. The method of claim 2, the obtaining training results for a plurality of single language models for respective language data sets, comprising:

4. The method of claim 3, wherein the sequence annotation result comprises a posterior probability distribution of the annotation corresponding to each word in the first sentence.

5. The method of claim 2, determining whether the monolingual model of the first language converges by:

6. The method of claim 1, the training a multi-language model using the training sample set until the multi-language model converges, comprising:

7. The method of any one of claims 1-6, the monolingual model being a conditional random field.

8. The method of any of claims 1-6, the multilingual model being a BerT model-based conditional random field;

the training of the multilingual model using the training sample set includes:

9. An apparatus for sequence annotation of multilingual text, said apparatus comprising:

10. The apparatus of claim 9, further comprising:

11. The apparatus of claim 10, wherein the first obtaining module is specifically configured to:

12. The apparatus of claim 11, wherein the sequence annotation result comprises a posterior probability distribution of annotations corresponding to respective words in the first sentence.

13. The apparatus of claim 2, determining whether the monolingual model of the first language converges by:

14. The apparatus of claim 1, the training module to:

15. The apparatus of any one of claims 9-14, the monolingual model being a conditional random field.

16. The apparatus of any of claims 9-14, the multilingual model being a BERT model-based conditional random field;

the training module is specifically configured to:

17. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of any one of claims 1 to 8 by executing the executable instructions.

18. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 8.