CN111428500A

CN111428500A - Named entity identification method and device

Info

Publication number: CN111428500A
Application number: CN201910018256.8A
Authority: CN
Inventors: 丁瑞雪; 谢朋峻; 马春平; 李林琳; 司罗
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2020-07-17
Anticipated expiration: 2039-01-09
Also published as: CN111428500B

Abstract

The embodiment of the invention provides a named entity identification method and a device, wherein the named entity identification method comprises the following steps: obtaining labels of unit words in a text training sample output by a first model and a first probability corresponding to the labels, wherein the labels comprise at least one unknown label; replacing each unknown label with a plurality of preset labels of the unit word according to the unit word corresponding to each unknown label, and acquiring a second probability corresponding to each preset label; and performing second model training by using the text training sample according to the labels of the unit words and the first probabilities corresponding to the labels, the plurality of preset labels of the unit words corresponding to each unknown label and the second probabilities corresponding to the preset labels, so as to perform named entity recognition on the text through the first model and the second model. By the embodiment of the invention, the accuracy and efficiency of named entity identification are improved, and the cost of named entity identification is reduced.

Description

Named entity identification method and device

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a named entity identification method and device.

Background

With the rapid development of computer technology, top-level applications for processing various transactions have come into force. For many top-level applications of word information processing, named entity recognition of text is the basis for these top-level applications.

Named Entity Recognition (NER) refers to Recognition of entities with specific meaning in text, and mainly includes name of person, place, organization, time, proper noun, etc. At present, the main named entity recognition is based on machine supervision learning, learning is carried out from artificially labeled training data, a named entity recognition model is finally generated, and then the model is used for an actual scene to carry out named entity recognition on a text.

However, in the actual training process, the training data labeled manually often has a missing label condition, or sometimes a plurality of labeled data need to be fused, while one entity may be labeled in one labeled data but not labeled in another labeled data, and these abnormal data are called as incomplete labeled data. The result of the model trained on the incomplete labeled data is often much worse than that of the model trained on the high-quality labeled data, and the trained named entity recognition model cannot meet the actual application requirement.

Therefore, incomplete labeling data are verified in a manual mode at present, so that training data are corrected on the whole, and the model training effect is improved. However, the manual method is inefficient, so that the training cost of the named entity recognition model is high, and the cost of performing named entity recognition on the text is increased.

Disclosure of Invention

In view of this, embodiments of the present invention provide a named entity recognition scheme, so as to solve the problem in the prior art that the cost for recognizing a named entity in a text is high due to low training efficiency and high training cost in training a named entity recognition model based on incomplete annotation data.

According to a first aspect of the embodiments of the present invention, there is provided a named entity identification method, including: obtaining labels of unit words in a text training sample output by a first model and a first probability corresponding to the labels, wherein the labels comprise at least one unknown label; replacing each unknown label with a plurality of preset labels of the unit word according to the unit word corresponding to each unknown label, and acquiring a second probability corresponding to each preset label; and performing second model training by using the text training sample according to the labels of the unit words and the first probabilities corresponding to the labels, the plurality of preset labels of the unit words corresponding to each unknown label and the second probabilities corresponding to the preset labels, so as to perform named entity recognition on the text through the first model and the second model.

According to a second aspect of the embodiments of the present invention, there is provided a named entity identifying apparatus, including: the first obtaining module is used for obtaining labels of unit words in a text training sample output by a first model and a first probability corresponding to the labels, wherein the labels comprise at least one unknown label; the second obtaining module is used for replacing each unknown label with a plurality of preset labels of the unit words according to the unit words corresponding to each unknown label and obtaining a second probability corresponding to each preset label; and the training module is used for performing second model training by using the text training sample according to the labels of the unit words and the first probabilities corresponding to the labels, the plurality of preset labels of the unit words corresponding to each unknown label and the second probabilities corresponding to the preset labels, so as to perform named entity recognition on the text through the first model and the second model.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the named entity identification method in the first aspect.

According to a fourth aspect of embodiments of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the named entity recognition method according to the first aspect.

According to the named entity recognition scheme provided by the embodiment of the invention, unknown tags and corresponding unit words in the unit words are processed on the basis of the first probability corresponding to the tags of the text unit words and the tags of each text unit word output by the first model. Therefore, the named entity recognition model with higher accuracy can be obtained through training the second model, manual verification of unit words corresponding to unknown labels is not needed, model training efficiency is improved, and model training cost is reduced. Furthermore, the cost of using the first model and the second model to identify the named entities of the text is reduced, and the identification accuracy and efficiency are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.

Fig. 1 is a flowchart illustrating steps of a named entity recognition method according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of a named entity recognition method according to a second embodiment of the present invention;

fig. 3 is a block diagram of a named entity recognition apparatus according to a third embodiment of the present invention;

fig. 4 is a block diagram illustrating a named entity recognition apparatus according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

Example one

Referring to fig. 1, a flowchart illustrating steps of a named entity identification method according to a first embodiment of the present invention is shown.

The named entity identification method of the embodiment comprises the following steps:

step S102: and obtaining the labels of the unit words in the text training samples output by the first model and the first probability corresponding to the labels.

Wherein the tag comprises at least one unknown tag. In the embodiment of the present invention, a unit word may be a single character, or a word composed of a plurality of characters, or a combination of a single character and a word, where the characters include, but are not limited to: characters, letters, numbers, symbols, etc. of various languages. Taking "the obama visits china" as an example, if the unit word is in the form of a single character, the sentence includes 7 unit words, and each word is a unit word; if the unit word is in the form of a word, the sentence may include 3 unit words, which are respectively "obama", "visit", and "china"; if the unit word is in the form of a single character and word combination, the sentence may include 4 unit words, which are respectively "obama", "visit", and "china", but is not limited thereto, and in this case, more unit words may be included, such as splitting "obama" into 3 single words and then forming 5 unit words with "visit" and "china", or splitting "china" into 2 single words and then forming 4 unit words with "obama" and "visit", and so on. In practical applications, those skilled in the art can adopt any appropriate form of unit words according to actual requirements, and the specific splitting manner can also be appropriately split by those skilled in the art according to actual requirements, which is not limited in the embodiments of the present invention.

At present, the common way to realize named entity recognition is to convert the named entity recognition task into a tag labeling task of text sequence, i.e. to label each unit word of an input sentence, the tag is composed of "prefix + type", e.g. prefix B indicates that the unit word is the beginning of an entity, prefix I indicates that the unit word is inside an entity, prefix E indicates that the unit word is the end of an entity, and prefix S indicates that the unit word is a single-word entity.

For example, in the above example, if the labels defined in the dataset include B-PER, I-PER, E-PER, B-L OC, I-L OC, E-L OC, and O, in the form of one unit word PER word, the first probability of the B-PER label for the "O" word is 1.5, the first probability of the corresponding I-PER label is 0.4, the first probability of the corresponding E-PER label is 0.2, the first probability of the corresponding O label is 0.1, the first probability of the corresponding B-L OC label is 0.08, the first probability of the corresponding I-L OC label is 0.01, and the first probability of the corresponding E-L OC label is 0.05.

In the embodiment of the present invention, mainly for the "other type" that is, the "O" type tag in all the tags output by the first model, since the type tag cannot be divided into any tag set of an entity type, the embodiment of the present invention refers to the type tag as an unknown tag, and refers to the other type tag as an entity tag.

In practical applications, the first model may be any suitable model of tag labeling tasks that can convert named entity recognition tasks into text sequences, including but not limited to the Bi-L STM (Bi-L ong Short Term Memory) network model.

Step S104: and replacing each unknown label with a plurality of preset labels of the unit word according to the unit word corresponding to each unknown label, and acquiring a second probability corresponding to each preset label.

For convenience of explanation, in the unit words output by the first model, if an unknown tag is associated with a certain unit word, the unit word associated with the unknown tag is referred to as a non-entity word, but it should be understood by those skilled in the art that a non-entity word does not mean that the unit word has only an unknown tag, and other entity tags may be associated with the unit word, and only when the unknown tag associated with the unit word is processed, the unit word is referred to as a non-entity word. For example, when the character w0 corresponds to three labels of B-PER, B-ORG and O, it is called a non-entity word when the corresponding O label is subsequently processed, so as to distinguish it from other cases.

For each non-entity word output by the first model, a plurality of different preset labels may be set for the non-entity word, where the preset labels may be labels defined in a data set used by the first model, may also be a plurality of labels obtained by performing label statistics on all text training samples, and may also be labels in which each non-entity word is labeled in a text training sample. It should be noted that the plurality of preset tags may still include the unknown tag.

In the embodiment of the present invention, the second probability means that the probability that each unit word in one text sentence takes a certain label can be calculated by the second model, which may also be referred to as an edge probability, after performing label statistics on all text training samples as described above, for each unit word, the labeled label type of each unit word is calculated, and the ratio of the number of times each label type is labeled to the sum of the number of times each unit word is labeled to various label types is calculated, for example, assuming that "middle" word is labeled as B-L OC 1000 times, labeled as B-PER 150 times, and labeled as O100 times, the second probability of "middle" word corresponding to B-L OC label is 1000/(1000+150+100) 0.8 ", that is, the second probability of" B-PER label corresponding to B-PER "is 150/(1000+150+100) 0.12", that is "B-PER label corresponding to" word "150/(1000 +150+ 100)" 0.12 ", that" B-PER label corresponding to word corresponding "B-PER" is 0.12 ", that the second probability of O label corresponding to" word is 100, that "B-PER label is set as a probability of 100, that the second label is equal to the actual label, that the label is set by using a plurality of the above-OC label, or a predetermined probability of the label, that is set as a predetermined number of a word, or a number of labels, or a number of a word, or a number of a word, which is set, or a number of a word, or a word, which is set, or a number of a word, which.

And taking the plurality of preset labels corresponding to each non-entity word and the second probability corresponding to the plurality of preset labels as a reference to train a subsequent second model, wherein the second probability can be used as a weight form to distinguish a plurality of different label paths corresponding to the plurality of preset labels, so that the most effective label path is determined, and the accuracy and hit probability of the trained named entity recognition model (comprising the first model and the second model) for carrying out the named entity recognition are improved.

Step S106: and performing second model training by using the text training sample according to the labels of the unit words and the first probabilities corresponding to the labels, the plurality of preset labels of the unit words corresponding to each unknown label and the second probabilities corresponding to the preset labels, so as to perform named entity recognition on the text through the first model and the second model.

On the basis of obtaining a plurality of first probabilities corresponding to a plurality of labels of each unit word output by the first model and a second probability corresponding to each preset label of a plurality of preset labels of each non-entity word, the text training sample can be continuously used for training the second model. Wherein the second model includes, but is not limited to, a CRF (conditional random Field) model.

At present, the traditional named entity recognition model training usually adopts a mode of an input layer + an encoding layer + CRF, wherein the input layer firstly maps each unit word in a sentence onto a vector, the encoding layer encodes the context information of each unit word through a Bi-L STM network model and outputs a vector as the code of the unit word, the code is used as the input of the CRF model, the CRF model calculates the ratio of the fraction of the path where the correct label of the unit word is located to the sum of the fractions of all paths of the sentence to be used as the optimization target of the whole named entity recognition model, and the parameter of the CRF model is updated by a gradient descent algorithm aiming at the input data each time so as to maximize the target.

In order to adapt to incomplete labeling data, the embodiment of the invention improves the training mode of the named entity recognition model, namely, under the traditional task, the label of each unit word is determined, in the embodiment of the invention, only the labeled non-O entities are regarded as known labels (such as B-PER, I-L OC and the like), namely, entity labels, and all O labels are regarded as unknown labels, and under the traditional task, the CRF model calculates the ratio of the fraction of the path where the correct label is located to the sum of the fractions of all paths.

The training of the second model, such as the Partial CRF model described above, is an iterative training, and after each iteration, the relevant parameters (including the second probability) in the second model are updated according to the training result, and then the second model is continuously trained again according to the first probability, the plurality of preset labels corresponding to each non-entity word, and the updated second probability corresponding to the preset labels until reaching the training termination condition, where the training termination condition may be that the value of the loss Function (L oss Function) reaches a set threshold, the accuracy does not rise any more, or the set number of times of training is reached, and the like.

In one possible approach, this step can be implemented as: determining a plurality of label paths corresponding to a plurality of preset labels of each unknown label aiming at the unit word (namely, non-entity word) corresponding to the unknown label; obtaining a path reference score of each label path according to the first probability corresponding to each unit word in each label path in the plurality of label paths; obtaining a path score of each label path according to a second probability corresponding to a preset label in each label path and a path reference score of each label path; obtaining the sum of the path scores of the plurality of label paths according to the path score of each label path; and performing iterative training on the second model according to the sum of the path scores until a training termination condition is reached.

For example, in sentence a comprising two characters w0 and w1, the tags in the dataset comprise: B-PER, I-PER and O are taken as examples, the first probability of the B-PER label corresponding to w0 is 1.5, and the first probabilities of the corresponding I-PER label and the corresponding O label are both 0; let w1 correspond to a B-PER label with a first probability of 0.01, a corresponding I-PER label with a first probability of 0.8, and a corresponding O label with a first probability of 0.5. For the O label corresponding to w1, the preset label is still set as the label in the data set, and for the convenience of distinction, it is labeled as B-PER ', I-PER ', and O '. Assume that B-PER ', I-PER ', and O ' correspond to second probabilities of 0.1, 0.7, and 0.03, respectively.

When calculating the score of each label path, the CRF model needs to use a label conversion matrix and a label conversion score. Therefore, in this embodiment, a simple example of a tag transformation matrix is shown in table 1 below:

TABLE 1

	START	B-PER	I-PER	B-ORG	S-ORG	O	END
								START	0	0.8	0.007	0.7	0.0008	0.9	0.08
B-PER	0	0.6	0.9	0.2	0.0006	0.6	0.009
								I-PER	-1	0.5	0.53	0.55	0.0003	0.85	0.008
B-ORG	0.9	0.5	0.0003	0.25	0.8	0.77	0.006
								S-ORG	-0.9	0.45	0.007	0.7	0.65	0.76	0.2
O	0	0.65	0.0007	0.7	0.0008	0.9	0.08
								END	0	0	0	0	0	0	0

In the above table 1, in order to increase the robustness of the tag transformation matrix, a START tag and an END tag are additionally added, where the START tag represents the START of a sentence, and the END tag represents the END of a sentence, and their meanings and labels can be referred to the existing tag transformation matrix schematic, and will not be further detailed herein. In Table 1, the value of each element represents the probability of transitioning from one tag to the next, e.g., row 3, column 4 "0.9" represents a probability of 0.9 of transitioning from a "B-PER" tag to an "I-PER" tag. In addition, it should be understood by those skilled in the art that only a part of the labels and probabilities of the label transformation matrix are shown in table 1 above, and the actual application is not limited to the labels and probabilities in table 1 above. Moreover, the setting and updating of the labels and probabilities shown in table 1 above are the same as the existing setting and updating methods, and are not described herein again.

Based on the above settings: (1) for w1 when the label is "O", the corresponding multiple label paths are B-PER- > B-PER ', B-PER- > I-PER ', and B-PER- > O ', where B-PER is the label of w 0; (2) correspondingly, the path reference score corresponding to B-PER- > B-PER' is as follows: the path reference score corresponding to 1.5+0.1+0.6 ═ 2.2 and B-PER- > I-PER' is: the path reference score corresponding to 1.5+0.7+0.9 ═ 3.1 and B-PER- > O' is: 1.5+0.03+0.6 ═ 2.13; (3) correspondingly, the path corresponding to B-PER- > B-PER' is divided into: 2.2 × 0.1 ═ 0.22, B-PER- > I-PER' the corresponding path score is: 3.1 × 0.7 ═ 2.17, B-PER- > O' the path score is: 2.13 × 0.03 ═ 0.0639; (4) when the label is "O", the sum of the path scores of the plurality of label paths corresponding to w1 is: SUM1 ═ 0.22+2.17+0.0639 ═ 2.4539; (5) calculate the sum of the scores of all paths between w0 and w 1: SUM2 ═ 2.4539+ (1.5+0.01+0.6) + (1.5+0.8+0.9) — 7.7639, then SUM1/SUM2 ═ 2.4539/7.7639 ≈ 0.3160, and further, parameters in the second model, including the second probability, can be updated according to the difference between the ratio and the set threshold, and then return to (2) to continue to perform the above process repeatedly, iteratively and repeatedly, until the training termination condition is reached.

The second model after the training process can determine the real label of each non-entity word with the label of "O" output by the first model in the sentence to which the non-entity word belongs.

According to the embodiment of the invention, a plurality of labels, namely a plurality of preset labels, and a second probability of each preset label are preset for each non-entity word, and based on the preset labels, the second model training can be performed on the unit word corresponding to each unknown label by combining the output result of the first model, so as to determine the text training sample to which the unknown label belongs, such as a real label in a sentence. Therefore, the named entity recognition model with higher accuracy can be obtained through training the second model, manual verification of unit words corresponding to unknown labels is not needed, model training efficiency is improved, and model training cost is reduced. Furthermore, the cost of using the first model and the second model to identify the named entities of the text is reduced, and the identification accuracy and efficiency are improved.

The named entity recognition method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: servers, mobile terminals (such as tablet computers, mobile phones and the like), PCs and the like.

Example two

Referring to fig. 2, a flowchart illustrating steps of a named entity recognition method according to a second embodiment of the present invention is shown.

In this embodiment, the named entity identification method according to the embodiment of the present invention is described by taking an example in which the first model is a Bi-L STM network model and the second model is a Partial CRF model.

step S202: and acquiring all text training samples, and equally dividing all the text training samples into a plurality of sample subsets of a set number.

By dividing all the text training samples into a plurality of sample subsets with set number of parts, the cross training of the first model and the second model can be realized. In the embodiments of the present invention, unless otherwise specified, the terms "a plurality of" and "a plurality" mean two or more.

The basic idea of cross-training is to group raw data, one part is used as a training set to train a model, and the other part is used as a test set to evaluate the model. Through cross training, the prediction performance of the model can be effectively evaluated, especially the performance of the trained model on new data can be reduced to a certain extent, and effective information as much as possible can be obtained from limited data.

The specific number of the set number of copies can be set by a person skilled in the art as appropriate according to actual needs, and optionally, the set number of copies of the multiple sample subset is two sample subsets, that is, the number of copies of the multiple sample subset is two. The set number of copies is set to two, so that the effect of cross training can be effectively realized, and the phenomenon that the training time is prolonged and the effect is not improved due to too many copies can be avoided.

For example, the text training samples include 500 sentences, the 1 st to 250 th ones of which can be used as training sample subsets to train the first model and the second model, and the 251-. Then, the 251-500 samples can be used as the training sample subset, and the 1-250 samples can be used as the testing sample subset, and the training and testing processes can be repeated.

Based on this, when the second model is trained subsequently, the second model may be cross-iteratively trained by using the multiple sample subsets, and cross-training is performed for the same number of times as the number of the multiple sample subsets according to the number of the multiple sample subsets, that is, if the number of the multiple sample subsets is N, cross-training is performed for N times, and when training is performed for the ith time (i is greater than or equal to 1 and less than or equal to N), the ith sample subset may be used as a test sample subset, and other sample subsets except the ith sample subset may be used as training sample subsets. Therefore, efficient division of sample data and efficient training of the model are achieved. It should be noted that cross training is an efficient model training method, but in practical applications, those skilled in the art may also use other suitable training methods, such as a method using all training samples, and the like, which can also effectively improve the accuracy of label labeling.

In addition, because the training of the second model is an iterative training mode, the second model can be cross-trained the same number of times as the number of the sample subsets in each iterative training process of the second model. For example, assuming that the number of iterative training is 1000, and 2 sample subsets X and Y are used, in each iteration, 2 cross-trainings are performed, where X may be used as a training sample subset for the first time, and Y may be used as a training sample subset for the second time. The cross training was performed a total of 2000 times throughout the iterative training process.

Step S204: the first model is trained using a subset of training samples.

In order to simplify the training process and reduce the training cost, in this embodiment, a cross-training mode is also used for the first model, and the training sample subset and the test sample subset used by the first model are consistent with the second model.

As mentioned above, the first model in this embodiment is a Bi-L STM network model, and the Bi-L STM network model adopts the existing model structure and training method, which will not be described in detail herein, after completing the training of the current training sample subset, the Bi-L STM network model can output the output results for the training samples in the training sample subset, including the label and the corresponding first probability of each unit word of each sentence in the training sample.

Step S206: and acquiring a label of each unit word in a text training sample output by the first model and a first probability corresponding to the label.

Wherein the tag comprises at least one unknown tag.

The specific implementation of this step can refer to the related description in the first embodiment, and is not described herein, each label of each unit word output by the Bi-L STM network model and the corresponding first probability thereof will be used as the training basis of the second model.

Step S208: and replacing each unknown label with a plurality of preset labels of the unit word according to the unit word corresponding to each unknown label, and acquiring a second probability corresponding to each preset label.

Before training a second model, such as the Partial CRF model in this embodiment, it is necessary to determine a plurality of preset labels corresponding to each unit word corresponding to an unknown label, that is, a non-entity word, and a second probability corresponding to each preset label.

Compared with the traditional method that the output of the Bi-L STM network model is directly input into a CRF model for subsequent training, in the embodiment of the invention, a plurality of preset labels and a second probability for each preset label are preset for each unit word with the label of O output by the Bi-L STM network model.

The initial value of the plurality of preset labels corresponding to each non-entity word and the second probability corresponding to each preset label can be obtained in the following way: firstly, acquiring all text training samples for model training; acquiring each unit word in all the text training samples and the label probability corresponding to the labeled label of each unit word in all the text training samples; generating a plurality of preset labels corresponding to each unit word and an initial value of the second probability according to the marked labels and the corresponding label probabilities; then, a plurality of preset labels corresponding to each non-entity word and an initial value of the second probability corresponding to each preset label can be determined.

For example, a text training sample comprises 500 sentences, wherein the sample comprises "middle" words, the "middle" words are labeled as B-L OC 1000 times, labeled as B-PER 150 times, and labeled as O100 times, then the initial value of the second probability of the B-L OC label corresponding to the "middle" words is 1000/(1000+150+100) ═ 0.8, the initial value of the second probability of the B-PER label corresponding to the "middle" words is 150/(1000+150+100) ═ 0.12, the initial value of the second probability of the O label corresponding to the "middle" words is 100/(1000+150+ 100): 0.08.

Step S210: and performing the K-th iterative training on the second model.

Where K is greater than or equal to 1 and less than or equal to M, and M is a preset number of iterative training times of the second model, which may be set by a person skilled in the art as appropriate according to an actual situation, and the embodiment of the present invention is not limited thereto.

Step S212: the ith cross-training was started.

The cross training is nested in each iteration training, i is more than or equal to 1 and less than or equal to N, and N is the number of the sample subsets. In this embodiment, when the ith cross training is performed, the ith sample subset is used as the test sample subset, and the other sample subsets except the ith sample subset are used as the training sample subsets, so as to perform cross training on the second model, i.e., the Partial CRF model in this embodiment.

Specifically, when the ith cross training is performed on the Partial CRF model, the second model may be trained according to each label of each unit word and the first probability of each label in the current training sample subset, a plurality of preset labels corresponding to each non-entity word, and the second probability corresponding to each preset label.

In the current cross training, as described above, for each non-entity word, a plurality of label paths corresponding to a plurality of preset labels of the current non-entity word may be determined; obtaining a path reference score of each label path according to the first probability or the second probability corresponding to each unit word in each label path in the plurality of label paths; obtaining a path score of each label path according to a second probability corresponding to a preset label in each label path and a path reference score of each label path; obtaining the sum of the path scores of the plurality of label paths according to the path score of each label path; then, a second model is trained based on the path score sums.

The training process of the Partial CRF model in one cross training process can refer to sentence A in the first embodiment described above, which includes two characters w0 and w1, and the labels in the data set include: examples of B-PER, I-PER, and O, are not described in detail herein.

It should be noted that, on the basis of obtaining the sum of the path scores of the multiple label paths and the sum of the path scores of all the paths, the loss value of the training may be determined according to a loss function preset in the Partial CRF model, and then the parameters (including but not limited to the second probability) in the Partial CRF model are updated based on the loss value. Namely, after each cross training, updating the second probabilities corresponding to the preset labels of the current non-entity words according to the training result.

As can be seen from the above process, performing cross training on the second model the same number of times as the number of copies of the plurality of sample subsets comprises: acquiring a training sample subset and a test sample subset for current cross training from the multiple sample subsets; training the second model using the subset of training samples; updating a second probability corresponding to a plurality of preset labels of each unit word in the test sample subset according to the training result; then, step S214 described below may be performed, returning to determining the training sample subset and the test sample subset for the current cross training from the multiple sample subsets, and continuing the execution until the same number of times of cross training as the number of copies of the multiple sample subsets is completed.

For example, a training sample subset X and a testing sample subset Y are obtained, and in the cross training, a Partial CRF model is trained by using X; and then, updating the second probability of the plurality of preset labels of each unit word in the Y according to the result of the cross training. In the next cross training, a Partial CRF model is trained by using Y; then, according to the result of the cross training, updating the second probability of the plurality of preset labels of each unit word in the X.

Therefore, through N times of cross training, the second probabilities corresponding to the preset labels of each unit word in the multi-sample subset can be updated.

Step S214: i is i +1, and whether i is larger than N is judged; if not, returning to step S212 to continue execution; if yes, go to step S216.

And after the ith cross training is finished, increasing i by 1, and if i is not more than N, performing the next round of cross training.

Step S216: and performing named entity recognition test on the test sample subset by using the second model after the cross training and the second probability after each unit word in the multi-sample subset is updated, and obtaining and recording a test result.

By using the test sample book subset to test the effect of the second model after the current round of cross training is completed, the effect of the current round of cross training can be known, so as to prepare for subsequently selecting the parameters corresponding to the model with the best effect.

Step S218: if K is not greater than M, returning to step S210 to continue execution; otherwise, step S220 is executed.

And if K is not larger than M, the second model does not complete all iterative training, and the next iterative training needs to be executed. Otherwise, the second model is indicated to complete the specified iterative training, and the subsequent processing can be carried out.

Step S220: and determining an optimal test result from the recorded test results, and determining the training parameters of the second model corresponding to the optimal test result as final parameters of the second model.

Wherein the training parameters include, but are not limited to: the second probability, the tag transition parameter in the tag transition matrix, and so on.

As can be seen from the above process, in the embodiment, for each unit word and the label thereof output by the Bi-L STM network model, (1) initializing the second probability of the label of each unit word, equally dividing all text training samples into N (i is more than or equal to 2), and removing the O label corresponding to the non-entity word, (2) starting the K (1 is more than or equal to K is more than or equal to M) times of iterative training, (3) starting N-fold cross training, wherein the ith (1 is more than or equal to I is more than or equal to N) training takes the sample subsets except the ith sample subset as training sample subsets, (4) the ith training, using the Bi-L STM network model to perform corresponding word vector mapping or word vector mapping on input characters or words, inputting each character or word after context information coding into a Partial model, calculating the sum of correct label scores with the second probability as a weight, subtracting the sum of all path scores obtained by using the second probability as an optimization target (in case of logarithmic probability, the division is changed into a parameter updating, 5) performing multiple rounds of Partial subtraction on the correct label scores obtained by using the sum of the Partial probability as an optimization target model, and obtaining the second probability updating of the second probability model, and finally obtaining the second probability updating of the second CRF (9) and the second probability of the second training sample subset after the second probability is used for the second probability, and the second probability of the second training, and the second probability of the second CRF (7) of the second probability of the second training, and the second probability of the second training sample subset after the second training, and the second probability of the second training, and the second probability of the second training, and the second probability of the.

It can be seen from the above process that the whole training process will iterate K times, and when the Partial CRF model calculates the sum of each label path, the score of each label path is the path reference score of the label path multiplied by the second probability of the label path, and this second probability is predicted by the Partial CRF model obtained in the last iteration on the sample subset of the training. For example, in one example, after the first iterative training is completed, a Partial CRF model is obtained, and the Partial CRF model can calculate a second probability that a unit word in the training sample, such as "Farm", takes O, PER, and ORG, respectively, where the second probability is greater on O and smaller on PER and ORG, and then during the second iterative training, the weight of the label path where "Farm" takes O is greater, and the weight of the label path where PER and ORG are less.

Therefore, on one hand, the Partial CRF model can be trained on incomplete annotation data; on the other hand, a Partial CRF model can be used for modeling sentences containing unknown labels instead of the traditional CRF model; on the other hand, in the Partial CRF model, a second probability is introduced as a weight to distinguish different paths; and, use many rounds of iterative training, make the global information available locally.

After determining the relevant parameters of the first model and the relevant parameters of the second model, named entity recognition can be performed on the text by using the first model and the second model.

In summary, according to the present embodiment, on the basis of the first probability corresponding to the label of the text unit word output by the first model and the label of each text unit word, the unknown label and the corresponding unit word are processed, in the embodiment of the present invention, a plurality of labels, that is, a plurality of preset labels, and a second probability of each preset label are preset for the unit word corresponding to each unknown label, and based on this, the second model training may be performed on the unit word corresponding to each unknown label in combination with the output result of the first model, so as to determine the true label in the text training sample to which the unit word belongs, such as a sentence. Therefore, the named entity recognition model with higher accuracy can be obtained through training the second model, manual verification of unit words corresponding to unknown labels is not needed, model training efficiency is improved, and model training cost is reduced. Furthermore, the cost of using the first model and the second model to identify the named entities of the text is reduced, and the identification accuracy and efficiency are improved.

EXAMPLE III

Referring to fig. 3, a block diagram of a named entity recognition apparatus according to a third embodiment of the present invention is shown.

The named entity recognition apparatus of the present embodiment includes: a first obtaining module 302, configured to obtain labels of unit words in a text training sample output by a first model and a first probability corresponding to the labels, where the labels include at least one unknown label; a second obtaining module 304, configured to replace, according to a unit word corresponding to each unknown tag, each unknown tag with a plurality of preset tags of the unit word, and obtain a second probability corresponding to each preset tag; the training module 306 is configured to perform second model training using the text training sample according to the labels of the unit words and the first probabilities corresponding to the labels, and the plurality of preset labels of the unit words corresponding to each unknown label and the second probabilities corresponding to each preset label, so as to perform named entity recognition on the text through the first model and the second model.

According to the named entity recognition device, on the basis of the first probability corresponding to the label of the text unit word and the label of each text unit word output by the first model, the unknown label and the corresponding unit word are processed. Therefore, the named entity recognition model with higher accuracy can be obtained through training the second model, manual verification of unit words corresponding to unknown labels is not needed, model training efficiency is improved, and model training cost is reduced. Furthermore, the cost of using the first model and the second model to identify the named entities of the text is reduced, and the identification accuracy and efficiency are improved.

The named entity recognition apparatus of this embodiment is used to implement the corresponding named entity recognition method in the foregoing method embodiments, and relevant portions may refer to the description in the foregoing method embodiments, which is not described herein again.

Example four

Referring to fig. 4, a block diagram of a named entity recognition apparatus according to a fourth embodiment of the present invention is shown.

The named entity recognition apparatus of the present embodiment includes: a first obtaining module 402, configured to obtain a label of each unit word in a text training sample output by a first model and a first probability corresponding to the label, where the label includes at least one unknown label; a second obtaining module 404, configured to replace, according to a unit word corresponding to each unknown tag, each unknown tag with a plurality of preset tags of the unit word, and obtain a second probability corresponding to each preset tag; a training module 406, configured to perform second model training using the text training sample according to the label of each unit word and the first probability corresponding to the label, and the multiple preset labels of the unit word corresponding to each unknown label and the second probability corresponding to each preset label, so as to perform named entity recognition on the text through the first model and the second model.

Optionally, the first model comprises a Bi-L STM network model and the second model comprises a CRF model.

Optionally, the CRF model comprises a Partial CRF (Partial conditional random field) model.

Optionally, the training module 406 includes: the determining sub-module 4060 is configured to determine, for each unit word corresponding to each unknown tag, multiple tag paths corresponding to multiple preset tags of the unit word; the first calculating submodule 4062 is configured to obtain a path reference score of each label path according to the first probability or the second probability corresponding to each unit word in each label path in the plurality of label paths; the second calculating submodule 4064 is configured to obtain a path score of each label path according to a second probability corresponding to a preset label in each label path and the path reference score of each label path; a third calculating sub-module 4066, configured to obtain a sum of the path scores of the multiple label paths according to the path score of each label path; an iteration sub-module 4068, configured to perform iterative training on the second model according to the sum of the path scores until a training termination condition is reached.

Optionally, the named entity identifying device of this embodiment further includes: the dividing module 408 is configured to divide all the text training samples into multiple sample subsets of a set number of parts; the training module 406 is configured to perform cross iterative training on the second model by using the multiple sample subsets according to the label of each unit word and the first probability corresponding to the label, and the multiple preset labels of the unit word corresponding to each unknown label and the second probability corresponding to each preset label.

Optionally, the training module 406 is configured to perform cross training on the second model for the same number of times as the number of the multiple sample subsets in each iterative training process of the second model according to the label of each unit word and the first probability corresponding to the label, and the multiple preset labels of the unit word and the second probability corresponding to each unknown label.

Optionally, the number of portions of the plurality of subsets of samples is two.

Optionally, the named entity identifying device of this embodiment further includes: a generating module 410, configured to obtain all text training samples for network model training; acquiring each unit word in all the text training samples and the label probability corresponding to the labeled label of each unit word in all the text training samples; and generating a plurality of preset labels corresponding to each unit word and an initial value of the second probability according to the marked labels and the corresponding label probabilities.

Optionally, the training module 406 obtains a training sample subset and a testing sample subset for current cross training from the multiple sample subsets; training the second model using the subset of training samples; updating a second probability corresponding to a plurality of preset labels of each character in the test sample subset according to the training result; returning to the operation of determining the training sample subset and the testing sample subset for the current cross training from the multiple sample subsets, and continuing to execute until the cross training is completed for the same times as the multiple sample subsets.

Optionally, the training module 406 is further configured to perform a named entity recognition test on the test sample subset by using the second model after the cross training is completed and the updated second probability of each unit word in the multiple sample subset, obtain a test result, and record the test result.

The named entity recognition apparatus of this embodiment is used to implement the corresponding named entity recognition method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the specific implementation of the named entity recognition apparatus of this embodiment can refer to the description of the relevant parts in the foregoing method embodiments, and is not repeated herein.

EXAMPLE five

Referring to fig. 5, a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.

As shown in fig. 5, the electronic device may include: a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508.

Wherein:

the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508.

A communication interface 504 for communicating with other electronic devices such as a terminal device or a server.

The processor 502, configured to execute the program 510, may specifically perform relevant steps in the above named entity recognition method embodiment.

In particular, program 510 may include program code that includes computer operating instructions.

The processor 502 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 510 may specifically be used to cause the processor 502 to perform the following operations: acquiring a label of each unit word in a text training sample output by a first model and a first probability corresponding to the label, wherein the label comprises at least one unknown label; replacing each unknown label with a plurality of preset labels of the unit word according to the unit word corresponding to each unknown label, and acquiring a second probability corresponding to each preset label; and performing second model training by using the text training sample according to the label of each unit word, the first probability corresponding to the label, a plurality of preset labels of the unit word corresponding to each unknown label and the second probability corresponding to each preset label, so as to perform named entity recognition on the text through the first model and the second model.

In an alternative embodiment, the first model comprises a Bi-L STM network model and the second model comprises a CRF model.

In an alternative embodiment, the CRF model comprises a Partial CRF model.

In an optional implementation manner, the program 510 is further configured to enable the processor 502, when performing second model training by using the text training sample according to the tag of each unit word and the first probability corresponding to the tag, and the multiple preset tags of the unit word corresponding to each unknown tag and the second probability corresponding to each preset tag, determine, for the unit word corresponding to each unknown tag, multiple tag paths corresponding to the multiple preset tags of the unit word; obtaining a path reference score of each label path according to the first probability or the second probability corresponding to each unit word in each label path in the plurality of label paths; obtaining a path score of each label path according to a second probability corresponding to a preset label in each label path and a path reference score of each label path; obtaining the sum of the path scores of the plurality of label paths according to the path score of each label path; and performing iterative training on the second model according to the path score sum until a training termination condition is reached.

In an optional implementation manner, the program 510 is further configured to enable the processor 502 to equally divide all the text training samples into a set number of sample subsets before performing the second model training using the text training samples according to the label of each unit word and the first probability corresponding to the label, a plurality of preset labels of unit words corresponding to each unknown label, and the second probability corresponding to each preset label; the program 510 is further configured to cause the processor 502 to cross-iteratively train the second model using the subset of multiple samples when performing the second model training using the text training samples.

In an alternative embodiment, the program 510 is further configured to cause the processor 502 to perform cross-training on the second model the same number of times as the number of copies of the plurality of sample subsets during each iterative training of the second model when performing the cross-iterative training on the second model using the plurality of sample subsets.

In an alternative embodiment, the plurality of subsets of samples are in duplicate.

In an alternative embodiment, the program 510 is further configured to enable the processor 502 to obtain all text training samples for performing network model training before replacing each unknown tag with a plurality of preset tags corresponding to a unit word according to the unit word corresponding to each unknown tag; acquiring each unit word in all the text training samples and the label probability corresponding to the labeled label of each unit word in all the text training samples; and generating a plurality of preset labels corresponding to each unit word and an initial value of the second probability according to the marked labels and the corresponding label probabilities.

In an alternative embodiment, the program 510 is further configured to cause the processor 502 to obtain a training sample subset and a test sample subset for current cross training from the multiple sample subsets when the second model is cross-trained the same number of times as the number of copies of the multiple sample subsets; training the second model using the subset of training samples; updating a second probability corresponding to a plurality of preset labels of each unit word in the test sample subset according to the training result; and returning to the step of determining the training sample subset and the test sample subset for current cross training from the multiple sample subsets, and continuing to execute the step until the cross training is completed for the same times as the multiple sample subsets.

In an alternative embodiment, the program 510 is further configured to enable the processor 502 to perform a named entity recognition test on the test sample subset after completing the cross training the same number of times as the number of copies of the multiple sample subset, and using the second model after completing the cross training and the updated second probability of each unit word in the multiple sample subset, and obtain and record a test result.

For specific implementation of each step in the program 510, reference may be made to corresponding steps and corresponding descriptions in units in the above embodiment of the named entity identifying method, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

According to the electronic device of the embodiment, on the basis of the first probability corresponding to the label of the text unit word and the label of each text unit word output by the first model, the unknown label and the corresponding unit word are processed. Therefore, the named entity recognition model with higher accuracy can be obtained through training the second model, manual verification of unit words corresponding to unknown labels is not needed, model training efficiency is improved, and model training cost is reduced. Furthermore, the cost of using the first model and the second model to identify the named entities of the text is reduced, and the identification accuracy and efficiency are improved.

It should be noted that, in the embodiments of the present invention, the unit words are exemplified as single characters, but it should be understood by those skilled in the art that the named entity recognition scheme of the embodiments of the present invention can be implemented by referring to the corresponding embodiments by using unit words in other forms.

In addition, optionally, multiple probabilities in the embodiment of the present invention, such as the first probability, the second probability, and the like, may all adopt a logarithmic probability form, so as to reduce the amount of computation and improve the speed and efficiency of named entity identification. But is not limited thereto, other probability forms are also equally applicable to the solution of the embodiment of the present invention.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.

The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the named entity identification method described herein. Further, when a general-purpose computer accesses code for implementing the named entity identification method illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the named entity identification method illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims

1. A named entity recognition method, comprising:

obtaining labels of unit words in a text training sample output by a first model and a first probability corresponding to the labels, wherein the labels comprise at least one unknown label;

replacing each unknown label with a plurality of preset labels of the unit word according to the unit word corresponding to each unknown label, and acquiring a second probability corresponding to each preset label;

and performing second model training by using the text training sample according to the labels of the unit words and the first probabilities corresponding to the labels, the plurality of preset labels of the unit words corresponding to each unknown label and the second probabilities corresponding to the preset labels, so as to perform named entity recognition on the text through the first model and the second model.

2. The method of claim 1, wherein the first model comprises a two-way long-short term memory (Bi-L STM) network model and the second model comprises a Conditional Random Field (CRF) model.

3. The method of claim 2, wherein the CRF model comprises a Partial conditional random field (Partial CRF) model.

4. The method of claim 3, wherein,

determining a plurality of label paths corresponding to a plurality of preset labels of each unknown label aiming at the unit word corresponding to the unknown label;

obtaining a path reference score of each label path according to the first probability or the second probability corresponding to each unit word in each label path in the plurality of label paths;

obtaining a path score of each label path according to a second probability corresponding to a preset label in each label path and a path reference score of each label path;

obtaining the sum of the path scores of the plurality of label paths according to the path score of each label path;

and performing iterative training on the second model according to the path score sum until a training termination condition is reached.

5. The method of any of claims 1-4, further comprising: dividing all text training samples into a plurality of sample subsets with set parts;

the performing of the second model training using the text training sample comprises: performing cross-iterative training on the second model using the plurality of sample subsets.

6. The method of claim 5, wherein,

and in each iterative training process of the second model, performing cross training on the second model for the same times as the number of the samples of the plurality of sample subsets.

7. The method of claim 6, wherein the number of portions of the plurality of subsets of samples is two.

8. The method of claim 6, further comprising:

acquiring all text training samples for model training;

acquiring each unit word in all the text training samples and the label probability corresponding to the labeled label of each unit word in all the text training samples;

and generating a plurality of preset labels corresponding to each unit word and an initial value of the second probability according to the marked labels and the corresponding label probabilities.

9. The method of claim 8, wherein,

acquiring a training sample subset and a test sample subset for current cross training from the multiple sample subsets;

training the second model using the subset of training samples;

updating a second probability corresponding to a plurality of preset labels of each unit word in the test sample subset according to the training result;

and returning to the step of determining the training sample subset and the test sample subset for current cross training from the multiple sample subsets, and continuing to execute the step until the cross training is completed for the same times as the multiple sample subsets.

10. The method of claim 9, further comprising:

and performing named entity recognition test on the test sample subset by using the second model after the cross training and the updated second probability of each unit word in the multi-sample subset to obtain and record a test result.

11. A named entity recognition apparatus comprising:

the first obtaining module is used for obtaining labels of unit words in a text training sample output by a first model and a first probability corresponding to the labels, wherein the labels comprise at least one unknown label;

the second obtaining module is used for replacing each unknown label with a plurality of preset labels of the unit words according to the unit words corresponding to each unknown label and obtaining a second probability corresponding to each preset label;

and the training module is used for performing second model training by using the text training sample according to the labels of the unit words and the first probabilities corresponding to the labels, the plurality of preset labels of the unit words corresponding to each unknown label and the second probabilities corresponding to the preset labels, so as to perform named entity recognition on the text through the first model and the second model.

12. The apparatus of claim 11, wherein the first model comprises a bidirectional long-short term memory (Bi-L STM) network model and the second model comprises a Conditional Random Field (CRF) model.

13. The apparatus of claim 12, wherein the CRF model comprises a partial conditional random field (PartialCRF) model.

14. The apparatus of claim 13, wherein the training module comprises:

the determining submodule is used for determining a plurality of label paths corresponding to a plurality of preset labels of each unknown label aiming at the unit word corresponding to the unknown label;

the first calculation submodule is used for obtaining a path reference score of each label path according to the first probability or the second probability corresponding to each unit word in each label path in the plurality of label paths;

the second calculation submodule is used for obtaining the path score of each label path according to the second probability corresponding to the preset label in each label path and the path reference score of each label path;

the third calculation submodule is used for obtaining the sum of the path scores of the label paths according to the path score of each label path;

and the iteration submodule is used for carrying out iterative training on the second model according to the path score sum until a training termination condition is reached.

15. The apparatus of any one of claims 11-14,

the device further comprises: the dividing module is used for equally dividing all the text training samples into a plurality of sample subsets with set parts;

and the training module is used for performing cross iterative training on the second model by using the multiple sample subsets according to the labels of the unit words and the first probabilities corresponding to the labels, the multiple preset labels of the unit words corresponding to each unknown label and the second probabilities corresponding to the preset labels.

16. The apparatus of claim 15, wherein the training module is configured to perform cross training on the second model the same number of times as the number of copies of the multiple sample subsets during each iterative training process of the second model according to the tag of the unit word and the first probability corresponding to the tag, and a plurality of preset tags of the unit word corresponding to each unknown tag and a second probability corresponding to each preset tag.

17. The apparatus of claim 16, wherein the number of portions of the plurality of subsets of samples is two.

18. The apparatus of claim 16, wherein the apparatus further comprises:

the generating module is used for acquiring all text training samples for model training; acquiring each unit word in all the text training samples and the label probability corresponding to the labeled label of each unit word in all the text training samples; and generating a plurality of preset labels corresponding to each unit word and an initial value of the second probability according to the marked labels and the corresponding label probabilities.

19. The apparatus of claim 18, wherein the training module:

training the second model using the subset of training samples;

returning to the operation of determining the training sample subset and the testing sample subset for the current cross training from the multiple sample subsets, and continuing to execute until the cross training is completed for the same times as the multiple sample subsets.

20. The apparatus of claim 19, wherein,

and the training module is further used for performing named entity recognition test on the test sample subset by using the second model after the cross training and the updated second probability of each unit word in the multi-sample subset, and obtaining and recording a test result.