CN113723089A

CN113723089A - Word segmentation model training method, word segmentation method, data processing method and data processing device

Info

Publication number: CN113723089A
Application number: CN202010448100.6A
Authority: CN
Inventors: 王潇斌; 徐光伟; 龙定坤; 马春平; 丁瑞雪; 谢朋峻
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2021-11-30
Anticipated expiration: 2040-05-25
Also published as: CN113723089B

Abstract

The invention discloses a word segmentation model training method, a word segmentation method, a data processing method and a data processing device. The word segmentation model training method comprises the following steps: training by using word segmentation labeling data to obtain a word segmentation model; acquiring entity tagging data, and adding word segmentation labels to an entity part and a non-entity part of the entity tagging data according to a preset rule; and training the word segmentation model by using the entity labeling data after adding the word segmentation labels. The word segmentation model can be fitted with the word segmentation boundary rule in the entity tagging data, the word segmentation boundary of the word segmentation model is finally consistent with the word segmentation boundary of the entity tagging model trained by utilizing the entity tagging data, and the possibility of word segmentation boundary conflict caused by the simultaneous use of the word segmentation model and the entity tagging model is avoided.

Description

Word segmentation model training method, word segmentation method, data processing method and data processing device

Technical Field

The invention relates to the technical field of text processing, in particular to a word segmentation model training method, a word segmentation method, a data processing method and a data processing device.

Background

The word segmentation model and the entity tagging model are generally sequence tagging models based on word granularity, the word segmentation model is built by performing model training on a large amount of word segmentation data, and the entity tagging model is built by performing model training on a large amount of entity tagging data. When the two are used together, there may be a case of boundary conflict, such as for the sentence "zhujing office representation", the word segmentation result is "zhujing office representation", and the entity annotation result is "zhujing/LOC office representation" (where,/is an entity annotation format, "/" front "jing" is an entity word, and "/" back "LOC" is an annotation result).

Disclosure of Invention

In view of the above, the present invention has been made to provide a segmentation model training method, a segmentation method, and a data processing method and apparatus that overcome or at least partially solve the above problems.

In a first aspect, an embodiment of the present invention provides a method for training a segmentation model, including:

training by using word segmentation labeling data to obtain a word segmentation model;

acquiring entity tagging data, and adding word segmentation labels to an entity part and a non-entity part of the entity tagging data respectively according to a preset rule;

and training the word segmentation model by using the entity labeling data after adding the word segmentation labels.

In some optional embodiments, the training using the segmentation labeling data to obtain the segmentation model specifically includes:

using word segmentation labeling data, and training by adopting a conditional random field CRF loss function to obtain a word segmentation model;

correspondingly, the training of the word segmentation model by using the entity labeling data after adding the word segmentation label specifically comprises:

and training the word segmentation model by using the entity marking data after adding the word segmentation labels and adopting a CRF loss function.

In some optional embodiments, the training with the conditional random field CRF loss function to obtain the word segmentation model specifically includes:

selecting first label data in the word segmentation label data, and generating a determined label sequence and a possible label sequence combination corresponding to the first label data;

determining a first joint probability of the first annotation data and the determined tag sequence, and a second joint probability of the first annotation data and each possible tag sequence in a possible tag sequence combination;

according to the first joint probability and the second joint probability, training a first normative parameter in a first objective function constructed according to a CRF loss function by adopting a random gradient descent training method;

and stopping training if the descending amplitude of the value of the first objective function is lower than a preset first descending threshold value.

In some optional embodiments, the generating of the determined tag sequence and the possible tag sequence combination corresponding to the first annotation data specifically includes:

generating a determined BIES label sequence corresponding to the first labeling data according to the word segmentation condition of the first labeling data;

and determining the possible BIES label of each word according to the position of each word in the first annotation data, and generating the possible BIES label sequence combination corresponding to the first annotation data according to the possible BIES label of each word.

In some optional embodiments, the training of the segmentation model by using the entity labeling data after adding the segmentation labels and using a CRF loss function specifically includes:

selecting second labeling data in the entity labeling data after the word segmentation labels are added, and generating a determined label sequence combination and a possible label sequence combination corresponding to the second labeling data;

determining a third joint probability of the second labeling data and each determined label sequence in the determined label sequence combination respectively, and determining a fourth joint probability of the second labeling data and each possible label sequence in the possible label sequence combination respectively;

training a second normative parameter in a second objective function constructed according to the CRF loss function by adopting a random gradient descent training method according to the third joint probability and the fourth joint probability;

and if the descending amplitude of the value of the second objective function is lower than a preset second descending threshold value, stopping training.

In some optional embodiments, the generating of the determined tag sequence combination corresponding to the second annotation data specifically includes:

determining a determined BIES label of each character in the entity tagging participle in the second tagging data;

determining a possible BIES label of each word according to the position of each word of the non-entity part in the second labeling data relative to the adjacent entity labeling participle;

and generating a determined BIES label sequence combination corresponding to the second labeling data according to the determined BIES label and the possible BIES label.

In some optional embodiments, the determining, according to a position of each word of the non-entity part in the second annotation data relative to the adjacent entity annotation participle, a possible BIES label of each word specifically includes:

adding (S, E) labels to the characters of the first non-entity part on the left side of the entity tagging participle in the second tagging data;

and adding (S, B) labels to the characters of the first non-entity part on the right side of the entity tagging participle in the second tagging data.

In some optional embodiments, generating a possible tag sequence combination corresponding to the second annotation data specifically includes:

determining the possible BIES label of each word according to the position of each word in the second annotation data;

and generating a possible BIES label sequence combination corresponding to the second labeling data according to the possible BIES label of each word.

In a second aspect, an embodiment of the present invention provides a word segmentation method, including:

and performing word segmentation on the target text by using the word segmentation model trained by the word segmentation model training method to obtain a word segmentation result.

In a third aspect, an embodiment of the present invention provides a data processing method, including:

performing word segmentation on the target text by using the word segmentation model trained according to the word segmentation model training method to obtain a word segmentation text;

labeling the target text by using an entity labeling model to obtain an entity labeling text, wherein the entity labeling model is trained by using the entity labeling data in advance;

judging whether the boundary of the labeled participle is consistent with the corresponding participle boundary in the participle text or not aiming at each labeled participle in the entity labeled text;

if yes, marking the participles in the participle text according to the marking information of the marked participles.

In a fourth aspect, an embodiment of the present invention provides a training apparatus for a segmentation model, including:

the first training module is used for training by using word segmentation labeling data to obtain a word segmentation model;

the second training module is used for obtaining entity marking data and adding word segmentation labels to the entity part and the non-entity part of the entity marking data according to a preset rule; and training the word segmentation model by using the entity labeling data after adding the word segmentation labels.

In a fifth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which computer instructions are stored, which, when executed by a processor, implement the above-mentioned word segmentation model training method, or implement the above-mentioned word segmentation method, or implement the above-mentioned data processing method.

In a sixth aspect, an embodiment of the present invention provides a server, including: the word segmentation method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the word segmentation method is realized, or the data processing method is realized.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

(1) the word segmentation model training method provided by the embodiment of the invention obtains the word segmentation model by using word segmentation data training, and further trains and adjusts the word segmentation model by using the word segmentation condition of the entity tagging word in the entity tagging data, so that the word segmentation model can fit the entity tagging word boundary rule in the entity tagging data, the word segmentation boundaries of the word segmentation model and the entity tagging model are finally consistent, and the possibility of word segmentation boundary conflict caused by the simultaneous use of the word segmentation model and the entity tagging model is avoided.

(2) The word segmentation model training method provided by the embodiment of the invention further trains and adjusts the word segmentation model by using the entity labeling word segmentation result in the entity labeling data; instead of performing word segmentation and entity labeling on the same corpus at the same time, the boundary consistency is ensured from the corpus level. Therefore, the calculation amount is reduced on the corpus level, and the training cost of the word segmentation model is reduced.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart of a training method of a segmentation model according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a specific implementation of a training method for a segmentation model according to a second embodiment of the present invention;

fig. 3 is a flowchart illustrating a specific implementation of generating a tag sequence combination corresponding to second annotation data according to a second embodiment of the present invention;

FIG. 4 is a flowchart of a data processing method according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a word segmentation model training device in an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In order to solve the problem of boundary conflict existing in the prior art when performing word segmentation and entity labeling on a text at the same time, embodiments of the present invention provide a word segmentation model training method, a word segmentation method, a data processing method, and a device, so as to avoid word segmentation boundary conflict when performing word segmentation and entity labeling on a target text at the same time.

Example one

The embodiment of the invention provides a training method of a word segmentation model, the flow of which is shown in figure 1, and the method comprises the following steps:

step S11: and training by using word segmentation labeling data to obtain a word segmentation model.

In one embodiment, the segmentation model may be obtained by training with a Conditional Random Field (CRF) loss function using segmentation labeling data.

Specifically, first annotation data in the segmentation annotation data is selected, and a determined tag sequence and a possible tag sequence combination corresponding to the first annotation data are generated; determining a first joint probability of the first labeling data and the determined label sequences, and a second joint probability of each possible label sequence in the combination of the first labeling data and the possible label sequences respectively; training a first normative parameter in a first objective function constructed according to the CRF loss function by adopting a random gradient descent training method according to the first joint probability and the second joint probability; and when the descending amplitude of the value of the first objective function is lower than a preset first descending threshold value, stopping training.

Specifically, the tag sequence may be a BIES sequence, where B is begin, which represents the first character of a word segmentation; i is inside and represents a middle character of a participle; e is end, which represents the tail word of a participle; s is single, which means that a word is a single word.

Specifically, the first tagging data in the segmentation tagging data may be segmentation information of each sample in the segmentation tagging data, and the determination BIES tag of each character in the sample may be determined according to the segmentation information, so as to generate a determination BIES tag sequence of the sample; all possible BIES tag sequence combinations for a sample are generated from the possible BIES tags for each word in the sample.

Taking "I am in beijing city" as an example, generating BIES according to the word segmentation information thereof determines that the tag sequence is { (S), (B), (I), (E) }. Since the segmentation information is deterministic, the generated BIES determines the tag sequence to be unique. Meanwhile, if the word segmentation information is ignored, the corresponding label is only B or S as the character of 'I' is at the beginning part; the words "in", "north" and "jing" are all in the middle position, possibly corresponding to labels B, I, E and S, either; the "city" word ends, and the corresponding tag may only be E or S. Therefore, the final generated possible BIES tag sequence combinations are { (B, S), (B, I, E, S), (E, S) }, which includes 256(2 × 4 × 4 × 4 × 2) possible BIES tag sequences.

Step S12: and acquiring entity labeling data, and adding word segmentation labels to the entity part and the non-entity part of the entity labeling data respectively according to a preset rule.

Determining a determination label of each word of the entity part according to the word segmentation condition of the entity part in the sample aiming at each sample in the entity labeling data; determining possible labels for each word of the non-entity portion; the determined tag for each word of the entity portion and the possible tags for each word of the non-entity portion are combined into a determined sequence of tags for the sample data. And directly determining the possible label of each character without considering the word segmentation condition in the sample, and generating the possible label sequence combination of the sample data.

Specifically, the tag sequence may be a BIES sequence.

Step S13: and training the word segmentation model by using the entity labeling data after adding the word segmentation labels.

If the conditional random field CRF loss function is used for training to obtain the segmentation model in step S11, the entity tagging data after the segmentation labels are added is used, and the CRF loss function is used for training to adjust the segmentation model. Optionally, training and training adjustment of the word segmentation model may also adopt other methods, and the specific method is not limited in this embodiment as long as the two methods use the same method.

The word segmentation model training method provided by the embodiment of the invention obtains the word segmentation model by using word segmentation data training, and further trains and adjusts the word segmentation model by using the word segmentation condition of the entity tagging word in the entity tagging data, so that the word segmentation model can fit the entity tagging word boundary rule in the entity tagging data, the word segmentation boundaries of the word segmentation model and the entity tagging model are finally consistent, and the possibility of word segmentation boundary conflict caused when the word segmentation model and the entity tagging model are used simultaneously is avoided.

According to the word segmentation model training method provided by the embodiment of the invention, the word segmentation model is further trained and adjusted by using the entity tagging word segmentation result in the entity tagging data; instead of performing word segmentation and entity labeling on the same corpus at the same time, the boundary consistency is ensured from the corpus level. Therefore, the calculation amount is reduced on the corpus level, and the training cost of the word segmentation model is reduced.

Example two

The second embodiment of the present invention provides a specific implementation of a word segmentation model training method, the flow of which is shown in fig. 2, and the method includes the following steps:

step S21: and selecting first label data in the word segmentation label data, and generating a determined label sequence and a possible label sequence combination corresponding to the first label data.

In one embodiment, the method may include generating and determining a BIES tag sequence according to the word segmentation condition of the first annotation data; determining the possible BIES label of each word according to the position of each word in the first annotation data, and generating the possible BIES label sequence combination of the first annotation data according to the possible BIES label of each word.

That is, according to the word segmentation condition of the first annotation data, a unique BIES label of each word can be determined, and a unique BIES determination label sequence corresponding to the first annotation data is generated; ignoring word segmentation, determining all possible BIES labels of each word only according to the position (starting position, ending position or middle position) of each word in the first annotation data, and generating a possible BIES label sequence combination of the first annotation data according to all possible BIES labels of each word.

Step S22: a first joint probability of the first labeling data and the determined tag sequences is determined, and a second joint probability of each possible tag sequence in the combination of the first labeling data and the possible tag sequences is determined.

Step S23: and training a first specification parameter in a first objective function constructed according to the CRF loss function by adopting a random gradient descent training method according to the first joint probability and the second joint probability.

Specifically, the following function may be used as the first objective function, where w in the following function is the first specification parameter:

in the formula, x_iFirst annotation data y representing the ith segmentation sample data in the segmentation annotation data_iIndicating the determined label sequence corresponding to the ith piece of first labeling data,

represents the j possible label sequence corresponding to the i-th piece of first annotation data, f (x)_i,y_i) Representing the ith item of first annotation data and determining the tag sequence y_iIs determined by the first joint probability of (a),

representing the ith piece of first annotation data and the jth possible tag sequence

A second combined probability of.

With x₁I am in Beijing, y₁For example, if { (S), (B), (I), (E) } is not used, then

For any sequence in the sequence combinations of { (B, S), (B, I, E, S), (B, I, E, S), (B, I, E, S), (E, S) } tags, the "I" word is at the beginning, so the corresponding tag may only be B or S, the "in", "North" and "Jing" words are all in the middle position, the corresponding tag may be either B, I, E or S, and the "City" word is at the end, and the corresponding tag may only be E or S.

Step S24: and judging whether the descending amplitude of the value of the first objective function is lower than a preset first descending threshold value or not.

If the determination at step S24 is no, the process continues to step S23 until the determination at step S24 is yes, and step S25 is performed.

Since the first determination of the value of the first objective function cannot be followed by determining the magnitude of the drop, the default is that step S24 is no after step S23 is performed for the first time, and step S23 is continued.

Step S25: and selecting second labeling data in the entity labeling data after the word segmentation labels are added, and generating a determined label sequence combination and a possible label sequence combination corresponding to the second labeling data.

Specifically, referring to fig. 3, the determined tag sequence combination corresponding to the second labeled data may be generated as follows:

step S31: and determining BIES labels of all characters in the entity tagging participles in the second tagging data.

Taking "i work at university (ORG)" as an example, the entity label is "university", the determination label of "large" is B, and the determination label of "school" is E.

Step S32: and determining the possible BIES label of each word according to the position of each word of the non-entity part in the second labeling data relative to the adjacent entity labeling participle.

In one embodiment, the entity in the second annotation data may be annotated with a word of the first non-entity part to the left of the participle, and then a tag is added (S, E); and adding (S, B) labels to the characters of the first non-entity part on the right side of the entity tagging participle in the second tagging data.

Take "i work at university (ORG)" as an example, "at" marks the word of the first non-entity part to the left of the participle "university" for an entity, adding an (S, E) tag; "worker" is the word of the first non-entity part to the right of the entity annotation participle "university" with an (S, B) tag added.

Meanwhile, the word "I" is at the beginning, so the corresponding label is (B, S); the word "do" ends with the corresponding label (E, S).

Step S33: and generating a determined BIES label sequence combination corresponding to the second labeling data according to the determined BIES label and the possible BIES label.

Taking "i work at university (ORG)" as an example, the finally generated combination of the determined tag sequences is any one of the combinations of { (B, S), (E, S), (B), (E), (B, S), (E, S) } tag sequences.

For a word in the entity tagging sample data that is not the beginning or the end, nor adjacent to the entity tagging participle, its possible tag sequence is (B, I, E, S).

Correspondingly, the possible tag sequence combinations corresponding to the second labeled data can be generated as follows:

determining the possible BIES label of each word according to the position of each word in the second annotation data; and generating a possible BIES label sequence combination corresponding to the second labeling data according to the possible BIES label of each word.

Specifically, for the first word of the second annotation data, the possible label is B or S; for the last word of the second annotation data, the possible label is E or S; for the intermediate words in the second annotation data that are not beginning or end, their possible labels are (B, I, E, S).

Taking "i work at university (ORG)" as an example, the "i" word is at the beginning of the second annotation data; so the corresponding label is B or S, the "doing" word is at the end of the second annotation data, possibly the corresponding label is E or S; the labels of the intermediate words are (B, I, E, S), so the possible sequence combinations of labels for "I work at university (ORG)" are { (B, S), (B, I, E, S), (B, I, E, S), (B, I, E, S), (E, S) }.

Step S26: and determining a third joint probability of the second labeling data and each determined label sequence in the determined label sequence combination respectively, and determining a fourth joint probability of the second labeling data and each possible label sequence in the possible label sequence combination respectively.

Step S27: and training a second specification parameter in a second objective function constructed according to the CRF loss function by adopting a random gradient descent training method according to the third joint probability and the fourth joint probability.

Specifically, the following function may be used as the second objective function, and w' in the following function is the second specification parameter:

in the formula, x_i' second annotation data representing the ith item of entity annotation data,

the m-th determined label sequence corresponding to the ith piece of second annotation data is represented;

the nth possible label sequence represents the ith piece of second annotation data;

representing a third association probability of the ith second label data and the mth determined label sequence;

and (3) representing a fourth joint probability of the ith piece of second annotation data and the nth possible label sequence.

Step S28: and judging whether the descending amplitude of the value of the second objective function is lower than a preset second descending threshold value or not.

If the determination at step S28 is no, the process continues to step S27 until the determination at step S28 is yes, and step S29 is performed.

Since the second objective function value cannot be determined after the first determination, the default is that step S27 is performed for the first time, and then step S27 is continued after step S28 determines no.

Specifically, the second drop threshold may be the same as or different from the first drop threshold in step S24.

Step S29: and stopping training the word segmentation model.

When the determination in step S28 is yes, the training of the adjustment of the segmentation model is stopped.

Based on the inventive concept of the present invention, an embodiment of the present invention further provides a word segmentation method, including performing word segmentation on a target text by using a word segmentation model trained according to the word segmentation model training method described above, so as to obtain a word segmentation result.

EXAMPLE III

An embodiment of the present invention provides a data processing method, a flow of which is shown in fig. 4, and the method includes the following steps:

step S41: and performing word segmentation on the target text by using a word segmentation model to obtain a word segmentation text.

Specifically, the word segmentation model is trained according to the word segmentation model training method described in the first embodiment or the second embodiment.

Step S42: and labeling the target text by using the entity labeling model to obtain an entity labeling text.

Specifically, the entity labeling model is trained in advance by using the entity labeling data described in the first embodiment.

The steps S41 and S42 are not in sequence, and any one of the steps may be executed first, or may be executed simultaneously. Specifically, the target texts operated in step S41 and step S42 are the same target text, and the acquired target texts may be copied for executing step S41 and step S42, respectively.

Step S43: and judging whether the boundary of the labeled participle is consistent with the corresponding participle boundary in the participle text or not aiming at each labeled participle in the entity labeled text.

If yes, go to step S44.

Step S44: and marking the corresponding participles in the participle text according to the marking information of the marked participles.

Based on the inventive concept of the present invention, an embodiment of the present invention further provides a training apparatus for a segmentation model, which has a structure as shown in fig. 5, and includes:

a first training module 51, configured to train to obtain a segmentation model by using segmentation labeling data;

the second training module 52 is configured to obtain entity tagging data, and add word segmentation labels to an entity part and a non-entity part of the entity tagging data according to a preset rule; and training the word segmentation model by using the entity labeling data after adding the word segmentation labels.

In some optional embodiments, the first training module 51 obtains a segmentation model by training using segmentation labeling data, and is specifically configured to:

using word segmentation labeling data, and training by adopting a conditional random field CRF loss function to obtain a word segmentation model; correspondingly, the second training module 52 trains the word segmentation model by using the entity tagging data after adding the word segmentation label, and is specifically configured to:

In some optional embodiments, the first training module 51 obtains the word segmentation model by training using a conditional random field CRF loss function, and is specifically configured to:

selecting first label data in the word segmentation label data, and generating a determined label sequence and a possible label sequence combination corresponding to the first label data; determining a first joint probability of the first annotation data and the determined tag sequence, and a second joint probability of the first annotation data and each possible tag sequence in a possible tag sequence combination; according to the first joint probability and the second joint probability, training a first normative parameter in a first objective function constructed according to a CRF loss function by adopting a random gradient descent training method; and stopping training if the descending amplitude of the value of the first objective function is lower than a preset first descending threshold value.

In some optional embodiments, the first training module 51 generates a determined tag sequence and a possible tag sequence combination corresponding to the first annotation data, and is specifically configured to:

generating a determined BIES label sequence corresponding to the first labeling data according to the word segmentation condition of the first labeling data; and determining the possible BIES label of each word according to the position of each word in the first annotation data, and generating the possible BIES label sequence combination corresponding to the first annotation data according to the possible BIES label of each word.

In some optional embodiments, the second training module 52 trains the word segmentation model by using the entity labeling data after adding the word segmentation label and using a CRF loss function, specifically to:

selecting second labeling data in the entity labeling data after the word segmentation labels are added, and generating a determined label sequence combination and a possible label sequence combination corresponding to the second labeling data; determining a third joint probability of the second labeling data and each determined label sequence in the determined label sequence combination respectively, and determining a fourth joint probability of the second labeling data and each possible label sequence in the possible label sequence combination respectively; training a second normative parameter in a second objective function constructed according to the CRF loss function by adopting a random gradient descent training method according to the third joint probability and the fourth joint probability; and if the descending amplitude of the value of the second objective function is lower than a preset second descending threshold value, stopping training.

In some optional embodiments, the second training module 52 generates a determined tag sequence combination corresponding to the second annotation data, and is specifically configured to:

determining a determined BIES label of each character in the entity tagging participle in the second tagging data; determining a possible BIES label of each word according to the position of each word of the non-entity part in the second labeling data relative to the adjacent entity labeling participle; and generating a determined BIES label sequence combination corresponding to the second labeling data according to the determined BIES label and the possible BIES label.

In some optional embodiments, the second training module 52 determines, according to a position of each word of the non-entity part in the second annotation data relative to the adjacent entity annotation participle, a possible BIES label for each word, specifically for:

adding (S, E) labels to the characters of the first non-entity part on the left side of the entity tagging participle in the second tagging data; and adding (S, B) labels to the characters of the first non-entity part on the right side of the entity tagging participle in the second tagging data.

In some optional embodiments, the second training module 52 generates a possible tag sequence combination corresponding to the second annotation data, and is specifically configured to:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Based on the inventive concept of the present invention, an embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the instructions are executed by a processor, the method for training a word segmentation model described above is implemented, or the method for word segmentation is implemented, or the method for data processing is implemented.

Based on the same inventive concept, an embodiment of the present invention further provides a server, including: the word segmentation method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the word segmentation method is realized, or the data processing method is realized.

Unless specifically stated otherwise, terms such as processing, computing, calculating, determining, displaying, or the like, may refer to an action and/or process of one or more processing or computing systems or similar devices that manipulates and transforms data represented as physical (e.g., electronic) quantities within the processing system's registers and memories into other data similarly represented as physical quantities within the processing system's memories, registers or other such information storage, transmission or display devices. Information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or". The terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Claims

1. A method for training a segmentation model is characterized by comprising the following steps:

2. The method of claim 1, wherein the training using the segmentation annotation data to obtain the segmentation model specifically comprises:

3. The method of claim 2, wherein the training with the conditional random field CRF loss function to obtain the segmentation model comprises:

4. The method of claim 3, wherein the generating of the determined tag sequence and possible tag sequence combinations corresponding to the first annotation data comprises:

5. The method of claim 2, wherein the training of the segmentation model using the entity labeling data after adding the segmentation labels using a CRF loss function comprises:

6. The method of claim 5, wherein the generating of the determined tag sequence combination corresponding to the second annotation data specifically comprises:

7. The method of claim 6, wherein determining the possible BIES label for each word based on the position of each word relative to the adjacent entity labeled participles for the non-entity portion in the second label data comprises:

8. The method of claim 5, wherein generating the possible tag sequence combinations corresponding to the second annotation data comprises:

9. A method of word segmentation, comprising:

performing word segmentation on the target text by using the word segmentation model trained according to the word segmentation model training method of any one of claims 1 to 8 to obtain a word segmentation result.

10. A data processing method, comprising:

performing word segmentation on a target text by using a word segmentation model trained according to the word segmentation model training method of any one of claims 1 to 8 to obtain a word segmentation text;

labeling the target text by using an entity labeling model to obtain an entity labeling text, wherein the entity labeling model is trained by using the entity labeling data of any one of claims 1 to 8 in advance;

11. A word segmentation model training device, comprising:

12. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the method of training a segmentation model according to any one of claims 1 to 8, or implement the method according to claim 9 or 10.