WO2021072852A1

WO2021072852A1 - Sequence labeling method and system, and computer device

Info

Publication number: WO2021072852A1
Application number: PCT/CN2019/117403
Authority: WO
Inventors: 金戈; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-16
Filing date: 2019-11-12
Publication date: 2021-04-22
Also published as: CN111222317A; CN111222317B

Abstract

A sequence labeling method, comprising: receiving a target text sequence, and converting the target text sequence into a corresponding sentence vector, a word vector of each word and a position vector of each word (S100); inputting the sentence vector of the target text sequence, the word vector of each word and the position vector of each word into a trained BERT model, and outputting, by means of the BERT model, a first labeling sequence corresponding to the target text sequence; inputting the first labeling sequence into a fully connected layer, and outputting a second labeling sequence by means of the fully connected layer; taking the second labeling sequence as an input sequence of a conditional random field (CRF) model so as to output a label sequence Y = (y₁, y₂,..., y_m) by means of the CRF model (S106); and generating a named entity sequence according to the label sequence, and outputting the named entity sequence (S108). According to the method, the problem of existing models being unable to consider a long-term context information relationship is solved, and thus the technical effects of directly extracting a named entity in a text by means of the model, and improving the accuracy of identifying an entity are realized.

Description

Sequence labeling method, system and computer equipment

This application affirms the priority of the Chinese patent application filed on October 16, 2019 with the application number 201910983279.2. entitled "Sequence labeling method, system and computer equipment". The entire content of the Chinese patent application is incorporated herein by reference. Applying.

Technical field

The embodiments of the present application relate to the field of sequence labeling, and in particular, to a sequence labeling method, system, computer equipment, and non-volatile computer-readable storage medium.

Background technique

Among all natural language processing applications, named entity recognition is the most basic and most widely used one. It refers to identifying entities with specific meanings in the text, including names of persons, places, organizations, proper nouns, etc. The named entity recognition application is an important basic tool for other applications such as information extraction, question answering system, syntactic analysis, machine translation, semantic web-oriented metadata annotation and other application fields. Through the application of the named entity recognition tool, a natural language model can be constructed, which can understand, analyze and answer the results of natural language like humans. However, existing models often fail to consider long-term contextual information, which results in a technical problem that limits the accuracy of recognition.

Therefore, how to solve the problem that the existing model cannot consider the long-term contextual information relationship, so as to further improve the recognition accuracy of sequence labeling, has become one of the current technical problems to be solved.

Summary of the invention

In view of this, it is necessary to provide a sequence labeling method, system, computer equipment, and non-volatile computer-readable storage medium to solve the problem that the existing model cannot consider the long-term contextual information relationship, so that the recognition accuracy of sequence labeling is affected. Technical issues such as restrictions.

In order to achieve the foregoing objective, the embodiment of the present application provides a sequence labeling method, and the method steps include:

Receiving a target text sequence, and converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word;

Input the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word into the trained BERT model, and output the first annotation sequence corresponding to the target text sequence through the BERT model, where , The first tag sequence includes a plurality of first n-dimensional vectors, each first n-dimensional vector corresponds to a word in the target text sequence, and the first n-dimensional vector represents the corresponding word belonging to n first The first probability of each first tag in the tags;

The first annotation sequence is input to the fully connected layer, and the second annotation sequence is output through the fully connected layer, wherein the second annotation sequence includes a plurality of second n-dimensional vectors, and each second n-dimensional vector corresponds to For a word in the target text sequence, the second n-dimensional vector represents the second probability of the corresponding word belonging to each of the n second tags;

Use the second label sequence as the input sequence of the conditional random field CRF model to output the label sequence Y=(y ₁ , y ₂ ,..., y _m ) through the CRF model; and

A named entity sequence is generated according to the tag sequence, and the named entity sequence is output.

To achieve the foregoing objective, an embodiment of the present application also provides a sequence labeling system, including:

A receiving text module for receiving a target text sequence, and converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word;

The first labeling module is used to input the sentence vector of the target text sequence, the word vector of each word and the position vector of each word into the trained BERT model, and the output of the BERT model corresponds to the target text sequence The first tagging sequence, wherein the first tagging sequence includes a plurality of first n-dimensional vectors, each first n-dimensional vector corresponds to a word in the target text sequence, and the first n-dimensional vector represents the corresponding The first probability of the word belonging to each of the n first tags;

The second labeling module is configured to input the first labeling sequence to the fully connected layer, and output a second labeling sequence through the fully connected layer, wherein the second labeling sequence includes a plurality of second n-dimensional vectors, each A second n-dimensional vector corresponds to a word in the target text sequence, and the second n-dimensional vector represents a second probability of the corresponding word belonging to each of the n second tags;

The output label module is used to use the second label sequence as the input sequence of the conditional random field CRF model to output the label sequence Y=(y ₁ , y ₂ ,..., y _m ) through the CRF model; and

The output entity module is used to generate a named entity sequence according to the tag sequence, and output the named entity sequence.

In order to achieve the foregoing objective, an embodiment of the present application further provides a computer device, the computer device including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, the When the computer-readable instructions are executed by the processor, the following steps are implemented:

In order to achieve the above objective, the embodiments of the present application also provide a non-volatile computer-readable storage medium, the non-volatile computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions may Is executed by at least one processor, so that the at least one processor executes the following steps:

The sequence labeling method, system, computer equipment, and non-volatile computer-readable storage medium provided by the embodiments of the application provide an effective sequence labeling method for text sequences; the embodiments of the application solve the problem that the model in the prior art cannot consider the long-term Contextual information, which limits the accuracy of recognition technology. It is possible to extract the named entities in the sentence by directly inputting the original sentence into the model. It has strong adaptability and wide application, which improves the accuracy of sequence labeling for entity recognition. The technical effect of the rate.

Description of the drawings

FIG. 1 is a schematic flowchart of a sequence labeling method according to an embodiment of the application.

Fig. 2 is a schematic diagram of program modules of the second embodiment of the sequence labeling system of this application.

FIG. 3 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.

Detailed ways

In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

It should be noted that the descriptions related to "first", "second", etc. in this application are only used for descriptive purposes, and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Therefore, the features defined with "first" and "second" may explicitly or implicitly include at least one of the features. In addition, the technical solutions between the various embodiments can be combined with each other, but it must be based on what can be achieved by a person of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be achieved, it should be considered that such a combination of technical solutions does not exist. , Is not within the scope of protection required by this application.

In the following embodiments, the computer device 2 will be used as an execution subject for exemplary description.

Example one

Referring to FIG. 1, it shows a flowchart of the steps of a sequence labeling method according to an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. The following is an exemplary description with the computer device 2 as the execution subject. details as follows.

Step S100: Receive a target text sequence, and convert the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word.

Specifically, the step S100 may further include:

Step S100a, input the target text sequence to an embedding layer, and output a plurality of word vectors corresponding to the target text sequence through the embedding layer, and the plurality of word vectors includes at least one punctuation vector.

Exemplarily, when the received target text sequence is [Curie was born in Poland and lives in the United States], it is necessary to convert each word and special symbol into a word embedding vector, because the neural network can only perform numerical calculations.

Step S100b: Input the plurality of word vectors into the segmentation layer, and divide the plurality of word vectors according to the at least one punctuation vector to obtain n word vector sets, and the n word vector sets correspond to n divisions code.

Exemplarily, the target text sequence is [Curie was born in Poland, lives in the United States] is divided into A sentence [Curie is born in Poland] and B sentence [residence in the United States], the first half of the sentence will be added with the segmentation code A, and then The segmentation code B will be added to the half sentence.

In step S100c, an encoding operation is performed on each segmentation code by position encoding, and the position information encoding of each segmentation code is determined, so as to obtain the position vector of each word in the target text sequence.

Exemplarily, the position information encoding may be used to determine the position of each word in the target text sequence.

Step S100d, generating a sentence vector of the target text sequence according to the word vector of each word in the target text sequence and the position vector of each word.

Step S102: Input the sentence vector of the target text sequence, the word vector of each word and the position vector of each word into the trained BERT model, and output the first annotation corresponding to the target text sequence through the BERT model Sequence, wherein the first annotation sequence includes a plurality of first n-dimensional vectors, each of the first n-dimensional vectors corresponds to a word in the target text sequence, and the first n-dimensional vector indicates that the corresponding word belongs to n The first probability of each first tag in the first tags.

Exemplarily, the n first tags may be multiple location tags and multiple semantic tags, and the n first tags may also be multiple location tags and multiple part-of-speech tags.

Exemplarily, BERT is an existing pre-training model. The full name of BERT is Bidirectional Encoder Representations from Transformers, that is, a two-way Transformer encoder (Encoder); wherein, the Transformer is a type that completely relies on self-attention. In order to calculate input and output representations; BERT aims to pre-train deep bidirectional representations by jointly adjusting the context in all layers. Therefore, the pre-trained BERT can be fine-tuned through an additional output layer, which is suitable for the construction of state-of-the-art models for a wide range of tasks, such as question answering tasks and language inference, without requiring major architectural modifications for specific tasks.

Exemplarily, the BERT model can be obtained by capturing words through a masked language model (MLM) method and expressing the sentence level through a "Next Sentence Prediction" method; wherein the masked language model randomly masks the model input Some of the words in the (token), the goal is to predict the original vocabulary id based only on the context of the masked word. Unlike the pre-training of the left-to-right language model, the training target of the masked language model allows the representation and fusion of the left and right sides of the language Context, so as to pre-train a deep two-way Transformer; “Next Sentence Prediction”, that is, Next Sentence Prediction refers to the selection of two sentences in two situations when pre-training the language model. One is to select the real sequence in the corpus. Two sentences; the other is that the second sentence is thrown dice from the corpus, and one is randomly selected and spelled after the first sentence. We require the model to do the above-mentioned Masked language model task with a sentence relationship prediction to determine whether the second sentence is really a follow-up sentence of the first sentence.

Exemplarily, training the pre-trained BERT model may include: acquiring multiple training text sequences, using the multiple training text sequences as a training set of the BERT model, and inputting the training set to the pre-trained BERT In the model, the pre-trained BERT model is trained through the training set to obtain a trained BERT model.

Specifically, the step S102 may further include:

Step S102a: Perform feature extraction on the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word through the BERT model to obtain the first label of each word in the target text sequence. First probability.

Step S102b: Generate a first labeling sequence according to the first probability of each first tag of each word in the target text sequence.

Step S104: Input the first labeling sequence to the fully connected layer, and output a second labeling sequence through the fully connected layer, where the second labeling sequence includes a plurality of second n-dimensional vectors, and each second n-dimensional vector The dimensional vector corresponds to a word in the target text sequence, and the second n-dimensional vector represents the second probability of the corresponding word belonging to each of the n second tags.

Exemplarily, the n second tags may be multiple location tags and multiple semantic tags, and the n second tags may also be multiple location tags and multiple part-of-speech tags.

Specifically, the step S104 may further include:

Step S104a: Input the first label sequence into the neural network structure of the fully connected layer, and perform additional feature extraction to obtain the second probability of each label of each word in the target text sequence. For the target text additional features of the i-th word sequence extracted calculation formula B _{_i} = wX _i + b, where, X _i is the probability of each of the first tag of the first sequence of the first tagging word i, w and b is the learning parameter of the BERT model;

Exemplarily, the neural network structure of the fully connected layer of this embodiment may be a multi-layer transformer structure. The multi-layer transformer structure further includes an attention mechanism. After the first annotation sequence is processed by the attention mechanism, the input To the feedforward fully connected neural network structure for additional feature extraction, to obtain the second probability of each second label of each word in the target text sequence; that is, to obtain each of the target text sequences through the operation of wx+b The second probability of each second label of the word, where x is the sequence, and w and b are the model learning parameters.

Step S104b: Generate a second tagging sequence according to the second probability of each second tag of each word in the target text sequence.

Step S106: Use the second label sequence as the input sequence of the conditional random field CRF model to output the label sequence Y=(y ₁ , y ₂ ,..., y _m ) through the CRF model.

Specifically, the step S106 may further include:

Step S106a, input the second annotation sequence into the CRF model;

Step S106b: Viterbi solving the second annotation sequence by the Viterbi algorithm to obtain an optimal solution path in the second annotation sequence, where the optimal solution path is that the label sequence is an integer. The highest probability sequence of the target text sequence;

Exemplarily, this step is to determine the output object that the target text sequence should correspond to according to the probability value of the second probability of each second label of each word in the target text sequence; here it is implemented by the Viterbi algorithm, so The Viterbi algorithm does not output the highest label probability among the second probabilities of each second label of each word in the target text sequence, but outputs the highest probability label sequence of the entire target text sequence.

Exemplarily, the Viterbi algorithm may include: when the path with the second highest probability of each second tag of each word in the target text sequence passes through a certain point of the fence network, then from the starting point to the The sub-path of a point must also be the path with the greatest probability from the beginning to the point; when there are k states at the i-th moment, there are k shortest paths from the beginning to the k states at the i time, and the final shortest path must pass One of them.

Step S106c, generating a label sequence according to the optimal solution path.

Exemplarily, the highest probability labeling sequence of the entire target text sequence is calculated by the Viterbi algorithm. When calculating the shortest path of the i+1th state, only the k state values from the beginning to the current The shortest path and the shortest path from the current state value to the i+1th state value are sufficient.

Step S108: Generate a named entity sequence according to the tag sequence, and output the named entity sequence.

Exemplarily, a named entity sequence can be generated according to the tag sequence, and the named entity sequence is a target text sequence predicted by the tagging system. Wherein, the named entity includes place name, person name, etc.; sequence labeling adopts the form of BIOES, where B is the beginning of the entity, I is the middle of the entity, O is the non-entity, E is the end of the entity, and S is the single word entity; and each named entity The label corresponds to the entity category, which can be refined into similar forms such as B-place name: the beginning of the place name entity. Here is an example of place names and personal names; for example, the sentence "Curie was born in Warsaw" as an example, this sentence will be split into a sequence of words. Then the home character is marked as B name, the inner character is marked as E name, the new character is marked as O, the word Yu is marked as O, the wave character is marked as B place name, and the blue character is marked as E- place name.

Example two

Fig. 2 is a schematic diagram of program modules of the second embodiment of the sequence labeling system of this application. The sequence labeling system 20 may include or be divided into one or more program modules. One or more program modules are stored in a storage medium and executed by one or more processors to complete the application and realize the above Sequence labeling method. The program module referred to in the embodiments of the present application refers to a series of computer-readable instruction segments that can complete specific functions. The following description will specifically introduce the functions of each program module in this embodiment:

The receiving text module 200 is configured to receive a target text sequence, and convert the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word.

Exemplarily, the receiving text module 200 is further configured to: input the target text sequence into an embedding layer, and output multiple word vectors corresponding to the target text sequence through the embedding layer, the multiple word vectors Including at least one punctuation vector; input the plurality of word vectors to the segmentation layer, and divide the plurality of word vectors according to the at least one punctuation vector to obtain n word vector sets, the n word vector sets Corresponding to n segmentation codes; perform an encoding operation on each segmentation code by position encoding, and determine the position information encoding of each segmentation code to obtain the position vector of each word in the target text sequence; and according to the target text sequence The word vector of each word in and the position vector of each word in the, generate the sentence vector of the target text sequence.

The first labeling module 202 is configured to input the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word into the trained BERT model, and the output of the BERT model is related to the target text sequence. The corresponding first annotation sequence, wherein the first annotation sequence includes a plurality of first n-dimensional vectors, each first n-dimensional vector corresponds to a word in the target text sequence, and the first n-dimensional vector represents The first probability that the corresponding word belongs to each of the n first tags.

Exemplarily, the first tagging module 202 is further configured to: use the BERT model to perform feature extraction on the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word to obtain the target The first probability of each first label of each word in the text sequence; and the first labeling sequence is generated according to the first probability of each first label of each word in the target text sequence.

The second labeling module 204 is configured to input the first labeling sequence to the fully connected layer, and output a second labeling sequence through the fully connected layer, wherein the second labeling sequence includes a plurality of second n-dimensional vectors, Each second n-dimensional vector corresponds to a word in the target text sequence, and the second n-dimensional vector represents the second probability of the corresponding word belonging to each of the n second tags.

Exemplarily, the second labeling module 204 is further configured to: input the first labeling sequence into the neural network structure of the fully connected layer, and perform additional feature extraction to obtain the information of each word in the target text sequence. For the second probability of each second label, the calculation formula for the extra feature extraction of the i-th character in the target text sequence is B _i =wX _i +b, where Xi _{is the i-th character} in the first labeling sequence The first probability of each first label of each word, w and b are BERT model learning parameters; according to the second probability of each second label of each word in the target text sequence, a second labeling sequence is generated.

The output label module 206 is configured to use the second label sequence as the input sequence of the conditional random field CRF model to output the label sequence Y=(y ₁ , y ₂ ,..., y _m ) through the CRF model.

Exemplarily, the output tag module 206 is further configured to: input the second annotation sequence into the CRF model; perform Viterbi solution on the second annotation sequence by the Viterbi algorithm to obtain the second annotation The optimal solution path in the sequence, wherein the optimal solution path is the highest probability sequence in which the label sequence is the entire target text sequence; the label sequence is generated according to the optimal solution path.

The output entity module 208 is configured to generate a named entity sequence according to the tag sequence, and output the named entity sequence.

Example three

Refer to FIG. 3, which is a schematic diagram of the hardware architecture of the computer device in the third embodiment of the present application. In this embodiment, the computer device 2 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. The computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers). As shown in the figure, the computer device 2 at least includes, but is not limited to, a memory 21, a processor 22, a network interface 23, and a sequence labeling system 20 that can communicate with each other through a system bus.

In this embodiment, the memory 21 includes at least one type of non-volatile computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), Random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk Wait. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device. In this embodiment, the memory 21 is generally used to store an operating system and various application software installed in the computer device 2, for example, the program code of the sequence labeling system 20 in the second embodiment. In addition, the memory 21 can also be used to temporarily store various types of data that have been output or will be output.

The processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 22 is generally used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the sequence labeling system 20 to implement the sequence labeling method of the first embodiment.

The network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices. For example, the network interface 23 is used to connect the computer device 2 to an external terminal through a network, and to establish a data transmission channel and a communication connection between the computer device 2 and the external terminal. The network may be an intranet, the Internet, a global system of mobile communication (GSM), a wideband code division multiple access (WCDMA), a 4G network, and a 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.

It should be pointed out that FIG. 3 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.

In this embodiment, the sequence labeling system 20 stored in the memory 21 may also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and processed by one or more The processor (in this embodiment, the processor 22) is executed to complete the application.

For example, FIG. 2 shows a schematic diagram of program modules for implementing the sequence labeling system 20 according to the second embodiment of the present application. In this embodiment, the sequence labeling system 20 can be divided into a text receiving module 200 and a first labeling module 202. , The second labeling module 204, the output label module 206, and the output entity module 208. Among them, the program module referred to in this application refers to a series of computer-readable instruction segments that can complete specific functions. The specific functions of the program modules 200-208 have been described in detail in the second embodiment, and will not be repeated here.

Example four

This embodiment also provides a non-volatile computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which storage There are computer-readable instructions, and the corresponding functions are realized when the program is executed by the processor. The non-volatile computer-readable storage medium of this embodiment is used in the sequence labeling system 20, and the processor executes the following steps:

The first annotation sequence is input to the fully connected layer, and the second annotation sequence is output through the fully connected layer, where the second annotation sequence includes a plurality of second n-dimensional vectors, and each second n-dimensional vector corresponds to For a word in the target text sequence, the second n-dimensional vector represents the second probability of the corresponding word belonging to each of the n second tags;

The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the superiority or inferiority of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。

The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

A sequence labeling method, the method includes:

Receiving a target text sequence, and converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word;

Input the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word into the trained BERT model, and output the first annotation sequence corresponding to the target text sequence through the BERT model, where , The first tag sequence includes a plurality of first n-dimensional vectors, each first n-dimensional vector corresponds to a word in the target text sequence, and the first n-dimensional vector indicates that the corresponding word belongs to n first tags The first probability of each first label in;

The first annotation sequence is input to the fully connected layer, and the second annotation sequence is output through the fully connected layer, wherein the second annotation sequence includes a plurality of second n-dimensional vectors, and each second n-dimensional vector corresponds to For a word in the target text sequence, the second n-dimensional vector represents the second probability of the corresponding word belonging to each of the n second tags;

Use the second label sequence as the input sequence of the conditional random field CRF model to output the label sequence Y=(y 1 , y 2 ,..., y m ) through the CRF model; and

A named entity sequence is generated according to the tag sequence, and the named entity sequence is output.
The sequence labeling method according to claim 1, wherein the step of converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word comprises:

Inputting the target text sequence to an embedding layer, and outputting a plurality of word vectors corresponding to the target text sequence through the embedding layer, and the plurality of word vectors includes at least one punctuation vector;

Inputting the plurality of word vectors into the segmentation layer, and segmenting the plurality of word vectors according to the at least one punctuation vector to obtain n word vector sets, the n word vector sets corresponding to n segmentation codes;

Perform an encoding operation on each segmentation code through position encoding, and determine the position information encoding of each segmentation code, so as to obtain the position vector of each word in the target text sequence; and

According to the word vector of each word in the target text sequence and the position vector of each word, a sentence vector of the target text sequence is generated.
3. The sequence labeling method according to claim 2, wherein the step of outputting the first labeling sequence corresponding to the target text sequence through the BERT model comprises:

Perform feature extraction on the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word through the BERT model to obtain the first probability of each first label of each word in the target text sequence ；

According to the first probability of each first label of each word in the target text sequence, a first labeling sequence is generated.
The sequence labeling method according to claim 3, wherein the step of inputting the first labeling sequence to a fully connected layer and outputting a second labeling sequence through the fully connected layer comprises:

The first annotation sequence is input into the neural network structure of the fully connected layer, and additional feature extraction is performed to obtain the second probability of each second label of each word in the target text sequence. For the target text sequence The calculation formula for the extra feature extraction of the i-th word in is B i =wX i +b, where X i is the first probability of each first tag of the i-th word in the first labeling sequence, w and b Are the learning parameters of the BERT model;

According to the second probability of each second label of each word in the target text sequence, a second labeling sequence is generated.
The sequence labeling method according to claim 1, wherein the second labeling sequence is used as the input sequence of the conditional random field CRF model to output the label sequence Y=(y 1 ,y 2 ,... ,y m ), including:

Input the second annotation sequence into the CRF model;

Viterbi solution is performed on the second tag sequence by the Viterbi algorithm to obtain the optimal solution path in the second tag sequence, where the optimal solution path is that the tag sequence is the entire target text The highest probability sequence of the sequence;

A label sequence is generated according to the optimal solution path.
A sequence labeling system, including:

A receiving text module for receiving a target text sequence, and converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word;

The first labeling module is used to input the sentence vector of the target text sequence, the word vector of each word and the position vector of each word into the trained BERT model, and the output of the BERT model corresponds to the target text sequence The first tagging sequence, wherein the first tagging sequence includes a plurality of first n-dimensional vectors, each first n-dimensional vector corresponds to a word in the target text sequence, and the first n-dimensional vector represents the corresponding The first probability of the word belonging to each of the n first tags;

The second labeling module is configured to input the first labeling sequence to the fully connected layer, and output a second labeling sequence through the fully connected layer, wherein the second labeling sequence includes a plurality of second n-dimensional vectors, each A second n-dimensional vector corresponds to a word in the target text sequence, and the second n-dimensional vector represents a second probability of the corresponding word belonging to each of the n second tags;

The output label module is used to use the second label sequence as the input sequence of the conditional random field CRF model to output the label sequence Y=(y 1 , y 2 ,..., y m ) through the CRF model; and

The output entity module is used to generate a named entity sequence according to the tag sequence, and output the named entity sequence.
The sequence labeling system according to claim 6, wherein the receiving text module is further configured to:

Inputting the target text sequence to an embedding layer, and outputting a plurality of word vectors corresponding to the target text sequence through the embedding layer, and the plurality of word vectors includes at least one punctuation vector;

Inputting the plurality of word vectors into the segmentation layer, and segmenting the plurality of word vectors according to the at least one punctuation vector to obtain n word vector sets, the n word vector sets corresponding to n segmentation codes; and

Perform an encoding operation on each segmentation code by position encoding, and determine the position information encoding of each segmentation code, so as to obtain the position vector of each word in the target text sequence;

According to the word vector of each word in the target text sequence and the position vector of each word, a sentence vector of the target text sequence is generated.
The sequence labeling system according to claim 7, wherein the first labeling module is further configured to:

Perform feature extraction on the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word through the BERT model to obtain the first probability of each first label of each word in the target text sequence ；

According to the first probability of each first label of each word in the target text sequence, a first labeling sequence is generated.
The sequence labeling system according to claim 8, wherein the second labeling module is further configured to:

The first annotation sequence is input into the neural network structure of the fully connected layer, and additional feature extraction is performed to obtain the second probability of each second label of each word in the target text sequence. For the target text sequence The calculation formula for the extra feature extraction of the i-th word in is B i =wX i +b, where X i is the first probability of each first tag of the i-th word in the first labeling sequence, w and b Are the learning parameters of the BERT model;

According to the second probability of each second label of each word in the target text sequence, a second labeling sequence is generated.
The sequence labeling system according to claim 6, wherein the output label module is further used for:

Input the second annotation sequence into the CRF model;

Viterbi solution is performed on the second tag sequence by the Viterbi algorithm to obtain the optimal solution path in the second tag sequence, where the optimal solution path is that the tag sequence is the entire target text The highest probability sequence of the sequence;

A label sequence is generated according to the optimal solution path.
A computer device that includes a memory, a processor, and computer-readable instructions stored on the memory and capable of running on the processor. The computer-readable instructions implement the following steps when executed by the processor :

Receiving a target text sequence, and converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word;

Input the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word into the trained BERT model, and output the first annotation sequence corresponding to the target text sequence through the BERT model, where , The first tag sequence includes a plurality of first n-dimensional vectors, each first n-dimensional vector corresponds to a word in the target text sequence, and the first n-dimensional vector indicates that the corresponding word belongs to n first tags The first probability of each first label in;

The first annotation sequence is input to the fully connected layer, and the second annotation sequence is output through the fully connected layer, wherein the second annotation sequence includes a plurality of second n-dimensional vectors, and each second n-dimensional vector corresponds to For a word in the target text sequence, the second n-dimensional vector represents the second probability of the corresponding word belonging to each of the n second tags;

Use the second label sequence as the input sequence of the conditional random field CRF model to output the label sequence Y=(y 1 , y 2 ,..., y m ) through the CRF model; and

A named entity sequence is generated according to the tag sequence, and the named entity sequence is output.
11. The computer device of claim 11, wherein the step of converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word comprises:

Inputting the target text sequence to an embedding layer, and outputting a plurality of word vectors corresponding to the target text sequence through the embedding layer, and the plurality of word vectors includes at least one punctuation vector;

Inputting the plurality of word vectors into the segmentation layer, and segmenting the plurality of word vectors according to the at least one punctuation vector to obtain n word vector sets, the n word vector sets corresponding to n segmentation codes;

Perform an encoding operation on each segmentation code through position encoding, and determine the position information encoding of each segmentation code, so as to obtain the position vector of each word in the target text sequence; and

According to the word vector of each word in the target text sequence and the position vector of each word, a sentence vector of the target text sequence is generated.
The computer device according to claim 12, wherein the step of outputting the first annotation sequence corresponding to the target text sequence through the BERT model comprises:

Perform feature extraction on the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word through the BERT model to obtain the first probability of each first label of each word in the target text sequence ；

According to the first probability of each first label of each word in the target text sequence, a first labeling sequence is generated.
The computer device according to claim 13, wherein the step of inputting the first labeling sequence to a fully connected layer and outputting a second labeling sequence through the fully connected layer comprises:

The first annotation sequence is input into the neural network structure of the fully connected layer, and additional feature extraction is performed to obtain the second probability of each second label of each word in the target text sequence. For the target text sequence The calculation formula for the extra feature extraction of the i-th word in is B i =wX i +b, where X i is the first probability of each first tag of the i-th word in the first labeling sequence, w and b Are the learning parameters of the BERT model;

According to the second probability of each second label of each word in the target text sequence, a second labeling sequence is generated.
The computer device according to claim 11, wherein the second labeling sequence is used as the input sequence of the conditional random field CRF model to output the label sequence Y=(y 1 ,y 2 ,..., The steps of y m) include:

Input the second annotation sequence into the CRF model;

Viterbi solution is performed on the second tag sequence by the Viterbi algorithm to obtain the optimal solution path in the second tag sequence, where the optimal solution path is that the tag sequence is the entire target text The highest probability sequence of the sequence;

A label sequence is generated according to the optimal solution path.
A non-volatile computer-readable storage medium having computer-readable instructions stored in the non-volatile computer-readable storage medium, and the computer-readable instructions can be executed by at least one processor to cause the At least one processor performs the following steps:

Receiving a target text sequence, and converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word;

Input the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word into the trained BERT model, and output the first annotation sequence corresponding to the target text sequence through the BERT model, where , The first tag sequence includes a plurality of first n-dimensional vectors, each first n-dimensional vector corresponds to a word in the target text sequence, and the first n-dimensional vector indicates that the corresponding word belongs to n first tags The first probability of each first label in;

The first annotation sequence is input to the fully connected layer, and the second annotation sequence is output through the fully connected layer, wherein the second annotation sequence includes a plurality of second n-dimensional vectors, and each second n-dimensional vector corresponds to For a word in the target text sequence, the second n-dimensional vector represents the second probability of the corresponding word belonging to each of the n second tags;

Use the second label sequence as the input sequence of the conditional random field CRF model to output the label sequence Y=(y 1 , y 2 ,..., y m ) through the CRF model; and

A named entity sequence is generated according to the tag sequence, and the named entity sequence is output.
16. The non-volatile computer-readable storage medium of claim 16, wherein the step of converting the target text sequence into a corresponding sentence vector, a word vector of each word, and a position vector of each word comprises:

Inputting the target text sequence to an embedding layer, and outputting a plurality of word vectors corresponding to the target text sequence through the embedding layer, and the plurality of word vectors includes at least one punctuation vector;

Inputting the plurality of word vectors into the segmentation layer, and segmenting the plurality of word vectors according to the at least one punctuation vector to obtain n word vector sets, the n word vector sets corresponding to n segmentation codes;

Perform an encoding operation on each segmentation code through position encoding, and determine the position information encoding of each segmentation code, so as to obtain the position vector of each word in the target text sequence; and

According to the word vector of each word in the target text sequence and the position vector of each word, a sentence vector of the target text sequence is generated.
17. The non-volatile computer-readable storage medium according to claim 17, wherein the step of outputting the first annotation sequence corresponding to the target text sequence through the BERT model comprises:

Perform feature extraction on the sentence vector of the target text sequence, the word vector of each word, and the position vector of each word through the BERT model to obtain the first probability of each first label of each word in the target text sequence ；

According to the first probability of each first label of each word in the target text sequence, a first labeling sequence is generated.
The non-volatile computer-readable storage medium of claim 18, wherein the step of inputting the first labeling sequence to a fully connected layer and outputting a second labeling sequence through the fully connected layer comprises:

The first annotation sequence is input into the neural network structure of the fully connected layer, and additional feature extraction is performed to obtain the second probability of each second label of each word in the target text sequence. For the target text sequence additional features of the i-th word extracted computation formula B i = wX i + b, where, X i is the probability of each of the first tag of the first sequence of the first tagging word i, w and b Are the learning parameters of the BERT model;

According to the second probability of each second label of each word in the target text sequence, a second labeling sequence is generated.
The non-volatile computer-readable storage medium according to claim 16, wherein the second labeling sequence is used as the input sequence of the conditional random field CRF model to output the labeling sequence Y=(y 1 , The steps of y 2 ,...,y m ) include:

Input the second annotation sequence into the CRF model;

Viterbi solution is performed on the second tag sequence by the Viterbi algorithm to obtain the optimal solution path in the second tag sequence, where the optimal solution path is that the tag sequence is the entire target text The highest probability sequence of the sequence;

A label sequence is generated according to the optimal solution path.