CN112115714B

CN112115714B - Deep learning sequence labeling method, device and computer readable storage medium

Info

Publication number: CN112115714B
Application number: CN202011024360.7A
Authority: CN
Inventors: 孙思
Original assignee: Shenzhen Ping An Smart Healthcare Technology Co ltd
Current assignee: Shenzhen Ping An Smart Healthcare Technology Co ltd
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2023-08-18
Anticipated expiration: 2040-09-25
Also published as: CN112115714A

Abstract

The invention relates to artificial intelligence, and provides a deep learning sequence labeling method, a device and a computer readable storage medium, wherein the method comprises the following steps: preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed; processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed; processing the text features through a softmax layer to obtain predicted labeling positions of the text features; and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed. The method and the device improve the accuracy of sequence labeling in deep learning.

Description

Deep learning sequence labeling method, device and computer readable storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a deep learning sequence labeling method and device based on a loss function, electronic equipment and a computer readable storage medium.

Background

In machine learning and deep learning applications, data imbalance is a very common problem, especially in natural language processing tasks. For example: in a sequence labeling task of natural language processing, when BIEOS labeling is adopted for named entity identification, most of data in a text is labeled as O (namely, the data is not named entity), the quantity of O and other categories (labeled as B, I, E, S) is quite obvious, the obvious unbalance can lead to that a model tends to be negative under the ordinary loss, but in fact, the correct labeling of positive examples in the labeling task is more important, so that evaluation is generally carried out by adopting an evaluation function of macro-F1 which pays attention to the accuracy and recall of each label at the same time, and the negative examples too much can lead to that the model is difficult to learn a difficult sample (a sample labeled as the positive example), and the model can forget to learn the difficult sample under the pushing of a loss function.

In order to solve the unbalance problem, a very large number of solutions are adopted, such as: the method has the advantages that the method is characterized in that data sampling is carried out from the data angle, undersampling is carried out on large-class data, oversampling is carried out on small-class data, data are generated by utilizing smote, back translation is carried out on the data, loss of small-class samples is manually or automatically weighted in a loss function, and the like, so that the problem that models are difficult to learn and difficult to learn due to data unbalance is solved to a certain extent.

However, the method for solving the problem of unbalance labeling is to disregard the problem caused by the characteristics of the loss function, wherein the weighting of the loss function is a thought of solving from the surface layer, and basically, the loss function is always equal to each sample, and whether the labels are positive or negative, the common loss (such as cross entropy) always pushes the sample to 1 or 0, but in practice, for labeling a certain word classification, the probability that the sample is divided into positive and negative only needs that the probability that the sample is divided into positive and negative is greater than or less than 0.5, and the sample is pushed to 0 or 1 extremely without taking more attention. This extreme push is responsible for model failure on unbalanced data.

In order to solve the above-mentioned problems, the present invention needs to provide a new method, apparatus, electronic device and computer readable storage medium for deep learning sequence labeling based on a loss function.

Disclosure of Invention

The invention provides a deep learning sequence labeling method, a device, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of sequence labeling in deep learning.

In order to achieve the above object, the present invention provides a method for labeling a deep learning sequence, the method comprising:

preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed;

processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed;

processing the text features through a softmax layer to obtain predicted labeling positions of the text features;

and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed.

Optionally, the preprocessing each word in the sentence of the text to be processed by using the initialized embedding layer, and obtaining the word vector of each word in the text to be processed, includes the following steps:

and mapping each word in the sentence of the text to be processed into a low-dimensional dense word vector from one-hot vectors by using an initialized ebedding layer to obtain the word vector of each word in the text to be processed.

Optionally, the processing the word vector through the bi-lstm layer, to obtain the text feature of the text to be processed, includes the following steps:

taking a word vector of each word of a sentence in the text to be processed as the input of each time step of the bidirectional lstm;

the hidden state vector output by the forward lstm and the hidden state vector output by the reverse lstm at each position are spliced according to the positions to obtain a complete hidden state vector;

and processing the complete hidden state vector to obtain the text characteristics of the text to be processed.

Optionally, the processing the text feature through the softmax layer, and obtaining the predicted labeling position of the text feature, includes the following steps:

carrying out softmax calculation on the text characteristics, and calculating the normalized probability that each word of a sentence in the text to be processed is predicted to be a certain labeling label;

and marking the calculated maximum probability as a prediction label, and finishing classification of the position to be marked in the text to be processed.

Optionally, the calculating the softmax of the text feature calculates a normalized probability that each word of the sentence in the text to be processed is predicted to be a label tag, including the following steps:

the softmax calculation formula is as follows:

wherein i represents a certain class of K, g _i The value of the class, K.epsilon. (0, n), is represented.

Optionally, the processing, by the loss layer, the predicted labeling position of the text feature, and completing the sequence labeling of the text to be processed, includes the following steps:

calculating loss values of the predicted labeling position and the real label position according to a loss function of loss;

repeatedly training the deep learning model according to the loss value until the deep learning model converges, and finishing the training of the deep learning model;

and processing the text to be processed through the trained deep learning model to obtain the sequence label of the text to be processed.

Optionally, the loss value of the predicted labeling position and the real label position is calculated according to a loss function, and the specific formula of the loss function is as follows:

wherein 1-p _i1 Representing an automatic scaling factor; p is p _i1 Representing a probability of 1 predicted;

y _i1 then the probability of 1; the e is a smooth term.

In order to solve the above problems, the present invention further provides a deep learning sequence labeling device, which includes:

the word vector acquisition module is used for preprocessing each word in a sentence of the text to be processed by using the initialized ebedding layer to acquire a word vector of each word in the text to be processed;

the text feature acquisition module is used for processing the word vector through a bi-lstm layer to acquire text features of the text to be processed;

the prediction labeling position acquisition module is used for processing the text features through the softmax layer to acquire the prediction labeling positions of the text features;

and the sequence labeling completion module is used for processing the predicted labeling position of the text characteristic through a loss layer and completing the sequence labeling of the text to be processed.

In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the deep learning sequence tagging method described above.

In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the deep learning sequence labeling method described above.

According to the embodiment of the invention, each word in a sentence of a text to be processed is preprocessed through an initialized ebedding layer, and a word vector of each word in the text to be processed is obtained; processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed; processing the text features through a softmax layer to obtain predicted labeling positions of the text features; and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed. According to the invention, the text to be processed is sequence-labeled through the loss function of the DSC coefficient principle, and the loss function enables a difficult sample to be more focused during training of the deep learning model, so that the labeling accuracy is improved as a whole.

Drawings

FIG. 1 is a flowchart of a deep learning sequence labeling method according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a deep learning sequence labeling apparatus according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an internal structure of an electronic device for implementing a deep learning sequence labeling method according to an embodiment of the present invention;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention provides a deep learning sequence labeling method. Referring to fig. 1, a flow chart of a method for labeling a deep learning sequence according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.

In this embodiment, the deep learning sequence labeling method includes:

s1: preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed;

s2: processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed;

s3: processing the text features through a softmax layer to obtain predicted labeling positions of the text features;

s4: and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed.

The self-adaptive loss function for unbalanced tasks in sequence labeling tasks which are common in natural language processing and provided by the invention is a basic process of deep learning sequence labeling of artificial intelligence, and the loss function is modified by referring to a DSC coefficient principle, so that the loss function pays more attention to difficult samples when a model is pushed to train, the labeling accuracy is improved as a whole, and the F1 value can be improved to a certain extent.

In an embodiment of the invention, for example: in the NER (Named Entity Recognition, named entity recognition, also called as special name recognition, is a common task in natural language processing) task, features are extracted through a model of the structure of an embellishing layer, a bi-lstm layer and a soft Max, and a loss layer adopts a special loss designed according to an unbalanced task, and the loss design enables a deep learning model to better train and learn the unbalanced problem, so that the labeling accuracy is improved.

In step S1, the preprocessing is performed on each word in a sentence of a text to be processed by using an initialized embedding layer, so as to obtain a word vector of each word in the text to be processed, including the following steps: and mapping each word in the sentence of the text to be processed into a low-dimensional dense word vector (character embedding) from one-hot vectors by using an initialized ebedding layer, and obtaining the word vector of each word in the text to be processed. Wherein dropout is set to mitigate overfitting before entering the next layer.

In step S2, the processing the word vector through the bi-lstm layer, to obtain the text feature of the text to be processed, includes the following steps:

s21: taking a word vector of each word of a sentence in the text to be processed as the input of each time step of the bidirectional lstm;

s22: the hidden state vector output by the forward lstm and the hidden state vector output by the reverse lstm at each position are spliced according to the positions to obtain a complete hidden state vector;

s23: and processing the complete hidden state vector to obtain the text characteristics of the text to be processed.

After dropout is set, a linear layer is connected, the hidden state vector is mapped from m dimension to k dimension, k is the number of labels of the labeling set, and therefore automatically extracted sentence characteristics are obtained, and further text characteristics of the text to be processed are obtained.

In step S3, the processing the text feature through the softmax layer, to obtain a predicted labeling position of the text feature, includes the following steps:

s31: carrying out softmax calculation on the text characteristics, and calculating the normalized probability that each word of a sentence in the text to be processed is predicted to be a certain labeling label;

s32: and marking the calculated maximum probability as a prediction label, and finishing classification of the position to be marked in the text to be processed.

Calculating the normalized probability (such as 1:0.7 and 0:0.3) of a word predicted as a label tag, and taking the maximum probability as the predicted label tag is equivalent to classifying the positions to be labeled.

The softmax calculation formula is as follows:

In step S4, the processing, by the loss layer, the predicted labeling position of the text feature, and completing the sequence labeling of the text to be processed, includes the following steps:

s41: calculating loss values of the predicted labeling position and the real label position according to a loss function of loss;

s42: repeatedly training the deep learning model according to the loss value until the deep learning model converges, and finishing the training of the deep learning model;

s43: and processing the text to be processed through the trained deep learning model to obtain the sequence label of the text to be processed.

Specifically, the deep learning model structure is an enabling layer+bi-lstm layer+soft Max, which are all commonly used in NLP tasks, and examples thereof are as follows:

there is sample data d= { (X ₁ ,Y ₁ ),(X ₂ ,Y ₂ ),…(X _n ,Y _n )},

Wherein X is ₁ ＝(w ₁₁ ,w ₁₂ ,…w _1k ) A text with k data; y is Y ₁ ＝(y ₁₁ ,y ₁₂ ,…y _1k ) Labeling the sequence of the text with a label, y _ij ∈{0,1}。

X′＝embedding(X ₁ ,X ₂ ,…X _batchsize ) (1)

X″＝bilstm(X′) (2)

P _{0～batchsize} ＝softmax(x″) (3)

Wherein P is _{0～batchsize} Respectively (X) ₁ ,X ₂ ,…X _batchsize ) The probability of 1 is marked after normalization,

the batch size is the size of the batch training and is a superparameter.

To simplify the problem, the input of a sample is denoted as X _m The intermediate models (1), (2) and (3) are integrally abstracted as M, and X is obtained from the models _j The binary probability of the ith word in (b) is p= [ p ] _i0 ,p _i1 ]，p _i0 Is X _j Probability of predicting 0 for the i-th word in the sentence, p _i1 Then it is the probability of predicting 1, and the corresponding true label is y= [ y ] _i0 ,y _i1 ]，y _i0 Is X _j Probability, y, of the ith word truly labeled 0 in this sentence _i1 Then it is the probability of labeling 1, y _i0 ,y _i1 Takes a value of 0 or 1, wherein p _i0 +p _i1 ＝1。

It should be noted that, the conventional cross entropy loss is:

the loss function is weighted as:

wherein alpha is _i ∈[0,1],α _i Either from the hyper-parameters or from the class statistics before training.

The deep learning model obtains loss through Weight CE, and after back propagation, model parameters are adjusted, and after multiple iterations, the model converges.

Compared with the traditional cross entropy loss, the invention modifies loss as follows:

DSC (Sorensen-Dice) is used to measure the similarity of two sets, such as: given two sets a, B, their degree of similarity can be measured by the DSE formula:

in natural language processing tasks, similar reference DSEs of predictive tags and real tags are measured as follows:

wherein, FN: false positive, determined as a Negative sample, but in fact as a positive sample; FP: false Positive, determined as Positive, but in fact as negative; TN: true Negative, determined as Negative, in fact also Negative; TP: true Positive, which is determined to be a Positive sample, is actually also a syndrome sample.

From the above, it can be seen that DSC measures label essence as F1 in a natural language processing task, so optimizing DSC is optimizing F1, and loss can be improved as follows:

in order for 0, which is a negative example, to contribute to loss, loss is smoothed as follows:

after the smoothing term epsilon is added, the problem of manual setting is faced in different data sets, and the smoothing term epsilon can be influenced by a large class, so that the whole model is dominated by the large class, and therefore, a self-adjusting term is added, and the formula is as follows:

wherein p is _i1 Acting as an auto-scaling factor, the loss is guided and is at p _i1 The part exceeding 0.5 has no gain on loss, does not force the model to learn towards 0,1, and is beneficial to solving the balance problem.

In the embodiment of the invention, the traditional Weight CE is replaced by DSC loss, and model parameters are adjusted after back propagation, and the model is converged after multiple iterations.

The initialized embellishing layer pre-processes each word in a sentence of a text to be processed to obtain a word vector of each word in the text to be processed; processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed; processing the text features through a softmax layer to obtain predicted labeling positions of the text features; and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed. According to the invention, the text to be processed is sequence-labeled through the loss function of the DSC coefficient principle, and the loss function enables a difficult sample to be more focused during training of the deep learning model, so that the labeling accuracy is improved as a whole.

Fig. 2 is a functional block diagram of the deep learning sequence labeling device according to the present invention.

The deep learning sequence labeling device 100 of the present invention may be installed in an electronic device. The deep learning sequence labeling device may include a word vector acquisition module 101, a text feature acquisition module 102, a prediction labeling position acquisition module 103, and a sequence labeling completion module 104 according to the implemented functions. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

a word vector obtaining module 101, configured to pre-process each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtain a word vector of each word in the text to be processed;

a text feature obtaining module 102, configured to process the word vector through a bi-lstm layer, and obtain text features of the text to be processed;

the predicted labeling position obtaining module 103 is configured to process the text feature through a softmax layer, and obtain a predicted labeling position of the text feature;

and the sequence labeling completion module 104 is used for processing the predicted labeling position of the text feature through a loss layer to complete the sequence labeling of the text to be processed.

In the word vector obtaining module 101, the preprocessing is performed on each word in a sentence of a text to be processed by using an initialized compressing layer, so as to obtain a word vector of each word in the text to be processed, including the following steps: and mapping each word in the sentence of the text to be processed into a low-dimensional dense word vector (character embedding) from one-hot vectors by using an initialized ebedding layer, and obtaining the word vector of each word in the text to be processed. Wherein dropout is set to mitigate overfitting before entering the next layer.

In the text feature obtaining module 102, the processing the word vector through the bi-lstm layer, to obtain the text feature of the text to be processed, includes the following steps:

The predicted labeling position obtaining module 103 processes the text feature through a softmax layer to obtain a predicted labeling position of the text feature, and includes the following steps:

The softmax calculation formula is as follows:

In the sequence labeling completion module 104, the processing, through the loss layer, the predicted labeling position of the text feature, and completing the sequence labeling of the text to be processed, includes the following steps:

there is sample data d= { (X ₁ ,Y ₁ ),(X ₂ ,Y ₂ ),…(X _n ,Y _n )},

X′＝embedding(X ₁ ,X ₂ ,…X _batchsize ) (1)

X″＝bilstm(X′) (2)

P _{0～batchsize} ＝softmax(x″) (3)

Wherein P is _{0～batchsize} Respectively (X) ₁ ,X ₂ ,…X _batchsize ) The normalized probability is marked as 1, and the batch size is the size of batch training and is a super parameter.

The invention modifies loss as follows:

Fig. 3 is a schematic structural diagram of an electronic device for implementing deep learning sequence labeling according to the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as an easy deep learning sequence annotation program 12.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of data auditing programs, etc., but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (e.g., data auditing programs, etc.) stored in the memory 11, and calling data stored in the memory 11.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The deep learning sequence annotation program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:

Specifically, the specific implementation method of the above instructions by the processor 10 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

In an embodiment of the present invention, a computer readable storage medium stores a computer program, which when executed by a processor, implements a deep learning sequence labeling method, and the specific method is as follows:

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A method for labeling a deep learning sequence, the method comprising:

processing the text features through a softmax layer to obtain predicted labeling positions of the text features, wherein the method comprises the following steps:

and carrying out softmax calculation on the text characteristics, and calculating the normalized probability that each word of a sentence in the text to be processed is predicted to be a certain labeling label, wherein the softmax calculation formula is as follows:

wherein i represents a certain class of K, g _i A value representing the classification, K.epsilon. (0, n);

marking the calculated maximum probability as a prediction label, and completing classification of the position to be marked in the text to be processed;

processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed, and comprising the following steps:

calculating the loss value of the predicted labeling position and the real label position according to a loss function, wherein the specific formula of the loss function is as follows:

y _i1 then the probability of 1; the E is a smooth item;

2. The deep learning sequence labeling method as set forth in claim 1, wherein the preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer to obtain a word vector of each word in the text to be processed comprises the following steps:

3. The deep learning sequence labeling method according to claim 1, wherein the processing the word vector through the bi-lstm layer to obtain the text feature of the text to be processed comprises the following steps:

4. A deep learning sequence annotation device, the device comprising:

the predicted labeling position obtaining module is used for processing the text features through a softmax layer to obtain the predicted labeling positions of the text features, and comprises the following steps:

the sequence marking completion module is used for processing the predicted marking position of the text feature through a loss layer to complete the sequence marking of the text to be processed, and comprises the following steps:

y _i1 then the probability of 1; the E is a smooth item;

5. An electronic device, the electronic device comprising:

at least one processor; the method comprises the steps of,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the deep learning sequence labeling method of any of claims 1-3.

6. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the deep learning sequence labeling method of any one of claims 1 to 3.