CN112115714B - Deep learning sequence labeling method, device and computer readable storage medium - Google Patents

Deep learning sequence labeling method, device and computer readable storage medium Download PDF

Info

Publication number
CN112115714B
CN112115714B CN202011024360.7A CN202011024360A CN112115714B CN 112115714 B CN112115714 B CN 112115714B CN 202011024360 A CN202011024360 A CN 202011024360A CN 112115714 B CN112115714 B CN 112115714B
Authority
CN
China
Prior art keywords
text
processed
word
deep learning
labeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011024360.7A
Other languages
Chinese (zh)
Other versions
CN112115714A (en
Inventor
孙思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Smart Healthcare Technology Co ltd
Original Assignee
Shenzhen Ping An Smart Healthcare Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ping An Smart Healthcare Technology Co ltd filed Critical Shenzhen Ping An Smart Healthcare Technology Co ltd
Priority to CN202011024360.7A priority Critical patent/CN112115714B/en
Publication of CN112115714A publication Critical patent/CN112115714A/en
Application granted granted Critical
Publication of CN112115714B publication Critical patent/CN112115714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to artificial intelligence, and provides a deep learning sequence labeling method, a device and a computer readable storage medium, wherein the method comprises the following steps: preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed; processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed; processing the text features through a softmax layer to obtain predicted labeling positions of the text features; and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed. The method and the device improve the accuracy of sequence labeling in deep learning.

Description

Deep learning sequence labeling method, device and computer readable storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a deep learning sequence labeling method and device based on a loss function, electronic equipment and a computer readable storage medium.
Background
In machine learning and deep learning applications, data imbalance is a very common problem, especially in natural language processing tasks. For example: in a sequence labeling task of natural language processing, when BIEOS labeling is adopted for named entity identification, most of data in a text is labeled as O (namely, the data is not named entity), the quantity of O and other categories (labeled as B, I, E, S) is quite obvious, the obvious unbalance can lead to that a model tends to be negative under the ordinary loss, but in fact, the correct labeling of positive examples in the labeling task is more important, so that evaluation is generally carried out by adopting an evaluation function of macro-F1 which pays attention to the accuracy and recall of each label at the same time, and the negative examples too much can lead to that the model is difficult to learn a difficult sample (a sample labeled as the positive example), and the model can forget to learn the difficult sample under the pushing of a loss function.
In order to solve the unbalance problem, a very large number of solutions are adopted, such as: the method has the advantages that the method is characterized in that data sampling is carried out from the data angle, undersampling is carried out on large-class data, oversampling is carried out on small-class data, data are generated by utilizing smote, back translation is carried out on the data, loss of small-class samples is manually or automatically weighted in a loss function, and the like, so that the problem that models are difficult to learn and difficult to learn due to data unbalance is solved to a certain extent.
However, the method for solving the problem of unbalance labeling is to disregard the problem caused by the characteristics of the loss function, wherein the weighting of the loss function is a thought of solving from the surface layer, and basically, the loss function is always equal to each sample, and whether the labels are positive or negative, the common loss (such as cross entropy) always pushes the sample to 1 or 0, but in practice, for labeling a certain word classification, the probability that the sample is divided into positive and negative only needs that the probability that the sample is divided into positive and negative is greater than or less than 0.5, and the sample is pushed to 0 or 1 extremely without taking more attention. This extreme push is responsible for model failure on unbalanced data.
In order to solve the above-mentioned problems, the present invention needs to provide a new method, apparatus, electronic device and computer readable storage medium for deep learning sequence labeling based on a loss function.
Disclosure of Invention
The invention provides a deep learning sequence labeling method, a device, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of sequence labeling in deep learning.
In order to achieve the above object, the present invention provides a method for labeling a deep learning sequence, the method comprising:
preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed;
processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed;
processing the text features through a softmax layer to obtain predicted labeling positions of the text features;
and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed.
Optionally, the preprocessing each word in the sentence of the text to be processed by using the initialized embedding layer, and obtaining the word vector of each word in the text to be processed, includes the following steps:
and mapping each word in the sentence of the text to be processed into a low-dimensional dense word vector from one-hot vectors by using an initialized ebedding layer to obtain the word vector of each word in the text to be processed.
Optionally, the processing the word vector through the bi-lstm layer, to obtain the text feature of the text to be processed, includes the following steps:
taking a word vector of each word of a sentence in the text to be processed as the input of each time step of the bidirectional lstm;
the hidden state vector output by the forward lstm and the hidden state vector output by the reverse lstm at each position are spliced according to the positions to obtain a complete hidden state vector;
and processing the complete hidden state vector to obtain the text characteristics of the text to be processed.
Optionally, the processing the text feature through the softmax layer, and obtaining the predicted labeling position of the text feature, includes the following steps:
carrying out softmax calculation on the text characteristics, and calculating the normalized probability that each word of a sentence in the text to be processed is predicted to be a certain labeling label;
and marking the calculated maximum probability as a prediction label, and finishing classification of the position to be marked in the text to be processed.
Optionally, the calculating the softmax of the text feature calculates a normalized probability that each word of the sentence in the text to be processed is predicted to be a label tag, including the following steps:
the softmax calculation formula is as follows:
wherein i represents a certain class of K, g i The value of the class, K.epsilon. (0, n), is represented.
Optionally, the processing, by the loss layer, the predicted labeling position of the text feature, and completing the sequence labeling of the text to be processed, includes the following steps:
calculating loss values of the predicted labeling position and the real label position according to a loss function of loss;
repeatedly training the deep learning model according to the loss value until the deep learning model converges, and finishing the training of the deep learning model;
and processing the text to be processed through the trained deep learning model to obtain the sequence label of the text to be processed.
Optionally, the loss value of the predicted labeling position and the real label position is calculated according to a loss function, and the specific formula of the loss function is as follows:
wherein 1-p i1 Representing an automatic scaling factor; p is p i1 Representing a probability of 1 predicted;
y i1 then the probability of 1; the e is a smooth term.
In order to solve the above problems, the present invention further provides a deep learning sequence labeling device, which includes:
the word vector acquisition module is used for preprocessing each word in a sentence of the text to be processed by using the initialized ebedding layer to acquire a word vector of each word in the text to be processed;
the text feature acquisition module is used for processing the word vector through a bi-lstm layer to acquire text features of the text to be processed;
the prediction labeling position acquisition module is used for processing the text features through the softmax layer to acquire the prediction labeling positions of the text features;
and the sequence labeling completion module is used for processing the predicted labeling position of the text characteristic through a loss layer and completing the sequence labeling of the text to be processed.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the deep learning sequence tagging method described above.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the deep learning sequence labeling method described above.
According to the embodiment of the invention, each word in a sentence of a text to be processed is preprocessed through an initialized ebedding layer, and a word vector of each word in the text to be processed is obtained; processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed; processing the text features through a softmax layer to obtain predicted labeling positions of the text features; and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed. According to the invention, the text to be processed is sequence-labeled through the loss function of the DSC coefficient principle, and the loss function enables a difficult sample to be more focused during training of the deep learning model, so that the labeling accuracy is improved as a whole.
Drawings
FIG. 1 is a flowchart of a deep learning sequence labeling method according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a deep learning sequence labeling apparatus according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device for implementing a deep learning sequence labeling method according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a deep learning sequence labeling method. Referring to fig. 1, a flow chart of a method for labeling a deep learning sequence according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
In this embodiment, the deep learning sequence labeling method includes:
s1: preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed;
s2: processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed;
s3: processing the text features through a softmax layer to obtain predicted labeling positions of the text features;
s4: and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed.
The self-adaptive loss function for unbalanced tasks in sequence labeling tasks which are common in natural language processing and provided by the invention is a basic process of deep learning sequence labeling of artificial intelligence, and the loss function is modified by referring to a DSC coefficient principle, so that the loss function pays more attention to difficult samples when a model is pushed to train, the labeling accuracy is improved as a whole, and the F1 value can be improved to a certain extent.
In an embodiment of the invention, for example: in the NER (Named Entity Recognition, named entity recognition, also called as special name recognition, is a common task in natural language processing) task, features are extracted through a model of the structure of an embellishing layer, a bi-lstm layer and a soft Max, and a loss layer adopts a special loss designed according to an unbalanced task, and the loss design enables a deep learning model to better train and learn the unbalanced problem, so that the labeling accuracy is improved.
In step S1, the preprocessing is performed on each word in a sentence of a text to be processed by using an initialized embedding layer, so as to obtain a word vector of each word in the text to be processed, including the following steps: and mapping each word in the sentence of the text to be processed into a low-dimensional dense word vector (character embedding) from one-hot vectors by using an initialized ebedding layer, and obtaining the word vector of each word in the text to be processed. Wherein dropout is set to mitigate overfitting before entering the next layer.
In step S2, the processing the word vector through the bi-lstm layer, to obtain the text feature of the text to be processed, includes the following steps:
s21: taking a word vector of each word of a sentence in the text to be processed as the input of each time step of the bidirectional lstm;
s22: the hidden state vector output by the forward lstm and the hidden state vector output by the reverse lstm at each position are spliced according to the positions to obtain a complete hidden state vector;
s23: and processing the complete hidden state vector to obtain the text characteristics of the text to be processed.
After dropout is set, a linear layer is connected, the hidden state vector is mapped from m dimension to k dimension, k is the number of labels of the labeling set, and therefore automatically extracted sentence characteristics are obtained, and further text characteristics of the text to be processed are obtained.
In step S3, the processing the text feature through the softmax layer, to obtain a predicted labeling position of the text feature, includes the following steps:
s31: carrying out softmax calculation on the text characteristics, and calculating the normalized probability that each word of a sentence in the text to be processed is predicted to be a certain labeling label;
s32: and marking the calculated maximum probability as a prediction label, and finishing classification of the position to be marked in the text to be processed.
Calculating the normalized probability (such as 1:0.7 and 0:0.3) of a word predicted as a label tag, and taking the maximum probability as the predicted label tag is equivalent to classifying the positions to be labeled.
The softmax calculation formula is as follows:
wherein i represents a certain class of K, g i The value of the class, K.epsilon. (0, n), is represented.
In step S4, the processing, by the loss layer, the predicted labeling position of the text feature, and completing the sequence labeling of the text to be processed, includes the following steps:
s41: calculating loss values of the predicted labeling position and the real label position according to a loss function of loss;
s42: repeatedly training the deep learning model according to the loss value until the deep learning model converges, and finishing the training of the deep learning model;
s43: and processing the text to be processed through the trained deep learning model to obtain the sequence label of the text to be processed.
Specifically, the deep learning model structure is an enabling layer+bi-lstm layer+soft Max, which are all commonly used in NLP tasks, and examples thereof are as follows:
there is sample data d= { (X 1 ,Y 1 ),(X 2 ,Y 2 ),…(X n ,Y n )},
Wherein X is 1 =(w 11 ,w 12 ,…w 1k ) A text with k data; y is Y 1 =(y 11 ,y 12 ,…y 1k ) Labeling the sequence of the text with a label, y ij ∈{0,1}。
X′=embedding(X 1 ,X 2 ,…X batchsize ) (1)
X″=bilstm(X′) (2)
P 0~batchsize =softmax(x″) (3)
Wherein P is 0~batchsize Respectively (X) 1 ,X 2 ,…X batchsize ) The probability of 1 is marked after normalization,
the batch size is the size of the batch training and is a superparameter.
To simplify the problem, the input of a sample is denoted as X m The intermediate models (1), (2) and (3) are integrally abstracted as M, and X is obtained from the models j The binary probability of the ith word in (b) is p= [ p ] i0 ,p i1 ],p i0 Is X j Probability of predicting 0 for the i-th word in the sentence, p i1 Then it is the probability of predicting 1, and the corresponding true label is y= [ y ] i0 ,y i1 ],y i0 Is X j Probability, y, of the ith word truly labeled 0 in this sentence i1 Then it is the probability of labeling 1, y i0 ,y i1 Takes a value of 0 or 1, wherein p i0 +p i1 =1。
It should be noted that, the conventional cross entropy loss is:
the loss function is weighted as:
wherein alpha is i ∈[0,1],α i Either from the hyper-parameters or from the class statistics before training.
The deep learning model obtains loss through Weight CE, and after back propagation, model parameters are adjusted, and after multiple iterations, the model converges.
Compared with the traditional cross entropy loss, the invention modifies loss as follows:
DSC (Sorensen-Dice) is used to measure the similarity of two sets, such as: given two sets a, B, their degree of similarity can be measured by the DSE formula:
in natural language processing tasks, similar reference DSEs of predictive tags and real tags are measured as follows:
wherein, FN: false positive, determined as a Negative sample, but in fact as a positive sample; FP: false Positive, determined as Positive, but in fact as negative; TN: true Negative, determined as Negative, in fact also Negative; TP: true Positive, which is determined to be a Positive sample, is actually also a syndrome sample.
From the above, it can be seen that DSC measures label essence as F1 in a natural language processing task, so optimizing DSC is optimizing F1, and loss can be improved as follows:
in order for 0, which is a negative example, to contribute to loss, loss is smoothed as follows:
after the smoothing term epsilon is added, the problem of manual setting is faced in different data sets, and the smoothing term epsilon can be influenced by a large class, so that the whole model is dominated by the large class, and therefore, a self-adjusting term is added, and the formula is as follows:
wherein p is i1 Acting as an auto-scaling factor, the loss is guided and is at p i1 The part exceeding 0.5 has no gain on loss, does not force the model to learn towards 0,1, and is beneficial to solving the balance problem.
In the embodiment of the invention, the traditional Weight CE is replaced by DSC loss, and model parameters are adjusted after back propagation, and the model is converged after multiple iterations.
The initialized embellishing layer pre-processes each word in a sentence of a text to be processed to obtain a word vector of each word in the text to be processed; processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed; processing the text features through a softmax layer to obtain predicted labeling positions of the text features; and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed. According to the invention, the text to be processed is sequence-labeled through the loss function of the DSC coefficient principle, and the loss function enables a difficult sample to be more focused during training of the deep learning model, so that the labeling accuracy is improved as a whole.
Fig. 2 is a functional block diagram of the deep learning sequence labeling device according to the present invention.
The deep learning sequence labeling device 100 of the present invention may be installed in an electronic device. The deep learning sequence labeling device may include a word vector acquisition module 101, a text feature acquisition module 102, a prediction labeling position acquisition module 103, and a sequence labeling completion module 104 according to the implemented functions. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
a word vector obtaining module 101, configured to pre-process each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtain a word vector of each word in the text to be processed;
a text feature obtaining module 102, configured to process the word vector through a bi-lstm layer, and obtain text features of the text to be processed;
the predicted labeling position obtaining module 103 is configured to process the text feature through a softmax layer, and obtain a predicted labeling position of the text feature;
and the sequence labeling completion module 104 is used for processing the predicted labeling position of the text feature through a loss layer to complete the sequence labeling of the text to be processed.
In the word vector obtaining module 101, the preprocessing is performed on each word in a sentence of a text to be processed by using an initialized compressing layer, so as to obtain a word vector of each word in the text to be processed, including the following steps: and mapping each word in the sentence of the text to be processed into a low-dimensional dense word vector (character embedding) from one-hot vectors by using an initialized ebedding layer, and obtaining the word vector of each word in the text to be processed. Wherein dropout is set to mitigate overfitting before entering the next layer.
In the text feature obtaining module 102, the processing the word vector through the bi-lstm layer, to obtain the text feature of the text to be processed, includes the following steps:
taking a word vector of each word of a sentence in the text to be processed as the input of each time step of the bidirectional lstm;
the hidden state vector output by the forward lstm and the hidden state vector output by the reverse lstm at each position are spliced according to the positions to obtain a complete hidden state vector;
and processing the complete hidden state vector to obtain the text characteristics of the text to be processed.
After dropout is set, a linear layer is connected, the hidden state vector is mapped from m dimension to k dimension, k is the number of labels of the labeling set, and therefore automatically extracted sentence characteristics are obtained, and further text characteristics of the text to be processed are obtained.
The predicted labeling position obtaining module 103 processes the text feature through a softmax layer to obtain a predicted labeling position of the text feature, and includes the following steps:
carrying out softmax calculation on the text characteristics, and calculating the normalized probability that each word of a sentence in the text to be processed is predicted to be a certain labeling label;
and marking the calculated maximum probability as a prediction label, and finishing classification of the position to be marked in the text to be processed.
Calculating the normalized probability (such as 1:0.7 and 0:0.3) of a word predicted as a label tag, and taking the maximum probability as the predicted label tag is equivalent to classifying the positions to be labeled.
The softmax calculation formula is as follows:
wherein i represents a certain class of K, g i The value of the class, K.epsilon. (0, n), is represented.
In the sequence labeling completion module 104, the processing, through the loss layer, the predicted labeling position of the text feature, and completing the sequence labeling of the text to be processed, includes the following steps:
s41: calculating loss values of the predicted labeling position and the real label position according to a loss function of loss;
s42: repeatedly training the deep learning model according to the loss value until the deep learning model converges, and finishing the training of the deep learning model;
s43: and processing the text to be processed through the trained deep learning model to obtain the sequence label of the text to be processed.
Specifically, the deep learning model structure is an enabling layer+bi-lstm layer+soft Max, which are all commonly used in NLP tasks, and examples thereof are as follows:
there is sample data d= { (X 1 ,Y 1 ),(X 2 ,Y 2 ),…(X n ,Y n )},
Wherein X is 1 =(w 11 ,w 12 ,…w 1k ) A text with k data; y is Y 1 =(y 11 ,y 12 ,…y 1k ) Labeling the sequence of the text with a label, y ij ∈{0,1}。
X′=embedding(X 1 ,X 2 ,…X batchsize ) (1)
X″=bilstm(X′) (2)
P 0~batchsize =softmax(x″) (3)
Wherein P is 0~batchsize Respectively (X) 1 ,X 2 ,…X batchsize ) The normalized probability is marked as 1, and the batch size is the size of batch training and is a super parameter.
To simplify the problem, the input of a sample is denoted as X m The intermediate models (1), (2) and (3) are integrally abstracted as M, and X is obtained from the models j The binary probability of the ith word in (b) is p= [ p ] i0 ,p i1 ],p i0 Is X j Probability of predicting 0 for the i-th word in the sentence, p i1 Then it is the probability of predicting 1, and the corresponding true label is y= [ y ] i0 ,y i1 ],y i0 Is X j Probability, y, of the ith word truly labeled 0 in this sentence i1 Then it is the probability of labeling 1, y i0 ,y i1 Takes a value of 0 or 1, wherein p i0 +p i1 =1。
The invention modifies loss as follows:
DSC (Sorensen-Dice) is used to measure the similarity of two sets, such as: given two sets a, B, their degree of similarity can be measured by the DSE formula:
in natural language processing tasks, similar reference DSEs of predictive tags and real tags are measured as follows:
wherein, FN: false positive, determined as a Negative sample, but in fact as a positive sample; FP: false Positive, determined as Positive, but in fact as negative; TN: true Negative, determined as Negative, in fact also Negative; TP: true Positive, which is determined to be a Positive sample, is actually also a syndrome sample.
From the above, it can be seen that DSC measures label essence as F1 in a natural language processing task, so optimizing DSC is optimizing F1, and loss can be improved as follows:
in order for 0, which is a negative example, to contribute to loss, loss is smoothed as follows:
after the smoothing term epsilon is added, the problem of manual setting is faced in different data sets, and the smoothing term epsilon can be influenced by a large class, so that the whole model is dominated by the large class, and therefore, a self-adjusting term is added, and the formula is as follows:
wherein p is i1 Acting as an auto-scaling factor, the loss is guided and is at p i1 The part exceeding 0.5 has no gain on loss, does not force the model to learn towards 0,1, and is beneficial to solving the balance problem.
In the embodiment of the invention, the traditional Weight CE is replaced by DSC loss, and model parameters are adjusted after back propagation, and the model is converged after multiple iterations.
The initialized embellishing layer pre-processes each word in a sentence of a text to be processed to obtain a word vector of each word in the text to be processed; processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed; processing the text features through a softmax layer to obtain predicted labeling positions of the text features; and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed. According to the invention, the text to be processed is sequence-labeled through the loss function of the DSC coefficient principle, and the loss function enables a difficult sample to be more focused during training of the deep learning model, so that the labeling accuracy is improved as a whole.
Fig. 3 is a schematic structural diagram of an electronic device for implementing deep learning sequence labeling according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as an easy deep learning sequence annotation program 12.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of data auditing programs, etc., but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (e.g., data auditing programs, etc.) stored in the memory 11, and calling data stored in the memory 11.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The deep learning sequence annotation program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed;
processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed;
processing the text features through a softmax layer to obtain predicted labeling positions of the text features;
and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed.
Specifically, the specific implementation method of the above instructions by the processor 10 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
In an embodiment of the present invention, a computer readable storage medium stores a computer program, which when executed by a processor, implements a deep learning sequence labeling method, and the specific method is as follows:
preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed;
processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed;
processing the text features through a softmax layer to obtain predicted labeling positions of the text features;
and processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (6)

1. A method for labeling a deep learning sequence, the method comprising:
preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer, and obtaining a word vector of each word in the text to be processed;
processing the word vector through a bi-lstm layer to acquire text characteristics of the text to be processed;
processing the text features through a softmax layer to obtain predicted labeling positions of the text features, wherein the method comprises the following steps:
and carrying out softmax calculation on the text characteristics, and calculating the normalized probability that each word of a sentence in the text to be processed is predicted to be a certain labeling label, wherein the softmax calculation formula is as follows:
wherein i represents a certain class of K, g i A value representing the classification, K.epsilon. (0, n);
marking the calculated maximum probability as a prediction label, and completing classification of the position to be marked in the text to be processed;
processing the predicted marking position of the text feature through a loss layer to finish the sequence marking of the text to be processed, and comprising the following steps:
calculating the loss value of the predicted labeling position and the real label position according to a loss function, wherein the specific formula of the loss function is as follows:
wherein 1-p i1 Representing an automatic scaling factor; p is p i1 Representing a probability of 1 predicted;
y i1 then the probability of 1; the E is a smooth item;
repeatedly training the deep learning model according to the loss value until the deep learning model converges, and finishing the training of the deep learning model;
and processing the text to be processed through the trained deep learning model to obtain the sequence label of the text to be processed.
2. The deep learning sequence labeling method as set forth in claim 1, wherein the preprocessing each word in a sentence of a text to be processed by using an initialized ebedding layer to obtain a word vector of each word in the text to be processed comprises the following steps:
and mapping each word in the sentence of the text to be processed into a low-dimensional dense word vector from one-hot vectors by using an initialized ebedding layer to obtain the word vector of each word in the text to be processed.
3. The deep learning sequence labeling method according to claim 1, wherein the processing the word vector through the bi-lstm layer to obtain the text feature of the text to be processed comprises the following steps:
taking a word vector of each word of a sentence in the text to be processed as the input of each time step of the bidirectional lstm;
the hidden state vector output by the forward lstm and the hidden state vector output by the reverse lstm at each position are spliced according to the positions to obtain a complete hidden state vector;
and processing the complete hidden state vector to obtain the text characteristics of the text to be processed.
4. A deep learning sequence annotation device, the device comprising:
the word vector acquisition module is used for preprocessing each word in a sentence of the text to be processed by using the initialized ebedding layer to acquire a word vector of each word in the text to be processed;
the text feature acquisition module is used for processing the word vector through a bi-lstm layer to acquire text features of the text to be processed;
the predicted labeling position obtaining module is used for processing the text features through a softmax layer to obtain the predicted labeling positions of the text features, and comprises the following steps:
and carrying out softmax calculation on the text characteristics, and calculating the normalized probability that each word of a sentence in the text to be processed is predicted to be a certain labeling label, wherein the softmax calculation formula is as follows:
wherein i represents a certain class of K, g i A value representing the classification, K.epsilon. (0, n);
marking the calculated maximum probability as a prediction label, and completing classification of the position to be marked in the text to be processed;
the sequence marking completion module is used for processing the predicted marking position of the text feature through a loss layer to complete the sequence marking of the text to be processed, and comprises the following steps:
calculating the loss value of the predicted labeling position and the real label position according to a loss function, wherein the specific formula of the loss function is as follows:
wherein 1-p i1 Representing an automatic scaling factor; p is p i1 Representing a probability of 1 predicted;
y i1 then the probability of 1; the E is a smooth item;
repeatedly training the deep learning model according to the loss value until the deep learning model converges, and finishing the training of the deep learning model;
and processing the text to be processed through the trained deep learning model to obtain the sequence label of the text to be processed.
5. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the deep learning sequence labeling method of any of claims 1-3.
6. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the deep learning sequence labeling method of any one of claims 1 to 3.
CN202011024360.7A 2020-09-25 2020-09-25 Deep learning sequence labeling method, device and computer readable storage medium Active CN112115714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011024360.7A CN112115714B (en) 2020-09-25 2020-09-25 Deep learning sequence labeling method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011024360.7A CN112115714B (en) 2020-09-25 2020-09-25 Deep learning sequence labeling method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112115714A CN112115714A (en) 2020-12-22
CN112115714B true CN112115714B (en) 2023-08-18

Family

ID=73797740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011024360.7A Active CN112115714B (en) 2020-09-25 2020-09-25 Deep learning sequence labeling method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112115714B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528658B (en) * 2020-12-24 2023-07-25 北京百度网讯科技有限公司 Hierarchical classification method, hierarchical classification device, electronic equipment and storage medium
CN113378925B (en) * 2021-06-10 2022-09-20 杭州芯声智能科技有限公司 Method and device for generating double attention training sequence and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106601228A (en) * 2016-12-09 2017-04-26 百度在线网络技术(北京)有限公司 Sample marking method and device based on artificial intelligence prosody prediction
CN109614614A (en) * 2018-12-03 2019-04-12 焦点科技股份有限公司 A kind of BILSTM-CRF name of product recognition methods based on from attention
CN110472235A (en) * 2019-07-22 2019-11-19 北京航天云路有限公司 A kind of end-to-end entity relationship joint abstracting method towards Chinese text
CN110705293A (en) * 2019-08-23 2020-01-17 中国科学院苏州生物医学工程技术研究所 Electronic medical record text named entity recognition method based on pre-training language model
CN111476024A (en) * 2020-02-29 2020-07-31 新华三大数据技术有限公司 Text word segmentation method and device and model training method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106601228A (en) * 2016-12-09 2017-04-26 百度在线网络技术(北京)有限公司 Sample marking method and device based on artificial intelligence prosody prediction
CN109614614A (en) * 2018-12-03 2019-04-12 焦点科技股份有限公司 A kind of BILSTM-CRF name of product recognition methods based on from attention
CN110472235A (en) * 2019-07-22 2019-11-19 北京航天云路有限公司 A kind of end-to-end entity relationship joint abstracting method towards Chinese text
CN110705293A (en) * 2019-08-23 2020-01-17 中国科学院苏州生物医学工程技术研究所 Electronic medical record text named entity recognition method based on pre-training language model
CN111476024A (en) * 2020-02-29 2020-07-31 新华三大数据技术有限公司 Text word segmentation method and device and model training method

Also Published As

Publication number Publication date
CN112115714A (en) 2020-12-22

Similar Documents

Publication Publication Date Title
CN109388807B (en) Method, device and storage medium for identifying named entities of electronic medical records
CN107943847B (en) Business connection extracting method, device and storage medium
CN110362823B (en) Training method and device for descriptive text generation model
CN112115714B (en) Deep learning sequence labeling method, device and computer readable storage medium
CN109918568B (en) Personalized learning method and device, electronic equipment and storage medium
CN112699686B (en) Semantic understanding method, device, equipment and medium based on task type dialogue system
WO2021208696A1 (en) User intention analysis method, apparatus, electronic device, and computer storage medium
CN113807973B (en) Text error correction method, apparatus, electronic device and computer readable storage medium
CN110377902B (en) Training method and device for descriptive text generation model
CN112069319A (en) Text extraction method and device, computer equipment and readable storage medium
CN111563158B (en) Text ranking method, ranking apparatus, server and computer-readable storage medium
WO2023123926A1 (en) Artificial intelligence task processing method and apparatus, electronic device, and readable storage medium
CN114298050A (en) Model training method, entity relation extraction method, device, medium and equipment
CN112000778A (en) Natural language processing method, device and system based on semantic recognition
CN115238115A (en) Image retrieval method, device and equipment based on Chinese data and storage medium
CN115221276A (en) Chinese image-text retrieval model training method, device, equipment and medium based on CLIP
CN114840684A (en) Map construction method, device and equipment based on medical entity and storage medium
CN113870846A (en) Speech recognition method, device and storage medium based on artificial intelligence
CN116226443B (en) Weak supervision video clip positioning method and system based on large-scale video corpus
CN109977400B (en) Verification processing method and device, computer storage medium and terminal
WO2023178979A1 (en) Question labeling method and apparatus, electronic device and storage medium
CN110737812A (en) search engine user satisfaction evaluation method integrating semi-supervised learning and active learning
CN115510188A (en) Text keyword association method, device, equipment and storage medium
WO2022141838A1 (en) Model confidence analysis method and apparatus, electronic device and computer storage medium
CN113342940A (en) Text matching analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221014

Address after: 518066 2601 (Unit 07), Qianhai Free Trade Building, No. 3048, Xinghai Avenue, Nanshan Street, Qianhai Shenzhen-Hong Kong Cooperation Zone, Shenzhen, Guangdong, China

Applicant after: Shenzhen Ping An Smart Healthcare Technology Co.,Ltd.

Address before: 1-34 / F, Qianhai free trade building, 3048 Xinghai Avenue, Mawan, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong 518000

Applicant before: Ping An International Smart City Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant