CN112966516A - Medical named entity identification method based on improved random average gradient descent - Google Patents
Medical named entity identification method based on improved random average gradient descent Download PDFInfo
- Publication number
- CN112966516A CN112966516A CN202110435549.3A CN202110435549A CN112966516A CN 112966516 A CN112966516 A CN 112966516A CN 202110435549 A CN202110435549 A CN 202110435549A CN 112966516 A CN112966516 A CN 112966516A
- Authority
- CN
- China
- Prior art keywords
- gradient descent
- parameter
- rollback
- random
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000002372 labelling Methods 0.000 claims abstract description 9
- 239000003550 marker Substances 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 44
- 230000006870 function Effects 0.000 claims description 25
- 238000012935 Averaging Methods 0.000 claims description 4
- 125000004432 carbon atom Chemical group C* 0.000 claims description 4
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 abstract description 31
- 230000000694 effects Effects 0.000 abstract description 8
- 230000009191 jumping Effects 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 6
- 238000011478 gradient descent method Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention relates to a medical named entity identification method based on improved random average gradient descent, which comprises the following steps: receiving a medical unstructured text to be labeled, and performing data preprocessing to obtain labeled data; establishing an AWD-LSTM model according to the improved random average gradient descent, inputting the preprocessed data into the AWD-LSTM model for training, and obtaining a marker; and carrying out named entity labeling on the medical unstructured text to be labeled by utilizing the trained labeler. The invention has the beneficial effects that: the invention indirectly influences the gradient value by changing the value of the iteration step number, and rolls back the key parameters of the random average gradient descent optimization algorithm according to a certain rule, thereby achieving the effect of changing the shrinkage rate of the random average gradient descent optimization algorithm, achieving the purpose of jumping out of the local optimum of the random average gradient descent optimization algorithm, obtaining a better value, and not increasing the training time.
Description
Technical Field
The invention relates to the field of natural language processing and the technical field of deep learning, in particular to a medical named entity identification method with reduced random average gradient.
Background
Natural Language Processing (NLP) is a branch of artificial intelligence and linguistics, one of the most difficult problems in artificial intelligence. Natural language processing refers to processing of information such as the shape, sound, and meaning of a natural language by a computer, that is, operations and processing for inputting, outputting, recognizing, analyzing, understanding, and generating words, sentences, and chapters. It has a significant impact on computer and human interaction. The basic tasks of natural language processing include voice recognition, information retrieval, question-answering systems, machine translation and the like, and the recurrent neural network and naive Bayes are common models of natural language processing. With the development of deep learning technology in many fields, natural language processing has made a great breakthrough.
Named Entity Recognition (NER) is a basic task in the field of NLP, and is also an important basic tool for most NLP tasks such as question and answer systems, machine translation, syntactic analysis, and the like. Previous approaches have been primarily dictionary-based and rule-based. The dictionary-based method is a method of fuzzy search or complete matching through character strings, but the quality and the size of the dictionary are limited as new entity names are continuously emerged; the rule-based method is to manually specify some rules and expand a rule set by common collocation of self characteristics and phrases of entity names, but huge human resources and time cost are consumed, the rules are generally effective only in a certain specific field, the cost of manual migration is high, and the rule portability is not strong. Named entity recognition is carried out, a machine learning method is mostly adopted, model training is continuously optimized, and the trained model shows better performance in test evaluation. Currently, the most applied models include Hidden Markov Models (HMMs), Support Vector Machines (SVMs), Maximum Entropy Markov Models (MEMMs), Conditional Random Fields (CRFs), and the like. The conditional random field model can effectively process the influence problem of the adjacent labels on the prediction sequence, so that the conditional random field model is applied to entity recognition more and has good effect. At present, a deep learning algorithm is generally adopted for the problem of sequence labeling. Compared with the traditional algorithm, the deep learning algorithm eliminates the step of manually extracting the features, and can effectively extract the distinguishing features.
After the foreign scholars Merity puts forward the AWD-LSTM model, a plurality of language models based on the model have good effects on named entity recognition. These models are trained first, and then re-trained, called "tweaks". However, the deep learning model targets non-convex functions and has many parameters, resulting in a rather difficult training process. At present, a random average gradient descent optimization algorithm is adopted in model training, namely a plurality of samples are randomly selected from a whole sample to be called a batch, then the average gradient of the batch is recorded, and the average gradient average value at all times is taken as the gradient estimation of the whole sample for training. However, the method has the defect that the model parameters cannot be converged, so that the optimal solution is easily missed, and the model training is incomplete.
In recent years, in the biomedical field, the literature resources are increased thousands of times every year, the information is mostly stored in the form of unstructured text, and the biomedicine named entity recognition aims to convert the unstructured text into structured text and recognize and classify specific entity names such as genes, proteins, diseases and the like in the biomedicine text. At present, how to quickly and efficiently retrieve relevant information from huge data is a great challenge.
In a gradient descent optimization method based on a hybrid strategy, which is disclosed in patent No. 2020109668396, the optimization method is firstly set as an Adam optimization algorithm, a gradient descent process of the Adam optimization algorithm is calculated, when a conversion mechanism is satisfied, the Adam optimization algorithm is converted into an SGDM optimization algorithm, a learning rate of the converted SGDM optimization algorithm is determined according to a scaling rule, and the gradient descent process of the SGDM optimization algorithm is calculated until a convergence condition is reached. The method can achieve a certain effect by mixing two optimization algorithms, but the method cannot achieve a good effect on retraining after restarting.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a random average gradient descent method based on a parameter rollback mechanism.
The medical named entity identification method based on improved random average gradient descent comprises the following steps:
step 1, receiving a medical unstructured text to be labeled, and performing data preprocessing to obtain labeled data; for the received medical unstructured text to be labeled, firstly, simple named entity labeling is carried out on a part of the medical unstructured text by using a rule-based and dictionary-based mode, then, high-quality training data are obtained by using a manual labeling method and a data enhancement technology, and linguistic data are provided for later training of a named entity recognition model;
step 2, establishing an AWD-LSTM model according to the improved random average gradient descent, inputting the preprocessed data into the AWD-LSTM model for training, and obtaining a marker;
step 2.1, initializing parameters of the medical named entity identification method based on improved random average gradient descent; the parameters are divided into hyper-parameters and ordinary parameters, and the hyper-parameters comprise: default learning rate lr, attenuation term λ, index α of iterative learning rate update, time point t at which gradient averaging starts0Weight attenuation term, parameter rollback vector s, parameter rollback level bl, and default parameter rollback size bds; common parameters include: iteration step, iteration learning rate eta, weight updating parameter mu and rollback frequency count b;
step 2.2, dynamically reducing the weight energy through weight attenuation operation to prevent model overfitting, wherein a weight attenuation updating function is as follows:
the upper typeIn, wtA random gradient descent weight vector representing time t,representing a randomly selected sample at the time t, and weight _ decay represents weight attenuation; using the L2 regularization term, a smaller weight vector w is obtainedtThe complexity of the network is reduced, so that the problem of model overfitting is effectively avoided;
step 2.3, calculating a random gradient descent weight vector according to a random gradient descent update function, wherein the random gradient descent update function is as follows:
in the above formula, wtA random gradient descent weight vector representing time t; w is at+1Represents a random gradient descent weight vector at time t + 1; eta represents the iterative learning rate, xjWhich represents a sample that is randomly selected,a gradient representing a loss function at time t;
step 2.4, the random gradient descent weight vector obtained in the step 2.3 is converted into a random average gradient descent weight vector through a random average gradient descent updating function; the random mean gradient descent update function is:
in the above formula, the first and second carbon atoms are,a random mean gradient descent weight vector representing time t,represents the random mean gradient descent weight vector, μ, at time t +1tTo representWeight update parameter at time t, wtA random gradient descent weight vector representing time t;
step 2.5, judging whether the value of the key step length parameter meets the condition requirement of rollback or not, and judging whether to perform parameter rollback operation or not by setting two judgments;
step 2.6, when the two judgment results of step 2.5 both meet the parameter rollback condition and the parameter rollback operation can be performed at the moment, performing the parameter rollback operation on the key step length parameter with the decreased random average gradient: updating the value of iteration round number step and the value of rollback number count b; the iteration round number step is updated as:
step=max(step//bl,bds*b)
in the above formula, step is the number of iteration rounds, bl is the parameter rollback level, bds is the default parameter rollback size, and b is the rollback number count; the rollback number count is updated to:
b=b*2
in the above formula, b is the rollback number count;
step 2.7, updating the iterative learning rate, the weight updating parameter and the sample parameter, and increasing the count of the iterative round step counter by 1;
step 2.8, repeating the steps 2.1 to 2.7 until the AWD-LSTM model meets a preset training termination criterion;
and 3, carrying out named entity labeling on the medical unstructured text to be labeled by using the trained labeler.
Preferably, step 2.5 specifically comprises the steps of:
step 2.5.1, first round judgment: judging whether the key step length parameter is divided by the default value, if the iteration round number step is divided by the default value, carrying out the second round judgment in the step 2.5.2; if the iteration round number step can not divide the default value, the iteration round number step and the rollback number count b are not updated;
step 2.5.2, second round judgment: judging whether the system random number is smaller than the quotient of the parameter rollback vector s and the rollback number count b; if the system random number is smaller than the quotient of the parameter rollback vector s and the rollback number count b, executing step 2.6; otherwise, the iteration step and the rollback number count b are not updated.
Preferably, the specific operations of updating the iterative learning rate, the weight update parameter and the sample parameter in step 2.7 are as follows:
the iterative learning rate η is updated as:
in the above formula, etat+1The iterative learning rate at the moment of t +1 is represented, lr represents a default learning rate, lambda represents an attenuation term, step is the number of iterative rounds, and alpha represents the update index of eta;
updating the weight update parameter to:
in the above formula,. mu.t+1A weight update parameter indicating time t +1, t indicating time t, t0Indicating the moment when the random average gradient descent starts;
the sample parameters are updated as:
in the above formula, the first and second carbon atoms are,represents a randomly selected sample at time t,represents a randomly selected sample at the moment of t +1, lambda represents an attenuation term, eta represents an iterative learning rate,representing the gradient of the loss function at time t.
Preferably, at initialization in step 2.1: setting the default learning rate lr to 20-30, setting the attenuation term lambda to 0, and overlappingThe index α of the generation learning rate update is set to 0.75, and the time point t at which gradient averaging is started is set0Set to 0, set the value of the weight attenuation term to 1.2e-6The value of the parameter rollback vector s is set to 0.02, the parameter rollback level bl is set to 10, and the default parameter rollback size bds is set to 10000.
Preferably, the weight update parameter μ at time t in step 2.4tIs 1; default value of attenuation term lambda in step 2.7 is e-4,t0Has a default value of e6。
Preferably, a random number seed value is also set in step 2.1, and the value of the systematic random number in step 2.5.2 is determined by the random number seed value.
Preferably, the key step size parameter in step 2.5 and step 2.6 is the iteration round step.
Preferably, the default value for step 2.5.1 divided exactly by the iteration step is set to 1000.
The invention has the beneficial effects that:
the invention indirectly influences the gradient value by changing the value of the iteration round number step, and rolls back the key parameter (iteration round number step) of the random average gradient descent optimization algorithm according to a certain rule, thereby achieving the effect of changing the shrinkage rate of the random average gradient descent optimization algorithm, achieving the purpose of jumping out of the local optimum of the random average gradient descent optimization algorithm, obtaining a better value, and not increasing the training time.
Drawings
FIG. 1 is a flow chart of a medical named entity identification method based on improved stochastic mean gradient descent;
FIG. 2 is a flow chart of a random average gradient descent method based on a parameter rollback mechanism;
FIG. 3 is a graph of the experimental results of the AWD-LSTM model using different parameter rollback vectors in the Penn Treebank dataset;
FIG. 4 is a graph of the results of a MoS-AWD-LSTM model experiment on the Penn Treebank dataset using rolling back vectors with different parameters.
Detailed Description
The present invention will be further described with reference to the following examples. The following examples are set forth merely to aid in the understanding of the invention. It should be noted that, for a person skilled in the art, several modifications can be made to the invention without departing from the principle of the invention, and these modifications and modifications also fall within the protection scope of the claims of the present invention.
As an embodiment, an AWD-LSTM model is proposed based on Merity of a foreign student, an original random average gradient descent optimization algorithm is replaced by a random average gradient descent optimization algorithm based on a parameter rollback mechanism, and a fine tuning step required in the later training period of the original model is eliminated. As shown in fig. 1, the method mainly comprises the following steps:
step 1, receiving a medical unstructured text to be labeled, and performing data preprocessing to obtain labeled data:
for the received medical unstructured text to be labeled, firstly, simple named entity labeling is carried out on a part of the medical unstructured text by using a rule-based and dictionary-based mode, then, a manual labeling method and a data enhancement technology are utilized to obtain enough high-quality training data, and linguistic data are provided for later training of a named entity recognition model.
Step 2, inputting the preprocessed data into an AWD-LSTM model for training according to the AWD-LSTM model established by the improved random average gradient descent to obtain a marker;
compared with the existing random average gradient descent algorithm, the method carries out certain regular constraint on the step length parameter in the existing random average gradient descent algorithm, indirectly influences the shrinkage rate of the algorithm by constraining the step length parameter, finally achieves the capability of enabling the algorithm to cross over the local optimum in the later period, and can find a more optimal solution.
Referring to fig. 2, the specific steps of building the AWD-LSTM model by improving the random mean gradient descent include:
step 2.1, initializing the parameters of the random average gradient descent optimization algorithm based on the parameter rollback mechanism: the optimization algorithm provided by the invention has eight hyper-parameters which are respectively a default learning rate lr, an attenuation term lambda, an index alpha of eta update and a time point t of starting gradient average0Weight _ decay (penalized by L2), parameter rollback vector s, parameter rollback level bl and default parameter rollback size bds, and four common parameters, iteration round number step, iteration learning rate eta, weight updating parameter mu and rollback number count b. Through experiments, the default learning rate lr value is set to be 20 to 30, and the attenuation term lambda value is set to be e-4The exponent α value of η update is set to 0.75 and the mean time point t of the gradient is started0The value is set to 0 and the weight decay (L2 penalty) weight _ decay value is set to 1.2e-6When the value of the parameter rollback vector s is set to be 0.02, the value of the parameter rollback level bl is set to be 10, and the value of the default parameter rollback size bds is set to be 10000, good effects can be achieved on different models and data sets.
Step 2.2, weight attenuation operation is carried out, so that the weight can be dynamically reduced, overfitting of the model is prevented, and a weight attenuation updating function is as follows:
wherein, wtA random gradient descent weight vector representing time t,represents a randomly selected sample at time t, and weight _ decay represents weight decay. Using the L2 regularization term, a smaller weight vector w is obtainedtAnd the complexity of the network is reduced, so that the problem of model overfitting is effectively avoided.
Step 2.3, calculating a random gradient descent weight vector, wherein a random gradient descent updating function is as follows:
wherein, wtRepresenting the random gradient descent weight vector at time t, η representing the iterative learning rate, xjWhich represents a sample that is randomly selected,representing the gradient of the loss function at time t.
Step 2.4, the random gradient descent weight vector is replaced by a random average gradient descent weight vector, and the random average gradient descent updating function is as follows:
wherein the content of the first and second substances,random mean gradient descent weight vector, μ, representing time ttThe default initial value is 1, which indicates the weight update parameter at time t.
Step 2.5, judging whether the value of the key step length parameter meets the condition requirement of rollback, and simultaneously judging whether the parameter rollback operation can be carried out:
in the random average gradient descent optimization algorithm, the key step size parameter influences the shrinkage rate of the random average gradient descent optimization algorithm. In the original random average gradient descent optimization algorithm, the step size parameter value is continuously increased to lead the algorithm to tend to be more and more stable, although the algorithm can obtain a more stable result in the later period, the algorithm is also trapped in local optimization due to the over-stability, and the algorithm cannot obtain a better result due to the over-small gradient value. Therefore, the improved method provided by the invention indirectly influences the gradient value by changing the value of the step length parameter, thereby achieving the effect of changing the shrinkage rate of the random average gradient descent optimization algorithm, then jumping out of local optimization and obtaining a better value.
In the proposed medical named entity recognition method for improving random average gradient descent, whether parameter rollback is performed or not is judged by setting two judgments. Judging whether the step length parameter in the improved random average gradient descent optimization algorithm is divided by the default value 1000 for the first time, namely setting a default rollback interval, and judging for the second time each time the interval is reached; and judging whether the system random number is smaller than the quotient of the parameter rollback vector s and the parameter rollback count b for the second time, wherein the system random number is influenced by the random number sub-values under the initial setting of model training.
Step 2.6, when the parameter rollback requirement is met, performing parameter rollback operation on the key step length parameter with the random average gradient decreasing:
when the two determinations in step 2.5 both meet the requirements, the value of the step parameter in the optimized random average gradient descent optimization algorithm is divided into the maximum value of the product of the parameter rollback level bl and the default parameter rollback size bds with the rollback number count b by the step parameter value, so that the gradient is ensured not to generate great fluctuation after multiple parameter rollback operations, and the phenomenon that the gradient is too large to jump out of the local optimum and the experimental result is poor is prevented. Meanwhile, the parameter rollback grade b is made to be twice of the value of the parameter rollback grade b, so that the gradient cannot greatly fluctuate under the influence of the parameter rollback operation along with the change of the number of training rounds.
The step size parameter value update function is as follows:
step=max(step//bl,bds*b)
the parameter rollback level update function is as follows:
b=b*2
step 2.7, updating the iterative learning rate eta, the weight updating parameter mu and the sample parameter, and increasing the step size parameter counter by 1:
the iterative learning rate update function is as follows:
where lr represents a default learning rate, λ represents an attenuation term, and e is defaulted-4And α represents an update index of η.
The weight update parameter update function is as follows:
wherein, t0Represents the time point of starting to perform random average gradient descent weight vector, defaultConsider the value e6。
The sample parameter update function is as follows:
wherein the content of the first and second substances,represents a randomly selected sample at the time t, lambda represents an attenuation term, and default is e(-4),η
The rate of iterative learning is represented as,representing the gradient of the loss function at time t.
And 2.8, repeating the steps 2.1-2.7 until the model meets the preset training termination criterion.
And step 3: and carrying out named entity labeling on the received medical unstructured text to be labeled by utilizing the trained labeler.
The experimental results are as follows:
to demonstrate the effectiveness of this example, experiments were performed on the Penn TreeBank dataset for both the AWD-LSTM model and the MoS-AWD-LSTM model. The Penn TreeBank dataset has long been a common dataset for language model experiments, with the maximum number of words in the vocabulary limited to 10000.
In the training process, all experiments strictly follow the regularization and optimization technology introduced in the AWD-LSTM, a series of optimization techniques including stacking three layers of LSTM and the like are included, and the experiment is reproduced in a Pytrch-0.4 version because the Pytrch-0.2 version used for comparing two models used is older. For the sake of fairness, the original random average gradient descent optimization algorithm is replaced by a random average gradient descent method based on a parameter rollback mechanism, and other parameters and architectures are kept unchanged.
The following table 1 shows the confusion results of the AWD-LSTM model and the MoS-AWD-LSTM model in the Penn Treebank data set language modeling task, the smaller confusion degree shows that the language model has better performance, and the parameters show the number of model parameters. The results show that compared with the AWD-LSTM model, the random average gradient descent method based on the parameter rollback mechanism provided by the invention respectively improves the results by 1.23% and 1.03% (Pythrch-0.2 version), 0.03% and 0.23% (Pythrch-0.4 version) in the confusion degree of the verification set and the test set; compared with the MoS-AWD-LSTM model, the random average gradient descent method based on the parameter rollback mechanism provided by the embodiment is improved by 0.88% and 0.87% (Pythrch-0.2 version), 2.35% and 2.3% (Pythrch-0.4 version).
TABLE 1 perplexity results table of AWD-LSTM and MoS-AWD-LSTM models in Penn Treebank dataset language modeling task
Meanwhile, the embodiment verifies the influence of different parameter rollback vectors s on the experimental result. Fig. 3 and 4 show the confusion of the verification set on the Penn Treebank dataset when different parameter rollback vectors s are used for the two experimental models, and for the sake of clarity, fig. 3 and 4 only show the part reaching the lowest value, and it can be found that after the parameter rollback operation occurs, the training progress has a certain probability of being greatly reduced before. In addition, the present embodiment verifies that setting the value of the parameter rollback vector s to 0.02 performs best on the Penn Treebank dataset.
Claims (8)
1. A medical named entity identification method based on improved random average gradient descent is characterized by comprising the following steps:
step 1, receiving a medical unstructured text to be labeled, and performing data preprocessing to obtain labeled data; for the received medical unstructured text to be labeled, firstly, simply naming the entity label on a part of the medical unstructured text in a rule-based and dictionary-based mode, and then obtaining high-quality training data;
step 2, establishing an AWD-LSTM model according to the improved random average gradient descent, inputting the preprocessed data into the AWD-LSTM model for training, and obtaining a marker;
step 2.1, initializing parameters of the medical named entity identification method based on improved random average gradient descent; the parameters are divided into hyper-parameters and ordinary parameters, and the hyper-parameters comprise: default learning rate lr, attenuation term λ, index α of iterative learning rate update, time point t at which gradient averaging starts0Weight attenuation term, parameter rollback vector s, parameter rollback level bl, and default parameter rollback size bds; common parameters include: iteration round number step, iteration learning rate eta, weight updating parameter p and rollback number count b;
step 2.2, reducing the weight energy through weight attenuation operation, wherein the weight attenuation updating function is as follows:
in the above formula, wtA random gradient descent weight vector representing time t,representing a randomly selected sample at the time t, and weight-decay representing weight attenuation; using the L2 regularization term, a smaller weight vector w is obtainedt;
Step 2.3, calculating a random gradient descent weight vector according to a random gradient descent update function, wherein the random gradient descent update function is as follows:
in the above formula, wtA random gradient descent weight vector representing time t; w is at+1Represents a random gradient descent weight vector at time t + 1; eta represents the iterative learning rate, xjWhich represents a sample that is randomly selected,a gradient representing a loss function at time t;
step 2.4, the random gradient descent weight vector obtained in the step 2.3 is converted into a random average gradient descent weight vector through a random average gradient descent updating function; the random mean gradient descent update function is:
in the above formula, the first and second carbon atoms are,a random mean gradient descent weight vector representing time t,represents the random mean gradient descent weight vector, μ, at time t +1tRepresents the weight update parameter at time t, wtA random gradient descent weight vector representing time t;
step 2.5, judging whether the value of the key step length parameter meets the condition requirement of rollback or not, and judging whether to perform parameter rollback operation or not by setting two judgments;
step 2.6, when the two judgment results of step 2.5 both meet the parameter rollback condition and the parameter rollback operation can be performed at the moment, performing the parameter rollback operation on the key step length parameter with the decreased random average gradient: updating the value of iteration round number step and the value of rollback number count b; the iteration round number step is updated as:
step=max(step//bl,bds*b)
in the above formula, step is the number of iteration rounds, bl is the parameter rollback level, bds is the default parameter rollback size, and b is the rollback number count; the rollback number count is updated to:
b=b*2
in the above formula, b is the rollback number count;
step 2.7, updating the iterative learning rate, the weight updating parameter and the sample parameter, and increasing the count of the iterative round step counter by 1;
step 2.8, repeating the steps 2.1 to 2.7 until the AWD-LSTM model meets a preset training termination criterion;
and 3, carrying out named entity labeling on the medical unstructured text to be labeled by using the trained labeler.
2. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein step 2.5 comprises the following steps:
step 2.5.1, first round judgment: judging whether the key step length parameter is divided by the default value, if the iteration round number step is divided by the default value, carrying out the second round judgment in the step 2.5.2; if the iteration round number step can not divide the default value, the iteration round number step and the rollback number count b are not updated;
step 2.5.2, second round judgment: judging whether the system random number is smaller than the quotient of the parameter rollback vector s and the rollback number count b; if the system random number is smaller than the quotient of the parameter rollback vector s and the rollback number count b, executing step 2.6; otherwise, the iteration step and the rollback number count b are not updated.
3. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein the specific operations of updating the iterative learning rate, the weight update parameters and the sample parameters in step 2.7 are as follows:
the iterative learning rate η is updated as:
in the above formula, etat+1The iterative learning rate at the moment of t +1 is represented, lr represents a default learning rate, lambda represents an attenuation term, step is the number of iterative rounds, and alpha represents the update index of eta;
updating the weight update parameter to:
in the above formula,. mu.t+1A weight update parameter indicating time t +1, t indicating time t, t0Indicating the moment when the random average gradient descent starts;
the sample parameters are updated as:
in the above formula, the first and second carbon atoms are,represents a randomly selected sample at time t,represents a randomly selected sample at the moment of t +1, lambda represents an attenuation term, eta represents an iterative learning rate,representing the gradient of the loss function at time t.
4. Medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, characterized in that at initialization in step 2.1: setting a default learning rate lr to 20-30, setting an attenuation term lambda to 0, setting an index alpha of iterative learning rate update to 0.75, and setting a time point t at which gradient averaging starts0Set to 0, set the value of the weight attenuation term to 1.2e-6The value of the parameter rollback vector s is set to 0.02, the parameter rollback level bl is set to 10, and the default parameter rollback size bds is set to 10000.
5. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein: in step 2.4, the weight updating parameter mu at the moment ttIs 1; step 2.7 decayItem λ has a default value of e-4,t0Has a default value of e6。
6. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein: a random number seed value is also set in step 2.1, and the value of the system random number in step 2.5.2 is determined by the random number seed value.
7. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein: the key step size parameter in step 2.5 and step 2.6 is the iteration round number step.
8. The medical named entity recognition method based on improved stochastic mean gradient descent of claim 2 or 7, wherein: the default value for the integer division by iteration round step in step 2.5.1 is set to 1000.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110435549.3A CN112966516A (en) | 2021-04-22 | 2021-04-22 | Medical named entity identification method based on improved random average gradient descent |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110435549.3A CN112966516A (en) | 2021-04-22 | 2021-04-22 | Medical named entity identification method based on improved random average gradient descent |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112966516A true CN112966516A (en) | 2021-06-15 |
Family
ID=76281014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110435549.3A Pending CN112966516A (en) | 2021-04-22 | 2021-04-22 | Medical named entity identification method based on improved random average gradient descent |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112966516A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023378A (en) * | 2022-01-05 | 2022-02-08 | 北京晶泰科技有限公司 | Method for generating protein structure constraint distribution and protein design method |
-
2021
- 2021-04-22 CN CN202110435549.3A patent/CN112966516A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023378A (en) * | 2022-01-05 | 2022-02-08 | 北京晶泰科技有限公司 | Method for generating protein structure constraint distribution and protein design method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110413986B (en) | Text clustering multi-document automatic summarization method and system for improving word vector model | |
CN111177374B (en) | Question-answer corpus emotion classification method and system based on active learning | |
CN109472024B (en) | Text classification method based on bidirectional circulation attention neural network | |
CN109635124B (en) | Remote supervision relation extraction method combined with background knowledge | |
CN109670039B (en) | Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis | |
CN109697289B (en) | Improved active learning method for named entity recognition | |
CN113505200B (en) | Sentence-level Chinese event detection method combined with document key information | |
CN111274790B (en) | Chapter-level event embedding method and device based on syntactic dependency graph | |
CN110298044B (en) | Entity relationship identification method | |
CN111046178B (en) | Text sequence generation method and system | |
CN110413768A (en) | A kind of title of article automatic generation method | |
CN112883722B (en) | Distributed text summarization method based on cloud data center | |
CN115965033B (en) | Method and device for generating text abstract based on sequence-level prefix prompt | |
CN112214989A (en) | Chinese sentence simplification method based on BERT | |
Li et al. | Adversarial discrete sequence generation without explicit neuralnetworks as discriminators | |
CN113221542A (en) | Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening | |
CN113094502A (en) | Multi-granularity takeaway user comment sentiment analysis method | |
CN115630649A (en) | Medical Chinese named entity recognition method based on generative model | |
CN111626041A (en) | Music comment generation method based on deep learning | |
CN113420552B (en) | Biomedical multi-event extraction method based on reinforcement learning | |
CN114925687A (en) | Chinese composition scoring method and system based on dynamic word vector representation | |
CN112966516A (en) | Medical named entity identification method based on improved random average gradient descent | |
CN112884087A (en) | Biological enhancer and identification method for type thereof | |
CN117436522A (en) | Biological event relation extraction method and large-scale biological event relation knowledge base construction method of cancer subject | |
CN116757195A (en) | Implicit emotion recognition method based on prompt learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |