CN112966516A - Medical named entity identification method based on improved random average gradient descent - Google Patents

Medical named entity identification method based on improved random average gradient descent Download PDF

Info

Publication number
CN112966516A
CN112966516A CN202110435549.3A CN202110435549A CN112966516A CN 112966516 A CN112966516 A CN 112966516A CN 202110435549 A CN202110435549 A CN 202110435549A CN 112966516 A CN112966516 A CN 112966516A
Authority
CN
China
Prior art keywords
gradient descent
parameter
rollback
random
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110435549.3A
Other languages
Chinese (zh)
Inventor
陈观林
程钊
杨武剑
翁文勇
李甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou City University
Original Assignee
Hangzhou City University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou City University filed Critical Hangzhou City University
Priority to CN202110435549.3A priority Critical patent/CN112966516A/en
Publication of CN112966516A publication Critical patent/CN112966516A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention relates to a medical named entity identification method based on improved random average gradient descent, which comprises the following steps: receiving a medical unstructured text to be labeled, and performing data preprocessing to obtain labeled data; establishing an AWD-LSTM model according to the improved random average gradient descent, inputting the preprocessed data into the AWD-LSTM model for training, and obtaining a marker; and carrying out named entity labeling on the medical unstructured text to be labeled by utilizing the trained labeler. The invention has the beneficial effects that: the invention indirectly influences the gradient value by changing the value of the iteration step number, and rolls back the key parameters of the random average gradient descent optimization algorithm according to a certain rule, thereby achieving the effect of changing the shrinkage rate of the random average gradient descent optimization algorithm, achieving the purpose of jumping out of the local optimum of the random average gradient descent optimization algorithm, obtaining a better value, and not increasing the training time.

Description

Medical named entity identification method based on improved random average gradient descent
Technical Field
The invention relates to the field of natural language processing and the technical field of deep learning, in particular to a medical named entity identification method with reduced random average gradient.
Background
Natural Language Processing (NLP) is a branch of artificial intelligence and linguistics, one of the most difficult problems in artificial intelligence. Natural language processing refers to processing of information such as the shape, sound, and meaning of a natural language by a computer, that is, operations and processing for inputting, outputting, recognizing, analyzing, understanding, and generating words, sentences, and chapters. It has a significant impact on computer and human interaction. The basic tasks of natural language processing include voice recognition, information retrieval, question-answering systems, machine translation and the like, and the recurrent neural network and naive Bayes are common models of natural language processing. With the development of deep learning technology in many fields, natural language processing has made a great breakthrough.
Named Entity Recognition (NER) is a basic task in the field of NLP, and is also an important basic tool for most NLP tasks such as question and answer systems, machine translation, syntactic analysis, and the like. Previous approaches have been primarily dictionary-based and rule-based. The dictionary-based method is a method of fuzzy search or complete matching through character strings, but the quality and the size of the dictionary are limited as new entity names are continuously emerged; the rule-based method is to manually specify some rules and expand a rule set by common collocation of self characteristics and phrases of entity names, but huge human resources and time cost are consumed, the rules are generally effective only in a certain specific field, the cost of manual migration is high, and the rule portability is not strong. Named entity recognition is carried out, a machine learning method is mostly adopted, model training is continuously optimized, and the trained model shows better performance in test evaluation. Currently, the most applied models include Hidden Markov Models (HMMs), Support Vector Machines (SVMs), Maximum Entropy Markov Models (MEMMs), Conditional Random Fields (CRFs), and the like. The conditional random field model can effectively process the influence problem of the adjacent labels on the prediction sequence, so that the conditional random field model is applied to entity recognition more and has good effect. At present, a deep learning algorithm is generally adopted for the problem of sequence labeling. Compared with the traditional algorithm, the deep learning algorithm eliminates the step of manually extracting the features, and can effectively extract the distinguishing features.
After the foreign scholars Merity puts forward the AWD-LSTM model, a plurality of language models based on the model have good effects on named entity recognition. These models are trained first, and then re-trained, called "tweaks". However, the deep learning model targets non-convex functions and has many parameters, resulting in a rather difficult training process. At present, a random average gradient descent optimization algorithm is adopted in model training, namely a plurality of samples are randomly selected from a whole sample to be called a batch, then the average gradient of the batch is recorded, and the average gradient average value at all times is taken as the gradient estimation of the whole sample for training. However, the method has the defect that the model parameters cannot be converged, so that the optimal solution is easily missed, and the model training is incomplete.
In recent years, in the biomedical field, the literature resources are increased thousands of times every year, the information is mostly stored in the form of unstructured text, and the biomedicine named entity recognition aims to convert the unstructured text into structured text and recognize and classify specific entity names such as genes, proteins, diseases and the like in the biomedicine text. At present, how to quickly and efficiently retrieve relevant information from huge data is a great challenge.
In a gradient descent optimization method based on a hybrid strategy, which is disclosed in patent No. 2020109668396, the optimization method is firstly set as an Adam optimization algorithm, a gradient descent process of the Adam optimization algorithm is calculated, when a conversion mechanism is satisfied, the Adam optimization algorithm is converted into an SGDM optimization algorithm, a learning rate of the converted SGDM optimization algorithm is determined according to a scaling rule, and the gradient descent process of the SGDM optimization algorithm is calculated until a convergence condition is reached. The method can achieve a certain effect by mixing two optimization algorithms, but the method cannot achieve a good effect on retraining after restarting.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a random average gradient descent method based on a parameter rollback mechanism.
The medical named entity identification method based on improved random average gradient descent comprises the following steps:
step 1, receiving a medical unstructured text to be labeled, and performing data preprocessing to obtain labeled data; for the received medical unstructured text to be labeled, firstly, simple named entity labeling is carried out on a part of the medical unstructured text by using a rule-based and dictionary-based mode, then, high-quality training data are obtained by using a manual labeling method and a data enhancement technology, and linguistic data are provided for later training of a named entity recognition model;
step 2, establishing an AWD-LSTM model according to the improved random average gradient descent, inputting the preprocessed data into the AWD-LSTM model for training, and obtaining a marker;
step 2.1, initializing parameters of the medical named entity identification method based on improved random average gradient descent; the parameters are divided into hyper-parameters and ordinary parameters, and the hyper-parameters comprise: default learning rate lr, attenuation term λ, index α of iterative learning rate update, time point t at which gradient averaging starts0Weight attenuation term, parameter rollback vector s, parameter rollback level bl, and default parameter rollback size bds; common parameters include: iteration step, iteration learning rate eta, weight updating parameter mu and rollback frequency count b;
step 2.2, dynamically reducing the weight energy through weight attenuation operation to prevent model overfitting, wherein a weight attenuation updating function is as follows:
Figure BDA0003033024380000031
the upper typeIn, wtA random gradient descent weight vector representing time t,
Figure BDA0003033024380000032
representing a randomly selected sample at the time t, and weight _ decay represents weight attenuation; using the L2 regularization term, a smaller weight vector w is obtainedtThe complexity of the network is reduced, so that the problem of model overfitting is effectively avoided;
step 2.3, calculating a random gradient descent weight vector according to a random gradient descent update function, wherein the random gradient descent update function is as follows:
Figure BDA0003033024380000033
in the above formula, wtA random gradient descent weight vector representing time t; w is at+1Represents a random gradient descent weight vector at time t + 1; eta represents the iterative learning rate, xjWhich represents a sample that is randomly selected,
Figure BDA0003033024380000034
a gradient representing a loss function at time t;
step 2.4, the random gradient descent weight vector obtained in the step 2.3 is converted into a random average gradient descent weight vector through a random average gradient descent updating function; the random mean gradient descent update function is:
Figure BDA0003033024380000035
in the above formula, the first and second carbon atoms are,
Figure BDA0003033024380000036
a random mean gradient descent weight vector representing time t,
Figure BDA0003033024380000037
represents the random mean gradient descent weight vector, μ, at time t +1tTo representWeight update parameter at time t, wtA random gradient descent weight vector representing time t;
step 2.5, judging whether the value of the key step length parameter meets the condition requirement of rollback or not, and judging whether to perform parameter rollback operation or not by setting two judgments;
step 2.6, when the two judgment results of step 2.5 both meet the parameter rollback condition and the parameter rollback operation can be performed at the moment, performing the parameter rollback operation on the key step length parameter with the decreased random average gradient: updating the value of iteration round number step and the value of rollback number count b; the iteration round number step is updated as:
step=max(step//bl,bds*b)
in the above formula, step is the number of iteration rounds, bl is the parameter rollback level, bds is the default parameter rollback size, and b is the rollback number count; the rollback number count is updated to:
b=b*2
in the above formula, b is the rollback number count;
step 2.7, updating the iterative learning rate, the weight updating parameter and the sample parameter, and increasing the count of the iterative round step counter by 1;
step 2.8, repeating the steps 2.1 to 2.7 until the AWD-LSTM model meets a preset training termination criterion;
and 3, carrying out named entity labeling on the medical unstructured text to be labeled by using the trained labeler.
Preferably, step 2.5 specifically comprises the steps of:
step 2.5.1, first round judgment: judging whether the key step length parameter is divided by the default value, if the iteration round number step is divided by the default value, carrying out the second round judgment in the step 2.5.2; if the iteration round number step can not divide the default value, the iteration round number step and the rollback number count b are not updated;
step 2.5.2, second round judgment: judging whether the system random number is smaller than the quotient of the parameter rollback vector s and the rollback number count b; if the system random number is smaller than the quotient of the parameter rollback vector s and the rollback number count b, executing step 2.6; otherwise, the iteration step and the rollback number count b are not updated.
Preferably, the specific operations of updating the iterative learning rate, the weight update parameter and the sample parameter in step 2.7 are as follows:
the iterative learning rate η is updated as:
Figure BDA0003033024380000041
in the above formula, etat+1The iterative learning rate at the moment of t +1 is represented, lr represents a default learning rate, lambda represents an attenuation term, step is the number of iterative rounds, and alpha represents the update index of eta;
updating the weight update parameter to:
Figure BDA0003033024380000042
in the above formula,. mu.t+1A weight update parameter indicating time t +1, t indicating time t, t0Indicating the moment when the random average gradient descent starts;
the sample parameters are updated as:
Figure BDA0003033024380000043
in the above formula, the first and second carbon atoms are,
Figure BDA0003033024380000044
represents a randomly selected sample at time t,
Figure BDA0003033024380000045
represents a randomly selected sample at the moment of t +1, lambda represents an attenuation term, eta represents an iterative learning rate,
Figure BDA0003033024380000046
representing the gradient of the loss function at time t.
Preferably, at initialization in step 2.1: setting the default learning rate lr to 20-30, setting the attenuation term lambda to 0, and overlappingThe index α of the generation learning rate update is set to 0.75, and the time point t at which gradient averaging is started is set0Set to 0, set the value of the weight attenuation term to 1.2e-6The value of the parameter rollback vector s is set to 0.02, the parameter rollback level bl is set to 10, and the default parameter rollback size bds is set to 10000.
Preferably, the weight update parameter μ at time t in step 2.4tIs 1; default value of attenuation term lambda in step 2.7 is e-4,t0Has a default value of e6
Preferably, a random number seed value is also set in step 2.1, and the value of the systematic random number in step 2.5.2 is determined by the random number seed value.
Preferably, the key step size parameter in step 2.5 and step 2.6 is the iteration round step.
Preferably, the default value for step 2.5.1 divided exactly by the iteration step is set to 1000.
The invention has the beneficial effects that:
the invention indirectly influences the gradient value by changing the value of the iteration round number step, and rolls back the key parameter (iteration round number step) of the random average gradient descent optimization algorithm according to a certain rule, thereby achieving the effect of changing the shrinkage rate of the random average gradient descent optimization algorithm, achieving the purpose of jumping out of the local optimum of the random average gradient descent optimization algorithm, obtaining a better value, and not increasing the training time.
Drawings
FIG. 1 is a flow chart of a medical named entity identification method based on improved stochastic mean gradient descent;
FIG. 2 is a flow chart of a random average gradient descent method based on a parameter rollback mechanism;
FIG. 3 is a graph of the experimental results of the AWD-LSTM model using different parameter rollback vectors in the Penn Treebank dataset;
FIG. 4 is a graph of the results of a MoS-AWD-LSTM model experiment on the Penn Treebank dataset using rolling back vectors with different parameters.
Detailed Description
The present invention will be further described with reference to the following examples. The following examples are set forth merely to aid in the understanding of the invention. It should be noted that, for a person skilled in the art, several modifications can be made to the invention without departing from the principle of the invention, and these modifications and modifications also fall within the protection scope of the claims of the present invention.
As an embodiment, an AWD-LSTM model is proposed based on Merity of a foreign student, an original random average gradient descent optimization algorithm is replaced by a random average gradient descent optimization algorithm based on a parameter rollback mechanism, and a fine tuning step required in the later training period of the original model is eliminated. As shown in fig. 1, the method mainly comprises the following steps:
step 1, receiving a medical unstructured text to be labeled, and performing data preprocessing to obtain labeled data:
for the received medical unstructured text to be labeled, firstly, simple named entity labeling is carried out on a part of the medical unstructured text by using a rule-based and dictionary-based mode, then, a manual labeling method and a data enhancement technology are utilized to obtain enough high-quality training data, and linguistic data are provided for later training of a named entity recognition model.
Step 2, inputting the preprocessed data into an AWD-LSTM model for training according to the AWD-LSTM model established by the improved random average gradient descent to obtain a marker;
compared with the existing random average gradient descent algorithm, the method carries out certain regular constraint on the step length parameter in the existing random average gradient descent algorithm, indirectly influences the shrinkage rate of the algorithm by constraining the step length parameter, finally achieves the capability of enabling the algorithm to cross over the local optimum in the later period, and can find a more optimal solution.
Referring to fig. 2, the specific steps of building the AWD-LSTM model by improving the random mean gradient descent include:
step 2.1, initializing the parameters of the random average gradient descent optimization algorithm based on the parameter rollback mechanism: the optimization algorithm provided by the invention has eight hyper-parameters which are respectively a default learning rate lr, an attenuation term lambda, an index alpha of eta update and a time point t of starting gradient average0Weight _ decay (penalized by L2), parameter rollback vector s, parameter rollback level bl and default parameter rollback size bds, and four common parameters, iteration round number step, iteration learning rate eta, weight updating parameter mu and rollback number count b. Through experiments, the default learning rate lr value is set to be 20 to 30, and the attenuation term lambda value is set to be e-4The exponent α value of η update is set to 0.75 and the mean time point t of the gradient is started0The value is set to 0 and the weight decay (L2 penalty) weight _ decay value is set to 1.2e-6When the value of the parameter rollback vector s is set to be 0.02, the value of the parameter rollback level bl is set to be 10, and the value of the default parameter rollback size bds is set to be 10000, good effects can be achieved on different models and data sets.
Step 2.2, weight attenuation operation is carried out, so that the weight can be dynamically reduced, overfitting of the model is prevented, and a weight attenuation updating function is as follows:
Figure BDA0003033024380000061
wherein, wtA random gradient descent weight vector representing time t,
Figure BDA0003033024380000062
represents a randomly selected sample at time t, and weight _ decay represents weight decay. Using the L2 regularization term, a smaller weight vector w is obtainedtAnd the complexity of the network is reduced, so that the problem of model overfitting is effectively avoided.
Step 2.3, calculating a random gradient descent weight vector, wherein a random gradient descent updating function is as follows:
Figure BDA0003033024380000063
wherein, wtRepresenting the random gradient descent weight vector at time t, η representing the iterative learning rate, xjWhich represents a sample that is randomly selected,
Figure BDA0003033024380000064
representing the gradient of the loss function at time t.
Step 2.4, the random gradient descent weight vector is replaced by a random average gradient descent weight vector, and the random average gradient descent updating function is as follows:
Figure BDA0003033024380000065
wherein the content of the first and second substances,
Figure BDA0003033024380000066
random mean gradient descent weight vector, μ, representing time ttThe default initial value is 1, which indicates the weight update parameter at time t.
Step 2.5, judging whether the value of the key step length parameter meets the condition requirement of rollback, and simultaneously judging whether the parameter rollback operation can be carried out:
in the random average gradient descent optimization algorithm, the key step size parameter influences the shrinkage rate of the random average gradient descent optimization algorithm. In the original random average gradient descent optimization algorithm, the step size parameter value is continuously increased to lead the algorithm to tend to be more and more stable, although the algorithm can obtain a more stable result in the later period, the algorithm is also trapped in local optimization due to the over-stability, and the algorithm cannot obtain a better result due to the over-small gradient value. Therefore, the improved method provided by the invention indirectly influences the gradient value by changing the value of the step length parameter, thereby achieving the effect of changing the shrinkage rate of the random average gradient descent optimization algorithm, then jumping out of local optimization and obtaining a better value.
In the proposed medical named entity recognition method for improving random average gradient descent, whether parameter rollback is performed or not is judged by setting two judgments. Judging whether the step length parameter in the improved random average gradient descent optimization algorithm is divided by the default value 1000 for the first time, namely setting a default rollback interval, and judging for the second time each time the interval is reached; and judging whether the system random number is smaller than the quotient of the parameter rollback vector s and the parameter rollback count b for the second time, wherein the system random number is influenced by the random number sub-values under the initial setting of model training.
Step 2.6, when the parameter rollback requirement is met, performing parameter rollback operation on the key step length parameter with the random average gradient decreasing:
when the two determinations in step 2.5 both meet the requirements, the value of the step parameter in the optimized random average gradient descent optimization algorithm is divided into the maximum value of the product of the parameter rollback level bl and the default parameter rollback size bds with the rollback number count b by the step parameter value, so that the gradient is ensured not to generate great fluctuation after multiple parameter rollback operations, and the phenomenon that the gradient is too large to jump out of the local optimum and the experimental result is poor is prevented. Meanwhile, the parameter rollback grade b is made to be twice of the value of the parameter rollback grade b, so that the gradient cannot greatly fluctuate under the influence of the parameter rollback operation along with the change of the number of training rounds.
The step size parameter value update function is as follows:
step=max(step//bl,bds*b)
the parameter rollback level update function is as follows:
b=b*2
step 2.7, updating the iterative learning rate eta, the weight updating parameter mu and the sample parameter, and increasing the step size parameter counter by 1:
the iterative learning rate update function is as follows:
Figure BDA0003033024380000071
where lr represents a default learning rate, λ represents an attenuation term, and e is defaulted-4And α represents an update index of η.
The weight update parameter update function is as follows:
Figure BDA0003033024380000072
wherein, t0Represents the time point of starting to perform random average gradient descent weight vector, defaultConsider the value e6
The sample parameter update function is as follows:
Figure BDA0003033024380000073
wherein the content of the first and second substances,
Figure BDA0003033024380000081
represents a randomly selected sample at the time t, lambda represents an attenuation term, and default is e(-4)
The rate of iterative learning is represented as,
Figure BDA0003033024380000082
representing the gradient of the loss function at time t.
And 2.8, repeating the steps 2.1-2.7 until the model meets the preset training termination criterion.
And step 3: and carrying out named entity labeling on the received medical unstructured text to be labeled by utilizing the trained labeler.
The experimental results are as follows:
to demonstrate the effectiveness of this example, experiments were performed on the Penn TreeBank dataset for both the AWD-LSTM model and the MoS-AWD-LSTM model. The Penn TreeBank dataset has long been a common dataset for language model experiments, with the maximum number of words in the vocabulary limited to 10000.
In the training process, all experiments strictly follow the regularization and optimization technology introduced in the AWD-LSTM, a series of optimization techniques including stacking three layers of LSTM and the like are included, and the experiment is reproduced in a Pytrch-0.4 version because the Pytrch-0.2 version used for comparing two models used is older. For the sake of fairness, the original random average gradient descent optimization algorithm is replaced by a random average gradient descent method based on a parameter rollback mechanism, and other parameters and architectures are kept unchanged.
The following table 1 shows the confusion results of the AWD-LSTM model and the MoS-AWD-LSTM model in the Penn Treebank data set language modeling task, the smaller confusion degree shows that the language model has better performance, and the parameters show the number of model parameters. The results show that compared with the AWD-LSTM model, the random average gradient descent method based on the parameter rollback mechanism provided by the invention respectively improves the results by 1.23% and 1.03% (Pythrch-0.2 version), 0.03% and 0.23% (Pythrch-0.4 version) in the confusion degree of the verification set and the test set; compared with the MoS-AWD-LSTM model, the random average gradient descent method based on the parameter rollback mechanism provided by the embodiment is improved by 0.88% and 0.87% (Pythrch-0.2 version), 2.35% and 2.3% (Pythrch-0.4 version).
TABLE 1 perplexity results table of AWD-LSTM and MoS-AWD-LSTM models in Penn Treebank dataset language modeling task
Figure BDA0003033024380000083
Meanwhile, the embodiment verifies the influence of different parameter rollback vectors s on the experimental result. Fig. 3 and 4 show the confusion of the verification set on the Penn Treebank dataset when different parameter rollback vectors s are used for the two experimental models, and for the sake of clarity, fig. 3 and 4 only show the part reaching the lowest value, and it can be found that after the parameter rollback operation occurs, the training progress has a certain probability of being greatly reduced before. In addition, the present embodiment verifies that setting the value of the parameter rollback vector s to 0.02 performs best on the Penn Treebank dataset.

Claims (8)

1. A medical named entity identification method based on improved random average gradient descent is characterized by comprising the following steps:
step 1, receiving a medical unstructured text to be labeled, and performing data preprocessing to obtain labeled data; for the received medical unstructured text to be labeled, firstly, simply naming the entity label on a part of the medical unstructured text in a rule-based and dictionary-based mode, and then obtaining high-quality training data;
step 2, establishing an AWD-LSTM model according to the improved random average gradient descent, inputting the preprocessed data into the AWD-LSTM model for training, and obtaining a marker;
step 2.1, initializing parameters of the medical named entity identification method based on improved random average gradient descent; the parameters are divided into hyper-parameters and ordinary parameters, and the hyper-parameters comprise: default learning rate lr, attenuation term λ, index α of iterative learning rate update, time point t at which gradient averaging starts0Weight attenuation term, parameter rollback vector s, parameter rollback level bl, and default parameter rollback size bds; common parameters include: iteration round number step, iteration learning rate eta, weight updating parameter p and rollback number count b;
step 2.2, reducing the weight energy through weight attenuation operation, wherein the weight attenuation updating function is as follows:
Figure FDA0003033024370000011
in the above formula, wtA random gradient descent weight vector representing time t,
Figure FDA0003033024370000012
representing a randomly selected sample at the time t, and weight-decay representing weight attenuation; using the L2 regularization term, a smaller weight vector w is obtainedt
Step 2.3, calculating a random gradient descent weight vector according to a random gradient descent update function, wherein the random gradient descent update function is as follows:
Figure FDA0003033024370000013
in the above formula, wtA random gradient descent weight vector representing time t; w is at+1Represents a random gradient descent weight vector at time t + 1; eta represents the iterative learning rate, xjWhich represents a sample that is randomly selected,
Figure FDA0003033024370000014
a gradient representing a loss function at time t;
step 2.4, the random gradient descent weight vector obtained in the step 2.3 is converted into a random average gradient descent weight vector through a random average gradient descent updating function; the random mean gradient descent update function is:
Figure FDA0003033024370000015
in the above formula, the first and second carbon atoms are,
Figure FDA0003033024370000016
a random mean gradient descent weight vector representing time t,
Figure FDA0003033024370000017
represents the random mean gradient descent weight vector, μ, at time t +1tRepresents the weight update parameter at time t, wtA random gradient descent weight vector representing time t;
step 2.5, judging whether the value of the key step length parameter meets the condition requirement of rollback or not, and judging whether to perform parameter rollback operation or not by setting two judgments;
step 2.6, when the two judgment results of step 2.5 both meet the parameter rollback condition and the parameter rollback operation can be performed at the moment, performing the parameter rollback operation on the key step length parameter with the decreased random average gradient: updating the value of iteration round number step and the value of rollback number count b; the iteration round number step is updated as:
step=max(step//bl,bds*b)
in the above formula, step is the number of iteration rounds, bl is the parameter rollback level, bds is the default parameter rollback size, and b is the rollback number count; the rollback number count is updated to:
b=b*2
in the above formula, b is the rollback number count;
step 2.7, updating the iterative learning rate, the weight updating parameter and the sample parameter, and increasing the count of the iterative round step counter by 1;
step 2.8, repeating the steps 2.1 to 2.7 until the AWD-LSTM model meets a preset training termination criterion;
and 3, carrying out named entity labeling on the medical unstructured text to be labeled by using the trained labeler.
2. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein step 2.5 comprises the following steps:
step 2.5.1, first round judgment: judging whether the key step length parameter is divided by the default value, if the iteration round number step is divided by the default value, carrying out the second round judgment in the step 2.5.2; if the iteration round number step can not divide the default value, the iteration round number step and the rollback number count b are not updated;
step 2.5.2, second round judgment: judging whether the system random number is smaller than the quotient of the parameter rollback vector s and the rollback number count b; if the system random number is smaller than the quotient of the parameter rollback vector s and the rollback number count b, executing step 2.6; otherwise, the iteration step and the rollback number count b are not updated.
3. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein the specific operations of updating the iterative learning rate, the weight update parameters and the sample parameters in step 2.7 are as follows:
the iterative learning rate η is updated as:
Figure FDA0003033024370000021
in the above formula, etat+1The iterative learning rate at the moment of t +1 is represented, lr represents a default learning rate, lambda represents an attenuation term, step is the number of iterative rounds, and alpha represents the update index of eta;
updating the weight update parameter to:
Figure FDA0003033024370000022
in the above formula,. mu.t+1A weight update parameter indicating time t +1, t indicating time t, t0Indicating the moment when the random average gradient descent starts;
the sample parameters are updated as:
Figure FDA0003033024370000031
in the above formula, the first and second carbon atoms are,
Figure FDA0003033024370000032
represents a randomly selected sample at time t,
Figure FDA0003033024370000033
represents a randomly selected sample at the moment of t +1, lambda represents an attenuation term, eta represents an iterative learning rate,
Figure FDA0003033024370000034
representing the gradient of the loss function at time t.
4. Medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, characterized in that at initialization in step 2.1: setting a default learning rate lr to 20-30, setting an attenuation term lambda to 0, setting an index alpha of iterative learning rate update to 0.75, and setting a time point t at which gradient averaging starts0Set to 0, set the value of the weight attenuation term to 1.2e-6The value of the parameter rollback vector s is set to 0.02, the parameter rollback level bl is set to 10, and the default parameter rollback size bds is set to 10000.
5. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein: in step 2.4, the weight updating parameter mu at the moment ttIs 1; step 2.7 decayItem λ has a default value of e-4,t0Has a default value of e6
6. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein: a random number seed value is also set in step 2.1, and the value of the system random number in step 2.5.2 is determined by the random number seed value.
7. The medical named entity recognition method based on improved stochastic mean gradient descent as claimed in claim 1, wherein: the key step size parameter in step 2.5 and step 2.6 is the iteration round number step.
8. The medical named entity recognition method based on improved stochastic mean gradient descent of claim 2 or 7, wherein: the default value for the integer division by iteration round step in step 2.5.1 is set to 1000.
CN202110435549.3A 2021-04-22 2021-04-22 Medical named entity identification method based on improved random average gradient descent Pending CN112966516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110435549.3A CN112966516A (en) 2021-04-22 2021-04-22 Medical named entity identification method based on improved random average gradient descent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110435549.3A CN112966516A (en) 2021-04-22 2021-04-22 Medical named entity identification method based on improved random average gradient descent

Publications (1)

Publication Number Publication Date
CN112966516A true CN112966516A (en) 2021-06-15

Family

ID=76281014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110435549.3A Pending CN112966516A (en) 2021-04-22 2021-04-22 Medical named entity identification method based on improved random average gradient descent

Country Status (1)

Country Link
CN (1) CN112966516A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114023378A (en) * 2022-01-05 2022-02-08 北京晶泰科技有限公司 Method for generating protein structure constraint distribution and protein design method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114023378A (en) * 2022-01-05 2022-02-08 北京晶泰科技有限公司 Method for generating protein structure constraint distribution and protein design method

Similar Documents

Publication Publication Date Title
CN110413986B (en) Text clustering multi-document automatic summarization method and system for improving word vector model
CN111177374B (en) Question-answer corpus emotion classification method and system based on active learning
CN109472024B (en) Text classification method based on bidirectional circulation attention neural network
CN109635124B (en) Remote supervision relation extraction method combined with background knowledge
CN109670039B (en) Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis
CN109697289B (en) Improved active learning method for named entity recognition
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN111274790B (en) Chapter-level event embedding method and device based on syntactic dependency graph
CN110298044B (en) Entity relationship identification method
CN111046178B (en) Text sequence generation method and system
CN110413768A (en) A kind of title of article automatic generation method
CN112883722B (en) Distributed text summarization method based on cloud data center
CN115965033B (en) Method and device for generating text abstract based on sequence-level prefix prompt
CN112214989A (en) Chinese sentence simplification method based on BERT
Li et al. Adversarial discrete sequence generation without explicit neuralnetworks as discriminators
CN113221542A (en) Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening
CN113094502A (en) Multi-granularity takeaway user comment sentiment analysis method
CN115630649A (en) Medical Chinese named entity recognition method based on generative model
CN111626041A (en) Music comment generation method based on deep learning
CN113420552B (en) Biomedical multi-event extraction method based on reinforcement learning
CN114925687A (en) Chinese composition scoring method and system based on dynamic word vector representation
CN112966516A (en) Medical named entity identification method based on improved random average gradient descent
CN112884087A (en) Biological enhancer and identification method for type thereof
CN117436522A (en) Biological event relation extraction method and large-scale biological event relation knowledge base construction method of cancer subject
CN116757195A (en) Implicit emotion recognition method based on prompt learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination