WO2021258914A1 - 一种序列标注模型的训练方法及装置 - Google Patents

一种序列标注模型的训练方法及装置 Download PDF

Info

Publication number
WO2021258914A1
WO2021258914A1 PCT/CN2021/094180 CN2021094180W WO2021258914A1 WO 2021258914 A1 WO2021258914 A1 WO 2021258914A1 CN 2021094180 W CN2021094180 W CN 2021094180W WO 2021258914 A1 WO2021258914 A1 WO 2021258914A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
model
sample training
sequence labeling
training
Prior art date
Application number
PCT/CN2021/094180
Other languages
English (en)
French (fr)
Inventor
周楠楠
杨海军
徐倩
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021258914A1 publication Critical patent/WO2021258914A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Definitions

  • the present disclosure relates to the field of natural language processing, and in particular to a training method and device for a sequence labeling model.
  • sequence labeling problem is an important and widely used problem in the field of natural language processing. After people have completed the training of the sequence labeling model built based on the training samples, they can use the sequence labeling model to complete the training to achieve the input sentence Sequence labeling. However, when training the sequence labeling model, there are insufficient sample sizes in many scenarios.
  • the embodiments of the present disclosure provide a method and device for training a sequence labeling model, which are used to solve the problem in the prior art that an effective sequence labeling model cannot be obtained due to insufficient sample data during training of the training labeling model.
  • a training method for a sequence labeling model including:
  • the target loss information is calculated based on the first loss information and the second loss information, and the model parameters of the sequence labeling model are adjusted and iteratively trained based on the target loss information, and the preset convergence is determined When conditional, output the sequence labeling model after training.
  • the determining the anti-disturbance factor according to the model parameters of the sequence labeling model includes:
  • the calculation of the anti-disturbance factor based on the gradient and preset hyperparameters includes:
  • a preset hyperparameter is obtained, and the product of the obtained hyperparameter and the gradient is quoted with the norm of the gradient to obtain an anti-disturbance factor.
  • the method before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the method further includes:
  • the learning rate corresponding to each level is determined respectively, where the learning rate is used to characterize the adjustment of the model parameters corresponding to each level Amplitude
  • the model parameters of each layer in the sequence labeling model are adjusted in a manner of error back propagation.
  • the method before acquiring the sequence labeling model to be trained and the sample training sentence set, the method further includes:
  • the one sample training sentence is filled with preset characters to generate a sample training sentence
  • the sentence length exceeds the preset fixed sentence length
  • the part of the one sample training sentence that exceeds the fixed sentence length is truncated to generate a sample training sentence
  • the one sample training sentence is directly used as a sample training sentence.
  • training the sequence labeling model based on the sample training sentence set to obtain first loss information includes:
  • a piece of first loss information is calculated based on a label difference between the first predicted label information and the actual label information corresponding to the one sample training sentence.
  • the training the sequence labeling model based on the sample training sentence set added to the anti-disturbance factor to obtain the second loss information includes:
  • a piece of second loss information is calculated based on a label difference between the second predicted label information and the actual label information corresponding to the one sample training sentence.
  • the determining that a preset convergence condition is satisfied includes:
  • the output of the trained sequence labeling model it further includes:
  • a training device for a sequence labeling model including:
  • the acquiring unit acquires the sequence labeling model to be trained and the set of sample training sentences;
  • a training unit which trains the sequence labeling model based on the sample training sentence set to obtain first loss information
  • the determining unit determines the anti-disturbance factor according to the model parameters of the sequence labeling model, and trains the sequence labeling model based on the sample training sentence set added to the anti-disturbance factor to obtain second loss information;
  • the adjustment unit calculates the target loss information based on the first loss information and the second loss information, and adjusts the model parameters of the sequence labeling model based on the target loss information, performs iterative training, and determines that the target loss information is satisfied.
  • the convergence condition is set, the sequence labeling model after training is output.
  • the determining unit when determining the anti-disturbance factor according to the model parameters of the sequence labeling model, is configured to:
  • the determining unit when calculating the anti-disturbance factor based on the gradient and preset hyperparameters, the determining unit:
  • a preset hyperparameter is obtained, and the product of the obtained hyperparameter and the gradient is quoted with the norm of the gradient to obtain an anti-disturbance factor.
  • the adjusting unit is further configured to:
  • the learning rate corresponding to each level is determined respectively, where the learning rate is used to characterize the adjustment of the model parameters corresponding to each level Amplitude
  • the model parameters of each layer in the sequence labeling model are adjusted in a manner of error back propagation.
  • the acquiring unit is further configured to:
  • the one sample training sentence is filled with preset characters to generate a sample training sentence
  • the sentence length exceeds the preset fixed sentence length
  • the part of the one sample training sentence that exceeds the fixed sentence length is truncated to generate a sample training sentence
  • the one sample training sentence is directly used as a sample training sentence.
  • the training unit is configured to:
  • a piece of first loss information is calculated based on a label difference between the first predicted label information and the actual label information corresponding to the one sample training sentence.
  • the determining unit is configured to:
  • a piece of second loss information is calculated based on a label difference between the second predicted label information and the actual label information corresponding to the one sample training sentence.
  • the adjustment unit when it is determined that a preset convergence condition is satisfied, the adjustment unit is configured to:
  • the adjustment unit is further configured to:
  • a training device for a sequence labeling model including:
  • Memory used to store executable instructions
  • the processor is configured to read and execute the executable instructions stored in the memory to implement the method for training the sequence annotation model described in any one of the above.
  • a storage medium is provided.
  • the processor can execute the method for training a sequence annotation model described in any one of the above.
  • a sequence labeling model to be trained and a sample training sentence set are obtained, and then the sequence labeling model is trained based on the sample training sentence set to obtain first loss information, and then the sequence labeling model is labeled according to the sequence Determine the anti-disturbance factor, and train the sequence labeling model based on the sample training sentence set added to the anti-disturbance factor to obtain the second loss information, and then, based on the first loss information and the first loss information 2.
  • Loss information is calculated to obtain target loss information, and based on the target loss information, the model parameters of the sequence labeling model are adjusted and iterative training is performed, and when it is determined that the preset convergence conditions are met, the sequence label after training is output Model.
  • Fig. 1 is a training flowchart of a sequence labeling model in an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of the logical structure of a training device for a sequence labeling model in an embodiment of the disclosure
  • FIG. 3 is a schematic diagram of the physical structure of a training device for a sequence labeling model in an embodiment of the disclosure.
  • the sequence labeling model to be trained and the sample training sentence set are obtained, and then the sequence is performed based on the sample training sentence set.
  • the annotation model is trained to obtain the first loss information, and then according to the model parameters of the sequence annotation model, the anti-disturbance factor is determined, and the sequence annotation model is trained based on the sample training sentence set added to the anti-disturbance factor to obtain Second loss information, and then target loss information is calculated based on the first loss information and the second loss information, and the model parameters of the sequence labeling model are adjusted and iteratively trained based on the target loss information, And when it is determined that the preset convergence condition is satisfied, the sequence labeling model after training is output.
  • the training process of the sequence labeling model is as follows:
  • S101 Obtain a sequence labeling model to be trained and a set of sample training sentences.
  • sample data is obtained, and the sentence length of each sample data is determined.
  • sample data 1 ⁇ Xiao Ming goes to the bank to repay two thousand yuan today ⁇ , and determine that the sentence length of sample data 1 is 12 characters
  • read sample data 2 ⁇ Xiao Li goes to school every day to study ⁇
  • determine sample data 2 The sentence length is 9 characters.
  • each sample data is processed to obtain a sample training sentence, until all the sample data processing is completed .
  • each sample data processing method there are but not limited to the following situations:
  • the first case the sentence length of the sample data does not reach the preset fixed sentence length.
  • the one sample data is filled with preset characters to generate a sample training sentence.
  • the preset fixed sentence length is 128 characters
  • the preset character is "0”
  • the sentence length of sample training sentence 1 is 12 characters.
  • the sentence length of sample training sentence 1 does not reach 128 characters. Characters, using the character "0" to fill in the sample data 1 to generate a sample training sentence 1.
  • the second case The sentence length of the sample data exceeds the preset fixed sentence length.
  • the sentence length of one sample data exceeds the preset fixed sentence length
  • the part of the one sample data that exceeds the fixed sentence length is truncated to generate a sample training sentence.
  • the preset fixed sentence length is 128 characters
  • the sentence length of the sample data X is 130 characters.
  • the sentence length of the sample data exceeds the preset fixed sentence length of 128 characters, and the sample data X
  • the part exceeding 128 characters is truncated to generate a sample training sentence X.
  • the third case the sentence length of the sample data reaches the preset fixed sentence length.
  • the sentence length of one sample data reaches the preset fixed sentence length, the one sample data is directly used as a sample training sentence.
  • the sample data N is directly used as a sample training sentence N.
  • the sample data is processed based on the preset sentence beginning tag [CLS] and the preset sentence ending tag [SEP], that is, Set [CLS] at the beginning of a sentence of sample data, and set [SEP] at the end of a sentence of sample data.
  • a training sample sentence set is obtained based on the training sample sentence obtained through processing.
  • the obtained sequence labeling model is based on the bidirectional Encoder Representation from Transformers (BERT) + the bidirectional Long Short-Term Memory (Bidirectional Long Short-Term Memory, BiLSTM) from the converter. )+Conditional Random Fields (CRF) model architecture, where the BERT model is a pre-training model in the sequence labeling model.
  • BERT bidirectional Encoder Representation from Transformers
  • BiLSTM bidirectional Long Short-Term Memory
  • CRF Conditional Random Fields
  • the sequence labeling model is trained based on the sample sequence sentence set to obtain the first loss information.
  • a batch processing method is adopted to read and process sample training sentences. That is, according to the preset batch size, it is determined to read a corresponding number of sample training sentences for model training each time.
  • the preset batch size is 32, and it is determined that 32 sample training sentences are read every time for model training.
  • the preset batch size is 64, and it is determined that each time 64 sample training sentences are read for model training.
  • each sample training sentence in the sample training sentence set is input into a sequence labeling model, and the following operations are performed for each sample training sentence input in the sequence labeling model:
  • S1 Determine the word vector corresponding to each character in a sample training sentence, and generate the corresponding first word vector set.
  • the corresponding first character is generated Vector collection.
  • Reading a sample training sentence obtains the word vector set including word vectors 1-5, and the word vectors 1-5 are all 768 dimensions.
  • S2 Carry out entity labeling on a sample training sentence based on the word vector set to obtain corresponding first predicted labeling information.
  • the prediction result of each character vector is obtained.
  • the corresponding first prediction label information is Small (B-NAM) Ming (E-NAM) Today (O) Day (O) Go to (O) Silver (O) Line (O) Return (O) Style (O) Two (O) Thousand (O) Yuan ( O).
  • a piece of first loss information is calculated based on the label difference between the first predicted label information and the actual label information corresponding to a sample training sentence, and the first loss information is denoted as Le.
  • the BiLSTM+CRF model in the sequence labeling model determines the labeling difference between the first predicted labeling information and the real labeling information based on the obtained first predicted labeling information and the real labeling information corresponding to a sample training sentence. And calculate the first loss corresponding to the current sequence labeling model in a targeted manner.
  • S103 Determine an anti-disturbance factor according to the model parameters of the sequence labeling model.
  • the current model parameters of the sequence annotation model are obtained, the gradient of the sequence annotation model is calculated based on the model parameters, and the anti-disturbance factor is calculated based on the gradient and preset hyperparameters, where: The hyperparameters are used to adjust the strength of the generated anti-disturbance.
  • the gradient g of the sequence labeling model is calculated. Further, in the embodiment of the present disclosure, the preset hyperparameters are obtained, and the obtained hyperparameters are compared with the obtained hyperparameters. The product of the gradient and the norm of the gradient are quotient to obtain the anti-disturbance factor.
  • is a hyperparameter, and the hyperparameter can be configured by itself according to processing needs to adjust the size of the anti-disturbance factor.
  • S104 Train the sequence labeling model based on the sample training sentence set added to the anti-disturbance factor to obtain second loss information.
  • the anti-disturbance factor is added to the first word vector set obtained in S102 to obtain a second word vector set. Then, the BiLSTM+CRF model in the sequence labeling model trains the sentence on the one sample based on the second word vector set Perform entity labeling to obtain corresponding second predicted labeling information.
  • a piece of second loss information is calculated based on the label difference between the second predicted label information and the actual label information corresponding to the one sample training sentence.
  • the labeling difference between the second predicted labeling information and the real labeling information is determined, and the correspondence of the current sequence labeling model is calculated pertinently
  • the second loss of, the second loss is recorded as Lr.
  • the corresponding real label information is: Xiao (B-NAM) Ming (E-NAM) Today (B-TIM) Days (E-TIM) Go (O) Silver (B-LOC) line (E-LOC) return (O) paragraph (O) two (B-MON) thousand (I-MON) yuan (E-MON), and for the sample training sentence obtained the first
  • the second prediction information is: small (B-NAM) Ming (E-NAM) today (O) day (O) to (O) silver (B-NAM) line (E-NAM) return (O) section (O) Two (O) thousand (O) yuan (O), and further, based on the difference between the second prediction information and the actual label information, the second loss corresponding to the current sequence labeling model is obtained.
  • the sample training sentence can be increased without introducing noise.
  • the sequence labeling model can make the sequence labeling model have stronger generalization ability and higher accuracy.
  • S105 Calculate target loss information based on the first loss information and the second loss information, and adjust the model parameters of the sequence labeling model based on the target loss information and perform iterative training.
  • the Le and The sum of Lr is used as target loss information, and the target loss information is denoted as L.
  • the learning rate specifically, the original learning rate set for the pre-training model in the sequence labeling model is obtained.
  • the pre-training model is used to generate a corresponding word vector set based on the input sample training sentences, and then according to the corresponding pre-training
  • the preset layer coefficients of each level in the training model are combined with the original learning rate to determine the learning rate corresponding to each level, wherein the learning rate is used to represent the adjustment range of the model parameter corresponding to each level.
  • the learning rate of each level in the pre-training model is different.
  • the upper layer of the pre-training model contains more semantic level information
  • the middle layer contains Syntactic level information
  • the bottom layer contains phrase information, so when configuring the learning rate, for the upper layer of the pre-training model, generally configure a higher learning rate to achieve more changes to the upper layer parameters, and for the pre-training model
  • the bottom layer of the training model is generally configured with a lower learning rate to change the bottom layer parameters less.
  • Li represents the learning rate of the i-th layer
  • Lr represents the original learning rate of the pre-training model configuration
  • Ci represents the layer coefficient of the i-th layer.
  • C1 indicates the layer coefficient of the upper layer of the pre-training model
  • C2 indicates the layer coefficient of the middle layer of the pre-training model
  • C3 Represents the layer coefficient of the bottom layer of the pre-training model, where the value of Ci can be adjusted according to actual processing needs.
  • Li represents the learning rate of the i-th layer
  • Li+1 represents the learning rate of the i+1-th layer
  • C is a fixed parameter.
  • the initial learning rate L1 of the first layer is set to 0.025, and the fixed parameter C is set to 5, the learning rate of the second layer is 0.005, and the learning rate of the third layer is 0.001.
  • L1 indicates the learning rate of the upper layer of the pre-training model
  • L2 indicates the learning rate of the middle layer of the pre-training model
  • L3 Represents the learning rate of the bottom layer of the pre-training model.
  • the model parameters of each layer in the sequence labeling model are adjusted by means of error back propagation.
  • the specific process may be: calculating the partial derivative value of the target loss information to the model parameters of each level, and calculating the product of the learning rate of each level and the calculated partial derivative value, and calculating the value corresponding to each level
  • the product of, adjusts the model parameters of each level, specifically, the adjusted model parameter of each level is the difference between the model parameter before the adjustment and the corresponding product.
  • an iterative training method is adopted to perform training based on each sample training sentence in the sample training sentence set. Specifically, in the process of training the sequence labeling model, a target loss information is obtained based on the sample training sentence, and after completing an adjustment to the sequence labeling model, a new sample is obtained from the sample training sentence set The training sentence is input in the adjusted sequence labeling model, and the training is performed again, and iteratively repeats in this way until it is determined that the convergence condition is satisfied, which will not be repeated here.
  • the following methods may be used but not limited to determine that the preset convergence condition is satisfied:
  • the first method Determine the difference between the prediction accuracy rate of the sample training sentence in each iteration process and the prediction accuracy rate of the sample training sentence in the previous iteration process during the continuous N iterations, and meet the preset When the accuracy difference is within the range, it is determined that the preset convergence condition is reached.
  • the prediction accuracy rate may be specifically measured by the accuracy rate of the predicted labeling information compared with the actual labeling information, that is, the proportion of the information content that is correctly labeled in the predicted labeling information.
  • sequence labeling model For example, suppose that after the sequence labeling model performs sequence labeling on a sample training sentence including 50 characters, it is determined that 40 character sequences are labelled correctly. In this way, the accuracy of the sequence labeling model is 80%.
  • the value of N can be set according to actual application scenarios.
  • the preset accuracy difference range is 1% to 5%
  • the prediction accuracy 1 of the sample training sentence during the 10th iteration is 80%
  • the prediction accuracy 2 of the sample training sentence is 75%
  • the prediction accuracy 3 of the sample training sentence in the 8th iteration is 70%.
  • the prediction accuracy of the sample training sentence in the 10th iteration is 1 and 9
  • the difference between the prediction accuracy 2 of the sample training sentences in the iterative process is 5%
  • the prediction accuracy of the sample training sentences in the 9th iteration process is between 2 and the prediction accuracy 3 of the sample training sentences in the 8th iteration.
  • the difference is 5%.
  • the difference between the prediction accuracy of the sample training sentence in each iteration and the prediction accuracy of the sample training sentence in the previous iteration is determined to meet the prediction Set 1% to 5%, then it is determined that the preset convergence condition is met.
  • the third method is to determine the difference between the target loss information of the sequence labeling model during each iteration and the target loss information of the sequence labeling model during the previous iteration in the process of successive M iterations, so as to meet the expected When the accuracy difference range is set, it is determined that the preset convergence condition is reached.
  • the value of M can be set according to actual application scenarios.
  • the preset loss difference range is 1%-2.5%
  • the target loss information of the sequence labeling model during the 30th iteration is 7.5%
  • the target loss during the 29th iteration The information is 8.4%
  • the target loss information during the 28th iteration is 9.2%
  • the target loss information during the 27th iteration is 10.3%
  • the target loss information during the 26th iteration is 11.6%
  • the target loss information is 13.0%, so it is determined that from the 25th iteration to the 30th iteration, the difference of the target loss information of the two adjacent iterations is 1.4%, 1.3%, 1.1%, 0.8%, 0.9%, 5 consecutive
  • the difference that meets the target loss information for the second time is in the loss difference range, then it is determined that the preset convergence condition is satisfied.
  • the third method is to determine that the preset convergence condition is satisfied when the current iteration number reaches the preset maximum iteration number.
  • the preset maximum number of iterations is 50, and when it is determined that the current number of iterations reaches 50, it is determined that the preset convergence condition is satisfied.
  • the sequence labeling model may be determined to converge, and the sequence labeling model after training may be output.
  • the sequence labeling model is invoked to perform sequence labeling processing on the sentence to be processed to obtain the output predicted labeling information.
  • a training device for a sequence labeling model which at least includes: an acquisition unit 201, a training unit 202, a determination unit 203, and an adjustment unit 204, wherein,
  • the obtaining unit 201 obtains a sequence labeling model to be trained and a set of sample training sentences;
  • the training unit 202 trains the sequence labeling model based on the sample training sentence set to obtain first loss information
  • the determining unit 203 determines the anti-disturbance factor according to the model parameters of the sequence labeling model, and trains the sequence labeling model based on the sample training sentence set added to the anti-perturbation factor to obtain second loss information;
  • the adjustment unit 204 calculates target loss information based on the first loss information and the second loss information, and adjusts the model parameters of the sequence labeling model based on the target loss information, performs iterative training, and determines that When the preset convergence condition is used, the sequence labeling model after training is output.
  • the determining unit 203 when determining the anti-disturbance factor according to the model parameters of the sequence labeling model, is configured to:
  • the determining unit 203 when calculating the anti-disturbance factor based on the gradient and preset hyperparameters, the determining unit 203:
  • a preset hyperparameter is obtained, and the product of the obtained hyperparameter and the gradient is quoted with the norm of the gradient to obtain an anti-disturbance factor.
  • the adjusting unit 204 is further configured to:
  • the learning rate corresponding to each level is determined respectively, where the learning rate is used to characterize the adjustment of the model parameters corresponding to each level Amplitude
  • the model parameters of each layer in the sequence labeling model are adjusted in a manner of error back propagation.
  • the acquiring unit 201 is further configured to:
  • the one sample training sentence is filled with preset characters to generate a sample training sentence
  • the sentence length exceeds the preset fixed sentence length
  • the part of the one sample training sentence that exceeds the fixed sentence length is truncated to generate a sample training sentence
  • the one sample training sentence is directly used as a sample training sentence.
  • the training unit 202 is configured to:
  • a piece of first loss information is calculated based on a label difference between the first predicted label information and the actual label information corresponding to the one sample training sentence.
  • the determining unit 203 is configured to:
  • a piece of second loss information is calculated based on a label difference between the second predicted label information and the actual label information corresponding to the one sample training sentence.
  • the adjustment unit 204 is configured to:
  • the adjustment unit 204 is further configured to:
  • the disclosed embodiment provides a training device for a sequence labeling model, which at least includes:
  • the memory 301 is used to store executable instructions
  • the processor 302 is configured to read and execute executable instructions stored in the memory, and execute the following process:
  • the target loss information is calculated based on the first loss information and the second loss information, and the model parameters of the sequence labeling model are adjusted and iteratively trained based on the target loss information, and the preset convergence is determined When conditional, output the sequence labeling model after training.
  • the processor 302 when determining the anti-disturbance factor according to the model parameters of the sequence labeling model, the processor 302 is configured to:
  • the processor 302 when calculating the anti-disturbance factor based on the gradient and preset hyperparameters, the processor 302 is configured to:
  • a preset hyperparameter is obtained, and the product of the obtained hyperparameter and the gradient is quoted with the norm of the gradient to obtain an anti-disturbance factor.
  • the processor 302 is further configured to:
  • the learning rate corresponding to each level is determined respectively, where the learning rate is used to characterize the adjustment of the model parameters corresponding to each level Amplitude.
  • the processor 302 is further configured to:
  • the one sample training sentence is filled with preset characters to generate a sample training sentence
  • the sentence length exceeds the preset fixed sentence length
  • the part of the one sample training sentence that exceeds the fixed sentence length is truncated to generate a sample training sentence
  • the one sample training sentence is directly used as a sample training sentence.
  • the processor 302 is configured to:
  • a piece of first loss information is calculated based on a label difference between the first predicted label information and the actual label information corresponding to the one sample training sentence.
  • the processor 302 is configured to:
  • a piece of second loss information is calculated based on a label difference between the second predicted label information and the actual label information corresponding to the one sample training sentence.
  • the processor 302 is configured to:
  • the processor 302 is further configured to:
  • embodiments of the present disclosure provide a storage medium.
  • the processor can execute the sequence labeling model described in any of the above embodiments. Training method.
  • the sequence labeling model to be trained and the sample training sentence set are acquired, and then the sequence labeling model is trained based on the sample training sentence set to obtain the first For loss information, determine the anti-disturbance factor according to the model parameters of the sequence labeling model, and train the sequence labeling model based on the sample training sentence set added to the anti-perturbation factor to obtain second loss information, and then,
  • the target loss information is calculated based on the first loss information and the second loss information, and the model parameters of the sequence labeling model are adjusted and iteratively trained based on the target loss information, and the preset convergence is determined When conditional, output the sequence labeling model after training.
  • the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

一种序列标注模型的训练方法及装置,涉及自然语言处理领域,用以在样本数据量不足的情况下对序列标注模型进行有效训练,该方法为:基于样本训练语句集合对序列标注模型进行训练,得到第一损失信息,在根据模型参数确定对抗扰动因子后,基于加入所述对抗扰动因子的样本训练语句集合,得到第二损失信息,基于第一损失信息和第二损失信息计算得到的目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,确定满足收敛条件。所述方法通过增加对抗扰动因子,使得基于一个样本训练语句能够得到不同的损失信息,使得训练得到的所述序列标注模型的泛化能力更强,精度更高,而且避免了引入不必要的噪声干扰,节省资源消耗。

Description

一种序列标注模型的训练方法及装置
相关申请的交叉引用
本公开要求在2020年06月24日提交中国专利局、申请号为202010591966.2、申请名称为“一种序列标注模型的训练方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及自然语言处理领域,特别涉及一种序列标注模型的训练方法及装置。
背景技术
序列标注问题是自然语言处理领域中重要且应用广泛的一类问题,人们在基于训练样本完成对搭建的序列标注模型的训练后,能够借助于完成训练的序列标注模型,实现对输入的语句进行序列标注。但是,在进行序列标注模型训练时,很多场景下均存在样本量不足的情况。
现有技术下,为了获得足够的样本量,通常采用数据增强处理的方式由一个样本数据得到多个样本数据,然后通过得到的多个样本数据进行序列标注模型训练。然而,采用数据增强处理生成的样本数据进行训练时,会引入由于进行数据增强处理而带来的噪声,极大的影响序列标注模型精度,进而影响序列标注的精确度。
发明内容
本公开实施例提供一种序列标注模型的训练方法及装置,用以解决现有技术中存在的进行训练标注模型训练时由于样本数据量不足,导致无法得到有效的序列标注模型的问题。
本公开实施例提供的具体技术方案如下:
第一方面,提出一种序列标注模型的训练方法,包括:
获取待训练的序列标注模型和样本训练语句集合;
基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息;
根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息;
基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
可选的,所述根据所述序列标注模型的模型参数,确定对抗扰动因子,包括:
获取所述序列标注模型当前的模型参数,并基于所述模型参数计算所述序列标注模型的梯度,并基于所述梯度以及预设的超参数计算对抗扰动因子,其中,所述超参数用于调整生成的对抗扰动的强弱。
可选的,所述基于所述梯度以及预设的超参数计算对抗扰动因子,包括:
获取预设的超参数,并将获得的所述超参数与所述梯度的乘积,与所述梯度的范数做商得到对抗扰动因子。
可选的,所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练之前,进一步包括:
获取为所述序列标注模型中的预训练模型设置的原始学习率,所述预训练模型用于基于输入的样本训练语句生成相应的字向量集合;
根据对应所述预训练模型中各个层级预设的层系数,结合所述原始学习率,分别确定各个层级对应的学习率,其中,所述学习率用于表征对各个层级对应的模型参数的调整幅度;
所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,包括:
基于确定的所述序列标注模型中各个层级的学习率以及所述目标损失信息,采用误差反向传播的方式调整所述序列标注模型中各个层级的模型参数。
可选的,所述获取待训练的序列标注模型和样本训练语句集合之前,进一步包括:
获取多个样本训练语句,并确定各个样本训练语句的语句长度,基于所述各个样本训练语句的语句长度,对所述多个样本训练语句中的各个样本训练语句执行以下任意一项操作:
若所述语句长度未达到预设的固定语句长度,则采用预设的字符对所述一个样本训练语句进行填补,生成一个样本训练语句;或者,
若所述语句长度超过预设的固定语句长度时,则将所述一个样本训练语句中超过所述固定语句长度的部分进行截断,生成一个样本训练语句;或者,
若所述语句长度达到预设的固定语句长度时,则直接将所述一个样本训练语句作为一个样本训练语句。
可选的,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息,包括:
将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对各个输入所述序列标注模型中的样本训练语句,分别执行以下操作:
确定一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合;
基于所述第一字向量集合对所述一个样本训练语句进行实体标注,得到对应的第一预 测标注信息;
基于所述第一预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第一损失信息。
可选的,所述基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息,包括:
将所述对抗扰动因子加入所述第一字向量集合,得到第二字向量集合;
基于所述第二字向量集合对所述一个样本训练语句进行实体标注,得到对应的第二预测标注信息;
基于所述第二预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第二损失信息。
可选的,所述确定满足预设的收敛条件,包括:
确定连续N次迭代过程中,每一次迭代过程中对于样本训练语句的预测准确率与前一次迭代过程中对样本训练语句的预测准确率之间的差值,满足预设的准确率差值范围时,确定达到预设的收敛条件;或者,
确定连续M次迭代过程中,每一次迭代过程中所述序列标注模型的目标损失与前一次迭代过程中所述序列标注模型的目标损失之间的差值,满足预设的损失差值范围时,确定满足预设的收敛条件;或者,
确定当前迭代的次数达到预设的最大迭代次数时,确定达到预设的收敛条件。
可选的,所述输出训练后的所述序列标注模型后,进一步包括:
获取待处理语句,调用所述序列标注模型对所述待处理语句进行序列标注处理,得到输出的预测标注信息。
第二方面,提出一种序列标注模型的训练装置,包括:
获取单元,获取待训练的序列标注模型和样本训练语句集合;
训练单元,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息;
确定单元,根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息;
调整单元,基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
可选的,所述根据所述序列标注模型的模型参数,确定对抗扰动因子时,所述确定单元用于:
获取所述序列标注模型当前的模型参数,并基于所述模型参数计算所述序列标注模型 的梯度,并基于所述梯度以及预设的超参数计算对抗扰动因子,其中,所述超参数用于调整生成的对抗扰动的强弱。
可选的,所述基于所述梯度以及预设的超参数计算对抗扰动因子时,所述确定单元:
获取预设的超参数,并将获得的所述超参数与所述梯度的乘积,与所述梯度的范数做商得到对抗扰动因子。
可选的,所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练之前,所述调整单元进一步用于:
获取为所述序列标注模型中的预训练模型设置的原始学习率,所述预训练模型用于基于输入的样本训练语句生成相应的字向量集合;
根据对应所述预训练模型中各个层级预设的层系数,结合所述原始学习率,分别确定各个层级对应的学习率,其中,所述学习率用于表征对各个层级对应的模型参数的调整幅度;
所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,包括:
基于确定的所述序列标注模型中各个层级的学习率以及所述目标损失信息,采用误差反向传播的方式调整所述序列标注模型中各个层级的模型参数。
可选的,所述获取待训练的序列标注模型和样本训练语句集合之前,所述获取单元进一步用于:
获取多个样本训练语句,并确定各个样本训练语句的语句长度,基于所述各个样本训练语句的语句长度,对所述多个样本训练语句中的各个样本训练语句执行以下任意一项操作:
若所述语句长度未达到预设的固定语句长度,则采用预设的字符对所述一个样本训练语句进行填补,生成一个样本训练语句;或者,
若所述语句长度超过预设的固定语句长度时,则将所述一个样本训练语句中超过所述固定语句长度的部分进行截断,生成一个样本训练语句;或者,
若所述语句长度达到预设的固定语句长度时,则直接将所述一个样本训练语句作为一个样本训练语句。
可选的,所述基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息时,所述训练单元用于:
将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对得到的一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合;
将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对各个输入所述序列标注模型中的样本训练语句,分别执行以下操作:
确定一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合;
基于所述第一字向量集合对所述一个样本训练语句进行实体标注,得到对应的第一预测标注信息;
基于所述第一预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第一损失信息。
可选的,所述基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息时,所述确定单元用于:
将所述对抗扰动因子加入所述第一字向量集合,得到第二字向量集合;
基于所述第二字向量集合对所述一个样本训练语句进行实体标注,得到对应的第二预测标注信息;
基于所述第二预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第二损失信息。
可选的,所述确定满足预设的收敛条件时,所述调整单元用于:
确定连续N次迭代过程中,每一次迭代过程中对于样本训练语句的预测准确率与前一次迭代过程中对样本训练语句的预测准确率之间的差值,满足预设的准确率差值范围时,确定达到预设的收敛条件;或者,
确定连续M次迭代过程中,每一次迭代过程中所述序列标注模型的目标损失与前一次迭代过程中所述序列标注模型的目标损失之间的差值,满足预设的损失差值范围时,确定满足预设的收敛条件;或者,
确定当前迭代的次数达到预设的最大迭代次数时,确定达到预设的收敛条件。
可选的,所述输出训练后的所述序列标注模型后,所述调整单元进一步用于:
获取待处理语句,调用所述序列标注模型对所述待处理语句进行序列标注处理,得到输出的预测标注信息。
第三方面,提出一种序列标注模型的训练装置,包括:
存储器,用于存储可执行指令;
处理器,用于读取并执行所述存储器中存储的可执行指令,以实现上述任一项所述的序列标注模型的训练方法。
第四方面,提出一种存储介质,当所述存储介质中的指令由处理器执行时,使得所述处理器能够执行上述任一项所述的序列标注模型的训练方法。
本公开有益效果如下:
本公开实施例中,获取待训练的序列标注模型和样本训练语句集合,然后,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息,再根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集 合对所述序列标注模型进行训练,得到第二损失信息,然后,基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
这样,通过在序列标注模型中增加对抗扰动因子,能够基于一个样本训练语句能够得到不同的损失信息,使得训练得到的所述序列标注模型的泛化能力更强,精度更高,而且避免了引入不必要的噪声干扰,节省资源消耗。此外,无需人工标注大量的训练样本,能够节省大量的人力成本和时间,从而能够提升序列标注模型的训练效率。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例中序列标注模型的训练流程图;
图2为本公开实施例中序列标注模型的训练装置逻辑结构示意图;
图3为本公开实施例中序列标注模型的训练装置实体结构示意图。
具体实施方式
为在样本数据不足的情况下,实现对序列标注模型进行有效训练,本公开实施例中,获取待训练的序列标注模型和样本训练语句集合,然后,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息,再根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息,然后,基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
下面结合附图对本公开优选的实施方式做出进一步详细说明。
参阅图1所示,本公开实施例中,对序列标注模型的训练流程如下:
S101:获取待训练的序列标注模型和样本训练语句集合。
具体的,首先,获取样本数据,并确定各个样本数据的语句长度。
例如,获取样本数据1{小明今天去银行还款两千元},并确定样本数据1的语句长度为12个字符,读取样本数据2{小丽每天去学校学习},并确定样本数据2的语句长度为9个字符。
进一步的,确定各个的样本数据的语句长度后,基于所述指定数目的样本数据中每一个样本数据的语句长度,对每一个样本数据进行处理得到一个样本训练语句,直至全部的样本数据处理完成,其中,对于每一个样本数据的处理方式,存在但不限于以下情况:
第一种情况:样本数据的语句长度未达到预设的固定语句长度。
若一个样本数据的语句长度未达到预设的固定语句长度,则采用预设的字符对所述一个样本数据进行填补,生成一个样本训练语句。
例如,假设,预设的固定语句长度为128个字符,预设的字符为“0”,样本训练语句1的语句长度为12个字符,此时,样本训练语句1的语句长度未达到128个字符,采用字符“0”对所述样本数据1进行填补,生成样本训练语句1。
第二种情况:样本数据的语句长度超过预设的固定语句长度。
若一个样本数据的语句长度超过预设的固定语句长度时,则将所述一个样本数据中超过所述固定语句长度的部分进行截断,生成一个样本训练语句。
例如,假设,预设的固定语句长度为128个字符,存在样本数据X的语句长度为130个字符,此时,样本数据的语句长度超过预设的固定语句长度128个字符,将样本数据X中超过128字符的部分进行截断,生成样本训练语句X。
第三种情况:样本数据的语句长度达到预设的固定语句长度。
若一个样本数据的语句长度达到预设的固定语句长度时,则直接将所述一个样本数据作为一个样本训练语句。
例如,假设,固定语句长度为128字符,样本数据N的语句长度为128字符,则直接将样本数据N作为一个样本训练语句N。
需要说明的是,本公开实施例中,在确定一个样本数据的语句长度之前,基于预设的句首标签[CLS],预设的句末标签[SEP],对样本数据进行处理,即,在一个样本数据的句首设置[CLS],在一个样本数据的句末设置[SEP]。
进一步的,本公开实施例中,基于处理得到的训练样本语句得到训练样本语句集合。
需要说明的是,本公开实施例中,获得的序列标注模型是基于来自转换器的双向编码器表示(Bidirectional Encoder Representation from Transformers,BERT)+双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiLSTM)+条件随机场(Conditional Random Fields,CRF)的模型架构搭建的,其中,所述BERT模型为所述序列标注模型中的预训练模型。
S102:基于样本训练语句集合对序列标注模型进行训练,得到第一损失信息。
本公开实施例中,获取样本训练语句集合后,基于所述样本序列语句集合对序列标注模型进行训练,得到第一损失信息。
需要说明的是,本公开实施例中,对序列标模型进行训练时,在进行迭代的过程中, 采用批处理的方式,读取并处理样本训练语句。即,根据预设的批处理大小,确定每一次读取相应数目的样本训练语句进行模型训练。
例如,假设,预设的批处理大小为32,确定每一次读取32个样本训练语句进行模型训练。
又例如,假设,预设的批处理大小为64,确定每一次读取64个样本训练语句进行模型训练。
为了便于描述,下面仅以将一个样本训练语句输入所述序列标注模型为例,对训练过程进行说明。
具体的,将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对各个输入所述序列标注模型中的样本训练语句,分别执行以下操作:
S1:确定一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合。
具体的,将一个样本训练语句输入序列标注模型后,基于到所述序列标注模型中的预训练模型输出的,与所述一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合。
例如,将样本训练语句{小明在读书}输入序列标注模型后,确定“小”对应字向量1,“明”对应字向量2,“在”对应字向量3,“读”对应字向量4,“书”对应字向量5,读一个样本训练语句得到得字向量集合包括字向量1-5,且,字向量1-5均为768维。
S2:基于字向量集合对一个样本训练语句进行实体标注,得到对应的第一预测标注信息。
例如,确定一个样本训练语句对应的字向量集合后,得到这对各个字向量的预测结果,如,对于样本训练语句{小明今天去银行还款两千元},对应的第一预测标注信息为小(B-NAM)明(E-NAM)今(O)天(O)去(O)银(O)行(O)还(O)款(O)两(O)千(O)元(O),表征样本训练语句中,“小”为人名的开始,“明”为人名的结束,“今”、“天”、“去”“银”、“行”、“还”“款”、“两”、“千”“元”归为其他标注。
S3:基于第一预测标注信息和一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第一损失信息,所述第一损失信息记为Le。
具体的,序列标注模型中的BiLSTM+CRF模型基于得到的第一预测标注信息,以及一个样本训练语句对应的真实标注信息,确定所述第一预测标注信息与真实标注信息之间的标注差异,并针对性的计算当前序列标注模型对应的第一损失。
例如,对于样本训练语句{小明今天去银行还款两千元},对应的真实标注信息为:小(B-NAM)明(E-NAM)今(B-TIM)天(E-TIM)去(O)银(B-LOC)行(E-LOC)还(O)款(O)两(B-MON)千(I-MON)元(E-MON),其中,“B-”表示标注的元素的开始,“I-”表示标注的元素的中间,“E-”表示标注的元素的结束。进而,基于真实标注信息 与序列标注模型得到的第一标注信息的差异,计算当前所述序列标注模型对应的第一损失。
S103:根据所述序列标注模型的模型参数,确定对抗扰动因子。
本公开实施例中,获取所述序列标注模型当前的模型参数,并基于所述模型参数计算所述序列标注模型的梯度,并基于所述梯度以及预设的超参数计算对抗扰动因子,其中,所述超参数用于调整生成的对抗扰动的强弱。
具体的,基于所述序列标注模型当前的模型参数,计算所述序列标注模型的梯度g,进一步的,本公开实施例中,获取预设的超参数,并将获得的所述超参数与所述梯度的乘积,与所述梯度的范数做商得到对抗扰动因子。
计算对抗扰动因子r的公式如下:
r=-εg/||g||2
其中,ε为超参数,所述超参数可以根据处理需要自行配置,用于调整对抗扰动因子的大小,对抗扰动因子越大,表示能够添加的对抗扰动越强。
S104:基于加入对抗扰动因子的样本训练语句集合对序列标注模型进行训练,得到第二损失信息。
具体的,继续S102中对样本训练集合中一个样本训练语句输入序列标注模型中,得到第二损失信息的过程进行说明。
将所述对抗扰动因子加入S102中得到的第一字向量集合,得到第二字向量集合,然后,序列标注模型中的BiLSTM+CRF模型基于所述第二字向量集合对所述一个样本训练语句进行实体标注,得到相应的第二预测标注信息。
例如,在样本训练语句{小明今天去银行还款两千元}对应的第一字向量集合中添加对抗扰动因子后,对第一字向量集合中各个字向量进行干扰,生成第二字向量集合。进而获得序列标注模型输出的诸如:小(B-NAM)明(E-NAM)今(O)天(O)去(O)银(B-NAM)行(E-NAM)还(O)款(O)两(O)千(O)元(O)的第二预测标注信息。
进一步的,基于所述第二预测标注信息和所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第二损失信息。
具体的,基于得到的第二预测标注信息,以及一个样本数据对应的真实标注信息,确定所述第二预测标注信息与真实标注信息之间的标注差异,并针对性的计算当前序列标注模型对应的第二损失,所述第二损失记为Lr。
例如,对于样本训练语句{小明今天去银行还款两千元},对应的真实标注信息为:小(B-NAM)明(E-NAM)今(B-TIM)天(E-TIM)去(O)银(B-LOC)行(E-LOC)还(O)款(O)两(B-MON)千(I-MON)元(E-MON),而对于样本训练语句得到的第二预测标注信息为:小(B-NAM)明(E-NAM)今(O)天(O)去(O)银(B-NAM)行(E-NAM)还(O)款(O)两(O)千(O)元(O),进而,基于第二预测信息与真实 标注信息之间的差异,获得当前序列标注模型对应的第二损失。
这样,基于一个样本训练语句,在添加对抗扰动因子的情况下,得到序列标注模型输出的不同预测标注信息,在样本有限的情况下,能够在不引入噪声的情况下,增加了样本训练语句的数量,而且基于添加有对抗扰动因子的样本训练语句进行训练时,能够使得序列标注模型的泛化能力更强,精度更高。
S105:基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练。
具体的,获得针对第一预测标注信息与真实标注信息之间的差异确定的Le,以及针对第二预测标注信息与真实标注信息之间的差异确定的Lr之后,进一步的,将所述Le与所述Lr的和作为目标损失信息,所述目标损失信息记为L。
需要说明的是,本公开实施例中,在基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练之前,需要为序列标注模型中的预训练模型的每一层配置学习率,具体的,获取为所述序列标注模型中的预训练模型设置的原始学习率,所述预训练模型用于基于输入的样本训练语句生成相应的字向量集合,再根据对应所述预训练模型中各个层级预设的层系数,结合所述原始学习率,分别确定各个层级对应的学习率,其中,所述学习率用于表征对各个层级对应的模型参数的调整幅度。
需要说明的是,对于所述序列标注模型中的预训练模型,所述预训练模型中每一层级的学习率不同,通常所述预训练模型的上层包含更多的语义层次的信息,中层包含句法层次的信息,底层包含词组方面的信息,故在配置学习率时,对于所述预训练模型的上层,一般配置较高的学习率,以实现更多的改变上层参数,而对于所述预训练模型的底层,一般配置较低的学习率,以较少的改变底层参数。
具体的,采用如下两种方式,计算序列标注模型中预训练模型中各层的学习率:
方式一、
如下公式,计算所述序列标注模型中的预训练模型中每一层的学习率:
Li=Lr/Ci
其中,Li表示第i层的学习率,Lr表示为预训练模型配置的原始学习率,Ci表示第i层的层系数。
需要说明的是,i越小表示层越高,以所述预训练模型具有三层为例,C1表示所述预训练模型上层的层系数,C2表示所述预训练模型中层的层系数,C3表示所述预训练模型底层的层系数,其中,可根据实际的处理需要调整Ci的取值。
方式二、
采用如下公式,计算所述序列标注模型中的预训练模型中每一层的学习率:
Li+1=Li/C
其中,Li表示第i层的学习率,Li+1表示第i+1层的学习率,C为固定参数。
例如,假设,设置第1层的初始学习率L1为0.025,固定参数C取值为5,则第2层的学习率为0.005,第3层的学习率为0.001。
需要说明的是,i越小表示层越高,以所述预训练模型具有三层为例,L1表示所述预训练模型上层的学习率,L2表示所述预训练模型中层的学习率,L3表示所述预训练模型底层的学习率。
这样,考虑由于样本量不足,在调整预训练模型时,通过为所述预训练模型的不同层级配置不同的学习率,更多的改变所述预训练模型的上层参数,较少的去改变所述预训练模型的底层参数,能够使预训练模型的参数调整更加科学,进而得到精度更高的序列标注模型。
进一步的,基于确定的所述序列标注模型中各个层级的学习率以及所述目标损失信息,采用误差反向传播的方式调整所述序列标注模型中各个层级的模型参数。具体过程可以为:计算所述目标损失信息对所述各个层级的模型参数的偏导值,以及计算所述各个层级的学习率和计算的偏导值的乘积,利用计算所述各个层级所对应的乘积,调整所述各个层级的模型参数,具体地,各个层级调整后的模型参数为调整前模型参数与对应的乘积的差值。
例如,以对序列标注模型中某一层级的某一模型参数W1进行调整为例,首先,基于计算得到的目标损失信息,计算所述目标损失信息对W1的偏导,进而计算调整前模型参数W1取值与所述偏导值与学习率的乘积的差,然后讲计算的差作为更新后的W1,如假设原始W1为0.4,学习率为0.01,计算得到的偏导值为0.04728,则更新后的W1位0.4-0.01*0.04728=0.3995727。
以上说明了将样本训练语句集合中一个样本训练语句输入序列标注模型中进行处理的过程,本公开实施例中,采用迭代训练的方式,基于所述样本训练语句集合中的各个样本训练语句进行训练,具体的,在对所述序列标注模型进行训练过程中,基于样本训练语句,得到一个目标损失信息,完成对所述序列标注模型的一次调整后,再从样本训练语句集合中获得新的样本训练语句输入经过的调整的序列标注模型中,进行再一次训练,如此循环迭代,直至确定满足收敛条件为止,在此不再赘述。
这样,相当于在序列标注模型的损失中添加了对抗扰动因子,而基于添加有对抗扰动因子的目标损失信息对所述序列标注模型进行调整时,能够使所述序列标注模型的泛化能力更强,精度更高。
S106:确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
具体的,本公开实施例中,确定序列标注模型的目标损失信息后,可以采用但不限于以下方式判断满足预设的收敛条件:
第一种方式:确定连续N次迭代过程中,每一次迭代过程中对于样本训练语句的预测 准确率与前一次迭代过程中对样本训练语句的预测准确率之间的差值,满足预设的准确率差值范围时,确定达到预设的收敛条件。
需要说明的是,所述预测准确率具体可以由预测标注信息与真实标注信息相比的正确率来衡量,即,预测标注信息中,正确标注的信息内容的占比。
例如,假设,序列标注模型对包括50个字符的样本训练语句进行序列标注后,确定其中40个字符序列标注正确,这样,所述序列标注模型的准确率为80%。
本公开实施例中,N的取值可以根据实际应用场景进行设定。
例如,假设,N的取值为2,预设的准确率差值范围为1%-5%,第10次迭代过程中样本训练语句的预测准确率1为80%,第9次迭代过程中样本训练语句的预测准确率2为75%,第8次迭代过程中样本训练语句的预测准确率3为70%,显然,第10次迭代过程中样本训练语句的预测准确率1与第9次迭代过程样本训练语句的预测准确率2之间的差值为5%,第9次迭代过程中样本训练语句的预测准确率2与第8次迭代过程样本训练语句的预测准确率3之间的差值为5%,此时,确定连续2次迭代过程中,每一次迭代过程中样本训练语句的预测准确率与前一次迭代过程中样本训练语句的预测准确率之间的差值,满足预设的1%-5%,那么,判定满足预设的收敛条件。
第三种方式:确定连续M次迭代过程中,每一次迭代过程中所述序列标注模型的目标损失信息与前一次迭代过程中所述序列标注模型的目标损失信息之间的差值,满足预设的准确率差值范围时,确定达到预设的收敛条件。
需要说明的是,本公开实施例中,M的取值可根据实际应用场景进行设定。
例如,假设,M的取值为5,预设的损失差值范围为1%-2.5%,第30次迭代过程中序列标注模型的目标损失信息为7.5%,第29次迭代过程中目标损失信息为8.4%,第28次迭代过程中目标损失信息为9.2%,第27次迭代过程中目标损失信息为10.3%,第26次迭代过程中目标损失信息为11.6%,第25次迭代过程中目标损失信息为13.0%,故确定从第25次迭代至第30次迭代,相邻两次迭代过程的目标损失信息差值为1.4%,1.3%,1.1%,0.8%,0.9%,连续5次满足目标损失信息的差值处于损失差值范围,那么,判定满足预设的收敛条件。
第三种方式:确定当前迭代次数达到预设的最大迭代次数时,确定满足预设的收敛条件。
例如,假设,预设的最大迭代次数为50,确定当前迭代次数达到50时,确定满足预设的收敛条件。
进一步的,确定满足以上任一收敛条件时,可确定所述序列标注模型收敛,可输出训练完成的所述序列标注模型。
进一步的,在实际的应用中,获取待处理语句后,通过调用所述序列标注模型对所述 待处理语句进行序列标注处理,得到输出的预测标注信息。
基于同一发明构思,参阅图2所示,本公开实施例中,提供一种序列标注模型的训练装置,至少包括:获取单元201,训练单元202,确定单元203和调整单元204,其中,
获取单元201,获取待训练的序列标注模型和样本训练语句集合;
训练单元202,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息;
确定单元203,根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息;
调整单元204,基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
可选的,所述根据所述序列标注模型的模型参数,确定对抗扰动因子时,所述确定单元203用于:
获取所述序列标注模型当前的模型参数,并基于所述模型参数计算所述序列标注模型的梯度,并基于所述梯度以及预设的超参数计算对抗扰动因子,其中,所述超参数用于调整生成的对抗扰动的强弱。
可选的,所述基于所述梯度以及预设的超参数计算对抗扰动因子时,所述确定单元203:
获取预设的超参数,并将获得的所述超参数与所述梯度的乘积,与所述梯度的范数做商得到对抗扰动因子。
可选的,所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练之前,所述调整单元204进一步用于:
获取为所述序列标注模型中的预训练模型设置的原始学习率,所述预训练模型用于基于输入的样本训练语句生成相应的字向量集合;
根据对应所述预训练模型中各个层级预设的层系数,结合所述原始学习率,分别确定各个层级对应的学习率,其中,所述学习率用于表征对各个层级对应的模型参数的调整幅度;
所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,包括:
基于确定的所述序列标注模型中各个层级的学习率以及所述目标损失信息,采用误差反向传播的方式调整所述序列标注模型中各个层级的模型参数。
可选的,所述获取待训练的序列标注模型和样本训练语句集合之前,所述获取单元201进一步用于:
获取多个样本训练语句,并确定各个样本训练语句的语句长度,基于所述各个样本训 练语句的语句长度,对所述多个样本训练语句中的各个样本训练语句执行以下任意一项操作:
若所述语句长度未达到预设的固定语句长度,则采用预设的字符对所述一个样本训练语句进行填补,生成一个样本训练语句;或者,
若所述语句长度超过预设的固定语句长度时,则将所述一个样本训练语句中超过所述固定语句长度的部分进行截断,生成一个样本训练语句;或者,
若所述语句长度达到预设的固定语句长度时,则直接将所述一个样本训练语句作为一个样本训练语句。
可选的,所述基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息时,所述训练单元202用于:
将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对得到的一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合;
将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对各个输入所述序列标注模型中的样本训练语句,分别执行以下操作:
确定一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合;
基于所述第一字向量集合对所述一个样本训练语句进行实体标注,得到对应的第一预测标注信息;
基于所述第一预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第一损失信息。
可选的,所述基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息时,所述确定单元203用于:
将所述对抗扰动因子加入所述第一字向量集合,得到第二字向量集合;
基于所述第二字向量集合对所述一个样本训练语句进行实体标注,得到对应的第二预测标注信息;
基于所述第二预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第二损失信息。
可选的,所述确定满足预设的收敛条件时,所述调整单元204用于:
确定连续N次迭代过程中,每一次迭代过程中对于样本训练语句的预测准确率与前一次迭代过程中对样本训练语句的预测准确率之间的差值,满足预设的准确率差值范围时,确定达到预设的收敛条件;或者,
确定连续M次迭代过程中,每一次迭代过程中所述序列标注模型的目标损失与前一次迭代过程中所述序列标注模型的目标损失之间的差值,满足预设的损失差值范围时,确定满足预设的收敛条件;或者,
确定当前迭代的次数达到预设的最大迭代次数时,确定达到预设的收敛条件。
可选的,所述输出训练后的所述序列标注模型后,所述调整单元204进一步用于:
获取待处理语句,调用所述序列标注模型对所述待处理语句进行序列标注处理,得到输出的预测标注信息。
基于同一发明构思,参阅图3所示,公开实施例提供一种序列标注模型的训练装置,至少包括:
存储器301,用于存储可执行指令;
处理器302,用于读取并执行存储器中存储的可执行指令,执行下列过程:
获取待训练的序列标注模型和样本训练语句集合;
基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息;
根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息;
基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
可选的,所述根据所述序列标注模型的模型参数,确定对抗扰动因子时,所述处理器302用于:
获取所述序列标注模型当前的模型参数,并基于所述模型参数计算所述序列标注模型的梯度,并基于所述梯度以及预设的超参数计算对抗扰动因子,其中,所述超参数用于调整生成的对抗扰动的强弱。
可选的,所述基于所述梯度以及预设的超参数计算对抗扰动因子时,所述处理器302用于:
获取预设的超参数,并将获得的所述超参数与所述梯度的乘积,与所述梯度的范数做商得到对抗扰动因子。
可选的,所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练之前,所述处理器302进一步用于:
获取为所述序列标注模型中的预训练模型设置的原始学习率,所述预训练模型用于基于输入的样本训练语句生成相应的字向量集合;
根据对应所述预训练模型中各个层级预设的层系数,结合所述原始学习率,分别确定各个层级对应的学习率,其中,所述学习率用于表征对各个层级对应的模型参数的调整幅度。
可选的,所述获取待训练的序列标注模型和样本训练语句集合之前,所述处理器302进一步用于:
获取多个样本训练语句,并确定各个样本训练语句的语句长度,基于所述各个样本训练语句的语句长度,对所述多个样本训练语句中的各个样本训练语句执行以下任意一项操作:
若所述语句长度未达到预设的固定语句长度,则采用预设的字符对所述一个样本训练语句进行填补,生成一个样本训练语句;或者,
若所述语句长度超过预设的固定语句长度时,则将所述一个样本训练语句中超过所述固定语句长度的部分进行截断,生成一个样本训练语句;或者,
若所述语句长度达到预设的固定语句长度时,则直接将所述一个样本训练语句作为一个样本训练语句。
可选的,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息时,所述处理器302用于:
将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对各个输入所述序列标注模型中的样本训练语句,分别执行以下操作:
确定一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合;
基于所述第一字向量集合对所述一个样本训练语句进行实体标注,得到对应的第一预测标注信息;
基于所述第一预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第一损失信息。
可选的,所述基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息时,所述处理器302用于:
将所述对抗扰动因子加入所述第一字向量集合,得到第二字向量集合;
基于所述第二字向量集合对所述一个样本训练语句进行实体标注,得到对应的第二预测标注信息;
基于所述第二预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第二损失信息。
可选的,所述确定满足预设的收敛条件时,所述处理器302用于:
确定连续N次迭代过程中,每一次迭代过程中对于样本训练语句的预测准确率与前一次迭代过程中对样本训练语句的预测准确率之间的差值,满足预设的准确率差值范围时,确定达到预设的收敛条件;或者,
确定连续M次迭代过程中,每一次迭代过程中所述序列标注模型的目标损失与前一次迭代过程中所述序列标注模型的目标损失之间的差值,满足预设的损失差值范围时,确定满足预设的收敛条件;或者,
确定当前迭代的次数达到预设的最大迭代次数时,确定达到预设的收敛条件。
可选的,所述输出训练后的所述序列标注模型后,所述处理器302进一步用于:
获取待处理语句,调用所述序列标注模型对所述待处理语句进行序列标注处理,得到输出的预测标注信息。
基于同一发明构思,本公开实施例提供一种存储介质,当所述存储介质中的指令由处理器执行时,使得所述处理器能够执行如上述实施例中任一项所述的序列标注模型的训练方法。
综上所述,本公开实施例中,本公开实施例中,获取待训练的序列标注模型和样本训练语句集合,然后,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息,再根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息,然后,基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。这样,通过在序列标注模型中增加对抗扰动因子,能够基于一个样本训练语句能够得到不同的损失信息,使得训练得到的所述序列标注模型的泛化能力更强,精度更高,而且避免了引入不必要的噪声干扰,节省资源消耗。此外,无需人工标注大量的训练样本,能够节省大量的人力成本和时间,从而能够提升序列标注模型的训练效率。
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本公开的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本公开范围的所有变更和修改。
显然,本领域的技术人员可以对本公开实施例进行各种改动和变型而不脱离本公开实施例的精神和范围。这样,倘若本公开实施例的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。

Claims (20)

  1. 一种序列标注模型的训练方法,其特征在于,包括:
    获取待训练的序列标注模型和样本训练语句集合;
    基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息;
    根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息;
    基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述序列标注模型的模型参数,确定对抗扰动因子,包括:
    获取所述序列标注模型当前的模型参数,并基于所述模型参数计算所述序列标注模型的梯度,并基于所述梯度以及预设的超参数计算对抗扰动因子,其中,所述超参数用于调整生成的对抗扰动的强弱。
  3. 如权利要求2所述的方法,其特征在于,所述基于所述梯度以及预设的超参数计算对抗扰动因子,包括:
    获取所述超参数,并将获得的所述超参数与所述梯度的乘积,与所述梯度的范数做商得到对抗扰动因子。
  4. 如权利要求1所述的方法,其特征在于,所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练之前,进一步包括:
    获取为所述序列标注模型中的预训练模型设置的原始学习率,所述预训练模型用于基于输入的样本训练语句生成相应的字向量集合;
    根据对应所述预训练模型中各个层级预设的层系数,结合所述原始学习率,分别确定各个层级对应的学习率,其中,所述学习率用于表征对各个层级对应的模型参数的调整幅度;
    所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,包括:
    基于确定的所述序列标注模型中各个层级的学习率以及所述目标损失信息,采用误差反向传播的方式调整所述序列标注模型中各个层级的模型参数。
  5. 如权利要求1所述的方法,其特征在于,所述获取待训练的序列标注模型和样本训练语句集合之前,进一步包括:
    获取多个样本训练语句,并确定各个样本训练语句的语句长度,基于所述各个样本训练语句的语句长度,对所述多个样本训练语句中的各个样本训练语句执行以下任意一项操 作:
    若所述语句长度未达到预设的固定语句长度,则采用预设的字符对所述一个样本训练语句进行填补,生成一个样本训练语句;或者,
    若所述语句长度超过预设的固定语句长度时,则将所述一个样本训练语句中超过所述固定语句长度的部分进行截断,生成一个样本训练语句;或者,
    若所述语句长度达到预设的固定语句长度时,则直接将所述一个样本训练语句作为一个样本训练语句。
  6. 如权利要求1-5任一项所述的方法,其特征在于,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息,包括:
    将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对各个输入所述序列标注模型中的样本训练语句,分别执行以下操作:
    确定一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合;
    基于所述第一字向量集合对所述一个样本训练语句进行实体标注,得到对应的第一预测标注信息;
    基于所述第一预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第一损失信息。
  7. 如权利要求6所述的方法,其特征在于,所述基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息,包括:
    将所述对抗扰动因子加入所述第一字向量集合,得到第二字向量集合;
    基于所述第二字向量集合对所述一个样本训练语句进行实体标注,得到对应的第二预测标注信息;
    基于所述第二预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第二损失信息。
  8. 如权利要求1-5任一项所述的方法,其特征在于,所述确定满足预设的收敛条件,包括:
    确定连续N次迭代过程中,每一次迭代过程中对于样本训练语句的预测准确率与前一次迭代过程中对样本训练语句的预测准确率之间的差值,满足预设的准确率差值范围时,确定达到预设的收敛条件;或者,
    确定连续M次迭代过程中,每一次迭代过程中所述序列标注模型的目标损失与前一次迭代过程中所述序列标注模型的目标损失之间的差值,满足预设的损失差值范围时,确定满足预设的收敛条件;或者,
    确定当前迭代的次数达到预设的最大迭代次数时,确定达到预设的收敛条件。
  9. 如权利要求1-5任一项所述的方法,其特征在于,所述输出训练后的所述序列标注 模型后,进一步包括:
    获取待处理语句,调用所述序列标注模型对所述待处理语句进行序列标注处理,得到输出的预测标注信息。
  10. 一种序列标注模型的训练装置,其特征在于,包括:
    获取单元,获取待训练的序列标注模型和样本训练语句集合;
    训练单元,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息;
    确定单元,根据所述序列标注模型的模型参数,确定对抗扰动因子,并基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息;
    调整单元,基于所述第一损失信息和所述第二损失信息计算得到目标损失信息,并基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练,以及确定满足预设的收敛条件时,输出训练后的所述序列标注模型。
  11. 如权利要求10所述的装置,其特征在于,所述根据所述序列标注模型的模型参数,确定对抗扰动因子时,所述确定单元用于:
    获取所述序列标注模型当前的模型参数,并基于所述模型参数计算所述序列标注模型的梯度,并基于所述梯度以及预设的超参数计算对抗扰动因子,其中,所述超参数用于调整生成的对抗扰动的强弱。
  12. 如权利要求11所述的装置,其特征在于,所述基于所述梯度以及预设的超参数计算对抗扰动因子时,所述确定单元用于:
    获取所述超参数,并将获得的所述超参数与所述梯度的乘积,与所述梯度的范数做商得到对抗扰动因子。
  13. 如权利要求10所述的装置,其特征在于,所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练之前,所述调整单元进一步用于:
    获取为所述序列标注模型中的预训练模型设置的原始学习率,所述预训练模型用于基于输入的样本训练语句生成相应的字向量集合;
    根据对应所述预训练模型中各个层级预设的层系数,结合所述原始学习率,分别确定各个层级对应的学习率,其中,所述学习率用于表征对各个层级对应的模型参数的调整幅度;
    所述基于所述目标损失信息对所述序列标注模型的模型参数进行调整并进行迭代训练时,所述调整单元用于:
    基于确定的所述序列标注模型中各个层级的学习率以及所述目标损失信息,采用误差反向传播的方式调整所述序列标注模型中各个层级的模型参数。
  14. 如权利要求10所述的装置,其特征在于,所述获取待训练的序列标注模型和样 本训练语句集合之前,所述获取单元进一步用于:
    获取多个样本训练语句,并确定各个样本训练语句的语句长度,基于所述各个样本训练语句的语句长度,对所述多个样本训练语句中的各个样本训练语句执行以下任意一项操作:
    若所述语句长度未达到预设的固定语句长度,则采用预设的字符对所述一个样本训练语句进行填补,生成一个样本训练语句;或者,
    若所述语句长度超过预设的固定语句长度时,则将所述一个样本训练语句中超过所述固定语句长度的部分进行截断,生成一个样本训练语句;或者,
    若所述语句长度达到预设的固定语句长度时,则直接将所述一个样本训练语句作为一个样本训练语句。
  15. 如权利要求10-14任一项所述的装置,其特征在于,基于所述样本训练语句集合对所述序列标注模型进行训练,得到第一损失信息时,所述训练单元用于:
    将所述样本训练语句集合中的各个样本训练语句输入序列标注模型,针对各个输入所述序列标注模型中的样本训练语句,分别执行以下操作:
    确定一个样本训练语句中各个字符对应的字向量,生成相应的第一字向量集合;
    基于所述第一字向量集合对所述一个样本训练语句进行实体标注,得到对应的第一预测标注信息;
    基于所述第一预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第一损失信息。
  16. 如权利要求15所述的装置,其特征在于,所述基于加入所述对抗扰动因子的样本训练语句集合对所述序列标注模型进行训练,得到第二损失信息时,所述确定单元用于:
    将所述对抗扰动因子加入所述第一字向量集合,得到第二字向量集合;
    基于所述第二字向量集合对所述一个样本训练语句进行实体标注,得到对应的第二预测标注信息;
    基于所述第二预测标注信息与所述一个样本训练语句对应的真实标注信息之间的标注差异计算得到一个第二损失信息。
  17. 如权利要求10-14任一项所述的装置,其特征在于,所述确定满足预设的收敛条件时,所述调整单元用于:
    确定连续N次迭代过程中,每一次迭代过程中对于样本训练语句的预测准确率与前一次迭代过程中对样本训练语句的预测准确率之间的差值,满足预设的准确率差值范围时,确定达到预设的收敛条件;或者,
    确定连续M次迭代过程中,每一次迭代过程中所述序列标注模型的目标损失与前一次迭代过程中所述序列标注模型的目标损失之间的差值,满足预设的损失差值范围时,确定 满足预设的收敛条件;或者,
    确定当前迭代的次数达到预设的最大迭代次数时,确定达到预设的收敛条件。
  18. 如权利要求10-14任一项所述的装置,其特征在于,所述输出训练后的所述序列标注模型后,所述调整单元进一步用于:
    获取待处理语句,调用所述序列标注模型对所述待处理语句进行序列标注处理,得到输出的预测标注信息。
  19. 一种序列标注模型的训练装置,其特征在于,包括:
    存储器,用于存储可执行指令;
    处理器,用于读取并执行所述存储器中存储的可执行指令,以实现如权利要求1至9中任一项所述的序列标注模型的训练方法。
  20. 一种存储介质,其特征在于,当所述存储介质中的指令由处理器执行时,使得所述处理器能够执行如权利要求1至9中任一项所述的序列标注模型的训练方法。
PCT/CN2021/094180 2020-06-24 2021-05-17 一种序列标注模型的训练方法及装置 WO2021258914A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010591966.2 2020-06-24
CN202010591966.2A CN111737952A (zh) 2020-06-24 2020-06-24 一种序列标注模型的训练方法及装置

Publications (1)

Publication Number Publication Date
WO2021258914A1 true WO2021258914A1 (zh) 2021-12-30

Family

ID=72651128

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/094180 WO2021258914A1 (zh) 2020-06-24 2021-05-17 一种序列标注模型的训练方法及装置

Country Status (2)

Country Link
CN (1) CN111737952A (zh)
WO (1) WO2021258914A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896307A (zh) * 2022-06-30 2022-08-12 北京航空航天大学杭州创新研究院 时间序列数据增强方法、装置和电子设备
CN115881212A (zh) * 2022-10-26 2023-03-31 溪砾科技(深圳)有限公司 一种基于rna靶点的小分子化合物筛选方法及装置
CN115938353A (zh) * 2022-11-24 2023-04-07 北京数美时代科技有限公司 语音样本分布式采样方法、系统、存储介质和电子设备
CN116388884A (zh) * 2023-06-05 2023-07-04 浙江大学 防窃听超声波干扰样本设计方法、系统及装置
CN116503923A (zh) * 2023-02-16 2023-07-28 深圳市博安智控科技有限公司 训练人脸识别模型的方法及装置

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737952A (zh) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 一种序列标注模型的训练方法及装置
CN112434213B (zh) * 2020-10-15 2023-09-29 中国科学院深圳先进技术研究院 网络模型的训练方法、信息推送方法及相关装置
CN114648028A (zh) * 2020-12-21 2022-06-21 阿里巴巴集团控股有限公司 标注模型的训练方法、装置、电子设备和存储介质
CN113032560B (zh) * 2021-03-16 2023-10-27 北京达佳互联信息技术有限公司 语句分类模型训练方法、语句处理方法及设备
CN115879524A (zh) * 2021-09-27 2023-03-31 华为技术有限公司 一种模型训练方法及其相关设备
CN114020887B (zh) * 2021-10-29 2023-11-07 北京有竹居网络技术有限公司 用于确定响应语句的方法、设备、装置和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902705A (zh) * 2018-10-30 2019-06-18 华为技术有限公司 一种对象检测模型的对抗扰动生成方法和装置
CN110459282A (zh) * 2019-07-11 2019-11-15 新华三大数据技术有限公司 序列标注模型训练方法、电子病历处理方法及相关装置
CN110781934A (zh) * 2019-10-15 2020-02-11 深圳市商汤科技有限公司 监督学习、标签预测方法及装置、电子设备和存储介质
US20200134468A1 (en) * 2018-10-26 2020-04-30 Royal Bank Of Canada System and method for max-margin adversarial training
CN111178527A (zh) * 2019-12-31 2020-05-19 北京航空航天大学 一种渐进式的对抗训练方法及装置
CN111737952A (zh) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 一种序列标注模型的训练方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582793B (zh) * 2018-11-23 2023-05-23 深圳前海微众银行股份有限公司 模型训练方法、客服系统及数据标注系统、可读存储介质
CN110532377B (zh) * 2019-05-13 2021-09-14 南京大学 一种基于对抗训练和对抗学习网络的半监督文本分类方法
CN111091004B (zh) * 2019-12-18 2023-08-25 上海风秩科技有限公司 一种语句实体标注模型的训练方法、训练装置及电子设备
CN111191453A (zh) * 2019-12-25 2020-05-22 中国电子科技集团公司第十五研究所 一种基于对抗训练的命名实体识别方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134468A1 (en) * 2018-10-26 2020-04-30 Royal Bank Of Canada System and method for max-margin adversarial training
CN109902705A (zh) * 2018-10-30 2019-06-18 华为技术有限公司 一种对象检测模型的对抗扰动生成方法和装置
CN110459282A (zh) * 2019-07-11 2019-11-15 新华三大数据技术有限公司 序列标注模型训练方法、电子病历处理方法及相关装置
CN110781934A (zh) * 2019-10-15 2020-02-11 深圳市商汤科技有限公司 监督学习、标签预测方法及装置、电子设备和存储介质
CN111178527A (zh) * 2019-12-31 2020-05-19 北京航空航天大学 一种渐进式的对抗训练方法及装置
CN111737952A (zh) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 一种序列标注模型的训练方法及装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896307A (zh) * 2022-06-30 2022-08-12 北京航空航天大学杭州创新研究院 时间序列数据增强方法、装置和电子设备
CN114896307B (zh) * 2022-06-30 2022-09-27 北京航空航天大学杭州创新研究院 时间序列数据增强方法、装置和电子设备
CN115881212A (zh) * 2022-10-26 2023-03-31 溪砾科技(深圳)有限公司 一种基于rna靶点的小分子化合物筛选方法及装置
CN115938353A (zh) * 2022-11-24 2023-04-07 北京数美时代科技有限公司 语音样本分布式采样方法、系统、存储介质和电子设备
CN116503923A (zh) * 2023-02-16 2023-07-28 深圳市博安智控科技有限公司 训练人脸识别模型的方法及装置
CN116503923B (zh) * 2023-02-16 2023-12-08 深圳市博安智控科技有限公司 训练人脸识别模型的方法及装置
CN116388884A (zh) * 2023-06-05 2023-07-04 浙江大学 防窃听超声波干扰样本设计方法、系统及装置
CN116388884B (zh) * 2023-06-05 2023-10-20 浙江大学 防窃听超声波干扰样本设计方法、系统及装置

Also Published As

Publication number Publication date
CN111737952A (zh) 2020-10-02

Similar Documents

Publication Publication Date Title
WO2021258914A1 (zh) 一种序列标注模型的训练方法及装置
CN111125331B (zh) 语义识别方法、装置、电子设备及计算机可读存储介质
US11120801B2 (en) Generating dialogue responses utilizing an independent context-dependent additive recurrent neural network
CN108334891B (zh) 一种任务型意图分类方法及装置
US11783008B2 (en) Machine-learning tool for generating segmentation and topic metadata for documents
EP3144860A2 (en) Subject estimation system for estimating subject of dialog
JP2017076403A (ja) 人間から示唆を得た簡単質問応答(hisqa)システム及び方法
KR20180091850A (ko) 외부 메모리로 신경망들 증강
CN112541060B (zh) 一种基于对抗训练的端到端任务型对话学习框架和方法
CN114385178A (zh) 基于抽象语法树结构信息增强的代码生成方法
CN111354333A (zh) 一种基于自注意力的汉语韵律层级预测方法及系统
CN110807335A (zh) 基于机器学习的翻译方法、装置、设备及存储介质
WO2020108545A1 (zh) 语句处理方法、语句解码方法、装置、存储介质及设备
CN113743099A (zh) 基于自注意力机制方面术语提取系统、方法、介质、终端
CN113128206A (zh) 基于单词重要性加权的问题生成方法
CN112270181A (zh) 序列标注方法、系统、计算机可读存储介质及计算机设备
CN114118088A (zh) 基于超图卷积神经网络的文档级实体关系抽取方法及装置
WO2023093909A1 (zh) 一种工作流节点推荐方法及装置
CN112183062A (zh) 一种基于交替解码的口语理解方法、电子设备和存储介质
CN109117471A (zh) 一种词语相关度的计算方法及终端
CN109815253A (zh) 一种查询语句的主语实体识别方法及装置
CN114626376A (zh) 文本分类模型的训练方法、装置及文本分类方法
CN115809663A (zh) 习题分析方法、装置、设备及存储介质
CN113836903A (zh) 一种基于情境嵌入和知识蒸馏的企业画像标签抽取方法及装置
Wu et al. Analyzing the Application of Multimedia Technology Assisted English Grammar Teaching in Colleges

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21828710

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21828710

Country of ref document: EP

Kind code of ref document: A1