WO2022227217A1 - 文本分类模型的训练方法、装置、设备及可读存储介质 - Google Patents

文本分类模型的训练方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2022227217A1
WO2022227217A1 PCT/CN2021/097412 CN2021097412W WO2022227217A1 WO 2022227217 A1 WO2022227217 A1 WO 2022227217A1 CN 2021097412 W CN2021097412 W CN 2021097412W WO 2022227217 A1 WO2022227217 A1 WO 2022227217A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
trained
classification model
classification
vector
Prior art date
Application number
PCT/CN2021/097412
Other languages
English (en)
French (fr)
Inventor
程华东
舒畅
陈又新
李剑锋
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022227217A1 publication Critical patent/WO2022227217A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of predictive analysis of artificial intelligence, and in particular, to a training method for a BERT-based text classification model, a training device for a text classification model, computer equipment, and a computer-readable storage medium.
  • the text-level classification model refers to firstly determining the first-level category to which the text belongs, and then determining the second-level category under the first-level category. For example, “Television” belongs to the category “Household Appliances”, which then belongs to “Large Appliances” under the “Household Appliances” category.
  • most conventional text-level classification models establish k+1 text classification models, including a first-level text classification model and k second-level text classification models, that is, a classification model is established for each first-level category.
  • the specific implementation process is as follows: first, the first-level classification model is used to determine the first-level category of the text, then the corresponding second-level classification model is selected according to its category, and the second-level classification model is used to classify the text again to determine its second-level category.
  • the inventor found that although the above method can achieve text-level classification, the model structure is complex, the scalability is not good, and the use efficiency is low.
  • the present application provides a training method, device, computer equipment and storage medium for a BERT-based text classification model, so as to improve the training efficiency of the model and at the same time improve the expansibility and stability of the model.
  • the present application provides a training method for a BERT-based text classification model, the method comprising:
  • the present application also provides a training device for a text classification model, the device comprising:
  • a model loading module for loading the classification model to be trained, and identifying the number of classification levels N included in the classification model to be trained, wherein the classification model to be trained is generated based on the BERT language model;
  • a sample processing module configured to receive input training samples, to process the training samples to obtain a representation vector corresponding to each training sample
  • a model training module configured to receive an input initial vector, and perform N iterations of training on the classification model to be trained according to the representation vector and the initial vector, so as to perform N classification levels of the classification model to be trained. Training, in which each iteration training completes the training of a classification level, and the corresponding classification level converges when each iteration training is completed;
  • the model storage module is configured to determine that the training of the classification model to be trained is completed when it is determined that the Nth classification level is converged after the Nth iteration training, and store the trained classification model.
  • the present application also provides a computer device, the computer device includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program and execute the computer program.
  • the computer program implements the above-mentioned training method of the BERT-based text classification model.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the above-mentioned BERT-based Training methods for text classification models.
  • the present application discloses a training method, device, computer equipment and storage medium for a BERT-based text classification model.
  • the training samples are processed based on the BERT language model, and the classification model is constructed according to actual needs.
  • the constructed classification model is iteratively trained several times according to the processed training samples, so that the classification model after the training can realize hierarchical classification.
  • there is no mutual interference between the sub-models that is, the latter sub-model does not need to re-train the previous module if it does not converge, which improves the efficiency of model training.
  • the change of the model structure enables the classification model to achieve more levels of classification, making the classification model more scalable to meet more classification requirements.
  • FIG. 1 is a schematic flowchart of a training method of a BERT-based text classification model provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a classification model provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a classification model provided by another embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a classification model provided by yet another embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a step of training a classification model to be trained according to an embodiment of the present application
  • FIG. 6 is a schematic flowchart of the steps of providing the first training according to an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a classification prediction step provided by an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of an apparatus for training a text classification model according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a training method of a BERT-based text classification model provided by an embodiment of the present application.
  • the training method includes steps S101 to S104.
  • Step S101 Load the classification model to be trained, and identify the number N of classification levels included in the classification model to be trained, wherein the classification model to be trained is generated based on the BERT language model.
  • the structure of the classification model is pre-built, and then the built classification model is loaded, that is, the classification model to be trained is loaded, and then the loaded classification model is loaded.
  • the classification model to be trained is trained.
  • the number N of classification levels included in the loaded classification model to be trained will be identified, wherein the number of classification levels indicates that the model will be predicted when the model is used for classification prediction.
  • the corresponding amount of category information for example, when the number of classification levels N is two, when classifying a text to be classified, the first-level label and the second-level label of the text to be classified will be output. When it is three, the primary label, secondary label and tertiary label of the text to be classified will be output, and at the same time, there is a certain correlation between labels at different levels.
  • the secondary label belongs to the subordinate label of the primary label
  • the tertiary label belongs to the subordinate label of the secondary label
  • a corresponding classification model needs to be constructed before loading the classification model to be trained.
  • the structure of the classification model at this time is shown in Figure 2, and based on the pre-trained BERT (Bidirectional Encoder Representation from Transformers) language model and The structure of the specific model is determined according to the actual needs of the user.
  • the output of the second-level label is required, and N is equal to 2, as shown in Figure 3.
  • the output of the third-level label is required, then At this time, N is equal to 3, as shown in Figure 4.
  • each embodiment of the present application will be explained and described by taking the number of classification levels N as 2, that is, the model structure of the classification model constructed at this time is shown in FIG. 3 .
  • the label information of the text can be output through the model.
  • the first-level label and the second-level label corresponding to the text can be output. Therefore, when the model is constructed, different The layers of the classification model need to predict and output different labels of the text, so when building the model and training the model, each layer in the classification model needs to be trained.
  • the prediction of three-level labels and more level labels can be realized by adding a simple model structure. Specifically, on the basis of the model structure shown in FIG.
  • the attention layer outputs the text representation vector corresponding to this level for input into the same structure as shown in A in the next level.
  • the model has better scalability.
  • the prediction of multi-level labels of the model needs to be increased, it is only necessary to expand the model, and then the expanded part can be implemented.
  • the structure is trained without the need to improve and train the entire model, which improves the scalability of the model and the ease of training after expansion.
  • Step S102 Receive input training samples, so as to process the training samples to obtain a representation vector corresponding to each training sample.
  • the loaded classification model will be trained, so the input training samples will be received at this time, and then the training samples will be processed to obtain each training sample in the training samples.
  • the representation vector corresponding to the sample is a vector used to describe the characteristics of the training samples.
  • the training samples are composed of several text information, that is, each training sample is a piece of text information. Therefore, when the feature extraction is performed on the text information, the BERT coding layer set in the model structure is used to realize the encoding of the samples. Feature extraction to obtain the representation vector corresponding to each training sample, wherein the training sample can be expressed as (x 1 ,x 2 ,...,x n ), and the extracted representation vector at this time is (z 1 ,z 2 ,...,z n ), z i is a d-dimensional vector.
  • the training sample is input into the BERT coding layer of the loaded classification model to be trained, so as to perform feature extraction and coding on the training sample according to the pre-trained BERT coding layer. , and then obtain the representation vector corresponding to each training sample.
  • Step S103 receiving an input initial vector, to perform N iterations of training on the classification model to be trained according to the characterization vector and the initial vector, so as to train each classification level of the classification model to be trained, wherein, each iteration training completes the training of a classification level, and the corresponding classification level converges when each iteration training is completed.
  • the classification model to be trained will be trained according to the received initial vector and the processed representation vector of each training sample, to The classification model to be trained after the training is completed can be used.
  • the input initial vector is also received before the model training is performed, wherein the initial vector is a random vector, denoted as q 1 , and q 1 is a d-dimensional vector, and then the initial vector and the sample corresponding to The representation vector of is used as training data for training one or some structures in the model. Since the number of classification levels in the constructed model structure is N, it is necessary to train all N classification levels during training, so that each level can accurately predict labels.
  • each classification level in the model is trained.
  • the training of the second classification level will only be entered when the training of the first classification level is completed. Only when one classification level is trained will the third classification level be trained, and so on to complete the training of the Nth classification level, that is, to complete the training of the entire classification model to be trained.
  • FIG. 5 is a schematic flowchart of a step of training a classification model to be trained according to an embodiment of the present application.
  • steps S501 to S503 are included.
  • Step S501 receive the input initial vector, carry out the first training of the classification model to be trained according to the characterization vector and the initial vector, and determine whether the classification model to be trained after the first training converges, so as to determine whether the classification model is converged.
  • the training of the first classification level of the classification model to be trained is completed;
  • Step S502 when it is determined that the classification model to be trained after the first training has converged, input the initial vector and the representation vector into the first classification level that has completed the training, and obtain a level corresponding to each training sample.
  • text representation vector
  • Step S503 Perform secondary training on the to-be-trained classification model obtained by the first training according to the first-level text representation vector and the training sample, so as to train the second classification level of the to-be-trained classification model, and And so on to perform N iterations of training on the classification model to be trained.
  • the input initial vector is also received, and then the representation vector obtained by processing the training samples and the received
  • the initial vector is used to train the classification model to be trained once, and determine whether the classification model to be trained after the first training converges, and when determining whether the classification model to be trained after the first training converges, it is determined after the first iteration of training. Whether the first classification level being trained is converged. At the same time, when the convergence is determined, the to-be-trained classification model that has been trained at the first classification level will be obtained.
  • the classification model to be trained is trained once, the first classification level in the classification model to be trained is trained, and the training samples and the received initial vector are used as the first classification level in the first classification level. input for the training of each classification level, and then determine whether the first classification level trained converges.
  • FIG. 6 provides a schematic flowchart of the steps of the first training for an embodiment of the present application, which specifically includes:
  • Step S601 inputting the initial vector and the characterization vector to the first attention layer, to obtain an intermediate characterization vector corresponding to each characterization vector as output;
  • Step S602 inputting the intermediate representation vector into the first fully connected layer connected with the first attention layer, and outputting the corresponding spatial vector;
  • Step S603 Input the space vector into a preset label probability distribution formula to obtain a label probability distribution curve, and read the label corresponding to the maximum probability value in the label probability distribution curve as a first-level predicted label.
  • the first sub-model in the model is trained, wherein the first sub-model is structured as shown in box A in FIG. 2 .
  • the first training it is implemented by using the input initial vector and training samples, and each training sample will pass through the BERT coding layer to obtain the corresponding representation vector.
  • the initial vector and representation vector are input into the first attention layer to output the intermediate representation vector corresponding to each representation vector, and then the intermediate representation vector is input into the first fully connected layer to obtain The output obtains the space vector corresponding to each training sample, and finally, according to the obtained space vector, the label probability distribution of the first-level label corresponding to each training sample is obtained, and the label with the highest label probability in the label probability distribution is the label at this time.
  • each training sample has its own label, including the first-level label, Secondary labels, even tertiary labels, etc., are marked according to actual needs.
  • a second training when it is determined that the classification model after the first training has converged, a second training will be performed, and during the second training, the first-level text representation vector will be obtained by using the model after the first training, and then the first-level text representation vector will be obtained by using the The obtained first-level text representation vector and the pre-obtained representation vector are used as the input of the secondary training for the second training of the classification model to be trained.
  • the classification of the secondary training will also be performed during the second training.
  • the convergence of the model is judged, and the entire training process is a continuous iterative training process. There is a certain difference in the input data of each training, so as to realize the training of each level in the classification model, and then obtain the trained classification. Model.
  • the model training is carried out at one classification level and one classification level.
  • the small models of the following classification levels are not trained.
  • the model shown the first training is to train the first classification level
  • the second training is to train the second classification level
  • the nth training is to train the first classification level in the model. N classification levels for training.
  • determining whether the classification model to be trained after one training has converged includes: acquiring a loss value of the classification model to be trained after one training, so as to determine the classification model after one training according to the loss value. Whether the classification model to be trained has converged; if the loss value is less than or equal to the preset loss threshold, it is determined to converge; if the loss value is greater than the preset loss threshold, perform the step: receiving the input initial vector, to The characterization vector and the initial vector are used to train the classification model to be trained once, and it is determined whether the classification model to be trained after the training is converged.
  • the main idea is minimum risk training.
  • the basic idea of minimal risk training is to use a loss function ⁇ (y,y (n) ) to describe the degree to which the model predicts the difference between y and y (n) , and try to find a set of parameters that make the model lose the expected value on the training set (ie the risk) is the smallest, that is, whether the model converges is determined by the expected value of the loss of the model.
  • the model input is x(n)
  • the standard is y(n)
  • the prediction output of the model is y
  • the corresponding expected value of loss (risk) is:
  • Y(x (n) ) represents all possible output sets corresponding to x (n) , also called search space.
  • a loss value threshold can be set, and when the loss value obtained by the current training is less than or equal to the set loss value threshold, it is determined to converge, otherwise it is determined to not converge.
  • the model will continue to be trained. For example, during the first training, that is, when training the first level, if the loss value obtained at a certain stage in the training is greater than If the loss threshold is set, the first level will continue to be trained. If the loss value obtained at a certain stage in the training is not greater than the set loss threshold, the model is completed at this time.
  • the classification model to be trained when the classification model to be trained is trained for the first time, and it is determined that the classification model to be trained after the first training is converged, the first level after the first training is completed will be used for training.
  • the text is the first-level text representation vector corresponding to each sample, and the first-level text representation vector is similar to the initial vector of the initial input.
  • the received initial vector and the training samples processed by the BERT layer are input into the attention layer in the first level to obtain The first-level text feature vector corresponding to each training sample, and the first-level text representation vector used to train the next level is obtained when the first training converges.
  • the model parameters in the first level are adjusted accordingly to meet the actual needs, and then the initial vector and the training samples after passing through the BERT layer are input into the first level.
  • the attention layer in the first level the first-level text representation vector corresponding to each sample is used.
  • the calculation process of the first-level text representation vector can be as follows:
  • e is the base of the natural logarithm
  • q 1 is a d-dimensional initial vector
  • z i is a d-dimensional representation vector
  • a training sample corresponds to a representation vector.
  • W V1 and W K1 are the parameter matrices that the model needs to learn, that is, the parameter matrices that need to be adjusted. During a training, by continuously optimizing the parameter, the prediction of the first label corresponding to the text information can be more accurately achieved.
  • the parameters W V1 and W K1 are continuously adjusted to make the first level of the model converge after training, and the corresponding parameters W V1 and W K1 will be obtained at this time. Therefore, each parameter can be obtained by calculation.
  • the next level will be trained.
  • the representation vector of is input into the second level of the model to realize the training of the second level.
  • the model parameters in the second level are adjusted through continuous training.
  • the parameters in the second level can be represented by W V2 and W K2 , and when the second level converges, the model parameters W V2 and W K2 of the second level are adjusted.
  • Step S104 When it is determined that the Nth classification level after the Nth iteration training is converged, it is determined that the training of the classification model to be trained is completed, and the trained classification model is stored.
  • the trained classification model Since the number of classification levels of the constructed and loaded classification model to be trained is N, it needs to be trained for N times during the training process. When it is determined that the trained classification model has converged after N times of training, the trained classification model will be stored for subsequent use. Specifically, when it is determined that the Nth classification level after the Nth iteration training is converged, it is determined that the to-be-trained classification model has been trained, and the trained classification model will be recorded and stored at this time.
  • each training is performed when the previous training converges, so when it is determined that the model does not converge, only the currently trained level is trained.
  • the training is only continued for the object trained for the third time.
  • the third attention layer and The third sub-model formed by the third fully connected layer is trained, such as the sub-model part included in the C box in Figure 4, and the sub-model part included in the A box and the B box will not be trained again.
  • the model is constructed and trained in this way, the efficiency of model training is improved, and at the same time, when the model is expanded, it has higher scalability, and at the same time, when the expanded model is trained, the model training can be completed more quickly.
  • FIG. 7 is a schematic flowchart of a classification prediction step according to an embodiment of the present application.
  • this step includes steps S701 to S703.
  • Step S701 when receiving the classification instruction, load the stored trained classification model
  • Step S702 receiving the input text information to be processed, and obtaining the stored query vector
  • Step S703 Input the query vector and the to-be-processed text information into the trained classification model to output a classification result corresponding to the to-be-processed text information, wherein the classification result includes the to-be-processed text information N labels corresponding to text information.
  • the trained classification model can be directly loaded and used. Therefore, when receiving the classification instruction, first load the pre-trained and stored classification model, and then use the loaded trained classification model to achieve Classification prediction for textual information.
  • the input text information to be processed is received, and a pre-stored query vector is obtained at the same time, wherein the query vector is randomly set, and then the received text information to be processed and the acquired
  • the initial vector is input into the loaded classification model to output the classification result corresponding to the text information to be processed.
  • the classification model processes the to-be-processed text information, it first receives the input to-be-processed text information, and performs feature extraction on the to-be-processed text information to obtain a representation vector corresponding to the to-be-processed text information, and then receives the input query vector, Input the representation vector into the first attention layer for first-level label prediction to calculate the first-level text classification representation vector corresponding to the text information to be processed, and then input the first-level text classification representation vector into the first fully connected layer. , to obtain the first space vector, and finally perform softmax calculation according to the first space vector to obtain the probability distribution of the first-level label, and then select the label corresponding to the maximum probability value as the first-level label. Specifically, the index of the maximum value is selected as First-level label, and output the obtained first-level label.
  • the prediction and output of the first-level labels are completed, the prediction and output of the second-level labels and even the third-level labels are also performed, and the number of levels of the specific output labels is related to the number of classification levels of the model used to classify The number of layers is two as an example.
  • the first-level text classification representation vector is also input into the second attention layer for the second-level label prediction, so as to calculate the second-level corresponding to the text information to be processed.
  • the text classification representation vector, and then the secondary text classification representation vector is input to the second fully connected layer to obtain the second space vector, and then the softmax calculation is performed according to the second space vector to obtain the index of the maximum value as the secondary label, and Output the obtained secondary label.
  • the training samples are processed based on the BERT language model, and the structure of the classification model is constructed according to the actual needs, and then according to the processed training samples
  • the sample performs iterative training on the constructed classification model for several times, so that the classification model after the training is completed can realize hierarchical classification.
  • there is no mutual interference between the sub-models that is, the latter sub-model does not need to re-train the previous module if it does not converge, which improves the efficiency of model training.
  • the change of the model structure enables the classification model to achieve more levels of classification, making the classification model more scalable to meet more classification requirements.
  • FIG. 8 is a schematic block diagram of an apparatus for training a text classification model according to an embodiment of the present application, and the apparatus is configured to execute the aforementioned training method for a text classification model based on BERT.
  • the training device 800 of the text classification model includes:
  • a model loading module 801 configured to load a classification model to be trained, and identify the number N of classification levels included in the classification model to be trained, wherein the classification model to be trained is generated based on the BERT language model;
  • a sample processing module 802 configured to receive input training samples, to process the training samples to obtain a representation vector corresponding to each training sample;
  • a model training module 803 configured to receive an input initial vector, and perform N iterations of training on the classification model to be trained according to the characterization vector and the initial vector, so as to perform N classifications of the classification model to be trained.
  • the training is performed at different levels, in which each iteration training completes the training of a classification level, and the corresponding classification level converges when each iteration training is completed;
  • the model storage module 804 is configured to, when it is determined that the Nth classification level is converged after the Nth iteration training, determine that the training of the classification model to be trained is completed, and store the trained classification model.
  • the classification model to be trained includes a BERT encoding layer.
  • the sample processing module 802 is further used for:
  • the training samples are input into the BERT encoding layer to encode each of the training samples to obtain a representation vector corresponding to each of the training samples.
  • model training module 803 is further used for:
  • Receive the input initial vector perform the first training of the classification model to be trained according to the characterization vector and the initial vector, and determine whether the classification model to be trained after the first training has converged, so as to complete the training of the classification model when convergence is determined.
  • the training of the first classification level of the classification model to be trained when it is determined that the classification model to be trained after the first training is converged, the initial vector and the characterization vector are input into the first classification level of the completed training, Obtain the first-level text representation vector corresponding to each training sample; perform the second training on the to-be-trained classification model obtained by the first training according to the first-level text representation vector and the training sample, so that the to-be-trained classification model is trained for the second time.
  • the second classification level of the classification model is trained, and so on to perform N iterations of training on the classification model to be trained.
  • the classification model to be trained further includes N attention layers and N fully connected layers, and the BERT encoding layer is connected to the N attention layers respectively, and a fully connected layer is connected to an attention layer. layer, the attention layer is located between the BERT encoding layer and the fully connected layer, and the model training module 803 is specifically also used for:
  • the initial vector of the input is received, and the first training of the classification model to be trained is performed according to the characterization vector and the initial vector, including:
  • the initial vector and the characterization vector are input to the first attention layer to output an intermediate representation vector corresponding to each representation vector; the intermediate representation vector is input to the first full layer connected to the first attention layer. Connect the layer, and output the corresponding space vector; input the space vector into the preset label probability distribution formula to obtain the label probability distribution curve, and read the label corresponding to the maximum probability value in the label probability distribution curve as a level prediction labels.
  • model training module 803 is further used for:
  • the loss value of the classification model to be trained after the first training, and obtain a preset loss threshold; compare the loss value with the loss threshold; if the loss value is less than or equal to the loss threshold, Convergence is determined; if the loss value is greater than the loss threshold, it is determined not to converge.
  • model training module 803 is further used for:
  • the training apparatus 800 for the text classification model further includes a model invocation module 805, wherein the model invocation module 805 is further used for:
  • a classification result corresponding to the to-be-processed text information is output, wherein the classification result includes N labels corresponding to the to-be-processed text information.
  • the above-mentioned apparatus can be implemented in the form of a computer program, and the computer program can be executed on a computer device as shown in FIG. 7 .
  • FIG. 9 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
  • the computer device may be a server.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.
  • the nonvolatile storage medium can store operating systems and computer programs.
  • the computer program includes program instructions, which, when executed, can cause the processor to execute any training method for a BERT-based text classification model.
  • the processor is used to provide computing and control capabilities to support the operation of the entire computer equipment.
  • the internal memory provides an environment for running the computer program in the non-volatile storage medium, and when the computer program is executed by the processor, the processor can cause the processor to execute any BERT-based text classification model training method.
  • the network interface is used for network communication, such as sending assigned tasks.
  • the network interface is used for network communication, such as sending assigned tasks.
  • FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated circuits) Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • the processor is configured to run a computer program stored in the memory to implement the following steps:
  • the classification model to be trained Load the classification model to be trained, and identify the number of classification levels N included in the classification model to be trained, wherein the classification model to be trained is generated based on the BERT language model; receive the input training sample, and process the training sample to obtain A representation vector corresponding to each training sample; receiving an input initial vector, and performing N iterations of training on the classification model to be trained according to the representation vector and the initial vector, so that each classification model of the classification model to be trained is trained for N iterations.
  • One classification level is trained, wherein each iteration training completes the training of one classification level, and the corresponding classification level converges when each iteration training is completed; when it is determined that the Nth classification level is converged after the Nth iteration training, it is determined that the The training of the classification model is completed, and the trained classification model is stored.
  • the classification model to be trained includes a BERT coding layer, and when the processor performs the processing of the training samples to obtain a representation vector corresponding to each training sample, the processor is further configured to achieve :
  • the training samples are input into the BERT encoding layer to encode each of the training samples to obtain a representation vector corresponding to each of the training samples.
  • the processor when the processor implements the initial vector of the received input and performs N iterations of training on the classification model to be trained according to the characterization vector and the initial vector, the processor is further configured to implement:
  • Receive the input initial vector perform the first training of the classification model to be trained according to the characterization vector and the initial vector, and determine whether the classification model to be trained after the first training has converged, so as to complete the training of the classification model when convergence is determined.
  • the training of the first classification level of the classification model to be trained when it is determined that the classification model to be trained after the first training is converged, the initial vector and the characterization vector are input into the first classification level of the completed training, Obtain the first-level text representation vector corresponding to each training sample; perform the second training on the to-be-trained classification model obtained by the first training according to the first-level text representation vector and the training sample, so that the to-be-trained classification model is trained for the second time.
  • the second classification level of the classification model is trained, and so on to perform N iterations of training on the classification model to be trained.
  • the classification model to be trained further includes N attention layers and N fully connected layers, and the BERT encoding layer is connected to the N attention layers respectively, and one fully connected layer is connected to one attention layer,
  • the attention layer is located between the BERT coding layer and the fully connected layer, and the processor performs the first step of the classification model to be trained according to the characterization vector and the initial vector for the initial vector of the received input. In one training, it is also used to achieve:
  • the initial vector and the characterization vector are input to the first attention layer to output an intermediate representation vector corresponding to each representation vector; the intermediate representation vector is input to the first full layer connected to the first attention layer. Connect the layer, and output the corresponding space vector; input the space vector into the preset label probability distribution formula to obtain the label probability distribution curve, and read the label corresponding to the maximum probability value in the label probability distribution curve as a level prediction labels.
  • the processor when implementing the determining whether the classification model to be trained after one training is converged, is further configured to:
  • the loss value of the classification model to be trained after the first training, and obtain a preset loss threshold; compare the loss value with the loss threshold; if the loss value is less than or equal to the loss threshold, Convergence is determined; if the loss value is greater than the loss threshold, it is determined not to converge.
  • the processor is further configured to:
  • the processor when implementing the computer program, is further configured to implement:
  • a classification result corresponding to the to-be-processed text information is output, wherein the classification result includes N labels corresponding to the to-be-processed text information.
  • the embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to implement the present application Any one of the training methods of the BERT-based text classification model provided in the embodiment.
  • the computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as a hard disk or a memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) ) card, Flash Card, etc.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Abstract

一种文本分类模型的训练方法、装置、设备及可读存储介质,该方法包括:加载待训练分类模型,识别包含的分类层级数N;接收输入的训练样本,得到每一训练样本的表征向量;接收输入的初始向量,根据表征向量以及初始向量对待训练分类模型进行N次迭代训练;当确定第N次迭代训练后第N分类层级收敛时,存储训练好的分类模型。

Description

文本分类模型的训练方法、装置、设备及可读存储介质
本申请要求于2021年04月28日提交中国专利局、申请号为202110470115.7发明名称为“文本分类模型的训练方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能的预测分析技术领域,尤其涉及一种基于BERT的文本分类模型的训练方法、文本分类模型的训练装置、计算机设备及计算机可读存储介质。
背景技术
文本层级分类模型指的是先确定文本所属的一级类别,紧接着确定一级类别下所属的二级类别。例如,“电视机”属于类别“家用电器”,接着属于“家用电器”类别下的“大家电”。
目前,常规的文本层级分类模型大多数建立k+1个文本分类模型,包括一个一级文本分类模型和k个二级文本分类模型,即为每个一级类别建立一个分类模型。而具体的实现过程为:首先使用一级分类模型确定文本的一级类别,接着根据其类别选择对应的二级分类模型,使用二级分类模型对文本再次进行分类以确定其二级类别。而在进行分类处理时,发明人发现上述方式虽然可以实现文本层级的分类,但是模型结构复杂,扩展性不好,同时在使用时使用效率较低。
因此,现在亟需一种模型训练效率以及模型扩展性的的基于BERT的文本分类模型的训练方法。
发明内容
本申请提供了一种基于BERT的文本分类模型的训练方法、装置、计算机设备及存储介质,以提高了模型的训练效率,同时提高了模型的扩展性和稳定性。
第一方面,本申请提供了一种基于BERT的文本分类模型的训练方法,所述方法包括:
加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成;
接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量;
接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,以对所述待训练分类模型的每一分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛;
当确定第N次迭代训练后第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
第二方面,本申请还提供了一种文本分类模型的训练装置,所述装置包括:
模型加载模块,用于加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成;
样本处理模块,用于接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量;
模型训练模块,用于接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,以对所述待训练分类模型的N个分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛;
模型存储模块,用于当确定第N次迭代训练后第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
第三方面,本申请还提供了一种计算机设备,所述计算机设备包括存储器和处理器;所述存储器用于存储计算机程序;所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如上述的基于BERT的文本分类模型的训练方法。
第四方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如上述的基于BERT的文本分类模型的训练方法。
本申请公开了一种基于BERT的文本分类模型的训练方法、装置、计算机设备及存储介质,在对分类模型进行训练时,基于BERT语言模型对训练样本进行处理,同时根据实际的需求构建分类模型的结构,接着根据处理后的训练样本对所构建的分类模型进行若干次的迭代训练,以使得训练完成后的分类模型可以实现层级分类。而在训练过程中,各个子模型之间不存在相互的干扰,也就是后一个子模型在不收敛的情况下不需要对前面的模块再次进行训练,提高了模型训练的效率,同时可以通过对模型结构的改变使得分类模型实现对更多层级的分类,使得分类模型具有更好的扩展性,以适应更多的分类需求。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一个实施例提供的一种基于BERT的文本分类模型的训练方法的流程示意图;
图2为本申请一个实施例提供的一种分类模型的结构示意图;
图3为本申请另一个实施例提供的一种分类模型的结构示意图;
图4为本申请又一个实施例提供的一种分类模型的结构示意图;
图5为本申请一个实施例提供的一种对待训练的分类模型进行训练的步骤的流程示意图;
图6为本申请一个实施例提供第一次训练的步骤的流程示意图;
图7为本申请一个实施例提供的一分类预测的步骤的流程示意图;
图8为本申请一个实施例提供的一种文本分类模型的训练装置的示意性框图;
图9为本申请一个实施例提供的计算机设备的结构示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。
应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
请参阅图1,图1为本申请一个实施例提供的一种基于BERT的文本分类模型的训练方法的流程示意图。
如图1所示,该训练方法包括步骤S101至步骤S104。
步骤S101、加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成。
在对模型进行训练时,需要预先构建和加载需要进行训练的模型,因此预先构建还分类模型的结构,然后加载所构建好的分类模型,也就是加载待训练的分类模型,进而通过对所加载的待训练的分类模型进行训练。
在一实施例中,在加载了待训练分类模型之后,将会识别所加载的待训练分类模型中所包含的分类层级数N,其中分类层级数表示在使用该模型进行分类预测时将会预测对应数量的类别信息,比如在分类层级数N为二时,在进行分类时,对一个待分类文本进行处理时,将会输出该待分类文本的一级标签和二级标签,再比如在N为三时,将会输出该待分类文本的一级标签、二级标签以及三级标签,同时,不同级别的标签之间存在有一定的关联性。
具体的关联性表现在:二级标签属于一级标签的下属标签,以及三级标签属于二级标签的下属标签。
在一实施例中,在加载待训练分类模型之前,需要构建对应的分类模型,此时的分类模型的结构如图2所示,且基于预训练的BERT(Bidirectional Encoder Representation from Transformers)语言模型而得到的,而具体的模型的结构根据使用者的实际需求所确定,比如需要进行二级标签的输出,此时N等于2,如图3所示,再比如需要进行三级标签的输出,那么此时N等于3,如图4。在此,以分类层级数N为2对本申请各实施例进行解释说明,也就是此时所构建的分类模型的模型结构如图3所示。
在实际应用中,在分类模型进行使用时,通过模型可以对文本的标签信息进行输出,示例性的,可以输出文本所对应的一级标签和二级标签,因此在对模型进行构建时,不同的层级需要实现对文本的不同标签进行预测和输出,故而在构建模型以及进行模型训练时,对分类模型中的每一层级都需要进行训练。
基于所构建的分类模型的结构特征,可以通过简单的模型结构的添加以实现三级标签以及更多层级标签的预测,具体地,在图2所示的模型结构的基础上,在前一层级的attention层输出该层级所对应的文本表征向量以输入到下一层级中与A所示的相同结构中。
在一实施例中,通过对模型结构的优化,使得模型具有更好的扩展性,在需要增加模型的多级标签的预测时,只需要进行模型的扩展即可实现,进而对所扩展的部分结构进行训练,而不需要对整个模型进行改进和训练,提高了模型的可扩展性和扩展后的训练简易性。
步骤S102、接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量。
在完成对待训练的分类模型的加载之后,将会对所加载的分类模型进行训练,因此此时将会接收到所输入的训练样本,然后对训练样本进行处理,以得到训练样本中每一训练样本所对应的表征向量。其中,表征向量是用来对训练样本的特征进行描述的向量。
在一实施例中,训练样本有若干文本信息组成,也就是每一训练样本都是一个文本信息,因此在对文本信息进行特征提取时,通过模型结构中所设置的BERT编码层实现对样本的特征的提取,以得到每一训练样本所对应的表征向量,其中,训练样本可以表示为(x 1,x 2,…,x n),此时所提取得到的表征向量为(z 1,z 2,…,z n),z i是d维的向量。
示例性的,在接收到所输入的训练样本时,将训练样本输入到所加载的待训练的分类模型的BERT编码层中,以根据预训练好的BERT编码层对训练样本进行特征提取和编码,进而得到每一训练样本所对应的表征向量。
步骤S103、接收输入的初始向量,以根据所述表征向量以及所述初始向量对所述待训练的分类模型进行N次迭代训练,以对所述待训练分类模型的每一分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛。
在对所接收到的训练样本进行处理得到每一训练样本的表征向量之后,将会根据所接收到的初始向量以及进行处理得到的每一训练样本的表征向量对待训练的分类模型进行训练, 以使得完成训练后的待训练的分类模型可以被使用。
在一实施例中,在进行模型训练之前还将接收所输入的初始向量,其中初始向量为一个随机向量,记为q 1,且q 1是d维的向量,然后将初始向量以及样本所对应的表征向量作为对模型中某一或者某些结构进行训练的训练数据。由于所构建的模型结构中的分类层级数为N,那么在进行训练时需要实现对N个分类层级都进行训练,以使得每一个层级都可以准确实现对标签进行预测。
在对模型进行了N次迭代训练时,是对模型中的每一个分类层级进行训练,同时只有在第一个分类层级训练完成时才会进入第二个分类层级的训练,而在完成第二个分类层级的训练时才会进入第三个分类层级的训练,以此类推以完成对第N个分类层级的训练,也就是完成对整个待训练的分类模型的训练。
在一实施例中,参照图5,图5为本申请一个实施例提供的一种对待训练的分类模型进行训练的步骤的流程示意图。
具体地,在对待训练的分类模型进行训练时包括步骤S501至步骤S503。
步骤S501、接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,并确定第一次训练后的所述待训练分类模型是否收敛,以在确定收敛时完成对所述待训练分类模型的第一分类层级的训练;
步骤S502、当确定所述第一次训练后的待训练分类模型收敛时,将所述初始向量以及所述表征向量输入至完成训练的第一分类层级中,得到每一训练样本对应的一级文本表征向量;
步骤S503、根据所述一级文本表征向量以及训练样本对所述第一次训练所得到的待训练分类模型进行二次训练,以对所述待训练分类模型的第二分类层级进行训练,并以此类推以对所述待训练分类模型进行N次迭代训练。
在对具有N个分类层级的待训练的分类模型进行训练时,在完成对训练样本的处理时,还接收输入的初始向量,然后根据对训练样本进行处理所得到的表征向量以及所接收到的初始向量对待训练分类模型的进行一次训练,并确定一次训练后的待训练分类模型是否收敛,而在确定第一次训练后的待训练分类模型是否收敛时是在确定在第一次迭代训练之后进行训练的第一分类层级是否收敛。同时在确定收敛时此时将会得到第一分类层级训练完成的待训练分类模型。
具体地,在对待训练的分类模型进行一次训练时,是对待训练分类模型中的第一分类层级进行训练,而在对第一个分类层级进行训训练样本以及所接收到的初始向量作为第一个分类层级训练的输入,然后确定所训练的第一个分类层级是否收敛。
在一实施例中,在进行第一次训练时,训练过程和步骤如图6所示,图6为本申请一个实施例提供第一次训练的步骤的流程示意图,其中具体包括:
步骤S601、将所述初始向量以及所述表征向量输入至第一attention层,以输出得到每一表征向量对应的中间表征向量;
步骤S602、将所述中间表征向量输入至与所述第一attention层连接的第一全连接层,输出得到对应的空间向量;
步骤S603、将所述空间向量输入至预设的标签概率分布公式中得到标签概率分布曲线,并读取所述标签概率分布曲线中最大概率值所对应的标签作为一级预测标签。
而在进行第一次训练时,是对模型中的第一子模型进行训练,其中第一子模型如图2中A框中的结构。在进行第一次训练时,是利用所输入的初始向量以及训练样本来实现的,且每一训练样本都会经过BERT编码层得到对应的表征向量。在第一次训练时,将初始向量和表征向量输入到第一attention层中,以输出得到每一表征向量所对应的中间表征向量,然后将中间表征向量输入到第一全连接层中,以输出得到每一训练样本所对应的空间向量,最后根据所得到的空间向量得到每一训练样本所对应的一级标签的标签概率分布,而标签概率分布中标签概率最高的标签即为此时所述出的一级标签。但是由于模型此时处于训练过程中,因此所输出的一级标签并不一定是正确的,而在训练样本的获取和输入时,每一训练样本都 对应有各自的标签,包括一级标签、二级标签,甚至于三级标签等,具体根据实际的需求进行标记。
示例性的,在根据空间向量输出对应的标签概率分布时,是经过Softmax计算所得到的,通过计算可以得到所有可能的标签以及对应的概率值。
在一实施例中,在确定一次训练后的分类模型收敛时,将会进行二次训练,而在进行二次训练时,将会利用一次训练后的模型得到一级文本表征向量,然后利用所得到的一级文本表征向量和预先所得到表征向量作为二次训练的输入,以对待训练的分类模型进行第二次训练,同样的,在进行第二次训练时也会对二次训练的分类模型进行收敛性的判定,而整个的训练过程是一个不断的迭代训练的过程,每一次训练的输入数据存在一定的差异,以实现对分类模型中每一层级进行训练,进而得到训练好的分类模型。
在分类模型的训练过程中,需要对每一次训练后的模型的收敛性进行判断,且只有在当前训练后的模型收敛时,才会进行下一次的训练。在实际应用中,模型训练时是一个分类层级一个分类层级来进行的,在对第一分类层级进行训练时,并不会对后面的几个分类层级的小模型进行训练,在对如图2所示的模型进行训练时,第一次训练是对第一分类层级进行训练,第二次训练则是对第二分类层级进行训练,以此类推,第n次训练则是对模型中的第N分类层级进行训练。
而在进行模型训练的收敛性的判断时,是利用模型训练过程中的损失函数的损失值所确定的。
在一实施例中,确定一次训练后的所述待训练分类模型是否收敛包括:获取一次训练后所述待训练分类模型的损失值,以根据所述损失值确定所述一次训练后的所述待训练分类模型是否收敛;若所述损失值小于或者等于预设的损失阈值,则确定收敛;若所述损失值大于预设的损失阈值,则执行步骤:接收输入的初始向量,以根据所述表征向量以及所述初始向量对待训练分类模型进行一次训练,并确定一次训练后的所述待训练分类模型是否收敛。
具体地,在对待训练分类模型进行一次训练之后,将会根据实际的训练结果和状态确定该次训练是否成功,也就是完成一次训练之后的分类模型是否可以准确的完成对文本信息的第一标签的预测和对文本信息的一级划分。因此,在完成一次训练之后,确定一次训练后的待训练分类模型是否收敛,进而在确定收敛时进行进一步的训练,以得到最终所需要的训练好的分类模型。
在确定一次训练后的待训练分类模型是否收敛时,利用神经网络loss设计的方式来实现,主要的思想为最小风险训练。最小风险训练的基本思想是使用损失函数Δ(y,y (n))来描述模型预测y与y (n)之间的差异程度,并试图寻找一组参数使得模型在训练集上损失的期望值(即风险)最小即通过模型的损失的期望值来确定模型是否收敛。
示例性的,模型输入为x(n),标准为y(n),模型的预测输出为y,对应的损失的期望值(风险)为:
Figure PCTCN2021097412-appb-000001
其中,Y(x (n))表示x (n)对应的所有可能的输出集合,也称之为搜索空间。
而在确定是否收敛时,可以设定一个损失值阈值,在当前训练所得到的损失值小于或者等于所设定的损失值阈值时,确定收敛,反之则确定不收敛。同时,在确定训练后的模型不收敛时,将会对模型继续训练,比如在进行第一次训练,也就是对第一层级进行训练时,若训练中的某一阶段所得到的损失值大于所设定的损失阈值,则会继续对第一层级进行训练,若训练中的某一阶段所得到的损失值不大于所设定的损失阈值,则说明此时模型完成。
在整个的训练过程中,每一次训练时所使用的训练数据存在一定的差异,每一次训练时都会输入相同的训练文本,但是除了训练文本之外还会输入另一个数据,比如第一次训练时所输入的初始向量。而在对后面的若干层级进行训练时,同样会输入与初始向量相似的一个特征向量,而特征向量的得到将会根据实际的模型训练结果所得到。
示例性的,在对待训练的分类模型进行第一次训练,且确定完成第一次训练后的待训练的分类模型收敛,此时将会利用所完成第一次训练之后的第一层级得到训练文本做每一个样本所对应的一级文本表征向量,而一级文本表征向量与初始输入的初始向量的相似。
实际上,基于所构建的模型的结构,在得到一级文本表征向量时,通过将所接收到的初始向量和通过BERT层处理后的训练样本输入到第一层级中的attention层中,以得到每一个训练样本所对应的一级文本特征向量,而被用来对下一层级进行训练的一级文本表征向量是在第一次训练收敛时所得到的。具体地,在对第一层级的训练完成时,此时对第一层级中的模型参数进行了相应的调整以满足实际的需求,进而再将初始向量和经过BERT层后的训练样本输入到第一层级中的attention层中,以的都每一个样本所对应的一级文本表征向量。
具体地,在得到一级文本表征向量时,一级文本表征向量的计算过程可以如下所示:
Figure PCTCN2021097412-appb-000002
其中,
Figure PCTCN2021097412-appb-000003
e为自然对数的底数,q 1为一个d维的初始向量,z i是一个d维的表征向量,且一个训练样本对应一个表征向量。
且W V1和W K1是模型需要学习参数矩阵,也就是需要进行调整的参数矩阵。在进行一次训练时,通过不断的对该参数的优化,以能够更加准确的实现对文本信息所对应的第一标签的预测。
而在训练过程中,通过对参数W V1和W K1的不断调整,以使得模型的第一层级训练后收敛,而此时会得到对应的参数W V1和W K1,因此可以通过计算得到每一个训练样本所对应的一级文本表征向量。
在模型的第一层级训练完成,并且已经得到的第一层级所输出的一级文本表征向量之后,将会对下一层级进行训练,具体地,将一级文本表征向量和根据训练样本所得到的表征向量输入到模型的第二层级中,以实现对第二层级的训练。而对于第二层级的训练方式,与模型的第一层级的训练方式相同,通过不断的训练对第二层级中的模型参数进行调整。其中,第二层级中的参数可以用W V2和W K2来表示,而在第二层级收敛时第二层级的模型参数W V2和W K2调整完成。
步骤S104、当确定第N次迭代训练后的第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
由于所构建和加载的待训练分类模型的分类层级数为N,因此在训练过程中需要经过N次训练。而在经过N次训练后确定训练后的分类模型收敛时,将会存储所训练好的分类模型,以供后续进行使用。具体地,在确定第N次迭代训练后的第N分类层级收敛时,确定待训练分类模型是训练完成的,此时将会记录和存储训练好的分类模型。
需要说明的是,训练后的分类模型未收敛时,将会再次进行学习和训练,而此时进行学习和训练的数据会根据每一次的学习和训练进行调整。但是与一般的模型训练不相同的是,每次训练的进行是在前一次训练收敛的情况下所进行的,因此在确定不收敛时仅仅是对当前所训练的层级进行训练。
示例性的,在N为3时,若此时确定第三次训练后的分类模型未收敛时,仅仅是对第三次训练的对象继续训练,具体地,此时是对第三attention层以及第三全连接层所构成的第三子模型进行训练,如图4中的C框所包含的子模型部分,而对于A框以及B框所包含的子模型部分不会进行再一次的训练。
通过该方式对模型进行构建和训练,提高模型训练的效率,同时在对模型进行扩展时,具有更高的扩展性,同时对于扩展后的模型进行训练时可以更加快速的完成模型训练。
进一步地,图7为本申请一个实施例提供的一分类预测的步骤的流程示意图。
具体地,该步骤包括步骤S701至步骤S703。
步骤S701、当接收到分类指令时,加载所存储的训练好的分类模型;
步骤S702、接收输入的待处理文本信息,以及获取所存储的查询向量;
步骤S703、将所述查询向量以及所述待处理文本信息输入至所述训练好的分类模型中,以输出所述待处理文本信息所对应的分类结果,其中所述分类结果包括所述待处理文本信息所对应的N个标签。
在完成模型训练之后,将可以直接加载并使用训练好的分类模型,因此,在接收到分类指令时,首先加载预先训练好并存储好的分类模型,然后利用所加载的训练好的分类模型实现对文本信息的分类预测。
而在加载好训练好的分类模型时,接收输入的待处理文本信息,同时获取预先所存储的查询向量,其中该查询向量随机设定,然后将所接收到的待处理文本信息以及所获取的初始向量输入到所加载的分类模型中,以输出得到待处理文本信息所对应的分类结果。
具体地,在分类模型对待处理文本信息进行处理时,首先接收输入待处理文本信息,并对待处理文本信息进行特征提取,以得到待处理文本信息所对应的表征向量,然后接收输入的查询向量,并将表征向量输入到进行一级标签预测的第一attention层中,以计算得到待处理文本信息所对应的一级文本分类表征向量,接着将一级文本分类表征向量输入至第一全连接层,以得到第一空间向量,最后根据第一空间向量进行softmax计算,以得到一级标签的概率分布,进而选择最大概率值所对应的标签为一级标签,具体地,选择最大值的index为一级标签,并将所得到的一级标签输出。
而在完成一级标签的预测和输出的同时,还会进行二级标签甚至三级标签的预测和输出,而具体输出的标签的级别的数量与所使用的模型的分类层级数有关,以分类层级数为二为例,在完成一级标签的输出的同时,还将一级文本分类表征向量输入到进行二级标签预测的第二attention层,以计算得到待处理文本信息所对应的二级文本分类表征向量,然后将二级文本分类表征向量输入至第二全连接层,以得到第二空间向量,接着根据第二空间向量进行softmax计算,以得到最大值的index为二级标签,并将所得到的二级标签输出。
在上述描述的基于BERT的文本分类模型的训练方法中,在对分类模型进行训练时,基于BERT语言模型对训练样本进行处理,同时根据实际的需求构建分类模型的结构,接着根据处理后的训练样本对所构建的分类模型进行若干次的迭代训练,以使得训练完成后的分类模型可以实现层级分类。而在训练过程中,各个子模型之间不存在相互的干扰,也就是后一个子模型在不收敛的情况下不需要对前面的模块再次进行训练,提高了模型训练的效率,同时可以通过对模型结构的改变使得分类模型实现对更多层级的分类,使得分类模型具有更好的扩展性,以适应更多的分类需求。
请参阅图8,图8为本申请一个实施例提供的一种文本分类模型的训练装置的示意性框图,该装置用于执行前述的基于BERT的文本分类模型的训练方法。
如图8所示,该文本分类模型的训练装置800包括:
模型加载模块801,用于加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成;
样本处理模块802,用于接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量;
模型训练模块803,用于接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,,以对所述待训练分类模型的N个分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛;
模型存储模块804,用于当确定第N次迭代训练后第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
进一步地,在一个实施例中,所述待训练分类模型包括一个BERT编码层所述样本处理模块802具体还用于:
将所述训练样本输入至所述BERT编码层中,以对所述训练样本中的每一训练样本进行编码,以得到所述训练样本中每一训练样本所对应的表征向量。
进一步地,在一个实施例中,所述模型训练模块803具体还用于:
接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,并确定第一次训练后的待训练分类模型是否收敛,以在确定收敛时完成对所述待训练分类模型的第一分类层级的训练;当确定所述第一次训练后的待训练分类模型收敛时,将所述初始向量以及所述表征向量输入至完成训练的第一分类层级中,得到每一训练样本对应的一级文本表征向量;根据所述一级文本表征向量以及训练样本对所述第一次训练所得到的待训练分类模型进行第二次训练,以对所述待训练分类模型的第二分类层级进行训练,并以此类推以对所述待训练分类模型进行N次迭代训练。
进一步地,在一个实施例中,述待训练分类模型还包括N个attention层以及N个全连接层,且所述BERT编码层分别与所述N个attention层连接,一个全连接层连接一个attention层,所述attention层位于所述BERT编码层与所述全连接层之间,所述模型训练模块803具体还用于:
所述接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,包括:
将所述初始向量以及所述表征向量输入至第一attention层,以输出得到每一表征向量对应的中间表征向量;将所述中间表征向量输入至与所述第一attention层连接的第一全连接层,输出得到对应的空间向量;将所述空间向量输入至预设的标签概率分布公式中得到标签概率分布曲线,并读取所述标签概率分布曲线中最大概率值所对应的标签作为一级预测标签。
进一步地,在一个实施例中,所述模型训练模块803具体还用于:
获取第一次训练后所述待训练分类模型的损失值,并获取预设的损失阈值;将所述损失值与所述损失阈值进行对比;若所述损失值小于或者等于所述损失阈值,则确定收敛;若所述损失值大于所述损失阈值,则确定不收敛。
进一步地,在一个实施例中,所述模型训练模块803具体还用于:
根据所述表征向量以及所述初始向量再次对所述待训练分类模型进行第一次训练,并再次确定第一次训练后的所述待训练分类模型是否收敛。
进一步地,在一个实施例中,所述文本分类模型的训练装置800还包括模型调用模块805,其中所述模型调用模块805具体还用于:
当接收到分类指令时,加载所存储的训练好的分类模型;接收输入的待处理文本信息,以及获取所存储的查询向量;将所述查询向量以及所述待处理文本信息输入至所述训练好的分类模型中,以输出所述待处理文本信息所对应的分类结果,其中所述分类结果包括所述待处理文本信息所对应的N个标签。
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和各模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
上述的装置可以实现为一种计算机程序的形式,该计算机程序可以在如图7所示的计算机设备上运行。
请参阅图9,图9为本申请一个实施例提供的计算机设备的结构示意性框图。该计算机设备可以是服务器。
参阅图9,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口,其中,存储器可以包括非易失性存储介质和内存储器。
非易失性存储介质可存储操作系统和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种基于BERT的文本分类模型的训练方法。
处理器用于提供计算和控制能力,支撑整个计算机设备的运行。
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器 执行时,可使得处理器执行任意一种基于BERT的文本分类模型的训练方法。
该网络接口用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
应当理解的是,处理器可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
其中,在一个实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:
加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成;接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量;接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,以对所述待训练分类模型的每一分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛;当确定第N次迭代训练后第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
在一个实施例中,所述待训练分类模型包括一个BERT编码层,所述处理器在实现所述对所述训练样本进行处理以得到每一训练样本所对应的表征向量时,还用于实现:
将所述训练样本输入至所述BERT编码层中,以对所述训练样本中的每一训练样本进行编码,以得到所述训练样本中每一训练样本所对应的表征向量。
在一个实施例中,所述处理器在实现所述接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练时,还用于实现:
接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,并确定第一次训练后的待训练分类模型是否收敛,以在确定收敛时完成对所述待训练分类模型的第一分类层级的训练;当确定所述第一次训练后的待训练分类模型收敛时,将所述初始向量以及所述表征向量输入至完成训练的第一分类层级中,得到每一训练样本对应的一级文本表征向量;根据所述一级文本表征向量以及训练样本对所述第一次训练所得到的待训练分类模型进行第二次训练,以对所述待训练分类模型的第二分类层级进行训练,并以此类推以对所述待训练分类模型进行N次迭代训练。
在一个实施例中,所述待训练分类模型还包括N个attention层以及N个全连接层,且所述BERT编码层分别与所述N个attention层连接,一个全连接层连接一个attention层,所述attention层位于所述BERT编码层与所述全连接层之间,所述处理器在实现将所述接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练时,还用于实现:
将所述初始向量以及所述表征向量输入至第一attention层,以输出得到每一表征向量对应的中间表征向量;将所述中间表征向量输入至与所述第一attention层连接的第一全连接层,输出得到对应的空间向量;将所述空间向量输入至预设的标签概率分布公式中得到标签概率分布曲线,并读取所述标签概率分布曲线中最大概率值所对应的标签作为一级预测标签。
在一个实施例中,所述处理器在实现所述确定一次训练后的所述待训练分类模型是否收敛时,还用于实现:
获取第一次训练后所述待训练分类模型的损失值,并获取预设的损失阈值;将所述损失值与所述损失阈值进行对比;若所述损失值小于或者等于所述损失阈值,则确定收敛;若所述损失值大于所述损失阈值,则确定不收敛。
在一个实施例中,所述处理器在实现所述若所述损失值大于所述损失阈值,则确定不收敛之后之后,还用于实现:
根据所述表征向量以及所述初始向量再次对所述待训练分类模型进行第一次训练,并再次确定第一次训练后的所述待训练分类模型是否收敛。
在一个实施例中,所述处理器在实现所述计算机程序时,还用于实现:
当接收到分类指令时,加载所存储的训练好的分类模型;接收输入的待处理文本信息,以及获取所存储的查询向量;将所述查询向量以及所述待处理文本信息输入至所述训练好的分类模型中,以输出所述待处理文本信息所对应的分类结果,其中所述分类结果包括所述待处理文本信息所对应的N个标签。
本申请的实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序中包括程序指令,所述处理器执行所述程序指令,实现本申请实施例提供的任一项基于BERT的文本分类模型的训练方法。
其中,所述计算机可读存储介质可以是前述实施例所述的计算机设备的内部存储单元,例如所述计算机设备的硬盘或内存。所述计算机可读存储介质也可以是所述计算机设备的外部存储设备,例如所述计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
另外,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种基于BERT的文本分类模型的训练方法,其中,所述方法包括:
    加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成;
    接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量;
    接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,以对所述待训练分类模型的每一分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛;
    当确定第N次迭代训练后第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
  2. 根据权利要求1所述的训练方法,其中,所述待训练分类模型包括一个BERT编码层,所述对所述训练样本进行处理以得到每一训练样本所对应的表征向量,包括:
    将所述训练样本输入至所述BERT编码层中,以对所述训练样本中的每一训练样本进行编码,以得到所述训练样本中每一训练样本所对应的表征向量。
  3. 根据权利要求2所述的训练方法,其中,所述接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,包括:
    接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,并确定第一次训练后的待训练分类模型是否收敛,以在确定收敛时完成对所述待训练分类模型的第一分类层级的训练;
    当确定所述第一次训练后的待训练分类模型收敛时,将所述初始向量以及所述表征向量输入至完成训练的第一分类层级中,得到每一训练样本对应的一级文本表征向量;
    根据所述一级文本表征向量以及训练样本对所述第一次训练所得到的待训练分类模型进行第二次训练,以对所述待训练分类模型的第二分类层级进行训练,并以此类推以对所述待训练分类模型进行N次迭代训练。
  4. 根据权利要求3所述的训练方法,其中,所述待训练分类模型还包括N个attention层以及N个全连接层,且所述BERT编码层分别与所述N个attention层连接,一个全连接层连接一个attention层,所述attention层位于所述BERT编码层与所述全连接层之间;
    所述接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,包括:
    将所述初始向量以及所述表征向量输入至第一attention层,以输出得到每一表征向量对应的中间表征向量;
    将所述中间表征向量输入至与所述第一attention层连接的第一全连接层,输出得到对应的空间向量;
    将所述空间向量输入至预设的标签概率分布公式中得到标签概率分布曲线,并读取所述标签概率分布曲线中最大概率值所对应的标签作为一级预测标签。
  5. 根据权利要求3所述的训练方法,其中,所述确定一次训练后的所述待训练分类模型是否收敛,包括:
    获取第一次训练后所述待训练分类模型的损失值,并获取预设的损失阈值;
    将所述损失值与所述损失阈值进行对比;
    若所述损失值小于或者等于所述损失阈值,则确定收敛;
    若所述损失值大于所述损失阈值,则确定不收敛。
  6. 根据权利要求5所述的训练方法,其中,所述若所述损失值大于所述损失阈值,则确定不收敛之后,还包括:
    根据所述表征向量以及所述初始向量再次对所述待训练分类模型进行第一次训练,并再次确定第一次训练后的所述待训练分类模型是否收敛。
  7. 根据权利要求1至6中任一项所述的训练方法,其中,所述存储训练好的分类模型之后,还包括:
    当接收到分类指令时,加载所存储的训练好的分类模型;
    接收输入的待处理文本信息,以及获取所存储的查询向量;
    将所述查询向量以及所述待处理文本信息输入至所述训练好的分类模型中,以输出所述待处理文本信息所对应的分类结果,其中所述分类结果包括所述待处理文本信息所对应的N个标签。
  8. 一种文本分类模型的训练装置,其中,所述装置包括:
    模型加载模块,用于加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成;
    样本处理模块,用于接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量;
    模型训练模块,用于接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,以对所述待训练分类模型的N个分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛;
    模型存储模块,用于当确定第N次迭代训练后第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
  9. 一种计算机设备,其中,包括存储器和处理器:
    所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行如下步骤:
    加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成;
    接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量;
    接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,以对所述待训练分类模型的每一分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛;
    当确定第N次迭代训练后第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
  10. 根据权利要求9所述的计算机设备,其中,所述待训练分类模型包括一个BERT编码层,所述对所述训练样本进行处理以得到每一训练样本所对应的表征向量,包括:
    将所述训练样本输入至所述BERT编码层中,以对所述训练样本中的每一训练样本进行编码,以得到所述训练样本中每一训练样本所对应的表征向量。
  11. 根据权利要求10所述的计算机设备,其中,所述接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,包括:
    接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,并确定第一次训练后的待训练分类模型是否收敛,以在确定收敛时完成对所述待训练分类模型的第一分类层级的训练;
    当确定所述第一次训练后的待训练分类模型收敛时,将所述初始向量以及所述表征向量输入至完成训练的第一分类层级中,得到每一训练样本对应的一级文本表征向量;
    根据所述一级文本表征向量以及训练样本对所述第一次训练所得到的待训练分类模型进行第二次训练,以对所述待训练分类模型的第二分类层级进行训练,并以此类推以对所述待训练分类模型进行N次迭代训练。
  12. 根据权利要求11所述的计算机设备,其中,所述待训练分类模型还包括N个attention层以及N个全连接层,且所述BERT编码层分别与所述N个attention层连接,一个全连接层连接一个attention层,所述attention层位于所述BERT编码层与所述全连接层之间;
    所述接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,包括:
    将所述初始向量以及所述表征向量输入至第一attention层,以输出得到每一表征向量对应的中间表征向量;
    将所述中间表征向量输入至与所述第一attention层连接的第一全连接层,输出得到对应的空间向量;
    将所述空间向量输入至预设的标签概率分布公式中得到标签概率分布曲线,并读取所述标签概率分布曲线中最大概率值所对应的标签作为一级预测标签。
  13. 根据权利要求11所述的计算机设备,其中,所述确定一次训练后的所述待训练分类模型是否收敛,包括:
    获取第一次训练后所述待训练分类模型的损失值,并获取预设的损失阈值;
    将所述损失值与所述损失阈值进行对比;
    若所述损失值小于或者等于所述损失阈值,则确定收敛;
    若所述损失值大于所述损失阈值,则确定不收敛。
  14. 根据权利要求13所述的计算机设备,其中,所述若所述损失值大于所述损失阈值,则确定不收敛之后,还包括:
    根据所述表征向量以及所述初始向量再次对所述待训练分类模型进行第一次训练,并再次确定第一次训练后的所述待训练分类模型是否收敛。
  15. 根据权利要求9至14中任一项所述的计算机设备,其中,所述存储训练好的分类模型之后,还包括:
    当接收到分类指令时,加载所存储的训练好的分类模型;
    接收输入的待处理文本信息,以及获取所存储的查询向量;
    将所述查询向量以及所述待处理文本信息输入至所述训练好的分类模型中,以输出所述待处理文本信息所对应的分类结果,其中所述分类结果包括所述待处理文本信息所对应的N个标签。
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机可读指令被所述处理器执行如下步骤:
    加载待训练分类模型,并识别所述待训练分类模型所包含的分类层级数N,其中所述待训练分类模型基于BERT语言模型生成;
    接收输入的训练样本,以对所述训练样本进行处理得到每一训练样本所对应的表征向量;
    接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,以对所述待训练分类模型的每一分类层级进行训练,其中每次迭代训练完成一分类层级的训练,且每次迭代训练完成时对应的分类层级收敛;
    当确定第N次迭代训练后第N分类层级收敛时,确定所述待训练分类模型训练完成,并存储训练好的分类模型。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述待训练分类模型包括一个BERT编码层,所述对所述训练样本进行处理以得到每一训练样本所对应的表征向量,包括:
    将所述训练样本输入至所述BERT编码层中,以对所述训练样本中的每一训练样本进行编码,以得到所述训练样本中每一训练样本所对应的表征向量。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述接收输入的初始向量,并根据所述表征向量以及所述初始向量对所述待训练分类模型进行N次迭代训练,包括:
    接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,并确定第一次训练后的待训练分类模型是否收敛,以在确定收敛时完成对所述待训练分类模型的第一分类层级的训练;
    当确定所述第一次训练后的待训练分类模型收敛时,将所述初始向量以及所述表征向量输入至完成训练的第一分类层级中,得到每一训练样本对应的一级文本表征向量;
    根据所述一级文本表征向量以及训练样本对所述第一次训练所得到的待训练分类模型进行第二次训练,以对所述待训练分类模型的第二分类层级进行训练,并以此类推以对所述待 训练分类模型进行N次迭代训练。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述待训练分类模型还包括N个attention层以及N个全连接层,且所述BERT编码层分别与所述N个attention层连接,一个全连接层连接一个attention层,所述attention层位于所述BERT编码层与所述全连接层之间;
    所述接收输入的初始向量,根据所述表征向量以及所述初始向量对待训练分类模型进行第一次训练,包括:
    将所述初始向量以及所述表征向量输入至第一attention层,以输出得到每一表征向量对应的中间表征向量;
    将所述中间表征向量输入至与所述第一attention层连接的第一全连接层,输出得到对应的空间向量;
    将所述空间向量输入至预设的标签概率分布公式中得到标签概率分布曲线,并读取所述标签概率分布曲线中最大概率值所对应的标签作为一级预测标签。
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述确定一次训练后的所述待训练分类模型是否收敛,包括:
    获取第一次训练后所述待训练分类模型的损失值,并获取预设的损失阈值;
    将所述损失值与所述损失阈值进行对比;
    若所述损失值小于或者等于所述损失阈值,则确定收敛;
    若所述损失值大于所述损失阈值,则确定不收敛。
PCT/CN2021/097412 2021-04-28 2021-05-31 文本分类模型的训练方法、装置、设备及可读存储介质 WO2022227217A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110470115.7 2021-04-28
CN202110470115.7A CN113011529B (zh) 2021-04-28 2021-04-28 文本分类模型的训练方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2022227217A1 true WO2022227217A1 (zh) 2022-11-03

Family

ID=76380866

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097412 WO2022227217A1 (zh) 2021-04-28 2021-05-31 文本分类模型的训练方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN113011529B (zh)
WO (1) WO2022227217A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255359A (zh) * 2021-07-15 2021-08-13 中兴通讯股份有限公司 模型训练方法、文本处理方法和装置、电子设备、介质
CN113284359B (zh) * 2021-07-22 2022-03-29 腾讯科技(深圳)有限公司 车位推荐方法、装置、设备及计算机可读存储介质
CN114416974A (zh) * 2021-12-17 2022-04-29 北京百度网讯科技有限公司 模型训练方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
CN109858558A (zh) * 2019-02-13 2019-06-07 北京达佳互联信息技术有限公司 分类模型的训练方法、装置、电子设备及存储介质
CN110956018A (zh) * 2019-11-22 2020-04-03 腾讯科技(深圳)有限公司 文本处理模型的训练方法、文本处理方法、装置及存储介质
CN111309919A (zh) * 2020-03-23 2020-06-19 智者四海(北京)技术有限公司 文本分类模型的系统及其训练方法
CN111488459A (zh) * 2020-04-15 2020-08-04 焦点科技股份有限公司 一种基于关键词的产品分类方法
CN111553399A (zh) * 2020-04-21 2020-08-18 佳都新太科技股份有限公司 特征模型训练方法、装置、设备及存储介质
CN112131366A (zh) * 2020-09-23 2020-12-25 腾讯科技(深圳)有限公司 训练文本分类模型及文本分类的方法、装置及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108615044A (zh) * 2016-12-12 2018-10-02 腾讯科技(深圳)有限公司 一种分类模型训练的方法、数据分类的方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
CN109858558A (zh) * 2019-02-13 2019-06-07 北京达佳互联信息技术有限公司 分类模型的训练方法、装置、电子设备及存储介质
CN110956018A (zh) * 2019-11-22 2020-04-03 腾讯科技(深圳)有限公司 文本处理模型的训练方法、文本处理方法、装置及存储介质
CN111309919A (zh) * 2020-03-23 2020-06-19 智者四海(北京)技术有限公司 文本分类模型的系统及其训练方法
CN111488459A (zh) * 2020-04-15 2020-08-04 焦点科技股份有限公司 一种基于关键词的产品分类方法
CN111553399A (zh) * 2020-04-21 2020-08-18 佳都新太科技股份有限公司 特征模型训练方法、装置、设备及存储介质
CN112131366A (zh) * 2020-09-23 2020-12-25 腾讯科技(深圳)有限公司 训练文本分类模型及文本分类的方法、装置及存储介质

Also Published As

Publication number Publication date
CN113011529B (zh) 2024-05-07
CN113011529A (zh) 2021-06-22

Similar Documents

Publication Publication Date Title
WO2022227217A1 (zh) 文本分类模型的训练方法、装置、设备及可读存储介质
US11741361B2 (en) Machine learning-based network model building method and apparatus
US9058564B2 (en) Controlling quarantining and biasing in cataclysms for optimization simulations
US11556850B2 (en) Resource-aware automatic machine learning system
WO2021254114A1 (zh) 构建多任务学习模型的方法、装置、电子设备及存储介质
CN113361680A (zh) 一种神经网络架构搜索方法、装置、设备及介质
CN112287166B (zh) 一种基于改进深度信念网络的电影推荐方法及系统
US20110173145A1 (en) Classification of a document according to a weighted search tree created by genetic algorithms
CN115456202B (zh) 一种提高工作机学习性能的方法、装置、设备及介质
JP2022078310A (ja) 画像分類モデル生成方法、装置、電子機器、記憶媒体、コンピュータプログラム、路側装置およびクラウド制御プラットフォーム
CN112634992A (zh) 分子性质预测方法及其模型的训练方法及相关装置、设备
KR20230107558A (ko) 모델 트레이닝, 데이터 증강 방법, 장치, 전자 기기 및 저장 매체
CN109255389B (zh) 一种装备评价方法、装置、设备及可读存储介质
CN114399025A (zh) 一种图神经网络解释方法、系统、终端以及存储介质
JP2020126468A (ja) 学習方法、学習プログラムおよび学習装置
Chen et al. Hierarchical multi‐label classification based on over‐sampling and hierarchy constraint for gene function prediction
CN115412401B (zh) 训练虚拟网络嵌入模型及虚拟网络嵌入的方法和装置
CN116976461A (zh) 联邦学习方法、装置、设备及介质
WO2022252694A1 (zh) 神经网络优化方法及其装置
WO2022252596A1 (zh) 构建ai集成模型的方法、ai集成模型的推理方法及装置
Wang et al. Parameters optimization of classifier and feature selection based on improved artificial bee colony algorithm
JP7388574B2 (ja) モデル推定装置、モデル推定方法及びモデル推定プログラム
WO2023202363A1 (zh) 一种信任评估方法、装置及设备
Pawar et al. Anytime Learning of Sum-Product and Sum-Product-Max Networks
US20240078424A1 (en) Neural network arrangement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938687

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE