CN116304693A - Method and device for accelerating training of text processing model - Google Patents

Method and device for accelerating training of text processing model Download PDF

Info

Publication number
CN116304693A
CN116304693A CN202310197464.5A CN202310197464A CN116304693A CN 116304693 A CN116304693 A CN 116304693A CN 202310197464 A CN202310197464 A CN 202310197464A CN 116304693 A CN116304693 A CN 116304693A
Authority
CN
China
Prior art keywords
training
text
batch
processing model
texts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310197464.5A
Other languages
Chinese (zh)
Inventor
蒋敏
暴宇健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Longzhi Digital Technology Service Co Ltd
Original Assignee
Beijing Longzhi Digital Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Longzhi Digital Technology Service Co Ltd filed Critical Beijing Longzhi Digital Technology Service Co Ltd
Priority to CN202310197464.5A priority Critical patent/CN116304693A/en
Publication of CN116304693A publication Critical patent/CN116304693A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure relates to the technical field of text processing, and provides a method and a device for accelerating text processing model training. The method comprises the following steps: acquiring a text training data set, wherein the text training data set comprises a plurality of training texts; sequencing a plurality of training texts based on the text length of each training text to obtain a sequencing result; dividing training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training; the text processing model is trained in multiple batches using multiple batches of training text. By adopting the technical means, the problem that the training time is too long in the prior art for training the text processing model in a text dynamic filling mode is solved.

Description

Method and device for accelerating training of text processing model
Technical Field
The disclosure relates to the technical field of text processing, in particular to a method and a device for accelerating text processing model training.
Background
The current common algorithm for training a text processing model is a dynamic padding text+random gradient descent algorithm. The dynamic text filling is performed by filling all texts in a batch to the length of the longest text in the batch, and then feeding the filled texts into a text processing model. However, on training sets with too large a difference in text length, many texts are padded to an unwanted length, which can result in too long training times. For example, the length of one text in a batch is 512, the length of other texts is around 100, and the lengths of all texts in the batch need to be filled to 512 in a dynamic text filling manner, so that the filling does not increase useful information, but rather, the text in each batch is too long, so that the training time is too long.
In the process of implementing the disclosed concept, the inventor finds that at least the following technical problems exist in the related art: training a text processing model in a dynamic text filling manner has the problem of overlong training time.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a method, apparatus, electronic device, and computer readable storage medium for accelerating training of a text processing model, so as to solve the problem in the prior art that training of a text processing model in a manner of dynamically filling text has a long training time.
In a first aspect of an embodiment of the present disclosure, a method for accelerating training of a text processing model is provided, including: acquiring a text training data set, wherein the text training data set comprises a plurality of training texts; sequencing a plurality of training texts based on the text length of each training text to obtain a sequencing result; dividing training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training; the text processing model is trained in multiple batches using multiple batches of training text.
In a second aspect of embodiments of the present disclosure, there is provided an apparatus for accelerating training of a text processing model, including: an acquisition module configured to acquire a text training dataset, wherein the text training dataset comprises a plurality of training texts; the ordering module is configured to order the plurality of training texts based on the text length of each training text to obtain an ordering result; the dividing module is configured to divide the training texts in the sequencing result based on the batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training; and a training module configured to train the text processing model in multiple batches using the multiple batches of training text.
In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.
Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: because the embodiment of the disclosure obtains the text training data set, wherein the text training data set comprises a plurality of training texts; sequencing a plurality of training texts based on the text length of each training text to obtain a sequencing result; dividing training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training; the text processing model is trained in multiple batches by utilizing the training texts in multiple batches, so that the problem that the training time is too long in the prior art for training the text processing model in a text filling dynamic mode can be solved, and the time for training the text processing model is further shortened.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow diagram (one) of a method for accelerating text processing model training provided by an embodiment of the present disclosure;
FIG. 2 is a flow chart diagram (II) of a method for accelerating text processing model training provided by an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of an apparatus for accelerating training of a text processing model according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
Fig. 1 is a flow chart (a) of a method for accelerating training of a text processing model according to an embodiment of the present disclosure. The method of accelerating text processing model training of fig. 1 may be performed by a computer or server, or software on a computer or server. As shown in fig. 1, the method for accelerating training of a text processing model includes:
s101, acquiring a text training data set, wherein the text training data set comprises a plurality of training texts;
s102, sorting a plurality of training texts based on the text length of each training text to obtain a sorting result;
s103, dividing training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training;
s104, training the text processing model in multiple batches by utilizing the training texts in multiple batches.
Specifically, the plurality of training texts are ordered, which may be ordered according to the order of the text length from long to short; for example, the batch size is 10, then the training texts are divided into 10 adjacent training texts from front to back in the sequencing result as a batch; and after a plurality of batches are obtained through dividing, carrying out multi-batch training on the text processing model.
The text processing model may be any common NLP model, i.e., a natural language processing model.
According to the technical scheme provided by the embodiment of the disclosure, a text training data set is obtained, wherein the text training data set comprises a plurality of training texts; sequencing a plurality of training texts based on the text length of each training text to obtain a sequencing result; dividing training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training; the text processing model is trained in multiple batches by utilizing the training texts in multiple batches, so that the problem that the training time is too long in the prior art for training the text processing model in a text filling dynamic mode can be solved, and the time for training the text processing model is further shortened.
Performing multiple batches of training on a text processing model with multiple batches of training text, including: determining a maximum text length of a plurality of training texts in each batch; filling each training text in each batch according to the maximum text length corresponding to each batch, so that the text length of each training text in each batch after filling is the maximum text length corresponding to each batch; and training the text processing model in multiple batches by using the training texts of multiple batches after the filling processing.
Specifically, in each batch training of the text processing model: training the batch of training texts to the text processing model requires that the training texts be entered into the text processing model, and then it is necessary to ensure that the dimensions of each time the text processing model is entered are the same, i.e., that the text lengths of the training texts of the batch remain consistent, and thus that each training text in the batch needs to be filled in. When the filling processing is performed, the short text is filled to be consistent with the long text, for example, the text lengths of 8 training texts in a batch are as follows in sequence: 1,2,3,4, 512, 512, 512, 512, then the maximum text length corresponding to the batch is 512, and the text lengths of the four training texts after the batch are 512, so that no filling is needed, and only the text lengths of the four training texts before the batch are filled to 512. The unit of text length may be a word.
Illustrating: there are 8 training texts in a text training data set, the text length is [1,2,3,4, 512, 512, 512, 512], the batch size of this training is 2. The existing dynamic text filling method is to divide training texts randomly, and the training texts can be divided into 4 batches: the model training requires equal length of data in one batch, so that the text length of 4 batches after filling is [ [512, 512], [512, 512], [512, 512] ]. In the embodiment of the disclosure, training texts are firstly ordered and then divided according to batch_size, and 4 batches obtained by dividing are: the text length of 4 batches after filling is [ [2,2], [4,4], [512, 512], [512, 512] ] and [ [2,2], [4,4], [512, 512] ]. Obviously, by the method of the embodiment of the disclosure, the problem that the text is filled to an unnecessary length, which results in too long training time, can be avoided.
Performing multiple batches of training on a text processing model with multiple batches of training text, including: the method comprises the following steps of circularly executing the following steps of training a text processing model in multiple batches: training the text processing model in the ith batch by utilizing the training texts in the ith batch; judging whether i is equal to the preset batch times or not; when i is equal to the preset batch times, the cycle is exited; and when the i is smaller than the preset batch times, training the text processing model in the i plus 1 batch by utilizing the training texts in the i plus 1 batch, wherein the initial value of the i is 1.
Through the method, the ith batch of training is carried out on the text processing model by using the training texts of the ith batch and 1 batch of training is carried out until the i is equal to the preset batch times, and the cycle is exited. The text processing model obtained after exiting the loop is a trained model, and can be used for text processing, and comprises the following steps: text classification, named entity recognition, and question-answering task processing.
Training the text processing model for the ith batch by using the training text of the ith batch, wherein the training comprises the following steps: inputting training texts of the ith batch into a text processing model, and outputting a processing result; calculating a loss value between the label corresponding to the ith batch and the processing result by using the loss function; according to the loss value, obtaining the gradient of each model parameter of the text processing model through a back propagation algorithm; based on the gradient of each model parameter of the text processing model, updating each model parameter of the text processing model by a gradient descent method.
The back propagation algorithm and the gradient descent method are common training methods, and are not described in detail herein.
Dividing training texts in the sequencing result based on the batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the method further comprises the following steps: acquiring operation information of hardware equipment for training a text processing model; determining the optimal batch processing size according to the operation information, the number of training texts in the text training data set, the model type of the text processing model and the type of an optimizer used in training the text processing model; wherein the optimizer comprises: adam, adamW, adaGrad and RMSProp.
The operation information of the hardware equipment comprises: running score, display card memory, running memory and the like of the hardware equipment about the operation capability. The operational information, the number of training texts in the text training dataset, the model type of the text processing model, and the type of optimizer used in training the text processing model are all related to the batch size.
It can be understood that the operation information, the number of training texts in the text training data set, the model type of the text processing model and the type of the optimizer used in training the text processing model all correspond to a value range of a batch processing size, and in the embodiment of the disclosure, an optimal batch processing size is determined according to the intersection of the four value ranges, so that the training speed is improved.
For example, the model types of different text processing models and the types of optimizers have the value ranges of the batch sizes which are respectively commonly used; according to the operation information of the hardware equipment, the maximum supported batch processing size of the hardware equipment can be determined; if the number of training texts in the text training dataset is too large, then the batch size may not be too small, which may affect the efficiency of the training.
After obtaining the operation information of the hardware device for training the text processing model, the method further comprises the following steps: determining a range of intervals of batch processing size according to the operation information, the number of training texts in the text training data set, the model type of the text processing model and the type of an optimizer used in training the text processing model; generating a batch size which dynamically changes in multi-batch training of the text processing model, wherein the batch size which dynamically changes within an interval range, and each change is controlled by a reference precision index, a reference generalization index and a current precision index and a current generalization index of the text processing model in current batch training.
The calculation information, the number of the training texts in the text training data set, the model type of the text processing model and the type of the optimizer used in the training text processing model all correspond to a batch-size value range, and the interval range of one batch-size is determined according to the intersection of the four value ranges. Designing a batch size of dynamic change, wherein the batch size varies in a range of intervals, and whether the batch size varies between adjacent batch training is controlled by a reference precision index, a reference generalization index, and a current precision index and a current generalization index of a text processing model in the current batch training.
The reference precision index and the reference generalization index are the precision index and the generalization index of the ideal situation which should be achieved by the text processing model after training respectively. The current precision index and the current generalization index are the precision index and the generalization index of the text processing model obtained by calculation after the current batch training respectively. The current precision index and the current generalization index calculated after the next batch training should be close to the reference precision index and the reference generalization index respectively, and meanwhile, the optimal balance between the current precision index and the current generalization index is ensured.
The calculation of the precision index and the generalization index has the existing method and is not repeated. For example, the generalization index is represented by layer rotation, and the layer rotation refers to the change of the cosine of the included angle between the weight vector and the initialization of each layer in the neural network.
Dividing training texts in the sequencing result based on the batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the method further comprises the following steps: acquiring operation information of hardware equipment for training a text processing model; according to the operation information, the number of training texts in the text training data set, the model type of the text processing model and the type of an optimizer used in training the text processing model, the following hyper-parameters are determined: iteration times, learning rate and batch processing size; wherein the determined set of hyper-parameters is an optimal combination.
The method for determining a set of superparameters is similar to the method for determining batch sizes above, except that the embodiments of the disclosure need to comprehensively consider iteration number, learning rate, and batch size, so that the iteration number, learning rate, and batch size are relatively optimal as a whole. For example, the number of iterations, the learning rate, and the batch size may be 50, 0.5, and 100, or 40, 0.7, and 125, or 100, 0.6, and 50, respectively, and the model converges most rapidly when the number of iterations, the learning rate, and the batch size are predicted or calculated to be 50, 0.5, and 100, respectively, so 50, 0.5, and 100 are optimal combinations.
Fig. 2 is a schematic flow chart (ii) of a method for accelerating training of a text processing model according to an embodiment of the disclosure, as shown in fig. 2, the method for accelerating training of a text processing model includes:
s201, acquiring a text training data set, wherein the text training data set comprises a plurality of training texts;
s202, sorting a plurality of training texts based on the text length of each training text to obtain a sorting result;
s203, dividing training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training;
s204, determining the maximum text length of a plurality of training texts in each batch;
s205, filling each training text in each batch according to the maximum text length corresponding to each batch, so that the text length of each training text in each batch after filling is the maximum text length corresponding to the batch;
s206, training the text processing model in the ith batch by utilizing the training text in the ith batch;
s207, judging whether i is equal to a preset batch number; and when i is equal to the preset batch times, exiting the cycle, and when i is smaller than the preset batch times, adding 1 to i, wherein the initial value of i is 1.
According to the technical scheme provided by the embodiment of the disclosure, a text training data set is obtained, wherein the text training data set comprises a plurality of training texts; sequencing a plurality of training texts based on the text length of each training text to obtain a sequencing result; dividing training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training; the text processing model is trained in multiple batches by utilizing the training texts in multiple batches, so that the problem that the training time is too long in the prior art for training the text processing model in a text filling dynamic mode can be solved, and the time for training the text processing model is further shortened.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.
Fig. 3 is a schematic diagram of an apparatus for accelerating text processing model training provided in an embodiment of the present disclosure.
As shown in fig. 3, the apparatus for accelerating training of a text processing model includes:
an acquisition module 301 configured to acquire a text training dataset, wherein the text training dataset comprises a plurality of training texts;
a ranking module 302 configured to rank the plurality of training texts based on the text length of each training text, to obtain a ranking result;
the dividing module 303 is configured to divide the training texts in the sorting result based on the batch size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch number is used for indicating the number of the training texts in each batch of training;
the training module 304 is configured to perform multiple batches of training on the text processing model using multiple batches of training text.
Specifically, the plurality of training texts are ordered, which may be ordered according to the order of the text length from long to short; for example, the batch size is 10, then the training texts are divided into 10 adjacent training texts from front to back in the sequencing result as a batch; and after a plurality of batches are obtained through dividing, carrying out multi-batch training on the text processing model.
The text processing model may be any common NLP model, i.e., a natural language processing model.
According to the technical scheme provided by the embodiment of the disclosure, a text training data set is obtained, wherein the text training data set comprises a plurality of training texts; sequencing a plurality of training texts based on the text length of each training text to obtain a sequencing result; dividing training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training; the text processing model is trained in multiple batches by utilizing the training texts in multiple batches, so that the problem that the training time is too long in the prior art for training the text processing model in a text filling dynamic mode can be solved, and the time for training the text processing model is further shortened.
Optionally, the training module 304 is further configured to determine a maximum text length of the plurality of training texts in each batch; filling each training text in each batch according to the maximum text length corresponding to each batch, so that the text length of each training text in each batch after filling is the maximum text length corresponding to each batch; and training the text processing model in multiple batches by using the training texts of multiple batches after the filling processing.
Specifically, in each batch training of the text processing model: training the batch of training texts to the text processing model requires that the training texts be entered into the text processing model, and then it is necessary to ensure that the dimensions of each time the text processing model is entered are the same, i.e., that the text lengths of the training texts of the batch remain consistent, and thus that each training text in the batch needs to be filled in. When the filling processing is performed, the short text is filled to be consistent with the long text, for example, the text lengths of 8 training texts in a batch are as follows in sequence: 1,2,3,4, 512, 512, 512, 512, then the maximum text length corresponding to the batch is 512, and the text lengths of the four training texts after the batch are 512, so that no filling is needed, and only the text lengths of the four training texts before the batch are filled to 512. The unit of text length may be a word.
Illustrating: there are 8 training texts in a text training data set, the text length is [1,2,3,4, 512, 512, 512, 512], the batch size of this training is 2. The existing dynamic text filling method is to divide training texts randomly, and the training texts can be divided into 4 batches: the model training requires equal length of data in one batch, so that the text length of 4 batches after filling is [ [512, 512], [512, 512], [512, 512] ]. In the embodiment of the disclosure, training texts are firstly ordered and then divided according to batch_size, and 4 batches obtained by dividing are: the text length of 4 batches after filling is [ [2,2], [4,4], [512, 512], [512, 512] ] and [ [2,2], [4,4], [512, 512] ]. Obviously, by the method of the embodiment of the disclosure, the problem that the text is filled to an unnecessary length, which results in too long training time, can be avoided.
Optionally, the training module 304 is further configured to perform a plurality of batches of training of the text processing model by performing the following steps in a loop: training the text processing model in the ith batch by utilizing the training texts in the ith batch; judging whether i is equal to the preset batch times or not; when i is equal to the preset batch times, the cycle is exited; and when the i is smaller than the preset batch times, training the text processing model in the i plus 1 batch by utilizing the training texts in the i plus 1 batch, wherein the initial value of the i is 1.
Through the method, the ith batch of training is carried out on the text processing model by using the training texts of the ith batch and 1 batch of training is carried out until the i is equal to the preset batch times, and the cycle is exited. The text processing model obtained after exiting the loop is a trained model, and can be used for text processing, and comprises the following steps: text classification, named entity recognition, and question-answering task processing.
Optionally, the training module 304 is further configured to input training texts of the ith batch into the text processing model, and output a processing result; calculating a loss value between the label corresponding to the ith batch and the processing result by using the loss function; according to the loss value, obtaining the gradient of each model parameter of the text processing model through a back propagation algorithm; based on the gradient of each model parameter of the text processing model, updating each model parameter of the text processing model by a gradient descent method.
The back propagation algorithm and the gradient descent method are common training methods, and are not described in detail herein.
Optionally, the partitioning module 303 is further configured to obtain operation information of a hardware device that trains the text processing model; determining the optimal batch processing size according to the operation information, the number of training texts in the text training data set, the model type of the text processing model and the type of an optimizer used in training the text processing model; wherein the optimizer comprises: adam, adamW, adaGrad and RMSProp.
The operation information of the hardware equipment comprises: running score, display card memory, running memory and the like of the hardware equipment about the operation capability. The operational information, the number of training texts in the text training dataset, the model type of the text processing model, and the type of optimizer used in training the text processing model are all related to the batch size.
It can be understood that the operation information, the number of training texts in the text training data set, the model type of the text processing model and the type of the optimizer used in training the text processing model all correspond to a value range of a batch processing size, and in the embodiment of the disclosure, an optimal batch processing size is determined according to the intersection of the four value ranges, so that the training speed is improved.
For example, the model types of different text processing models and the types of optimizers have the value ranges of the batch sizes which are respectively commonly used; according to the operation information of the hardware equipment, the maximum supported batch processing size of the hardware equipment can be determined; if the number of training texts in the text training dataset is too large, then the batch size may not be too small, which may affect the efficiency of the training.
Optionally, the partitioning module 303 is further configured to determine a range of intervals of the batch size according to the operation information, the number of training texts in the text training dataset, the model type of the text processing model, and the type of the optimizer used in training the text processing model; generating a batch size which dynamically changes in multi-batch training of the text processing model, wherein the batch size which dynamically changes within an interval range, and each change is controlled by a reference precision index, a reference generalization index and a current precision index and a current generalization index of the text processing model in current batch training.
The calculation information, the number of the training texts in the text training data set, the model type of the text processing model and the type of the optimizer used in the training text processing model all correspond to a batch-size value range, and the interval range of one batch-size is determined according to the intersection of the four value ranges. Designing a batch size of dynamic change, wherein the batch size varies in a range of intervals, and whether the batch size varies between adjacent batch training is controlled by a reference precision index, a reference generalization index, and a current precision index and a current generalization index of a text processing model in the current batch training.
The reference precision index and the reference generalization index are the precision index and the generalization index of the ideal situation which should be achieved by the text processing model after training respectively. The current precision index and the current generalization index are the precision index and the generalization index of the text processing model obtained by calculation after the current batch training respectively. The current precision index and the current generalization index calculated after the next batch training should be close to the reference precision index and the reference generalization index respectively, and meanwhile, the optimal balance between the current precision index and the current generalization index is ensured.
The calculation of the precision index and the generalization index has the existing method and is not repeated. For example, the generalization index is represented by layer rotation, and the layer rotation refers to the change of the cosine of the included angle between the weight vector and the initialization of each layer in the neural network.
Optionally, the partitioning module 303 is further configured to obtain operation information of a hardware device that trains the text processing model; according to the operation information, the number of training texts in the text training data set, the model type of the text processing model and the type of an optimizer used in training the text processing model, the following hyper-parameters are determined: iteration times, learning rate and batch processing size; wherein the determined set of hyper-parameters is an optimal combination.
The method for determining a set of superparameters is similar to the method for determining batch sizes above, except that the embodiments of the disclosure need to comprehensively consider iteration number, learning rate, and batch size, so that the iteration number, learning rate, and batch size are relatively optimal as a whole. For example, the number of iterations, the learning rate, and the batch size may be 50, 0.5, and 100, or 40, 0.7, and 125, or 100, 0.6, and 50, respectively, and the model converges most rapidly when the number of iterations, the learning rate, and the batch size are predicted or calculated to be 50, 0.5, and 100, respectively, so 50, 0.5, and 100 are optimal combinations.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.
Fig. 4 is a schematic diagram of an electronic device 4 provided by an embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented by processor 401 when executing computer program 403. Alternatively, the processor 401, when executing the computer program 403, performs the functions of the modules/units in the above-described apparatus embodiments.
The electronic device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 4 and is not limiting of the electronic device 4 and may include more or fewer components than shown, or different components.
The processor 401 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
The memory 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4. Memory 402 may also include both internal storage units and external storage devices of electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims (10)

1. A method for accelerating training of a text processing model, comprising:
acquiring a text training data set, wherein the text training data set comprises a plurality of training texts;
sequencing a plurality of training texts based on the text length of each training text to obtain a sequencing result;
dividing the training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training;
the text processing model is trained in multiple batches using multiple batches of training text.
2. The method of claim 1, wherein the multi-batch training of the text processing model with a plurality of batches of training text comprises:
determining a maximum text length of a plurality of training texts in each batch;
filling each piece of training text in each batch according to the maximum text length corresponding to each batch, so that the text length of each piece of training text in each batch after the filling is the maximum text length corresponding to each batch;
and training the text processing model in multiple batches by utilizing the training texts of multiple batches after the filling processing.
3. The method of claim 1, wherein the multi-batch training of the text processing model with a plurality of batches of training text comprises:
performing the multi-batch training on the text processing model in a loop by performing the following steps:
training the text processing model in the ith batch by utilizing the training texts in the ith batch;
judging whether i is equal to the preset batch times or not;
when i is equal to the preset batch times, exiting the cycle;
and when the i is smaller than the preset batch times, training the text processing model in the i plus 1 batch by using training texts in the i plus 1 batch, wherein the initial value of the i is 1.
4. A method according to claim 3, wherein the training of the text processing model for the ith batch using training text for the ith batch comprises:
inputting training texts of the ith batch into the text processing model, and outputting a processing result;
calculating a loss value between the label corresponding to the ith batch and the processing result by using the loss function;
obtaining gradients of various model parameters of the text processing model through a back propagation algorithm according to the loss value;
based on the gradients of the respective model parameters of the text processing model, updating the respective model parameters of the text processing model by a gradient descent method.
5. The method of claim 1, wherein the dividing training texts in the ranking result based on batch sizes results in a plurality of batches, each batch containing a plurality of training texts, and before the step of further comprising:
acquiring operation information of hardware equipment for training the text processing model;
determining the optimal batch size according to the operation information, the quantity of training texts in the text training data set, the model type of the text processing model and the type of an optimizer used in training the text processing model;
wherein the optimizer comprises: adam, adamW, adaGrad and RMSProp.
6. The method of claim 5, wherein after the obtaining the operation information of the hardware device for training the text processing model, the method further comprises:
determining a range of intervals of batch processing size according to the operation information, the number of training texts in the text training dataset, the model type of the text processing model and the type of an optimizer used in training the text processing model;
generating a batch size which dynamically changes in the multi-batch training of the text processing model, wherein the batch size which dynamically changes within the interval range, and each change is controlled by a reference precision index, a reference generalization index and a current precision index and a current generalization index of the text processing model in the current batch training.
7. The method of claim 1, wherein the dividing training texts in the ranking result based on batch sizes results in a plurality of batches, each batch containing a plurality of training texts, and before the step of further comprising:
acquiring operation information of hardware equipment for training the text processing model;
according to the operation information, the quantity of training texts in the text training data set, the model type of the text processing model and the type of an optimizer used in training the text processing model, the following hyper-parameters are determined: iteration times, learning rate and batch processing size;
wherein the determined set of hyper-parameters is an optimal combination.
8. An apparatus for accelerating training of a text processing model, comprising:
an acquisition module configured to acquire a text training dataset, wherein the text training dataset comprises a plurality of training texts;
the ordering module is configured to order the plurality of training texts based on the text length of each training text to obtain an ordering result;
the dividing module is configured to divide the training texts in the sequencing result based on batch processing size to obtain a plurality of batches, wherein each batch contains a plurality of training texts, and the batch processing number is used for indicating the number of the training texts in each batch of training;
and a training module configured to train the text processing model in multiple batches using the multiple batches of training text.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.
CN202310197464.5A 2023-03-01 2023-03-01 Method and device for accelerating training of text processing model Pending CN116304693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310197464.5A CN116304693A (en) 2023-03-01 2023-03-01 Method and device for accelerating training of text processing model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310197464.5A CN116304693A (en) 2023-03-01 2023-03-01 Method and device for accelerating training of text processing model

Publications (1)

Publication Number Publication Date
CN116304693A true CN116304693A (en) 2023-06-23

Family

ID=86784589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310197464.5A Pending CN116304693A (en) 2023-03-01 2023-03-01 Method and device for accelerating training of text processing model

Country Status (1)

Country Link
CN (1) CN116304693A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117313892A (en) * 2023-09-26 2023-12-29 上海悦普网络科技有限公司 Training device and method for text processing model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117313892A (en) * 2023-09-26 2023-12-29 上海悦普网络科技有限公司 Training device and method for text processing model

Similar Documents

Publication Publication Date Title
US20230252327A1 (en) Neural architecture search for convolutional neural networks
CN108710613B (en) Text similarity obtaining method, terminal device and medium
US20200265315A1 (en) Neural architecture search
WO2018081563A9 (en) Neural architecture search
CN110287961A (en) Chinese word cutting method, electronic device and readable storage medium storing program for executing
CN109446430A (en) Method, apparatus, computer equipment and the readable storage medium storing program for executing of Products Show
WO2018201151A1 (en) Neural network optimizer search
CN108363788B (en) Post intelligent ranking method and device and computer readable storage medium
CN116362351B (en) Method and device for training pre-training language model by using noise disturbance
EP3961384A1 (en) Automatic derivation of software engineering artifact attributes from product or service development concepts
CN110263328B (en) Discipline capability type labeling method and device, storage medium and terminal equipment
CN107133190A (en) The training method and training system of a kind of machine learning system
CN111401940A (en) Feature prediction method, feature prediction device, electronic device, and storage medium
CN116304693A (en) Method and device for accelerating training of text processing model
CN114298329A (en) Model training method, device, equipment and storage medium
CN109753647A (en) The partitioning method and device of paragraph
CN110046344B (en) Method for adding separator and terminal equipment
CN116542328B (en) Knowledge distillation method and device for CTR prediction model
CN116595130B (en) Corpus expansion method and device under multiple tasks based on small language model
US10789510B2 (en) Dynamic minibatch sizes
CN116521527A (en) Test case recommendation method and device
CN112765936B (en) Training method and device for operation based on language model
CN116341640B (en) Text processing model training method and device
CN113297835B (en) Text similarity calculation method, device, equipment and storage medium
US11313694B2 (en) Method and apparatus for recommending travel way

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination