CN111695689B - Natural language processing method, device, equipment and readable storage medium - Google Patents

Natural language processing method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN111695689B
CN111695689B CN202010542341.7A CN202010542341A CN111695689B CN 111695689 B CN111695689 B CN 111695689B CN 202010542341 A CN202010542341 A CN 202010542341A CN 111695689 B CN111695689 B CN 111695689B
Authority
CN
China
Prior art keywords
model
natural language
language processing
target
model parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010542341.7A
Other languages
Chinese (zh)
Other versions
CN111695689A (en
Inventor
赖志权
杨越童
李东升
蔡蕾
张立志
冉浙江
梅松竹
王庆林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202010542341.7A priority Critical patent/CN111695689B/en
Publication of CN111695689A publication Critical patent/CN111695689A/en
Application granted granted Critical
Publication of CN111695689B publication Critical patent/CN111695689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a natural language processing method, which comprises the following steps: receiving natural language information to be processed; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the language processing model. By applying the technical scheme provided by the embodiment of the invention, the accuracy and the processing efficiency of the language processing model for natural language processing are improved. The invention also discloses a natural language processing device, equipment and a storage medium, which have corresponding technical effects.

Description

Natural language processing method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a method, an apparatus, a device, and a computer readable storage medium for processing natural language.
Background
At present, the deep learning method has been widely used for solving the problem of natural language processing, along with the deep research of the problem of natural language processing, the adopted deep learning model is more and more complex, and training data on which the deep learning model depends is more and more huge, on the other hand, in natural language processing applications such as machine translation, dialogue generation and the like, along with the change of language environment and training data, incremental training is required to be frequently carried out on the model so as to ensure the accuracy and the effectiveness of natural language processing by the language processing model. Thus, in the case of using a computing device, it takes a long time for these deep learning algorithms to complete training the deep learning model, meeting the convergence of the need for high-speed iteration of the model for natural language processing applications. Therefore, it is desirable to train the language processing model on multiple computing devices in a distributed training manner, which may reduce the time spent training.
One of the key problems of distributed training is to efficiently complete synchronization of gradient or model parameters and reduce communication overhead, so that the utilization rate of computing equipment is improved and the training progress is accelerated. For the existing distributed training method, a synchronous gradient synchronization mode is mostly adopted to keep the consistency of model parameters on each node. In this manner, each iteration of the training round requires the transfer and synchronization of gradients to be completed for each distributed node. The distributed training method is widely used for distributed training of the image classification model, and good effects are achieved through experimental verification.
In the current research, the trained deep neural network is mainly used for solving the problem of computer vision, and a typical representative of the trained deep neural network is the deep neural network for solving the problem of image classification, and the characteristic of the class of networks is that all gradient data can be represented by a dense matrix. There have been few studies attempting to distributively train language models in natural language processing, which are characterized by sparse portions of gradient data, which are represented by sparse matrices commonly used in existing distributive deep learning frameworks. Gradient data represented by the dense matrix is transferred among computing nodes of the cluster, so that the gradient data can be more efficiently realized by the existing set communication library, and the gradient data represented by the sparse matrix is difficult to efficiently transmit in the cluster. Therefore, the sparse model in the fields of natural language processing and the like is subjected to distributed training, the time required for the deep neural network model to converge to the target progress is long, and in the natural language processing process, the accuracy of the natural language processing by the language processing model is low, and the processing efficiency is low.
In summary, how to effectively solve the problems of low accuracy and low processing efficiency of the existing language processing model in performing the natural language processing is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a natural language processing method which improves the accuracy and the processing efficiency of natural language processing by a language processing model; another object of the present invention is to provide a natural language processing apparatus, device, and computer-readable storage medium.
In order to solve the technical problems, the invention provides the following technical scheme:
a natural language processing method, comprising:
receiving natural language information to be processed;
inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average;
and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.
In one embodiment of the present invention, the process of obtaining the target natural language processing model by performing model training through the model aggregation algorithm of the model average includes:
preprocessing an original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model;
grouping the model parameters according to the update frequencies to obtain model parameter groups;
determining a synchronization interval of each model parameter set respectively;
and in the model iteration process, carrying out synchronous operation on each model parameter in a corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval to obtain the target natural language processing model.
In a specific embodiment of the present invention, preprocessing an original natural language processing model to obtain an update frequency of each model parameter in the original natural language processing model, including:
selecting a first preset number of data sets from the training data set;
sequentially inputting each data set into the original natural language processing model to perform the first preset number of model iteration operations on the original natural language processing model;
forward computing and back-propagating the original natural language processing model aiming at each data group to obtain a gradient value of each model parameter in the original natural language processing model;
carrying out statistical operation on non-zero values in the gradient values in each model iteration by using an indication function to obtain the corresponding gradient updating effective times of each model parameter in each model iteration;
and respectively calculating the proportion of the effective times of the gradient update to the preset times to obtain the update frequency corresponding to each model parameter.
In one embodiment of the present invention, grouping each of the model parameters according to each of the update frequencies to obtain each of the model parameter sets includes:
inputting each model parameter into a preset sortable container;
sorting the model parameters by using the sorting container according to the update frequency corresponding to the model parameters respectively to obtain a sorting result;
and dividing each model parameter into a second preset number of model parameter groups according to the sorting result.
In a specific embodiment of the present invention, determining the synchronization interval of each of the model parameter sets includes:
calculating an average update frequency of the update frequencies of the model parameters in each model parameter set for each model parameter set;
and calculating the synchronization interval of each model parameter group according to each average updating frequency.
In a specific embodiment of the present invention, in a model iteration process, performing a synchronization operation on each model parameter in a corresponding model parameter set of the original natural language processing model in each training node according to each synchronization interval, including:
selecting a target training node from the training nodes;
initializing each model parameter in the original natural language processing model in each target training node to obtain an initialization result;
broadcasting the initialization result to other training nodes except the target training node by utilizing the target training node;
initializing the iteration times of the model;
in the model iteration process, the residual operation is carried out on the accumulated result of the iteration times of each model by utilizing each synchronous interval;
and carrying out synchronous operation on each model parameter in the target model parameter set corresponding to the synchronous interval with zero remainder.
In one embodiment of the present invention, performing a synchronization operation on each of the model parameters in a set of target model parameters corresponding to a synchronization interval with zero remainder includes:
respectively obtaining target model parameter sets in the training nodes;
respectively carrying out average value calculation on corresponding model parameters in each target model parameter group to obtain average model parameters;
setting each model parameter in the target model parameter set in each training node as each average model parameter.
A natural language processing apparatus, comprising:
the information receiving module is used for receiving natural language information to be processed;
the information input module is used for inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average;
and the information processing module is used for carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.
A natural language processing device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the natural language processing method as described above when executing the computer program.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of a natural language processing method as described above.
By applying the method provided by the embodiment of the invention, the natural language information to be processed is received; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model. The target natural language processing model is obtained through distributed training by using a model average model aggregation algorithm, the model average model aggregation algorithm is suitable for training the language processing model which is expressed by a sparse matrix commonly used in a distributed deep learning framework, the model training time is shortened greatly, and the accuracy and the processing efficiency of the natural language processing by the language processing model are improved.
Correspondingly, the embodiment of the invention also provides a natural language processing device, a device and a computer readable storage medium corresponding to the natural language processing method, which have the technical effects and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an implementation of a natural language processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of another implementation of a natural language processing method according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating a natural language processing device according to an embodiment of the present invention;
fig. 4 is a block diagram of a natural language processing device according to an embodiment of the present invention.
Detailed Description
In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Embodiment one:
referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a natural language processing method according to an embodiment of the present invention, where the method may include the following steps:
s101: and receiving the natural language information to be processed.
When the natural language processing is needed, the natural language information to be processed is sent to the processing center. The processing center receives natural language information to be processed. The natural language information to be processed may be sentences to be translated, sentences of a dialog to be generated, and the like.
S102: inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average.
And carrying out distributed training on the original natural language processing model by using a model aggregation algorithm of model average in advance to obtain a target natural language processing model. After receiving the natural language information to be processed, the processing center inputs the natural language information to be processed into the target natural language processing model. When the natural language information to be processed is the sentence to be translated, carrying out distributed training on the original machine translation model by using a model aggregation algorithm of model average in advance to obtain a target machine translation model, and inputting the sentence to be translated into the target machine translation model; when the natural language information to be processed is a sentence of the dialogue to be generated, the original dialogue generation model is subjected to distributed training by using a model aggregation algorithm of model average in advance to obtain a target dialogue generation model, and the sentence of the dialogue to be generated is input into the target dialogue generation model.
S103: and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.
After the natural language information to be processed is input into the target natural language processing model, the language processing model is utilized to perform corresponding natural language understanding or natural language generating operation on the natural language information to be processed. Taking the example, when the natural language information to be processed is a sentence to be translated, performing translation operation on the sentence to be translated by using a target machine translation model; and when the natural language information to be processed is the sentence of the dialogue to be generated, performing dialogue generation operation on the sentence of the dialogue to be generated by using the target dialogue generation model. The model aggregation algorithm of model average is suitable for training the language processing model which is expressed by the sparse matrix commonly used in the distributed deep learning frame, greatly shortens the model training time, and improves the accuracy and the processing efficiency of the language processing model for natural language processing.
It should be noted that, the target natural language processing model is not limited to processing sentences to be translated and sentences of dialog to be generated, and can also be used for information extraction, text emotion analysis, personalized recommendation and the like, and the corresponding target natural language processing model is trained in advance according to different application scenes.
By applying the method provided by the embodiment of the invention, the natural language information to be processed is received; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model. The target natural language processing model is obtained through distributed training by using a model average model aggregation algorithm, the model average model aggregation algorithm is suitable for training the language processing model which is expressed by a sparse matrix commonly used in a distributed deep learning framework, the model training time is shortened greatly, and the accuracy and the processing efficiency of the natural language processing by the language processing model are improved.
It should be noted that, based on the first embodiment, the embodiment of the present invention further provides a corresponding improvement scheme. The following embodiments relate to the same steps as those in the first embodiment or the steps corresponding to the first embodiment, and the corresponding beneficial effects can also be referred to each other, so that the following modified embodiments will not be repeated.
Embodiment two:
referring to fig. 2, fig. 2 is a flowchart of another implementation of a natural language processing method according to an embodiment of the present invention, where the method may include the following steps:
s201: and preprocessing the original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model.
In the model iteration process, an original natural language processing model is obtained, and the original natural language processing model is preprocessed to obtain the updating frequency of each model parameter in the original natural language processing model.
In one embodiment of the present invention, step S201 may include the steps of:
step one: selecting a first preset number of data sets from the training data set;
step two: sequentially inputting each data set into the original natural language processing model to perform a first preset number of model iteration operations on the original natural language processing model;
step three: forward computing and back-propagating the original natural language processing model aiming at each data group to obtain a gradient value of each model parameter in the original natural language processing model;
step four: and carrying out statistical operation on non-zero values in gradient values in each model iteration by using an indication function to obtain the corresponding gradient updating effective times of each model parameter in each model iteration.
Step five: and respectively calculating the proportion of the effective times of each gradient update to the preset times to obtain the update frequency corresponding to each model parameter.
For convenience of description, the above five steps are described in combination.
Selecting a first preset number of data sets in a pre-acquired training data set, for example, selecting m data sets, sequentially inputting each data set into the original natural language processing model, wherein each data set corresponds to one iteration of the original natural language processing model, and accordingly performing a first preset number of model iteration operations on the original natural language processing model. Forward computing and back-propagating the original natural language processing model aiming at each data group to obtain gradient values of each model parameter in the original natural language processing model
Figure BDA0002539423220000085
Gradient values in each model iteration using the indicator function I>
Figure BDA0002539423220000086
Statistical operations are performed on non-zero values in (a) to obtain a set of values in each model iterationAnd obtaining the effective times of the gradient update corresponding to each model parameter in each model iteration respectively, wherein the gradient update corresponding to each model parameter is effective or not. The definition of the indicator function I is as follows:
Figure BDA0002539423220000081
when (when)
Figure BDA0002539423220000082
When it is stated that the corresponding gradient update is valid,/->
Figure BDA0002539423220000083
Representing the gradient values of the ith model parameter after the t-th iteration.
And respectively calculating the proportion of the effective times of each gradient update to the preset times to obtain the update frequency corresponding to each model parameter. The ratio of the effective times of each gradient update to the preset times can be calculated by the following formula to obtain the update frequency alpha corresponding to each model parameter i
Figure BDA0002539423220000084
S202: and grouping the model parameters according to the updating frequencies to obtain the model parameter groups.
Because the language processing model is represented by a sparse matrix in the distributed deep learning framework, after the update frequency of each model parameter in the original natural language processing model is obtained, grouping operation is carried out on each model parameter according to each update frequency, so as to obtain each model parameter group.
In one embodiment of the present invention, step S202 may include the steps of:
step one: inputting each model parameter into a preset sortable container;
step two: sequencing the model parameters by utilizing a sequencing container according to the update frequency corresponding to the model parameters respectively to obtain a sequencing result;
step three: and dividing each model parameter into a second preset number of model parameter groups according to the sorting result.
For convenience of description, the above three steps are described in combination.
Pre-deploying the sequencable container, inputting each model parameter into the pre-arranged sequencable container, and obtaining the sparseness degree of each model parameter, namely alpha i And sequencing the model parameters by utilizing a sequencing container according to the update frequency corresponding to the model parameters respectively to obtain a sequencing result. And dividing each model parameter into a second preset number of model parameter groups according to the sorting result. The second preset number is represented by a preset super parameter q, and the i-th group model parameters with similar sparsity degree are represented by p i The set of model parameters can be represented as:
P={p 1 ,p 2 ,…,p q };
s203: the synchronization interval for each set of model parameters is determined separately.
After each model parameter set is obtained, the synchronization interval of each model parameter set is determined separately.
In one embodiment of the present invention, step S203 may include the steps of:
step one: calculating an average update frequency of the update frequencies of the model parameters in the model parameter sets for each model parameter set;
step two: and calculating the synchronization interval of each model parameter group according to each average updating frequency.
For convenience of description, the above two steps are described in combination.
After the update frequency of each model parameter is obtained and the model parameters are grouped to obtain each model parameter set, the average update frequency of the update frequency of each model parameter in the model parameter set is calculated for each model parameter set
Figure BDA0002539423220000091
And calculating the synchronization interval of each model parameter group according to each average updating frequency. The average update frequency +.Can be calculated specifically according to the following formula>
Figure BDA0002539423220000092
Figure BDA0002539423220000093
Where || represents 1-norm calculation.
After determining the synchronization interval of each model parameter set, the synchronization interval k of each model parameter set is calculated based on the average update frequency i The respective synchronization interval k for each set of model parameters can be calculated by the following formula i
Figure BDA0002539423220000094
Wherein k is i The synchronization interval corresponding to the i-th group of model parameter sets is represented by K, which is represented by a synchronization interval set formed by the synchronization intervals corresponding to the respective groups of model parameter sets:
Figure BDA0002539423220000095
and lambda is a synchronization interval setting coefficient, so that an invention operator can dynamically adjust the synchronization interval of different sparse depth neural network models based on priori knowledge.
S204: and in the model iteration process, performing synchronous operation on each model parameter in the corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval to obtain the target natural language processing model.
And in the model iteration process, performing synchronous operation on each model parameter in the corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval to obtain the target natural language processing model.
In one embodiment of the present invention, step S204 may include the steps of:
step one: selecting a target training node from the training nodes;
step two: initializing each model parameter in the original natural language processing model in each target training node to obtain an initialization result;
step three: broadcasting an initialization result to other training nodes except the target training node by using the target training node;
step four: initializing the iteration times of the model;
step five: in the model iteration process, the remainder operation is carried out on the accumulated results of the iteration times of each model by utilizing each synchronous interval;
step six: and carrying out synchronous operation on each model parameter in the target model parameter set corresponding to the synchronous interval with zero remainder to obtain the target natural language processing model.
For convenience of description, the above six steps are described in combination.
And selecting target training nodes from the training nodes, and initializing model parameters in the original natural language processing model in each target training node to obtain an initialization result.
The target training node may be any training node in each training node, for example, each training node may be numbered in advance, the numbers are respectively 0 and 1 and … …, and the training node with the number 0 is selected as the target training node. Initializing each model parameter in the original natural language processing model in each target training node to obtain an initialization result. Broadcasting the initialization result to other training nodes except the target training node by using the target training node. Initializing the iteration times of the model, for example, initializing the iteration times to 0 before carrying out iterative training on an original natural language processing model, and respectively carrying out remainder taking operation on the accumulated result of the iteration times t of each model by utilizing each synchronous interval in the model iteration process, wherein the remainder taking operation can be calculated by the following formula:
t modk i
and carrying out synchronous operation on each model parameter in the target model parameter set corresponding to the synchronous interval with zero remainder to obtain the target natural language processing model.
In a specific embodiment of the present invention, the synchronization operation for each model parameter in the target model parameter set corresponding to the synchronization interval with zero remainder may include the following steps:
step one: respectively acquiring target model parameter sets in all training nodes;
step two: respectively carrying out average value calculation on corresponding model parameters in each target model parameter group to obtain each average model parameter;
step three: setting each model parameter in the target model parameter set in each training node as each average model parameter to obtain a target natural language processing model.
For convenience of description, the above three steps are described in combination.
Respectively obtaining target model parameter sets in all training nodes, respectively carrying out mean value calculation on corresponding model parameters in the target model parameter sets to obtain average model parameters, setting the model parameters in the target model parameter sets in all training nodes as the average model parameters, and obtaining the target natural language processing model.
The step S201 to the step S203 complete the solution of the key parameters P and K, and the step S201 to the step S203 may be obtained by performing offline computation of a single machine using only one training node. Since the model is not trained in steps S201 to S203, modification of model parameters is not required, and only the corresponding gradient values are calculated according to different batch training data, the time consumption is much shorter than that of a common single machine training. A single server with Intel to strong CPU and 4 pieces of Injeida RTX 2080Ti GPU can be selected, and one piece of GPU in the server is used for completing the steps S201 to S203 aiming at the LM1b model, so that the time consumption is short. Empirically, the time consumed by steps S201 through S203 is negligible relative to the overall training time.
Compared with the existing distributed training mode of the language processing model, the method of the invention has the advantages that the distributed training is carried out on the language processing model on a single server and two servers, and the training time is greatly shortened.
S205: and receiving the natural language information to be processed.
S206: inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average.
S207: and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a natural language processing device, where the natural language processing device described below and the natural language processing method described above may be referred to correspondingly.
Referring to fig. 3, fig. 3 is a block diagram illustrating a natural language processing device according to an embodiment of the present invention, where the device may include:
an information receiving module 31 for receiving natural language information to be processed;
an information input module 32 for inputting the natural language information to be processed into the target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average;
the information processing module 33 is configured to perform corresponding natural language understanding or natural language generating operation on the to-be-processed natural language information by using the target natural language processing model.
The device provided by the embodiment of the invention is applied to receive the natural language information to be processed; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the language processing model. The target natural language processing model is obtained through distributed training by using a model average model aggregation algorithm, the model average model aggregation algorithm is suitable for training the language processing model which is expressed by a sparse matrix commonly used in a distributed deep learning framework, the model training time is shortened greatly, and the accuracy and the processing efficiency of the natural language processing by the language processing model are improved.
In one embodiment of the present invention, the apparatus includes a model training module comprising:
the parameter updating frequency obtaining sub-module is used for preprocessing the original natural language processing model to obtain the updating frequency of each model parameter in the original natural language processing model;
the parameter set obtaining submodule is used for grouping the model parameters according to the updating frequencies to obtain the model parameter sets;
the synchronization interval determining submodule is used for respectively determining the synchronization interval of each model parameter set;
and the parameter synchronization sub-module is used for carrying out synchronous operation on each model parameter in the corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval in the model iteration process to obtain the target natural language processing model.
In a specific embodiment of the present invention, the parameter update frequency obtaining submodule includes:
a data set selecting unit for selecting a first preset number of data sets from the training data set;
the model iteration unit is used for sequentially inputting each data set into the original natural language processing model so as to perform a first preset number of model iteration operations on the original natural language processing model;
the gradient value obtaining unit is used for carrying out forward calculation and backward propagation on the original natural language processing model aiming at each data group to obtain a gradient value of each model parameter in the original natural language processing model;
the effective times obtaining unit is used for carrying out statistics operation on non-zero values in gradient values in each model iteration by using an indication function to obtain gradient updating effective times corresponding to each model parameter in each model iteration respectively;
the updating frequency obtaining unit is used for respectively calculating the proportion of the effective times of each gradient updating to the preset times to obtain the updating frequency corresponding to each model parameter.
In a specific embodiment of the present invention, the parameter set obtaining submodule includes:
a parameter input unit for inputting each model parameter into a preset sortable container;
the sequencing result obtaining unit is used for sequencing the model parameters by utilizing the sequencing container according to the update frequency corresponding to the model parameters respectively to obtain a sequencing result;
and the parameter set dividing unit is used for dividing each model parameter into a second preset number of model parameter sets according to the sorting result.
In one embodiment of the present invention, the synchronization interval determination submodule includes:
an average update frequency calculation unit configured to calculate, for each model parameter group, an average update frequency of update frequencies of model parameters in the model parameter group;
and the synchronization interval calculation unit is used for calculating the synchronization interval of each model parameter set according to each average update frequency.
In one embodiment of the present invention, the parameter synchronization sub-module includes:
the node selection unit is used for selecting a target training node from all the training nodes;
the parameter initializing unit is used for initializing each model parameter in the original natural language processing model in each target training node to obtain an initializing result;
the broadcasting unit is used for broadcasting the initialization result to other training nodes except the target training node by using the target training node;
the iteration number initializing unit is used for initializing the iteration number of the model;
the remainder taking unit is used for taking remainder of the accumulated results of the iteration times of each model by utilizing each synchronous interval in the model iteration process;
and the parameter synchronization unit is used for performing synchronization operation on each model parameter in the target model parameter set corresponding to the synchronization interval with zero remainder.
In a specific embodiment of the present invention, the parameter synchronization unit includes:
the parameter set acquisition subunit is used for respectively acquiring the target model parameter sets in the training nodes;
the parameter average value obtaining subunit is used for respectively carrying out average value calculation on corresponding model parameters in each target model parameter group to obtain each average model parameter;
and the parameter setting subunit is used for setting each model parameter in the target model parameter set in each training node as each average model parameter.
Corresponding to the above method embodiment, referring to fig. 4, fig. 4 is a schematic diagram of a natural language processing device provided by the present invention, where the device may include:
a memory 41 for storing a computer program;
the processor 42 is configured to execute the computer program stored in the memory 41, and implement the following steps:
receiving natural language information to be processed; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.
For the description of the apparatus provided by the present invention, please refer to the above method embodiment, and the description of the present invention is omitted herein.
Corresponding to the above method embodiments, the present invention also provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of:
receiving natural language information to be processed; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.
The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
For the description of the computer-readable storage medium provided by the present invention, refer to the above method embodiments, and the disclosure is not repeated here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. The apparatus, device and computer readable storage medium of the embodiments are described more simply because they correspond to the methods of the embodiments, and the description thereof will be given with reference to the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, but the description of the examples above is only for aiding in understanding the technical solution of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (8)

1. A method of natural language processing, comprising:
receiving natural language information to be processed;
inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average;
performing corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model;
the process of obtaining the target natural language processing model through model training by the model aggregation algorithm of the model average comprises the following steps:
preprocessing an original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model;
grouping the model parameters according to the update frequencies to obtain model parameter groups;
determining a synchronization interval of each model parameter set respectively;
in the model iteration process, performing synchronous operation on each model parameter in a corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval to obtain the target natural language processing model;
and grouping the model parameters according to the update frequencies to obtain model parameter sets, wherein the grouping comprises the following steps:
inputting each model parameter into a preset sortable container;
sorting the model parameters by using the sorting container according to the update frequency corresponding to the model parameters respectively to obtain a sorting result;
and dividing each model parameter into a second preset number of model parameter groups according to the sorting result.
2. The method according to claim 1, wherein preprocessing an original natural language processing model to obtain an update frequency of each model parameter in the original natural language processing model, comprises:
selecting a first preset number of data sets from the training data set;
sequentially inputting each data set into the original natural language processing model to perform the first preset number of model iteration operations on the original natural language processing model;
forward computing and back-propagating the original natural language processing model aiming at each data group to obtain a gradient value of each model parameter in the original natural language processing model;
carrying out statistical operation on non-zero values in the gradient values in each model iteration by using an indication function to obtain the corresponding gradient updating effective times of each model parameter in each model iteration;
and respectively calculating the proportion of the effective times of the gradient update to the preset times to obtain the update frequency corresponding to each model parameter.
3. The natural language processing method of claim 1, wherein determining the synchronization interval of each of the model parameter sets, respectively, comprises:
calculating an average update frequency of the update frequencies of the model parameters in each model parameter set for each model parameter set;
and calculating the synchronization interval of each model parameter group according to each average updating frequency.
4. A natural language processing method according to any one of claims 1 to 3, wherein in a model iteration process, performing a synchronization operation on each of the model parameters in a corresponding model parameter set of the original natural language processing model in each training node according to each of the synchronization intervals, includes:
selecting a target training node from the training nodes;
initializing each model parameter in the original natural language processing model in each target training node to obtain an initialization result;
broadcasting the initialization result to other training nodes except the target training node by utilizing the target training node;
initializing the iteration times of the model;
in the model iteration process, the residual operation is carried out on the accumulated result of the iteration times of each model by utilizing each synchronous interval;
and carrying out synchronous operation on each model parameter in the target model parameter set corresponding to the synchronous interval with zero remainder.
5. The method of claim 4, wherein synchronizing each of the model parameters in the set of target model parameters corresponding to a synchronization interval with zero remainder comprises:
respectively obtaining target model parameter sets in the training nodes;
respectively carrying out average value calculation on corresponding model parameters in each target model parameter group to obtain average model parameters;
setting each model parameter in the target model parameter set in each training node as each average model parameter.
6. A natural language processing apparatus, comprising:
the information receiving module is used for receiving natural language information to be processed;
the information input module is used for inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average;
the information processing module is used for carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model;
wherein the apparatus comprises a model training module comprising:
the parameter updating frequency obtaining sub-module is used for preprocessing the original natural language processing model to obtain the updating frequency of each model parameter in the original natural language processing model;
the parameter set obtaining submodule is used for grouping the model parameters according to the updating frequencies to obtain the model parameter sets;
the synchronization interval determining submodule is used for respectively determining the synchronization interval of each model parameter set;
the parameter synchronization sub-module is used for carrying out synchronous operation on each model parameter in a corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval in the model iteration process to obtain a target natural language processing model;
the process of grouping the model parameters by the parameter set obtaining submodule according to the update frequencies to obtain the model parameter sets includes: inputting each model parameter into a preset sortable container; sorting the model parameters by using the sorting container according to the update frequency corresponding to the model parameters respectively to obtain a sorting result; and dividing each model parameter into a second preset number of model parameter groups according to the sorting result.
7. A natural language processing device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the natural language processing method of any one of claims 1 to 5 when executing the computer program.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the natural language processing method according to any one of claims 1 to 5.
CN202010542341.7A 2020-06-15 2020-06-15 Natural language processing method, device, equipment and readable storage medium Active CN111695689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010542341.7A CN111695689B (en) 2020-06-15 2020-06-15 Natural language processing method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010542341.7A CN111695689B (en) 2020-06-15 2020-06-15 Natural language processing method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111695689A CN111695689A (en) 2020-09-22
CN111695689B true CN111695689B (en) 2023-06-20

Family

ID=72480984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010542341.7A Active CN111695689B (en) 2020-06-15 2020-06-15 Natural language processing method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111695689B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112561078B (en) * 2020-12-18 2021-12-28 北京百度网讯科技有限公司 Distributed model training method and related device
CN112699686B (en) * 2021-01-05 2024-03-08 浙江诺诺网络科技有限公司 Semantic understanding method, device, equipment and medium based on task type dialogue system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014074698A2 (en) * 2012-11-12 2014-05-15 Nuance Communications, Inc. Distributed nlu/nlp

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035751B (en) * 2014-06-20 2016-10-12 深圳市腾讯计算机系统有限公司 Data parallel processing method based on multi-graphics processor and device
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
US9715498B2 (en) * 2015-08-31 2017-07-25 Microsoft Technology Licensing, Llc Distributed server system for language understanding
CN108280522B (en) * 2018-01-03 2021-08-20 北京大学 Plug-in distributed machine learning calculation framework and data processing method thereof
CN108549692B (en) * 2018-04-13 2021-05-11 重庆邮电大学 Method for classifying text emotion through sparse multiple logistic regression model under Spark framework
JP7135743B2 (en) * 2018-11-06 2022-09-13 日本電信電話株式会社 Distributed processing system and distributed processing method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014074698A2 (en) * 2012-11-12 2014-05-15 Nuance Communications, Inc. Distributed nlu/nlp

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于分布内存的层次短语机器翻译并行化算法;赵博;黄书剑;戴新宇;袁春风;黄宜华;;计算机研究与发展(第12期);2724-2732 *

Also Published As

Publication number Publication date
CN111695689A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN110969250B (en) Neural network training method and device
CN111030861B (en) Edge calculation distributed model training method, terminal and network side equipment
CN109934331A (en) Device and method for executing artificial neural network forward operation
US20170193368A1 (en) Conditional parallel processing in fully-connected neural networks
CN111695689B (en) Natural language processing method, device, equipment and readable storage medium
Alawad et al. Stochastic-based deep convolutional networks with reconfigurable logic fabric
CN113705793B (en) Decision variable determination method and device, electronic equipment and medium
CN112508190A (en) Method, device and equipment for processing structured sparse parameters and storage medium
CN116701692B (en) Image generation method, device, equipment and medium
CN109460813B (en) Acceleration method, device and equipment for convolutional neural network calculation and storage medium
CN113159287A (en) Distributed deep learning method based on gradient sparsity
US20230087774A1 (en) Parameter optimization method, electronic device, and storage medium
CN116644804A (en) Distributed training system, neural network model training method, device and medium
CN106156142B (en) Text clustering processing method, server and system
CN113191486A (en) Graph data and parameter data mixed partitioning method based on parameter server architecture
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
CN111191036A (en) Short text topic clustering method, device, equipment and medium
CN115879547A (en) Open world knowledge graph complementing method and system based on LSTM and attention mechanism
CN115168326A (en) Hadoop big data platform distributed energy data cleaning method and system
US20220138554A1 (en) Systems and methods utilizing machine learning techniques for training neural networks to generate distributions
CN116797850A (en) Class increment image classification method based on knowledge distillation and consistency regularization
CN116384471A (en) Model pruning method, device, computer equipment, storage medium and program product
CN115909441A (en) Face recognition model establishing method, face recognition method and electronic equipment
DE102022120819A1 (en) QUANTIZED NEURAL NETWORK TRAINING AND INFERENCE
CN109636199B (en) Method and system for matching translator for to-be-translated manuscript

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant