CN111695689B

CN111695689B - A natural language processing method, device, equipment and readable storage medium

Info

Publication number: CN111695689B
Application number: CN202010542341.7A
Authority: CN
Inventors: 赖志权; 杨越童; 李东升; 蔡蕾; 张立志; 冉浙江; 梅松竹; 王庆林
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2023-06-20
Anticipated expiration: 2040-06-15
Also published as: CN111695689A

Abstract

The invention discloses a natural language processing method, which comprises the following steps: receiving natural language information to be processed; inputting the natural language information to be processed into a target natural language processing model; wherein, the target natural language processing model is averaged by the model It is obtained through distributed training of the model aggregation algorithm; the language processing model is used to perform corresponding natural language understanding or natural language generation operations on the natural language information to be processed. By applying the technical solutions provided by the embodiments of the present invention, the accuracy and processing efficiency of the natural language processing performed by the language processing model are improved. The invention also discloses a natural language processing device, equipment and storage medium, which have corresponding technical effects.

Description

A natural language processing method, device, equipment and readable storage medium

技术领域technical field

本发明涉及深度学习技术领域，特别是涉及一种自然语言处理方法、装置、设备及计算机可读存储介质。The present invention relates to the technical field of deep learning, in particular to a natural language processing method, device, equipment and computer-readable storage medium.

背景技术Background technique

目前，深度学习方法已经被广泛用于解决自然语言处理的问题，随着对自然语言处理问题的研究的深入，所采用的深度学习模型越来越复杂，深度学习模型所依赖的训练数据也越来越庞大，另一方面，机器翻译、对话生成等自然语言处理应用中，随着语言环境和训练数据的变化，需要频繁地对模型进行增量训练，以保证语言处理模型进行自然语言处理的准确性和有效性。因此，在使用一台计算设备的情形下，这些深度学习算法需要花费很长的时间才能完成使深度学习模型训练，满足不了自然语言处理应用的模型高速迭代的需求收敛。因此，需要在多台计算设备上，通过分布式训练的方式对语言处理模型进行训练，可以减少训练所花费的时间。At present, deep learning methods have been widely used to solve natural language processing problems. With the deepening of research on natural language processing problems, the deep learning models used are becoming more and more complex, and the training data that the deep learning models rely on are becoming more and more complicated. On the other hand, in natural language processing applications such as machine translation and dialogue generation, as the language environment and training data change, frequent incremental training of the model is required to ensure the performance of the language processing model in natural language processing. Accuracy and Validity. Therefore, in the case of using one computing device, these deep learning algorithms take a long time to complete the training of the deep learning model, which cannot meet the needs of high-speed iteration of the model for natural language processing applications. Therefore, it is necessary to train the language processing model through distributed training on multiple computing devices, which can reduce the time spent on training.

分布式训练的关键问题之一，是高效的完成梯度或模型参数的同步，减少通信开销，从而提高计算设备的使用率，加快训练进度。对于现有的分布式训练方法，大多采用的同步的梯度同步方式以保持各个节点上模型参数的一致性。在这种方式下，训练的每一轮迭代各个分布式节点都需要完成梯度的传递和同步。这种分布式训练方法已被广泛使用于图像分类模型的分布式训练，并通过实验验证取得了较好的效果。One of the key issues in distributed training is to efficiently complete the synchronization of gradients or model parameters, reduce communication overhead, thereby increasing the utilization rate of computing equipment and speeding up the training progress. For the existing distributed training methods, most of them adopt a synchronous gradient synchronization method to maintain the consistency of model parameters on each node. In this way, each distributed node needs to complete gradient transmission and synchronization in each iteration of training. This distributed training method has been widely used in the distributed training of image classification models, and has achieved good results through experimental verification.

目前的研究中，所训练的深度神经网络主要是用于解决计算机视觉问题，一个典型的代表是解决图像分类问题的深度神经网络，这一类网络的特点是所有的梯度数据均可以用稠密矩阵来表示。鲜有研究尝试对自然语言处理中的语言模型进行分布式训练，这类模型的特点是，一部分梯度数据是稀疏的，它们在现有的分布式深度学习框架中常用稀疏的矩阵进行表示。将稠密矩阵所表示的梯度数据在集群的计算节点之间传递，能够较高效的被现有的集合通信库所实现，而稀疏矩阵所表示的梯度数据，难以在集群中进行高效的数据传输。因此，对于自然语言处理等领域出现的稀疏模型进行分布式训练，存在深度神经网络模型收敛至目标进度所需时间长，在自然语言处理过程中，语言处理模型进行自然语言处理的准确性低，处理效率低。In the current research, the trained deep neural network is mainly used to solve computer vision problems. A typical representative is the deep neural network that solves image classification problems. The characteristic of this type of network is that all gradient data can be used with dense matrix To represent. Few studies have attempted distributed training of language models in natural language processing. The characteristic of this type of model is that part of the gradient data is sparse, and they are often represented by sparse matrices in existing distributed deep learning frameworks. Transferring the gradient data represented by the dense matrix between the computing nodes of the cluster can be more efficiently realized by the existing collective communication library, while the gradient data represented by the sparse matrix is difficult to perform efficient data transmission in the cluster. Therefore, for distributed training of sparse models that appear in fields such as natural language processing, it takes a long time for the deep neural network model to converge to the target progress. In the process of natural language processing, the accuracy of the language processing model for natural language processing is low. The processing efficiency is low.

综上所述，如何有效地解决现有的语言处理模型进行自然语言处理的准确性低，处理效率低等问题，是目前本领域技术人员急需解决的问题。To sum up, how to effectively solve the problems of low accuracy and low processing efficiency of the existing language processing models for natural language processing is an urgent problem for those skilled in the art.

发明内容Contents of the invention

本发明的目的是提供一种自然语言处理方法，该方法提高了语言处理模型进行自然语言处理的准确性和处理效率；本发明的另一目的是提供一种自然语言处理装置、设备及计算机可读存储介质。The purpose of the present invention is to provide a natural language processing method, which improves the accuracy and processing efficiency of the language processing model for natural language processing; another purpose of the present invention is to provide a natural language processing device, equipment and computer-based Read storage media.

为解决上述技术问题，本发明提供如下技术方案：In order to solve the above technical problems, the present invention provides the following technical solutions:

一种自然语言处理方法，包括：A natural language processing method comprising:

接收待处理自然语言信息；Receive natural language information to be processed;

将所述待处理自然语言信息输入到目标自然语言处理模型中；其中，所述目标自然语言处理模型为通过模型平均的模型聚合算法进行分布式训练得到；Inputting the natural language information to be processed into a target natural language processing model; wherein, the target natural language processing model is obtained by performing distributed training through a model averaging model aggregation algorithm;

利用所述目标自然语言处理模型对所述待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。Using the target natural language processing model to perform corresponding natural language understanding or natural language generation operations on the to-be-processed natural language information.

在本发明的一种具体实施方式中，通过所述模型平均的模型聚合算法进行模型训练得到所述目标自然语言处理模型的过程包括：In a specific embodiment of the present invention, the process of obtaining the target natural language processing model by performing model training through the model averaging model aggregation algorithm includes:

对原自然语言处理模型进行预处理，得到所述原自然语言处理模型中每个模型参数的更新频率；Preprocessing the original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model;

根据各所述更新频率对各所述模型参数进行分组操作，得到各模型参数组；performing a grouping operation on each of the model parameters according to each of the update frequencies to obtain each of the model parameter groups;

分别确定各所述模型参数组的同步间隔；Respectively determine the synchronization interval of each of the model parameter groups;

在模型迭代过程中，根据各所述同步间隔对各训练节点内所述原自然语言处理模型的对应模型参数组中各所述模型参数进行同步操作，得到所述目标自然语言处理模型。During the model iteration process, each of the model parameters in the corresponding model parameter group of the original natural language processing model in each training node is synchronized according to each of the synchronization intervals to obtain the target natural language processing model.

在本发明的一种具体实施方式中，对原自然语言处理模型进行预处理，得到所述原自然语言处理模型中每个模型参数的更新频率，包括：In a specific embodiment of the present invention, the original natural language processing model is preprocessed to obtain the update frequency of each model parameter in the original natural language processing model, including:

从训练数据集中选取第一预设数量的数据组；selecting a first preset number of data sets from the training data set;

依次将各所述数据组输入到所述原自然语言处理模型，以对所述原自然语言处理模型进行所述第一预设数量的模型迭代操作；sequentially input each of the data groups into the original natural language processing model, so as to perform the first preset number of model iteration operations on the original natural language processing model;

针对每个数据组，对所述原自然语言处理模型进行前向计算和反向传播，得到所述原自然语言处理模型中每个模型参数的梯度值；For each data group, perform forward calculation and backpropagation on the original natural language processing model to obtain the gradient value of each model parameter in the original natural language processing model;

利用指示函数对各次模型迭代中各所述梯度值中的非零值进行统计操作，得到各所述模型参数分别在各次模型迭代中对应的梯度更新有效次数；Using an indicator function to perform statistical operations on the non-zero values in each of the gradient values in each model iteration, to obtain the effective number of gradient updates corresponding to each of the model parameters in each model iteration;

分别计算各所述梯度更新有效次数占所述预设次数的比例，得到各所述模型参数分别对应的更新频率。Calculating the ratio of each effective number of gradient updates to the preset number of times to obtain update frequencies corresponding to each of the model parameters.

在本发明的一种具体实施方式中，根据各所述更新频率对各所述模型参数进行分组操作，得到各模型参数组，包括：In a specific embodiment of the present invention, each model parameter is grouped according to each update frequency to obtain each model parameter group, including:

将各所述模型参数输入到预置可排序容器中；inputting each of said model parameters into a preset sortable container;

利用所述排序容器根据各所述模型参数分别对应的更新频率的大小对各所述模型参数进行排序操作，得到排序结果；Using the sorting container to perform a sorting operation on each of the model parameters according to the magnitude of the update frequency corresponding to each of the model parameters, to obtain a sorting result;

根据所述排序结果将各所述模型参数划分为第二预设数量的模型参数组。Divide each of the model parameters into a second preset number of model parameter groups according to the ranking result.

在本发明的一种具体实施方式中，分别确定各所述模型参数组的同步间隔，包括：In a specific implementation manner of the present invention, determining the synchronization intervals of each of the model parameter groups respectively includes:

针对每个模型参数组，计算所述模型参数组中各所述模型参数的更新频率的平均更新频率；For each model parameter group, calculate the average update frequency of the update frequencies of the model parameters in the model parameter group;

根据各所述平均更新频率计算各所述模型参数组分别的同步间隔。The respective synchronization intervals of each of the model parameter groups are calculated according to each of the average update frequencies.

在本发明的一种具体实施方式中，在模型迭代过程中，根据各所述同步间隔对各训练节点内所述原自然语言处理模型的对应模型参数组中各所述模型参数进行同步操作，包括：In a specific embodiment of the present invention, during the model iteration process, each of the model parameters in the corresponding model parameter group of the original natural language processing model in each training node is synchronized according to each of the synchronization intervals, include:

从各所述训练节点中选取目标训练节点；selecting a target training node from each of the training nodes;

对各所述目标训练节点内所述原自然语言处理模型中各模型参数进行初始化，得到初始化结果；Initializing each model parameter in the original natural language processing model in each of the target training nodes to obtain an initialization result;

利用所述目标训练节点将所述初始化结果广播至除所述目标训练节点之外的其他训练节点；using the target training node to broadcast the initialization result to other training nodes except the target training node;

对模型迭代次数进行初始化；Initialize the number of model iterations;

在模型迭代过程中，分别利用各所述同步间隔对每个模型迭代次数累加结果进行取余操作；During the model iteration process, each of the synchronization intervals is used to perform a remainder operation on the accumulation result of each model iteration number;

对余数为零的同步间隔对应的目标模型参数组内各所述模型参数进行同步操作。A synchronization operation is performed on each of the model parameters in the target model parameter group corresponding to the synchronization interval whose remainder is zero.

在本发明的一种具体实施方式中，对余数为零的同步间隔对应的目标模型参数组内各所述模型参数进行同步操作，包括：In a specific embodiment of the present invention, the synchronization operation is performed on each of the model parameters in the target model parameter group corresponding to the synchronization interval whose remainder is zero, including:

分别获取各所述训练节点中的目标模型参数组；Obtaining target model parameter groups in each of the training nodes respectively;

分别对各所述目标模型参数组中对应各所述模型参数进行均值计算，得到各所述平均模型参数；Carrying out mean calculations for each of the model parameters corresponding to each of the target model parameter groups to obtain each of the average model parameters;

将各所述训练节点中的目标模型参数组中的各模型参数设置为各所述平均模型参数。Each model parameter in the target model parameter group in each of the training nodes is set as each of the average model parameters.

一种自然语言处理装置，包括：A natural language processing device, comprising:

信息接收模块，用于接收待处理自然语言信息；An information receiving module, configured to receive natural language information to be processed;

信息输入模块，用于将所述待处理自然语言信息输入到目标自然语言处理模型中；其中，所述目标自然语言处理模型为通过模型平均的模型聚合算法进行分布式训练得到；An information input module, configured to input the natural language information to be processed into a target natural language processing model; wherein, the target natural language processing model is obtained by performing distributed training through a model averaging model aggregation algorithm;

信息处理模块，用于利用所述目标自然语言处理模型对所述待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。An information processing module, configured to use the target natural language processing model to perform corresponding natural language understanding or natural language generation operations on the to-be-processed natural language information.

一种自然语言处理设备，包括：A natural language processing device, comprising:

存储器，用于存储计算机程序；memory for storing computer programs;

处理器，用于执行所述计算机程序时实现如前所述自然语言处理方法的步骤。A processor, configured to implement the steps of the aforementioned natural language processing method when executing the computer program.

一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如前所述自然语言处理方法的步骤。A computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the aforementioned natural language processing method are realized.

应用本发明实施例所提供的方法，接收待处理自然语言信息；将待处理自然语言信息输入到目标自然语言处理模型中；其中，目标自然语言处理模型为通过模型平均的模型聚合算法进行分布式训练得到；利用目标自然语言处理模型对待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。通过利用模型平均的模型聚合算法进行分布式训练得到目标自然语言处理模型，模型平均的模型聚合算法适用于对分布式深度学习框架中常用稀疏的矩阵进行表示的语言处理模型进行训练，较大地缩短了模型训练时长，提高了语言处理模型进行自然语言处理的准确性和处理效率。Apply the method provided by the embodiment of the present invention to receive the natural language information to be processed; input the natural language information to be processed into the target natural language processing model; wherein, the target natural language processing model is distributed through the model aggregation algorithm of model averaging Obtained through training; use the target natural language processing model to perform corresponding natural language understanding or natural language generation operations on the natural language information to be processed. The target natural language processing model is obtained by using the model averaging model aggregation algorithm for distributed training. The model averaging model aggregation algorithm is suitable for training language processing models represented by sparse matrices commonly used in distributed deep learning frameworks, greatly shortening The model training time is shortened, and the accuracy and processing efficiency of the language processing model for natural language processing are improved.

相应的，本发明实施例还提供了与上述自然语言处理方法相对应的自然语言处理装置、设备和计算机可读存储介质，具有上述技术效果，在此不再赘述。Correspondingly, the embodiments of the present invention also provide a natural language processing device, device, and computer-readable storage medium corresponding to the above-mentioned natural language processing method, which have the above-mentioned technical effects, and will not be repeated here.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明实施例中自然语言处理方法的一种实施流程图；Fig. 1 is a kind of implementation flowchart of the natural language processing method in the embodiment of the present invention;

图2为本发明实施例中自然语言处理方法的另一种实施流程图；Fig. 2 is another implementation flowchart of the natural language processing method in the embodiment of the present invention;

图3为本发明实施例中一种自然语言处理装置的结构框图；FIG. 3 is a structural block diagram of a natural language processing device in an embodiment of the present invention;

图4为本发明实施例中一种自然语言处理设备的结构框图。Fig. 4 is a structural block diagram of a natural language processing device in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面结合附图和具体实施方式对本发明作进一步的详细说明。显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

实施例一：Embodiment one:

参见图1，图1为本发明实施例中自然语言处理方法的一种实施流程图，该方法可以包括以下步骤：Referring to FIG. 1, FIG. 1 is an implementation flowchart of a natural language processing method in an embodiment of the present invention, and the method may include the following steps:

S101：接收待处理自然语言信息。S101: Receive natural language information to be processed.

当需要进行自然语言处理时，向处理中心发送待处理自然语言信息。处理中心接收待处理自然语言信息。待处理自然语言信息可以为待翻译语句、待生成对话的语句等。When natural language processing is required, the natural language information to be processed is sent to the processing center. The processing center receives the natural language information to be processed. The natural language information to be processed may be a sentence to be translated, a sentence to be generated for a dialogue, and the like.

S102：将待处理自然语言信息输入到目标自然语言处理模型中；其中，目标自然语言处理模型为通过模型平均的模型聚合算法进行分布式训练得到。S102: Input the to-be-processed natural language information into a target natural language processing model; wherein, the target natural language processing model is obtained by performing distributed training through a model averaging model aggregation algorithm.

预先利用模型平均的模型聚合算法对原自然语言处理模型进行分布式训练，得到目标自然语言处理模型。处理中心在接收到待处理自然语言信息之后，将待处理自然语言信息输入到目标自然语言处理模型中。承接上述举例，当待处理自然语言信息为待翻译语句时，预先利用模型平均的模型聚合算法对原机器翻译模型进行分布式训练，得到目标机器翻译模型，将待翻译语句输入到目标机器翻译模型中；当待处理自然语言信息为待生成对话的语句时，预先利用模型平均的模型聚合算法对原对话生成模型进行分布式训练，得到目标对话生成模型，将待生成对话的语句输入到目标对话生成模型。In advance, the model aggregation algorithm of model averaging is used to perform distributed training on the original natural language processing model to obtain the target natural language processing model. After receiving the natural language information to be processed, the processing center inputs the natural language information to be processed into the target natural language processing model. Following the above example, when the natural language information to be processed is a sentence to be translated, the original machine translation model is distributedly trained using the model averaging model aggregation algorithm in advance to obtain the target machine translation model, and the sentence to be translated is input to the target machine translation model Middle; when the natural language information to be processed is the sentence of the dialogue to be generated, the original dialogue generation model is distributedly trained by the model aggregation algorithm of model averaging in advance to obtain the target dialogue generation model, and the sentence to be generated is input into the target dialogue Generate a model.

S103：利用目标自然语言处理模型对待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。S103: Use the target natural language processing model to perform corresponding natural language understanding or natural language generation operations on the natural language information to be processed.

将待处理自然语言信息输入到目标自然语言处理模型之后，利用语言处理模型对待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。承接上述举例，当待处理自然语言信息为待翻译语句时，利用目标机器翻译模型对待翻译语句进行翻译操作；当待处理自然语言信息为待生成对话的语句时，利用目标对话生成模型对待生成对话的语句进行对话生成操作。模型平均的模型聚合算法适用于对分布式深度学习框架中常用稀疏的矩阵进行表示的语言处理模型进行训练，较大地缩短了模型训练时长，提高了语言处理模型进行自然语言处理的准确性和处理效率。After inputting the natural language information to be processed into the target natural language processing model, the language processing model is used to perform corresponding natural language understanding or natural language generation operations on the natural language information to be processed. Following the above example, when the natural language information to be processed is a sentence to be translated, use the target machine translation model to translate the sentence to be translated; when the natural language information to be processed is a sentence to be generated, use the target dialog generation model to translate the sentence statement to perform dialog generation operations. The model aggregation algorithm of model averaging is suitable for training language processing models represented by sparse matrices commonly used in distributed deep learning frameworks. efficiency.

需要说明的是，目标自然语言处理模型不限定于对待翻译语句、待生成对话的语句进行处理，还可以用于信息提取、文本情感分析、个性化推荐等，根据应用场景的不同预先训练对应的目标自然语言处理模型。It should be noted that the target natural language processing model is not limited to processing sentences to be translated and dialogues to be generated, but can also be used for information extraction, text sentiment analysis, personalized recommendation, etc., and pre-train corresponding models according to different application scenarios Target natural language processing model.

需要说明的是，基于上述实施例一，本发明实施例还提供了相应的改进方案。在后续实施例中涉及与上述实施例一中相同步骤或相应步骤之间可相互参考，相应的有益效果也可相互参照，在下文的改进实施例中不再一一赘述。It should be noted that, based on the first embodiment above, the embodiment of the present invention also provides a corresponding improvement solution. In subsequent embodiments, the same steps as in the first embodiment above or corresponding steps may be referred to each other, and the corresponding beneficial effects may also be referred to each other, and will not be repeated in the improved embodiments below.

实施例二：Embodiment two:

参见图2，图2为本发明实施例中自然语言处理方法的另一种实施流程图，该方法可以包括以下步骤：Referring to Fig. 2, Fig. 2 is another implementation flowchart of the natural language processing method in the embodiment of the present invention, and the method may include the following steps:

S201：对原自然语言处理模型进行预处理，得到原自然语言处理模型中每个模型参数的更新频率。S201: Preprocessing the original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model.

在模型迭代过程中，获取原自然语言处理模型，对原自然语言处理模型进行预处理，得到原自然语言处理模型中每个模型参数的更新频率。In the model iteration process, the original natural language processing model is obtained, and the original natural language processing model is preprocessed to obtain the update frequency of each model parameter in the original natural language processing model.

在本发明的一种具体实施方式中，步骤S201可以包括以下步骤：In a specific implementation manner of the present invention, step S201 may include the following steps:

步骤一：从训练数据集中选取第一预设数量的数据组；Step 1: selecting a first preset number of data groups from the training data set;

步骤二：依次将各数据组输入到原自然语言处理模型，以对原自然语言处理模型进行第一预设数量的模型迭代操作；Step 2: Input each data group into the original natural language processing model in turn, so as to perform a first preset number of model iteration operations on the original natural language processing model;

步骤三：针对每个数据组，对原自然语言处理模型进行前向计算和反向传播，得到原自然语言处理模型中每个模型参数的梯度值；Step 3: For each data group, perform forward calculation and backpropagation on the original natural language processing model to obtain the gradient value of each model parameter in the original natural language processing model;

步骤四：利用指示函数对各次模型迭代中各梯度值中的非零值进行统计操作，得到各模型参数分别在各次模型迭代中对应的梯度更新有效次数。Step 4: Use the indicator function to perform statistical operations on the non-zero values in each gradient value in each model iteration, and obtain the effective number of gradient updates corresponding to each model parameter in each model iteration.

步骤五：分别计算各梯度更新有效次数占预设次数的比例，得到各模型参数分别对应的更新频率。Step 5: Calculate the ratio of the effective number of update times of each gradient to the preset number of times, and obtain the update frequency corresponding to each model parameter.

为方便描述，将上述五个步骤结合起来进行说明。For convenience of description, the above five steps are combined for description.

在预先获取到的训练数据集中选取第一预设数量的数据组，如选取m个数据组，依次将各数据组输入到原自然语言处理模型，每个数据组对应对原自然语言处理模型的一次迭代，从而对原自然语言处理模型进行第一预设数量的模型迭代操作。针对每个数据组，对原自然语言处理模型进行前向计算和反向传播，得到原自然语言处理模型中每个模型参数的梯度值

利用指示函数I对各次模型迭代中各梯度值/>

中的非零值进行统计操作，从而获取在各次模型迭代中各模型参数对应的梯度更新是否有效，得到各所述模型参数分别在各次模型迭代中对应的梯度更新有效次数。指示函数I的定义如下：Select the first preset number of data groups in the pre-acquired training data set, such as selecting m data groups, and input each data group into the original natural language processing model in turn, each data group corresponds to the original natural language processing model. One iteration, so that a first preset number of model iterations are performed on the original natural language processing model. For each data group, perform forward calculation and backpropagation on the original natural language processing model to obtain the gradient value of each model parameter in the original natural language processing model

Each gradient value in each model iteration is calculated by using the indicator function I

Statistical operations are performed on the non-zero values in , so as to obtain whether the gradient update corresponding to each model parameter is valid in each model iteration, and obtain the effective number of gradient updates corresponding to each model parameter in each model iteration. The indicator function I is defined as follows:

当

时，说明对应的梯度更新是有效的，/>

表示第t次迭代后第i个模型参数的梯度值。when

, indicating that the corresponding gradient update is valid, />

Indicates the gradient value of the i-th model parameter after the t-th iteration.

分别计算各梯度更新有效次数占预设次数的比例，得到各模型参数分别对应的更新频率。可以通过以下公式分别计算各梯度更新有效次数占预设次数的比例，得到各模型参数分别对应的更新频率α_i：Calculate the ratio of the effective number of update times of each gradient to the preset number of times, and obtain the update frequency corresponding to each model parameter. The ratio of the effective times of each gradient update to the preset times can be calculated by the following formula, and the update frequency α _i corresponding to each model parameter can be obtained:

S202：根据各更新频率对各模型参数进行分组操作，得到各模型参数组。S202: Group each model parameter according to each update frequency to obtain each model parameter group.

由于语言处理模型在分布式深度学习框架中常用稀疏的矩阵进行表示，因此在得到原自然语言处理模型中每个模型参数的更新频率之后，根据各更新频率对各模型参数进行分组操作，得到各模型参数组。Since the language processing model is often represented by a sparse matrix in the distributed deep learning framework, after obtaining the update frequency of each model parameter in the original natural language processing model, the model parameters are grouped according to each update frequency, and each model parameter is obtained. Model parameter group.

在本发明的一种具体实施方式中，步骤S202可以包括以下步骤：In a specific implementation manner of the present invention, step S202 may include the following steps:

步骤一：将各模型参数输入到预置可排序容器中；Step 1: Input each model parameter into a preset sortable container;

步骤二：利用排序容器根据各模型参数分别对应的更新频率的大小对各模型参数进行排序操作，得到排序结果；Step 2: use the sorting container to sort each model parameter according to the update frequency corresponding to each model parameter, and obtain the sorting result;

步骤三：根据排序结果将各模型参数划分为第二预设数量的模型参数组。Step 3: Divide each model parameter into a second preset number of model parameter groups according to the sorting result.

为方便描述，将上述三个步骤结合起来进行说明。For convenience of description, the above three steps are combined for description.

预先部署可排序容器，将各模型参数输入到预置可排序容器中，依据获得的每一个模型参数的稀疏程度，即α_i，利用排序容器根据各模型参数分别对应的更新频率的大小对各模型参数进行排序操作，得到排序结果。根据排序结果将各模型参数划分为第二预设数量的模型参数组。第二预设数量用预设的超参数q表示，稀疏程度比较相似的第i组模型参数用p_i表示，模型参数集合可以表示为：Deploy the sortable container in advance, input each model parameter into the preset sortable container, and use the sorting _container according to the update frequency corresponding to each model parameter to sort each The model parameters are sorted and the sorted results are obtained. Each model parameter is divided into a second preset number of model parameter groups according to the sorting result. The second preset number is represented by the preset hyperparameter q, and the i-th group of model parameters with similar sparsity is represented by p _i , and the model parameter set can be expressed as:

P＝{p₁,p₂,…,p_q}；P={p ₁ ,p ₂ ,...,p _q };

S203：分别确定各模型参数组的同步间隔。S203: Determine the synchronization interval of each model parameter group respectively.

在得到各模型参数组之后，分别确定各模型参数组的同步间隔。After each model parameter group is obtained, the synchronization interval of each model parameter group is determined respectively.

在本发明的一种具体实施方式中，步骤S203可以包括以下步骤：In a specific implementation manner of the present invention, step S203 may include the following steps:

步骤一：针对每个模型参数组，计算模型参数组中各模型参数的更新频率的平均更新频率；Step 1: For each model parameter group, calculate the average update frequency of the update frequencies of each model parameter in the model parameter group;

步骤二：根据各平均更新频率计算各模型参数组分别的同步间隔。Step 2: Calculate the respective synchronization intervals of each model parameter group according to each average update frequency.

为方便描述，将上述两个步骤结合起来进行说明。For convenience of description, the above two steps are combined for description.

在获取每个模型参数的更新频率，并对各模型参数进行分组，得到各模型参数组之后，针对每个模型参数组，计算模型参数组中各模型参数的更新频率的平均更新频率

根据各平均更新频率计算各模型参数组分别的同步间隔。具体可以根据以下公式计算平均更新频率/>

After obtaining the update frequency of each model parameter and grouping each model parameter to obtain each model parameter group, for each model parameter group, calculate the average update frequency of the update frequency of each model parameter in the model parameter group

The respective synchronization intervals for each model parameter set are calculated according to each average update frequency. Specifically, the average update frequency can be calculated according to the following formula/>

其中，||表示1范数计算。Among them, || represents 1-norm calculation.

在分别确定各模型参数组的同步间隔之后，根据各平均更新频率计算各模型参数组分别的同步间隔k_i，可以通过以下公式计算各模型参数组分别的同步间隔k_i：After the synchronization intervals of each model parameter group are respectively determined, the respective synchronization intervals k _i of each model parameter group are calculated according to each average update frequency, and the respective synchronization intervals k _i of each model parameter group can be calculated by the following formula:

其中，k_i表示第i组模型参数组对应的同步间隔，用K表示由各组模型参数组对应的同步间隔构成的同步间隔集合，K表示为：Among them, _ki represents the synchronization interval corresponding to the i-th model parameter group, and K represents the synchronization interval set composed of the synchronization intervals corresponding to each group of model parameter groups, and K is expressed as:

其中，λ是同步间隔设定系数，可以便于发明实施人员基于先验知识从而动态的对不同稀疏深度神经网络模型的同步间隔进行调整。Wherein, λ is a synchronization interval setting coefficient, which can facilitate the inventors to dynamically adjust the synchronization intervals of different sparse deep neural network models based on prior knowledge.

S204：在模型迭代过程中，根据各同步间隔对各训练节点内原自然语言处理模型的对应模型参数组中各模型参数进行同步操作，得到目标自然语言处理模型。S204: During the model iteration process, perform synchronous operations on each model parameter in the corresponding model parameter group of the original natural language processing model in each training node according to each synchronization interval, to obtain a target natural language processing model.

在模型迭代过程中，根据各同步间隔对各训练节点内原自然语言处理模型的对应模型参数组中各模型参数进行同步操作，得到目标自然语言处理模型。During the model iteration process, each model parameter in the corresponding model parameter group of the original natural language processing model in each training node is synchronized according to each synchronization interval to obtain the target natural language processing model.

在本发明的一种具体实施方式中，步骤S204可以包括以下步骤：In a specific implementation manner of the present invention, step S204 may include the following steps:

步骤一：从各训练节点中选取目标训练节点；Step 1: Select the target training node from each training node;

步骤二：对各目标训练节点内原自然语言处理模型中各模型参数进行初始化，得到初始化结果；Step 2: Initialize each model parameter in the original natural language processing model in each target training node, and obtain the initialization result;

步骤三：利用目标训练节点将初始化结果广播至除目标训练节点之外的其他训练节点；Step 3: Use the target training node to broadcast the initialization result to other training nodes except the target training node;

步骤四：对模型迭代次数进行初始化；Step 4: Initialize the number of model iterations;

步骤五：在模型迭代过程中，分别利用各同步间隔对每个模型迭代次数累加结果进行取余操作；Step 5: During the model iteration process, each synchronization interval is used to perform a remainder operation on the cumulative result of each model iteration number;

步骤六：对余数为零的同步间隔对应的目标模型参数组内各模型参数进行同步操作，得到目标自然语言处理模型。Step 6: Synchronize each model parameter in the target model parameter group corresponding to the synchronization interval whose remainder is zero, and obtain the target natural language processing model.

为方便描述，将上述六个步骤结合起来进行说明。For convenience of description, the above six steps are combined for description.

从各训练节点中选取目标训练节点，对各目标训练节点内原自然语言处理模型中各模型参数进行初始化，得到初始化结果。A target training node is selected from each training node, and each model parameter in the original natural language processing model in each target training node is initialized to obtain an initialization result.

目标训练节点可以为各训练节点中任意一个训练节点，如可以预先为各训练节点进行编号，编号分别为0，1……，选取编号为0的训练节点作为目标训练节点。对各目标训练节点内原自然语言处理模型中各模型参数进行初始化，得到初始化结果。利用目标训练节点将初始化结果广播至除目标训练节点之外的其他训练节点。对模型迭代次数进行初始化，如在对原自然语言处理模型进行迭代训练前，将迭代次数初始化为0，在模型迭代过程中，分别利用各同步间隔对每个模型迭代次数t累加结果进行取余操作，可以通过以下公式计算：The target training node can be any one of the training nodes. For example, each training node can be numbered in advance, and the numbers are 0, 1..., and the training node numbered 0 is selected as the target training node. Initialize each model parameter in the original natural language processing model in each target training node to obtain an initialization result. The target training node is used to broadcast the initialization result to other training nodes except the target training node. Initialize the number of model iterations. For example, before performing iterative training on the original natural language processing model, initialize the number of iterations to 0. During the model iteration process, use each synchronization interval to take the remainder of the accumulated results of each model iteration number t operation, can be calculated by the following formula:

t modk_i；t modk _i ;

对余数为零的同步间隔对应的目标模型参数组内各模型参数进行同步操作，得到目标自然语言处理模型。The synchronization operation is performed on each model parameter in the target model parameter group corresponding to the synchronization interval whose remainder is zero, to obtain the target natural language processing model.

在本发明的一种具体实施方式中，对余数为零的同步间隔对应的目标模型参数组内各模型参数进行同步操作，可以包括以下步骤：In a specific embodiment of the present invention, the synchronous operation of each model parameter in the target model parameter group corresponding to the synchronous interval whose remainder is zero may include the following steps:

步骤一：分别获取各训练节点中的目标模型参数组；Step 1: Obtain the target model parameter groups in each training node respectively;

步骤二：分别对各目标模型参数组中对应各模型参数进行均值计算，得到各平均模型参数；Step 2: Calculate the mean value of each model parameter corresponding to each target model parameter group to obtain each average model parameter;

步骤三：将各训练节点中的目标模型参数组中的各模型参数设置为各平均模型参数，得到目标自然语言处理模型。Step 3: Set each model parameter in the target model parameter group in each training node as each average model parameter to obtain the target natural language processing model.

分别获取各训练节点中的目标模型参数组，分别对各目标模型参数组中对应各模型参数进行均值计算，得到各平均模型参数，将各训练节点中的目标模型参数组中的各模型参数设置为各平均模型参数，得到目标自然语言处理模型。Obtain the target model parameter groups in each training node respectively, calculate the mean value of each model parameter corresponding to each target model parameter group, obtain each average model parameter, and set each model parameter in the target model parameter group in each training node is each average model parameter to obtain the target natural language processing model.

步骤S201至步骤S203完成了关键参数P和K的求解，步骤S201至步骤S203可以仅使用一个训练节点进行单机的离线计算得到。由于步骤S201至步骤S203不对模型进行训练，不需要对模型参数进行修改，仅仅需要根据不同的批训练数据计算相应梯度值，因此其耗时相对一般单机训练短很多。可以选用配有英特尔至强CPU与4块英伟达RTX 2080Ti GPU的单台服务器，使用服务器中的一块GPU完成针对LM1b模型的步骤S201至步骤S203，耗时短。经验上而言，步步骤S201至步骤S203的耗时相对整体训练耗时可以忽略不计。Steps S201 to S203 complete the solution of the key parameters P and K, and steps S201 to S203 can be obtained by using only one training node for offline calculation on a single machine. Since step S201 to step S203 do not train the model and do not need to modify the model parameters, only the corresponding gradient values need to be calculated according to different batches of training data, so the time-consuming is much shorter than that of general stand-alone training. A single server equipped with an Intel Xeon CPU and 4 Nvidia RTX 2080Ti GPUs can be selected, and one GPU in the server can be used to complete steps S201 to S203 for the LM1b model, which takes a short time. From experience, the time consumption of steps S201 to S203 is negligible relative to the overall training time.

相对现有的语言处理模型分布式训练方式，采用本发明的方法，在单台和两台上述服务器上对语言处理模型进行分布式训练，较大地缩短了训练时间。Compared with the existing distributed training mode of the language processing model, the method of the present invention is used to carry out the distributed training of the language processing model on a single server or two servers, which greatly shortens the training time.

S205：接收待处理自然语言信息。S205: Receive natural language information to be processed.

S206：将待处理自然语言信息输入到目标自然语言处理模型中；其中，目标自然语言处理模型为通过模型平均的模型聚合算法进行分布式训练得到。S206: Input the to-be-processed natural language information into the target natural language processing model; wherein, the target natural language processing model is obtained by performing distributed training through a model averaging model aggregation algorithm.

S207：利用目标自然语言处理模型对待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。S207: Use the target natural language processing model to perform corresponding natural language understanding or natural language generation operations on the natural language information to be processed.

相应于上面的方法实施例，本发明实施例还提供了一种自然语言处理装置，下文描述的自然语言处理装置与上文描述的自然语言处理方法可相互对应参照。Corresponding to the above method embodiments, an embodiment of the present invention also provides a natural language processing device, and the natural language processing device described below and the natural language processing method described above can be referred to in correspondence.

参见图3，图3为本发明实施例中一种自然语言处理装置的结构框图，该装置可以包括：Referring to FIG. 3, FIG. 3 is a structural block diagram of a natural language processing device in an embodiment of the present invention, which may include:

信息接收模块31，用于接收待处理自然语言信息；An information receiving module 31, configured to receive natural language information to be processed;

信息输入模块32，用于将待处理自然语言信息输入到目标自然语言处理模型中；其中，目标自然语言处理模型为通过模型平均的模型聚合算法进行分布式训练得到；The information input module 32 is used to input the natural language information to be processed into the target natural language processing model; wherein, the target natural language processing model is obtained by performing distributed training through a model averaging model aggregation algorithm;

信息处理模块33，用于利用目标自然语言处理模型对待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。The information processing module 33 is configured to use the target natural language processing model to perform corresponding natural language understanding or natural language generation operations on the natural language information to be processed.

应用本发明实施例所提供的装置，接收待处理自然语言信息；将待处理自然语言信息输入到目标自然语言处理模型中；其中，目标自然语言处理模型为通过模型平均的模型聚合算法进行分布式训练得到；利用语言处理模型对待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。通过利用模型平均的模型聚合算法进行分布式训练得到目标自然语言处理模型，模型平均的模型聚合算法适用于对分布式深度学习框架中常用稀疏的矩阵进行表示的语言处理模型进行训练，较大地缩短了模型训练时长，提高了语言处理模型进行自然语言处理的准确性和处理效率。Apply the device provided by the embodiment of the present invention to receive the natural language information to be processed; input the natural language information to be processed into the target natural language processing model; wherein, the target natural language processing model is distributed through the model aggregation algorithm of model averaging Obtained through training; use the language processing model to perform corresponding natural language understanding or natural language generation operations on the natural language information to be processed. The target natural language processing model is obtained by using the model averaging model aggregation algorithm for distributed training. The model averaging model aggregation algorithm is suitable for training language processing models represented by sparse matrices commonly used in distributed deep learning frameworks, greatly shortening The model training time is shortened, and the accuracy and processing efficiency of the language processing model for natural language processing are improved.

在本发明的一种具体实施方式中，该装置包括模型训练模块，模型训练模块包括：In a specific embodiment of the present invention, the device includes a model training module, and the model training module includes:

参数更新频率获得子模块，用于对原自然语言处理模型进行预处理，得到原自然语言处理模型中每个模型参数的更新频率；The parameter update frequency acquisition sub-module is used to preprocess the original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model;

参数组获得子模块，用于根据各更新频率对各模型参数进行分组操作，得到各模型参数组；The parameter group acquisition sub-module is used to group each model parameter according to each update frequency to obtain each model parameter group;

同步间隔确定子模块，用于分别确定各模型参数组的同步间隔；The synchronization interval determination submodule is used to determine the synchronization interval of each model parameter group respectively;

参数同步子模块，用于在模型迭代过程中，根据各同步间隔对各训练节点内原自然语言处理模型的对应模型参数组中各模型参数进行同步操作，得到目标自然语言处理模型。The parameter synchronization sub-module is used for synchronizing each model parameter in the corresponding model parameter group of the original natural language processing model in each training node according to each synchronization interval during the model iteration process to obtain the target natural language processing model.

在本发明的一种具体实施方式中，参数更新频率获得子模块包括：In a specific embodiment of the present invention, the parameter update frequency obtaining submodule includes:

数据组选取单元，用于从训练数据集中选取第一预设数量的数据组；a data group selection unit, configured to select a first preset number of data groups from the training data set;

模型迭代单元，用于依次将各数据组输入到原自然语言处理模型，以对原自然语言处理模型进行第一预设数量的模型迭代操作；A model iteration unit, configured to sequentially input each data group into the original natural language processing model, so as to perform a first preset number of model iteration operations on the original natural language processing model;

梯度值获得单元，用于针对每个数据组，对原自然语言处理模型进行前向计算和反向传播，得到原自然语言处理模型中每个模型参数的梯度值；The gradient value obtaining unit is used to perform forward calculation and backpropagation on the original natural language processing model for each data group, and obtain the gradient value of each model parameter in the original natural language processing model;

有效次数获得单元，用于利用指示函数对各次模型迭代中各梯度值中的非零值进行统计操作，得到各模型参数分别在各次模型迭代中对应的梯度更新有效次数；The effective number of times obtaining unit is used to perform statistical operations on the non-zero values in each gradient value in each model iteration by using the indicator function, and obtain the corresponding gradient update effective times of each model parameter in each model iteration;

更新频率获得单元，用于分别计算各梯度更新有效次数占预设次数的比例，得到各模型参数分别对应的更新频率。The update frequency acquisition unit is used to calculate the ratio of the effective number of update times of each gradient to the preset number of times, and obtain the update frequency corresponding to each model parameter.

在本发明的一种具体实施方式中，参数组获得子模块包括：In a specific embodiment of the present invention, the parameter set obtaining submodule includes:

参数输入单元，用于将各模型参数输入到预置可排序容器中；A parameter input unit, configured to input each model parameter into a preset sortable container;

排序结果获得单元，用于利用排序容器根据各模型参数分别对应的更新频率的大小对各模型参数进行排序操作，得到排序结果；The sorting result obtaining unit is used to use the sorting container to sort each model parameter according to the size of the update frequency corresponding to each model parameter, and obtain the sorting result;

参数组划分单元，用于根据排序结果将各模型参数划分为第二预设数量的模型参数组。A parameter group dividing unit, configured to divide each model parameter into a second preset number of model parameter groups according to the sorting result.

在本发明的一种具体实施方式中，同步间隔确定子模块包括：In a specific embodiment of the present invention, the synchronization interval determination submodule includes:

平均更新频率计算单元，用于针对每个模型参数组，计算模型参数组中各模型参数的更新频率的平均更新频率；An average update frequency calculation unit, for each model parameter group, to calculate the average update frequency of the update frequencies of the model parameters in the model parameter group;

同步间隔计算单元，用于根据各平均更新频率计算各模型参数组分别的同步间隔。The synchronization interval calculation unit is configured to calculate the respective synchronization intervals of each model parameter group according to each average update frequency.

在本发明的一种具体实施方式中，参数同步子模块包括：In a specific embodiment of the present invention, the parameter synchronization submodule includes:

节点选取单元，用于从各训练节点中选取目标训练节点；A node selection unit is used to select a target training node from each training node;

参数初始化单元，用于对各目标训练节点内原自然语言处理模型中各模型参数进行初始化，得到初始化结果；The parameter initialization unit is used to initialize each model parameter in the original natural language processing model in each target training node, and obtain the initialization result;

广播单元，用于利用目标训练节点将初始化结果广播至除目标训练节点之外的其他训练节点；A broadcast unit, configured to use the target training node to broadcast the initialization result to other training nodes except the target training node;

迭代次数初始化单元，用于对模型迭代次数进行初始化；The number of iterations initialization unit is used to initialize the number of iterations of the model;

取余单元，用于在模型迭代过程中，分别利用各同步间隔对每个模型迭代次数累加结果进行取余操作；A remainder unit is used to perform a remainder operation on the accumulation result of each model iteration number by using each synchronization interval during the model iteration process;

参数同步单元，用于对余数为零的同步间隔对应的目标模型参数组内各模型参数进行同步操作。The parameter synchronization unit is configured to perform a synchronization operation on each model parameter in the target model parameter group corresponding to the synchronization interval whose remainder is zero.

在本发明的一种具体实施方式中，参数同步单元包括：In a specific implementation manner of the present invention, the parameter synchronization unit includes:

参数组获取子单元，用于分别获取各训练节点中的目标模型参数组；The parameter group obtaining subunit is used to respectively obtain the target model parameter group in each training node;

参数平均值获得子单元，用于分别对各目标模型参数组中对应各模型参数进行均值计算，得到各平均模型参数；The parameter average value obtaining subunit is used to calculate the average value of each corresponding model parameter in each target model parameter group to obtain each average model parameter;

参数设置子单元，用于将各训练节点中的目标模型参数组中的各模型参数设置为各平均模型参数。The parameter setting subunit is used to set each model parameter in the target model parameter group in each training node as each average model parameter.

相应于上面的方法实施例，参见图4，图4为本发明所提供的自然语言处理设备的示意图，该设备可以包括：Corresponding to the above method embodiment, see FIG. 4, which is a schematic diagram of a natural language processing device provided by the present invention, which may include:

存储器41，用于存储计算机程序；Memory 41, used to store computer programs;

处理器42，用于执行上述存储器41存储的计算机程序时可实现如下步骤：When the processor 42 is used to execute the computer program stored in the memory 41, the following steps can be realized:

接收待处理自然语言信息；将待处理自然语言信息输入到目标自然语言处理模型中；其中，目标自然语言处理模型为通过模型平均的模型聚合算法进行分布式训练得到；利用目标自然语言处理模型对待处理自然语言信息进行相应的自然语言理解或自然语言生成操作。Receive the natural language information to be processed; input the natural language information to be processed into the target natural language processing model; wherein, the target natural language processing model is obtained through distributed training through the model averaging model aggregation algorithm; use the target natural language processing model to treat Process natural language information to perform corresponding natural language understanding or natural language generation operations.

对于本发明提供的设备的介绍请参照上述方法实施例，本发明在此不做赘述。For the introduction of the device provided by the present invention, please refer to the above method embodiment, and the present invention will not repeat it here.

相应于上面的方法实施例，本发明还提供一种计算机可读存储介质，计算机可读存储介质上存储有计算机程序，计算机程序被处理器执行时可实现如下步骤：Corresponding to the above method embodiments, the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the following steps can be implemented:

该计算机可读存储介质可以包括：U盘、移动硬盘、只读存储器(Read-OnlyMemory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The computer-readable storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store program codes. medium.

对于本发明提供的计算机可读存储介质的介绍请参照上述方法实施例，本发明在此不做赘述。For the introduction of the computer-readable storage medium provided by the present invention, please refer to the foregoing method embodiments, and the present invention will not repeat them here.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其它实施例的不同之处，各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置、设备及计算机可读存储介质而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other. As for the device, equipment and computer-readable storage medium disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for relevant details, please refer to the description of the method part.

本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的技术方案及其核心思想。应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以对本发明进行若干改进和修饰，这些改进和修饰也落入本发明权利要求的保护范围内。In this paper, specific examples are used to illustrate the principles and implementation methods of the present invention, and the descriptions of the above embodiments are only used to help understand the technical solutions and core ideas of the present invention. It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, some improvements and modifications can be made to the present invention, and these improvements and modifications also fall within the protection scope of the claims of the present invention.

Claims

1. A natural language processing method, characterized in that, comprising:

Receive natural language information to be processed;

Inputting the natural language information to be processed into a target natural language processing model; wherein, the target natural language processing model is obtained by performing distributed training through a model averaging model aggregation algorithm;

Using the target natural language processing model to perform corresponding natural language understanding or natural language generation operations on the to-be-processed natural language information;

Wherein, the process of obtaining the target natural language processing model by performing model training through the model averaging model aggregation algorithm includes:

Preprocessing the original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model;

performing a grouping operation on each of the model parameters according to each of the update frequencies to obtain each of the model parameter groups;

Respectively determine the synchronization interval of each of the model parameter groups;

During the model iteration process, perform synchronous operations on each of the model parameters in the corresponding model parameter groups of the original natural language processing model in each training node according to each of the synchronization intervals, to obtain the target natural language processing model;

Wherein, each of the model parameters is grouped according to each of the update frequencies to obtain each of the model parameter groups, including:

inputting each of said model parameters into a preset sortable container;

Using the sorting container to perform a sorting operation on each of the model parameters according to the magnitude of the update frequency corresponding to each of the model parameters, to obtain a sorting result;

Divide each of the model parameters into a second preset number of model parameter groups according to the ranking result.

2. The natural language processing method according to claim 1, wherein the original natural language processing model is preprocessed to obtain the update frequency of each model parameter in the original natural language processing model, including:

selecting a first preset number of data sets from the training data set;

sequentially input each of the data groups into the original natural language processing model, so as to perform the first preset number of model iteration operations on the original natural language processing model;

For each data group, perform forward calculation and backpropagation on the original natural language processing model to obtain the gradient value of each model parameter in the original natural language processing model;

Using an indicator function to perform statistical operations on the non-zero values in each of the gradient values in each model iteration, to obtain the effective number of gradient updates corresponding to each of the model parameters in each model iteration;

Calculating the ratio of each effective number of gradient updates to the preset number of times to obtain update frequencies corresponding to each of the model parameters.

3. the natural language processing method according to claim 1, is characterized in that, determines the synchronous interval of each described model parameter group respectively, comprises:

For each model parameter group, calculate the average update frequency of the update frequencies of the model parameters in the model parameter group;

The respective synchronization intervals of each of the model parameter groups are calculated according to each of the average update frequencies.

4. according to the natural language processing method described in any one of claim 1 to 3, it is characterized in that, in model iterative process, according to each described synchronization interval to the corresponding model of the original natural language processing model in each training node The model parameters in the parameter group are synchronously operated, including:

selecting a target training node from each of the training nodes;

Initializing each model parameter in the original natural language processing model in each of the target training nodes to obtain an initialization result;

using the target training node to broadcast the initialization result to other training nodes except the target training node;

Initialize the number of model iterations;

During the model iteration process, each of the synchronization intervals is used to perform a remainder operation on the accumulation result of each model iteration number;

A synchronization operation is performed on each of the model parameters in the target model parameter group corresponding to the synchronization interval whose remainder is zero.

5. the natural language processing method according to claim 4, is characterized in that, carry out synchronous operation to each described model parameter in the target model parameter group corresponding to the synchronous interval that remainder is zero, comprising:

Obtaining target model parameter groups in each of the training nodes respectively;

Carrying out mean calculations for each of the model parameters corresponding to each of the target model parameter groups to obtain each of the average model parameters;

Each model parameter in the target model parameter group in each of the training nodes is set as each of the average model parameters.

6. A natural language processing device, characterized in that, comprising:

An information receiving module, configured to receive natural language information to be processed;

An information input module, configured to input the natural language information to be processed into a target natural language processing model; wherein, the target natural language processing model is obtained by performing distributed training through a model averaging model aggregation algorithm;

An information processing module, configured to use the target natural language processing model to perform corresponding natural language understanding or natural language generation operations on the to-be-processed natural language information;

Wherein, the device includes a model training module, and the model training module includes:

The parameter update frequency acquisition sub-module is used to preprocess the original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model;

The parameter group acquisition sub-module is used to group each model parameter according to each update frequency to obtain each model parameter group;

The synchronization interval determination submodule is used to determine the synchronization interval of each model parameter group respectively;

The parameter synchronization sub-module is used for synchronizing each model parameter in the corresponding model parameter group of the original natural language processing model in each training node according to each synchronization interval during the model iteration process to obtain the target natural language processing model;

Wherein, the parameter group obtaining submodule performs grouping operation on each of the model parameters according to each of the update frequencies to obtain each model parameter group, including: inputting each of the model parameters into a preset sortable container; Using the sorting container to perform a sorting operation on each of the model parameters according to the update frequency corresponding to each of the model parameters to obtain a sorting result; divide each of the model parameters into a second preset number according to the sorting result set of model parameters.

7. A natural language processing device, comprising:

memory for storing computer programs;

A processor configured to implement the steps of the natural language processing method according to any one of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the natural language described in any one of claims 1 to 5 is realized. The steps of the processing method.