CN117034926A

CN117034926A - Model training method and device for multi-field text classification model

Info

Publication number: CN117034926A
Application number: CN202311038789.5A
Authority: CN
Inventors: 杨仁杰; 王智君; 魏一雄; 王聪; 曹靖楠
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2023-11-10

Abstract

The specification discloses a method and a device for model training of a multi-field text classification model. The text classification model can learn potential association under each classification field in the model training process, so that the text classification model can give more accurate classification results, and the text classification model comprises a plurality of sub-classification layers, so that the aim of classifying the text data in each classification field can be fulfilled after the text classification model is trained, and the training cost is greatly saved.

Description

A method and device for model training of multi-domain text classification models

技术领域Technical field

本说明书涉及计算机技术领域以及文本处理领域，尤其涉及一种针对多领域文本分类模型的模型训练的方法及装置。This description relates to the field of computer technology and the field of text processing, and in particular, to a method and device for model training of a multi-domain text classification model.

背景技术Background technique

随着计算机技术的不断发展，人工智能已经广泛应用到各个业务场景中，其通过使用预训练的模型，将业务场景所涉及的业务数据输入到该模型中，从而基于模型输出的结果进行业务执行，极大的提高了各种业务场景中的业务执行效率。With the continuous development of computer technology, artificial intelligence has been widely used in various business scenarios. It uses pre-trained models to input the business data involved in the business scenarios into the model, so as to perform business execution based on the results output by the model. , greatly improving business execution efficiency in various business scenarios.

例如，在文本分类的场景中，通常需要将文本数据输入到预训练的文本分类模型中，通过文本分类模型输出的结果，可以实现诸如信息推荐、安全预警等目的。For example, in text classification scenarios, it is usually necessary to input text data into a pre-trained text classification model. Through the output results of the text classification model, purposes such as information recommendation and security warnings can be achieved.

然而，目前只是使用单一分类场景下的样本文本数据对文本分类模型进行训练，这就导致训练后的文本分类模型通常也只能对输入的文本数据给出一种分类场景下的分类结果。而如果想要得到多种分类场景下的分类结果，则需要分别对构建出各个分类场景下所使用的文本分类模型并对其进行训练，极大的提高了训练成本。并且，由于一个文本数据可能会包含多个分类领域下的内容，而文本分类模型往往不会关注各分类领域之间所存在的潜在联系，从而导致文本分类模型给出的分类结果的准确性也较低。However, currently, text classification models are only trained using sample text data in a single classification scenario, which results in the trained text classification model usually being able to only provide a classification result in one classification scenario for the input text data. If you want to obtain classification results in multiple classification scenarios, you need to build and train the text classification models used in each classification scenario, which greatly increases the training cost. Moreover, since a piece of text data may contain content under multiple classification fields, text classification models often do not pay attention to the potential connections between various classification fields, resulting in the accuracy of the classification results given by the text classification model. lower.

发明内容Contents of the invention

本说明书提供一种针对多领域文本分类模型的模型训练的方法及装置，以部分的解决现有技术存在的上述问题。This specification provides a method and device for model training of a multi-domain text classification model to partially solve the above problems existing in the prior art.

本说明书采用下述技术方案：This manual adopts the following technical solutions:

本说明书提供了一种针对多领域文本分类模型的模型训练的方法，包括：This manual provides a method for model training for multi-domain text classification models, including:

获取样本文本数据；Get sample text data;

将所述样本文本数据输入到待训练的文本分类模型中的编码层，以通过所述编码层对所述样本文本数据进行编码，得到所述样本文本数据对应的第一编码向量；Input the sample text data into the encoding layer in the text classification model to be trained, so as to encode the sample text data through the encoding layer to obtain the first encoding vector corresponding to the sample text data;

将所述第一编码向量分别输入到所述文本分类模型中所包含的每个专家网络层中，以针对每个专家网络层，确定该专家网络层针对所述第一编码向量所输出的第二编码向量；The first encoding vector is input into each expert network layer included in the text classification model, so as to determine, for each expert network layer, the first encoding vector output by the expert network layer. Two encoding vectors;

将每个专家网络层输出的第二编码向量输入到所述文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过该子分类层对输入到该子分类层中的每个专家网络层所输出的第二编码向量进行处理，以得到针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果；The second encoding vector output by each expert network layer is input to each sub-classification layer included in the text classification model, so that for each sub-classification layer, each sub-classification layer input to the sub-classification layer is The second coding vector output by each expert network layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer;

以最小化每个子分类层对应的分类结果与所述样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对所述文本分类模型进行训练。The text classification model is trained with the optimization goal of minimizing the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer.

可选地，针对每个子分类层，所述文本分类模型中还包含有针对该子分类层的权重分配层；Optionally, for each sub-classification layer, the text classification model also includes a weight allocation layer for the sub-classification layer;

将每个专家网络层输出的第二编码向量输入到所述文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过该子分类层对输入到该子分类层中的每个专家网络层所输出的第二编码向量进行处理，以得到针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果，具体包括：The second encoding vector output by each expert network layer is input to each sub-classification layer included in the text classification model, so that for each sub-classification layer, each sub-classification layer input to the sub-classification layer is The second coding vector output by the expert network layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer, specifically including:

将每个专家网络层输出的第二编码向量输入到所述文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过针对该子分类层所设置的权重分配层，对每个专家网络层所输出的第二编码向量进行加权，得到每个专家网络层的加权编码向量，并将每个专家网络层的加权编码向量输入到该子分类层中，以通过该子分类层输出针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果。The second encoding vector output by each expert network layer is input to each sub-classification layer included in the text classification model, so that for each sub-classification layer, through the weight allocation layer set for the sub-classification layer, each The second coding vectors output by each expert network layer are weighted to obtain the weighted coding vector of each expert network layer, and the weighted coding vector of each expert network layer is input into the sub-classification layer to pass the sub-classification layer. Output the classification results in the corresponding classification field of the sub-classification layer as the classification results corresponding to the sub-classification layer.

可选地，获取样本文本数据，具体包括：Optionally, obtain sample text data, including:

获取初始样本文本数据；Get initial sample text data;

对所述初始样本文本数据中所包含的无效词语进行识别，并将所述无效词语从所述初始样本文本数据中剔除，得到过渡样本文本数据；Identify invalid words contained in the initial sample text data, and remove the invalid words from the initial sample text data to obtain transition sample text data;

将所述过渡样本文本数据按照指定文本长度进行截取，以获取到输入到所述文本分类模型中的样本文本数据。The transition sample text data is intercepted according to the specified text length to obtain sample text data input into the text classification model.

可选地，以最小化每个子分类层对应的分类结果与所述样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对所述文本分类模型进行训练，具体包括：Optionally, with the optimization goal of minimizing the deviation between the classification results corresponding to each sub-classification layer and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer, classify the text The model is trained, including:

针对每轮训练，从所述文本分类模型中确定出部分网络层，并将所述部分网络层的网络参数在该轮训练中固定；For each round of training, determine some network layers from the text classification model, and fix the network parameters of the partial network layers in this round of training;

以最小化该轮训练中每个子分类层所对应的分类结果与所述样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对所述文本分类模型中包含的除所述部分网络层以外的其他网络层的网络参数进行调整，以执行所述文本分类模型的该轮训练。With the optimization goal of minimizing the deviation between the classification results corresponding to each sub-classification layer in this round of training and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer, the text Network parameters of other network layers included in the classification model except the partial network layer are adjusted to perform this round of training of the text classification model.

本说明书提供了一种针对多领域文本分类模型的模型训练的装置，包括：This specification provides a device for model training of multi-domain text classification models, including:

获取模块，用于获取样本文本数据；Obtain module, used to obtain sample text data;

第一编码模块，用于将所述样本文本数据输入到待训练的文本分类模型中的编码层，以通过所述编码层对所述样本文本数据进行编码，得到所述样本文本数据对应的第一编码向量；The first encoding module is used to input the sample text data into the encoding layer in the text classification model to be trained, so as to encode the sample text data through the encoding layer, and obtain the third code corresponding to the sample text data. a coding vector;

第二编码模块，用于将所述第一编码向量分别输入到所述文本分类模型中所包含的每个专家网络层中，以针对每个专家网络层，确定该专家网络层针对所述第一编码向量所输出的第二编码向量；A second encoding module, configured to input the first encoding vector into each expert network layer included in the text classification model, so as to determine, for each expert network layer, whether the expert network layer is suitable for the first a second coding vector output from one coding vector;

输出模块，用于将每个专家网络层输出的第二编码向量输入到所述文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过该子分类层对输入到该子分类层中的每个专家网络层所输出的第二编码向量进行处理，以得到针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果；An output module, configured to input the second encoding vector output by each expert network layer to each sub-classification layer included in the text classification model, so that for each sub-classification layer, the pair of input to the sub-classification layer is input to the sub-classification layer through the sub-classification layer. The second encoding vector output by each expert network layer in the classification layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer;

训练模块，用于以最小化每个子分类层对应的分类结果与所述样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对所述文本分类模型进行训练。The training module is used to optimize the target by minimizing the deviation between the classification results corresponding to each sub-classification layer and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer. Classification model is trained.

所述输出模块具体用于，将每个专家网络层输出的第二编码向量输入到所述文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过针对该子分类层所设置的权重分配层，对每个专家网络层所输出的第二编码向量进行加权，得到每个专家网络层的加权编码向量，并将每个专家网络层的加权编码向量输入到该子分类层中，以通过该子分类层输出针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果。The output module is specifically configured to input the second encoding vector output by each expert network layer into each sub-classification layer included in the text classification model, so that for each sub-classification layer, by The set weight distribution layer weights the second coding vector output by each expert network layer to obtain the weighted coding vector of each expert network layer, and inputs the weighted coding vector of each expert network layer into the sub-classification layer. , the classification result for the corresponding classification field of the sub-classification layer is output through the sub-classification layer as the classification result corresponding to the sub-classification layer.

可选地，所述获取模块具体用于，获取初始样本文本数据；对所述初始样本文本数据中所包含的无效词语进行识别，并将所述无效词语从所述初始样本文本数据中剔除，得到过渡样本文本数据；将所述过渡样本文本数据按照指定文本长度进行截取，以获取到输入到所述文本分类模型中的样本文本数据。Optionally, the acquisition module is specifically configured to obtain initial sample text data; identify invalid words contained in the initial sample text data, and eliminate the invalid words from the initial sample text data, Obtain transition sample text data; intercept the transition sample text data according to the specified text length to obtain sample text data input into the text classification model.

可选地，所述训练模块具体用于，针对每轮训练，从所述文本分类模型中确定出部分网络层，并将所述部分网络层的网络参数在该轮训练中固定；以最小化该轮训练中每个子分类层所对应的分类结果与所述样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对所述文本分类模型中包含的除所述部分网络层以外的其他网络层的网络参数进行调整，以执行所述文本分类模型的该轮训练。Optionally, the training module is specifically configured to, for each round of training, determine some network layers from the text classification model, and fix the network parameters of the partial network layers in this round of training; to minimize In this round of training, the deviation between the classification results corresponding to each sub-classification layer and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer is the optimization target. For the text classification model Network parameters of other network layers included except the partial network layer are adjusted to perform this round of training of the text classification model.

本说明书提供了一种计算机可读存储介质，所述存储介质存储有计算机程序，所述计算机程序被处理器执行时实现上述针对多领域文本分类模型的模型训练的方法。This specification provides a computer-readable storage medium. The storage medium stores a computer program. When the computer program is executed by a processor, the above-mentioned method for model training of a multi-domain text classification model is implemented.

本说明书提供了一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现上述针对多领域文本分类模型的模型训练的方法。This specification provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the above-mentioned model training for a multi-domain text classification model. Methods.

本说明书采用的上述至少一个技术方案能够达到以下有益效果：At least one of the above technical solutions adopted in this manual can achieve the following beneficial effects:

在本说明书提供的针对多领域文本分类模型的模型训练的方法中，文本分类模型中包含有编码层，多个专家网络层以及多个子分类层，每个子分类层用于分别给出一种分类领域下的分类结果，所以，在获取到样本文本数据后，可以将该样本文本数据输入到文本分类模型中的编码层，以得到样本文本数据对应的第一编码向量，而后，可以将该第一编码向量分别输入到文本分类模型中所包含的每个专家网络层中，由每个专家网络层分别给出各第二编码向量。进一步的，将每个专家网络层输出的第二编码向量分别输入到文本分类模型中所包含的每个子分类层中，以针对每个子分类层，通过该子分类层对输入到该子分类层中的每个专家网络层所输出的第二编码向量进行处理，以得到针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果。最后，可以最小化每个子分类层对应的分类结果与样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对文本分类模型进行训练。In the model training method for multi-domain text classification models provided in this manual, the text classification model includes a coding layer, multiple expert network layers and multiple sub-classification layers. Each sub-classification layer is used to give a classification respectively. Therefore, after obtaining the sample text data, the sample text data can be input into the encoding layer of the text classification model to obtain the first encoding vector corresponding to the sample text data. Then, the first encoding vector can be A coding vector is input into each expert network layer included in the text classification model, and each expert network layer provides a second coding vector. Further, the second coding vector output by each expert network layer is input into each sub-classification layer included in the text classification model, so that for each sub-classification layer, the pairs of input to the sub-classification layer are input to the sub-classification layer through the sub-classification layer. The second encoding vector output by each expert network layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer. Finally, the deviation between the classification results corresponding to each sub-classification layer and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer can be minimized as the optimization goal to train the text classification model.

从上述的针对多领域文本分类模型的模型训练的方法中可以看出，由于先由多个专家网络层给出各自的第二编码向量，然后再将每个专家网络层给出的第二编码向量分别输入到各个子分类层中，并通过使每个子分类层给出的分类结果与每个子分类层各自对应的分类标签之间的偏差最小为目标，对文本分类模型进行训练，这使得在模型训练过程中，文本分类模型可以学习到各分类领域下的潜在关联，从而可以使得文本分类模型能够给出更为准确的分类结果，并且，由于该文本分类模型中包含有多个子分类层，这使得训练出该文本分类模型后，可以实现对各个分类领域下的文本数据进行分类的目的，从而极大的节省了训练成本。It can be seen from the above model training method for multi-domain text classification models that since multiple expert network layers first give their respective second encoding vectors, and then the second encoding vectors given by each expert network layer are The vectors are input into each sub-classification layer respectively, and the text classification model is trained by minimizing the deviation between the classification result given by each sub-classification layer and the corresponding classification label of each sub-classification layer, which makes the During the model training process, the text classification model can learn the potential associations under each classification field, so that the text classification model can give more accurate classification results. Moreover, since the text classification model contains multiple sub-classification layers, This enables the purpose of classifying text data in various classification fields after training the text classification model, thus greatly saving training costs.

附图说明Description of the drawings

此处所说明的附图用来提供对本说明书的进一步理解，构成本说明书的一部分，本说明书的示意性实施例及其说明用于解释本说明书，并不构成对本说明书的不当限定。在附图中：The drawings described here are used to provide a further understanding of this specification and constitute a part of this specification. The illustrative embodiments and descriptions of this specification are used to explain this specification and do not constitute an improper limitation of this specification. In the attached picture:

图1为本说明书实施例提供的针对多领域文本分类模型的模型训练的方法的流程示意图；Figure 1 is a schematic flowchart of a method for model training of a multi-domain text classification model provided by an embodiment of this specification;

图2为本说明书实施例提供的一种文本分类模型的网络示意图；Figure 2 is a schematic network diagram of a text classification model provided by an embodiment of this specification;

图3为本说明书实施例提供的一种文本分类模型的网络示意图；Figure 3 is a schematic network diagram of a text classification model provided by an embodiment of this specification;

图4为本说明书实施例提供的针对多领域文本分类模型的模型训练的装置的结构示意图；Figure 4 is a schematic structural diagram of a device for model training of a multi-domain text classification model provided by an embodiment of this specification;

图5为本说明书实施例提供的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of this specification.

具体实施方式Detailed ways

为使本说明书的目的、技术方案和优点更加清楚，下面将结合本说明书具体实施例及相应的附图对本说明书技术方案进行清楚、完整地描述。显然，所描述的实施例仅是本说明书一部分实施例，而不是全部的实施例。基于本说明书中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本说明书保护的范围。In order to make the purpose, technical solutions and advantages of this specification more clear, the technical solutions of this specification will be clearly and completely described below in conjunction with specific embodiments of this specification and the corresponding drawings. Obviously, the described embodiments are only some of the embodiments of this specification, but not all of the embodiments. Based on the embodiments in this specification, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this specification.

以下结合附图，详细说明本说明书各实施例提供的技术方案。The technical solutions provided by each embodiment of this specification will be described in detail below with reference to the accompanying drawings.

图1为本说明书实施例提供的针对多领域文本分类模型的模型训练的方法的流程示意图，具体包括以下步骤：Figure 1 is a schematic flowchart of a method for model training of a multi-domain text classification model provided by an embodiment of this specification, which specifically includes the following steps:

S101：获取样本文本数据。S101: Obtain sample text data.

在本说明书实施例中，本说明书提供的模型训练的方法的执行主体可以是服务器，也可以诸如笔记本电脑、台式电脑等终端设备，为了便于描述，下面仅以终端设备为执行主体，对本说明书提供的模型训练的方法进行说明。In the embodiments of this specification, the execution subject of the model training method provided in this specification can be a server or a terminal device such as a laptop computer or a desktop computer. For the convenience of description, only the terminal device is used as the execution subject in the following, and the description of this specification is provided. The method of model training is explained.

为了训练出一个能够对多种分类领域均能进行文本分类的模型，在本说明书中，终端设备需要获取到样本文本数据，其中，该样本文本数据可以是从预先构建出的训练样本集中获取。而获取到的样本文本数据可以针对多种实际应用场景，例如，在信息推荐领域中，可以通过获取到的用户的文本数据，确定用户在诸如旅游、美食、娱乐等业务场景的偏好分类结果，这一偏好分类结果可以用于表示用户在一个业务场景下所包含的各种业务对象的偏好程度。如，对于美食这一业务场景，该业务场景下的偏好分类结果可以用于表示用户对哪种菜系的美食更为感兴趣。进而，在后续过程中，可以根据确定出的偏好分类结果，在各个业务场景中向用户进行信息推荐。而在这一场景中，在明确获知用户具体偏好的情况下，获取到的用户的文本数据即可作为训练所需的样本文本数据。In order to train a model that can classify text in multiple classification fields, in this specification, the terminal device needs to obtain sample text data, where the sample text data can be obtained from a pre-constructed training sample set. The obtained sample text data can be targeted at a variety of practical application scenarios. For example, in the field of information recommendation, the user's preference classification results in business scenarios such as travel, food, entertainment, etc. can be determined through the obtained user text data. This preference classification result can be used to represent the user's preference for various business objects included in a business scenario. For example, for the business scenario of gourmet food, the preference classification results in this business scenario can be used to indicate which cuisine the user is more interested in. Furthermore, in the subsequent process, information recommendations can be made to users in various business scenarios based on the determined preference classification results. In this scenario, when the user's specific preferences are clearly known, the user's text data obtained can be used as the sample text data required for training.

再例如，对于信息预警领域中，用户可以针对遇到的问题提交问题文本，而终端设备可以通过文本分类模型，对该问题文本进行分类，以确定出在不同的分类标准下，该问题文本的分类结果。其中，该问题文本在一个分类标准下的分类结果即用于表示在问题文本具体应该属于该分类标准下的哪种问题类别。进而，在后续过程中，可以根据确定出的分类结果，将该问题文本转发给用于处理指定问题的工作人员，并由该工作人员为用户进行问题解答。而在这一场景中，在明确获知该问题文本所归属的问题类别的情况下，获取到的用户的问题文本即可作为训练所需的样本文本数据。For another example, in the field of information early warning, users can submit question text for the problems they encounter, and the terminal device can classify the question text through the text classification model to determine the text of the question under different classification standards. Classification results. Wherein, the classification result of the question text under a classification standard is used to indicate which question category under the classification standard the question text should specifically belong to. Furthermore, in the subsequent process, the question text can be forwarded to the staff member who handles the specified problem based on the determined classification result, and the staff member can answer the user's question. In this scenario, when the question category to which the question text belongs is clearly known, the user's question text obtained can be used as the sample text data required for training.

需要说明的是，在实际应用中，涉及文本分类的场景较多，上述只是示例性的列举了两种涉及文本分类的具体应用领域，其他领域在此就不一一举例说明了。It should be noted that in practical applications, there are many scenarios involving text classification. The above are just examples of two specific application fields involving text classification. Other fields will not be illustrated one by one here.

进一步地，终端设备获取到的样本文本数据中可能会出现一些语法错误、或是无用的词语，而为了保证模型的训练效率以及训练效果，则可以对获取到的样本文本数据进行数据清洗，并将清洗后的样本文本数据作为训练样本对文本分类模型进行训练。Furthermore, some grammatical errors or useless words may appear in the sample text data obtained by the terminal device. In order to ensure the training efficiency and training effect of the model, the sample text data obtained can be data cleaned and Use the cleaned sample text data as training samples to train the text classification model.

具体的，在本说明书中，终端设备可以获取初始样本文本数据，而后，可以对获取到的初始样本文本数据中所包含的无效词语进行识别，并将识别出的无效词语从该初始样本文本数据中剔除，得到过渡样本文本数据。Specifically, in this specification, the terminal device can obtain the initial sample text data, and then identify invalid words contained in the obtained initial sample text data, and remove the identified invalid words from the initial sample text data. are eliminated to obtain transitional sample text data.

其中，终端设备具体可以采用诸如正则处理、停用词处理的方式，对初始样本文本数据中所包含的无效词语进行识别并剔除。Specifically, the terminal device may use methods such as regular processing and stop word processing to identify and eliminate invalid words contained in the initial sample text data.

进一步地，由于待训练的文本分类模型往往会规定输入数据的长度，为此，终端设备需要将上述过渡样本文本数据按照指定文本长度进行截取，从而获取到输入到文本分类模型中的样本文本数据。Furthermore, since the text classification model to be trained often stipulates the length of the input data, the terminal device needs to intercept the above-mentioned transition sample text data according to the specified text length, thereby obtaining the sample text data input into the text classification model. .

S102：将所述样本文本数据输入到待训练的文本分类模型中的编码层，以通过所述编码层对所述样本文本数据进行编码，得到所述样本文本数据对应的第一编码向量。S102: Input the sample text data into the encoding layer in the text classification model to be trained, so as to encode the sample text data through the encoding layer to obtain the first encoding vector corresponding to the sample text data.

S103：将所述第一编码向量分别输入到所述文本分类模型中所包含的每个专家网络层中，以针对每个专家网络层，确定该专家网络层针对所述第一编码向量所输出的第二编码向量。S103: Input the first encoding vector into each expert network layer included in the text classification model, so as to determine, for each expert network layer, the output of the expert network layer for the first encoding vector. the second encoding vector.

S104：将每个专家网络层输出的第二编码向量输入到所述文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过该子分类层对输入到该子分类层中的每个专家网络层所输出的第二编码向量进行处理，以得到针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果。S104: Input the second encoding vector output by each expert network layer to each sub-classification layer included in the text classification model, so that for each sub-classification layer, the pair of sub-classification layers is input into the sub-classification layer. The second encoding vector output by each expert network layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer.

在本说明书中，文本分类模型中包含有一个编码层，该编码层主要的作用在于将输入的样本文本数据进行编码，以得到文本分类模型中的其他网络层能够进行处理的编码向量。除此之外，文本分类模型中包含有多个子分类层，每个子分类层即视为对应一种分类领域，所以，本说明书提供的文本分类模型实际上可以对多种分类领域下的文本数据进行分类，相比于现有技术来说，只需训练一个文本分类模型，即可实现对多领域的文本的分类，从而有效的降低了模型的训练以及维护成本。In this specification, the text classification model contains a coding layer. The main function of this coding layer is to code the input sample text data to obtain coding vectors that can be processed by other network layers in the text classification model. In addition, the text classification model contains multiple sub-classification layers, and each sub-classification layer is regarded as corresponding to a classification field. Therefore, the text classification model provided in this manual can actually classify text data in multiple classification fields. Compared with the existing technology, only one text classification model can be trained to classify text in multiple fields, thus effectively reducing the training and maintenance costs of the model.

进一步地，本说明书中的文本分类模型还设有若干专家网络层，这些专家网络层可以理解为对于输入到文本分类模型中的文本数据，都会给出各自的分类建议，而专家网络层给出的分类建议由都会汇总到每个子分类层中，每个子分类层结合自身对应的分类领域的分类特点，参考各个专家网络层给出的分类建议，进而输出相应的分类结果。Furthermore, the text classification model in this specification also has several expert network layers. These expert network layers can be understood as giving respective classification suggestions for the text data input into the text classification model, and the expert network layer gives The classification suggestions will be summarized into each sub-classification layer. Each sub-classification layer combines the classification characteristics of its corresponding classification field, refers to the classification suggestions given by each expert network layer, and then outputs the corresponding classification results.

而这些专家网络层给出的所谓的分类建议，实际上是通过对编码层输出的编码向量所进一步地处理，处理所得到的数据隐含有这些专家网络层在模型训练过程中所学习到的分类知识。The so-called classification suggestions given by these expert network layers are actually further processed by the encoding vector output by the encoding layer. The data obtained by the processing implies what these expert network layers learned during the model training process. Classification knowledge.

基于此，在本说明书中，终端设备将获取到的样本文本数据输入到待训练的文本分类模型中的编码层后，该编码层可以对样本文本数据进行编码，以得到该样本文本数据对应的第一编码向量。Based on this, in this specification, after the terminal device inputs the obtained sample text data into the encoding layer in the text classification model to be trained, the encoding layer can encode the sample text data to obtain the sample text data corresponding to the The first encoding vector.

而后，编码层会将输出的第一编码向量再分别输入到文本分类模型中所包含的每个专家网络层中，而针对每个专家网络层，该专家网络层则会结合自身在训练过程中所学习到的知识(这一知识具体通过训练过程中对专家网络层中的网络参数进行调整而实现)，对该第一编码向量作进一步地处理，以确定该专家网络层针对第一编码向量所输出的第二编码向量。Then, the encoding layer will input the output first encoding vector into each expert network layer included in the text classification model, and for each expert network layer, the expert network layer will combine its own training process The learned knowledge (this knowledge is specifically realized by adjusting the network parameters in the expert network layer during the training process), further processes the first encoding vector to determine the expert network layer's response to the first encoding vector. The output second encoding vector.

每个专家网络层可以将各自输出的第二编码向量再进一步地分别输入到每个子分类层中，即，每个子分类层都会接收到所有专家网络层所输出的各第二编码向量。而针对每个子分类层，通过该子分类层对输入到该子分类层中的每个专家网络层所输出的第二编码向量进行处理，从而可以得到针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果。Each expert network layer can further input the second coding vector output by itself into each sub-classification layer, that is, each sub-classification layer will receive the second coding vectors output by all expert network layers. For each sub-classification layer, the second coding vector output by each expert network layer input to the sub-classification layer is processed through the sub-classification layer, so that the classification under the corresponding classification field of the sub-classification layer can be obtained. The result is used as the classification result corresponding to the sub-classification layer.

从上述可以看出，每个子分类层都会根据所有专家网络层所输出的第二编码向量，来给出自身的分类结果，则这种模型结构的设定相比于单一专家网络层的结构来说，可以保证每个子分类层输出的分类结果尽可能的准确。As can be seen from the above, each sub-classification layer will give its own classification result based on the second encoding vector output by all expert network layers. Compared with the structure of a single expert network layer, the setting of this model structure is That is, it can ensure that the classification results output by each sub-classification layer are as accurate as possible.

S105：以最小化每个子分类层对应的分类结果与所述样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对所述文本分类模型进行训练。S105: With the optimization goal of minimizing the deviation between the classification results corresponding to each sub-classification layer and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer, perform the optimization on the text classification model train.

每个子分类层各自输出分类结果后，终端设备可以根据每个子分类层对应的分类结果与样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差，分别确定出每个子分类层所对应的损失值，并进一步地将这些损失值进行加和，得到总损失。最后，终端设备可以最小化该总损失的方式，对文本分类模型进行训练，如图2所示。After each sub-classification layer outputs a classification result, the terminal device can determine the corresponding classification result based on the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer. The loss value corresponding to each sub-classification layer is further added up to obtain the total loss. Finally, the terminal device can train the text classification model in a way that minimizes the total loss, as shown in Figure 2.

图2为本说明书实施例提供的一种文本分类模型的网络示意图。Figure 2 is a schematic network diagram of a text classification model provided by an embodiment of this specification.

在图2所示的文本分类模型中设有一个编码层、三个专家网络层以及三个子分类层，这三个子分类层即对应三个不同的分类领域。而需要指出的是，专家网络层的数量与子分类层的数量之间没有严格的对应关系。In the text classification model shown in Figure 2, there is a coding layer, three expert network layers and three sub-classification layers. These three sub-classification layers correspond to three different classification fields. It should be pointed out that there is no strict correspondence between the number of expert network layers and the number of sub-classification layers.

终端设备通过编码层得到样本文本数据的第一编码向量后，编码层则会将该第一编码向量分别输入到这三个专家网络层中。而这三个专家网络层则会将各种输出的第二编码向量再输入到三个子分类层中，例如，专家网络层1、专家网络层2和专家网络层3各自输出的第二编码向量均汇总到了子分类层A中。After the terminal device obtains the first encoding vector of the sample text data through the encoding layer, the encoding layer will input the first encoding vector into the three expert network layers respectively. These three expert network layers will then input the second encoding vectors of various outputs into the three sub-classification layers, for example, the second encoding vectors output by expert network layer 1, expert network layer 2 and expert network layer 3 respectively. All are summarized in sub-category layer A.

而对于每个子分类层来说，该子分类层将给出在自身所属的分类领域下的分类结果，终端设备则会根据该子分类层所给出的分类结果以及预先确定出的样本文本数据在该子分类层所属的分类领域下的实际分类结果(在该分类领域下的分类标签)，确定出该子分类层对应的损失值，例如，基于子分类层A所输出的分类结果，可以确定出损失值a。For each sub-classification layer, the sub-classification layer will provide the classification results under the classification field to which it belongs, and the terminal device will use the classification results given by the sub-classification layer and the predetermined sample text data. Based on the actual classification result under the classification field to which the sub-classification layer belongs (the classification label under the classification field), the loss value corresponding to the sub-classification layer is determined. For example, based on the classification result output by sub-classification layer A, you can Determine the loss value a.

终端设备最终将损失值a、损失值b和损失值c进行加和，得到总损失，并以最小化该总损失为优化目标，对文本分类模型进行训练。The terminal device finally adds the loss value a, loss value b and loss value c to obtain the total loss, and trains the text classification model with minimizing the total loss as the optimization goal.

从上述内容可以看出，首先，对于任意一个子分类层来说，该子分类层的输入是有多个专家网络层的输出给出的，可以保证子分类层输出结果的准确性，其次，由于最后会将所有子分类层对应的损失值进行加和，并通过损失和值，来对文本分类模型进行训练，则可以使得文本分类模型在模型训练过程中能够学习到各个分类领域之间的潜在的关联性，从而使得无论是文本分类模型中的编码层、各专家网络层还是每个子分类层，均不会站在一个单一的分类领域下输出结果，也就说是，文本分类模型中的每个网络层均会全方位的考虑各分类领域下的特性以及各分类领域之间的潜在联系，给出更为准确的、合理的结果。It can be seen from the above that, firstly, for any sub-classification layer, the input of the sub-classification layer is given by the output of multiple expert network layers, which can ensure the accuracy of the output results of the sub-classification layer. Secondly, Since the loss values corresponding to all sub-classification layers will be summed in the end, and the text classification model is trained through the loss sum value, the text classification model can learn the differences between various classification fields during the model training process. Potential correlation, so that no matter it is the coding layer, each expert network layer or each sub-classification layer in the text classification model, the results will not be output under a single classification field. In other words, in the text classification model Each network layer will comprehensively consider the characteristics of each classification field and the potential connections between each classification field, giving more accurate and reasonable results.

进一步地，为了保证文本分类模型的训练效果，使得训练后的文本分类模型能够给出更为精确合理的分类结果，在本说明书中，文本分类模型还可以设有多个权重分配层，权重分配层设置的数量可以与子分类层的数量相对应，即，在文本分类模型中设置与子分类层的数量相同的权重分配层。Furthermore, in order to ensure the training effect of the text classification model and enable the trained text classification model to provide more accurate and reasonable classification results, in this specification, the text classification model can also be equipped with multiple weight distribution layers, and the weight distribution layer The number of layer settings may correspond to the number of sub-classification layers, that is, the same weight distribution layer as the number of sub-classification layers is set in the text classification model.

而权重分配层主要的作用则是给出子分类层在自身所属的分类领域下应该分别参考每个专家网络层输出结果的占比(即权重)，从而使得子分类层能够给出更为符合自身所属分类领域下的分类结果。The main function of the weight distribution layer is to give the sub-classification layer the proportion (i.e. weight) of the output results of each expert network layer that should be referred to in the classification field to which it belongs, so that the sub-classification layer can give a more consistent Classification results under the classification field to which it belongs.

具体的，在得到每个专家网络层输出的各第二编码向量后，可以将每个专家网络层输出的第二编码向量输入到文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过针对该子分类层所设置的权重分配层，对每个专家网络层所输出的第二编码向量进行加权，得到每个专家网络层的加权编码向量，并将每个专家网络层的加权编码向量输入到该子分类层中，以通过该子分类层输出针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果，如图3所示。Specifically, after obtaining each second coding vector output by each expert network layer, the second coding vector output by each expert network layer can be input to each sub-classification layer included in the text classification model to target each sub-classification layer. The classification layer weights the second encoding vector output by each expert network layer through the weight allocation layer set for the sub-classification layer to obtain the weighted encoding vector of each expert network layer, and assigns each expert network layer The weighted coding vector is input into the sub-classification layer, so that the classification result for the corresponding classification field of the sub-classification layer is output through the sub-classification layer as the classification result corresponding to the sub-classification layer, as shown in Figure 3.

图3为本说明书实施例提供的一种文本分类模型的网络示意图。Figure 3 is a schematic network diagram of a text classification model provided by an embodiment of this specification.

从图3中可以看出，文本分类模型中设有针对子分类层A的一个权重分配层A，编码层输出的第一编码向量可以输入到该权重分配层A中，该权重分配层A则根据第一编码向量，给出在子分类层A所对应的分类领域下，参考每个专家网络层所输出的第二编码向量的权重分别为多少，进而将确定出的权重分别加权到每个专家网络层所输出的第二编码向量，得到每个专家网络层的加权编码向量，使得子分类层A可以根据每个专家网络层的加权编码向量，给出相应的分类结果。As can be seen from Figure 3, the text classification model is provided with a weight distribution layer A for the sub-classification layer A. The first coding vector output by the coding layer can be input to the weight distribution layer A. The weight distribution layer A then According to the first coding vector, in the classification field corresponding to sub-category layer A, the weights of the second coding vectors output by each expert network layer are given, and then the determined weights are weighted to each The second coding vector output by the expert network layer obtains the weighted coding vector of each expert network layer, so that the sub-classification layer A can provide corresponding classification results based on the weighted coding vector of each expert network layer.

需要指出的是，图3中只是示出了一个子分类层对应的权重分配层，而实际上，在文本分类模型中，也分别设有子分类层B所对应的权重分配层，以及子分类层C所对应的权重分配层，只是在图3中没有示出而已，而具体的作用与子分类层A对应的权重分配层A相同，在此就不详细赘述了。It should be pointed out that Figure 3 only shows the weight distribution layer corresponding to one sub-classification layer. In fact, in the text classification model, there are also weight distribution layers corresponding to sub-classification layer B and sub-classification layer B. The weight distribution layer corresponding to layer C is only not shown in Figure 3, and its specific function is the same as the weight distribution layer A corresponding to sub-classification layer A, so it will not be described in detail here.

进一步地，针对任意一个分类领域下的权重分配层，该权重分配层输出的在该分类领域下分别参考每个专家网络层输出的第二编码向量的权重的能力，是通过对文本分类模型进行训练得到的，也就是说，通过上述训练方式，实际上也需要对文本分类模型中的权重分配层所包含的网络参数进行不断的调整，以使得对于训练后的文本分类模型中所包含的权重分配层，能够给出准确且合理的权重，以保证各个子分类层输出分类结果的准确性。Further, for the weight allocation layer under any classification field, the ability of the weight distribution layer output to refer to the weight of the second coding vector output by each expert network layer in the classification field is achieved by performing the text classification model Obtained by training, that is to say, through the above training method, it is actually necessary to continuously adjust the network parameters contained in the weight allocation layer in the text classification model, so that the weights contained in the trained text classification model The distribution layer can give accurate and reasonable weights to ensure the accuracy of the classification results output by each sub-classification layer.

还需说明的是，由于文本分类模型中包含有多个分类领域下的子分类层，所以，如果采用常规的方式则可能难以做到网络参数的较快收敛，为此，在本说明书中，可以对部分网络层中的网络参数进行固定，并调整其余网络的网络参数，以提高文本分类模型的训练效率。It should also be noted that since the text classification model contains sub-classification layers under multiple classification fields, it may be difficult to achieve faster convergence of network parameters if conventional methods are used. For this reason, in this specification, The network parameters in some network layers can be fixed and the network parameters of the remaining networks can be adjusted to improve the training efficiency of the text classification model.

具体的，通常情况下，终端设备会执行多轮次迭代式的训练方式执行模型训练任务，那么，针对每轮训练，可以先从文本分类模型中确定出部分网络层，并将这部分网络层的网络参数在该轮训练中固定。Specifically, usually, the terminal device will perform multiple rounds of iterative training methods to perform model training tasks. Then, for each round of training, some network layers can be determined from the text classification model, and these network layers can be The network parameters of are fixed in this round of training.

而后，终端设备可以最小化该轮训练中每个子分类层所对应的分类结果与样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对文本分类模型中包含的除上述部分网络层以外的其他网络层的网络参数进行调整，以执行文本分类模型的该轮训练。Then, the terminal device can minimize the deviation between the classification results corresponding to each sub-classification layer in this round of training and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer as the optimization goal, and the text The network parameters of other network layers included in the classification model except for the above-mentioned partial network layers are adjusted to perform this round of training of the text classification model.

而需要指出的是，终端设备可以在前期的训练轮次中固定较多网络层的网络参数，而随着训练轮次的增加，可以不断减少每轮训练中需要固定网络层参数的网络层数量。It should be pointed out that the terminal device can fix the network parameters of more network layers in the early training rounds, and as the training rounds increase, the number of network layers that need to fix the network layer parameters in each round of training can be continuously reduced. .

从上述方法可以看出，由于先由多个专家网络层给出各自的第二编码向量，然后再将每个专家网络层给出的第二编码向量分别输入到各个子分类层中，并通过使每个子分类层给出的分类结果与每个子分类层各自对应的分类标签之间的偏差最小为目标，对文本分类模型进行训练，这使得在模型训练过程中，文本分类模型可以学习到各分类领域下的潜在关联，从而可以使得文本分类模型能够给出更为准确的分类结果，并且，由于该文本分类模型中包含有多个子分类层，这使得训练出该文本分类模型后，可以实现对各个分类领域下的文本数据进行分类的目的，从而极大的节省了训练成本。It can be seen from the above method that since multiple expert network layers first provide their respective second encoding vectors, and then the second encoding vectors given by each expert network layer are input into each sub-classification layer, and passed The text classification model is trained with the goal of minimizing the deviation between the classification results given by each sub-classification layer and the corresponding classification labels of each sub-classification layer. This allows the text classification model to learn various aspects during the model training process. Potential correlations under the classification field can enable the text classification model to give more accurate classification results, and since the text classification model contains multiple sub-classification layers, this allows the text classification model to be trained. The purpose of classifying text data in various classification fields, thus greatly saving training costs.

以上为本说明书的一个或多个实施例提供的针对多领域文本分类模型的模型训练的方法，基于同样的思路，本说明书还提供了相应的针对多领域文本分类模型的模型训练的装置，如图4所示。The above is a method for model training of a multi-domain text classification model provided by one or more embodiments of this specification. Based on the same idea, this specification also provides a corresponding device for model training of a multi-domain text classification model, such as As shown in Figure 4.

图4为本说明书实施例提供的针对多领域文本分类模型的模型训练的装置的结构示意图，具体包括：Figure 4 is a schematic structural diagram of a device for model training of a multi-domain text classification model provided by an embodiment of this specification, which specifically includes:

获取模块401，用于获取样本文本数据；Obtain module 401, used to obtain sample text data;

第一编码模块402，用于将所述样本文本数据输入到待训练的文本分类模型中的编码层，以通过所述编码层对所述样本文本数据进行编码，得到所述样本文本数据对应的第一编码向量；The first encoding module 402 is used to input the sample text data into the encoding layer in the text classification model to be trained, so as to encode the sample text data through the encoding layer to obtain the sample text data corresponding to first encoding vector;

第二编码模块403，用于将所述第一编码向量分别输入到所述文本分类模型中所包含的每个专家网络层中，以针对每个专家网络层，确定该专家网络层针对所述第一编码向量所输出的第二编码向量；The second encoding module 403 is used to input the first encoding vector into each expert network layer included in the text classification model, so as to determine, for each expert network layer, whether the expert network layer is a second encoding vector output by the first encoding vector;

输出模块404，用于将每个专家网络层输出的第二编码向量输入到所述文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过该子分类层对输入到该子分类层中的每个专家网络层所输出的第二编码向量进行处理，以得到针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果；The output module 404 is used to input the second encoding vector output by each expert network layer to each sub-classification layer included in the text classification model, so that for each sub-classification layer, the pair of input to the sub-classification layer is input to the The second coding vector output by each expert network layer in the sub-classification layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer;

训练模块405，用于以最小化每个子分类层对应的分类结果与所述样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对所述文本分类模型进行训练。The training module 405 is used to optimize the target by minimizing the deviation between the classification results corresponding to each sub-classification layer and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer. Text classification model is trained.

所述输出模块404具体用于，将每个专家网络层输出的第二编码向量输入到所述文本分类模型中所包含的每个子分类层，以针对每个子分类层，通过针对该子分类层所设置的权重分配层，对每个专家网络层所输出的第二编码向量进行加权，得到每个专家网络层的加权编码向量，并将每个专家网络层的加权编码向量输入到该子分类层中，以通过该子分类层输出针对该子分类层对应分类领域下的分类结果，作为该子分类层对应的分类结果。The output module 404 is specifically configured to input the second encoding vector output by each expert network layer into each sub-classification layer included in the text classification model, so as to target each sub-classification layer by targeting the sub-classification layer. The set weight distribution layer weights the second coding vector output by each expert network layer to obtain the weighted coding vector of each expert network layer, and inputs the weighted coding vector of each expert network layer into the subcategory. In the layer, the classification result for the corresponding classification field of the sub-classification layer is output through the sub-classification layer as the classification result corresponding to the sub-classification layer.

可选地，所述获取模块401具体用于，获取初始样本文本数据；对所述初始样本文本数据中所包含的无效词语进行识别，并将所述无效词语从所述初始样本文本数据中剔除，得到过渡样本文本数据；将所述过渡样本文本数据按照指定文本长度进行截取，以获取到输入到所述文本分类模型中的样本文本数据。Optionally, the acquisition module 401 is specifically configured to obtain initial sample text data; identify invalid words contained in the initial sample text data, and eliminate the invalid words from the initial sample text data. , obtain transition sample text data; intercept the transition sample text data according to the specified text length to obtain sample text data input into the text classification model.

可选地，所述训练模块405具体用于，针对每轮训练，从所述文本分类模型中确定出部分网络层，并将所述部分网络层的网络参数在该轮训练中固定；以最小化该轮训练中每个子分类层所对应的分类结果与所述样本文本数据在每个子分类层对应的分类领域下所对应的实际分类结果之间的偏差为优化目标，对所述文本分类模型中包含的除所述部分网络层以外的其他网络层的网络参数进行调整，以执行所述文本分类模型的该轮训练。Optionally, the training module 405 is specifically configured to, for each round of training, determine some network layers from the text classification model, and fix the network parameters of the partial network layers in this round of training; with a minimum The deviation between the classification results corresponding to each sub-classification layer in this round of training and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer is the optimization goal, and the text classification model is The network parameters of other network layers included in the network layer except the partial network layer are adjusted to perform this round of training of the text classification model.

本说明书还提供了一种计算机可读存储介质，该存储介质存储有计算机程序，计算机程序可用于执行上述图1提供的针对多领域文本分类模型的模型训练的方法。This specification also provides a computer-readable storage medium that stores a computer program. The computer program can be used to execute the method for model training of a multi-domain text classification model provided in Figure 1 above.

本说明书还提供了图5所示的电子设备的结构示意图。如图5所述，在硬件层面，该电子设备包括处理器、内部总线、网络接口、内存以及非易失性存储器，当然还可能包括其他业务所需要的硬件。处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行，以实现上述图1提供的针对多领域文本分类模型的模型训练的方法。This specification also provides a schematic structural diagram of the electronic device shown in Figure 5. As shown in Figure 5, at the hardware level, the electronic device includes a processor, internal bus, network interface, memory and non-volatile memory, and of course may also include other hardware required for business. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it to implement the method of model training for the multi-domain text classification model provided in Figure 1 above.

当然，除了软件实现方式之外，本说明书并不排除其他实现方式，比如逻辑器件抑或软硬件结合的方式等等，也就是说以下处理流程的执行主体并不限定于各个逻辑单元，也可以是硬件或逻辑器件。Of course, in addition to software implementation, this specification does not exclude other implementation methods, such as logic devices or a combination of software and hardware, etc. That is to say, the execution subject of the following processing flow is not limited to each logical unit, and may also be hardware or logic device.

在20世纪90年代，对于一个技术的改进可以很明显地区分是硬件上的改进(例如，对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而，随着技术的发展，当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此，不能说一个方法流程的改进就不能用硬件实体模块来实现。例如，可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable GateArray，FPGA))就是这样一种集成电路，其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上，而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且，如今，取代手工地制作集成电路芯片，这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现，它与程序开发撰写时所用的软件编译器相类似，而要编译之前的原始代码也得用特定的编程语言来撰写，此称之为硬件描述语言(Hardware Description Language，HDL)，而HDL也并非仅有一种，而是有许多种，如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware DescriptionLanguage)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(RubyHardware Description Language)等，目前最普遍使用的是VHDL(Very-High-SpeedIntegrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚，只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中，就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, improvements in a technology could be clearly distinguished as hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method processes). However, with the development of technology, many improvements in today's method processes can be regarded as direct improvements in hardware circuit structures. Designers almost always obtain the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that an improvement of a method flow cannot be implemented using hardware entity modules. For example, a programmable logic device (PLD) (such as a field programmable gate array (FPGA)) is such an integrated circuit, and its logic function is determined by the user programming the device. Designers can program themselves to "integrate" a digital system on a PLD, instead of asking chip manufacturers to design and produce dedicated integrated circuit chips. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly implemented using "logic compiler" software, which is similar to the software compiler used in program development and writing. Before compiling, The original code must also be written in a specific programming language, which is called Hardware Description Language (HDL). There is not only one type of HDL, but many types, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., the most commonly used ones currently are VHDL (Very-High-SpeedIntegrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should also know that by simply logically programming the method flow using the above-mentioned hardware description languages and programming it into the integrated circuit, the hardware circuit that implements the logical method flow can be easily obtained.

控制器可以按任何适当的方式实现，例如，控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit，ASIC)、可编程逻辑控制器和嵌入微控制器的形式，控制器的例子包括但不限于以下微控制器：ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320，存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道，除了以纯计算机可读程序代码方式实现控制器以外，完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件，而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至，可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (eg, software or firmware) executable by the (micro)processor. , logic gates, switches, Application Specific Integrated Circuit (ASIC), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, the memory controller can also be implemented as part of the memory control logic. Those skilled in the art also know that in addition to implementing the controller in the form of pure computer-readable program code, the controller can be completely programmed with logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded logic by logically programming the method steps. Microcontroller, etc. to achieve the same function. Therefore, this controller can be considered as a hardware component, and the devices included therein for implementing various functions can also be considered as structures within the hardware component. Or even, the means for implementing various functions can be considered as structures within hardware components as well as software modules implementing the methods.

上述实施例阐明的系统、装置、模块或单元，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的，计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.

为了描述的方便，描述以上装置时以功能分为各种单元分别描述。当然，在实施本说明书时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing this specification, the functions of each unit can be implemented in the same or multiple software and/or hardware.

本领域内的技术人员应明白，本说明书的实施例可提供为方法、系统、或计算机程序产品。因此，本说明书可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本说明书可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will understand that embodiments of the present specification may be provided as methods, systems, or computer program products. Thus, the present description may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk memory, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本说明书是参照根据本说明书实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The specification is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, random access memory (RAM), and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.

本领域技术人员应明白，本说明书的实施例可提供为方法、系统或计算机程序产品。因此，本说明书可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本说明书可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present specification may be provided as methods, systems, or computer program products. Thus, the present description may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk memory, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本说明书可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through communications networks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment.

以上所述仅为本说明书的实施例而已，并不用于限制本说明书。对于本领域技术人员来说，本说明书可以有各种更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本说明书的权利要求范围之内。The above descriptions are only examples of this specification and are not intended to limit this specification. Various modifications and variations may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this specification shall be included in the scope of the claims of this specification.

Claims

1. A method of model training for multi-domain text classification models, which is characterized by including:

Get sample text data;

Input the sample text data into the encoding layer in the text classification model to be trained, so as to encode the sample text data through the encoding layer to obtain the first encoding vector corresponding to the sample text data;

The first encoding vector is input into each expert network layer included in the text classification model, so as to determine, for each expert network layer, the first encoding vector output by the expert network layer. Two encoding vectors;

The second encoding vector output by each expert network layer is input to each sub-classification layer included in the text classification model, so that for each sub-classification layer, each sub-classification layer input to the sub-classification layer is The second coding vector output by each expert network layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer;

The text classification model is trained with the optimization goal of minimizing the deviation between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer.

2. The method according to claim 1, characterized in that, for each sub-classification layer, the text classification model also includes a weight allocation layer for the sub-classification layer;

The second encoding vector output by each expert network layer is input to each sub-classification layer included in the text classification model, so that for each sub-classification layer, each sub-classification layer input to the sub-classification layer is The second coding vector output by the expert network layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer, specifically including:

The second encoding vector output by each expert network layer is input to each sub-classification layer included in the text classification model, so that for each sub-classification layer, through the weight allocation layer set for the sub-classification layer, each The second coding vectors output by each expert network layer are weighted to obtain the weighted coding vector of each expert network layer, and the weighted coding vector of each expert network layer is input into the sub-classification layer to pass the sub-classification layer. Output the classification results in the corresponding classification field of the sub-classification layer as the classification results corresponding to the sub-classification layer.

3. The method of claim 1, wherein obtaining sample text data specifically includes:

Get initial sample text data;

Identify invalid words contained in the initial sample text data, and remove the invalid words from the initial sample text data to obtain transition sample text data;

The transition sample text data is intercepted according to the specified text length to obtain sample text data input into the text classification model.

4. The method of claim 1, characterized in that, to minimize the difference between the classification result corresponding to each sub-classification layer and the actual classification result corresponding to the sample text data in the classification field corresponding to each sub-classification layer. The deviation is the optimization target, and the text classification model is trained, specifically including:

For each round of training, determine some network layers from the text classification model, and fix the network parameters of the partial network layers in this round of training;

With the optimization goal of minimizing the deviation between the classification results corresponding to each sub-classification layer in this round of training and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer, the text Network parameters of other network layers included in the classification model except the partial network layer are adjusted to perform this round of training of the text classification model.

5. A device for model training of multi-domain text classification models, characterized by including:

Obtain module, used to obtain sample text data;

The first encoding module is used to input the sample text data into the encoding layer in the text classification model to be trained, so as to encode the sample text data through the encoding layer, and obtain the third code corresponding to the sample text data. a coding vector;

A second encoding module, configured to input the first encoding vector into each expert network layer included in the text classification model, so as to determine, for each expert network layer, whether the expert network layer is suitable for the first a second coding vector output from one coding vector;

An output module, configured to input the second encoding vector output by each expert network layer to each sub-classification layer included in the text classification model, so that for each sub-classification layer, the pair of input to the sub-classification layer is input to the sub-classification layer through the sub-classification layer. The second encoding vector output by each expert network layer in the classification layer is processed to obtain the classification result for the corresponding classification field of the sub-classification layer as the classification result corresponding to the sub-classification layer;

The training module is used to optimize the target by minimizing the deviation between the classification results corresponding to each sub-classification layer and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer. Classification model is trained.

6. The device according to claim 5, wherein for each sub-classification layer, the text classification model also includes a weight allocation layer for the sub-classification layer;

The output module is specifically configured to input the second encoding vector output by each expert network layer into each sub-classification layer included in the text classification model, so that for each sub-classification layer, by The set weight distribution layer weights the second coding vector output by each expert network layer to obtain the weighted coding vector of each expert network layer, and inputs the weighted coding vector of each expert network layer into the sub-classification layer. , the classification result for the corresponding classification field of the sub-classification layer is output through the sub-classification layer as the classification result corresponding to the sub-classification layer.

7. The device of claim 5, wherein the acquisition module is specifically configured to acquire initial sample text data; identify invalid words contained in the initial sample text data, and convert the invalid words into Words are eliminated from the initial sample text data to obtain transition sample text data; the transition sample text data is intercepted according to the specified text length to obtain sample text data input into the text classification model.

8. The device of claim 5, wherein the training module is specifically configured to, for each round of training, determine a partial network layer from the text classification model, and convert the network of the partial network layer The parameters are fixed in this round of training; to minimize the deviation between the classification results corresponding to each sub-classification layer in this round of training and the actual classification results corresponding to the sample text data in the classification field corresponding to each sub-classification layer. In order to optimize the target, network parameters of other network layers included in the text classification model except the partial network layer are adjusted to perform this round of training of the text classification model.

9. A computer-readable storage medium, characterized in that the storage medium stores a computer program, and when the computer program is executed by a processor, the method of any one of claims 1 to 4 is implemented.

10. An electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the program, any one of claims 1 to 4 is realized. method described in the item.