CN116757208A

CN116757208A - Data processing method, device and equipment

Info

Publication number: CN116757208A
Application number: CN202310466681.XA
Authority: CN
Inventors: 马志远; 张蝶; 周书恒; 都金涛; 周欣欣; 杨淑娟; 祝慧佳
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-09-15

Abstract

The embodiment of the specification provides various data processing methods, devices and equipment, wherein one method comprises the following steps: acquiring a text data sample for training a first model and entity type labels corresponding to words contained in the text data sample, performing iterative training on the first model based on a first loss function, the text data sample and the entity type labels corresponding to words contained in the text data sample, performing entity recognition processing on the text data sample based on probability distribution of words contained in the text data sample corresponding to different predicted entity types obtained by performing the entity recognition processing on the text data sample by the first model after the preliminary training under the condition that the first model meets a preset convergence condition, updating parameters in the first loss function, performing iterative training on the first model after the preliminary training based on the updated first loss function, and obtaining the first model after training.

Description

Data processing method, device and equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, and device.

Background

Named entity recognition occupies very important positions in the field of natural language processing, such as in application scenes of question-answering systems, knowledge base construction and the like, entities contained in natural language texts can be recognized by constructing an entity recognition model, so that corresponding question-answering strategies can be determined or corresponding knowledge bases can be constructed through the recognized entities.

The label corresponding to the sample data for training the entity recognition model can be determined in a manual standard reaching mode, however, because the accuracy of the label marked manually is poor, the accuracy of entity recognition of the entity recognition model obtained through training the sample data is poor, and therefore, a scheme capable of improving the accuracy of named entity recognition is needed.

Disclosure of Invention

The embodiment of the specification aims to provide a scheme capable of improving the accuracy of named entity recognition.

In order to achieve the above technical solution, the embodiments of the present specification are implemented as follows:

in a first aspect, an embodiment of the present disclosure provides a data processing method, including: acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample; performing iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition; determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training; and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

In a second aspect, an embodiment of the present disclosure provides a data processing method, including: under the condition that the target user is detected to trigger the execution of the target service, determining target text data to be identified based on the acquired target information, wherein the target information comprises information required by the target user to trigger the execution of the target service and/or interaction information of the target user for triggering the execution of the target service; inputting the target text data into the trained first model to obtain a predicted entity type corresponding to the target text data; determining a target conversation in the candidate conversation and triggered by the target user to execute the target service matching based on the predicted entity type corresponding to the target text data, and outputting the target conversation; wherein the training process of the first model comprises: acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample; based on a first loss function, the text data sample and the text

Performing iterative training on the first model by using entity type labels corresponding to words contained in the data samples, and obtaining a preliminarily trained first model under the condition that the first model meets preset convergence conditions; determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training; and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

In a third aspect, embodiments of the present specification provide a data processing apparatus, the apparatus comprising: the data acquisition module is used for acquiring a text data sample for training the first model and entity type labels corresponding to words contained in the text data sample; the first training module is used for carrying out iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition; the first determining module is used for determining the prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample, which is obtained by carrying out entity identification processing on the text data sample by the first model of the preliminary training; and the second training module is used for updating the parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and carrying out iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

In a fourth aspect, embodiments of the present specification provide a data processing apparatus, the apparatus comprising: the information acquisition module is used for determining target text data to be identified based on the acquired target information under the condition that the target user is detected to trigger the execution of the target service, wherein the target information comprises information required by the target user to trigger the execution of the target service and/or interaction information of the target user for triggering the execution of the target service; the type determining module is used for inputting the target text data into the trained first model to obtain a predicted entity type corresponding to the target text data; the voice operation output module is used for determining a target voice operation which is triggered to execute the target service matching with the target user in the candidate voice operation based on the predicted entity type corresponding to the target text data, and outputting the target voice operation; wherein the training process of the first model comprises: acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample; performing iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition; determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training; and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

In a fifth aspect, embodiments of the present specification provide a data processing apparatus, the data processing apparatus including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample; performing iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition; determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training; and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

In a sixth aspect, embodiments of the present specification provide a data processing apparatus, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: under the condition that the target user is detected to trigger the execution of the target service, determining target text data to be identified based on the acquired target information, wherein the target information comprises information required by the target user to trigger the execution of the target service and/or interaction information of the target user for triggering the execution of the target service; inputting the target text data into the trained first model to obtain a predicted entity type corresponding to the target text data; determining a target conversation in the candidate conversation and triggered by the target user to execute the target service matching based on the predicted entity type corresponding to the target text data, and outputting the target conversation; wherein the training process of the first model comprises: acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample; performing iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition; determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training; and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

In a seventh aspect, embodiments of the present disclosure provide a storage medium for storing computer-executable instructions that, when executed, implement the following: acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample; performing iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition; determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training; and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

In an eighth aspect, the present description provides a storage medium for storing computer-executable instructions that when executed implement the following: under the condition that the target user is detected to trigger the execution of the target service, determining target text data to be identified based on the acquired target information, wherein the target information comprises information required by the target user to trigger the execution of the target service and/or interaction information of the target user for triggering the execution of the target service; inputting the target text data into the trained first model to obtain a predicted entity type corresponding to the target text data; determining a target conversation in the candidate conversation and triggered by the target user to execute the target service matching based on the predicted entity type corresponding to the target text data, and outputting the target conversation; wherein the training process of the first model comprises: acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample; performing iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition; determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training; and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data processing system of the present specification;

FIG. 2A is a flow chart of an embodiment of a data processing method of the present disclosure;

FIG. 2B is a schematic diagram illustrating a data processing method according to the present disclosure;

FIG. 3 is a schematic structural view of a first model of the present disclosure;

FIG. 4 is a schematic illustration of a first model process of the present disclosure;

FIG. 5 is a schematic illustration of a training process of a first model of the present disclosure;

FIG. 6 is a schematic diagram illustrating a processing procedure of another data processing method according to the present disclosure;

FIG. 7 is a schematic diagram illustrating a processing procedure of another data processing method according to the present disclosure;

FIG. 8 is a schematic illustration of a training process of yet another first model of the present disclosure;

FIG. 9 is a schematic diagram of a training process of yet another first model of the present disclosure;

FIG. 10 is a schematic illustration of a training process of yet another first model of the present disclosure;

FIG. 11 is a schematic illustration of a training process of yet another first model of the present disclosure;

FIG. 12A is a flowchart of yet another embodiment of a data processing method of the present disclosure;

FIG. 12B is a schematic diagram illustrating a processing procedure of another data processing method according to the present disclosure;

FIG. 13 is a schematic diagram of target information according to the present disclosure;

FIG. 14 is a schematic view of another embodiment of a data processing apparatus according to the present disclosure;

FIG. 15 is a schematic diagram of another embodiment of a data processing apparatus according to the present disclosure;

fig. 16 is a schematic diagram of a structure of a data processing apparatus of the present specification.

Detailed Description

The embodiment of the specification provides a data processing method, a device and equipment.

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The technical scheme of the specification can be applied to a data processing system, as shown in fig. 1, the data processing system can be provided with terminal equipment and a server, wherein the server can be an independent server or a server cluster formed by a plurality of servers, and the terminal equipment can be equipment such as a personal computer and the like or mobile terminal equipment such as a mobile phone, a tablet personal computer and the like.

The data processing system may include n terminal devices and m servers, where n and m are positive integers greater than or equal to 1, where the terminal devices may be used to collect data samples, for example, the terminal devices may obtain corresponding data samples for different application scenarios, for example, for a question-answering system, the terminal devices may collect feedback information of a user-oriented dialog as data samples, for a knowledge base construction scenario of a preset service, and the terminal devices may collect service data corresponding to the preset service (such as data required for executing the preset service) as data samples.

The terminal device can send the collected data sample to any server in the data processing system, the server can preprocess the received data sample, and the preprocessed data sample is stored as a text data sample. Among other things, the preprocessing operations may include text conversion preprocessing (i.e., converting audio data into text data, etc.), text format conversion processing (e.g., converting english text into chinese text, etc.), and the like.

In addition, the terminal device can also send the collected data samples to the corresponding service end based on the application scene corresponding to the data samples. For example, assuming that the server 1 and the server 2 are used for processing a question-answer service and the server 3 and the server 4 are used for processing a knowledge base construction service in the data processing system, the terminal device may send the collected data samples in the question-answer scenario to the server 1 and the server 2, and send the collected data samples in the knowledge base construction scenario to the server 3 and the server 4.

In this way, the server may train the first model based on the stored text data samples, in the case of receiving a training instruction for the first model.

In addition, there may be a central server (e.g., server 1) in the data processing system, where the central server is configured to train a first model to be trained based on text data samples sent by other servers (e.g., server 2 and server 3) when a model training period is reached, and return model parameters of the trained first model to the corresponding server after the trained first model is obtained. In this way, other service ends in the data processing system can provide business services for users without interruption, and meanwhile, the center service end can update and upgrade the first model based on the model training period.

Because noise may exist in the text data sample acquired by the server, that is, the credibility of the entity type label corresponding to the word contained in the acquired text data sample cannot be guaranteed, in order to improve the accuracy of model training and improve the accuracy of named entity identification, the confidence of the first model on the prediction result can be judged through the prediction entropy corresponding to the first model of the preliminary training determined through the probability distribution of the word contained in the text data sample corresponding to different prediction entity types in the model training process, so that the parameters of the first loss function are dynamically adjusted (that is, the parameters in the first loss function are updated), the parameters of the first loss function gradually transit from average absolute value loss to cross entropy loss, robustness is better provided for high-concentration noise in the early stage of model training, and the network can be better trained after the noise is gradually screened out in the later stage of model training, so that the named entity identification effect of the first model is improved.

The data processing method in the following embodiments can be implemented based on the above-described data processing system configuration.

Example 1

As shown in fig. 2A and fig. 2B, the embodiment of the present disclosure provides a data processing method, where an execution body of the method may be a server, and the server may be a server, where the server may be an independent server or may be a server cluster formed by a plurality of servers. The method specifically comprises the following steps:

In S202, a text data sample for training the first model is acquired, and an entity type tag corresponding to a word included in the text data sample.

The entity type label corresponding to the word contained in the text data sample can be used for identifying the type of the entity, the entity type label corresponding to the word contained in the text data sample can be determined in a manual labeling mode, for example, the entity type label can comprise a person name, a place name, an organization name, a proper noun and the like, and the first model can be a model which is constructed based on a preset neural network algorithm and used for identifying the entity type.

In implementation, named entity recognition occupies very important positions in the field of natural language processing, for example, in application scenes such as question-answering systems, knowledge base construction and the like, entities contained in natural language texts can be recognized by constructing an entity recognition model, so that corresponding question-answering strategies can be determined through the recognized entities, or corresponding knowledge bases can be constructed. The label corresponding to the sample data for training the entity recognition model can be determined in a manual standard reaching mode, however, because the accuracy of the label marked manually is poor, the accuracy of entity recognition of the entity recognition model obtained through training the sample data is poor, and therefore, a scheme capable of improving the accuracy of named entity recognition is needed. For this reason, the embodiments of the present specification provide a technical solution that can solve the above-mentioned problems, and specifically, reference may be made to the following.

Taking the first model as an example for determining a model of a corresponding conversation based on feedback information input by a user in a question-answer scene, a server side can acquire user feedback information acquired in a model training period based on terminal equipment, determine a text data sample for training the first model, and determine an entity type label corresponding to words contained in the text data sample.

As shown in fig. 3, the first model may include an Embedding Layer (Embedding Layer), a semantic extraction Layer and a type recognition Layer, the Embedding Layer may be used for performing a vector extraction process to obtain an embedded vector, the semantic extraction Layer may be used for performing a semantic extraction process on the embedded vector to obtain a semantic vector, and finally, the type recognition Layer may be used for performing a category recognition process on the semantic vector to obtain a predicted entity type. The semantic extraction layer may be used to identify the entity, and the type identification layer is used to perform type classification on the identified entity, i.e. determine the predicted entity type corresponding to the identified entity.

Wherein the embedding layer and the semantic extraction layer may be built based on a language characterization model (Bidirectional Encoder Representation from Transformers, BERT).

The server side can input the text data sample into the first model, an embedding layer of the first model can conduct vector extraction processing on words contained in the text data sample to obtain embedding vectors (emmbedding) corresponding to the words contained in the text data sample, semantic extraction processing is conducted on the embedding vectors corresponding to the words contained in the text data sample through a semantic extraction layer of the first model to obtain semantic vectors (token) corresponding to the words contained in the text data sample, and finally type recognition processing is conducted on the token through a type recognition layer to obtain predicted entity types corresponding to the words contained in the text data sample.

For example, as shown in fig. 4, the server may input a text data sample (i.e. "ABCD") into the first model to obtain a predicted entity type corresponding to a word included in the text data sample, that is, the predicted entity type corresponding to a may be entity type 1, the predicted entity type corresponding to B may be entity type 2, the predicted entity type corresponding to C may be entity type 2, and the predicted entity type corresponding to D may be entity type 3.

In S204, based on the first loss function, the text data sample, and the entity type label corresponding to the word included in the text data sample, performing iterative training on the first model, and obtaining a preliminarily trained first model when the first model meets a preset convergence condition.

In implementation, a text data sample may be input into a first model, entity recognition processing is performed on the text data sample through the first model, an entity recognition result of the text data sample is obtained, and finally, the server may determine a first loss value corresponding to the first model based on the entity recognition result, an entity type tag corresponding to a word included in the text data sample, and a first loss function.

The server may determine whether the first model meets the preset convergence condition based on the first loss value, if the server determines that the first model does not meet the preset convergence condition based on the first loss value, the server may continue to perform iterative training on the first model based on the first loss function, the text data sample and the entity type label corresponding to the word included in the text data sample until the first model meets the preset convergence condition, thereby obtaining the first model for preliminary training.

In addition, the method for determining whether the first model meets the preset convergence condition is an optional and realizable determination method, and in an actual application scenario, there may also be multiple different determination methods, and different determination methods may be selected according to different actual application scenarios, which is not specifically limited in this embodiment of the present disclosure.

In S206, based on the probability distribution of the words included in the text data sample corresponding to different predicted entity types obtained by performing entity recognition processing on the text data sample by the first model of the preliminary training, the prediction entropy corresponding to the first model of the preliminary training is determined.

In implementation, the text data samples may be input into the first model of the preliminary training in batches (for example, the text data samples may be input into the first model of the preliminary training in a manner of not replacing random sampling), and entity recognition processing is performed on each batch of text data samples through the first model of the preliminary training, so as to obtain probability distributions of different predicted entity types corresponding to words included in each batch of text data samples. And determining the average value of the entropy of the probability distribution of the words contained in each text data sample corresponding to different predicted entity types as the predicted entropy corresponding to the first model of the preliminary training.

In S208, based on the prediction entropy corresponding to the first model of the preliminary training, the parameters in the first loss function are updated to obtain an updated first loss function, and the first model of the preliminary training is iteratively trained based on the updated first loss function until the first model converges to obtain a trained first model.

In an implementation, taking the first loss function as a generalized cross entropy loss (GCE Generalized Cross Entropy Loss, GCE) function as an example, the GCE function is a compromise loss function of the cross entropy loss function and the average absolute value loss function, and the GCE function can have the advantage that the cross entropy loss function is suitable for training and convergence of a neural network, and the advantage that the average absolute value loss function is insensitive to noise. Wherein the loss value corresponding to the first loss function can be calculated by the following formula

Wherein n is the number of input data of the first model, y _i The i-th input data of the first model is only a corresponding entity identification result obtained by entity identification processing, and q is a parameter of a first loss function.

When the parameter q is trending towards 1, the GCE function is trending towards an average absolute value loss function, and when the parameter q is trending towards 0, the GCE function is trending towards a cross entropy loss function. In the training framework of the first model, the first model screens out trusted data in an iterative process, so that a training data set is changed, namely noise in the training process of the first model is gradually reduced. When the noise is larger, the parameter q of the GCE function can be more approximate to 1, so that better noise robustness is obtained, and when the noise is smaller, the parameter q of the GCE function can be more approximate to 0, so that the neural network of the first model can learn and converge better. Thus, the parameter q of the GCE function may be dynamically adjusted as training proceeds during the training of the first model.

The overall trend of the prediction entropy in the training process of the first model is that the prediction entropy is firstly reduced and then tends to be stable, the prediction entropy can reflect the prediction confidence of the first model from the side face, the smaller the prediction entropy is, the more the accuracy of the entity recognition result by the first model is determined, and therefore the parameter of the first loss function of the first model can be dynamically adjusted by the prediction entropy.

In the training process of the first model, the first model can be trained in a first stage through a text data sample and a first loss function, the first model of preliminary training is obtained under the condition that the first model meets the preset convergence condition, and the parameters in the first loss function are updated through the determined prediction entropy corresponding to the first model of the preliminary training. And in the second stage of training the first model, the first model subjected to preliminary training can be continuously subjected to iterative training based on the updated first loss function until the first model converges, so as to obtain the trained first model.

In addition, taking the two-stage training of the first model as an example, in an actual application scene, three or more stages of model training can be performed on the first model, and after each stage of training is finished, the prediction entropy corresponding to the first model can be determined in the above manner. And updating the parameters of the first loss function through the determined prediction entropy to obtain an updated first loss function, and continuing model training of the next stage through the updated first loss function until the first model converges to obtain a trained first model.

In addition, there may be various methods for determining whether the first model converges, for example, whether the first model converges may be determined by whether the preset iteration number is reached, whether the accuracy of entity identification by the first model is greater than a preset accuracy threshold, whether the loss value of the first model is less than a preset loss value threshold, etc., and different determining methods may be selected according to different practical application scenarios, which is not limited in this embodiment of the present disclosure.

Taking the model structure of the first model as shown in fig. 3 as an example, in the process of training the first model, when the first model meets the preset convergence condition, the text data sample can be screened through the entity recognition result.

If the entity type tag corresponding to the word contained in the text data sample is not matched with the predicted entity type, the predicted entity type corresponding to the word can be considered to be unreliable, so that the text data sample can be divided into trusted data and noise by whether the entity type tag is matched with the predicted entity type. As shown in fig. 5, in the process of performing subsequent training on the first model, the loss value corresponding to the noise data may not be calculated, the parameters in the first loss function are updated only by the trusted data, and the first model which is primarily trained is iteratively trained based on the screened trusted data and the updated first loss function until the first model converges, so that the trained first model is obtained, and therefore, the noise data which is not trusted by the label will not affect the subsequent training.

In addition, if the model filters the training data by itself, noise accumulation is likely to be caused, and if the first model has already fitted with some noise in the previous training process, the first model will not reject such noise data in the subsequent data filtering process, which leads to enhancement and accumulation of such errors in the subsequent training of the first model. Therefore, after the first loss function is updated, a text data sample different from the text data sample can be obtained, and the first model which is initially trained is iteratively trained based on the obtained new text data sample and the updated first loss function until the first model converges to obtain the first model after training, so that the noise learned by the first model in the training process of each stage is different, the noise and the error are not accumulated and enhanced, and the first model has a stronger generalization effect.

The embodiment of the specification provides a data processing method, which is characterized in that a text data sample used for training a first model is obtained, an entity type label corresponding to a word contained in the text data sample is obtained, iteration training is carried out on the first model based on a first loss function, the text data sample and the entity type label corresponding to the word contained in the text data sample, the first model is obtained under the condition that the first model meets the preset convergence condition, the probability distribution of different predicted entity types corresponding to the word contained in the text data sample obtained by carrying out entity identification processing on the text data sample based on the first model of the first training is obtained, prediction entropy corresponding to the first model of the first training is determined, parameters in the first loss function are updated based on the prediction entropy corresponding to the first model of the first training, the updated first loss function is obtained, and the first model of the first training is iterated based on the updated first loss function until the first model converges, and the first model after training is obtained. In this way, the confidence level of the entity recognition result can be judged by the prediction entropy, so that the parameter of the first loss function is updated by the prediction entropy, the first loss function gradually transits from the average absolute value loss function to the cross entropy loss function, the high-concentration noise can be more robust in the early stage of model training, the neural network of the first model can be better trained through the updated first loss function after the noise is gradually screened out in the later stage of model training, namely, in the noisy learning of different stages, the first model can be more robust in the stage with stronger noise, in the stage with weaker noise, the neural network can be better trained and converged through the updated first loss function, in addition, the parameter of the first loss function can be flexibly adjusted (namely, updating processing) by the prediction entropy of the model, the model training efficiency can be improved without adjusting the parameters on different data sets, and the entity recognition accuracy of the first model is improved.

Example two

The embodiment of the specification provides a data processing method, an execution subject of the method may be a server, and the server may be a server, where the server may be an independent server or may be a server cluster formed by a plurality of servers. The method specifically comprises the following steps:

in S202, a text data sample for training a first model is acquired.

In S602, a word included in the text data sample is subjected to matching processing based on the preset database, and an entity type tag corresponding to a word matched with the word included in the text data sample in the preset database is determined as an entity type tag corresponding to the word included in the text data sample.

In implementation, the entity type label corresponding to the word contained in the first text data can be determined in a remote supervision manner, for example, the remote supervision method can search for the word matched with the unlabeled data in the existing database such as a knowledge base and a dictionary, and determine the entity type label corresponding to the unlabeled data based on the matching result, so that a large amount of labeled data can be obtained without relying on manual labeling.

However, the entity type tag corresponding to the word included in the text data sample is usually determined to have a lot of noise when matching with a preset database such as a fixed dictionary or a knowledge base, and the noise may be caused by that the preset database such as the dictionary or the knowledge base does not cover all the entities, or may be caused by that the word itself belongs to different entity types, but cannot be distinguished by a fixed matching method.

After obtaining entity type labels corresponding to words contained in the text data samples, performing iterative training on the first model through the first text data, and obtaining a preliminarily trained first model under the condition that the first model meets preset convergence conditions. Various methods for determining whether the first model meets the convergence condition are provided, for example, as shown in fig. 6, after S602, S604 may be continuously performed to determine whether the first model meets the preset convergence condition by determining whether the iteration number of the first model reaches the preset iteration number, or, as shown in fig. 7, after S602, S606 may be continuously performed to determine whether the first model meets the preset convergence condition by the entity recognition accuracy of the first model.

In S604, iterative training is performed on the first model based on the first loss function, the text data sample, and the entity type label corresponding to the word included in the text data sample, where the first model is determined to satisfy the preset convergence condition when the number of iterations of the first model reaches the preset number of iterations, and the first model for preliminary training is obtained when the first model satisfies the preset convergence condition.

The preset iteration number may be preset number configured based on a model structure, an application scene, and the like of the first model, for example, the preset iteration number may be 100 times, 1000 times, and the like.

In S606, iterative training is performed on the first model based on the first loss function, the text data sample, and the entity type label corresponding to the word included in the text data sample, where the entity recognition accuracy of the first model after the current iteration is over is smaller than that of the first model after the previous iteration is over, it is determined that the first model meets the preset convergence condition, and the first model of the preliminary training is obtained when the first model meets the preset convergence condition.

In implementation, the server may determine, based on the preset data verification set, the entity recognition accuracy of the first model after the iteration ends when each iteration ends, and may determine that the first model meets the preset convergence condition when the entity recognition accuracy of the first model after the current iteration ends is smaller than the entity recognition accuracy of the first model after the previous iteration ends.

In S608, based on the second loss function, the text data sample, and the entity type label corresponding to the word included in the text data sample, performing iterative training on the second model, and obtaining a preliminarily trained second model when the second model meets a preset convergence condition.

The model structures of the second model and the first model may be the same, and the first loss function and the second loss function may be generalized cross entropy loss functions.

In implementation, the method for determining whether the second model meets the preset convergence condition may refer to the method for determining whether the first model meets the preset convergence condition, that is, the method for determining whether the second model meets the preset convergence condition based on the method for determining in S604 or S606 is not described herein.

In S610, entity recognition processing is performed on the text data sample through the primarily trained second model, so as to obtain a first predicted entity type corresponding to the word included in the text data sample.

In implementation, the server may input the text data sample into the primarily trained second model to obtain a first predicted entity type corresponding to the word included in the text data sample.

In S612, the words included in the text data sample are divided into a first sample and a second sample including noise based on the entity type tag corresponding to the words included in the text data sample and the first predicted entity type.

In an implementation, words in the text data sample where the entity type tag matches the first predicted entity type may be determined as a first sample, and words in the text data sample where the entity type tag does not match the first predicted entity type may be determined as a second sample containing noise.

In S614, based on the prediction entropy of the first model of the preliminary training, the initial prediction entropy, and the number of entity type labels, the parameters in the first loss function are updated, so as to obtain an updated first loss function.

In practice, the predictive entropy of the first model of the preliminary training, the initial predictive entropy, and the number of entity type labels may be substituted into the formula

q＝1+ln(E/E ₀ )/N，

Obtaining parameters in a first loss function, wherein q is the parameter in the first loss functionParameters E is the predictive entropy of the first model of the preliminary training, E ₀ For initial prediction entropy, N is the number of entity type tags.

And replacing the parameters in the obtained first loss function with the parameters in the first loss function to obtain an updated first loss function.

In S616, the initially trained first model is iteratively trained based on the first sample and the updated first loss function until the first model converges, resulting in a trained first model.

In implementation, taking the model structures of the first model and the second model as an example, as shown in fig. 2, in the process of performing model training on the models (i.e., the first model and the second model), when the models meet the preset convergence condition, the training data can be screened through the entity recognition result.

If the entity type tag corresponding to the word contained in the text data sample is not matched with the predicted entity type, the predicted entity type corresponding to the word can be considered to be unreliable, so that the text data sample can be divided into trusted data and noise by whether the entity type tag is matched with the predicted entity type.

If the model itself filters the training data, it is likely to cause noise accumulation, and if the first model has already fitted some noise during the previous training, then the first model will not reject such noise data during the subsequent data filtering, which will result in the enhancement and accumulation of such errors during the subsequent training of the first model.

Therefore, the text data sample can be screened through the primarily trained second model, and the primarily trained first model is trained based on the primarily trained trusted data (namely the first sample) screened by the primarily trained second model, so that the noise learned by the first model and the noise learned by the second model can be different, the trusted data screened by the second model can be used for continuously training the first model, the noise accumulation problem in the training process of the same model can be avoided, and the generalization effect of the first model is improved.

In addition, if the training stages of the first model include a plurality of the second models and the second model includes a plurality of the first models, as shown in fig. 8, the first model may be trained in the first stage by using the text data sample to obtain a first model that is primarily trained, and the parameters of the first loss function are updated based on the text data sample to obtain an updated first loss function.

The server screens the text data sample based on the text data sample and the initially trained second model 1 to obtain a first sample 1, continuously trains the first model based on the first sample 1 and the updated first loss function, and can update the parameters of the updated first loss function again after the second stage training is finished to obtain the updated first loss function.

The server may screen the text data sample based on the text data sample and the preliminarily trained second model 2 to obtain a first sample 2, and continuously train the first model based on the first sample 2 and the updated first loss function, after the training in the third stage is finished, update the parameters of the updated first loss function again to obtain the updated first loss function, and so on until the first model converges to obtain the trained first model. Therefore, the noise learned by the first model and each second model can be different, so that the first model can be trained by the credible data screened out by the different second models in different training stages, the problem of noise accumulation in the same model training process can be avoided, and the generalization effect of the first model is improved.

Example III

in S202, a text data sample for training a first model is acquired.

In implementation, the specific processing procedure of S204 may refer to the content related to S604 or S606 in the second embodiment, which is not described herein.

In implementation, as shown in fig. 1, there may be a central server (e.g., server 1) in the data processing system, where the central server is configured to train the entity recognition model to be trained based on text data samples sent by other servers (e.g., server 2 and server 3) when the model training period is reached, and return model parameters of the trained entity recognition model to the corresponding server after the trained entity recognition model is obtained.

In order to avoid the problem of noise accumulation in the model training process, the central server may store the entity identification model corresponding to each server, and the model structures of the entity identification models corresponding to each server are the same, and when the model training period is reached, the central server may train the entity identification model corresponding to each server based on text data samples sent by other servers. Specifically, the central server may use the entity recognition model corresponding to the server 2 as a first model, use the entity recognition model corresponding to the server 3 as a second model, and use the text data samples sent by the server 2 and the server 3 as text data samples for training the first model and the second model.

The center server can train the first model based on the training process to obtain a trained first model, and meanwhile, the center server can train the initially trained second model continuously based on the initially trained first model and the text data sample to obtain a trained second model. That is, as shown in fig. 9, after S616, the server may continue to execute S902 to S908.

In S902, entity recognition processing is performed on the text data sample through the first model that is primarily trained, so as to obtain a second predicted entity type corresponding to the word included in the text data sample.

In S904, words included in the text data sample are divided into a third sample and a fourth sample including noise based on the entity type tag corresponding to the words included in the text data sample and the second predicted entity type.

In practice, the above-mentioned processing manner of S904 may be varied, and the following provides an alternative implementation manner, which can be seen from the following steps one to two:

and step one, performing entity identification processing on the text data sample through a primarily trained second model to obtain a probability value of a second predicted entity type corresponding to words contained in the text data sample.

And secondly, dividing the words contained in the text data sample into a third sample and a fourth sample containing noise based on the entity type labels corresponding to the words contained in the text data sample, the probability values of the second predicted entity types corresponding to the words contained in the text data sample and a preset probability threshold value.

In implementation, for example, assuming that the entity type label corresponding to the word included in the text data sample is a 0-1 label, the preset probability threshold is 0.7 and 0.2, if the probability value of the second predicted entity type corresponding to the word included in the text data sample is less than 0.5 and not more than 0.2, the second predicted entity type corresponding to the word included in the text data sample is label 1, and if the probability value of the second predicted entity type corresponding to the word included in the text data sample is not less than 0.5 and not less than 0.7, the second predicted entity type corresponding to the word included in the text data sample is label 2. Therefore, through the preset probability threshold value, the text data samples can be accurately screened, and the reliability of the third samples is improved.

In S906, based on the probability distribution of the words included in the text data sample corresponding to different predicted entity types obtained by performing entity recognition processing on the text data sample by the primarily trained second model, a prediction entropy corresponding to the primarily trained second model is determined.

In S908, based on the prediction entropy corresponding to the initially trained second model, updating the parameters in the second loss function to obtain an updated second loss function, and performing iterative training on the initially trained second model based on the third sample and the updated second loss function until the second model converges to obtain a trained second model.

In implementation, the method for determining the prediction entropy corresponding to the first model of the preliminary training and the training process of the second model may refer to the method for determining the prediction entropy corresponding to the first model of the preliminary training and the training process of the first model, which are not described herein.

In this way, when the server obtains the first model and the second model after training, the model parameters corresponding to the models after training can be respectively returned to the corresponding server, for example, the model parameters of the first model after training can be sent to the server 2, and the model parameters of the second model after training can be sent to the server 3. The server may update the local model based on the received model parameters to perform entity identification based on the updated model.

In addition, under the condition that a plurality of second models exist, the server side can train the first model and the plurality of second models in a mode that the plurality of models mutually screen the credible data to train the models, and the trained first model and second model are obtained.

For example, as shown in fig. 10, the server may store a first model and n second models, and train the multiple models by mutually screening trusted data through the models, specifically, for example, when the first model meets a preset convergence condition, the server may perform screening processing on a text data sample based on the first model of the preliminary training to obtain a third sample, and the server may further perform screening processing on the text data sample based on the second model 1 of the preliminary training to obtain the first sample 1, and the server may further perform training on the second model 2 of the preliminary training based on the first sample 1, and so on, may further train the first model of the preliminary training based on the second model n of the preliminary training until the first model and the n second models converge, and may obtain the first model after training and the n second models after training.

In addition, the server may perform the integration processing on the first model and the plurality of second models, that is, as shown in fig. 11, after S908, the execution of S1102 may be further continued.

In S1102, a model integration process is performed on the trained first model and the trained second model, to obtain a target model for performing entity recognition processing on text data.

In implementation, the server may perform entity recognition processing based on the target model obtained by the integration processing, and in addition, the server may further send model parameters of the target model to other servers in the data processing system, so that the other servers may update the locally stored target model based on the received model parameters. Therefore, the data processing resources of other service ends can be saved, and the data processing efficiency is improved.

Example IV

As shown in fig. 12A and fig. 12B, the embodiment of the present disclosure provides a data processing method, where an execution body of the method may be a server, and the server may be a server or a terminal device, where the server may be an independent server or a server cluster formed by a plurality of servers, and the terminal device may be a device such as a personal computer or a mobile terminal device such as a mobile phone or a tablet computer. The method specifically comprises the following steps:

in S1202, in the case where it is detected that the target user triggers execution of the target service, target text data to be recognized is determined based on the acquired target information.

The target information comprises information required by the target user to trigger execution of the target service and/or interaction information of the target user aiming at triggering execution of the target service. The target service may be any service related to user privacy, property security, etc., for example, the target service may be a resource transfer service, a privacy information update service (such as modifying a login password, adding new user information, etc.), etc., and assuming that the target service is a resource transfer service, the target information may include authentication information required for triggering execution of the resource transfer service by the target user, and/or interaction information for triggering execution of the resource transfer service by the target user may include information specific to the target user such as "whether or not and what is known about the resource transfer object on the internet? "feedback information of the waiting operation".

In an implementation, taking a target service as an example of a resource transfer service in a resource management application installed in a terminal device, a target user may trigger starting the resource management application, and trigger executing the resource transfer service in the resource management application. The terminal device may acquire information (such as authentication information of the target user) required for triggering the execution of the resource transfer service by the target user, and take the information as target information.

In addition, the terminal device can output preset prompt information and receive feedback information input by the target user aiming at the preset prompt information under the condition that the terminal device detects that the target user triggers the execution of the target service, and the terminal device can determine the preset prompt information and the feedback information input by the target user aiming at the preset prompt information as target information.

For example, as shown in fig. 13, when the terminal device detects that the target user triggers to execute the resource transfer service, a prompt page with preset prompt information (i.e. prompt information Q1 and prompt information Q2) may be displayed, and feedback information input by the target user on the preset prompt information on the prompt page may be received. The electronic device may determine the hint information Q1, the hint information Q2, the feedback information A1, and the feedback information A2 as target information.

The terminal device can send the collected target information to the server, and the server can determine target text data to be identified based on the obtained target information. Because the target information collected by the terminal equipment may include audio data, picture data, webpage data, video data and the like, the server side can perform text conversion processing on the target information to obtain target text data.

In S1204, the target text data is input into the trained first model to obtain a predicted entity type corresponding to the target text data.

Wherein the training process of the first model comprises: obtaining a text data sample for training a first model, obtaining entity type labels corresponding to words contained in the text data sample, carrying out iterative training on the first model based on a first loss function, the text data sample and the entity type labels corresponding to the words contained in the text data sample, obtaining a first model for preliminary training under the condition that the first model meets a preset convergence condition, carrying out entity recognition processing on the text data sample based on the first model for preliminary training to obtain probability distribution of different predicted entity types corresponding to the words contained in the text data sample, determining prediction entropy corresponding to the first model for preliminary training, carrying out update processing on parameters in the first loss function based on the prediction entropy corresponding to the first model for preliminary training, obtaining an updated first loss function, and carrying out iterative training on the first model for preliminary training based on the updated first loss function until the first model converges to obtain the first model after training.

In implementation, the server may train the first model based on the training process in the first embodiment, the second embodiment or the third embodiment to obtain a trained first model, and input the target text data into the trained first model to obtain the predicted entity type corresponding to the word included in the target text data.

In addition, the server may further receive model parameters of the first trained model obtained after the central server trains the first model based on the training process in the first embodiment, the second embodiment or the third embodiment, and update the local first model based on the model parameters.

Or the server may further determine the target model obtained by the integration processing in the third embodiment as the trained first model, and determine the predicted entity type corresponding to the target text data based on the target model.

In S1206, based on the predicted entity type corresponding to the target text data, a target call with which the target user triggers execution of target service matching in the candidate call is determined, and the target call is output.

The candidate speech operation can be used for acquiring feedback information of the target user aiming at the target service in the interaction process with the target user, and the feedback information can be any text information, voice information and the like.

In implementation, the server may determine, based on the predicted entity type corresponding to the word included in the target text data, a target session that triggers execution of target service matching with the target user in the candidate session, and output the target session.

For example, assuming that the predicted entity type corresponding to the word included in the target text data includes a person name type and an organization name type, the server may acquire the word corresponding to the person name type and the word corresponding to the organization name type, and determine the corresponding target conversation from the candidate conversations based on the acquired word.

The above-mentioned determination method of target speaking is an optional and realizable determination method, and in the actual application scenario, there may be a plurality of different determination methods, and they may be different according to the actual application scenario, which is not specifically limited in the embodiment of the present disclosure.

The embodiment of the specification provides a data processing method, under the condition that a target user is detected to trigger execution of target service, target text data to be identified is determined based on acquired target information, the target information comprises information required by the target user to trigger execution of the target service, and/or interaction information of the target user for triggering execution of the target service, the target text data is input into a trained first model to obtain a predicted entity type corresponding to the target text data, a target conversation matched with the target user to trigger execution of the target service in a candidate conversation is determined based on the predicted entity type corresponding to the target text data, and the target conversation is output, wherein the training process of the first model comprises the following steps: obtaining a text data sample for training a first model, obtaining entity type labels corresponding to words contained in the text data sample, carrying out iterative training on the first model based on a first loss function, the text data sample and the entity type labels corresponding to the words contained in the text data sample, obtaining a first model for preliminary training under the condition that the first model meets a preset convergence condition, carrying out entity recognition processing on the text data sample based on the first model for preliminary training to obtain probability distribution of different predicted entity types corresponding to the words contained in the text data sample, determining prediction entropy corresponding to the first model for preliminary training, carrying out update processing on parameters in the first loss function based on the prediction entropy corresponding to the first model for preliminary training, obtaining an updated first loss function, and carrying out iterative training on the first model for preliminary training based on the updated first loss function until the first model converges to obtain the first model after training. In this way, in the training process of the first model, the confidence level of the entity recognition result of the first model can be judged through the prediction entropy, so that the parameter of the first loss function is updated through the prediction entropy, the first loss function is gradually transited from the average absolute value loss function to the cross entropy loss function, the high-concentration noise can be more robust in the early stage of model training, in the later stage of model training, after the noise is gradually screened out, the neural network of the first model can be better trained through the updated first loss function, namely in the noisy learning of different stages, the first model can be more robust in the stage with stronger noise, in the stage with weaker noise, the neural network can be better trained and converged through the updated first loss function, in addition, the parameter of the first loss function can be flexibly adjusted (namely updated) through the prediction entropy of the model, different data sets are not required, the parameter adjustment is carried out on different data sets, the training efficiency is improved, the accuracy of the entity recognition model is improved, and the accuracy of the target text recognition can be further improved, and the accuracy of the target recognition is improved.

Example five

The data processing method provided in the embodiment of the present disclosure is based on the same concept, and the embodiment of the present disclosure further provides a data processing apparatus, as shown in fig. 14.

The data processing apparatus includes: a data acquisition module 1401, a first training module 1402, a first determination module 1403, and a second training module 1404, wherein:

a data obtaining module 1401, configured to obtain a text data sample for training a first model, and an entity type tag corresponding to a word included in the text data sample;

a first training module 1402, configured to perform iterative training on the first model based on a first loss function, the text data sample, and an entity type tag corresponding to a word included in the text data sample, and obtain a first model for preliminary training if the first model meets a preset convergence condition;

a first determining module 1403, configured to determine a prediction entropy corresponding to the first model of the preliminary training, based on probability distributions of different predicted entity types corresponding to words included in the text data sample obtained by performing entity recognition processing on the text data sample by using the first model of the preliminary training;

The second training module 1404 is configured to update the parameters in the first loss function based on the prediction entropy corresponding to the first model of the preliminary training, obtain an updated first loss function, and perform iterative training on the first model of the preliminary training based on the updated first loss function until the first model converges, so as to obtain a trained first model.

In an embodiment of the present disclosure, the apparatus further includes:

the third training module is used for carrying out iterative training on a second model based on a second loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained second model under the condition that the second model meets the preset convergence condition, wherein the second model has the same model structure as the first model;

the second determining module is used for carrying out entity identification processing on the text data sample through the primarily trained second model to obtain a first predicted entity type corresponding to words contained in the text data sample;

the first dividing module is used for dividing words contained in the text data sample into a first sample and a second sample containing noise based on entity type labels corresponding to the words contained in the text data sample and a first predicted entity type;

The second training module is used for:

and carrying out iterative training on the initially trained first model based on the first sample and the updated first loss function until the first model converges to obtain the trained first model.

In an embodiment of the present disclosure, the apparatus further includes:

the third determining module is used for carrying out entity identification processing on the text data sample through the first model of the preliminary training to obtain a second predicted entity type corresponding to words contained in the text data sample;

the second dividing module is used for dividing words contained in the text data sample into a third sample and a fourth sample containing noise based on entity type labels corresponding to the words contained in the text data sample and a second predicted entity type;

a fourth determining module, configured to determine, based on a probability distribution of different predicted entity types corresponding to words included in the text data sample obtained by performing entity recognition processing on the text data sample by using the initially trained second model, a prediction entropy corresponding to the initially trained second model;

and the fourth training module is used for updating the parameters in the second loss function based on the prediction entropy corresponding to the primarily trained second model to obtain an updated second loss function, and carrying out iterative training on the primarily trained second model based on the third sample and the updated second loss function until the second model converges to obtain a trained second model.

In this embodiment of the present disclosure, the second dividing module is configured to:

performing entity recognition processing on the text data sample through the primarily trained second model to obtain a probability value of a second predicted entity type corresponding to words contained in the text data sample;

dividing words contained in the text data sample into the third sample and the fourth sample containing noise based on entity type labels corresponding to words contained in the text data sample, probability values of a second predicted entity type corresponding to words contained in the text data sample and a preset probability threshold value.

In an embodiment of the present disclosure, the apparatus further includes:

and the model integration module is used for carrying out model integration processing on the trained first model and the trained second model to obtain a target model for carrying out entity recognition processing on the text data.

In the embodiment of the present disclosure, the second training module 1404 is configured to:

and updating parameters in the first loss function based on the prediction entropy of the initially trained first model, the initial prediction entropy and the number of the entity type labels to obtain the updated first loss function.

In an embodiment of the present disclosure, the apparatus further includes:

the first judging module is used for determining that the first model meets the preset convergence condition under the condition that the iteration number of the first model reaches the preset iteration number.

In an embodiment of the present disclosure, the apparatus further includes:

and the second judging module is used for determining that the first model meets the preset convergence condition under the condition that the entity recognition accuracy of the first model after the current iteration is ended is smaller than the entity recognition accuracy of the first model after the last iteration is ended.

In the embodiment of the present disclosure, the data acquisition module 1401 is configured to:

and carrying out matching processing on words contained in the text data sample based on a preset database, and determining entity type labels corresponding to words matched with the words contained in the text data sample in the preset database as entity type labels corresponding to the words contained in the text data sample.

In an embodiment of the present disclosure, the first loss function and the second loss function are generalized cross entropy loss functions.

The embodiment of the specification provides a data processing device, through obtaining a text data sample for training a first model and entity type labels corresponding to words contained in the text data sample, based on a first loss function, the text data sample and the entity type labels corresponding to words contained in the text data sample, performing iterative training on the first model, obtaining a first model of preliminary training under the condition that the first model meets a preset convergence condition, based on probability distribution of different predicted entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of preliminary training, determining prediction entropy corresponding to the first model of preliminary training, based on the prediction entropy corresponding to the first model of preliminary training, performing update processing on parameters in the first loss function to obtain an updated first loss function, and performing iterative training on the first model of preliminary training based on the updated first loss function until the first model converges to obtain the first model after training. In this way, the confidence level of the entity recognition result can be judged by the prediction entropy, so that the parameter of the first loss function is updated by the prediction entropy, the first loss function gradually transits from the average absolute value loss function to the cross entropy loss function, the high-concentration noise can be more robust in the early stage of model training, the neural network of the first model can be better trained through the updated first loss function after the noise is gradually screened out in the later stage of model training, namely, in the noisy learning of different stages, the first model can be more robust in the stage with stronger noise, in the stage with weaker noise, the neural network can be better trained and converged through the updated first loss function, in addition, the parameter of the first loss function can be flexibly adjusted (namely, updating processing) by the prediction entropy of the model, the model training efficiency can be improved without adjusting the parameters on different data sets, and the entity recognition accuracy of the first model is improved.

Example six

Based on the same concept, the embodiment of the present disclosure further provides a data processing apparatus, as shown in fig. 15.

The data processing apparatus includes: an information acquisition module 1501, a type determination module 1502 and a speech determination module 1503, wherein:

an information acquisition module 1501, configured to determine target text data to be identified based on acquired target information in a case where it is detected that a target user triggers execution of a target service, where the target information includes information required by the target user to trigger execution of the target service, and/or interaction information of the target user for triggering execution of the target service;

a type determining module 1502, configured to input the target text data into the trained first model, and obtain a predicted entity type corresponding to the target text data;

a speaking operation output module 1503, configured to determine, based on a predicted entity type corresponding to the target text data, a target speaking operation that triggers execution of the target service matching with the target user in a candidate speaking operation, and output the target speaking operation;

wherein the training process of the first model comprises: acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample; performing iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition; determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training; and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

The embodiment of the specification provides a data processing device, under the condition that a target user is detected to trigger execution of a target service, target text data to be identified is determined based on acquired target information, the target information comprises information required by the target user to trigger execution of the target service, and/or interaction information of the target user for triggering execution of the target service, the target text data is input into a trained first model to obtain a predicted entity type corresponding to the target text data, a target conversation matched with the target user to trigger execution of the target service in a candidate conversation is determined based on the predicted entity type corresponding to the target text data, and the target conversation is output, wherein a training process of the first model comprises: obtaining a text data sample for training a first model, obtaining entity type labels corresponding to words contained in the text data sample, carrying out iterative training on the first model based on a first loss function, the text data sample and the entity type labels corresponding to the words contained in the text data sample, obtaining a first model for preliminary training under the condition that the first model meets a preset convergence condition, carrying out entity recognition processing on the text data sample based on the first model for preliminary training to obtain probability distribution of different predicted entity types corresponding to the words contained in the text data sample, determining prediction entropy corresponding to the first model for preliminary training, carrying out update processing on parameters in the first loss function based on the prediction entropy corresponding to the first model for preliminary training, obtaining an updated first loss function, and carrying out iterative training on the first model for preliminary training based on the updated first loss function until the first model converges to obtain the first model after training. In this way, in the training process of the first model, the confidence level of the entity recognition result of the first model can be judged through the prediction entropy, so that the parameter of the first loss function is updated through the prediction entropy, the first loss function is gradually transited from the average absolute value loss function to the cross entropy loss function, the high-concentration noise can be more robust in the early stage of model training, in the later stage of model training, after the noise is gradually screened out, the neural network of the first model can be better trained through the updated first loss function, namely in the noisy learning of different stages, the first model can be more robust in the stage with stronger noise, in the stage with weaker noise, the neural network can be better trained and converged through the updated first loss function, in addition, the parameter of the first loss function can be flexibly adjusted (namely updated) through the prediction entropy of the model, different data sets are not required, the parameter adjustment is carried out on different data sets, the training efficiency is improved, the accuracy of the entity recognition model is improved, and the accuracy of the target text recognition can be further improved, and the accuracy of the target recognition is improved.

Example seven

Based on the same idea, the embodiment of the present disclosure further provides a data processing apparatus, as shown in fig. 16.

The data processing apparatus may vary considerably in configuration or performance and may include one or more processors 1601 and memory 1602, where the memory 1602 may store one or more stored applications or data. Wherein the memory 1602 may be a transient storage or a persistent storage. The application programs stored in memory 1602 may include one or more modules (not shown) each of which may include a series of computer executable instructions for use in a data processing apparatus. Still further, the processor 1601 may be configured to communicate with a memory 1602 and execute a series of computer executable instructions in the memory 1602 on a data processing device. The data processing device may also include one or more power supplies 1603, one or more wired or wireless network interfaces 1604, one or more input/output interfaces 1605, and one or more keyboards 1606.

In particular, in this embodiment, the data processing apparatus includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the data processing apparatus, and the one or more programs configured to be executed by the one or more processors comprise instructions for:

Acquiring a text data sample for training a first model, and an entity type label corresponding to a word contained in the text data sample;

performing iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition;

determining a prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample obtained by performing entity identification processing on the text data sample by the first model of the preliminary training;

and updating parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and performing iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

In addition, the one or more programs configured to be executed by the one or more processors also include computer-executable instructions for:

Under the condition that the target user is detected to trigger the execution of the target service, determining target text data to be identified based on the acquired target information, wherein the target information comprises information required by the target user to trigger the execution of the target service and/or interaction information of the target user for triggering the execution of the target service;

inputting the target text data into the trained first model to obtain a predicted entity type corresponding to the target text data;

determining a target conversation in the candidate conversation and triggered by the target user to execute the target service matching based on the predicted entity type corresponding to the target text data, and outputting the target conversation;

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for data processing apparatus embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

The embodiment of the specification provides a data processing device, which can judge the confidence level of a first model on an entity identification result through prediction entropy, so that parameters of a first loss function are updated through the prediction entropy, the first loss function is gradually transited from an average absolute value loss function to a cross entropy loss function, high-concentration noise can be more robust in the early stage of model training, the neural network of the first model can be better trained through the updated first loss function after the noise is gradually screened out in the later stage of model training, namely, in the noisy learning of different stages, the first model can be more robust in the stage with stronger noise, in the stage with weaker noise, the neural network can be better trained and converged through the updated first loss function, in addition, the parameters of the first loss function can be flexibly adjusted (namely, updating processing) through the prediction entropy of the model, the parameters of the first loss function can be not required to be adapted to different data sets, the parameter adjustment can be carried out on different data sets, the training efficiency is improved, and the accuracy of the first entity identification model is named.

Example eight

The embodiments of the present disclosure further provide a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements each process of the embodiments of the data processing method, and the same technical effects can be achieved, and for avoiding repetition, a detailed description is omitted herein. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The embodiment of the specification provides a computer readable storage medium, which can judge the self-confidence of a first model on an entity identification result through prediction entropy, so that the parameter of a first loss function is updated through the prediction entropy, the first loss function gradually transits from an average absolute value loss function to a cross entropy loss function, high-concentration noise can be more robust in the early stage of model training, the neural network of the first model can be better trained through the updated first loss function after the noise is gradually screened out in the later stage of model training, namely, in the noisy learning of different stages, the first model can be more robust in the stage with stronger noise, in the stage with weaker noise, the neural network can be better trained and converged through the updated first loss function, in addition, the parameter of the first loss function can be flexibly adjusted (i.e. updated) through the prediction entropy of the model, the parameters of the first loss function can be adapted to different data sets, the neural network can be better trained on different data sets, the model training efficiency is improved, and the accuracy of the entity identification of the first model is improved.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing one or more embodiments of the present description.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A data processing method, comprising:

2. The method of claim 1, the method further comprising:

Performing iterative training on a second model based on a second loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained second model under the condition that the second model meets the preset convergence condition, wherein the second model has the same model structure as the first model;

performing entity recognition processing on the text data sample through the primarily trained second model to obtain a first predicted entity type corresponding to words contained in the text data sample;

dividing words contained in the text data sample into a first sample and a second sample containing noise based on entity type labels corresponding to the words contained in the text data sample and a first predicted entity type;

the iterative training is performed on the first model of the preliminary training based on the updated first loss function until the first model converges, so as to obtain a trained first model, which comprises the following steps:

3. The method of claim 2, the method further comprising:

performing entity recognition processing on the text data sample through the initially trained first model to obtain a second predicted entity type corresponding to words contained in the text data sample;

dividing words contained in the text data sample into a third sample and a fourth sample containing noise based on entity type labels corresponding to the words contained in the text data sample and a second predicted entity type;

determining a prediction entropy corresponding to the primarily trained second model based on probability distribution of different prediction entity types corresponding to words contained in the text data sample, wherein the probability distribution is obtained by carrying out entity recognition processing on the text data sample by the primarily trained second model;

and updating parameters in the second loss function based on the prediction entropy corresponding to the initially trained second model to obtain an updated second loss function, and performing iterative training on the initially trained second model based on the third sample and the updated second loss function until the second model converges to obtain a trained second model.

4. The method of claim 3, the dividing the words contained in the text data samples into a third sample and a fourth sample containing noise based on the entity type tags and the second predicted entity types corresponding to the words contained in the text data samples, comprising:

5. A method according to claim 3, the method further comprising:

and carrying out model integration processing on the trained first model and the trained second model to obtain a target model for carrying out entity recognition processing on the text data.

6. The method of claim 1, wherein the updating the parameters in the first loss function based on the prediction entropy corresponding to the first model of the preliminary training to obtain an updated first loss function includes:

7. The method of claim 1, further comprising, before the obtaining the initially trained first model if the first model meets a preset convergence condition:

and under the condition that the iteration times of the first model reach the preset iteration times, determining that the first model meets the preset convergence condition.

8. The method of claim 1, further comprising, before the obtaining the initially trained first model if the first model meets a preset convergence condition:

and under the condition that the entity recognition accuracy of the first model after the current iteration is finished is smaller than that of the first model after the last iteration is finished, determining that the first model meets the preset convergence condition.

9. The method of claim 1, wherein the obtaining the entity type tag corresponding to the word included in the text data sample includes:

10. The method of claim 2, the first and second loss functions being generalized cross entropy loss functions.

11. A data processing method, comprising:

12. A data processing apparatus comprising:

the data acquisition module is used for acquiring a text data sample for training the first model and entity type labels corresponding to words contained in the text data sample;

the first training module is used for carrying out iterative training on the first model based on a first loss function, the text data sample and entity type labels corresponding to words contained in the text data sample, and obtaining a preliminarily trained first model under the condition that the first model meets a preset convergence condition;

the first determining module is used for determining the prediction entropy corresponding to the first model of the preliminary training based on probability distribution of different prediction entity types corresponding to words contained in the text data sample, which is obtained by carrying out entity identification processing on the text data sample by the first model of the preliminary training;

and the second training module is used for updating the parameters in the first loss function based on the prediction entropy corresponding to the initially trained first model to obtain an updated first loss function, and carrying out iterative training on the initially trained first model based on the updated first loss function until the first model converges to obtain a trained first model.

13. A data processing apparatus comprising:

the information acquisition module is used for determining target text data to be identified based on the acquired target information under the condition that the target user is detected to trigger the execution of the target service, wherein the target information comprises information required by the target user to trigger the execution of the target service and/or interaction information of the target user for triggering the execution of the target service;

the type determining module is used for inputting the target text data into the trained first model to obtain a predicted entity type corresponding to the target text data;

the voice operation output module is used for determining a target voice operation which is triggered to execute the target service matching with the target user in the candidate voice operation based on the predicted entity type corresponding to the target text data, and outputting the target voice operation;

14. A data processing apparatus, the data processing apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

15. A data processing apparatus, the data processing apparatus comprising:

a processor; and