CN115269844B

CN115269844B - Model processing method, device, electronic equipment and storage medium

Info

Publication number: CN115269844B
Application number: CN202210917417.9A
Authority: CN
Inventors: 周青宇; 李映辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2024-03-29
Anticipated expiration: 2042-08-01
Also published as: CN115269844A

Abstract

The embodiment of the application discloses a method and a device for processing a model, electronic equipment and a storage medium; the embodiment of the application can acquire the last model, the training sample set and the expansion sample set; training a previous model by adopting a training sample set to obtain a current model; inputting the entities in the extended sample set to the current model to obtain a prediction result corresponding to the entities; obtaining a prediction confidence corresponding to the entity based on a prediction result corresponding to the entity; classifying the entities according to the prediction confidence corresponding to the entities to obtain the types of the entities; obtaining the contrast loss of the current model according to the entity pairs; when the contrast loss is not greater than a preset threshold value, returning to and executing the steps to acquire a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model; and when the contrast loss is larger than a preset threshold value, ending training. Therefore, the scheme can improve the precision of the model to be used in use.

Description

Model processing method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for processing a model, an electronic device, and a storage medium.

Background

With the rapid development of computer technology, natural language processing technology is gradually maturing. The information extraction is an important natural language processing technology, and aims to extract the fact information of the specified type of entities, relations, events and the like from natural language texts, so that a structured data format output is formed.

However, existing pre-trained language models do not extract information from natural language text according to a given entity with high accuracy of the extracted results.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing a model, electronic equipment and a storage medium, which can improve the precision of the model to be used in use.

The embodiment of the application provides a processing method of a model, which comprises the following steps:

acquiring a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model;

training the previous model by adopting the training sample set to obtain a current model;

inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity;

obtaining a prediction confidence corresponding to the entity based on a prediction result corresponding to the entity;

Classifying the entities according to the prediction confidence corresponding to the entities to obtain the types of the entities;

obtaining the comparison loss of the current model according to an entity pair, wherein the entity pair comprises a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity;

when the contrast loss is not greater than a preset threshold value, returning to and executing the steps to obtain a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model;

and ending training when the contrast loss is larger than a preset threshold value.

The embodiment of the application also provides a processing device of the model, which comprises:

the acquisition unit is used for acquiring a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model;

the first obtaining unit is used for inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity;

the second obtaining unit is used for obtaining the prediction confidence corresponding to the entity based on the prediction result corresponding to the entity;

the classification unit is used for classifying the entity according to the prediction confidence corresponding to the entity to obtain the type of the entity;

A third obtaining unit, configured to obtain a comparison loss of the current model according to an entity pair, where the entity pair includes a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity;

the return unit is used for returning and executing the steps to acquire a previous model, a training sample set and an expansion sample set when the contrast loss is not greater than a preset threshold value, wherein the previous model comprises an initial neural network model;

and the training ending unit is used for ending training when the contrast loss is larger than a preset threshold value.

The embodiment of the application also provides electronic equipment, which comprises a memory, wherein the memory stores a plurality of instructions; the processor loads instructions from the memory to perform steps in the processing method of any of the models provided in the embodiments of the present application.

The embodiments of the present application also provide a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform steps in a method for processing any of the models provided in the embodiments of the present application.

The embodiment of the application can acquire a last model, a training sample set and an expansion sample set, wherein the last model comprises an initial neural network model; training the previous model by adopting the training sample set to obtain a current model; inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity; obtaining a prediction confidence corresponding to the entity based on a prediction result corresponding to the entity; classifying the entities according to the prediction confidence corresponding to the entities to obtain the types of the entities; obtaining the comparison loss of the current model according to an entity pair, wherein the entity pair comprises a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity; when the contrast loss is not greater than a preset threshold value, returning to and executing the steps to obtain a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model; and ending training when the contrast loss is larger than a preset threshold value.

In the method, an initial neural network model can be trained through a training sample set to obtain a trained current model, then an expanded sample set is input into the current model to obtain a prediction result, entity types in the expanded sample set can be obtained according to the prediction result, comparison loss can be obtained according to the entity types, and the training completion degree of the current model can be determined according to the comparison loss, wherein when the training of the current model is incomplete, the current model can be used as a last model of the next iteration to be iteratively trained through the expanded sample set, when the training of the current model is completed, the current model is used as the initial neural network model for completing the training, the current model after the training of the expanded sample set is used for classifying the entities in the expanded sample set through the training sample set, so that the current model is better in prediction effect along with continuous training, the classification effect of the entities in the expanded sample set is better, the entity embedding vectors of the positive entity and the negative entity are more discrete, and further, when the loss is calculated through the positive entity and the negative entity, the training effect of the initial neural network is reduced. Meanwhile, when the last model is trained, the training effect can be ensured by using the training sample set, so that the problem that the entity added into the set later deviates from the category more easily to form vicious circle because the newly added entity does not belong to the category or has stronger semantic information irrelevant to the category when a new entity is introduced in the training process is avoided, and the accuracy of the model to be used when the model to be used is used after the training is finished is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic view of a scenario of a method for processing a model provided in an embodiment of the present application;

FIG. 1b is a schematic flow chart of a method for processing a model according to an embodiment of the present application;

fig. 2a is a schematic diagram of a processing method of a model provided in an embodiment of the present application applied in a server scenario;

FIG. 2b is a flow chart of another model processing method according to an embodiment of the present disclosure;

FIG. 2c is a flow chart of another model processing method according to an embodiment of the present disclosure;

FIG. 2d is a flow chart of another model processing method according to an embodiment of the present disclosure;

FIG. 3 is a schematic view of a first configuration of a processing device of a model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal provided in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The embodiment of the application provides a method and device for processing a model, electronic equipment and a storage medium.

The processing device of the model may be integrated in an electronic device, and the electronic device may be a terminal, a server, or other devices. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer (Personal Computer, PC) or the like; the server may be a single server or a server cluster composed of a plurality of servers.

In some embodiments, the processing apparatus of the model may also be integrated in a plurality of electronic devices, for example, the processing apparatus of the model may be integrated in a plurality of servers, and the processing method of the model of the present application is implemented by the plurality of servers.

In some embodiments, the server may also be implemented in the form of a terminal.

For example, referring to fig. 1a, the electronic device may be integrated in a server, and a previous model including an initial neural network model, a training sample set, and an expanded sample set may be obtained; training the previous model by adopting the training sample set to obtain a current model; inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity; obtaining a prediction confidence corresponding to the entity based on a prediction result corresponding to the entity; classifying the entities according to the prediction confidence corresponding to the entities to obtain the types of the entities; obtaining the comparison loss of the current model according to an entity pair, wherein the entity pair comprises a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity; when the contrast loss is not greater than a preset threshold value, returning to and executing the steps to obtain a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model; and ending training when the contrast loss is larger than a preset threshold value.

The following will describe in detail. The numbers of the following examples are not intended to limit the preferred order of the examples.

Artificial intelligence (Artificial Intelligence, AI) is a technology that utilizes a digital computer to simulate the human perception environment, acquire knowledge, and use the knowledge, which can enable machines to function similar to human perception, reasoning, and decision. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

In this embodiment, a method for processing a model based on machine learning is provided, as shown in fig. 1b, a specific flow of the method for processing a model may be as follows:

110. a last model, a training sample set, and an expanded sample set are obtained, the last model comprising an initial neural network model.

The previous model may be an initial neural network model during the first iterative training, or may be a model obtained during the previous iterative training, for example, in some embodiments, the previous model may be an initial neural network model during the first iterative training, and the previous model may be a model after training the initial neural network model during the second iterative training.

An iteration may be a process of repeating feedback while training the last model, where each repetition of the process is referred to as an "iteration".

The previous model may be a language model, an audio model, or an image model, where in some embodiments, the language model may be a natural language processing model for open information extraction, the open information extraction may be entity set expansion (Entity Set Expansion), and according to the obtained seed entities, the same entity as the seed entities in a natural language text or a network may be extracted, and entity information may be expanded into the entity, where when information extraction is performed, the natural language processing model is required to automatically determine category information of the obtained seed entities, or perform category word expansion according to categories of the seed entities. The language model may also be a natural language processing model for other fields.

For example, in some embodiments, the previous model may be BERT (Bidirectional Encoder Representations from Transformer, bi-directional coding language model), but also may be a neural network model such as CNN (Convolutional Neural Network ), DNN (Deep Neural Network, deep neural network), LSTM (Long Short-Term Memory Neural Network, long-Short-term memory neural network), and the like, which is not limited herein.

When the last model is BERT, BERT adopts an encoder-decoder architecture, wherein the encoder is a multi-layer bidirectional transducer (neural network architecture) network conforming to BERT setting, and the decoder is a classification header, and the classification header comprises two full-connection layers and a softmax (logistic regression algorithm) layer. In use, the input character sequence is first broken up into individual words using a word segmentation machine. The word embedding and position embedding of each word are then added to obtain its input embedded vector. The sequence of input embedded vectors is then passed through 12 layers of bi-directional transformers, where the dimension h=768 of the hidden embedded vector for each Transformer, the number of self-attention heads a=12. The final hidden embedded vector corresponding to the masking entity is then input into the classification head for decoding and outputting the predictive probability distribution.

The training sample set may be a sample text for training the previous model, where the training sample set may be the same sample text or may be different sample texts when training is performed in each iteration.

In some embodiments, the training sample set may include an entity in an entity vocabulary, and a sample sentence containing the entity, where the sample sentence may be a text sentence.

The extended sample set may include a training sample set, and may include other set training sample sets.

The training sample set and the extension sample set may include an entity, where the entity may be a distinguishable and independent transaction that exists in practice, such as a person name, a place name, a commodity, a name, and the like.

The training sample set and the extended sample set may be entity sets, the entity sets may be sets of entities having the same type and the same attribute, for example, the entity sets may be sets of names, and the entity sets of different types and different attributes may be sets of names and sets of place names.

In this embodiment of the present application, the method for obtaining the extended sample set may further include:

acquiring an expanded sample set, wherein the expanded sample set comprises a training sample set;

and performing entity expansion processing on the expanded sample set to obtain an expanded sample set.

The extended sample set may be an extended sample set of the previous iterative training, and the extended sample set may include a training sample set and a set obtained by performing entity extension processing according to the training sample set.

The entity extension process may be entity set extension (Entity Set Expansion), and may extract an entity of the same type as the seed entity in the natural language text or network according to the obtained seed entity, and extend entity information into the entity. In some embodiments, performing entity extension processing on the extended sample set may be characterized as obtaining an entity of the same entity type as the entity in the entity set, and using the entity of the same entity type as the extended sample set.

In this embodiment of the present application, the method for performing entity extension processing on the extended sample set to obtain an extended sample set may include:

selecting an extended entity in the extended sample set to obtain a selected entity;

performing entity expansion processing on the selected entity to obtain the selected entity;

and adding the selected entity to the expanded sample set to obtain the expanded sample set.

Wherein the selected entity is a higher quality entity selected from the expanded sample set, the higher quality entity may include the entity expanded by the initial iterative training and the entity in the training sample set when the first iterative training is initially performed, wherein in some embodiments, the quality of the entity may be determined according to the true confidence and the predicted confidence of the entity.

In this embodiment of the present application, the method for selecting the extended entity in the extended sample set to obtain the selected entity may include:

inputting the extended entity in the extended sample set to the current model to obtain an extended prediction result corresponding to the extended entity;

obtaining the extended prediction confidence corresponding to the extended entity based on the extended prediction result corresponding to the extended entity;

and selecting the selected entity with the expanded prediction confidence coefficient larger than a preset expansion threshold from the expanded sample set according to the expanded prediction confidence coefficient.

Wherein the extended predictive confidence may be a predictive confidence of the extended entity.

The method for obtaining the selected entity according to the extended prediction confidence coefficient can be to compare the prediction confidence coefficient with a confidence coefficient threshold value, and when the extended prediction confidence coefficient is larger than the confidence coefficient threshold value, the quality of the extended entity is characterized to be higher, and the method can be used for extension.

In this embodiment of the present application, according to the extended prediction confidence, the method for selecting, from the extended sample set, a selected entity whose extended prediction confidence is greater than a preset extension threshold may include:

Acquiring the true confidence coefficient of more than one extended entity in the extended sample set;

carrying out weighted averaging processing on the true confidence coefficient of more than one extended entity to obtain a preset extension threshold value of the true confidence coefficient;

and selecting the extended entity according to the preset extension threshold to obtain the selected entity with the extended prediction confidence coefficient larger than the preset extension threshold.

The method for obtaining the selected entity may be to compare the predicted confidence coefficient of each extended entity with the true confidence coefficient of one extended entity in the extended sample set or the average true confidence coefficient of more than two entity sets, and select an entity with the predicted confidence coefficient greater than the true confidence coefficient. The entity with the confidence coefficient larger than the true confidence coefficient is predicted, and the entity generated when the initial iteration is performed in the expanded sample set and the original entity when the first iteration is performed can be selected as the expanded set, so that the quality of the expanded set after expansion is improved.

120. And training the previous model by adopting the training sample set to obtain a current model.

The training process may be a process of performing parameter adjustment on the previous model, thereby obtaining the current model. In some embodiments, the training process may obtain a comparison result by inputting a part of samples or all samples in a preset training sample set to a previous model, comparing the output result with a label corresponding to an entity in the training sample set, and performing parameter adjustment according to the comparison result.

The current model may be a model obtained after the previous model is subjected to training processing, for example, in some embodiments, the current model obtained in the previous iteration is obtained when the current iteration is performed, the current model in the current iteration is obtained when the current iteration is performed after the current iteration is finished, and when the current iteration is performed, the current model in the current iteration is used as the previous model in the next iteration to perform training, so that as training is continuously performed, the model can be continuously updated in each iteration.

The training sample set includes a text sentence, where the text sentence includes an entity and a label corresponding to the entity, and in this embodiment of the present application, the method for training the previous model by using the training sample set to obtain a current model may include:

acquiring a preset training sample set, wherein the training sample set comprises text sentences, and the text sentences comprise entities and labels corresponding to the entities;

masking the entity to obtain a masked text sentence;

and training the previous model by adopting the training sample set to obtain a current model.

The masking processing of the entity may be masking processing of the entity in the text sentence, and the semantic category and the extended entity of the entity are obtained through the context during training.

For example, in some embodiments, entities in the text statement may be replaced with mask (mask) words, resulting in a sample statement with mask words.

In order to avoid the influence of over-sampling or under-sampling, which alleviates the negative influence caused by the unbalanced occurrence frequency of the entities, each sample sentence can only have one entity, and the number of sample sentences corresponding to each entity is equal to the average of the total number of the entities in the training sample set and the total number of the sample sentences. Where a sample sentence includes multiple entities, the sample sentence may be divided into multiple sub-sample sentences with only one entity. This allows the cross entropy penalty of label smoothing to be selected such that entities with similar semantic information to the training or predicted entity are not unduly suppressed in the output predicted entity probability distribution.

For example, in some embodiments, when an entity corresponds to 10 text sentences, then when there are 10 entities in the training sample set, there are 100 sample sentences in the training sample set.

130. And inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity.

The prediction result may be entity extension information obtained by inputting the entity into the current model and performing natural language processing of open type information extraction.

For example, in some embodiments, when the entity set is an entity set with a place name, the entity may be an "XXX" place city name, and after the entity may be input to the current model for training, the obtained prediction result may be an "XXX" direct city name, or a "YYY" province name, or other names, such as a person name, a trademark name.

140. And obtaining the prediction confidence corresponding to the entity based on the prediction result corresponding to the entity.

Where confidence may be that in statistics the confidence interval of a probability sample is an interval estimate of some overall parameter of the sample. The confidence interval reveals the extent to which the true value of this parameter falls around the measurement with a certain probability. The confidence interval gives the degree of confidence in the measured value of the parameter being measured, i.e. the "certain probability" as required before. This probability is referred to as the confidence level. If one's support rate is 55% at one large choice and the confidence interval at confidence level 0.95 is (50%, 60%), then his true support rate (confidence) has a ninety-five percent probability of falling between fifty percent and sixty percent, so his true support rate is less than 2.5 percent less likely (assuming the distribution is symmetrical). Thus, the confidence level may also be how large the estimated value is within a certain allowable error range from the overall parameter, and this corresponding probability is referred to as the confidence level.

The prediction confidence coefficient can be represented according to the prediction probability distribution of the output result, wherein the larger the probability distribution is, the higher the prediction confidence coefficient is, and the first entity meets the preset semantic recognition requirement, so that the prediction probability distribution can be represented by the prediction confidence coefficient, and the entities are ordered according to the magnitude of the prediction confidence coefficient, so that the entity sequence set is obtained.

150. And classifying the entities according to the prediction confidence corresponding to the entities to obtain the types of the entities.

The types of the entities comprise positive entities and negative entities, wherein the positive entities can be the entities of which the output results belong to the semantic categories of the entities, namely the prediction confidence of the positive entities is higher. The positive entity set is a positive entity set, wherein more than one positive entity in the positive entity set can exist. A negative entity may be an entity whose output result does not belong to the semantic category of the entity, i.e., the predictive confidence of a positive entity is lower. The negative entity set is a set of negative entities, wherein more than one negative entity in the negative entity set can exist.

The comparison of the preset prediction confidence and the prediction confidence can characterize the association degree of the actual semantic category of the entity and the prediction semantic category.

For example, in some embodiments, when a result obtained by performing semantic extraction on an entity by the current model is a place name and a semantic class corresponding to the entity is a person name, the output result does not belong to the semantic class, and the prediction confidence is low; when the result obtained by semantic extraction of the entity in the current model is a place name and the semantic category corresponding to the entity is also a place name, the output result belongs to the semantic category and the prediction confidence is high.

When the prediction confidence is lower than the preset prediction confidence, the result of the entity can be considered to be not an entity of the semantic category of the entity, wherein the preset prediction confidence can be adjusted manually according to the needs of a user.

In some embodiments, the output result may include a plurality of results, for example, some of the output results are results belonging to a semantic category corresponding to the entity, and some of the output results are results not belonging to a semantic category corresponding to the entity, so that when semantic classification is performed, it may be determined that the entity is a positive entity or a negative entity according to the ratio of all the results to the results belonging to the semantic category corresponding to the entity. The ratio may be a preset prediction confidence, and the ratio of the preset prediction confidence may be set manually.

In this embodiment of the present application, the method for classifying the entity according to the prediction confidence corresponding to the entity to obtain the type of the entity may include:

according to the prediction confidence, entity sequencing is carried out on the extended sample set to obtain a sequenced entity sequence set;

classifying the entity sequence set to obtain the type of the entity, wherein the type of the entity comprises a positive type and a negative type, the positive type represents that the sequence position of the entity in the entity sequence set is higher than a preset sequence position, and the negative type represents that the sequence position of the entity in the entity sequence set is not higher than the preset sequence position.

Wherein, the entity sequence set may refer to a set in which entities are ordered according to the magnitude of the prediction confidence. Wherein, in some embodiments, the higher the ranking position, the greater the token prediction confidence.

The preset sequence position can be a super parameter set for human, and the entity meeting the confidence requirement and the entity not meeting the confidence requirement can be screened through the preset sequence position.

For example, in some embodiments, the entity sequence set may include entity e1, entity e2, entity e3, and entity e4 in order, where 1, 2, 3, 4 may characterize the ordering of the predicted probability distribution characterized by the first output result in the entity sequence set, where the positive entity set includes entity e1, entity e2, and the negative entity set includes entity e3 and entity e4 when the preset sequence position may be set to 2.5.

160. And obtaining the comparison loss of the current model according to an entity pair, wherein the entity pair comprises a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity.

The contrast loss function may refer to a loss function adopted during contrast learning, where contrast learning is a self-supervision learning method and is used for shortening the distance between positive samples and shortening the distance between negative samples.

The computer device may calculate a contrast loss corresponding to each entity pair based on the semantic representation corresponding to the positive entity in each positive entity set and the semantic representation corresponding to the negative entity in each negative entity set, and finally construct a contrast loss function according to the contrast losses corresponding to all entity pairs.

The alignment loss may characterize an example between a positive entity and a negative entity, which may be used to characterize whether the current model converges.

In this embodiment of the present application, the method for obtaining the contrast loss of the current model according to the entity pair may include:

constructing a contrast loss function model of the current model;

determining the positive entity and the negative entity according to the entity pair;

and inputting the positive entity and the negative entity into the contrast loss function model to obtain the contrast loss.

Wherein the contrast loss function loss _cl The following can be mentioned:

wherein pos _i Can be a positive entity, neg _i May be a negative entity, z being a semantic representation of the entity; n can characterize that all data are batched and then processed in sequence, each batch calculates a contrast loss once, and parameters are updated and then the next batch; j (i) can represent (x) _i ,x _j ) A union contained in the positive entity set and the negative entity set; τ ⁺ The prior class probability is a super parameter which can be set for a person, and beta is a super parameter for controlling the concentration degree of a difficult negative entity; t is a temperature factor.

Wherein, in the embodiment of the application, when there are at least two entity pairs; the method for obtaining the contrast loss of the current model according to the entity pair can comprise the following steps:

obtaining the contrast loss of each entity pair;

and weighting and averaging the contrast loss of each entity pair to obtain the contrast loss of the current model.

Wherein by contrast the loss function loss _cl Calculating the comparison loss of each sample pair, summing the comparison loss of each sample pair, dividing by the number of sample pairs to obtain the pairSpecific loss.

170. And when the contrast loss is not greater than a preset threshold value, returning to and executing the step to acquire a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model.

After the contrast loss is obtained, the contrast loss is not greater than the preset threshold, which indicates that the current model is not converged, and the training effect of the current model is not completed, so that step 110 may be returned and executed for iteration.

The step of returning and executing obtains a current model which can be trained in the iteration currently by the previous model, the training sample set and the previous model in the expanding sample set.

180. And ending training when the contrast loss is larger than a preset threshold value.

After the contrast loss is obtained, the contrast loss is larger than a preset threshold value, the convergence of the current model is represented, and the training effect of the current model is completed.

In the embodiment of the application, when the number of the previous models is N, N is an integer greater than or equal to 2; the method for carrying out initial training treatment on the previous model to obtain the current model of the current iteration can comprise the following steps:

performing initial training treatment on N previous models to obtain N models to be selected;

and screening and integrating the N models to be selected to obtain an integrated model, wherein the integrated model is the current model.

The initial training process may be a training process performed on the previous model, respectively.

For example, in some embodiments, an initial training process may be performed on N previous models in a parallel processing manner, to obtain N candidate models.

The screening and integrating process may be a process of screening and integrating N candidate models according to the performance expectation of the candidate models.

The integrated model may be a process of performing integrated processing on the screened model to obtain a new model, where the output of the integrated model may be an average output of the screened model to be selected.

In this embodiment of the present application, performing screening and integration processing on N candidate models to obtain an integrated language model may include: inputting the training sample set into the model to be selected to obtain a model to be selected prediction output result, wherein the training sample set comprises a text sentence, and the text sentence comprises an entity and a label corresponding to the entity; acquiring a cross entropy loss function corresponding to the sample label; calculating the cross entropy loss of each model to be selected according to the cross entropy loss function; screening N screening models from the N models to be selected according to the cross entropy loss, wherein N is a positive integer less than or equal to N; and carrying out integrated processing on the n screening models to obtain an integrated model.

In this embodiment of the present application, the method for performing integrated processing on n screening models to obtain an integrated model may include:

Acquiring input parameter sets of n screening models;

obtaining an output average value according to the input parameter set, wherein the output average value is the average value of the output results of the n screening models;

and constructing an integrated model according to the input parameter set and the output average value.

The input parameters may be variables input to the screening model, and the input parameter sets of the n screening models may be sets of input parameters respectively input to the n screening models.

The output average value is obtained by inputting each input parameter in the parameter set into n screening models.

From the above, the previous model and the entity set are obtained, and the previous model is the language model obtained by the previous iteration; inputting a preset training sample set into a previous model for training to obtain a current model of a current iteration, wherein the previous iteration is the previous iteration of the current iteration; inputting the entities in the entity set to a current model for training to obtain a first prediction confidence coefficient, wherein the first prediction confidence coefficient characterizes a prediction probability distribution obtained by comparing an output result obtained by inputting the entities to a previous model with a label corresponding to the entities; obtaining a positive entity set and a negative entity set according to the first prediction confidence coefficient, wherein the positive entity set is an entity with the first prediction confidence coefficient larger than the preset prediction confidence coefficient in the entity set, and the negative entity set is an entity with the first prediction confidence coefficient smaller than the preset prediction confidence coefficient in the entity set; and according to the positive entity set and the negative entity set, obtaining a first comparison loss of the current model, wherein the first comparison loss is the first comparison loss obtained by training the current model according to a comparison loss function constructed according to the positive entity set and the negative entity set. In the method, an initial neural network model can be trained through a training sample set to obtain a trained current model, then an expanded sample set is input into the current model to obtain a prediction result, entity types in the expanded sample set can be obtained according to the prediction result, comparison loss can be obtained according to the entity types, and the training completion degree of the current model can be determined according to the comparison loss, wherein when the training of the current model is incomplete, the current model can be used as a last model of the next iteration to be iteratively trained through the expanded sample set, when the training of the current model is completed, the current model is used as the initial neural network model for completing the training, the current model after the training of the expanded sample set is used for classifying the entities in the expanded sample set through the training sample set, so that the current model is better in prediction effect along with continuous training, the classification effect of the entities in the expanded sample set is better, the entity embedding vectors of the positive entity and the negative entity are more discrete, and further, when the loss is calculated through the positive entity and the negative entity, the training effect of the initial neural network is reduced. Meanwhile, when the last model is trained, the training effect can be ensured by using the training sample set, so that the problem that the entity added into the set later deviates from the category more easily to form vicious circle because the newly added entity does not belong to the category or has stronger semantic information irrelevant to the category when a new entity is introduced in the training process is avoided, and the accuracy of the model to be used when the model to be used is used after the training is finished is improved.

The method described in the above embodiments will be described in further detail below.

In this embodiment, a server will be taken as an example, and a method of the embodiment of the present application will be described in detail.

As shown in fig. 2a, a specific flow of a model processing method is as follows:

201. training a plurality of models by adopting a preset entity set to obtain a plurality of trained models.

The preset entity set comprises each entity in an entity word list and each sample sentence containing the entity, wherein the entity in the sample sentence is replaced by a mask word, so that the entity is shielded.

In order to alleviate the over-sampling or under-sampling means of the negative effects caused by the unbalance of the entities, a certain sample randomness can be introduced to facilitate the subsequent integrated learning, each sample sentence in the preset entity set corresponds to one entity, and the number of sample sentences of each entity is controlled to be the average value of the numbers of all the entities and the total sample sentences. By the cross entropy loss of label smoothing, it can be ensured that entities with similar semantic information with the entities in the output predicted entity probability distribution are not excessively suppressed. Wherein by a cross entropy loss function loss as follows _pred The method comprises the following steps:

wherein v is _e Is the size, v, of mini-batch at the time of training _e Is the size of the entity word list, eta is a smoothing coefficient, and the label smoothness is higher as eta is larger, y is _i Index of the entity corresponding to sample i.

The cross entropy loss of each trained model can also be calculated through the cross entropy loss function, and the screening of the initial model is carried out according to the cross entropy loss.

202. And screening the multiple trained models to obtain k initial models meeting the effect requirements.

The screening process refers to screening according to the use effect of a plurality of trained models in entity set expansion tasks, and during screening, the trained models can be evaluated through the following evaluation model functions:

wherein,for all seed entities (entities in the entity set) of class cls, S _e Is all samples of entity e, r (e) is a probabilistic representation of entity e, and kl_div may be KL (Kullback-Leibler) divergence.

After evaluating the trained model by the evaluation model function, scoring may be performed by the following scoring model:

and scoring the model to obtain an initial model meeting the requirements.

203. And carrying out integrated processing on the k initial models to obtain an integrated model.

After k initial models are screened out, parameter sets of all the initial models are obtained, the parameter sets are input into all the initial models to obtain k model outputs, and finally, the k outputs are averaged to obtain the output of the integrated model.

Wherein the integrated model is as follows:

wherein Θtop is a parameter set,is the output of the integrated model.

204. Obtaining predicted expected distribution of each entity in an entity set and average actual expected distribution of all the entities;

205. and screening an extended entity set with predicted expected distribution larger than average real expected distribution in the entity set by adopting a window algorithm.

The window algorithm may rank all entities in the extended set and the entity set according to the predicted expected distribution and the actual expected distribution of all the entities to obtain a ranked entity set, where the entities in the extended entity set may rank the entities in the ranked entity set that are the top.

206. And carrying out entity set expansion processing on the expanded entity set to obtain an expanded set.

The number of new expansion sets may be preset, and after the preset number is reached, expansion is stopped.

207. And merging the extended entity set with the first entity set to obtain a second entity set.

The process diagram of another model shown in fig. 2c is shown, the entity set is the "current entity set" shown in fig. 2c, the "current entity set" is predicted and trained by the "entity-level pre-training language model" obtained by training in step 203, the confidence level for representing the "entity set probabilistic representation" is obtained, the "current entity set" is ordered according to the confidence level, a candidate entity queue is obtained, then a higher entity is selected from the candidate entity queue through the "candidate entity score", and entity expansion processing is performed according to the entity to obtain a target entity, and the target entity is stored in the "current entity set" until the number of entity expansion meets the preset number requirement, and then expansion is stopped.

208. And re-ordering the entities in the second entity set to obtain a re-ordered entity set.

Wherein the reordering of the entities in the second set of entities may be scored according to a ranking formula as shown below:

where i is entity e _i Order added to the extension set, rank (e _i ) Is entity e _i Ranking in a set of ranking entities.

The second entity set may be an "extended entity set", and the second entity in the "extended entity set" is scored, so as to obtain an "extended entity ordered queue" for ordering the second entity according to the score, so as to determine a positive entity set and a negative entity set, as shown in a flowchart of another processing method of the model in fig. 2 d.

209. Determining a positive entity set and a negative entity set in the second entity set according to a preset ranking threshold;

210. and constructing a contrast loss function according to the positive entity set and the negative entity set.

Constructed contrast loss function loss _cl The following is shown:

211. Training the integrated model according to the comparison loss function to obtain the comparison loss of the integrated model;

212. returning to and executing step 201 when the contrast loss of the integrated model characterizes that the current model is not converged;

213. and when the contrast loss of the integrated model represents the convergence of the current model, the integrated model is taken as the model to be used.

The model to be used in the embodiment of the application can be used in a Class name guide entity selection (Class-Guided Entity Selection) module of CGExpan. Wherein, the candidate entity score definition of the CGExpan Class name guiding entity selection (Class-Guided Entity Selection) module is as follows:

where i is the index of the candidate entity in the entity vocabulary,the association of the token with the guide name is characterized,similarity is represented for the embedding of the candidate entity and the current set of entities.

Wherein, embedding represents the similarity as follows:

wherein E is _s Is a plurality of entities randomly selected from the current entity set, V _e Is the average BERT context word embedding for entity e, cos (·) refers to cosine similarity.

In the embodiment of the application, can be usedReplace->Fusion of the model to be used and a Class name guiding entity selection (Class-Guided Entity Selection) module of CGExpan is achieved, wherein a framework obtained after fusion is as follows:

From the above, in the application, when training the model, the positive entity and the non-negative entity can be separated more accurately, the boundary for separation is better determined, and the training effect of the model is improved. Therefore, the precision of the model to be used in use is improved.

In order to better implement the method, the embodiment of the application also provides a processing device of the model, and the processing device of the model can be integrated in electronic equipment, and the electronic equipment can be a terminal, a server and other equipment. The terminal can be a mobile phone, a tablet personal computer, an intelligent Bluetooth device, a notebook computer, a personal computer and other devices; the server may be a single server or a server cluster composed of a plurality of servers.

For example, in the present embodiment, a method of the embodiment of the present application will be described in detail by taking a specific integration of a processing device of a model into a terminal as an example.

As shown in fig. 2b, another model processing method is shown in the following flow chart:

and (3) shielding entity task prediction task learning, and training the pre-training model through a preset entity set so as to adjust parameters of the pre-training model.

And (3) model selection and model integration, wherein a pre-training model trained by 'shielding entity task prediction task learning' is selected and integrated according to the used effect, so that an integrated model is obtained.

And (3) an entity set expansion framework, namely, sequencing, expanding and re-sequencing the expanded set, so as to obtain an expanded set after expanded sequencing.

And classifying the entities in the expansion set to obtain positive entities and negative entities, and obtaining comparison loss according to the positive entities and the negative entities, wherein when the integrated model represented by the comparison loss is not converged, the iterative training of the shielding entity task prediction task learning is carried out on the integrated model again through the initial model parameters until the integrated model represented by the comparison loss is converged, and the model selection and model integration can be carried out on the integrated model to obtain an entity-level pre-training model.

For example, as shown in fig. 3, the processing apparatus of the model may include an acquisition unit 301, an adoption unit 302, a first obtaining unit 303, a second obtaining unit 304, a classification unit 305, a third obtaining unit 306, a return unit 307, and an end training unit 308, as follows:

(one) an acquisition unit 301:

the acquisition unit 301 may acquire a previous model including the initial neural network model, a training sample set, and an extended sample set.

In some embodiments, the obtaining unit 301 is further specifically configured to:

In some embodiments, the last model has N, where N is an integer greater than or equal to 2, and the obtaining unit 301 is further specifically configured to:

and screening and integrating the N models to be selected to obtain an integrated model, wherein the integrated model is used as a current model.

inputting the training sample set into the model to be selected to obtain a model to be selected prediction output result, wherein the training sample set comprises a text sentence, and the text sentence comprises an entity and a label corresponding to the entity;

Acquiring a cross entropy loss function corresponding to the label;

calculating the cross entropy loss of each model to be selected according to the cross entropy loss function;

screening N screening models from the N models to be selected according to the cross entropy loss, wherein N is a positive integer less than or equal to N;

and carrying out integrated processing on the n screening models to obtain an integrated model.

acquiring input parameter sets of n screening models;

And (II) adopting a unit 302:

the adoption unit 302 may train the previous model using the training sample set to obtain the current model.

In some embodiments, the training sample set includes a text sentence, where the text sentence includes an entity and a tag corresponding to the entity, and the employing unit 302 is specifically configured to:

masking the entity to obtain a masked text sentence;

(III) first obtaining unit 303:

the first obtaining unit 303 may input the entity in the extended sample set to the current model, to obtain a prediction result corresponding to the entity.

(fourth) a second obtaining unit 304:

the second obtaining unit 304 may obtain the prediction confidence corresponding to the entity based on the prediction result corresponding to the entity.

(fifth) a classification unit 305:

the classifying unit 305 may classify the entity according to the prediction confidence corresponding to the entity, to obtain the type of the entity.

In some embodiments, the second obtaining unit 304 is specifically further configured to:

(sixth) a third obtaining unit 306:

the third obtaining unit 306 may obtain the comparison loss of the current model according to an entity pair, where the entity pair includes a positive entity and a negative entity, the positive entity is a positive entity, and the negative entity is a negative entity.

In some embodiments, the classification unit 305 is specifically further configured to:

constructing a contrast loss function model of the current model; determining the positive entity and the negative entity according to the entity pair;

obtaining the contrast loss of each entity pair;

(seventh) a return unit 307:

the returning unit 307 may return and execute the step of obtaining a previous model, a training sample set, and an extended sample set when the contrast loss is not greater than a preset threshold, where the previous model includes an initial neural network model.

(eighth) end training unit 308:

the end training unit 308 may end training when the contrast loss is greater than a preset threshold.

In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.

As can be seen from the above, the processing device of the model of the present embodiment is configured to obtain a previous model, a training sample set, and an extended sample set, where the previous model includes an initial neural network model; the first obtaining unit is used for inputting the entity in the extended sample set to the current model to obtain a prediction result corresponding to the entity; a second obtaining unit, configured to obtain a prediction confidence corresponding to the entity based on a prediction result corresponding to the entity; the classification unit is used for classifying the entity according to the prediction confidence corresponding to the entity to obtain the type of the entity; a third obtaining unit, configured to obtain a comparison loss of the current model according to an entity pair, where the entity pair includes a positive entity and a negative entity, the positive entity is a positive type entity, and the negative entity is a negative type entity; the return unit is used for returning and executing the steps to acquire a previous model, a training sample set and an expansion sample set when the contrast loss is not greater than a preset threshold value, wherein the previous model comprises an initial neural network model; and the training ending unit is used for ending training when the contrast loss is larger than a preset threshold value. Therefore, the precision of the model to be used in use is improved.

The embodiment of the application also provides electronic equipment which can be a terminal, a server and other equipment. The terminal can be a mobile phone, a tablet computer, an intelligent Bluetooth device, a notebook computer, a personal computer and the like; the server may be a single server, a server cluster composed of a plurality of servers, or the like.

In this embodiment, a detailed description will be given taking an electronic device of this embodiment as an example of a terminal, for example, as shown in fig. 4, which shows a schematic structure of the terminal according to the embodiment of the present application, specifically:

the terminal may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, input module 404, and communication module 405, among other components. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 4 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

The processor 401 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall detection of the terminal. In some embodiments, processor 401 may include one or more processing cores; in some embodiments, processor 401 may integrate an application processor that primarily processes operating systems, user interfaces, applications, and the like, with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The terminal also includes a power supply 403 for powering the various components, and in some embodiments, the power supply 403 may be logically connected to the processor 401 by a power management system so as to perform functions such as managing charging, discharging, and power consumption by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The terminal may also include an input module 404, which input module 404 may be used to receive entered numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

The terminal may also include a communication module 405, and in some embodiments the communication module 405 may include a wireless module, through which the terminal may wirelessly transmit over short distances, thereby providing wireless broadband internet access to the user. For example, the communication module 405 may be used to assist a user in e-mail, browsing web pages, accessing streaming media, and so forth.

Although not shown, the terminal may further include a display unit or the like, which is not described herein. In this embodiment, the processor 401 in the terminal loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

In some embodiments, a computer program product is also proposed, comprising a computer program or instructions which, when executed by a processor, implement the steps in a method of processing any of the models described above.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform steps in a method of processing any of the models provided by embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The instructions stored in the storage medium may perform steps in any of the methods for processing models provided in the embodiments of the present application, so that the beneficial effects that can be achieved by any of the methods for processing models provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.

The foregoing describes in detail a method, apparatus, electronic device, and storage medium for processing a model provided in the embodiments of the present application, and specific examples are applied to illustrate principles and implementations of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of processing a model, comprising:

acquiring a previous model, a training sample set and an expansion sample set, wherein the previous model comprises an initial neural network model, the training sample set comprises a text sentence, and the text sentence comprises an entity and a label corresponding to the entity;

Ending training when the contrast loss is greater than a preset threshold;

obtaining the set of extended samples includes:

selecting a selected entity with the expanded prediction confidence coefficient larger than a preset expansion threshold value from the expanded sample set according to the expanded prediction confidence coefficient;

2. The method of claim 1, wherein training the previous model using the training sample set to obtain a current model comprises:

masking the entity to obtain a masked text sentence;

3. The method of claim 1, wherein selecting a selected entity from the set of expanded samples having the expanded prediction confidence greater than a preset expansion threshold based on the expanded prediction confidence comprises:

4. The method of claim 1, wherein classifying the entity according to the prediction confidence corresponding to the entity to obtain the type of the entity comprises:

5. The method of claim 1, wherein deriving the contrast loss for the current model from the pair of entities comprises:

6. The method of claim 5, wherein there are at least two of said pairs of entities;

and obtaining the contrast loss of the current model according to the entity pairs, wherein the method comprises the following steps:

obtaining the contrast loss of each entity pair;

7. The method of claim 1, wherein the last model has N, N being an integer greater than or equal to 2;

training the previous model by using the training sample set to obtain a current model, wherein the training comprises the following steps:

and screening and integrating the N models to be selected to obtain an integrated model, wherein the integrated model is used as the current model.

8. The method of claim 7, wherein the screening and integrating the N candidate models to obtain an integrated language model includes:

acquiring a cross entropy loss function corresponding to the label;

9. The method of claim 8, wherein the integrating the n screening models to obtain an integrated model comprises:

acquiring n input parameter sets of the screening models;

obtaining an output average value according to an input parameter set, wherein the output average value is an average value of n screening model output results;

10. A model processing apparatus, comprising:

The adoption unit is used for training the previous model by adopting the training sample set to obtain a current model;

the training ending unit is used for ending training when the contrast loss is larger than a preset threshold value;

the acquisition unit is further configured to:

11. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions; the processor loads instructions from the memory to perform the steps in the method of processing a model according to any one of claims 1 to 9.

12. A computer-readable storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor for executing the steps in the method of processing a model according to any one of claims 1-9.

13. A computer program product comprising a computer program or instructions which, when executed by a processor, carries out the steps in the method of processing a model according to any one of claims 1 to 9.