CN116150621A

CN116150621A - Training method, device and equipment for text model

Info

Publication number: CN116150621A
Application number: CN202310172601.XA
Authority: CN
Inventors: 蔡志伟; 杜新凯; 吕超; 纪诚; 姚雷
Original assignee: Sunshine Insurance Group Co Ltd
Current assignee: Sunshine Insurance Group Co Ltd
Priority date: 2023-02-18
Filing date: 2023-02-18
Publication date: 2023-05-23

Abstract

The embodiment of the application provides a training method, device and equipment for a text model, wherein the method comprises the following steps: acquiring a training sample set; substituting the training sample set into a dictionary enhancement language feature model which is finished by pre-training to obtain a plurality of feature vectors; fusing a plurality of feature vectors to obtain a target vector; and acquiring a loss function according to the target vector until training is completed, so as to acquire a text model. According to the method, the training sample set is substituted into the pre-trained dictionary enhancement language feature model, on one hand, the dictionary enhancement language feature model is introduced, and on the other hand, entry prediction training and contrast learning training are carried out.

Description

Training method, device and equipment for text model

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a training method, device and equipment for a text model.

Background

With the development of artificial intelligence technology, natural language processing technology has more application scenes. For example, the method has wide application in semantic similarity, text classification, question and answer system, emotion analysis, machine translation and the like.

In the prior art, when text processing is performed, the text processing method is generally directly applied based on a single model, for example, vectorized representation of the text is obtained through means such as statistics of word frequency, syntactic analysis and the like, and then similarity is calculated, so that the text with the highest similarity is obtained.

However, the above method only considers features such as word frequency, and has the problem of too sparse semantics, so that the accuracy of text processing is not high.

Disclosure of Invention

The invention aims to provide a training method, device and equipment for a text model aiming at the defects in the prior art so as to solve the problem of low text processing accuracy in the prior art.

In order to achieve the above purpose, the technical solution adopted in the embodiment of the present application is as follows:

in a first aspect, an embodiment of the present application provides a training method for a text model, where the method includes:

obtaining a training sample set, the training sample set comprising: labeling a good text sample;

substituting the training sample set into a dictionary enhanced language feature model which is pre-trained to obtain a plurality of feature vectors; the dictionary enhanced language feature model is obtained through vocabulary entry prediction training, comparison learning training, and the vocabulary entry prediction training is used for training vocabulary entries of marker prediction in example sentences, and the comparison learning training is used for training the relationship among vocabulary entry meanings;

fusing a plurality of the feature vectors to obtain a target vector;

and acquiring a loss function according to the target vector until training is completed, so as to acquire a text model.

In an alternative embodiment, the method further comprises:

extracting an original text from a preset dictionary, wherein the original text comprises: the entry, the paraphrasing and example sentences corresponding to each entry;

generating a vocabulary entry prediction sample set according to the original text and the vocabulary entry prediction rule;

and generating a comparison learning sample set according to the original text and the comparison learning rule.

In an optional embodiment, after generating the vocabulary entry prediction sample set according to the original text and the vocabulary entry prediction rule, the method further includes:

substituting the vocabulary entry prediction sample set into an initial dictionary enhanced language feature model for training to obtain a predicted vocabulary entry;

and calculating and acquiring a first loss parameter according to the first loss function and the predicted entry.

In an optional implementation manner, after generating the comparison learning sample set according to the original text and the comparison learning rule, the method further includes:

substituting the comparison learning sample set into the initial dictionary enhanced language feature model for training, and calculating and obtaining a second loss parameter according to a second loss function.

In an alternative embodiment, the method further comprises:

and reversely optimizing parameters of the initial dictionary enhancement language feature model according to the first loss parameters and the second loss parameters to obtain the pre-trained dictionary enhancement language feature model.

In an optional embodiment, before substituting the training sample set into the pre-trained dictionary enhancement language feature model, the method further includes:

after eliminating stop words and/or invalid words from the texts in the training sample set, searching and obtaining target words in a preset vocabulary entry set where the texts are located as enhancement words;

combining the text in the training sample set with the corresponding enhanced word paraphrasing to obtain enhanced text;

and updating the training sample set by using the enhanced text replacement.

In an optional implementation manner, the fusing the plurality of feature vectors to obtain the target vector includes:

fusing a plurality of feature vectors by adopting an attention mechanism to obtain a target vector, wherein the target vector comprises: the sentence characterizes the vector.

In an alternative embodiment, the obtaining the loss function according to the target vector until training is completed to obtain a text model includes:

calculating a third loss parameter according to the target vector and a third loss function;

and optimizing parameters in an initial text model according to the third loss parameters, and acquiring the text model after training, wherein the text processing capacity of the text model corresponds to the text relationship in the training sample set.

In a second aspect, another embodiment of the present application provides a training device for a text model, where the device includes:

a first acquisition module for acquiring a training sample set, the training sample set comprising: labeling a good text sample;

the second acquisition module is used for substituting the training sample set into a dictionary enhanced language feature model which is finished by pre-training to acquire a plurality of feature vectors; the dictionary enhanced language feature model is obtained through vocabulary entry prediction training, comparison learning training, and the vocabulary entry prediction training is used for training vocabulary entries of marker prediction in example sentences, and the comparison learning training is used for training the relationship among vocabulary entry meanings;

the fusion module is used for fusing the plurality of feature vectors to obtain a target vector;

and the training module is used for acquiring a loss function according to the target vector until training is completed so as to acquire a text model.

In an optional implementation manner, the second obtaining module is further configured to extract, from a preset dictionary, an original text, where the original text includes: the entry, the paraphrasing and example sentences corresponding to each entry;

In an optional implementation manner, the second obtaining module is further configured to substitute the vocabulary entry prediction sample set into an initial dictionary enhancement language feature model for training to obtain a predicted vocabulary entry;

In an optional implementation manner, the second obtaining module is further configured to substitute the comparison learning sample set into an initial dictionary enhancement language feature model for training, and calculate and obtain a second loss parameter according to a second loss function.

In an optional implementation manner, the second obtaining module is further configured to reversely optimize parameters of the initial dictionary enhancement language feature model according to the first loss parameter and the second loss parameter, and obtain the dictionary enhancement language feature model after pre-training is completed.

In an optional implementation manner, the second obtaining module is further configured to search for and obtain, as the enhancement word, a target word in a preset vocabulary entry set where the text is located after removing the stop word and/or the invalid word from the text in the training sample set;

and updating the training sample set by using the enhanced text replacement.

In an optional implementation manner, the fusing module is further configured to fuse a plurality of the feature vectors with an attention mechanism, and obtain a target vector, where the target vector includes: the sentence characterizes the vector.

In an alternative embodiment, the training module is configured to calculate a third loss parameter according to the target vector and a third loss function;

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method of training a text model as claimed in any of the first aspects.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the training method for a text model according to any of the first aspects.

The beneficial effects of this application are:

the application provides a training method, device, equipment and storage medium of a text model, wherein the method comprises the following steps: acquiring a training sample set; substituting the training sample set into a dictionary enhancement language feature model which is finished by pre-training to obtain a plurality of feature vectors; fusing a plurality of feature vectors to obtain a target vector; and acquiring a loss function according to the target vector until training is completed, so as to acquire a text model. According to the method, the training sample set is substituted into the pre-trained dictionary enhancement language feature model, the dictionary enhancement language feature model is introduced on one hand, and the expression capacity of the model is enhanced through entry prediction training and contrast learning training on the other hand, so that the accuracy of the acquired text model on text processing can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a training method of a text model according to an embodiment of the present application;

FIG. 2 is a flowchart of another training method for a text model according to an embodiment of the present application;

FIG. 3 is a flowchart of another training method for a text model according to an embodiment of the present application;

FIG. 4 is a flowchart of another training method for a text model according to an embodiment of the present application;

FIG. 5 is a flowchart of another training method for a text model according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of a text model call provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a training device for a text model according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the present application are only for the purpose of illustration and description, and are not intended to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this application, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.

In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the presence of the features stated hereinafter, but not to exclude the addition of other features.

With the development of artificial intelligence, the technology of natural language processing is applied to various text processing fields, such as obtaining text similarity, classifying text, screening text, and the like, which are not limited herein. For different text processing, the text model with corresponding capability can be adopted to operate, for example, the text model trained by different text samples can obtain different processing results.

However, the current model training is too single, the obtained model is not accurate in terms of obtained results, and the model training is performed in a dictionary enhancement mode to improve the accuracy of the model.

The method in the embodiments of the present application may be performed by a computer, a processor, a server, or another device having the capability of model training, model processing, or the like, which is not limited herein.

Fig. 1 is a flow chart of a training method of a text model according to an embodiment of the present application.

As shown in fig. 1, the method includes:

s101, acquiring a training sample set.

Optionally, a training sample set is obtained through the device, where the training sample set may include labeled text samples. According to the text model training method and device, the text samples can be collected and marked according to the function of the text model which is actually required to be trained, for example, the text model with the classification function is to be trained, and then the text samples in the training sample set are marked with the corresponding classifications, and the method and device are not limited. It should be noted that, the number of training samples in the training sample set is not limited herein, and may be flexibly set according to an actual application scenario.

S102, substituting the training sample set into the dictionary enhanced language feature model after pre-training to obtain a plurality of feature vectors.

Optionally, pre-trained dictionary enhancement language feature models (Dictionary enhanced Bert, abbreviated as DeBert) are pre-stored in the device, wherein the dictionary enhancement language feature models are obtained through vocabulary entry prediction training and comparison learning training, the vocabulary entry prediction training is used for training vocabulary entries of marker prediction in example sentences, and the comparison learning training is used for training relations among vocabulary entry meanings.

For example, text in a training sample set may be input to a pre-trained dictionary enhanced language feature model in the form of batch (batch) sentence pairs, where the batch (batch) sentence pairs may be composed of positive samples, which need not contain negative samples.

Taking training a text similarity model as an example, that is, the training text model can perform text similarity analysis, in this embodiment of the present application, the sentence pairs of positive samples may be in the format of<a,a ⁺ >Wherein sentence a ⁺ Is the positive example of sentence a, that is, sentence a ⁺ Is similar to sentence a, while for a sample, within a batch (batch) sentence, the other sample pairs are negative samples, e.g., within a batch (batch) sentence<a,a ⁺ >、<b,b ⁺ >And<c,c ⁺ >three pieces of data, then for sentence a, then b, b ⁺ ,c,c ⁺ Are negative examples of a.

S103, fusing the plurality of feature vectors to obtain a target vector.

Alternatively, a vector fusion algorithm may be used to fuse multiple feature vectors to obtain a target vector, where the specific vector fusion algorithm is not limited herein.

S104, acquiring a loss function according to the target vector until training is completed, so as to acquire a text model.

Optionally, training the initial text model according to the target vector, calculating a loss function in the training process, and continuously adjusting the initial text model in training according to the loss function value until the loss function value converges to obtain the text model.

Alternatively, the processing capability of the text model mentioned in the embodiments of the present application depends on the training sample set substituted during training, for example, the text relationship in the training sample set is labeled with similarity between texts, then the trained text model may be used for predicting text similarity, and for example, the text relationship in the training sample set is labeled with classification of texts, then the trained text model may be used for text classification, which is not limited herein.

According to the method, the expression capacity of the model is enhanced by substituting the training sample set into the pre-trained dictionary enhancement language feature model, on one hand, the dictionary enhancement language feature model is introduced, and on the other hand, the vocabulary entry prediction training and the contrast learning training are performed.

Fig. 2 is a flowchart of another training method for a text model according to an embodiment of the present application, where pre-training of the dictionary enhanced language feature model is completed before substituting the training sample set into the pre-trained dictionary enhanced language feature model. It should be noted that the dictionary enhancement language feature model may be, for example, a pre-training process of introducing dictionary data of a preset dictionary into the Bert (Bidirectional Encoder Representations from Transformers) model and a model obtained by combining with a contrast learning training, where the preset dictionary may be an existing dictionary database, such as oxford dictionary, and the like, and is not limited herein.

As shown in fig. 2, the method includes:

s201, extracting an original text from a preset dictionary.

The original text may include an entry, a paraphrase corresponding to the entry, and an example sentence corresponding to the entry.

Optionally, the preset dictionary records pre-input text data, which may be pre-input by the user, or may be pre-input by the user into an existing dictionary database, such as oxford dictionary. The preset dictionary generally contains terms, paraphraseology, example sentences, and other data. The preset dictionary is a computer readable file, and a certain term, a paraphrase corresponding to the term and an example sentence corresponding to the term can be extracted from the preset dictionary in an extraction mode.

For example, assume that the preset dictionary includes the contents as shown in table 1:

TABLE 1

Taking table 1 as an example, assuming that the extracted term "modest" is taken as an example, the extracted original text includes a term (modest), a paraphrase (non-large, extensible, important, etc.) corresponding to the term, and an example sentence (He charged a relatively modest fee) corresponding to the term.

It should be noted that, one term generally includes a plurality of definitions, each of which generally corresponds to a plurality of example sentences, and in the example, only randomly selected definitions and example sentences are used. In the embodiment of the application, for the term with multiple meanings, a plurality of definitions and corresponding example sentences can be split correspondingly and regarded as a plurality of samples.

S202, generating a vocabulary entry prediction sample set according to the original text and the vocabulary entry prediction rule.

The term prediction sample set may include an input text sample and a term prediction value.

For example, according to the entry prediction rule, an entry in an example sentence in an original text is replaced by a [ MASK ] mark, the entry is converted into an example sentence with a [ MASK ], the example sentence and an entry paraphrasing are spliced into an input text sample, and the input text sample is obtained as an example sentence [ SEP ] with a [ CLS ] paraphrasing [ SEP ] with a [ MASK ], and optionally, the input text sample is marked as X, and the entry predicted value is marked as Y.

Illustratively, taking the term "model" as an example, the term prediction sample set generated in the embodiment of the present application may be as shown in table 2:

TABLE 2

Where [ CLS ] represents a start symbol of the input text sample X, [ SEP ] represents an end symbol of each sentence, [ MASK ] represents a mark replacing a corresponding entry position of an example sentence in the original text.

It can be understood that the term prediction rules mentioned in the embodiments of the present application may be flexibly set according to actual application scenarios, which are not limited herein.

In this embodiment, an original text is extracted from a preset dictionary, and then a vocabulary entry prediction sample set is generated according to the original text and a vocabulary entry prediction rule. Because the database in the preset dictionary is rich in semantics, the expression capability of the dictionary for enhancing the language characteristic model can be enhanced.

S203, a comparison learning sample set is generated according to the original text and the comparison learning rule.

Wherein the contrast learning sample set may include a positive sample pair and a negative sample pair.

In specific implementation, taking the above pre-set dictionary as an example, the term paraphrasing and example sentence are spliced into an input sentence to obtain a form of [ CLS ]]Paraphrasing [ SEP ]]Example sentence [ SEP ]]". The combination of the result of the concatenation of the paraphrasing and the term may be used as a positive sample pair, and the combination of the result of the concatenation of the paraphrasing and the term may be used as a negative sample pair. For example for the term "modest", whichThe hyponym is "hub", the anticnym is "arogant", and the corresponding input text samples are obtained according to the original text and the comparison learning rule and are respectively x ₁ ,x ₂ ,x ₃ ：

x ₁ :[CLS]not talking much about your own abilities or possessions[SEP]She's very modest about her success[SEP].

x ₂ ：[CLS]showing you do not think that you are as important as other people[SEP]Be humble enough to learn from your mistakes.[SEP].

x ₃ ：[CLS]behaving in a proud,unpleasant way,showing little thought for other people[SEP]He was a rude,arrogant young man.[SEP].

Then x ₁ And x ₂ Combined into positive sample pairs<x ₁ ,x ₂ >，x ₁ And x ₃ Combined as a negative pair of samples<x ₁ ,x ₃ >。

It can be understood that the comparison learning rule is mentioned in the embodiment of the present application, and may be flexibly set according to the actual application scenario, which is not limited herein.

In this embodiment, the comparison learning sample set is generated by extracting the original text from the preset dictionary and then according to the original text and the comparison learning rule. The accuracy of the acquired text model on text processing can be further improved in an auxiliary mode through comparing the learning rules.

After the vocabulary entry prediction sample set and the comparison learning sample set are established, vocabulary entry prediction training and comparison learning training can be jointly carried out, and the establishment sequence of the vocabulary entry prediction sample set and the comparison learning sample set is not divided into the front and the back.

Fig. 3 is a flowchart of another training method for a text model according to an embodiment of the present application, as shown in fig. 3, in step 202, after generating a vocabulary entry prediction sample set according to an original text and a vocabulary entry prediction rule, the training method further includes:

s301, substituting the vocabulary entry prediction sample set into the initial dictionary enhancement language feature model for training, and obtaining the predicted vocabulary entry.

Alternatively, the predicted entry is a representation of the entry learned by predicting a [ MASK ] tag in an example sentence of the input text sample back to the original entry by combining the paraphrasing of the original text and the example sentence. Specifically, the input text sample X in the vocabulary entry prediction sample set is substituted into the initial dictionary enhancement language feature model for training, so that the predicted vocabulary entry can be obtained.

Illustratively, for the above-mentioned example of the term "modest", the text sample X will be input: "[ CLS ] non-large, extensible, importent, etc ] [ SEP ] He charged a relatively [ MASK ] fee ] [ SEP ]" is substituted into the initial dictionary enhanced language feature model for training, and the predicted entry can be obtained through training.

S302, calculating and obtaining a first loss parameter according to the first loss function and the predicted entry.

Optionally, according to the predicted term obtained through training, calculating cross entropy loss (cross entropy loss) of the predicted term and the corresponding original term by using a first loss function to obtain a first loss parameter L ₁ 。

By way of the above illustration, a cross entropy loss is calculated (cross entropy loss) using the predicted value Y originally corresponding to the obtained predicted entry and the input text sample X to obtain a first loss parameter L ₁ 。

Optionally, in step S203, after generating the comparison learning sample set according to the original text and the comparison learning rule, the method further includes: substituting the comparison learning sample set into the initial dictionary enhanced language feature model for training, and calculating and obtaining a second loss parameter according to a second loss function.

Alternatively, contrast learning is to pull the distance between the positive case and the negative case by using a contrast loss function, and push the distance between the positive case and the negative case. In this embodiment, "paraphrasing+example sentences" of the original text entry and "paraphrasing+example sentences" of the paraphrasing may constitute positive sample pairs, and "paraphrasing+example sentences" of the anticonsite may constitute negative sample pairs.

Alternatively, the positive sample pair and the negative sample pair in the comparison learning sample set are respectively substituted into the initial dictionary enhancement language feature model for training, for example, the positive sample pair and the negative sample pair are input into the initial dictionary enhancement language feature model, so that respective [ CLS ] marked output vectors of two texts can be obtained and used as respective sentence vector representations, and the similarity degree between the two vectors can be measured through a loss function.

Optionally, calculating a loss function value of the sample similarity parameter according to the sample similarity parameter obtained by training to obtain a second loss parameter L ₂ The specific calculation process can be found in the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,

CLS of original term, paraphrasing and anticnym respectively]The output vector, D is the term set, sim (·,) is cosine similarity, τ is the temperature coefficient (super parameter).

In this embodiment, by learning to distinguish the relationships between the meaning of the terms by adopting a contrast learning manner, sentence representation of the text to be processed can be improved.

Optionally, according to the first loss parameter L ₁ Second loss parameter L ₂ And reversely optimizing parameters of the initial dictionary enhancement language feature model so as to obtain the pre-trained dictionary enhancement language feature model.

Optionally, the loss values of the first loss parameter and the second loss parameter are weighted and summed to obtain an overall loss, and the specific calculation process can be referred to as the following formula:

L＝λ ₁ L ₁ +λ ₂ L ₂

wherein θ ₁ ，λ ₂ Is super-parametric and lambda ₁ +λ ₂ In this embodiment, the super-parameter values are set to 0.5, which means that the importance of the vocabulary entry prediction task and the contrast learning task are equal, but the present invention is not limited thereto, and the values can be flexibly adjusted according to the training requirements.

And after the loss is calculated, carrying out back propagation optimization on the parameters of the initial dictionary enhancement language feature model, and carrying out multiple rounds of iterative training models. And after training is completed, obtaining a pre-trained dictionary enhanced language feature model.

Fig. 4 is a flow chart of another training method for a text model according to an embodiment of the present application, as shown in fig. 4, before substituting the training sample set into the pre-trained dictionary enhanced language feature model, the method further includes:

s401, after the stop words and/or the invalid words are removed from the texts in the training sample set, searching and obtaining target words in a preset vocabulary entry set where the texts are located, and taking the target words as enhancement words.

Optionally, the enhancement word may perform a replacement operation on an original term in the original text, where the preset term set includes the original term, a paraphrasing word, and an anticonying word.

Alternatively, the rejected stop words and/or invalid words may be words having no specific meaning such as an exclamation word, a preposition word, or the like.

S402, combining the text in the training sample set with the corresponding enhanced word paraphrasing to obtain the enhanced text.

In the embodiment of the application, the paraphrasing of the enhanced word is combined with the text in the training sample set in the form of "[ CLS ] paraphrasing [ SEP ] original sentence [ SEP ]" to obtain the enhanced text, and for the condition of word ambiguity, the text is combined into a plurality of text paraphrasing according to different modes.

Exemplary, for example, for a sentence text S, the enhancement word retrieved is w ₁ ,w ₂ ,w ₃ Wherein, corresponds to w ₁ ,w ₃ Are defined as def1, def3, w respectively ₂ With two definitions def2_1, def2_2, then the resulting enhanced text is:

s1: [ CLS ] def1[ SEP ] original sentence S [ SEP ]

S2: [ CLS ] def2_1[ SEP ] original sentence S [ SEP ]

S3: [ CLS ] def2_2[ SEP ] original sentence S [ SEP ]

S4: [ CLS ] def3[ SEP ] original sentence S [ SEP ].

S403, replacing and updating the training sample set by the enhanced text.

Optionally, specifically, when updating is performed, the reinforced text is adopted to replace the sample at the corresponding position in the training sample set, so as to form a reinforced training sample set.

In this embodiment, after updating, the enhanced training sample set is substituted into the dictionary enhanced language feature model after pre-training to perform subsequent training, so that the accuracy of text processing can be improved.

Correspondingly, substituting the training sample set into the pre-trained dictionary enhanced language feature model to obtain a plurality of feature vectors, wherein the method comprises the following steps:

substituting the enhanced training sample set into the dictionary enhanced language feature model which is pre-trained to obtain a plurality of feature vectors.

For "one sentence text S", the 4 enhanced texts S1, S2, S3, S4 are input into the pre-trained dictionary enhanced language feature model, respectively, to obtain 4 feature vectors v1, v2, v3, v4.

As an optional embodiment, in step 103, fusing the plurality of feature vectors to obtain the target vector includes:

optionally, the attention mechanism is used to fuse a plurality of feature vectors, and the target vector is obtained, where the target vector includes: the sentence characterizes the vector. Taking the sentence text S as an example, the specific calculation process of obtaining the sentence vector v may refer to the following formula:

u _i ＝tanh(W _att v _i +b _att )

wherein w is _att 、b _att And u _att Parameters that need to be learned for the attention mechanism, u _i 、α _i Intermediate meters each of oIn the course of the calculation process,

for the calculated intermediate vector u _i Transposed of alpha _i V is _i Is defined by the weights of the sentence vectors v _i Is obtained by a weighted sum of (a) and (b).

On the basis of the above embodiment, the text model is obtained by further training with the target vector.

Fig. 5 is a flow chart of another training method for a text model according to an embodiment of the present application, as shown in fig. 5, the obtaining a loss function according to the target vector until training is completed, so as to obtain the text model, including:

s501, calculating a third loss parameter L according to the target vector and the third loss function ₃ 。

Taking the sentence text S as an example, calculating InfoNCE loss, i.e. a third loss parameter L, in a batch (batch) sentence according to sentence vectors of each sentence obtained by the operation ₃ . For example, for the ith sample pair<a,a ⁺ >，o _i The sentence vector being a is a set of sentence vectors,

for its positive example a ⁺ The third penalty function is defined as follows:

where N is the number of samples in a batch (batch) sentence, o _j And

the value of the penalty of one batch (batch) sentence, i.e. the third penalty parameter L, is determined by the j-th sample sentence vector and its positive example sentence vector in the batch (batch) sentence ₃ The method comprises the following steps:

s502, optimizing parameters in the initial text model according to the third loss parameters, and acquiring a trained text model, wherein the text processing capacity of the text model corresponds to the text relation in the training sample set.

Optionally, after the third loss parameter is obtained by calculation, the parameter in the initial text model is optimized by back propagation, so that a trained text model is obtained.

In an alternative embodiment, after the training of the text model is completed, the text may be processed by calling the text model, and the following examples will further describe the method for calling the text model.

Fig. 6 is a schematic flow chart of calling a text model according to an embodiment of the present application, as shown in fig. 6. The method specifically comprises the following steps:

s601, acquiring sentence text to be processed.

In this embodiment, the sentence text to be processed is determined according to the specific function of the text model, and the text for obtaining the similarity of the text is taken as an example, and the sentence text to be processed may be the sentence text for obtaining the similarity.

Wherein the sentence text is input in the form of sentence pairs.

S602, retrieving enhancement words in a preset vocabulary entry set of the sentence text to be processed and constructing an enhancement sentence text.

The specific way of constructing the enhanced sentence text is the same as that in the foregoing model training process, and will not be described here again.

S603, substituting the enhanced sentence text into a dictionary enhanced language feature model to obtain a plurality of feature vectors.

S604, fusing a plurality of feature vectors by adopting an attention mechanism, and outputting sentence vectors.

Alternatively, the sentence vector is output by the trained text model using an attention mechanism.

S605, processing the sentence vector to obtain a text processing result.

Alternatively, taking text similarity as an example, the text model may obtain cosine similarity to represent similarity of sentence text to be processed, that is, similarity of sentence text obtained as a result of text processing.

Fig. 7 is a schematic structural diagram of a training device for text models provided in an embodiment of the present application, and as shown in fig. 7, the device may include:

a first obtaining module 701, configured to obtain a training sample set, where the training sample set includes: labeling a good text sample;

a second obtaining module 702, configured to substitute the training sample set into a dictionary enhanced language feature model after pre-training to obtain a plurality of feature vectors; the dictionary enhanced language feature model is obtained through vocabulary entry prediction training and comparison learning training, the vocabulary entry prediction training is used for training vocabulary entries predicted by marks in example sentences, and the comparison learning training is used for training the relations among vocabulary entry meanings;

a fusion module 703, configured to fuse a plurality of feature vectors to obtain a target vector;

training module 704, configured to obtain a loss function according to the target vector until training is completed, so as to obtain a text model.

In an alternative embodiment, the second obtaining module 702 is further configured to extract, from a preset dictionary, an original text, where the original text includes: the vocabulary entries, the paraphrases and the example sentences corresponding to each vocabulary entry;

In an alternative embodiment, the second obtaining module 702 is further configured to substitute the vocabulary entry prediction sample set into the initial dictionary enhancement language feature model for training to obtain the predicted vocabulary entry;

In an alternative embodiment, the second obtaining module 702 is further configured to substitute the comparison learning sample set into the initial dictionary enhancement language feature model for training, and calculate and obtain the second loss parameter according to the second loss function.

In an alternative embodiment, the second obtaining module 702 is further configured to reversely optimize the parameters of the initial dictionary enhancement language feature model according to the first loss parameter and the second loss parameter, and obtain the pre-trained dictionary enhancement language feature model.

In an optional implementation manner, the second obtaining module 702 is further configured to search for, as the enhancement word, a target word in a preset vocabulary entry set where the obtained text is located after removing the stop word and/or the invalid word from the text in the training sample set;

combining the text in the training sample set with the corresponding enhanced word paraphrasing to obtain an enhanced text;

the training sample set is updated with enhanced text replacement.

In an alternative embodiment, the fusion obtaining module 703 is further configured to fuse a plurality of feature vectors with an attention mechanism to obtain a target vector, where the target vector includes: the sentence characterizes the vector.

In an alternative embodiment, training module 704 is configured to calculate a third loss parameter based on the target vector and the third loss function;

and optimizing parameters in the initial text model according to the third loss parameters, and acquiring a trained text model, wherein the text processing capacity of the text model corresponds to the text relationship in the training sample set.

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 8, the electronic device may include: the training system comprises a processor 801, a memory 802 and a bus, wherein the memory 802 stores machine-readable instructions executable by the processor 801, and when the electronic device is running, the machine-readable instructions are executed, the processor 801 and the memory 802 communicate through the bus, and the processor 801 is used for executing the steps of the training method of the text model in the embodiment.

The memory 802, the processor 801, and the bus components are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The removable storage device includes at least one software functional module that may be stored in the memory 802 in the form of software or firmware (firmware) or cured in the Operating System (OS) of the electronic device. The processor 801 is configured to execute executable modules stored in the memory 802, such as software functional modules and computer programs included in a training method of a text model.

The Memory 802 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

Optionally, an embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, performs the steps of the training method of the text model in any of the foregoing embodiments. The specific implementation manner and the technical effect are similar, and are not repeated here.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform part of the steps of the methods of the embodiments of the invention. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

The foregoing is merely illustrative of embodiments of the present invention, and the present invention is not limited thereto, and any changes or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and the present invention is intended to be covered by the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method for training a text model, comprising:

fusing a plurality of the feature vectors to obtain a target vector;

2. The method according to claim 1, wherein the method further comprises:

3. The method of claim 2, wherein after generating a set of vocabulary prediction samples according to the original text and vocabulary prediction rules, further comprising:

4. The method of claim 3, wherein after generating a comparison learning sample set according to the original text and a comparison learning rule, further comprising:

5. The method according to claim 4, wherein the method further comprises:

6. The method of claim 1, wherein substituting the training sample set into the pre-trained lexicon enhanced language feature model is preceded by:

and updating the training sample set by using the enhanced text replacement.

7. The method of claim 6, wherein fusing the plurality of feature vectors to obtain a target vector comprises:

8. The method of claim 7, wherein said obtaining a loss function from said target vector until training is completed to obtain a text model comprises:

9. A training device for a text model, comprising:

10. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the training method of the text model according to any of claims 1-7.