CN117875266B

CN117875266B - Training method and device for text coding model, electronic equipment and storage medium

Info

Publication number: CN117875266B
Application number: CN202410269435.XA
Authority: CN
Inventors: 陈春全
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Filing date: 2024-03-11
Publication date: 2024-06-28
Anticipated expiration: 2044-03-11

Abstract

The application relates to the technical field of data processing, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, in particular to a training method, a training device, electronic equipment and a storage medium of a text coding model, wherein the training method comprises the following steps: performing multi-round iterative training on the constructed coding guide network and a target coding network obtained by pre-training, and constructing a text coding model based on the trained target coding network; in the process of one round of iterative training, the auxiliary coding guide network restores the mask content in the sample text by means of the text coding vector obtained by the sample text coding corresponding to the target coding network, and parameter adjustment is carried out according to the difference between the mask prediction result and the corresponding sample mask result. Therefore, the target coding network can be prompted to learn to extract high-quality text coding vectors aiming at sample texts in the training process, and training of the text expression capacity of the target coding network is achieved.

Description

Training method and device for text coding model, electronic equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a training method and apparatus for a text coding model, an electronic device, and a storage medium.

Background

In the prior art, in the process of selecting similar texts, usually, by means of a text coding model, coding vectors are obtained for text contents, and then the text contents with similarity relations are determined by calculating the similarity between the coding vectors.

At present, in the process of training a text coding model, the constructed text coding model is generally trained according to a sample text after masking, so that the text coding model learns the word position content of a prediction mask position to obtain a coding vector corresponding to the text content.

However, in the existing training mode of the text coding model, the training of the model focuses on restoring the word position content of a single word position, so that the text coding model obtained by training does not have good text expression capability, the extracted coding vector cannot represent the semantic information of the text content on the whole, and the comparison of similar texts cannot be effectively realized according to the extracted coding vector.

Disclosure of Invention

The embodiment of the application provides training of a text coding model, which is used for improving the training effect of the text coding model and guiding the text coding model to learn the semantics of representing text contents on the whole.

In a first aspect, a training method for a text coding model is provided, including:

Performing multiple rounds of iterative training on the constructed code guide network and the target code network obtained by pre-training by adopting preset various text books, and constructing a text code model based on the trained target code network, wherein in the iterative training process, the following operations are executed:

Reading a sample text, and performing random masking processing on each word position content in the sample text according to a preset target masking proportion to obtain a masking text associated with a masking position and a sample masking result;

Obtaining a text coding vector corresponding to the sample text by adopting the target coding network, and determining a mask prediction result in each candidate word covered by a preset word list by adopting the coding guidance network according to the mask position in the mask text under the guidance of the text coding vector;

Based on a result difference between the mask prediction result and the sample mask result, network parameters of the target encoding network and the encoding pilot network are adjusted.

In a second aspect, a training device for a text coding model is provided, including:

the first training unit is used for carrying out multiple rounds of iterative training on the constructed coding guide network and the target coding network obtained by pre-training by adopting preset various text books, and constructing a text coding model based on the trained target coding network, wherein in the iterative training process, the following operations are executed:

Optionally, the device further includes a second training unit, where the target coding network is obtained by training by the second training unit in the following manner:

Performing multiple rounds of iterative training on the constructed initial coding network and the initial prediction network by adopting each training text to obtain a target coding network which is obtained by training the initial coding network, wherein in the process of one round of iterative training, the following operations are executed:

reading training texts, and carrying out random mask processing on each word position content in the training texts according to a preset initial mask proportion to obtain initial mask texts associated with mask positions and sample mask contents, wherein the initial mask proportion is lower than the target mask proportion;

And outputting each code vector by adopting the initial code network corresponding to the word position content of each word position in the initial mask text, determining mask predicted content in each candidate word covered by a preset word list based on the code vector corresponding to the mask position by adopting the initial prediction network, and adjusting network parameters of the initial code network and the initial prediction network based on the content difference between the mask predicted content and the sample mask content.

Optionally, the various samples herein are obtained by the first training unit in the following manner:

collecting and obtaining each initial text in the text data of each service type;

Unifying character coding forms of the initial texts, deleting non-text contents in the initial texts to obtain processed candidate texts;

And deleting repeated texts from the candidate texts, and deleting abnormal texts containing preset illegal keywords to obtain each sample text.

Optionally, when the repeated text is deleted, the first training unit is configured to:

aiming at each candidate text, respectively calculating text hash values by adopting a preset hash function;

And determining repeated texts in the candidate texts according to the hash values of the texts, and deleting the repeated texts from the candidate texts.

Optionally, when the target coding network is adopted and a text coding vector is obtained corresponding to the sample text, the first training unit is configured to:

adding head word position content and tail word position content to the sample text, and constructing each initial coding result corresponding to the word position content of each word position in the processed sample text;

Adopting the target coding network, and obtaining a coding vector generated by coding under the influence of the initial coding results of other word positions by means of a multi-head self-attention mechanism and nonlinear transformation processing and corresponding to the initial coding result of each word position;

and determining the coding vector corresponding to the header word position content as the text coding vector of the sample text.

Optionally, when the corresponding processed lexeme content of each lexeme in the sample text constructs each initial encoding result, the first training unit is configured to:

performing word segmentation and coding processing on the processed sample text, obtaining each content coding result corresponding to each word position in the sample text, and obtaining each position coding result corresponding to each word position;

and superposing a content coding result and a word position coding result corresponding to the same word position to obtain initial coding results corresponding to the word positions respectively.

Optionally, the code guidance network is used, under the guidance of the text code vector, corresponding to the mask position in the mask text, and when determining a mask prediction result in each candidate word covered by a preset word list, the first training unit is configured to:

Constructing each initial coding result according to the mask text added with the head word position content and the tail word position content and the word position content corresponding to each word position in the mask text, and adopting the text coding vector to replace the initial coding result corresponding to the head word position content of the mask text;

and adopting the code guidance network, obtaining a code vector generated by coding under the influence of the initial code contents of other word positions aiming at the initial code result of each word position content in the mask text by means of a multi-head self-attention mechanism and nonlinear transformation processing, and determining a corresponding mask prediction result in each candidate word covered by a preset word list based on the code vector corresponding to the mask position.

Optionally, after the training corresponding to the target coding model to obtain a text coding model, the device further includes a computing unit, where the computing unit is configured to:

Acquiring a target question text input by a target object, and acquiring pre-stored candidate question texts;

Outputting target coding vectors corresponding to the target problem texts and outputting candidate coding vectors corresponding to the candidate problem texts by adopting the text coding model;

And calculating the vector similarity between the target coding vector and each candidate coding vector, determining a reference question text similar to the target question text in the candidate question texts based on the vector similarity, and feeding back reply content associated with the reference question text to the target object.

acquiring target texts browsed by target objects, and acquiring candidate texts to be recommended;

outputting target coding vectors corresponding to the target texts and outputting candidate coding vectors corresponding to the candidate texts by adopting the text coding model;

And calculating the vector similarity between the target coding vector and each candidate coding vector, and determining a text to be recommended similar to the target text in each candidate text based on the vector similarity.

In a third aspect, an electronic device is presented comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above method when executing the computer program.

In a fourth aspect, a computer readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, implements the above method.

In a fifth aspect, a computer program product is proposed, comprising a computer program which, when executed by a processor, implements the above method.

The application has the following beneficial effects:

The application provides a training method, a device, electronic equipment and a storage medium of a text coding model, and particularly provides a method, a device, electronic equipment and a storage medium for performing multi-round iterative training on a coding guide network and a target coding network obtained by pre-training in the process of training to obtain the text coding model, and finally constructing the text coding model based on the trained target coding network; that is, the code guiding network and the target code network are synchronously trained, and in the process of training the target code network, the code guiding network plays a role in learning and guiding the target code network;

In the single iteration training process, carrying out random mask processing on each word position content in a sample text according to a preset target mask proportion, obtaining a mask text, a relevant mask position and a sample mask result, adopting a target coding network, obtaining a text coding vector corresponding to the sample text, adopting a coding guide network, determining a mask prediction result in each candidate word covered by a preset word list under the guidance of the text coding vector corresponding to the mask position in the mask text, and then adjusting network parameters of the coding guide network and the target coding network based on the difference between the mask prediction result and the sample mask result at the same mask position; therefore, in the process of iterative training, the target coding network is required to capture word sequence information in the sample text and characterize semantic information of the sample text as a whole by means of text coding vectors obtained by corresponding sample text codes of the target coding network, so that the target coding network is promoted to learn to extract high-quality text coding vectors for the sample text in the training process, training of text expression capacity of the target coding network is realized, and training effect for the target coding network can be ensured.

Drawings

Fig. 1 is a schematic diagram of a possible application scenario in an embodiment of the present application;

FIG. 2 is a schematic diagram of a network to be trained involved in training a text encoding model in an embodiment of the present application;

FIG. 3A is a schematic diagram of a network constructed for training a target coding network in an embodiment of the present application;

FIG. 3B is a schematic diagram of a training process for an initial coding network and an initial prediction network according to an embodiment of the present application;

FIG. 3C is a schematic diagram of a process of obtaining mask text corresponding to a training text according to an embodiment of the present application;

FIG. 3D is a schematic diagram of a training process according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a training process for a target coding network and a code guidance network according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a process of two-network co-training in an embodiment of the present application;

FIG. 6 is a schematic diagram of a process for calculating similarity between problem texts according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a process for implementing text similarity determination in an embodiment of the present application;

FIG. 8 is a schematic diagram of a training device for text encoding models according to an embodiment of the present application;

Fig. 9 is a schematic diagram of a hardware composition structure of an electronic device to which the embodiment of the present application is applied;

Fig. 10 is a schematic diagram of a hardware composition structure of another electronic device to which the embodiment of the present application is applied.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be capable of operation in sequences other than those illustrated or otherwise described.

In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function and working together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

Transformer: a deep learning model structure based on an attention mechanism.

Multi-head self-attention mechanism: an attention mechanism, in particular to a neural network structure, which is mainly used for processing association and dependency relations in a text sequence. In the embodiment of the application, the multi-head self-attention mechanism layer is used for carrying out feature coding processing on initial features (namely initial coding results) corresponding to each word position content, wherein the multi-head self-attention mechanism layer is used for calculating the dependency relationship between the initial features of different word positions in a text, mapping the initial features of different word positions in the text into a plurality of different query vectors, key vectors and value vectors respectively, calculating the similarity among the different query vectors, the key vectors and the value vectors to obtain a weight matrix, and carrying out weighted summation on the value vectors according to the weight matrix to obtain the coding vector corresponding to the initial features on each word position.

Cross entropy loss (cross entropy loss): a loss function.

Word frequency-inverse document frequency (term frequency-inverse document frequency, TF-IDF) algorithm: the method comprises a Term Frequency (TF) algorithm and an inverse document Frequency (Inverse Document Frequency, IDF) algorithm, wherein the TF algorithm is used for counting the Frequency of occurrence of a word in one document, and the IDF algorithm is used for counting how many documents of a document set a word occurs. Based on this, when calculating the word frequency-inverse document frequency corresponding to a word, the TF algorithm result calculated for the word and the IDF algorithm result are multiplied.

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI): the system is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The following briefly describes the design concept of the embodiment of the present application:

In conceiving to accomplish the task of text similarity calculation, the applicant thought that text could be represented as TF-IDF vectors based on a vector space model, and then cosine similarity between TF-IDF vectors was calculated.

However, although this method can highlight important words and can capture semantic information to some extent, word order information cannot be captured, so that the information content represented in the vector is not comprehensive.

The applicant then conceived that the text could be represented as a word frequency vector by means of a bag of words model, and then the cosine similarity between the two word frequency vectors was calculated.

However, this approach, while simple and easy to implement, ignores the order of words in the text, which may result in the loss of important semantic information, and again cannot capture word order information, so the information content represented in the vector is monolithic.

After that, the applicant conceived to determine the similarity between two texts by calculating the edit distance between the two texts, that is, the minimum number of editing operations (insertion, deletion, substitution) required to convert one text into the other text, and based on the obtained minimum number of editing operations.

However, although the word order information can be captured on a certain program, the word similarity sensitivity of the method based on the editing distance is low, the method mainly focuses on the matching degree of words in two texts, but ignores the semantic similarity between the two texts, so that the related similarity comparison operation is limited to the characters on the surface, and cannot be analyzed from the perspective of deep semantic information.

Based on this, the applicant believes that the above statistical learning task can only be analyzed from the perspective of the words, and further deep features cannot be extracted for the words; the extraction of text vectors should be achieved by means of training neural network models.

In this case, the applicant has conceived that a Word vector representation model (e.g., word2Vec, gloVe, etc.) may be pre-trained to obtain a text encoding model, and learn to represent each Word in the text as a Word vector during model training, and then use the average Word vector of each Word vector as a vector representation of the entire text.

However, for the text coding model trained in the above manner, although the text coding model can capture semantic and grammatical relations between words, it is insensitive to synonyms and polysemous words and thus can only focus on vector representations of individual words; in addition, the text vector obtained by calculating the average word vector cannot capture the semantic information of the whole text well.

Furthermore, the applicant thinks of pre-training a language model (e.g. BERT) to obtain a text encoding model, and learning to encode the text during model training to obtain a vector representation of the text, and then calculating cosine similarities between the encoded vectors.

However, for the text coding model trained in the above manner, the word position content of the restoration mask is generally used as a training target, the model is focused on restoring individual independent words, and the semantic information of the text content cannot be represented as a whole.

Therefore, in the training mode of the existing text coding model, the training of the model focuses on restoring words with single word positions, so that the text coding model obtained by training does not have good text expression capability, the extracted coding vectors cannot represent semantic information of text contents on the whole, and comparison of similar texts can not be effectively realized according to the extracted coding vectors.

In view of this, the application provides a training method, a device, an electronic device and a storage medium of a text coding model, and in particular provides a method for performing multiple rounds of iterative training on a coding guide network and a target coding network obtained by training in advance in the process of training to obtain the text coding model, and finally constructing the text coding model based on the trained target coding network; that is, the code guiding network and the target code network are synchronously trained, and in the process of training the target code network, the code guiding network plays a role in learning and guiding the target code network;

Furthermore, after a text coding model is constructed based on the trained target coding network, the coding vectors extracted by the text coding model can represent the semantics of the text content on the whole.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and that the embodiments of the present application and the features of the embodiments may be combined with each other without conflict.

Fig. 1 is a schematic diagram of a possible application scenario in an embodiment of the present application. The application scenario diagram includes a client device 110, and a processing device 120.

In a possible embodiment of the present application, the processing device 120 may obtain the target text content sent by the related object on the client device 110, and determine, in a specific service scenario, a text content range in which similarity determination needs to be performed for the target text content; then, obtaining target coding vectors by means of the text coding model obtained through training and corresponding to target text contents, and obtaining candidate coding vectors by adopting the text coding model and corresponding to candidate text contents in the text content range; and then, respectively calculating the vector similarity between each candidate coding vector and the target coding vector, and determining the text content meeting the similarity requirement with the target text content in each candidate text content according to the vector similarity.

It should be noted that, the target text content sent by the related object on the client device 110 may be initiated by any one of an applet application, a client application, and a web application, which is not particularly limited by the present application.

Client devices 110 include, but are not limited to, cell phones, tablet computers, notebooks, electronic book readers, intelligent voice interaction devices, intelligent appliances, vehicle terminals, aircraft, and the like.

The processing device 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like; in a possible implementation, the processing device may be a terminal device with processing capabilities as desired, such as a tablet computer, a notebook, etc.

The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.

In the embodiment of the present application, communication between the client device 110 and the processing device 120 may be performed through a wired network or a wireless network. In the following description, the training process of the text encoding model and the processing process performed in accordance with the text encoding model will be described only from the viewpoint of the processing device 120.

The following describes schematically a scenario involving training of a text coding model in connection with possible application scenarios:

And matching answers most relevant to the user questions in the first application scene and the intelligent customer service scene.

In the corresponding processing process of the application scene one, the processing equipment can acquire target question texts sent by the user on the client equipment, and determine each candidate question text associated with the reply content in the maintained knowledge base; and then, obtaining target coding vectors corresponding to the target problem texts and candidate problem texts by adopting a text coding model obtained through training, and obtaining candidate coding vectors.

Further, according to the vector similarity between each candidate encoding vector and the target encoding vector, one reference question text which is most similar to the target question text is determined in each candidate question text, and the reply content associated with the reference question text is used as the reply content corresponding to the target question text and fed back to the client device of the user, so that the user can obtain the reply content which is most relevant to the target question text.

And in the application scene II, recommending other texts similar to the text browsed by the user in the text recommendation scene.

Under the scene corresponding to the application scene II, the processing equipment can acquire target texts interested by the user and acquire candidate texts to be recommended; and then, obtaining target coding vectors corresponding to the target texts and corresponding to each candidate text by adopting a text coding model obtained through training, so as to obtain each candidate coding vector.

Further, according to the vector similarity between each candidate coding vector and the target coding vector, determining a text to be recommended similar to the target text in each candidate text, and pushing the determined text to be recommended to the client device of the user.

And determining a classification result corresponding to the text in the content classification scene by the application scene III.

In the scene corresponding to the application scene III, the processing equipment can respectively maintain corresponding reference texts aiming at different content classifications; and after obtaining the text to be classified sent by the user on the client device, obtaining a target coding vector corresponding to the text to be classified and each candidate coding vector corresponding to each reference text by adopting a text coding model obtained through training.

And determining a reference text most similar to the text to be classified according to the vector similarity between each candidate coding vector and the target coding vector, and classifying the content of the most similar reference text as a classification result of the text to be classified.

In addition, it should be understood that in the specific embodiment of the present application, the training of the text coding model and the calculation of the text similarity are involved, and when the embodiments described in the present application are applied to specific products or technologies, the collection, use and processing of the relevant data need to comply with relevant laws and regulations and standards of relevant countries and regions.

The training process of the text coding model is described below from the point of view of the processing device with reference to the accompanying drawings:

It should be noted that, in the embodiment of the present application, the initial encoding network and the initial decoding network which are constructed, and the encoding guidance network which is constructed may be constructed according to a network structure of any one of a transform encoder, a Long Short-Term Memory (LSTM) model, and a Recurrent Neural Network (RNN) model, and it should be understood that the initial encoding network includes N encoding blocks, and the encoding function portion in the encoding guidance network corresponds to H encoding blocks, where the value of N is greater than H, and the value of N may be a set multiple of H; the value of the set multiple is set according to the actual processing requirement, and the structures of the coding blocks in the initial coding network and the coding guiding network can be the same. In the following description of the present application, only the initial encoding network and the initial decoding network, and the encoding guidance network are constructed by using the network structure of the transducer encoder as an example.

Referring to fig. 2, which is a schematic diagram of a network to be trained involved in training a text coding model according to an embodiment of the present application, as can be seen from the content illustrated in fig. 2, the network to be trained includes: the target coding network comprises 24 coding blocks; the code guidance network includes 2 code blocks, a linear layer, and a softmax layer.

Continuing with the description of fig. 2, the coding block includes two sub-layers: a Multi-head Self-Attention mechanism (Multi-head Self-Attention) layer and a Feed-forward neural network (Feed-Forward Neural Network) layer. Each sub-layer (a multi-head self-attention mechanism layer or a feedforward neural network layer) is followed by residual connection (Residual Connection) processing and layer normalization (Layer Normalization) processing, wherein the multi-head self-attention mechanism layer is used for calculating the association degree between each word position content and other word position contents in an input sequence, thereby capturing long-distance dependency relations in sentences and simultaneously focusing on information of different word position positions by means of a multi-head mechanism; the feedforward neural network layer generally comprises two linear transformations (full-connection layers) and an activation function, and is used for respectively further processing the encoding results of each word position calculated by adopting a multi-head self-attention mechanism so as to extract richer semantic information, thereby better capturing the global dependence of an input sequence.

In addition, as can be seen from the content illustrated in fig. 2, the identification of the superposition of the characterization content refers to performing residual connection, which means that after processing, the input is directly added to the output, so that the gradient disappearance can be effectively relieved; layer normalization is also used to alleviate the problem of gradient extinction; the linear layer is used for carrying out dimension conversion on the input coding vector; the softmax layer is used for scaling numbers in the coding vector into a probability value domain of 0-1, and the softmax layer corresponds to a mask position of the mask text and outputs a prediction probability under each candidate word covered by a preset word list.

It should be understood that, in the embodiment of the present application, before the processing device trains the text coding model, the processing device needs to train to obtain the target coding network, and the process of training to obtain the target coding network is described below with reference to the accompanying drawings:

Referring to fig. 3A, which is a schematic diagram of a network constructed for training to obtain a target coding network according to an embodiment of the present application, as can be seen from the content of the schematic diagram in fig. 3A, the network construction includes: the initial coding network, which is obtained by connecting 24 coding blocks in series, and the prediction network, which includes a linear layer and a softmax layer, wherein the functions of the different hierarchies are the same as described above with respect to fig. 2, and will not be explained here.

After constructing a network to be trained for training of an initial coding network, the processing equipment adopts each training text to perform multiple rounds of iterative training on the constructed initial coding network and initial prediction network to obtain a target coding network which is obtained by training the corresponding initial coding network, wherein convergence conditions of the multiple rounds of iterative training can be as follows: training the total number of wheels to reach a first set value, or continuously lower than the number of times of the second set value by the calculated model loss value to reach a preset third set value; the values of the first setting value, the second setting value and the third setting value are set according to actual processing requirements, which is not particularly limited in the present application.

It should be appreciated that the multi-headed self-attention mechanism in the initial encoding network may be a bi-directional attention mechanism in particular, capable of simultaneously considering the context information of the lexeme content, including the lexeme content preceding it and the lexeme content following it, when processing text data. The bi-directional attention mechanism enables the initial encoding network to better capture semantic information in the context when interpreting text. In the initial encoding network, the encoding vector for each of the lexeme content is calculated based on the entire input text sequence, which enables the initial encoding network to better understand the meaning of each of the lexeme content in context.

Taking a round of iterative training performed in the process of training to obtain the target coding network as an example, the related training process is described below:

in the embodiment of the application, each training text adopted by the processing equipment in the process of training the initial coding network and the initial prediction network can be the same as or different from each sample text adopted in the process of training to obtain the text coding model, and the application is not particularly limited to the text coding model; it should be understood that the process of processing each acquired text data to obtain each training text has the same processing logic as the process of processing each acquired text data to obtain each sample text, and the present application will be described in detail in the process of obtaining text coding models in the subsequent training.

Referring to fig. 3B, a schematic diagram of a training process performed on an initial coding network and an initial prediction network according to an embodiment of the present application is shown, and a training process performed in the training process is described below with reference to fig. 3B:

Step 301: the processing equipment reads the training text, and performs random masking processing on each word position content in the training text according to a preset initial masking proportion to obtain an initial masking text associated with a masking position and sample masking content.

Specifically, the processing device reads the training texts used in one round of training, wherein the number of the training texts used in the one round of training, namely the value of the batch size (batchsize), is set according to the actual training requirement, and the application only takes batchsize value as 1 as an example to schematically illustrate the processing involved in the training process.

Moreover, when the processing device performs random mask processing on each word position content in the training text, the initial mask proportion is lower than the target mask proportion which is used for generating the mask text in the process of training to obtain the text coding model.

For example, the initial coding ratio may be 15% to 30% and the target mask ratio may be 50% to 60%.

In the embodiment of the application, in the process of obtaining the initial mask text based on the sample text, a special symbol, namely a head word position content and a tail word position content, can be added at the beginning and the end of the sample text respectively; and then, word segmentation and indexing are carried out on the processed sample text, wherein the indexing is to index corresponding words in a preset word list aiming at the word position content of each word position. Furthermore, noise can be added to the processed sample text, namely, a small part of the word position content in the processed sample text is replaced by a special character mask, and other word position contents are kept unchanged, so that the mask of the corresponding word position content is realized.

For example, referring to fig. 3C, which is a schematic diagram illustrating a process of obtaining mask text corresponding to one training text in an embodiment of the present application, it can be seen from the content illustrated in fig. 3C that the training samples read are assumed to be "vinca is a jilin province city. Then, after adding the head word position content and the tail word position content, the provincial city sep of which cls is Jilin can be obtained; then, word segmentation processing is carried out to obtain the word position content of each word position in the processed training text as [ 'cls', 'Changchun', 'is', 'Jilin', 'province', 'city'. 'sep' ]. Further, after the random masking process is performed, the lexeme content of each lexeme in the obtained initial mask text is [ ' cls ', ' vinca ', ' is ', ' mask ', ' city ', '. 'sep' ].

Therefore, under the condition that the value of the initial coding proportion is smaller, most of information of the training text can be reserved in the initial mask text obtained through processing, and more referenceable prediction basis is provided for the mask content prediction carried out subsequently.

Step 302: the processing device adopts an initial coding network, outputs each coding vector corresponding to the word position content of each word position in the initial mask text, adopts an initial prediction network, determines mask prediction content in each candidate word covered by a preset word list based on the coding vector corresponding to the mask position, and adjusts network parameters of the initial coding network and the initial prediction network based on content difference between the mask prediction content and sample mask content.

Specifically, in the process that the processing device adopts the initial coding network to output each coding vector corresponding to the word position content of each word position in the initial mask text, the processing device needs to firstly sort the initial mask text into an input form which can be accepted by the initial coding network, namely, the corresponding initial coding result is constructed aiming at the word position content of each word position in the initial mask text, wherein the initial coding result corresponding to one word position is determined based on the position information of one word position and the word position content of the word position.

It should be noted that, when determining an initial encoding result corresponding to a word position (assumed to be a word position X), determining a position encoding corresponding to the word position X and a content encoding corresponding to word position content at the word position X, and further determining a superposition result of the position encoding and the content encoding as an initial encoding result corresponding to the word position X; when determining the content coding corresponding to the word position content of the word position X, firstly determining the index result of the word position content in a preset word list, realizing the digital processing of the word position content, and then adopting coding modes such as one-hot and the like to realize digital vectorization to obtain the content coding; when determining the position code corresponding to the word position X, the position code may be processed by using a feasible position code manner in a transducer model, which is not particularly limited in the present application.

When the initial coding network and the initial prediction network are adopted for specific processing, the processing equipment uses the initial coding network to generate corresponding coding vectors by calculating the dependency relationship between the initial coding result of each word position in the initial mask text and the initial coding result of each other word position, wherein the processing of the initial coding network by using a multi-head self-attention mechanism and a feedforward neural network layer can be the same as the processing logic of a transducer encoder by using the multi-head self-attention mechanism, and the application does not develop description on the principle and the calculation process of the processing by using the multi-head self-attention mechanism.

The processing device then inputs the encoding vector output by the initial encoding network for the mask positions to the initial prediction network, and obtains a prediction probability distribution on the preset vocabulary, which is output by the initial prediction network for each mask position, wherein the prediction probability distribution is used for indicating the probability that the candidate word is the masked content at the mask position in each candidate word covered by the preset vocabulary, in other words, the corresponding mask prediction content can be determined for the corresponding mask position based on the prediction probability distribution, wherein the mask position is used for indicating the mask-added word position, and the sample mask content at one mask position is one word position covered by the mask.

Further, in the process of adjusting network parameters of the initial encoding network and the initial prediction network, a cross entropy loss value is calculated according to a content difference between sample mask content and corresponding mask prediction content at each mask position, and the network parameters are adjusted according to the cross entropy loss value, wherein a formula adopted in calculating the cross entropy loss value is as follows:

Wherein, Representing the calculated cross entropy loss value,Representing the sample mask content set at the mask position in the mask text,Representing the initial mask text derived from the training text,Representing the predicted mask content at the predicted corresponding mask position asIs a probability of (2).

For example, referring to fig. 3D, which is a schematic diagram of a processing procedure during training in the embodiment of the present application, as can be seen from the content illustrated in fig. 3D, after constructing a corresponding initial encoding result for each word position content, inputting each initial encoding result into an initial encoding network; after the processing of the initial coding network and the initial prediction network, outputting a prediction probability distribution on each candidate word covered by the preset word list corresponding to each mask position; and then according to the prediction probability distribution corresponding to each mask position and the sample mask content corresponding to each mask position, calculating the cross entropy loss value, and according to the calculated cross entropy loss value, adjusting the network parameters of the initial coding network and the initial prediction network.

Similarly, the processing device may perform multiple rounds of iterative training according to the training manner illustrated in steps 301-302, until a preset convergence condition is satisfied, and determine the initial coding network after training as the target coding network.

In this way, the initial coding network and the initial prediction network are trained by adopting a large number of sample texts without labels, so that the trained target coding network can learn basic grammar, syntax, semantic knowledge and language structure; moreover, through training according to the text data of each service field, rich language expression and background knowledge of different fields can be learned, so that the target coding network has extremely strong universality and generalization capability, and the meaning of each word position content in the context can be well understood.

Further, because the target encoding network only learns the sample mask content of the restored mask locations, the target encoding network focuses on representing individual words rather than representing the entire complete input text, and the representation of the entire text is relatively poor. Based on this, the processing device continues training the target coding network by means of the code guidance network and builds a text coding model by means of the trained target coding network.

After the processing device constructs a network to be trained such as illustrated in fig. 2, each sample text may be used to perform multiple rounds of iterative training on the constructed code guidance network and the target code network obtained by pre-training, and construct a text coding model based on the target code network after the last round of training, where convergence conditions of the multiple rounds of iterative training may be: training the total number of wheels to reach a fourth set value, or continuously lower than the number of times of the fifth set value by the calculated model loss value to reach a preset sixth set value; the values of the fourth setting value, the fifth setting value and the sixth setting value are set according to actual processing requirements, which are not particularly limited in the present application.

In the embodiment of the present application, each sample text acquired by the processing device may be obtained in the following manner: collecting and obtaining each initial text in the text data of each service type; unifying character coding forms of the initial texts, deleting non-text contents in the initial texts, and obtaining processed candidate texts; and deleting repeated texts from the candidate texts, and deleting abnormal texts containing preset illegal keywords to obtain each sample text.

Specifically, in the process of collecting text data, the processing device may collect a large amount of unlabeled text data to obtain each initial text, where after the text data is obtained, the processing device may perform screening according to the domain to which the text data belongs, so that each initial text includes text content covering each domain. The text data may be from news articles, science popularization data, web text, etc., which may cover as much of each business type as possible, such as news, novels, articles, conversations, chat, comments, critique, etc. In the specific process of collecting general text data, on one hand, a published text data set can be used for model training directly, on the other hand, various websites can be used as portals in a web crawler mode to capture the text data in the websites, wherein the various websites can be news websites, forums, popular science content publishing websites, blogs and the like.

After the processing device completes the collection of each initial text, the character coding form of each initial text can be unified, and then, the preprocessing of each initial text is required, and the related preprocessing steps comprise: data cleaning is carried out on each initial text, special characters and labels are removed, and plain texts are reserved; unifying the coding mode of each text data; the repeated text and the abnormal text including the offending keywords are deleted.

For example, in each text data, the processing device deletes non-text special characters and tags such as HTML tags, javaScript codes, special symbols, and the like; then unifying each text data into UTF-8 coding mode; further, according to preset offensive keywords, abnormal texts containing the offensive keywords are identified and deleted, and text data containing excessive errors and meaningless contents can be identified and deleted, and repeated texts can be deleted in combination with conventional matching rules.

Therefore, the generation quality of the sample texts can be guaranteed by carrying out data cleaning processing, unified coding processing and filtering processing on each initial text, content messy codes in the subsequent processing process are avoided, the content richness of each sample text is guaranteed, and the generation effect of each sample text is guaranteed.

In the embodiment of the application, when deleting the repeated text for each candidate text obtained by processing each initial text, the repeated text needs to be determined from each candidate text, and then the repeated text is deleted, wherein the repeated text can refer to the text with the identical content, or the repeated text can refer to the text with the highly similar content.

Based on this, in a possible manner of determining the repeated text, the following operations may be performed for each candidate text: and determining index results of the word position contents of each word position in the candidate text in a preset word list, and obtaining index character strings corresponding to the candidate text according to the index results corresponding to the word position contents. Then, determining repeated texts in candidate texts with the same index character string; further, the repeated text is deleted from each candidate text.

It should be noted that, in the embodiment of the present application, the word range corresponding to each default sample text is included in each candidate word covered by the preset word list.

In other possible implementations, when deleting the repeated text for each candidate text obtained by processing each initial text, the processing device may calculate text hash values for each candidate text respectively by using a preset hash function; and determining repeated texts in the candidate texts according to the hash values of the texts, and deleting the repeated texts from the candidate texts. The hash function adopted by the application is not particularly limited.

Specifically, when text deduplication is performed by means of a hash deduplication algorithm (i.e., a hash algorithm), each candidate text is calculated by the hash algorithm to obtain a unique hash value, the hash value is stored in a hash table, and when the hash value of the newly obtained candidate text already exists in the hash table, the hash value is regarded as duplicate text, and the duplicate text is deleted.

For example, for candidate texts 1-5, after each candidate text is sequentially acquired according to the sequence, the text hash value of the candidate text is calculated and stored in the hash table, if the corresponding hash value is determined to be already stored in the hash table when the text hash value corresponding to the candidate text 3 is calculated, it can be determined that the candidate text 3 is repeated with the previously acquired candidate text, and then the candidate text 3 is directly deleted.

Therefore, by means of calculation of the text hash value, texts which are highly similar in each candidate text can be determined to be repeated texts, and by deleting the repeated texts from each candidate text, adverse effects caused by redundant samples in the subsequent training process can be avoided, and the quality of training samples is improved. The constructed sample texts not only can cover a plurality of fields and have the characteristic of diversity, but also can provide high-quality training basis for the training process of the model.

Taking a round of training process executed in the process of training to obtain a text coding model as an example, the related training process is described below:

Referring to fig. 4, a schematic diagram of a training process performed on a target coding network and a coding pilot network according to an embodiment of the present application is shown, and a training process performed in a round of iterative training process is described below with reference to fig. 4:

Step 401: the processing equipment reads the sample text, and performs random mask processing on each word position content in the sample text according to a preset target mask proportion to obtain mask text associated with mask positions and sample mask results.

Specifically, the processing device reads the sample text used in one round of training, wherein the number of the sample text used in the one round of training, namely, the value of batchsize is set according to the actual training requirement, and the application only takes batchsize value as 1 as an example to schematically illustrate the processing involved in the training process.

In the process of carrying out random mask processing on a sample text to obtain a mask text which is associated with a mask position and a sample mask result, the processing device can firstly add a head word position content and a tail word position content to the sample text, then carry out word segmentation processing on the processed sample text, and then carry out random mask processing on each word position content in the processed sample text according to a preset target mask proportion, wherein the head word position content is used for marking the beginning of the text, and the tail word position content is used for marking the end of the text.

Step 402: the processing equipment adopts a target coding network to obtain text coding vectors corresponding to sample texts, and adopts a coding guide network to correspond to mask positions in mask texts under the guidance of the text coding vectors, and determines mask prediction results in candidate words covered by a preset word list.

When executing step 402, the processing device adopts a target coding network, and in the process of obtaining text coding vectors corresponding to sample texts, head word position content and tail word position content can be added for the sample texts, and each initial coding result is constructed corresponding to word position content of each word position in the processed sample texts; then, a target coding network is adopted, and the initial coding result of each word position is corresponding by means of a multi-head self-attention mechanism and nonlinear transformation processing, so that a coding vector generated by coding under the influence of the initial coding results of other word positions is obtained; and then, determining the coding vector corresponding to the header word position content as a text coding vector of the sample text.

Specifically, in order to process by using the target coding network, the processing device first needs to process the sample text into a content form which is allowed to be input by the target coding network; based on the above, after adding the head word position content and the tail word position content to the sample text, the processing device performs word segmentation and coding processing on the processed sample text, obtains each content coding result corresponding to each word position in the sample text, and obtains each position coding result corresponding to each word position; and then, overlapping the content coding result and the position coding result corresponding to the same word position to obtain the initial coding result corresponding to each word position.

When determining an initial coding result corresponding to a word position (assumed to be a word position Y), determining a position code corresponding to the word position Y and a content code corresponding to word position content at the word position Y, and further determining a superposition result of the position code and the content code as an initial coding result corresponding to the word position Y; when determining the content code corresponding to the word position content of the word position Y, firstly determining the index result of the word position content in a preset word list, realizing the digital processing of the word position content, and then adopting coding modes such as single-hot coding and the like to realize digital vectorization to obtain the content code; when determining the position code corresponding to the word position Y, the position code may be processed by using a feasible position code manner in a transform model, which is not particularly limited in the present application.

In this way, the sample text can be processed into a data form which can be processed by the target coding network, and the corresponding relation between the word position and the word position content can be simultaneously represented in each initial coding result constructed corresponding to each word position.

And then, the processing equipment adopts a target coding network, and codes and generates a corresponding coding vector by calculating the dependency relationship between the initial coding result of each word position and the initial coding result of other word positions in the sample text by means of a multi-head self-attention mechanism and nonlinear transformation processing in the target coding network. Further, after the corresponding code vectors are generated respectively corresponding to the word position contents of each word position, the code vector corresponding to the head word position content (i.e. cls) can be determined as the text code vector of the sample text, wherein the code vector corresponding to the head word position content can be understood as the semantic information of the sample text is captured as a whole; the encoded vector is specifically output by the last encoded block in the target encoding network.

The reason why the encoding vector corresponding to the head word position content is selected as the text encoding vector of the sample text is that the head word position content is a symbol without obvious semantic information compared with other words in the sample text, so that the semantic information of each word position content in the sample text can be fused more 'fairly', and the whole semantic of the sample text can be better represented.

In this way, by means of the target coding network obtained by pre-training, text coding vectors representing the whole semantics can be obtained corresponding to sample texts not including mask contents.

Further, the processing device adopts a code guide network, under the guide of a text code vector, corresponds to a mask position in a mask text, and firstly constructs each initial code result according to the mask text added with the head word position content and the tail word position content and the word position content corresponding to each word position in the mask text in the process of determining a mask prediction result in each candidate word covered by a preset word list, and adopts the text code vector to replace the initial code result corresponding to the head word position content of the mask text; and then adopting a coding guidance network, obtaining coding vectors generated by coding under the influence of the initial coding contents of other word positions aiming at the initial coding result of each word position content in the mask text by means of a multi-head self-attention mechanism and nonlinear transformation processing, and determining corresponding mask prediction results in each candidate word covered by a preset word table based on the coding vectors corresponding to the mask positions.

Specifically, in the process of processing by adopting the code guide network, the processing device is similar to the mode of constructing each initial coding result aiming at each word position content, and can arrange the mask text into a form which can be processed by the code guide network, and construct each initial coding result corresponding to the word position content of each word position in the mask text; then, by adopting the text coding vector output by the target coding network to replace the initial coding result corresponding to the head word position content in the mask text, the influence relationship between the target coding network and the coding guidance network can be established.

Furthermore, by adopting a coding guidance network and by means of a multi-head self-attention mechanism and nonlinear transformation processing, the initial coding result of each word site content in the mask text can be coded to obtain a coding vector under the influence of the initial coding results of other word site contents; and determining a corresponding mask prediction result in each candidate word covered by the preset word list based on the code vector of the mask position, wherein the mask prediction result specifically can refer to probability distribution in each candidate word.

For example, assume that the coding blocks in the target coding network and the coding pilot network are the transformer layer; the size of the preset vocabulary is 3 ten thousand, namely 3 ten thousand words in the preset vocabulary; the input sequence has 10 lexemes, and the 3 rd and 7 th lexemes correspond to lexeme content that is masked (mask) off. Then the output of the last transducer layer in the pilot encoding network is 10 word vectors (the vector dimension may be 768 or 1024); the vector corresponding to the first lexeme content (cls) is used as a sentence vector. And then, respectively obtaining vectors with 3 ten thousand dimensions by using vectors (3 rd and 7 th vectors) corresponding to the mask-removed word position content through a linear layer in a coding guide network, and respectively processing by using a softmax layer in the coding guide network to obtain a probability distribution on a preset word list, namely a prediction probability distribution. Thus, the 3 rd lexeme content corresponds to one probability distribution and the 7 th lexeme content corresponds to another probability distribution.

In this way, by means of the code guide network, under the action of the text code vector output by the target code network, the mask prediction result corresponding to the mask position can be predicted from the mask text obtained by the corresponding sample text mask, so that the effect of the whole text semantics can be combined under the condition of considering the comprehensive effect of other word position contents, and the effective prediction of the masked content at the mask position can be realized.

Step 403: the processing device adjusts network parameters of the target encoding network and the encoding pilot network based on a result difference between the mask prediction result and the sample mask result.

Specifically, in the process of executing step 403, after obtaining the mask prediction result input by the code guidance network, the processing device adjusts network parameters of the target code network and the code guidance network based on the prediction result difference between the mask prediction result and the sample mask result.

Specifically, in the process of adjusting network parameters of an initial coding network and an initial prediction network, a cross entropy loss value is calculated according to a result difference between a sample mask result at each mask position and a corresponding mask prediction result, and the network parameters are adjusted according to the cross entropy loss value, wherein a formula adopted in calculating the cross entropy loss value is as follows:

Wherein, For the calculated cross entropy loss value,Representing a sample mask result set at a mask position in the mask text,The mask text is represented and,Represents a predictive probability distribution corresponding to one mask position in the mask text,Outputting a text coding vector for the sample text corresponding to the target coding network; Representing that the initial coding result corresponding to the head word position content in the mask text is replaced by ；Showing that the predicted mask content at the predicted corresponding mask position isIs a probability of (2).

For example, referring to FIG. 5, which is a schematic diagram illustrating the processing procedure of two networks co-training in the embodiment of the present application, it can be seen from the content illustrated in FIG. 5 that, for the sample text added with the head word level content cls and the tail word level content sepAfter the corresponding initial coding results are respectively constructed, inputting the initial coding results into a target coding network to obtain coding vectors corresponding to the word position contents output by the target coding network;

Continuing with the description with reference to FIG. 5, the coded vector corresponding to cls is determined to be the text coded vector Masking the word positions 1,3 and 5 to obtain masking text; then, respectively constructing initial coding results for each word position content in the mask text, andThe initial encoding results corresponding to cls in the substitution mask text, i.e. the initial encoding results corresponding to cls are replaced with; And inputting each initial coding result of the mask text into a coding guide network to obtain predictive probability distribution of each candidate word of a preset word list, wherein the predictive probability distribution is obtained by respectively predicting word positions 1, 3 and 5 corresponding to the coding guide network. Calculating a cross entropy loss value according to the prediction probability distribution corresponding to each mask position and the sample mask content corresponding to each mask position; and further, according to the calculated cross entropy loss value, the network parameters of the target coding network and the coding guiding network are adjusted once.

Similarly, the processing device may perform multiple rounds of iterative training according to the training methods illustrated in steps 401-403 until a preset convergence condition is satisfied, and construct a text coding model based on the trained target coding network.

In summary, in a feasible implementation manner of the present application, in a process of training a target coding network by means of a coding guidance network, compared with the target coding network, the number of network layers included in the coding guidance network is less, the network structure is simpler, and the modeling capability is weaker; when constructing the mask text input to the code guide network, the initial mask proportion used in the process of training to obtain the target code network is higher than the target mask proportion according to the target mask proportion; in this case, it is very difficult to train the code-guided network to learn to reconstruct and recover the mask content, so that the code-guided network must pay attention to the text code vector output by the target code network in the learning process; the code guide network captures enough semantic information of sample text and outputs enough text code vectors, so that the code guide network can be assisted to recover and reconstruct mask content. In this way, in the process of continuing training the target coding network, the text representation capability of the target coding network can be improved, the semantic information of the whole input text can be captured, and high-quality dense sentence vectors (namely text coding vectors) can be output.

Further, after training to obtain the text coding model, the processing device may implement text similarity comparison in various service scenarios by means of the text coding model.

In performing the text similarity calculation, the processing device may use a text encoding model to obtain text encoding vectors (or dense sentence vectors) capable of expressing semantic information of the text, respectively, for sentences to be subjected to similarity comparison (assumed to be texts X1 and X2); and then, taking the cosine similarity between two text coding vectors as the vector similarity between the texts X1 and X2, wherein the value range of the vector similarity is 0-1, and the higher the value is, the larger the similarity is represented, and otherwise, the smaller the similarity is represented.

The formula for calculating the vector similarity is as follows:

Wherein score represents the calculated vector similarity; h1 is a text coding vector obtained by adopting a text coding model for the text X1, and the model value of h1 is represented by the I h 1; h2 is a text encoding vector obtained for the text X2 using a text encoding model, h2| characterizes the modulus of h 2.

In some feasible application scenarios, the processing device may acquire a target question text input by a target object, and acquire pre-stored candidate question texts; then, a text coding model is adopted, a target coding vector is output corresponding to the target problem text, and each candidate coding vector is output corresponding to each candidate problem text; then, a vector similarity between the target encoding vector and each candidate encoding vector is calculated, and based on the respective vector similarities, a reference question text similar to the target question text is determined in the respective candidate question texts, and reply contents associated with the reference question text are fed back to the target object.

It should be noted that, in a feasible implementation manner, after the processing device calculates the vector similarity between the target encoding vector and each candidate encoding vector, candidate problem text that the vector similarity between the candidate problem text and the target problem text meets a preset condition may be selected as a reference problem text similar to the target problem text, where the preset condition may be any one of the following: the vector similarity is highest; the vector similarity reaches a set threshold; belongs to the top Q vectors with highest vector similarity; the values of the set threshold and the Q value are set according to the actual processing requirements. Further, the reply content associated with the reference question text is fed back to the target object.

For example, referring to fig. 6, which is a schematic diagram illustrating a process of calculating similarity between question texts according to an embodiment of the present application, as can be seen from the content illustrated in fig. 6, after obtaining a target question text input by a target object, obtaining pre-stored candidate question texts, and then sorting the target question text and the candidate question texts into a form that can be processed by a text coding model; taking as an example the calculation of the vector similarity between the target text question and one candidate question text (i.e. candidate question text 1), referring to fig. 6, the text coding model outputs a coding vector 1 corresponding to the target question text and a coding vector 2 corresponding to the candidate question text 1; then, by calculating the vector similarity between the code vector 1 and the code vector 2, the degree of similarity between the corresponding target question text and the candidate question text 1 is determined.

In this way, when a question posed by a target object is replied, by calculating the vector similarity between the target question text sent by the target object and each candidate question text associated with the reply content, the reference question text similar to the target question text can be determined in each candidate question text, so that the reply content matched with the target question text can be determined according to the reply content associated with the reference question text, and the quick reply to the user question can be realized.

In other feasible application scenarios, the processing device may acquire target text browsed by the target object, and acquire candidate texts to be recommended; then adopting a text coding model, outputting target coding vectors corresponding to the target texts, and outputting candidate coding vectors corresponding to the candidate texts; further, a vector similarity between the target encoding vector and each candidate encoding vector is calculated, and a text to be recommended that is similar to the target text is determined among the candidate texts based on the respective vector similarities.

Specifically, the processing device may calculate, in a context of text content recommendation, a vector similarity between each candidate text to be recommended and a target text browsed by the target object, and determine, in each candidate text, a text to be recommended that can be recommended to the target object, where, according to each vector similarity, when determining, in each candidate text, a text to be recommended that is similar to the target text, processing may be performed by using the preset condition according to which the reference problem text is determined.

Therefore, the text to be recommended similar to the target text of interest of the target object can be determined in the text content recommendation scene, so that the text content of possible interest of the target object can be determined aiming at the target object, and efficient recommendation of the text content is realized.

Referring to fig. 7, which is a schematic diagram of a process for implementing text similarity determination in an embodiment of the present application, a process performed to implement text similarity determination will be described below with reference to fig. 7:

As can be seen from the illustration of fig. 7, the present application involves a two-stage training process, the first training stage for training to obtain a target coding network, the second training stage retraining the target coding network by means of a code guidance network, and constructing a text coding model based on the retrained target coding network. Then, extraction of text encoding vectors is performed for the text contents by means of the text encoding model, and similarity between the text contents is determined by calculating similarity relationships between the text encoding vectors.

In the second training stage, the processing device introduces a coding guidance network to promote text representation capability of the target coding network, the target coding network codes the input text into a low-dimensional dense sentence vector, the coding guidance network adopts a shallow network structure, and reconstructs and recovers mask content from mask text based on the dense sentence vector output by the target coding network, wherein the target coding network can be specifically a deep layer of a converter, the coding guidance network can be constructed based on the shallow layer of the converter, for example, a constructed initial coding network is formed by stacking a plurality of (12 or 24) layers of identical converters, and the constructed coding guidance network is constructed based on a few (1 or 2) layers of identical converters; the network parameters of the code guidance network are initialized randomly before training begins.

Continuing to explain the second training process, when obtaining the text coding vector output by the target coding network, after word segmentation and indexing processing are carried out on the input sample text, masking noise is not added to the sample text, the initially coded sample text is directly used as the input of the target coding network, and a low-dimension dense sentence vector corresponding to the sample text output by the target coding network is obtained and recorded asWherein, the method comprises the steps of, wherein,Representing the target encoding network, X representing the input processable sample text form,And (5) a dense sentence vector corresponding to the sample text, namely a text coding vector of the sample text.

For example, the last transducer layer output vector corresponding to the start character "cls" is taken as a dense sentence vector that captures semantic information of the input text.

Continuing with the second training process, masking noise is added to the input sample text to obtain a masked text, where a relatively large masking scale (e.g., 50% -60%) is typically used to increase the difficulty of the code guided network to recover and reconstruct the masked content when the masked text is obtained. The code guidance network may also employ a bi-directional attention mechanism to combine the text code vectors output by the target code network with the mask text to obtain an input to the code guidance network to enable the code guidance network to recover and reconstruct the mask content based on dense sentence vectors output by the target code network. The input form of the code guidance network can be written as:

Wherein, Representing the contents of the recovered mask(s),A code-directed network is represented,Representing the i-th wordIs used to encode the content of the program,Representing the i-th wordIs a result of the position encoding of (a).

Based on this, during the second stage of processing, the code guided network can reconstruct and recover the perturbed text, predict the mask content for the mask position, and measure the difference between the predicted and real lexeme content using a cross entropy loss function, and update the network parameters by minimizing the loss function.

In this way, in the process of obtaining a text coding model through integral training, massive universal text data which does not need to be marked and is in different fields is collected, and a large-scale high-quality sample text is constructed after data cleaning and filtering and text deduplication processing. Then, by means of each sample text, the mask language model is used as a training target, and the disturbed mask content is reconstructed and recovered, so that a target coding network is obtained through training, the target coding network can learn basic grammar, syntax and semantic knowledge of natural language in the training process, can learn rich language representation, has stronger universality and generalization capability, and provides a strong basic representation for subsequent further training. Then, a guiding coding network is introduced to enhance the text representation capability of the target coding network, and compared with the target coding network, the number of network layers of the coding guiding network is less, the network structure is simpler, the modeling capability is weak, and the mask proportion adopted in the relevant training stage is higher; therefore, the difficulty of the code guidance network in reconstructing and recovering the disturbed text is very high, so that the code guidance network needs to pay attention to sentence vectors output by the target code network in order to recover and reconstruct the mask content, and the target code network needs to capture semantic information of enough input texts and output enough sentence vectors to provide a processing basis for the code guidance network to recover and reconstruct the mask content. Based on the method, the text representation capability of the target coding network can be improved by means of the coding guide network introduced by the method, semantic information of the whole input text can be captured, high-quality dense sentence vectors are output, and the effect of calculating and measuring semantic similarity between two texts is improved.

Based on the same inventive concept, referring to fig. 8, which is a schematic diagram of a logic structure of a training device for a text coding model according to an embodiment of the present application, the training device 800 for a text coding model includes a first training unit 801, where,

The first training unit 801 performs multiple rounds of iterative training on the constructed code guidance network and the target coding network obtained by pre-training by adopting preset various text books, and constructs a text coding model based on the trained target coding network, wherein in the process of one round of iterative training, the following operations are performed:

Reading a sample text, and performing random mask processing on each word position content in the sample text according to a preset target mask proportion to obtain a mask text associated with a mask position and a sample mask result;

A target coding network is adopted to correspond to the sample text to obtain a text coding vector, a coding guide network is adopted to correspond to a mask position in a mask text under the guide of the text coding vector, and a mask prediction result is determined in each candidate word covered by a preset word list;

based on the result difference between the mask prediction result and the sample mask result, network parameters of the target coding network and the coding pilot network are adjusted.

Optionally, the apparatus further includes a second training unit 802, where the target coding network is obtained by training by the second training unit 802 in the following manner:

performing multiple rounds of iterative training on the constructed initial coding network and the initial prediction network by adopting each training text to obtain a target coding network which is obtained by training the corresponding initial coding network, wherein in the iterative training process of one round, the following operations are executed:

Reading a training text, and carrying out random mask processing on each word position content in the training text according to a preset initial mask proportion to obtain an initial mask text associated with mask positions and sample mask contents, wherein the initial mask proportion is lower than a target mask proportion;

And outputting each code vector by adopting an initial code network corresponding to the word position content of each word position in the initial mask text, determining mask predicted content in each candidate word covered by a preset word list based on the code vector corresponding to the mask position by adopting an initial prediction network, and adjusting network parameters of the initial code network and the initial prediction network based on the content difference between the mask predicted content and the sample mask content.

Optionally, each sample text is obtained by the first training unit 801 in the following manner:

Unifying character coding forms of all initial texts, deleting non-text contents in all initial texts, and obtaining all processed candidate texts;

And deleting repeated texts from each candidate text, and deleting abnormal texts containing preset illegal keywords to obtain each sample text.

Optionally, when deleting the repeated text, the first training unit 801 is configured to:

For each candidate text, a preset hash function is adopted, and text hash values are calculated respectively;

Optionally, when a target coding network is adopted and a text coding vector is obtained corresponding to the sample text, the first training unit 801 is configured to:

Adding head word position content and tail word position content to the sample text, and constructing each initial coding result according to the word position content of each word position in the processed sample text;

Adopting a target coding network, and obtaining a coding vector generated by coding under the influence of the initial coding results of other word positions by means of a multi-head self-attention mechanism and nonlinear transformation processing and corresponding to the initial coding result of each word position;

And determining the coding vector corresponding to the head word position content as a text coding vector of the sample text.

Optionally, when each initial encoding result is constructed according to the lexeme content of each lexeme in the processed sample text, the first training unit 801 is configured to:

And superposing the content coding result and the word position coding result corresponding to the same word position to obtain initial coding results corresponding to each word position.

Optionally, with the adoption of a code guidance network, under the guidance of a text code vector, the first training unit 801 is configured to:

Constructing each initial coding result according to the mask text added with the head word position content and the tail word position content and the word position content corresponding to each word position in the mask text, and replacing the initial coding result corresponding to the head word position content of the mask text by adopting a text coding vector;

And adopting a coding guidance network, obtaining coding vectors generated by coding under the influence of the initial coding contents of other word positions aiming at the initial coding result of each word position content in the mask text by means of a multi-head self-attention mechanism and nonlinear transformation processing, and determining corresponding mask prediction results in each candidate word covered by a preset word table based on the coding vectors corresponding to the mask positions.

Optionally, after training the text coding model corresponding to the target coding model, the apparatus further includes a calculating unit 803, where the calculating unit 803 is configured to:

Outputting target coding vectors corresponding to the target problem texts and outputting candidate coding vectors corresponding to the candidate problem texts by adopting a text coding model;

And calculating the vector similarity between the target coding vector and each candidate coding vector, determining a reference question text similar to the target question text in each candidate question text based on the vector similarity, and feeding back the reply content associated with the reference question text to the target object.

Outputting target coding vectors corresponding to the target texts and outputting candidate coding vectors corresponding to the candidate texts by adopting a text coding model;

and calculating the vector similarity between the target coding vector and each candidate coding vector, and determining the text to be recommended which is similar to the target text in each candidate text based on the vector similarity.

For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.

Having described the training method and apparatus of the text encoding model of an exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.

Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method. Referring to fig. 9, a schematic diagram of a hardware component of an electronic device to which an embodiment of the present application is applied, in one embodiment, the electronic device may be the processing device 120 shown in fig. 1. In this embodiment, the electronic device may be configured as shown in fig. 9, including a memory 901, a communication module 903, and one or more processors 902.

A memory 901 for storing a computer program executed by the processor 902. The memory 901 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a program required for running an instant communication function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.

The memory 901 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory 901 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a hard disk (HARD DISK DRIVE, HDD) or a solid state disk (solid-state disk) (STATE DRIVE, SSD); or memory 901, is any other medium that can be used to carry or store a desired computer program in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 901 may be a combination of the above memories.

The processor 902 may include one or more central processing units (central processing unit, CPUs) or digital processing units, or the like. And a processor 902 for implementing the training method of the text encoding model when calling the computer program stored in the memory 901.

The communication module 903 is used to communicate with the client device and the server.

The specific connection medium between the memory 901, the communication module 903, and the processor 902 is not limited in the embodiment of the present application. The embodiment of the present application is shown in fig. 9, where the memory 901 and the processor 902 are connected by a bus 904, where the bus 904 is depicted in bold in fig. 9, and the connection between other components is merely illustrative, and not limiting. The bus 904 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 9, but only one bus or one type of bus is not depicted.

The memory 901 stores a computer storage medium, in which computer executable instructions are stored, for implementing the method for determining the estimated playing time length according to the embodiment of the present application. The processor 902 is configured to perform the training method of the text encoding model described above, as shown in fig. 4.

In another embodiment, the electronic device may be another electronic device, and referring to fig. 10, a schematic diagram of a hardware composition of another electronic device to which the embodiment of the present application is applied, where the electronic device may specifically be the client device 110 shown in fig. 1. In this embodiment, the structure of the electronic device may include, as shown in fig. 10: communication component 1010, memory 1020, display unit 1030, camera 1040, sensor 1050, audio circuit 1060, bluetooth module 1070, processor 1080 and the like.

The communication component 1010 is for communicating with a server. In some embodiments, a circuit wireless fidelity (WIRELESS FIDELITY, WIFI) module may be included, the WiFi module belongs to a short-range wireless transmission technology, and the electronic device may help the user to send and receive information through the WiFi module.

Memory 1020 may be used to store software programs and data. Processor 1080 performs various functions and data processing for client device 110 by executing software programs or data stored in memory 1020. The memory 1020 may store an operating system and various application programs, and may also store a computer program that requests a text similarity calculation.

The display unit 1030 may also be used to display information entered by a user or provided to a user as well as a graphical user interface (GRAPHICAL USER INTERFACE, GUI) of various menus of the client device 110. In particular, the display unit 1030 may include a display screen 1032 disposed on a front side of the client device 110. The display unit 1030 may be used to display a page or the like of the text similarity calculation operation in the embodiment of the present application.

The display unit 1030 may also be used to receive input numeric or character information and generate signal inputs related to user settings and function control of the client device 110. In particular, the display unit 1030 may include a touch screen 1031 disposed on the front of the client device 110 and may collect touch operations thereon or thereabout by a user.

The touch screen 1031 may be covered on the display screen 1032, or the touch screen 1031 may be integrated with the display screen 1032 to implement the input and output functions of the client device 110, and after integration, the touch screen may be simply referred to as a touch screen. The display unit 1030 may display an application program and corresponding operation steps in the present application.

The camera 1040 may be used to capture still images, and the user may comment the image captured by the camera 1040 through the application. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then passed to a processor 1080 for conversion into a digital image signal.

The client device may also include at least one sensor 1050, such as an acceleration sensor 1051, a distance sensor 1052, a fingerprint sensor 1053, and a temperature sensor 1054. The client device may also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, light sensors, motion sensors, and the like.

Audio circuitry 1060, speakers 1061, microphone 1062 may provide an audio interface between a user and the client device 110. Audio circuit 1060 may transmit the received electrical signal after conversion of the audio data to speaker 1061 for conversion by speaker 1061 into an audio signal output. On the other hand, microphone 1062 converts the collected sound signals into electrical signals, which are received by audio circuitry 1060 and converted into audio data, which are output to communications component 1010 for transmission to, for example, another client device 110, or to memory 1020 for further processing.

The bluetooth module 1070 is used for exchanging information with other bluetooth devices having a bluetooth module through a bluetooth protocol.

Processor 1080 is a control center of the client device and connects the various parts of the overall terminal using various interfaces and lines, performs various functions of the client device and processes data by running or executing software programs stored in memory 1020 and invoking data stored in memory 1020. In some embodiments, processor 1080 may include at least one processing unit; processor 1080 may also integrate the application processor and the baseband processor. Processor 1080 of the present application may run an operating system, an application, a user interface display, and a touch response, as well as a training method for a text encoding model of an embodiment of the present application. In addition, a processor 1080 is coupled to the display unit 1030.

In some possible embodiments, aspects of the training method of a text encoding model provided by the present application may also be implemented in the form of a program product comprising a computer program for causing an electronic device to perform the steps of the training method of a text encoding model according to the various exemplary embodiments of the present application described herein above when the program product is run on the electronic device, e.g. the electronic device may perform the steps as shown in fig. 3B.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may take the form of a portable compact disc read only memory (CD-ROM) and comprise a computer program and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer programs for performing the operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic device may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., connected through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having a computer-usable computer program embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program commands may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the commands executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for training a text encoding model, comprising:

obtaining a text coding vector corresponding to the sample text by adopting the target coding network, and determining a mask prediction result in each candidate word covered by a preset word list by adopting the coding guide network corresponding to the mask position in the mask text under the guide of the text coding vector, wherein the coding guide network is constructed based on a network structure of a neural network model; the total number of the coding blocks in the target coding network is larger than the total number of the coding blocks in the coding guide network;

based on a result difference between the mask prediction result and the sample mask result, adjusting network parameters of the target encoding network and the encoding pilot network;

the step of determining a mask prediction result in each candidate word covered by a preset word list by adopting the coding guidance network under the guidance of the text coding vector and corresponding to the mask position in the mask text comprises the following steps: constructing each initial coding result according to the mask text added with the head word position content and the tail word position content and the word position content corresponding to each word position in the mask text, and adopting the text coding vector to replace the initial coding result corresponding to the head word position content of the mask text; and adopting the code guidance network, obtaining a code vector generated by coding under the influence of the initial code contents of other word positions aiming at the initial code result of each word position content in the mask text by means of a multi-head self-attention mechanism and nonlinear transformation processing, and determining a corresponding mask prediction result in each candidate word covered by a preset word list based on the code vector corresponding to the mask position.

2. The method of claim 1, wherein the target encoding network is trained by:

3. The method of claim 1, wherein each sample is obtained by:

4. The method of claim 3, wherein the deleting the repeated text comprises:

5. The method of claim 1, wherein said obtaining text encoding vectors corresponding to said sample text using said target encoding network comprises:

6. The method of claim 5, wherein constructing each initial encoding result corresponding to the lexeme content of each lexeme in the processed sample text comprises:

7. The method according to any one of claims 1-6, further comprising, after training the text coding model corresponding to the target coding model:

8. The method according to any one of claims 1-6, further comprising, after training the text coding model corresponding to the target coding model:

9. A training device for a text encoding model, comprising:

10. The apparatus of claim 9, further comprising a second training unit, the target encoding network being trained by the second training unit by:

11. The apparatus according to claim 9 or 10, wherein after training the text encoding model corresponding to the target encoding model, the apparatus further comprises a calculating unit configured to:

12. The apparatus according to claim 9 or 10, wherein after training the text encoding model corresponding to the target encoding model, the apparatus further comprises a calculating unit configured to:

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-8 when the computer program is executed by the processor.

14. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the method according to any of claims 1-8 when executed by a processor.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-8.