CN117436443A

CN117436443A - Model construction method, text generation method, device, equipment and medium

Info

Publication number: CN117436443A
Application number: CN202311754111.7A
Authority: CN
Inventors: 刘陆阳; 张闯; 林群阳; 王敏
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-01-23
Anticipated expiration: 2043-12-19
Also published as: CN117436443B

Abstract

The embodiment of the invention provides a model building method, a text generating method, a device, equipment and a medium, and relates to the technical field of text generation. The method comprises the following steps: the text generation model to be trained comprises a sparse topic coding network, a topic feature adaptation network and a text generation network; processing the acquired sample subject word distribution matrix based on a sparse subject coding network to obtain a sample sparse subject context vector; converting the sample sparse topic context vector based on the topic feature adaptation network to obtain a sample context vector; processing the sample context vector and the sample text based on the text generation network to obtain the output of the text generation network; and updating network parameters of the three networks according to the calculated loss function values until the loss function converges to obtain a trained text generation model, so that the cooperative training of the topic mining module and the text generation module is realized in the text generation model, and the interpretability of the text generation model is improved.

Description

Model construction method, text generation method, device, equipment and medium

Technical Field

The present invention relates to the field of text generation technologies, and in particular, to a model building method, a text generation method, a device, equipment, and a medium.

Background

Thanks to the rapid development of artificial intelligence and deep learning technologies in recent years, probability Generation models based on Variational self-Encoder (VAE) and Generation of countermeasure networks (Generative Adversarial Network, GAN) are becoming an important foundation for studying Text Generation (Text Generation) technology. The current common text generation methods mainly preset a samplingparameter hidden variable distribution as a hidden state random variable for controlling the text generation, which can greatly improve the text generation performance, but the random variable for controlling the text generation in the methods is often not interpretable.

Based on this, using knowledge to implement controllable text generation is one way to effectively solve the above-mentioned problems. In the current controllable text generation method, basically similar two-step method is adopted: 1. and 2, acquiring data topic information by using the existing topic model, and integrating the topic information into a text generation framework as additional input. In fact, this model creates "topic mining" and "text generation" cuts, i.e., a topic mining model needs to be trained first, and then the topic mining model is used as a parameter-fixed module to output topic word related information to the text generation model in order to enhance the consistency of the generated text with the topic. Thus, the current topic-controlled text generation method not only causes difficulty in co-training of topic mining and text generation, but also causes the problem that the text generation effect and the text interpretability are diluted when the data volume and the vocabulary are large.

Therefore, how to improve the interpretability of the text generation model on the basis of the collaborative training topic mining module and the text generation module is a technical problem to be solved in the invention.

Disclosure of Invention

The embodiment of the invention provides a model construction method, a text generation method, a device, equipment and a medium, which are used for realizing the cooperative training of a theme mining module and a text generation module in a text generation model and improving the interpretability of the text generation model.

The first aspect of the embodiment of the invention provides a model construction method, which is applied to a text generation system with controllable sparse theme, wherein the text generation system with controllable sparse theme at least comprises: the system comprises a sample subject word acquisition unit, a sample sparse subject conversion unit, a sample text generation unit and a model training unit; the method comprises the following steps:

acquiring a sample subject term distribution matrix through the sample subject term acquisition unit;

the sample sparse topic obtaining unit is used for processing the sample topic word distribution matrix based on a sparse topic coding network in a text generation model to be trained to obtain a sample sparse topic context vector;

Converting the sample sparse topic context vector by the sample sparse topic conversion unit based on a topic feature adaptation network in a text generation model to be trained to obtain a sample context vector;

processing the sample context vector and the sample text in the sample text data set based on a text generation network in a text generation model to be trained by the sample text generation unit to obtain the output of the text generation network;

and calculating a loss function value based on the output of the text generation network through the model training unit, and updating network parameters of the sparse topic coding network, the topic feature adaptation network and the text generation network according to the loss function value until the loss function converges to obtain a trained text generation model.

The second aspect of the embodiment of the invention provides a text generation method, which is applied to a text generation system with controllable sparse theme, wherein the text generation system with controllable sparse theme at least comprises: the system comprises a target subject word acquisition unit, a target sparse subject conversion unit and a target text generation unit; the method comprises the following steps:

Acquiring a target subject term distribution matrix through the target subject term acquisition unit;

processing the target subject word distribution matrix based on a sparse subject coding network in a text generation model by the target sparse subject acquisition unit to obtain a target sparse subject context vector;

converting the target sparse topic context vector based on a topic feature adaptation network in a text generation model by the target sparse topic conversion unit to obtain a target context vector;

processing the target context vector and the text to be processed based on a text generation network in a text generation model by the target text generation unit to obtain a target predicted text;

the text generation model is a trained text generation model obtained by the model construction method in the first aspect.

A third aspect of the embodiments of the present invention provides a model building device, which is applied to a text generation system with controllable sparse theme, where the text generation system with controllable sparse theme at least includes: the system comprises a sample subject word acquisition unit, a sample sparse subject conversion unit, a sample text generation unit and a model training unit; the device comprises:

The first acquisition module is used for acquiring a sample subject term distribution matrix through the sample subject term acquisition unit;

the first processing module is used for processing the sample subject word distribution matrix through the sample sparse subject acquisition unit based on a sparse subject coding network in a text generation model to be trained to obtain a sample sparse subject context vector;

the first conversion module is used for converting the sample sparse topic context vector through the sample sparse topic conversion unit based on a topic feature adaptation network in a text generation model to be trained to obtain a sample context vector;

the text output module is used for processing the sample context vector and the sample text in the sample text data set based on a text generation network in a text generation model to be trained through the sample text generation unit to obtain the output of the text generation network;

and the model training module is used for calculating a loss function value based on the output of the text generation network through the model training unit, updating network parameters of the sparse topic coding network, the topic feature adaptation network and the text generation network according to the loss function value until the loss function converges, and obtaining a trained text generation model.

A fourth aspect of the present invention provides a text generating device, which is applied to a text generating system with controllable sparse theme, where the text generating system with controllable sparse theme at least includes: the system comprises a target subject word acquisition unit, a target sparse subject conversion unit and a target text generation unit; the device comprises:

the second acquisition module is used for acquiring a target subject term distribution matrix through the target subject term acquisition unit;

the second processing module is used for processing the target subject word distribution matrix based on a sparse subject coding network in a text generation model through the target sparse subject acquisition unit to obtain a target sparse subject context vector;

the second conversion module is used for converting the target sparse topic context vector based on a topic feature adaptation network in a text generation model through the target sparse topic conversion unit to obtain a target context vector;

the text prediction module is used for processing the target context vector and the text to be processed based on a text generation network in a text generation model through the target text generation unit to obtain a target predicted text;

A fifth aspect of the embodiments of the present invention provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program implementing steps of the model building method according to the first aspect of the embodiments of the present invention or implementing steps of the text generating method according to the second aspect of the embodiments of the present invention when executed by the processor.

A sixth aspect of the embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the model building method according to the first aspect of the embodiments of the present invention, or implements the steps of the text generating method according to the second aspect of the embodiments of the present invention.

In the method for constructing the model provided by the embodiment of the invention, a text generation model to be trained comprises a sparse topic coding network, a topic feature adaptation network and a text generation network, and in the model training process, a sample topic word distribution matrix is firstly obtained; secondly, processing the sample subject word distribution matrix through a sparse subject coding network to obtain a sample sparse subject context vector; then, converting the sample sparse topic context vector through a topic feature adaptation network to obtain a sample context vector; then, processing the sample context vector and the sample text in the sample text data set through a text generation network to obtain the output of the text generation network; and finally, calculating a loss function value based on the output of the text generation network, and updating network parameters of the three networks according to the loss function value until the loss function converges to obtain a trained text generation model. According to the model construction method provided by the embodiment of the invention, the sparse topic context representation is obtained through the sparse topic coding network, the output of the sparse topic coding network is matched with the text generation network through the topic feature adaptation network, and finally the text generation network is trained on a large-scale corpus, so that the text generation of sparse topic guidance is finally realized, the cooperative training of the sparse topic mining module (namely the sparse topic coding network) and the text generation module (namely the text generation network) is realized, and the interpretability of the text generation model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart illustrating steps of a model building method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a sparse topic-controllable text generation model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a sparse topic encoding network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a theme adaptation network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a text generation network according to an embodiment of the present invention;

FIG. 6 is a flow chart illustrating a text generation model training phase in a conventional mode in accordance with an embodiment of the present invention;

FIG. 7 is a flow chart illustrating a text generation model training phase in an external topic mode in accordance with an embodiment of the present invention;

FIG. 8 is a flow chart of a text generation method according to an embodiment of the present invention;

FIG. 9 is a flow chart illustrating a text generation model reasoning phase in a conventional mode in accordance with an embodiment of the present invention;

FIG. 10 is a flow chart illustrating a text generation model reasoning phase in an external topic model in accordance with an embodiment of the present invention;

FIG. 11 is a block diagram showing a construction of a model construction apparatus according to an embodiment of the present invention;

fig. 12 is a block diagram showing a structure of a text generating apparatus according to an embodiment of the present invention;

fig. 13 is a schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Currently, knowledge is used to explain, improve hidden variables for controlling text generation and improve the interpretability of the hidden variables, which are a big hot problem of controllable text generation technology. In the field of knowledge controllable text generation, the meaning of the expression of "knowledge" is quite rich, and as described above, the text generation is guided by taking a "theme" as "knowledge" to become an important thought for text generation. For example, representative methods of related topic-controlled text generation techniques are SVAE, TGVAE, BLT-NMT, etc., which have in common that LSTM (Long Short Term Memory networks, long short term neural network) is used as the base of the basic text generation model, and then a NTM (Neural Topic Model ) or CNN (Convolutional Neural Network, convolutional neural network) feature extractor is docked to extract topic features and fuse them into the text generation process.

The topic model is a model for mining document topics and topic word distribution on a document set in an unsupervised manner. The common method comprises the following steps: 1. after training on the document set is completed, the document theme vector of the input document is used as a document vector to compress and classify the data; 2. after training on the document set is completed, the subject words corresponding to the subject word distribution tensor beta are derived to serve as the subjects covered by the data set.

The combination of the VAE and the recurrent neural network can cause the problem of gradient disappearance to make the model difficult to train, so that the topic-controllable text generation method in the related art has to adopt a model combination method with a separated structure, thereby causing difficulty in co-training of topic mining and text generation. In addition, the topic-controllable text generation method in the related art mainly uses dense topic representation, but the effect and interpretability of text generation are diluted when the data amount and vocabulary are large.

Therefore, in order to at least partially solve one or more of the above problems and other potential problems, an embodiment of the present invention proposes a model building method for collaborative training topic mining and text generation, in which a text generation model to be trained includes three parts, namely a sparse topic coding network, a topic feature adaptation network and a text generation network, in the model training process, a sparse topic context representation is obtained through the sparse topic feature coding network, then the output sparse topic context representation is adapted to the text generation network through the proposed topic feature adaptation network, finally the text generation model is trained on a large-scale corpus, and finally a sparse topic-guided text generation model is implemented, so that not only is the collaborative training sparse topic mining module and the text generation module possible, but also the interpretability of the text generation model is improved.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a model construction method according to an embodiment of the present invention, where the method is applied to a sparse topic controllable text generation system. The sparse topic controllable text generation system of this embodiment is at least used for: training to obtain a text generation model with controllable sparse subjects, wherein the text generation model is used for: text to be processed predicts text related to the subject word based on the given subject word. The sparse topic controllable text generation system of the embodiment is deployed on a terminal or a server; the terminal (may also be referred to as a device) may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a wearable electronic device (e.g., a smart watch), a vehicle-mounted terminal, a smart home appliance (e.g., a smart television), an AR/VR device, and the like. The server may be an independent physical server, a server cluster or a distributed system (such as a distributed cloud storage system) formed by a plurality of physical servers, or a cloud server for providing cloud computing and cloud storage services.

The sparse topic-controllable text generation system of the embodiment may at least include: the system comprises a sample subject word acquisition unit, a sample sparse subject conversion unit, a sample text generation unit and a model training unit.

As shown in fig. 1, the model construction method of the present embodiment may include the steps of:

step S11: and acquiring a sample subject term distribution matrix through the sample subject term acquisition unit.

In the training process of the text generation model to be trained, the sample subject word distribution matrix can be acquired through the sample subject word acquisition unit, so that a core for generating the text is provided in the training process of the text generation model, and the text generation model can organize words, sentences and paragraphs around the subject words in the sample subject word distribution matrix to generate the text. The sample subject matter word distribution matrix of the embodiment is a subject matter word distribution matrix for model training, and the subject matter word distribution matrix is a polynomial subject matter word distribution matrix.

Step S12: and processing the sample subject word distribution matrix by the sample sparse subject acquisition unit based on a sparse subject coding network in a text generation model to be trained to obtain a sample sparse subject context vector.

In this embodiment, the text generation model to be trained includes at least three parts: sparse topic coding network, topic feature adaptation network and text generation network. Where the subject term distribution is often a distribution defined over a vocabulary, the vocabulary length is larger when the data set is larger. While most of the topic mining processes in the related art are based on statistical information NTM or CNN feature extractors, the topic context representation is lacking. Therefore, in this embodiment, the obtained sample topic word distribution matrix may be processed by the sample sparse topic obtaining unit based on the sparse topic encoding network, so as to obtain a context representation of the sample sparse topic.

The sparse topics in this embodiment are m words with the highest probability before each topic, namely topic keywords, and may be further represented as sparse topic word distribution by using a sparse tensor. The sample sparse topic is a sparse topic keyword used for model training, and the sparse topic context representation represents that each word in the sparse topic has an association relationship with the front and rear topic words. Based on the above, in this embodiment, after the sample subject word distribution matrix is obtained, the sample subject word distribution matrix may be processed through the sparse subject code network to obtain a sample sparse subject context vector. The sample sparse topic context vector is a sparse topic word vector for the presence context representation of model training.

Step S13: and converting the sample sparse topic context vector by the sample sparse topic conversion unit based on a topic feature adaptation network in a text generation model to be trained to obtain a sample context vector.

In this embodiment, the sample sparse topic context vector output by the sparse topic coding network is not adapted to the input of the text generation network, and cannot be directly input to the text generation network to perform text generation model training, so this embodiment provides a topic feature adaptation network between the sparse topic coding network and the text generation network, where the topic feature adaptation network is mainly used to transform the sparse topic context vector, so that the output of the topic feature adaptation network is adapted to the text generation network.

Therefore, after obtaining the sample sparse topic context vector, the present embodiment may input the sample sparse topic context vector to the topic feature adaptation network through the sample sparse topic conversion unit, and convert the sample sparse topic context vector through the topic feature adaptation network, so as to obtain an output of the topic feature adaptation network: sample context vector. The sample context vector in this embodiment is a context vector in which the sample sparse subject context vector is converted.

Step S14: and processing the sample context vector and the sample text in the sample text data set based on a text generation network in a text generation model to be trained by the sample text generation unit to obtain the output of the text generation network.

In this embodiment, for model training of a text generation model to be trained, a sample text data set for training is prepared, the sample text data set including a plurality of sample texts. After the sample context vector is obtained, the sample context vector and the sample text can be input into a text generation network through a sample text generation unit, so that the sample context vector and the sample text are processed through the text generation network, and output of the text generation network is obtained. The text generation network of the present embodiment may be an autoregressive text generation network based on a transformer framework.

Step S15: and calculating a loss function value based on the output of the text generation network through the model training unit, and updating network parameters of the sparse topic coding network, the topic feature adaptation network and the text generation network according to the loss function value until the loss function converges to obtain a trained text generation model.

In this embodiment, after obtaining the output of the text generation network, the model training unit may calculate a loss function value according to the output of the text generation network based on a preset model loss function, and then update parameters of the network parameters of the sparse topic coding network, the network parameters of the topic feature adaptation network, and the network parameters of the text generation network based on the loss function value until the loss function converges, so as to complete model training of the text generation model, and obtain a trained text generation model. The text generation model trained by the embodiment is a text generation model with controllable sparse subjects.

According to the method, a text generation model to be trained is divided into three network parts, a sparse topic context vector is obtained through a sparse topic coding network, then the sparse topic context vector is converted into a context vector matched with a text generation network through a topic feature adaptation network, finally the text generation network is trained on the context vector and a large-scale sample text data set, network parameters of the three networks are updated according to loss function values until the text generation model capable of achieving sparse topic guidance is finally obtained through training, and gradient disappearance problems cannot occur when the text generation is conducted through the transform framework.

In an alternative implementation, as shown in fig. 2, fig. 2 is a schematic structural diagram of a sparse topic-controllable text generation model according to an embodiment of the present invention. In this embodiment, the sparse Topic-controllable text generation (TGTG, topic-guided text generation) model mainly includes three parts: a sparse topic encoding network (e.g., a sparse topic context encoding network), a topic feature adaptation network, and a text generation network (e.g., an autoregressive text generation network). In the model training process, the acquired sample subject word distribution matrix (such as a subject keyword sequence) is input into a sparse subject coding network for processing to obtain a sample sparse subject context vector, then the sample sparse subject context vector representation is converted through a subject feature adaptation network to obtain a context vector adapted to a text generation network, finally the context vector and a sample text are input into the text generation network for training a text generation model, and finally the training is performed to obtain a text generation model with controllable sparse subjects for predicting and generating the text with controllable subjects.

In combination with the above embodiment, in an implementation manner, the embodiment of the invention further provides a model building method. In this method, the step S12 may specifically include step S21 and step S22:

Step S21: and sampling a sparse subject word sequence in the sample subject word distribution matrix based on the sparse subject coding network through the sample sparse subject acquisition unit.

In this embodiment, after the subject word distribution matrix is obtained, the sample sparse subject obtaining unit may sample the sample subject word distribution matrix based on the sparse subject coding network, so as to sample a sparse subject word sequence in the sample subject word distribution matrix.

Step S22: and inputting the sparse subject word sequence to a pre-training transducer network in the sparse subject coding network to obtain the sample sparse subject context vector.

A pre-training language model is introduced into the sparse topic coding network of the embodiment and is used for processing the sparse topic word sequence to obtain the context representation of the sparse topic. Specifically, the sparse subject word sequence can be processed into short texts through a pre-training language model, and then the short texts are formed into the ebedding features for subsequent processing. The pre-training language model of the present embodiment may be a pre-training transducer network, for example, a T5 model, and so on.

In this embodiment, after the sparse subject word sequence is obtained by sampling, the sparse subject word sequence may be input to a pre-training transducer network in the sparse subject encoding network, so as to obtain a context vector of the sample sparse subject.

In this embodiment, an external pre-trained language model is introduced into the sparse topic coding network, and the sampled sparse topic words can be processed based on word vectors or knowledge of the pre-trained language model to obtain context representations of the sparse topic words, so as to promote association between subsequent text generation and controllable sparse topic keywords, and further promote interpretability of the text generation model.

In combination with the above embodiment, in an implementation manner, the embodiment of the invention further provides a model building method. In this method, the "sampling the sparse subject word sequence in the sample subject word distribution matrix" in the above step S21 may specifically include steps S31 to S33:

step S31: and sampling the sample subject term distribution matrix to obtain a sample subject feature matrix.

In this embodiment, the obtained sample subject term distribution matrix may be:wherein, the method comprises the steps of, wherein,the matrix is a pre-trained word vector, +.>For topic vectors, K is the number of topics, V is the vocabulary length, and 100 is the word vector dimension. Sampling the sample subject word distribution matrix beta through a sparse subject coding network, for example, gumbel Softmax sampling, to obtain a sample subject feature matrix +. >。

Step S32: and inputting the sample theme feature matrix to a salient theme STL layer, and processing based on non-zero element number network parameters to obtain a sparse theme matrix.

In this embodiment, a sample topic feature matrix is obtainedThereafter, the sample topic feature matrix can be +.>Inputting to the salient topic STL layer, and based on the non-zero element number network parameter tpk, fitting a sample topic feature matrix->Processing to obtain sparse topic matrix->The sparse topic matrix->Sparse keywords for each topic and their set of probabilities. Wherein, the non-zero element number network parameter (i.e. tpk network parameter) determines the processing mode of the selected element, and the sample theme feature matrix ∈>Is a K-V matrix, which is intended to be sparse, one of the matrixThe rows are a topic, each row only holds tpk words, given v=10000, tpk can be equal to 20, so that after passing through the salient topic STL layer, a sparse topic matrix is obtainedIn this sparse matrix, each row has only tpk non-0 elements, the others are all 0 elements. I.e.)>Is a sparse matrix with most elements 0 and is normalized by rows.

Step S33: performing feature textualization on the sparse topic matrix to obtain sparse topic word information, wherein the sparse topic word information at least comprises: the sparse subject word sequence.

In this embodiment, a sparse topic matrix is obtainedAfterwards, a sparse topic matrix can be applied>Feature texting, i.e. sparse topic matrix +.>Is a topic, each column is the distribution of the current topic on the vocabulary, thus a sparse topic matrix +.>Obtaining sparse subject word information through conversion, wherein the sparse subject word information at least comprises a sparse subject word sequence, and the sparse subject word sequence is a subject keyword set: />。

In combination with the above embodiment, in an implementation manner, the embodiment of the invention further provides a model building method. In the method, the sparse subject term information further comprises: the weight vector corresponding to the sparse subject word sequence; the step S22 may specifically include step S41 and step S42:

step S41: and after the sparse subject word sequence is digitized, inputting the sparse subject word sequence into an encoder of the pre-training transducer network to obtain a context output vector.

In this embodiment, the sparse subject word information includes a weight vector corresponding to the sparse subject word sequence in addition to the sparse subject word sequence. After the sparse subject word sequence is obtained, the sparse subject word sequence is subjected to numerical processing, and a numerical processing result is obtained. And inputting the numeric processing result into an encoder of the pre-training transducer network to obtain a context output vector output by the encoder of the transducer network.

For example, the sparse subject word sequence may be input as a feature into a token layer of a pre-trained transducer network, for example into a token layer of a pre-trained T5 model, resulting in an output of the token layer, and then the output of the token layer is input into an encoder of the T5 model, such as into a context output vector of the T5-base-encoder resulting in an output。

Step S42: and obtaining the sample sparse subject context vector based on the context output vector and the weight vector corresponding to the sparse subject word sequence.

In this embodiment, after obtaining the context output vector, the sample sparse subject context vector may be obtained according to the context output vector and the weight vector corresponding to the sparse subject word sequence. For example, the output of the sparse topic code network may be obtained by performing matrix multiplication operation on the context output vector and the weight vector corresponding to the sparse topic word sequence according to the following formula (1): sample sparse topic context vector wec:

formula [ (formula ]1）

In this embodiment, a sparse topic coding network with discrete distribution sampling is provided based on a transform model and a salient topic STL layer, so that a sample topic word distribution matrix is processed through the sparse topic coding network to obtain a sparse topic context representation fused with pre-training language model knowledge, so as to further improve the text generation effect and the interpretability of a text generation model.

In an alternative implementation, as shown in fig. 3, fig. 3 is a schematic structural diagram of a sparse topic coding network according to an embodiment of the present invention. In the embodiment, after a sample subject word distribution matrix is obtained, gummel Softmax sampling is carried out on the sample subject word distribution matrix through a sampling layer of a sparse subject coding network, so as to obtain a sample subject feature matrix; then inputting the sample topic feature matrix into a significant topic STL layer of a sparse topic coding network, and processing based on tpk network parameters of the number of non-zero elements to obtain a sparse topic matrixThe method comprises the steps of carrying out a first treatment on the surface of the Then sparse topic matrix->Feature textualization is carried out to obtain a sparse subject word sequence keyword (namely a subject keyword sequence) and weight vectors corresponding to the sparse subject word sequenceThe method comprises the steps of carrying out a first treatment on the surface of the Then, the sparse subject word sequence keyword is input into a pre-training transducer network (such as a pre-training T5 model encoder) introduced into a sparse subject coding network after being digitized, so as to obtain a context output vector->The method comprises the steps of carrying out a first treatment on the surface of the Finally, the context output vector +.>And corresponding weight vector->Matrix multiplication calculation is carried out, and output of the whole sparse topic coding network is obtained: sample sparse topic context vector wec.

In combination with the above embodiment, in an implementation manner, the embodiment of the invention further provides a model building method. In this method, the step S13 may specifically include steps S51 to S53:

step S51: and inputting the context vector of the sample sparse topic into a self-attention layer of the topic feature adaptation network through the sample sparse topic conversion unit to obtain a context vector weighting result.

In this embodiment, after obtaining the output sample sparse topic context vector of the sparse topic encoding network, the sample sparse topic context vector may be input to the self-attention layer of the topic feature adaptation network through the sample sparse topic conversion unit, and self-attention calculation is performed, so as to obtain a context vector weighting result after weighting the sample sparse topic context vector.

Step S52: and inputting the context vector weighted result to a mean pooling layer to obtain a pooling result.

In this embodiment, after obtaining the context vector weighted result, the context vector weighted result may be input to a mean pooling layer of the theme feature adapter network, and attention mean pooling processing is performed, so as to obtain a processed pooling result.

Step S53: and carrying out linear interpolation calculation on the pooling result and random noise to obtain the sample context vector.

In this embodiment, after the pooling result is obtained, random noise may be added, and the pooling result and the random noise are subjected to corresponding linear interpolation calculation, so as to obtain the output of the whole theme feature adaptive network: and a sample context vector, namely a sample sparse topic context feature vector matched with the text generation network.

In this embodiment, the decoder in the text generation network includes a cross attention layer, and the output of the sparse topic coding network cannot directly perform cross attention calculation, so this embodiment proposes a topic feature adaptation network based on a self-attention mechanism and a method of manually adding random noise disturbance, which is used to perform corresponding transformation on a sample sparse topic context vector, map the sparse topic context feature into a feature space of the autoregressive text generation network, so that the output of the topic feature adaptation network can adapt to the text generation network, and input the sample context vector into the cross attention layer in the text generation network to perform correlation calculation, and finally generate the predicted text.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a model building method. In this method, the step S51 may specifically include step S61:

step S61: and inputting the sample sparse topic context vector into a K, V matrix of the self-attention layer, inputting network parameters of the topic feature adaptation network into a Q matrix of the self-attention layer to perform self-attention calculation, and obtaining the context vector weighting result.

In this embodiment, the sample sparse topic context vector wec may be input into the K, V matrix of the self-attention layer, and the trainable network parameters in the topic feature adaptation network may be input into the Q matrix of the self-attention layerSelf-attention calculation is performed to obtain a context vector weighting result H.

By way of example, the self-attention calculation may be performed by the following formula (2):

H =formula (2)

Wherein,i.e., the weight of the self-attention, K is the K matrix, V is the V matrix, Q is the Q matrix,is the transposed matrix of the K matrix, h is the vector dimension.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a model building method. In this method, the step S53 may specifically include step S71 and step S72:

Step S71: and performing activation function calculation after the linear layer on the pooling result to obtain calculation weight for linear interpolation.

In this embodiment, the context vector weighted result is input to the mean pooling layer to obtain a pooling result, specifically, the pooling result c is obtained through the following formula (3):

formula (3)

Wherein,i.e. the context vector weighting result.

After the pooling result c is obtained, the activation function calculation after the linear layer can be performed on the pooling result c, so as to obtain the calculation weight for the linear difference. By way of example, the calculation weight may be obtained by the following equation (4)：

Formula (4)

Wherein Sigmoid is an activation function, linear is a fully connected Linear layer, and c is a pooling result.

Step S72: and performing linear difference calculation based on the calculation weight, the random noise and the pooling result to obtain the sample context vector.

In this embodiment, after obtaining the calculation weight for performing the linear difference calculation, the linear difference calculation may be performed based on the calculation weight, the added random noise, and the pooling result, to obtain the output of the whole theme feature adaptive network: sample context vector.

By way of example, a sample context vector may be obtained by the following equation (5) ：

Formula (5)

Wherein,for the sample context vector, +.>For calculating the weights c is the pooling result, +.>Is random noise.

In an alternative implementation, as shown in fig. 4, fig. 4 is a schematic structural diagram of a theme adaptation network according to an embodiment of the present invention. In this embodiment, after obtaining the sample sparse topic context vector wec output by the sparse topic encoding network, the sample sparse topic context vector wec may be input into the K, V matrix of the self-attention layer of the topic feature adaptation network to adapt the topic feature to the trainable network parameters in the networkInputting the self-attention calculation result into the self-attention layer to perform self-attention calculation, and inputting the self-attention calculation result, namely a context vector weighting result, into an attention average pooling layer to perform processing to obtain a pooling result; then determining the calculation weight based on the pooling result, and then adding random noiseAnd carrying out linear difference value calculation on the pooling result and the calculation weight to obtain the output of the whole theme feature adaptation network: sample context vector->。

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a model building method. In this method, the text generation network includes a plurality of sequentially connected text generation modules, and the step S14 may specifically include steps S81 to S83:

Step S81: and carrying out text vectorization on the sample text through the sample text generating unit to obtain a text vector.

In this embodiment, for the text generation network, the text vector may be obtained by performing text vectorization on the input sample text by the sample text generation unit. For example, a Tokenizer layer may be used to digitize a sample text (such as a text sequence), and the digitized result is input to an embedding layer, where all sequences fused with position information of the sample text (such as the text sequence) are obtained by calculation, so as to implement text vectorization and obtain a text vector.

Step S82: and inputting the text vector and the sample context vector to a first text generation module until the output of a last text generation module is obtained.

Since the text generation network of the embodiment includes a plurality of sequentially connected text generation modules, the text vector and the sample context vector output by the theme feature adaptation network may be input to the first text generation module, and the plurality of sequentially connected text generation modules sequentially perform the correlation processing until the output of the last text generation module is obtained.

Step S83: and mapping the output of the last text generation module into a vocabulary space through a linear layer with an activation function to obtain the output of the text generation network.

In this embodiment, after the output of the last text generation module is obtained, the output of the last text generation module may be mapped into the vocabulary space through a linear layer with an activation function, so as to obtain the output of the entire text generation network. In particular, the output of the last text generation module may be mapped into the vocabulary space by a linear layer with Softmax activation functions.

For example, a text generation network includes M sequentially connected text generation modules, and the output of the last text generation module is obtainedThereafter, the output of the text generation network can be obtained by the following formula (6)>：

Formula (6)

Wherein Sigmoid is an activation function, linear is a fully connected Linear layer,the output of the module is generated for the last text.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a model building method. In the method, the processing steps of each text generation module may include steps S91 and S96:

Step S91: and processing the first input of each text generation module through a mask multi-head attention layer to obtain a first output.

In this embodiment, the processing steps of each text generation module are consistent, and each text generation module has 2 inputs: a first input and a second input, wherein the second input is: sample context vector for theme feature adaptation network output. In each text generation module, first, the first input of each text generation module is processed through a Masked multi-head attention layer (Masked multi-head self-attention), resulting in a first output of the Masked multi-head attention layer.

Step S92: and carrying out residual connection on the first output and the first input to obtain a second output.

In this embodiment, after the first output of the masking multi-head attention layer is obtained, the first output of the masking multi-head attention layer is connected with the first input, which is the input of the masking multi-head attention layer, by residual connection, so as to obtain the second output.

Step S93: and carrying out layer normalization operation on the second output to obtain a text intermediate result.

In this embodiment, after the second output is obtained, the second output is subjected to a Layer normalization (Layer) operation, and a text intermediate result is obtained by processing.

Step S94: and performing cross attention calculation on the text intermediate result and the sample context vector to obtain a third output.

In this embodiment, after obtaining the text intermediate result, the text intermediate result and the second input (the sample context vector) of the text generation module are subjected to cross attention calculation of multiple attention, and a third output a is obtained.

It should be noted that the sample context vector used by each text generation module is the same.

Step S95: and carrying out residual connection on the third output and the text intermediate result to obtain a fourth output.

In this embodiment, after the third output a is obtained, the third output is connected with the normalized output of the previous layer, that is, the text intermediate result, to obtain the fourth output.

Step S96: and carrying out layer normalization operation on the fourth output to obtain the output of each text generation module.

In this embodiment, the layer normalization operation is performed on the obtained fourth output, so as to obtain the output of each text generation module. By way of example, the Output of each text generation module may be obtained by the following equation (7):

formula (7)

Wherein A is a third output obtained by cross attention calculation;normalized output for the last layer, namely a text intermediate result; layer_norm is a Layer normalization operation.

In this embodiment, the text generation network is formed by stacking a plurality of text generation modules arranged in sequence, and in each text generation module, the masking multi-head self-attention layer is not followed by the self-attention layer, but the self-attention layer following the masking multi-head self-attention layer is replaced by: the sample context vector output by the theme feature adaptation network and the output vector of the previous layer normalization layer are subjected to cross attention calculation, so that text generation is tightly combined with the output of the theme feature adaptation network, and a sparse theme controllable text generation effect is realized.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a model building method. In the method, in the case where the text generation module is the first text generation module, a first input of the text generation module: is a text vector; in the case where the text generation module is any text generation module other than the first text generation module, the first input of the text generation module is: the output of the last text generation module.

That is, since the output of the text generation module is already in vector form, the text vectorization is no longer required by the other text generation modules than the first one, but the output of the last one is directly used as the first input of the next one for the subsequent text generation processing.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a model building method. In this method, the step S94 may specifically include step S101:

step S101: and inputting the sample context vector into a K, V matrix of a multi-head attention layer, and inputting the text intermediate result into a Q matrix of the multi-head attention layer to perform cross attention calculation so as to obtain the third output.

In this embodiment, the sample context vector may beAnd inputting the text intermediate result into a K, V matrix of the multi-head attention layer, inputting the text intermediate result into a Q matrix of the multi-head attention layer, and performing cross attention calculation to obtain a third output A.

By way of example, the third output a may be obtained by performing a cross-attention calculation by the following equation (8):

formula (8)

Wherein K is a K matrix, V is a V matrix, Q is a Q matrix, and A is a third output.

In an alternative implementation, as shown in fig. 5, fig. 5 is a schematic structural diagram of a text generation network according to an embodiment of the present invention. In this embodiment, the output of the theme-feature-adapting network is obtained: sample context vectorFirstly inputting an input text into a text generation network, wherein the text generation network comprises M sequentially connected text generation modules, the input text is an input sample text, text vectorization is carried out on the input text to obtain a text vector, then the text vector is used as a first input to obtain a first output through a mask multi-head self-attention layer in a first text generation module, the first output is connected with the text vector in a residual way to obtain a second output, then the second output is subjected to layer normalization operation to obtain a text intermediate result, the text intermediate result is used as a Q matrix, and the sample context vector is subjected to layer normalization operation>Performing cross attention calculation as a K, V matrix to obtain a third output; residual connecting the third output with the text intermediate result to obtainAnd finally, carrying out layer normalization operation on the fourth Output to obtain Output1 of the first text generation module. Then, the Output1 is used as the first input of the second text generation module to process the subsequent text generation modules until the Output of the last text generation module is obtained >Output of last text generation module +.>Mapping the output to the vocabulary space by a linear layer with Softmax activation, resulting in +.>Then, text is generated according to the selected text output decoding mode, and a loss function is calculated.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a model building method. In this method, the step S15 may specifically include step S111 and step S112:

step S111: and calculating a corresponding negative log likelihood sum based on the output of the text generation network through the model training unit.

In this embodiment, the loss function for performing model training is a negative log-likelihood sum function, and after the output of the text generation network is obtained, the corresponding negative log-likelihood sum may be calculated by the model training unit based on the output of the text generation network.

Specifically, for a given sample text data setWherein->For inputting text, i.e. sample text, and the text generation model training process is autoregressive self-supervision training, i.e. on the premise of giving subject information, inputting the current position and the previous text word and predicting the next word, thus, the method is practical The penalty function of an embodiment is the negative log-likelihood sum of the output text given the input text. Further, in an alternative embodiment, the loss function of this embodiment may be a regularized negative log likelihood sum function, such as: the following equation (9) is a loss function of the present embodiment:

formula (9)

Wherein,is regularization coefficient, namely model super parameter; />For regularization term, for constraining L1 norm distance between sparse topics, the objective is to constrain sparse topic matrix +.>To minimize repetitive themes, wherein,，/>and->Is a sparse topic matrix->Is a row vector of (2); />For inputting x _j-1 After each sample text, the next sample x is predicted _j Is a probability of (2).

And, since the text generation model in this embodiment is autoregressive, the text generation model needs to decode the predictions sequentially from the current location and the previous input sample text and weighted sparse subject context vector (i.e., sample context vector)Outputting the word y, i.e. y _<j A set of model predictive terms is generated for the text preceding the current position j.

Specifically, the sample text dataset includes a plurality of sample text sequences, each sample text sequence including a plurality of sample texts. For example, for an input sample text sequence Target output text +.>Wherein->For a special flag bit at the beginning of the text,and the special flag bit is the special flag bit for ending the text.

Predicting text in the j=1 position will x= {<sos>Inputting the text as sample text into a text generation network while outputting sample context vectors through a sparse topic coding network and a topic feature adaptation networkInput to the autoregressive text generation network and then calculate equation (10):

formula (10)

Similarly, the following is true:

calculating equation (11) for predicting j=2 positions:

formula (11)

Calculation formula (12) when predicting j=3 positions:

formula (12)

Calculating equation (13) for predicting j=4 positions:

formula (13)

Wherein the model in the above formulas (10) - (13) is a text generating network. That is, after obtaining the output of the text generation network, e.g. obtainingAfterwards, let->=/>Carry-in +.about.in the loss function equation (9)>And obtaining the loss function value corresponding to the currently input sample text, namely obtaining the negative log likelihood sum of each input sample text.

Step S112: and updating the network parameters based on the negative log likelihood sum until the average loss variation amplitude on the sample data set is smaller than a preset threshold value, determining that the loss function converges, and fixing the network parameters to obtain the trained text generation model.

In this embodiment, after the negative log likelihood sum of each input text is obtained through the above-mentioned autoregressive self-monitoring training process, network parameters of three networks (i.e., a sparse topic coding network, a topic feature adapting network and a text generating network) can be optimized and updated through an AdamW optimization method until the average loss variation amplitude on the sample text data set C is smaller than a preset threshold value, the loss function convergence is determined, the network parameters of the three networks are fixed, model training is stopped, and a trained text generating model is obtained.

In an alternative embodiment, in the model training process, for a plurality of sample texts in each sample text sequence, the same sample context direction is correspondingMeasuring amount. The sparse subject sequence is resampled only once when the sample text sequence changes, i.e. different sample text sequences correspond to different sample context vectors +.>That is, the present embodiment performs co-training in units of text sequences.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a model building method. In this method, the step S11 may specifically include step S121 or step S122:

Step S121: and acquiring a built-in subject word distribution matrix in the sparse subject coding network through the sample subject word acquisition unit.

In this embodiment, two types of topic word distribution matrix acquisition modes may be provided in the model training stage, one type is that a topic word distribution matrix is built in a sparse topic code network, a built-in topic word distribution matrix in the sparse topic code network may be acquired through a sample topic word acquisition unit, and calculation processing is performed on the built-in topic word distribution matrix, so that training of the whole text generation model is performed, and a trained text generation model is obtained. In this case, the model mode of the trained text generation model is a regular mode.

Step S122: and acquiring a specific sample subject term distribution matrix input from the outside through the sample subject term acquisition unit.

The other acquisition mode in this embodiment is to acquire a specific sample subject word distribution matrix input from the outside through a sample subject word acquisition unit, that is, initialize external input data to obtain and fix the specific sample subject word distribution matrix, and perform calculation processing on the specific sample subject word distribution matrix, so as to perform training of the whole text generation model, and obtain a trained text generation model. In this case, the model mode of the trained text generation model is an external subject mode. The text generation model in the external topic mode may control the content of the text generation based on externally specified sparse topic keywords.

In this embodiment, the sparse topic controllable text generation model may be configured as a conventional text generation mode or an external topic mode, and the method may be matched with keyword knowledge graphs oriented to different vertical fields to construct a chat robot in the vertical field, or be used as a tool call of a general chat robot, so that the method has a wide application prospect in the fields of constructing a large predictive model tool and a vertical field customer service chat robot.

In an alternative embodiment, after the training of the text generation model converges, the text can be generated according to a selected autoregressive decoding algorithm, wherein the theme of the generated text is the theme distribution learned by the model in the previous training stage, and the autoregressive decoding algorithm can be selected from common text generation decoding algorithms such as Beam search, top-k decoding, top-p decoding and the like.

In an alternative implementation, as shown in fig. 6, fig. 6 is a flow chart illustrating a text generation model training phase in a conventional mode according to an embodiment of the present invention. In this embodiment, network parameters of each network of the text generation model are initialized first, then the embedded topic parameter β of the sparse topic code network is calculated, and then the output of the sparse topic code network is calculated: sample sparse topic context vector wec, which is converted wec by topic feature adaptation network to obtain sample context vector . Then for the input text x, calculating a text intermediate result Q by means of an autoregressive text generation network, and then based on the text intermediate result Q and the sample context vector +.>Performing cross attention calculation to obtain a third output A, obtaining an output of a final text generation network based on the third output A, calculating a negative log likelihood of a text sequence and a loss function L based on the output of the final text generation network, thereby updating network parameters of three networks based on the loss function values, andand judging whether the loss function is converged, under the condition that the loss function is not converged, inputting the next text, continuing to train the whole text generation model until the loss function is converged, fixing network parameters, completing training of the text generation model, and ending the training process.

In an alternative implementation, as shown in fig. 7, fig. 7 is a flow chart illustrating a text generation model training phase in an external theme mode according to an embodiment of the present invention. In this embodiment, network parameters of each network of the text generation model are initialized first, then the domain-specific subject matter word distribution β is initialized, and then the output of the sparse subject matter coding network is calculated based on the domain-specific subject matter word distribution β: sample sparse topic context vector wec, which is converted wec by topic feature adaptation network to obtain sample context vector . Then for the input text x, calculating a text intermediate result Q by means of an autoregressive text generation network, and then based on the text intermediate result Q and the sample context vector +.>And (3) performing cross attention calculation to obtain a third output A, obtaining the output of a final text generation network based on the third output A, calculating the negative log likelihood of the text sequence and a loss function L based on the output of the final text generation network, updating network parameters of the three networks based on the loss function values, judging whether the loss function is converged, inputting the next text to continue training the whole text generation model under the condition that the loss function is not converged, fixing the network parameters until the loss function is converged, finishing training the text generation model, and ending the training process.

Based on the same inventive concept, one embodiment of the invention provides a text generation method, which is applied to a text generation system with controllable sparse subjects. The sparse topic controllable text generation system of this embodiment is further configured to: and carrying out relevant text prediction of the text to be processed based on the given subject word based on the trained sparse subject controllable text generation model. And, the sparse topic controllable text generation system of this embodiment may at least include: the system comprises a target subject word acquisition unit, a target sparse subject conversion unit and a target text generation unit. Referring to fig. 8, fig. 8 is a flowchart illustrating a text generation method according to an embodiment of the present invention. As shown in fig. 8, the text generation method of the present embodiment may include steps S131 to S134:

Step S131: and acquiring a target subject term distribution matrix through the target subject term acquisition unit.

In this embodiment, the target subject word distribution matrix may be acquired by the target subject word acquisition unit first, so as to provide a core for generating text, so that the text generation model may organize words, sentences, and paragraphs around the subject words in the target subject word distribution matrix to generate text. The target subject matter word distribution matrix of the embodiment is a subject matter word distribution matrix for text generation model application reasoning, and the subject matter word distribution matrix is a polynomial subject matter word distribution matrix.

The text generation model of the present embodiment is a trained text generation model obtained by the model construction method of any one of the above embodiments. The text generation model includes at least three parts: sparse topic coding network, topic feature adaptation network and text generation network.

Step S132: and processing the target subject word distribution matrix based on a sparse subject coding network in a text generation model by the target sparse subject acquisition unit to obtain a target sparse subject context vector.

In this embodiment, after the target subject word distribution matrix is acquired, the target sparse subject word distribution matrix may be processed by the target sparse subject acquisition unit based on the sparse subject encoding network in the text generation model, so as to obtain the target sparse subject context vector. The target sparse topic context vector is a sparse topic word vector of the presence context representation for model reasoning applications.

Step S133: and converting the target sparse topic context vector based on a topic feature adaptation network in a text generation model by the target sparse topic conversion unit to obtain a target context vector.

In this embodiment, the target sparse topic context vector output by the sparse topic coding network is not adapted to the input of the text generation network, and cannot be directly input to the text generation network to perform text generation model training, so in the text generation model of this embodiment, a topic feature adaptation network is connected between the sparse topic coding network and the text generation network, and the topic feature adaptation network is mainly used for performing transformation processing on the target sparse topic context vector, so that the output of the topic feature adaptation network is adapted to the text generation network.

Therefore, in this embodiment, after the target sparse topic context vector is obtained, the target sparse topic context vector is input to the topic feature adaptation network through the target sparse topic conversion unit, and the target sparse topic context vector is converted through the topic feature adaptation network, so as to obtain the output of the topic feature adaptation network: a target context vector. The target context vector in this embodiment is a context vector in which the target sparse topic context vector is converted.

Step S134: and processing the target context vector and the text to be processed based on a text generation network in a text generation model by the target text generation unit to obtain a target predicted text.

In this embodiment, for a text to be processed that needs to be subjected to text prediction, the obtained target context vector and the text to be processed may be input to a text generation network of a text generation model through a target text generation unit, so that the target context vector and the text to be processed are processed through the text generation network to obtain an output of the text generation network, and the output of the text generation network is decoded according to a selected decoding algorithm to obtain an output of the entire text generation model: target predicted text.

In this embodiment, the text to be processed and the obtained target context vector are processed through the pre-trained text generation model, so that the target prediction text obtained by reasoning has better interpretability.

In combination with the above embodiment, in an implementation manner, the embodiment of the present invention further provides a text generating method. In this method, the step 131 may specifically include step S141 and step S142:

Step S141: and determining a model mode of the text generation model through the target subject term acquisition unit.

In this embodiment, the trained text generation models correspond to respective model modes, where the model modes include: a regular mode or an external theme mode. Therefore, in the application of the inference of the text generation model, for the acquisition of the target subject matter distribution matrix, the model mode of the text generation model may be determined by the target subject matter acquisition unit first.

Step S142: and determining the target subject term distribution matrix based on the model mode.

In this embodiment, the target subject term distribution matrix may be determined based on the determined model mode of the text generation model.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a text generating method. In this method, the step S142 may specifically include step S151 and step S152:

step S151: and under the condition that the model mode is a conventional mode, determining the target subject word distribution matrix as a built-in subject word distribution matrix in the text generation model.

In this embodiment, when it is determined that the model mode of the text generation model is the normal mode, it may be determined that the target subject word distribution matrix is a built-in subject word distribution matrix in the text generation model, so that when the model is applied, the built-in subject word distribution matrix in the sparse subject coding network of the text generation model may be directly calculated.

Step S152: and under the condition that the model mode is an external theme mode, determining that the target subject word distribution matrix is a specific subject word distribution matrix input externally.

In this embodiment, when the model mode of the text generation model is determined to be the external topic mode, it may be determined that the target topic word distribution matrix is an externally input specific topic word distribution matrix, for example, when the specific topic word distribution matrix only includes topic words related to "consumer electronics", "chip", "personal terminal device", "computer", and given that "apple" is the text to be processed initially input by the autoregressive text generation model, text related to electronic devices such as "apple phone" will be mostly output, so that when the model is applied, the externally input specific topic word distribution matrix may be loaded, and thus the text generation model may generate text in a specified topic space context.

In an alternative implementation, as shown in fig. 9, fig. 9 is a flow chart illustrating a text generation model reasoning phase in a conventional mode according to an embodiment of the present invention. In this embodiment, network parameters of each network of the text generation model are loaded first, then, the embedded topic parameter β of the sparse topic code network is calculated, and the output of the sparse topic code network is calculated: target sparse topic context vector wec, converting wec via topic feature adaptation network to obtain target context vector . Then for the input text x, calculating a text intermediate result Q by means of an autoregressive text generation network, and then based on the text intermediate result Q and the target context vector +.>And performing cross attention calculation to obtain a third output A, obtaining the output of the final text generation network based on the third output A, and generating a target prediction text according to the selected decoding algorithm so as to output a text sequence y.

In an alternative implementation, as shown in fig. 10, fig. 10 is a flow chart illustrating a text generation model reasoning phase in an external topic mode in accordance with an embodiment of the present invention. In this embodiment, network parameters of each network of the text generation model are loaded first, then the distribution β of the subject words of the specific field subject is initialized, and a sparse subject coding network is calculatedOutput of the complex: target sparse topic context vector wec, converting wec via topic feature adaptation network to obtain target context vector. Then for the input text x, calculating a text intermediate result Q by means of an autoregressive text generation network, and then based on the text intermediate result Q and the target context vector +.>And performing cross attention calculation to obtain a third output A, obtaining the output of the final text generation network based on the third output A, and generating a target prediction text according to the selected decoding algorithm so as to output a text sequence y.

In combination with any of the above embodiments, in an implementation manner, the embodiment of the present invention further provides a text generating method. In this method, the above sparse topic controllable text generation system may further include: a parameter deriving unit and a keyword determining unit. And, in addition to the above steps, step S161 and step S162 may be included:

step S161: and the network parameters of the sparse topic coding network are derived through the parameter derivation unit.

In this embodiment, in the reasoning application process of the text generation model, the network parameters of the sparse topic coding network in the text generation model may be derived by the parameter derivation unit.

Step S162: and determining sparse topic keywords corresponding to the target predicted text based on the network parameters through the keyword determination unit.

In this embodiment, the sparse topic keywords corresponding to the target predicted text output by the text generation model may be determined by the keyword determination unit based on the derived network parameters of the sparse topic coding network in the text generation model.

In this embodiment, for a given output text, the parameters of the sparse topic code network may be derived to obtain a sparse topic keyword as the context of the generated text.

In combination with the above embodiment, in an alternative implementation manner, the embodiment proposes a collaborative training topic mining model and a text generation method based on an autoregressive transform framework, that is, an integrated collaborative training scheme of a sparse topic mining model and an autoregressive text generation model is provided, so that a collaborative training sparse topic mining module and an autoregressive text generation module are possible. In the embodiment, the topic context representation fused with the knowledge of the pre-training language model is obtained based on the autoregressive transform sparse topic feature extraction module (namely the sparse topic coding network), then the output of the topic context representation is matched with the autoregressive text generation model through the proposed topic feature adaptation network, and finally the autoregressive text generation guided by the sparse topic is realized through training on a large-scale corpus, so that the interpretability of the autoregressive text generation model is improved.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Based on the same inventive concept, an embodiment of the present invention provides a model building apparatus 1100, where the model building apparatus 1100 is applied to a sparse topic controllable text generation system. The sparse topic controllable text generation system at least comprises: the system comprises a sample subject word acquisition unit, a sample sparse subject conversion unit, a sample text generation unit and a model training unit. Referring to fig. 11, fig. 11 is a block diagram showing a construction of a model construction apparatus according to an embodiment of the present invention. The model constructing apparatus 1100 includes:

a first obtaining module 1101, configured to obtain a sample subject term distribution matrix through the sample subject term obtaining unit;

the first processing module 1102 is configured to process, by using the sample sparse topic acquisition unit, the sample topic word distribution matrix based on a sparse topic coding network in a text generation model to be trained, so as to obtain a sample sparse topic context vector;

the first conversion module 1103 is configured to convert, by using the sample sparse topic conversion unit, the sample sparse topic context vector based on a topic feature adaptation network in a text generation model to be trained, to obtain a sample context vector;

A text output module 1104, configured to process, by the sample text generating unit, the sample context vector and the sample text in the sample text dataset based on a text generating network in a text generating model to be trained, to obtain an output of the text generating network;

the model training module 1105 is configured to calculate, by using the model training unit, a loss function value based on an output of the text generation network, and update network parameters of the sparse topic coding network, the topic feature adaptation network, and the text generation network according to the loss function value until the loss function converges, to obtain a trained text generation model.

Optionally, the first processing module 1102 includes:

the sampling submodule is used for sampling a sparse subject word sequence in the sample subject word distribution matrix based on the sparse subject encoding network through the sample sparse subject acquisition unit;

and the first processing submodule is used for inputting the sparse subject word sequence into a pre-training transducer network in the sparse subject coding network to obtain the sample sparse subject context vector.

Optionally, the sampling submodule includes:

The first sampling submodule is used for sampling the sample subject term distribution matrix to obtain a sample subject feature matrix;

the second processing submodule is used for inputting the sample theme feature matrix into the obvious theme STL layer, and processing the sample theme feature matrix based on the network parameters of the number of non-zero elements to obtain a sparse theme matrix;

the textualization sub-module is used for textualizing the sparse subject matrix to obtain sparse subject word information, and the sparse subject word information at least comprises: the sparse subject word sequence.

Optionally, the sparse subject term information further includes: the weight vector corresponding to the sparse subject word sequence;

the first processing sub-module includes:

the first input sub-module is used for inputting the sparse subject word sequence into an encoder of the pre-training transducer network after the sparse subject word sequence is digitized to obtain a context output vector;

and the feature processing sub-module is used for obtaining the sample sparse subject context vector based on the context output vector and the weight vector corresponding to the sparse subject word sequence.

Optionally, the first conversion module 1103 includes:

the self-attention sub-module is used for inputting the sparse topic context vector to a self-attention layer of the topic feature adaptation network through the sample sparse topic conversion unit to obtain a context vector weighting result;

Chi Huazi module, configured to input the context vector weighted result to a mean pooling layer to obtain a pooling result;

and the linear difference value submodule is used for carrying out linear interpolation calculation on the pooling result and random noise to obtain the sample context vector.

Optionally, the self-attention sub-module includes:

and the self-attention weighting sub-module is used for inputting the sparse topic context vector into a K, V matrix of the self-attention layer, inputting network parameters of the topic feature adaptation network into a Q matrix of the self-attention layer for self-attention calculation, and obtaining the context vector weighting result.

Optionally, the linear difference submodule includes:

the first calculation sub-module is used for performing activation function calculation after the linear layer on the pooling result to obtain calculation weight for linear interpolation;

and the second computing sub-module is used for carrying out linear difference value computation based on the computing weight, the random noise and the pooling result to obtain the sample context vector.

Optionally, the text generation network includes a plurality of sequentially connected text generation modules, and the text output module 1104 includes:

The vectorization submodule is used for carrying out text vectorization on the sample text through the sample text generating unit to obtain a text vector;

the second input sub-module is used for inputting the text vector and the sample context vector to the first text generation module until the output of the last text generation module is obtained;

and the mapping sub-module is used for mapping the output of the last text generation module into a word list space through a linear layer with an activation function to obtain the output of the text generation network.

Optionally, the apparatus 1100 further includes: a text processing module for executing the processing steps of each text generation module, the text processing module comprising:

the third processing submodule is used for processing the first input of each text generating module through a mask multi-head attention layer to obtain a first output;

a fourth processing submodule, configured to perform residual connection on the first output and the first input, so as to obtain a second output;

a fifth processing sub-module, configured to perform a layer normalization operation on the second output to obtain a text intermediate result;

A sixth processing sub-module, configured to perform cross attention computation on the text intermediate result and the sample context vector, to obtain a third output;

a seventh processing sub-module, configured to perform residual connection on the third output and the text intermediate result, to obtain a fourth output;

and the eighth processing sub-module is used for carrying out layer normalization operation on the fourth output to obtain the output of each text generation module.

Optionally, in the case that the text generation module is the first text generation module, the first input is the text vector;

in the case where the text generation module is any text generation module other than the first text generation module, the first input is an output of a last text generation module.

Optionally, the sixth processing sub-module includes:

and the cross attention sub-module is used for inputting the sample context vector into a K, V matrix of the multi-head attention layer, inputting the text intermediate result into a Q matrix of the multi-head attention layer for cross attention calculation, and obtaining the third output.

Optionally, the loss function is a negative log likelihood sum function, and the model training module 1105 includes:

A third calculation sub-module, configured to calculate, by the model training unit, a corresponding negative log-likelihood sum based on an output of the text generation network;

and the model training sub-module is used for updating the network parameters based on the negative log likelihood and until the average loss change amplitude on the sample data set is smaller than a preset threshold value, determining that the loss function converges, and fixing the network parameters to obtain the trained text generation model.

Optionally, the first obtaining module 1101 includes:

the first subject term acquisition sub-module is used for acquiring a built-in subject term distribution matrix in the sparse subject coding network through the sample subject term acquisition unit;

and the second subject term acquisition sub-module is used for acquiring a specific sample subject term distribution matrix input from the outside through the sample subject term acquisition unit.

Based on the same inventive concept, an embodiment of the present invention provides a text generating apparatus 1200, where the text generating apparatus 1200 is applied to a text generating system with controllable sparse topics. The sparse topic controllable text generation system at least comprises: the system comprises a target subject word acquisition unit, a target sparse subject conversion unit and a target text generation unit. Referring to fig. 12, fig. 12 is a block diagram showing a structure of a text generating apparatus according to an embodiment of the present invention. As shown in fig. 12, the apparatus 1200 includes:

A second obtaining module 1201, configured to obtain a target subject term distribution matrix through the target subject term obtaining unit;

the second processing module 1202 is configured to process, by using the target sparse topic acquisition unit, the target topic word distribution matrix based on a sparse topic coding network in a text generation model, to obtain a target sparse topic context vector;

the second conversion module 1203 is configured to convert, by using the target sparse topic conversion unit, the target sparse topic context vector based on a topic feature adaptation network in a text generation model, to obtain a target context vector;

the text prediction module 1204 is configured to process, by the target text generating unit, the target context vector and the text to be processed based on a text generating network in a text generating model, to obtain a target predicted text;

the text generation model is a trained text generation model obtained by the model construction method according to any embodiment.

Optionally, the second obtaining module 1201 includes:

the mode determining module is used for determining a model mode of the text generation model through the target subject term acquiring unit;

And the subject term determination submodule is used for determining the target subject term distribution matrix based on the model mode.

Optionally, the subject term determination submodule includes:

the conventional mode determining submodule is used for determining that the target subject word distribution matrix is a built-in subject word distribution matrix in the text generation model under the condition that the model mode is a conventional mode;

and the external mode determining submodule is used for determining that the target subject word distribution matrix is a specific subject word distribution matrix input externally under the condition that the model mode is an external subject mode.

Optionally, the sparse topic controllable text generation system further includes: a parameter deriving unit and a keyword determining unit; the apparatus 1200 further comprises:

the parameter export module is used for exporting network parameters of the sparse topic coding network through the parameter export unit;

and the sparse subject term determining module is used for determining the sparse subject term corresponding to the target predicted text based on the network parameter through the keyword determining unit.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device 1300, as shown in fig. 13. Fig. 13 is a schematic diagram of an electronic device according to an embodiment of the present invention. The electronic device comprises a processor 1301, a memory 1302 and a computer program stored on the memory 1302 and executable on the processor 601, which computer program, when being executed by the processor, implements the steps of the model building method according to any of the above embodiments of the invention or the steps of the text generating method according to any of the above embodiments of the invention.

Based on the same inventive concept, another embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in the model building method according to any one of the above embodiments of the present invention, or implements the steps in the text generating method according to any one of the above embodiments of the present invention.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The model construction method is characterized by being applied to a text generation system with controllable sparse topics, wherein the text generation system with controllable sparse topics at least comprises: the system comprises a sample subject word acquisition unit, a sample sparse subject conversion unit, a sample text generation unit and a model training unit; the method comprises the following steps:

2. The method for constructing a model according to claim 1, wherein the generating, by the sample sparse topic acquisition unit, the sparse topic coding network in the model based on the text to be trained, and processing the sample topic word distribution matrix to obtain a sample sparse topic context vector, includes:

sampling a sparse subject word sequence in the sample subject word distribution matrix based on the sparse subject coding network through the sample sparse subject acquisition unit;

and inputting the sparse subject word sequence to a pre-training transducer network in the sparse subject coding network to obtain the sample sparse subject context vector.

3. The method of claim 2, wherein sampling the sparse subject word sequence in the sample subject word distribution matrix comprises:

Sampling the sample subject term distribution matrix to obtain a sample subject feature matrix;

inputting the sample theme feature matrix to a salient theme STL layer, and processing based on non-zero element number network parameters to obtain a sparse theme matrix;

performing feature textualization on the sparse topic matrix to obtain sparse topic word information, wherein the sparse topic word information at least comprises: the sparse subject word sequence.

4. The model construction method according to claim 3, wherein the sparse subject term information further includes: the weight vector corresponding to the sparse subject word sequence;

the step of inputting the sparse subject word sequence to a pre-training transducer network in the sparse subject coding network to obtain the sample sparse subject context vector comprises the following steps:

after the sparse subject word sequence is digitized, inputting the sparse subject word sequence into an encoder of the pre-training transducer network to obtain a context output vector;

and obtaining the sample sparse subject context vector based on the context output vector and the weight vector corresponding to the sparse subject word sequence.

5. The method according to claim 1, wherein the converting, by the sample sparse topic conversion unit, the sample sparse topic context vector based on a topic feature adaptation network in a text generation model to be trained to obtain a sample context vector includes:

Inputting the context vector of the sample sparse topic into a self-attention layer of the topic feature adaptation network through the sample sparse topic conversion unit to obtain a context vector weighting result;

inputting the context vector weighted result to a mean pooling layer to obtain a pooling result;

and carrying out linear interpolation calculation on the pooling result and random noise to obtain the sample context vector.

6. The method of claim 5, wherein inputting the sample sparse topic context vector to the self-attention layer of the topic feature adaptation network yields a context vector weighted result, comprising:

and inputting the sample sparse topic context vector into a K, V matrix of the self-attention layer, inputting network parameters of the topic feature adaptation network into a Q matrix of the self-attention layer to perform self-attention calculation, and obtaining the context vector weighting result.

7. The method of model construction according to claim 5, wherein the performing linear interpolation calculation on the pooled result and random noise to obtain the sample context vector includes:

Performing activation function calculation after the linear layer on the pooling result to obtain calculation weight for linear interpolation;

and performing linear difference calculation based on the calculation weight, the random noise and the pooling result to obtain the sample context vector.

8. The model construction method according to claim 1, wherein the text generation network includes a plurality of sequentially connected text generation modules, and the processing, by the sample text generation unit, the sample context vector and the sample text in the sample text dataset based on the text generation network in the text generation model to be trained, to obtain an output of the text generation network includes:

performing text vectorization on the sample text through the sample text generating unit to obtain a text vector;

inputting the text vector and the sample context vector to a first text generation module until the output of a last text generation module is obtained;

and mapping the output of the last text generation module into a vocabulary space through a linear layer with an activation function to obtain the output of the text generation network.

9. The model building method according to claim 8, wherein the processing steps of each of the text generation modules are as follows:

processing the first input of each text generation module through a mask multi-head attention layer to obtain a first output;

residual connection is carried out on the first output and the first input, so that a second output is obtained;

performing layer normalization operation on the second output to obtain a text intermediate result;

performing cross attention calculation on the text intermediate result and the sample context vector to obtain a third output;

residual connection is carried out on the third output and the text intermediate result, so that a fourth output is obtained;

and carrying out layer normalization operation on the fourth output to obtain the output of each text generation module.

10. The model construction method according to claim 9, wherein the first input is the text vector in the case where the text generation module is the top text generation module;

11. The method of model construction according to claim 9, wherein the performing cross-attention computation on the text intermediate result and the sample context vector to obtain a third output comprises:

and inputting the sample context vector into a K, V matrix of a multi-head attention layer, and inputting the text intermediate result into a Q matrix of the multi-head attention layer to perform cross attention calculation so as to obtain the third output.

12. The model construction method according to claim 1, wherein the loss function is a negative log likelihood sum function, the calculating, by the model training unit, a loss function value based on an output of the text generation network, updating network parameters of the sparse topic coding network, the topic feature adaptation network, and the text generation network according to the loss function value until the loss function converges, and obtaining a trained text generation model includes:

calculating, by the model training unit, a corresponding negative log-likelihood sum based on an output of the text generation network;

and updating the network parameters based on the negative log likelihood sum until the average loss variation amplitude on the sample data set is smaller than a preset threshold value, determining that the loss function converges, and fixing the network parameters to obtain the trained text generation model.

13. The method according to any one of claims 1 to 12, wherein the obtaining, by the sample subject term obtaining unit, a sample subject term distribution matrix includes:

acquiring a built-in subject word distribution matrix in the sparse subject coding network through the sample subject word acquisition unit; or alternatively, the first and second heat exchangers may be,

and acquiring a specific sample subject term distribution matrix input from the outside through the sample subject term acquisition unit.

14. A text generation method, characterized by being applied to a text generation system with controllable sparse topics, wherein the text generation system with controllable sparse topics at least comprises: the system comprises a target subject word acquisition unit, a target sparse subject conversion unit and a target text generation unit; the method comprises the following steps:

the text generation model is a trained text generation model obtained by the model construction method according to any one of claims 1 to 13.

15. The text generation method according to claim 14, wherein the obtaining, by the target subject term obtaining unit, a target subject term distribution matrix includes:

determining a model mode of the text generation model through the target subject term acquisition unit;

and determining the target subject term distribution matrix based on the model mode.

16. The text generation method of claim 15, wherein the determining the target subject term distribution matrix based on the model pattern comprises:

under the condition that the model mode is a conventional mode, determining that the target subject word distribution matrix is a built-in subject word distribution matrix in the text generation model;

and under the condition that the model mode is an external theme mode, determining that the target subject word distribution matrix is a specific subject word distribution matrix input externally.

17. A method of generating text according to any of claims 14 to 16, wherein the sparse subject controllable text generating system further comprises: a parameter deriving unit and a keyword determining unit; the method further comprises the steps of:

the network parameters of the sparse topic coding network are derived through the parameter deriving unit;

and determining sparse topic keywords corresponding to the target predicted text based on the network parameters through the keyword determination unit.

18. A model building device, characterized in that it is applied to a text generation system with controllable sparse topic, the text generation system with controllable sparse topic at least comprises: the system comprises a sample subject word acquisition unit, a sample sparse subject conversion unit, a sample text generation unit and a model training unit; the device comprises:

19. A text generation device, characterized by being applied to a text generation system with controllable sparse topics, wherein the text generation system with controllable sparse topics at least comprises: the system comprises a target subject word acquisition unit, a target sparse subject conversion unit and a target text generation unit; the device comprises:

20. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the model building method according to any one of claims 1 to 13 or the text generating method according to any one of claims 14 to 17.

21. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the model building method according to any one of claims 1 to 13, or the steps of the text generating method according to any one of claims 14 to 17.