CN115358242A

CN115358242A - Method and device for generating user description text based on text generation network

Info

Publication number: CN115358242A
Application number: CN202210949828.6A
Authority: CN
Inventors: 李怀松; 张天翼; 黄涛
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2022-11-18
Also published as: CN112949315B; CN112949315A

Abstract

The embodiment of the specification provides a method and a device for generating a user description text based on a text generation network, wherein the method comprises the following steps: inputting each feature of a target user into a first encoder, acquiring each initial user feature vector corresponding to each feature through the first encoder, and encoding based on a self-attention mechanism to obtain an encoding state vector; inputting the coding state vector into a retrieval model, retrieving K sentences from an artificial knowledge base through the retrieval model, determining word coding vectors corresponding to all words contained in the K sentences, determining all attention coefficients according to output feedback vectors of a decoder and the word coding vectors, and performing weighted summation on all the word coding vectors according to all the attention coefficients to obtain semantic representation vectors; and inputting the coding state vector and the semantic representation vector into a decoder, generating a user description text of a target user through the decoder, and taking the hidden state of the decoder as an output feedback vector. The quality of the obtained text can be improved.

Description

Method and device for generating user description text based on text generation network

The application is a divisional application of an invention patent application with the application number of 202110189520.1, entitled "method and device for generating user description text based on text generating network", which is filed on 2021, 2, 19.

Technical Field

One or more embodiments of the present specification relate to the field of computers, and more particularly, to a method and apparatus for generating user description text based on a text generation network.

Background

Since the user characteristics of the user have an association relationship with the user category of the user, the user can be classified based on the user characteristics of the user. The user characteristics may be data of the user's age, academic history, income, etc., and the user categories may include a plurality of predetermined categories, for example, whether there is a risk of repayment, whether there is a risk of illegal money transfer, etc. Generally, it is not convincing to only give the user characteristics of the user and the user category of the user, so after the user characteristics of the user are obtained, a user description text needs to be generated based on the user characteristics, and the user description text comprises a plurality of sentences, which can embody the association relationship between the user characteristics and the user category. The requirement for the user description text is a normative message with compact logic, sufficient demonstration, simplicity and understandability.

In the prior art, user description texts are often generated based on a text generation network, that is, the user description texts are generated in a machine learning manner, and because the number of training samples for training the text generation network is small, the quality of texts obtained in the manner is poor.

Accordingly, improved solutions are desired that improve the quality of the resulting text.

Disclosure of Invention

One or more embodiments of the present specification describe a method and an apparatus for generating a user description text based on a text generation network, which can improve the quality of the obtained text.

In a first aspect, a method for generating a user description text based on a text generation network is provided, the text generation network comprising a first encoder, a retrieval model and a decoder, the method comprising:

inputting various characteristics of a target user into a first encoder, acquiring initial user characteristic vectors corresponding to the various characteristics of the target user through the first encoder, and encoding the initial user characteristic vectors based on a self-attention mechanism to obtain encoding state vectors;

inputting the coding state vector into a retrieval model, retrieving K sentences from an artificial knowledge base through the retrieval model, determining word coding vectors corresponding to all words contained in the K sentences, determining attention coefficients corresponding to all words according to output feedback vectors of a decoder and the word coding vectors, and performing weighted summation on all word coding vectors according to the attention coefficients to obtain semantic representation vectors corresponding to the K sentences;

and inputting the coding state vector and the semantic representation vector into a decoder, generating a user description text of the target user through the decoder, and taking the hidden state of the decoder as the output feedback vector.

In a possible implementation manner, the first encoder includes a time-sequential unidirectional coding structure, the inputting each feature of the target user into the first encoder, and obtaining, by the first encoder, each initial user feature vector corresponding to each feature of the target user respectively includes:

and sequentially taking each feature of the target user as the input of each moment of the first encoder, and taking the output of each moment of the first encoder as the feature vector of each initial user.

In a possible implementation manner, the first encoder includes a time-series-based bidirectional encoding structure, the inputting each feature of the target user into the first encoder, and obtaining, by the first encoder, each initial user feature vector corresponding to each feature of the target user respectively includes:

inputting each feature of the target user into the first encoder in sequence according to a first sequence, and obtaining a first feature vector of each feature according to the output of each moment of the first encoder;

inputting each feature of the target user into the first encoder in sequence according to the sequence opposite to the first sequence, and obtaining a second feature vector of each feature according to the output of each moment of the first encoder;

and combining the first feature vector and the second feature vector corresponding to the same item of feature to serve as an initial user feature vector of the item of feature.

In a possible implementation, the encoding each initial user feature vector based on an attention-based mechanism includes:

and determining weights corresponding to the initial user characteristic vectors respectively, and performing weighted summation on the initial user characteristic vectors according to the weights to obtain the coding state vector.

In a possible implementation, the search model includes a second encoder, and the determining a word-encoding vector corresponding to each word included in the K statements includes:

acquiring word embedding vectors corresponding to all words contained in the K sentences;

and inputting each word embedding vector into a second encoder, and determining a word encoding vector corresponding to each word contained in the K sentences through the second encoder based on an attention mechanism.

In a possible implementation, the decoder includes a time-sequence-based decoding structure, the decoder takes the encoded state vector as an initial state, takes the decoder output at the previous time and the semantic representation vector output at the previous time of the retrieval model as inputs at the current time, and determines the output and hidden state at the current time, the hidden state at the current time is fed back to the retrieval model as an output feedback vector at the current time, and the output at each time corresponds to each word in the user description text.

In one possible embodiment, the method further comprises:

and performing model training on at least one of the first encoder, the retrieval model and the decoder by using a first type of samples and a second type of samples, wherein the first type of samples have various characteristics of sample users and classification labels in two classifications corresponding to the sample users, the second type of samples have various characteristics of the sample users and sample description texts corresponding to the sample users, and the number of the first type of samples is greater than that of the second type of samples.

Further, the model training comprises:

performing model pre-training on the first encoder by using the first type samples;

continuing model training of at least one of the first encoder, the search model and the decoder after pre-training with the second type of samples.

Further, the model training comprises:

and mixing the first type samples and the second type samples together, randomly disturbing the sequence, and performing model training on at least one of the first encoder, the retrieval model and the decoder in batches according to the disturbed sequence.

Further, the model training utilizes a preset total loss function to determine the total prediction loss of the first type samples and the second type samples in the same batch, and the parameters of at least one of the first encoder, the retrieval model and the decoder are adjusted according to the total prediction loss; the total loss function is determined by a first loss function and a second loss function, wherein the function value of the first loss function is determined according to the probability of the first type of sample classification, and the function value of the second loss function is determined according to the probability of the second type of sample output by the decoder corresponding to each word in the sample description text.

In a second aspect, there is provided an apparatus for generating user description text based on a text generation network comprising a first encoder, a retrieval model and a decoder, the apparatus comprising:

the encoding unit is used for inputting various features of a target user into the first encoder, acquiring initial user feature vectors corresponding to the various features of the target user through the first encoder, and encoding the initial user feature vectors based on a self-attention mechanism to obtain encoding state vectors;

a retrieval unit, configured to input the coding state vector obtained by the coding unit into a retrieval model, retrieve K statements from an artificial knowledge base through the retrieval model, determine a word coding vector corresponding to each word included in the K statements, determine each attention coefficient corresponding to each word according to an output feedback vector of the decoder and the word coding vector, and perform weighted summation on each word coding vector according to each attention coefficient to obtain a semantic representation vector corresponding to the K statements;

and the decoding unit is used for inputting the coding state vector obtained by the coding unit and the semantic representation vector obtained by the retrieval unit into a decoder, generating a user description text of the target user through the decoder, and taking the hidden state of the decoder as the output feedback vector.

In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

In a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.

According to the method and the device provided by the embodiment of the specification, various characteristics of a target user are input into a first encoder, initial user characteristic vectors corresponding to the various characteristics of the target user are obtained through the first encoder, and the initial user characteristic vectors are encoded based on a self-attention mechanism to obtain an encoding state vector; then inputting the coding state vector into a retrieval model, retrieving K sentences from an artificial knowledge base through the retrieval model, determining word coding vectors corresponding to all words contained in the K sentences, determining attention coefficients corresponding to all words according to output feedback vectors of a decoder and the word coding vectors, and performing weighted summation on all word coding vectors according to the attention coefficients to obtain semantic representation vectors corresponding to the K sentences; and finally, inputting the coding state vector and the semantic representation vector into a decoder, generating a user description text of the target user through the decoder, and taking the hidden state of the decoder as the output feedback vector. As can be seen from the above, in the embodiments of the present specification, not only are initial user feature vectors corresponding to various features of a target user respectively utilized, but also semantic representation vectors corresponding to retrieved K statements are utilized, and since the K statements are derived from an artificial knowledge base, the most relevant artificial experiences with the target user can be effectively utilized, problems of word folding, word errors, and the like can be well solved, the applicability is strong, the text quality is good, and the retrieval model and the decoder interact with each other, the semantic representation vectors output by the retrieval model can be used as input of the decoder, user description texts generated by the decoder are affected, meanwhile, hidden states of the decoder can be used as output feedback vectors, the hidden states of the decoder can be used as input parts of an attention machine of the retrieval model, the semantic representation vectors output by the retrieval model are affected, and thus the quality of the obtained text can be further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present specification;

FIG. 2 illustrates a flow diagram of a method for generating user description text based on a text generation network, according to one embodiment;

FIG. 3 shows a schematic block diagram of a first encoder according to one embodiment;

FIG. 4 illustrates a structural diagram of a retrieval model according to one embodiment;

FIG. 5 shows a block diagram of a decoder according to an embodiment;

FIG. 6 illustrates a training sample shuffle order diagram in accordance with one embodiment;

FIG. 7 shows a schematic block diagram of an apparatus for generating user description text based on a text generation network according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. The implementation scenario relates to the generation of user description texts based on a text generation network, the text generation network is used as a neural network model and can be obtained in a machine learning mode, and the input of the text generation network is various characteristics of a target user. It will be appreciated that the user may be classified based on the user characteristics of the user. The user characteristics may be data of the user's age, academic history, income, etc., and the user categories may include a plurality of predetermined categories, for example, whether there is a risk of repayment, whether there is a risk of illegal fund transfer, etc. The user description text comprises a plurality of sentences, and can embody the association relation between the user characteristics and the user categories. The requirement for the user description text is a normative message with compact logic, sufficient demonstration, simplicity and understandability.

Referring to fig. 1, the table lists feature names and corresponding feature values of various features of the user a, and it can be understood that the target user is the user a, the feature name is age, the feature value is 50 years old, the feature name is scholarly, the feature value is high school, \8230, and the feature name is annual income, the feature value is 3 ten thousand yuan. According to the feature names of the features of the user A and the corresponding feature values of the features, the generated user description text is ' the user A is older and has a lower school history ' \ 8230 \ 8230; ' the annual income is lower, and therefore the repayment risk ' is realized '. In the embodiment of the present specification, the user characteristics may include, but are not limited to, the above-listed user attribute characteristics such as age, academic degree, and income of the year, and may further include historical behavior characteristics of the user for a specific application, for example, a historical debit amount, whether there is a delayed payment, and the like. The specific content and manner of generation of the user description text is generally not fixed. In the embodiment of the specification, the user description text is generated by combining expert experience and machine learning, and it can be understood that the user description text is automatically generated by the machine through the expert experience, namely manual experience, and the efficiency is high. The embodiment of the specification utilizes initial user feature vectors corresponding to various features of a target user respectively, and also utilizes semantic representation vectors corresponding to retrieved K sentences, and the K sentences are derived from an artificial knowledge base, so that the most relevant artificial experience of the target user can be effectively utilized, the problems of word folding, wrong words and the like can be well solved, the applicability is high, the text quality is good, in addition, the semantic representation vectors output by a retrieval model are influenced by outputting feedback vectors during decoding, and the obtained text quality can be further improved.

Fig. 2 shows a flowchart of a method for generating user description text based on a text generation network comprising a first encoder, a search model and a decoder, which may be based on the implementation scenario shown in fig. 1, according to an embodiment. As shown in fig. 2, the method for generating a user description text based on a text-generating network in this embodiment includes the following steps: step 21, inputting each feature of a target user into a first encoder, obtaining each initial user feature vector corresponding to each feature of the target user through the first encoder, and encoding each initial user feature vector based on a self-attention mechanism to obtain an encoding state vector; step 22, inputting the coding state vector into a retrieval model, retrieving K sentences from an artificial knowledge base through the retrieval model, determining word coding vectors corresponding to the words contained in the K sentences, determining attention coefficients corresponding to the words according to the output feedback vector of the decoder and the word coding vectors, and performing weighted summation on the word coding vectors according to the attention coefficients to obtain semantic representation vectors corresponding to the K sentences; and step 23, inputting the coding state vector and the semantic representation vector into a decoder, generating a user description text of the target user through the decoder, and taking the hidden state of the decoder as the output feedback vector. Specific execution modes of the above steps are described below.

Firstly, in step 21, inputting each feature of a target user into a first encoder, obtaining each initial user feature vector corresponding to each feature of the target user through the first encoder, and encoding each initial user feature vector based on a self-attention mechanism to obtain an encoding state vector. It is understood that the first encoder may be based on various model structures, such as a transformer (transform), a long-short-term memory (LSTM) or a gated round-robin unit (GRU).

In the embodiments of the present specification, the types of the features include: numeric or textual.

For example, the age of the user a is 50 years, the age is a feature of a numerical type, the feature name of the feature is age, and the corresponding feature value of the feature is 50; the place where the user A normally lives is Beijing and Shanghai, the feature of the place belongs to a text type feature, the feature name of the feature is the place where the user A normally lives, and the corresponding feature values are Beijing and Shanghai.

It will be appreciated that the type of feature is also the type of its corresponding feature value.

In the embodiment of the present specification, for a feature whose type is a numerical type, a feature name of the feature and an original feature value corresponding to the feature name may be input to a first encoder; for the feature with text type, the corresponding original feature value can be firstly subjected to word segmentation processing to obtain a plurality of word segmentation results, and then the feature name of the feature and the plurality of word segmentation results corresponding to the feature are input into the first encoder.

In one example, the first encoder includes a time-series-based unidirectional encoding structure, the inputting the features of the target user into the first encoder, and obtaining, by the first encoder, initial user feature vectors corresponding to the features of the target user, respectively, includes:

and sequentially taking the various features of the target user as the input of each moment of the first encoder, and taking the output of each moment of the first encoder as the feature vector of each initial user.

In another example, the first encoder includes a time-series-based bidirectional encoding structure, the inputting the features of the target user into the first encoder, and obtaining, by the first encoder, initial user feature vectors corresponding to the features of the target user, respectively, includes:

In one example, the encoding the initial user feature vectors based on a self-attention mechanism includes:

Fig. 3 shows a schematic structural diagram of a first encoder according to an embodiment. Referring to fig. 3, the first encoder includes a timing-based bi-directional encoding structure and a self-attention layer. The method comprises the steps of inputting various features of a target user into a bidirectional coding structure, wherein the various features comprise feature 1, feature 2 and feature 3 \8230, obtaining initial user feature vectors corresponding to the various features respectively through the bidirectional coding structure, wherein the initial user feature vectors comprise h1, h2 and h3 \8230hn, and after the initial user feature vectors are input from an attention layer, obtaining a coding state vector and expressing the coding state vector by X.

Then, in step 22, the coding state vector is input into a search model, K sentences are searched from an artificial knowledge base through the search model, word coding vectors corresponding to words included in the K sentences are determined, attention coefficients corresponding to the words are determined according to the output feedback vector of the decoder and the word coding vectors, and the word coding vectors are subjected to weighted summation according to the attention coefficients to obtain semantic representation vectors corresponding to the K sentences. It can be understood that the sentences in the artificial knowledge base embody artificial experiences, the artificial experiences related to the target user can be obtained through a retrieval mode, and the semantic representation vectors output by the retrieval model are influenced through the output feedback vectors of the decoder, so that the quality of the obtained text can be further improved.

In one example, the searching model comprises a second encoder, and the determining word encoding vectors corresponding to the words contained in the K sentences comprises:

acquiring word embedding vectors corresponding to words contained in the K sentences;

FIG. 4 illustrates a structural schematic of a retrieval model according to one embodiment. Referring to FIG. 4, the search model includes a search network, a second encoder, and self-attentionA layer. And inputting the coding state vector X into a retrieval network, and retrieving K sentences from the artificial knowledge base through the retrieval network. Where N is the total number of sentences contained in the artificial knowledge base, N is usually a large number, for example, N may be hundreds or thousands, and K is a predetermined number, for example, K may be 2, 3, or 5. Embedding words of each word included in the K sentences into a vector w _i Inputting a second encoder by which a word-encoded vector s of each word is determined _i (ii) a A word encoding vector R of each word _i And inputting an output feedback vector F of the decoder into a self-attention layer, and obtaining semantic representation vectors H corresponding to the K sentences through the self-attention layer.

In embodiments of the present description, the second encoder may be based on various model structures, such as a transform, LSTM, or GRU model structure.

Finally, in step 23, the encoding state vector and the semantic representation vector are input into a decoder, a user description text of the target user is generated by the decoder, and the hidden state of the decoder is used as the output feedback vector. It can be understood that the search model obtains the semantic representation vector as an input of the decoder, and the hidden state of the decoder serves as an output feedback vector, which acts on the search model to influence the semantic representation vector obtained by the search model, and through interaction between the search model and the decoder, the text quality of the generated user description text can be improved.

In one example, the decoder includes a time-sequence-based decoding structure, the decoder takes the encoding state vector as an initial state, takes the decoder output at the previous moment and the semantic representation vector output at the previous moment of the retrieval model as the input of the current moment, determines the output and hidden state at the current moment, the hidden state at the current moment is fed back to the retrieval model as the output feedback vector at the current moment, and the output at each moment corresponds to each word in the user description text.

Fig. 5 shows a schematic block diagram of a decoder according to an embodiment. Referring to fig. 5, the decoder includes a time-sequence-based decoding structure, the decoder uses the encoding state vector X as an initial state, uses the decoder output y (t-1) at the previous time and the semantic representation vector H (t-1) output at the previous time of the retrieval model as inputs at the current time, determines the output y (t) and the hidden state H (t) at the current time, feeds the hidden state H (t) at the current time back to the retrieval model as the output feedback vector F (t) at the current time, and outputs at various times respectively correspond to words in the user description text. It will be appreciated that the output feedback vector at each time is different, and accordingly the semantic representation vector at each time is different, since each time has a different hidden state.

In one example, the method further comprises:

It is understood that the embodiments of the present specification can classify the target user based on the encoding status vector obtained by the first encoder.

Table one is a sample composition schematic table for the first type of sample.

Watch 1

	Feature 1	Feature 2	…	Feature 3	Classification label
						Sample
1	…	…	…	…	Black colour
						Sample 2	…	…	…	…	White colour (Bai)
Sample 3	…	…	…	…	Black (black)

As can be seen from table one, the samples 1, 2, and 3 belong to a first type sample, the first type sample has various features of the sample user and classification labels in two classifications corresponding to the sample user, and the first type sample does not have a sample description text corresponding to the sample user. In different domains, the classification label may have different meanings, e.g., in the anti-illegal funds transfer domain, the classification label is black, representing that the sample user has an illegal funds transfer risk; the classification label is white, representing that the sample user is not at risk for an illegal funds transfer.

Table two is a sample composition schematic for the second type of sample.

Watch two

As can be seen from table one, the

samples

21, 22 and 23 belong to a second type of samples, the second type of samples have the features of the sample user and the sample description text corresponding to the sample user, and the second type of samples do not have the classification labels in the two classifications corresponding to the sample user.

In the embodiment of the present specification, since the sample description text is usually generated manually, includes several sentences, and is not easily obtained, the number of the second type samples is small. The classification labels are easily obtained, so the number of the first type samples is large. The magnitude of the first type of samples is much larger than the second type of samples. The classification labels of the samples in the first type of samples have an incidence relation with the characteristics of the samples, the sample description texts of the samples in the second type of samples have an incidence relation with the characteristics of the samples, the bottom logics of the classification labels and the characteristics of the samples are consistent, and the convergence directions during model training are also consistent, so that the first type of samples can be used for helping the training texts to generate a network, and the problems of insufficient network learning and poor generalization caused by the small number of the second type of samples can be effectively solved.

In one example, the model training includes:

continuing model training on at least one of the first encoder, the retrieval model and the decoder after pre-training by using the second type samples.

In another example, the model training includes:

and mixing the first type samples and the second type samples together, randomly scrambling the sequence, and performing model training on at least one of the first encoder, the retrieval model and the decoder in batches according to the scrambled sequence.

FIG. 6 illustrates a training sample shuffle order diagram in accordance with one embodiment. Referring to fig. 6, a sample 1, a sample 2, a sample 3, a sample 4, and a sample 5 are first type samples, a sample 6, a sample 7, a sample 8, a sample 9, and a sample 10 are second type samples, originally, the first type samples and the second type samples are respectively sequenced, and after the sequence is scrambled, the first type samples and the second type samples are mixed and interspersed together, and a batch of training samples obtained according to the scrambled sequence includes both the first type samples and the second type samples, for example, a batch of training samples has a number of 5, and the sequence thereof is sequentially a sample 8, a sample 2, a sample 6, a sample 4, and a sample 5, and the batch of training samples includes the first type sample 2, the sample 4, and the sample 5, including the second type sample 8 and the sample 6.

For example, the total predicted loss is expressed as loss and the function value of the first loss function is expressed as l _c The function value of the second loss function is expressed by l _g Denotes that loss = l _g +l _c 。

The first loss function may be a cross-entropy loss function, and may be represented by the following formula:

wherein n represents a sample of the first typeNumber, i represents the sample number, y is the number of samples when the ith sample is classified as class one _i The value is 1, when the ith sample is classified as class two, y _i The value is 0,p _i Representing the probability of classifying the ith sample as class one.

The second loss function may be a cross-entropy loss function, which may be represented by the following equation:

where n represents the number of words output by the decoder, i represents the number of words, and y represents the number of words when the ith word output by the decoder belongs to the sample description text _i Taking the value of 1, when the ith word output by the decoder does not belong to the sample description text, y _i The value is 0,p _wi Representing the probability that the sample output by the decoder describes each word in the text.

According to the method provided by the embodiment of the specification, various features of a target user are input into a first encoder, initial user feature vectors corresponding to the various features of the target user are obtained through the first encoder, and the initial user feature vectors are encoded on the basis of a self-attention mechanism to obtain an encoding state vector; then inputting the coding state vector into a retrieval model, retrieving K sentences from an artificial knowledge base through the retrieval model, determining word coding vectors corresponding to all words contained in the K sentences, determining attention coefficients corresponding to all words according to the output feedback vector of the decoder and the word coding vectors, and performing weighted summation on all word coding vectors according to the attention coefficients to obtain semantic representation vectors corresponding to the K sentences; and finally, inputting the coding state vector and the semantic representation vector into a decoder, generating a user description text of the target user through the decoder, and taking the hidden state of the decoder as the output feedback vector. As can be seen from the above, in the embodiments of the present specification, not only are initial user feature vectors corresponding to various features of a target user respectively utilized, but also semantic representation vectors corresponding to retrieved K statements are utilized, and since the K statements are derived from an artificial knowledge base, the most relevant artificial experiences with the target user can be effectively utilized, problems of word folding, word errors, and the like can be well solved, the applicability is strong, the text quality is good, and the retrieval model and the decoder interact with each other, the semantic representation vectors output by the retrieval model can be used as input of the decoder, user description texts generated by the decoder are affected, meanwhile, hidden states of the decoder can be used as output feedback vectors, the hidden states of the decoder can be used as input parts of an attention machine of the retrieval model, the semantic representation vectors output by the retrieval model are affected, and thus the quality of the obtained text can be further improved.

According to an embodiment of another aspect, there is also provided an apparatus for generating a user description text based on a text generation network, where the text generation network includes a first encoder, a search model and a decoder, and the apparatus is configured to perform the method for generating a user description text based on a text generation network provided in the embodiments of this specification. FIG. 7 shows a schematic block diagram of an apparatus for generating user description text based on a text generation network according to one embodiment. As shown in fig. 7, the apparatus 700 includes:

the encoding unit 71 is configured to input each feature of the target user into the first encoder, obtain each initial user feature vector corresponding to each feature of the target user through the first encoder, and encode each initial user feature vector based on a self-attention mechanism to obtain an encoded state vector;

a retrieving unit 72, configured to input the coding state vector obtained by the encoding unit 71 into a retrieval model, retrieve K sentences from an artificial knowledge base through the retrieval model, determine word coding vectors corresponding to words included in the K sentences, determine attention coefficients corresponding to the words according to output feedback vectors of the decoder and the word coding vectors, and perform weighted summation on the word coding vectors according to the attention coefficients to obtain semantic representation vectors corresponding to the K sentences;

and a decoding unit 73, configured to input the coding state vector obtained by the coding unit 71 and the semantic representation vector obtained by the retrieving unit 72 into a decoder, generate, by the decoder, a user description text of the target user, where a hidden state of the decoder is used as the output feedback vector.

Optionally, as an embodiment, the first encoder includes a time-series-based unidirectional coding structure, and the coding unit 71 is specifically configured to sequentially use each feature of the target user as an input of each time of the first encoder, and use an output of each time of the first encoder as each initial user feature vector.

Optionally, as an embodiment, the first encoder includes a time-series based bidirectional encoding structure, and the encoding unit 71 includes:

the first coding subunit is used for sequentially inputting each feature of the target user into the first coder according to a first sequence and obtaining a first feature vector of each feature according to the output of each moment of the first coder;

a second encoding subunit, configured to sequentially input each feature of the target user into the first encoder according to an order opposite to the first order, and obtain a second feature vector of each feature according to an output of each time of the first encoder;

and the combination subunit is used for combining the first feature vector obtained by the first coding subunit and the second feature vector obtained by the second coding subunit, which correspond to the same item of feature, as the initial user feature vector of the item of feature.

Optionally, as an embodiment, the encoding unit 71 is specifically configured to determine weights corresponding to the initial user feature vectors, and perform weighted summation on the initial user feature vectors according to the weights to obtain the encoding state vector.

Optionally, as an embodiment, the retrieving model includes a second encoder, and the retrieving unit 72 includes:

an obtaining subunit, configured to obtain word embedding vectors corresponding to words included in the K statements;

and the coding subunit is used for inputting the word embedding vectors obtained by the obtaining subunit into a second coder, and determining the word coding vectors corresponding to the words contained in the K sentences through the second coder based on an attention mechanism.

Optionally, as an embodiment, the decoder includes a time-series-based decoding structure, the decoder uses the encoded state vector as an initial state, uses a decoder output at a previous time and a semantic representation vector output at a previous time of the retrieval model as inputs at a current time, determines an output and a hidden state at the current time, feeds the hidden state at the current time back to the retrieval model as an output feedback vector at the current time, and outputs at respective times respectively correspond to words in the user description text.

Optionally, as an embodiment, the apparatus further includes:

and the training unit is used for carrying out model training on at least one of the first encoder, the retrieval model and the decoder by utilizing a first type sample and a second type sample, wherein the first type sample has various characteristics of a sample user and classification labels in two classifications corresponding to the sample user, the second type sample has various characteristics of the sample user and a sample description text corresponding to the sample user, and the number of the first type samples is greater than that of the second type samples.

Further, the training unit comprises:

a first training subunit, configured to perform model pre-training on the first encoder by using the first type of samples;

and the second training subunit is configured to continue model training, by using the second type of sample, on at least one of the first encoder, the search model, and the decoder, which is obtained by the first training subunit and is pre-trained.

Further, the training unit is specifically configured to mix the first type samples and the second type samples together, randomly shuffle the order, and perform model training on at least one of the first encoder, the search model, and the decoder in batches according to the shuffled order.

With the device provided in this specification, first, the encoding unit 71 inputs each feature of the target user into the first encoder, obtains each initial user feature vector corresponding to each feature of the target user through the first encoder, and encodes each initial user feature vector based on the attention mechanism to obtain an encoded state vector; then, the retrieval unit 72 inputs the coding state vector into a retrieval model, retrieves K sentences from an artificial knowledge base through the retrieval model, determines word coding vectors corresponding to words contained in the K sentences, determines attention coefficients corresponding to the words according to the output feedback vector of the decoder and the word coding vectors, and performs weighted summation on the word coding vectors according to the attention coefficients to obtain semantic representation vectors corresponding to the K sentences; finally, the decoding unit 73 inputs the coding state vector and the semantic representation vector into a decoder, and generates a user description text of the target user through the decoder, and the hidden state of the decoder is used as the output feedback vector. As can be seen from the above, in the embodiments of the present specification, not only are initial user feature vectors corresponding to various features of a target user respectively utilized, but also semantic representation vectors corresponding to retrieved K statements are utilized, and since the K statements are derived from an artificial knowledge base, the most relevant artificial experiences with the target user can be effectively utilized, problems of word folding, word errors, and the like can be well solved, the applicability is strong, the text quality is good, and the retrieval model and the decoder interact with each other, the semantic representation vectors output by the retrieval model can be used as input of the decoder, user description texts generated by the decoder are affected, meanwhile, hidden states of the decoder can be used as output feedback vectors, the hidden states of the decoder can be used as input parts of an attention machine of the retrieval model, the semantic representation vectors output by the retrieval model are affected, and thus the quality of the obtained text can be further improved.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of another aspect, there is also provided a computing device including a memory and a processor, the memory having stored therein executable code, and the processor implementing the method described in conjunction with fig. 2 when executing the executable code.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of generating user description text based on a text generation network, the text generation network comprising a first encoder, a retrieval model and a decoder, the method comprising:

inputting various features of a target user into a first encoder, acquiring initial user feature vectors respectively corresponding to the various features of the target user through the first encoder, and encoding the initial user feature vectors based on a self-attention mechanism to obtain encoding state vectors;

and inputting the coding state vector and the semantic representation vector into a decoder, wherein the decoder comprises a decoding structure based on time sequence, the decoder takes the coding state vector as an initial state, takes the decoder output at the previous moment and the semantic representation vector output at the previous moment of the retrieval model as the input at the current moment, determines the output and hidden state at the current moment, the hidden state at the current moment is fed back to the retrieval model as the output feedback vector at the current moment, and the output at each moment respectively corresponds to each word to form the user description text of the target user.

2. The method of claim 1, wherein the text-generating network is trained by:

performing model training on the first encoder by using a first type of sample, wherein the first type of sample has various characteristics of a sample user and classification labels in two types of classifications corresponding to the sample user, and classifying the sample user based on a coding state vector obtained by the first encoder in the model training; and performing model training on at least one of the first encoder, the retrieval model and the decoder by using a second type of samples, wherein the second type of samples have various characteristics of sample users and sample description texts corresponding to the sample users, and the number of the first type of samples is greater than that of the second type of samples.

3. The method of claim 2, wherein the model training comprises:

and mixing the first type samples and the second type samples together, randomly scrambling the sequence, dividing the samples into batches according to the scrambled sequence, determining the total prediction loss of the first type samples and the second type samples in the same batch by using a preset total loss function, and adjusting the parameter of at least one of the first encoder, the retrieval model and the decoder according to the total prediction loss.

4. The method of claim 3, wherein the total loss function is determined by a first loss function having a function value determined according to the probability of the first type of sample class and a second loss function having a function value determined according to the probability of the second type of sample output by the decoder corresponding to each word in the sample description text.

5. The method of claim 1, wherein the first encoder includes a time-sequential unidirectional coding structure, the inputting features of the target user into the first encoder, and obtaining, by the first encoder, initial user feature vectors corresponding to the features of the target user, respectively, includes:

6. The method of claim 1, wherein the first encoder comprises a time-sequence-based bidirectional coding structure, the inputting features of the target user into the first encoder, and obtaining, by the first encoder, initial user feature vectors corresponding to the features of the target user, respectively, comprises:

7. The method of claim 1, wherein said encoding each of said initial user feature vectors based on a self-attention mechanism comprises:

and determining the weight corresponding to each initial user characteristic vector, and performing weighted summation on each initial user characteristic vector according to each weight to obtain the coding state vector.

8. An apparatus for generating user description text based on a text generation network, the text generation network including a first encoder, a search model, and a decoder, the apparatus comprising:

and the decoder takes the coding state vector as an initial state, takes the decoder output at the previous moment and the semantic representation vector output at the previous moment of the retrieval model as the input at the current moment, determines the output and hidden state at the current moment, takes the hidden state at the current moment as the output feedback vector at the current moment and feeds the output of each moment back to the retrieval model, and each word corresponding to the output of each moment forms the user description text of the target user.

9. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.

10. A computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of any of claims 1-7.