CN114911895A

CN114911895A - Text generation method, device and storage medium

Info

Publication number: CN114911895A
Application number: CN202110174537.XA
Authority: CN
Inventors: 兰国兴; 许娟婷; 隋志成; 周力
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-08-16

Abstract

The application relates to a text generation method, a text generation device and a storage medium, wherein the method comprises the following steps: performing feature extraction on data to be processed to obtain a feature vector; performing feature conversion on the feature vector to obtain a semantic vector, wherein the semantic vector corresponds to the semantics of the data to be processed; and processing the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed, wherein in the binary tree structure, each leaf node corresponds to one candidate word of the first target text, and the closer the semantics of the candidate words are, the closer the distance between the leaf nodes corresponding to the candidate words in the binary tree structure is. According to the embodiment of the application, the prediction precision can be improved, the prediction process is accelerated, and the user experience is improved.

Description

Text generation method, device and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a text generation method and apparatus, and a storage medium.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. That is, artificial intelligence studies the design principle and implementation method of various intelligent machines, so that the machine has the functions of perception, reasoning and decision making.

Natural language processing is an important direction in the field of artificial intelligence, and researches on how to enable a computer to understand the meaning of a natural language text and express given intentions, ideas and the like through the natural language text.

Disclosure of Invention

In view of this, a text generation method, apparatus and storage medium are provided.

In a first aspect, an embodiment of the present application provides a text generation method, where the method includes: performing feature extraction on data to be processed to obtain a feature vector; performing feature conversion on the feature vector to obtain a semantic vector, wherein the semantic vector corresponds to the semantics of the data to be processed; and processing the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed, wherein in the binary tree structure, each leaf node corresponds to a candidate word of the first target text, and the closer the semantics of the candidate word are, the closer the distance of the leaf node corresponding to the candidate word in the binary tree structure is.

According to the embodiment of the application, the feature vector is obtained by extracting the features of the data to be processed, the feature vector is subjected to feature conversion, the semantic vector is obtained, the semantic vector corresponds to the semantics of the data to be processed, the semantic vector is processed according to a binary tree structure, the first target text corresponding to the data to be processed is obtained, the prediction of the data to be processed can be realized, the required target text is obtained, the prediction precision can be improved by enabling each leaf node to correspond to one candidate word of the first target text in the binary tree structure, the closer the semantics of the candidate words are, the closer the distance of the leaf node corresponding to the candidate word in the binary tree structure is, the prediction process can be accelerated, and the user experience is improved.

According to the first aspect, in a first possible implementation manner of the text generation method, each leaf node further corresponds to a semantic vector average value, the semantic vector average value is obtained by averaging one or more semantic vectors, the one or more semantic vectors are associated with candidate words corresponding to the leaf node, and the binary tree structure is obtained by performing binary clustering on the semantic vector average values of the candidate words.

According to the embodiment of the application, more accurate prediction results can be obtained when the data to be processed are predicted, and the precision can be prevented from being reduced by using the distribution of the clustered semantic vectors as the arrangement mode of the leaf nodes.

In a second possible implementation manner of the text generation method according to the first aspect, the method is applied to a first text generation model, and the method further includes: training a second text generation model by using a training sample to obtain the trained second text generation model, wherein an output layer in the second text generation model does not comprise the binary tree structure; predicting the training sample by using the trained second text generation model to obtain a semantic vector and a second target text corresponding to the training sample; establishing a mapping relation between each target word in the second target text and the corresponding semantic vector average value, wherein the semantic vector average value corresponding to any target word represents the average value of one or more semantic vectors of the training sample of the target word obtained through prediction; performing binary clustering based on the semantic vector average value in the mapping relation, and establishing the binary tree structure; and training the trained second text generation model based on the binary tree structure to obtain the first text generation model.

According to the embodiment of the application, the training sample is used for training the second text generation model to obtain the trained second text generation model, the training sample is predicted by using the trained second text generation model to obtain the semantic vector and the second target text corresponding to the training sample, the mapping relation between each target word in the second target text and the corresponding semantic vector average value is established, binary clustering is carried out based on the semantic vector average value in the mapping relation, the binary tree structure is established, the trained second text generation model is trained based on the binary tree structure to obtain the first text generation model, the process of establishing the first text generation model can be achieved, and meanwhile, the first text generation model with high prediction precision and high speed can be obtained by establishing the binary tree structure.

According to the second possible implementation manner of the first aspect, in a third possible implementation manner of the text generation method, the training the trained second text generation model based on the binary tree structure to obtain the first text generation model includes: replacing the output layer of the trained second text generation model with the output layer containing the binary tree structure; and training the replaced second text generation model by using the training sample to obtain the first text generation model.

According to the embodiment of the application, the first text generation model with higher prediction precision can be obtained by training on the basis of obtaining the second text generation model containing the binary tree structure.

According to the second possible implementation manner of the first aspect, in a fourth possible implementation manner of the text generation method, training the trained second text generation model based on the binary tree structure to obtain the first text generation model includes: training the output layer containing the binary tree structure by using the semantic vector corresponding to the training sample to obtain the trained output layer containing the binary tree structure; and replacing the output layer of the trained second text generation model with the trained output layer containing the binary tree structure to obtain the first text generation model.

According to the embodiment of the application, the training process of the model can be accelerated, the corresponding semantic vector of the training sample is used for training the output layer containing the binary tree structure, and the training resources can be saved.

According to the first aspect, in a fifth possible implementation manner of the text generation method, the processing the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed includes: for any semantic vector, taking a root node of the binary tree as an initial current node; judging whether a current node is a leaf node or not, and determining that a candidate word corresponding to the current node is a first target text corresponding to the semantic vector under the condition that the current node is the leaf node; otherwise, calculating the probability value of the left child node of the current node of the semantic vector according to the semantic vector and the semantic vector average value corresponding to the left child node of the current node; and under the condition that the probability value is larger than the threshold value, taking the left child node of the current node as a new current node, otherwise, taking the right child node of the current node as a new current node, and repeatedly executing the step after judging whether the current node is a leaf node.

According to the embodiment of the application, the semantic vector is subjected to binary prediction, so that the probability value of only one node needs to be calculated for each layer of the binary tree, the time delay of model prediction can be reduced, the time complexity is reduced, the calculated amount in model prediction is reduced, the calculation resources occupied by the model are reduced, and the user experience is improved.

In a second aspect, an embodiment of the present application provides a text generation apparatus, including: the characteristic extraction module is used for extracting the characteristics of the data to be processed to obtain a characteristic vector; the feature conversion module is used for performing feature conversion on the feature vector to obtain a semantic vector, and the semantic vector corresponds to the semantics of the data to be processed; and the processing module is used for processing the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed, wherein in the binary tree structure, each leaf node corresponds to one candidate word of the first target text, and the closer the semantics of the candidate words are, the closer the distance between the leaf nodes corresponding to the candidate words in the binary tree structure is.

In a first possible implementation manner of the text generating apparatus according to the second aspect, each leaf node further corresponds to a semantic vector average value, the semantic vector average value is obtained by averaging one or more semantic vectors, the one or more semantic vectors are associated with candidate words corresponding to the leaf node, and the binary tree structure is obtained by performing binary clustering on the semantic vector average values of the candidate words.

In a second possible implementation manner of the text generation apparatus according to the second aspect, the apparatus is configured to generate a model for a first text, and the apparatus further includes: the first training module is used for training a second text generation model by using a training sample to obtain the trained second text generation model, and an output layer in the second text generation model does not comprise the binary tree structure; the prediction module is used for predicting the training samples by using the trained second text generation model to obtain semantic vectors and second target texts corresponding to the training samples; the establishing module is used for establishing a mapping relation between each target word in the second target text and the corresponding semantic vector average value, wherein the semantic vector average value corresponding to any target word represents the average value of one or more semantic vectors of the training sample of the target word obtained through prediction; the binary clustering module is used for carrying out binary clustering based on the semantic vector average value in the mapping relation and establishing the binary tree structure; and the second training module is used for training the trained second text generation model based on the binary tree structure to obtain the first text generation model.

In a third possible implementation manner of the text generating apparatus according to the second possible implementation manner of the second aspect, the second training module includes: a first replacing module, configured to replace the output layer of the trained second text generation model with the output layer including the binary tree structure; and the first training submodule is used for training the replaced second text generation model by using the training sample to obtain the first text generation model.

In a fourth possible implementation manner of the text generating apparatus according to the second possible implementation manner of the second aspect, the second training module includes: the second training submodule is used for training the output layer containing the binary tree structure by utilizing the semantic vector corresponding to the training sample to obtain the trained output layer containing the binary tree structure; and the second replacing module is used for replacing the output layer of the trained second text generation model by the trained output layer containing the binary tree structure to obtain the first text generation model.

In a fifth possible implementation manner of the text generating apparatus according to the second aspect, the processing module includes: a first determining module, configured to, for any semantic vector, take a root node of the binary tree as an initial current node; the second determining module is used for determining that the candidate word corresponding to the current node is the first target text corresponding to the semantic vector under the condition that the current node is the leaf node; otherwise, the calculation module is used for calculating the probability value of the left child node of the current node of the semantic vector according to the semantic vector and the semantic vector average value corresponding to the left child node of the current node; and a third determining module, configured to, when the probability value is greater than the threshold value, use the left child node of the current node as a new current node, otherwise use the right child node of the current node as a new current node, and repeatedly perform the step after determining whether the current node is a leaf node.

In a third aspect, an embodiment of the present application provides a text generation apparatus, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the text generation method of the first aspect or one or more of the many possible implementation manners of the first aspect when executing the instructions.

In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the text generation method of the first aspect or one or more of the many possible implementations of the first aspect.

In a fifth aspect, an embodiment of the present application provides a terminal device, where the terminal device may perform the text generation method of the first aspect or one or more of multiple possible implementation manners of the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product, which includes computer-readable code or a non-transitory computer-readable storage medium carrying computer-readable code, and when the computer-readable code runs in an electronic device, a processor in the electronic device executes a text generation method of one or more of the foregoing first aspect or multiple possible implementations of the first aspect.

These and other aspects of the present application will be more readily apparent from the following description of the embodiment(s).

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the description, serve to explain the principles of the application.

FIG. 1 shows a schematic diagram of an artificial intelligence agent framework.

Fig. 2 shows a block diagram of a text generation apparatus according to an embodiment of the present application.

Fig. 3 shows a flowchart of establishing an output layer model in a text generating apparatus according to an embodiment of the present application.

Fig. 4 shows a structure diagram of the softmax output layer.

FIG. 5 is a diagram illustrating the influence of leaf nodes on classification results according to an embodiment of the present application.

Fig. 6a shows a flow chart of prediction with a text generation apparatus according to an embodiment of the present application.

FIG. 6b shows a flow diagram for prediction with a text generation apparatus according to an embodiment of the present application.

Fig. 6c shows a flowchart of prediction using a text generation apparatus according to an embodiment of the present application.

Fig. 7 shows a schematic diagram of the hierarchical softmax output layer of the binary tree structure.

FIG. 8 shows a flow diagram of a text generation method according to an embodiment of the present application.

FIG. 9 shows a flow diagram of a text generation method according to an embodiment of the present application.

FIG. 10 shows a flow diagram of a text generation method according to an embodiment of the application.

FIG. 11 shows a flow diagram of a text generation method according to an embodiment of the application.

FIG. 12 shows a flow diagram of a text generation method according to an embodiment of the present application.

Fig. 13 shows a configuration diagram of a text generation apparatus according to an embodiment of the present application.

Fig. 14 shows a configuration diagram of a text generation apparatus according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments, features and aspects of the present application will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present application.

In an implementation manner of the prior art, in a text generation task, a task model generally takes a neural network model as a main component, a softmax layer as an output layer, and for each input data, a word corresponding to the task model in a word list can be obtained through neural network model inference, and the word list includes a set of all possible words in an output text. When the vocabulary is large, the calculation amount in the process is large, for example, when the size of the vocabulary is | V |, the time complexity is O (| V |), and the prediction delay of the whole model is large, which affects the experience of the user. In another implementation manner in the prior art, a softmax output layer of a huffman tree structure is also established according to word frequency, in this case, although the calculation amount in the model training process can be reduced, the entire huffman tree still needs to be traversed and the probabilities of all possible words are calculated in the prediction process using the model (because the precision is greatly reduced if only one greedy calculation from top to bottom is performed without traversing the entire huffman tree) to obtain the final output, the calculation time is long, and the time complexity is even higher than that of the previous implementation manner and is O (| V | log), which is a unit of time consumption and is a unit of time complexity ₂ |V|)。

In order to solve the technical problem, the text generation method provided by the application can generate a text based on a binary tree structure to obtain a target text, so that prediction delay is reduced, and high prediction accuracy is maintained.

The overall workflow of the artificial intelligence system is described first, fig. 1 shows a schematic structural diagram of an artificial intelligence subject framework, which is illustrated in fig. 1 and explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by an intelligent chip, which includes but is not limited to hardware acceleration chips such as a Central Processing Unit (CPU), an embedded neural Network Processor (NPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to smart chips in a distributed computing system provided by the underlying platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, autopilot, safe city etc..

The embodiment of the application can be applied to various fields of artificial intelligence, and particularly can be applied to scenes in which texts need to be generated in natural language processing, wherein text generation refers to inputting of a piece of data (which can be texts, pictures, videos, sounds and the like) to generate a piece of text, such as machine translation, text summarization, poetry sentence making, talking on the picture and the like, for example, in machine translation, "I like swim" and corresponding context information are input, and "I like swim" can be output. In talking on a picture, a picture can be input, a sentence describing the content of the picture can be output, and the like. It should be understood that the examples are only for convenience of understanding the application scenarios of the embodiments of the present application, and are not exhaustive.

Fig. 2 shows a configuration diagram of a text generation apparatus according to an embodiment of the present application. As shown in FIG. 2, the text generation device may include a feature extraction module, a feature converterA module and an output layer. The feature extraction module and the feature conversion module can be realized based on the related technology, and the embodiment of the application does not limit the specific implementation mode. The input data may include text, sound, picture, video, and other data, for example, the text may include original text and context information of the original text, and the output data may include target text. The feature extraction module can be used for extracting features of input data to obtain an original feature vector X _ori The feature transformation module may be configured to apply the original feature vector X to the network structure _ori Performing feature conversion to obtain a processed semantic vector X; the output layer may be configured to obtain output data from the processed semantic vector X.

The network structure for performing feature conversion may include a network structure such as a full connection network, a Recurrent Neural Network (RNN), a long short-term memory (LSTM), a Convolutional Neural Network (CNN), an attention mechanism (attention mechanism), and the like, and the output layer may be a hierarchical softmax output layer established based on a classification network (softmax). The semantic vector X may be a feature vector for representing the semantics of the input data.

For example, in the case that the input data is text, the original text may be 'I like swimmming', and may further include context information, which may be a previous sentence and a subsequent sentence of the original text; after the original text and the context information are input, feature extraction can be respectively carried out on 'I', 'like' and 'swimming', so that original feature vectors (which can be two-dimensional word vectors) corresponding to the three words are respectively obtained

And

the three original feature vectors can be respectively subjected to feature conversion to obtain three corresponding semantic vectors X ₁ 、X ₂ And X ₃ And inputting it into the output layer, passing through the output layer, for the semantic vector X ₁ Can obtain a target text 'I' aiming at the semantic vector X ₂ Target text 'like' can be obtained, aiming at semantic vector X ₃ The target text "swimming" can be obtained, and finally, the complete target text "i like swimming" can be output.

Fig. 3 shows a flowchart for establishing an output layer model in a text generating apparatus according to an embodiment of the present application. As shown in fig. 3, the process of building the hierarchical softmax output layer model may include:

and S301, training an original model of which the output layer is a softmax output layer.

The training data may be used for training, where the training data may include samples marked in a training set, the samples need to include original texts themselves, and may also include context information of the original texts, and the marks of the samples may represent target texts corresponding to the samples. Wherein the softmax output layer may be an output layer that does not contain a binary tree structure.

Fig. 4 shows a structure diagram of the softmax output layer. As shown in FIG. 4, the input data of the softmax output layer may be a semantic vector X, which may be applied to the original feature vector X through a network structure such as a fully-connected network _ori Performing characteristic conversion treatment to obtain, wherein x ₁ ，x ₂ ，...，x _n For each component of the semantic vector X, n may represent the number of components of the semantic vector X, z ₁ ，z ₂ ，...，z _|v| Can respectively correspond to candidate words in the word list V ₁ ，word ₂ ，...，word _|v| Wherein, in the step (A),

wherein W may represent a zoom factor, W _i ,: may represent the individual components of X to a vector of scaling factors of zi, W _ij Denotes the vector of the scaling factors, x _j To z _i Scaling factor of, z _i May correspond to the ith candidate word, x in the word list _j Representing the jth component of the semantic vector X.

Word candidates in a vocabulary ₁ ……word _|V| For the probability of the target text, argmax may indicate the word in the corresponding word list when the probability value is the maximum, and the word is also the target text output by the softmax output layer.

And S302, predicting samples in the training set by using the trained original model to obtain a mapping relation S between output data of the softmax output layer and an average value of the corresponding semantic vectors.

The semantic vector X and the corresponding output word (which may also be referred to as a target text) corresponding to each sample in the training set may be recorded.

The output data of the softmax output layer can be a certain word in a word list V, and can also be called as a target text, and for the certain word in V, a word can be used _i The expression can be called as the ith word in the word list V, the value of i can be 1, 2, | V |, | V | can represent the size of the word list, and for one of the output words, word, according to the recorded semantic vector X and the corresponding output word _i May have n _i The semantic vector X corresponding to the sample in the strip training set corresponds to the semantic vector, n _i Can represent the number of semantic vectors X corresponding to the samples in the training set corresponding to the ith word, i.e., n _i And the semantic vectors X are all words wordi through results obtained by the output layer. The mapping S of wordi can be represented by

To carry out the process, wherein,

can represent ni semantic vectors corresponding to wordi

Is calculated (also referred to as the mean semantic vector).

For example, the number of samples in the training set is a, which may correspond to a total of B words, respectively, and then the S has a total of B mapping relationships corresponding to the B words, respectively.

Step S303, toThe mapping relation S is used as a root node, and the average semantic vector corresponding to each word in S is

And performing binary clustering to obtain a binary tree structure.

Wherein, the average semantic vector corresponding to each word in S can be firstly

Clustering is carried out, and after clustering, semantic vectors are averaged

The mapping relation is divided into two types, so that two corresponding sub mapping relation sets S are obtained ₁ And S ₂ The two child nodes, which are respectively used as root nodes, can be respectively called as left child and right child. Get the left child S ₁ And the right child S ₂ Thereafter, S may be separately introduced ₁ And S ₂ As a father node, and respectively performing binary clustering of the average semantic vectors again, the above binary clustering process can be repeated until each child node corresponds to only one mapping relationship

I.e. a one-to-one correspondence between a word and an average semantic vector.

Wherein the average semantic vector

The average value of semantic vectors of a word is obtained by prediction, the average value can reflect the semantics of the word, and the average semantic vector corresponding to each word in S

Clustering is carried out to construct a binary tree, so that in the finally established binary tree, the closer the semanteme is, the closer the distance of the words in the leaf nodes is, and the words corresponding to the input semantic vector can be predicted by utilizing the established binary tree.

The binary clustering method in the embodiment of the present application is not limited, and for example, a K-Means clustering method may be used, and a two-norm is used as a distance measurement mode in the process of clustering by using the K-Means clustering method.

And step S304, establishing a hierarchical softmax output layer model of a binary tree structure according to the obtained binary tree.

Each leaf node in the hierarchical softmax output layer model of the binary tree structure may correspond to a word in the word list V (and correspond to an average semantic vector according to a mapping relationship), and words with closer semantics are closer to each other in the leaf node.

Fig. 5 is a schematic diagram illustrating an influence of a leaf node on a classification result according to an embodiment of the present application. As shown in FIG. 5, X may represent an input semantic vector, C ₁ 、C ₂ 、C ₃ And C ₄ The number of four classes can be represented, respectively, each class corresponding to an average semantic vector, i.e. to a word, where class C ₁ And class C ₃ Class C ₁ And class C ₂ Class C ₄ And class C ₂ Class C ₄ And class C ₃ Respectively, have similar semantics. As shown in FIG. 5(a), the binary tree coding scheme is used to perform the first dichotomous clustering and then classify C ₁ And class C ₃ Are classified into the same class, class C ₄ And class C ₂ Are divided into the same class, and after the second dichotomy clustering, the leaf nodes respectively correspond to the class C in sequence ₁ Class C ₃ Class C ₂ And class C ₄ As shown in fig. 5(b), after two bifurcations (equivalent to two dichotomous clustering), the more similar categories can be closer in the leaf nodes, and are reflected on the decision plane, so that the final category can be more accurately decided.

By the encoding mode, the established hierarchical softmax output layer model can have a better classification effect, the final category can be more accurately decided, and more accurate target texts can be output.

And S305, training the obtained hierarchical softmax output layer model.

The training method can be that the obtained hierarchical softmax output layer model is used as an output layer of the text generation device, and the training set sample is used for re-training the text generation device end to obtain a trained model; the training method may also be to use the semantic vector X corresponding to the sample of the training set obtained in step S302 as an input, train only the hierarchical softmax output layer model to obtain a trained model, and then splice the trained model with other modules of the text generation apparatus to form a final model.

Hereinafter, a description will be given of a scenario in which input data is used as a text, and those skilled in the art will understand that the scenario of the present application is not limited to this scenario. Fig. 6a, 6b and 6c show a flow chart of prediction with a text generation apparatus according to an embodiment of the present application. As shown in fig. 6a, the flow of prediction may include:

in step S601, an original text and context information of the original text are input.

For example, the original text may be an 'I like shocking', and the context information may be the contents of the upper and lower sentences in the 'I like shocking' in the article.

The original text and the context information of the original text are for example see the original output data in fig. 6b, or for example see the 'I like swamming' and the 'context information' in fig. 6 c.

Step S602, extracting the characteristics of the original text and the context information of the original text to obtain an original characteristic vector X _ori 。

For example, a Word vector model (Word2Vec) can be used to map the semantics of a Word to a vector space, resulting in a corresponding original feature vector X _ori (see, e.g., X in FIGS. 6b and 6 c) _ori ) For example, three different original feature vectors corresponding to "I", 'like' and 'swimming' may be generated corresponding to the input original text 'I like swimming', respectively.

Step S603, for the original feature vector X _ori And performing feature conversion to obtain a processed semantic vector X.

Therein, canUsing full connection network, RNN, LSTM, CNN, Attention and other neural network structure to X _ori And performing characteristic conversion processing.

This step can be seen for example in fig. 6b and 6c for fully connected networks, RNN, LSTM, CNN, Attention.

Step S604, the semantic vector X is used as input data, a hierarchical softmax output layer with a binary tree structure is input, and binary prediction is carried out.

Fig. 7 shows a schematic diagram of a hierarchical softmax output layer of a binary tree structure. As shown in fig. 7, after the semantic vector X is input, binary prediction may be performed from top to bottom, starting from the root node of the binary tree. For example, a greedy strategy may be adopted, starting from a root node, a value of a sigmoid function corresponding to the root node is calculated, the value of the sigmoid function (value range 0-1) may represent the probability that an input semantic vector X belongs to a category corresponding to the node, the larger the value of the function is, the more likely the input semantic vector X belongs to the category corresponding to the node, in the case that a function value is greater than 0.5, a left child of the node may be selected as a next calculation node, otherwise, a right child is selected as a next calculation node, and so on until a leaf node is reached, a candidate word in a word list corresponding to the leaf node is the target text. That is, only one of the sigmoid function values will be calculated for each layer of the binary tree, and the time complexity of the prediction process of the hierarchical softmax output layer is O (log) ₂ |V|)。

Wherein, the calculation formula of the sigmoid function can be

May be as shown in each node in fig. 7, where X may represent a value of an input semantic vector, p may represent a number of levels of a binary tree, q may represent a q-th node in a p-th level of the binary tree, and θ may represent a parameter obtained when training the obtained hierarchical softmax output layer model, see step S305 for an example of training,

can represent the second of a binary treeThe parameters obtained when the output layer is trained in the first node of the n layers,

the coefficients of the parameters obtained when the output layer is trained in the second node in the nth layer of the binary tree may be represented, and the calculation result of the formula may represent the probability value of the semantic vector X belonging to the category of the left child node corresponding to the node. For example, for a semantic vector X corresponding to a text "like", in a hierarchical softmax output layer of the binary tree structure shown in fig. 7, starting from a root node, a probability that the semantic vector X belongs to a category of a left child node corresponding to the node is calculated by using the above formula, when the calculated probability value is greater than 0.5, it may be indicated that the semantic vector X belongs more to the category of the left child node, a left child is selected as a next calculation node, when the calculated probability value is less than 0.5, it may be indicated that the semantic vector X belongs more to a category of a right child node, a right child is selected as a next calculation node, until a leaf node, if a candidate word corresponding to the leaf node is "like", a target text corresponding to the semantic vector X is "like".

Wherein the hierarchical softmax output layer of the binary tree structure can be seen in e.g. the modified softmax layer in fig. 6b and fig. 6 c.

Step S605, the target text is output.

For example, the original text 'I like swimmming' corresponding to the input as shown in fig. 6c may output three target texts of "me", "like", and "swimming" as shown in fig. 6c, respectively.

Wherein the three target texts of "I", "like" and "swim" can also refer to word as shown in FIG. 6b _i 、word _j And word _k 。

FIG. 8 shows a flow diagram of a text generation method according to an embodiment of the present application. The method can be realized on an online server, an offline server or a terminal device.

As shown in fig. 8, the method includes:

step S801, performing feature extraction on data to be processed to obtain a feature vector;

step S802, performing feature conversion on the feature vector to obtain a semantic vector, wherein the semantic vector corresponds to the semantics of the data to be processed;

step S803, processing the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed, where in the binary tree structure, each leaf node corresponds to a candidate word of the first target text, and the closer the semantics of the candidate word are, the closer the distance between the leaf nodes corresponding to the candidate word in the binary tree structure is.

The data to be processed may represent any type of data that can be used to generate text, for example, data whose content may be represented by the generated text may include text, sound, picture, video, and other data. The object of feature extraction may be the data to be processed (e.g., original text) itself, and may further include context information of the data to be processed, for example, the data to be processed may itself be text such as 'I like listening', and the context information may be a previous sentence and a next sentence of 'I like listening'. The first target text may represent a purpose of text generation corresponding to the data to be processed, and the resulting text may include, for example, in a case where translation of the data to be processed 'I like swim' is required, the first target text may include "me", "like", "swim", these outputted words may be made up into a final translation result "me likes swim", in a case where a text description is required to be generated for a photograph of a landscape of the seaside of the data to be processed, the first target text may include a picture description text "seaside", and in a case where voice recognition and a text description are required to be generated for a sound of the data to be processed, the first target text may include voice content in the sound, such as "today is monday". The purpose of text generation is not limited in the present application, and may be text translation, text description, and the like.

When the data to be processed is a text, the feature extraction may include performing word segmentation on the data to be processed and converting the data to be processed into a two-dimensional word vector, for example, for an 'I like swimmmg', after performing word segmentation, three words of 'I', 'like', 'swimmmg' may be obtained, and the word vectors are converted respectively, and then further processed, the feature vector may be the word vector, and the feature conversion may include processing the feature vector through a network structure. In the case that the data to be processed is a text, step S801 may perform feature extraction on the original text and the context information of the original text as illustrated in step S602 in fig. 6a, so as to obtain an original feature vector Xori. In step S802, as illustrated in step S602 in fig. 6a, the original feature vector Xori is subjected to feature transformation to obtain a processed semantic vector X.

When the data to be processed is sound, the feature extraction may include extracting sound features, and when the data to be processed is a picture or a video, the feature extraction may include extracting image features. Accordingly, the sound feature or the image feature may be converted into a semantic vector through feature conversion.

The feature extraction and feature transformation may be implemented based on the related art, which is not limited in this application.

The leaf nodes may represent nodes in the last layer of the binary tree, and the candidate words may represent possible first target texts corresponding to the data to be processed. The set of candidate words may be referred to as a vocabulary. The distance of a leaf node in the binary tree structure may be the path length from one leaf node to another leaf node.

In a possible implementation manner, each leaf node further corresponds to a semantic vector average value, the semantic vector average value is obtained by averaging one or more semantic vectors, the one or more semantic vectors are associated with a candidate word corresponding to the leaf node, and the binary tree structure is obtained by performing binary clustering on the semantic vector average value of each candidate word.

For example, each leaf node may correspond to a candidate word that may be associated with one or more semantic vectors, e.g., where the predicted result of the one or more semantic vectors is the candidate word, and the candidate word corresponds to the average of the semantic vectors. Those skilled in the art will appreciate that the manner of association between one or more semantic vectors and candidate words is not so limited.

According to the embodiment of the application, more accurate prediction results can be obtained when the data to be processed are predicted, and the reduction of precision can be avoided by using the distribution of the clustered semantic vectors as the arrangement mode of the leaf nodes.

The clustering method is not limited in the present application, and may be, for example, a K-Means clustering method, and the distance measurement method in the clustering process is also not limited, for example, may be a two-norm.

FIG. 9 shows a flow diagram of a text generation method according to an embodiment of the present application. The method is applied to the first text generation model, and as shown in fig. 9, the method further includes:

step S901, training a second text generation model by using a training sample to obtain the trained second text generation model, wherein an output layer in the second text generation model does not comprise the binary tree structure;

step S902, predicting a training sample by using the trained second text generation model to obtain a semantic vector and a second target text corresponding to the training sample;

step S903, establishing a mapping relation between each target word in the second target text and the corresponding semantic vector average value, wherein the semantic vector average value corresponding to any target word represents the average value of one or more semantic vectors of the training sample of the target word obtained through prediction;

step S904, performing binary clustering based on the semantic vector average value in the mapping relation, and establishing the binary tree structure;

step S905, training the trained second text generation model based on the binary tree structure to obtain the first text generation model.

According to the embodiment of the application, the second text generation model is trained by using the training sample to obtain the trained second text generation model, the training sample is predicted by using the trained second text generation model to obtain the semantic vector and the second target text corresponding to the training sample, the mapping relation between each target word in the second target text and the corresponding semantic vector average value is established, binary clustering is performed based on the semantic vector average value in the mapping relation, the binary tree structure is established, the trained second text generation model is trained based on the binary tree structure to obtain the first text generation model, the process of establishing the first text generation model can be achieved, and meanwhile, the first text generation model with higher prediction accuracy and speed can be obtained by establishing the binary tree structure.

The training sample may include training data and labels thereof, the training data may be data of the same type as the data to be processed, that is, the training data may represent any type of data that can be used to generate text, for example, data whose content may be represented by the generated text may include text, sound, picture, video, and other data.

The training data used for training the second text generation model may include the training data itself (e.g., original training text), and may also include context information of the training data, and the label of the training data may be determined by the purpose of the generation of the text by the model, for example, for a translation scenario, the training data is text, labeled as translated text, for a talking-on-picture scenario, the training data is a picture, labeled as text representing the content of the picture, for a speech recognition scenario, the training data is sound, labeled as text representing the content of speech in the sound, and so on. The output layer in the second text generation model may include a softmax output layer, for example, the second text generation model may include the original model of which the output layer is the softmax output layer shown in fig. 3, wherein the structure of the softmax output layer may refer to fig. 4. Corresponding to a training sample, there may be a plurality of target words, corresponding to the same target word, or there may be a plurality of semantic vectors, and in the case where there is only one semantic vector corresponding to a certain target word, the average value of the semantic vectors may be the semantic vector itself. In step S902 and step S903, as illustrated in step S302 in fig. 3, a mapping relationship S between the output data of the softmax output layer and the average value of the corresponding semantic vector may be obtained. Where the semantic vector X is available through a module for feature transformation before the output layer. Step S904 may result in a binary tree structure as illustrated in step S303 in fig. 3.

FIG. 10 shows a flow diagram of a text generation method according to an embodiment of the application. As shown in fig. 10, training the trained second text generation model based on the binary tree structure to obtain the first text generation model includes:

step S1001, replacing the output layer of the trained second text generation model with the output layer containing the binary tree structure;

step S1002, training the replaced second text generation model by using the training sample to obtain the first text generation model.

The output of the output layer containing the binary tree structure is determined by the arrangement mode of each node in the binary tree structure, and the output of the output layer corresponds to a certain leaf node in the binary tree structure. The output layer containing the binary tree structure may for example comprise the hierarchical softmax output layer of the binary tree structure described above, with reference to fig. 7.

FIG. 11 shows a flow diagram of a text generation method according to an embodiment of the present application. As shown in fig. 11, training the trained second text generation model based on the binary tree structure to obtain the first text generation model includes:

step S1101, training an output layer including the binary tree structure by using the semantic vector corresponding to the training sample, to obtain a trained output layer including the binary tree structure;

step S1102, replacing the output layer of the trained second text generation model with the trained output layer including the binary tree structure, to obtain the first text generation model.

According to the embodiment of the application, the training process of the model can be accelerated, the output layer containing the binary tree structure is trained by using the corresponding semantic vector of the training sample, and the training resources can be saved.

The semantic vector corresponding to the training sample can be determined when the training sample is predicted by using the trained second text generation model.

FIG. 12 shows a flow diagram of a text generation method according to an embodiment of the present application. As shown in fig. 12, processing the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed includes:

step S1201, regarding any semantic vector, taking the root node of the binary tree as an initial current node;

step S1202, judging whether the current node is a leaf node,

step S1203, under the condition that the current node is a leaf node, determining that a candidate word corresponding to the current node is a first target text corresponding to the semantic vector; if not, then,

step S1204, calculating a probability value of the left child node of the current node to which the semantic vector belongs according to the semantic vector and a semantic vector average value corresponding to the left child node of the current node;

and step S1205, under the condition that the probability value is greater than the threshold value, taking the left child node of the current node as a new current node, otherwise, taking the right child node of the current node as a new current node, and repeatedly executing the step after judging whether the current node is a leaf node.

The left child node only represents the same clustering direction, and when the probability value calculation is performed on the node of each layer, the left child node can represent the node which is used as a new node to perform calculation in the same direction. The probability value calculation method may include calculation according to a sigmoid function, for example, refer to the formula of the sigmoid function in the above

The method for calculating the probability value is not limited, and the threshold corresponding to the probability value is also not limited, for example, 0.5. The probability value may represent a probability that the semantic vector belongs to a semantic category corresponding to a left child node of the current node, and a value range of the threshold may be 0 to 1.

For step S1205, when the probability value is smaller than the threshold value, the right child node of the current node may be a new current node, and the step after determining whether the current node is a leaf node is repeatedly performed.

In steps S1201 to S1204, as illustrated in step S604 in fig. 6a, the semantic vector X may be used as input data, and the hierarchical softmax output layer may be input, so as to perform binary prediction.

For exemplary descriptions of the text generation method and apparatus according to the embodiments of the present application, reference may also be made to fig. 1 to 7, which are not repeated here.

Fig. 13 shows a configuration diagram of a text generation apparatus according to an embodiment of the present application. As shown in fig. 13, the apparatus includes:

the feature extraction module 1301 is configured to perform feature extraction on data to be processed to obtain a feature vector;

a feature conversion module 1302, configured to perform feature conversion on the feature vector to obtain a semantic vector, where the semantic vector corresponds to the semantics of the data to be processed;

and a processing module 1303, configured to process the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed, where in the binary tree structure, each leaf node corresponds to one candidate word of the first target text, and the closer the semantics of the candidate words are, the closer the distance between the leaf nodes corresponding to the candidate words in the binary tree structure is.

In one possible implementation, the apparatus is used for generating a model for a first text, and the apparatus further includes: the first training module is used for training a second text generation model by using a training sample to obtain the trained second text generation model, and an output layer in the second text generation model does not comprise the binary tree structure; the prediction module is used for predicting the training sample by using the trained second text generation model to obtain a semantic vector and a second target text corresponding to the training sample; the establishing module is used for establishing a mapping relation between each target word in the second target text and the corresponding semantic vector average value, wherein the semantic vector average value corresponding to any target word represents the average value of one or more semantic vectors of the training sample of the target word obtained through prediction; the binary clustering module is used for carrying out binary clustering based on the semantic vector average value in the mapping relation and establishing the binary tree structure; and the second training module is used for training the trained second text generation model based on the binary tree structure to obtain the first text generation model.

In one possible implementation, the second training module includes: a first replacing module, configured to replace the output layer of the trained second text generation model with the output layer including the binary tree structure; and the first training submodule is used for training the replaced second text generation model by using the training sample to obtain the first text generation model.

In one possible implementation, the second training module includes: the second training submodule is used for training the output layer containing the binary tree structure by utilizing the semantic vector corresponding to the training sample to obtain the trained output layer containing the binary tree structure; and the second replacing module is used for replacing the output layer of the trained second text generation model by the trained output layer containing the binary tree structure to obtain the first text generation model.

In one possible implementation manner, the processing module includes: a first determining module, configured to, for any semantic vector, take a root node of the binary tree as an initial current node; the second determining module is used for determining that the candidate word corresponding to the current node is the first target text corresponding to the semantic vector under the condition that the current node is the leaf node; otherwise, the calculation module is used for calculating the probability value of the left child node of the current node to which the semantic vector belongs according to the semantic vector and the semantic vector average value corresponding to the left child node of the current node; and a third determining module, configured to, when the probability value is greater than the threshold value, use the left child node of the current node as a new current node, otherwise use the right child node of the current node as a new current node, and repeatedly perform the step after determining whether the current node is a leaf node.

According to the embodiment of the application, the semantic vector is subjected to binary prediction, so that for each layer of the binary tree, only the probability value of one node needs to be calculated, the time delay of model prediction can be reduced, the time complexity is reduced, meanwhile, the calculated amount in model prediction is reduced, the operation resources occupied by the model are reduced, and the user experience is improved.

Fig. 14 shows a configuration diagram of a text generation apparatus according to an embodiment of the present application. As shown in fig. 14, the apparatus 40 includes at least one processor 1801, at least one memory 1802, and at least one communication interface 1803. In addition, the device may also include common components such as an antenna, which will not be described in detail herein.

The processor 1801 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to control the execution of programs according to the above schemes.

Communication interface 1803 may be adapted to communicate with other devices or a communication network, such as an ethernet network, a Radio Access Network (RAN), a core network, a Wireless Local Area Network (WLAN), etc.

The Memory 1802 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integrated with the processor.

The memory 1802 is used for storing application program codes for implementing the above schemes and is controlled by the processor 1801. The processor 1801 is configured to execute application code stored in the memory 1802.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

An embodiment of the present application provides a text generation apparatus, including: a processor and a memory for storing processor-executable instructions; wherein the processor is configured to implement the above method when executing the instructions.

Embodiments of the present application provide a non-transitory computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

Embodiments of the present application provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (EPROM or flash Memory), a Static Random Access Memory (SRAM), a portable Compact Disc Read-Only Memory (CD-ROM), a Digital Versatile Disc (DVD), a Memory stick, a floppy disk, a mechanical coding device, a punch card or an in-groove protrusion structure, for example, having instructions stored thereon, and any suitable combination of the foregoing.

The computer readable program instructions or code described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present application may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry can execute computer-readable program instructions to implement aspects of the present application by utilizing state information of the computer-readable program instructions to personalize custom electronic circuitry, such as Programmable Logic circuits, Field-Programmable Gate arrays (FPGAs), or Programmable Logic Arrays (PLAs).

Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It is also noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by hardware (e.g., an electronic circuit or an ASIC (Application Specific integrated circuit)) for performing the corresponding functions or acts, or combinations of hardware and software, such as firmware.

While the invention has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of text generation, the method comprising:

extracting the features of the data to be processed to obtain a feature vector;

performing feature conversion on the feature vector to obtain a semantic vector, wherein the semantic vector corresponds to the semantics of the data to be processed;

processing the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed,

in the binary tree structure, each leaf node corresponds to a candidate word of the first target text, and the closer the semantics of the candidate words are, the closer the leaf nodes corresponding to the candidate words are in the binary tree structure.

2. The method of claim 1, wherein each leaf node further corresponds to a semantic vector average, the semantic vector average is obtained by averaging one or more semantic vectors associated with the candidate word corresponding to the leaf node, and the binary tree structure is obtained by performing binary clustering on the semantic vector average of each candidate word.

3. The text generation method of claim 1, wherein the method is used for a first text generation model, and wherein the method further comprises:

training a second text generation model by using a training sample to obtain the trained second text generation model, wherein an output layer in the second text generation model does not comprise the binary tree structure;

predicting the training sample by using the trained second text generation model to obtain a semantic vector and a second target text corresponding to the training sample;

establishing a mapping relation between each target word in the second target text and the corresponding semantic vector average value, wherein the semantic vector average value corresponding to any target word represents the average value of one or more semantic vectors of the training sample of the target word obtained through prediction;

performing binary clustering based on the semantic vector average value in the mapping relation to establish the binary tree structure;

and training the trained second text generation model based on the binary tree structure to obtain the first text generation model.

4. The text generation method according to claim 3, wherein training the trained second text generation model based on the binary tree structure to obtain the first text generation model includes:

replacing the output layer of the trained second text generation model with the output layer containing the binary tree structure;

and training the replaced second text generation model by using the training sample to obtain the first text generation model.

5. The text generation method according to claim 3, wherein training the trained second text generation model based on the binary tree structure to obtain the first text generation model includes:

training the output layer containing the binary tree structure by using the semantic vector corresponding to the training sample to obtain the trained output layer containing the binary tree structure;

and replacing the output layer of the trained second text generation model with the trained output layer containing the binary tree structure to obtain the first text generation model.

6. The method according to claim 1, wherein the processing the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed includes:

for any semantic vector, taking a root node of the binary tree as an initial current node;

judging whether the current node is a leaf node or not,

under the condition that the current node is a leaf node, determining that a candidate word corresponding to the current node is a first target text corresponding to the semantic vector; if not, then the mobile terminal can be switched to the normal mode,

calculating the probability value of the left child node of the current node of the semantic vector according to the semantic vector and the semantic vector average value corresponding to the left child node of the current node;

and under the condition that the probability value is larger than the threshold value, taking the left child node of the current node as a new current node, otherwise, taking the right child node of the current node as a new current node, and repeatedly executing the step after judging whether the current node is a leaf node.

7. An apparatus for generating text, the apparatus comprising:

the characteristic extraction module is used for extracting the characteristics of the data to be processed to obtain a characteristic vector;

the feature conversion module is used for performing feature conversion on the feature vector to obtain a semantic vector, and the semantic vector corresponds to the semantics of the data to be processed;

a processing module, configured to process the semantic vector according to a binary tree structure to obtain a first target text corresponding to the data to be processed,

in the binary tree structure, each leaf node corresponds to one candidate word of the first target text, and the closer the semantics of the candidate words are, the closer the distance between the leaf nodes corresponding to the candidate words in the binary tree structure is.

8. A text generation apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any of claims 1-6 when executing the instructions.

9. A non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the method of any one of claims 1-6.

10. A computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in an electronic device, a processor in the electronic device performs the method of any of claims 1-6.