WO2023185896A1 - Text generation method and apparatus, and computer device and storage medium - Google Patents

Text generation method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2023185896A1
WO2023185896A1 PCT/CN2023/084560 CN2023084560W WO2023185896A1 WO 2023185896 A1 WO2023185896 A1 WO 2023185896A1 CN 2023084560 W CN2023084560 W CN 2023084560W WO 2023185896 A1 WO2023185896 A1 WO 2023185896A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
node
nodes
target
model
Prior art date
Application number
PCT/CN2023/084560
Other languages
French (fr)
Chinese (zh)
Inventor
黄斐
周浩
黄民烈
李航
Original Assignee
北京有竹居网络技术有限公司
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司, 清华大学 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023185896A1 publication Critical patent/WO2023185896A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the embodiments of the present disclosure relate to the technical field of natural language processing, for example, to a text generation method, device, computer equipment and storage medium.
  • Text generation technology is an important technology in the field of natural language processing. Text generation technology can use established information and text generation models to generate text sequences that meet specific goals. Among them, the text generation model used is trained based on sample data in different application scenarios (generative reading comprehension, human-computer dialogue, intelligent writing, machine translation, etc.), and text generation in different application scenarios can be achieved.
  • output delay refers to the time delay required from the model receiving input to the model fully generating text output.
  • this output delay is linearly related to the sentence length of the generated text.
  • new problems will be introduced.
  • the text produced may have consecutive repeated words, or the context may be incoherent.
  • Embodiments of the present disclosure provide a text generation method, device, computer equipment, and storage medium, which reduce the contextual incoherence and continuous repetition of words in the generated text, and improve the quality of the generated text.
  • embodiments of the present disclosure provide a text generation method, which method includes:
  • the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target words contained in the target text And the combination order of the target words is determined by the node information of the nodes and the topological structure between nodes.
  • embodiments of the present disclosure also provide a text generation device, which includes:
  • the encoding execution module is configured to input the acquired original text into the trained text encoding model to obtain text feature information
  • a decoding execution module configured to generate target text corresponding to the original text based on the text feature information and combined with the trained text decoding model
  • the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target words contained in the target text And the combination order of the target words is determined by the node information of the nodes and the topological structure between nodes. .
  • embodiments of the present disclosure also provide an electronic device, which includes:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors are caused to implement the text generation method provided by any embodiment of the present disclosure.
  • embodiments of the disclosure also provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the text generation method provided by any embodiment of the disclosure is implemented.
  • Figure 1 is a schematic flowchart of a text generation method provided by an embodiment of the present disclosure
  • Figure 1a shows the application renderings of text generation models in related technologies in machine translation scenarios
  • Figure 1b shows a structural diagram of the text decoding model used in the text generation method provided in this embodiment
  • Figure 1c shows an application rendering of the text generation model involved in this embodiment in a machine translation scenario
  • Figure 2 shows a schematic flowchart of a text generation method provided by an embodiment of the present disclosure
  • Figure 2a shows a schematic diagram of part of the network structure of the text decoding model used in the text generation method provided in this embodiment
  • Figure 2b shows an example diagram of calculating a node transfer matrix in the text generation method provided by this embodiment
  • Figure 2c shows an example diagram of the fully connected structure in the text prediction layer involved in the text generation method provided by this embodiment
  • Figure 3 is a schematic structural diagram of a text generation device provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • Figure 1 is a schematic flowchart of a text generation method provided by an embodiment of the present disclosure. This embodiment can be applied to the situation of text generation.
  • the method can be executed by a text generation device, and the device can use software and/or hardware. To implement, it can be configured in a terminal and/or server to implement the text generation method in the embodiment of the present disclosure.
  • Figure 1a shows an application rendering of a text generation model in related technologies in a machine translation scenario.
  • the input text may be "I went to the cinema" in Chinese.
  • the purpose of the text generation model 11 in the related art is to generate the English text of the above-mentioned Chinese sentence.
  • the English output samples used may include multiple such as: "I went to the movie theater” and "I just went to the cinema”. After completing the training, when actually performing English machine translation of "I went to the cinema", it is possible to mix the words in the above output samples and output the predicted text as the wrong text of "I went to the theater".
  • This embodiment provides a text generation method that improves the text generation model in related technologies and adds a text prediction layer. Through the nodes included in the added text prediction layer, high-quality text can be obtained. Generate text.
  • a text generation method provided by this embodiment may include the following steps:
  • the text generation method provided in this embodiment is not limited to a certain application scenario. If text generation is required in a certain application scenario, training samples can be collected in the application scenario to train the text generation model.
  • the text generation model can include two parts in structure, one is the text encoding model, and the other is the text decoding model.
  • the original text is equivalent to the input text before text generation, and the content of the original text may be different in different application scenarios.
  • the original text in the machine translation scenario, if Chinese-English translation is performed, the original text can be the Chinese text to be translated; if English-Chinese translation is performed, the original text can be the English text to be translated.
  • the text encoding model can be used to encode the original text, thereby obtaining the text feature information of the original text, wherein the model structure of the text encoding model can be directly reused in the text generation model in the related art.
  • the text encoding model can be trained and learned through the sample data provided in different application scenarios, so that the output text feature information can meet the text generation needs of the application scenarios. For example, in the application scenario of machine translation, the output text feature information is mainly used to subsequently obtain the translated text corresponding to the original text.
  • the text feature information is used to characterize the feature information of multiple words in the originally input original text.
  • the text feature information can be represented by a text feature matrix.
  • the text included in the text feature matrix The number of feature vectors is the same as the number of words contained in the original text.
  • the text feature information can be used as input data to enter the text decoding model.
  • the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target text The included target words and the combination order of the target words are determined by the node information of the set number of nodes and the topological structure between nodes.
  • the text decoding model used in this step includes a text prediction layer, and the text prediction layer includes a certain number of nodes, among which, through The node information of multiple nodes and the topological structure between nodes can effectively determine the target text of the original text. It can be known that the text decoding model in this embodiment is also trained and learned through sample data provided in different application scenarios, so that the output target text can meet the text generation requirements of the application scenarios.
  • the text prediction layer contains a set number of nodes. All nodes can be used to construct the graph required for text generation.
  • the node information of each node can be determined through text feature information.
  • the specific value of the set number is greater than the number of words contained in the original text, and can be used as the graph size required for graph construction in the text prediction layer, or as the possible predicted length of the text to be generated. , which means that the number of words contained in the text to be generated will not be greater than the set number.
  • the node information of a set number of nodes contained in the text prediction layer can be determined through the text feature information. For example, the text feature information can be combined with certain parameter information for full connection processing, and finally multiple nodes in the original text can be processed.
  • the relevant feature information of each word is mapped to the node respectively as the node information of the node.
  • the generation logic of the target text it needs to consider the node information of the nodes in the text prediction layer and the topological structure between nodes.
  • the analysis shows that the target text is also composed of single words, and the words in the target text should have some relationship with the words in the original text.
  • text feature information representing multiple words in the original text can be obtained.
  • the text decoding model of this embodiment the text feature information can be converted into text through basic decoding processing.
  • the node information of multiple nodes included in the prediction layer is equivalent to establishing an association between multiple words in the original text and multiple nodes in the text prediction layer.
  • the text decoding model provided in this embodiment can establish a corresponding relationship between multiple nodes and words in the dictionary through the node information of multiple nodes in the text prediction layer, so that the node can correspond to a most matching word. .
  • the text coding model provided in this embodiment can also connect multiple nodes in the text prediction layer according to certain connection conditions to form a topological structure between nodes. Based on the formed topological structure between nodes, the connection relationship between multiple nodes can be clearly understood. According to the trained learning parameters in the text prediction layer, combined with the topological structure between nodes, the transition probability from one node to another connected node can be determined, and finally based on the word corresponding to each node, and each node Click to other The transition probability of the connected nodes can select the target node from multiple nodes.
  • the target words required to generate the target text are also determined accordingly when the target node is selected; in addition, The combination order of multiple target words in the generated target text can also be determined by the connection relationship between nodes represented by the topological structure between nodes.
  • the text decoding model may include: a position information input layer, a basic decoding sub-model and a text prediction layer;
  • the position information input layer includes a set number of node position parameters, and the set number is used to determine the number of nodes included in the text prediction layer;
  • the node information of the set number of nodes included is determined through the node position parameters and the text feature information, combined with the basic decoding sub-model.
  • the text decoding model in addition to the text prediction layer, also includes a position information input layer and a basic decoding sub-model, and on the structural connection, the output information of the position information input layer is passed to the basic decoding sub-model, The information output by the basic decoding sub-model is passed to multiple nodes in the text prediction layer.
  • the position information input layer can be understood as an information input layer that predicts the graph size required to generate a directed acyclic graph in the text prediction layer in the text generation implementation.
  • the position information input layer The predicted graph size is actually the number of nodes required to construct the graph.
  • the value of the graph size can be set to a multiple of the number of words contained in the original text. It can be known that the number of nodes determines the number of nodes included in the text prediction layer. That is, the set number of nodes representing the number of nodes in the text prediction layer is equivalent to the preset number of nodes in the position information input layer. Determined; when setting the graph size to n, it is equivalent to determining that the number of nodes contained in the text prediction layer is n.
  • node position parameters are used to represent the nodes.
  • the node position parameters can be understood as the position parameters assigned to the nodes required to construct the graph.
  • Each node position parameter represents the existence of a corresponding node in the text prediction layer; at the same time, the node
  • the position parameter is also equivalent to one of the learning parameters obtained from training in the text decoding model. Through training iterations, the node position parameters can be adjusted accordingly until stable parameter information is obtained after the training.
  • the text feature information and node position parameters input by the text encoding model can be used as the input of the basic decoding sub-model in the text decoding model, and the basic decoding sub-model can be output and combined with the text prediction layer.
  • Vector information containing the same number of nodes is used as the node information of the corresponding node.
  • the basic decoding sub-model can include: self-attention mechanism self-attention network structure and cross-attention mechanism cross-attention network structure, which is equivalent to reusing the text decoding model in the text generation model in related technologies.
  • Figure 1b shows a structural diagram of the text decoding model used in the text generation method provided in this embodiment.
  • the text decoding model 12 includes an input layer.
  • the input layer includes two different input branches.
  • One of the input branches is the position information input layer 121 for inputting graph size and node position information.
  • the position information input layer 121 includes n determined node position parameters g; the other input branch is used to input text feature information output by the text encoding model; the text decoding model 12 also includes a basic decoding sub-model 122 and a text prediction layer 123.
  • the sub-model 122 may include an m-layer network structure composed of a self-attention mechanism and a spanning attention mechanism; the text prediction layer 123 includes n nodes with the same number of node position parameters; finally, through the text decoding model The output layer 124 of 12 outputs the target text of the original text.
  • the text generation method realizes the parallel determination of node information of multiple nodes in the added text prediction layer and the parallel determination of multiple target words in the generated text, reducing the text generation delay; At the same time, through the node information of multiple nodes in the added text prediction layer, one-to-one correspondence between multiple words in the generated text and the matching nodes can be achieved, thereby better avoiding continuous repetition in the generated text.
  • the appearance of words; in addition, through the inter-node topological structure of multiple nodes, the combination order of multiple words in the generated text can be limited, thereby ensuring the relevance of the context in the generated text, thus improving the quality of the generated text.
  • the generation quality ensures text accuracy.
  • the method further includes:
  • the learning parameters include: node position parameters involved in the position information input layer included in the text decoding model, basic model parameters involved in the included basic decoding sub-model, and included text prediction layer. Node-related parameters involved in each node.
  • this embodiment improves the network structure of the text decoding model, such as adding a text prediction layer and using more values than the words contained in the text as the number of nodes, so that each node can be related to the output A word correspondence in the text.
  • sample data improvement and loss function improvement were performed during the training phase.
  • this embodiment can use single sample data, that is, one input text corresponds to only one output text, forming a piece of sample data; for loss function improvement, a loss function generation strategy is given, which mainly decodes text Consider the nodes in the text prediction layer added to the model. For example, this strategy can first consider the possible paths between nodes and consider the generation probability of generating output text through the constituted path, and then combine the generation probabilities of multiple paths to generate a loss function.
  • the learning parameters in the created text decoding model can be adjusted through backpropagation, and finally a higher accuracy can be obtained. Text decoding model.
  • the learning parameters in the text decoding model can include node position parameters in the position information input layer; they can also include weight parameters involved in the basic decoding sub-model; and they can also include relatively multiple nodes in the text prediction layer.
  • the set node-related parameters can be used to determine the predicted nodes related to the generated text and match the nodes to predicted words in the dictionary.
  • the built text decoding model will be trained with learning parameters to obtain a trained text decoding model, including:
  • the set of sample data includes an original sample text and a corresponding single target sample text.
  • a set of sample data in this embodiment may include an original sample text and a target sample text.
  • the original sample text in a set of sample data is encoded using the text encoding model and then input into the current text decoding model.
  • the current iteration can be understood as being either the first iteration or a training iteration to be executed in an iteration loop, and the training logic executed in each iteration is the same.
  • the current text decoding model can be understood as the text decoding model to be trained under the current iteration.
  • the original sample text can be input into the trained text encoding model for encoding processing, and then input into the current text decoding model.
  • the original sample text can be processed through the network structure included in the current text decoding model and the current parameter values of the learning parameters in the network structure, where based on the text prediction layer in the current text decoding model Multiple nodes can form multiple text prediction paths, and one predicted text can be generated through multiple text prediction paths.
  • the probability value that the predicted text is the target sample text can be determined as the corresponding probability value when generating the target sample text based on the text prediction path.
  • This step is equivalent to one of the execution logic in the loss function generation strategy, and the determined probability value is used to determine the value of the loss function used in the current iteration.
  • the text prediction path is formed based on the node combination setting algorithm in the text prediction layer. illustrative, this During the execution of the step, all paths formed by connections between multiple nodes can be used as text prediction paths. If all paths are directly selected as text prediction paths, more computing resources will be occupied during the path calculation process. For model training, this embodiment considers using a dynamic programming algorithm in all path calculations to avoid repeated operations of the same logic, thereby saving computing resources and improving training time. At the same time, this embodiment can also consider using a certain algorithm to start from the node. Select a part of the paths formed by the connection as the text prediction path.
  • the above-mentioned determined probability value can be substituted into a preset loss function generation formula, and the current loss function value under the current iteration can be determined.
  • the loss function generation formula is expressed as taking the logarithm of the sum of multiple probability values, and taking the negative of the logarithmic operation result.
  • step e0 Use the next iteration as the new current iteration, return and continue executing step b0 until the iteration end condition is met, and the trained text decoding model is obtained.
  • the iteration acceptance condition may be that the current loss function value determined in the iteration logic is within a set threshold range, or the number of iterations reaches a set threshold.
  • model training logic given in this embodiment, it is possible to better avoid the inconsistent labels in the training samples that occur during the model training stage, so that each node in the text encoding model can be matched with the words that appear in the text to be generated. correspond.
  • Figure 1c shows an application rendering of the text generation model involved in this embodiment in a machine translation scenario.
  • the input text can also be "I went to the cinema" in Chinese.
  • the text generation model 13 used in this embodiment (the model includes a text prediction layer)
  • the English text of the above Chinese sentence can be generated.
  • the English text samples used may be only "I went to the movie theater” or only "I just went to the cinema”.
  • multiple words in the English text sample each correspond to a processing node in the text generation model 13 (in Figure 1c, a processing node can be represented by predicting the words presented in the text); equivalent to this
  • the text generation model 13 used in the embodiment can determine the most matching word for each processing node, and according to the connection relationship between the processing nodes, a word that best fits the contextual relationship can be determined from the multiple connection paths formed by the processing nodes. Combined paths.
  • Figure 2 shows a schematic flow chart of a text generation method provided by an embodiment of the present disclosure.
  • This embodiment is a refinement of the above embodiment.
  • based on the text feature information, combined with the trained text Decoding model generating target text corresponding to the original text, including: inputting the text feature information and the position information into the node position parameters in the layer, inputting into the basic decoding sub-model; obtaining the basic decoding sub-model
  • the set number of initial text prediction vectors output by the model are used as node information of a set number of nodes in the text prediction layer; based on the node construction Directed acyclic graph, determine the topological structure between nodes, and determine the target text of the original text based on the node information.
  • this embodiment provides a text generation method, including the following steps:
  • the text feature information may be a feature matrix containing feature vectors corresponding to multiple words in the original text.
  • the text decoding model includes a position information input layer.
  • the position information input layer includes the position information (node position parameters) in the text decoding model that represents the nodes in the graph to be constructed and the features of the graph to be constructed.
  • the size of the graph (mainly characterized by the number of node position parameters included).
  • text feature information and node position parameters can be used as input information into the basic decoding sub-model in the text decoding model.
  • Figure 2a shows a schematic diagram of part of the network structure of the text decoding model used in the text generation method provided in this embodiment.
  • the position information input layer in the text decoding model is given, and the basic decoding sub-model 20 is provided; among them, the position information input layer contains 9 (graph size) node position parameters 21.
  • Multiple node position parameters 21 and text feature information 22 output by the text encoding model can be input to the basic decoding sub-model 20 .
  • This step can obtain the processing information output by the basic decoding sub-model, and the processing information can include a set number of initial text prediction vectors.
  • the number of settings is the same as the number of node position parameters in the position information input layer.
  • This step can also correspond to the initial text prediction vector obtained above and the node in the text prediction layer as the node information of the node.
  • Figure 2a also shows the node set in the text prediction layer.
  • the node set also contains 9 nodes 23; multiple initial text prediction vectors output by the basic decoding sub-model 20 It can correspond to the node 23 one-to-one to serve as the node information of multiple nodes 23.
  • S204 Construct a directed acyclic graph based on the nodes, determine the topological structure between nodes, and determine the target text of the original text based on the node information.
  • the above steps are equivalent to assigning node information to the nodes in the text prediction layer, so that the nodes in the text prediction layer are associated with the actual original text.
  • This step is equivalent to using the text prediction layer as the execution subject, which mainly performs subsequent processing of text generation based on the node information of multiple nodes, thereby generating the target text of the original text.
  • this step needs to establish the association between multiple nodes, and the association between multiple nodes can be achieved by constructing a graph. Considering that the text to be generated is directed and acyclic, this step can build a directed acyclic graph based on multiple nodes.
  • the execution logic of generating the target text corresponding to the original text based on node information in this step can be described as: 1) Establishing directed connections between multiple nodes to form a directed acyclic graph, and determining whether the two connected Among the nodes, the transition probability from the source node to the target node, where the source node is the outgoing node in the directed connection between two nodes, and the target node is the incoming node in the directed connection between two nodes; 2) Determine the predicted words corresponding to each node; 3) Select the target words based on the transition probability between nodes and the predicted words corresponding to the nodes, and finally combine the target words to form the target by obtaining the combination order between the target words. text.
  • this embodiment provides an implementation method of constructing a directed acyclic graph based on multiple nodes, determining the topological structure between nodes, and determining the target text of the original text based on node information.
  • the implementation steps Including the following steps a1 ⁇ c1, for example:
  • a directed acyclic graph is used to determine the connection relationship between nodes. Taking into account the directional nature of the constructed graph, this embodiment performs directional connections based on the node labels of the nodes. For example, assuming there are 9 nodes, based on the node labels from small to large, node v1 will be connected with node v1. v2 ⁇ v9 establish directed connections respectively, and node v2 can only establish directed connections with v3 ⁇ v9, and so on, the last node v9 will no longer have directed connections. After determining the directed acyclic graph, it is equivalent to determining the topology between nodes.
  • the topological structure between nodes includes the connection relationship between the node and other nodes. Based on the connection relationship between nodes, it can be known which nodes each node is connected to, and the existing connections are Directed connection.
  • the row and column values of the node transfer matrix are respectively the number of nodes included in the text prediction layer. And considering the directionality of node connections, the node transfer matrix can be an upper triangular matrix. For a valid element value in the node transition matrix, it represents the existence of a directed connection between the node corresponding to the corresponding row and the node corresponding to the corresponding column, and is mainly the transition probability of the two nodes determined through the corresponding calculation logic.
  • one of the implementation logic can be described as: for two nodes that establish a connection, it can obtain the node information of the two nodes, where the node information can be obtained through the characteristics Then the feature vectors representing the node information of the two nodes can be multiplied together, and the obtained product vector can be used as the transition probability of the two nodes after normalization.
  • another implementation logic can also be described as: first obtain the node-related parameters set by the node in the text prediction layer, such as the first learning parameter and the second learning parameter, mainly using is used to determine the transition probability; among them, the node-related parameters of the node exist in the text decoding model. After completing the training of the text decoding model, they can have fixed parameter values; after that, for the two nodes that are connected, they can be based on the result. The product vector obtained by multiplying the point information and the node-related parameters is used to determine the transition probability.
  • node vi and node vj are connected to The calculation of the transition probability of node vj, node vi and node vj can be described as: determining the product of the initial text prediction vector (node information) of node vi and the first learning parameter (recorded as the first product); determining the node The product of the initial text prediction vector (node information) of point vj and the second learning parameter (recorded as the second product); normalize the product result of the first product and the second product, and the result after normalization can be Seen as the transition probability of node vi and node vj.
  • the node transition matrix of the text prediction layer can be formed based on the transition probability.
  • this embodiment can determine the node transition matrix corresponding to the text prediction layer based on the topological structure between nodes and the node information of the node, including:
  • the calculation of the transition probability p vi->vj from node vi to node vj can be described as: Among them, softmax represents normalization, Indicates the size of the text prediction layer (d is determined in the construction phase), Vi and Vj respectively represent the node information vectors of node vi and node vj.
  • the implementation logic can be summarized as: for each node, according to the node information of the node and the corresponding adjacent node, the first learning parameter and the second learning parameter, Combined with the probability transition formula, the transition probability from the node to the adjacent node is determined, where the first learning parameter and the second learning parameter are both node-related parameters corresponding to the node.
  • the probability transfer formula can be expressed as:
  • softmax represents normalization
  • Vi and Vj respectively represent the node information vectors of node vi and node vj
  • W1 in this formula represents the first learning parameter related to the node
  • W2 represents the second learning parameter related to the node
  • p vi-> vj represents the transition probability from node vi to node vj.
  • the transition probability of a node and its adjacent nodes can be calculated, and the node transition matrix can be formed based on the transition probability.
  • FIG. 2b shows an example diagram of calculating a node transition matrix in the text generation method provided in this embodiment.
  • the transition probability is calculated for the nodes included in the text prediction layer.
  • Table E in Figure 2b represents the calculated node transfer matrix.
  • Figure 2b shows some connections of nodes and the transition probabilities corresponding to the corresponding connections. For example, the transition probability from v1 to v2 is 0.3; the transition probability from v1 to v3 is 0.7, etc. What we can know is that in the node transition matrix E, the sum of the transition probabilities of each row is 1.
  • this embodiment after determining the node transition matrix, it is equivalent to determining the weight of the edge formed by the connection in the directed acyclic graph.
  • This embodiment can select a prediction path through the prediction path selection strategy; exemplary , for the selection of the predicted path, one of the implementation methods can be described as: along the node connection line, on the premise that the outgoing end node is fixed, select the incoming end with the highest transition probability from the outgoing end node. node, and use the edge between the two nodes as one of the sides of the predicted path; then repeat the above logic on selecting a new outgoing node, and finally select all the edges in the predicted path, and then determine the components that make up the predicted path. Target.
  • this step can determine the probability information of each node and the words included in the dictionary based on the node information and the fully connected layer existing in the text prediction layer.
  • the dictionary can be pre-created word list information. It contains a variety of words required for text generation, and each word can be represented in the form of a vector.
  • the node of the previous layer of the fully connected layer can be the node of the directed acyclic graph in this embodiment, and the node of the latter layer can be the word in the dictionary.
  • the full connection processing can be to calculate the matching probability from the node in the directed acyclic graph to the word node in the dictionary.
  • the calculation form can be realized through full connection based on the node information of the node and the word vector of the word node.
  • the target word corresponding to the node in the prediction path can be determined, and finally the target text is formed based on the combination of multiple target words. It should be noted that in this embodiment, the order of execution of determining the prediction path and matching probability is not determined. It is also possible to determine the prediction path after determining the matching probability. As long as the target text can be generated.
  • this embodiment can also provide a detailed description of the above step c1 of "determining the target text of the original text based on the node information of the node and the node transfer matrix".
  • the node transfer matrix corresponding to the text prediction layer and the node information of the node it can be implemented by executing the logic of steps c11 to c13 provided in this embodiment.
  • the text prediction layer also includes a fully connected structure.
  • the fully connected structure can regard the node information of the nodes in the directed acyclic graph as Input information.
  • the next layer in the fully connected structure can be considered as the word node formed by the words in the dictionary.
  • the nodes in the directed acyclic graph and the word nodes in the dictionary can be connected through connecting lines. connect.
  • the connection weight of each connection line in the fully connected structure can be the third learning parameter determined by the connection between the relative node and the word after training the text decoding model.
  • Figure 2c shows an example diagram of the fully connected structure in the text prediction layer involved in the text generation method provided in this embodiment.
  • a fully connected structure 24 that determines the predicted words associated with the nodes.
  • Figure 2c also includes the result output layer. On the result output layer, only the predicted words matching the nodes in the directed acyclic graph are displayed. For example, the word matching node v1 is " I"; the word matching node v2 is "just”; the word matching node v3 is "went", etc.
  • this step and the execution logic can be described as each node being connected to multiple words in the dictionary.
  • the nodes and words can also represent the corresponding information through vectors. Therefore, for the matching probability of nodes and words, if the connection weights in the fully connected structure are re-determined in the text decoding model training stage, the third learning parameters obtained from training can be obtained first, and then the corresponding third learning parameters and the corresponding nodes can be determined.
  • the vector product of point information and word information if the text decoding model training phase no longer corrects the connection weights, but directly shares the word features used by the text encoding model, the vector product of the corresponding node information and word information can be directly determined; Afterwards, the vector product of the node relative to all words can also be determined, which can be used as the matching probability from the node to the word after normalization.
  • the fully connected layer is built within the text prediction layer, which contains a fully connected structure for matching probability processing.
  • the connection structure full connection processing can be performed on each node.
  • c12. Determine the predicted node and the corresponding target word according to the node transfer matrix and the matching probability from the node to the word.
  • the prediction node can be considered as a key node on which the target text generation depends on selected among the nodes of the text prediction layer. Based on the matching probability corresponding to the prediction node, the prediction word matching the prediction node can be determined, and the prediction word can be regarded as the target word contained in the target text.
  • the prediction point can be obtained by determining the prediction path based on the node transfer matrix in the text prediction layer, and then the target word of the prediction point can be determined based on the matching probability from the node to the word; it can also be based on the node transfer
  • the matrix and the matching probability of the node to the word are used to determine the prediction node and the target word, and the prediction path is determined based on the prediction node, which is used to combine the target words to form the target text; it can also be determined first based on the matching probability of the node to the word.
  • the predicted words corresponding to the nodes are generated, and then the prediction path is determined in the directed acyclic graph through a search algorithm, and the target words required for text generation are finally selected.
  • the target words determined above are combined according to the connection directions between the corresponding nodes in the text prediction layer. Multiple target words can only determine one combination order, and finally the final result can be obtained according to this combination order.
  • target text is equivalent to the result of text generation processing with the original text.
  • This embodiment provides a refinement of the above step c13.
  • this embodiment provides a An implementation manner can be described as:
  • At least one predicted node is determined according to the maximum transition probability corresponding to the node in the node transition matrix.
  • the maximum transition probability of a node It first starts from the node corresponding to the starting node label. This node can be used as the first predicted node, and the predicted node is connected to the adjacent node correspondingly.
  • the maximum transition probability of the prediction node can be determined, and the adjacent node corresponding to the maximum transition probability can be regarded as a new prediction node; after that, the maximum transition can be performed again on the new prediction node Determine the probability, and then determine a new prediction node; through the above logic, the prediction nodes can be determined in a loop until the last node is reached, and the last node can also be used as the last prediction node. From this step, at least one prediction node can be obtained (in one case, the starting node is also the ending node).
  • the maximum matching probability is determined from the matching probability between the prediction node and the word, and the word corresponding to the maximum matching probability is determined as the target word.
  • the maximum matching probability can also be determined from the matching probability, and then the prediction word corresponding to the maximum matching probability can be obtained.
  • the prediction word It is equivalent to the target word corresponding to the prediction node. What can be known is that by predicting the determined order of nodes, a combination path for target word combination can be determined, and this combination path can be used as the final target text generation.
  • this embodiment also provides another implementation method. It should be noted that, different from the above implementation logic, the implementation logic of this method is to simultaneously consider the transition probability in the node transition matrix. And the impact of the matching probability of the node and the word on the predicted node, which can multiply the transition probability and the matching probability, and determine the predicted node based on the product result.
  • the current node can be recorded as the first predicted node.
  • the matching probability and transition probability corresponding to the maximum product value you can know the matching probability and transition probability corresponding to the maximum product value.
  • the word corresponding to the above matching probability relative to the current node The word is recorded as a target word.
  • the adjacent node can be recorded as another predicted node.
  • this execution logic also performs loop processing in the order of directed connections of nodes, from which it can be determined Prediction nodes and target words that meet the conditions.
  • the above process of determining the prediction node is equivalent to determining the combination order of the target word combination.
  • this embodiment also provides another implementation method. Different from the above two implementation methods, this implementation method mainly considers the situation that there are different nodes that may correspond to the same word. This embodiment is equivalent to proposing a target word determination method based on this situation.
  • the corresponding predicted words are determined for the nodes in the text prediction layer.
  • the determination of predicted words is also implemented using the logic of maximum matching probability.
  • the purpose of this step is mainly to determine the candidate text generation path based on the node label sequence for the nodes in the text prediction layer, and to determine the transition probability of the edge between the two nodes in the candidate text generation path based on the node transition matrix; and then through The path search algorithm combines the predicted words to determine candidate prediction paths with different nodes representing the same predicted word from the candidate text generation path; and obtains the prediction path with the highest weight from the candidate prediction paths.
  • the first one has the fastest execution speed, but the quality of the generated text is relatively low; the second one is at the lowest in terms of execution speed and text generation quality. Moderate state; the execution speed of the third type is relatively slow, but the quality of the generated text is relatively high.
  • This embodiment can adopt the above methods but is not limited to the above methods.
  • the target text can be generated by considering the appropriate prediction node and target word implementation method according to the actual situation.
  • This embodiment provides a text generation method that refines the implementation process of the text decoding model to generate target text.
  • a text prediction layer By adding a text prediction layer, it is considered to use graph nodes in the form of a directed acyclic graph to perform target words and predictions.
  • the effective determination of nodes ensures the relevance of the context and avoids the continuous occurrence of repeated words in the generated text. Compared with related technologies, it improves the quality of generated text and ensures text accuracy.
  • Figure 3 is a schematic structural diagram of a text generation device provided by an embodiment of the present disclosure. This embodiment can be applied to text generation.
  • the device can be implemented by software and/or hardware, and can be configured in a terminal and/or server. To implement the text generation method in the embodiment of the present disclosure.
  • the device may include: an encoding execution module 31 and a decoding execution module 32.
  • the encoding execution module 31 is configured to input the acquired original text into the trained text encoding model to obtain text feature information;
  • the decoding execution module 32 is configured to generate target text corresponding to the original text based on the text feature information and combined with the trained text decoding model;
  • the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target words contained in the target text And the combination order of the target words is determined by the node information of the nodes and the topological structure between nodes.
  • This embodiment provides a text generation device that realizes parallel determination of node information of nodes in the added text prediction layer and parallel determination of target words in the generated text, reducing text generation delay; at the same time, through the added
  • the node information of the nodes in the text prediction layer can achieve a one-to-one correspondence between the words in the generated text and the matching nodes, thereby better avoiding the occurrence of consecutive repeated words in the generated text; in addition, through the nodes
  • the topological structure between nodes can limit the combination order of words in the generated text, thereby ensuring the relevance of the context in the generated text, thereby improving the generation quality of the generated text and ensuring text accuracy.
  • the text decoding model includes: a position information input layer, a basic decoding sub-model and a text prediction layer;
  • the position information input layer includes a set number of node position parameters, and the set number is used to determine the number of nodes included in the text prediction layer;
  • the node information of a set number of nodes included in the text prediction layer is determined through the node position parameters and the text feature information, combined with the basic decoding sub-model.
  • the decoding execution module 32 includes:
  • An information input unit configured to input the text feature information and the position information into the node position parameters in the layer and into the basic decoding sub-model;
  • the initial vector output unit is configured to obtain the set number of initial text prediction vectors output by the basic decoding sub-model, and use the set number of initial text prediction vectors as the set number in the text prediction layer. Node information of the node;
  • the text generation unit is configured to construct a directed acyclic graph based on the nodes, determine the topological structure between nodes, and determine the target text of the original text in combination with node information.
  • the text generation unit includes:
  • the first execution unit is configured to construct a directed acyclic graph based on the node labels of the nodes in the text prediction layer, and obtain the topological structure between nodes;
  • the second execution unit is configured to determine the node transfer matrix corresponding to the text prediction layer based on the topological structure between nodes and the node information of the node;
  • the third execution unit is configured to determine the target text of the original text based on the node information of the node and the node transition matrix.
  • the second execution unit is configured as:
  • For each node determine the adjacent nodes with directed connections to the node from the topological structure between nodes;
  • a node transition matrix corresponding to the text prediction layer is formed based on the transition probability.
  • the third execution unit is configured as:
  • the matching probability of the node to the word in the preset vocabulary is determined through the fully connected layer in the text prediction layer;
  • a target text of the original text is formed.
  • the third execution unit performs the step of determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to the word, which may be:
  • the maximum matching probability is determined from the matching probability between the prediction node and the word, and the word corresponding to the maximum matching probability is determined as the target word.
  • the fourth execution unit performs the step of determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to the word.
  • the step may also be:
  • the predicted node is used as a new current node, and the selection operation of the current adjacent point corresponding to the current node is re-executed until the loop end condition is reached.
  • the fourth execution unit performs the step of determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to the word, which may also be:
  • the device may further include: a model training module configured to generate a strategy based on the set loss function, perform learning parameter training on the constructed text decoding model, and obtain a trained text decoding model;
  • a model training module configured to generate a strategy based on the set loss function, perform learning parameter training on the constructed text decoding model, and obtain a trained text decoding model;
  • the learning parameters include: node position parameters involved in the position information input layer included in the text decoding model, basic model parameters involved in the included basic decoding sub-model, and included text prediction layer. Node related parameters involved in the node.
  • model training module can be set to:
  • Obtain at least one set of sample data which includes an original sample text and a corresponding single target sample text
  • the original sample text in a set of sample data is encoded using the text encoding model and then input to the current text decoding model;
  • the current loss function value is determined, and based on the current loss function value, the learning parameters in the current text decoding model are adjusted through backpropagation to obtain text decoding for the next iteration.
  • next iteration is regarded as the new current iteration, and the learning parameter training is continued until the iteration end condition is met, and the trained text decoding model is obtained.
  • the loss function generation formula is expressed as: taking the logarithm of the sum of the probability values, and taking the negative of the logarithm operation result.
  • the above-mentioned device can execute the method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • Terminal devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablets), PMPs (Portable Multimedia Players), vehicle-mounted terminals (such as Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 4 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 40 may include a processing device (eg, central processing unit, graphics processor, etc.) 41 that may be loaded into a random access memory according to a program stored in a read-only memory (ROM) 42 or from a storage device 48 .
  • the program in the memory (RAM) 43 executes various appropriate actions and processes.
  • various programs and data required for the operation of the electronic device 40 are also stored.
  • the processing device 41, ROM 42 and RAM 43 are connected to each other via a bus 45.
  • An editing/output (I/O) interface 44 is also connected to bus 45.
  • the following devices may be connected to the I/O interface 44: input devices 46 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 47 such as a computer; a storage device 48 including a magnetic tape, a hard disk, etc.; and a communication device 49.
  • the communication device 49 may allow the electronic device 40 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 4 illustrates electronic device 40 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • Embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 49, or from the storage device 48, or from the ROM 42.
  • the processing device 41 When the computer program is executed by the processing device 41, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
  • the electronic device provided by the embodiments of the present disclosure and the text generation method provided by the above embodiments belong to the same inventive concept.
  • Technical details that are not described in detail in this embodiment can be referred to the above embodiments, and this embodiment has the same features as the above embodiments. beneficial effects.
  • Embodiments of the present disclosure provide a computer storage medium on which a computer program is stored.
  • the program is executed by a processor, the text generation method provided in the above embodiments is implemented.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
  • Communications e.g., communications network
  • communications networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or developed in the future network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device executes the above-mentioned one or more programs.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages—such as "C” or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as an Internet service provider through Internet connection
  • each box in the flowchart or block diagram may represent a A module, program segment, or part of code that contains one or more executable instructions for implementing specified logical functions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware.
  • the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
  • the first acquisition unit can also be described as "the unit that acquires at least two Internet Protocol addresses.”
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • Example 1 provides a text generation method, which method includes: inputting the acquired original text into a trained text coding model to obtain text feature information; based on the text features Information, combined with the trained text decoding model, generates the target text corresponding to the original text; wherein, the text decoding model includes a text prediction layer, and the text prediction layer includes a set number of nodes.
  • the point information is determined by the text feature information, and the target words contained in the target text and the combination order of the target words are determined by the node information of the nodes and the topological structure between nodes.
  • Example 2 provides a text generation method, wherein the text decoding model includes: a position information input layer, a basic decoding sub-model and a text prediction layer; the position The information input layer includes a set number of node position parameters, and the set number is used to determine the number of nodes included in the text prediction layer; the set number included in the text prediction layer The node information of the node is determined through the node position parameter and the text feature information, combined with the basic decoding sub-model.
  • Example 3 provides a text generation method, wherein a target text corresponding to the original text is generated based on the text feature information and a trained text decoding model, The method includes: inputting the text feature information and the position information into the node position parameter in the layer and inputting it into the basic decoding sub-model; obtaining the set number of initial text prediction vectors output by the basic decoding sub-model. , use the set number of initial text prediction vectors as node information of a set number of nodes in the text prediction layer; construct a directed acyclic graph based on the nodes, and determine the topology structure between nodes, And determine the target text of the original text in combination with the node information.
  • Example 4 provides a text generation method, wherein the directed acyclic graph is constructed based on the nodes, the topology between nodes is determined, and the nodes are combined
  • the information determines the target text of the original text, including: predicting the node labels of the nodes in the layer based on the text, constructing a directed acyclic graph, and obtaining the topological structure between nodes; based on the topological structure between nodes and the
  • the node information of the node determines the node transition matrix corresponding to the text prediction layer; based on the node information of the node and the node transition matrix, the target text of the original text is determined.
  • Example 5 provides a text generation method, in which the text prediction layer corresponding to the text prediction layer is determined based on the topological structure between nodes and the node information of the nodes.
  • the node transfer matrix includes: for each node, determining the adjacent nodes of the directed connection of the node from the topological structure between nodes; according to the node and the The node information of the adjacent node determines the transition probability from the node to the adjacent node; based on the transition probability, a node transition matrix corresponding to the text prediction layer is formed.
  • Example 6 provides a text generation method, wherein the target of the original text is determined based on the node information of the node and the node transition matrix.
  • the text includes: according to the node information of the node, through the fully connected layer in the text prediction layer, determining the matching probability of the node to the word in the preset vocabulary; according to the node transfer matrix and The matching probability of the node to the word is used to determine the predicted node and the corresponding target word; based on the target word, the target text of the original text is combined to form the target text.
  • Example 7 provides a text generation method, in which the predicted node and the corresponding target word are determined according to the node transition matrix and the matching probability from the node to a word. , including: determining at least one prediction node according to the maximum transition probability corresponding to the node in the node transition matrix; for each prediction node, determining the maximum matching probability from the matching probability of the prediction node to the word, And the word corresponding to the maximum matching probability is determined as the target word.
  • Example 8 provides a text generation method, in which the predicted node and the corresponding target word are determined according to the node transition matrix and the matching probability of the node to a word. , including: taking the node corresponding to the starting node label as the current node; obtaining the current transition probability from the current node to the adjacent node from the node transition matrix; determining the current transition probability and the The product value of the matching probability corresponding to the current node and the word; select the maximum product value from the product value, and use the adjacent nodes and words associated with the maximum product value as the prediction node and the target word respectively, and Add the predicted node and the target word association to the cache table; use the predicted node as the new current node, and re-execute the selection operation of the current adjacent point corresponding to the current node until the loop end condition is reached .
  • Example 9 provides a text generation method, in which the predicted node and the corresponding target word are determined according to the node transfer matrix and the matching probability from the node to a word. , including: determining the corresponding maximum matching probability based on the matching probability from the node to the word, and determining the word corresponding to the maximum matching probability as the predicted word of the corresponding node; based on the preset path search algorithm, combined with the knot The point transfer matrix and the predicted word of the node determine the prediction path with the highest weight; the predicted word corresponding to the prediction node in the prediction path is determined as the corresponding target word.
  • Example 10 provides a text generation method, which further includes: based on a set loss function generation strategy, learning parameter training for the constructed text decoding model, and obtaining A text decoding model; wherein the learning parameters include: node position parameters involved in the position information input layer included in the text decoding model, basic model parameters involved in the included basic decoding sub-model, and included text predictions Node-related parameters involved in the nodes in the layer.
  • Example 11 provides a text generation method, in which the built text decoding model is trained with learning parameters based on a set loss function generation strategy.
  • the The text decoding model includes: obtaining at least one set of sample data, which includes an original sample text and a corresponding single target sample text; under the current iteration, using text encoding to encode the original sample text in a set of sample data
  • the model is encoded, it is input to the current text decoding model; based on the current text decoding model, the probability value corresponding to when the original sample text is generated through the text prediction path to generate the target sample text, wherein the text prediction path is based on the
  • the nodes in the text prediction layer are formed by combining the setting algorithm; based on the probability value combined with the loss function generation formula, the current loss function value is determined, and the current text decoding is adjusted through backpropagation based on the current loss function value.
  • the learning parameters in the model are used to obtain the text decoding model for the next iteration; the next iteration is regarded as the new current iteration, and the learning parameter training is continued until the iteration end conditions are met, and the trained text decoding model is obtained.
  • Example 12 provides a text generation method, wherein the loss function generation formula is expressed as: taking the logarithm of the sum of the probability values, and adding the logarithm The result of the operation is negative.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed in the embodiments of the present disclosure are a text generation method and apparatus, and a computer device and a storage medium. The method comprises: inputting acquired original text into a trained text coding model, so as to obtain text feature information (S101); and on the basis of the text feature information and in combination with a trained text decoding model, generating target text corresponding to the original text (S102), wherein the text decoding model comprises a text prediction layer, node information of a set number of nodes included in the text prediction layer is determined by means of the text feature information, and target words comprised in the target text and a combination sequence of the target words are determined by means of the node information of the nodes and a topological structure between the nodes.

Description

一种文本生成方法、装置、计算机设备及存储介质A text generation method, device, computer equipment and storage medium
本申请要求在2022年3月31日提交中国专利局、申请号为202210346397.4的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202210346397.4, which was submitted to the China Patent Office on March 31, 2022. The entire content of this application is incorporated into this application by reference.
技术领域Technical field
本公开实施例涉及自然语言处理技术领域,例如涉及一种文本生成方法、装置、计算机设备及存储介质。The embodiments of the present disclosure relate to the technical field of natural language processing, for example, to a text generation method, device, computer equipment and storage medium.
背景技术Background technique
文本生成技术是自然语言处理领域的一个重要技术。通过文本生成技术可以利用既定信息与文本生成模型生成满足特定目标的文本序列。其中,所使用的文本生成模型基于在不同应用场景(生成式阅读理解、人机对话、智能写作以及机器翻译等)下的样本数据进行训练后,就可以实现不同应用场景下的文字生成。Text generation technology is an important technology in the field of natural language processing. Text generation technology can use established information and text generation models to generate text sequences that meet specific goals. Among them, the text generation model used is trained based on sample data in different application scenarios (generative reading comprehension, human-computer dialogue, intelligent writing, machine translation, etc.), and text generation in different application scenarios can be achieved.
目前,文本生成实现中所采用的文本生成模型,所存在的一个问题是在文本生成过程中会有较高的输出延迟(输出延迟指从模型接收输入到模型完全生成文本输出所需的时间延迟)。且该输出延迟与所生成文本的句子长度成线性关系。或者,在解决输出延迟问题时,又会引入新的问题,如所生产的文本可能会出现存在连续重复词,或上下文不连贯的情况。Currently, a problem with the text generation models used in text generation implementations is that there will be a high output delay during the text generation process (output delay refers to the time delay required from the model receiving input to the model fully generating text output. ). And this output delay is linearly related to the sentence length of the generated text. Or, when solving the output delay problem, new problems will be introduced. For example, the text produced may have consecutive repeated words, or the context may be incoherent.
发明内容Contents of the invention
本公开实施例提供了一种文本生成方法、装置、计算机设备及存储介质,降低了所生成文本上下文不连贯性以及连续重复词,提高了所生成文本的质量。Embodiments of the present disclosure provide a text generation method, device, computer equipment, and storage medium, which reduce the contextual incoherence and continuous repetition of words in the generated text, and improve the quality of the generated text.
第一方面,本公开实施例提供了一种文本生成方法,该方法包括:In a first aspect, embodiments of the present disclosure provide a text generation method, which method includes:
将获取的原始文本输入训练后的文本编码模型,获得文本特征信息;Input the obtained original text into the trained text coding model to obtain text feature information;
基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本;Based on the text feature information and combined with the trained text decoding model, generate target text corresponding to the original text;
其中,所述文本解码模型中包括文本预测图层,所述文本预测图层中所包括设定数量结点的结点信息通过所述文本特征信息确定,且所述目标文本中包含的目标词以及目标词的组合顺序,通过所述结点的结点信息及结点间拓扑结构确定。Wherein, the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target words contained in the target text And the combination order of the target words is determined by the node information of the nodes and the topological structure between nodes.
第二方面,本公开实施例还提供了一种文本生成装置,该装置包括:In a second aspect, embodiments of the present disclosure also provide a text generation device, which includes:
编码执行模块,设置为将获取的原始文本输入训练后的文本编码模型,获得文本特征信息;The encoding execution module is configured to input the acquired original text into the trained text encoding model to obtain text feature information;
解码执行模块,设置为基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本;A decoding execution module configured to generate target text corresponding to the original text based on the text feature information and combined with the trained text decoding model;
其中,所述文本解码模型中包括文本预测图层,所述文本预测图层中所包括设定数量结点的结点信息通过所述文本特征信息确定,且所述目标文本中包含的目标词以及目标词的组合顺序,通过所述结点的结点信息及结点间拓扑结构确定。。Wherein, the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target words contained in the target text And the combination order of the target words is determined by the node information of the nodes and the topological structure between nodes. .
第三方面,本公开实施例还提供了一种电子设备,该电子设备包括:In a third aspect, embodiments of the present disclosure also provide an electronic device, which includes:
一个或多个处理器;one or more processors;
存储装置,设置为存储一个或多个程序,a storage device configured to store one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本公开任意实施例所提供的文本生成方法。When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the text generation method provided by any embodiment of the present disclosure.
第四方面,本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本公开任意实施例所提供的文本生成方法。In a fourth aspect, embodiments of the disclosure also provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the text generation method provided by any embodiment of the disclosure is implemented.
附图说明 Description of drawings
图1为本公开一实施例所提供的一种文本生成方法的流程示意图;Figure 1 is a schematic flowchart of a text generation method provided by an embodiment of the present disclosure;
图1a给出了相关技术中的文本生成模型在机器翻译场景中的应用效果图;Figure 1a shows the application renderings of text generation models in related technologies in machine translation scenarios;
图1b给出了本实施例提供的文本生成方法中所采用文本解码模型的结构展示图;Figure 1b shows a structural diagram of the text decoding model used in the text generation method provided in this embodiment;
图1c给出了本实施例所涉及文本生成模型在机器翻译场景中的应用效果图;Figure 1c shows an application rendering of the text generation model involved in this embodiment in a machine translation scenario;
图2给出了本公开实施例提供的一种文本生成方法的流程示意图;Figure 2 shows a schematic flowchart of a text generation method provided by an embodiment of the present disclosure;
图2a给出了本实施例所提供文本生成方法中所采用文本解码模型中部分网络结构的示意图;Figure 2a shows a schematic diagram of part of the network structure of the text decoding model used in the text generation method provided in this embodiment;
图2b给出了本实施例所提供文本生成方法中计算结点转移矩阵的其中一种示例图;Figure 2b shows an example diagram of calculating a node transfer matrix in the text generation method provided by this embodiment;
图2c给出了本实施例提供的文本生成方法中所涉及文本预测图层内全连接结构的示例图;Figure 2c shows an example diagram of the fully connected structure in the text prediction layer involved in the text generation method provided by this embodiment;
图3为本公一开实施例提供的一种文本生成装置的结构示意图;Figure 3 is a schematic structural diagram of a text generation device provided by an embodiment of the present disclosure;
图4为本公开一实施例所提供的一种电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
应当理解,本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that multiple steps described in the method implementations of the present disclosure may be executed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performance of illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "include" and its variations are open-ended, ie, "including but not limited to." The term "based on" means "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; and the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that concepts such as “first” and “second” mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units. Or interdependence. It should be noted that the modifications of "one" and "plurality" mentioned in this disclosure are illustrative and not restrictive. Those skilled in the art will understand that unless the context clearly indicates otherwise, it should be understood as "one or Multiple”.
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are for illustrative purposes only and are not used to limit the scope of these messages or information.
图1为本公开一实施例所提供的一种文本生成方法的流程示意图,本实施例可适用于文本生成的情况,该方法可以由文本生成装置来执行,该装置可以通过软件和/或硬件来实现,可配置于终端和/或服务器中来实现本公开实施例中的文本生成方法。Figure 1 is a schematic flowchart of a text generation method provided by an embodiment of the present disclosure. This embodiment can be applied to the situation of text generation. The method can be executed by a text generation device, and the device can use software and/or hardware. To implement, it can be configured in a terminal and/or server to implement the text generation method in the embodiment of the present disclosure.
需要说明的是,相关技术中的文本生成模型中,通常采用一个输入文本与多个输出文本构成的样本数据进行训练学习,采用这种训练形式对常规文本生成模型训练后,在实际应用中,所生成的目标文本存在预测词混合输出的情况,主要因无法区分预测词来自训练阶段所使用的哪一个输出文本,而把多个输出文本所包含预测词的可能输出混在一起,由此无法保证文本的生成质量。It should be noted that in text generation models in related technologies, sample data composed of one input text and multiple output texts are usually used for training and learning. After using this training form to train the conventional text generation model, in practical applications, The generated target text has mixed output of predicted words, mainly because it is impossible to distinguish which output text the predicted words come from in the training phase, and the possible output of the predicted words contained in multiple output texts is mixed together, so it cannot be guaranteed. Text generation quality.
示例性的,图1a给出了相关技术中的文本生成模型在机器翻译场景中的应用效果图。如图1a所示,所输入的文本可以是中文的“我去电影院了”,在机器翻译的应用场景中,相关技术中的文本生成模型11的目的是生成上述中文句子的英文文本。在对相关技术中的文本生成模型11进行训练时,所采用的英文输出样本可以有多个如:“I went to the movie theater”及“I just went to the cinema”。在完成训练后,实际对“我去电影院了”进行英文的机器翻译时,就有可能将上述输出样本中的单词混合输出,生成的预测文本为“I went went the the theater”的错误文本。As an example, Figure 1a shows an application rendering of a text generation model in related technologies in a machine translation scenario. As shown in Figure 1a, the input text may be "I went to the cinema" in Chinese. In the application scenario of machine translation, the purpose of the text generation model 11 in the related art is to generate the English text of the above-mentioned Chinese sentence. When training the text generation model 11 in related technologies, the English output samples used may include multiple such as: "I went to the movie theater" and "I just went to the cinema". After completing the training, when actually performing English machine translation of "I went to the cinema", it is possible to mix the words in the above output samples and output the predicted text as the wrong text of "I went to the theater".
本实施例提供的一种文本生成方法,对相关技术中的文本生成模型进行了改进,增设了文本预测图层,通过所增设的文本预测图层中所包含的结点,能够获得高质量的生成文本。This embodiment provides a text generation method that improves the text generation model in related technologies and adds a text prediction layer. Through the nodes included in the added text prediction layer, high-quality text can be obtained. Generate text.
例如,如图1所示,本实施例提供的一种文本生成方法可包括下述步骤:For example, as shown in Figure 1, a text generation method provided by this embodiment may include the following steps:
S101、将获取的原始文本输入训练后的文本编码模型,获得文本特征信息。 S101. Input the obtained original text into the trained text coding model to obtain text feature information.
需要知道的是,本实施例所提供文本生成方法并不局限于某个应用场景,如果某个应用场景下需要进行文本生成,就可以在该应用场景下采集训练样本进行文本生成模型的训练。其中,文本生成模型在结构上可以包括两部分,一部分为文本编码模型,一部分为文本解码模型。It should be noted that the text generation method provided in this embodiment is not limited to a certain application scenario. If text generation is required in a certain application scenario, training samples can be collected in the application scenario to train the text generation model. Among them, the text generation model can include two parts in structure, one is the text encoding model, and the other is the text decoding model.
在本实施例中,所述原始文本相当于文本生成前的输入文本,不同的应用场景中原始文本的内容可能不同。如在机器翻译场景中,假设进行中英翻译,原始文本就可以是待翻译的中文文本;若是进行英中翻译,原始文本就可以是待翻译的英文文本。In this embodiment, the original text is equivalent to the input text before text generation, and the content of the original text may be different in different application scenarios. For example, in the machine translation scenario, if Chinese-English translation is performed, the original text can be the Chinese text to be translated; if English-Chinese translation is performed, the original text can be the English text to be translated.
在本实施例中,所述文本编码模型可以用于对原始文本进行编码处理,从而获得原始文本的文本特征信息,其中,该文本编码模型的模型结构可以直接复用相关技术中文本生成模型中的文本编码模型,且可以通过不同应用场景下所提供的样本数据进行训练学习,以使所输出的文本特征信息能够满足应用场景的文本生成需求。示例性的,在机器翻译的应用场景中,所输出的文本特征信息主要用于后续的获得与原始文本对应的翻译文本。In this embodiment, the text encoding model can be used to encode the original text, thereby obtaining the text feature information of the original text, wherein the model structure of the text encoding model can be directly reused in the text generation model in the related art. The text encoding model can be trained and learned through the sample data provided in different application scenarios, so that the output text feature information can meet the text generation needs of the application scenarios. For example, in the application scenario of machine translation, the output text feature information is mainly used to subsequently obtain the translated text corresponding to the original text.
在本实施例中,所述文本特征信息用于表征最初所输入原始文本中的多个词的特征信息,该文本特征信息可以由文本特征矩阵表示,一般的,该文本特征矩阵中所包括文本特征向量的个数与原始文本中所包含词的个数相同。In this embodiment, the text feature information is used to characterize the feature information of multiple words in the originally input original text. The text feature information can be represented by a text feature matrix. Generally, the text included in the text feature matrix The number of feature vectors is the same as the number of words contained in the original text.
S102、基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本。S102. Based on the text feature information and combined with the trained text decoding model, generate a target text corresponding to the original text.
在本实施例中,通过上述步骤获得文本编码模型输出的文本特征信息后,可以将该文本特征信息作为输入数据输入值文本解码模型。在本实施例中,所述文本解码模型中包括文本预测图层,所述文本预测图层中所包括设定数量结点的结点信息通过所述文本特征信息确定,且所述目标文本中包含的目标词以及所述目标词的组合顺序,通过所述设定数量结点的结点信息及结点间拓扑结构确定。In this embodiment, after the text feature information output by the text encoding model is obtained through the above steps, the text feature information can be used as input data to enter the text decoding model. In this embodiment, the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target text The included target words and the combination order of the target words are determined by the node information of the set number of nodes and the topological structure between nodes.
例如,相比于相关技术中文本生成模型中的文本解码模型,本步骤所采用的文本解码模型中包含了文本预测图层,且文本预测图层中包含了一定数量的结点,其中,通过多个结点的结点信息以及结点间拓扑结构可以有效的确定原始文本的目标文本。可以知道的是,本实施例中文本解码模型同样通过不同应用场景下所提供的样本数据进行训练学习,以使所输出的目标文本能够满足应用场景的文本生成需求。For example, compared with the text decoding model in the text generation model in related technologies, the text decoding model used in this step includes a text prediction layer, and the text prediction layer includes a certain number of nodes, among which, through The node information of multiple nodes and the topological structure between nodes can effectively determine the target text of the original text. It can be known that the text decoding model in this embodiment is also trained and learned through sample data provided in different application scenarios, so that the output target text can meet the text generation requirements of the application scenarios.
接上述描述,文本预测图层中包含了设定数量的结点,所有的结点可用来构建文本生成所需的图,每个结点的结点信息可以通过文本特征信息确定。在本实施例中,所述设定数量的具体值大于原始文本所包含词的数量,可作为文本预测图层中进行图构建时所需的图大小,也可以作为待生成文本可能的预测长度,即相当于待生成文本中所包含词的数量不会大于该设定数量。文本预测图层中所包含设定数量结点的结点信息可以通过文本特征信息确定,示例性的,可以将文本特征信息与某些参数信息相结合进行全连接处理,最终将原始文本中多个词的相关特征信息分别映射到结点上,作为结点的结点信息。Following the above description, the text prediction layer contains a set number of nodes. All nodes can be used to construct the graph required for text generation. The node information of each node can be determined through text feature information. In this embodiment, the specific value of the set number is greater than the number of words contained in the original text, and can be used as the graph size required for graph construction in the text prediction layer, or as the possible predicted length of the text to be generated. , which means that the number of words contained in the text to be generated will not be greater than the set number. The node information of a set number of nodes contained in the text prediction layer can be determined through the text feature information. For example, the text feature information can be combined with certain parameter information for full connection processing, and finally multiple nodes in the original text can be processed. The relevant feature information of each word is mapped to the node respectively as the node information of the node.
在本实施例中,对于目标文本的生成逻辑,其需要考虑文本预测图层中结点的结点信息以及结点间拓扑结构。分析可知,目标文本中同样由单个词构成,且目标文本中的词应该与原始文本中的词存在某种关联。其中,通过本实施例上述文本编码模型,可以获得表征原始文本中多个词的文本特征信息,之后,通过本实施例的文本解码模型,可以将文本特征信息通过基础的解码处理,转换为文本预测图层中所包括多个结点的结点信息,相当于将原始文本中多个词与文本预测图层中多个结点建立了关联。In this embodiment, for the generation logic of the target text, it needs to consider the node information of the nodes in the text prediction layer and the topological structure between nodes. The analysis shows that the target text is also composed of single words, and the words in the target text should have some relationship with the words in the original text. Among them, through the above-mentioned text encoding model of this embodiment, text feature information representing multiple words in the original text can be obtained. Afterwards, through the text decoding model of this embodiment, the text feature information can be converted into text through basic decoding processing. The node information of multiple nodes included in the prediction layer is equivalent to establishing an association between multiple words in the original text and multiple nodes in the text prediction layer.
例如,本实施例所提供文本解码模型通过文本预测图层中多个结点的结点信息,可以将多个结点与词典中的词建立对应关系,使得结点能够对应一个最匹配的词。For example, the text decoding model provided in this embodiment can establish a corresponding relationship between multiple nodes and words in the dictionary through the node information of multiple nodes in the text prediction layer, so that the node can correspond to a most matching word. .
此外,本实施例所提供文本编码模型还可以将文本预测图层中多个结点按照一定的连接条件进行连接,形成结点间拓扑结构。基于所形成的结点间拓扑结构,可以清楚多个结点之间的连接关系。根据文本预测图层中训练好的学习参数,结合结点间拓扑结构,就可以确定一个结点到另一所连接结点的转移概率,最终基于每个结点对应的词,以及每个结点到其他 所连接结点的转移概率就可以从多个结点中选中目标结点,由于结点与词一一对应,在选中目标结点时也相应确定了生成目标文本所需的目标词;此外,所生成目标文本中多个目标词的组合顺序也可以通过结点间拓扑结构所表征的结点间连接关系来确定。通过上述逻辑,就可以相对原始文本确定出规避了连续重复词,且上下文关系明确的目标文本。In addition, the text coding model provided in this embodiment can also connect multiple nodes in the text prediction layer according to certain connection conditions to form a topological structure between nodes. Based on the formed topological structure between nodes, the connection relationship between multiple nodes can be clearly understood. According to the trained learning parameters in the text prediction layer, combined with the topological structure between nodes, the transition probability from one node to another connected node can be determined, and finally based on the word corresponding to each node, and each node Click to other The transition probability of the connected nodes can select the target node from multiple nodes. Since the nodes correspond to the words one-to-one, the target words required to generate the target text are also determined accordingly when the target node is selected; in addition, The combination order of multiple target words in the generated target text can also be determined by the connection relationship between nodes represented by the topological structure between nodes. Through the above logic, the target text that avoids continuous repeated words and has clear contextual relationships can be determined relative to the original text.
在本实施例的基础上,对文本解码模型进行了优化,该文本解码模型可以包括:位置信息输入层、基础解码子模型及文本预测图层;On the basis of this embodiment, the text decoding model is optimized. The text decoding model may include: a position information input layer, a basic decoding sub-model and a text prediction layer;
其中,所述位置信息输入层中包括设定数量个结点位置参数,所述设定数量用于决定所述文本预测图层中所包含结点的结点数量;所述文本预测图层中所包括设定数量结点的结点信息通过所述结点位置参数与所述文本特征信息,结合所述基础解码子模型确定。Wherein, the position information input layer includes a set number of node position parameters, and the set number is used to determine the number of nodes included in the text prediction layer; The node information of the set number of nodes included is determined through the node position parameters and the text feature information, combined with the basic decoding sub-model.
在上述实施例中,文本解码模型除了包括文本预测图层,还包括了位置信息输入层和基础解码子模型,且在结构连接上,位置信息输入层的输出的信息传递给基础解码子模型,基础解码子模型输出的信息分别传递给文本预测图层中的多个结点。In the above embodiment, in addition to the text prediction layer, the text decoding model also includes a position information input layer and a basic decoding sub-model, and on the structural connection, the output information of the position information input layer is passed to the basic decoding sub-model, The information output by the basic decoding sub-model is passed to multiple nodes in the text prediction layer.
在本实施例中,所述位置信息输入层可理解为文本生成实现中对文本预测图层中待生成有向无环图所需的图大小进行预测的信息输入层,该位置信息输入层中所预测的图大小实际为构建图所需结点的结点数量,该图大小的值可设定为原始文本中所包含词个数的倍数。可以知道的是,该结点数量决定了文本预测图层中所包含结点的结点数量,即,文本预测图层中表征结点数量的设定数量相当于在该位置信息输入层预先设定;在设定图大小为n时,相当于确定了文本预测图层中所包含结点的数量为n。In this embodiment, the position information input layer can be understood as an information input layer that predicts the graph size required to generate a directed acyclic graph in the text prediction layer in the text generation implementation. The position information input layer The predicted graph size is actually the number of nodes required to construct the graph. The value of the graph size can be set to a multiple of the number of words contained in the original text. It can be known that the number of nodes determines the number of nodes included in the text prediction layer. That is, the set number of nodes representing the number of nodes in the text prediction layer is equivalent to the preset number of nodes in the position information input layer. Determined; when setting the graph size to n, it is equivalent to determining that the number of nodes contained in the text prediction layer is n.
接上述描述,在位置信息输入层中,除了预设文本预测图层中所包含结点的数量,还需要预设每个结点的位置信息,本实施例中采用结点位置参数来表征结点的位置信息,结点位置参数可理解为给构建图所需结点所赋予的位置参数,每个结点位置参数表征文本预测图层中存在相对应的一个结点;同时,该结点位置参数也相当于文本解码模型中训练所得的其中一项学习参数,通过训练迭代,且结点位置参数可以进行相应调整,直至训练结束后获得稳定的参数信息。Continuing from the above description, in the position information input layer, in addition to presetting the number of nodes contained in the text prediction layer, it is also necessary to preset the position information of each node. In this embodiment, node position parameters are used to represent the nodes. The position information of the point. The node position parameters can be understood as the position parameters assigned to the nodes required to construct the graph. Each node position parameter represents the existence of a corresponding node in the text prediction layer; at the same time, the node The position parameter is also equivalent to one of the learning parameters obtained from training in the text decoding model. Through training iterations, the node position parameters can be adjusted accordingly until stable parameter information is obtained after the training.
在文本生成的具体实现中,可以将文本编码模型输入的文本特征信息以及结点位置参数分别作为文本解码模型中基础解码子模型的输入,而基础解码子模型则可以输出与文本预测图层中所包含结点数量相同的向量信息,分别作为相应结点的结点信息。其中,基础解码子模型可以包括:自注意力机制self-attention网络结构以及跨越注意力机制cross-attention网络结构,相当于复用了相关技术中文本生成模型中的文本解码模型。In the specific implementation of text generation, the text feature information and node position parameters input by the text encoding model can be used as the input of the basic decoding sub-model in the text decoding model, and the basic decoding sub-model can be output and combined with the text prediction layer. Vector information containing the same number of nodes is used as the node information of the corresponding node. Among them, the basic decoding sub-model can include: self-attention mechanism self-attention network structure and cross-attention mechanism cross-attention network structure, which is equivalent to reusing the text decoding model in the text generation model in related technologies.
示例性的,图1b给出了本实施例提供的文本生成方法中所采用文本解码模型的结构展示图。如图1b所示,文本解码模型12包括了输入层,输入层中包括两个不同输入分支,其中一个输入分支为进行图大小及结点位置信息输入的位置信息输入层121,位置信息输入层121包括n个确定的结点位置参数g;另一输入分支中用于输入文本编码模型输出的文本特征信息;文本解码模型12还包括了基础解码子模型122以及文本预测图层123,基础解码子模型122中可以包括由自注意力机制和跨越注意力机制构成的m层网络结构;文本预测图层123中包括与结点位置参数个数相同的n个结点;最终,通过文本解码模型12的输出层124输出原始文本的目标文本。Exemplarily, Figure 1b shows a structural diagram of the text decoding model used in the text generation method provided in this embodiment. As shown in Figure 1b, the text decoding model 12 includes an input layer. The input layer includes two different input branches. One of the input branches is the position information input layer 121 for inputting graph size and node position information. The position information input layer 121 includes n determined node position parameters g; the other input branch is used to input text feature information output by the text encoding model; the text decoding model 12 also includes a basic decoding sub-model 122 and a text prediction layer 123. Basic decoding The sub-model 122 may include an m-layer network structure composed of a self-attention mechanism and a spanning attention mechanism; the text prediction layer 123 includes n nodes with the same number of node position parameters; finally, through the text decoding model The output layer 124 of 12 outputs the target text of the original text.
本发明实施例提供的一种文本生成方法,实现了所增设文本预测图层中多个结点的结点信息并行确定以及所生成文本内多个目标词的并行确定,降低了文本生成延迟;同时,通过所增设文本预测图层中多个结点的结点信息,能够实现所生成文本中多个词与所匹配结点的一一对应,从而更好规避了所述生成文本中连续重复词的出现;此外,通过多个结点的结点间拓扑结构,能够限定所生成文本中多个词的组合顺序,进而保证了所生成本文中上下文的关联性,由此提高了所生成文本的生成质量,保证文本准确度。The text generation method provided by the embodiment of the present invention realizes the parallel determination of node information of multiple nodes in the added text prediction layer and the parallel determination of multiple target words in the generated text, reducing the text generation delay; At the same time, through the node information of multiple nodes in the added text prediction layer, one-to-one correspondence between multiple words in the generated text and the matching nodes can be achieved, thereby better avoiding continuous repetition in the generated text. The appearance of words; in addition, through the inter-node topological structure of multiple nodes, the combination order of multiple words in the generated text can be limited, thereby ensuring the relevance of the context in the generated text, thus improving the quality of the generated text. The generation quality ensures text accuracy.
在一实施例中,该方法还包括:In one embodiment, the method further includes:
基于设定的损失函数生成策略,对所构建的文本解码模型进行学习参数训练,获得训练后的文本解码模型; Based on the set loss function generation strategy, perform learning parameter training on the constructed text decoding model to obtain the trained text decoding model;
其中,所述学习参数包括:所述文本解码模型所包括位置信息输入层中涉及的结点位置参数、所包括基础解码子模型中涉及的基础模型参数、以及所包括文本预测图层中所具备各结点涉及的结点相关参数。Wherein, the learning parameters include: node position parameters involved in the position information input layer included in the text decoding model, basic model parameters involved in the included basic decoding sub-model, and included text prediction layer. Node-related parameters involved in each node.
对于如图1a所示的相关技术中文本生成模型,其在训练阶段还存在的一个情况是,参与训练的样本数据中,包含一个输入文本和多个输出文本,由此在训练阶段会存在标签不一致的情况。例如,面对同一个输入文本,存在多种可能的不同输出文本,在模型训练阶段,对同一位置上学习参数进行学习时,其所对应的预测词可能来自不同的输出文本,从而造成了训练困难。For the text generation model in the related art as shown in Figure 1a, another situation that exists in the training phase is that the sample data participating in the training contains one input text and multiple output texts, so there will be labels in the training phase Inconsistency. For example, facing the same input text, there are many different possible output texts. During the model training stage, when learning parameters at the same position are learned, the corresponding predicted words may come from different output texts, thus causing training problems. difficulty.
基于此,本实施例一方面对文本解码模型的网络结构进行了改进,如增设了文本预测图层,采用多于文本所包含词的数值作为结点数量,使得每个结点都可以与输出文本中的一个词对应。另一方面在训练阶段进行了样本数据改进以及损失函数改进。Based on this, on the one hand, this embodiment improves the network structure of the text decoding model, such as adding a text prediction layer and using more values than the words contained in the text as the number of nodes, so that each node can be related to the output A word correspondence in the text. On the other hand, sample data improvement and loss function improvement were performed during the training phase.
对于样本数据改进,本实施例可以采用单样本数据,即一个输入文本就只与一个输出文本对应,构成一条样本数据;对于损失函数改进,给出了损失函数生成策略,该策略主要从文本解码模型中所增设文本预测图层中的结点考虑。示例性的,该策略首先可以考虑结点间可能构成的路径,并考虑通过所构成路径生成输出文本的生成概率,然后结合多条路径的生成概率来生成损失函数。For sample data improvement, this embodiment can use single sample data, that is, one input text corresponds to only one output text, forming a piece of sample data; for loss function improvement, a loss function generation strategy is given, which mainly decodes text Consider the nodes in the text prediction layer added to the model. For example, this strategy can first consider the possible paths between nodes and consider the generation probability of generating output text through the constituted path, and then combine the generation probabilities of multiple paths to generate a loss function.
在一实施例中,通过所确定的损失函数,及改进后设定形式的样本数据,就可以对所创建的文本解码模型中的学习参数通过反向传播进行调整,最终获得准确度较高的文本解码模型。In one embodiment, through the determined loss function and the improved sample data in the set form, the learning parameters in the created text decoding model can be adjusted through backpropagation, and finally a higher accuracy can be obtained. Text decoding model.
可以知道的是,对于文本解码模型的训练相当于对于模型中所具备的学习参数的调整。在该文本解码模型中所具备的学习参数可以包括位置信息输入层中的结点位置参数;还可以包括基础解码子模型中涉及的权重参数;也可以包括文本预测图层中相对多个结点所设定的结点相关参数,结点相关参数中可以用于所生成文本相关的预测结点确定以及结点到词典中预测词的匹配。What you can know is that training the text decoding model is equivalent to adjusting the learning parameters included in the model. The learning parameters in the text decoding model can include node position parameters in the position information input layer; they can also include weight parameters involved in the basic decoding sub-model; and they can also include relatively multiple nodes in the text prediction layer. The set node-related parameters can be used to determine the predicted nodes related to the generated text and match the nodes to predicted words in the dictionary.
在一实施例中,将基于设定的损失函数生成策略,对所构建的文本解码模型进行学习参数训练,获得训练后的文本解码模型,包括:In one embodiment, based on the set loss function generation strategy, the built text decoding model will be trained with learning parameters to obtain a trained text decoding model, including:
a0、获得至少一组样本数据,一组样本数据中包括一个原始样本文本及对应的单个目标样本文本。a0. Obtain at least one set of sample data. The set of sample data includes an original sample text and a corresponding single target sample text.
在本实施例中,可以获得多组样本数据,以在每次训练迭代中输入不同的样本数据。相比于相关技术中的样本数据,本实施例中的一组样本数据中可以包括一个原始样本文本和一个目标样本文本。In this embodiment, multiple sets of sample data can be obtained to input different sample data in each training iteration. Compared with sample data in the related art, a set of sample data in this embodiment may include an original sample text and a target sample text.
b0、在当前迭代下,将一组样本数据中的原始样本文本使用文本编码模型编码后,输入至当前文本解码模型。b0. Under the current iteration, the original sample text in a set of sample data is encoded using the text encoding model and then input into the current text decoding model.
在本实施例中,当前迭代可理解为可能是第一次迭代,也可能是迭代循环中的待执行的训练迭代,每次迭代下所执行的训练逻辑相同。当前文本解码模型可理解为当前迭代下待训练的文本解码模型,本步骤可以先将原始样本文本输入训练后的文本编码模型进行编码处理,之后输入至该当前文本解码模型。In this embodiment, the current iteration can be understood as being either the first iteration or a training iteration to be executed in an iteration loop, and the training logic executed in each iteration is the same. The current text decoding model can be understood as the text decoding model to be trained under the current iteration. In this step, the original sample text can be input into the trained text encoding model for encoding processing, and then input into the current text decoding model.
c0、基于所述当前文本解码模型,确定将所述原始样本文本通过多种文本预测路径生成所述目标样本文本时所对应的概率值。c0. Based on the current text decoding model, determine the probability value corresponding to when the original sample text is generated through multiple text prediction paths to generate the target sample text.
在本实施例中,通过当前文本解码模型所包括的网络结构,及网络结构中学习参数当前的参数值,可以对原始样本文本进行处理,其中,基于当前文本解码模型中文本预测图层内的多个结点可以构成多种文本预测路径,通过多种文本预测路径能够生成一条预测文本。本步骤就可以确定出该预测文本为目标样本文本的概率值,作为基于文本预测路径生成目标样本文本时所对应的概率值。本步骤相当于损失函数生成策略中的其中一个执行逻辑,其确定的概率值用于当前迭代所采用损失函数值的确定。In this embodiment, the original sample text can be processed through the network structure included in the current text decoding model and the current parameter values of the learning parameters in the network structure, where based on the text prediction layer in the current text decoding model Multiple nodes can form multiple text prediction paths, and one predicted text can be generated through multiple text prediction paths. In this step, the probability value that the predicted text is the target sample text can be determined as the corresponding probability value when generating the target sample text based on the text prediction path. This step is equivalent to one of the execution logic in the loss function generation strategy, and the determined probability value is used to determine the value of the loss function used in the current iteration.
其中,文本预测路径基于所述文本预测图层中的结点结合设定算法形成。示例性的,本 步骤在执行中,可以将多个结点间连接所形成的全部路径分别作为文本预测路径,如果直接选定全部路径作为文本预测路径,在路径计算过程中将会占用较多的计算资源来实现模型训练,本实施例考虑在全部路径计算中使用动态规划算法来避免相同逻辑的重复运算,由此来节省计算资源,提升训练时间;同时,本实施例也可以考虑采用一定的算法从结点连接所形成的全部路径中选择一部分路径作为文本预测路径。Wherein, the text prediction path is formed based on the node combination setting algorithm in the text prediction layer. illustrative, this During the execution of the step, all paths formed by connections between multiple nodes can be used as text prediction paths. If all paths are directly selected as text prediction paths, more computing resources will be occupied during the path calculation process. For model training, this embodiment considers using a dynamic programming algorithm in all path calculations to avoid repeated operations of the same logic, thereby saving computing resources and improving training time. At the same time, this embodiment can also consider using a certain algorithm to start from the node. Select a part of the paths formed by the connection as the text prediction path.
d0、基于所述概率值结合损失函数生成公式,确定当前损失函数值,并基于所述当前损失函数值通过反向传播调整所述当前文本解码模型中的学习参数,得到用于下一迭代的文本解码模型。d0. Based on the probability value combined with the loss function generation formula, determine the current loss function value, and adjust the learning parameters in the current text decoding model through backpropagation based on the current loss function value to obtain the next iteration Text decoding model.
在本实施例中,上述确定概率值可以代入预先设定的损失函数生成公式,就可以确定出当前迭代下的当前损失函数值。其中,所述损失函数生成公式表述为对多个所述概率值之和求对数,并将对数运算结果取负。In this embodiment, the above-mentioned determined probability value can be substituted into a preset loss function generation formula, and the current loss function value under the current iteration can be determined. Wherein, the loss function generation formula is expressed as taking the logarithm of the sum of multiple probability values, and taking the negative of the logarithmic operation result.
e0、将下一迭代作为新的当前迭代,返回继续执行步骤b0,直至满足迭代结束条件,获得训练后的文本解码模型。e0. Use the next iteration as the new current iteration, return and continue executing step b0 until the iteration end condition is met, and the trained text decoding model is obtained.
在一实施例中,迭代接收条件可以是迭代逻辑中所确定的当前损失函数值处于设定阈值范围,或者迭代次数达到了设定次数阈值。In one embodiment, the iteration acceptance condition may be that the current loss function value determined in the iteration logic is within a set threshold range, or the number of iterations reaches a set threshold.
通过本实施例给出的模型训练逻辑,可以更好的规避模型训练阶段出现的训练样本中标签不一致的情况,使得文本编码模型中每个结点都可以与待生成文本中出现的词一一对应。Through the model training logic given in this embodiment, it is possible to better avoid the inconsistent labels in the training samples that occur during the model training stage, so that each node in the text encoding model can be matched with the words that appear in the text to be generated. correspond.
示例性的,图1c给出了本实施例所涉及文本生成模型在机器翻译场景中的应用效果图。如图1c所示,所输入的文本同样可以是中文的“我去电影院了”,在机器翻译的应用场景中,本实施例所采用的文本生成模型13(模型中包含了文本预测图层)可以生成上述中文句子的英文文本。在对本实施例所采用文本生成模型13进行训练时,所采用的英文文本样本可以有只有“I went to the movie theater”或者只有“I just went to the cinema”。在完成训练后,英文文本样本中的多个单词在文本生成模型13中分别对应一个处理结点(在图1c中,可以通过预测文本中呈现出的单词来表征一个处理节点);相当于本实施例所采用文本生成模型13可以为每个处理节点确定最匹配的单词,且根据处理结点之间连接关系,可以从处理节点所形成的多种连接路径中确定出一条最符合上下文关系的组合路径。Illustratively, Figure 1c shows an application rendering of the text generation model involved in this embodiment in a machine translation scenario. As shown in Figure 1c, the input text can also be "I went to the cinema" in Chinese. In the application scenario of machine translation, the text generation model 13 used in this embodiment (the model includes a text prediction layer) The English text of the above Chinese sentence can be generated. When training the text generation model 13 used in this embodiment, the English text samples used may be only "I went to the movie theater" or only "I just went to the cinema". After completing the training, multiple words in the English text sample each correspond to a processing node in the text generation model 13 (in Figure 1c, a processing node can be represented by predicting the words presented in the text); equivalent to this The text generation model 13 used in the embodiment can determine the most matching word for each processing node, and according to the connection relationship between the processing nodes, a word that best fits the contextual relationship can be determined from the multiple connection paths formed by the processing nodes. Combined paths.
在采用本实施例所训练的文本生成模型际对“我去电影院了”进行英文的机器翻译时,就可以只选择组合路径内处理结点对应的单词进行组合,由此组合形成可输出的目标文本,如,基于所确定的其中一条组合路径,对应的输出文本可以表示为:“I went to the movie theater”。相比于图1a中输出的错误文本“I went went the the theater”。本实施例所输出文本中规避了单词的连线重复,且保证了上下文连贯。When using the text generation model trained in this embodiment to perform English machine translation of "I went to the cinema", you can only select words corresponding to the processing nodes in the combination path to combine, thereby forming an outputable target. Text, for example, based on one of the determined combination paths, the corresponding output text can be expressed as: "I went to the movie theater". Compared to the error text "I went went the the theater" output in Figure 1a. The text output in this embodiment avoids repeated word connections and ensures contextual coherence.
图2给出了本公开实施例提供的一种文本生成方法的流程示意图,本实施例为上述实施例的细化,在本实施例中,将基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本,包括:将所述文本特征信息及所述位置信息输入层中所述结点位置参数,输入所述基础解码子模型;获得所述基础解码子模型输出的所述设定数量的初始文本预测向量,将所述设定数量的初始文本预测向量分别作为所述文本预测图层中设定数量结点的结点信息;基于所述结点构建有向无环图,确定结点间拓扑结构,并结合结点信息确定所述原始文本的目标文本。Figure 2 shows a schematic flow chart of a text generation method provided by an embodiment of the present disclosure. This embodiment is a refinement of the above embodiment. In this embodiment, based on the text feature information, combined with the trained text Decoding model, generating target text corresponding to the original text, including: inputting the text feature information and the position information into the node position parameters in the layer, inputting into the basic decoding sub-model; obtaining the basic decoding sub-model The set number of initial text prediction vectors output by the model are used as node information of a set number of nodes in the text prediction layer; based on the node construction Directed acyclic graph, determine the topological structure between nodes, and determine the target text of the original text based on the node information.
如图2所示,本实施例提供的一种文本生成方法,包括如下步骤:As shown in Figure 2, this embodiment provides a text generation method, including the following steps:
S201、将获取的原始文本输入训练后的文本编码模型,获得文本特征信息。S201. Input the obtained original text into the trained text coding model to obtain text feature information.
示例性的,文本特征信息可以是包含了原始文本中多个词所对应特征向量的特征矩阵。For example, the text feature information may be a feature matrix containing feature vectors corresponding to multiple words in the original text.
S202、将所述文本特征信息及所述位置信息输入层中所述结点位置参数,输入所述基础解码子模型。S202. Input the text feature information and the position information into the node position parameters in the layer and into the basic decoding sub-model.
在本实施例中,文本解码模型中包含了位置信息输入层,位置信息输入层中包含了文本解码模型中表征待构建图中结点的位置信息(结点位置参数)以及待构建图所具备的图大小(主要通过所包含结点位置参数的数量表征)。 In this embodiment, the text decoding model includes a position information input layer. The position information input layer includes the position information (node position parameters) in the text decoding model that represents the nodes in the graph to be constructed and the features of the graph to be constructed. The size of the graph (mainly characterized by the number of node position parameters included).
本步骤可以将文本特征信息以及结点位置参数均作为输入信息,输入文本解码模型中的基础解码子模型。In this step, text feature information and node position parameters can be used as input information into the basic decoding sub-model in the text decoding model.
图2a给出了本实施例所提供文本生成方法中所采用文本解码模型中部分网络结构的示意图。如图2a所示,给出了文本解码模型中的位置信息输入层,基础解码子模型20;其中,位置信息输入层中包含9个(图大小)结点位置参数21。多个结点位置参数21以及文本编码模型输出的文本特征信息22,可以输入至基础解码子模型20。Figure 2a shows a schematic diagram of part of the network structure of the text decoding model used in the text generation method provided in this embodiment. As shown in Figure 2a, the position information input layer in the text decoding model is given, and the basic decoding sub-model 20 is provided; among them, the position information input layer contains 9 (graph size) node position parameters 21. Multiple node position parameters 21 and text feature information 22 output by the text encoding model can be input to the basic decoding sub-model 20 .
S203、获得所述基础解码子模型输出的所述设定数量的初始文本预测向量,将所述设定数量的初始文本预测向量分别作为所述文本预测图层中设定数量结点的结点信息。S203. Obtain the set number of initial text prediction vectors output by the basic decoding sub-model, and use the set number of initial text prediction vectors as nodes of the set number of nodes in the text prediction layer. information.
本步骤可以获得基础解码子模型输出的处理信息,处理信息可以包括设定数量的初始文本预测向量。其中设定数量与位置信息输入层中结点位置参数的个数相同。本步骤还可以将上述获得的初始文本预测向量与文本预测图层中的结点相对应,作为结点的结点信息。This step can obtain the processing information output by the basic decoding sub-model, and the processing information can include a set number of initial text prediction vectors. The number of settings is the same as the number of node position parameters in the position information input layer. This step can also correspond to the initial text prediction vector obtained above and the node in the text prediction layer as the node information of the node.
接上述图2a,可以看出图2a中还给出了文本预测图层中的结点集合,结点集合中同样包含9个结点23;基础解码子模型20输出的多个初始文本预测向量可以与结点23一一对应,以作为多个结点23的结点信息。Continuing from Figure 2a above, it can be seen that Figure 2a also shows the node set in the text prediction layer. The node set also contains 9 nodes 23; multiple initial text prediction vectors output by the basic decoding sub-model 20 It can correspond to the node 23 one-to-one to serve as the node information of multiple nodes 23.
S204、基于所述结点构建有向无环图,确定结点间拓扑结构,并结合结点信息确定所述原始文本的目标文本。S204. Construct a directed acyclic graph based on the nodes, determine the topological structure between nodes, and determine the target text of the original text based on the node information.
上述步骤相当于为文本预测图层中的结点赋值了结点信息,使得文本预测图层中的结点与实际的原始文本建立关联。The above steps are equivalent to assigning node information to the nodes in the text prediction layer, so that the nodes in the text prediction layer are associated with the actual original text.
本步骤相当于将文本预测图层作为执行主体,其主要基于多个结点的结点信息进行文本生成的后续处理,从而生成原始文本的目标文本。This step is equivalent to using the text prediction layer as the execution subject, which mainly performs subsequent processing of text generation based on the node information of multiple nodes, thereby generating the target text of the original text.
本步骤执行逻辑的分析可以描述为:多个结点赋予结点信息后,仍然是单个的结点,多个结点间还不存在关联;考虑到待生成的文本中多个词之间存在上下文关联,而多个词与文本预测图层中的结点存在联系,由此,本步骤需要建立多个结点之间的关联,而多个结点间的关联可以通过构建图来实现,又考虑到待生成的文本为有向的且无环的,由此本步骤可以基于多个结点来构建有向无环图。The analysis of the execution logic of this step can be described as: after multiple nodes are assigned node information, they are still single nodes, and there is no correlation between multiple nodes; considering that there are multiple words in the text to be generated Contextual association, and multiple words are related to nodes in the text prediction layer. Therefore, this step needs to establish the association between multiple nodes, and the association between multiple nodes can be achieved by constructing a graph. Considering that the text to be generated is directed and acyclic, this step can build a directed acyclic graph based on multiple nodes.
接上述分析,待生成文本中词之间需要存在上下文关联,在确定结点可以表征一个词之后,想要确定词之间的上下文关联,可以转换为结点间的关联,而结点间的关联可以通过有向无环图中结点连接后所形成边的权重来体现。本实施例考虑通过一结点到另一结点的转移概率来表征两结点所形成边的权重。确定结点间的转移概率后,两结点间的转移概率越高,就可以认为两结点之间的关联越大。Following the above analysis, there needs to be a contextual association between words in the text to be generated. After determining that a node can represent a word, if you want to determine the contextual association between words, you can convert it into an association between nodes, and the Association can be reflected by the weight of the edges formed after connecting nodes in a directed acyclic graph. This embodiment considers using the transition probability from one node to another node to represent the weight of the edge formed by two nodes. After determining the transition probability between nodes, the higher the transition probability between two nodes, the greater the correlation between the two nodes.
基于上述分析,本步骤中基于结点信息生成原始文本所对应目标文本的执行逻辑可以描述为:1)建立多个结点之间的有向连接,构成有向无环图,确定相连接两结点中,源结点到目标结点的转移概率,其中源结点为两结点有向连接中的出端结点,目标结点为两结点有向连接中的入端结点;2)确定每个结点对应的预测词;3)根据结点间的转移概率及结点对应的预测词,挑选出目标词,通过获得目标词之间的组合顺序,最终组合目标词形成目标文本。Based on the above analysis, the execution logic of generating the target text corresponding to the original text based on node information in this step can be described as: 1) Establishing directed connections between multiple nodes to form a directed acyclic graph, and determining whether the two connected Among the nodes, the transition probability from the source node to the target node, where the source node is the outgoing node in the directed connection between two nodes, and the target node is the incoming node in the directed connection between two nodes; 2) Determine the predicted words corresponding to each node; 3) Select the target words based on the transition probability between nodes and the predicted words corresponding to the nodes, and finally combine the target words to form the target by obtaining the combination order between the target words. text.
例如,本实施例给出基于多个所述结点构建有向无环图,确定结点间拓扑结构,并结合结点信息确定所述原始文本的目标文本的其中一种实现方式,实现步骤包括下述步骤a1~c1,例如:For example, this embodiment provides an implementation method of constructing a directed acyclic graph based on multiple nodes, determining the topological structure between nodes, and determining the target text of the original text based on node information. The implementation steps Including the following steps a1 ~ c1, for example:
a1、根据所述文本预测图层中结点的结点标号,构建有向无环图,获得结点间拓扑结构。a1. According to the node labels of the nodes in the text prediction layer, construct a directed acyclic graph and obtain the topological structure between nodes.
示例性的,有向无环图的构建用于确定结点间的连接关系。考虑到所构建图的有向性,本实施例基于结点的结点标号进行有向连接,如假设存在9个结点,基于结点标号的由小到大,结点v1将与结点v2~v9分别建立有向连接,而结点v2则只能与v3~v9建立有向连接,依次类推,最后一个结点v9将不再进行有向连接。确定出有向无环图之后,也相当于确定了结点间拓扑结构。For example, the construction of a directed acyclic graph is used to determine the connection relationship between nodes. Taking into account the directional nature of the constructed graph, this embodiment performs directional connections based on the node labels of the nodes. For example, assuming there are 9 nodes, based on the node labels from small to large, node v1 will be connected with node v1. v2~v9 establish directed connections respectively, and node v2 can only establish directed connections with v3~v9, and so on, the last node v9 will no longer have directed connections. After determining the directed acyclic graph, it is equivalent to determining the topology between nodes.
b1、根据结点间拓扑结构及所述结点的结点信息,确定所述文本预测图层对应的结点转移矩阵。 b1. Determine the node transfer matrix corresponding to the text prediction layer according to the topological structure between nodes and the node information of the node.
在本实施例中,结点间拓扑结构中包含了结点与其他结点的连接关系,基于结点间的连接关系,可以知道每个结点都与哪些结点存在连接,且存在的连接为有向连接。在本实施例中,结点转移矩阵的行列值分别为文本预测图层中包括的结点个数。且考虑到结点连接的有向性,该结点转移矩阵可以为一个上三角矩阵。对于结点转移矩阵中的一个有效元素值,其表征了相应行所对应结点与相应列对应结点存在有向连接,且主要为通过相应计算逻辑确定出的两结点的转移概率。In this embodiment, the topological structure between nodes includes the connection relationship between the node and other nodes. Based on the connection relationship between nodes, it can be known which nodes each node is connected to, and the existing connections are Directed connection. In this embodiment, the row and column values of the node transfer matrix are respectively the number of nodes included in the text prediction layer. And considering the directionality of node connections, the node transfer matrix can be an upper triangular matrix. For a valid element value in the node transition matrix, it represents the existence of a directed connection between the node corresponding to the corresponding row and the node corresponding to the corresponding column, and is mainly the transition probability of the two nodes determined through the corresponding calculation logic.
在本实施例中,对于结点间转移概率的确定,其中一种实现逻辑可以描述为:对于建立连接的两结点,其可以获取两结点的结点信息,其中结点信息可以通过特征向量来表征,之后可以将表征两结点所具备结点信息的特征向量相乘,所获得的乘积向量进行归一化处理后就可以作为两结点的转移概率。In this embodiment, for determining the transition probability between nodes, one of the implementation logic can be described as: for two nodes that establish a connection, it can obtain the node information of the two nodes, where the node information can be obtained through the characteristics Then the feature vectors representing the node information of the two nodes can be multiplied together, and the obtained product vector can be used as the transition probability of the two nodes after normalization.
对于结点间转移概率的确定,另一种实现逻辑还可以描述为:首先获取结点在文本预测图层中设定的结点相关参数,如第一学习参数和第二学习参数,主要用于转移概率的确定;其中,结点的结点相关参数存在于文本解码模型中,在完成文本解码模型训练后,可以具备固定的参数值;之后,对于建立连接的两结点,可以根据结点信息与结点相关参数相乘后的乘积向量进行转移概率的确定。For the determination of the transition probability between nodes, another implementation logic can also be described as: first obtain the node-related parameters set by the node in the text prediction layer, such as the first learning parameter and the second learning parameter, mainly using is used to determine the transition probability; among them, the node-related parameters of the node exist in the text decoding model. After completing the training of the text decoding model, they can have fixed parameter values; after that, for the two nodes that are connected, they can be based on the result. The product vector obtained by multiplying the point information and the node-related parameters is used to determine the transition probability.
其中,对于上述基于结点信息结合结点相关参数进行两结点间转移概率确定的实现,给出了下述示例性的描述:以结点vi和结点vj为例,结点vi连接至结点vj,结点vi与结点vj的转移概率计算可以描述为:确定结点vi的初始文本预测向量(结点信息)与第一学习参数的乘积(记为第一乘积);确定结点vj的初始文本预测向量(结点信息)与第二学习参数的乘积(记为第二乘积);对第一乘积与第二乘积的乘积结果进行归一化,归一化后结果就可以看做结点vi与结点vj的转移概率。Among them, for the above-mentioned implementation of determining the transition probability between two nodes based on node information combined with node-related parameters, the following exemplary description is given: Taking node vi and node vj as an example, node vi is connected to The calculation of the transition probability of node vj, node vi and node vj can be described as: determining the product of the initial text prediction vector (node information) of node vi and the first learning parameter (recorded as the first product); determining the node The product of the initial text prediction vector (node information) of point vj and the second learning parameter (recorded as the second product); normalize the product result of the first product and the second product, and the result after normalization can be Seen as the transition probability of node vi and node vj.
基于上述描述,可以知道在确定相连接两结点间的转移概率后,基于转移概率就可以形成文本预测图层的结点转移矩阵。Based on the above description, it can be known that after determining the transition probability between the two connected nodes, the node transition matrix of the text prediction layer can be formed based on the transition probability.
例如,本实施例可以将根据结点间拓扑结构及所述结点的结点信息,确定所述文本预测图层对应的结点转移矩阵,包括:For example, this embodiment can determine the node transition matrix corresponding to the text prediction layer based on the topological structure between nodes and the node information of the node, including:
b11、针对每个结点,从结点间拓扑结构中确定所述结点有向连接的邻接结点。b11. For each node, determine the adjacent nodes with directed connections to the node from the topological structure between nodes.
其中,通过上述构建的有向无环图,在获得结点间拓扑结构后,可以很容易确定与结点存在有向连接的其他结点,这些结点就可以看做该结点的邻接结点。Among them, through the directed acyclic graph constructed above, after obtaining the topological structure between nodes, it is easy to determine other nodes with directed connections to the node, and these nodes can be regarded as the adjacent nodes of the node. point.
b12、根据所述结点及所述邻接结点的结点信息,确定所述结点到邻接结点的转移概率。b12. Based on the node information of the node and the adjacent node, determine the transition probability from the node to the adjacent node.
示例性的,在其中一种实现方式中,结点vi到结点vj的转移概率pvi->vj的计算可以描述为:其中,softmax表示归一化,表示该文本预测图层的规模大小(d在构建阶段确定),Vi和Vj分别表示结点vi和结点vj的结点信息向量。For example, in one of the implementations, the calculation of the transition probability p vi->vj from node vi to node vj can be described as: Among them, softmax represents normalization, Indicates the size of the text prediction layer (d is determined in the construction phase), Vi and Vj respectively represent the node information vectors of node vi and node vj.
在另一种示例性的实现方式中,其实现逻辑可以概括为:针对每个结点,根据所述结点及所对应邻接结点的结点信息、第一学习参数及第二学习参数,结合概率转移公式,确定所述结点到邻接结点的转移概率,其中,第一学习参数及第二学习参数均为与结点对应的结点相关参数。参考上述描述,概率转移公式可以表述为:In another exemplary implementation, the implementation logic can be summarized as: for each node, according to the node information of the node and the corresponding adjacent node, the first learning parameter and the second learning parameter, Combined with the probability transition formula, the transition probability from the node to the adjacent node is determined, where the first learning parameter and the second learning parameter are both node-related parameters corresponding to the node. Referring to the above description, the probability transfer formula can be expressed as:
其中,同上述描述,softmax表示归一化,表示该文本预测图层的规模大小(d在构建阶段确定),Vi和Vj分别表示结点vi和结点vj的结点信息向量;另外,该公式中W1表示结点相关的第一学习参数;W2表示结点相关的第二学习参数;pvi->vj表示结点vi到结点vj的转移概率。 Among them, the same as the above description, softmax represents normalization, Indicates the size of the text prediction layer (d is determined in the construction phase), Vi and Vj respectively represent the node information vectors of node vi and node vj; in addition, W1 in this formula represents the first learning parameter related to the node ; W2 represents the second learning parameter related to the node; p vi-> vj represents the transition probability from node vi to node vj.
b13、基于所述转移概率形成所述文本预测图层对应的结点转移矩阵。b13. Form a node transition matrix corresponding to the text prediction layer based on the transition probability.
可以知道,基于上述步骤b12和b13可以计算出结点与其邻接结点的转移概率,基于转移概率就可以形成结点转移矩阵。It can be known that based on the above steps b12 and b13, the transition probability of a node and its adjacent nodes can be calculated, and the node transition matrix can be formed based on the transition probability.
示例性的,图2b给出了本实施例所提供文本生成方法中计算结点转移矩阵的其中一种示例图。如图2b所示,为文本预测图层中所包括结点进行了转移概率的计算,图2b中的E表 征了计算所得的结点转移矩阵。需要说明的是,图2b中给出了结点的部分连接,及相应连接所对应的转移概率。如v1到v2的转移概率0.3;v1到v3的转移概率0.7等。可以知道的是,在结点转移矩阵E中,每一行的转移概率之和为1。Illustratively, FIG. 2b shows an example diagram of calculating a node transition matrix in the text generation method provided in this embodiment. As shown in Figure 2b, the transition probability is calculated for the nodes included in the text prediction layer. Table E in Figure 2b represents the calculated node transfer matrix. It should be noted that Figure 2b shows some connections of nodes and the transition probabilities corresponding to the corresponding connections. For example, the transition probability from v1 to v2 is 0.3; the transition probability from v1 to v3 is 0.7, etc. What we can know is that in the node transition matrix E, the sum of the transition probabilities of each row is 1.
c1、根据所述结点的结点信息,及所述结点转移矩阵,确定所述原始文本的目标文本。c1. Determine the target text of the original text according to the node information of the node and the node transition matrix.
在本实施例中,确定出结点转移矩阵后,相当于确定了有向无环图中连接所形成边的权重,本实施例可以通过预测路径的选定策略选定一条预测路径;示例性的,对于预测路径的选定,其中一种实现方式可以描述为:沿结点连线放向,在出端结点固定的前提下,选定与出端结点具备最高转移概率的入端结点,将两结点的边作为预测路径中的其中一边;之后在选定新的出端结点重复执行上述逻辑,最终选定预测路径中的所有边,进而也确定出构成预测路径的目标点。In this embodiment, after determining the node transition matrix, it is equivalent to determining the weight of the edge formed by the connection in the directed acyclic graph. This embodiment can select a prediction path through the prediction path selection strategy; exemplary , for the selection of the predicted path, one of the implementation methods can be described as: along the node connection line, on the premise that the outgoing end node is fixed, select the incoming end with the highest transition probability from the outgoing end node. node, and use the edge between the two nodes as one of the sides of the predicted path; then repeat the above logic on selecting a new outgoing node, and finally select all the edges in the predicted path, and then determine the components that make up the predicted path. Target.
如图2b所示,通过上述逻辑描述,可以确定出预测路径为v1->v3->v4->v5->v6->v9;所包含的目标点为A={v1,v3,v4,v5,v6,v9}。As shown in Figure 2b, through the above logical description, it can be determined that the predicted path is v1->v3->v4->v5->v6->v9; the included target point is A={v1, v3, v4, v5 , v6, v9}.
同时,本步骤根据结点信息,结合文本预测图层中存在的全连接层,可以确定每个结点与词典中所包括的词的概率信息,其中,词典可以是预先创建的词表信息,包含了文本生成所需的多种词,且每个词可以通过向量的形式表征。基于文本预测图层中存在的全连接层,全连接的前一层结点可以是本实施例中有向无环图的结点,后一层结点可以是词典中的词。全连接处理可以是计算有向无环图中结点到词典中词结点的匹配概率,计算形式可以基于结点的结点信息与词结点的词向量通过全连接来实现。At the same time, this step can determine the probability information of each node and the words included in the dictionary based on the node information and the fully connected layer existing in the text prediction layer. The dictionary can be pre-created word list information. It contains a variety of words required for text generation, and each word can be represented in the form of a vector. Based on the fully connected layer existing in the text prediction layer, the node of the previous layer of the fully connected layer can be the node of the directed acyclic graph in this embodiment, and the node of the latter layer can be the word in the dictionary. The full connection processing can be to calculate the matching probability from the node in the directed acyclic graph to the word node in the dictionary. The calculation form can be realized through full connection based on the node information of the node and the word vector of the word node.
在上述确定了预测路径及结点到词的匹配向量后,就可以确定出预测路径中结点对应的目标词,最终基于多个目标词组合形成目标文本。需要说明的是,本实施例中并不对进行预测路径确定和匹配概率的执行顺序进行确定,其也可以在确定匹配概率后,再确定预测路径。只要能完成目标文本的生成即可。After the prediction path and the matching vector from the node to the word are determined above, the target word corresponding to the node in the prediction path can be determined, and finally the target text is formed based on the combination of multiple target words. It should be noted that in this embodiment, the order of execution of determining the prediction path and matching probability is not determined. It is also possible to determine the prediction path after determining the matching probability. As long as the target text can be generated.
在上述实施例的基础上,本实施例还可以对上述步骤c1“根据所述结点的结点信息,及所述结点转移矩阵,确定所述原始文本的目标文本”进行细化描述。On the basis of the above embodiment, this embodiment can also provide a detailed description of the above step c1 of "determining the target text of the original text based on the node information of the node and the node transfer matrix".
示例性的,在获得文本预测图层所对应结点转移矩阵以及结点的结点信息后,可以通过本实施例中所提供步骤c11~c13执行逻辑来实现。For example, after obtaining the node transfer matrix corresponding to the text prediction layer and the node information of the node, it can be implemented by executing the logic of steps c11 to c13 provided in this embodiment.
需要说明的是,文本预测图层中除了包括构建有向无环图所需的结点,还包括了全连接结构,全连接结构可以将有向无环图中结点的结点信息看做输入信息,全连接结构中的下一层则可认为是词典中词形成的词结点,全连接结构中可以通过连接线将有向无环图中的结点与词典中的词结点进行连接。全连接结构中每条连接线的连接权重,可以是对文本解码模型训练后相对结点与词的连接确定的第三学习参数。It should be noted that in addition to the nodes required to construct a directed acyclic graph, the text prediction layer also includes a fully connected structure. The fully connected structure can regard the node information of the nodes in the directed acyclic graph as Input information. The next layer in the fully connected structure can be considered as the word node formed by the words in the dictionary. In the fully connected structure, the nodes in the directed acyclic graph and the word nodes in the dictionary can be connected through connecting lines. connect. The connection weight of each connection line in the fully connected structure can be the third learning parameter determined by the connection between the relative node and the word after training the text decoding model.
图2c给出了本实施例提供的文本生成方法中所涉及文本预测图层内全连接结构的示例图。如图2c所示,在有向无环图所展示的结点之上,包括了确定结点所关联预测词的全连接结构24。需要说明的是,图2c中还包括的结果输出层,在结果输出层上,仅展示了有向无环图中结点相匹配的预测词,如,与结点v1相匹配的单词为“I”;与结点v2相匹配的单词为“just”;与结点v3相匹配的单词为“went”等。Figure 2c shows an example diagram of the fully connected structure in the text prediction layer involved in the text generation method provided in this embodiment. As shown in Figure 2c, above the nodes shown in the directed acyclic graph, there is a fully connected structure 24 that determines the predicted words associated with the nodes. It should be noted that Figure 2c also includes the result output layer. On the result output layer, only the predicted words matching the nodes in the directed acyclic graph are displayed. For example, the word matching node v1 is " I"; the word matching node v2 is "just"; the word matching node v3 is "went", etc.
c11、根据所述结点的结点信息,通过所述文本预测图层中全连接层,确定所述结点到预设词表中词的匹配概率。c11. According to the node information of the node, determine the matching probability of the node to the word in the preset vocabulary through the fully connected layer in the text prediction layer.
本步骤的具体实现,执行逻辑可以描述为每个结点与词典中多个词存在连接,本步骤中,结点和词也都可以通过向量来表征相应的信息。由此,对于结点与词的匹配概率,如果全连接结构中的连接权重在文本解码模型训练阶段重新确定,就可以先获取训练所得第三学习参数,再确定相应第三学习参数与相应结点信息及词信息的向量乘积;如果文本解码模型训练阶段不再重确连接权重,而是直接共享文本编码模型所使用的词语特征,就可直接确定相应结点信息及词信息的向量乘积;之后还可以确定该结点相对所有词的向量乘积,进行归一化后可作为结点到词的匹配概率。The specific implementation of this step and the execution logic can be described as each node being connected to multiple words in the dictionary. In this step, the nodes and words can also represent the corresponding information through vectors. Therefore, for the matching probability of nodes and words, if the connection weights in the fully connected structure are re-determined in the text decoding model training stage, the third learning parameters obtained from training can be obtained first, and then the corresponding third learning parameters and the corresponding nodes can be determined. The vector product of point information and word information; if the text decoding model training phase no longer corrects the connection weights, but directly shares the word features used by the text encoding model, the vector product of the corresponding node information and word information can be directly determined; Afterwards, the vector product of the node relative to all words can also be determined, which can be used as the matching probability from the node to the word after normalization.
其中,全连接层构建于文本预测图层内,其包含了进行匹配概率处理的全连接结构,全 连接结构中可以相对每个结点进行全连接处理。Among them, the fully connected layer is built within the text prediction layer, which contains a fully connected structure for matching probability processing. In the connection structure, full connection processing can be performed on each node.
c12、根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词。c12. Determine the predicted node and the corresponding target word according to the node transfer matrix and the matching probability from the node to the word.
在本实施例中,预测结点可认为是在文本预测图层的结点中选定的目标文本生成所依赖的关键结点。基于预测结点对应的匹配概率,可以确定出该预测结点相匹配的预测词,预测词就可以看做目标文本中所包含的目标词。In this embodiment, the prediction node can be considered as a key node on which the target text generation depends on selected among the nodes of the text prediction layer. Based on the matching probability corresponding to the prediction node, the prediction word matching the prediction node can be determined, and the prediction word can be regarded as the target word contained in the target text.
在本实施例中,可以通过在文本预测图层中基于结点转移矩阵确定预测路径来获得预测点,之后通过结点到词的匹配概率来确定预测点的目标词;也可以基于结点转移矩阵及结点到词的匹配概率来确定预测结点和目标词,基于预测结点确定出预测路径,以用来组合目标词形成目标文本;还可以先基于结点到词的匹配概率先确定出结点对应的预测词,然后通过搜索算法在有向无环图中确定预测路径,最终选定文本生成所需的目标词。In this embodiment, the prediction point can be obtained by determining the prediction path based on the node transfer matrix in the text prediction layer, and then the target word of the prediction point can be determined based on the matching probability from the node to the word; it can also be based on the node transfer The matrix and the matching probability of the node to the word are used to determine the prediction node and the target word, and the prediction path is determined based on the prediction node, which is used to combine the target words to form the target text; it can also be determined first based on the matching probability of the node to the word. The predicted words corresponding to the nodes are generated, and then the prediction path is determined in the directed acyclic graph through a search algorithm, and the target words required for text generation are finally selected.
c13、基于所述目标词,组合形成所述原始文本的目标文本。c13. Based on the target words, combine to form the target text of the original text.
本步骤对上述确定的目标词按照文本预测图层中所对应结点之间的连接方向进行组合,其中,多个目标词仅能确定出一种组合顺序,最终按照该组合顺序就可以获得最终的目标文本。该目标文本就相当于与原始文本进行文本生成处理后所得的结果。In this step, the target words determined above are combined according to the connection directions between the corresponding nodes in the text prediction layer. Multiple target words can only determine one combination order, and finally the final result can be obtained according to this combination order. target text. The target text is equivalent to the result of text generation processing with the original text.
本实施例给出了上述步骤c13的细化,示例性的,对于根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词,本实施例提供了一种实施方式,可以描述为:This embodiment provides a refinement of the above step c13. For example, for determining the prediction node and the corresponding target word according to the node transfer matrix and the matching probability from the node to the word, this embodiment provides a An implementation manner can be described as:
根据所述结点转移矩阵中结点对应的最大转移概率,确定至少一个预测结点。At least one predicted node is determined according to the maximum transition probability corresponding to the node in the node transition matrix.
其中,结点最大转移概率的确定存在顺序性,其首先从起始结点标号对应的结点开始,可以将该结点作为首个预测结点,在该预测结点与邻接结点连接对应的转移概率中,可以确定出该预测结点的最大转移概率,该最大转移概率对应的邻接结点就可看做一个新的预测结点;之后,可以对新的预测结点再次进行最大转移概率的确定,并由此再确定出新的预测结点;通过上述逻辑可以循环确定出预测结点,直至达到最后一个结点,最后一个结点也可以作为最后一个预测结点。由此本步骤至少可以获得一个预测结点(一个的情况就是起始结点也是结束结点)。Among them, there is a sequential nature in determining the maximum transition probability of a node. It first starts from the node corresponding to the starting node label. This node can be used as the first predicted node, and the predicted node is connected to the adjacent node correspondingly. Among the transition probabilities, the maximum transition probability of the prediction node can be determined, and the adjacent node corresponding to the maximum transition probability can be regarded as a new prediction node; after that, the maximum transition can be performed again on the new prediction node Determine the probability, and then determine a new prediction node; through the above logic, the prediction nodes can be determined in a loop until the last node is reached, and the last node can also be used as the last prediction node. From this step, at least one prediction node can be obtained (in one case, the starting node is also the ending node).
针对每个预测结点,从所述预测结点到词的匹配概率中确定最大匹配概率,并将该最大匹配概率对应的词确定为目标词。For each prediction node, the maximum matching probability is determined from the matching probability between the prediction node and the word, and the word corresponding to the maximum matching probability is determined as the target word.
其中,对于上述确定出预测结点,在已知预测结点到词的匹配概率后,也可以从匹配概率中确定出最大匹配概率,进而可以获得该最大匹配概率对应的预测词,该预测词就相当于该预测结点对应的目标词。可以知道的是,通过预测结点的确定顺序,就可以确定出一条用于目标词组合的组合路径,该组合路径就可以用作最后的目标文本生成。Among them, for the prediction node determined above, after the matching probability of the prediction node to the word is known, the maximum matching probability can also be determined from the matching probability, and then the prediction word corresponding to the maximum matching probability can be obtained. The prediction word It is equivalent to the target word corresponding to the prediction node. What can be known is that by predicting the determined order of nodes, a combination path for target word combination can be determined, and this combination path can be used as the final target text generation.
示例性的,对于上述步骤c13的细化,本实施例也提供了另一种实现方式,需要说明的是,区别于上述实现逻辑,本方式的实现逻辑在于同时考虑结点转移矩阵中转移概率及结点与词所对应匹配概率对预测结点的影响,其可以将转移概率及匹配概率相乘,基于乘积结果来确定预测结点。Illustratively, for the refinement of the above step c13, this embodiment also provides another implementation method. It should be noted that, different from the above implementation logic, the implementation logic of this method is to simultaneously consider the transition probability in the node transition matrix. And the impact of the matching probability of the node and the word on the predicted node, which can multiply the transition probability and the matching probability, and determine the predicted node based on the product result.
其中,本实现方式的步骤可描述为:Among them, the steps of this implementation can be described as:
1)将起始结点标号对应的结点作为当前结点。其中,该当前结点可记为首个预测结点。1) Use the node corresponding to the starting node label as the current node. Among them, the current node can be recorded as the first predicted node.
2)从所述结点转移矩阵中获取所述当前结点到邻接结点的当前转移概率。2) Obtain the current transition probability from the current node to the adjacent node from the node transition matrix.
3)确定所述当前转移概率分别与所述当前结点与词所对应匹配概率的乘积值。3) Determine the product value of the current transition probability and the matching probability corresponding to the current node and word respectively.
4)从所述乘积值中选定最大乘积值,并将所述最大乘积值关联的邻接结点及词分别作为预测结点和目标词,并将所述预测结点及目标词关联添加至缓存表。4) Select the maximum product value from the product values, and use the adjacent nodes and words associated with the maximum product value as prediction nodes and target words respectively, and add the prediction node and target word associations to Cache table.
其中,可以知道最大乘积值对应的匹配概率及转移概率,以当前结点为参照,可以知道上述匹配概率相对当前结点对应的词,该词就记为一个目标词,还可以知道上述转移概率相对当前结点对应的邻接结点,该邻接结点就可以记为另一个预测结点。Among them, you can know the matching probability and transition probability corresponding to the maximum product value. Using the current node as a reference, you can know the word corresponding to the above matching probability relative to the current node. The word is recorded as a target word. You can also know the above transition probability. Relative to the adjacent node corresponding to the current node, the adjacent node can be recorded as another predicted node.
5)将所述预测结点作为新的当前结点,重新执行所述当前结点所对应当前邻接点的选定操作,直至达到循环结束条件。5) Use the predicted node as a new current node, and re-execute the selection operation of the current adjacent point corresponding to the current node until the loop end condition is reached.
可以看出,本执行逻辑中同样按照结点的有向连接顺序进行循环处理,由此可以确定出 满足条件的预测结点以及目标词。It can be seen that this execution logic also performs loop processing in the order of directed connections of nodes, from which it can be determined Prediction nodes and target words that meet the conditions.
同样的,上述在确定预测结点的过程中,相当于也确定出了目标词组合是所采用的组合顺序。Similarly, the above process of determining the prediction node is equivalent to determining the combination order of the target word combination.
示例性的,对于上述步骤c13的细化,本实施例还提供了又一种实现方式,区别于上述两种实现方式,该实施方式主要考虑了存在不同结点可能对应了同一词的情况,本实施例方式相当于在该种情况的基础上提出了一种目标词的确定方式。Illustratively, for the refinement of the above step c13, this embodiment also provides another implementation method. Different from the above two implementation methods, this implementation method mainly considers the situation that there are different nodes that may correspond to the same word. This embodiment is equivalent to proposing a target word determination method based on this situation.
其中,本实现方式的步骤可描述为:Among them, the steps of this implementation can be described as:
1)基于结点到词的匹配概率,确定相应的最大匹配概率,将所述最大匹配概率对应的词,确定为相应结点的预测词。1) Based on the matching probability from the node to the word, determine the corresponding maximum matching probability, and determine the word corresponding to the maximum matching probability as the predicted word of the corresponding node.
首先通过本步骤先为文本预测图层中结点确定相应的预测词。其中,预测词的确定同样采用最大匹配概率的逻辑实现。First, through this step, the corresponding predicted words are determined for the nodes in the text prediction layer. Among them, the determination of predicted words is also implemented using the logic of maximum matching probability.
2)根据预先设定路径搜索算法,结合所述结点转移矩阵及所述结点的预测词,确定权重最高的预测路径。2) According to the preset path search algorithm, combined with the node transfer matrix and the predicted word of the node, determine the predicted path with the highest weight.
本步骤的目的主要在于对文本预测图层中结点基于结点标号顺序确定候选文本生成路径,并基于结点转移矩阵,确定出候选文本生成路径中两结点间边的转移概率;然后通过路径搜索算法结合预测词,从候选文本生成路径中确定出不同结点表征同一预测词的候选预测路径;并从候选预测路径中获得权重最高的预测路径。The purpose of this step is mainly to determine the candidate text generation path based on the node label sequence for the nodes in the text prediction layer, and to determine the transition probability of the edge between the two nodes in the candidate text generation path based on the node transition matrix; and then through The path search algorithm combines the predicted words to determine candidate prediction paths with different nodes representing the same predicted word from the candidate text generation path; and obtains the prediction path with the highest weight from the candidate prediction paths.
3)将所述预测路径中预测结点对应的预测词确定为相应的目标词。3) Determine the predicted word corresponding to the predicted node in the predicted path as the corresponding target word.
对于上述给出的三种确定预测结点和目标词的实现方式,第一种的执行速度最快,但所生成文本的生成质量相对较低;第二种在执行速度和文本生成质量上处于适中状态;第三种的执行速度相对较慢,但所生成文本的生成质量相对较高。本实施例可以采用上述几种方式但并不局限于上述方式,在应用场景中,可以根据实际情况考虑合适的预测结点及目标词的实现方式来生成目标文本。For the three implementation methods of determining prediction nodes and target words given above, the first one has the fastest execution speed, but the quality of the generated text is relatively low; the second one is at the lowest in terms of execution speed and text generation quality. Moderate state; the execution speed of the third type is relatively slow, but the quality of the generated text is relatively high. This embodiment can adopt the above methods but is not limited to the above methods. In the application scenario, the target text can be generated by considering the appropriate prediction node and target word implementation method according to the actual situation.
本实施例提供的一种文本生成方法,细化了文本解码模型生成目标文本的实现过程,通过增设文本预测图层,来考虑通过有向无环图的形式采用图结点进行目标词以及预测结点的有效确定,保证了上下文的关联性,也避免了所生成文本中重复词的连续出现。相比于相关技术,提高所生成文本的生成质量,保证文本准确度。This embodiment provides a text generation method that refines the implementation process of the text decoding model to generate target text. By adding a text prediction layer, it is considered to use graph nodes in the form of a directed acyclic graph to perform target words and predictions. The effective determination of nodes ensures the relevance of the context and avoids the continuous occurrence of repeated words in the generated text. Compared with related technologies, it improves the quality of generated text and ensures text accuracy.
图3为本公开实施例提供的一种文本生成装置的结构示意图,本实施例可适用于文本生成的情况,该装置可以通过软件和/或硬件来实现,可配置于终端和/或服务器中来实现本公开实施例中的文本生成方法。该装置可包括:编码执行模块31和解码执行模块32。Figure 3 is a schematic structural diagram of a text generation device provided by an embodiment of the present disclosure. This embodiment can be applied to text generation. The device can be implemented by software and/or hardware, and can be configured in a terminal and/or server. To implement the text generation method in the embodiment of the present disclosure. The device may include: an encoding execution module 31 and a decoding execution module 32.
其中,编码执行模块31,设置为将获取的原始文本输入训练后的文本编码模型,获得文本特征信息;Among them, the encoding execution module 31 is configured to input the acquired original text into the trained text encoding model to obtain text feature information;
解码执行模块32,设置为基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本;The decoding execution module 32 is configured to generate target text corresponding to the original text based on the text feature information and combined with the trained text decoding model;
其中,所述文本解码模型中包括文本预测图层,所述文本预测图层中所包括设定数量结点的结点信息通过所述文本特征信息确定,且所述目标文本中包含的目标词以及目标词的组合顺序,通过所述结点的结点信息及结点间拓扑结构确定。Wherein, the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target words contained in the target text And the combination order of the target words is determined by the node information of the nodes and the topological structure between nodes.
本实施例提供的一种文本生成装置,实现了所增设文本预测图层中结点的结点信息并行确定以及所生成文本内目标词的并行确定,降低了文本生成延迟;同时,通过所增设文本预测图层中结点的结点信息,能够实现所生成文本中词与所匹配结点的一一对应,从而更好规避了所述生成文本中连续重复词的出现;此外,通过结点的结点间拓扑结构,能够限定所生成文本中词的组合顺序,进而保证了所生成本文中上下文的关联性,由此提高了所生成文本的生成质量,保证文本准确度。This embodiment provides a text generation device that realizes parallel determination of node information of nodes in the added text prediction layer and parallel determination of target words in the generated text, reducing text generation delay; at the same time, through the added The node information of the nodes in the text prediction layer can achieve a one-to-one correspondence between the words in the generated text and the matching nodes, thereby better avoiding the occurrence of consecutive repeated words in the generated text; in addition, through the nodes The topological structure between nodes can limit the combination order of words in the generated text, thereby ensuring the relevance of the context in the generated text, thereby improving the generation quality of the generated text and ensuring text accuracy.
在一实施例中,所述文本解码模型包括:位置信息输入层、基础解码子模型及文本预测图层; In one embodiment, the text decoding model includes: a position information input layer, a basic decoding sub-model and a text prediction layer;
所述位置信息输入层中包括设定数量个结点位置参数,所述设定数量用于决定所述文本预测图层中所包含结点的结点数量;The position information input layer includes a set number of node position parameters, and the set number is used to determine the number of nodes included in the text prediction layer;
所述文本预测图层中所包括设定数量结点的结点信息通过所述结点位置参数与所述文本特征信息,结合所述基础解码子模型确定。The node information of a set number of nodes included in the text prediction layer is determined through the node position parameters and the text feature information, combined with the basic decoding sub-model.
在一实施例中,解码执行模块32包括:In one embodiment, the decoding execution module 32 includes:
信息输入单元,设置为将所述文本特征信息及所述位置信息输入层中所述结点位置参数,输入所述基础解码子模型;An information input unit configured to input the text feature information and the position information into the node position parameters in the layer and into the basic decoding sub-model;
初始向量输出单元,设置为获得所述基础解码子模型输出的所述设定数量的初始文本预测向量,将所述设定数量的初始文本预测向量分别作为所述文本预测图层中设定数量结点的结点信息;The initial vector output unit is configured to obtain the set number of initial text prediction vectors output by the basic decoding sub-model, and use the set number of initial text prediction vectors as the set number in the text prediction layer. Node information of the node;
文本生成单元,设置为基于所述结点构建有向无环图,确定结点间拓扑结构,并结合结点信息确定所述原始文本的目标文本。The text generation unit is configured to construct a directed acyclic graph based on the nodes, determine the topological structure between nodes, and determine the target text of the original text in combination with node information.
在一实施例中,文本生成单元包括:In one embodiment, the text generation unit includes:
第一执行单元,设置为根据所述文本预测图层中结点的结点标号,构建有向无环图,获得结点间拓扑结构;The first execution unit is configured to construct a directed acyclic graph based on the node labels of the nodes in the text prediction layer, and obtain the topological structure between nodes;
第二执行单元,设置为根据结点间拓扑结构及所述结点的结点信息,确定所述文本预测图层对应的结点转移矩阵;The second execution unit is configured to determine the node transfer matrix corresponding to the text prediction layer based on the topological structure between nodes and the node information of the node;
第三执行单元,设置为根据所述结点的结点信息,及所述结点转移矩阵,确定所述原始文本的目标文本。The third execution unit is configured to determine the target text of the original text based on the node information of the node and the node transition matrix.
在一实施例中,第二执行单元设置为:In one embodiment, the second execution unit is configured as:
针对每个结点,从结点间拓扑结构中确定所述结点有向连接的邻接结点;For each node, determine the adjacent nodes with directed connections to the node from the topological structure between nodes;
根据所述结点及所述邻接结点的结点信息,确定所述结点到邻接结点的转移概率;Determine the transition probability from the node to the adjacent node based on the node information of the node and the adjacent node;
基于所述转移概率形成所述文本预测图层对应的结点转移矩阵。A node transition matrix corresponding to the text prediction layer is formed based on the transition probability.
在一实施例中,第三执行单元设置为:In one embodiment, the third execution unit is configured as:
根据所述结点的结点信息,通过所述文本预测图层中全连接层,确定所述结点到预设词表中词的匹配概率;According to the node information of the node, the matching probability of the node to the word in the preset vocabulary is determined through the fully connected layer in the text prediction layer;
根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词;According to the node transfer matrix and the matching probability from the node to the word, determine the predicted node and the corresponding target word;
基于所述目标词,组合形成所述原始文本的目标文本。Based on the target words, a target text of the original text is formed.
在一实施例中,第三执行单元执行根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词的步骤可以是:In one embodiment, the third execution unit performs the step of determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to the word, which may be:
根据所述结点转移矩阵中结点对应的最大转移概率,确定至少一个预测结点;Determine at least one prediction node according to the maximum transition probability corresponding to the node in the node transition matrix;
针对每个预测结点,从所述预测结点到词的匹配概率中确定最大匹配概率,并将该最大匹配概率对应的词确定为目标词。For each prediction node, the maximum matching probability is determined from the matching probability between the prediction node and the word, and the word corresponding to the maximum matching probability is determined as the target word.
在一实施例中,第四执行单元执行根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词的步骤还可以是:In an embodiment, the fourth execution unit performs the step of determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to the word. The step may also be:
将起始结点标号对应的结点作为当前结点;Use the node corresponding to the starting node label as the current node;
从所述结点转移矩阵中获取所述当前结点到邻接结点的当前转移概率;Obtain the current transition probability from the current node to the adjacent node from the node transition matrix;
确定所述当前转移概率分别与所述当前结点与词所对应匹配概率的乘积值;Determine the product value of the current transition probability and the matching probability corresponding to the current node and word respectively;
从所述乘积值中选定最大乘积值,并将所述最大乘积值关联的邻接结点及词分别作为预测结点和目标词,并将所述预测结点及目标词关联添加至缓存表;Select the maximum product value from the product values, use the adjacent nodes and words associated with the maximum product value as prediction nodes and target words respectively, and add the prediction nodes and target words to the cache table. ;
将所述预测结点作为新的当前结点,重新执行所述当前结点所对应当前邻接点的选定操作,直至达到循环结束条件。The predicted node is used as a new current node, and the selection operation of the current adjacent point corresponding to the current node is re-executed until the loop end condition is reached.
在一实施例中,第四执行单元执行根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词的步骤也可以是:In one embodiment, the fourth execution unit performs the step of determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to the word, which may also be:
基于结点到词的匹配概率,确定相应的最大匹配概率,将所述最大匹配概率对应的词,确定为相应结点的预测词; Based on the matching probability from the node to the word, determine the corresponding maximum matching probability, and determine the word corresponding to the maximum matching probability as the predicted word of the corresponding node;
根据预先设定路径搜索算法,结合所述结点转移矩阵及所述结点的预测词,确定权重最高的预测路径;According to the preset path search algorithm, combined with the node transfer matrix and the predicted word of the node, determine the predicted path with the highest weight;
将所述预测路径中预测结点对应的预测词确定为相应的目标词Determine the predicted word corresponding to the predicted node in the predicted path as the corresponding target word
在一实施例中,该装置还可以包括:模型训练模块,设置为基于设定的损失函数生成策略,对所构建的文本解码模型进行学习参数训练,获得训练后的文本解码模型;In one embodiment, the device may further include: a model training module configured to generate a strategy based on the set loss function, perform learning parameter training on the constructed text decoding model, and obtain a trained text decoding model;
其中,所述学习参数包括:所述文本解码模型所包括位置信息输入层中涉及的结点位置参数、所包括基础解码子模型中涉及的基础模型参数、以及所包括文本预测图层中所具备结点涉及的结点相关参数。Wherein, the learning parameters include: node position parameters involved in the position information input layer included in the text decoding model, basic model parameters involved in the included basic decoding sub-model, and included text prediction layer. Node related parameters involved in the node.
在一实施例中,模型训练模块,可以设置为:In an embodiment, the model training module can be set to:
获得至少一组样本数据,一组样本数据中包括一个原始样本文本及对应的单个目标样本文本;Obtain at least one set of sample data, which includes an original sample text and a corresponding single target sample text;
在当前迭代下,将一组样本数据中的原始样本文本使用文本编码模型编码后,输入至当前文本解码模型;Under the current iteration, the original sample text in a set of sample data is encoded using the text encoding model and then input to the current text decoding model;
基于所述当前文本解码模型,确定将所述原始样本文本通过文本预测路径生成所述目标样本文本时所对应的概率值,其中,文本预测路径基于所述文本预测图层中的结点结合设定算法形成;Based on the current text decoding model, determine the probability value corresponding to when the original sample text is used to generate the target sample text through a text prediction path, wherein the text prediction path is based on the node combination device in the text prediction layer. The formula is formed;
基于所述概率值结合损失函数生成公式,确定当前损失函数值,并基于所述当前损失函数值通过反向传播调整所述当前文本解码模型中的学习参数,得到用于下一迭代的文本解码模型;Based on the probability value combined with the loss function generation formula, the current loss function value is determined, and based on the current loss function value, the learning parameters in the current text decoding model are adjusted through backpropagation to obtain text decoding for the next iteration. Model;
将下一迭代作为新的当前迭代,继续进行学习参数训练,直至满足迭代结束条件,获得训练后的文本解码模型。The next iteration is regarded as the new current iteration, and the learning parameter training is continued until the iteration end condition is met, and the trained text decoding model is obtained.
例如,所述损失函数生成公式表述为:对所述概率值之和求对数,并将对数运算结果取负。For example, the loss function generation formula is expressed as: taking the logarithm of the sum of the probability values, and taking the negative of the logarithm operation result.
上述装置可执行本公开任意实施例所提供的方法,具备执行方法相应的功能模块和有益效果。The above-mentioned device can execute the method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
值得注意的是,上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的具体名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。It is worth noting that the multiple units and modules included in the above device are only divided according to functional logic, but are not limited to the above divisions, as long as they can achieve the corresponding functions; in addition, the specific names of the multiple functional units They are only used to facilitate mutual differentiation and are not used to limit the protection scope of the embodiments of the present disclosure.
图4为本公开实施例所提供的一种电子设备的结构示意图。下面参考图4,其示出了适于用来实现本公开实施例的电子设备(例如图4中的终端设备或服务器)40的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图4示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. Referring now to FIG. 4 , a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 4 ) 40 suitable for implementing embodiments of the present disclosure is shown. Terminal devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablets), PMPs (Portable Multimedia Players), vehicle-mounted terminals (such as Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG. 4 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
如图4所示,电子设备40可以包括处理装置(例如中央处理器、图形处理器等)41,其可以根据存储在只读存储器(ROM)42中的程序或者从存储装置48加载到随机访问存储器(RAM)43中的程序而执行多种适当的动作和处理。在RAM 43中,还存储有电子设备40操作所需的多种程序和数据。处理装置41、ROM 42以及RAM 43通过总线45彼此相连。编辑/输出(I/O)接口44也连接至总线45。As shown in FIG. 4 , the electronic device 40 may include a processing device (eg, central processing unit, graphics processor, etc.) 41 that may be loaded into a random access memory according to a program stored in a read-only memory (ROM) 42 or from a storage device 48 . The program in the memory (RAM) 43 executes various appropriate actions and processes. In the RAM 43, various programs and data required for the operation of the electronic device 40 are also stored. The processing device 41, ROM 42 and RAM 43 are connected to each other via a bus 45. An editing/output (I/O) interface 44 is also connected to bus 45.
通常,以下装置可以连接至I/O接口44:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置46;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置47;包括例如磁带、硬盘等的存储装置48;以及通信装置49。通信装置49可以允许电子设备40与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有多种装置的电子设备40,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Generally, the following devices may be connected to the I/O interface 44: input devices 46 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 47 such as a computer; a storage device 48 including a magnetic tape, a hard disk, etc.; and a communication device 49. The communication device 49 may allow the electronic device 40 to communicate wirelessly or wiredly with other devices to exchange data. Although FIG. 4 illustrates electronic device 40 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如, 本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置49从网络上被下载和安装,或者从存储装置48被安装,或者从ROM 42被安装。在该计算机程序被处理装置41执行时,执行本公开实施例的方法中限定的上述功能。According to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as a computer software program. For example, Embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such embodiments, the computer program may be downloaded and installed from the network via the communication device 49, or from the storage device 48, or from the ROM 42. When the computer program is executed by the processing device 41, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are for illustrative purposes only and are not used to limit the scope of these messages or information.
本公开实施例提供的电子设备与上述实施例提供的文本生成方法属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的有益效果。The electronic device provided by the embodiments of the present disclosure and the text generation method provided by the above embodiments belong to the same inventive concept. Technical details that are not described in detail in this embodiment can be referred to the above embodiments, and this embodiment has the same features as the above embodiments. beneficial effects.
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的文本生成方法。Embodiments of the present disclosure provide a computer storage medium on which a computer program is stored. When the program is executed by a processor, the text generation method provided in the above embodiments is implemented.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmed read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium. Communications (e.g., communications network) interconnections. Examples of communications networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any currently known or developed in the future network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:The above-mentioned computer-readable medium carries one or more programs. When the above-mentioned one or more programs are executed by the electronic device, the electronic device:
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages—such as "C" or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个 模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operations of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each box in the flowchart or block diagram may represent a A module, program segment, or part of code that contains one or more executable instructions for implementing specified logical functions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。The units involved in the embodiments of the present disclosure can be implemented in software or hardware. The name of the unit does not constitute a limitation on the unit itself under certain circumstances. For example, the first acquisition unit can also be described as "the unit that acquires at least two Internet Protocol addresses."
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, and without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
根据本公开的一个或多个实施例,【示例一】提供了一种文本生成方法,该方法包括:将获取的原始文本输入训练后的文本编码模型,获得文本特征信息;基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本;其中,所述文本解码模型中包括文本预测图层,所述文本预测图层中所包括设定数量结点的结点信息通过所述文本特征信息确定,且所述目标文本中包含的目标词以及目标词的组合顺序,通过所述结点的结点信息及结点间拓扑结构确定。According to one or more embodiments of the present disclosure, [Example 1] provides a text generation method, which method includes: inputting the acquired original text into a trained text coding model to obtain text feature information; based on the text features Information, combined with the trained text decoding model, generates the target text corresponding to the original text; wherein, the text decoding model includes a text prediction layer, and the text prediction layer includes a set number of nodes. The point information is determined by the text feature information, and the target words contained in the target text and the combination order of the target words are determined by the node information of the nodes and the topological structure between nodes.
根据本公开的一个或多个实施例,【示例二】提供了一种文本生成方法,其中,所述文本解码模型包括:位置信息输入层、基础解码子模型及文本预测图层;所述位置信息输入层中包括设定数量个结点位置参数,所述设定数量用于决定所述文本预测图层中所包含结点的结点数量;所述文本预测图层中所包括设定数量结点的结点信息通过所述结点位置参数与所述文本特征信息,结合所述基础解码子模型确定。According to one or more embodiments of the present disclosure, [Example 2] provides a text generation method, wherein the text decoding model includes: a position information input layer, a basic decoding sub-model and a text prediction layer; the position The information input layer includes a set number of node position parameters, and the set number is used to determine the number of nodes included in the text prediction layer; the set number included in the text prediction layer The node information of the node is determined through the node position parameter and the text feature information, combined with the basic decoding sub-model.
根据本公开的一个或多个实施例,【示例三】提供了一种文本生成方法,其中,基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本,包括:将所述文本特征信息及所述位置信息输入层中所述结点位置参数,输入所述基础解码子模型;获得所述基础解码子模型输出的所述设定数量的初始文本预测向量,将所述设定数量的初始文本预测向量分别作为所述文本预测图层中设定数量结点的结点信息;基于所述结点构建有向无环图,确定结点间拓扑结构,并结合结点信息确定所述原始文本的目标文本。According to one or more embodiments of the present disclosure, [Example 3] provides a text generation method, wherein a target text corresponding to the original text is generated based on the text feature information and a trained text decoding model, The method includes: inputting the text feature information and the position information into the node position parameter in the layer and inputting it into the basic decoding sub-model; obtaining the set number of initial text prediction vectors output by the basic decoding sub-model. , use the set number of initial text prediction vectors as node information of a set number of nodes in the text prediction layer; construct a directed acyclic graph based on the nodes, and determine the topology structure between nodes, And determine the target text of the original text in combination with the node information.
根据本公开的一个或多个实施例,【示例四】提供了一种文本生成方法,其中,所述基于所述结点构建有向无环图,确定结点间拓扑结构,并结合结点信息确定所述原始文本的目标文本,包括:根据所述文本预测图层中结点的结点标号,构建有向无环图,获得结点间拓扑结构;根据结点间拓扑结构及所述结点的结点信息,确定所述文本预测图层对应的结点转移矩阵;根据所述结点的结点信息,及所述结点转移矩阵,确定所述原始文本的目标文本。According to one or more embodiments of the present disclosure, [Example 4] provides a text generation method, wherein the directed acyclic graph is constructed based on the nodes, the topology between nodes is determined, and the nodes are combined The information determines the target text of the original text, including: predicting the node labels of the nodes in the layer based on the text, constructing a directed acyclic graph, and obtaining the topological structure between nodes; based on the topological structure between nodes and the The node information of the node determines the node transition matrix corresponding to the text prediction layer; based on the node information of the node and the node transition matrix, the target text of the original text is determined.
根据本公开的一个或多个实施例,【示例五】提供了一种文本生成方法,其中,根据结点间拓扑结构及所述结点的结点信息,确定所述文本预测图层对应的结点转移矩阵,包括:针对每个结点,从结点间拓扑结构中确定所述结点有向连接的邻接结点;根据所述结点及所述 邻接结点的结点信息,确定所述结点到邻接结点的转移概率;基于所述转移概率形成所述文本预测图层对应的结点转移矩阵。According to one or more embodiments of the present disclosure, [Example 5] provides a text generation method, in which the text prediction layer corresponding to the text prediction layer is determined based on the topological structure between nodes and the node information of the nodes. The node transfer matrix includes: for each node, determining the adjacent nodes of the directed connection of the node from the topological structure between nodes; according to the node and the The node information of the adjacent node determines the transition probability from the node to the adjacent node; based on the transition probability, a node transition matrix corresponding to the text prediction layer is formed.
根据本公开的一个或多个实施例,【示例六】提供了一种文本生成方法,其中,根据所述结点的结点信息,及所述结点转移矩阵,确定所述原始文本的目标文本,包括:根据所述结点的结点信息,通过所述文本预测图层中全连接层,确定所述结点到预设词表中词的匹配概率;根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词;基于所述目标词,组合形成所述原始文本的目标文本。According to one or more embodiments of the present disclosure, [Example 6] provides a text generation method, wherein the target of the original text is determined based on the node information of the node and the node transition matrix. The text includes: according to the node information of the node, through the fully connected layer in the text prediction layer, determining the matching probability of the node to the word in the preset vocabulary; according to the node transfer matrix and The matching probability of the node to the word is used to determine the predicted node and the corresponding target word; based on the target word, the target text of the original text is combined to form the target text.
根据本公开的一个或多个实施例,【示例七】提供了一种文本生成方法,其中,根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词,包括:根据所述结点转移矩阵中结点对应的最大转移概率,确定至少一个预测结点;针对每个预测结点,从所述预测结点到词的匹配概率中确定最大匹配概率,并将该最大匹配概率对应的词确定为目标词。According to one or more embodiments of the present disclosure, [Example 7] provides a text generation method, in which the predicted node and the corresponding target word are determined according to the node transition matrix and the matching probability from the node to a word. , including: determining at least one prediction node according to the maximum transition probability corresponding to the node in the node transition matrix; for each prediction node, determining the maximum matching probability from the matching probability of the prediction node to the word, And the word corresponding to the maximum matching probability is determined as the target word.
根据本公开的一个或多个实施例,【示例八】提供了一种文本生成方法,其中,根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词,包括:将起始结点标号对应的结点作为当前结点;从所述结点转移矩阵中获取所述当前结点到邻接结点的当前转移概率;确定所述当前转移概率分别与所述当前结点与词所对应匹配概率的乘积值;从所述乘积值中选定最大乘积值,并将所述最大乘积值关联的邻接结点及词分别作为预测结点和目标词,并将所述预测结点及目标词关联添加至缓存表;将所述预测结点作为新的当前结点,重新执行所述当前结点所对应当前邻接点的选定操作,直至达到循环结束条件。According to one or more embodiments of the present disclosure, [Example 8] provides a text generation method, in which the predicted node and the corresponding target word are determined according to the node transition matrix and the matching probability of the node to a word. , including: taking the node corresponding to the starting node label as the current node; obtaining the current transition probability from the current node to the adjacent node from the node transition matrix; determining the current transition probability and the The product value of the matching probability corresponding to the current node and the word; select the maximum product value from the product value, and use the adjacent nodes and words associated with the maximum product value as the prediction node and the target word respectively, and Add the predicted node and the target word association to the cache table; use the predicted node as the new current node, and re-execute the selection operation of the current adjacent point corresponding to the current node until the loop end condition is reached .
根据本公开的一个或多个实施例,【示例九】提供了一种文本生成方法,其中,根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词,包括:基于结点到词的匹配概率,确定相应的最大匹配概率,将所述最大匹配概率对应的词,确定为相应结点的预测词;根据预先设定路径搜索算法,结合所述结点转移矩阵及所述结点的预测词,确定权重最高的预测路径;将所述预测路径中预测结点对应的预测词确定为相应的目标词。According to one or more embodiments of the present disclosure, [Example 9] provides a text generation method, in which the predicted node and the corresponding target word are determined according to the node transfer matrix and the matching probability from the node to a word. , including: determining the corresponding maximum matching probability based on the matching probability from the node to the word, and determining the word corresponding to the maximum matching probability as the predicted word of the corresponding node; based on the preset path search algorithm, combined with the knot The point transfer matrix and the predicted word of the node determine the prediction path with the highest weight; the predicted word corresponding to the prediction node in the prediction path is determined as the corresponding target word.
根据本公开的一个或多个实施例,【示例十】提供了一种文本生成方法,还包括:基于设定的损失函数生成策略,对所构建的文本解码模型进行学习参数训练,获得训练后的文本解码模型;其中,所述学习参数包括:所述文本解码模型所包括位置信息输入层中涉及的结点位置参数、所包括基础解码子模型中涉及的基础模型参数、以及所包括文本预测图层中所具备结点涉及的结点相关参数。According to one or more embodiments of the present disclosure, [Example 10] provides a text generation method, which further includes: based on a set loss function generation strategy, learning parameter training for the constructed text decoding model, and obtaining A text decoding model; wherein the learning parameters include: node position parameters involved in the position information input layer included in the text decoding model, basic model parameters involved in the included basic decoding sub-model, and included text predictions Node-related parameters involved in the nodes in the layer.
根据本公开的一个或多个实施例,【示例十一】提供了一种文本生成方法,其中,基于设定的损失函数生成策略,对所构建的文本解码模型进行学习参数训练,获得训练后的文本解码模型,包括:获得至少一组样本数据,一组样本数据中包括一个原始样本文本及对应的单个目标样本文本;在当前迭代下,将一组样本数据中的原始样本文本使用文本编码模型编码后,输入至当前文本解码模型;基于所述当前文本解码模型,确定将所述原始样本文本通过文本预测路径生成所述目标样本文本时所对应的概率值,其中,文本预测路径基于所述文本预测图层中的结点结合设定算法形成;基于所述概率值结合损失函数生成公式,确定当前损失函数值,并基于所述当前损失函数值通过反向传播调整所述当前文本解码模型中的学习参数,得到用于下一迭代的文本解码模型;将下一迭代作为新的当前迭代,继续进行学习参数训练,直至满足迭代结束条件,获得训练后的文本解码模型。According to one or more embodiments of the present disclosure, [Example 11] provides a text generation method, in which the built text decoding model is trained with learning parameters based on a set loss function generation strategy. After training, the The text decoding model includes: obtaining at least one set of sample data, which includes an original sample text and a corresponding single target sample text; under the current iteration, using text encoding to encode the original sample text in a set of sample data After the model is encoded, it is input to the current text decoding model; based on the current text decoding model, the probability value corresponding to when the original sample text is generated through the text prediction path to generate the target sample text, wherein the text prediction path is based on the The nodes in the text prediction layer are formed by combining the setting algorithm; based on the probability value combined with the loss function generation formula, the current loss function value is determined, and the current text decoding is adjusted through backpropagation based on the current loss function value. The learning parameters in the model are used to obtain the text decoding model for the next iteration; the next iteration is regarded as the new current iteration, and the learning parameter training is continued until the iteration end conditions are met, and the trained text decoding model is obtained.
根据本公开的一个或多个实施例,【示例十二】提供了一种文本生成方法,其中,所述损失函数生成公式表述为:对所述概率值之和求对数,并将对数运算结果取负。According to one or more embodiments of the present disclosure, [Example 12] provides a text generation method, wherein the loss function generation formula is expressed as: taking the logarithm of the sum of the probability values, and adding the logarithm The result of the operation is negative.
此外,虽然采用特定次序描绘了多种操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了如果干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。 Furthermore, although various operations are depicted in a specific order, this should not be understood as requiring that these operations be performed in the specific order shown or performed in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although specific implementation details are included in the discussion above, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Claims (15)

  1. 一种文本生成方法,包括:A text generation method including:
    将获取的原始文本输入训练后的文本编码模型,获得文本特征信息;Input the obtained original text into the trained text coding model to obtain text feature information;
    基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本;Based on the text feature information and combined with the trained text decoding model, generate target text corresponding to the original text;
    其中,所述文本解码模型中包括文本预测图层,所述文本预测图层中所包括设定数量结点的结点信息通过所述文本特征信息确定,且所述目标文本中包含的目标词以及所述目标词的组合顺序,通过所述结点的结点信息及结点间拓扑结构确定。Wherein, the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target words contained in the target text And the combination order of the target words is determined by the node information of the nodes and the topological structure between nodes.
  2. 根据权利要求1所述的方法,其中,所述文本解码模型包括:位置信息输入层、基础解码子模型及文本预测图层;The method according to claim 1, wherein the text decoding model includes: a position information input layer, a basic decoding sub-model and a text prediction layer;
    所述位置信息输入层中包括设定数量个结点位置参数,所述设定数量用于决定所述文本预测图层中所包含结点的结点数量;The position information input layer includes a set number of node position parameters, and the set number is used to determine the number of nodes included in the text prediction layer;
    所述文本预测图层中所包括设定数量结点的结点信息通过所述结点位置参数与所述文本特征信息,结合所述基础解码子模型确定。The node information of a set number of nodes included in the text prediction layer is determined through the node position parameters and the text feature information, combined with the basic decoding sub-model.
  3. 根据权利要求2所述的方法,其中,所述基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本,包括:The method of claim 2, wherein generating the target text corresponding to the original text based on the text feature information and a trained text decoding model includes:
    将所述文本特征信息及所述位置信息输入层中所述结点位置参数,输入所述基础解码子模型;Enter the text feature information and the position information into the node position parameters in the layer and into the basic decoding sub-model;
    获得所述基础解码子模型输出的所述设定数量的初始文本预测向量,将所述设定数量的初始文本预测向量分别作为所述文本预测图层中所述设定数量结点的结点信息;Obtain the set number of initial text prediction vectors output by the basic decoding sub-model, and use the set number of initial text prediction vectors as nodes of the set number of nodes in the text prediction layer. information;
    基于所述结点构建有向无环图,确定结点间拓扑结构,并结合所述节点的结点信息确定所述原始文本的目标文本。Construct a directed acyclic graph based on the nodes, determine the topological structure between nodes, and determine the target text of the original text based on the node information of the nodes.
  4. 根据权利要求3所述的方法,其中,所述基于所述结点构建有向无环图,确定结点间拓扑结构,并结合所述节点的结点信息确定所述原始文本的目标文本,包括:The method according to claim 3, wherein the directed acyclic graph is constructed based on the nodes, the topological structure between nodes is determined, and the target text of the original text is determined in combination with the node information of the nodes, include:
    根据所述文本预测图层中所述结点的结点标号,构建有向无环图,获得结点间拓扑结构;Construct a directed acyclic graph according to the node labels of the nodes in the text prediction layer, and obtain the topological structure between nodes;
    根据所述结点间拓扑结构及所述结点的结点信息,确定所述文本预测图层对应的结点转移矩阵;Determine the node transition matrix corresponding to the text prediction layer according to the topological structure between the nodes and the node information of the node;
    根据所述结点的结点信息,及所述结点转移矩阵,确定所述原始文本的目标文本。According to the node information of the node and the node transition matrix, the target text of the original text is determined.
  5. 根据权利要求4所述的方法,其中,所述根据所述结点间拓扑结构及所述结点的结点信息,确定所述文本预测图层对应的结点转移矩阵,包括:The method according to claim 4, wherein determining the node transition matrix corresponding to the text prediction layer based on the topological structure between the nodes and the node information of the node includes:
    针对每个结点,从所述结点间拓扑结构中确定所述每个结点有向连接的邻接结点;For each node, determine the adjacent nodes with directed connections to each node from the topological structure between the nodes;
    根据所述每个结点及所述邻接结点的结点信息,确定所述每个结点到所述邻接结点的转移概率;Determine the transition probability from each node to the adjacent node based on the node information of each node and the adjacent node;
    基于所述转移概率形成所述文本预测图层对应的结点转移矩阵。A node transition matrix corresponding to the text prediction layer is formed based on the transition probability.
  6. 根据权利要求4所述的方法,其中,所述根据所述结点的结点信息,及所述结点转移矩阵,确定所述原始文本的目标文本,包括:The method according to claim 4, wherein determining the target text of the original text according to the node information of the node and the node transition matrix includes:
    根据所述结点的结点信息,通过所述文本预测图层中全连接层,确定所述结点到预设词表中词的匹配概率;According to the node information of the node, the matching probability of the node to the word in the preset vocabulary is determined through the fully connected layer in the text prediction layer;
    根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词;According to the node transfer matrix and the matching probability from the node to the word, determine the predicted node and the corresponding target word;
    基于所述目标词,组合形成所述原始文本的目标文本。Based on the target words, a target text of the original text is formed.
  7. 根据权利要求6所述的方法,其中,所述根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词,包括:The method according to claim 6, wherein determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to a word includes:
    根据所述结点转移矩阵中结点对应的最大转移概率,确定至少一个预测结点;Determine at least one prediction node according to the maximum transition probability corresponding to the node in the node transition matrix;
    针对每个预测结点,从所述预测结点到词的匹配概率中确定最大匹配概率,并将所述最大匹配概率对应的词确定为目标词。For each prediction node, the maximum matching probability is determined from the matching probability between the prediction node and the word, and the word corresponding to the maximum matching probability is determined as the target word.
  8. 根据权利要求6所述的方法,其中,所述根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词,包括: The method according to claim 6, wherein determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to a word includes:
    将起始结点标号对应的结点作为当前结点;Use the node corresponding to the starting node label as the current node;
    从所述结点转移矩阵中获取所述当前结点到邻接结点的当前转移概率;Obtain the current transition probability from the current node to the adjacent node from the node transition matrix;
    确定所述当前转移概率分别与所述当前结点与词所对应匹配概率的乘积值;Determine the product value of the current transition probability and the matching probability corresponding to the current node and word respectively;
    从所述乘积值中选定最大乘积值,并将所述最大乘积值关联的邻接结点及词分别作为预测结点和目标词,并将所述预测结点及目标词关联添加至缓存表;Select the maximum product value from the product values, use the adjacent nodes and words associated with the maximum product value as prediction nodes and target words respectively, and add the prediction nodes and target words to the cache table. ;
    将所述预测结点作为新的当前结点,重新执行所述当前结点所对应当前邻接点的选定操作,直至达到循环结束条件。The predicted node is used as a new current node, and the selection operation of the current adjacent point corresponding to the current node is re-executed until the loop end condition is reached.
  9. 根据权利要求6所述的方法,其中,所述根据所述结点转移矩阵及结点到词的匹配概率,确定预测结点及相应的目标词,包括:The method according to claim 6, wherein determining the predicted node and the corresponding target word based on the node transition matrix and the matching probability from the node to a word includes:
    基于结点到词的匹配概率,确定相应的最大匹配概率,将所述最大匹配概率对应的词,确定为相应结点的预测词;Based on the matching probability from the node to the word, determine the corresponding maximum matching probability, and determine the word corresponding to the maximum matching probability as the predicted word of the corresponding node;
    根据预先设定路径搜索算法,结合所述结点转移矩阵及所述结点的预测词,确定权重最高的预测路径;According to the preset path search algorithm, combined with the node transfer matrix and the predicted word of the node, determine the predicted path with the highest weight;
    将所述预测路径中预测结点对应的预测词确定为相应的目标词。The predicted word corresponding to the prediction node in the prediction path is determined as the corresponding target word.
  10. 根据权利要求1-9任一项所述的方法,还包括:The method according to any one of claims 1-9, further comprising:
    基于设定的损失函数生成策略,对所构建的文本解码模型进行学习参数训练,获得训练后的文本解码模型;Based on the set loss function generation strategy, perform learning parameter training on the constructed text decoding model to obtain the trained text decoding model;
    其中,所述学习参数包括:所述文本解码模型所包括位置信息输入层中涉及的结点位置参数、所包括基础解码子模型中涉及的基础模型参数、以及所包括文本预测图层中所具备结点涉及的结点相关参数。Wherein, the learning parameters include: node position parameters involved in the position information input layer included in the text decoding model, basic model parameters involved in the included basic decoding sub-model, and included text prediction layer. Node related parameters involved in the node.
  11. 根据权利要求10所述的方法,其中,所述基于设定的损失函数生成策略,对所构建的文本解码模型进行学习参数训练,获得训练后的文本解码模型,包括:The method according to claim 10, wherein, based on the set loss function generation strategy, the constructed text decoding model is trained with learning parameters, and the trained text decoding model is obtained, including:
    获得至少一组样本数据,一组样本数据中包括一个原始样本文本及对应的单个目标样本文本;Obtain at least one set of sample data, which includes an original sample text and a corresponding single target sample text;
    在当前迭代下,将一组样本数据中的原始样本文本使用文本编码模型编码后,输入至当前文本解码模型;Under the current iteration, the original sample text in a set of sample data is encoded using the text encoding model and then input to the current text decoding model;
    基于所述当前文本解码模型,确定将所述原始样本文本通过文本预测路径生成所述目标样本文本时所对应的概率值,其中,所述文本预测路径基于所述文本预测图层中的结点结合设定算法形成;Based on the current text decoding model, determine the probability value corresponding to when the original sample text is used to generate the target sample text through a text prediction path, wherein the text prediction path is based on nodes in the text prediction layer Formed by combining the setting algorithm;
    基于所述概率值结合损失函数生成公式,确定当前损失函数值,并基于所述当前损失函数值通过反向传播调整所述当前文本解码模型中的学习参数,得到用于下一迭代的文本解码模型;Based on the probability value combined with the loss function generation formula, the current loss function value is determined, and based on the current loss function value, the learning parameters in the current text decoding model are adjusted through backpropagation to obtain text decoding for the next iteration. Model;
    将下一迭代作为新的当前迭代,继续进行学习参数训练,直至满足迭代结束条件,获得训练后的文本解码模型。The next iteration is regarded as the new current iteration, and the learning parameter training is continued until the iteration end condition is met, and the trained text decoding model is obtained.
  12. 根据权利要求11所述的方法,其中,所述损失函数生成公式表述为:The method according to claim 11, wherein the loss function generation formula is expressed as:
    对所述概率值之和求对数,并将对数运算结果取负。Calculate the logarithm of the sum of the probability values and negate the logarithmic result.
  13. 一种文本生成装置,包括:A text generating device including:
    编码执行模块,设置为将获取的原始文本输入训练后的文本编码模型,获得文本特征信息;The encoding execution module is configured to input the acquired original text into the trained text encoding model to obtain text feature information;
    解码执行模块,设置为基于所述文本特征信息,结合训练后的文本解码模型,生成所述原始文本对应的目标文本;A decoding execution module configured to generate target text corresponding to the original text based on the text feature information and combined with the trained text decoding model;
    其中,所述文本解码模型中包括文本预测图层,所述文本预测图层中所包括设定数量结点的结点信息通过所述文本特征信息确定,且所述目标文本中包含的目标词以及所述目标词的组合顺序,通过所述结点的结点信息及结点间拓扑结构确定。Wherein, the text decoding model includes a text prediction layer, the node information of a set number of nodes included in the text prediction layer is determined by the text feature information, and the target words contained in the target text And the combination order of the target words is determined by the node information of the nodes and the topological structure between nodes.
  14. 一种电子设备,包括:An electronic device including:
    一个或多个处理器; one or more processors;
    存储装置,设置为存储一个或多个程序,a storage device configured to store one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-12中任一所述的文本生成方法。When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the text generation method as described in any one of claims 1-12.
  15. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-12中任一所述的文本生成方法。 A computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the text generation method as described in any one of claims 1-12 is implemented.
PCT/CN2023/084560 2022-03-31 2023-03-29 Text generation method and apparatus, and computer device and storage medium WO2023185896A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210346397.4A CN114818746A (en) 2022-03-31 2022-03-31 Text generation method and device, computer equipment and storage medium
CN202210346397.4 2022-03-31

Publications (1)

Publication Number Publication Date
WO2023185896A1 true WO2023185896A1 (en) 2023-10-05

Family

ID=82532962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084560 WO2023185896A1 (en) 2022-03-31 2023-03-29 Text generation method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN114818746A (en)
WO (1) WO2023185896A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818746A (en) * 2022-03-31 2022-07-29 北京有竹居网络技术有限公司 Text generation method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021026760A (en) * 2019-07-31 2021-02-22 株式会社Nttドコモ Machine translation apparatus and method
CN113420569A (en) * 2021-06-22 2021-09-21 康键信息技术(深圳)有限公司 Code translation method, device, equipment and storage medium
CN113535939A (en) * 2020-04-17 2021-10-22 阿里巴巴集团控股有限公司 Text processing method and device, electronic equipment and computer readable storage medium
CN113761845A (en) * 2021-01-28 2021-12-07 北京沃东天骏信息技术有限公司 Text generation method and device, storage medium and electronic equipment
CN113947060A (en) * 2021-10-19 2022-01-18 北京有竹居网络技术有限公司 Text conversion method, device, medium and electronic equipment
CN114818746A (en) * 2022-03-31 2022-07-29 北京有竹居网络技术有限公司 Text generation method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021026760A (en) * 2019-07-31 2021-02-22 株式会社Nttドコモ Machine translation apparatus and method
CN113535939A (en) * 2020-04-17 2021-10-22 阿里巴巴集团控股有限公司 Text processing method and device, electronic equipment and computer readable storage medium
CN113761845A (en) * 2021-01-28 2021-12-07 北京沃东天骏信息技术有限公司 Text generation method and device, storage medium and electronic equipment
CN113420569A (en) * 2021-06-22 2021-09-21 康键信息技术(深圳)有限公司 Code translation method, device, equipment and storage medium
CN113947060A (en) * 2021-10-19 2022-01-18 北京有竹居网络技术有限公司 Text conversion method, device, medium and electronic equipment
CN114818746A (en) * 2022-03-31 2022-07-29 北京有竹居网络技术有限公司 Text generation method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114818746A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
JP7208952B2 (en) Method and apparatus for generating interaction models
WO2021244354A1 (en) Training method for neural network model, and related product
CN113436620B (en) Training method of voice recognition model, voice recognition method, device, medium and equipment
WO2023165538A1 (en) Speech recognition method and apparatus, and computer-readable medium and electronic device
WO2020207174A1 (en) Method and apparatus for generating quantized neural network
CN111597825B (en) Voice translation method and device, readable medium and electronic equipment
WO2023273610A1 (en) Speech recognition method and apparatus, medium, and electronic device
WO2023273612A1 (en) Training method and apparatus for speech recognition model, speech recognition method and apparatus, medium, and device
WO2023274187A1 (en) Information processing method and apparatus based on natural language inference, and electronic device
US20220215177A1 (en) Method and system for processing sentence, and electronic device
CN112270200B (en) Text information translation method and device, electronic equipment and storage medium
WO2023185896A1 (en) Text generation method and apparatus, and computer device and storage medium
US20240185046A1 (en) Intention recognition method and apparatus, readable medium, and electronic device
CN111681661B (en) Speech recognition method, apparatus, electronic device and computer readable medium
WO2023005763A1 (en) Information processing method and apparatus, and electronic device
WO2024099342A1 (en) Translation method and apparatus, readable medium, and electronic device
CN110909527B (en) Text processing model running method and device, electronic equipment and storage medium
CN110009101B (en) Method and apparatus for generating a quantized neural network
US20240169988A1 (en) Method and device of generating acoustic features, speech model training, and speech recognition
WO2023202543A1 (en) Character processing method and apparatus, and electronic device and storage medium
CN111653261A (en) Speech synthesis method, speech synthesis device, readable storage medium and electronic equipment
WO2023138361A1 (en) Image processing method and apparatus, and readable storage medium and electronic device
WO2023011260A1 (en) Translation processing method and apparatus, device and medium
CN111581455B (en) Text generation model generation method and device and electronic equipment
CN111737572B (en) Search statement generation method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23778261

Country of ref document: EP

Kind code of ref document: A1