WO2021143022A1 - 一种文本生成的方法及装置 - Google Patents

一种文本生成的方法及装置 Download PDF

Info

Publication number
WO2021143022A1
WO2021143022A1 PCT/CN2020/093450 CN2020093450W WO2021143022A1 WO 2021143022 A1 WO2021143022 A1 WO 2021143022A1 CN 2020093450 W CN2020093450 W CN 2020093450W WO 2021143022 A1 WO2021143022 A1 WO 2021143022A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
initial
hidden
text data
hidden space
Prior art date
Application number
PCT/CN2020/093450
Other languages
English (en)
French (fr)
Inventor
陈瑞清
许开河
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021143022A1 publication Critical patent/WO2021143022A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Definitions

  • This application relates to the field of data processing technology, in particular to a method and device for text generation.
  • the generated text is the text in the process of structuring, and the form of structured expression is the phenomenon text.
  • the quality of the generated text is usually judged by readability and controllability.
  • readability means that the generated text has sentences that conform to human natural language specifications and the meaning of the sentences is clear.
  • Controllability means that the desired target sentence can be generated according to the parameters set in advance, and the sentence can be changed according to the adjustment of the parameters. Semantics.
  • Existing text generation schemes are usually divided into rule-based text generation and neural network-based text generation.
  • the rule-based generation method is usually by manually specifying some rules through synonym substitution, part-of-speech analysis and other methods. This way, the generated text has better controllability but poor readability and weak scalability, requiring a lot of manual feature engineering.
  • Neural network-based methods are mainly divided into text generation using GAN and using VAE. Since text is composed of discrete characters, it is not directable. If you use GAN, the common method is to use reinforcement learning to achieve back propagation, but the problem is variance The effect is relatively large, while another method based on VAE is considered to be more friendly in text generation.
  • VAE Vehicle Auto-Encoder
  • GAN Gate Adversarial Networks
  • the so-called generative model is a model that can generate samples.
  • each image can be regarded as a sample of random distribution p(x)p(x). If a similar random model can be obtained, then samples can be generated without restriction. However, the random distribution p(x)p(x) needs to be obtained by learning from the training set, or to approximate it.
  • Variational autoencoder is a typical generative model in the field of deep learning, which belongs to the Encoder-Decoder model structure.
  • a text corpus is obtained according to a text application scenario to generate a text corpus, and then an aligned corpus is obtained from the text corpus, and the aligned corpus is used as the training corpus of the seq2seq model.
  • the aligned corpus is text that expresses the same content but identifies different emotions. Corpus, and then input the training corpus into the seq2seq model to train the seq2seq model for emotional style conversion, then obtain the target text according to the application scenario, and input the target text into the trained seq2seq model to obtain the corresponding emotional style conversion corpus.
  • the seq2seq model is the Encoder-Decoder model structure.
  • the training corpus data has limitations and cannot be used to train a seq2seq model with general meaning.
  • the conversion corpus cannot accurately reflect the solution in the application scenario when the emotional style is converted, that is, the target text generated according to the existing model is inaccurate, and the actual emotional style corresponding to the application scenario has a large gap.
  • the present application provides a method and device for text generation, the main purpose of which is to solve the problem of inaccurate target text generated according to an existing model in the prior art.
  • a method for text generation including:
  • the target sentence of the sentence to be tested is generated.
  • a text generation device including:
  • the calculation module is used to calculate the hidden space parameters of the variational autoencoder of the initial text data according to the preset BERT language model
  • the training module is configured to use the initial text data, the hidden space parameters, and the initial control conditions as input data, and use the control sentences corresponding to the initial text data under the initial control conditions as output data, using time sequence A reverse transfer algorithm, which modifies the weight of the LSTM decoder for training the long and short-term memory network to train the LSTM decoder;
  • the generating module is used to generate the target sentence of the sentence to be tested using the sentence to be tested and the target control condition as the input data of the LSTM decoder.
  • a computer storage medium is provided, and at least one computer readable instruction is stored in the computer storage medium, and the computer readable instruction causes a processor to perform operations corresponding to the above-mentioned text generation method. .
  • a computer device including: a processor, a memory, a communication interface, and a communication bus.
  • the processor, the memory, and the communication interface complete mutual communication through the communication bus.
  • the memory is used to store at least one computer-readable instruction, and the computer-readable instruction causes the processor to perform operations corresponding to the above-mentioned text generation method.
  • This application provides a method and device for text generation.
  • the initial text data is obtained, and then according to the preset BERT language model, the hidden space parameters of the variational autoencoder of the initial text data are calculated, and then the initial text data and the hidden space
  • the parameters and initial control conditions are the input data, and the control sentences corresponding to the initial text data under the initial control conditions are the output data.
  • the sequential backward pass algorithm is used to modify the training weights of the LSTM decoder of the long and short-term memory network to train the LSTM.
  • the decoder finally uses the sentence to be tested and the target control condition as the input data of the LSTM decoder to generate the target sentence of the sentence to be tested.
  • a preset BERT voice model is used to capture the grammatical and semantic features of the sentence in the initial text data to obtain rich sentence characterization information, and the sentence characterization information is passed through a variational autoencoder to obtain hidden space parameters, and The target sentence of the sentence to be tested is generated by controlling the conditions.
  • the target sentence has a good textual representation and controllability, and can accurately express the required emotional style, similar semantics, similar sentence patterns and other application scenarios with the control conditions.
  • FIG. 1 shows a flowchart of a method for generating text provided by an embodiment of the present application
  • Figure 2 shows a flowchart of another text generation method provided by an embodiment of the present application
  • Figure 3 shows a block diagram of a text generation device provided by an embodiment of the present application
  • Figure 4 shows a block diagram of another text generation device provided by an embodiment of the present application.
  • Fig. 5 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the purpose of this application is to use the intention as a control condition to generate sentences with a given intention, which can be used to expand the data of the intelligent question answering knowledge base.
  • the embodiment of the present application provides a method for generating text. As shown in FIG. 1, the method includes:
  • Variational autoencoder is an unsupervised learning neural network model. It uses raw data as input and output. It contains a hidden layer less than the input and output. It uses the seq2seq structure to encode text sentences into the hidden space. After completion, the text sentence can be solved by the decoder. From the initial text data to the hidden layer, the number of neurons in the neural network model decreases, and this process is an encoding process.
  • the purpose of the hidden layer is to extract the main components of the initial text data, and the hidden space parameters refer to the characteristic parameters of the initial text data.
  • the initial text data is mapped to the hidden space through the variational autoencoder through the preset BERT language model.
  • the mapping process is to connect the pooled_output of the preset BERT language model to the two fully connected layers respectively. Learn the parameter mean and standard deviation of the hidden space. During the mapping process, you can set a lower learning rate, such as 5E-5.
  • the algorithm modifies the weights of the LSTM decoder for training the long and short-term memory network to train the LSTM decoder.
  • the generation model is an LSTM decoder, which is developed to deal with natural language processing problems.
  • the LSTM decoder it is necessary to train the LSTM decoder through the initial text data, hidden space parameters, initial control conditions, and the actual sentences generated in the initial text data under the initial control conditions.
  • the number of hidden layers is smaller than the input, so the data will be compressed, and then the number of output neurons in the decoding part is greater than the hidden layer, and the compressed hidden layers are combined with each other to reproduce the original output.
  • the sequential reverse transfer algorithm is used to modify the training weights according to the error, so that the initial text data, hidden space parameters and initial control conditions can be generated in the LSTM decoder through the LSTM decoder.
  • the control condition refers to the semantics and style of the text generated by setting the category information of the labeled text and the variational characteristics.
  • the control condition refers to the digitally expressed intent that can be recognized by the computer.
  • Intent refers to the purpose of use in actual application scenarios, such as handling business, consulting business, complaining business, and so on.
  • the LSTM decoder In order to train the LSTM decoder, for the initial text data, it can be considered that the initial control conditions and the actual sentences corresponding to the initial control conditions are set, so that the LSTM decoder can generate text with higher controllability.
  • the hidden space parameter is not set, and the restriction on the target sentence is reduced, so that the target sentence is more in line with the target requirements of the sentence to be tested and the control condition.
  • the target control condition can be the purpose of use in actual application scenarios, such as handling business.
  • the target control condition refers to the digitally expressed intention that the computer can recognize.
  • This application provides a method for text generation. Firstly, the initial text data is obtained, and then according to the preset BERT language model, the hidden space parameters of the variational autoencoder of the initial text data are calculated, and then the initial text data, hidden space parameters and The initial control condition is the input data, and the control sentence corresponding to the initial text data under the initial control condition is the output data.
  • the sequential backward pass algorithm is used to modify the weight of training the long-short-term memory network LSTM decoder to train the LSTM decoder
  • the target sentence of the sentence to be tested is generated by using the sentence to be tested and the target control condition as the input data of the LSTM decoder.
  • a preset BERT voice model is used to capture the grammatical and semantic features of the sentence in the initial text data to obtain rich sentence characterization information, and the sentence characterization information is passed through a variational autoencoder to obtain hidden space parameters, and The target sentence of the sentence to be tested is generated by controlling the conditions.
  • the target sentence has a good textual representation and controllability, and can accurately express the required emotional style, similar semantics, similar sentence patterns and other application scenarios with the control conditions.
  • the embodiment of the present application provides another text generation method. As shown in FIG. 2, the method includes:
  • the initial text data can be randomly obtained text data, text data related to an application scenario, or text data entered by a user in a specific application scenario.
  • the source of the initial text data is not discussed in the embodiment of this application. limited.
  • the initial text data may be sentences in the intelligent question answering knowledge base data.
  • the number of sentences in the initial text data is not limited, and each sentence may include Chinese characters, English letters, pinyin symbols, or Arabic numerals.
  • a preset BERT language model map the initial text data to a hidden space through the variational autoencoder, and obtain hidden space parameters of the hidden space.
  • Variational autoencoder is an unsupervised learning neural network model. It uses raw data as input and output. It contains a hidden layer less than the input and output. It uses the seq2seq structure to encode text sentences into the hidden space. After completion, the text sentence can be solved by the decoder.
  • Obtaining hidden space parameters specifically includes: taking the initial text data as the input of the preset BERT language model, and obtaining the sentence vector of each sentence in the initial text data, where the sentence vector includes a word vector and a position vector;
  • the sentence vector is a learning parameter of the variational autoencoder, and the initial text data is mapped to a hidden space, the hidden space is a normal distribution space; the hidden space parameter of the hidden space is searched, the hidden space parameter Including the parameter mean and standard deviation of the initial text data.
  • the variational autoencoder adopts a neural network structure, so it needs to be trained before it is used, and its training process will not be repeated in the embodiment of this application. During the mapping process, you can set a lower learning rate, such as 5E-5.
  • Reconstruction of hidden space parameters is essentially based on the variational autoencoder, adding "Gaussian noise" to the output hidden space parameters, so that the robustness to noise can be increased during decoding.
  • Gaussian resampling is performed in the hidden space, and the hidden space parameters are re-obtained.
  • the new hidden space parameters are the input data for subsequent training of the LSTM decoder.
  • Spatial dimension refers to the type of data that characterizes the hidden space.
  • two types of data, mean and standard deviation, are used to represent the hidden space, so the number of spatial dimensions is 2.
  • splicing the hidden space parameter and the initial control condition to generate a hidden initial input of the LSTM decoder According to the spatial dimension, splicing the hidden space parameter and the initial control condition to generate a hidden initial input of the LSTM decoder.
  • part of the training data is specially processed before training, and the hidden space parameters and initial control conditions are spliced to generate the hidden initial input.
  • Generating the hidden initial input specifically includes: mapping the hidden space parameters to a hidden tensor; converting the initial control condition into an intent tensor whose intent tensor has the same intent dimension as the hidden tensor; The hidden tensor and the intention tensor are spliced to generate the hidden initial input of the LSTM decoder.
  • the spatial dimension of the hidden space is H
  • the number of sentences in the initial text data is M
  • the intent in the initial control condition is N
  • a tensor of size [N, H] is randomly defined, where each intent is for An H tensor, the intentional tensor and the reconstructed H tensor of hidden space parameters are spliced to obtain the hidden initial input of the LSTM decoder.
  • the sequential reverse transfer algorithm is used to modify the training weights according to the error, so that the initial text data, hidden space parameters and initial control conditions can be generated in the LSTM decoder through the LSTM decoder.
  • the word vector table used in the LSTM decoder is the same as the word vector table used in the variational autoencoder.
  • a larger learning rate is set for the decoding process to ensure that there are as little changes as possible in the encoding part, which corresponds to the lower 5E-5 learning rate set in the encoding process.
  • the learning rate in the process of the decoder can be 0.01.
  • the KL error coefficient is associated with the global step in the training process. As the number of global steps increases, the KL error coefficient gradually increases to 1 and then does not increase to prevent the decrease. The KL divergence caused by the reduction of the small KL error coefficient decreases too fast, which causes the LSTM decoder to be uncoupled from the hidden space.
  • the hidden space parameter is not set, and the restriction on the target sentence is reduced, so that the target sentence is more in line with the target requirements of the sentence to be tested and the control condition.
  • the target control condition can be the purpose of use in actual application scenarios, such as handling business.
  • the target control condition refers to the digitally expressed intention that the computer can recognize.
  • the target control condition is the control condition of the sentence to be tested
  • the sentence to be tested and the target control condition are used as the input data of the LSTM decoder to generate similar sentences of the sentence to be tested
  • the target control condition is the sentence to be tested.
  • the control condition of the test statement is the control condition of the test statement.
  • This application provides a method for text generation. Firstly, the initial text data is obtained, and then according to the preset BERT language model, the hidden space parameters of the variational autoencoder of the initial text data are calculated, and then the initial text data, hidden space parameters and The initial control condition is the input data, and the control sentence corresponding to the initial text data under the initial control condition is the output data.
  • the sequential backward pass algorithm is used to modify the weight of training the long-short-term memory network LSTM decoder to train the LSTM decoder
  • the target sentence of the sentence to be tested is generated by using the sentence to be tested and the target control condition as the input data of the LSTM decoder.
  • a preset BERT voice model is used to capture the grammatical and semantic features of the sentence in the initial text data to obtain rich sentence characterization information, and the sentence characterization information is passed through a variational autoencoder to obtain hidden space parameters, and The target sentence of the sentence to be tested is generated by controlling the conditions.
  • the target sentence has a good textual representation and controllability, and can accurately express the required emotional style, similar semantics, similar sentence patterns and other application scenarios with the control conditions.
  • an embodiment of the present application provides a device for generating text.
  • the device includes:
  • the obtaining module 31 is used to obtain initial text data
  • the calculation module 32 is configured to calculate the hidden space parameters of the variational autoencoder of the initial text data according to the preset BERT language model;
  • the training module 33 is configured to use the initial text data, the hidden space parameters, and initial control conditions as input data, and use the control sentences corresponding to the initial text data under the initial control conditions as output data, and use
  • the sequential backward pass algorithm is used to modify the weights of training the long and short-term memory network LSTM decoder to train the LSTM decoder;
  • the generating module 34 is configured to use the sentence to be tested and the target control condition as input data of the LSTM decoder to generate the target sentence of the sentence to be tested.
  • This application provides a device for generating text. Firstly, initial text data is obtained, and then according to a preset B ERT language model, the hidden space parameters of the variational autoencoder of the initial text data are calculated, and then the initial text data and hidden space parameters And the initial control condition is the input data, and the control sentence corresponding to the initial text data under the initial control condition is the output data.
  • the sequential backward pass algorithm is used to modify the weight of the training long-short-term memory network LSTM decoder to train the LSTM decoding
  • the target sentence of the sentence to be tested is generated.
  • a preset BERT voice model is used to capture the grammatical and semantic features of the sentence in the initial text data to obtain rich sentence characterization information, and the sentence characterization information is passed through a variational autoencoder to obtain hidden space parameters, and The target sentence of the sentence to be tested is generated by controlling the conditions.
  • the target sentence has a good textual representation and controllability, and can accurately express the required emotional style, similar semantics, similar sentence patterns and other application scenarios with the control conditions.
  • an embodiment of the present application provides another text generation device.
  • the device includes:
  • the obtaining module 41 is used to obtain initial text data
  • the calculation module 42 is configured to calculate the hidden space parameters of the variational autoencoder of the initial text data according to the preset BERT language model;
  • the training module 43 is configured to use the initial text data, the hidden space parameters, and initial control conditions as input data, and use the control sentences corresponding to the initial text data under the initial control conditions as output data, and use
  • the sequential backward pass algorithm is used to modify the weights of training the long and short-term memory network LSTM decoder to train the LSTM decoder;
  • the generating module 44 is configured to use the sentence to be tested and the target control condition as input data of the LSTM decoder to generate the target sentence of the sentence to be tested.
  • calculation module 42 includes:
  • the obtaining unit 421 is configured to map the initial text data to a hidden space through the variational autoencoder according to a preset BERT language model, and obtain hidden space parameters of the hidden space;
  • the reconstruction unit 422 is configured to perform Gaussian resampling in the hidden space to reconstruct the hidden space parameters.
  • the acquiring unit 421 includes:
  • the obtaining subunit 4211 is configured to use the initial text data as the input of the preset BERT language model to obtain the sentence vector of each sentence in the initial text data, and the sentence vector includes a word vector and a position vector;
  • the mapping subunit 4212 is configured to use the sentence vector as the learning parameter of the variational autoencoder to map the initial text data to a hidden space, where the hidden space is a normal distribution space;
  • the searching subunit 4213 is configured to search for hidden space parameters of the hidden space, where the hidden space parameters include the parameter mean and standard deviation of the initial text data.
  • training module 43 includes:
  • the obtaining unit 431 is configured to obtain the spatial dimension of the hidden space
  • the splicing unit 432 is configured to splice the hidden space parameters and the initial control conditions according to the spatial dimensions to generate the hidden initial input of the LSTM decoder;
  • the training unit 433 is configured to use the hidden initial input and the initial text data as input data, and use the control sentence corresponding to the initial text data under the initial control condition as the output data, and adopt sequential backward transfer
  • the algorithm modifies the weights of the LSTM decoder for training the long and short-term memory network to train the LSTM decoder.
  • the splicing unit 432 includes:
  • the mapping subunit 4321 is used to map the hidden space parameters to the hidden tensor
  • a conversion subunit 4322 configured to convert the initial control condition into an intent tensor, the intent dimension of the intent tensor is the same as that of the hidden tensor;
  • the splicing subunit 4323 is used to splice the hidden tensor and the intent tensor to generate the hidden initial input of the LSTM decoder.
  • the generating module 44 is used to:
  • the target control condition is the control condition of the sentence to be tested.
  • variational autoencoder and the LSTM decoder use the same word vector table.
  • This application provides a text generation device. Firstly, the initial text data is obtained, and then according to the preset BERT language model, the hidden space parameters of the variational autoencoder of the initial text data are calculated, and then the initial text data, hidden space parameters and The initial control condition is the input data, and the control sentence corresponding to the initial text data under the initial control condition is the output data.
  • the sequential backward pass algorithm is used to modify the weight of training the long-short-term memory network LSTM decoder to train the LSTM decoder
  • the sentence to be tested and the target control condition are used as the input data of the LSTM decoder to generate the target sentence of the sentence to be tested.
  • a preset BERT voice model is used to capture the grammatical and semantic features of the sentence in the initial text data to obtain rich sentence characterization information, and the sentence characterization information is passed through a variational autoencoder to obtain hidden space parameters, and The target sentence of the sentence to be tested is generated by controlling the conditions.
  • the target sentence has a good textual representation and controllability, and can accurately express the required emotional style, similar semantics, similar sentence patterns and other application scenarios with the control conditions.
  • a computer storage medium stores at least one computer-readable instruction, and the computer-readable instruction can execute the text generation method in any of the foregoing method embodiments.
  • Computer storage media include but are not limited to NandFlash, NorFlash, non-volatile memory (ROM, Flash memory), registers, cache, and memory.
  • FIG. 5 shows a schematic structural diagram of a computer device according to an embodiment of the present application, and the specific embodiment of the present application does not limit the specific implementation of the computer device.
  • the computer device may include a processor 502, a communication interface 504, a memory 506, and a communication bus 508.
  • the processor 502, the communication interface 504, and the memory 506 communicate with each other through the communication bus 508.
  • the communication interface 504 is used to communicate with other devices, such as network elements such as clients or other servers.
  • the processor 502 is configured to execute computer-readable instructions 510, and specifically can execute relevant steps in the above-mentioned text generation method embodiment.
  • the computer readable instructions 510 may include program code, and the program code includes computer operation instructions.
  • the processor 502 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the one or more processors included in the computer device may be the same type of processor, such as one or more CPUs, or different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 506 is used to store computer readable instructions 510.
  • the memory 506 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), for example, at least one disk memory.
  • the computer-readable instructions 510 may be specifically used to cause the processor 502 to perform the following operations:
  • the target sentence of the sentence to be tested is generated.
  • modules or steps of this application can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices.
  • they can be implemented with program codes executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, can be executed in a different order than here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

一种文本生成的方法及装置,涉及数据处理技术领域,解决现有技术中根据已有模型生成的目标文本不准确的问题。该方法主要包括:获取初始文本数据(101);根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数(102);以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器(103);以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句(104)。该方法主要应用于相似文本扩展的过程中。

Description

一种文本生成的方法及装置
本申请要求于2020年1月14日提交中国专利局、申请号为202010038172.3,发明名称为“一种文本生成的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种数据处理技术领域,特别是涉及一种文本生成的方法及装置。
背景技术
生成文本是结构化过程中的文本,结构化表现出来的形式即现象文本,通常以可读性和可控性判断生成文本的质量。其中,可读性是指生成文本具有符合人类自然语言规范的句子并且句子的意思清晰,可控性是指能够根据事先设定的参数生成想要的目标句子能够根据参数的调节来改变句子的语义。
现有的文本生成方案通常分成基于规则的文本生成和基于神经网络的文本生成。基于规则的生成方式通常是通过人工指定一些规则通过同义词替换、词性分析等方法,这样生成的文本可控性比较好但是可读性比较差且扩展性弱,需要大量的人工特征工程。基于神经网络的方式主要分为使用GAN和使用VAE的文本生成,由于文本是离散字符组成,因此是不可导的如果使用GAN常用的方法是使用强化学习来实现反向传播但是这样的问题是方差比较大影响效果,而另一种基于VAE的方法被认为在文本生成方面更友好。
VAE(Variational Auto-Encoder)和GAN(Ganerative Adversarial Networks)都是生成模型(Generative model)。所谓生成模型,即能生成样本的模型。将训练集中的数据点看作是某个随机分布抽样出来的样本,比如:MNIST手写体样本,可以将每一幅图像看作是随机分布p(x)p(x)的抽样。如果能够得到类似的随机模型,那么能够无限制地生成样本。但随机分布p(x)p(x),需要通过对训练集的学习来得到它,或者逼近它。要逼近一个随机分布,其基本思想是:将一个已知的,可控的随机分布q(z)q(z)映射到目标随机分布p(x)p(x)上。变分自编码器是深度学习领域中典型的生成模型,属于Encoder-Decoder模型结构。
现有技术中,采用根据文本应用场景获取文本语料,生成文本语料集,然后从文本语料中获取对齐语料,将对齐语料作为seq2seq模型的训练语料,对齐语料为表达内容相同但是标识不同情感的文本语料,再将训练语料输入seq2seq模型,以对seq2seq模型进行情感风格转换训练,再根据应用场景获取目标文本,将目标文本输入已训练的seq2seq模型,得到相应情感风格的转换语料。其中seq2seq模型是Encoder-Decoder模型结构。
然而,发明人发现,现有技术中直接将应用场景获取的文本语料获取文本语料集做为训练语料,在实际应用中训练语料数据具有局限性,不能据此训练出具有一般意义的seq2seq模型,导致在转换情感风格时转换语料不能准确反映应用场景下的方案,也就是根据已有模型生成的目标文本不准确,与应用场景实际对应的情感风格差距较大。
发明内容
有鉴于此,本申请提供一种文本生成的方法及装置,主要目的在于解决现有技术中根据已有模型生成的目标文本不准确的问题。
依据本申请一个方面,提供了一种文本生成的方法,包括:
获取初始文本数据;
根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数;
以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器;
以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
依据本申请另一个方面,提供了一种文本生成的装置,包括:
获取模块,用于获取初始文本数据;
计算模块,用于根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数;
训练模块,用于以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器;
生成模块,用于以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
根据本申请的又一方面,提供了一种计算机存储介质,所述计算机存储介质中存储有至少一计算机可读指令,所述计算机可读指令使处理器执行如上述文本生成的方法对应的操作。
根据本申请的再一方面,提供了一种计算机设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一计算机可读指令,所述计算机可读指令使所述处理器执行上述文本生成的方法对应的操作。
借由上述技术方案,本申请实施例提供的技术方案至少具有下列优点:
本申请提供了一种文本生成的方法及装置,首先获取初始文本数据,然后根据预置BERT语言模型,计算初始文本数据的变分自编码器的隐藏空间参数,再以初始文本数据、隐藏空间参数和初始控制条件为输入数据,以在初始控制条件下与初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练LSTM解码器,最后以待测语句和目标控制条件为LSTM解码器的输入数据,生成所述待测语句的目标语句。本申请实施例通过利用预置BERT语音模型抓取初始文本数据中的句子的语法特征和语义特征,以获取丰富的句子表征信息,将句子表征信息通过变分自编码器获取隐藏空间参数,并通过控制条件的方式生成待测语句的目标语句,目标语句具备较好的文本表示并具有可控制性,能准确以控制条件表达所需的情感风格、相似语义、相似句式等应用场景。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特 征和优点能够更明显易懂,以下特举本申请的具体实施方式。
发明概述
技术问题
问题的解决方案
发明的有益效果
对附图的简要说明
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本申请实施例提供的一种文本生成的方法流程图;
图2示出了本申请实施例提供的另一种文本生成的方法流程图;
图3示出了本申请实施例提供的一种文本生成的装置组成框图;
图4示出了本申请实施例提供的另一种文本生成的装置组成框图;
图5示出了本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本申请的目的在于,以意图作为控制条件,生成给定意图的语句,可以用于扩展智能问答知识库的数据。本申请实施例提供了一种文本生成的方法,如图1所示,该方法包括:
101、获取初始文本数据。
102、根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数。
变分自编码器,是一种非监督式学习的神经网络模型,采用原始数据作为输入和输出,含有一个数量小于输入输出的隐藏层,使用seq2seq结构,将文本句子编码到隐藏空间,在编码完成后能通过解码器解出文本句子。从初始文本数据到隐藏层,在神经网络模型中神经元数量下降,该过程为编码过程。隐藏层的目的是实现提取初始文本数据的主要成分,隐藏空间参数是指初始文本数据的特征参数。
在计算初始文本数据的隐藏空间参数之前,通过预置BERT语言模型将初始文本数据通过变分自编码器映射到隐藏空间,其映射过程为预置BERT语言模型的pooled_output连接两个全连接层分别学习隐藏空间的参数均值和标准差。在映射过程,可以设置较低的学习率,例如5E-5。
103、以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器。
从隐藏层到输出层,神经元的数量上升,被称为解码器,也就是生成模型。在本申请实施例中生成模型是LSTM解码器,LSTM解码器是为处理自然语言处理问题而开发的。为了后续使用LSTM解码器,需要通过初始文本数据、隐藏空间参数、初始控制条件,以及在初始控制条件在初始文本数据生成的实际语句,训练LSTM解码器。在编码部分由于隐藏层数量小于输入,所以会对数据进行压缩,之后在解码部分输出神经元数量大于隐藏层,压缩后的隐藏层相互组合重现原始输出。在训练LSTM解码器的过程中,为了最小化训练误差,采用时序性倒传递算法,依据错误修改训练权重,以使得以初始文本数据、隐藏空间参数和初始控制条件通过LSTM解码器,能够生成在初始控制条件下雨初始文本数据相对应的控制语句。
控制条件是指通过设定标注文本的类别信息与变分后的特征控制文本生成的语义和风格,控制条件是指计算机能够识别的用数字化表达的意图。意图是指实际应用场景中的使用目的,比如办理业务、咨询业务、投诉业务等等。为了训练LSTM解码器,针对初始文本数据,可以认为的设置初始控制条件以及初始控 制条件对应的实际语句,以使得LSTM解码器能够生成可控性较高的文本。
104、以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
在生成待测语句的目标语句的过程中,不设置隐藏空间参数,减少对目标语句的限定,使得目标语句更符合待测语句和控制条件的目标要求。目标控制条件,可以为在实际应用场景的使用目的,比如办理业务。目标控制条件是指计算机能够识别的用数字化表达的意图。
通过LSTM解码器,可以生成“12|月|份|推|荐|好|友|的|30|元|为|什|么|一|直|没|到|账”、“181|天|的|定|期|什|么|时|间|发|售”、“155|##64|##93|##15|##91|注|册|的|推|荐|人|手|机|尾|号|是|250|##1|吗”,通过生成例子可以看出本方案可以得到比较通顺的句子,通过对控制条件的训练可以在解码过程中,根据目标控制条件生成相应风格的句子。
本申请提供了一种文本生成的方法,首先获取初始文本数据,然后根据预置BERT语言模型,计算初始文本数据的变分自编码器的隐藏空间参数,再以初始文本数据、隐藏空间参数和初始控制条件为输入数据,以在初始控制条件下与初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练LSTM解码器,最后以待测语句和目标控制条件为LSTM解码器的输入数据,生成所述待测语句的目标语句。本申请实施例通过利用预置BERT语音模型抓取初始文本数据中的句子的语法特征和语义特征,以获取丰富的句子表征信息,将句子表征信息通过变分自编码器获取隐藏空间参数,并通过控制条件的方式生成待测语句的目标语句,目标语句具备较好的文本表示并具有可控制性,能准确以控制条件表达所需的情感风格、相似语义、相似句式等应用场景。
本申请实施例提供了另一种文本生成的方法,如图2所示,该方法包括:
201、获取初始文本数据。
初始文本数据可以是随机获取的文本数据,也可以是与应用场景相关的文本数据,还可以是特定应用场景下的用户录入的文本数据,在本申请实施例中对初始文本数据的来源不做限定。示例性的,初始文本数据可以是智能问答知识库 数据中的句子。初始文本数据中的句子数量不做限定,在每个句子中可能包括汉字、英文字母、拼音符号、或阿拉伯数字。
202、根据预置BERT语言模型,将所述初始文本数据通过所述变分自编码器映射到隐藏空间,获取所述隐藏空间的隐藏空间参数。
变分自编码器,是一种非监督式学习的神经网络模型,采用原始数据作为输入和输出,含有一个数量小于输入输出的隐藏层,使用seq2seq结构,将文本句子编码到隐藏空间,在编码完成后能通过解码器解出文本句子。获取隐藏空间参数,具体包括:以所述初始文本数据为所述预置BERT语言模型的输入,获取所述初始文本数据中各个句子的句向量,所述句向量包括字向量和位置向量;以所述句向量为变分自编码器的学习参数,将所述初始文本数据映射到隐藏空间,所述隐藏空间是正态分布空间;查找所述隐藏空间的隐藏空间参数,所述隐藏空间参数包括所述初始文本数据的参数均值和标准差。在获取初始文本数据中各个句子的句向量时,采用预置BERT语言模型中的字向量表。
变分自编码器采用神经网络结构,所以在使用之前,还需要对其进行训练,在本申请实施例中对其训练过程不再赘述。在映射过程,可以设置较低的学习率,例如5E-5。
203、在所述隐藏空间进行高斯重采样,重构所述隐藏空间参数。
重构隐藏空间参数,本质上就是在变分自编码器的基础上,对输出的隐藏空间参数加上“高斯噪声”,使得解码时能够增加对噪声的鲁棒性。在隐藏空间进行高斯重采样,重新获取隐藏空间参数,新的隐藏空间参数是后续训练LSTM解码器的输入数据。
204、获取所述隐藏空间的空间维度。
空间维度是指表征隐藏空间的数据种类,示例性的,用均值和标准差两种数据表示隐藏空间,那么其空间维度的数量就是2。
205、按照所述空间维度,拼接所述隐藏空间参数和所述初始控制条件,生成所述LSTM解码器的隐藏初始输入。
为了更好的训练LSTM解码器,在训练之前对部分训练数据进行特定处理,拼接隐藏空间参数和初始控制条件生成隐藏初始输入。生成隐藏初始输入具体包 括:将所述隐藏空间参数,映射到隐藏张量;将所述初始控制条件转换为意图张量,所述意图张量的意图维度与所述隐藏张量的维度相同;拼接所述隐藏张量和所述意图张量,生成所述LSTM解码器的隐藏初始输入。
示例性的,隐藏空间的空间维度为H,初始文本数据中的句子数量为M,初始控制条件中的意图为N,随机定义一个大小为[N,H]的张量,其中每个意图对于一个H张量,将该意图的张罗和重构后的隐藏空间参数的H张量进行拼接,得到LSTM解码器的隐藏初始输入。
206、以所述隐藏初始输入和所述初始文本数据为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器。
在训练LSTM解码器的过程中,为了最小化训练误差,采用时序性倒传递算法,依据错误修改训练权重,以使得以初始文本数据、隐藏空间参数和初始控制条件通过LSTM解码器,能够生成在初始控制条件下雨初始文本数据相对应的控制语句。
为了保证编码和解码的结果处于相同的空间,在LSTM解码器中采用的字向量表,与变分自编码器中采用的字向量表相同。为了是编码和解码过程取得同步,对解码过程设置较大的学习率,以保证编码部分的尽可能存在较小的改动,与编码过程设置较低的5E-5学习率相对应,在训练LSTM解码器的过程中学习率可为0.01。
由于解码过程中设置较小的学习率,采用将KL误差系数关联到训练过程中的全局步骤中,随着全局步骤数量的增大,KL误差系数逐渐增加到1之后不再增加,以防止减小KL误差系数降低带来的KL散度下降过快,导致LSTM解码器与隐藏空间脱钩不受控制。
207、以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
在生成待测语句的目标语句的过程中,不设置隐藏空间参数,减少对目标语句的限定,使得目标语句更符合待测语句和控制条件的目标要求。目标控制条件 ,可以为在实际应用场景的使用目的,比如办理业务。目标控制条件是指计算机能够识别的用数字化表达的意图。
当目标控制条件为待测语句的控制条件时,以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的相似语句,所述目标控制条件为所述待测语句的控制条件。
本申请提供了一种文本生成的方法,首先获取初始文本数据,然后根据预置BERT语言模型,计算初始文本数据的变分自编码器的隐藏空间参数,再以初始文本数据、隐藏空间参数和初始控制条件为输入数据,以在初始控制条件下与初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练LSTM解码器,最后以待测语句和目标控制条件为LSTM解码器的输入数据,生成所述待测语句的目标语句。本申请实施例通过利用预置BERT语音模型抓取初始文本数据中的句子的语法特征和语义特征,以获取丰富的句子表征信息,将句子表征信息通过变分自编码器获取隐藏空间参数,并通过控制条件的方式生成待测语句的目标语句,目标语句具备较好的文本表示并具有可控制性,能准确以控制条件表达所需的情感风格、相似语义、相似句式等应用场景。
进一步的,作为对上述图1所示方法的实现,本申请实施例提供了一种文本生成的装置,如图3所示,该装置包括:
获取模块31,用于获取初始文本数据;
计算模块32,用于根据预置BERT语言模型,计算所述初始文本数据的变分自编码器变分自编码器的隐藏空间参数;
训练模块33,用于以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器;
生成模块34,用于以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
本申请提供了一种文本生成的装置,首先获取初始文本数据,然后根据预置B ERT语言模型,计算初始文本数据的变分自编码器的隐藏空间参数,再以初始文本数据、隐藏空间参数和初始控制条件为输入数据,以在初始控制条件下与初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练LSTM解码器,最后以待测语句和目标控制条件为LSTM解码器的输入数据,生成所述待测语句的目标语句。本申请实施例通过利用预置BERT语音模型抓取初始文本数据中的句子的语法特征和语义特征,以获取丰富的句子表征信息,将句子表征信息通过变分自编码器获取隐藏空间参数,并通过控制条件的方式生成待测语句的目标语句,目标语句具备较好的文本表示并具有可控制性,能准确以控制条件表达所需的情感风格、相似语义、相似句式等应用场景。
进一步的,作为对上述图2所示方法的实现,本申请实施例提供了另一种文本生成的装置,如图4所示,该装置包括:
获取模块41,用于获取初始文本数据;
计算模块42,用于根据预置BERT语言模型,计算所述初始文本数据的变分自编码器变分自编码器的隐藏空间参数;
训练模块43,用于以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器;
生成模块44,用于以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
进一步地,所述计算模块42,包括:
获取单元421,用于根据预置BERT语言模型,将所述初始文本数据通过所述变分自编码器映射到隐藏空间,获取所述隐藏空间的隐藏空间参数;
重构单元422,用于在所述隐藏空间进行高斯重采样,重构所述隐藏空间参数。
进一步地,所述获取单元421,包括:
获取子单元4211,用于以所述初始文本数据为所述预置BERT语言模型的输入 ,获取所述初始文本数据中各个句子的句向量,所述句向量包括字向量和位置向量;
映射子单元4212,用于以所述句向量为变分自编码器的学习参数,将所述初始文本数据映射到隐藏空间,所述隐藏空间是正态分布空间;
查找子单元4213,用于查找所述隐藏空间的隐藏空间参数,所述隐藏空间参数包括所述初始文本数据的参数均值和标准差。
进一步地,所述训练模块43,包括:
获取单元431,用于获取所述隐藏空间的空间维度;
拼接单元432,用于按照所述空间维度,拼接所述隐藏空间参数和所述初始控制条件,生成所述LSTM解码器的隐藏初始输入;
训练单元433,用于以所述隐藏初始输入和所述初始文本数据为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器。
进一步地,所述拼接单元432,包括:
映射子单元4321,用于将所述隐藏空间参数,映射到隐藏张量;
转换子单元4322,用于将所述初始控制条件转换为意图张量,所述意图张量的意图维度与所述隐藏张量的维度相同;
拼接子单元4323,用于拼接所述隐藏张量和所述意图张量,生成所述LSTM解码器的隐藏初始输入。
进一步地,生成模块44,用于:
以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的相似语句,所述目标控制条件为所述待测语句的控制条件。
进一步地,所述变分自编码器和所述LSTM解码器采用相同的字向量表。
本申请提供了一种文本生成的装置,首先获取初始文本数据,然后根据预置BERT语言模型,计算初始文本数据的变分自编码器的隐藏空间参数,再以初始文本数据、隐藏空间参数和初始控制条件为输入数据,以在初始控制条件下与初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练 长短期记忆网络LSTM解码器的权重,以训练LSTM解码器,最后以待测语句和目标控制条件为LSTM解码器的输入数据,生成所述待测语句的目标语句。本申请实施例通过利用预置BERT语音模型抓取初始文本数据中的句子的语法特征和语义特征,以获取丰富的句子表征信息,将句子表征信息通过变分自编码器获取隐藏空间参数,并通过控制条件的方式生成待测语句的目标语句,目标语句具备较好的文本表示并具有可控制性,能准确以控制条件表达所需的情感风格、相似语义、相似句式等应用场景。
根据本申请一个实施例提供了一种计算机存储介质,所述计算机存储介质存储有至少一计算机可读指令,该计算机计算机可读指令可执行上述任意方法实施例中的文本生成的方法。计算机存储介质包括但不限于NandFlash、NorFlash、非易失性存储器(ROM、Flash memory)、寄存器、缓存和内存。
图5示出了根据本申请一个实施例提供的一种计算机设备的结构示意图,本申请具体实施例并不对计算机设备的具体实现做限定。
如图5所示,该计算机设备可以包括:处理器(processor)502、通信接口(Commu nications Interface)504、存储器(memory)506、以及通信总线508。
其中:处理器502、通信接口504、以及存储器506通过通信总线508完成相互间的通信。
通信接口504,用于与其它设备比如客户端或其它服务器等的网元通信。
处理器502,用于执行计算机可读指令510,具体可以执行上述文本生成的方法实施例中的相关步骤。
具体地,计算机可读指令510可以包括程序代码,该程序代码包括计算机操作指令。
处理器502可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器506,用于存放计算机可读指令510。存储器506可能包含高速RAM存储 器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
计算机可读指令510具体可以用于使得处理器502执行以下操作:
获取初始文本数据;
根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数;
以所述初始文本数据、所述隐藏空间参数和初始控制条件为训练数据,训练长短期记忆网络LSTM解码器;
以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (20)

  1. 一种文本生成的方法,包括:
    获取初始文本数据;
    根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数;
    以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器;
    以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
  2. 如权利要求1所述的方法,所述根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数,包括:
    根据预置BERT语言模型,将所述初始文本数据通过所述变分自编码器映射到隐藏空间,获取所述隐藏空间的隐藏空间参数;
    在所述隐藏空间进行高斯重采样,重构所述隐藏空间参数。
  3. 如权利要求2所述的方法,所述根据预置BERT语言模型,将所述初始文本数据通过所述变分自编码器映射到隐藏空间,获取所述隐藏空间的隐藏空间参数,包括:
    以所述初始文本数据为所述预置BERT语言模型的输入,获取所述初始文本数据中各个句子的句向量,所述句向量包括字向量和位置向量;
    以所述句向量为变分自编码器的学习参数,将所述初始文本数据映射到隐藏空间,所述隐藏空间是正态分布空间;
    查找所述隐藏空间的隐藏空间参数,所述隐藏空间参数包括所述初始文本数据的参数均值和标准差。
  4. 如权利要求2所述的方法,所述以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与 所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器,包括:
    获取所述隐藏空间的空间维度;
    按照所述空间维度,拼接所述隐藏空间参数和所述初始控制条件,生成所述LSTM解码器的隐藏初始输入;
    以所述隐藏初始输入和所述初始文本数据为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器。
  5. 如权利要求4所述的方法,所述按照所述空间维度,拼接所述隐藏空间参数和所述初始控制条件,生成所述LSTM解码器的隐藏初始输入,包括:
    将所述隐藏空间参数,映射到隐藏张量;
    将所述初始控制条件转换为意图张量,所述意图张量的意图维度与所述隐藏张量的维度相同;
    拼接所述隐藏张量和所述意图张量,生成所述LSTM解码器的隐藏初始输入。
  6. 如权利要求1所述的方法,以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句,包括:
    以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的相似语句,所述目标控制条件为所述待测语句的控制条件。
  7. 如权利要求1所述的方法,所述变分自编码器和所述LSTM解码器采用相同的字向量表。
  8. 一种文本生成的装置,包括:
    获取模块,用于获取初始文本数据;
    计算模块,用于根据预置BERT语言模型,计算所述初始文本数据 的变分自编码器的隐藏空间参数;
    训练模块,用于以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器;
    生成模块,用于以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
  9. 一种计算机存储介质,所述计算机存储介质中存储有至少一计算机可读指令,所述计算机可读指令使处理器执行如下操作:
    获取初始文本数据;
    根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数;
    以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器;
    以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
  10. 如权利要求9所述的计算机存储介质,所述计算机可读指令还使处理器执行如下操作:
    根据预置BERT语言模型,将所述初始文本数据通过所述变分自编码器映射到隐藏空间,获取所述隐藏空间的隐藏空间参数;
    在所述隐藏空间进行高斯重采样,重构所述隐藏空间参数。
  11. 如权利要求10所述的计算机存储介质,所述计算机可读指令还使处理器执行如下操作:
    以所述初始文本数据为所述预置BERT语言模型的输入,获取所述初始文本数据中各个句子的句向量,所述句向量包括字向量和位 置向量;
    以所述句向量为变分自编码器的学习参数,将所述初始文本数据映射到隐藏空间,所述隐藏空间是正态分布空间;
    查找所述隐藏空间的隐藏空间参数,所述隐藏空间参数包括所述初始文本数据的参数均值和标准差。
  12. 如权利要求10所述的计算机存储介质,所述计算机可读指令还使处理器执行如下操作:
    获取所述隐藏空间的空间维度;
    按照所述空间维度,拼接所述隐藏空间参数和所述初始控制条件,生成所述LSTM解码器的隐藏初始输入;
    以所述隐藏初始输入和所述初始文本数据为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器。
  13. 如权利要求12所述的计算机存储介质,所述计算机可读指令还使处理器执行如下操作:
    将所述隐藏空间参数,映射到隐藏张量;
    将所述初始控制条件转换为意图张量,所述意图张量的意图维度与所述隐藏张量的维度相同;
    拼接所述隐藏张量和所述意图张量,生成所述LSTM解码器的隐藏初始输入。
  14. 如权利要求9所述的计算机存储介质,所述计算机可读指令还使处理器执行如下操作:
    以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的相似语句,所述目标控制条件为所述待测语句的控制条件。
  15. 一种计算机设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完 成相互间的通信;
    所述存储器用于存放至少一计算机可读指令,所述计算机可读指令使所述处理器执行如下操作:
    获取初始文本数据;
    根据预置BERT语言模型,计算所述初始文本数据的变分自编码器的隐藏空间参数;
    以所述初始文本数据、所述隐藏空间参数和初始控制条件为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器;
    以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的目标语句。
  16. 如权利要求15所述的计算机设备,所述计算机可读指令还使处理器执行如下操作:
    根据预置BERT语言模型,将所述初始文本数据通过所述变分自编码器映射到隐藏空间,获取所述隐藏空间的隐藏空间参数;
    在所述隐藏空间进行高斯重采样,重构所述隐藏空间参数。
  17. 如权利要求16所述的计算机设备,所述计算机可读指令还使处理器执行如下操作:
    以所述初始文本数据为所述预置BERT语言模型的输入,获取所述初始文本数据中各个句子的句向量,所述句向量包括字向量和位置向量;
    以所述句向量为变分自编码器的学习参数,将所述初始文本数据映射到隐藏空间,所述隐藏空间是正态分布空间;
    查找所述隐藏空间的隐藏空间参数,所述隐藏空间参数包括所述初始文本数据的参数均值和标准差。
  18. 如权利要求16所述的计算机设备,所述计算机可读指令还使处理器执行如下操作:
    获取所述隐藏空间的空间维度;
    按照所述空间维度,拼接所述隐藏空间参数和所述初始控制条件,生成所述LSTM解码器的隐藏初始输入;
    以所述隐藏初始输入和所述初始文本数据为输入数据,以在所述初始控制条件下与所述初始文本数据相对应的控制语句为输出数据,采用时序性倒传递算法,修正训练长短期记忆网络LSTM解码器的权重,以训练所述LSTM解码器。
  19. 如权利要求18所述的计算机设备,所述计算机可读指令还使处理器执行如下操作:
    将所述隐藏空间参数,映射到隐藏张量;
    将所述初始控制条件转换为意图张量,所述意图张量的意图维度与所述隐藏张量的维度相同;
    拼接所述隐藏张量和所述意图张量,生成所述LSTM解码器的隐藏初始输入。
  20. 如权利要求15所述的计算机设备,所述计算机可读指令还使处理器执行如下操作:
    以待测语句和目标控制条件为所述LSTM解码器的输入数据,生成所述待测语句的相似语句,所述目标控制条件为所述待测语句的控制条件。
PCT/CN2020/093450 2020-01-14 2020-05-29 一种文本生成的方法及装置 WO2021143022A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010038172.3A CN111241789A (zh) 2020-01-14 2020-01-14 一种文本生成的方法及装置
CN202010038172.3 2020-01-14

Publications (1)

Publication Number Publication Date
WO2021143022A1 true WO2021143022A1 (zh) 2021-07-22

Family

ID=70874506

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093450 WO2021143022A1 (zh) 2020-01-14 2020-05-29 一种文本生成的方法及装置

Country Status (2)

Country Link
CN (1) CN111241789A (zh)
WO (1) WO2021143022A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569016A (zh) * 2021-09-27 2021-10-29 北京语言大学 一种基于Bert模型的专业术语提取方法及装置
CN113704480A (zh) * 2021-11-01 2021-11-26 成都我行我数科技有限公司 一种智能最小库存量单位匹配方法
CN116432663A (zh) * 2023-06-12 2023-07-14 山东山大鸥玛软件股份有限公司 基于要素简图的可控多样性专业文本生成方法及系统
CN116597049A (zh) * 2023-07-17 2023-08-15 北京奇虎科技有限公司 文本生成方法、装置、设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287641B (zh) * 2020-12-25 2021-03-09 上海旻浦科技有限公司 一种同义句生成方法、系统、终端及存储介质
CN113420129B (zh) * 2021-05-08 2022-11-18 天津大学 一种基于大型通用预训练模型控制对话生成的方法
CN113656573B (zh) * 2021-08-27 2024-02-06 北京大数医达科技有限公司 文本信息生成方法、装置、终端设备
CN115811630B (zh) * 2023-02-09 2023-05-02 成都航空职业技术学院 一种基于人工智能的教育信息化方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271629A (zh) * 2018-09-07 2019-01-25 中山大学 基于强化学习的生成式文本摘要方法
CN109885673A (zh) * 2019-02-13 2019-06-14 北京航空航天大学 一种基于预训练语言模型的自动文本摘要方法
CN110188331A (zh) * 2019-06-03 2019-08-30 腾讯科技(深圳)有限公司 模型训练方法、对话系统评价方法、装置、设备及存储介质
CN110210032A (zh) * 2019-05-31 2019-09-06 北京神州泰岳软件股份有限公司 文本处理方法及装置
US20190318040A1 (en) * 2018-04-16 2019-10-17 International Business Machines Corporation Generating cross-domain data using variational mapping between embedding spaces

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959256B (zh) * 2018-06-29 2023-04-07 北京百度网讯科技有限公司 短文本的生成方法、装置、存储介质和终端设备
CN108984524A (zh) * 2018-07-05 2018-12-11 北京理工大学 一种基于变分神经网络主题模型的标题生成方法
CN109582952B (zh) * 2018-10-31 2022-09-02 腾讯科技(深圳)有限公司 诗歌生成方法、装置、计算机设备和介质
CN110427490B (zh) * 2019-07-03 2021-11-09 华中科技大学 一种基于自注意力机制的情感对话生成方法与装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318040A1 (en) * 2018-04-16 2019-10-17 International Business Machines Corporation Generating cross-domain data using variational mapping between embedding spaces
CN109271629A (zh) * 2018-09-07 2019-01-25 中山大学 基于强化学习的生成式文本摘要方法
CN109885673A (zh) * 2019-02-13 2019-06-14 北京航空航天大学 一种基于预训练语言模型的自动文本摘要方法
CN110210032A (zh) * 2019-05-31 2019-09-06 北京神州泰岳软件股份有限公司 文本处理方法及装置
CN110188331A (zh) * 2019-06-03 2019-08-30 腾讯科技(深圳)有限公司 模型训练方法、对话系统评价方法、装置、设备及存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569016A (zh) * 2021-09-27 2021-10-29 北京语言大学 一种基于Bert模型的专业术语提取方法及装置
CN113704480A (zh) * 2021-11-01 2021-11-26 成都我行我数科技有限公司 一种智能最小库存量单位匹配方法
CN113704480B (zh) * 2021-11-01 2022-01-25 成都我行我数科技有限公司 一种智能最小库存量单位匹配方法
CN116432663A (zh) * 2023-06-12 2023-07-14 山东山大鸥玛软件股份有限公司 基于要素简图的可控多样性专业文本生成方法及系统
CN116432663B (zh) * 2023-06-12 2023-10-13 山东山大鸥玛软件股份有限公司 基于要素简图的可控多样性专业文本生成方法及系统
CN116597049A (zh) * 2023-07-17 2023-08-15 北京奇虎科技有限公司 文本生成方法、装置、设备及存储介质
CN116597049B (zh) * 2023-07-17 2023-10-31 北京奇虎科技有限公司 文本生成方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN111241789A (zh) 2020-06-05

Similar Documents

Publication Publication Date Title
WO2021143022A1 (zh) 一种文本生成的方法及装置
KR102048030B1 (ko) 자동화 어시스턴트와의 단대단 다국어 통신 촉진
US11354521B2 (en) Facilitating communications with automated assistants in multiple languages
WO2020143137A1 (zh) 基于受限文本空间的多步自注意力跨媒体检索方法及系统
WO2022188734A1 (zh) 一种语音合成方法、装置以及可读存储介质
US11699074B2 (en) Training sequence generation neural networks using quality scores
WO2020050893A1 (en) Natural language question answering
WO2020124674A1 (zh) 向量化译员的翻译个性特征的方法及装置
WO2021082427A1 (zh) 韵律控制的诗词生成方法、装置、设备及存储介质
CN112417092B (zh) 基于深度学习的智能化文本自动生成系统及其实现方法
WO2024099144A1 (zh) 下游任务模型生成及任务执行的方法和设备
JP2022006173A (ja) 知識事前訓練モデルの訓練方法、装置及び電子機器
WO2023045184A1 (zh) 一种文本类别识别方法、装置、计算机设备及介质
CN114064865A (zh) 在远程交互中检测词汇技能级别并校正未对准
WO2024040831A1 (zh) 自然语言处理方法及装置、电子设备和存储介质
Shi et al. Language chatbot–the design and implementation of english language transfer learning agent apps
CN115906815A (zh) 一种用于修改一种或多种类型错误句子的纠错方法及装置
WO2019218809A1 (zh) 一种篇章级文本翻译方法及装置
CN113326367B (zh) 基于端到端文本生成的任务型对话方法和系统
Mathur et al. A scaled‐down neural conversational model for chatbots
US20230317058A1 (en) Spoken language processing method and apparatus, and storage medium
WO2023245523A1 (zh) 用于生成训练数据的方法以及装置
CN113822044A (zh) 语法纠错数据生成方法、装置、计算机设备及存储介质
Nie et al. Graph neural net-based user simulator
CN116562275B (zh) 一种结合实体属性图的自动文本摘要方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913991

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20913991

Country of ref document: EP

Kind code of ref document: A1