WO2020151175A1 - Procédé et dispositif de génération de texte, dispositif informatique et support de stockage - Google Patents

Procédé et dispositif de génération de texte, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2020151175A1
WO2020151175A1 PCT/CN2019/092519 CN2019092519W WO2020151175A1 WO 2020151175 A1 WO2020151175 A1 WO 2020151175A1 CN 2019092519 W CN2019092519 W CN 2019092519W WO 2020151175 A1 WO2020151175 A1 WO 2020151175A1
Authority
WO
WIPO (PCT)
Prior art keywords
word vector
text
attention matrix
neural network
preset
Prior art date
Application number
PCT/CN2019/092519
Other languages
English (en)
Chinese (zh)
Inventor
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020151175A1 publication Critical patent/WO2020151175A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the technical field of text generation, and in particular to a text generation method, device, computer equipment, and computer-readable storage medium.
  • Text generation refers to the generation of new character text given language model parameters and text fragments.
  • the traditional text generation model is based on the recurrent neural network.
  • the recurrent neural network (Recurrent Neural Network, RNN) is a type of sequence (Sequence) data as input, recursive in the evolution direction of the sequence and all nodes (recurrent units) Recursive Neural Network (Recursive Neural Network) that forms a closed loop by chain connection. Since the recurrent neural network model uses a sequence to recursively generate text, the training efficiency of the text generation model is low.
  • the embodiments of the present application provide a text generation method, device, computer equipment, and computer-readable storage medium, which can solve the problem of low training efficiency during text generation model training in the traditional technology.
  • an embodiment of the present application provides a text generation method, the method includes: obtaining an initial text for text generation and a preset prediction vocabulary; and performing word embedding on the initial text and the preset prediction vocabulary respectively To convert the initial text into a first word vector and convert the preset prediction vocabulary into a second word vector; to obtain the first word vector and the second word vector through a corresponding convolutional neural network, respectively The first attention matrix of the first word vector and the second attention matrix of the second word vector; multiply the first attention matrix and the second attention matrix to obtain a third attention matrix Matrix; the third attention matrix is normalized and matched with a preset predicted vocabulary to generate predicted text.
  • an embodiment of the present application also provides a text generation device, wherein the device includes: an acquisition unit, configured to acquire initial text and preset prediction vocabulary for text generation; and a conversion unit, configured to convert the The initial text and the preset predicted vocabulary are respectively embedded in words to convert the initial text into a first word vector and the preset predicted vocabulary into a second word vector; a convolution unit is used to convert the first word vector A word vector and the second word vector obtain the first attention matrix of the first word vector and the second attention matrix of the second word vector through the corresponding convolutional neural network respectively; the unit is used to obtain Multiply the first attention matrix and the second attention matrix to obtain a third attention matrix; a matching unit for performing normalization of the third attention matrix with a preset prediction vocabulary Match to generate predictive text.
  • the device includes: an acquisition unit, configured to acquire initial text and preset prediction vocabulary for text generation; and a conversion unit, configured to convert the The initial text and the preset predicted vocabulary are respectively embedded in words to convert the initial text into a first word vector and the preset predicted
  • an embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the text generation method when the computer program is executed.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the text generation method.
  • FIG. 1 is a schematic diagram of an application scenario of a text generation method provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of a text generation method provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of word vectors in a text generation method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of a corresponding model in the text generation method provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a sub-process in the text generation method provided by an embodiment of the application.
  • FIG. 6 is a schematic block diagram of a text generation apparatus provided by an embodiment of the application.
  • FIG. 7 is another schematic block diagram of a text generation apparatus provided by an embodiment of the application.
  • FIG. 8 is a schematic block diagram of a computer device provided by an embodiment of the application.
  • FIG. 1 is a schematic diagram of an application scenario of a text generation method provided by an embodiment of the application.
  • the application scenarios include:
  • the terminal may be an electronic device such as a notebook computer, a tablet computer, or a desktop computer.
  • the terminal application environment shown in FIG. 1 is also It can be replaced with computer equipment such as servers.
  • the application environment in Figure 1 is a server
  • the server can be a server cluster or a cloud server.
  • the server cluster can also adopt a distributed system, and the servers of the distributed system can include a master server and a slave server, so that the master server uses the obtained initial text to perform the steps of the text generation method, and the slave server can be used to store a large amount of generated data.
  • each subject in Fig. 1 The working process of each subject in Fig. 1 is as follows: the terminal obtains the initial text and the preset predicted vocabulary for text generation; embeds the initial text and the preset predicted vocabulary to convert the initial text into the first Word vector and converting the preset prediction vocabulary into a second word vector; respectively obtaining the first attention of the first word vector through the corresponding convolutional neural network of the first word vector and the second word vector Force matrix and the second attention matrix of the second word vector; multiply the first attention matrix and the second attention matrix to obtain a third attention matrix; add the third attention matrix After being normalized, it is matched with a preset prediction vocabulary to generate a prediction text.
  • FIG. 1 only shows a desktop computer as a terminal.
  • the type of terminal is not limited to that shown in FIG. 1.
  • the terminal may also be an electronic device such as a mobile phone, a notebook computer, or a tablet computer.
  • the application scenarios of the above text generation method are only used to illustrate the technical solutions of this application, and are not used to limit the technical solutions of this application.
  • Fig. 2 is a schematic flowchart of a text generation method provided by an embodiment of the application.
  • the text generation method is applied to the terminal in FIG. 1 to complete all or part of the functions of the text generation method.
  • Figure 2 is a schematic flowchart of a text generation method provided by an embodiment of the present application
  • Figure 3 is a schematic diagram of a word vector in the text generation method provided by an embodiment of the application
  • a corresponding model diagram in the text generation method As shown in Figure 2, the method includes the following steps S210-S250:
  • the initial text refers to the text input by the user through the input device.
  • the content input by the user next is predicted based on the content input by the user to generate recommended text content matched with the input initial text, thereby improving the text input efficiency of the user.
  • the preset prediction vocabulary refers to a preset vocabulary selection range for generating prediction text.
  • the preset prediction vocabulary can be updated according to the content input by the user, and the preset prediction vocabulary is updated by recording and storing the user's common language to improve the accuracy of prediction, thereby improving the efficiency of text generation.
  • the embodiment of the present application is a text generation model based on a multi-scale parallel convolutional neural network, that is, the convolutional neural network is used to separately analyze the input initial text vocabulary to obtain the text information of the initial text, and analyze the preset prediction vocabulary
  • the correlation between the initial text is normalized by convolving the word vector of the initial text, then the first attention matrix and the preset prediction vocabulary are convolved and then normalized to obtain the second attention matrix.
  • the first attention matrix and the second attention matrix are multiplied to obtain the word vector of the predicted text, and then the word vector of the predicted text is normalized and the preset predicted vocabulary is matched to generate the predicted text.
  • the terminal obtains the initial text for text generation, for example, the text input by the user through the input device.
  • the text generation model predicts the predicted text associated with the input text from the preset predicted vocabulary according to the input text through the convolutional neural network, the predicted text It can be a predicted vocabulary, it can also be a predicted sentence or a paragraph, and the output is the text generation result.
  • S220 Perform word embedding on the initial text and the preset predicted vocabulary respectively to convert the initial text into a first word vector and convert the preset predicted vocabulary into a second word vector.
  • word embedding which is Word Embedding in English
  • Words with similar meanings have similar representations. It is a general term for the method of mapping words to real number vectors.
  • the structure layer where the word embedding is located is called the word embedding layer.
  • Word embedding layer or embedding layer for short, and Embedding layer in English.
  • Word embedding is a type of technology, which means that a single word is represented as a real number vector in a predefined vector space, and each word is mapped to a vector.
  • FIG. 3 is a schematic diagram of a word vector in a text generation method provided by an embodiment of the application.
  • the terminal converts the initial text and the preset prediction vocabulary into corresponding word vectors through the word embedding layer in the text generation model, that is, encodes the input natural language into word vectors.
  • the initial text is transformed into a first word vector
  • the preset predicted vocabulary is transformed into a second word vector, in preparation for text generation.
  • Word vectors are more than 100 times faster. If the pre-trained word vector is used, it is divided into Static method and No-static method.
  • Static method means that the parameters of the word vector are no longer adjusted during the process of training text generation.
  • No-static method is used in the training process of the text generation model. Adjust the parameters of the word vector, so the result of the No-static method is better than the result of the Static method.
  • Embedding layer embedding layer
  • a trained preset word vector dictionary can be used to embed the initial text to convert the initial text into a word vector.
  • the word vector can be a Word2Vec pre-trained word vector, that is, each vocabulary has a corresponding vector representation, which can express vocabulary information in the form of data, and the word vector dimension can be 300.
  • Word2vec which is Word to vector in English, is a software tool for training word vectors. It is used to generate related models of word vectors.
  • the automatic training of word vectors can be implemented through the Gensim library in Python.
  • convolutional neural network English is Convolutional Neural Networks, referred to as CNN, is a type of feedforward neural network (Feedforward Neural Networks) that contains convolution or related calculations and has a deep structure. It is a representative of deep learning (Deep Learning) One of the algorithms. Because convolutional neural networks can perform translation invariant classification (English: Shift-Invariant Classification), they are also called “translation invariant artificial neural networks (English: Shift-Invariant Artificial Neural Networks, referred to as SIANN).
  • SIANN Shift-Invariant Artificial Neural Networks
  • Attention also known as attention mechanism, or attention model, or attention structure, English is Attention Model.
  • the attention model in natural language processing draws on the concept of human attention.
  • visual attention is a brain signal processing mechanism unique to human vision. Human vision can quickly scan the global image to obtain the target area that needs to be focused. , which is commonly referred to as the focus of attention, and then put more attention resources in this area to obtain more detailed information about the target that needs to be paid attention to, and suppress other useless information. Human visual attention greatly improves vision The efficiency and accuracy of information processing.
  • the attention in the embodiments of this application is essentially similar to human selective visual attention, and the core goal is to select information that is more critical to the current task goal from a large number of information.
  • w and b respectively represent the parameters of the linear relationship between x and y, and w and b can be adjusted during the training process.
  • the attention matrix refers to the matrix after the weights are allocated in the matrix.
  • the function of the convolutional layer is to extract features from the input data. It contains multiple convolution kernels. Each element of the convolution kernel corresponds to a weight coefficient and a deviation, in the form of a matrix. Convolution is performed, so the attention matrix is generated after convolution and weight distribution.
  • the terminal after the terminal receives the initial text input by the user, the initial text is word-embedded through the word embedding layer to obtain the first word vector, and then the terminal uses the first convolutional layer in the convolutional layer to pair
  • the word vector is convolved, that is, the first word vector is convolved by a first convolutional neural network and normalized by the Softmax function to obtain the first word vector probability of the first word vector,
  • the first word vector is convolved by the first convolutional neural network in the first convolution layer to obtain a convolved first word vector, and the probability of the first word vector and the convolution first word
  • the vectors are multiplied to get the first attention matrix.
  • the terminal After receiving the preset prediction vocabulary, the terminal performs word embedding on the preset prediction vocabulary through the word embedding layer to obtain the second word vector, and then the terminal performs word embedding on the second word vector through the second convolutional layer in the convolutional layer.
  • the word vector is convolved, that is, the second word vector is convolved by the second convolutional neural network and normalized by the Softmax function to obtain the second word vector probability of the second word vector, and pass
  • the second convolutional neural network in the second convolutional layer convolves the second word vector to obtain a convolved second word vector, and compares the probability of the second word vector with the convolution second word vector Multiply by to get the second attention matrix.
  • matrix multiplication refers to matrix multiplication
  • English is Matrixmultiplication, which refers to general matrix product.
  • the first attention matrix and the second attention matrix are multiplied to obtain a third attention matrix.
  • normalization is a way to simplify calculations, that is, a dimensional expression is transformed into a dimensionless expression and becomes a scalar.
  • the English name is Normalization.
  • the normalization method has two forms, one is to change the number to a decimal between (0, 1), and the other is to change the dimensional expression to a dimensionless expression. It is mainly proposed for the convenience of data processing. It is more convenient and faster to map the data to the range of 0 to 1.
  • Commonly used normalization functions include the Softmax function.
  • the Softmax function or normalized exponential function, is an extension of the logic function, which can "compress" a K-dimensional vector z containing any real number into another K-dimensional real vector ⁇ (z), so that The range of each element is between (0,1), and the sum of all elements is 1.
  • the Softmax function is actually the logarithmic normalization of the gradient of the finite item discrete probability distribution.
  • the third attention matrix is normalized, the weight of each vector in the third attention matrix, or the probability of each vector, is obtained.
  • the third attention The matrix is normalized by the Softmax function to obtain the third word vector, and then the third word vector is matched with each vector of the preset prediction vocabulary, thereby converting each vector in the third attention matrix into each of the preset prediction vocabulary Natural language vocabulary to generate predictive text.
  • the output of the generated predictive text is processed by the attention mechanism in the text generation model. The output is the generated text content.
  • the text generation model passes through the convolutional layer, the first attention layer, and the second attention layer.
  • the attention layer constructs the output content, where the second attention layer is a fully connected network structure, and the output function is Softmax, which is used to limit the range of attention so that the data adjusted by the attention weight is input into the convolutional layer to obtain the prediction object And match the word vector with a dictionary composed of preset prediction words to determine the predicted text to be output.
  • the text generation model must be trained first.
  • the loss function of the text generation model is cross-entropy, and the training method is ADAM learning rate 0.001, where ADAM , English is Adaptive Moment Estimation, which is adaptive moment estimation.
  • ADAM Adaptive Moment Estimation
  • the learning rate in English, is the learning rate, also known as the learning rate, which is used to control the learning progress of the model.
  • the training of the neural network is implemented through the Tensorflow library in Python.
  • the trained text generation model can be used for user input word prediction.
  • the initial text and the preset predicted vocabulary for text generation are obtained, and the initial text and the preset predicted vocabulary are respectively embedded in words to convert the initial text into the first A word vector and converting the preset prediction vocabulary into a second word vector, and the first word vector and the second word vector are respectively passed through a corresponding convolutional neural network to obtain the first attention of the first word vector Force matrix and the second attention matrix of the second word vector, multiply the first attention matrix and the second attention matrix to obtain a third attention matrix, and the third attention matrix
  • it is matched with the preset predicted vocabulary to generate the predicted text, and then according to the input text, the information is refined through the convolutional neural network, and the predicted text with strong association attributes is generated within the preset predicted vocabulary.
  • the text generation model established in the embodiment of the present application adopts the parallel computing characteristics of the multi-scale convolutional neural network with higher training efficiency.
  • the first word vector and the second word vector are respectively passed through a corresponding convolutional neural network to obtain the first word vector of the first word vector.
  • the steps of an attention matrix and the second attention matrix of the second word vector include:
  • S510 Perform convolution on the first word vector by a first convolutional neural network, and obtain the first word vector probability of the first word vector after normalization;
  • S540 Convolve the second word vector through a second convolutional neural network and obtain a second word vector probability of the second word vector after normalization;
  • the terminal needs to first establish a first convolutional neural network and a second convolutional neural network to capture the information of the word vector through the convolutional neural network to obtain the word vector relationship between the word vectors.
  • the text generation model needs to perform the next input prediction based on the content that the user has input. Since input prediction may depend on one or more words that have been input, the text generation model sets up a multi-dimensional convolution kernel to capture local information of the input text.
  • the text generation model includes two parallel convolutional layers, the first convolutional layer and the second convolutional layer, that is, the convolutional layer to which the first convolutional neural network belongs and the second convolutional neural network Convolutional layer.
  • Each convolutional layer in the first convolutional layer and the second convolutional layer includes two parallel sub-convolutional layers. Among them, one sub-convolutional layer in each convolutional layer is mapped by the Softmax function After that, it is multiplied by another sub-convolutional layer, and each convolutional layer is point-multiplied by establishing two sub-convolution kernels to achieve information extraction. Among them, dot multiplication is also called the inner product and quantity product of vectors.
  • the text generation model has preset predicted words in advance. For example, 1000 words have been established as optional predicted words, and the preset predicted words are converted into second words by word vector conversion in the embedding layer.
  • the text generation model uses the word embedding layer to convert text and word vectors, and enters the first convolutional layer to which the first convolutional neural network belongs, through the first convolution
  • the neural network convolves the first word vector and normalizes it by the Softmax function to obtain the first word vector probability of the first word vector.
  • the first word vector is calculated by the first convolutional neural network.
  • the vector is convolved to obtain a convolved first word vector, and the first word vector probability and the convolved first word vector are multiplied to obtain a first attention matrix.
  • the height of the first convolutional layer may include two types of convolutional neural networks of 1 dimension and 3 dimensions, and each type of convolutional neural network has 128 channels.
  • the first convolutional neural network before the normalization of the Softmax function in the first convolutional layer and the first convolutional neural network that obtains the convolutional first word vector in the first convolutional layer may be the same or different.
  • the Softmax in the first convolutional layer may both be a 1-dimensional convolutional neural network or a 3-dimensional convolution Product neural network.
  • the first convolutional neural network before the normalization of the Softmax function in the first convolutional layer is different from the first convolutional neural network that obtains the first convolutional word vector in the first convolutional layer, for example, the first convolution
  • the first convolutional neural network before the normalization of the Softmax function in the layer is a 3-dimensional convolutional neural network
  • the convolutional neural network that obtains the first convolutional word vector in the first convolutional layer is a 1-dimensional convolutional neural network .
  • the terminal receives the preset predicted vocabulary, it embeds the preset predicted vocabulary through the word embedding layer to obtain the second word vector, and then the terminal passes the volume
  • the second convolutional layer in the buildup layer convolves the second word vector, that is, convolves the second word vector through the second convolutional neural network and normalizes it by the Softmax function to obtain the State the second word vector probability of the second word vector, and at the same time convolve the second word vector through the second convolutional neural network in the second convolutional layer to obtain the convolved second word vector, The probability of the two word vectors is multiplied by the convolved second word vector to obtain a second attention matrix.
  • the goal of the embodiments of this application is to predict input words. Since the initial text input by the text generation model has the characteristics of variable length, the text generation model outputs two parts through variable length training text and optional preset prediction words Matrix, that is, the first attention matrix and the second attention matrix, and multiply these two parts of the matrix to obtain the third attention matrix, and the third attention matrix is mapped to the Softmax function, and the third attention matrix is The vector of, in descending order of probability, outputs words with higher probability as predicted words to generate predicted text, so as to increase the probability of text output accuracy and improve the efficiency of user input.
  • Matrix that is, the first attention matrix and the second attention matrix
  • the step of performing convolution on the first word vector by a first convolutional neural network and normalizing to obtain the first word vector probability of the first word vector includes:
  • the step of performing convolution on the second word vector by a second convolutional neural network and normalization to obtain the second word vector probability of the second word vector includes:
  • the second word vector is convolved by a second convolutional neural network, and the short-term information and long-term information of the second word vector are captured after normalization to obtain the second word vector probability.
  • the text generation model sets up a multi-dimensional convolution kernel to capture the local information of the input text
  • the local information refers to the information of the word vector
  • the information of the word vector refers to the difference between the vocabulary sequences included in the input text.
  • Associated information can also be understood as the sequence information of the input text, used to describe the contextual relationship of the input text, and the combination of words to form a specific meaning.
  • the probability of the above-mentioned "cat” and “dog” collocation is higher than the probability of " ⁇ ” and "love” collocation.
  • the fixed idiom in Chinese reflects the corresponding contextual relationship and sequence information, such as , When it comes to "a journey of a thousand miles", it is usually accompanied by text content such as "beginning with a single step”.
  • the word vector information includes short-term information and long-term information.
  • Short-term information refers to sequence information below a preset number of words
  • short-term information can also be called short-term sequence information
  • long-term information refers to a preset number and prediction. Assuming sequence information between words above the number, long-term information can be called long-term sequence information.
  • short-term information is information in a text embodied in a vocabulary of one word or two words
  • long-term information is sequence information in a text embodied in a vocabulary of three words, four words, and four words.
  • the first word vector is convolved by the first convolutional neural network and the short-term information and long-term information of the first word vector are captured after normalization to obtain the probability of the first word vector, and the second convolutional neural network
  • the network convolves the second word vector and after normalization, it captures the short-term information and long-term information of the second word vector to obtain the second word vector probability.
  • the short-term information and the long-term information are calculated through the convolutional neural network.
  • the method further includes:
  • the step of matching the third word vector with a preset predicted vocabulary to generate predicted text includes:
  • the preset number of third word vectors that have been filtered out are matched with a preset prediction vocabulary to generate a preset number of prediction texts.
  • a preset number of third word vectors are filtered out according to the probability of the third word vector from high to low, and then the preset number of third word vectors selected are filtered out.
  • the three-word vector is matched with a preset predicted vocabulary to generate a preset number of predicted texts. For example, it is set in advance to generate 5 predicted words, and the 5 third word vectors are filtered out according to the probability of the third word vector from high to low, and then the 5 third word vectors selected are combined with the preset predicted words.
  • Matching generates a predicted text of 5 words, and outputs the generated text composed of 5 predicted words, so that it is not necessary to match and output all the predicted texts, which reduces the amount of data processing and improves the efficiency of text prediction.
  • the method further includes:
  • the generated predicted text is displayed in horizontal or vertical rows according to the probability of the corresponding third word vector from high to low, that is, the probability of the third word vector corresponding to the third word vector is displayed in horizontal or vertical rows from high to low.
  • the predictive text of the preset amount is displayed in a vertical row. For example, it is set in advance to generate 5 predicted words, and the 5 third word vectors are filtered out according to the probability of the third word vector from high to low, and then the 5 third word vectors selected are combined with the preset predicted words. Match to generate 5 words of predicted text, and output the generated text composed of 5 words of predicted words.
  • the generated 5 words of predicted text are arranged horizontally or vertically according to the probability of the corresponding third word vector. The way of row display.
  • FIG. 6 is a schematic block diagram of a text generating apparatus provided by an embodiment of the application.
  • an embodiment of the present application also provides a text generation device.
  • the text generation device includes a unit for executing the above-mentioned text generation method, and the device can be configured in a computer device such as a terminal or a server.
  • the text generation device 600 includes an acquisition unit 601, a conversion unit 602, a convolution unit 603, an obtaining unit 604 and a matching unit 605.
  • the obtaining unit 601 is configured to obtain the initial text and preset prediction vocabulary for text generation
  • a conversion unit 602 configured to perform word embedding on the initial text and the preset predicted vocabulary respectively to convert the initial text into a first word vector and convert the preset predicted vocabulary into a second word vector;
  • the convolution unit 603 is configured to obtain the first attention matrix of the first word vector and the second word vector through the corresponding convolutional neural network by the first word vector and the second word vector. Second attention matrix;
  • the obtaining unit 604 is configured to multiply the first attention matrix and the second attention matrix to obtain a third attention matrix
  • the matching unit 605 is configured to normalize the third attention matrix and match the preset predicted vocabulary to generate predicted text.
  • FIG. 7 is another schematic block diagram of the text generating apparatus provided by an embodiment of the application.
  • the convolution unit 603 includes:
  • the first convolution subunit 6031 is configured to convolve the first word vector through a first convolutional neural network and obtain the first word vector probability of the first word vector after normalization;
  • the second convolution subunit 6032 is configured to convolve the first word vector through the first convolutional neural network to obtain a convolved first word vector;
  • the first multiplication subunit 6033 is configured to multiply the probability of the first word vector and the first word vector of the convolution to obtain a first attention matrix
  • the third convolution subunit 6034 is configured to convolve the second word vector through a second convolutional neural network and obtain the second word vector probability of the second word vector after normalization;
  • a fourth convolution subunit 6035 configured to convolve the second word vector through the second convolutional neural network to obtain a convolved second word vector
  • the second multiplication subunit 6036 is configured to multiply the second word vector probability and the convolved second word vector to obtain a second attention matrix.
  • the first convolution subunit 6031 is configured to convolve the first word vector through a first convolutional neural network, and after normalization, capture the short-term information of the first word vector Information and long-term information to get the probability of the first word vector;
  • the third convolution subunit 6034 is configured to convolve the second word vector through a second convolutional neural network, and after normalization, capture the short-term information and long-term information of the second word vector to obtain the second Word vector probability.
  • the matching unit 605 includes:
  • a normalization subunit 6051 configured to normalize the third attention matrix to obtain a third word vector
  • the matching subunit 6053 is configured to match the third word vector with a preset prediction vocabulary to generate a prediction text.
  • the matching unit 605 further includes:
  • the screening subunit 6052 is configured to screen out a preset number of third word vectors according to the probability of the third word vector from high to low;
  • the matching subunit 6053 is configured to match the preset number of third word vectors that have been screened out with preset predicted words to generate a preset number of predicted texts.
  • the normalization subunit 6051 is used to normalize the third attention matrix by a Softmax function to obtain a third word vector.
  • the text generating device 600 further includes:
  • the display unit 606 is configured to display the preset number of predictive texts in a preset manner.
  • the division and connection of the units in the text generation device are only used for illustration.
  • the text generation device can be divided into different units as needed, or the units in the text generation device can be different.
  • the above-mentioned text generating apparatus can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 8.
  • FIG. 8 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 800 may be a computer device such as a desktop computer or a server, or may be a component or component in other devices.
  • the computer device 800 includes a processor 802, a memory, and a network interface 805 connected through a system bus 801, where the memory may include a nonvolatile storage medium 803 and an internal memory 804.
  • the non-volatile storage medium 803 can store an operating system 8031 and a computer program 8032.
  • the processor 802 can execute one of the above-mentioned text generation methods.
  • the processor 802 is used to provide calculation and control capabilities to support the operation of the entire computer device 800.
  • the internal memory 804 provides an environment for the operation of the computer program 8032 in the non-volatile storage medium 803.
  • the processor 802 can execute one of the foregoing text generation methods.
  • the network interface 805 is used for network communication with other devices.
  • the structure shown in FIG. 8 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 800 to which the solution of the present application is applied.
  • the specific computer device 800 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 8 and will not be repeated here.
  • the processor 802 is configured to run a computer program 8032 stored in the memory to implement the text generation method in the embodiment of the present application.
  • the processor 802 may be a central processing unit (Central Processing Unit, CPU), and the processor 802 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the steps of the text generation method described in the above embodiments.
  • the computer-readable storage medium may be a USB flash drive, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk, and other computer-readable storage media that can store computer programs.
  • ROM Read-Only Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Selon des modes de réalisation, la présente invention concerne un procédé et un appareil de génération de texte, un dispositif informatique et un support de stockage lisible par ordinateur, se rapportant au domaine technique de la génération de texte. Les modes de réalisation de la présente invention sont tels que, lors de la mise en œuvre d'une génération de texte, par acquisition d'un texte initial pour la génération de texte et d'un vocabulaire prédictif prédéfini, un plongement de mots est réalisé respectivement relativement au texte initial et au vocabulaire prédictif prédéfini pour convertir le texte initial en un premier vecteur de mots et pour convertir le vocabulaire prédictif prédéfini en un deuxième vecteur de mots, une première matrice d'attention du premier vecteur de mots et une deuxième matrice d'attention du deuxième vecteur de mots sont acquises respectivement à partir du premier vecteur de mots et du deuxième vecteur de mots par le biais d'un réseau neuronal convolutif correspondant, la première matrice d'attention est multipliée par la deuxième matrice d'attention pour produire une troisième matrice d'attention, et la troisième matrice d'attention est normalisée et ensuite appariée avec le vocabulaire prédictif prédéfini pour générer un texte prédictif.
PCT/CN2019/092519 2019-01-23 2019-06-24 Procédé et dispositif de génération de texte, dispositif informatique et support de stockage WO2020151175A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910064116.4A CN109918630B (zh) 2019-01-23 2019-01-23 文本生成方法、装置、计算机设备及存储介质
CN201910064116.4 2019-01-23

Publications (1)

Publication Number Publication Date
WO2020151175A1 true WO2020151175A1 (fr) 2020-07-30

Family

ID=66960501

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/092519 WO2020151175A1 (fr) 2019-01-23 2019-06-24 Procédé et dispositif de génération de texte, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN109918630B (fr)
WO (1) WO2020151175A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183057A (zh) * 2020-09-16 2021-01-05 北京思源智通科技有限责任公司 文章生成方法、装置、智能设备和存储介质
CN112561474A (zh) * 2020-12-14 2021-03-26 华南理工大学 一种基于多源数据融合的智能人格特性评价方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918630B (zh) * 2019-01-23 2023-08-04 平安科技(深圳)有限公司 文本生成方法、装置、计算机设备及存储介质
CN110427456A (zh) * 2019-06-26 2019-11-08 平安科技(深圳)有限公司 一种词语联想的方法及装置
CN110442767B (zh) * 2019-07-31 2023-08-18 腾讯科技(深圳)有限公司 一种确定内容互动平台标签的方法、装置及可读存储介质
CN111061867B (zh) * 2019-10-29 2022-10-25 平安科技(深圳)有限公司 基于质量感知的文本生成方法、设备、存储介质及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180293499A1 (en) * 2017-04-11 2018-10-11 Sap Se Unsupervised neural attention model for aspect extraction
CN108829719A (zh) * 2018-05-07 2018-11-16 中国科学院合肥物质科学研究院 一种非事实类问答答案选择方法及系统
CN108845990A (zh) * 2018-06-12 2018-11-20 北京慧闻科技发展有限公司 基于双向注意力机制的答案选择方法、装置和电子设备
CN109918630A (zh) * 2019-01-23 2019-06-21 平安科技(深圳)有限公司 文本生成方法、装置、计算机设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9317482B2 (en) * 2012-10-14 2016-04-19 Microsoft Technology Licensing, Llc Universal FPGA/ASIC matrix-vector multiplication architecture
CN108664632B (zh) * 2018-05-15 2021-09-21 华南理工大学 一种基于卷积神经网络和注意力机制的文本情感分类算法
CN109034378B (zh) * 2018-09-04 2023-03-31 腾讯科技(深圳)有限公司 神经网络的网络表示生成方法、装置、存储介质和设备
CN109241536B (zh) * 2018-09-21 2020-11-06 浙江大学 一种基于深度学习自注意力机制的句子排序方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180293499A1 (en) * 2017-04-11 2018-10-11 Sap Se Unsupervised neural attention model for aspect extraction
CN108829719A (zh) * 2018-05-07 2018-11-16 中国科学院合肥物质科学研究院 一种非事实类问答答案选择方法及系统
CN108845990A (zh) * 2018-06-12 2018-11-20 北京慧闻科技发展有限公司 基于双向注意力机制的答案选择方法、装置和电子设备
CN109918630A (zh) * 2019-01-23 2019-06-21 平安科技(深圳)有限公司 文本生成方法、装置、计算机设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183057A (zh) * 2020-09-16 2021-01-05 北京思源智通科技有限责任公司 文章生成方法、装置、智能设备和存储介质
CN112561474A (zh) * 2020-12-14 2021-03-26 华南理工大学 一种基于多源数据融合的智能人格特性评价方法
CN112561474B (zh) * 2020-12-14 2024-04-30 华南理工大学 一种基于多源数据融合的智能人格特性评价方法

Also Published As

Publication number Publication date
CN109918630A (zh) 2019-06-21
CN109918630B (zh) 2023-08-04

Similar Documents

Publication Publication Date Title
WO2020151175A1 (fr) Procédé et dispositif de génération de texte, dispositif informatique et support de stockage
WO2020140403A1 (fr) Procédé et appareil de classification de texte, dispositif informatique et support de stockage
WO2022007823A1 (fr) Procédé et dispositif de traitement de données de texte
CN107836000B (zh) 用于语言建模和预测的改进的人工神经网络方法、电子设备
CN109214386B (zh) 用于生成图像识别模型的方法和装置
CN109299344B (zh) 排序模型的生成方法、搜索结果的排序方法、装置及设备
CN111368993B (zh) 一种数据处理方法及相关设备
US10755048B2 (en) Artificial intelligence based method and apparatus for segmenting sentence
EP4209965A1 (fr) Procédé de traitement de données et dispositif associé
US20180174037A1 (en) Suggesting resources using context hashing
WO2020140632A1 (fr) Procédé d'extraction de caractéristiques masquées, appareil, dispositif informatique et support de stockage
US20210390370A1 (en) Data processing method and apparatus, storage medium and electronic device
WO2022001724A1 (fr) Procédé et dispositif de traitement de données
WO2021034941A1 (fr) Procédé de récupération et de groupement multimodaux à l'aide d'une cca profonde et d'interrogations par paires actives
CN110968725B (zh) 图像内容描述信息生成方法、电子设备及存储介质
CN112529149B (zh) 一种数据处理方法及相关装置
US20240046067A1 (en) Data processing method and related device
JP6743942B2 (ja) 語彙テーブルの選択方法、装置およびコンピュータ読み取り可能な記憶媒体
US20220383119A1 (en) Granular neural network architecture search over low-level primitives
CN113011532A (zh) 分类模型训练方法、装置、计算设备及存储介质
WO2020143303A1 (fr) Procédé et dispositif d'entraînement de modèle d'apprentissage profond, appareil informatique et support d'informations
US20240152770A1 (en) Neural network search method and related device
WO2024114659A1 (fr) Procédé de génération de résumé et dispositif associé
WO2021253938A1 (fr) Procédé et appareil d'apprentissage de réseau neuronal, et procédé et appareil de reconnaissance vidéo
WO2021012691A1 (fr) Procédé et dispositif de récupération d'image

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.11.2021).

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19911911

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19911911

Country of ref document: EP

Kind code of ref document: A1