CN111428014A - Non-autoregressive conversational speech generation method and model based on maximum mutual information - Google Patents

Non-autoregressive conversational speech generation method and model based on maximum mutual information Download PDF

Info

Publication number
CN111428014A
CN111428014A CN202010185621.7A CN202010185621A CN111428014A CN 111428014 A CN111428014 A CN 111428014A CN 202010185621 A CN202010185621 A CN 202010185621A CN 111428014 A CN111428014 A CN 111428014A
Authority
CN
China
Prior art keywords
sentence
probability
reply
autoregressive
mutual information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010185621.7A
Other languages
Chinese (zh)
Inventor
韩庆宏
李纪为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiangnong Huiyu Technology Co ltd
Original Assignee
Beijing Xiangnong Huiyu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiangnong Huiyu Technology Co ltd filed Critical Beijing Xiangnong Huiyu Technology Co ltd
Priority to CN202010185621.7A priority Critical patent/CN111428014A/en
Publication of CN111428014A publication Critical patent/CN111428014A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a non-autoregressive dialogue generation method and a model based on maximum mutual information, and belongs to the technical field of machine dialogue. The non-autoregressive dialogue generation method based on the maximum mutual information comprises the following steps: coding an input first upper sentence through a preceding coder to obtain a first feature vector of the first upper sentence; decoding the first feature vector through a foreigner decoder to obtain reply sentences of the above sentences, and calculating first probability of each reply sentence; encoding the reply through a backward encoder to obtain a second feature vector; decoding the second feature vector through a backward decoder to obtain a second upper statement of the reply statement, and calculating a second probability of the first upper statement appearing in the second upper statement; and calculating the sum of the first probability and the second probability, and selecting the corresponding reply sentence when the sum is maximum. The invention achieves the balance of efficiency and effect in the dialog generation process by utilizing a non-autoregressive method and a maximum mutual information criterion.

Description

Non-autoregressive conversational speech generation method and model based on maximum mutual information
Technical Field
The invention relates to the technical field of machine conversation, in particular to a non-autoregressive conversation generation method and a model based on maximum mutual information.
Background
In the prior art, most of the previous dialog generation uses an "autoregressive" generation mode, so-called autoregressive, that is, in the dialog generation process, dialog contents are generated word by word, and the current word is generated based on all the words generated before, so as to form a sentence. For example, to generate the sentence "I like cats", the autoregressive is generated by: first generate "I", then generate "like" based on "I", and finally generate "cats" based on "I like". Expressed in terms of probability is:
Figure DEST_PATH_IMAGE002
. It is clear that a disadvantage of this method is that model generation is particularly slow when the sentences to be generated are long, since only one word can be generated at a time. This drawback is particularly pronounced in dialog generation.
The non-autoregressive generation method is to generate a plurality of, even all, words at one time, for example, when generating an "I like cats" statement, three words can be generated at one time, represented by probability
Figure DEST_PATH_IMAGE004
And the generation of each word does not depend on other words and can be independently generated, namely, the words can be generated by the model at one time. Obviously, the non-autoregressive generation method can greatly improve the generation efficiency, but has the disadvantage that the generated words have no correlation, so that the generated sentences are very poor and the accuracy requirement of dialog generation cannot be met, for example, "ii I" or "like like like" may be generated instead of the correct sentence "I like" s ".
Disclosure of Invention
The invention mainly solves the technical problem of providing a non-autoregressive dialog generation method and a model machine based on maximum mutual information, which can accelerate the dialog generation speed, improve the generation efficiency, improve the correlation between the text and the text during the dialog generation and improve the accuracy of the dialog generation.
In order to achieve the above object, the first technical solution adopted by the present invention is: the non-autoregressive dialogue generating method based on the maximum mutual information comprises the following steps: coding the input upper sentence through a first coder to obtain a first feature vector of the first upper sentence; decoding the first feature vector through a first decoder to obtain reply sentences of the above sentences, and calculating first probability of each reply sentence; coding the reply statement through a second coder to obtain a second feature vector; decoding the second feature vector through a second decoder to obtain a second upper statement of the reply statement, and calculating a second probability of the first upper statement appearing in the second upper statement; and calculating the sum of the first probability and the second probability, and selecting the corresponding reply sentence when the sum is maximum.
In order to achieve the above object, the second technical solution adopted by the present invention is: there is provided a non-autoregressive dialog generation model based on maximum mutual information, comprising: a reply sentence generation section that generates a reply sentence from the input first upper sentence by first encoding and first decoding; a previous sentence generating section that generates a second previous sentence from the reply sentence by the second encoding and the second decoding; the probability operation part calculates the first probability of each reply statement generated by the first previous statement, calculates the rule that the second previous statement generated by the reply statement is the first previous statement, and solves the probability and the value; and the reply sentence extracting part compares the probability and the value and selects the reply sentence corresponding to the maximum probability and the value.
The invention has the beneficial effects that: when the method is applied, a non-autoregressive conversation production mode is used, the conversation generation efficiency is improved, meanwhile, the maximum mutual information is used for grasping the conversation generation correlation, the conversation generation quality is improved, and the balance of efficiency and effect is achieved.
Drawings
FIG. 1 is a flow chart diagram of a non-autoregressive dialog generation method based on maximum mutual information according to the present invention;
FIG. 2 is a structural diagram of the non-autoregressive dialog generating model based on maximum mutual information according to the present invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
It is noted that the terms first and second in the claims and the description of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In one embodiment of the present invention, as shown in fig. 1, a schematic flow chart of the non-autoregressive dialog generating method based on maximum mutual information according to the present invention includes the following steps:
step S101, generating a reply statement.
In one embodiment of the present invention, the forward encoder encodes the above dialog into feature vectors corresponding to the above dialog, and the forward decoder decodes the feature vectors of the above dialog to generate a reply sentence. In the process of generating the reply sentence, a non-autoregressive mode is adopted for generation, namely, all words forming the reply sentence are generated at one time at the same time, so that the reply sentence can be generated quickly when the conversation is long, and the generation efficiency of the reply sentence is improved. For example, when generating the statement "Ilike cats", autoregressive is generated by: first generate "I", then generate "like" based on "I", and finally generate "cats" based on "Ilike". Expressed in terms of probability is:
Figure DEST_PATH_IMAGE006
. It is clear that a disadvantage of this approach is that model generation is particularly slow when the sentences to be generated are long. For the non-autoregressive generation mode, three words in the non-autoregressive generation mode can be generated at one time, namely
Figure DEST_PATH_IMAGE008
Wherein each word is generated independently of other words, and can be generated independently and completely at one time by using the model. Obviously, the non-autoregressive generation mode can greatly improve the generation efficiency.
In one embodiment of the present invention, when a sentence X is inputted, a plurality of reply sentences Y are generated by a forward encoder, a forward decoder and a non-autoregressive generation method, and the probability of the generated reply sentences is determined
Figure DEST_PATH_IMAGE010
Size, from which K sentences are sampled, denoted as
Figure DEST_PATH_IMAGE012
The probability is expressed as
Figure DEST_PATH_IMAGE014
Figure DEST_PATH_IMAGE016
Summarizing, when a plaintext sentence X is input, the probability of obtaining a reply sentence is expressed as
Figure DEST_PATH_IMAGE018
Where the letter i denotes the ith reply sentence.
Step S102, generating the above sentence.
In one embodiment of the present invention, the generated reply sentence is encoded into the feature vector corresponding to the reply sentence by backward encoding, and the above sentence is generated by decoding the feature vector of the reply sentence by a backward decoder. And calculates the probability that the newly generated sentence is the original sentence.
In one embodiment of the invention, when a reply sentence is input
Figure DEST_PATH_IMAGE020
Then, a plurality of the above sentences are generated by a backward encoder, a forward decoder, a backward decoder and a non-autoregressive generation mode
Figure DEST_PATH_IMAGE022
,
Figure DEST_PATH_IMAGE024
,
Figure DEST_PATH_IMAGE026
Calculating the newly generated sentence
Figure 643871DEST_PATH_IMAGE022
,
Figure 561012DEST_PATH_IMAGE024
,
Figure 340749DEST_PATH_IMAGE026
Probability of the original sentence X, expressed as
Figure DEST_PATH_IMAGE028
. Similarly, when a reply sentence is input
Figure DEST_PATH_IMAGE030
Then, a plurality of the above sentences are generated by a backward encoder, a forward decoder, a backward decoder and a non-autoregressive generation mode
Figure DEST_PATH_IMAGE032
,
Figure DEST_PATH_IMAGE034
,
Figure DEST_PATH_IMAGE036
Calculating the newly generated sentence
Figure 63854DEST_PATH_IMAGE032
,
Figure 878227DEST_PATH_IMAGE034
,
Figure 497427DEST_PATH_IMAGE036
Probability of the original sentence X, expressed as
Figure DEST_PATH_IMAGE038
Analogize, when the ith reply statement is input
Figure DEST_PATH_IMAGE040
Then, the probability representation that the above sentence is the original one is obtained
Figure DEST_PATH_IMAGE042
Step S103, a reply sentence extracting step.
In an embodiment of the present invention, the probabilities in the step S101, the step of generating the reply sentence, and the step S102 of generating the previous sentence are summed, and the reply sentence with the largest sum is the reply sentence closest to the previous dialog. In one embodiment of the invention, the probabilities are compared
Figure 295619DEST_PATH_IMAGE042
And probability
Figure 494519DEST_PATH_IMAGE018
Performing a summation operation
Figure DEST_PATH_IMAGE044
Select the reply sentence that maximizes the sum
Figure 960135DEST_PATH_IMAGE040
I.e. the sentence that maximizes mutual information, this is the final generated reply sentence.
When a non-autoregressive sentence generating manner is used, the sentence is less accurate, for example, when the sentence "Ilike cats" is to be generated, an "II I" or a "like like like like" may be generated. By using the maximum mutual information criterion in the dialog generation process, the problem of poor sentence reliability in the non-autoregressive sentence generation process is solved, the dialog generation process is ensured to be fast, the accuracy can be ensured, and the balance between the efficiency and the effect in the dialog generation process is achieved.
In one embodiment of the present invention, as shown in FIG. 2, the present invention generates model knots based on non-autoregressive dialogs with maximum mutual informationThe composition comprises a reply sentence generation module which generates a reply sentence by a first encoding and a first decoding of an input first upper sentence, and in one embodiment of the invention, when an upper sentence X is input, a plurality of reply sentences Y which are marked as a plurality of reply sentences Y are generated by a forward encoder, a forward decoder and a non-autoregressive generation mode
Figure DEST_PATH_IMAGE046
The probability of generating a statement is expressed as
Figure 953499DEST_PATH_IMAGE018
In this embodiment, the non-autoregressive dialogue generating model based on the maximum mutual information of the invention comprises a previous sentence generating module, which generates a second previous sentence from the reply sentence through a second encoding and a second decoding. In one embodiment of the invention, when a reply sentence is input
Figure 238987DEST_PATH_IMAGE020
Then, a plurality of the above sentences are generated by a backward encoder, a forward decoder, a backward decoder and a non-autoregressive generation mode
Figure 975999DEST_PATH_IMAGE022
,
Figure 499384DEST_PATH_IMAGE024
,
Figure 929228DEST_PATH_IMAGE026
Calculating the newly generated sentence
Figure 970521DEST_PATH_IMAGE022
,
Figure 245645DEST_PATH_IMAGE024
,
Figure 889116DEST_PATH_IMAGE026
Probability of the original sentence X, expressed as
Figure 755440DEST_PATH_IMAGE028
. Similarly, when a reply sentence is input
Figure 484362DEST_PATH_IMAGE030
Then, a plurality of the above sentences are generated by a backward encoder, a forward decoder, a backward decoder and a non-autoregressive generation mode
Figure 563176DEST_PATH_IMAGE032
,
Figure 61154DEST_PATH_IMAGE034
,
Figure 629538DEST_PATH_IMAGE036
Calculating the newly generated sentence
Figure 580177DEST_PATH_IMAGE032
,
Figure 462682DEST_PATH_IMAGE034
,
Figure 346325DEST_PATH_IMAGE036
Probability of the original sentence X, expressed as
Figure 288873DEST_PATH_IMAGE038
Analogize, when the ith reply statement is input
Figure 726807DEST_PATH_IMAGE040
Then, the probability representation that the above sentence is the original one is obtained
Figure 147424DEST_PATH_IMAGE042
In this specific embodiment, as shown in fig. 2, the non-autoregressive dialogue generating model based on maximum mutual information further includes a reply statement extracting module, which sums the generation probability of the reply statement and the probability of the newly generated previous statement as the original statement, and selects the maximum sumA corresponding one of the reply statements. In one embodiment of the invention, the probabilities are compared
Figure 416732DEST_PATH_IMAGE042
And probability
Figure 530181DEST_PATH_IMAGE018
Performing a summation operation
Figure 455412DEST_PATH_IMAGE044
Select the reply sentence that maximizes the sum
Figure 476458DEST_PATH_IMAGE040
I.e. the sentence that maximizes mutual information, this is the final generated reply sentence.
The invention uses a non-autoregressive dialogue production mode to generate all words of all reply sentences at one time, greatly improves the speed of generating the dialogue, improves the efficiency of the dialogue generation, simultaneously applies the maximum mutual information criterion, fully models the relevance of the text and the reply, grasps the relevance of the dialogue generation, improves the quality of the dialogue generation and achieves the balance of the efficiency and the effect of the dialogue generation.

Claims (9)

1. A non-autoregressive dialog generation method based on maximum mutual information is characterized by comprising the following steps:
coding an input first upper sentence through a first coder to obtain a first feature vector of the first upper sentence;
decoding the first feature vector through a first decoder to obtain reply sentences of the above sentences, and calculating first probability of each reply sentence;
encoding the reply through a second encoder to obtain a second feature vector;
decoding the second feature vector through a second decoder to obtain a second previous statement of the reply statement, and calculating a second probability of the first previous statement appearing in the second previous statement;
and calculating the sum of the first probability and the second probability, and selecting a reply statement corresponding to the maximum sum.
2. The maximum mutual information based non-autoregressive dialog generation method of claim 1, wherein each word in the reply sentence is generated simultaneously at one time during the generation of the reply sentence using the first decoder.
3. The maximum mutual information based non-autoregressive dialog generation method of claim 1, wherein the first probability and the second probability are logarithmized separately when solving for a sum of the first probability and the second probability.
4. The maximum mutual information based non-autoregressive dialog generation method of claim 1, wherein the first encoder is a forward encoder, the first decoder is a forward decoder, the second encoder is a backward encoder, and the second decoder is a backward decoder.
5. A non-autoregressive dialog generation model based on maximum mutual information, comprising:
the reply sentence generation module is used for generating a reply sentence by an input first upper sentence through first coding and first decoding and calculating a first probability of each reply sentence generated by the first upper sentence;
the previous sentence generating module is used for generating a second previous sentence by the reply sentence through second coding and second decoding and calculating a second probability that the second previous sentence generated by the reply sentence is the first previous sentence; and
and the reply sentence extraction part sums the first probability and the second annual probability and selects one reply sentence corresponding to the maximum probability sum value.
6. The maximum mutual information based non-autoregressive dialog generation model of claim 5, wherein said above sentence generation portion uses a forward encoder for said first encoding and a forward decoder for said first decoding.
7. The maximum mutual information based non-autoregressive dialogue generation model of claim 5, wherein the reply sentence is generated in the above sentence generation section using a non-autoregressive method, and each word in the reply sentence is generated at once in synchronization.
8. The maximum mutual information based non-autoregressive dialogue generation model of claim 5, wherein the reply sentence generation section performs the second encoding using a backward encoder and the second decoding using a backward decoder.
9. The maximum mutual information based non-autoregressive dialog generation model of claim 5, wherein in the probability operation section, the first probability and the second probability are respectively logarithmized and re-summed.
CN202010185621.7A 2020-03-17 2020-03-17 Non-autoregressive conversational speech generation method and model based on maximum mutual information Pending CN111428014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010185621.7A CN111428014A (en) 2020-03-17 2020-03-17 Non-autoregressive conversational speech generation method and model based on maximum mutual information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010185621.7A CN111428014A (en) 2020-03-17 2020-03-17 Non-autoregressive conversational speech generation method and model based on maximum mutual information

Publications (1)

Publication Number Publication Date
CN111428014A true CN111428014A (en) 2020-07-17

Family

ID=71548016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010185621.7A Pending CN111428014A (en) 2020-03-17 2020-03-17 Non-autoregressive conversational speech generation method and model based on maximum mutual information

Country Status (1)

Country Link
CN (1) CN111428014A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012094075A (en) * 2010-10-28 2012-05-17 Toshiba Corp Interaction device
CN106710596A (en) * 2016-12-15 2017-05-24 腾讯科技(上海)有限公司 Answer statement determination method and device
US20180285348A1 (en) * 2016-07-19 2018-10-04 Tencent Technology (Shenzhen) Company Limited Dialog generation method, apparatus, and device, and storage medium
CN109635093A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for generating revert statement
CN109710915A (en) * 2017-10-26 2019-05-03 华为技术有限公司 Repeat sentence generation method and device
US20190198014A1 (en) * 2017-12-21 2019-06-27 Ricoh Company, Ltd. Method and apparatus for ranking responses of dialog model, and non-transitory computer-readable recording medium
CN110222155A (en) * 2019-06-13 2019-09-10 北京百度网讯科技有限公司 Dialogue generation method, device and the terminal of knowledge-chosen strategy
CN110852116A (en) * 2019-11-07 2020-02-28 腾讯科技(深圳)有限公司 Non-autoregressive neural machine translation method, device, computer equipment and medium
CN110851574A (en) * 2018-07-27 2020-02-28 北京京东尚科信息技术有限公司 Statement processing method, device and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012094075A (en) * 2010-10-28 2012-05-17 Toshiba Corp Interaction device
US20180285348A1 (en) * 2016-07-19 2018-10-04 Tencent Technology (Shenzhen) Company Limited Dialog generation method, apparatus, and device, and storage medium
CN106710596A (en) * 2016-12-15 2017-05-24 腾讯科技(上海)有限公司 Answer statement determination method and device
US20190220513A1 (en) * 2016-12-15 2019-07-18 Tencent Technology (Shenzhen) Company Limited Method and apparatus for determining a reply statement
CN109710915A (en) * 2017-10-26 2019-05-03 华为技术有限公司 Repeat sentence generation method and device
US20190198014A1 (en) * 2017-12-21 2019-06-27 Ricoh Company, Ltd. Method and apparatus for ranking responses of dialog model, and non-transitory computer-readable recording medium
CN110851574A (en) * 2018-07-27 2020-02-28 北京京东尚科信息技术有限公司 Statement processing method, device and system
CN109635093A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for generating revert statement
CN110222155A (en) * 2019-06-13 2019-09-10 北京百度网讯科技有限公司 Dialogue generation method, device and the terminal of knowledge-chosen strategy
CN110852116A (en) * 2019-11-07 2020-02-28 腾讯科技(深圳)有限公司 Non-autoregressive neural machine translation method, device, computer equipment and medium

Similar Documents

Publication Publication Date Title
KR102648306B1 (en) Speech recognition error correction method, related devices, and readable storage medium
CN110299131B (en) Voice synthesis method and device capable of controlling prosodic emotion and storage medium
US11908451B2 (en) Text-based virtual object animation generation method, apparatus, storage medium, and terminal
CN111477216B (en) Training method and system for voice and meaning understanding model of conversation robot
CN108170686B (en) Text translation method and device
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN110875035A (en) Novel multi-task combined speech recognition training framework and method
JP2023545988A (en) Transformer transducer: One model that combines streaming and non-streaming speech recognition
JP2023547847A (en) Cascading encoder for simplified streaming and non-streaming ASR
CN112348073A (en) Polyphone recognition method and device, electronic equipment and storage medium
CN112735377B (en) Speech synthesis method, device, terminal equipment and storage medium
CN117099157A (en) Multitasking learning for end-to-end automatic speech recognition confidence and erasure estimation
CN111667828B (en) Speech recognition method and apparatus, electronic device, and storage medium
CN111428014A (en) Non-autoregressive conversational speech generation method and model based on maximum mutual information
CN117079637A (en) Mongolian emotion voice synthesis method based on condition generation countermeasure network
CN114783405B (en) Speech synthesis method, device, electronic equipment and storage medium
CN113257221B (en) Voice model training method based on front-end design and voice synthesis method
CN115346520A (en) Method, apparatus, electronic device and medium for speech recognition
CN114881010A (en) Chinese grammar error correction method based on Transformer and multitask learning
CN114974218A (en) Voice conversion model training method and device and voice conversion method and device
CN113077785A (en) End-to-end multi-language continuous voice stream voice content identification method and system
CN112395832B (en) Text quantitative analysis and generation method and system based on sequence-to-sequence
CN112562686B (en) Zero-sample voice conversion corpus preprocessing method using neural network
JP7490804B2 (en) System and method for streaming end-to-end speech recognition with asynchronous decoders - Patents.com
KR102637025B1 (en) Multilingual rescoring models for automatic speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination