CN111428014A - Non-autoregressive conversational speech generation method and model based on maximum mutual information - Google Patents
Non-autoregressive conversational speech generation method and model based on maximum mutual information Download PDFInfo
- Publication number
- CN111428014A CN111428014A CN202010185621.7A CN202010185621A CN111428014A CN 111428014 A CN111428014 A CN 111428014A CN 202010185621 A CN202010185621 A CN 202010185621A CN 111428014 A CN111428014 A CN 111428014A
- Authority
- CN
- China
- Prior art keywords
- sentence
- probability
- reply
- autoregressive
- mutual information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 239000013598 vector Substances 0.000 claims abstract description 16
- 238000000605 extraction Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 4
- 241000282326 Felis catus Species 0.000 description 6
- FBOUIAKEJMZPQG-AWNIVKPZSA-N (1E)-1-(2,4-dichlorophenyl)-4,4-dimethyl-2-(1,2,4-triazol-1-yl)pent-1-en-3-ol Chemical compound C1=NC=NN1/C(C(O)C(C)(C)C)=C/C1=CC=C(Cl)C=C1Cl FBOUIAKEJMZPQG-AWNIVKPZSA-N 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a non-autoregressive dialogue generation method and a model based on maximum mutual information, and belongs to the technical field of machine dialogue. The non-autoregressive dialogue generation method based on the maximum mutual information comprises the following steps: coding an input first upper sentence through a preceding coder to obtain a first feature vector of the first upper sentence; decoding the first feature vector through a foreigner decoder to obtain reply sentences of the above sentences, and calculating first probability of each reply sentence; encoding the reply through a backward encoder to obtain a second feature vector; decoding the second feature vector through a backward decoder to obtain a second upper statement of the reply statement, and calculating a second probability of the first upper statement appearing in the second upper statement; and calculating the sum of the first probability and the second probability, and selecting the corresponding reply sentence when the sum is maximum. The invention achieves the balance of efficiency and effect in the dialog generation process by utilizing a non-autoregressive method and a maximum mutual information criterion.
Description
Technical Field
The invention relates to the technical field of machine conversation, in particular to a non-autoregressive conversation generation method and a model based on maximum mutual information.
Background
In the prior art, most of the previous dialog generation uses an "autoregressive" generation mode, so-called autoregressive, that is, in the dialog generation process, dialog contents are generated word by word, and the current word is generated based on all the words generated before, so as to form a sentence. For example, to generate the sentence "I like cats", the autoregressive is generated by: first generate "I", then generate "like" based on "I", and finally generate "cats" based on "I like". Expressed in terms of probability is:. It is clear that a disadvantage of this method is that model generation is particularly slow when the sentences to be generated are long, since only one word can be generated at a time. This drawback is particularly pronounced in dialog generation.
The non-autoregressive generation method is to generate a plurality of, even all, words at one time, for example, when generating an "I like cats" statement, three words can be generated at one time, represented by probabilityAnd the generation of each word does not depend on other words and can be independently generated, namely, the words can be generated by the model at one time. Obviously, the non-autoregressive generation method can greatly improve the generation efficiency, but has the disadvantage that the generated words have no correlation, so that the generated sentences are very poor and the accuracy requirement of dialog generation cannot be met, for example, "ii I" or "like like like" may be generated instead of the correct sentence "I like" s ".
Disclosure of Invention
The invention mainly solves the technical problem of providing a non-autoregressive dialog generation method and a model machine based on maximum mutual information, which can accelerate the dialog generation speed, improve the generation efficiency, improve the correlation between the text and the text during the dialog generation and improve the accuracy of the dialog generation.
In order to achieve the above object, the first technical solution adopted by the present invention is: the non-autoregressive dialogue generating method based on the maximum mutual information comprises the following steps: coding the input upper sentence through a first coder to obtain a first feature vector of the first upper sentence; decoding the first feature vector through a first decoder to obtain reply sentences of the above sentences, and calculating first probability of each reply sentence; coding the reply statement through a second coder to obtain a second feature vector; decoding the second feature vector through a second decoder to obtain a second upper statement of the reply statement, and calculating a second probability of the first upper statement appearing in the second upper statement; and calculating the sum of the first probability and the second probability, and selecting the corresponding reply sentence when the sum is maximum.
In order to achieve the above object, the second technical solution adopted by the present invention is: there is provided a non-autoregressive dialog generation model based on maximum mutual information, comprising: a reply sentence generation section that generates a reply sentence from the input first upper sentence by first encoding and first decoding; a previous sentence generating section that generates a second previous sentence from the reply sentence by the second encoding and the second decoding; the probability operation part calculates the first probability of each reply statement generated by the first previous statement, calculates the rule that the second previous statement generated by the reply statement is the first previous statement, and solves the probability and the value; and the reply sentence extracting part compares the probability and the value and selects the reply sentence corresponding to the maximum probability and the value.
The invention has the beneficial effects that: when the method is applied, a non-autoregressive conversation production mode is used, the conversation generation efficiency is improved, meanwhile, the maximum mutual information is used for grasping the conversation generation correlation, the conversation generation quality is improved, and the balance of efficiency and effect is achieved.
Drawings
FIG. 1 is a flow chart diagram of a non-autoregressive dialog generation method based on maximum mutual information according to the present invention;
FIG. 2 is a structural diagram of the non-autoregressive dialog generating model based on maximum mutual information according to the present invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
It is noted that the terms first and second in the claims and the description of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In one embodiment of the present invention, as shown in fig. 1, a schematic flow chart of the non-autoregressive dialog generating method based on maximum mutual information according to the present invention includes the following steps:
step S101, generating a reply statement.
In one embodiment of the present invention, the forward encoder encodes the above dialog into feature vectors corresponding to the above dialog, and the forward decoder decodes the feature vectors of the above dialog to generate a reply sentence. In the process of generating the reply sentence, a non-autoregressive mode is adopted for generation, namely, all words forming the reply sentence are generated at one time at the same time, so that the reply sentence can be generated quickly when the conversation is long, and the generation efficiency of the reply sentence is improved. For example, when generating the statement "Ilike cats", autoregressive is generated by: first generate "I", then generate "like" based on "I", and finally generate "cats" based on "Ilike". Expressed in terms of probability is:. It is clear that a disadvantage of this approach is that model generation is particularly slow when the sentences to be generated are long. For the non-autoregressive generation mode, three words in the non-autoregressive generation mode can be generated at one time, namelyWherein each word is generated independently of other words, and can be generated independently and completely at one time by using the model. Obviously, the non-autoregressive generation mode can greatly improve the generation efficiency.
In one embodiment of the present invention, when a sentence X is inputted, a plurality of reply sentences Y are generated by a forward encoder, a forward decoder and a non-autoregressive generation method, and the probability of the generated reply sentences is determinedSize, from which K sentences are sampled, denoted asThe probability is expressed as,Summarizing, when a plaintext sentence X is input, the probability of obtaining a reply sentence is expressed asWhere the letter i denotes the ith reply sentence.
Step S102, generating the above sentence.
In one embodiment of the present invention, the generated reply sentence is encoded into the feature vector corresponding to the reply sentence by backward encoding, and the above sentence is generated by decoding the feature vector of the reply sentence by a backward decoder. And calculates the probability that the newly generated sentence is the original sentence.
In one embodiment of the invention, when a reply sentence is inputThen, a plurality of the above sentences are generated by a backward encoder, a forward decoder, a backward decoder and a non-autoregressive generation mode,,Calculating the newly generated sentence,,Probability of the original sentence X, expressed as. Similarly, when a reply sentence is inputThen, a plurality of the above sentences are generated by a backward encoder, a forward decoder, a backward decoder and a non-autoregressive generation mode,,Calculating the newly generated sentence,,Probability of the original sentence X, expressed asAnalogize, when the ith reply statement is inputThen, the probability representation that the above sentence is the original one is obtained。
Step S103, a reply sentence extracting step.
In an embodiment of the present invention, the probabilities in the step S101, the step of generating the reply sentence, and the step S102 of generating the previous sentence are summed, and the reply sentence with the largest sum is the reply sentence closest to the previous dialog. In one embodiment of the invention, the probabilities are comparedAnd probabilityPerforming a summation operationSelect the reply sentence that maximizes the sumI.e. the sentence that maximizes mutual information, this is the final generated reply sentence.
When a non-autoregressive sentence generating manner is used, the sentence is less accurate, for example, when the sentence "Ilike cats" is to be generated, an "II I" or a "like like like like" may be generated. By using the maximum mutual information criterion in the dialog generation process, the problem of poor sentence reliability in the non-autoregressive sentence generation process is solved, the dialog generation process is ensured to be fast, the accuracy can be ensured, and the balance between the efficiency and the effect in the dialog generation process is achieved.
In one embodiment of the present invention, as shown in FIG. 2, the present invention generates model knots based on non-autoregressive dialogs with maximum mutual informationThe composition comprises a reply sentence generation module which generates a reply sentence by a first encoding and a first decoding of an input first upper sentence, and in one embodiment of the invention, when an upper sentence X is input, a plurality of reply sentences Y which are marked as a plurality of reply sentences Y are generated by a forward encoder, a forward decoder and a non-autoregressive generation modeThe probability of generating a statement is expressed as。
In this embodiment, the non-autoregressive dialogue generating model based on the maximum mutual information of the invention comprises a previous sentence generating module, which generates a second previous sentence from the reply sentence through a second encoding and a second decoding. In one embodiment of the invention, when a reply sentence is inputThen, a plurality of the above sentences are generated by a backward encoder, a forward decoder, a backward decoder and a non-autoregressive generation mode,,Calculating the newly generated sentence,,Probability of the original sentence X, expressed as. Similarly, when a reply sentence is inputThen, a plurality of the above sentences are generated by a backward encoder, a forward decoder, a backward decoder and a non-autoregressive generation mode,,Calculating the newly generated sentence,,Probability of the original sentence X, expressed asAnalogize, when the ith reply statement is inputThen, the probability representation that the above sentence is the original one is obtained
In this specific embodiment, as shown in fig. 2, the non-autoregressive dialogue generating model based on maximum mutual information further includes a reply statement extracting module, which sums the generation probability of the reply statement and the probability of the newly generated previous statement as the original statement, and selects the maximum sumA corresponding one of the reply statements. In one embodiment of the invention, the probabilities are comparedAnd probabilityPerforming a summation operationSelect the reply sentence that maximizes the sumI.e. the sentence that maximizes mutual information, this is the final generated reply sentence.
The invention uses a non-autoregressive dialogue production mode to generate all words of all reply sentences at one time, greatly improves the speed of generating the dialogue, improves the efficiency of the dialogue generation, simultaneously applies the maximum mutual information criterion, fully models the relevance of the text and the reply, grasps the relevance of the dialogue generation, improves the quality of the dialogue generation and achieves the balance of the efficiency and the effect of the dialogue generation.
Claims (9)
1. A non-autoregressive dialog generation method based on maximum mutual information is characterized by comprising the following steps:
coding an input first upper sentence through a first coder to obtain a first feature vector of the first upper sentence;
decoding the first feature vector through a first decoder to obtain reply sentences of the above sentences, and calculating first probability of each reply sentence;
encoding the reply through a second encoder to obtain a second feature vector;
decoding the second feature vector through a second decoder to obtain a second previous statement of the reply statement, and calculating a second probability of the first previous statement appearing in the second previous statement;
and calculating the sum of the first probability and the second probability, and selecting a reply statement corresponding to the maximum sum.
2. The maximum mutual information based non-autoregressive dialog generation method of claim 1, wherein each word in the reply sentence is generated simultaneously at one time during the generation of the reply sentence using the first decoder.
3. The maximum mutual information based non-autoregressive dialog generation method of claim 1, wherein the first probability and the second probability are logarithmized separately when solving for a sum of the first probability and the second probability.
4. The maximum mutual information based non-autoregressive dialog generation method of claim 1, wherein the first encoder is a forward encoder, the first decoder is a forward decoder, the second encoder is a backward encoder, and the second decoder is a backward decoder.
5. A non-autoregressive dialog generation model based on maximum mutual information, comprising:
the reply sentence generation module is used for generating a reply sentence by an input first upper sentence through first coding and first decoding and calculating a first probability of each reply sentence generated by the first upper sentence;
the previous sentence generating module is used for generating a second previous sentence by the reply sentence through second coding and second decoding and calculating a second probability that the second previous sentence generated by the reply sentence is the first previous sentence; and
and the reply sentence extraction part sums the first probability and the second annual probability and selects one reply sentence corresponding to the maximum probability sum value.
6. The maximum mutual information based non-autoregressive dialog generation model of claim 5, wherein said above sentence generation portion uses a forward encoder for said first encoding and a forward decoder for said first decoding.
7. The maximum mutual information based non-autoregressive dialogue generation model of claim 5, wherein the reply sentence is generated in the above sentence generation section using a non-autoregressive method, and each word in the reply sentence is generated at once in synchronization.
8. The maximum mutual information based non-autoregressive dialogue generation model of claim 5, wherein the reply sentence generation section performs the second encoding using a backward encoder and the second decoding using a backward decoder.
9. The maximum mutual information based non-autoregressive dialog generation model of claim 5, wherein in the probability operation section, the first probability and the second probability are respectively logarithmized and re-summed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010185621.7A CN111428014A (en) | 2020-03-17 | 2020-03-17 | Non-autoregressive conversational speech generation method and model based on maximum mutual information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010185621.7A CN111428014A (en) | 2020-03-17 | 2020-03-17 | Non-autoregressive conversational speech generation method and model based on maximum mutual information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111428014A true CN111428014A (en) | 2020-07-17 |
Family
ID=71548016
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010185621.7A Pending CN111428014A (en) | 2020-03-17 | 2020-03-17 | Non-autoregressive conversational speech generation method and model based on maximum mutual information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111428014A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012094075A (en) * | 2010-10-28 | 2012-05-17 | Toshiba Corp | Interaction device |
CN106710596A (en) * | 2016-12-15 | 2017-05-24 | 腾讯科技(上海)有限公司 | Answer statement determination method and device |
US20180285348A1 (en) * | 2016-07-19 | 2018-10-04 | Tencent Technology (Shenzhen) Company Limited | Dialog generation method, apparatus, and device, and storage medium |
CN109635093A (en) * | 2018-12-17 | 2019-04-16 | 北京百度网讯科技有限公司 | Method and apparatus for generating revert statement |
CN109710915A (en) * | 2017-10-26 | 2019-05-03 | 华为技术有限公司 | Repeat sentence generation method and device |
US20190198014A1 (en) * | 2017-12-21 | 2019-06-27 | Ricoh Company, Ltd. | Method and apparatus for ranking responses of dialog model, and non-transitory computer-readable recording medium |
CN110222155A (en) * | 2019-06-13 | 2019-09-10 | 北京百度网讯科技有限公司 | Dialogue generation method, device and the terminal of knowledge-chosen strategy |
CN110851574A (en) * | 2018-07-27 | 2020-02-28 | 北京京东尚科信息技术有限公司 | Statement processing method, device and system |
CN110852116A (en) * | 2019-11-07 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Non-autoregressive neural machine translation method, device, computer equipment and medium |
-
2020
- 2020-03-17 CN CN202010185621.7A patent/CN111428014A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012094075A (en) * | 2010-10-28 | 2012-05-17 | Toshiba Corp | Interaction device |
US20180285348A1 (en) * | 2016-07-19 | 2018-10-04 | Tencent Technology (Shenzhen) Company Limited | Dialog generation method, apparatus, and device, and storage medium |
CN106710596A (en) * | 2016-12-15 | 2017-05-24 | 腾讯科技(上海)有限公司 | Answer statement determination method and device |
US20190220513A1 (en) * | 2016-12-15 | 2019-07-18 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for determining a reply statement |
CN109710915A (en) * | 2017-10-26 | 2019-05-03 | 华为技术有限公司 | Repeat sentence generation method and device |
US20190198014A1 (en) * | 2017-12-21 | 2019-06-27 | Ricoh Company, Ltd. | Method and apparatus for ranking responses of dialog model, and non-transitory computer-readable recording medium |
CN110851574A (en) * | 2018-07-27 | 2020-02-28 | 北京京东尚科信息技术有限公司 | Statement processing method, device and system |
CN109635093A (en) * | 2018-12-17 | 2019-04-16 | 北京百度网讯科技有限公司 | Method and apparatus for generating revert statement |
CN110222155A (en) * | 2019-06-13 | 2019-09-10 | 北京百度网讯科技有限公司 | Dialogue generation method, device and the terminal of knowledge-chosen strategy |
CN110852116A (en) * | 2019-11-07 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Non-autoregressive neural machine translation method, device, computer equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102648306B1 (en) | Speech recognition error correction method, related devices, and readable storage medium | |
JP7490804B2 (en) | System and method for streaming end-to-end speech recognition with asynchronous decoders - Patents.com | |
CN111477216B (en) | Training method and system for voice and meaning understanding model of conversation robot | |
US11908451B2 (en) | Text-based virtual object animation generation method, apparatus, storage medium, and terminal | |
CN108170686B (en) | Text translation method and device | |
CN108710704B (en) | Method and device for determining conversation state, electronic equipment and storage medium | |
JP2023545988A (en) | Transformer transducer: One model that combines streaming and non-streaming speech recognition | |
CN110875035A (en) | Novel multi-task combined speech recognition training framework and method | |
CN111191468B (en) | Term replacement method and device | |
CN117099157A (en) | Multitasking learning for end-to-end automatic speech recognition confidence and erasure estimation | |
JP2023547847A (en) | Cascading encoder for simplified streaming and non-streaming ASR | |
CN109933773A (en) | A kind of multiple semantic sentence analysis system and method | |
CN114881010A (en) | Chinese grammar error correction method based on Transformer and multitask learning | |
CN112735377B (en) | Speech synthesis method, device, terminal equipment and storage medium | |
US20240153484A1 (en) | Massive multilingual speech-text joint semi-supervised learning for text-to-speech | |
CN111667828B (en) | Speech recognition method and apparatus, electronic device, and storage medium | |
CN113257221A (en) | Voice model training method based on front-end design and voice synthesis method | |
CN111428014A (en) | Non-autoregressive conversational speech generation method and model based on maximum mutual information | |
CN117079637A (en) | Mongolian emotion voice synthesis method based on condition generation countermeasure network | |
KR102637025B1 (en) | Multilingual rescoring models for automatic speech recognition | |
CN116312539A (en) | Chinese dialogue round correction method and system based on large model | |
CN114783405B (en) | Speech synthesis method, device, electronic equipment and storage medium | |
CN110825869A (en) | Text abstract generating method of variation generation decoder based on copying mechanism | |
CN115346520A (en) | Method, apparatus, electronic device and medium for speech recognition | |
CN113077785A (en) | End-to-end multi-language continuous voice stream voice content identification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200717 |
|
RJ01 | Rejection of invention patent application after publication |