SG11201903130WA - Sequence to sequence transformations for speech synthesis via recurrent neural networks - Google Patents
Sequence to sequence transformations for speech synthesis via recurrent neural networksInfo
- Publication number
- SG11201903130WA SG11201903130WA SG11201903130WA SG11201903130WA SG11201903130WA SG 11201903130W A SG11201903130W A SG 11201903130WA SG 11201903130W A SG11201903130W A SG 11201903130WA SG 11201903130W A SG11201903130W A SG 11201903130WA SG 11201903130W A SG11201903130W A SG 11201903130WA
- Authority
- SG
- Singapore
- Prior art keywords
- gatway
- newton
- unit
- sequence
- center
- Prior art date
Links
- 230000015572 biosynthetic process Effects 0.000 title abstract 3
- 238000003786 synthesis reaction Methods 0.000 title abstract 3
- 238000000844 transformation Methods 0.000 title abstract 3
- 230000000306 recurrent effect Effects 0.000 title abstract 2
- 230000009466 transformation Effects 0.000 title abstract 2
- 238000013528 artificial neural network Methods 0.000 title 1
- 230000001537 neural effect Effects 0.000 abstract 3
- 239000003795 chemical substances by application Substances 0.000 abstract 2
- 239000013598 vector Substances 0.000 abstract 2
- 238000010606 normalization Methods 0.000 abstract 1
- 230000008520 organization Effects 0.000 abstract 1
- 238000007781 pre-processing Methods 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
- G10L13/047—Architecture of speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Telephonic Communication Services (AREA)
Abstract
INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property - C.-- - .` Organization 03 111111111111111111111111111111111111111111111111111111111111111111111111 International Bureau (10) International Publication Number (43) International Publication Date .....•\"\" WO 2018/081163 Al 03 May 2018 (03.05.2018) WIP0 I PCT (51) International Patent Classification: 300 Washington Street, Unit 302, Newton, MA 02458 (US). G1OL 25/00 (2013.01) WEGMANN, Steven; One Gatway Center, 300 Washing- (21) International Application Number: ton Street, Unit 302, Newton, MA 02458 (US). PCT/US2017/058138 (74) Agent: BACHMANN, Steve; Bachmann Law Group, (22) International Filing Date: 19925 Stevens Creek Blvd, Ste 100, Cupertino, CA 95014 24 October 2017 (24.10.2017) (US). (25) Filing Language: English (81) Designated States (unless otherwise indicated, for every kind of national protection available): AE, AG, AL, AM, (26) Publication Language: English AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, (30) Priority Data: CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, 62/412,165 24 October 2016 (24.10.2016) US DZ, EC, EE, EG, ES, FL GB, GD, GE, GH, GM, GT, HN, 15/792,236 24 October 2017 (24.10.2017) US HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, (71) Applicant: SEMANTIC MACHINES, INC. [US/US]; MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, One Gatway Center, 300 Washington Street, Unit 302, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, Newton, MA 02458 (US). SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (72) Inventors: HALL, David Leo Wright; One Gatway Cen- ter, 300 Washington Street, Unit 302, Newton, MA 02458 (84) Designated States (unless otherwise indicated, for every — (US). KLEIN, David; One Gatway Center, 300 Washing- kind of regional protection available): ARIPO (BW, GH, ton Street, Unit 302, Newton, MA 02458 (US). ROTH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, = Daniel; One Gatway Center, 300 Washington Street, Unit UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, = — 302, Newton, MA 02458 (US). GILLICK, Lawrence; One TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, Gatway Center, 300 Washington Street, Unit 302, Newton, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, _ MA 02458 (US). MAAS, Andrew; One Gatway Center, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, = = (54) Title: SEQUENCE - WORKS — — Text - I'm gonna need about $3.50 TO SEQUENCE TRANSFORMATIONS FOR SPEECH SYNTHESIS VIA RECURRENT NEURAL NET- (57) : A system eliminates alignment processing and performs TTS L functionality using a new neural architecture. The neural architecture in- 305 dudes an encoder and a decoder. The encoder receives an input and encodes = = Pre-Processing = r Pronuncia ti on L '310 Text Normalization it into vectors. The encoder applies a sequence of transformations to the input and generates a vector representing the entire sentence. The decoder 315 takes the encoding and outputs an audio file, which can include compressed audio frames. Context(s) Text Encoder = Encoding Pronunciation Encoder Normalized Text Encoder = = = L. 320 L. -. 325 330 = Attention Hidden State Synthesis Decoder ' 1 , 1(3 350 11 M Audio Codec 345 .-' 1 r - -IIH Stop , , 360 x-. 335 Frames Output 11 Encoding Ls 11 355 GC © Text to Speech (TTS) 300 GC 11 0 FIGURE 3 N O [Continued on next page] WO 2018/081163 Al D ill TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, KM, ML, MR, NE, SN, TD, TG). Published: — with international search report (Art. 21(3))
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662412165P | 2016-10-24 | 2016-10-24 | |
US15/792,236 US20180114522A1 (en) | 2016-10-24 | 2017-10-24 | Sequence to sequence transformations for speech synthesis via recurrent neural networks |
PCT/US2017/058138 WO2018081163A1 (en) | 2016-10-24 | 2017-10-24 | Sequence to sequence transformations for speech synthesis via recurrent neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
SG11201903130WA true SG11201903130WA (en) | 2019-05-30 |
Family
ID=61969829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
SG11201903130WA SG11201903130WA (en) | 2016-10-24 | 2017-10-24 | Sequence to sequence transformations for speech synthesis via recurrent neural networks |
Country Status (6)
Country | Link |
---|---|
US (1) | US20180114522A1 (en) |
AU (1) | AU2017347995A1 (en) |
BR (1) | BR112019006979A2 (en) |
CA (1) | CA3037090A1 (en) |
SG (1) | SG11201903130WA (en) |
WO (1) | WO2018081163A1 (en) |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180061408A1 (en) * | 2016-08-24 | 2018-03-01 | Semantic Machines, Inc. | Using paraphrase in accepting utterances in an automated assistant |
WO2018085760A1 (en) | 2016-11-04 | 2018-05-11 | Semantic Machines, Inc. | Data collection for a new conversational dialogue system |
WO2018148441A1 (en) | 2017-02-08 | 2018-08-16 | Semantic Machines, Inc. | Natural language content generator |
US10762892B2 (en) | 2017-02-23 | 2020-09-01 | Semantic Machines, Inc. | Rapid deployment of dialogue system |
US10586530B2 (en) | 2017-02-23 | 2020-03-10 | Semantic Machines, Inc. | Expandable dialogue system |
US11069340B2 (en) | 2017-02-23 | 2021-07-20 | Microsoft Technology Licensing, Llc | Flexible and expandable dialogue system |
US10733380B2 (en) * | 2017-05-15 | 2020-08-04 | Thomson Reuters Enterprise Center Gmbh | Neural paraphrase generator |
CN107293296B (en) * | 2017-06-28 | 2020-11-20 | 百度在线网络技术(北京)有限公司 | Voice recognition result correction method, device, equipment and storage medium |
US11132499B2 (en) | 2017-08-28 | 2021-09-28 | Microsoft Technology Licensing, Llc | Robust expandable dialogue system |
US10510358B1 (en) * | 2017-09-29 | 2019-12-17 | Amazon Technologies, Inc. | Resolution enhancement of speech signals for speech synthesis |
US20210056958A1 (en) * | 2017-12-29 | 2021-02-25 | Fluent.Ai Inc. | System and method for tone recognition in spoken languages |
US11042712B2 (en) * | 2018-06-05 | 2021-06-22 | Koninklijke Philips N.V. | Simplifying and/or paraphrasing complex textual content by jointly learning semantic alignment and simplicity |
US11381715B2 (en) | 2018-07-16 | 2022-07-05 | Massachusetts Institute Of Technology | Computer method and apparatus making screens safe for those with photosensitivity |
CN110288978B (en) * | 2018-10-25 | 2022-08-30 | 腾讯科技(深圳)有限公司 | Speech recognition model training method and device |
TWI698857B (en) | 2018-11-21 | 2020-07-11 | 財團法人工業技術研究院 | Speech recognition system and method thereof, and computer program product |
CN109616093B (en) * | 2018-12-05 | 2024-02-27 | 平安科技(深圳)有限公司 | End-to-end speech synthesis method, device, equipment and storage medium |
US11508359B2 (en) * | 2019-09-11 | 2022-11-22 | Oracle International Corporation | Using backpropagation to train a dialog system |
CN112489618A (en) * | 2019-09-12 | 2021-03-12 | 微软技术许可有限责任公司 | Neural text-to-speech synthesis using multi-level contextual features |
CN111754973B (en) * | 2019-09-23 | 2023-09-01 | 北京京东尚科信息技术有限公司 | Speech synthesis method and device and storage medium |
US11373633B2 (en) * | 2019-09-27 | 2022-06-28 | Amazon Technologies, Inc. | Text-to-speech processing using input voice characteristic data |
KR20210042707A (en) * | 2019-10-10 | 2021-04-20 | 삼성전자주식회사 | Method and apparatus for processing speech |
WO2021107189A1 (en) * | 2019-11-28 | 2021-06-03 | 주식회사 엘솔루 | Electronic device for speech-to-text, and data processing method thereof |
CN111247581B (en) * | 2019-12-23 | 2023-10-10 | 深圳市优必选科技股份有限公司 | Multi-language text voice synthesizing method, device, equipment and storage medium |
NL2025235B1 (en) * | 2020-03-30 | 2021-10-22 | Microsoft Technology Licensing Llc | Updating constraints for computerized assistant actions |
US20220101829A1 (en) * | 2020-09-29 | 2022-03-31 | Harman International Industries, Incorporated | Neural network speech recognition system |
US11461681B2 (en) | 2020-10-14 | 2022-10-04 | Openstream Inc. | System and method for multi-modality soft-agent for query population and information mining |
CN112687259B (en) * | 2021-03-11 | 2021-06-18 | 腾讯科技(深圳)有限公司 | Speech synthesis method, device and readable storage medium |
US11600282B2 (en) * | 2021-07-02 | 2023-03-07 | Google Llc | Compressing audio waveforms using neural networks and vector quantizers |
CN115083386B (en) * | 2022-06-10 | 2024-09-06 | 思必驰科技股份有限公司 | Audio synthesis method, electronic device and storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7403890B2 (en) * | 2002-05-13 | 2008-07-22 | Roushar Joseph C | Multi-dimensional method and apparatus for automated language interpretation |
WO2011026247A1 (en) * | 2009-09-04 | 2011-03-10 | Svox Ag | Speech enhancement techniques on the power spectrum |
US9672811B2 (en) * | 2012-11-29 | 2017-06-06 | Sony Interactive Entertainment Inc. | Combining auditory attention cues with phoneme posterior scores for phone/vowel/syllable boundary detection |
US10127901B2 (en) * | 2014-06-13 | 2018-11-13 | Microsoft Technology Licensing, Llc | Hyper-structure recurrent neural networks for text-to-speech |
US9799327B1 (en) * | 2016-02-26 | 2017-10-24 | Google Inc. | Speech recognition with attention-based recurrent neural networks |
US10896669B2 (en) * | 2017-05-19 | 2021-01-19 | Baidu Usa Llc | Systems and methods for multi-speaker neural text-to-speech |
US10614826B2 (en) * | 2017-05-24 | 2020-04-07 | Modulate, Inc. | System and method for voice-to-voice conversion |
-
2017
- 2017-10-24 US US15/792,236 patent/US20180114522A1/en not_active Abandoned
- 2017-10-24 WO PCT/US2017/058138 patent/WO2018081163A1/en unknown
- 2017-10-24 CA CA3037090A patent/CA3037090A1/en not_active Abandoned
- 2017-10-24 SG SG11201903130WA patent/SG11201903130WA/en unknown
- 2017-10-24 AU AU2017347995A patent/AU2017347995A1/en not_active Abandoned
- 2017-10-24 BR BR112019006979A patent/BR112019006979A2/en not_active Application Discontinuation
Also Published As
Publication number | Publication date |
---|---|
CA3037090A1 (en) | 2018-05-03 |
BR112019006979A2 (en) | 2019-06-25 |
US20180114522A1 (en) | 2018-04-26 |
AU2017347995A1 (en) | 2019-03-28 |
WO2018081163A1 (en) | 2018-05-03 |
AU2017347995A8 (en) | 2019-08-29 |
WO2018081163A8 (en) | 2019-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
SG11201903130WA (en) | Sequence to sequence transformations for speech synthesis via recurrent neural networks | |
SG11201810933QA (en) | Anti-c5 antibodies and uses thereof | |
SG11201903857UA (en) | Antibodies to pd-1 and uses thereof | |
SG11201900201YA (en) | Methods for quantitating individual antibodies from a mixture | |
SG11201903304YA (en) | IL15/IL15Ra HETERODIMERIC FC-FUSION PROTEINS | |
SG11201903830TA (en) | Blockade of cd7 expression and chimeric antigen receptors for immunotherapy of t-cell malignancies | |
SG11201811283PA (en) | System and method for determining safety score of driver | |
SG11201907208XA (en) | Radiolabeled anti-lag3 antibodies for immuno-pet imaging | |
SG11201805470VA (en) | Code and container of system for preparing a beverage or foodstuff | |
SG11201903771XA (en) | Binding molecules specific for asct2 and uses thereof | |
SG11201811604UA (en) | System and method for real-time transcription of an audio signal into texts | |
SG11201906961UA (en) | Polypeptide variants and uses thereof | |
SG11201807325UA (en) | Optimizing range of aircraft docking system | |
SG11201908390UA (en) | Non-harmonic speech detection and bandwidth extension in a multi-source environment | |
SG11201908238SA (en) | Anti-c5 antibodies and uses thereof | |
SG11201908678XA (en) | Methods and compositions for reduction of immunogenicity | |
SG11201900442PA (en) | Sorting of t lymphocytes in a microfluidic device | |
SG11201909561RA (en) | Octree-based convolutional neural network | |
SG11201908056QA (en) | Anti-par2 antibodies and uses thereof | |
SG11201905510XA (en) | Compositions and methods for reducing bioburden in chromatography | |
SG11201810529VA (en) | Propeller | |
SG11201908744PA (en) | Anti-c5a antibodies and uses thereof | |
SG11201910131YA (en) | Compact reverse flow centrifuge system | |
SG11201906313SA (en) | A polypeptide linker for preparing multispecific antibodies | |
SG11201909348QA (en) | Stereo parameters for stereo decoding |