SG11201903130WA - Sequence to sequence transformations for speech synthesis via recurrent neural networks - Google Patents

Sequence to sequence transformations for speech synthesis via recurrent neural networks

Info

Publication number
SG11201903130WA
SG11201903130WA SG11201903130WA SG11201903130WA SG11201903130WA SG 11201903130W A SG11201903130W A SG 11201903130WA SG 11201903130W A SG11201903130W A SG 11201903130WA SG 11201903130W A SG11201903130W A SG 11201903130WA SG 11201903130W A SG11201903130W A SG 11201903130WA
Authority
SG
Singapore
Prior art keywords
gatway
newton
unit
sequence
center
Prior art date
Application number
SG11201903130WA
Inventor
David Leo Wright Hall
Daniel Klein
Daniel Lawrence Roth
Laurence Steven Gillick
Andrew Lee Maas
Steven Andrew Wegmann
Original Assignee
Semantic Machines Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Semantic Machines Inc filed Critical Semantic Machines Inc
Publication of SG11201903130WA publication Critical patent/SG11201903130WA/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephonic Communication Services (AREA)

Abstract

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property - C.-- - .` Organization 03 111111111111111111111111111111111111111111111111111111111111111111111111 International Bureau (10) International Publication Number (43) International Publication Date .....•\"\" WO 2018/081163 Al 03 May 2018 (03.05.2018) WIP0 I PCT (51) International Patent Classification: 300 Washington Street, Unit 302, Newton, MA 02458 (US). G1OL 25/00 (2013.01) WEGMANN, Steven; One Gatway Center, 300 Washing- (21) International Application Number: ton Street, Unit 302, Newton, MA 02458 (US). PCT/US2017/058138 (74) Agent: BACHMANN, Steve; Bachmann Law Group, (22) International Filing Date: 19925 Stevens Creek Blvd, Ste 100, Cupertino, CA 95014 24 October 2017 (24.10.2017) (US). (25) Filing Language: English (81) Designated States (unless otherwise indicated, for every kind of national protection available): AE, AG, AL, AM, (26) Publication Language: English AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, (30) Priority Data: CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, 62/412,165 24 October 2016 (24.10.2016) US DZ, EC, EE, EG, ES, FL GB, GD, GE, GH, GM, GT, HN, 15/792,236 24 October 2017 (24.10.2017) US HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, (71) Applicant: SEMANTIC MACHINES, INC. [US/US]; MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, One Gatway Center, 300 Washington Street, Unit 302, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, Newton, MA 02458 (US). SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (72) Inventors: HALL, David Leo Wright; One Gatway Cen- ter, 300 Washington Street, Unit 302, Newton, MA 02458 (84) Designated States (unless otherwise indicated, for every — (US). KLEIN, David; One Gatway Center, 300 Washing- kind of regional protection available): ARIPO (BW, GH, ton Street, Unit 302, Newton, MA 02458 (US). ROTH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, = Daniel; One Gatway Center, 300 Washington Street, Unit UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, = — 302, Newton, MA 02458 (US). GILLICK, Lawrence; One TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, Gatway Center, 300 Washington Street, Unit 302, Newton, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, _ MA 02458 (US). MAAS, Andrew; One Gatway Center, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, = = (54) Title: SEQUENCE - WORKS — — Text - I'm gonna need about $3.50 TO SEQUENCE TRANSFORMATIONS FOR SPEECH SYNTHESIS VIA RECURRENT NEURAL NET- (57) : A system eliminates alignment processing and performs TTS L functionality using a new neural architecture. The neural architecture in- 305 dudes an encoder and a decoder. The encoder receives an input and encodes = = Pre-Processing = r Pronuncia ti on L '310 Text Normalization it into vectors. The encoder applies a sequence of transformations to the input and generates a vector representing the entire sentence. The decoder 315 takes the encoding and outputs an audio file, which can include compressed audio frames. Context(s) Text Encoder = Encoding Pronunciation Encoder Normalized Text Encoder = = = L. 320 L. -. 325 330 = Attention Hidden State Synthesis Decoder ' 1 , 1(3 350 11 M Audio Codec 345 .-' 1 r - -IIH Stop , , 360 x-. 335 Frames Output 11 Encoding Ls 11 355 GC © Text to Speech (TTS) 300 GC 11 0 FIGURE 3 N O [Continued on next page] WO 2018/081163 Al D ill TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, KM, ML, MR, NE, SN, TD, TG). Published: — with international search report (Art. 21(3))
SG11201903130WA 2016-10-24 2017-10-24 Sequence to sequence transformations for speech synthesis via recurrent neural networks SG11201903130WA (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201662412165P 2016-10-24 2016-10-24
US15/792,236 US20180114522A1 (en) 2016-10-24 2017-10-24 Sequence to sequence transformations for speech synthesis via recurrent neural networks
PCT/US2017/058138 WO2018081163A1 (en) 2016-10-24 2017-10-24 Sequence to sequence transformations for speech synthesis via recurrent neural networks

Publications (1)

Publication Number Publication Date
SG11201903130WA true SG11201903130WA (en) 2019-05-30

Family

ID=61969829

Family Applications (1)

Application Number Title Priority Date Filing Date
SG11201903130WA SG11201903130WA (en) 2016-10-24 2017-10-24 Sequence to sequence transformations for speech synthesis via recurrent neural networks

Country Status (6)

Country Link
US (1) US20180114522A1 (en)
AU (1) AU2017347995A1 (en)
BR (1) BR112019006979A2 (en)
CA (1) CA3037090A1 (en)
SG (1) SG11201903130WA (en)
WO (1) WO2018081163A1 (en)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180061408A1 (en) * 2016-08-24 2018-03-01 Semantic Machines, Inc. Using paraphrase in accepting utterances in an automated assistant
WO2018085760A1 (en) 2016-11-04 2018-05-11 Semantic Machines, Inc. Data collection for a new conversational dialogue system
WO2018148441A1 (en) 2017-02-08 2018-08-16 Semantic Machines, Inc. Natural language content generator
US10762892B2 (en) 2017-02-23 2020-09-01 Semantic Machines, Inc. Rapid deployment of dialogue system
US10586530B2 (en) 2017-02-23 2020-03-10 Semantic Machines, Inc. Expandable dialogue system
US11069340B2 (en) 2017-02-23 2021-07-20 Microsoft Technology Licensing, Llc Flexible and expandable dialogue system
US10733380B2 (en) * 2017-05-15 2020-08-04 Thomson Reuters Enterprise Center Gmbh Neural paraphrase generator
CN107293296B (en) * 2017-06-28 2020-11-20 百度在线网络技术(北京)有限公司 Voice recognition result correction method, device, equipment and storage medium
US11132499B2 (en) 2017-08-28 2021-09-28 Microsoft Technology Licensing, Llc Robust expandable dialogue system
US10510358B1 (en) * 2017-09-29 2019-12-17 Amazon Technologies, Inc. Resolution enhancement of speech signals for speech synthesis
US20210056958A1 (en) * 2017-12-29 2021-02-25 Fluent.Ai Inc. System and method for tone recognition in spoken languages
US11042712B2 (en) * 2018-06-05 2021-06-22 Koninklijke Philips N.V. Simplifying and/or paraphrasing complex textual content by jointly learning semantic alignment and simplicity
US11381715B2 (en) 2018-07-16 2022-07-05 Massachusetts Institute Of Technology Computer method and apparatus making screens safe for those with photosensitivity
CN110288978B (en) * 2018-10-25 2022-08-30 腾讯科技(深圳)有限公司 Speech recognition model training method and device
TWI698857B (en) 2018-11-21 2020-07-11 財團法人工業技術研究院 Speech recognition system and method thereof, and computer program product
CN109616093B (en) * 2018-12-05 2024-02-27 平安科技(深圳)有限公司 End-to-end speech synthesis method, device, equipment and storage medium
US11508359B2 (en) * 2019-09-11 2022-11-22 Oracle International Corporation Using backpropagation to train a dialog system
CN112489618A (en) * 2019-09-12 2021-03-12 微软技术许可有限责任公司 Neural text-to-speech synthesis using multi-level contextual features
CN111754973B (en) * 2019-09-23 2023-09-01 北京京东尚科信息技术有限公司 Speech synthesis method and device and storage medium
US11373633B2 (en) * 2019-09-27 2022-06-28 Amazon Technologies, Inc. Text-to-speech processing using input voice characteristic data
KR20210042707A (en) * 2019-10-10 2021-04-20 삼성전자주식회사 Method and apparatus for processing speech
WO2021107189A1 (en) * 2019-11-28 2021-06-03 주식회사 엘솔루 Electronic device for speech-to-text, and data processing method thereof
CN111247581B (en) * 2019-12-23 2023-10-10 深圳市优必选科技股份有限公司 Multi-language text voice synthesizing method, device, equipment and storage medium
NL2025235B1 (en) * 2020-03-30 2021-10-22 Microsoft Technology Licensing Llc Updating constraints for computerized assistant actions
US20220101829A1 (en) * 2020-09-29 2022-03-31 Harman International Industries, Incorporated Neural network speech recognition system
US11461681B2 (en) 2020-10-14 2022-10-04 Openstream Inc. System and method for multi-modality soft-agent for query population and information mining
CN112687259B (en) * 2021-03-11 2021-06-18 腾讯科技(深圳)有限公司 Speech synthesis method, device and readable storage medium
US11600282B2 (en) * 2021-07-02 2023-03-07 Google Llc Compressing audio waveforms using neural networks and vector quantizers
CN115083386B (en) * 2022-06-10 2024-09-06 思必驰科技股份有限公司 Audio synthesis method, electronic device and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7403890B2 (en) * 2002-05-13 2008-07-22 Roushar Joseph C Multi-dimensional method and apparatus for automated language interpretation
WO2011026247A1 (en) * 2009-09-04 2011-03-10 Svox Ag Speech enhancement techniques on the power spectrum
US9672811B2 (en) * 2012-11-29 2017-06-06 Sony Interactive Entertainment Inc. Combining auditory attention cues with phoneme posterior scores for phone/vowel/syllable boundary detection
US10127901B2 (en) * 2014-06-13 2018-11-13 Microsoft Technology Licensing, Llc Hyper-structure recurrent neural networks for text-to-speech
US9799327B1 (en) * 2016-02-26 2017-10-24 Google Inc. Speech recognition with attention-based recurrent neural networks
US10896669B2 (en) * 2017-05-19 2021-01-19 Baidu Usa Llc Systems and methods for multi-speaker neural text-to-speech
US10614826B2 (en) * 2017-05-24 2020-04-07 Modulate, Inc. System and method for voice-to-voice conversion

Also Published As

Publication number Publication date
CA3037090A1 (en) 2018-05-03
BR112019006979A2 (en) 2019-06-25
US20180114522A1 (en) 2018-04-26
AU2017347995A1 (en) 2019-03-28
WO2018081163A1 (en) 2018-05-03
AU2017347995A8 (en) 2019-08-29
WO2018081163A8 (en) 2019-05-09

Similar Documents

Publication Publication Date Title
SG11201903130WA (en) Sequence to sequence transformations for speech synthesis via recurrent neural networks
SG11201810933QA (en) Anti-c5 antibodies and uses thereof
SG11201903857UA (en) Antibodies to pd-1 and uses thereof
SG11201900201YA (en) Methods for quantitating individual antibodies from a mixture
SG11201903304YA (en) IL15/IL15Ra HETERODIMERIC FC-FUSION PROTEINS
SG11201903830TA (en) Blockade of cd7 expression and chimeric antigen receptors for immunotherapy of t-cell malignancies
SG11201811283PA (en) System and method for determining safety score of driver
SG11201907208XA (en) Radiolabeled anti-lag3 antibodies for immuno-pet imaging
SG11201805470VA (en) Code and container of system for preparing a beverage or foodstuff
SG11201903771XA (en) Binding molecules specific for asct2 and uses thereof
SG11201811604UA (en) System and method for real-time transcription of an audio signal into texts
SG11201906961UA (en) Polypeptide variants and uses thereof
SG11201807325UA (en) Optimizing range of aircraft docking system
SG11201908390UA (en) Non-harmonic speech detection and bandwidth extension in a multi-source environment
SG11201908238SA (en) Anti-c5 antibodies and uses thereof
SG11201908678XA (en) Methods and compositions for reduction of immunogenicity
SG11201900442PA (en) Sorting of t lymphocytes in a microfluidic device
SG11201909561RA (en) Octree-based convolutional neural network
SG11201908056QA (en) Anti-par2 antibodies and uses thereof
SG11201905510XA (en) Compositions and methods for reducing bioburden in chromatography
SG11201810529VA (en) Propeller
SG11201908744PA (en) Anti-c5a antibodies and uses thereof
SG11201910131YA (en) Compact reverse flow centrifuge system
SG11201906313SA (en) A polypeptide linker for preparing multispecific antibodies
SG11201909348QA (en) Stereo parameters for stereo decoding