SG11201903130WA

SG11201903130WA - Sequence to sequence transformations for speech synthesis via recurrent neural networks

Info

Publication number: SG11201903130WA
Application number: SG11201903130WA
Authority: SG
Inventors: David Leo Wright Hall; Daniel Klein; Daniel Lawrence Roth; Laurence Steven Gillick; Andrew Lee Maas; Steven Andrew Wegmann
Original assignee: Semantic Machines Inc
Priority date: 2016-10-24
Filing date: 2017-10-24
Publication date: 2019-05-30
Also published as: CA3037090A1; BR112019006979A2; US20180114522A1; AU2017347995A1; WO2018081163A1; AU2017347995A8; WO2018081163A8

Abstract

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property - C.-- - .` Organization 03 111111111111111111111111111111111111111111111111111111111111111111111111 International Bureau (10) International Publication Number (43) International Publication Date .....•\"\" WO 2018/081163 Al 03 May 2018 (03.05.2018) WIP0 I PCT (51) International Patent Classification: 300 Washington Street, Unit 302, Newton, MA 02458 (US). G1OL 25/00 (2013.01) WEGMANN, Steven; One Gatway Center, 300 Washing- (21) International Application Number: ton Street, Unit 302, Newton, MA 02458 (US). PCT/US2017/058138 (74) Agent: BACHMANN, Steve; Bachmann Law Group, (22) International Filing Date: 19925 Stevens Creek Blvd, Ste 100, Cupertino, CA 95014 24 October 2017 (24.10.2017) (US). (25) Filing Language: English (81) Designated States (unless otherwise indicated, for every kind of national protection available): AE, AG, AL, AM, (26) Publication Language: English AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, (30) Priority Data: CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, 62/412,165 24 October 2016 (24.10.2016) US DZ, EC, EE, EG, ES, FL GB, GD, GE, GH, GM, GT, HN, 15/792,236 24 October 2017 (24.10.2017) US HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, (71) Applicant: SEMANTIC MACHINES, INC. [US/US]; MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, One Gatway Center, 300 Washington Street, Unit 302, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, Newton, MA 02458 (US). SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (72) Inventors: HALL, David Leo Wright; One Gatway Cen- ter, 300 Washington Street, Unit 302, Newton, MA 02458 (84) Designated States (unless otherwise indicated, for every — (US). KLEIN, David; One Gatway Center, 300 Washing- kind of regional protection available): ARIPO (BW, GH, ton Street, Unit 302, Newton, MA 02458 (US). ROTH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, = Daniel; One Gatway Center, 300 Washington Street, Unit UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, = — 302, Newton, MA 02458 (US). GILLICK, Lawrence; One TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, Gatway Center, 300 Washington Street, Unit 302, Newton, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, _ MA 02458 (US). MAAS, Andrew; One Gatway Center, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, = = (54) Title: SEQUENCE - WORKS — — Text - I'm gonna need about $3.50 TO SEQUENCE TRANSFORMATIONS FOR SPEECH SYNTHESIS VIA RECURRENT NEURAL NET- (57) : A system eliminates alignment processing and performs TTS L functionality using a new neural architecture. The neural architecture in- 305 dudes an encoder and a decoder. The encoder receives an input and encodes = = Pre-Processing = r Pronuncia ti on L '310 Text Normalization it into vectors. The encoder applies a sequence of transformations to the input and generates a vector representing the entire sentence. The decoder 315 takes the encoding and outputs an audio file, which can include compressed audio frames. Context(s) Text Encoder = Encoding Pronunciation Encoder Normalized Text Encoder = = = L. 320 L. -. 325 330 = Attention Hidden State Synthesis Decoder ' 1 , 1(3 350 11 M Audio Codec 345 .-' 1 r - -IIH Stop , , 360 x-. 335 Frames Output 11 Encoding Ls 11 355 GC © Text to Speech (TTS) 300 GC 11 0 FIGURE 3 N O [Continued on next page] WO 2018/081163 Al D ill TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, KM, ML, MR, NE, SN, TD, TG). Published: — with international search report (Art. 21(3))