WO2021034613A1 - Développement d'applications vocales et d'autres applications d'interaction - Google Patents

Développement d'applications vocales et d'autres applications d'interaction Download PDF

Info

Publication number
WO2021034613A1
WO2021034613A1 PCT/US2020/046201 US2020046201W WO2021034613A1 WO 2021034613 A1 WO2021034613 A1 WO 2021034613A1 US 2020046201 W US2020046201 W US 2020046201W WO 2021034613 A1 WO2021034613 A1 WO 2021034613A1
Authority
WO
WIPO (PCT)
Prior art keywords
interaction
utterance
markup language
general
developer
Prior art date
Application number
PCT/US2020/046201
Other languages
English (en)
Inventor
Jeffrey K. Mcmahon
Robert T. Naughton
Nicholas G. Laidlaw
Alexander M. Dunn
Jason Green
Original Assignee
Voicify, LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/544,527 external-priority patent/US10614800B1/en
Priority claimed from US16/544,375 external-priority patent/US11508365B2/en
Priority claimed from US16/544,508 external-priority patent/US10762890B1/en
Application filed by Voicify, LLC filed Critical Voicify, LLC
Priority to CA3151910A priority Critical patent/CA3151910A1/fr
Priority to CN202080071550.6A priority patent/CN114945979A/zh
Priority to EP20853981.7A priority patent/EP4018436A4/fr
Publication of WO2021034613A1 publication Critical patent/WO2021034613A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management

Definitions

  • An intent represents a function that is bound to one or more utterances.
  • An utterance may contain one or more slots to represent dynamic values (for example, a time of day).
  • an intent is indicated by interaction of an end user with an interaction assistant (e.g., an Amazon Echo Dot)
  • information about the interaction is delivered by the assistant platform to the endpoint for additional processing.
  • An endpoint is essentially an application having a collection of functions or methods that map to the intents defined within the interaction model.
  • the endpoint’s functions may contain references to items of content or literal content (we sometimes refer to the “items of content” and “literal content” simply as “content”) that becomes part of the responses sent back to the assistant platform.
  • the development platform is its use of a “content-first” (or content-centric) development approach.
  • the content-first development approach gives priority to the aspects of the app development and deployment process that involve development of content and management of relationships between end-user requests and responses.
  • the following hard-coded interaction model can support only two user requests: Welcome and Weather.
  • the development platform Using the entered content and questions and information contained in the template, the development platform has enough information to automatically process and generate a response to essentially any type of request an end user might pose and handle variations of utterances that don’t require exact matching. For example, end-user requests that use the general utterance pattern “how do I ⁇ Query ⁇ ?” will map to a single intent within the development platform’s general interaction model.
  • the development platform uses the value of ⁇ Query ⁇ to search for a content match that will provide a suitable answer to both the general “how do I” part of the request and the specific ⁇ Query ⁇ part of the request. Because ⁇ Query ⁇ can have a wide range of specific values representing a variety of implicit intents, the use of the general utterance pattern support a wide range of requests.
  • the interaction platform may determine (for example, through automated inspection of repeated developer updates) that particular intents are worth updating for all interaction models for all interaction applications. In these cases, administrative updates can be made automatically (or with human assistance) across all interaction models to add, remove, or edit one or more intents.
  • the validation process will return an error stating the given unit, property, and element that does not allow it Check that the node’ s immediate children are among the child types allowed four the node If there are any children nodes that are not in the allowed child types, the validation process will return an error with the name of the child type that is not allowed for the specific node type.
  • the development platform has divided the original tree into elements that are fully valid on the left segment, and what would be invalid on the right segment.
  • the segmentation process can then either proceed with just the left branch or it could alter the right branch to remove the ⁇ voice> element resulting in the two trees (segments, branches) shown in figure 7
  • the segmenting process can also be applied separately to allow for using the separated trees to run custom logic. For example, some text-to-speech services support the ⁇ audio> element while others don’t. So when trying to generate audio files from the SSML that has ⁇ audio> elements, the segmentation engine can segment the trees separately, then generate the output speech audio files and keep the audio files separate but in order.
  • the development platform can process them individually for text-to-speech, resulting in three .mp3 files that can be played back to back as one full representation of the entire input.
  • the visual tool presents a small vertical value indicator 140 next to the icon to show where the current value 142 is on the scale.
  • the user of the SSML visual tool can also cause the pointer to hover over the icon or the scale indicator to view a tooltip 144 explaining the details of the element including the name, value, and others. The user can then click the tooltip to open the
  • the development platform leverages generalized, abstract intents and open-ended slot types that provide greater flexibility for utterance matching. This greater flexibility enables other features including that new content can be added without requiring an update to the general interaction model, and therefore without requiring re-deployment or recertification.
  • the ability to create interaction applications without coding enables a broad non technical user base to create voice, chat, and other interaction applications.
  • the development platform also allows users to manage content without managing business logic, whereas content, business logic, and intents are tightly coupled in custom or flow-based tools.
  • the development platform also uses a more traditional content form style of managing content which does not require a large canvas of intersecting items.

Abstract

Entre autres, un développeur d'une application d'interaction pour une entreprise peut créer des éléments de contenu qui seront fournis à une plateforme d'assistant destinée à être utilisée dans des réponses à des demandes d'utilisateurs finaux. Le développeur peut déployer l'application d'interaction à l'aide d'éléments de contenu définis et d'un modèle d'interaction général disponible qui comprend des intentions et des énoncés échantillons avec des intervalles. Le développeur peut déployer l'application d'interaction sans que le développeur ait besoin de formuler des intentions, des énoncés échantillons ou des intervalles du modèle d'interaction général.
PCT/US2020/046201 2019-08-19 2020-08-13 Développement d'applications vocales et d'autres applications d'interaction WO2021034613A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CA3151910A CA3151910A1 (fr) 2019-08-19 2020-08-13 Developpement d'applications vocales et d'autres applications d'interaction
CN202080071550.6A CN114945979A (zh) 2019-08-19 2020-08-13 语音和其他交互应用的开发
EP20853981.7A EP4018436A4 (fr) 2019-08-19 2020-08-13 Développement d'applications vocales et d'autres applications d'interaction

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US16/544,527 US10614800B1 (en) 2019-08-19 2019-08-19 Development of voice and other interaction applications
US16/544,375 US11508365B2 (en) 2019-08-19 2019-08-19 Development of voice and other interaction applications
US16/544,375 2019-08-19
US16/544,508 US10762890B1 (en) 2019-08-19 2019-08-19 Development of voice and other interaction applications
US16/544,527 2019-08-19
US16/544,508 2019-08-19
US16/816,535 2020-03-12
US16/816,535 US11538466B2 (en) 2019-08-19 2020-03-12 Development of voice and other interaction applications

Publications (1)

Publication Number Publication Date
WO2021034613A1 true WO2021034613A1 (fr) 2021-02-25

Family

ID=74660576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/046201 WO2021034613A1 (fr) 2019-08-19 2020-08-13 Développement d'applications vocales et d'autres applications d'interaction

Country Status (4)

Country Link
EP (1) EP4018436A4 (fr)
CN (1) CN114945979A (fr)
CA (1) CA3151910A1 (fr)
WO (1) WO2021034613A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT202100012548A1 (it) 2021-05-14 2022-11-14 Hitbytes Srl Metodo per la creazione di applicazioni vocali multipiattaforma

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022155107A (ja) * 2021-03-30 2022-10-13 本田技研工業株式会社 情報処理装置、情報処理方法、移動体の制御装置、移動体の制御方法及びプログラム
CN115064166B (zh) * 2022-08-17 2022-12-13 广州小鹏汽车科技有限公司 车辆语音交互方法、服务器和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100943A1 (en) * 2013-10-09 2015-04-09 Viv Labs, Inc. Dynamically evolving cognitive architecture system based on contributions from third-party developers
US20170212884A1 (en) * 2016-01-23 2017-07-27 Microsoft Technology Licensing, Llc Tool for Facilitating the Development of New Language Understanding Scenarios
US20180366114A1 (en) * 2017-06-16 2018-12-20 Amazon Technologies, Inc. Exporting dialog-driven applications to digital communication platforms

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010715A1 (en) * 2001-07-26 2002-01-24 Garry Chinn System and method for browsing using a limited display device
US20040194016A1 (en) * 2003-03-28 2004-09-30 International Business Machines Corporation Dynamic data migration for structured markup language schema changes
US10235999B1 (en) * 2018-06-05 2019-03-19 Voicify, LLC Voice application platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100943A1 (en) * 2013-10-09 2015-04-09 Viv Labs, Inc. Dynamically evolving cognitive architecture system based on contributions from third-party developers
US20170212884A1 (en) * 2016-01-23 2017-07-27 Microsoft Technology Licensing, Llc Tool for Facilitating the Development of New Language Understanding Scenarios
US20180366114A1 (en) * 2017-06-16 2018-12-20 Amazon Technologies, Inc. Exporting dialog-driven applications to digital communication platforms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4018436A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT202100012548A1 (it) 2021-05-14 2022-11-14 Hitbytes Srl Metodo per la creazione di applicazioni vocali multipiattaforma

Also Published As

Publication number Publication date
CA3151910A1 (fr) 2021-02-25
EP4018436A4 (fr) 2022-10-12
EP4018436A1 (fr) 2022-06-29
CN114945979A (zh) 2022-08-26

Similar Documents

Publication Publication Date Title
US11538466B2 (en) Development of voice and other interaction applications
US11508365B2 (en) Development of voice and other interaction applications
EP3545427B1 (fr) Service pour développer des applications commandées par dialogue
US11749256B2 (en) Development of voice and other interaction applications
US8117023B2 (en) Language understanding apparatus, language understanding method, and computer program
JP3964134B2 (ja) 言語文法を作成するための方法
US9081550B2 (en) Adding speech capabilities to existing computer applications with complex graphical user interfaces
JP4237915B2 (ja) ユーザが文字列の発音を設定することを可能にするためにコンピュータ上で実行される方法
US7630892B2 (en) Method and apparatus for transducer-based text normalization and inverse text normalization
WO2021034613A1 (fr) Développement d'applications vocales et d'autres applications d'interaction
US8447610B2 (en) Method and apparatus for generating synthetic speech with contrastive stress
US11776533B2 (en) Building a natural language understanding application using a received electronic record containing programming code including an interpret-block, an interpret-statement, a pattern expression and an action statement
US20150106101A1 (en) Method and apparatus for providing speech output for speech-enabled applications
JP2005537532A (ja) 自然言語理解アプリケーションを構築するための総合開発ツール
WO2002033542A2 (fr) Procedes et systemes de developpement de logiciels
US8914291B2 (en) Method and apparatus for generating synthetic speech with contrastive stress
CA2671722A1 (fr) Methodes et systemes de prestation de services de grammaire
US20100191519A1 (en) Tool and framework for creating consistent normalization maps and grammars
Gruenstein et al. Scalable and portable web-based multimodal dialogue interaction with geographical databases
US11604929B2 (en) Guided text generation for task-oriented dialogue
US20140257816A1 (en) Speech synthesis dictionary modification device, speech synthesis dictionary modification method, and computer program product
Di Fabbrizio et al. AT&t help desk.
Wigmore Speech-based creation and editing of mathematical content
Albin Typologizing native language influence on intonation in a second language: Three transfer phenomena in Japanese EFL learners
TW201537372A (zh) 動作設計裝置及動作設計程式產品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20853981

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3151910

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020853981

Country of ref document: EP

Effective date: 20220321