JP7313455B2 - 発話エージェント - Google Patents

発話エージェント Download PDF

Info

Publication number
JP7313455B2
JP7313455B2 JP2021544347A JP2021544347A JP7313455B2 JP 7313455 B2 JP7313455 B2 JP 7313455B2 JP 2021544347 A JP2021544347 A JP 2021544347A JP 2021544347 A JP2021544347 A JP 2021544347A JP 7313455 B2 JP7313455 B2 JP 7313455B2
Authority
JP
Japan
Prior art keywords
response
audio
user model
model
caller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2021544347A
Other languages
English (en)
Japanese (ja)
Other versions
JP2022523504A5 (https=
JP2022523504A (ja
Inventor
スコデアリー,アンソニー,スコデアリー
バロン,アレックス
コーエン,デヴィッド
ミラン,エヴァン マック
Original Assignee
グリッドスペース インコーポレイテッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by グリッドスペース インコーポレイテッド filed Critical グリッドスペース インコーポレイテッド
Publication of JP2022523504A publication Critical patent/JP2022523504A/ja
Publication of JP2022523504A5 publication Critical patent/JP2022523504A5/ja
Application granted granted Critical
Publication of JP7313455B2 publication Critical patent/JP7313455B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/091Active learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/527Centralised call answering arrangements not requiring operator intervention
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/39Electronic components, circuits, software, systems or apparatus used in telephone systems using speech synthesis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Machine Translation (AREA)
JP2021544347A 2019-01-29 2020-01-10 発話エージェント Active JP7313455B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/929,095 US10770059B2 (en) 2019-01-29 2019-01-29 Conversational speech agent
US15/929,095 2019-01-29
PCT/US2020/013160 WO2020159693A1 (en) 2019-01-29 2020-01-10 Conversational speech agent

Publications (3)

Publication Number Publication Date
JP2022523504A JP2022523504A (ja) 2022-04-25
JP2022523504A5 JP2022523504A5 (https=) 2022-12-21
JP7313455B2 true JP7313455B2 (ja) 2023-07-24

Family

ID=71732630

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2021544347A Active JP7313455B2 (ja) 2019-01-29 2020-01-10 発話エージェント

Country Status (4)

Country Link
US (1) US10770059B2 (https=)
EP (1) EP3918508A4 (https=)
JP (1) JP7313455B2 (https=)
WO (1) WO2020159693A1 (https=)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10600408B1 (en) * 2018-03-23 2020-03-24 Amazon Technologies, Inc. Content output management based on speech quality
US11854538B1 (en) * 2019-02-15 2023-12-26 Amazon Technologies, Inc. Sentiment detection in audio data
JP7338039B2 (ja) * 2019-08-07 2023-09-04 ライブパーソン, インコーポレイテッド メッセージングを自動化に転送するためのシステムおよび方法
US11393471B1 (en) * 2020-03-30 2022-07-19 Amazon Technologies, Inc. Multi-device output management based on speech characteristics
WO2021205946A1 (ja) * 2020-04-06 2021-10-14 ソニーグループ株式会社 情報処理装置および情報処理方法
US11875797B2 (en) * 2020-07-23 2024-01-16 Pozotron Inc. Systems and methods for scripted audio production
US20220319505A1 (en) * 2021-02-12 2022-10-06 Ashwarya Poddar System and method for rapid improvement of virtual speech agent's natural language understanding
US11545141B1 (en) * 2021-04-16 2023-01-03 ConverzAI Inc. Omni-channel orchestrated conversation system and virtual conversation agent for realtime contextual and orchestrated omni-channel conversation with a human and an omni-channel orchestrated conversation process for conducting realtime contextual and fluid conversation with the human by the virtual conversation agent
US12272355B2 (en) * 2021-04-20 2025-04-08 Converzai, Inc. System and method for providing a virtual speech agent for simulated conversations and conversational feedback
US20230026945A1 (en) * 2021-07-21 2023-01-26 Wellspoken, Inc. Virtual Conversational Agent
US11856139B2 (en) * 2021-09-24 2023-12-26 International Business Machines Corporation Method and apparatus for dynamic tone bank and personalized response in 5G telecom network
CN114189587A (zh) * 2021-11-10 2022-03-15 阿里巴巴(中国)有限公司 通话方法、设备、存储介质及计算机程序产品
US12315495B2 (en) 2021-12-17 2025-05-27 Snap Inc. Speech to entity
US11936812B2 (en) * 2021-12-22 2024-03-19 Kore.Ai, Inc. Systems and methods for handling customer conversations at a contact center
US12361934B2 (en) 2022-07-14 2025-07-15 Snap Inc. Boosting words in automated speech recognition
EP4471694A1 (en) * 2023-06-01 2024-12-04 Airbus S.A.S. Method for assisting a worker in a production line, data processing apparatus and computer program
JP7808661B2 (ja) * 2023-09-20 2026-01-29 ソフトバンクグループ株式会社 システム
US20250118298A1 (en) * 2023-10-09 2025-04-10 Hishab Singapore Private Limited System and method for optimizing a user interaction session within an interactive voice response system
EP4599429A1 (en) * 2023-12-28 2025-08-13 Google LLC Dynamic adaptation of speech synthesis by an automated assistant during automated telephone call(s)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005062240A (ja) 2003-08-13 2005-03-10 Fujitsu Ltd 音声応答システム

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6018711A (en) * 1998-04-21 2000-01-25 Nortel Networks Corporation Communication system user interface with animated representation of time remaining for input to recognizer
US6771746B2 (en) 2002-05-16 2004-08-03 Rockwell Electronic Commerce Technologies, Llc Method and apparatus for agent optimization using speech synthesis and recognition
US20040162724A1 (en) * 2003-02-11 2004-08-19 Jeffrey Hill Management of conversations
US20080086690A1 (en) * 2006-09-21 2008-04-10 Ashish Verma Method and System for Hybrid Call Handling
US9575963B2 (en) * 2012-04-20 2017-02-21 Maluuba Inc. Conversational agent
US20180012595A1 (en) * 2016-07-07 2018-01-11 Intelligently Interactive, Inc. Simple affirmative response operating system
US9812151B1 (en) * 2016-11-18 2017-11-07 IPsoft Incorporated Generating communicative behaviors for anthropomorphic virtual agents based on user's affect
KR20190121758A (ko) * 2017-02-27 2019-10-28 소니 주식회사 정보 처리 장치, 정보 처리 방법, 및 프로그램
US9865260B1 (en) * 2017-05-03 2018-01-09 Google Llc Proactive incorporation of unsolicited content into human-to-computer dialogs
KR20190004495A (ko) * 2017-07-04 2019-01-14 삼성에스디에스 주식회사 챗봇을 이용한 태스크 처리 방법, 장치 및 시스템
US10504514B2 (en) * 2017-09-29 2019-12-10 Visteon Global Technologies, Inc. Human machine interface system and method for improving user experience based on history of voice activity
US10424302B2 (en) * 2017-10-12 2019-09-24 Google Llc Turn-based reinforcement learning for dialog management
CN107943896A (zh) * 2017-11-16 2018-04-20 百度在线网络技术(北京)有限公司 信息处理方法和装置
US10475451B1 (en) * 2017-12-06 2019-11-12 Amazon Technologies, Inc. Universal and user-specific command processing
JP7044415B2 (ja) * 2017-12-31 2022-03-30 美的集団股▲フン▼有限公司 ホームアシスタント装置を制御するための方法及びシステム
CN108600911B (zh) * 2018-03-30 2021-05-18 联想(北京)有限公司 一种输出方法及电子设备

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005062240A (ja) 2003-08-13 2005-03-10 Fujitsu Ltd 音声応答システム

Also Published As

Publication number Publication date
US20200243062A1 (en) 2020-07-30
EP3918508A1 (en) 2021-12-08
US10770059B2 (en) 2020-09-08
WO2020159693A1 (en) 2020-08-06
EP3918508A4 (en) 2022-11-09
JP2022523504A (ja) 2022-04-25

Similar Documents

Publication Publication Date Title
JP7313455B2 (ja) 発話エージェント
US12602198B2 (en) Search and knowledge base question answering for a voice user interface
US20240153489A1 (en) Data driven dialog management
US11645547B2 (en) Human-machine interactive method and device based on artificial intelligence
Li et al. Learning fine-grained cross modality excitement for speech emotion recognition
Casale et al. Speech emotion classification using machine learning algorithms
WO2020135194A1 (zh) 基于情绪引擎技术的语音交互方法、智能终端及存储介质
CN111145721A (zh) 个性化提示语生成方法、装置和设备
KR102870187B1 (ko) 복수 의도어 획득을 위한 합성곱 신경망을 가진 장치 및 그 방법
Sarthak et al. Spoken language identification using convnets
US11914635B2 (en) Performing image search based on user input using neural networks
CN114005446B (zh) 情感分析方法、相关设备及可读存储介质
CN114373443B (zh) 语音合成方法和装置、计算设备、存储介质及程序产品
WO2024263220A1 (en) Generating model output using a knowledge graph
CN118197306A (zh) 一种语音对话方法、系统、电子设备及存储介质
Pieraccini AI assistants
CN118260711A (zh) 一种多模态的情感识别方法及装置
CN118861258A (zh) 基于语料库的语音问答方法、装置、电子设备及存储介质
CN117290562A (zh) 智能外呼方法、装置、设备及存储介质
KR102159988B1 (ko) 음성 몽타주 생성 방법 및 시스템
CA3144042A1 (fr) Procede et dispositif d'obtention d'une reponse a partir d'une question orale posee a une interface homme-machine
Bonaccorsi Speech-Text Cross-Modal Learning through Self-Attention Mechanisms
KR102796556B1 (ko) 자연어 처리를 이용한 중요 발화 감지 방법 및 이를 위한 장치
Patil Deep learning based natural language processing for end to end speech translation
Kumari et al. The speech emotion recognition in multi-languages using an ensemble deep learning-based technique

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20221213

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20221213

A871 Explanation of circumstances concerning accelerated examination

Free format text: JAPANESE INTERMEDIATE CODE: A871

Effective date: 20221213

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20230117

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20230412

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20230613

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20230711

R150 Certificate of patent or registration of utility model

Ref document number: 7313455

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150