JP7313455B2 - 発話エージェント - Google Patents
発話エージェント Download PDFInfo
- Publication number
- JP7313455B2 JP7313455B2 JP2021544347A JP2021544347A JP7313455B2 JP 7313455 B2 JP7313455 B2 JP 7313455B2 JP 2021544347 A JP2021544347 A JP 2021544347A JP 2021544347 A JP2021544347 A JP 2021544347A JP 7313455 B2 JP7313455 B2 JP 7313455B2
- Authority
- JP
- Japan
- Prior art keywords
- response
- audio
- user model
- model
- caller
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/091—Active learning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services or time announcements
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/527—Centralised call answering arrangements not requiring operator intervention
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/39—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech synthesis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- Child & Adolescent Psychology (AREA)
- Machine Translation (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/929,095 US10770059B2 (en) | 2019-01-29 | 2019-01-29 | Conversational speech agent |
| US15/929,095 | 2019-01-29 | ||
| PCT/US2020/013160 WO2020159693A1 (en) | 2019-01-29 | 2020-01-10 | Conversational speech agent |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2022523504A JP2022523504A (ja) | 2022-04-25 |
| JP2022523504A5 JP2022523504A5 (https=) | 2022-12-21 |
| JP7313455B2 true JP7313455B2 (ja) | 2023-07-24 |
Family
ID=71732630
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2021544347A Active JP7313455B2 (ja) | 2019-01-29 | 2020-01-10 | 発話エージェント |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US10770059B2 (https=) |
| EP (1) | EP3918508A4 (https=) |
| JP (1) | JP7313455B2 (https=) |
| WO (1) | WO2020159693A1 (https=) |
Families Citing this family (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10600408B1 (en) * | 2018-03-23 | 2020-03-24 | Amazon Technologies, Inc. | Content output management based on speech quality |
| US11854538B1 (en) * | 2019-02-15 | 2023-12-26 | Amazon Technologies, Inc. | Sentiment detection in audio data |
| JP7338039B2 (ja) * | 2019-08-07 | 2023-09-04 | ライブパーソン, インコーポレイテッド | メッセージングを自動化に転送するためのシステムおよび方法 |
| US11393471B1 (en) * | 2020-03-30 | 2022-07-19 | Amazon Technologies, Inc. | Multi-device output management based on speech characteristics |
| WO2021205946A1 (ja) * | 2020-04-06 | 2021-10-14 | ソニーグループ株式会社 | 情報処理装置および情報処理方法 |
| US11875797B2 (en) * | 2020-07-23 | 2024-01-16 | Pozotron Inc. | Systems and methods for scripted audio production |
| US20220319505A1 (en) * | 2021-02-12 | 2022-10-06 | Ashwarya Poddar | System and method for rapid improvement of virtual speech agent's natural language understanding |
| US11545141B1 (en) * | 2021-04-16 | 2023-01-03 | ConverzAI Inc. | Omni-channel orchestrated conversation system and virtual conversation agent for realtime contextual and orchestrated omni-channel conversation with a human and an omni-channel orchestrated conversation process for conducting realtime contextual and fluid conversation with the human by the virtual conversation agent |
| US12272355B2 (en) * | 2021-04-20 | 2025-04-08 | Converzai, Inc. | System and method for providing a virtual speech agent for simulated conversations and conversational feedback |
| US20230026945A1 (en) * | 2021-07-21 | 2023-01-26 | Wellspoken, Inc. | Virtual Conversational Agent |
| US11856139B2 (en) * | 2021-09-24 | 2023-12-26 | International Business Machines Corporation | Method and apparatus for dynamic tone bank and personalized response in 5G telecom network |
| CN114189587A (zh) * | 2021-11-10 | 2022-03-15 | 阿里巴巴(中国)有限公司 | 通话方法、设备、存储介质及计算机程序产品 |
| US12315495B2 (en) | 2021-12-17 | 2025-05-27 | Snap Inc. | Speech to entity |
| US11936812B2 (en) * | 2021-12-22 | 2024-03-19 | Kore.Ai, Inc. | Systems and methods for handling customer conversations at a contact center |
| US12361934B2 (en) | 2022-07-14 | 2025-07-15 | Snap Inc. | Boosting words in automated speech recognition |
| EP4471694A1 (en) * | 2023-06-01 | 2024-12-04 | Airbus S.A.S. | Method for assisting a worker in a production line, data processing apparatus and computer program |
| JP7808661B2 (ja) * | 2023-09-20 | 2026-01-29 | ソフトバンクグループ株式会社 | システム |
| US20250118298A1 (en) * | 2023-10-09 | 2025-04-10 | Hishab Singapore Private Limited | System and method for optimizing a user interaction session within an interactive voice response system |
| EP4599429A1 (en) * | 2023-12-28 | 2025-08-13 | Google LLC | Dynamic adaptation of speech synthesis by an automated assistant during automated telephone call(s) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2005062240A (ja) | 2003-08-13 | 2005-03-10 | Fujitsu Ltd | 音声応答システム |
Family Cites Families (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6018711A (en) * | 1998-04-21 | 2000-01-25 | Nortel Networks Corporation | Communication system user interface with animated representation of time remaining for input to recognizer |
| US6771746B2 (en) | 2002-05-16 | 2004-08-03 | Rockwell Electronic Commerce Technologies, Llc | Method and apparatus for agent optimization using speech synthesis and recognition |
| US20040162724A1 (en) * | 2003-02-11 | 2004-08-19 | Jeffrey Hill | Management of conversations |
| US20080086690A1 (en) * | 2006-09-21 | 2008-04-10 | Ashish Verma | Method and System for Hybrid Call Handling |
| US9575963B2 (en) * | 2012-04-20 | 2017-02-21 | Maluuba Inc. | Conversational agent |
| US20180012595A1 (en) * | 2016-07-07 | 2018-01-11 | Intelligently Interactive, Inc. | Simple affirmative response operating system |
| US9812151B1 (en) * | 2016-11-18 | 2017-11-07 | IPsoft Incorporated | Generating communicative behaviors for anthropomorphic virtual agents based on user's affect |
| KR20190121758A (ko) * | 2017-02-27 | 2019-10-28 | 소니 주식회사 | 정보 처리 장치, 정보 처리 방법, 및 프로그램 |
| US9865260B1 (en) * | 2017-05-03 | 2018-01-09 | Google Llc | Proactive incorporation of unsolicited content into human-to-computer dialogs |
| KR20190004495A (ko) * | 2017-07-04 | 2019-01-14 | 삼성에스디에스 주식회사 | 챗봇을 이용한 태스크 처리 방법, 장치 및 시스템 |
| US10504514B2 (en) * | 2017-09-29 | 2019-12-10 | Visteon Global Technologies, Inc. | Human machine interface system and method for improving user experience based on history of voice activity |
| US10424302B2 (en) * | 2017-10-12 | 2019-09-24 | Google Llc | Turn-based reinforcement learning for dialog management |
| CN107943896A (zh) * | 2017-11-16 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | 信息处理方法和装置 |
| US10475451B1 (en) * | 2017-12-06 | 2019-11-12 | Amazon Technologies, Inc. | Universal and user-specific command processing |
| JP7044415B2 (ja) * | 2017-12-31 | 2022-03-30 | 美的集団股▲フン▼有限公司 | ホームアシスタント装置を制御するための方法及びシステム |
| CN108600911B (zh) * | 2018-03-30 | 2021-05-18 | 联想(北京)有限公司 | 一种输出方法及电子设备 |
-
2019
- 2019-01-29 US US15/929,095 patent/US10770059B2/en active Active
-
2020
- 2020-01-10 JP JP2021544347A patent/JP7313455B2/ja active Active
- 2020-01-10 WO PCT/US2020/013160 patent/WO2020159693A1/en not_active Ceased
- 2020-01-10 EP EP20748597.0A patent/EP3918508A4/en active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2005062240A (ja) | 2003-08-13 | 2005-03-10 | Fujitsu Ltd | 音声応答システム |
Also Published As
| Publication number | Publication date |
|---|---|
| US20200243062A1 (en) | 2020-07-30 |
| EP3918508A1 (en) | 2021-12-08 |
| US10770059B2 (en) | 2020-09-08 |
| WO2020159693A1 (en) | 2020-08-06 |
| EP3918508A4 (en) | 2022-11-09 |
| JP2022523504A (ja) | 2022-04-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7313455B2 (ja) | 発話エージェント | |
| US12602198B2 (en) | Search and knowledge base question answering for a voice user interface | |
| US20240153489A1 (en) | Data driven dialog management | |
| US11645547B2 (en) | Human-machine interactive method and device based on artificial intelligence | |
| Li et al. | Learning fine-grained cross modality excitement for speech emotion recognition | |
| Casale et al. | Speech emotion classification using machine learning algorithms | |
| WO2020135194A1 (zh) | 基于情绪引擎技术的语音交互方法、智能终端及存储介质 | |
| CN111145721A (zh) | 个性化提示语生成方法、装置和设备 | |
| KR102870187B1 (ko) | 복수 의도어 획득을 위한 합성곱 신경망을 가진 장치 및 그 방법 | |
| Sarthak et al. | Spoken language identification using convnets | |
| US11914635B2 (en) | Performing image search based on user input using neural networks | |
| CN114005446B (zh) | 情感分析方法、相关设备及可读存储介质 | |
| CN114373443B (zh) | 语音合成方法和装置、计算设备、存储介质及程序产品 | |
| WO2024263220A1 (en) | Generating model output using a knowledge graph | |
| CN118197306A (zh) | 一种语音对话方法、系统、电子设备及存储介质 | |
| Pieraccini | AI assistants | |
| CN118260711A (zh) | 一种多模态的情感识别方法及装置 | |
| CN118861258A (zh) | 基于语料库的语音问答方法、装置、电子设备及存储介质 | |
| CN117290562A (zh) | 智能外呼方法、装置、设备及存储介质 | |
| KR102159988B1 (ko) | 음성 몽타주 생성 방법 및 시스템 | |
| CA3144042A1 (fr) | Procede et dispositif d'obtention d'une reponse a partir d'une question orale posee a une interface homme-machine | |
| Bonaccorsi | Speech-Text Cross-Modal Learning through Self-Attention Mechanisms | |
| KR102796556B1 (ko) | 자연어 처리를 이용한 중요 발화 감지 방법 및 이를 위한 장치 | |
| Patil | Deep learning based natural language processing for end to end speech translation | |
| Kumari et al. | The speech emotion recognition in multi-languages using an ensemble deep learning-based technique |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20221213 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20221213 |
|
| A871 | Explanation of circumstances concerning accelerated examination |
Free format text: JAPANESE INTERMEDIATE CODE: A871 Effective date: 20221213 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20230117 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230412 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20230613 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20230711 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7313455 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |