WO2021106080A1 - 対話装置、方法及びプログラム - Google Patents
対話装置、方法及びプログラム Download PDFInfo
- Publication number
- WO2021106080A1 WO2021106080A1 PCT/JP2019/046184 JP2019046184W WO2021106080A1 WO 2021106080 A1 WO2021106080 A1 WO 2021106080A1 JP 2019046184 W JP2019046184 W JP 2019046184W WO 2021106080 A1 WO2021106080 A1 WO 2021106080A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- utterance
- response
- status
- unit
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/086—Detection of language
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
- G06F40/56—Natural language generation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Definitions
- the present invention relates to a technique for generating a more natural response utterance in a voice dialogue using synthetic voice.
- the speech of the conversation partner is voice-recognized and converted into text to understand the language, and while managing the state of the dialogue, a response sentence is generated and voice synthesis is performed to perform the speech.
- a response has been made (see, for example, Patent Document 2).
- the voice uttered in response depends only on the text information generated by the response generation unit, even if the response can be made appropriately on the text, the situation of the voice itself of the actual dialogue partner and the response utterance There may be gaps in the audio situation.
- An object of the present invention is to provide a dialogue device, a method and a program for realizing a more natural dialogue.
- the dialogue device performs voice recognition for an input utterance, and generates a text corresponding to the utterance, a voice waveform corresponding to the utterance, and information on the length of the utterance sound.
- An utterance status extraction unit that extracts the utterance status using a text, an utterance waveform corresponding to the utterance, and information on the length of the utterance sound, and a response that determines the response status according to the utterance status. It is provided with a situation determination unit, a response sentence generation unit that generates a response sentence using the contents of the response, and a voice synthesis unit that synthesizes a voice corresponding to the response sentence and considering the response situation. ing.
- a more natural dialogue can be realized.
- FIG. 1 is a diagram showing an example of the functional configuration of the dialogue device.
- FIG. 2 is a diagram showing an example of a processing procedure of the dialogue method.
- FIG. 3 is a diagram for explaining an example of processing of the response status determination unit 5.
- FIG. 4 is a diagram for explaining another example of the processing of the response status determination unit 5.
- FIG. 5 is a diagram showing an example of a functional configuration of a computer.
- the dialogue device includes, for example, a voice recognition unit 1, a language understanding unit 2, a dialogue management unit 3, an utterance status extraction unit 4, a response status determination unit 5, a response sentence generation unit 6, and a speech synthesis unit 7. I have.
- the dialogue method is realized, for example, by each component of the dialogue device performing the processing of steps S1 to S7 shown in FIG. 1 as described below.
- the voice recognition unit 1 performs voice recognition on the input utterance and generates a text corresponding to the utterance, a voice waveform corresponding to the utterance, and information on the length of the utterance sound (step S1).
- the text corresponding to the utterance is sometimes called the "utterance sentence”.
- the text corresponding to the generated utterance is output to the language understanding unit 2 and the utterance status extraction unit 4.
- the voice waveform corresponding to the utterance and the information regarding the length of the utterance sound are output to the utterance status extraction unit 4.
- the information regarding the length of the sound of the utterance may be the length of the utterance itself or the length of each phoneme constituting the utterance.
- An example of an utterance input to the voice recognition unit 1 is "What is the weather tomorrow?".
- the language comprehension unit 2 grasps the content of the utterance by using the text corresponding to the utterance (step S2).
- the grasped content is output to the dialogue management unit 3.
- the content of the utterance is, for example, information about a so-called dialogue act.
- the dialogue act has at least information on the act type and the attribute, for example (see, for example, Reference 1).
- the dialogue management unit 3 uses the content of the utterance to determine the content of the response corresponding to the utterance (step S3).
- the content of the determined response is output to the response sentence generation unit 6.
- the content of the response is, for example, information about the dialogue type.
- Examples of dialogue types of responses are answers, answers (lie), questions, greetings, apologies, and confirmations.
- the dialogue management unit 3 determines the content of the response by, for example, the method described in Reference 1. That is, the dialogue management unit 3 updates the internal state based on the input content of the utterance, and determines the dialogue type which is the content of the utterance based on the updated internal state. At that time, the dialogue management unit 3 may determine the content of the utterance by using the external API.
- the utterance status extraction unit 4 is input with the text corresponding to the utterance, the voice waveform corresponding to the utterance, and the information regarding the length of the utterance sound generated by the voice recognition unit 1.
- the utterance status extraction unit 4 extracts the utterance status using the text corresponding to the utterance, the voice waveform corresponding to the utterance, and the information regarding the length of the utterance sound (step S4).
- the extracted utterance status is output to the response status determination unit 5.
- the utterance status is at least information related to the utterance status such as the speaking speed of the utterance and the emotion of the person who made the utterance.
- the utterance situation may include the tone of the person who made the utterance.
- the utterance speed is information related to the utterance speed.
- the speaking speed of an utterance is, for example, the number of characters or the number of phonemes included in a unit time.
- Examples of emotions of the person who spoke are normal, joy, sadness, anger, calm, excitement, calm, depressed, anxious, apologetic, bright, dark, etc.
- the utterance situation extraction unit 4 classifies the emotions of the person who made the utterance into any of normal, joy, sadness, anger, calm, excitement, calmness, depression, anxiety, excuse, bright, dark, and the like. It is decided by.
- the utterance situation extraction unit 4 may determine the emotions of the person who made the utterance by classifying them into one of normal, joy, sadness, and anger.
- the utterance situation extraction unit 4 may determine the emotion of the person who made the utterance by classifying it into any of calm, excited, calm, depressed, anxious, and apologetic.
- the utterance situation extraction unit 4 may determine the emotions of the person who made the utterance by classifying them into bright and dark.
- the utterance situation extraction unit 4 can determine the emotion of the person who made the utterance, for example, by the method described in Reference 2.
- the emotion of the person who made the utterance is determined based on, for example, the text corresponding to the utterance and the voice waveform corresponding to the utterance.
- the utterance status extraction unit 4 can determine the tone of the person who made the utterance, for example, by the method described in Reference 3.
- the tone of the person who made the utterance is determined based on, for example, the text corresponding to the utterance and the voice waveform corresponding to the utterance.
- the response status determination unit 5 inputs the utterance status extracted by the utterance status extraction unit 4.
- the response status determination unit 5 determines the response status according to the utterance status (step S5).
- the determined response status is output to the voice synthesis unit 7.
- the response status determination unit 5 can determine the response status according to the input utterance status, for example, based on a predetermined rule.
- a predetermined rule is the conversion table shown in FIG.
- the conversion table in FIG. 3 shows only the response status corresponding to each of the three utterance situations.
- the conversion table actually used by the response status determination unit 5 it is assumed that the response status corresponding to each of all utterance situations is defined.
- the response status determination unit 5 determines the response status using the conversion table for the specific utterance status described in the conversion table, and determines the response status for other utterance statuses. It may be determined as the response status output by the response status determination unit 5.
- the response status determination unit 5 may determine the response status by using a non-linear transformation using a neural network or the like.
- the dimension of the input layer of the neural network is the sum of the type of speech speed, the type of emotion, and the type of tone
- the dimension of the output layer of the neural network is the type of speech speed of the response and the type of emotion.
- the number of tone types added.
- the number of intermediate layers (hidden layers) of the neural network is arbitrary.
- the number of dimensions of each intermediate layer (hidden layer) is also arbitrary.
- the parameters of the neural network By adjusting the parameters of the neural network so that the output value output from the neural network by that input approaches the output of the corresponding response, we learned the pattern of conversion between the input utterance status and the response status. Generate a model.
- the parameters are adjusted so that the output node in which the response speed is normal, the response emotion is normal, and the response tone is politely corresponding outputs 1 and the other output nodes output 0.
- the response sentence generation unit 6 generates a response sentence using the contents of the response (step S6).
- the generated response sentence is output to the voice synthesis unit 7.
- ⁇ Speech synthesis unit 7> The response sentence generated by the response sentence generation unit 6 and the response status determined by the response status determination unit 5 are input to the speech synthesis unit 7.
- the voice synthesis unit 7 synthesizes the voice corresponding to the response sentence and in consideration of the response status (step S7).
- the synthesized voice is output from the dialogue device.
- the response status determined by the response status determination unit 5 may include the tone of the response.
- the response sentence generation unit 6 may generate a response sentence in consideration of the tone of the response included in the response status determined by the response status determination unit 5.
- the response status determination unit 5 further responds according to at least one of the text corresponding to the utterance, the content of the utterance, the content of the response, and the information obtained until the dialogue management unit 3 determines the content of the response. The situation may be determined.
- the information obtained until the dialogue management unit 3 determines the content of the response is, for example, internal information in the dialogue management unit 3.
- the conversion table which is a predetermined rule used by the response status determination unit 5 to determine the response status based on the utterance dialogue type, which is the content of the utterance, and the response dialogue type, which is the content of the response.
- An example is shown in FIG.
- speaking speed slow
- emotion of the person who made the utterance anxiety
- tone of the person who made the utterance polite
- dialogue type of the utterance question
- speaking speed slow
- emotion of the person who made the utterance anxiety
- tone of the person who made the utterance polite
- dialogue type of the utterance question
- speaking speed normal
- emotion of the person who made the utterance joy
- tone of the person who made the utterance spoken
- dialogue type of the utterance spoken
- dialogue type of the utterance greeting
- speaking speed normal
- emotion of the person who made the utterance bright
- tone spoken
- dialogue type of the utterance spoken
- dialogue type of the utterance question
- data may be exchanged directly between the constituent parts of the dialogue device, or may be performed via a storage unit (not shown).
- the program that describes this processing content can be recorded on a computer-readable recording medium.
- the computer-readable recording medium may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like.
- the distribution of this program is carried out, for example, by selling, transferring, renting, etc., portable recording media such as DVDs and CD-ROMs on which the program is recorded. Further, the program may be stored in the storage device of the server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.
- a computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when the process is executed, the computer reads the program stored in its own storage device and executes the process according to the read program. Further, as another execution form of this program, the computer may read the program directly from the portable recording medium and execute the processing according to the program, and further, the program is transferred from the server computer to this computer. It is also possible to execute the process according to the received program one by one each time. In addition, the above processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition without transferring the program from the server computer to this computer. May be.
- the program in this embodiment includes information to be used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property of defining the processing of the computer, etc.).
- the present device is configured by executing a predetermined program on the computer, but at least a part of these processing contents may be realized by hardware.
- Speech recognition unit 2 Language understanding unit 3
- Dialogue management unit 4
- Speaking status extraction unit 5
- Response status determination unit 6
- Response sentence generation unit 7 Speech synthesis unit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2019/046184 WO2021106080A1 (ja) | 2019-11-26 | 2019-11-26 | 対話装置、方法及びプログラム |
| JP2021560806A JPWO2021106080A1 (https=) | 2019-11-26 | 2019-11-26 | |
| US17/779,528 US20230005467A1 (en) | 2019-11-26 | 2019-11-26 | Dialogue apparatus, method and program |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2019/046184 WO2021106080A1 (ja) | 2019-11-26 | 2019-11-26 | 対話装置、方法及びプログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021106080A1 true WO2021106080A1 (ja) | 2021-06-03 |
Family
ID=76129403
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2019/046184 Ceased WO2021106080A1 (ja) | 2019-11-26 | 2019-11-26 | 対話装置、方法及びプログラム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20230005467A1 (https=) |
| JP (1) | JPWO2021106080A1 (https=) |
| WO (1) | WO2021106080A1 (https=) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2023038957A (ja) * | 2021-09-08 | 2023-03-20 | 株式会社日立製作所 | 音声合成システム及び音声を合成する方法 |
| US20240242718A1 (en) * | 2021-05-24 | 2024-07-18 | Nippon Telegraph And Telephone Corporation | Dialogue apparatus, dialogue method, and program |
| US20250328727A1 (en) * | 2024-04-19 | 2025-10-23 | Augmented Reality Concepts, Inc. | Dialogue state tracking logic control layers |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021186525A1 (ja) * | 2020-03-17 | 2021-09-23 | 日本電信電話株式会社 | 発話生成装置、発話生成方法、プログラム |
| CN116682463A (zh) * | 2023-05-30 | 2023-09-01 | 广东工业大学 | 一种多模态情感识别方法及系统 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001272991A (ja) * | 2000-03-24 | 2001-10-05 | Sanyo Electric Co Ltd | 音声対話方法及び音声対話装置 |
| JP2004090109A (ja) * | 2002-08-29 | 2004-03-25 | Sony Corp | ロボット装置およびロボット装置の対話方法 |
| JP2012128440A (ja) * | 2012-02-06 | 2012-07-05 | Denso Corp | 音声対話装置 |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DE60215296T2 (de) * | 2002-03-15 | 2007-04-05 | Sony France S.A. | Verfahren und Vorrichtung zum Sprachsyntheseprogramm, Aufzeichnungsmedium, Verfahren und Vorrichtung zur Erzeugung einer Zwangsinformation und Robotereinrichtung |
| KR100590553B1 (ko) * | 2004-05-21 | 2006-06-19 | 삼성전자주식회사 | 대화체 운율구조 생성방법 및 장치와 이를 적용한음성합성시스템 |
| US8036899B2 (en) * | 2006-10-20 | 2011-10-11 | Tal Sobol-Shikler | Speech affect editing systems |
| JPWO2010001512A1 (ja) * | 2008-07-03 | 2011-12-15 | パナソニック株式会社 | 印象度抽出装置および印象度抽出方法 |
| US10453479B2 (en) * | 2011-09-23 | 2019-10-22 | Lessac Technologies, Inc. | Methods for aligning expressive speech utterances with text and systems therefor |
| US9183830B2 (en) * | 2013-11-01 | 2015-11-10 | Google Inc. | Method and system for non-parametric voice conversion |
| KR102222122B1 (ko) * | 2014-01-21 | 2021-03-03 | 엘지전자 주식회사 | 감성음성 합성장치, 감성음성 합성장치의 동작방법, 및 이를 포함하는 이동 단말기 |
| US9824681B2 (en) * | 2014-09-11 | 2017-11-21 | Microsoft Technology Licensing, Llc | Text-to-speech with emotional content |
| JP2018049132A (ja) * | 2016-09-21 | 2018-03-29 | トヨタ自動車株式会社 | 音声対話システムおよび音声対話方法 |
| US10902841B2 (en) * | 2019-02-15 | 2021-01-26 | International Business Machines Corporation | Personalized custom synthetic speech |
| KR20190098928A (ko) * | 2019-08-05 | 2019-08-23 | 엘지전자 주식회사 | 음성 인식 방법 및 장치 |
-
2019
- 2019-11-26 JP JP2021560806A patent/JPWO2021106080A1/ja active Pending
- 2019-11-26 US US17/779,528 patent/US20230005467A1/en not_active Abandoned
- 2019-11-26 WO PCT/JP2019/046184 patent/WO2021106080A1/ja not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001272991A (ja) * | 2000-03-24 | 2001-10-05 | Sanyo Electric Co Ltd | 音声対話方法及び音声対話装置 |
| JP2004090109A (ja) * | 2002-08-29 | 2004-03-25 | Sony Corp | ロボット装置およびロボット装置の対話方法 |
| JP2012128440A (ja) * | 2012-02-06 | 2012-07-05 | Denso Corp | 音声対話装置 |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240242718A1 (en) * | 2021-05-24 | 2024-07-18 | Nippon Telegraph And Telephone Corporation | Dialogue apparatus, dialogue method, and program |
| JP2023038957A (ja) * | 2021-09-08 | 2023-03-20 | 株式会社日立製作所 | 音声合成システム及び音声を合成する方法 |
| US20250328727A1 (en) * | 2024-04-19 | 2025-10-23 | Augmented Reality Concepts, Inc. | Dialogue state tracking logic control layers |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230005467A1 (en) | 2023-01-05 |
| JPWO2021106080A1 (https=) | 2021-06-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021106080A1 (ja) | 対話装置、方法及びプログラム | |
| EP3582119B1 (en) | Spoken language understanding system and method using recurrent neural networks | |
| McTear et al. | Conversational interfaces: Past and present | |
| JP5286062B2 (ja) | 対話装置、対話方法、対話プログラムおよび記録媒体 | |
| CN107818798A (zh) | 客服服务质量评价方法、装置、设备及存储介质 | |
| JP2022531994A (ja) | 人工知能ベースの会話システムの生成および動作 | |
| JPH0981632A (ja) | 情報公開装置 | |
| US12243517B1 (en) | Utterance endpointing in task-oriented conversational systems | |
| Leite et al. | Semi-situated learning of verbal and nonverbal content for repeated human-robot interaction | |
| Wilks et al. | Some background on dialogue management and conversational speech for dialogue systems | |
| CN114220425B (zh) | 基于语音识别和Rasa框架的聊天机器人系统及对话方法 | |
| Panda et al. | An efficient model for text-to-speech synthesis in Indian languages | |
| JP2024129098A (ja) | プログラム及び情報処理方法 | |
| Young et al. | Evaluation of statistical POMDP-based dialogue systems in noisy environments | |
| US11449726B1 (en) | Tailored artificial intelligence | |
| Nishimura et al. | A spoken dialog system for chat-like conversations considering response timing | |
| de Bayser et al. | Ravel: a mas orchestration platform for human-chatbots conversations | |
| JP7581651B2 (ja) | 会話制御方法、装置、及びプログラム | |
| KR102695585B1 (ko) | 감성적 소통이 가능한 대화형 인공지능 시스템 및 인공지능 서버 | |
| Patel et al. | Google duplex-a big leap in the evolution of artificial intelligence | |
| JP2002358304A (ja) | 会話制御システム | |
| CN119204225A (zh) | 交互方法、交互系统、计算设备及可读存储介质 | |
| JP2017194510A (ja) | 音響モデル学習装置、音声合成装置、これらの方法及びプログラム | |
| JP2025164125A (ja) | 制御装置、ロボットシステム、制御方法、及び制御プログラム | |
| KR20250074195A (ko) | 음성 데이터 플랫폼을 제공하는 대화형 인공지능 서버 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19954280 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021560806 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19954280 Country of ref document: EP Kind code of ref document: A1 |