WO2016175354A1 - Artificial intelligence conversation device and method - Google Patents

Artificial intelligence conversation device and method Download PDF

Info

Publication number
WO2016175354A1
WO2016175354A1 PCT/KR2015/004347 KR2015004347W WO2016175354A1 WO 2016175354 A1 WO2016175354 A1 WO 2016175354A1 KR 2015004347 W KR2015004347 W KR 2015004347W WO 2016175354 A1 WO2016175354 A1 WO 2016175354A1
Authority
WO
WIPO (PCT)
Prior art keywords
response
voice
user
conversation
question
Prior art date
Application number
PCT/KR2015/004347
Other languages
French (fr)
Korean (ko)
Inventor
이영근
김승곤
임완섭
임성환
김우현
이영호
김두호
Original Assignee
주식회사 아카인텔리전스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 아카인텔리전스 filed Critical 주식회사 아카인텔리전스
Priority to PCT/KR2015/004347 priority Critical patent/WO2016175354A1/en
Publication of WO2016175354A1 publication Critical patent/WO2016175354A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The present invention relates to an artificial intelligence conversation device and method which supports a conversation between a human and a robot. An artificial intelligence conversation device according to an aspect of the present invention comprises: an input answer analysis unit which analyzes an answer input from a user; a response control unit which selects at least one response scenario among predetermined scenarios according to the result of the analysis and transmits an output command for a response and a question to the user's answer; and an output unit which outputs a silence or a conversation start voice and outputs a response voice and a question voice according to the output command from the response control unit.

Description

Artificial Intelligence Apparatus and Method

The present invention relates to an artificial intelligence dialogue apparatus and method for supporting a dialogue between a person and a robot.

Chat is a computer or a portable terminal to support the conversation with the other party over the network, it is widely used in the form of chat window messenger.

However, in a chat between a person and a person, when there is no other party, a chat cannot be performed, resulting in a chat robot.

As the necessity increases as a means of communication using natural language between humans and computers (robots) in intelligent agents, various chat robot technologies have been proposed.

In the case of the conversation engine according to the related art, a corresponding answer preset according to the text input by the user is provided, and thus the subject of the conversation changes rapidly according to the user's input.

Despite the fact that the dialogue engine using natural language between humans and robots can be said to be the most important factor to minimize the heterogeneity of dialogue with the robot and to enable natural dialogue, the conventional technology simply answers based on the user's input. As the passive dialog engine providing only provides a user with not only a lot of heterogeneity, but also requires the user to induce a conversation, the flow of the conversation and the user's interest in the conversation are sharply dropped.

The present invention has been proposed to solve the above-described problems, by proceeding the dialogue in the order of sending a question, receiving an answer, responding to the answer and sending the next question, by inducing the user to lead to the next conversation in response to the subject, An object of the present invention is to provide an artificial intelligence device and method for supporting a natural conversation with a user without departing from the present invention.

According to an aspect of an exemplary embodiment, an artificial intelligence dialog device may include an input response analysis unit analyzing an input user response, and selecting at least one response scenario among preset scenarios according to an analysis result to respond to a response and a question about a user response. And an output unit for outputting a response control unit for outputting an output command and a silent or conversation start voice, and outputting a response voice and a question voice according to the output command of the reaction control unit.

The artificial intelligence dialogue apparatus and method according to the present invention actively proceeds a conversation in the order of question transmission, user response reception, and response response to a user response based on a preset scenario, thereby providing only a predetermined answer according to a user input. Rather, it leads to active conversations, thereby minimizing the heterogeneity of the user by conducting conversations with the conversation engine, and enhancing the interest of the conversation.

It is necessary to classify the answers received from the users by type, to organize the components belonging to the answers to improve the reliability of the analysis of the user answers, and to provide the response according to the user's answers accordingly, so as to flexibly proceed to the next order of the conversation. There is a possible effect.

The effects of the present invention are not limited to those mentioned above, and other effects that are not mentioned will be clearly understood by those skilled in the art from the following description.

1 is a block diagram illustrating an artificial intelligence conversation apparatus according to an embodiment of the present invention.

2 is a flowchart illustrating an artificial intelligence conversation method according to an embodiment of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, advantages and features of the present invention, and methods of achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings.

However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various forms, and only the following embodiments are provided to those skilled in the art to which the present invention pertains. It is merely provided to easily inform the configuration and effects, the scope of the present invention is defined by the description of the claims.

Meanwhile, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and / or “comprising” refers to the presence of one or more other components, steps, operations and / or devices in which the mentioned components, steps, operations and / or devices are described. Or does not exclude addition.

1 is a block diagram illustrating an artificial intelligence conversation apparatus according to an embodiment of the present invention.

Artificial intelligence communication apparatus according to an embodiment of the present invention includes an input unit 100 for receiving a voice from the user utterance, Speech To Text (STT) unit 200 for converting the voice received by the input unit 100 into text, Input response analysis unit 300 for receiving the STT conversion result and analyzing the user response, and at least one response scenario selected from the preset scenarios according to the analysis result, and outputs the response command for the user response and the question And a output unit 500 for outputting a response control unit 400 for transmitting a silent or conversation start voice and outputting a response voice and a question voice according to an output command of the reaction control unit 400.

Input unit 100 according to an embodiment of the present invention receives the user's voice through the microphone of the artificial intelligence chat device.

The artificial intelligence dialogue apparatus according to the embodiment of the present invention performs the output of the question to the user, the input of the answer from the user, the output of the response to the response to the user and the output of the next question according to the response output to the user in order.

The question according to an embodiment of the present invention is provided at the beginning of a conversation with a user and is expressed by a conversation start voice.

At this time, the conversation start voice is the first question provided through the output unit, or when the silent sound is output, the conversation proceeds in the order of the response voice and the question voice output from the user's first voice input.

That is, by not only providing a preset matching answer to a query input from the user, but also asking a user a question based on a preset scenario, analyzing the user's response to this, and outputting a response and the next question, one conversation Supports natural dialogue between the user and the AI dialog within the subject.

The output unit 500 according to an embodiment of the present invention outputs a conversation start voice, which is a question voice for starting a conversation, based on the application execution environment information before starting the conversation.

At this time, the first embodiment outputs a conversation start voice, which is a question voice, and the conversation proceeds in the order of the user's answer, response voice, and question voice output. In the second embodiment, silence is output, The voice input of is the starting point of the conversation and the conversation proceeds in the order of response voice and question voice output.

In this case, the application execution environment information may be at least one of a built-in scenario database, a user's personal information, a user's behavior pattern, a record of a previous conversation, and surrounding environment information. If the record of previous conversations is about a company's project, as the application runs, it outputs the question, "How did the project go today?"

Also, if the application execution environment information is "weekend" and the weather information is "sunny", the output unit 500 is a question to start a conversation, not a question related to the company, "good weekend. Is the weather good? ”

That is, the output unit 500 according to an embodiment of the present invention does not only provide only a predetermined answer based on a user's input, but also provides a user with a question of an appropriate topic as the application is executed, thereby providing a conversation. It's a natural way to start and provide a customized conversation.

In addition, the reaction control unit 400 according to an embodiment of the present invention not only commands to select and output a response and a question for a user response from a pre-stored list, but also generates a new response and a question for the user response. It is also possible to print.

The input unit 100 according to an embodiment of the present invention receives a user's voice input in response to a conversation start voice output or after a silent output, and the STT unit 200 inputs a result of converting the voice of the user into a string. The answer analysis unit 300 is provided.

The input response analysis unit 300 analyzes whether or not the user answer converted into a character string corresponds to a predetermined answer type.

The structured scenario database according to the embodiment of the present invention stores and manages the selected answers, the general answers, the answers that you want to repeat, and the unrelated answers by their types.

The selective answer is a type in which the classification according to the answer selection is clearly defined in the user's answer to the question, and examples thereof include positive / negative, spring / summer / fall / winter, and the like.

The general answer, unlike the optional answer, is a type of ambiguity and multiple choices for a question, such as the answer to the question “What kind of exercise do you like?”.

Repetition is the type that the desired answer corresponds to the answer you want the output to ask again. At this time, the output unit 500 re-outputs the question immediately output.

An irrelevant answer is an unrelated answer to a question, in which the answer is "What weather do you like?" At this time, the artificial intelligence dialog device according to an embodiment of the present invention is to modify the scenario based on the user's answer to extract and provide the answers and questions sequentially or to query the user again the question of the category corresponding to the first question It is also possible.

The input response analysis unit 300 determines which category of a predetermined category the input user's answer belongs to, and displays the result. For example, when it corresponds to the optional answer type, it is determined whether the user response input through sentence analysis is affirmative or negative for the question.

The reaction controller 400 according to an embodiment of the present invention selects at least one reaction scenario from among scenarios stored in the instrumented scenario database according to the analysis result, and responds based on a reaction scenario according to a subject to which a user answer is applicable. And generate an output command for the question.

By extracting and providing responses and questions based on scenarios so as to lead to the next conversation according to the user's answer, the user can perform a natural conversation with the AI conversation device corresponding to his answer without any dissatisfaction.

The response control unit 400 transmits a pause command signal to the output unit 500 when new voice data is received from the user during output of the response voice and the question voice of the output unit 500, and then input response analysis unit 300. Resample the responses and questions according to the analysis results according to the new voice data of).

That is, the reaction scenario selected by the reaction controller 400 according to an embodiment of the present invention is determined and modified in real time according to the user's response or the user's comment, and the reaction scenario is appropriately based on the instrumented scenario database. It is corrected.

For example, if the first question is “Have you been in the company?” And while you were talking to you about what happened at the company, your answer was “But I have to go to the wedding this weekend.” If it is determined that the topic is to be transformed, the reaction control unit 400 changes the scenario (eg, “Would your friend get married? Where is the marriage ceremony?” Asking the first question on the topic and entering a specific event called wedding attendance). To continue the conversation).

The output unit 500 according to an embodiment of the present invention outputs text corresponding to the response voice and the question voice through the screen. Accordingly, even in a noisy environment in which the user cannot properly receive a voice from the output unit 500, the user may recognize the response and the question from the text output through the screen, and continue the conversation by uttering the answer. .

2 is a flowchart illustrating an artificial intelligence conversation method according to an embodiment of the present invention.

According to an embodiment of the present invention, an AI conversation method includes outputting a conversation start voice (including silence), receiving a user response (S200), analyzing a user response, and analyzing the result. Selecting at least one reaction scenario from among predetermined scenarios, extracting a response and a question based on the response scenario, and outputting a response voice and a question voice according to the extracted response and question (S400) do.

According to an embodiment of the present invention, step S100 is a step of outputting a question voice or a silence corresponding to a conversation start question to start a conversation, and includes a structured scenario database, a user's personal information, a user's behavior pattern, and a previous conversation. The conversation start question is extracted according to the application execution environment information which is at least one of the recording information, and the output information.

That is, based on the user's personal information, date, time, environment information such as the history of the previous conversation, and extracts the first question that corresponds to the subject of the conversation that the user may be interested in, and outputs it to the user to start the conversation. Actively perform

Alternatively, when the silent sound is output in step S100, the dialogue is performed in response to the user's answer input by voice in the step S200 and in the order of questions.

Step S200 according to an embodiment of the present invention converts a user's answer input by voice into text, and provides a sentence for analyzing a user's answer.

Step S300 according to an embodiment of the present invention performs analysis by determining which type of response type the user answer corresponds to. According to the present invention, the dialogue is performed in the order of the first question, the user's answer, the response to the user's answer, and the question according to the response (when silence is output, the user's voice input, the response to the user's voice, the question according to the response) To determine which type of response the user's answer falls into, which is divided into preset answer types (e.g., optional, general, answer you want to repeat, unrelated answer), Use it as evidence.

According to an embodiment of the present invention, the step S400 outputs the text corresponding to the response voice and the question voice through the screen, thereby providing the user with the response text and the question text visually as well as hearing, thereby supporting more accurate recognition of the user. .

According to an embodiment of the present invention, step S600 is a step of determining whether the conversation ends with a criterion. When the user is confirmed to say goodbye for a predetermined time, if the user does not answer for a predetermined time or longer, the user for a predetermined time or more. If there is no answer, and if the end criterion, such as when there is no user's reply to the voice of the output unit calling the user, the conversation is terminated. From step S200 to step S500 are repeated.

So far I looked at the center of the embodiments of the present invention. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

Claims (12)

  1. An input response analyzer analyzing the input user response;
    A response control unit for selecting at least one response scenario among preset scenarios according to an analysis result and transmitting an output command for a response to the user response and a question; And
    An output unit for outputting a silent or conversation start voice and outputting a response voice and a question voice according to an output command of the response control unit
    Artificial intelligence communication device comprising a.
  2. The method of claim 1,
    The output unit is extracted based on the application execution environment information, and outputs a conversation start voice which is a question voice to start a conversation.
    AI talk device.
  3. The method of claim 2,
    The output unit extracts the conversation start voice according to the application execution environment information, which is at least one of a built-in scenario database, a user's personal information, a user's behavior pattern, and information on a previous conversation.
    AI talk device.
  4. The method of claim 1,
    The input response analysis unit receives a result of converting the voice of the user input in response to the conversation start voice output or the voice of the user input after the silent output into a character string, and the user response is one of preset response types. To perform an analysis by determining whether a type is applicable
    AI talk device.
  5. The method of claim 4, wherein
    The reaction controller selects at least one reaction scenario from among scenarios stored in the scenario database constructed according to the analysis result, and outputs a response command for a response and a question based on the reaction scenario according to the subject to which the user answer corresponds. To transmit
    AI talk device.
  6. The method of claim 1,
    The response control unit transmits a pause command signal to the output unit when new voice data is received from the user among the response voices and the question voice outputs of the output unit, and analyzes the result according to the new voice data of the input response analyzer. Reextracting responses and questions according to
    AI talk device.
  7. The method of claim 1,
    The output unit to output a text corresponding to the response voice and the question voice through the screen
    AI talk device.
  8. (a) outputting a silent or conversation start voice;
    (b) receiving a user response according to the silent or conversation start voice output;
    (c) analyzing the user response and selecting at least one response scenario among preset scenarios based on the result, and extracting a response and a question based on the response scenario; And
    (d) outputting a response voice and a question voice according to the extracted response and question
    Artificial intelligence conversation method comprising a.
  9. The method of claim 8,
    The step (a) is to output a conversation start voice according to the application execution environment information which is at least one of a built-in scenario database, a user's personal information, a user's behavior pattern, and a record of a previous conversation.
    Artificial intelligence conversation method.
  10. The method of claim 8,
    The step (b) is to convert the user response inputted by voice into text.
    Artificial intelligence conversation method.
  11. The method of claim 8,
    The step (c) is to determine which type of response type the user answer corresponds to and performs an analysis.
    Artificial intelligence conversation method.
  12. The method of claim 8,
    In step (d), the text corresponding to the response voice and the question voice is output through the screen.
    Artificial intelligence conversation method.
PCT/KR2015/004347 2015-04-29 2015-04-29 Artificial intelligence conversation device and method WO2016175354A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/KR2015/004347 WO2016175354A1 (en) 2015-04-29 2015-04-29 Artificial intelligence conversation device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2015/004347 WO2016175354A1 (en) 2015-04-29 2015-04-29 Artificial intelligence conversation device and method

Publications (1)

Publication Number Publication Date
WO2016175354A1 true WO2016175354A1 (en) 2016-11-03

Family

ID=57199748

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2015/004347 WO2016175354A1 (en) 2015-04-29 2015-04-29 Artificial intelligence conversation device and method

Country Status (1)

Country Link
WO (1) WO2016175354A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019000326A1 (en) * 2017-06-29 2019-01-03 Microsoft Technology Licensing, Llc Generating responses in automated chatting
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10390213B2 (en) 2014-09-30 2019-08-20 Apple Inc. Social reminders
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10417344B2 (en) 2014-05-30 2019-09-17 Apple Inc. Exemplar-based natural language processing
US10438595B2 (en) 2014-09-30 2019-10-08 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10529332B2 (en) 2015-03-08 2020-01-07 Apple Inc. Virtual assistant activation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020010226A (en) * 2000-07-28 2002-02-04 정명수 Internet Anything Response System
WO2014010879A1 (en) * 2012-07-09 2014-01-16 엘지전자 주식회사 Speech recognition apparatus and method
WO2014088377A1 (en) * 2012-12-07 2014-06-12 삼성전자 주식회사 Voice recognition device and method of controlling same
US20140222436A1 (en) * 2013-02-07 2014-08-07 Apple Inc. Voice trigger for a digital assistant

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020010226A (en) * 2000-07-28 2002-02-04 정명수 Internet Anything Response System
WO2014010879A1 (en) * 2012-07-09 2014-01-16 엘지전자 주식회사 Speech recognition apparatus and method
WO2014088377A1 (en) * 2012-12-07 2014-06-12 삼성전자 주식회사 Voice recognition device and method of controlling same
US20140222436A1 (en) * 2013-02-07 2014-08-07 Apple Inc. Voice trigger for a digital assistant

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KOREA CREATIVE CONTENT AGENCY: "This Month's Issue, Trend and Prospect of Speech Recognition Technology", CULTURE TECHNOLOGY(CT) IN-DEPTH STUDY, November 2011 (2011-11-01), Retrieved from the Internet <URL:https://www.kocca.kr/knowledge/publication/ct/ksFiles/afieldfile/2011/12/07/87NEmyIcVWMc.pdf> *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10417344B2 (en) 2014-05-30 2019-09-17 Apple Inc. Exemplar-based natural language processing
US10390213B2 (en) 2014-09-30 2019-08-20 Apple Inc. Social reminders
US10438595B2 (en) 2014-09-30 2019-10-08 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10529332B2 (en) 2015-03-08 2020-01-07 Apple Inc. Virtual assistant activation
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
WO2019000326A1 (en) * 2017-06-29 2019-01-03 Microsoft Technology Licensing, Llc Generating responses in automated chatting
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10504518B1 (en) 2018-06-03 2019-12-10 Apple Inc. Accelerated task performance

Similar Documents

Publication Publication Date Title
RU2352979C2 (en) Synchronous comprehension of semantic objects for highly active interface
US8719015B2 (en) Dialogue system and method for responding to multimodal input using calculated situation adaptability
US9715875B2 (en) Reducing the need for manual start/end-pointing and trigger phrases
RU2349969C2 (en) Synchronous understanding of semantic objects realised by means of tags of speech application
US10079013B2 (en) Sharing intents to provide virtual assistance in a multi-person dialog
US10096316B2 (en) Sharing intents to provide virtual assistance in a multi-person dialog
US20050192730A1 (en) Driver safety manager
JP2005321817A (en) Method and apparatus for obtaining combining information from speech signals for adaptive interaction in teaching and testing
JP5405672B2 (en) Foreign language learning apparatus and dialogue system
WO2013085320A1 (en) Method for providing foreign language acquirement and studying service based on context recognition using smart device
US8265933B2 (en) Speech recognition system for providing voice recognition services using a conversational language model
KR101860281B1 (en) Systems and methods for haptic augmentation of voice-to-text conversion
US20030130847A1 (en) Method of training a computer system via human voice input
WO2012115324A1 (en) Conversation management method, and device for executing same
EP1494499A2 (en) Ideal transfer of call handling from automated systems to human operators
US10068575B2 (en) Information notification supporting device, information notification supporting method, and computer program product
US8371857B2 (en) System, method and device for language education through a voice portal
US20020128840A1 (en) Artificial language
DE69935909T2 (en) Device for processing information
US9070369B2 (en) Real time generation of audio content summaries
US9495350B2 (en) System and method for determining expertise through speech analytics
JP2006171719A (en) Interactive information system
CN102439661A (en) Service oriented speech recognition for in-vehicle automated interaction
WO2005122145A1 (en) Speech recognition dialog management
KR100906136B1 (en) Information processing robot

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15890795

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14/02/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 15890795

Country of ref document: EP

Kind code of ref document: A1