WO2020196977A1

WO2020196977A1 - User persona-based interactive agent device and method

Info

Publication number: WO2020196977A1
Application number: PCT/KR2019/004267
Authority: WO
Inventors: 이수영; 김태훈; 김태호; 신영훈; 최신국; 박성진
Original assignee: 한국과학기술원
Priority date: 2019-03-26
Filing date: 2019-04-10
Publication date: 2020-10-01
Also published as: KR20200113775A; KR102199928B1

Abstract

A user persona-based interactive agent device and method according to various embodiments can be configured so as to acquire a user's persona, set a persona corresponding to the acquired persona, and execute an emotive interaction mode with the user on the basis of the set persona.

Description

Interactive agent device and method considering user persona

Various embodiments relate to an apparatus and method for an interactive agent in consideration of a user persona.

Today, interactive agent devices naturally communicate with users. At this time, the interactive agent device provides various information in response to a user's query. This interactive agent device has characteristics that guarantee anonymity and confidentiality. However, the interactive agent device as described above only performs a knowledge-based conversation. For this reason, there is a demand to emotionally communicate with the interactive agent device.

Various embodiments provide an interactive agent apparatus and method capable of emotionally communicating with a user.

Various embodiments provide an interactive agent apparatus and method capable of emotionally communicating with a user by setting a persona corresponding to the user's persona.

An operation method of an interactive agent device according to various embodiments includes an operation of identifying a persona of a user, an operation of setting a persona corresponding to the identified persona, and an emotional conversation mode with the user based on the set persona. It may include an operation to execute.

The interactive agent apparatus according to various embodiments may include an input module and a processor connected to the input module and configured to perform an emotional conversation mode with a user. According to various embodiments, the processor may be configured to recognize a persona of the user, set a persona corresponding to the identified persona, and execute an emotional conversation mode with the user based on the set persona. have.

According to various embodiments, the interactive agent device may execute an emotional conversation mode with a user based on a persona corresponding to a persona of the user among various personas. That is, the interactive agent device can select a persona suitable for the user from various personas. At this time, since the interactive agent device can recognize the user's emotional state from the user's utterance data, it is possible to execute an emotional conversation mode with the user. In addition, the interactive agent device may execute the conversation mode while inducing the user's emotional state in a positive direction. That is, the interactive agent device can induce the user's emotional state in a positive direction by executing the personalized emotional conversation mode for the user.

1 is a diagram illustrating an interactive agent device according to various embodiments.

2 is a diagram illustrating a method of operating an interactive agent device according to various embodiments.

FIG. 3 is a diagram illustrating an operation of executing the emotional conversation mode of FIG. 2.

Hereinafter, various embodiments of the present document will be described with reference to the accompanying drawings.

1 is a diagram illustrating an emotional interactive agent device 100 according to various embodiments.

Referring to FIG. 1, the interactive agent device 100 according to various embodiments may include at least one of an input module 110, an output module 120, a memory 130, or a processor 140. have.

The input module 110 may receive commands or data to be used for components of the interactive agent device 100 from outside the interactive agent device 100. The input module 110 is an input device configured to directly input commands or data to the interactive agent device 100 or a communication device configured to receive commands or data by communicating with an external electronic device wired or wirelessly. It may include at least any one of. For example, the input device may include at least one of a microphone, a mouse, a keyboard, and a camera. For example, the communication device may include at least one of a wired communication device or a wireless communication device, and the wireless communication device may include at least one of a short-range communication device and a long-distance communication device. According to various embodiments, the input module 110 may input utterance data of a user.

The output module 120 may provide information to the outside of the interactive agent device 100. The output module 120 includes at least one of an audio output device configured to audibly output information, a display device configured to visually output information, or a communication device configured to transmit information by wired or wireless communication with an external electronic device. It can contain either. For example, the communication device may include at least one of a wired communication device or a wireless communication device, and the wireless communication device may include at least one of a short-range communication device and a long-distance communication device.

The memory 130 may store data used by components of the interactive agent device 100. The data may include input data or output data for a program or a command related thereto. For example, the memory 130 may include at least one of a volatile memory or a nonvolatile memory. According to various embodiments, the memory 130 may store a program for executing an interactive mode with a user, and may store various personas related to the interactive mode.

The processor 140 may execute a program in the memory 130 to control components of the interactive agent device 100 and perform data processing or calculation. According to various embodiments, the processor 140 may execute a conversation mode with a user using an artificial neural network structure. For example, the processor 140 may set any one of the personas and execute an emotional conversation mode with the user based on the set persona. To this end, the processor 140 may identify the user's persona and select one corresponding to the user's persona from the personas. The processor 140 may determine the persona of the user based on the user's speech data input through the input module 110. For example, the emotional conversation mode may include a conversation mode for psychological counseling of a user. The processor 140 may execute an emotional conversation mode with the user based on a persona for psychological counseling of the user. The processor 140 may track a change in the user's emotional state based on the user's speech data input through the input module 110 while executing the emotional conversation mode with the user. The processor 140 may evaluate the influence of the user's emotional state change while executing the conversation mode with the user. Through this, the processor 140 may proceed in a conversation mode with the user while inducing the user's emotional state in a positive direction.

The interactive agent device 100 according to various embodiments may include an input module 110 and a processor 140 connected to the input module 110 and configured to perform an emotional conversation mode with a user. .

According to various embodiments, the processor 140 may be configured to recognize a persona of a user, set a persona corresponding to the identified persona, and execute an emotional conversation mode with the user based on the set persona.

According to various embodiments, the processor 140 analyzes the user's utterance data to determine at least one of the user's utterance intention, the user's emotional state, or characteristic information related to the user, and Based on at least one of the characteristic information, it may be configured to identify a persona.

According to various embodiments, the processor 140 may be configured to track a change in the user's emotional state based on the user's speech data while executing the emotional conversation mode.

According to various embodiments, the processor 140 may be configured to evaluate an influence on a change in an emotional state while executing the emotional conversation mode.

According to various embodiments, the processor 140 may be configured to obtain a speech context from speech data of a user and output response data corresponding to the speech context while executing the emotional conversation mode.

According to various embodiments, the processor 140 may be configured to output response data based on the speech context and influence.

According to various embodiments, the emotional conversation mode may include a conversation mode for psychological counseling.

According to various embodiments, the characteristic information may include at least one of the user's age or gender.

2 is a diagram illustrating a method of operating an interactive agent device 100 according to various embodiments.

Referring to FIG. 2, the interactive agent device 100 may detect input data in operation 210. The processor 140 may detect user input data through the input module 110. According to an embodiment, the processor 140 may directly detect input data through an input device. According to another embodiment, the processor 140 may detect input data received from an external electronic device through a communication device. For example, the input data may include user's speech data.

The interactive agent device 100 may determine the persona of the user in operation 220. The processor 140 may determine the user's persona based on the user's input data. The processor 140 may analyze the input data and extract at least one feature point related to at least one of a user's voice, video, or text. Through this, the processor 140 may check at least one of a user's intention, a user's emotional state, or characteristic information related to the user based on the feature point of the input data. For example, the processor 140 may check the user's intention by performing sentence classification of text. For example, the characteristic information may include at least one of the user's age or gender. Through this, the processor 140 may determine the persona of the user based on at least one of the user's intention, the user's emotional state, or characteristic information related to the user.

According to an embodiment, the interactive agent device 100 may detect user input data while executing an arbitrary chat mode with the user. In this case, the interactive agent device 100 may also be executing an emotional conversation mode with a user based on a preset persona among various personas. The interactive agent device 100 may use the sentence embedding technique while executing the emotional conversation mode with the user. For example, the sentence embedding technique may include a self-attentive sentence embedding technique. In addition, the interactive agent device 100 may recognize a persona of a user through dialog embedding based on input data. Here, if there is a persona label predetermined in relation to the input data, the interactive agent device 100 may assign a weight to the persona.

The interactive agent device 100 may set a persona corresponding to the persona of the user in operation 230. In this case, the processor 140 may select one of various personas corresponding to the persona of the user and set it. That is, the processor 140 may select and set a persona suitable for a user from various personas. According to an embodiment, the processor 140 may select a persona for psychological counseling.

The interactive agent device 100 may execute an emotional conversation mode with a user based on the persona set in operation 240. To this end, the processor 140 may provide a graphic user interface (GUI) for an emotional conversation mode with a user. At this time, the memory 130 may store conversation data collected from actual counseling specialists. Through this, the processor 140 may execute an emotional conversation mode with the user by using the conversation data. According to an embodiment, the emotional conversation mode may include a conversation mode for psychological counseling. Here, the processor 140 may output response data based on conversation data through the output module 120 in response to input data input through the input module 110. The input data may include user's speech data. For example, the processor 140 may output response data through at least one of an audio output device, a display device, and a communication device. In addition, the processor 140 may hierarchically associate the input data and the response data, store it in the memory 130 as a conversation record, and use this to perform an emotional conversation mode. Through this, the processor 140 may proceed with the emotional conversation mode based on context understanding. In this case, the processor 140 may perform an emotional conversation mode with the user while inducing the user's emotional state in a positive direction.

Referring to FIG. 3, the interactive agent device 100 may detect speech data in operation 310. While executing the emotional conversation mode with the user, the processor 140 may detect speech data input through the input module 110. According to an embodiment, the processor 140 may detect speech data directly input through a microphone. According to another embodiment, the processor 140 may detect speech data received from an external electronic device through a communication device.

The interactive agent device 100 may analyze the speech data in operation 320. The processor 140 may analyze the speech data and extract at least one feature point related to at least one of a user's voice or text. The processor 140 may acquire a speech context from speech data. For example, the processor 140 may acquire a speech context based on text of speech data or a feature point related to the text. For example, the processor 140 may obtain a speech context by performing sentence classification of text data. In addition, the processor 140 may recognize the user's emotional state from the speech data. For example, the processor 140 may recognize the user's emotional state based on a feature point related to at least one of the user's voice or text.

The interactive agent device 100 may output response data in response to the speech data in operation 330. The processor 140 may output response data corresponding to the speech context. At this time, the memory 130 may store conversation data collected from actual counseling specialists. Through this, the processor 140 may determine response data corresponding to the speech context from the conversation data. Here, the processor 140 may determine response data based on a previous conversation record stored in the memory 130. In addition, the processor 140 may output response data through the output module 120.

After outputting the response data in operation 330, the interactive agent device 100 may detect the speech data in operation 340. While executing the emotional conversation mode with the user, the processor 140 may detect speech data input through the input module 110. According to an embodiment, the processor 140 may detect speech data directly input through a microphone. According to another embodiment, the processor 140 may detect speech data received from an external electronic device through a communication device.

The interactive agent device 100 may analyze speech data in operation 350. The processor 140 may analyze the speech data and extract at least one feature point related to at least one of a user's voice or text. The processor 140 may acquire a speech context from speech data. For example, the processor 140 may acquire a speech context based on text of speech data or a feature point related to the text. For example, the processor 140 may obtain a speech context by performing sentence classification of text data. In addition, the processor 140 may recognize the user's emotional state from the speech data. For example, the processor 140 may recognize the user's emotional state based on a feature point related to at least one of the user's voice or text. Through this, the interactive agent device 100 may track changes in the user's emotional state. The processor 140 may check a change from a previously recognized emotional state to a currently recognized emotional state. At this time, the processor 140 may check whether or not the user's emotional state change is proceeding in a positive direction.

The interactive agent device 100 may evaluate the influence of the user's emotional state change in operation 360. According to an embodiment, the processor 140 may evaluate the influence on psychological counseling. In this case, the processor 140 may determine that the change in the user's emotional state is based on the response data output in operation 330, and may evaluate the influence of the user's emotional state change as the influence according to the response data. Through this, the processor 140 may evaluate the suitability of the persona based on the influence. In this case, the processor 140 may evaluate the suitability of the response data output in operation 330.

If the emotional conversation mode is not terminated in operation 370, the interactive agent device 100 may return to operation 330. In this case, the processor 140 may output response data in response to the speech data in operation 330. The processor 140 may output response data corresponding to the speech context of the speech data. In this case, the processor 140 may determine response data corresponding to the speech context from the conversation data. The processor 140 may determine response data to induce a change in the user's emotional state in a positive direction. Here, the processor 140 may determine the response data in consideration of the suitability for the previously output response data. In addition, the processor 140 may output response data through the output module 120. Thereafter, the interactive agent 100 may repeatedly perform operations 330 to 370 until the emotional conversation mode ends in operation 370. According to an embodiment, the processor 140 determines the depressive region and severity -> searches for a stimulus triggering depression -> seeks and builds coping power -> uses therapeutic techniques and provides information/advice -> Following the psychological counseling process, a dialogue mode for psychological counseling can be performed.

Meanwhile, when an event for terminating the emotional conversation mode is detected in operation 370, the interactive agent device 100 may terminate the emotional conversation mode. For example, when a request for terminating the emotional conversation mode is received through the input module 110, the processor 140 may terminate the emotional conversation mode.

The operation method of the interactive agent device 100 according to various embodiments includes an operation of identifying a persona of a user, an operation of setting a persona corresponding to the identified persona, and an emotional conversation mode with a user based on the set persona. It may include an operation to execute.

According to various embodiments, the operation of identifying a persona includes an operation of analyzing user's speech data to confirm at least one of the user's speech intention, the user's emotional state, or characteristic information related to the user, Based on at least one of emotional state or characteristic information, it may include an operation of identifying a persona.

According to various embodiments, the operation of executing the emotional conversation mode may include an operation of tracking a change in the user's emotional state based on the user's speech data.

According to various embodiments, the operation of executing the emotional conversation mode may further include an operation of evaluating an influence on a change in an emotional state.

According to various embodiments, the operation of executing the emotional conversation mode may include an operation of obtaining a speech context from speech data and an operation of outputting response data corresponding to the speech context.

According to various embodiments, the operation of outputting response data may include an operation of outputting response data based on a speech context and an influence.

According to various embodiments, the interactive agent device 100 may execute an emotional conversation mode with a user based on a persona corresponding to a persona of the user among various personas. That is, the interactive agent device 100 may select a persona suitable for a user from various personas. At this time, since the interactive agent device 100 can recognize the user's emotional state from the user's speech data, it is possible to execute an emotional conversation mode with the user. In addition, the interactive agent device 100 may execute the conversation mode while inducing the user's emotional state in a positive direction. That is, the interactive agent device 100 may induce a user's emotional state in a positive direction by executing an emotional conversation mode personalized to the user.

Various embodiments of the present document and terms used therein are not intended to limit the technology described in this document to a specific embodiment, and should be understood to include various modifications, equivalents, and/or substitutes of the corresponding embodiment. In connection with the description of the drawings, similar reference numerals may be used for similar elements. Singular expressions may include plural expressions unless the context clearly indicates otherwise. In this document, expressions such as "A or B", "at least one of A and/or B", "A, B or C" or "at least one of A, B and/or C" are all of the items listed together. It can include possible combinations. Expressions such as "first", "second", "first" or "second" can modify the corresponding elements regardless of their order or importance, and are only used to distinguish one element from another. The components are not limited. When it is mentioned that a certain (eg, first) component is “(functionally or communicatively) connected” or “connected” to another (eg, second) component, the certain component is It may be directly connected to the component, or may be connected through another component (eg, a third component).

The term "module" used in this document includes a unit composed of hardware, software, or firmware, and may be used interchangeably with terms such as, for example, logic, logic blocks, parts, or circuits. A module may be an integrally configured component or a minimum unit or a part of one or more functions. For example, the module may be configured as an application-specific integrated circuit (ASIC).

Various embodiments of the present document are software including one or more instructions stored in a storage medium (eg, memory 130) readable by a machine (eg, interactive agent device 100). Can be implemented as For example, the processor of the device (for example, the processor 140) may call at least one instruction from among one or more instructions stored from a storage medium and execute it. This enables the device to be operated to perform at least one function according to the at least one command invoked. The one or more instructions may include code generated by a compiler or code that can be executed by an interpreter. A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here,'non-transient' only means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic wave), and this term refers to the case where data is semi-permanently stored in the storage medium. It does not distinguish between temporary storage cases.

According to various embodiments, each component (eg, a module or program) of the described components may include a singular number or a plurality of entities. According to various embodiments, one or more components or operations among the above-described corresponding components may be omitted, or one or more other components or operations may be added. Alternatively or additionally, a plurality of components (eg, a module or a program) may be integrated into one component. In this case, the integrated component may perform one or more functions of each component of the plurality of components in the same or similar to that performed by the corresponding component among the plurality of components prior to integration. According to various embodiments, operations performed by a module, program, or other component may be sequentially, parallel, repeatedly, or heuristically executed, or one or more of the operations may be executed in a different order, or omitted. , Or one or more other actions may be added.

Claims

In the method of operating an interactive agent device,

Identifying the persona of the user;

Setting a persona corresponding to the identified persona; And

And executing an emotional conversation mode with the user based on the set persona.
The method of claim 1, wherein the determining of the persona comprises:

Analyzing the user's speech data to confirm at least one of the user's speech intention, the user's emotional state, and characteristic information related to the user; And

And identifying the persona based on at least one of the speech intention, emotional state, or characteristic information.
The method of claim 1, wherein the operation of executing the emotional conversation mode comprises:

And tracking a change in the user's emotional state based on the user's speech data.
The method of claim 3, wherein the operation of executing the emotional conversation mode comprises:

The method further comprising evaluating an influence on the change in the emotional state.
The method of claim 4, wherein the operation of executing the emotional conversation mode comprises:

Obtaining a speech context from the speech data; And

And outputting response data based on at least one of the speech context or the influence.
The method of claim 5,

The emotional conversation mode includes a conversation mode for psychological counseling.
The method of claim 2,

The characteristic information includes at least one of the user's age or gender.
In the interactive agent device,

Input module; And

And a processor connected to the input module and configured to perform an emotional conversation mode with a user,

The processor,

Identify the persona of the user,

Set a persona corresponding to the identified persona,

An apparatus configured to execute the emotional conversation mode with the user based on the set persona.
The method of claim 8, wherein the processor,

By analyzing the user's speech data, at least one of the user's speech intention, the user's emotional state, or characteristic information related to the user is checked,

An apparatus configured to identify the persona based on at least one of the speech intention, emotional state, or characteristic information.
The method of claim 8, wherein the processor,

An apparatus, configured to track a change in an emotional state of the user based on the user's speech data while executing the emotional conversation mode.
The method of claim 10, wherein the processor,

An apparatus configured to evaluate an influence on a change in the emotional state while executing the emotional conversation mode.
The method of claim 11, wherein the processor, while executing the emotional conversation mode,

Obtaining a speech context from the speech data,

The apparatus, configured to output response data based on at least one of the speech context or the influence.
The method of claim 12,

The emotional conversation mode includes a conversation mode for psychological counseling.
The method of claim 9,

The characteristic information includes at least one of the user's age or gender.