WO2019142976A1

WO2019142976A1 - Display control method, computer-readable recording medium, and computer device for displaying conversation response candidate for user speech input

Info

Publication number: WO2019142976A1
Application number: PCT/KR2018/005937
Authority: WO
Inventors: 설재호; 임보훈; 손보경; 장세영
Original assignee: 주식회사 머니브레인
Priority date: 2018-01-16
Filing date: 2018-05-25
Publication date: 2019-07-25
Also published as: KR101891495B1

Abstract

Provided is a method, performed by a computer, for intervening in a call between a first user terminal and a second user terminal, which are remotely placed from each other, so as to control a predetermined display associated with the second user terminal. The call includes transmission/reception of sound information between the first user terminal and the second user terminal. The method comprises the steps of: during the call, allowing the display to show a textual expression of user intent determined by natural language-processing of a user speech input, which is made by a first user on the first user terminal, according to a predetermined knowledge base model; and allowing the display to show textual expressions of one or more conversation response candidates, respectively, which can be provided as conversation responses to the user speech input, the textual expressions having been determined on the basis of the user intent.

Description

A display control method for displaying a dialog response candidate for a user utterance input, a computer readable recording medium, and a computer apparatus

This disclosure relates to the analysis of user utterances by the AI system of conversation understanding, and more particularly to a method of visualizing and presenting the results of analysis of user utterance by the AI system of conversation understanding.

Description of the Related Art [0002] In recent years, with the development of artificial intelligence fields, especially natural language understanding fields, it has become possible to move away from the machine operation according to the conventional machine-centered command input / output method and to allow users to use natural language in a more human-friendly manner such as voice and / The development and use of a conversation understanding AI system (e.g., chatbot) that allows the machine to operate in an interactive manner and obtain the desired service from the machine is increasing. Accordingly, adoption of the AI system of conversation understanding in various fields including (but not limited to) a customer consultation center or an online shopping mall enables each user to understand the conversation through natural language conversation with the AI system, Is getting more convenient and quicker.

Understanding the conversation The AI system may be used as a conversation partner for the user, but it may also be used to intervene in the conversation between human users to help smooth the progress of the conversation.

According to one aspect of the disclosure, a call-to-call between a first user terminal and a second user terminal, remotely located from each other, comprises transmitting and receiving voice information between the first and second user terminals, To control a predetermined display associated with a second user terminal. The method of the present disclosure is characterized in that, during a call, the display causes the display to display a textual representation of a tent being a user determined by natural processing of the user utterance input entered by the first user on the first user terminal, ; And displaying each textual representation of the one or more dialog response candidates that may be provided as an interactive response to the user utterance input, determined based on the user intent.

According to one embodiment of the present disclosure, a method is characterized in that, during a call, the emotion information-emotion information of the first user obtained by analysis of the user's utterance input causes the display to display the emotion state of the first user So that the display can be displayed.

According to one embodiment of the present disclosure, the emotion information includes value information assigned for each of a plurality of emotion types, based on an analysis of a user utterance input, wherein the plurality of emotion types includes at least one of emotion, happiness, joy, And may include at least one emotional type of anxiety, anxiety, anger, sadness, surprise, frustration, emptiness, hate, and restraint.

According to one embodiment of the present disclosure, a predetermined display associated with a second user terminal comprises one of a display for a second user performing a call on the second user terminal and a manager display remotely located with the second user terminal .

According to one embodiment of the present disclosure, the method further comprises: during a call, displaying a first user's profile information-profile information estimated by analysis of a user utterance input, based on the sex, age, Language information including at least one of the information of the language.

According to one embodiment of the present disclosure, a method is characterized in that during a call, the display causes the display to acquire voice acoustic information-voice acoustic information obtained by analysis of the user utterance input, of volume, pitch and velocity information Including at least one < RTI ID = 0.0 > of: < / RTI >

According to one embodiment of the present disclosure, the step of displaying each textual representation of one or more dialog response candidates includes displaying each textual representation of one or more dialog response candidates with respective corresponding probabilistic reliability can do.

According to one embodiment of the present disclosure, the method may further comprise, during a call, causing the display to display a textual representation of the user utterance input.

According to one embodiment of the present disclosure, the method may further comprise, during a call, causing the display to display a probabilistic indicator of whether or not an interactive response to the user utterance input is to be provided.

According to another aspect of the present disclosure there is provided a computer-readable medium having stored thereon one or more instructions that, when executed, cause the computer device to perform any one of the methods described above, A possible recording medium is provided.

According to another aspect of the present disclosure, a call-to-call communication between a first user terminal and a second user terminal, which is remotely located from each other, comprises transmitting and receiving voice information between the first and second user terminals, A computer apparatus configured to control a predetermined display associated with two user terminals is provided. The computer device of the present disclosure includes a receiving module configured to receive a user utterance input from a first user terminal; The dialogue understanding module - dialogue understanding module for analyzing the received user speech input comprises a predetermined knowledge base model and processes the received user speech input in a natural language according to a predetermined knowledge base model, Determine an intent and determine one or more dialog response candidates that match the determined user ' s tent; And a communication module configured to transmit information of the determined user intent and one or more conversation response candidates to the display.

When the method and apparatus according to the present disclosure are used for telephone consultation between a consulting staff of a customer consulting center and a customer, it is possible to provide a response direction of a consulting staff for each situation. Therefore, Not only the consultation can be performed, but also the emotional labor fatigue of the person consulting staff can be mitigated. The method and apparatus of the present disclosure can also be applied to wired and wireless voice calls between people so that the conversation person can read the feelings of the conversation partner and thus provide a delicate response accordingly.

1 is a diagram schematically illustrating a system environment in which a conversation understanding AI system may be implemented, in accordance with one embodiment of the present disclosure.

FIG. 2 is a functional block diagram that schematically illustrates the functional configuration of the conversation understanding service server 104 of FIG. 1, according to one embodiment of the present disclosure.

FIG. 3 is a functional block diagram that schematically illustrates the functional configuration of the dialogue understanding unit 204 of FIG. 2, according to one embodiment of the present disclosure.

FIG. 4 is a functional block diagram that schematically illustrates the functional configuration of the responding user terminal 108 of FIG. 1, in accordance with one embodiment of the present disclosure.

FIG. 5 is a diagram illustrating an example of a screen configuration that may be presented on a screen display of response user terminal 108 of FIG. 1, in accordance with one embodiment of the present disclosure.

FIG. 6 is a diagram illustrating an example of a screen configuration that may be presented on the display of the responding user terminal 102 of FIG. 1, in accordance with another embodiment of the present disclosure.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. Hereinafter, when it is judged that there is a possibility that the gist of the present disclosure may be unnecessarily blurred, a detailed description of known functions and configurations will be omitted. It is also to be understood that the following description is only an example of the present disclosure, and the present disclosure is not limited thereto.

The terminology used in this disclosure is used only to describe a specific embodiment and is not used to limit the present disclosure. For example, an element expressed in singular < Desc / Clms Page number 5 > terms should be understood as including a plurality of elements unless the context clearly dictates a singular value. It is to be understood that the term "and / or" as used in this disclosure encompasses any and all possible combinations of one or more of the listed items. It should be understood that the terms " comprises " or " having ", etc. used in the present disclosure are intended to specify that there exist features, numbers, steps, operations, elements, It is not intended to exclude the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof, by use.

In the embodiments of the present disclosure, 'module' or 'sub-module' means a functional part that performs at least one function or operation, and may be implemented in hardware or software, or a combination of hardware and software. Also, a plurality of "modules" or "sub-modules" may be integrated into at least one software module and implemented by at least one processor, except for "module" or "sub-module" have.

In the embodiment of the present disclosure, the " conversation understanding AI system " is a system in which a natural language input (e.g., a command from a user in natural language, a statement, Requests, questions, and so on) to determine the intent of the user and to provide the necessary actions based on the found user's intent, i.e., any information processing System, and is not limited to any particular form.

In addition, all terms used in the present disclosure, including technical or scientific terms, unless otherwise defined, have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It should be understood that commonly used predefined terms are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are not to be interpreted excessively or extensively unless explicitly defined otherwise in this disclosure .

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

Figure 1 is a schematic diagram of a system environment 100 in which a conversation understanding AI system may be implemented, in accordance with one embodiment of the present disclosure. The system environment 100 includes a plurality of requesting user terminals 102a-102n, a conversation understanding service server 104, a communication network 106 and a plurality of responding user terminals 108a-108m.

According to one embodiment of the present disclosure, a plurality of requesting user terminals 102a-102n may be any user device having a wired or wireless form of telephone capability. Each of the requesting user terminals 102a-102n may be a variety of wired or wireless telephony terminals, including, for example, a wired and wireless telephone, a smartphone, a tablet PC, a smart speaker, a desktop, a laptop, a PDA, a digital TV, a set- But not limited to, the < / RTI > In accordance with one embodiment of the present disclosure, each of the plurality of requesting user terminals 102a-102n may be a PSTN, VoIP, GSM, CDMA, TDMA, OFDM, Enhanced Data GSM Environment (EDGE) , LAN, WAN, or any of a variety of wired or wireless communication protocols. According to one embodiment of the present disclosure, each of the plurality of requesting user terminals 102a-102n may contact the conversation understanding service server 104 to request the desired service.

According to one embodiment of the present disclosure, each of the requesting user terminals 102a-102n may receive voice input from a user on the terminal, as well as other various types of user input, such as text and / And can transmit the received user input signal to the conversation understanding service server 104 according to a predetermined communication method. According to one embodiment of the present disclosure, each of the requesting user terminals 102a-102n can receive not only a voice response signal according to a predetermined communication method from the conversation understanding service server 104, / RTI > and / or other various types of response signals, such as a tactile shape.

According to one embodiment of the present disclosure, the conversation understanding service server 104 may communicate with the requesting user terminals 102a-102n in accordance with any wired or wireless communication scheme. In accordance with one embodiment of the present disclosure, the conversation understanding service server 104 may receive a voice telephone call (voice information including voice information from the requesting user terminals 102a-102n, including other types of information such as video and text, (Including various types of telephone calls) and establish a call (communication session) between the requesting user terminal 102a-102n and one of the responding user terminals 108a-108m described below. In accordance with one embodiment of the present disclosure, the conversation understanding service server 104 receives user utterances (and voice and text utterances including voice and speech input, etc.) from the requesting user terminals 102a-102n via the established call calls Various types of information including other types of information).

According to one embodiment of the present disclosure, the conversation understanding service server 104 may process the received user utterance input based on previously prepared knowledge base models to determine the intent of the user. According to one embodiment of the present disclosure, the conversation understanding service server 104 may determine an indicator (e.g., a probabilistic indicator) that indicates the turn to provide the user with an answer in consideration of the determined user ' s tent and context . According to one embodiment of the present disclosure, the conversation understanding service server 104 also analyzes the received user utterances, etc. and generates analysis results regarding information about the user, e.g., a user profile, acoustic characteristics and / can do. According to one embodiment of the present disclosure, the conversation understanding service server 104 may generate one or more conversation response candidates (suggestions) that match, for example, a user ' s tent. According to one embodiment of the present disclosure, the conversation understanding service server 104 receives one or more conversation response candidates matching the user ' s intent, taking into account, for example, analysis results regarding the above user profile, voice acoustic features and / Can be generated.

In accordance with one embodiment of the present disclosure, the conversation understanding service server 104 receives a user utterance input received via a call call as a result of analyzing and processing the received user utterance input, Context information), an index indicating whether the answer is to be provided, one or more dialog response candidates matching the user intent, and analysis result information such as other user profile features, voice acoustic features, and emotion characteristics, ) To one of the corresponding responding user terminals 108a-108m. According to one embodiment of the present disclosure, the conversation understanding service server 104 may be various customer center servers in various fields such as finance, medical care, law, shopping, etc., but the present disclosure is not limited thereto.

In the figure, the conversation understanding service server 104 has an independent form (for example, a function of acquiring and providing a user's intent and other various analysis results by analysis and processing of the user's voice described above) The present disclosure is not limited thereto. According to another embodiment of the present disclosure, the conversation understanding service server 104 may acquire the necessary conversation understanding function through communication with a separate external conversation understanding server, instead of having the conversation understanding function inside You should know.

According to one embodiment of the present disclosure, the communication network 106 may include any wired or wireless communication network, e.g., a TCP / IP communication network. According to one embodiment of the present disclosure, the communication network 106 may include, for example, a Wi-Fi network, a LAN network, a WAN network, an Internet network, and the like, and the present disclosure is not limited thereto. In accordance with one embodiment of the present disclosure, the communication network 106 may be implemented using any of a variety of wired or wireless communication protocols such as, for example, Ethernet, GSM, EDGE, CDMA, TDMA, OFDM, Bluetooth, VoIP, Wi-MAX, Wibro, .

According to one embodiment of the present disclosure, each of the responding user terminals 108a-108m may receive, from the conversation understanding service server 104 via the communication network 106, the users received from the requesting user terminals 102a-102n, (And context) information, an indicator indicating whether the answer is to be provided, one or more dialog response candidates matching the user intent, and other user profile features Voice acoustic features, emotional characteristics, and the like. According to one embodiment of the present disclosure, the answering user terminals 108a-108m receive the user utterance input, various analysis results, and the like via the voice output unit and the screen output unit (display) -108m) to the responding user (e.g., consulting staff). In accordance with one embodiment of the present disclosure, the responding user terminals 108a-108m also receive voice (and other types of) input from responding users on the terminal and communicate it via the communication network 106 to the conversation understanding service server 104 ). &Lt; / RTI >

FIG. 2 is a functional block diagram that schematically illustrates the functional configuration of the conversation understanding service server 104 of FIG. 1, according to one embodiment of the present disclosure. As shown, the conversation understanding service server 104 includes a telephone call establishing / relaying unit 202, a conversation understanding unit 204, and a communication unit 206.

According to one embodiment of the present disclosure, the telephone call establishing / relaying unit 202 establishes a telephone call between one of the requesting user terminals 102a-102n and one of the responding user terminals 108a-108m, (I.e., transmission and reception of voice, etc.) through the established call. According to one embodiment of the present disclosure, the telephone call establishing / relaying unit 202 receives user's utterance voice input or the like incoming from the requesting user terminal 102a-102n according to a predetermined communication protocol, To the corresponding requesting user terminal 102a-102n, voice input from the responding user transmitted from one of the requesting user terminals 108a-108m.

According to one embodiment of the present disclosure, the conversation understanding section 204 receives the user speech input received from the requesting user terminal 102a-102n via the telephone call establishing / relaying section 202 and receives the received speech input Can be converted into text. According to one embodiment of the present disclosure, the conversation understanding section 204 may also determine a user intent corresponding to the received user speech input based on previously prepared knowledge base models. According to one embodiment of the present disclosure, the conversation understanding service server 104 may also analyze the received user speech input to obtain information about the user, such as user profile information, voice acoustic feature information, and emotional information. In accordance with one embodiment of the present disclosure, the conversation understanding section 204 may also determine, based on previously prepared knowledge base models, information about the analyzed users, and the like, one or more conversation response candidates Lt; / RTI >

According to one embodiment of the present disclosure, the communication unit 206 allows the conversation understanding service server 104 to communicate with each of the responding user terminals 108a-108m through the communication network 106 of Fig. According to one embodiment of the present disclosure, the communication unit 206 is configured to communicate with the user's utterance input signal or the like received from the requesting user terminals 102a-102n via the telephone call establishing / relaying unit 202, One or more dialog response candidates matching the user tent, and various analysis results generated by the dialogue understanding unit 204 can be transmitted to one of the response user terminals 108a-108m according to a predetermined protocol.

FIG. 3 is a functional block diagram that schematically illustrates the functional configuration of the dialogue understanding unit 204 of FIG. 2, according to one embodiment of the present disclosure. The dialogue understanding unit 204 includes a Speech-To-Text (STT) module 302, an acoustic feature analysis module 304, a Natural Language Understanding (NLU) module 306, a dialog understanding knowledge base 308, a user profile analysis module 310, a dialogue management module 312, an emotion analysis module 314, and a dialogue generation module 316.

According to one embodiment of the present disclosure, the STT module 302 receives the call input of the telephone call requesting user, received via the telephone call establishing / relaying unit 202 of FIG. 2, And can be converted into text data based on matching and the like. According to one embodiment of the present disclosure, the STT module 302 may extract a feature from a speech call requesting user's telephone input to generate a feature vector sequence. According to one embodiment of the present disclosure, the STT module 302 may be implemented as a dynamic time warping (DTW) method, a HMM model (Hidden Markov Model), a GMM model (Gaussian-Mixture Mode), a deep neural network model, For example, a sequence of words, based on various statistical models of the speech recognition results.

According to one embodiment of the present disclosure, the acoustic feature analysis module 304, like the STT module 302, can receive the user utterance input received via the telephone call establishing / relaying section 202 of Figure 2 . According to one embodiment of the present disclosure, the acoustic feature analysis module 304 may measure and / or extract acoustic feature information of speech from the received user speech input. According to one embodiment of the present disclosure, the acoustic feature analysis module 304 may measure and / or extract, for example, the volume, pitch, velocity, and other acoustic information of the user audio input.

According to one embodiment of the present disclosure, the NLU module 306 may receive text input from the STT module 302. In accordance with one embodiment of the present disclosure, the textual input received at the NLU module 306 is received from the user utterance input received via the telephone call establishing / relaying unit 202 of FIG. 2, for example, A text recognition result, e.g., a sequence of words. According to one embodiment of the present disclosure, the NLU module 306 may map the received text input to one or more user-defined intents based on a dialog understanding knowledge base 308, described below. Where the user intent may be associated with a series of operations (s) that can be understood and performed by the conversation understanding service server 104 in accordance with the user ' s tent.

According to one embodiment of the present disclosure, the conversation understanding knowledge base 308 may include, for example, a predefined ontology model. According to one embodiment of the present disclosure, an ontology model can be represented, for example, in a hierarchical structure between nodes, each node having an "intent" node corresponding to the user's intent or a &Quot; Attributes " node that is linked directly to an " Attributes "node or a " Attributes" node of an "Attributes" According to one embodiment of the present disclosure, the " intent "node and the" attribute "nodes directly or indirectly linked to the" intent "node can constitute one domain and the ontology comprises a set of such domains . In accordance with one embodiment of the present disclosure, the conversation understanding knowledge base 308 may include, for example, domains that correspond to all of the intents that the conversation understanding service server 104 can understand and perform corresponding actions Lt; / RTI > It should be noted that, according to one embodiment of the present disclosure, the ontology model can be dynamically changed by addition or deletion of nodes, or modification of relations between nodes.

According to one embodiment of the present disclosure, the intent nodes and attribute nodes of each domain in the ontology model may be associated with words and / or phrases associated with the corresponding user's tent or attributes, respectively. According to one embodiment of the present disclosure, the conversation understanding knowledge base 308 includes an ontology model 308, which may comprise an ontology model, for example, a hierarchy of nodes, , And the NLU module 306 may determine the user intent based on the ontology model implemented in the lexical dictionary form. For example, according to one embodiment of the present disclosure, the NLU module 306, upon receipt of a text input or a sequence of words, can determine which nodes in a domain in the ontology model each word in the sequence is associated with, Based on such a determination, it is possible to determine the corresponding domain, i. E. The user tent.

According to one embodiment of the present disclosure, the user profile analysis module 310 may analyze the speech input of the telephone call requesting user received via the telephone call establishing / relaying 202 to estimate information about the user profile have. According to one embodiment of the present disclosure, the user profile analysis module 310 is configured to analyze the user input of the telephone call requesting user input via the telephone call establishing / relaying unit 202, (E.g., volume, pitch, velocity, and other acoustic information of the user's voice input) obtained by the above-described acoustic feature analysis module 304, one or more intents acquired by the NLU module 306, And estimate information about the user profile. According to one embodiment of the present disclosure, the user profile analysis module 310 may estimate or obtain, for example, the user's gender, age range, language of use, and the like, and provide the obtained information.

According to one embodiment of the present disclosure, the dialogue management module 312 is based on a tent of the user determined by the NLU module 306 and, in accordance with a predetermined dialogue management knowledge base model, Lt; / RTI > In accordance with one embodiment of the present disclosure, the conversation management module 312 is configured to determine, based on a predetermined dialogue management knowledge base model, any action, e.g., any conversation response, in response to a tent received from the NLU module 306 , And generate a corresponding detailed operation flow.

According to one embodiment of the present disclosure, the emotion analysis module 314 analyzes the speech input of the telephone call requesting user received via the telephone call establishing / relaying unit 202 to estimate information about the current emotion of the user can do. According to one embodiment of the present disclosure, the emotion analysis module 314 is configured to determine whether the speech input of the telephone call requesting user received through the telephone call establishing / relaying unit 202, (E.g., volume, pitch, velocity, other acoustic information, etc.) obtained by a user acoustic feature analysis module 304, one or more intents acquired by the NLU module 306, and / The information on the current emotion of the user can be estimated by comprehensively analyzing the user profile information (for example, user's sex, age group, language, etc.) acquired by the profile analysis module 310. According to one embodiment of the present disclosure, the emotional analysis module 312 classifies a person's emotional type into a predetermined plurality of types (e.g., heat, happiness, joy, comfort, anxiety, anger, sadness, Frustration, void, hate, restraint, etc.), a value is assigned to each emotion type according to a comprehensive analysis of the above information, and the current emotion state of the user can be indicated through the assigned value.

In accordance with one embodiment of the present disclosure, the dialog creator 316 may generate one or more suitable candidates as the dialog response to be presented to the user based on the conversation flow generated by the dialog management module 312. [ According to one embodiment of the present disclosure, the dialogue generation unit 316 generates the dialogue generation unit 316 in consideration of a value assigned to each emotion type (i.e., the current emotion state information of the user) by the emotion analysis module 314, It is possible to generate one or a plurality of dialog response candidates deemed appropriate. According to one embodiment of the present disclosure, the conversation generation unit 316 generates not only the information on the emotion state of the user but also the above-described processing and analysis results related to the above-described user voice input, (E.g., volume, pitch, speed, and other acoustic information of the user voice input) acquired by the NLU module 304, and / or one or more intents acquired by the NLU module 306 and / (E.g., user's gender, age range, language, etc.) acquired by the user, and may generate one or more suitable candidates as the dialog response to be provided to the user.

FIG. 4 is a functional block diagram that schematically illustrates the functional configuration of the responding user terminal 108 of FIG. 1, in accordance with one embodiment of the present disclosure. The response user terminal 108 includes a communication unit 402, a response user input receiving unit 404, an information visualization / screen output unit 406 and an audio output unit 408. [

According to one embodiment of the present disclosure, the communication unit 402 enables the answering user terminal 108 to communicate with the conversation understanding service server 104 via the communication network 106. [ According to one embodiment of the present disclosure, the communication unit 402 is configured to allow the signal obtained on the response user input receiving module 404 to be transmitted to the conversation understanding service server 104 via the communication network 106 in accordance with a predetermined protocol can do. According to one embodiment of the present disclosure, the communication unit 402 is configured to receive various kinds of signals received from the conversation understanding service server 104 via the communication network 106, such as a user voice input signal, User intent, one or more dialog response candidates matching the user intent, various analysis results, and the like, and perform appropriate processing according to a predetermined protocol.

According to one embodiment of the present disclosure, the responding user input receiving unit 404 may receive a natural-language input in the form of a voice from a responding user (e.g., a consulting employee) on the responding user terminal 108. [ According to one embodiment of the present disclosure, the response user input receiving section 404 includes, for example, a microphone and an audio circuit, and can acquire a user voice input signal through a microphone and convert the obtained signal into audio data. In accordance with one embodiment of the present disclosure, the natural language input from the responding user may include a natural language input received from the conversation understanding service server 104 via, for example, one or more of the conversation response candidates, i.e., the communication network 106 and the communication unit 402, May correspond to that selected by the responding user on the responding user terminal 108 as one of the one or more conversation response candidates matching the intent. According to one embodiment of the present disclosure, the responding user input receiving unit 404 may also include various types of input devices such as various pointing devices such as a mouse, joystick, trackball, keyboard, touch panel, touch screen, And can acquire text input and / or touch input signals input from the response user via these input devices.

According to one embodiment of the present disclosure, the information visualization / screen output unit 406 is a user who is generated from various signals received from the conversation understanding service server 104 via the communication network 106, for example, (E.g., information obtained by the user acoustic feature analysis module 304 of FIG. 3 described above, the user profile analysis module 310 of FIG. 3), the tent, User emotion information determined by the emotion analysis module 312 in FIG. 3, and the like) as time information, and display the time information. According to one embodiment of the present disclosure, the information visualization / screen output unit 406 includes various display devices such as a touch screen based on a technology such as an LCD, LED, OLED, QLED, etc., A visual response corresponding to the input, such as text, symbols, video, images, hyperlinks, animations, various notices, etc., can be presented to the user.

According to one embodiment of the present disclosure, the audio output unit 408 receives a user audio input signal from the user terminal 102, which has been transmitted through the communication network 106 and the communication unit 402, And reproduce and output it on the terminal 108. [ According to one embodiment of the present disclosure, the voice output unit 408 includes, for example, a speaker or a headset, and can provide the above-described user voice input signal to the responding user via a speaker or headset.

In the embodiments of the present disclosure described above with reference to Figs. 1-4, it is to be understood that the present invention is not limited thereto, although it has been described with reference to the case where the conversation understanding AI system is mainly used for a customer consultation center or the like. It should be noted that, according to another embodiment of the present disclosure, the AI understanding system of conversation can be applied to various types of user-to-user voice calls via the conversation understanding service server in addition to the customer consultation center. 1 to 4, when the time information of the information analyzed by the conversation understanding service server 104 is presented through the screen output unit on the response user terminal 108, The present disclosure is not limited thereto. According to another embodiment of the present disclosure, the time information of the information analyzed by the conversation understanding service server 104 may be presented on an administrator terminal (not shown) managing a plurality of responding user terminals 108 You should know.

In the embodiment of the present disclosure described above with reference to Figures 1-4, and also between the conversation understanding service server 104 and the responding user terminal 108, the responding user terminal 108 only provides the user I / And other conversation understanding are all described as being implemented based on the so-called "thin client-server model " delegated to the conversation understanding service server 104, but this disclosure is not so limited. According to another embodiment of the present disclosure, the above-described functions may be implemented distributed among the conversation understanding service server 104 and the response user terminal 108, or all functions may be implemented in a stand- It should be noted that this may be implemented as an application. Further, in the case of implementing each function such as the above-described conversation understanding function according to an embodiment of the present disclosure distributed between the conversation understanding service server 104 and the response user terminal 108, It should be understood that the invention may be otherwise embodied. In addition, although in the embodiments of the present disclosure described above with reference to Figures 1-4, a particular module has been described as if it were performing certain operations for convenience, the present disclosure is not so limited. It should be noted that, in accordance with another embodiment of the present disclosure, the operations described above as performed by any particular module in the above description can be performed by separate and distinct modules, respectively.

5 is a diagram illustrating an example of a screen configuration that may be presented on the screen output of the responding user terminal 108, in accordance with one embodiment of the present disclosure. For example, the illustrated example relates to a call between the user who called the shopping mall customer center and the consultant.

In the upper left box 502 of the screen, a user's utterance sentence is converted into text and displayed. As shown, each word of a user utterance sentence converted into text (i.e., "hi, do you have a moto drill?") Is displayed together with each part of speech, but the present invention is not limited thereto. 5, a tent representing the user obtained by processing the above-mentioned user spoken sentence is displayed in a sentence expression in the left middle box 504 of the screen (i.e., The corresponding intent is indicated in the form of, for example, "Do I really have to buy this?"). According to an embodiment of the present disclosure, when a sentence expression of an intent of a user utterance sentence is displayed on the screen, a method of easily recognizing the user on the terminal (for example, a visually remarkable color, highlight, And the like). In the lower left box 506 of the screen, a turning taking item indicating a probable state of whether or not a response time is reached is indicated by 1, indicating that the user has finished utterance and has been provided with a response.

5, profile information of the spoken user's gender, age, and language of use is displayed in the upper middle box 508. In the upper right box 510 of the screen, spoken sentences The current emotional state of the user, obtained and analyzed, is displayed as a probability for each emotional marker. As can be seen, the emotional state of the current utterance user is most dominant in the neutral state. 5, analysis results such as volume, pitch, and speed are displayed in the middle box 512 on the right side of the screen in Fig. 5, in the middle lower box 514 of the screen, a result of synthesizing the above-mentioned user's utterance sentence and various other analysis results, it is possible to provide a user with a single candidate response According to one embodiment of the present disclosure, when a candidate response is expressed on a screen, a user on the terminal can easily (E.g., in various ways, including visually striking colors, highlights, or boldface formatting).

6 is a diagram illustrating an example of a screen configuration that may be presented on the display of the responding user terminal 102, in accordance with another embodiment of the present disclosure. The illustrated example may be, for example, a telephone conversation between couples.

In the left upper box 602 of the screen, a speech sentence from the user is converted into text and displayed. As shown, each word of a user spoken sentence (i.e., "Sarah posted photos of the newly built hotel restaurant.") That is converted to text is shown with each part of speech indication, but the present invention is not limited thereto. 6, in the left middle box 604 of the screen, a tent representing the user obtained by processing the above user spoken sentence is expressed in a sentence expression (i.e., The corresponding intent is displayed in the form of, for example, "Make a reservation. I want to go there."). In the lower left box 606 of the screen, a turning taking item informing the probability of a response time is indicated as 0.7.

6, profile information of the sex, age, and language of the utterance user is displayed in the upper middle box 608 of the screen. In the upper right box 610 of the screen, The current emotional state of the user, obtained and analyzed, is displayed as a probability for each emotional marker. As can be seen, the emotional state of the current utterance user is most dominant in the neutral state. In the middle box 612 on the right side of the screen in FIG. 6, analysis results such as volume, pitch, speed and the like of the user utterance sentence are displayed. 6, in the lower middle box 614 of the screen, a result of synthesizing the above-mentioned user's utterance sentence and various other analysis results, a single candidate response (hereinafter referred to as " In other words, "I already made a reservation for our one-year anniversary" is marked with a probabilistic indication of its reliability.

As will be appreciated by those skilled in the art, the present invention is not limited to the examples described in this disclosure, but may be variously modified, rearranged and replaced within the scope of the present disclosure. It should be understood that the various techniques described herein may be implemented in hardware or software, or a combination of hardware and software.

A computer program according to an embodiment of the present disclosure may be stored in a storage medium readable by a computer processor or the like, for example, a non-volatile memory such as an EPROM, EEPROM, or flash memory device, a magnetic disk such as an internal hard disk and a removable disk, CDROM disks, and the like. Also, the program code (s) may be implemented in assembly language or machine language. All such modifications and variations that fall within the true spirit and scope of this disclosure are intended to be embraced by the following claims.

Claims

Wherein the second user terminal comprises a first user terminal and a second user terminal, the second user terminal being located remotely from the first user terminal, the second user terminal being located remotely from the first user terminal, A method performed by a computer for controlling a predetermined display associated with a terminal,

During the execution of the call,

Causing a user utterance input entered by a first user on the first user terminal to display a textual representation of a user intent determined by natural language processing according to a predetermined knowledge base model; And

And displaying each textual representation of the one or more dialog response candidates that may be provided as an interactive response to the user utterance input, determined based on the user intent.

Display control method.
The method according to claim 1,

During the execution of the call,

Wherein the emotion information of the first user obtained by analysis of the user utterance input further comprises the step of causing the emotion information of the first user to display information indicating the emotion state of the first user.
3. The method of claim 2,

Wherein the emotion information includes value information assigned for each of a plurality of emotional types based on an analysis of the user utterance input and wherein the plurality of emotional types includes at least one of emotion, happiness, joy, comfort, anxiety, anger, The emotional type of at least one of surprise, frustration, emptiness, hate, restraint.
The method according to claim 1,

Wherein the predetermined display associated with the second user terminal comprises one of a display for a second user performing the call on the second user terminal and an administrator display remotely located with the second user terminal, Control method.
The method according to claim 1,

During the execution of the call,

Displaying profile information of the first user estimated by analysis of the user utterance input, wherein the profile information includes at least one of sex, age, and language of the first user; Further included,

Display control method.
The method according to claim 1,

During the execution of the call,

Further comprising: displaying voice acoustic information obtained by analysis of the user utterance input, wherein the voice acoustic information includes at least one of volume, pitch and velocity information of the user utterance input.

Display control method.
The method according to claim 1,

Wherein causing the display of each textual representation of the one or more dialog response candidates comprises displaying each textual representation of the one or more dialog response candidates with respective corresponding probabilistic reliability.
The method according to claim 1,

During the execution of the call,

Further comprising: displaying a textual representation of the user utterance input.
The method according to claim 1,

During the execution of the call,

Further comprising: displaying a probabilistic indicator of whether or not an interactive response to the user utterance input is to be provided.
A computer-readable recording medium storing one or more instructions,

Wherein the one or more instructions, when executed, cause the computer device to perform the method of any one of claims 1 to 9.
A first user terminal and a second user terminal, wherein the first user terminal and the second user terminal are located remotely from each other, the call including transmission and reception of voice information between the first and second user terminals, The computer device being configured to control the display of the computer,

A receiving module configured to receive a user utterance input from the first user terminal;

A dialogue understanding module for analyzing the received user speech input, wherein the dialogue understanding module comprises a predetermined knowledge base model and processes the received user speech input in a natural language according to the predetermined knowledge base model, Determine a tent being a user that matches the input, and determine one or more dialog response candidates matching the determined user tent; And

And a communication module configured to transmit the determined user tent and information of the one or more conversation response candidates to the display.

Computer device.