WO2014020835A1

WO2014020835A1 - Agent control system, method, and program

Info

Publication number: WO2014020835A1
Application number: PCT/JP2013/004243
Authority: WO
Inventors: 康行三井
Original assignee: 日本電気株式会社
Priority date: 2012-07-31
Filing date: 2013-07-09
Publication date: 2014-02-06

Abstract

A plurality of interaction processing means (81) carries out processing for interacting with a user by generating response information in response to input information from the user. The plurality of interaction processing means (81) includes at least one interaction processing means with interaction processing performance different from the other interaction processing means. A processing means determination means (82) determines one interaction processing means from the plurality of interaction processing means according to the situation in which interaction processing is carried out. An agent setting means (83) sets an agent corresponding to the one interaction processing means that has been determined. An agent control means (84) notifies the user of the response information via the agent that has been set.

Description

Agent control system, method and program

The present invention relates to an agent control system, an agent control method, and an agent control program for controlling an agent that interacts with a user.

Devices such as keyboards, mice, and touch panels have become widespread as input devices for operating personal computers, mobile phones, and the like. On the other hand, a method of operating a device by a voice interactive user interface (hereinafter referred to as a voice interaction UI) without using the device as described above and having a microphone and a speaker has been studied.

In particular, in recent years, mobile phones such as smartphones are rapidly spreading. In a mobile terminal typified by such a mobile phone, it is difficult to equip a device such as a keyboard because of the problem of portability of the device. For this reason, attention has been focused on device operations using the voice dialogue UI.

In the voice interaction UI, the user's voice input from the microphone is mainly recognized by the automatic voice recognition process. Then, after recognizing the voice as a command, a response is generated according to the scenario. By displaying the generated response on a screen of a monitor or the like, or outputting the synthesized voice from a speaker, it becomes possible to operate the device while performing a dialogue.

In recent years, with the improvement of the communication speed of mobile terminals, processing such as speech recognition and speech synthesis is performed by a server. With such a configuration, the number of devices and services that improve the accuracy of speech recognition and the quality of synthesized speech are increasing (distributed speech recognition).

Furthermore, Patent Document 1 describes a voice recognition system in which a simple voice recognition device is mounted not only on a server but also on a terminal. The speech recognition system described in Patent Document 1 performs speech recognition processing at a terminal when a speech recognition result cannot be obtained from a server in consideration of a communication state.

On the other hand, as disclosed in Patent Document 1, for example, a technique having an agent function is also known in order to make device operation using a voice dialogue UI more enjoyable and friendly. The agent function is to create an anthropomorphic character (agent) on the screen and change the action and facial expression according to the contents of the response and the input from the user, thereby producing a simulated interaction with the agent. It is a function.

By using such a function, the user can feel as if the device is being operated while performing human communication with the agent.

Furthermore, Patent Document 2 describes a method of performing more human communication in combination with an agent and a voice dialogue UI.

Patent Document 3 describes an information search method using an agent. In the information search method described in Patent Literature 3, when a plurality of agents each having a unique information search condition set are presented to the user and the agent is selected based on the information search condition desired by the user. Information retrieval desired by the user is performed.

Japanese Patent No. 4554285 Japanese Patent No. 3016350 JP 2004-118856 A

As disclosed in Patent Document 1, when the line between the server and the mobile terminal is disconnected or when the line state is not good, voice interaction processing including voice recognition and voice synthesis processing is executed on the mobile terminal side It is desirable that

In this case, voice conversation processing is performed with the server when the line is connected, and voice conversation processing is performed with the mobile terminal when the line is not connected. However, the processing capability of a mobile terminal is generally inferior to that of a server. Therefore, in general, the accuracy when voice dialogue processing is performed on a mobile terminal is inferior to that when processing is performed on a server.

Therefore, there is a possibility that a keyword (command) that has been correctly recognized when processed by the server will not be correctly recognized when the terminal processing is switched to the line disconnected state. If the user does not grasp the switching, the user feels that the keyword that has been recognized normally is suddenly no longer recognized. Therefore, there is a problem that dissatisfaction with the voice dialogue UI increases.

In the information search method described in Patent Document 3, since an agent is selected based on information search conditions desired by the user, it is possible to provide a user-friendly UI. However, as described above, when the information search process performed on the server is switched to the terminal process, the information search capability is generally inferior. In such a case, since the user performs the information search process without knowing what process is being performed, dissatisfaction with the information search process increases even if a user-friendly UI is provided. There's a problem.

Accordingly, an object of the present invention is to provide an agent control system, an agent control method, and an agent control program that can provide user-friendly dialogue processing and also allow the user to recognize the current processing status at a glance. And

The agent control system according to the present invention determines a plurality of dialogue processing means for performing dialogue processing with a user by generating response information for input information from the user, and one dialogue processing means from the plurality of dialogue processing means. Processing means determining means, agent setting means for setting an agent according to the determined one dialog processing means, and agent control means for notifying the user of response information via the set agent, The plurality of dialog processing means include at least one dialog processing means having different dialog processing performance from other dialog processing means, and the processing means determining means has a plurality of dialogs depending on the situation in which the dialog processing is performed. One dialogue processing means is determined from the processing means.

The agent control method according to the present invention determines one dialogue processing means from a plurality of dialogue processing means that perform dialogue processing with a user by generating response information with respect to input information from the user. Set the agent according to the interactive processing means, notify the user of the response information through the set agent, and determine the one interactive processing means according to the situation where the interactive processing is being performed One dialog processing means is determined from a plurality of dialog processing means including at least one dialog processing means having different dialog processing performance from other dialog processing means.

The agent control program according to the present invention is a processing means for determining one interaction processing means from a plurality of interaction processing means for performing interaction processing with a user by generating response information for input information from the user in a computer. Determine processing means by executing determination processing, agent setting processing for setting an agent corresponding to the determined one dialog processing means, and agent control processing for notifying the user of response information via the set agent. In the process, one dialog processing means is determined from a plurality of dialog processing means including at least one dialog processing means having different dialog processing performance from other dialog processing means depending on the situation in which the dialog processing is performed. It is characterized by that.

According to the present invention, it is possible to provide a user-friendly dialogue process and make the user recognize the current processing status at a glance.

It is a block diagram which shows the structural example of 1st Embodiment of the agent control system by this invention. It is a flowchart which shows an example of operation | movement of 1st Embodiment. It is a block diagram which shows the structural example of 2nd Embodiment of the agent control system by this invention. It is a flowchart which shows an example of operation | movement of 2nd Embodiment. It is a block diagram which shows the structural example which performs a response information generation process. It is a block diagram which shows the structural example of 3rd Embodiment of the agent control system by this invention. It is a block diagram which shows the structural example of the 1st Example of the agent control system by this invention. It is a block diagram which shows the structural example of the 3rd Example of the agent control system by this invention. It is a block diagram which shows the outline | summary of the agent control system by this invention.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that, in each embodiment, the same constituent elements are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

Embodiment 1. FIG.
FIG. 1 is a block diagram showing a configuration example of a first embodiment of an agent control system according to the present invention. Referring to FIG. 1, the agent control system of the present embodiment includes a processing means determination unit 101, an agent setting unit 102, a dialogue processing unit 104A, a dialogue processing unit 104B, and an agent control unit 105.

The dialogue processing unit 104A and the dialogue processing unit 104B perform dialogue processing with the user by generating response information with respect to input information from the user. The input information from the user is information used by the user to interact with the system, such as text, speech, speech synthesis results, and the like. The interactive process is a process for generating a response to the input content, and response information is generated as the response.

Further, the response information is information relating to a response from the system in a voice dialogue between the agent control system according to the present invention (hereinafter sometimes simply referred to as a system) and a user of the system. The response information is also expressed by text, speech, speech synthesis result, etc., like the input information.

The dialogue processing unit 104A and the dialogue processing unit 104B differ in processing method, dictionaries, databases, and the like. Therefore, the processing performance differs between the dialogue processing unit 104A and the dialogue processing unit 104B. The performance of interactive processing refers to processing speed, processing complexity, accuracy regarding processing results, and the like. Note that the difference in processing performance between the dialog processing unit 104A and the dialog processing unit 104B is not limited to the difference in pure processing performance. The conversation processing unit 104A and the conversation processing unit 104B have the same basic performance, but include processing performances different in a specific usage form.

The difference in processing performance in a specific usage form means that the function related to the interactive processing is not good at the form, scene, or area to be used, for example, it shows high performance in a good usage form. . Here, as a specific example, speech recognition, which is one function in the dialogue processing, will be described as an example.

In speech recognition, a larger database such as a dictionary performs better. However, if you have a large dictionary to achieve high performance in all areas, the average performance will improve, but partial performance degradation will occur due to the registration of many similar utterances. There is a fear. In order to avoid this, when a usage pattern is specified, a number of words related to the usage pattern are registered, and conversely, a word that is not related may be deleted to improve the performance of the usage pattern. In such a case, even if the basic performance related to the dialogue processing is equivalent, the processing performance in a specific usage mode may be different.

Suppose, for example, that dialogue processing is performed using a dictionary in which many words related to the weather are registered. In this case, if the scene uses “weather forecast”, high speech recognition performance is exhibited. On the other hand, in other usage scenes such as “news” and “congestion information”, only average performance can be exhibited.

Also, the difference in processing performance is not limited to the dictionary size. As the performance difference other than the size of the dictionary, for example, a performance difference such as strong against sentence utterance or strong against word utterance may be considered.

The processing means determination unit 101 determines one dialogue processing unit from a plurality of dialogue processing units. Specifically, the processing means determination unit 101 determines one dialogue processing unit from a plurality of dialogue processing units according to the situation where the dialogue processing is being performed. The situation in which the dialogue processing is performed includes, for example, a system load situation and a network load situation, but is not limited to these situations.

The agent setting unit 102 sets an agent corresponding to the determined one dialog processing unit. The agent corresponding to the dialogue processing unit is determined in advance.

The agent control unit 105 notifies the response information to the user via the set agent. For example, the agent control unit 105 may change the agent according to the content of the response information, or may edit the content of the response information according to the agent. A specific control method will be described later.

The processing means determination unit 101, the agent setting unit 102, and the agent control unit 105 are realized by a CPU of a computer that operates according to a program (agent control program). For example, the program is stored in a storage unit (not shown) of the terminal 100, and the CPU reads the program and operates as the processing means determination unit 101, the agent setting unit 102, and the agent control unit 105 according to the program. Good.

Further, the dialogue processing unit 104A and the dialogue processing unit 104B may also be realized by a CPU of a computer that operates according to a program. For example, the program may be stored in a storage unit (not shown) of the terminal 100, and the CPU may read the program and operate as the dialogue processing unit 104A and the dialogue processing unit 104B according to the program.

Further, each of the processing means determination unit 101, the agent setting unit 102, the dialogue processing unit 104A, the dialogue processing unit 104B, and the agent control unit 105 may be realized by dedicated hardware. Further, the dialogue processing unit 104A and the dialogue processing unit 104B may be included in the same device as other processing units, or may be included in different devices.

Next, the operation of this embodiment will be described. FIG. 2 is a flowchart showing an example of the operation of the present embodiment.

The processing means determination unit 101 determines a dialogue processing unit to be used (step S101). For example, the processing means determination unit 101 may determine means according to the current system load status. Specifically, the processing means determination unit 101 selects a low-load (that is, low performance) dialogue processing unit when the load on the entire system is high, and high-load (that is, when the overall system load is low). (High performance) dialog processing unit may be selected. Note that the user may designate in advance the dialog processing unit to be used. However, it is more preferable to determine the dialogue processing unit according to the situation where the dialogue processing is performed.

On the other hand, text is input as input information from the user (step S102).

The agent setting unit 102 determines an agent to be used in accordance with the result determined by the processing means determination unit 101 (step S103). The form and operation of the agent to be used are defined in advance, and the agent setting unit 102 stores images, operation information, and the like. The agent setting unit 102 may store parameters for controlling the appearance and operation of the agent instead of the image and the operation information.

For the agent, a character is used that clearly indicates to the user whether the current processing is performed by the dialog processing unit 104A or the dialog processing unit 104B. Specifically, for the agent, a character having a characteristic that clearly shows the level of performance and the usage scene that is best used.

Features that clearly show the level of performance and the usage scenes that you are good at include, for example, age, sex, or occupation. Another feature is the appearance of the character's body shape, facial expression, clothes, etc. that can be recalled as a human being, an animal, or a machine. Another feature is the action of a character that can be recalled as being fast (slow), cheerful (tired), or having a head (stunned).

The agent setting unit 102 may set an anthropomorphic agent having a feature that reminds the user of the age corresponding to the determined dialog processing performance of the one dialog processing means. For example, when a low-performance dialogue process is performed, an infant character representing a young age may be set as an agent. By setting an anthropomorphic agent in this way, it is possible to provide a user with a friendly interaction process.

When the processing means determination unit 101 selects the dialogue processing unit A (“A” in step S301), the dialogue processing unit 104A performs processing for generating information (response information) related to the dialogue response to the input text (step S301). S104A). The input text is subjected to language analysis processing, and keywords and the like are extracted. Next, the dialogue processing unit 104A generates response information corresponding to the keyword.

Similarly, when the processing means determination unit 101 selects the dialogue processing unit B (“B” in step S301), the dialogue processing unit 104B performs a process of generating response information (step S104B).

Based on the response information generated by the dialog processing unit 104A or the dialog processing unit 104B, the agent control unit 105 displays the response of the agent selected by the agent setting unit 102 using a display device such as a display (step S105). . In addition, when the control information of the device and software included in the terminal 100 is included in the response information, the agent control unit 105 also performs these controls at the same time.

In the present embodiment, the case where two types of dialogue processing units (the dialogue processing unit 104A and the dialogue processing unit 104B) are provided is illustrated, but the number of dialogue processing units is not limited to two, and may be three or more. There may be. In addition, each dialogue processing unit may be provided in the same device, or may be distributed in a plurality of devices.

As described above, according to the present embodiment, the processing means determination unit 101 responds to a situation in which a dialogue process is being performed (for example, communication line connection status, communication radio wave status, overall system load status). One dialogue processing unit is determined from a plurality of dialogue processing units (for example, the dialogue processing unit 104A and the dialogue processing unit 104B). Note that the plurality of dialogue processing units include at least one dialogue processing unit having different dialogue processing performance from other dialogue processing units. Then, the agent setting unit 102 sets an agent corresponding to the determined one dialog processing unit, and the agent control unit 105 notifies the user of response information via the set agent.

With the configuration as described above, it is possible to provide a user-friendly dialogue process and make the user recognize the current processing status at a glance.

That is, according to the present embodiment, since the appearance and operation of the agent are changed due to the difference in the performance of the interactive processing, it is possible to provide an agent that is easy for the user to understand and can easily understand the current processing at a glance. In other words, since the level of dialogue processing performance by the voice dialogue UI is clearly indicated by changing the display of the agent, a comfortable voice dialogue UI can be provided to the user.

Embodiment 2. FIG.
FIG. 3 is a block diagram showing a configuration example of the second embodiment of the agent control system according to the present invention. In addition, about the structure similar to 1st Embodiment, the code | symbol same as FIG. 1 is attached | subjected and description is abbreviate | omitted.

Referring to FIG. 3, the agent control system of this embodiment includes a terminal 100 and a server 200. The terminal 100 and the server 200 are connected to each other via a communication network.

The terminal 100 includes a processing means determination unit 101, an agent setting unit 102, a voice input unit 103, a dialogue processing unit (terminal) 104, and an agent control unit 105. Note that the content of the dialog processing unit (terminal) 104 is the same as the content of the dialog processing unit 104A or the dialog processing unit 104B in the first embodiment.

The voice input unit 103 inputs a voice signal. The voice input unit 103 is realized by an acoustic input device such as a microphone, for example. The audio input unit 103 may be realized by an interface that receives an audio signal input from another device.

The server 200 includes a dialogue processing unit (server) 204. Note that the content of the dialog processing unit (server) 204 is the same as the content of the dialog processing unit 104A or the dialog processing unit 104B in the first embodiment.

Thus, in the present embodiment, one of the two dialogue processing units is provided in the terminal 100, and the other dialogue processing unit is provided in the server 200. In the present embodiment, the case where there are two dialogue processing units is illustrated, but the number of dialogue processing units may be three or more. At this time, at least one dialogue processing unit among the plurality of dialogue processing units may be provided in another device (for example, the server 200) connected via the communication network.

Next, the operation of this embodiment will be described. FIG. 4 is a flowchart showing an example of the operation of this embodiment.

The processing means determination unit 101 determines whether the server 200 or the terminal 100 performs the interactive process (Step S101a). For example, the processing means determination unit 101 may determine a target to be interactively processed according to connection / disconnection of a communication line, the strength of communication radio waves, and the load status of the entire system of the terminal 100 and the server 200. Further, the processing means determination unit 101 may determine a target for performing the interactive processing by combining one or more of these conditions.

The agent setting unit 102 determines an agent to be used in accordance with the result determined by the processing means determination unit 101 (step S103). The form and operation of the agent used are defined in advance, and the agent for terminal processing and the agent for server processing are stored as separate agents.

As the agent, a character is used that clearly indicates to the user whether the current process is being performed by the terminal 100 or the server 200. For example, in the case of server processing, an adult character may be an agent, and in the case of terminal processing, a child character may be an agent.

The voice input unit 103 receives a voice signal using an acoustic input device such as a microphone, and sends it to the dialogue processing unit 204 of the server 200 or the dialogue processing unit 104 of the terminal 100 according to the result determined by the processing means determination unit 101. An audio signal is input (step S102a).

Specifically, when interactive processing is performed by the server 200 (“server” in step S302), the voice input unit 103 transmits a voice signal to the server 200 via a network represented by the Internet.

On the other hand, when the dialog processing is performed on the terminal 100 (“terminal” in step S302), the dialog processing unit (terminal) 104 generates information (response information) related to the dialog response to the input voice signal (response information generation). Process) (step S104).

Hereinafter, a specific example of the response information generation process will be described with reference to FIG. FIG. 5 is a block diagram illustrating a configuration example for performing response information generation processing. In the example illustrated in FIG. 5, the response information generation process is realized by a voice recognition unit 1041, a response generation unit 1042, and a voice synthesis unit 1043. The speech recognition unit 1041, the response generation unit 1042, and the speech synthesis unit 1043 are included in the dialogue processing unit 104, for example.

The voice recognition unit 1041 performs voice recognition processing on the voice signal input by the voice input unit 103. Specifically, the voice recognition unit 1041 uses, for example, a voice recognition database 1044 (hereinafter referred to as a voice recognition DB 1044), analyzes the input voice linguistically and acoustically, and performs voice recognition processing. . As a speech recognition method, for example, a method using a statistical probability model such as HMM (Hidden Markov Model) can be considered.

The response generation unit 1042 generates response information based on the result of the voice recognition processing by the voice recognition unit 1041. The response information includes, for example, display information such as the agent's action, appearance, and facial expression, text information related to the voice to be uttered by the agent, text information displayed on the screen, operation information of devices and software, and the like.

The voice synthesis unit 1043 uses, for example, a voice synthesis database 1045 (hereinafter referred to as a voice synthesis DB 1045), and is based on text information to be uttered by the agent among the response information generated by the response generation unit 1042. Generate synthesized speech. The text information may include non-linguistic information such as emotions and intentions. In this case, the speech synthesizer 1043 may generate synthesized speech including emotions and intentions. The speech synthesizer 1043 may include the generated synthesized speech in the response information or may handle it as separate information.

When the server processing is selected in the switching determination in step S302, the dialogue processing unit (server) 204 performs processing (response information generation processing) for generating response information for the audio signal input in step S102a (step S204). . The dialogue processing unit (server) 204 transmits response information to the terminal 100 via a network represented by the Internet.

The response information generation process performed by the dialog processing unit (server) 204 may be the same as or different from that of the dialog processing unit (terminal) 204. Specifically, the methods for speech recognition, response generation, and speech synthesis performed by the dialogue processing unit (server) 204 may be different. The server 200 may use, for example, a large-scale database that cannot be installed in the terminal 100 for speech recognition and speech synthesis processing due to processing power.

The agent control unit 105 notifies the user of the synthesized voice via the set agent. Specifically, the agent control unit 105 uses a display device such as a display or a sound output device such as a speaker based on the response information generated by the dialogue processing unit (terminal) 104 or the dialogue processing unit (server) 204. Then, the agent setting unit 102 performs image display and audio output of the selected agent (step S105a).

If the response information includes device and software control information included in the terminal 100, the agent control unit 105 also performs these controls simultaneously.

As described above, according to the present embodiment, the appearance and operation of the agent are changed depending on whether the terminal 100 or the server 200 performs the voice conversation processing. Therefore, it is possible to provide an agent that is familiar to the user and that can understand the current processing at a glance.

Embodiment 3. FIG.
FIG. 6 is a block diagram showing a configuration example of the third embodiment of the agent control system according to the present invention. In addition, about the structure similar to 1st Embodiment, the code | symbol same as FIG. 1 is attached | subjected and description is abbreviate | omitted.

Referring to FIG. 6, the agent control system of this embodiment includes a terminal 100 and a server 200. That is, as shown in FIG. 6, the configuration of the agent control system of this embodiment is the same as that of the second embodiment. However, the operations of the agent setting unit 102, the dialogue processing unit (terminal) 104, and the dialogue processing unit (server) 204 are different.

The agent setting unit 102 determines an agent to be used according to the result determined by the processing means determination unit 101. When the dialog processing is performed at the terminal 100, the agent setting unit 102 transmits agent setting information to the dialog processing unit (terminal) 104. On the other hand, when the server 200 performs dialogue processing, the agent setting unit 102 transmits agent setting information to the dialogue processing unit (server) 204 via a network represented by the Internet.

Agent setting information is information set according to the attributes of the agent. For example, the character of the character, the language created as response information, its tone, etc. are specified in the agent setting information. However, the contents of the agent setting information are not limited to the above contents.

The dialogue processing unit (terminal) 104 or the dialogue processing unit (server) 204 uses agent setting information in addition to the method described in the second embodiment. Specifically, the dialogue processing unit (terminal) 104 and the dialogue processing unit (server) 204 change the response information to be generated according to the operation and appearance specific to each agent.

As described above, according to the present embodiment, control specific to each agent becomes possible. Therefore, in addition to the effect of the second embodiment, the difference for each agent can be made clearer. Therefore, an effect that the user can more easily understand the current process can be obtained.

Hereinafter, the present invention will be described with reference to specific examples, but the scope of the present invention is not limited to the contents described below.

FIG. 7 is a block diagram showing a configuration example of the first embodiment of the agent control system according to the present invention. The agent control system according to this embodiment includes a terminal 100 and a server 200.

Specifically, the terminal 100 is realized by a mobile terminal having a wireless network connection function using a network for mobile phones such as a wireless LAN (Local Area Network) or a 3G (3rd Generation) line. As a specific example of the mobile terminal, for example, a mobile phone such as a smartphone or a tablet terminal is assumed.

The server 200 is connected to the Internet through a telephone line or the like, and is connected to the terminal 100 through an Internet service provider or the like. The terminal 100 is assumed to have a telephone call function, a music playback function, a television viewing function, and a recording function that the telephone has.

The terminal 100 and the server 200 have substantially the same configuration as that of the third embodiment. The terminal 100 includes a speech recognition unit 1041, a speech recognition database 1044, a speech synthesis unit 1043, and a speech synthesis database 1045. Note that the speech recognition unit 1041, the speech recognition database 1044, the speech synthesis unit 1043, and the speech synthesis database 1045 have a small calculation scale or database scale or low performance due to limitations on processing capacity and storage capacity of the terminal 100. It has become a thing.

On the other hand, the server 200 includes a speech recognition unit 2041, a speech recognition database 2044, a speech synthesis unit 2043, and a speech synthesis database 2045. The processing capacity and storage capacity limitations of the server 200 are greatly relaxed compared to the terminal 100. Therefore, the speech recognition unit 2041, the speech recognition database 2044, the speech synthesis unit 2043, and the speech synthesis database 2045 have a larger calculation scale or database size or higher performance than the terminal 100. .

Usually, the terminal 100 and the server 200 are connected by a communication line. In this embodiment, it is assumed that the server 200 performs processing related to generation of response information for voice conversation. First, the flow of processing in this case will be described below.

In this case, since the communication line is in a connected state, the communication status determination unit 106 transmits information that the line is “connected” to the processing means determination unit 101. The processing means determination unit 101 stores a rule that “if connected, the process by the server is performed, and if not connected, the process by the terminal is performed”. Therefore, here, the processing means determination unit 101 determines to perform processing on the server. Information that “processing is performed by the server” is transmitted to the agent setting unit 102, and an agent to be used is determined.

As the agent, it is desirable to set a character that can clearly determine whether the server 200 or the terminal 100 is processing. Here, kangaroo parents and children are assumed as agents. When processing is performed by the server 200, the parent of the kangaroo (the child is in the bag) shall interact as an agent, and when processing is performed by the terminal 100, the child of the kangaroo (the parent is not displayed). Assume that the agent acts as a dialogue.

When such an agent is set, if the processing is high-performance (that is, processing is performed on the server), the parent agent is supported, and processing is low-performance (that is, processing is performed on the terminal). The child agent will respond. Therefore, the user can clearly understand which process is being performed intuitively.

Furthermore, when a child agent is set, the child agent may respond steadily. In this case, it is also possible to expect an effect that the user avoids requesting a highly accurate result or inputting a difficult keyword.

Hereinafter, the parent of kangaroo will be referred to as Agent A, and the child of Kangaroo will be referred to as Agent B. The change of the display of agent A and agent B is performed by preparing each image and a parameter for controlling the operation and replacing the image and the parameter.

Therefore, here, the agent A is displayed on the monitor 110 provided in the terminal 100, and the terminal 100 is in a waiting state for dialogue, that is, in a state waiting for a user's voice input.

The user inputs sound to the agent A displayed on the terminal 100 using an acoustic input device such as the microphone 108. The voice input unit 103 receives the voice signal and transmits the voice signal to the voice recognition unit 2041 of the server 200 through the network.

For example, it is assumed that an utterance (referred to as utterance U1) is made by the user, “record the drama that Taro Yamada will appear tomorrow”. The voice signal of the utterance is transmitted to the voice recognition unit 2041 of the server 200 via the voice input unit 103, and voice recognition processing is performed.

The speech recognition unit 2041 is connected to the speech recognition database 2044, and can perform speech recognition processing with higher accuracy than the speech recognition unit 2041 of the terminal 100. The voice recognition unit 2041 converts the utterance U1 into text information “Record a drama in which Tomorrow's Taro Yamada appears”, and transmits it to the response generation unit 2042.

The response generation unit 2042 extracts four keywords “Tomorrow”, “Taro Yamada”, “Drama”, and “Recording” from the text information of the speech recognition result, and generates response information. Here, whether there is a drama in which the actor “Taro Yamada” appears on the next day is searched from the program table stored in the server 200 or the terminal 100, and the drama titled “Spring Wind” is searched from 21:00. It is assumed that it was hit by. In this case, the response generation unit 2042 generates, for example, text information T1 “Drama“ Spring Wind ”is reserved from 21:00” and image information P1 information that the agent is operating the television.

When the response generation unit 2042 transmits the text information T1 to the speech synthesis unit 2043, the speech synthesis unit 2043 generates a synthesized speech V1 based on the text information T1. The voice synthesizer 2043 is connected to the voice synthesizer database 2045, and can perform voice synthesizer with higher quality than the voice synthesizer 1043 on the terminal 100.

When the voice synthesis unit 2043 transmits the synthesized voice V1 and the image information P1 to the agent control unit 105 of the terminal 100 via the network, the agent is displayed on the terminal 100 side. As a result, the terminal 100 uses the display device such as the monitor 110 to show the agent A operating the TV, and uses the sound output device such as the speaker 109 to reserve “Drama“ Spring Wind ”from 21:00. A response is made to say "I will do it".

Subsequently, a situation is assumed in which the communication line between the terminal 100 and the server 200 is not connected and the terminal 100 and the server 200 cannot communicate. In this case, the communication status determination unit 106 transmits information that the line is “not connected” to the processing means determination unit 101. The processing means determination unit 101 determines that processing is to be performed at the terminal, and the agent setting unit 102 sets the agent to be used in the agent B.

Processing when voice dialogue processing is performed on the terminal 100 will be described. In this case, the terminal 100 stands by with the agent B displayed on the screen.

Here, it is assumed that the utterance U1 (“Record the drama of Tomorrow Yamada tomorrow”) was made by the user as before. Also in this case, as in the case of the server processing, the voice signal of the utterance U1 is transmitted to the voice recognition unit 1041 of the terminal 100 via the voice input unit 103.

The speech recognition unit 1041 is connected to the speech recognition database 1044. As described above, the speech recognition database 1044 is smaller than the speech recognition database 2044 of the server 200.

In this case, the accuracy of the speech recognition process is lower than that of the server process, and the speech recognition rate for proper nouns such as personal names may be reduced. The speech synthesis database 1045 is connected to the speech synthesis unit 1043, which is also smaller than the speech synthesis database 2045 of the server 200. In this case, the quality of the synthesized speech is deteriorated as compared with the server processing.

Therefore, by displaying Agent B, it is clearly indicated that advanced processing is not possible. Also, by generating response information that prompts the user to utter without requiring advanced processing, the user is guided so as not to force advanced processing at the terminal 100. Specifically, in the case of the above-described recording, after receiving the utterance U1, a method of limiting free utterances such as “Please specify time and channel” can be considered.

When the line connection state is changed to the line connection state, the agent A may be displayed on the monitor 110 of the terminal 100 instead of the agent B. By doing so, the user can grasp that the advanced processing by the server 200 can be used again.

In this embodiment, the voice signal input by the voice input unit 103 is transmitted to the voice recognition unit 1041 or the voice recognition unit 2041 as it is. Note that the voice input unit 103 may include means for analyzing the acoustic feature amount. Then, the voice input unit 103 may transmit only the acoustic feature amount of the voice signal to the voice recognition unit 1041 or the voice recognition unit 2041. In this case, there is a possibility that an effect of reducing the communication amount can be obtained.

In addition, the voice input unit 103 transmits only the acoustic feature amount only when transmitting to the voice recognition unit 2041 of the server 200, and when transmitting to the voice recognition unit 1041 of the terminal 100, the voice signal may be transmitted as it is. Good.

In this embodiment, both voice recognition and synthesis processing are performed by the terminal 100 or the server 200. In addition, the terminal 100 performs voice recognition processing regardless of connection / disconnection of the line, and when the line is connected, the terminal 100 transmits the result to the server 200, and the server 200 executes the voice synthesis process. Also good. Of course, conversely, the terminal 100 may execute only the speech synthesis process. By doing so, the processing speed can be improved.

In this embodiment, a parent / child character agent is used to indicate to the user whether the server process or the terminal process is different. In addition, as a method for changing the agent, a method of preparing two types of images of parent and child and parameters for controlling the operation thereof and replacing the images with the parameters was used. In addition, a method of expressing two types of character agents by preparing one image and controlling parameters for changing the body shape is conceivable.

In this embodiment, two types of databases for speech synthesis are prepared for the terminal 100 and the server 200, but only the terminal 100 may include the speech synthesis unit 1043. In this case, the change of the character may be expressed by changing parameters for speech synthesis (eg, voice pitch, loudness, speech speed, etc.).

Subsequently, a second embodiment of the present invention will be described. The agent control system of this embodiment has the same configuration as that of the first embodiment. However, in this embodiment, the operations of the communication status determination unit 106, the processing means determination unit 101, and the agent setting unit 102 are different from those in the first embodiment. In the present embodiment, an operation for changing an agent character or changing an agent's appearance and operation according to a communication congestion state will be described.

The communication status determination unit 106 determines the communication line connection / disconnection status as in the first embodiment. Furthermore, the communication status determination unit 106 of this embodiment also determines the status of the communication line at the time of connection. As an index indicating the state of the communication line, radio wave intensity at the wireless terminal, the degree of congestion of the communication line, and the like can be considered.

The communication status determination unit 106 transmits information on the communication status to the processing means determination unit 101. Here, it is assumed that the radio wave intensity is 50% (that is, a situation in which the communication speed is only about half that of the strongest radio wave), and the communication status determining unit 106 uses “radio wave intensity: 50%” as information on the communication status. ”Is transmitted.

The processing means determination unit 101 determines a dialogue processing unit based on the information “radio wave intensity: 50%”. Here, the agent expresses an action and a state that reminds the user that although the server can perform high-performance processing, but the radio wave intensity is weak and it takes a long time to return the response of the dialogue.

Specifically, the terminal 100 produces an effect as if the character has moved away, such as by displaying the agent A of the parent kangaroo described in the first embodiment in a small size. In addition, when the radio wave intensity becomes strong, the character agent is displayed again in a large size so that the user can recognize that the response speed has been recovered.

Also, when the communication line is congested, a method may be considered in which the user is reminded that the line is congested by displaying many characters such as animals other than kangaroos on the screen. The agent setting unit 102 may set an agent having a feature that reminds the user of the status of the connected communication line.

In this embodiment, the agent expresses actions and appearances that remind the user that the signal strength is weak. In addition, when the radio field intensity falls below a certain threshold value or the degree of congestion exceeds a certain threshold value, the processing by the terminal 100 is switched to the processing by the server 200. At that time, as shown in the first embodiment, a character that reminds the user that the terminal process is being executed may be used as the agent.

Subsequently, a third embodiment of the present invention will be described. FIG. 8 is a block diagram showing a configuration example of the third embodiment of the agent control system according to the present invention. In the agent control system of the present embodiment, the server 200 includes a server load status determination unit 107 in addition to the configuration of the first embodiment. In the present embodiment, an operation for changing an agent character or changing an agent's appearance and operation according to a server load situation will be described.

The server load status determination unit 107 determines the load status of the server 200, and transmits server load information indicating the load status to the processing means determination unit 101. Here, it is assumed that interactive processing requests from a large number of terminals are transmitted at the same time, and the server load status determination unit 107 transmits information “server load factor: 80%”.

The processing means determination unit 101 determines the dialogue processing unit based on the information “server load factor: 80%”. Here, the agent expresses an action and a state that reminds the user that the server load is high and the server can perform high-performance processing, but takes a long time to return a response to the dialogue.

Specifically, the terminal 100 displays a large number of kangaroo characters that are painted differently from the color of the agent A described in the first embodiment, thereby producing a congestion. Further, when the server load factor decreases, the terminal 100 again displays the agent A alone. The agent setting unit 102 sets an agent having a feature that reminds the user of the load status of the server 200, so that the user can recognize that the response speed has been recovered.

Next, the outline of the present invention will be described. FIG. 9 is a block diagram showing an outline of the agent control system according to the present invention. The agent control system according to the present invention includes a plurality of dialogue processing means 81 (for example, a dialogue process with a user by generating response information for input information (for example, text, speech, speech synthesis result, etc.) from the user. , Dialog processing unit 104A, dialog processing unit 104B), processing unit determining unit 82 (for example, processing unit determining unit 101) for determining one dialog processing unit from a plurality of dialog processing units 81, and one determined dialog Agent setting means 83 (for example, agent setting section 102) for setting an agent according to the processing means, and agent control means 84 (for example, agent control section 105) for notifying the user of response information via the set agent. ).

The plurality of dialogue processing means 81 includes at least one dialogue processing means having different dialogue processing performance from other dialogue processing means.

The processing means determining means 82 determines one dialog processing means from a plurality of dialog processing means according to the situation (for example, system load, network load, etc.) in which the dialog processing is being performed.

With such a configuration, it is possible to provide a user-friendly dialogue process and make the user recognize the current processing status at a glance.

Further, the agent control system may include voice input means (for example, voice input unit 103) for inputting a voice signal. In addition, the dialogue processing unit 81 includes a voice recognition unit (for example, a voice recognition unit 1041) that performs voice recognition processing on the voice signal input by the voice input unit, and text to be uttered by the agent based on the result of the voice recognition processing. Response generation means (for example, response generation section 1042) for generating response information including information, and speech synthesis means (for example, speech synthesis section 1043) for generating synthesized speech based on text information included in the response information. You may go out. Then, the agent control means 84 may notify the user of the synthesized voice via the set agent. With such a configuration, a situation close to an actual dialogue can be created.

Moreover, at least one of the plurality of dialogue processing means 81 may be provided in another device (for example, the server 200) connected via the communication network. With such a configuration, interactive processing can be realized in an environment where restrictions on processing capacity and storage capacity are relaxed.

Also, the agent setting means 83 may set an agent having a feature that reminds the user of the load status of other devices. With such a configuration, the current processing status can be clearly recognized by the user.

The plurality of dialogue processing means 81 include at least one or more dialogue processing means having different dialogue processing performance from other dialogue processing means in a specific usage pattern (for example, a usage pattern, a scene, or an area). May be.

Further, the agent setting means 83 may set an anthropomorphic agent having a feature that reminds the user of the age according to the dialogue processing performance of the decided one dialogue processing means. With such a configuration, for example, when low-performance processing is performed, it can be expected to prevent the user from requesting a highly accurate result or inputting a difficult keyword.

Also, the agent setting means 83 may set an agent having a feature that reminds the user of the status of the connected communication line. With such a configuration, the current processing status can be clearly recognized by the user.

As mentioned above, although this invention was demonstrated with reference to embodiment and an Example, this invention is not limited to the said embodiment and Example. For example, regarding the types and connection methods of the devices of the terminal 100 and the server 200, the configuration and details of the present invention can be changed in various ways that can be understood by those skilled in the art within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2012-169985 filed on July 31, 2012, the entire disclosure of which is incorporated herein.

The present invention can be suitably applied to an agent control system that performs voice conversation using a mobile terminal. The present invention is preferably applied to, for example, an agent control system that performs device operation and information search using voice interaction.

DESCRIPTION OF SYMBOLS 100 Terminal 101 Processing means determination part 102 Agent setting part 103 Voice input part 104,104A, 104B, 204 Dialogue processing part 105 Agent control part 106 Communication condition determination part 107 Server load condition determination part 108 Microphone 109 Speaker 110 Monitor 200 Server 1041 Voice Recognition unit 1042 Response generation unit 1043

Speech synthesis unit

1044, 2044

Speech recognition database

1045, 2045 Speech synthesis database

Claims

A plurality of interactive processing means for performing interactive processing with the user by generating response information for the input information from the user;
Processing means determining means for determining one dialog processing means from the plurality of dialog processing means;
Agent setting means for setting an agent according to the determined one dialog processing means;
Agent control means for notifying the user of the response information via a set agent,
The plurality of interaction processing means includes at least one interaction processing means having different interaction processing performance from other interaction processing means,
The agent control system, wherein the processing means determining means determines one dialogue processing means from the plurality of dialogue processing means according to a situation where the dialogue processing is being performed.
A voice input means for inputting a voice signal;
The dialog processing means includes:
Voice recognition means for performing voice recognition processing on the voice signal input by the voice input means;
Response generating means for generating response information including text information to be uttered by the agent based on the result of the voice recognition processing;
Speech synthesis means for generating synthesized speech based on text information included in the response information,
The agent control system according to claim 1, wherein the agent control means notifies the user of the synthesized voice through a set agent.
The agent control system according to claim 1 or 2, wherein at least one of the plurality of dialogue processing means is provided in another device connected via a communication network.
The agent control system according to claim 3, wherein the agent setting means sets an agent having a feature that reminds a user of a load status of another device.
5. The plurality of interaction processing means includes at least one interaction processing means having different interaction processing performance from other interaction processing means in a specific usage form. Agent control system.
The agent setting means sets an anthropomorphized agent having a feature that reminds the user of the age according to the dialog processing performance of the determined one dialog processing means. The agent control system according to item 1.
The agent control system according to any one of claims 1 to 6, wherein the agent setting means sets an agent having a feature that reminds the user of the status of the connected communication line.
One interaction processing means is determined from a plurality of interaction processing means that perform interaction processing with the user by generating response information to input information from the user,
Set an agent according to the determined one interaction processing means,
Notifying the user of the response information through the set agent,
The plurality of dialog processing means including at least one or more dialog processing means having different dialog processing performance from other dialog processing means according to a situation in which the dialog processing is performed when determining the one dialog processing means. An agent control method characterized in that one dialogue processing means is determined.
Voice recognition processing is performed on the input audio signal,
Based on the result of the speech recognition process, generating response information including text information to be uttered by the agent,
Generating synthesized speech based on the text information included in the response information;
The agent control method according to claim 8, wherein the synthesized voice is notified to a user via a set agent.
On the computer,
Processing means determination processing for determining one dialogue processing means from a plurality of dialogue processing means for performing dialogue processing with the user by generating response information to input information from the user;
An agent setting process for setting an agent according to the determined one dialog processing means; and
Via the set agent, the agent control process for notifying the user of the response information is executed,
In the processing means determination process, one dialog from the plurality of dialog processing means including at least one dialog processing means having different dialog processing performance from other dialog processing means according to a situation in which the dialog processing is performed. Agent control program for determining processing means.