WO2014181524A1

WO2014181524A1 - Conversation processing system and program

Info

Publication number: WO2014181524A1
Application number: PCT/JP2014/002348
Authority: WO
Inventors: 孫　正義; 筒井　多圭志; 康介朝長; 賢之鎌谷; 輝稲葉
Original assignee: ソフトバンクモバイル株式会社; ソフネック株式会社
Priority date: 2013-05-09
Filing date: 2014-04-25
Publication date: 2014-11-13
Also published as: JP2014219594A

Abstract

A conversation system that can perform flexible conversations in accordance with a user state is desired, and to this end, provided is a conversation processing system that is equipped with: a voice acquisition unit that acquires the voice of a user; an emotion recognition unit that recognizes the emotions of the user on the basis of the voice acquired by the voice acquisition unit; a first conversation algorithm memory unit that stores first conversation algorithms correlating with each type of a plurality of emotions; a conversation algorithm selection unit that makes a selection, in correlation with the emotions recognized by the emotion recognition unit, from among the first conversation algorithms being stored in the first conversation algorithm memory unit; and a conversation performance unit that performs conversations with the user, in accordance with the first conversation algorithms selected by the conversation algorithm selection unit.

Description

Conversation processing system and program

The present invention relates to a conversation processing system and a program.

Conventionally, a conversation system for voice conversation with a user has been known. (For example, refer to Patent Document 1).
[Prior art documents]
[Patent Literature]
[Patent Document 1] Japanese Patent Application Laid-Open No. 2011-253389

A conversation system that can execute a flexible conversation according to the user's condition is desired.

In the first aspect of the present invention, a voice acquisition unit that acquires a user's voice, an emotion recognition unit that recognizes a user's emotion based on the voice acquired by the voice acquisition unit, and each of a plurality of types of emotions A first conversation algorithm storage unit that stores a first conversation algorithm (for example, emotion conversation algorithm) in association with the first conversation stored in the first conversation algorithm storage unit in association with the emotion recognized by the emotion recognition unit. A conversation processing system is provided that includes a conversation algorithm selection unit that selects an algorithm and a conversation execution unit that executes a conversation with a user according to a first conversation algorithm selected by the conversation algorithm selection unit.

The conversation processing system includes a second conversation algorithm storage unit that stores an execution condition in association with each of a plurality of second conversation algorithms (for example, a conditional conversation algorithm), and a plurality of second conversation algorithm storage units that are stored. An execution condition determining unit that determines that any one of the execution conditions is satisfied, and the conversation algorithm selecting unit is associated with the execution condition determined to be satisfied by the execution condition determining unit. The second conversation algorithm stored in the second conversation algorithm storage unit may be selected, and the conversation execution unit may execute the conversation with the user according to the second conversation algorithm selected by the conversation algorithm selection unit.

In the conversation processing system, the voice acquisition unit may acquire the user's voice while the conversation execution unit is executing a conversation with the user according to the second conversation algorithm. The conversation according to the second conversation algorithm selected by may be interrupted, and the conversation according to the first conversation algorithm selected by the conversation algorithm selection unit may be started. In the conversation processing system, the second conversation algorithm storage unit may further store a priority in association with each of the plurality of second conversation algorithms, and the emotion recognition unit may select the second conversation selected by the conversation algorithm selection unit. While the conversation execution unit is executing a conversation with the user according to the algorithm, the user's emotion may be recognized based on the voice acquired by the voice acquisition unit, and the conversation processing system may recognize the user's emotion recognized by the emotion recognition unit. You may further provide the 1st priority change part which changes the priority memorize | stored in association with the 2nd conversation algorithm which the conversation algorithm selection part selected based on emotion.

The conversation processing system selects one category from a plurality of categories based on an utterance content storage unit that stores a plurality of utterance contents in association with each of a plurality of categories and a voice acquired by the voice acquisition unit And a conversation execution unit that executes a conversation with the user based on a plurality of utterance contents stored in association with one category selected by the category selection unit. The voice acquisition unit may acquire the voice of the user while the conversation execution unit executes a conversation with the user based on a plurality of utterance contents stored in association with one category. Interrupts the conversation with the user based on a plurality of utterance contents stored in association with one category, and starts a conversation according to the first conversation algorithm selected by the conversation algorithm selection unit. There. In the above conversation processing system, the utterance content storage unit may further store a priority in association with each of the plurality of categories, and the emotion recognition unit may use the plurality of utterance contents stored in association with one category. While the conversation execution unit is executing a conversation with the user, the user's emotion may be recognized based on the voice acquired by the voice acquisition unit, and the conversation processing system recognizes the user's emotion recognized by the emotion recognition unit. A second priority changing unit that changes the priority stored in association with one category may be further provided.

In the above conversation processing system, the category selection unit determines the priority when switching the conversation with the user executed by the conversation execution unit according to a plurality of utterance contents stored in association with the selected category. On the basis of this, a category other than one category may be selected, and the conversation execution unit may associate a plurality of categories stored in the utterance content storage unit in association with a category other than the one category selected by the category selection unit. Depending on the utterance content, a conversation with the user may be started. In the conversation processing system, the plurality of categories may have a hierarchical structure, and the category selection unit is executed by the conversation execution unit according to the plurality of utterance contents stored in association with the selected one category. When switching a conversation with a user, a category adjacent to one category may be selected, and the conversation execution unit associates the category with the category adjacent to the one category selected by the category selection unit. A conversation with the user may be started based on a plurality of utterance contents stored in the storage unit.

In the conversation processing system, after the category selection unit selects one category, the speech acquired by the voice acquisition unit is associated with the one category in the selected state as utterance content and registered in the utterance content storage unit. A content registration unit may be further provided. The conversation processing system includes: a voice output control unit that outputs a first utterance content included in a plurality of utterance contents stored in association with one category selected by the category selection unit; and a first utterance content And a conversation data generation unit that generates conversation data including the first response content of the user 10 with respect to the first utterance content. The conversation execution unit may include the first utterance content and the first utterance content. After the conversation data including the response content is generated, when the voice acquisition unit acquires the voice that matches the first utterance content, the first response content may be output by voice, and the voice acquisition unit The second response content corresponding to the first response content output by the voice may be acquired, and the conversation data generation unit generates the conversation data associated in the order of the first utterance content, the first response content, and the second response content. It's okay.

In the above conversation processing system, the second conversation algorithm stored in the second conversation algorithm storage unit of the first information terminal possessed by the first user is a standard in which the similarity with the profile of the first user is determined in advance. An algorithm sharing processing unit for copying to a second conversation algorithm storage unit of a second information terminal possessed by a second user having a profile exceeding the threshold may be further provided.

In the second aspect of the present invention, there is provided a program for causing a computer to function as the conversation processing system.

Note that the above summary of the invention does not enumerate all the necessary features of the present invention. In addition, a sub-combination of these feature groups can also be an invention.

An example of the communication environment of the information terminal 100 is shown schematically. An example of the conversation process by the information terminal 100 is shown schematically. The other example of the conversation process by the information terminal 100 is shown schematically. The function structure of the information terminal 100 is shown schematically. An example of the operation | movement flow by the information terminal 100 is shown roughly. An example of the operation | movement flow by the information terminal 100 is shown roughly. An example of the conversation algorithm corresponding to pleasure is shown roughly. An example of the conversation algorithm corresponding to anger is shown roughly. An example of the conversation algorithm corresponding to sadness is shown roughly. An example of the hierarchical structure of the category 44 is shown schematically.

Hereinafter, the present invention will be described through embodiments of the invention. However, the following embodiments do not limit the invention according to the claims. In addition, not all the combinations of features described in the embodiments are essential for the solving means of the invention.

FIG. 1 schematically shows an example of a communication environment of the information terminal 100. The information terminal 100 has a voice input function and a voice output function, and executes a conversation with the user 10. The information terminal 100 is a mobile phone such as a smartphone, for example. The information terminal 100 may execute a conversation with the user 10 alone. Further, the information terminal 100 may execute a conversation with the user 10 in cooperation with the server 200 that can communicate via the communication network 20.

The information terminal 100 may be an example of a conversation processing system. Further, the conversation processing system may be configured by the information terminal 100 and the server 200. The information terminal 100 may be any device that has a voice input function and a voice output function. For example, the information terminal 100 may be a tablet terminal, a PC, a home appliance, a car, a car navigation system, a robot, a stuffed toy, or the like. It may be.

FIG. 2 schematically shows an example of conversation processing by the information terminal 100. The information terminal 100 may execute a conversation with the user 10 in accordance with the conversation algorithm stored in the conversation algorithm DB 30. The information terminal 100 may hold the conversation algorithm DB 30. Moreover, the server 200 may hold | maintain conversation algorithm DB30. The conversation algorithm DB 30 may be held by devices other than the information terminal 100 and the server 200.

The information terminal 100 may execute the conversation algorithm corresponding to the execution condition when the execution condition registered in the execution condition table 32 is satisfied. For example, the information terminal 100 executes the “program notification application” when there is a television program of a genre that the user 10 likes in the near future. For example, the information terminal 100 receives a TV program data providing service and acquires the profile data of the user 10 to determine that there is a TV program of a genre that the user 10 likes soon.

The “program notification application” is a conversation for notifying the user 10 that a TV program of a genre that the user 10 likes is broadcast based on the TV program data, or answering an inquiry from the user 10 regarding the program content. It may be an algorithm. For example, the “program notification application” may give a voice notification that there is a music program from 20:00 today, and may respond with “silver bomb will come out” in response to an inquiry from the user 10 about who will appear.

The “commuting assistance application” may be a conversation algorithm for notifying the user 10 of events related to commuting and answering inquiries from the user 10 regarding commuting. For example, the information terminal 100 executes the “commuting assistance application” when the operation information data received from the server that provides the train operation information indicates the delay of the commuter train. The “event notification application” may be a conversation algorithm that notifies the user 10 that the event registered in the calendar is about to be held, or answers an inquiry from the user 10 regarding the contents of the event. The information terminal 100 executes the “event notification application” on the day before the event registered in the calendar, for example.

The “umbrella alert app” may be a conversation algorithm that makes a voice notification that prompts the user 10 to hold an umbrella or answers an inquiry from the user 10 regarding the weather according to the probability of precipitation. For example, the information terminal 100 executes the “umbrella alert app” when the weather forecast data received from a server that provides weather forecast information occupies a precipitation probability of 30% or more. The “blood group fortune-telling application” may be a conversation algorithm for voice notification of the content of fortune-telling for the blood type designated by the user 10. For example, the information terminal 100 executes the “blood type fortune-telling app” when receiving an instruction to start the “blood type fortune-telling app”. Note that these conversation algorithms are examples, and the conversation algorithm DB 30 may store other conversation algorithms.

The execution condition table 32 may store priorities in association with each conversation algorithm. The information terminal 100 may select a conversation algorithm based on the priority. For example, when there are a plurality of conversation algorithms that satisfy the execution condition, the information terminal 100 may execute a conversation algorithm with a high priority or may execute a conversation algorithm in descending order of priority. Further, the information terminal 100 may select a conversation algorithm having a higher priority than the conversation algorithm currently being executed when the topic is switched halfway.

FIG. 3 schematically shows another example of conversation processing by the information terminal 100. The information terminal 100 may execute a conversation with the user 10 based on the plurality of utterance contents 46 stored in the conversation DB 40. Further, the information terminal 100 may execute a conversation with the user 10 based on the plurality of utterance contents 48 stored in the Q & ADB 42.

The conversation terminal 40 and the Q & ADB 42 may be held by the information terminal 100. The conversation DB 40 and the Q & ADB 42 may be held by the server 200. The conversation DB 40 and the Q & ADB 42 may be held by devices other than the information terminal 100 and the server 200.

The conversation DB 40 stores a plurality of utterance contents 46 in association with each of the plurality of categories 44. The information terminal 100 may select one category from a plurality of categories based on the voice of the user 10. For example, when the information terminal 100 determines that the voice of the user 10 matches any of the plurality of utterance contents 46, the information terminal 100 puts the category 44 including the utterance contents 46 into a selected state. For example, if the information terminal 100 determines that the keyword included in the voice of the user 10 matches the name of the category 44, the information terminal 100 may select the category 44.

In the example shown in FIG. 3, since the voice of the user 10 “Silver explosion is interesting” matches the utterance content 46 associated with “Entertainment: Musician: Silver bomb (silver bomber)”, the information terminal 100 Select “Entertainment: Musician: Silver Bomber”. Further, the information terminal 100 determines that the voice of the user 10 does not match any of the plurality of utterance contents 46 associated with “Entertainment: Musician: Silver Bomber”. If the keyword is included, “Entertainment: Musician: Silver Bomber” may be selected.

The information terminal 100 may output one of a plurality of utterance contents 46 associated with the selected category 44 as a response to the inquiry from the user 10. For example, the information terminal 100 may respond with “It is an air band” in response to the user 10's question “It ’s fun! Thereby, a highly relevant response can be made to the voice of the user 10.

The conversation DB 40 may store the priority 45 in association with each of the plurality of categories 44. The information terminal 100 may select another category 44 based on the priority 45 when switching the selected category 44 to another category 44. In addition, for example, when the information terminal 100 selects the category 44 without being based on the voice of the user 10, such as when the information terminal 100 speaks to the user 10, the information terminal 100 selects the category 44 based on the priority 45. It's okay.

The Q & ADB 42 stores a plurality of utterance contents 48 in association with each of the plurality of categories 44. Each of the plurality of utterance contents 48 may be associated with a question. When the information terminal 100 determines that the acquired voice of the user 10 matches one of the plurality of questions associated with the selected category 44, the information terminal 100 may output the utterance content 48 corresponding to the question as a voice. . In the example shown in FIG. 3, if “Entertainment: Musician: Silver Bomber” is selected and the voice of the user 10 is “Who is the vocalist?”, The information terminal 100 displays “Kirishu”. "It's right".

As described above, the information terminal 100 executes a conversation with the user 10 using the conversation algorithm DB 30, the conversation DB 40, the Q & ADB 42, and the like. The information terminal 100 according to the present embodiment uses an emotion recognition result for the voice of the user 10 in order to realize a more flexible conversation.

FIG. 4 schematically shows a functional configuration of the information terminal 100. Here, conversation algorithm DB 30, conversation DB 40, Q & ADB 42, execution condition determination unit 112, condition data acquisition unit 116, conversation algorithm selection unit 118, conversation execution unit 120, voice output control unit 122, voice acquisition unit 124, category selection unit 126 The information terminal 100 including the emotion recognition unit 128, the priority change unit 130, the utterance content registration unit 132, the conversation data generation unit 134, and the algorithm sharing processing unit 136 will be described.

The conversation algorithm DB 30 stores a plurality of conversation algorithms. The conversation algorithm DB 30 stores an emotion conversation algorithm 36 associated with each of a plurality of emotion types. The emotion conversation algorithm 36 may be an example of a first conversation algorithm. The conversation algorithm DB 30 may be an example of a first conversation algorithm storage unit.

Further, as described in FIG. 2, the conversation algorithm DB 30 stores a plurality of conditional conversation algorithms 34 each associated with an execution condition and an execution condition table 32. The conditional conversation algorithm 34 may be an example of a second conversation algorithm. The conversation algorithm DB 30 may be an example of a second conversation algorithm storage unit.

The conversation DB 40 stores a plurality of utterance contents 46 in association with each of the plurality of categories 44. The Q & ADB 42 stores a plurality of combinations of the question and the utterance content 48 in association with each of the plurality of categories 44. The conversation DB 40 and the Q & ADB 42 may be an example of an utterance content storage unit.

The execution condition determination unit 112 determines that any of a plurality of execution conditions registered in the execution condition table 32 is satisfied. The execution condition determination unit 112 refers to the condition data acquired by the condition data acquisition unit 116 that acquires the condition data related to the execution condition and the execution condition table 32, so that one of the plurality of execution conditions is satisfied. You may judge that.

For example, the condition data acquisition unit 116 acquires the genre of the television program that the user 10 likes from the profile data of the user 10. The condition data acquisition unit 116 acquires television program data via the communication network 20. The execution condition determination unit 112 may determine that the execution condition of the “program notification application” is satisfied from the TV program genre that the user 10 likes, the TV program data, and the execution condition table 32.

The conversation algorithm selection unit 118 selects the conditional conversation algorithm 34 stored in the conversation algorithm DB 30 in association with the execution condition determined to be satisfied by the execution condition determination unit 112.

The conversation execution unit 120 executes a conversation with the user 10 in accordance with the conditional conversation algorithm 34 selected by the conversation algorithm selection unit 118. The conversation execution unit 120 executes a conversation with the user 10 based on the utterance contents 46 and the utterance contents 48 stored in the conversation DB 40 and the Q & ADB 42.

For example, the conversation execution unit 120 applies a speech synthesis technique to the utterance contents included in the conditional conversation algorithm 34, the utterance contents 46 and the utterance contents 48 stored in the conversation DB 40 and the Q & ADB 42, thereby providing a voice output control unit. The voice is output to 122. In addition, when voice data is stored in the conversation algorithm DB 30, the conversation DB 40, and the Q & ADB 42, the conversation execution unit 120 may cause the voice output control unit 122 to output the voice data. The conversation execution unit 120 may recognize the voice of the user 10 by applying voice recognition technology to the voice of the user 10 acquired by the voice acquisition unit 124. As described above, the conversation execution unit 120 may execute the conversation with the user 10 by controlling the voice output control unit 122 and the voice acquisition unit 124.

The voice output control unit 122 outputs voice according to the control of the conversation execution unit 120. The voice output control unit 122 may output the voice data and voice synthesis data specified by the conversation execution unit 120 as voices using a speaker or the like.

The voice acquisition unit 124 acquires the voice of the user 10 using a microphone or the like. The voice acquisition unit 124 may transmit the acquired voice to the conversation execution unit 120. The voice acquisition unit 124 may transmit the acquired voice to the category selection unit 126.

The category selection unit 126 sets one category 44 among the plurality of categories 44 based on the voice acquired by the voice acquisition unit 124. For example, when the category selection unit 126 determines that the voice acquired by the voice acquisition unit 124 matches any of the plurality of utterance contents 46 stored in the conversation DB 40, the category selection unit 126 selects the category 44 including the utterance contents 46 as a selected state. To do. For example, the category selection unit 126 determines that they match when the degree of coincidence between the voice recognition result of the voice acquired by the voice acquisition unit 124 and the utterance content 46 is higher than a predetermined threshold.

In addition, for example, when the category selection unit 126 determines that the keyword included in the voice acquired by the voice acquisition unit 124 matches the name of the category 44, the category selection unit 126 sets the category 44 in a selected state. For example, the category selection unit 126 determines that a match is found when the degree of matching between the keyword included in the voice acquired by the voice acquisition unit 124 and the name of the category 44 is higher than a predetermined threshold.

The conversation execution unit 120 may execute a conversation with the user 10 using the plurality of utterance contents 46 and utterance contents 48 stored in the conversation DB 40 and the Q & ADB 42 in association with the category 44 selected by the category selection unit 126.

The voice acquisition unit 124 may transmit the acquired voice to the emotion recognition unit 128. The emotion recognition unit 128 recognizes the emotion of the user 10 based on the voice of the user 10 acquired by the voice acquisition unit 124.

The emotion recognition unit 128 may recognize the emotion of the user 10 using existing voice emotion recognition technology. The emotion recognition unit 128 may recognize the emotion of the user 10 based on the prosody, sound quality, and phoneme of the user 10 voice. The emotion recognition unit 128 may recognize the emotion of the user 10 based on the voice recognition result of the user 10 voice. The emotion recognition unit 128 may recognize the emotion of the user 10 by combining these. For example, emotions such as joy, anger, and sadness may be recognized by the emotion recognition unit 128. When the emotion recognition unit 128 cannot recognize any emotion from the voice of the user 10, the emotion recognition unit 128 may output result data indicating that fact.

The emotion recognition unit 128 may transmit the emotion recognition result to the conversation algorithm selection unit 118. The conversation algorithm selection unit 118 selects the emotion conversation algorithm 36 stored in the conversation algorithm DB 30 in association with the emotion recognized by the emotion recognition unit 128. For example, when the emotion recognition unit 128 recognizes pleasure, the conversation algorithm selection unit 118 selects the emotion conversation algorithm 36 stored in association with pleasure. The conversation algorithm selection unit 118 may transmit the selected emotion conversation algorithm 36 to the conversation execution unit 120.

The conversation execution unit 120 may execute a conversation with the user 10 according to the emotion conversation algorithm 36 selected by the conversation algorithm selection unit 118. For example, when the conversation algorithm selection unit 118 selects the emotion conversation algorithm 36 corresponding to pleasure, the conversation execution unit 120 executes a conversation with the user 10 according to the emotion conversation algorithm 36 corresponding to pleasure. Thereby, the conversation execution part 120 can perform the conversation suitable for the emotion of the user 10 recognized during the conversation with the user 10.

The emotion recognition unit 128 may further transmit the emotion recognition result to the priority changing unit 130. The priority changing unit 130 may change the priority associated with the conditional conversation algorithm 34 based on the emotion recognition result. For example, when the emotion recognition unit 128 recognizes a positive emotion while executing one conditional conversation algorithm 34, the priority changing unit 130 improves the priority of the one conditional conversation algorithm 34. Further, the priority changing unit 130 may lower the priority of the one conditional conversation algorithm 34 when the emotion recognition unit 128 recognizes a negative emotion while executing the one conditional conversation algorithm 34.

For example, the priority changing unit 130 improves the priority of the “program notification app” when the emotion recognition unit 128 recognizes a positive emotion while executing the “program notification app”. For example, when the emotion recognition unit 128 recognizes a negative emotion while executing the “umbrella alert app”, the priority change unit 130 decreases the priority of the “umbrella alert app”. Thereby, according to a user's emotion, the priority of the conditional conversation algorithm 34 can be changed appropriately.

Also, the priority changing unit 130 may change the priority 45 associated with the category 44 based on the emotion recognition result. For example, when the emotion recognition unit 128 recognizes a positive emotion in a state where one category 44 is selected, the priority changing unit 130 improves the priority of the one category 44. Moreover, the priority change part 130 reduces the priority of the said one category 44, when the emotion recognition part 128 recognizes a negative emotion in the state from which the one category 44 was selected. Thereby, the priority of the category 44 in which the user 10 has a positive emotion can be improved, and the priority of the category 44 in which the user 10 has a negative emotion can be reduced.

The utterance content registration unit 132 additionally registers the voice acquired by the voice acquisition unit 124 as the utterance content 46 corresponding to the one category 44 after the category selection unit 126 selects one category. For example, when “Entertainment: Musician: Silver Explosion” is selected and the user's 10 voice “It was formed in 2004” is acquired, the utterance content registration unit 132 displays “Entertainment: Musician: Silver”. As an utterance content 46 corresponding to “explosion”, “you were formed in 2004” is additionally registered. Thereby, conversation DB40 can be enriched.

The conversation data generation unit 134 generates conversation data indicating a conversation flow. The conversation data indicating the conversation flow includes continuous utterance contents and response contents. For example, the conversation data may be a conversation flow such as “Member B seems to be handsome”, “Hey, is that so”, and “Muscle is also amazing”.

The conversation data generation unit 134 first performs voice output control of the first utterance content 46 (eg, “Member B seems to be handsome”) from the plurality of utterance contents 46 stored in association with the selected category 44. The unit 122 outputs a sound. Next, the conversation execution unit 120 acquires the first response content (eg, “Hey, that's right”) by the user 10 with respect to the first utterance content 46. Then, the conversation data generation unit 134 generates conversation data including the first utterance content 46 and the first response content.

After generating the conversation data, the conversation execution unit 120 outputs the first response content as a voice when the voice matching the first utterance content 46 is acquired from the user 10. As a result, a human-like natural response can be made. Furthermore, the conversation execution unit 120 acquires the second response content of the user 10 with respect to the first response content output by voice (for example, “muscles are also amazing”). Then, the conversation data generation unit 134 generates conversation data by ordering the first utterance content 46, the first response content, and the second response content. Thereby, the conversation data which can continue receiving and answering like a human being can be generated.

In addition, although the example which produces | generates conversation data with respect to the one user 10 was given and demonstrated here, you may produce | generate conversation data with respect to the two users 10. FIG. Further, conversation data may be generated by the two information terminals 100. For example, the conversation data generation unit 134 generates conversation data including the first utterance content 46 and the first response content, and then transmits the conversation data to another information terminal 100. Then, in another information terminal 100, when the voice that matches the first utterance content 46 is acquired, the first response content is output as voice, the second response content is acquired, and the first utterance content 46, first The response contents and the second response contents are ordered to generate conversation data.

The algorithm sharing processing unit 136 copies the conditional conversation algorithm 34 stored in the conversation algorithm DB 30 to the conversation algorithm DB 30 included in another information terminal 100. For example, the algorithm sharing processing unit 136 adds the conditional conversation algorithm 34 to the conversation algorithm DB 30 of the information terminal 100 possessed by another user who has a profile whose similarity with the profile of the user 10 exceeds a predetermined standard. make a copy. For example, when the address, age, and gender match, the algorithm sharing processing unit 136 may determine that the profile similarity exceeds a standard.

This makes it possible to provide a useful conditional conversation algorithm 34 for users having similar profiles. The algorithm sharing processing unit 136 may copy, for example, a trial version of the conditional conversation algorithm 34 to the conversation algorithm DB 30 of the other information terminal 100. Thereby, the user 10 possessing the other information terminal 100 can try the conditional conversation algorithm 34 and can be used as a reference for purchasing the conditional conversation algorithm 34.

The server 200 may have the same functional configuration as that shown in FIG. In this case, the audio output control unit 122 may control the information terminal 100 so that the information terminal 100 outputs the audio. Further, the voice acquisition unit 124 may receive the voice of the user 10 from the information terminal 100.

Further, when the server 200 includes the conversation data generation unit 134, the server 200 first selects one information terminal 100 at random from the plurality of information terminals 100, and outputs the first utterance content 46 by voice. Next, the information terminal 100 is made to acquire the first response content, and conversation data including the first utterance content 46 and the first response content is generated. Next, the conversation of a plurality of information terminals 100 is monitored, and the information terminal 100 that has acquired the same voice as the first utterance content 46 is specified. Then, the identified information terminal 100 is caused to output the first response content by voice and acquire the second response content. As a result, the server 200 can generate conversation data that can continue to be accepted and answered like humans.

Further, when the server 200 includes the algorithm sharing processing unit 136, the server 200 first arbitrarily selects one information terminal 100. Then, the server 200 identifies the information terminal 100 possessed by another user having a profile whose similarity with the profile of the user 10 of the selected information terminal 100 exceeds a predetermined standard, and performs copy processing Execute.

FIG. 5 schematically shows an example of an operation flow by the information terminal 100. The operation flow illustrated in FIG. 5 may be started when an execution instruction for conversation processing according to the present embodiment is received.

In step 502 (step may be abbreviated as S), the execution condition determination unit 112 determines whether any of a plurality of execution conditions registered in the execution condition table 32 is satisfied. to decide. If it is determined in S502 that the condition is satisfied, the process proceeds to S504.

In S504, the conversation algorithm selection unit 118 selects the conditional conversation algorithm 34 stored in the conversation algorithm DB 30 in association with the execution condition determined to be satisfied by the execution condition determination unit 112. In S506, according to the selected conditional conversation algorithm 34, the conversation execution unit 120 acquires the utterance content. The conversation execution unit 120 may acquire the utterance content registered in association with the selected conditional conversation algorithm 34 from the conversation algorithm DB 30. The conversation algorithm DB 30 may hold an utterance content that is first voice-output when the conditional conversation algorithm 34 is selected. The conversation algorithm DB 30 may hold utterance contents indicating a response to the voice of the user 10. The response to the voice of the user 10 is registered in advance by, for example, the administrator of the conversation algorithm DB 30. The administrator of the conversation algorithm DB 30 may register the voice of the user 10 assumed in advance and the utterance content indicating the response in the conversation algorithm DB 30.

In S508, the conversation execution unit 120 causes the voice output control unit 122 to output the utterance content acquired in S506. In S510, the conversation execution unit 120 determines whether or not a conversation end instruction from the user 10 has been received. If an end instruction has not been received, the process proceeds to S512. In S512, the voice acquisition unit 124 acquires the voice uttered by the user 10 with respect to the voice output in S508.

In S514, the emotion recognition unit 128 recognizes the emotion of the user 10 based on the voice of the user 10 acquired by the voice acquisition unit 124. In S516, conversation execution unit 120 determines whether or not the emotion recognized in S514 matches a predetermined emotion. The predetermined emotion is, for example, an emotion associated with the emotion conversation algorithm 36. The predetermined emotion may be an emotion designated in advance among a plurality of emotions.

If it is determined in S516 that the emotion matches the predetermined emotion, the process proceeds to S518. If it is determined that the emotion does not match, the process returns to S508. In S518, the priority changing unit 130 changes the priority of the conditional conversation algorithm 34 selected in S504 based on the emotion recognized in 514.

In S520, the conversation execution unit 120 interrupts the conversation according to the conditional conversation algorithm 34, and the conversation with the user 10 is performed in accordance with the emotion conversation algorithm 36 stored in the conversation algorithm DB 30 in association with the emotion recognized in S514. Execute. In S522, when it is determined that the original conversation is continued as a result of the conversation execution unit 120 executing the emotion conversation algorithm 36 in S520, the process returns to S508. If it is not determined in S522 to continue the original conversation, the process ends.

In S524, the conversation execution unit 120 determines whether the utterance content can be acquired. The conversation execution unit 120 may determine whether or not the utterance content corresponding to the voice acquired in S512 can be acquired from the conversation algorithm DB 30. For example, the conversation algorithm selected in S504 is “blood group fortune-telling app”, and in S508, “What is your blood type?” Is output as a voice, whereas a pre-registered user voice, for example “ When the voice “A type” is acquired in S512, the conversation execution unit 120 determines that the utterance content has been acquired. On the other hand, when the user voice that is not registered in advance, for example, the voice of “What is your blood type?” Is acquired in S512, the conversation execution unit 120 cannot acquire the utterance content. judge. If it is determined in S524 that the utterance content has been acquired, the process proceeds to S508, and the conversation execution unit 120 causes the audio output control unit 122 to output the acquired utterance content as audio. If it is determined in S524 that the utterance content could not be acquired, the process proceeds to the flow of FIG.

FIG. 6 schematically shows an example of an operation flow by the information terminal 100. The operation flow illustrated in FIG. 6 may be started when it is determined in S524 of FIG. 5 that the utterance content has not been acquired. Further, the operation flow illustrated in FIG. 6 may be started at an arbitrary timing at which the voice from the user 10 is acquired, for example.

In S602, the category selection unit 126 selects one category 44 among the plurality of categories 44 based on the voice of the user 10. In S 604, the conversation execution unit 120 acquires one utterance content 46 from the plurality of utterance contents 46 associated with the selected category 44 stored in the conversation DB 40.

In S606, the conversation execution unit 120 causes the audio output control unit 122 to output the utterance content 46 acquired in S604. In step S 608, the conversation execution unit 120 determines whether a conversation end instruction from the user 10 has been received. If an end instruction has not been received, the process proceeds to S610.

In S610, the voice acquisition unit 124 acquires the user's voice with respect to the voice output in S606. In S612, the emotion recognition unit 128 recognizes the emotion of the user 10 based on the voice acquired in S610. In S614, conversation execution unit 120 determines whether or not the emotion recognized in S612 matches a predetermined emotion. The predetermined emotion is, for example, an emotion associated with the emotion conversation algorithm 36. The predetermined emotion may be an emotion designated in advance among a plurality of emotions.

If it is determined in S614 that the emotion matches the predetermined emotion, the process proceeds to S616. If it is determined that the emotion does not match, the process proceeds to S622. In S616, the priority changing unit 130 changes the priority of the category 44 selected in S602 based on the emotion recognized in S612.

In S618, the conversation execution unit 120 interrupts the conversation based on the plurality of utterance contents 46 associated with the selected category 44, and the emotion conversation stored in the conversation algorithm DB 30 in association with the emotion recognized in S612. A conversation with the user 10 is executed according to the algorithm 36. In S620, if it is determined that the original conversation is continued as a result of the conversation execution unit 120 executing the emotion conversation algorithm 36 in S618, the process proceeds to S622. If it is not determined in S620 to continue the original conversation, the process ends.

In S622, the conversation execution unit 120 determines whether the utterance content 46 can be acquired. The conversation execution unit 120 may determine whether or not the utterance content 46 can be acquired from the conversation DB 40 based on the selected category 44. For example, the conversation execution unit 120 determines that the utterance content 46 has been acquired when one utterance content 46 to be output by voice is obtained from the plurality of utterance contents 46 associated with the category 44 in the selected state. If it cannot be acquired, it is determined that the utterance content 46 cannot be acquired. For example, in one conversation process, the conversation execution unit 120 outputs all of the plurality of utterance contents 46 associated with the selected category 44, and when there is no unoutput utterance contents 46, the utterance contents 46 are displayed. It is determined that it could not be acquired.

If it is determined in S622 that the utterance content 46 has been acquired, the process returns to S606, and the conversation execution unit 120 causes the voice output control unit 122 to output the utterance content 46 acquired in S622. If it is determined in S622 that the utterance content 46 has not been acquired, the process proceeds to S624.

In S624, the execution condition determination unit 112 determines whether any of a plurality of execution conditions registered in the execution condition table 32 is satisfied. If it is determined that any one of the execution conditions is satisfied, the process proceeds to S504 in FIG. As described above, the information terminal 100 according to the present embodiment may advance the conversation with the user 10 by appropriately switching between the conversation process using the conversation algorithm DB 30 and the conversation process using the conversation DB 40 and the Q & ADB 42.

If it is determined in S624 that none of the plurality of execution conditions is satisfied, the process proceeds to S626. In S626, the conversation execution unit 120 executes error processing. For example, the conversation execution unit 120 notifies the user 10 that the conversation process is to be terminated, and then terminates the conversation process. Note that the conversation execution unit 120 may continue the conversation process by notifying the user 10 to urge other utterances and acquiring a new voice from the user 10. In this case, for example, after notifying the user 10 of another utterance in S626, the process may return to S602 or S610.

FIG. 7 schematically shows an example of the emotion conversation algorithm 36 corresponding to pleasure. In step S 702, the conversation execution unit 120 causes the audio output control unit 122 to output thanks for appreciation that the user 10 feels joy and to inquire whether or not to continue the conversation. For example, the audio output control unit 122 outputs “Thank you. Do you want to continue?”

In S704, the voice acquisition unit 124 acquires the voice uttered by the user 10 with respect to the voice output in S702. Then, the conversation execution unit 120 determines whether or not the user 10 wishes to continue the conversation based on the voice of the user 10 acquired by the voice acquisition unit 124. When the conversation execution unit 120 determines that it is desired, the process proceeds to S706, and when it is determined that it is not desired, the process proceeds to S710.

In S706, the conversation execution unit 120 causes the voice output control unit 122 to output a voice expressing joy that the user 10 has selected to continue the conversation. For example, the audio output control unit 122 outputs “I am happy”.

In S708, the conversation execution unit 120 determines to continue the original conversation. Then return. That is, if it is the operation | movement flow of FIG. 5, it will progress to S522, and if it is the operation | movement flow of FIG. 6, it will progress to S620.

In S710, the conversation execution unit 120 causes the audio output control unit 122 to output the content that the user 10 does not wish to continue the conversation and the content that indicates that the conversation is to be ended. For example, the voice output control unit 122 outputs a voice saying “Yes, today.

Thus, according to the emotion conversation algorithm 36 corresponding to pleasure, the conversation in which the user 10 feels pleasure can be continued. Further, when the user 10 is satisfied, the conversation can be appropriately terminated without continuing the conversation persistently.

FIG. 8 schematically shows an example of a conversation algorithm corresponding to anger. In step S 802, the conversation execution unit 120 causes the audio output control unit 122 to output a sound corresponding to the user 10 feeling angry. For example, the audio output control unit 122 outputs an audio message “Acha, have you done it again?”.

In S804, the voice acquisition unit 124 acquires the voice uttered by the user 10 with respect to the voice output in S802. Then, the conversation execution unit 120 determines whether or not the user 10 is angry based on the voice of the user 10 acquired by the voice acquisition unit 124. When the conversation execution unit 120 determines that it feels anger, the process proceeds to S806, and when it determines that it does not feel anger, the process proceeds to S810.

In S806, the conversation execution unit 120 causes the voice output control unit 122 to output a voice indicating that voice output is not performed for a while. For example, the voice output control unit 122 outputs a voice saying “Please keep quiet for a while”. In S808, the conversation execution unit 120 shifts to a standby state for waiting for conversation.

In S810, the conversation execution unit 120 causes the voice output control unit 122 to output a voice in response to an erroneous emotion recognition result. For example, the voice output control unit 122 outputs a voice “That?”. Thus, according to the emotion conversation algorithm 36 corresponding to anger, when the user 10 feels anger, a conversation can be stopped and it can transfer to a standby state.

FIG. 9 schematically shows an example of a conversation algorithm corresponding to sadness. In step S 902, the conversation execution unit 120 causes the audio output control unit 122 to output a sound for confirming whether the user 10 feels sadness. For example, the voice output control unit 122 outputs a voice saying “That's bad glue”.

In S904, the voice acquisition unit 124 acquires the voice uttered by the user 10 with respect to the voice output in S902. Then, the conversation execution unit 120 determines whether the user 10 feels sadness based on the voice of the user 10 acquired by the voice acquisition unit 124. When the conversation execution unit 120 determines that it is felt, the process proceeds to S906, and when it is determined that it is not felt, the process proceeds to S910.

In S906, the conversation execution unit 120 causes the audio output control unit 122 to output a voice that notifies the user 10 that the topic is to be switched. The audio output control unit 122 outputs, for example, “let's talk about another topic”.

In S908, the conversation execution unit 120 switches conversations. For example, when one category 44 is in a selected state, the conversation execution unit 120 puts another category 44 into a selected state. When a plurality of categories 44 have a hierarchical structure, the conversation execution unit 120 may select a category 44 adjacent to one category 44 that has been selected.

Further, the conversation execution unit 120 may select a category 44 having a higher priority than the one category 44 that has been selected. Further, the conversation execution unit 120 may select a category 44 selected at random from the plurality of categories 44.

In S910, the conversation execution unit 120 determines to continue the original conversation. Then return. That is, if it is the operation | movement flow of FIG. 5, it will progress to S522, and if it is the operation | movement flow of FIG. 6, it will progress to S620. Thus, according to the emotion conversation algorithm 36 corresponding to sadness, the conversation can be switched in order to make the user 10 have a more enjoyable emotion.

FIG. 10 schematically shows an example of the hierarchical structure of the category 44. A priority 54 is assigned to each of the plurality of category names 52. For example, when the conversation is switched while “entertainment: musician: group A” is selected, the category selection unit 126 may select the adjacent “entertainment: musician: group B”. In this case, the conversation execution unit 120 may cause the audio output control unit 122 to output a sound including the upper category name 52. For example, the voice output control unit 122 outputs a voice saying “Speaking of musicians, it is group B”. In this way, by switching to the adjacent category 44, it is possible to execute a conversation related to a category that is highly related to the currently selected category.

Further, the category selection unit 126 may select a category 44 other than the currently selected “entertainment: musician: group A” based on the priority. For example, the category selection unit 126 may select “TV: mail order program” having a higher priority than the currently selected category 44. Thereby, it is possible to switch to a conversation with higher priority for the user 10.

Further, the category selection unit 126 may select a category 44 other than the currently selected “entertainment: musician: group A” based on the profile of the user 10. For example, when “marathon” is registered as a hobby in the profile of the user 10, the category selection unit 126 may select “sports general: athletics: marathon”. Thereby, it can switch to the conversation suitable for the user's 10 profile.

In the above description, each unit of the information terminal 100 may be realized by hardware or may be realized by software. Further, it may be realized by a combination of hardware and software. For example, a computer may function as a part of the information terminal 100 by executing a program on the information terminal 100. The program may be stored in a computer-readable medium, or may be stored in a storage device connected to a network. In an information processing apparatus having a general configuration including a data processing device having a CPU, ROM, RAM, communication interface, etc., an input device, an output device, and a storage device, the operation of each part of the information terminal 100 is defined. The information terminal 100 may be realized by starting software or a program.

A program that is installed in a computer and causes the computer to function as a part of the information terminal 100 according to the present embodiment includes a module that defines the operation of each unit of the information terminal 100. These programs or modules work on the CPU or the like to cause the computer to function as each unit of the information terminal 100. Information processing described in these programs functions as a specific means in which software and the various hardware resources described above cooperate with each other by being read by a computer. A specific measurement device according to the purpose of use can be constructed by realizing calculation or processing of information according to the purpose of use of the computer in the present embodiment by these specific means.

The server 200 includes a data unit having a CPU, a ROM, a RAM, a communication interface, an input unit such as a keyboard, a touch panel, and a microphone, an output unit such as a display and a speaker, and a storage unit such as a memory and an HDD. The information processing apparatus having a general configuration provided may be realized by activating software or a program that defines the operation of each unit of the server 200. The server 200 may be a virtual server or a cloud system.

As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

The execution order of each process such as operations, procedures, steps, and stages in the apparatus, system, program, and method shown in the claims, the description, and the drawings is particularly “before” or “prior”. It should be noted that it can be realized in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the description, and the drawings, even if it is described using “first”, “next”, etc. for the sake of convenience, it means that it is essential to carry out in this order. is not.

10 users, 20 communication networks, 30 conversation algorithm DB, 32 execution condition table, 34 conditional conversation algorithm, 36 emotion conversation algorithm, 40 conversation DB, 42 Q & ADB, 44 category, 45 priority, 46 utterance content, 48 utterance content, 52 Category name, 54 priority, 100 information terminal, 112 execution condition determination unit, 116 condition data acquisition unit, 118 conversation algorithm selection unit, 120 conversation execution unit, 122 voice output control unit, 124 voice acquisition unit, 126 category selection unit, 128 emotion recognition unit, 130 priority change unit, 132 utterance content registration unit, 134 conversation data generation unit, 136 algorithm sharing processing unit, 200 server

Claims

An audio acquisition unit for acquiring the user's audio;
An emotion recognition unit that recognizes the user's emotion based on the voice acquired by the voice acquisition unit;
A first conversation algorithm storage unit that stores a first conversation algorithm in association with each of a plurality of emotion types;
A conversation algorithm selection unit that selects a first conversation algorithm stored in the first conversation algorithm storage unit in association with the emotion recognized by the emotion recognition unit;
A conversation processing system comprising: a conversation execution unit that executes a conversation with the user according to a first conversation algorithm selected by the conversation algorithm selection unit.
A second conversation algorithm storage unit that stores execution conditions in association with each of the plurality of second conversation algorithms;
An execution condition determination unit that determines that any one of the plurality of execution conditions stored in the second conversation algorithm storage unit is satisfied;
The conversation algorithm selection unit selects a second conversation algorithm stored in the second conversation algorithm storage unit in association with the execution condition determined to be satisfied by the execution condition determination unit;
The conversation processing system according to claim 1, wherein the conversation execution unit executes a conversation with a user according to a second conversation algorithm selected by the conversation algorithm selection unit.
The voice acquisition unit acquires the voice of the user while the conversation execution unit is executing a conversation with the user according to a second conversation algorithm,
The conversation execution unit interrupts the conversation according to the second conversation algorithm selected by the conversation algorithm selection unit, and starts the conversation according to the first conversation algorithm selected by the conversation algorithm selection unit. The conversation processing system described.
The second conversation algorithm storage unit further stores a priority in association with each of the plurality of second conversation algorithms;
The emotion recognizing unit is based on the voice acquired by the voice acquisition unit while the conversation execution unit is executing a conversation with the user according to the second conversation algorithm selected by the conversation algorithm selection unit. Recognize emotions,
The conversation processing system includes:
A first priority changing unit that changes the priority stored in association with the second conversation algorithm selected by the conversation algorithm selection unit based on the emotion of the user recognized by the emotion recognition unit. The conversation processing system according to claim 2 or 3.
An utterance content storage unit that stores a plurality of utterance contents in association with each of a plurality of categories
A category selection unit that selects one of the plurality of categories based on the voice acquired by the voice acquisition unit;
The conversation execution unit executes a conversation with the user according to a plurality of utterance contents stored in association with the one category selected by the category selection unit,
The voice acquisition unit acquires the voice of the user while the conversation execution unit is executing a conversation with the user based on a plurality of utterance contents stored in association with the one category,
The conversation execution unit interrupts a conversation with the user based on a plurality of utterance contents stored in association with the one category, and starts a conversation according to the first conversation algorithm selected by the conversation algorithm selection unit. The conversation processing system according to any one of claims 1 to 4.
The utterance content storage unit further stores a priority in association with each of the plurality of categories,
The emotion recognition unit is based on the voice acquired by the voice acquisition unit while the conversation execution unit is executing a conversation with the user based on a plurality of utterance contents stored in association with the one category. Recognizing the user's emotions,
The conversation processing system includes:
The conversation according to claim 5, further comprising a second priority changing unit that changes the priority stored in association with the one category based on the emotion of the user recognized by the emotion recognition unit. Processing system.
The category selection unit is based on the priority when switching the conversation with the user being executed by the conversation execution unit according to a plurality of utterance contents stored in association with the one category selected. To select a category other than the one category,
The conversation execution unit starts a conversation with the user based on a plurality of utterance contents stored in the utterance content storage unit in association with a category other than the one category selected by the category selection unit. The conversation processing system according to claim 6.
The plurality of categories have a hierarchical structure;
The category selection unit switches to the one category when switching conversations with the user being executed by the conversation execution unit according to a plurality of utterance contents stored in association with the one category selected. Select the adjacent category,
The conversation execution unit starts a conversation with the user based on a plurality of utterance contents stored in the utterance content storage unit in association with a category adjacent to the one category selected by the category selection unit. The conversation processing system according to claim 6.
After the category selection unit selects the one category, the voice acquired by the voice acquisition unit is associated with the one category in the selection state as the utterance content and is registered in the utterance content storage unit. The conversation processing system according to any one of claims 5 to 8, further comprising an utterance content registration unit.
A voice output control unit that outputs a first utterance content included in a plurality of utterance contents stored in association with the one category selected by the category selection unit;
A conversation data generating unit that generates conversation data including the first utterance content and the first response content of the user 10 with respect to the first utterance content;
The conversation execution unit acquires the voice that matches the first utterance content after the conversation data generation unit generates the conversation data including the first utterance content and the first response content. If so, the first response content is output as voice,
The voice acquisition unit acquires a second response content with respect to the first response content output by the conversation execution unit as a voice,
The conversation according to any one of claims 5 to 9, wherein the conversation data generation unit generates conversation data associated in the order of the first utterance content, the first response content, and the second response content. Processing system.
The similarity with the profile of the first user that exceeds the second conversation algorithm stored in the second conversation algorithm storage unit of the first information terminal possessed by the first user exceeds a predetermined standard. The conversation processing system according to claim 2, further comprising an algorithm sharing processing unit that copies to the second conversation algorithm storage unit of a second information terminal possessed by a second user having a profile.
A program for causing a computer to function as the conversation processing system according to any one of claims 1 to 11.