JP4015424B2 - Voice robot system - Google Patents

Voice robot system Download PDF

Info

Publication number
JP4015424B2
JP4015424B2 JP2002002499A JP2002002499A JP4015424B2 JP 4015424 B2 JP4015424 B2 JP 4015424B2 JP 2002002499 A JP2002002499 A JP 2002002499A JP 2002002499 A JP2002002499 A JP 2002002499A JP 4015424 B2 JP4015424 B2 JP 4015424B2
Authority
JP
Japan
Prior art keywords
unit
emotion
voice
user
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2002002499A
Other languages
Japanese (ja)
Other versions
JP2003202892A (en
Inventor
淳 富士本
和生 岡田
Original Assignee
アルゼ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by アルゼ株式会社 filed Critical アルゼ株式会社
Priority to JP2002002499A priority Critical patent/JP4015424B2/en
Publication of JP2003202892A publication Critical patent/JP2003202892A/en
Application granted granted Critical
Publication of JP4015424B2 publication Critical patent/JP4015424B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Description

[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice robot system.
[0002]
[Prior art]
Conventionally, there are voice robots that react to voices uttered by users. This voice robot can function as a kind of advertising doll if it is installed in a place where many people come and go, such as a pachinko game hall. Also, if a person living alone installs a voice robot at home and speaks toward the voice robot, the voice robot moves in response to the utterance, so that the person can perform a predetermined operation. You can enjoy watching and enjoy the loneliness of living alone.
[0003]
[Problems to be solved by the invention]
However, the voice robot performs only one action (for example, the action of only the neck) in response to the voice emitted from the user, and the user feels something unsatisfactory with respect to the voice robot. On the other hand, at present, there is a voice robot that performs a plurality of series of operations in response to a voice uttered by a user. However, the voice robot simply has a plurality of predetermined voices while a voice is generated from the user. The movement pattern (for example, movement in the order of neck → arm → waist) is merely performed continuously, and does not satisfy the user's heart.
[0004]
Therefore, the present invention has been made in view of the above points, and infers the emotion level of the user based on the content spoken by the user, and is a complex for moving the robot in accordance with the inferred emotion level. Provided is a voice robot system capable of determining a simple motion pattern and moving the robot using the determined motion pattern.
[0005]
[Means for Solving the Problems]
The invention according to the present application has been made to solve the above-described problem, and is a voice robot system in which a movable part of a robot moves in response to voice uttered by a user, and acquires the voice of the user. An acquisition unit; a voice recognition unit that identifies a character string corresponding to the voice based on the voice acquired by the acquisition unit; a storage unit that stores a search table that lists predetermined keyword groups; and a voice recognition unit Inferring the degree of emotion a user has based on the specified character string, creating a question content that is a question to the user, and storing the question content in a storage means And if it is judged that the meaning of the decoded word is ambiguous, use the fuzzy function. An inference engine, comprising: an inference engine; an audio output means for outputting the content of a question created by the inference engine as a voice; and a movable means for moving a movable part of the robot in accordance with an emotion level inferred by the inference engine. Is based on the character string including the keyword extracted by the keyword extraction means, the keyword extraction means that collates the character string recognized by the character recognition means and the search table, and extracts a predetermined keyword from the character string. The emotion recognition means for recognizing the emotion level for the keyword, the predetermined keyword extracted by the keyword extraction means, and the emotion level recognized by the emotion recognition means are related to the predetermined keyword and the emotion level. The AI inference unit that determines the question contents and accumulates the question contents in the storage means, and the robot according to the emotion level recognized by the emotion recognition means It is characterized in that it has an operation determination unit that determines a moving pattern for moving the movable portion of the bets.
[0006]
According to the invention according to the present application, the robot recognizes the emotion level indicating the degree of emotion the user has with respect to the keyword based on the keyword included in the character string issued from the user. Since the movable part is moved according to the recognized emotion level, the movable part can be moved not by a single motion pattern but by a complex motion pattern according to the emotion level.
[0007]
Further, since the robot operates with a complicated pattern, it becomes difficult for the user to predict the next operation of the robot, and the user can enjoy watching the robot without getting tired.
[0008]
The emotion level may be determined based on an emphasized word included in the character string. As a result, the robot can determine the emotion level of the user based on the emphasis word included in the character string, and thus can accurately grasp the emotion level the user has for the predetermined keyword. It is possible to perform an action that matches the emotion level held by the person (for example, if the user is in an excited state, the robot moves violently).
[0009]
Furthermore, the invention according to the present application accumulates a plurality of conversation tables that associate each keyword with each conversation phrase, collates the extracted keyword with the conversation table, and detects the conversation phrase that matches the keyword. And outputting a voice corresponding to the phrase of the conversation based on the detected phrase of the conversation.
[0010]
According to the invention according to the present application, since the robot outputs the conversation contents related to the keyword based on the keyword included in the character string uttered by the user, the user is as if with the robot. You can enjoy the feeling of having a conversation.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
[Basic configuration of voice robot system]
A voice robot system according to the present invention will be described with reference to the drawings. FIG. 1 is a schematic configuration diagram of a voice robot system according to the present embodiment. As shown in the figure, the voice robot system includes a robot 1 having a determination unit 100 and an operation unit 200. The robot 1 may have a character outer shape used for advertisements, video games, and the like.
[0012]
The determination unit 100 infers the emotion level of the user from the content uttered by the user, and based on the inferred emotion level, each unit in the movable unit 200 (in this embodiment, the arm movable unit 202, the knee movable unit 203, A command for moving the neck movable unit 204 and the waist movable unit 205 (hereinafter, these units are simply referred to as “units”) is generated. In this embodiment, the input unit 101, the voice recognition unit 102, , A speech recognition dictionary storage unit 103, an inference engine 104, an emotion information database 105, and an output unit 106.
[0013]
The input unit 101 is an acquisition unit that acquires a user's voice. Specifically, the input unit 101 acquires the user's voice and outputs the acquired voice to the voice recognition unit 102 as a voice signal. The voice recognition unit 102 is a voice recognition unit that identifies a character string corresponding to the voice based on the voice acquired by the input unit 101.
[0014]
Specifically, the speech recognition unit 102 to which the speech signal is input analyzes the input speech signal, and a character string corresponding to the analyzed speech signal is stored in a dictionary stored in the speech recognition dictionary storage unit 103. The specified character string is output to the inference engine 104 as a character string signal.
[0015]
The voice recognition dictionary storage unit 103 stores a dictionary corresponding to standard voice signals. The output unit 106 outputs voice or the like based on a command from the inference engine 104.
[0016]
The voice recognition unit 102 is also a character recognition unit that specifies a character string input by a user through an operation unit (for example, a keyboard). Further, the voice recognition dictionary storage unit 103 stores a dictionary corresponding to a character string input by the user through the operation unit.
[0017]
As a result, the inference engine 104 not only utters the input unit 101 but also inputs characters from the operation unit. It is possible to infer the degree and generate a command for moving each part of the movable part 200 based on the inferred emotion degree.
[0018]
The emotion information database 105 is a storage unit that stores a search table that lists predetermined keyword groups. Here, for example, keywords related to sports (such as soccer, basketball, table tennis, tennis, and badminton), keywords related to reading (such as reasoning novels and non-fiction), and keywords related to current events (such as politics and economics) Can be mentioned.
[0019]
The inference engine 104 makes a predetermined question to the user through the output unit 106, further infers the emotion level the user has based on the character string specified by the voice recognition unit 102, and based on the inferred emotion level A command for moving the movable part 200 is generated.
[0020]
In this embodiment, the inference engine 104 is a context dictionary for decoding contexts, a similarity dictionary for examining language similarity, a dictionary for word phrase analysis, morphological analysis of words (each part-of-speech, inflection, classification) ), And the meaning content of the words spoken by the user can be deciphered based on these dictionaries, and the emotion level of the user can be inferred from the deciphered meaning content.
[0021]
In other words, the inference engine 104 that deciphers the semantic content of the words, based on the deciphered semantic content, coherence of the deciphered meaning, topic change, language that forms the user's emotions, statistics of the conversation so far Thus, it is possible to infer emotions held by the user and to create a sentence suitable for asking a question to the user.
[0022]
The inference engine 104 is also composed of artificial intelligence and a neural network. The inference engine 104 learns a language (word, sentence, etc.) exchanged with a user from a neural network, and converts the learned language into the learned language. Based on this, it is possible to create a content for asking a question to the user.
[0023]
Further, when the reasoning engine 104 determines that the meaning content of the decoded word is an ambiguous expression, the inference engine 104 can also create a question content corresponding to the ambiguous expression using the fuzzy function. Note that the execution of the above-described functions of the inference engine 104 is mainly performed by an AI inference unit 104e and an operation determination unit 104f described later.
[0024]
Specifically, the inference engine 104 to which the character string signal is input from the speech recognition unit 102 is included in the user's emotion and character string based on the elements constituting the character string corresponding to the input character string signal. Sort keywords These “user emotions” and “keywords included in the character string” mean emotion information in this embodiment.
[0025]
Here, “user's emotion (type of emotion)” includes, for example, “Like / No”, “Good” / “Bad”, etc., as shown in FIG. The “user's emotion” also includes the degree of emotion (feeling level) that the user has. As shown in FIG. 5, for example, the user is very interested / interested / interested. There is no / not interested at all.
[0026]
In the present embodiment, this “degree of emotion” is P1 (P: Positive / plus element) (for example, very interested) when the user has a strong positive emotion, and the user is positive. P2 [for example, interested] when the user simply has a negative emotion, N1 (N; negative / negative element) [for example, not interested] when the user simply has a negative emotion A person who has a strong negative feeling will be expressed as N2 [for example, no interest at all]. The “degree of emotion” is not limited to the above example.
[0027]
The inference engine 104, which classifies information related to the user's emotion from the elements constituting the character string, infers the emotion level held by the user based on the user's emotion information.
[0028]
For example, when the inference engine 104 asks the user through the output unit 106 “It is fun to play soccer?” And the user responds “It is very boring to play soccer” The inference engine 104 determines that the user has a negative emotional level about soccer from “soccer” (keyword) and “very boring” (emotion level N2). That is, the inference engine 104 determines the emotion level that the user has for the keyword.
[0029]
The inference engine 104 generates various commands for moving the movable unit 200 based on the determined emotion level. The determination of the emotion level and the generation of various commands will be described in detail in the AI inference unit 104e and the operation determination unit 104f described later. Thereby, the inference engine 104 infers the emotion level of the user included in the character string based on the character string specified through the input unit 101, and moves the movable unit 200 according to the inferred emotion level. Various instructions can be generated.
[0030]
In the present embodiment, the inference engine 104 includes a phrase recognition unit 104a, a classification unit 104b, an emphasized word detection unit 104c, an emotion determination unit 104d, an AI inference unit 104e, and an operation determination as shown in FIG. Part 104f.
[0031]
The phrase recognition unit 104a analyzes the sentence and recognizes the meaning space of the words grasped from the sentence based on the analyzed sentence. Here, sentence analysis means analyzing sentence form elements such as parts of speech, inflection forms, classifications, and connection relations. The meaning space of words is grasped from context, sentence similarity, and sentence learning patterns.
[0032]
Furthermore, the phrase recognizing unit 104a recognizes between sentences by the above recognition. Specifically, the phrase recognizing unit 104a to which the character string signal is input from the speech recognition unit 102 that has recognized the semantic space of the word grasped from the sentence corresponds to the character string signal based on the input character string signal. Recognize between sentences.
[0033]
In this embodiment, since there is a certain time interval between sentences in the present embodiment, for example, the sentence is distinguished from the sentence based on the time interval. For example, when the sentence corresponding to the character signal is “It is hot today ... Let's eat ice cream”, the phrase recognition unit 104a has a time interval in the sentence. Recognize that “is a sentence break,” and divide it into the sentence “It ’s hot today” and “Let's eat ice cream”.
[0034]
The phrase recognizing unit 104a that recognizes a sentence between sentences divides the sentence into sentences and classifies the sentence into sentences as a sentence body signal 104b, an emphasized word detection unit 104c, and an emotion determination unit 104d. Output to.
[0035]
The classification unit 104b determines the type of user's emotion from the character string. Specifically, the classification unit 104b to which the style signal is input from the phrase recognition unit 104a classifies the types of emotions included in the character string based on the “emotion classification table” shown in FIG.
[0036]
As described above, as shown in FIG. 3, this emotion type is “plus element P” when the user feels positive thinking, and “when the user feels negative thought”. It consists of what is designated as “minus element N”. The classification unit 104b classifies what kind of emotion is included in one sentence based on the above “emotion classification table”, and the classification result is sent to the emotion determination unit 104d as a classification signal. Output.
[0037]
The emphasized word detection unit 104c extracts an element characterizing emotion strength from a character string. Specifically, the emphasized word detection unit 104c, to which the stylistic signal is input from the phrase recognition unit 104a, has an emphasized word among the elements constituting the sentence based on the sentence corresponding to the input stylistic signal. Whether or not is detected.
[0038]
In this embodiment, this emphasis word can be detected according to, for example, an “emphasis word table” shown in FIG. In this "emphasis word table", as shown in the figure, for example, there are adverb and exclamation of suge, choo, uhyo, wow, hie, super, very, quite etc. included. The emphasized word detection unit 104c detects an emphasized word in one sentence based on the “emphasized word table”, and outputs the detected emphasized word as an emphasized word detection signal to the emotion determination unit 104d.
[0039]
The emotion determination unit 104 d compares the character string recognized by the voice recognition unit 102 with the search table stored in the emotion information database 105. , From within a string It is a keyword extracting means for extracting a predetermined keyword. The emotion determination unit 104d is also an emotion recognition unit that recognizes an emotion level indicating the level of emotion that the user has for the keyword based on the character string including the extracted keyword.
[0040]
Specifically, the emotion determination unit 104d to which the stylistic signal is input from the phrase recognition unit 104a collates the character string corresponding to the input stylistic signal with the search table, and is included in the search table from the character string. Extract keywords that match the keyword.
[0041]
In addition, the emotion determination unit 104d, to which the classification signal or the emphasized word detection signal is input from the classification unit 104b or the emphasized word detection unit 104c, extracts the keyword extracted based on the input classification signal or the emphasized word detection signal. On the other hand, the degree of emotion (feeling level) held by the user is determined (determined).
[0042]
In this embodiment, the determination of the degree of emotion can be performed according to, for example, an “emotion level table” shown in FIG. This “emotion level table” has, for example, a determination element (user's emotion) and an “emotion level (emotion level)” as shown in FIG.
[0043]
The determination element means a phrase that influences the user's emotion. For example, as shown in FIG. This determination element has the same meaning as “user's emotion” described above.
[0044]
As shown in FIG. 4, the “degree of emotion” is very interested in the degree of emotion (degree of emotion) when the determination element is “interested / not interested”, for example (P1). , Interested (P2), not interested (N1), and not interested at all (N2). In addition, this emotion level is not limited to what is classified into four.
[0045]
The emotion determination unit 104d refers to the “emotion level table” based on the “emotion category” corresponding to the category signal, the emphasized word corresponding to the emphasized word detection signal, and a predetermined keyword, and can be grasped from the sentence. The degree of emotion of the person is determined, and the determination results (P1, P2, N1, N2) are output to the AI inference unit 104e as emotion determination signals.
[0046]
For example, if the character string corresponding to the style signal is “It is very boring to play soccer”, the classification unit 104b detects the character string “not boring” and the emphasized word detecting unit 104c is “very”. The emotion determination unit 104d detects the character string “soccer”.
[0047]
The emotion determination unit 104d refers to the table of FIG. 5 based on the “bottom” detected by the classification unit 104b and the “very” detected by the emphasized word detection unit 104c, and determines “soccer” (keyword). Then, it is determined that the degree of emotion (feeling level) held by the user is N2. The emotion determination unit 104d that has determined the emotion level outputs the determined emotion level to the AI inference unit 104e as an emotion level signal.
[0048]
The AI inference unit 104e asks the user various questions through the output unit 106. Specifically, the AI inference unit 104e to which the emotion level signal is input from the emotion determination unit 104d is a keyword corresponding to the input emotion level signal. as well as Based on the feelings that users have about that keyword, as well as Ask questions related to the emotional level of the keyword.
[0049]
For example, the AI inference unit 104e to which the emotion level signal is input from the emotion determination unit 104d is a keyword corresponding to the input emotion level signal. as well as If the user's emotional level for the keyword is soccer (keyword) or emotional level N2 (very boring) from the above example, it is inferred (judged) that he is not interested in soccer, and questions about other than soccer I do.
[0050]
The AI inference unit 104e to which the emotion level signal is input from the emotion determination unit 104d makes a question related to the input emotion level signal and outputs the input emotion level signal to the action determination unit 104f.
[0051]
Note that the content of the question performed by the AI reasoning unit 104e may be stored in advance in the emotion information database 105 in a plurality of conversation tables that associate each keyword with each conversation phrase. Further, the AI inference unit 104e may include a phrase detecting unit that compares the extracted keyword with the conversation table and detects a phrase of the conversation associated with the keyword, and the output unit 106 further includes an AI The voice corresponding to the conversation phrase may be output based on the conversation phrase detected by the inference unit 104e.
[0052]
As a result, when an emotion level signal is input from the emotion determination unit 104d, the AI reasoning unit 104e checks the keyword against the conversation table based on the keyword included in the input emotion level signal. The associated phrase of conversation is detected, and the output unit 106 outputs a predetermined voice (question content) based on the phrase of conversation detected by the AI inference unit 104e, whereby the AI inference unit 104e outputs A question about a certain matter can be made to the user through the unit 106.
[0053]
Further, the AI inference unit 104e can make a predetermined question to the user through the output unit 106 based on the content uttered by the user, so that the user feels as if he / she is talking to a human. You can taste it.
[0054]
Note that the AI inference unit 104e can determine the contents of a question to be asked to the user according to the user's emotion level. For example, if the user's emotion level is high, the AI inference unit 104e may calm down the user. When the content and the user's emotion level are low, the content of the question can be determined as content that supports the user or content that makes the user cheer up. These question contents can be accumulated in the emotion information database 105.
[0055]
As a result, the robot 1 can emit content that encourages the user when, for example, the emotion level of the user is low (such as a state in which the user feels depressed). Courage is courageous by the content emanating from and you can spend a little fun in your daily life.
[0056]
The action determination unit 104f determines a movable pattern for moving the movable part of the robot 1 according to the emotion level determined by the emotion determination unit 104d. Specifically, the action determination unit 104f to which the emotion level signal is input from the AI inference unit 104e performs various operations for moving each unit in the movable unit 200 according to the emotion level included in the input emotion level signal. Generate instructions.
[0057]
For example, when the emotion level included in the input emotion level signal is “P1”, the motion determination unit 104f uses the table shown in FIG. 6 to move the arm movable unit 202, the knee movable unit 203, the neck movable unit 204, A movable command signal for moving the waist movable unit 205 is output to the movable unit 200.
[0058]
Further, when the emotion level included in the input emotion level signal is “P2”, the motion determination unit 104f uses, for example, the arm movable unit 202, the knee movable unit 203, and the waist movable unit according to the table shown in FIG. A movable command signal for moving 205 is output to the movable unit 200. Further, when the emotion level included in the input emotion level signal is “N1”, the motion determination unit 104f moves, for example, the knee movable unit 203 and the neck movable unit 204 using the table shown in FIG. The movable command signal is output to the movable unit 200.
[0059]
Furthermore, when the emotion level included in the input emotion level signal is “N2”, the motion determination unit 104f receives, for example, a movable command signal for moving the neck movable unit 204 using the table shown in FIG. Output to the movable unit 200. The movable pattern determined by the operation determining unit 104f is not limited to the above four patterns, and can be realized by various combinations.
[0060]
Thereby, since the action determination unit 104f generates a command for moving each unit in the movable unit 200 according to the emotion level included in the emotion level signal, for example, the emotion level included in the emotion level signal is high ( In the case of an excitement state, etc., in order to express that the degree of emotion of the user is high, it is possible to generate a command that allows all of the units in the movable unit 200 to move.
[0061]
In addition, the motion determination unit 200 not only generates a command for performing a certain motion in response to the voice spoken by the user, but also operates the movable part according to the emotion level held by the user. Since the robot 1 can generate various commands, the robot 1 can perform various operations according to the user's emotions, and the user can operate the robot 1 every time his / her emotions change. You can enjoy watching the change of
[0062]
The movable unit 200 moves each part of the robot 1 according to a command from the determination unit 100. In the present embodiment, the movable control unit 201, the arm movable unit 202, the knee movable unit 203, and the neck A movable part 204 and a waist movable part 205 are provided.
[0063]
The arm movable part 202 is for moving the arm part of the robot 1. The knee movable unit 203 is for moving the knee portion of the robot 1. The neck movable unit 204 moves the neck portion of the robot 1. The waist movable unit 205 is for moving the waist portion of the robot 1. The operation control unit 201 moves each unit based on the movable command signal from the operation determination unit 104f.
[0064]
In the present invention, the determination unit 100 in the robot 1 is arranged on a container different from the robot 1, and a communication unit is provided between the determination unit 100 arranged on the container and the movable unit 200 in the robot 1. It may be provided. As a result, the robot 1 does not need to include the determination unit 100. Therefore, the weight of the robot 1 is reduced by the absence of the determination unit 100, and the robot 1 can smoothly move each part while maintaining a stable center of gravity. Can be moved.
[0065]
In the present invention, the bottom portion (such as a foot) of the robot 1 may be fixed. Thereby, even if each part of the robot 1 moves violently, the robot 1 can be prevented from rolling over.
[0066]
[Voice Robot Operation Method Using Voice Robot System]
The voice robot operation method by the voice robot system having the above-described configuration can be implemented by the following procedure. FIG. 7 is a flowchart showing the procedure of the voice robot operation method according to the present embodiment. First, the input unit 101 performs a step of acquiring voice uttered by the user (S101). Specifically, the input unit 101 acquires voice uttered by the user, and outputs the acquired voice to the voice recognition unit 102 as a voice signal.
[0067]
Next, the voice recognition unit 102 performs a step of specifying a character string corresponding to the voice information based on the voice acquired by the input unit 101 (S102). Specifically, the speech recognition unit 102 to which the speech signal is input analyzes the input speech signal, and a character string corresponding to the analyzed speech signal is stored in a dictionary stored in the speech recognition dictionary storage unit 103. The specified character string is output to the inference engine 104 as a character string signal.
[0068]
Next, the inference engine 104 infers the emotion level that the user has for the keyword based on the keyword included in the character string specified by the speech recognition unit 102, and the robot moves according to the inferred emotion level. The step which moves the part 200 is performed (S103). The process performed here is demonstrated based on FIG.
[0069]
In the inference engine 104, first, as shown in FIG. 8, the phrase recognition unit 104a analyzes the sentence, and performs a step of recognizing the meaning space of the word grasped from the sentence based on the analyzed sentence (S200). . Next, the phrase recognition unit 104a performs a step of recognizing between sentences by the above recognition (S201). Specifically, the phrase recognizing unit 104a that has grasped the semantic space of the word grasped from the sentence recognizes between the sentence corresponding to the character signal based on the inputted character signal.
[0070]
In this embodiment, since there is a certain time interval between sentences in the present embodiment, for example, the sentence is distinguished from the sentence based on the time interval. For example, when the sentence corresponding to the character signal is “It is hot today ... Let's eat ice cream”, the phrase recognition unit 104a has a time interval in the sentence. Recognize that “is a sentence break,” and divide it into the sentence “It ’s hot today” and “Let's eat ice cream”.
[0071]
Then, the phrase recognizing unit 104a that recognizes between the sentences divides the sentence into sentences, and classifies the sentence divided into sentences as a stylistic signal 104b, the emphasized word detecting unit 104c, and emotion determination. Output to the unit 104d.
[0072]
Next, the classification unit 104b performs a step of discriminating the type of user's emotion from the character string (S202). Specifically, the classification unit 104b to which the style signal is input from the phrase recognition unit 104a classifies the types of emotions held by the user based on the “emotion classification table” shown in FIG.
[0073]
The classification unit 104b classifies what emotions are included in one sentence based on the above “emotion classification table”, and outputs the classified result to the emotion determination unit 104d as a classification signal. .
[0074]
Next, the emphasized word detection unit 104c performs a step of extracting an element characterizing the strength of emotion from the character string (S203). Specifically, the emphasized word detection unit 104c, to which the stylistic signal is input from the phrase recognition unit 104a, has an emphasized word among the elements constituting the sentence based on the sentence corresponding to the input stylistic signal. Whether or not is detected.
[0075]
In this embodiment, this emphasis word can be detected according to, for example, an “emphasis word table” shown in FIG. In this "emphasis word table", as shown in the figure, for example, there are adverb and exclamation of suge, choo, uhyo, wow, hie, super, very, quite etc. included. The emphasized word detection unit 104c detects an emphasized word in one sentence based on the “emphasized word table”, and outputs the detected emphasized word as an emphasized word detection signal to the emotion determination unit 104d.
[0076]
Next, the emotion determination unit 104d performs a step of determining the degree of emotion that the user has with respect to the keyword included in the character string (S204). Specifically, the emotion determination unit 104d to which the stylistic signal is input from the phrase recognition unit 104a collates the character string corresponding to the input stylistic signal with the search table, and is included in the search table from the character string. Extract keywords that match the keyword.
[0077]
Thereafter, the emotion determination unit 104d, to which the classification signal or the emphasized word detection signal is input from the classification unit 104b and the emphasized word detection unit 104c, extracts the extracted keyword based on the input classification signal or the emphasized word detection signal. On the other hand, the degree of emotion (feeling level) held by the user is determined.
[0078]
The emotion determination unit 104d refers to the “emotion level table” based on the “emotion category” corresponding to the category signal, the emphasized word corresponding to the emphasized word detection signal, and a predetermined keyword, and can be grasped from the sentence. The degree of emotion of the person is determined, and the determination results (P1, P2, N1, N2) are output to the AI inference unit 104e as emotion determination signals.
[0079]
For example, if the character string corresponding to the style signal is “It is very boring to play soccer”, the classification unit 104b detects the character string “not boring” and the emphasized word detecting unit 104c is “very”. The emotion determination unit 104d detects the character string “soccer”.
[0080]
The emotion determination unit 104e refers to the table of FIG. 5 based on the “bottom” detected by the classification unit 104b and “very” detected by the emphasized word detection unit 104c, and the emotion for “soccer” (keyword). Is determined to be N2. The emotion determination unit 104d that has determined the emotion level outputs the determined emotion level as an emotion level signal to the AI inference unit 104e.
[0081]
Next, the AI inference unit 104e performs a step of asking various questions to the user (S205). Specifically, the AI inference unit 104e to which the emotion level signal is input from the emotion determination unit 104d, the keyword and the keyword based on the keyword corresponding to the input emotion level signal and the user's emotion level for the keyword Ask questions related to the emotion level associated with At the same time, this question is stored in the emotion information database 105 .
[0082]
For example, the AI inference unit 104e to which the emotion level signal is input from the emotion determination unit 104d has a keyword corresponding to the input emotion level signal and the user's emotion level for the keyword, soccer (keyword), emotion If the degree is N2 (very boring), it is inferred (judged) that you are not interested in soccer and asks questions other than soccer.
[0083]
The AI inference unit 104e to which the emotion level signal is input from the emotion determination unit 104d makes a question associated with the input emotion level signal and outputs the input emotion level signal to the action determination unit 104f.
[0084]
Next, the operation determination unit 104f performs a step of moving each unit in the movable unit 200 in accordance with the emotion level determined by the emotion determination unit 104d (S206). Specifically, the action determination unit 104f to which the emotion level signal is input from the AI inference unit 104e is movable to move each unit in the movable unit 200 according to the emotion level included in the input emotion level signal. A command signal is generated, and the generated movable command signal is output to the movable unit 200.
[0085]
For example, when the emotion level included in the input emotion level signal is “P1”, the motion determination unit 104f uses the table shown in FIG. 6 to move the arm movable unit 202, the knee movable unit 203, the neck movable unit 204, A movable command signal for moving the waist movable unit 205 is output to the movable unit 200.
[0086]
After that, the motion control unit 201 to which the movable command signal is input from the motion determination unit 104f, based on the input motion command signal, each unit (arm movable unit 202, knee movable unit 203, neck movable unit 204, waist movable unit). 205) is moved.
[0087]
[Actions and effects of voice robot system and voice robot operation method]
According to the invention according to the present embodiment as described above, the motion determination unit 104f generates a command for moving each unit in the movable unit 200 according to the emotion level included in the emotion level signal. When the degree of emotion included in the degree signal is high (excited state or the like), in order to express that the degree of emotion of the user is high, it is possible to generate an instruction that can move all of the units in the movable unit 200. .
[0088]
In addition, the motion determination unit 200 not only generates a command for performing a certain motion in response to the voice spoken by the user, but also operates the movable part according to the emotion level held by the user. Therefore, the robot 1 can perform various actions in accordance with the user's emotions according to the instructions generated by the action determining unit 200. It can be enjoyed by watching the behavior of the robot 1 changing whenever the emotion of the robot changes.
[0089]
Furthermore, since the emotion determination unit 104d can determine the emotion level of the user based on the emphasized word included in the character string, the emotion determination unit 104d can accurately grasp the emotion level the user has with respect to the predetermined keyword. As a result, the action determining unit 104f can cause the movable part 200 to perform an action that matches the emotion level of the user (for example, if the user is in an excited state, the robot moves violently). it can.
[0090]
Finally, since the robot 1 operates in a complicated pattern according to various commands from the operation determining unit 200, it becomes difficult for the user to predict the next operation of the robot 1, and the robot 1 operates in a complicated manner. You can enjoy watching without getting bored.
[0091]
【The invention's effect】
As described above, according to the present invention, the emotion level of the user is inferred based on the content uttered by the user, and a complex motion pattern for moving the robot is determined according to the inferred emotion level. The robot can be moved using the determined motion pattern.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a voice robot system according to an embodiment.
FIG. 2 is a block diagram showing an internal structure of the inference engine in the present embodiment.
FIG. 3 is a diagram showing the contents of an emotion classification table stored in a classification section in the present embodiment.
FIG. 4 is a diagram showing the contents of an emphasized word table stored in an emphasized word detection unit in the present embodiment.
FIG. 5 is a diagram showing the contents of an emotion level table stored in an emotion determination unit in the present embodiment.
FIG. 6 is a diagram showing the contents of an operation table for determining the operation of the robot by the operation determining unit in the present embodiment.
FIG. 7 is a flowchart showing a procedure of the voice robot operation method according to the present embodiment.
FIG. 8 is a flowchart showing a procedure processed in the inference engine in the present embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Robot, 100 ... Judgment part, 101 ... Input part, 102 ... Speech recognition part, 103 ... Speech recognition dictionary memory | storage part, 104 ... Inference engine, 104a ... Phrase recognition part, 104b ... Classification part, 104c ... Emphasis word detection part 104d ... emotion determination unit, 104e ... AI inference unit, 104f ... motion determination unit, 105 ... emotion information database, 106 ... output unit

Claims (5)

  1. A voice robot system in which a movable part of a robot moves in response to voice emitted from a user,
    Acquisition means for acquiring the user's voice;
    Voice recognition means for identifying a character string corresponding to the voice based on the voice acquired by the acquisition means;
    Storage means for storing a search table listing predetermined keyword groups;
    Inferring the degree of emotion that the user has based on the character string specified by the voice recognition means, creating a question content that is a question to the user, and storing the question content in the storage means And if the semantic content of the decoded word is an ambiguous expression, an inference engine that uses a fuzzy function,
    Voice output means for outputting the question content created by the inference engine as voice;
    Movable means for moving a movable part of the robot in accordance with the emotion level inferred by the inference engine,
    The inference engine is
    Keyword extraction means for comparing the character string recognized by the voice recognition means with the search table and extracting a predetermined keyword from the character string;
    Emotion recognition means for recognizing the emotion level for the keyword based on the character string including the keyword extracted by the keyword extraction means;
    Based on the predetermined keyword extracted by the keyword extraction unit and the emotion level recognized by the emotion recognition unit, the question content related to the predetermined keyword and the emotion level is determined, and the storage unit AI inference unit for storing the question content in
    A voice robot system comprising: an operation determining unit that determines a movable pattern for moving a movable part of the robot according to the emotion level recognized by the emotion recognition unit.
  2. The voice robot system according to claim 1,
    The voice robot characterized in that the movable means moves the movable part based on recognition information including the emotion level recognized by the emotion recognition means and the keyword extracted by the keyword extraction means. system.
  3. The voice robot system according to claim 1 or 2,
    The storage means stores in advance a plurality of conversation tables that associate each keyword with each conversation phrase,
    The inference engine includes a phrase detection unit that compares the keyword extracted by the keyword extraction unit with the conversation table and detects a phrase of the conversation associated with the keyword,
    The voice output system, wherein the voice output unit outputs a voice corresponding to the phrase of the conversation based on the phrase of the conversation detected by the phrase detection unit.
  4. A voice robot system according to any one of claims 1 to 3,
    The voice robot system, wherein the emotion level is determined based on an emphasis word composed of an adverb or exclamation included in the character string.
  5. A voice robot system according to any one of claims 1 to 4,
    The inference engine includes a context dictionary for deciphering a context, a similarity dictionary for examining language similarity, a dictionary for word phrase analysis, and a dictionary for word morphological analysis. In addition, the speech robot system is characterized by deciphering the semantic content of the words spoken by the user and inferring the emotion level held by the user from the decoded semantic content.
JP2002002499A 2002-01-09 2002-01-09 Voice robot system Active JP4015424B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2002002499A JP4015424B2 (en) 2002-01-09 2002-01-09 Voice robot system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2002002499A JP4015424B2 (en) 2002-01-09 2002-01-09 Voice robot system

Publications (2)

Publication Number Publication Date
JP2003202892A JP2003202892A (en) 2003-07-18
JP4015424B2 true JP4015424B2 (en) 2007-11-28

Family

ID=27642338

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2002002499A Active JP4015424B2 (en) 2002-01-09 2002-01-09 Voice robot system

Country Status (1)

Country Link
JP (1) JP4015424B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101048619B1 (en) * 2008-12-19 2011-07-13 재단법인대구경북과학기술원 Fuzzy Controller and Control Method for Obstacle Avoidance with Table-based Fuzzy Single Positive and Double Negative Rules
CN102671381A (en) * 2011-03-08 2012-09-19 德信互动科技(北京)有限公司 Acoustic control-based game implementation device and method
CN102698434A (en) * 2011-03-28 2012-10-03 德信互动科技(北京)有限公司 Device and method for implementing game based on conversation
EP3373301A1 (en) 2017-03-08 2018-09-12 Panasonic Intellectual Property Management Co., Ltd. Apparatus, robot, method and recording medium having program recorded thereon

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100877476B1 (en) 2007-06-26 2009-01-07 주식회사 케이티 Intelligent robot service apparatus and method on PSTN
JP5172049B2 (en) 2011-06-14 2013-03-27 パナソニック株式会社 Robot apparatus, robot control method, and robot control program
JP6199927B2 (en) * 2015-06-17 2017-09-20 Cocoro Sb株式会社 Control system, system and program

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101048619B1 (en) * 2008-12-19 2011-07-13 재단법인대구경북과학기술원 Fuzzy Controller and Control Method for Obstacle Avoidance with Table-based Fuzzy Single Positive and Double Negative Rules
CN102671381A (en) * 2011-03-08 2012-09-19 德信互动科技(北京)有限公司 Acoustic control-based game implementation device and method
CN102698434A (en) * 2011-03-28 2012-10-03 德信互动科技(北京)有限公司 Device and method for implementing game based on conversation
EP3373301A1 (en) 2017-03-08 2018-09-12 Panasonic Intellectual Property Management Co., Ltd. Apparatus, robot, method and recording medium having program recorded thereon

Also Published As

Publication number Publication date
JP2003202892A (en) 2003-07-18

Similar Documents

Publication Publication Date Title
Daneš Sentence intonation from a functional point of view
Goldberg Constructions at work: The nature of generalization in language
Cowie et al. Emotion recognition in human-computer interaction
Klepousniotou The processing of lexical ambiguity: Homonymy and polysemy in the mental lexicon
Osgood et al. Psycholinguistics: a survey of theory and research problems.
Bever The cognitive basis for linguistic structures
McIntire The acquisition of American Sign Language hand configurations
Dascal Pragmatics and the Philosophy of Mind: Vol. I: Thought in Language
Cloitrew et al. Linguistic anaphors, levels of representation, and discourse
Ullman Acceptability ratings of regular and irregular past-tense forms: Evidence for a dual-system model of language from word frequency and phonological neighbourhood effects
US9213558B2 (en) Method and apparatus for tailoring the output of an intelligent automated assistant to a user
Cieślicka Literal salience in on-line processing of idiomatic expressions by second language learners
Householder Linguistic speculations
CN1321401C (en) Speech recognition apparatus, speech recognition method, conversation control apparatus, conversation control method
Rohde A connectionist model of sentence comprehension and production
Kroeger Analyzing syntax: A lexical-functional approach
Emmorey et al. The mental lexicon
Arnold Reference production: Production-internal and addressee-oriented processes
Finch Linguistic terms and concepts
Monaghan et al. What exactly interacts with spelling--sound consistency in word naming?
Radford Minimalist syntax: Exploring the structure of English
Van Dijk et al. Strategies of discourse comprehension
Jescheniak et al. Word frequency effects in speech production: Retrieval of syntactic information and of phonological form.
Pieraccini The voice in the machine: building computers that understand speech
Tallerman Did our ancestors speak a holistic protolanguage?

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20041014

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20060627

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20060822

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20070327

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20070515

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20070731

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20070820

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20070911

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20070913

R150 Certificate of patent or registration of utility model

Ref document number: 4015424

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100921

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100921

Year of fee payment: 3

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313532

S533 Written request for registration of change of name

Free format text: JAPANESE INTERMEDIATE CODE: R313533

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100921

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100921

Year of fee payment: 3

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110921

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110921

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120921

Year of fee payment: 5

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120921

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130921

Year of fee payment: 6

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250