US5802488A - Interactive speech recognition with varying responses for time of day and environmental conditions - Google Patents
Interactive speech recognition with varying responses for time of day and environmental conditions Download PDFInfo
- Publication number
- US5802488A US5802488A US08/609,336 US60933696A US5802488A US 5802488 A US5802488 A US 5802488A US 60933696 A US60933696 A US 60933696A US 5802488 A US5802488 A US 5802488A
- Authority
- US
- United States
- Prior art keywords
- speech
- data
- recognition
- response content
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 23
- 230000007613 environmental effect Effects 0.000 title claims abstract description 7
- 230000004044 response Effects 0.000 title claims description 176
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 52
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 52
- 238000001514 detection method Methods 0.000 claims description 20
- 230000007246 mechanism Effects 0.000 claims description 16
- 230000009118 appropriate response Effects 0.000 claims description 9
- 230000003993 interaction Effects 0.000 abstract description 13
- 230000008859 change Effects 0.000 abstract description 12
- 230000000694 effects Effects 0.000 abstract description 2
- 238000013528 artificial neural network Methods 0.000 description 31
- 238000010586 diagram Methods 0.000 description 10
- 238000000034 method Methods 0.000 description 10
- 230000009471 action Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000036962 time dependent Effects 0.000 description 2
- 241000167854 Bourreria succulenta Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 235000019693 cherries Nutrition 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63H—TOYS, e.g. TOPS, DOLLS, HOOPS OR BUILDING BLOCKS
- A63H2200/00—Computerized interactive toys, e.g. dolls
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
Definitions
- This invention relates generally to an interactive speech recognition device that recognizes speech and produces sounds or actions in response to the recognition result.
- a speech recognition toy One example of this kind of interactive speech recognition device is a speech recognition toy.
- speech recognition toy disclosed in Japanese patent application Laid-Open No. 6-142342, multiple instructions that will be used as speech instructions are pre-registered as recognition-target phrases.
- the speech signal issued by a child who is using the toy is compared to the speech signals that have been registered. When there is a match, the electrical signal pre-specified for the speech instruction is output and causes the toy to perform a specified action.
- an object of the invention is to provide an interactive speech recognition device that possesses a function for detecting changes in circumstance or environment, e.g., time of day, that can respond to the speech issued by the user by taking into account the change in circumstance or environment, and that enables more sophisticated interactions.
- the interactive speech recognition device of the invention recognizes input speech by analyzing and comparing it to pre-registered speech patterns and responds to the recognized speech.
- the speech recognition device comprises a speech analysis unit for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech; a detection unit for detecting variable data that affects the interaction content between the speech recognition device and a speaker; a coefficient setting unit responsive to the variable for generating a plurality of weighting coefficients each pre-assigned to a pre-registered recognition target speech, based on the variable data; a speech recognition unit for computing a final recognition result in response to the speech data pattern, the speech recognition unit including means for storing a plurality of pre-registered recognition target speeches and for outputting, in response to the speech data pattern, a plurality of recognition data values each for a corresponding pre-registered recognition target speech, means for computing final recognition data by multiplying each recognition data value by a corresponding pre-assigned weighting coefficient for a corresponding pre-registered recognition
- the variable data detection unit is, for example, a timing circuit for detecting time data, and the coefficient setting unit generates a weighting coefficient that corresponds to the time data of a day for each of the pre-registered recognition target speeches.
- the coefficient setting unit can be configured to output a preset largest weighting coefficient for the recognized data if it occurs at a peak time when it was correctly recognized most frequently in the past, and a smaller weighting coefficient as the time deviates from this peak time.
- the speech recognition device recognizes input speech by analyzing and comparing it to pre-registered speech patterns and responds to the recognized speech.
- the speech recognition device comprises a speech analysis unit for analyzing an input speech and creating a speech data pattern that matches the characteristics of the input speech; a speech recognition unit for generating recognition data that corresponds to the input speech based on the speech data pattern output by the speech analysis unit; a timing circuit for generating time data; a response content level storage unit for storing information relating to passage of time relative to a response content level; a response content level generation unit for storing time ranges for a plurality of response content levels, the response content level generation means being responsive to the time data from the timing means, the recognition data from the speech recognition means and the information from the response content level storage means, for generating a response content level value according to passage of time; a response content creation unit responsive to the recognition data from the speech recognition means and the response content level value from the response content level generation means for determining response content data appropriate for the response content level value generated by the
- the speech recognition device recognizes input speech by analyzing and comparing it to pre-registered speech patterns and responds to the recognized speech.
- the speech recognition device comprises a speech analysis unit for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech; a speech recognition unit for generating the recognition data that corresponds to the input speech, based on the speech data pattern from the speech analysis unit; a detection unit for detecting variable data that affects the interaction content between the speech recognition device and a speaker; a response content creation unit, responsive to the variable data from the detection unit and the recognition data from the speech recognition unit, for outputting response content data, based on the recognition data by taking the variable data into consideration; and a speech synthesis unit for converting the response content data to corresponding speech synthesis data for producing an appropriate response to the input speech.
- the detection unit may be a temperature sensor that measures an environmental temperature and outputs temperature data
- the response content creation unit outputs the response content data by taking the temperature data into consideration.
- the detection means may be an air pressure sensor that measures an environmental air pressure and outputs air pressure data
- the response content creation unit outputs the response content data by taking the air pressure data into consideration.
- the detection means may be a calendar detection means that detects calendar data and outputs the calendar data
- the response content creation means outputs the response content data by taking the calendar data into consideration.
- a weighting coefficient is assigned to the recognition data of each of the pre-registered recognition target speeches, based on the changes in the variable data (e.g., time of day, temperature, weather, and date) that affects the content of the interaction.
- variable data e.g., time of day, temperature, weather, and date
- a weighting coefficient can be assigned to each recognition data of recognition target speeches according to the time of day, and speech recognition that considers the weighting coefficients can be operated by taking into consideration whether or not the phrase (in particular, a greeting phrase) issued by the speaker is appropriate for the time of day.
- weighting coefficients can increase the differences among the numerical values of the recognition data that are ultimately output, thus improving the recognition rate.
- variable data in addition to time of day. For example, if weighting coefficients that correspond to the current temperature are set up, whether or not the greeting phrase issued by the speaker is appropriate relative to the current temperature can be determined.
- weighting coefficients can increase the differences among the numerical values of the recognition data that are ultimately output, thus improving the recognition rate.
- time of day when time of day is used as the variable data, the relationship between phrases and times of day that matches actual usage can be obtained by detecting the time of day at which a particular phrase is used most often and assigning a large weighting coefficient to this peak time, and smaller weighting coefficients to times of day that deviate farther from this peak time.
- the response content level can be changed in response to the speaker's phrase by generating the response content level for changing the response content for the input speech as time passes, and by issuing an appropriate response by determining the response content that matches the response level based on the recognition data from the speech recognition unit.
- the response content can be varied widely, enabling more meaningful interactions.
- FIG. 1A is a block diagram showing the overall configuration of the stuffed toy dog of Working example 1 of the invention
- FIG. 1B is a more detailed diagram of the configuration in FIG. 1A;
- FIG. 2A is a block diagram showing the overall configuration of Working example 2 of the invention
- FIG. 2B is a more detailed diagram of the configuration in FIG. 1A;
- FIG. 3A is a block diagram showing the overall configuration of Working example 3 of the invention
- FIG. 3B is a more detailed diagram of the configuration in FIG. 3A;
- FIG. 4 is a block diagram showing the overall configuration of Working example 4 of the invention.
- FIG. 5 is a block diagram showing the overall configuration of Working example 5 of the invention.
- FIG. 6 is a block diagram showing the overall configuration of Working example 6 of the invention.
- FIG. 1A is a block diagram that illustrates Working example 1 of the invention. The embodiment will be briefly explained first, and the individual functions will be explained in detail later. Note that Working example 1 uses time of day as the variable data that affects the content of the interaction.
- the variable data e.g., time of day, temperature, weather, and date
- FIG. 1A in the stuffed toy dog 30, microphone 1 inputs speeches from outside.
- Speech analysis unit 2 analyzes the speech input from microphone 1 and generates a speech pattern that matches the characteristics volume of the input speech.
- Clock 3 is a timing means for outputting timing data such as the time at which the speech is input, and the time at which this speech input is recognized by the speech recognition unit described below.
- Coefficient setting unit 4 receives the time data from clock 3 and generates weighting coefficients that change over time, in correspondence to the content of each recognition target speech.
- Speech recognition unit 5 receives the speech data pattern of the input speech from speech analysis unit 2 and at the same time obtains a weighting coefficient in effect for a registered recognition target speech at the time from coefficient setting unit 4. As will be described below, in connection with FIG.
- speech recognition unit 5 computes the final recognition data by multiplying the recognition data corresponding to each recognition target speech by its corresponding weighting coefficient, recognizes the input speech based on the computed final recognition data, and outputs the final recognition result of the recognized speech.
- Speech synthesis unit 6 outputs the speech synthesis data that corresponds to the final recognition result recognized by taking the coefficient from speech recognition unit 5 into consideration.
- Drive control unit 7 drives motion mechanism 10 which moves the mouth, etc. of stuffed toy 30 according to the drive condition that are predetermined in correspondence to the recognition data recognized by speech recognition unit 5.
- Speaker 8 outputs the content of the speech synthesized by speech synthesis unit 6 to the outside.
- Power supply unit 9 drives all of the above units.
- Speech recognition unit 5 in the example uses a neural network, as shown in FIG. 1B, that handles a non-specific speaker, as its recognition means.
- the recognition means is not limited to the method that handles a non-specific speaker, and other known methods such as a method that handles a specific speaker, DP matching, and HMM, can be used as the recognition means.
- motor 11 rotates based on the drive signal (which matches the length of the output signal from speech synthesis unit 6) output by drive control unit 7.
- drive control unit 7 When cam 12 rotates in conjunction with motor 11, protrusion-shaped rib 13 provided on cam 12 moves in a circular trace in conjunction with the rotation of cam 12.
- Crank 15 which uses axis 14 as a fulcrum is clipped on rib 13, and moves lower jaw 16 of the stuffed toy dog up and down synchronously with the rotation of cam 12.
- the speech input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created.
- This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, and is recognized as explained below.
- greeting phrases such as "Good morning,” “I'm leaving,” “Good day,” “I'm home,” and “Good night” are used here for explanation.
- a phrase “Good morning” issued by a non-specific speaker is input into microphone 1.
- the characteristics of this speaker's "Good morning” are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
- the time at which the phrase "Good morning” input from microphone 1 is detected as sound pressure, or the data related to the time at which the phrase "Good morning” is recognized by the neural network of speech recognition unit 5 is supplied from clock 3 to coefficient setting unit 4.
- the time to be referenced by coefficient setting unit 4 is the time the speech is recognized by speech recognition unit 5 in this case.
- the speech data pattern of "Good morning” that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data, as shown in FIG. 1B.
- this value is a number between 0 and 10 possessing a floating point.
- the neural network of speech recognition unit 5 outputs a recognition data value of 8.0 for "Good morning,” 1.0 for “I'm leaving,” 2.0 for “Good day,” 1.0 for “I'm home,” and 4.0 for “Good night.”
- the fact that the recognition data from the neural network for the speaker's “Good morning” is a high value of 8.0 is understandable.
- the reason why the recognition data value for "Good night” is relatively high compared to those for "I'm leaving, " "Good day, “ and “I'm home” is presumed to be because the speech pattern data of "Good morning” and "Good night” of a non-specific speaker, analyzed by speech analysis unit 2, are somewhat similar to each other. Therefore, although the probability is nearly nonexistent that the speaker's "Good morning” will be recognized as “I'm leaving, " “Good day, “ or “I'm home,” the probability is high that the speaker's "Good morning” will be recognized as “Good night.”
- speech recognition unit 5 fetches the weighting coefficient pre-assigned to a recognition target speech by referencing coefficient setting unit 4, and multiply the recognition data by this coefficient. Because different greeting phrases are used depending on the time of day, weighting coefficients are assigned to various greeting phrases based on the time of day. For example, if the current time is 7:00 am, 1.0 will be used as the weighting coefficient for "Good morning,” 0.9 for "I'm leaving, " 0.7 for "Good day, “ 0.6 for "I'm home,” and 0.5 for "Good night.” These relationships among recognition target speeches, time of day, and coefficients are stored in coefficient setting unit 4 in advance.
- the final recognition data of "Good morning” will be 8.0 (i.e., 8.0 ⁇ 1.0) since the recognition data for "Good morning” output by the neural network is 8.0 and the coefficient for "Good morning” at 7:00 am is 1.0.
- the final recognition data for "I'm leaving” will be 0.9 (i.e., 1.0 ⁇ 0.9)
- the final recognition data for "Good day” will be 1.4 (i.e., 2.0 ⁇ 0.7)
- the final recognition data for "I'm home” will be 0.6 (i.e., 1.0 ⁇ 0.6)
- the final recognition data for "Good night” will be 2.0 (i.e., 4.0 ⁇ 0.5).
- speech recognition unit 5 creates final recognition data by taking time-dependent weighting coefficients into consideration.
- the final recognition data for "Good morning” is four times larger than that for "Good night.”
- speech recognition unit 5 can accurately recognize the phrase “Good morning” when it is issued by the speaker. Note that the number of phrases that can be recognized can be set to any value.
- Speech synthesis unit 6 converts the final recognition result from speech recognition unit 5 to pre-determined speech synthesis data, and outputs that speech synthesis output from speaker 8.
- speech synthesis output from speaker 8 For example, "Good morning” will be output from speaker 8 in response to the final recognition result of the phrase "Good morning” in this case. That is, when the child playing with the stuffed toy says “Good morning” to the toy, the toy responds with "Good morning.” This is because the phrase issued and the time of day match each other since the child says "Good morning” at 7:00 am. As a result "Good morning” is correctly recognized and an appropriate response is returned.
- drive control unit 7 drives individual action mechanisms according to the drive conditions pre-determined for the final recognition result.
- the mouth of stuffed toy dog 30 is moved synchronously with the output signal ("Good morning” in this case) from speech synthesis unit 6.
- the output signal ("Good morning” in this case) from speech synthesis unit 6.
- any other units such as shaking the head or tail, for example.
- the final recognition data of "Good morning” will be 4.0 (i.e., 8.0 ⁇ 0.5) since the recognition data for "Good morning” output by the neural network is 8.0 and the weighting coefficient for "Good morning” at 8:00 pm is 0.5.
- the final recognition data for "I'm leaving” will be 0.6 (i.e., 1.0 ⁇ 0.6)
- the final recognition data for "Good day” will be 1.4 (i.e., 2.0 ⁇ 0.7)
- the final recognition data for "I'm home” will be 0.9 (i.e., 1.0 ⁇ 0.9)
- the final recognition data for "Good night” will be 4.0 (i.e., 4.0 ⁇ 1.0).
- speech recognition unit 5 creates final recognition data by taking weighting coefficients into consideration. Since the final recognition data for both "Good morning” and “Good night” are 4.0, the two phrases cannot be differentiated. In other words, when the speaker says “Good morning” at 8:00 pm, it is not possible to determine whether the phrase is "Good morning” or "Good night.”
- speech synthesis unit 6 converts the final recognition data to a pre-determined ambiguous speech synthesis data and outputs it. For example, “Something is funny here
- drive control unit 7 drives individual action mechanisms according to the drive conditions pre-determined for the final recognition data.
- the mouth of stuffed toy dog is moved synchronously with the output signal ("Something is funny here
- the output signal ("Something is funny here
- the final recognition data of "Good morning” will be 2.0 (i.e., 4.0 ⁇ 0.5) since the recognition data for "Good morning” output by the neural network is 4.0 and the weighting coefficient for "Good morning” at 8:00 pm is 0.5.
- the final recognition data for "I'm leaving” will be 0.9 (i.e., 1.0 ⁇ 0.9)
- the final recognition data for "Good day” will be 1.4 (i.e., 2.0 ⁇ 0.7)
- the final recognition data for "I'm home” will be 0.6 (i.e., 1.0 ⁇ 0.6)
- the final recognition data for "Good night” will be 8.0 (i.e., 8.0 ⁇ 1.0).
- speech recognition unit 5 creates final recognition data by taking weighting coefficients into consideration.
- the final recognition data for "Good night” is four times larger than that for "Good morning.”
- speech recognition unit 5 can accurately recognize the phrase “Good night” when it is issued by the speaker.
- Speech synthesis unit 6 converts the final recognition data from speech recognition unit 5 to predetermined speech synthesis data, and outputs that speech synthesis output from speaker 8. For example, “Good night” will be output from speaker 8 in response to the final recognition data of the phrase "Good night” in this case.
- the time of day is used as the variable data for setting weighting coefficients in Working example 1, it is also possible to set weighting coefficients based on other data such as temperature, weather, and date.
- temperature is used as the variable data
- temperature data is detected from a temperature sensor that measures the air temperature, and weighting coefficients are assigned to the recognition data for weather-related greeting phrases (e.g., "It's hot, isn't it?" or "It's cold, isn't it?") that are input and to other registered recognition data.
- weather-related greeting phrases e.g., "It's hot, isn't it?" or "It's cold, isn't it?"
- the recognition rate for various greeting phrases can be increased even further.
- FIG. 2A is different from FIG. 1A in that coefficient storage unit 21 is provided for storing the weighting coefficients for recognizable phrases that are set by coefficient setting unit 4 according to time data. Since all other elements are identical as in FIG. 1A, like symbols are used to represent like parts. The processing between coefficient storage unit 21 and coefficient setting unit 4 will be explained later.
- FIG. 2A the speech that is input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created.
- This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, as shown in FIG. 2B, and is recognized as explained below.
- greeting phrases such as "Good morning,” “I'm leaving, “ “Good day, “ “I'm home,” and “Good night” are used here for explanation.
- a phrase “Good morning” issued by a non-specific speaker is input into microphone 1.
- the characteristics of this speaker's "Good morning” are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
- the time at which the phrase "Good morning” input from microphone 1 is detected as sound pressure, or the data related to the time at which the phrase "Good morning” is recognized by the neural network of speech recognition unit 5 is supplied from clock 3 to coefficient setting unit 4.
- the time to be referenced by coefficient setting unit 4 is the time the speech is recognized by speech recognition unit 5 in this case.
- the speech data pattern of "Good morning” that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data.
- this value is a number between 0 and 10 possessing a floating point.
- the neural network of speech recognition unit 5 outputs a recognition data value of 8.0 for "Good morning,” 1.0 for “I'm leaving, “ 2.0 for “Good day, “ 1.0 for “I'm home,” and 4.0 for “Good night.”
- the fact that the recognition data from the neural network for the speaker's "Good morning” is a high value of 8.0 is understandable.
- the reason why the recognition data value for "Good night” is relatively high compared to those for "I'm leaving, " "Good day, " and “I'm home” is presumed to be because the speech pattern data of "Good morning” and "Good night” of a non-specific speaker, analyzed by speech analysis unit 2, are somewhat similar to each other.
- Speech recognition unit 5 fetches the weighting coefficient assigned to a recognition target speech according to time data by referencing coefficient setting unit 4.
- coefficient storage unit 21 is connected to coefficient setting unit 4, and the content (weighting coefficients) stored in coefficient storage unit 21 is referenced by coefficient setting unit 4.
- coefficient storage unit 21 includes past time data storage unit 42 that stores the time data relating to past statistic data and coefficient table creation unit 44 that creates coefficient tables based on the statistic data from past time data storage unit 42.
- coefficient setting unit 4 Based on coefficient tables created by coefficient table creation unit 42, coefficient setting unit 4 outputs a large weighting coefficient for multiplying to the recognition data of a phrase if the phrase occurs at the time of day it was most frequently recognized, and outputs a smaller weighting coefficient for multiplying to the recognition data of the phrase as the phrase occurs away from that time of day.
- the largest weighting coefficient is assigned to the recognition data when the phrase occurs at the time of day with the highest usage frequency, and a smaller weighting coefficient is assigned to the recognition data as the phrase occurs away from that time of day.
- the final recognition data of "Good morning” will be 8.0 (i.e., 8.0 ⁇ 1.0) since the recognition data for "Good morning” output by the neural network is 8.0 and the coefficient for "Good morning” fetched from memory unit 21 at 7:00 am is 1.0.
- the final recognition data will be 0.9 for "I'm leaving, “ 1.4 for "Good day, “ 0.6 for “I'm home,” and 2.0 for “Good night.”
- coefficient table creation unit 44 of coefficient storage unit 21 stores the largest weighting coefficient for a phrase when it occurs at the time of day with the highest usage frequency based on the time data for recognizing that phrase in the past, and stores a smaller weighting coefficient for the phrase as it occurs away from that time of day.
- the coefficient to be applied to the recognition data of "Good morning” is set the largest when the time data indicates 7:00 am, and smaller as the time data deviates farther away from 7:00 am. That is, the coefficient is set at 1.0 for 7:00 am, 0.9 for 8:00 am, and 0.8 for 9:00 am, for example.
- the time data used for setting coefficients is statistically created based on several past time data instead of just one time data. Note that the coefficients during the initial setting are set to standard values for pre-determined times of day. That is, in the initial state, the weighting coefficient for "Good morning” at 7:00 am is set to 1.0.
- coefficient storage unit 21 The coefficient of the "Good morning” that is most recently recognized is input into coefficient storage unit 21 as a new coefficient data along with the time data, and coefficient storage 21 updates the coefficient for the phrase based on this data and past data as needed.
- the final recognition data of "Good morning” will be 8.0 (i.e., 8.0 ⁇ 1.0) since the recognition data for "Good morning” output by the neural network is 8.0 and the coefficient for "Good morning” fetched from memory unit 21 at 7:00 am is 1.0. Since this final recognition data is at least four times larger than those of other phrases, the phrase "Good morning” is correctly recognized by speech recognition unit 5.
- Speech synthesis unit 6 converts the final recognition result from speech recognition unit 5 to predetermined speech synthesis data, and a preset phrase such as "Good morning” or "You're up early today” is returned through speaker 8 embedded in the body of the stuffed toy dog, as a response to the speaker's "Good morning.”
- speech synthesis unit 6 is programmed to issue a corresponding phrase as in Working example 1, and a response such as "Something is funny here
- FIG. 3A Note that stuffed toy dog 30, motion mechanism 10 for moving the mouth of the stuffed toy, etc. shown in FIG. 1 are omitted from FIG. 3A.
- microphone 1 inputs speeches from outside.
- Speech analysis unit 2 analyzes the speech input from microphone 1 and generates a speech pattern that matches the characteristics volume of the input speech.
- Clock 3 outputs timing data.
- Speech recognition unit 5 outputs the recognition data for the input speech based on the speech data pattern output by speech analysis unit 2.
- Speech synthesis unit 6 outputs the speech synthesis data that corresponds to the final recognition data recognized by taking the coefficient from speech recognition unit 5 into consideration.
- Drive control unit 7 drives motion mechanism 10 (see FIG. 1A) which moves the mouth, etc.
- Speaker 8 outputs the content of the speech synthesized by speech synthesis unit 6 to the outside.
- Power supply unit 9 drives all of the above units.
- Response content level generation unit 31, response content level storage unit 32, and response content creation unit 33 are also included in this embodiment.
- Speech recognition unit 5 in the example uses a neural network that handles a non-specific speaker, as its recognition means.
- the recognition means is not limited to the method that handles a non-specific speaker, and other known methods such as a method that handles a specific speaker, DP matching, and HMM, can be used as the recognition means.
- Response content level generation unit 31 generates response level values for increasing the level of response content as time passes or as the number of recognition's by speech recognition unit 5 increases. As shown in FIG. 3B, response content level generation unit 31 includes level determination table 52 and level determination unit 54. Level determination table 52 contains time ranges for various levels, e.g., level 1 applies up to 24 hours, etc. Level determination unit 54 determines the level value according to the time elapsed. Response content level storage unit 32 stores the relationship between the response level values generated by response content level generation unit 31 and time.
- response content level storage unit 32 stores information of the time elapsed relative to a level value listed in level determination table 52. For example, storage unit 32 stores information that 2 hours have passed from level 2 in a case where 50 hours have passed from a certain point of time. Thus, storage unit 32 stores information of time elapsed so that the data contained in level determination table 52 corresponds to the information.
- response content creation unit 33 includes response content table 56 and response content determination unit 58.
- response content determination unit 58 references response content level generation unit 31 and determines response content that corresponds to the response content level value.
- response content level generation unit 31 fetches the response content level that corresponds to the time data from response content level storage unit 32. For example, response content level 1 is fetched if the current time is within the first 24 hours after the switch was turned on for the first time, and level 2 is fetched if the current time is between 24th and 48th hours.
- Response content creation unit 33 then creates recognition data with the response content that corresponds to the fetched response content level, based on the recognition data from speech recognition unit 5. For example, “Bow-wow” is returned for recognition data “Good morning” when the response content level (hereafter simply referred to as “level”) is 1, broken “G-o-o-d mor-ning" for level 2, "Good morning” for level 3, and "Good morning. It's a nice day, isn't it?" for a higher level n. In this way, both the response content and level are increased as time passes.
- the response data created by response content creation unit 33 is synthesized into a speech by speech synthesis unit 6 and is output from speaker 8.
- a phrase "Good morning” issued by a non-specific speaker is input into microphone 1.
- the characteristics of this speaker's "Good morning” are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
- the speech data pattern of "Good morning” that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. If the recognition data for the phrase "Good morning” is higher than those recognition data for other phrases, speech recognition unit 5 correctly recognizes the speaker's "Good morning” as "Good morning.”
- response content creation unit 33 The final recognition result for the phrase "Good morning” thus identified is input into response content creation unit 33.
- Response content determination unit 58 in response content creation unit 33 determines the response content for the final recognition result, based on the final recognition result and the response level value of response content level generation unit 31.
- the response level value from response content level generation unit 31 is used for gradually increasing the level of response content in response to the phrase issued by the speaker; and in this case, the level is increased as time passes based on the time data of clock 3.
- Working example 3 is characterized in that it provides an illusion that the stuffed toy is growing up like a living creature as time passes.
- the stuffed toy can only respond with "Bow-wow” to "Good morning” on the first day after being purchased because the response level is only 1.
- the second day it can respond with "G-o-o-d mor-ning" to "Good morning” on the second day because the response level is 2.
- the stuffed toy can respond with "It's a nice day, isn't it?" to "Good morning” because of a higher level.
- the clock of time for increasing the response content by one level was set at 1 day (24 hours) in the above explanation, the unit is not limited to 1 day, and it is possible to use a longer or shorter time span for increasing the level. Note that it will be possible to reset level increase if a reset switch for resetting the level is provided. For example, it will be possible to reset the level back to the initial value when level 3 has been reached.
- the stuffed toy dog can be made to appear to be changing the content of its response as it grows.
- the toy can then be made to act like a living creature by making it respond differently as time passes even when the same phrase "Good morning” is recognized.
- the toy is not boring because it responds with different phrases even when the speaker says the same thing.
- Working example 3 is also useful for training the speaker to find out the best way to speak to the toy in order to obtain a high recognition rate when the toy's response content level value is still low. That is, when the speaker does not pronounce "Good morning” in a correct way, the "Good morning” will not be easily recognized, often resulting in a low recognition rate. However, if the toy responds with "Bow-wow” to "Good morning,” this means that the "Good morning” was correctly recognized. Therefore, if the speaker practices to speak in a recognizable manner early on, the speaker can learn how to speak so that the speech can be recognized. Consequently, the speaker's phrases will be recognized at high rates even when the response content level value gradually increases, resulting in smooth interactions.
- the speech that is input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created.
- This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, and is recognized as a speech.
- a phrase "Good morning” issued by a non-specific speaker is input into microphone 1.
- the characteristics of this speaker's "Good morning” are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
- the speech data pattern of "Good morning” that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. If the recognition data for the phrase "Good morning” is higher than those recognition data for other phrases, speech recognition unit 5 correctly recognizes the speaker's "Good morning” as "Good morning.”
- response content creation unit 33 determines the response content for the final recognition result, based on the final recognition result and the temperature data from temperature sensor 34.
- the data content of the response to the recognition data that is output by speech recognition unit 5 can be created according to the current temperature. For example, suppose that the speaker's "Good morning” is correctly recognized by speech recognition unit 5 as “Good morning.” Response content creation unit 33 then creates response data “Good morning. It's a bit cold, isn't it?” in reply to the recognition data "Good morning” if the current temperature is low. On the other hand, response data "Good morning. It's a bit hot, isn't it?" is created in reply to the same recognition data "Good morning” if the current temperature is higher. The response data created by response content creation unit 33 is input into speech synthesis unit 6 and drive control unit 7.
- the speech data input into synthesis unit 6 is converted into speech synthesis data, and is output by speaker 8 embedded in the body of the stuffed toy dog.
- the recognition data input into drive control unit 7 drives motion mechanism 10 (see FIG. 1) according to the corresponding pre-determined drive condition and moves the mouth of the stuffed toy while the response is being issued.
- the stuffed toy dog can be made to behave as if it sensed a change in the temperature in its environment and responded accordingly.
- the toy can then be made to act like a living creature by making it respond differently as the surrounding temperature changes even when the same phrase "Good morning" is recognized.
- the toy is not boring because it responds with different phrases even when the speaker says the same thing.
- FIG. 5 Note that stuffed toy dog 30, motion mechanism 10 for moving the mouth of the stuffed toy, etc. shown in FIG. 1 are omitted from FIG. 5.
- air pressure is detected as one of the variable data that affect the interaction, and the change in air pressure (good or bad weather) is used for changing the content of the response from response content creation unit 33 shown in Working example 3 above.
- Air pressure sensor 35 is provided in FIG. 5, and like symbols are used to represent like parts as in FIG. 3.
- Response content creation unit 33 receives the recognition data from speech recognition unit 5, and determines the response content for the stuffed toy based on the recognition data and the air pressure data from air pressure sensor 35. The specific processing details will be explained later.
- the speech that is input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created.
- This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, and is recognized as a speech.
- a phrase "Good morning” issued by a non-specific speaker is input into microphone 1.
- the characteristics of this speaker's "Good morning” are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
- the speech data pattern of "Good morning” that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. If the recognition data for the phrase "Good morning” is higher than those recognition data for other phrases, speech recognition unit 5 correctly recognizes the speaker's "Good morning” as "Good morning.”
- response content creation unit 33 determines the response content for the input recognition data, based on the input recognition data and the air pressure data from air pressure sensor 35.
- the data content of the response to the recognition data that is output by speech recognition unit 5 can be created according to the current air pressure. For example, suppose that the speaker's "Good morning” is correctly recognized by speech recognition unit 5 as “Good morning.” Response content creation unit 33 then creates response data “Good morning. The weather is going to get worse today.” in reply to the recognition data "Good morning” if the air pressure has fallen. On the other hand, response data "Good morning. The weather is going to get better today.” is created in reply to the recognition data "Good morning” if the air pressure has risen.
- the response data created by response content creation unit 33 is input into speech synthesis unit 6 and drive control unit 7.
- the speech data input into synthesis unit 6 is converted into speech synthesis data, and is output by speaker 8 embedded in the body of the stuffed toy dog.
- the recognition data input into drive control unit 7 drives motion mechanism 10 (see FIG. 1) according to the corresponding pre-determined drive condition and moves the mouth of the stuffed toy while the response is being issued.
- the stuffed toy dog can be made to behave as if it sensed a change in the air pressure in its environment and responded accordingly.
- the toy can then be made to act like a living creature by making it respond differently as the air pressure changes even when the same phrase "Good morning" is recognized.
- the toy is not boring because it responds with different phrases even when the speaker says the same thing.
- FIG. 6 Note that stuffed toy dog 30, motion mechanism 10 for moving the mouth of the stuffed toy, etc. shown in FIG. 1 are omitted from FIG. 6.
- calendar data is detected as one of the variable data that affect the interaction, and the change in calendar data (change in date) is used for changing the content of the response.
- the embodiment in FIG. 6 is different from those in FIGS. 4 and 5 in that calendar unit 36 is provided in place of temperature sensor 34 or air pressure sensor 35, and like symbols are used to represent like parts as in FIGS. 4 or 5.
- calendar unit 36 updates the calendar by referencing the time data from the clock (not shown in the figure).
- Response content creation unit 33 in Working example 6 receives speech recognition data from speech recognition unit 5, and determines the response content for the stuffed toy based on the recognition data and the calendar data from calendar unit 36. The specific processing details will be explained later.
- the speech that is input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created.
- This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, and is recognized as a speech.
- a phrase "Good morning” issued by a non-specific speaker is input into microphone 1.
- the characteristics of this speaker's "Good morning” are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
- the speech data pattern of "Good morning” that was input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. If the recognition data for the phrase "Good morning” is higher than those recognition data for other phrases, speech recognition unit 5 correctly recognizes the speaker's "Good morning” as "Good morning.”
- response content creation unit 33 determines the response content for the input recognition data, based on the input recognition data and the calendar data (date information which can also include year data) from calendar unit 36.
- the data content of the response to the recognition data that is output by speech recognition unit 5 can be created according to the current date. For example, suppose that the speaker's "Good morning” is correctly recognized by speech recognition unit 5 as “Good morning.” Response content creation unit 33 then creates response data "Good morning. Please take me to cherry blossom viewing.” in reply to the recognition data "Good morning” if the calendar data shows April 1. On the other hand, response data "Good morning. Christmas is coming soon.” is created in reply to the same recognition data "Good morning” if the calendar data shows Dec. 23. Naturally, it is possible to create a response that is different from the previous year if the year data is available.
- the response data created by response content creation unit 33 is input into speech synthesis unit 6 and drive control unit 7.
- the speech data input into synthesis unit 6 is converted into speech synthesis data, and is output by speaker 8 embedded in the body of the stuffed toy dog.
- the recognition data input into drive control unit 7 drives motion mechanism 10 (see FIG. 1) according to the corresponding pre-determined drive condition and moves the mouth of the stuffed toy while the response is being issued.
- the stuffed toy dog can be made to behave as if it sensed a change in the date and responded accordingly.
- the toy can then be made to act like a living creature by making it respond differently as the date changes even when the same phrase "Good morning" is recognized.
- the toy is not boring because it responds with different phrases even when the speaker says the same thing.
- speech recognition unit 5 can obtain the final recognition data using weighting coefficients that take into consideration the appropriateness of the content of the speaker's phrase relative to a variable data such as time of day as in Working example 1 or 2, or can obtain the final recognition data using some other method. For example, if the final recognition data is obtained as in Working example 1 or 2 and the response content for this final recognition data is processed as explained in Working examples 3 through 6, the speaker's phrases can be successfully recognized at high rates, and the response to the speaker's phrase can match the prevailing condition much better.
- the response can match the prevailing condition much better.
- Working example 2 is combined with Working example 3, and the temperature sensor, the air pressure sensor, and the calendar unit explained in Working examples 4 through 6 are added, accurate speech recognition can be performed that takes into consideration appropriateness of the content of the speaker's phrase relative to time of day, and it is possible to enjoy changes in the level of the response content from the stuffed toy as time passes.
- interactions that take into account information such as temperature, weather, and date become possible, and thus an extremely sophisticated interactive speech recognition device can be realized.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Toys (AREA)
- Input From Keyboards Or The Like (AREA)
Abstract
The invention improves recognition rates by providing an interactive speech recognition device that performs recognition by taking situational and environmental changes into consideration, thus enabling interactions that correspond to situational and environmental changes. The invention comprises a speech analysis unit that creates a speech data pattern corresponding to the input speech; a timing circuit for generating time data, for example, as variable data; a coefficient setting unit receiving the time data from the timing circuit and generating weighting coefficients that change over time, in correspondence to the content of each recognition target speech; a speech recognition unit that receives the speech data pattern of the input speech from the speech analysis unit, and that at the same time obtains a weighting coefficient in effect for a pre-registered recognition target speech at the time from the coefficient setting unit, that computes final recognition data by multiplying the recognition data corresponding to each recognition target speech by its corresponding weighting coefficient, and that recognizes the input speech based on the computed final recognition result; a speech synthesis unit for outputting speech synthesis data based on the recognition data that takes the weighting coefficient into consideration; and a drive control unit for transmitting the output from the speech synthesis unit to the outside.
Description
1. Field of the Invention
This invention relates generally to an interactive speech recognition device that recognizes speech and produces sounds or actions in response to the recognition result.
2. Description of the Related Art
One example of this kind of interactive speech recognition device is a speech recognition toy. For example, in the speech recognition toy disclosed in Japanese patent application Laid-Open No. 6-142342, multiple instructions that will be used as speech instructions are pre-registered as recognition-target phrases. The speech signal issued by a child who is using the toy is compared to the speech signals that have been registered. When there is a match, the electrical signal pre-specified for the speech instruction is output and causes the toy to perform a specified action.
However, in this type of conventional toys such as stuffed toy animals that issue phrases or perform specified actions based on the speech recognition result, the recognition result is often different from the actual word or phrase issued by the speaker; and even when the recognition result is correct, the toys usually cannot respond or return phrases that accommodate changes in the prevailing condition or environment.
Nowadays, sophisticated actions are required even of toys. For example, a child will quickly tire of a stuffed toy animal if it responds with "Good morning" when a child says "Good morning" to it regardless of the time of day. Furthermore, because this type of interactive speech recognition technology possesses the potential of being applied to game machines for older children, or even to consumer appliances and instruments, development of more advanced technologies have been desired.
Therefore, an object of the invention is to provide an interactive speech recognition device that possesses a function for detecting changes in circumstance or environment, e.g., time of day, that can respond to the speech issued by the user by taking into account the change in circumstance or environment, and that enables more sophisticated interactions.
The interactive speech recognition device of the invention recognizes input speech by analyzing and comparing it to pre-registered speech patterns and responds to the recognized speech. The speech recognition device comprises a speech analysis unit for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech; a detection unit for detecting variable data that affects the interaction content between the speech recognition device and a speaker; a coefficient setting unit responsive to the variable for generating a plurality of weighting coefficients each pre-assigned to a pre-registered recognition target speech, based on the variable data; a speech recognition unit for computing a final recognition result in response to the speech data pattern, the speech recognition unit including means for storing a plurality of pre-registered recognition target speeches and for outputting, in response to the speech data pattern, a plurality of recognition data values each for a corresponding pre-registered recognition target speech, means for computing final recognition data by multiplying each recognition data value by a corresponding pre-assigned weighting coefficient for a corresponding pre-registered recognition target speech, and means for recognizing the input speech by comparing the final recognition data for all of the pre-registered recognition target speeches and for outputting a final recognition result; and a speech synthesis unit for converting the final recognition result to corresponding synthesized speech data for producing an appropriate response to the input speech.
The variable data detection unit is, for example, a timing circuit for detecting time data, and the coefficient setting unit generates a weighting coefficient that corresponds to the time data of a day for each of the pre-registered recognition target speeches. In this case, the coefficient setting unit can be configured to output a preset largest weighting coefficient for the recognized data if it occurs at a peak time when it was correctly recognized most frequently in the past, and a smaller weighting coefficient as the time deviates from this peak time.
Another embodiment of the interactive speech recognition device of the invention recognizes input speech by analyzing and comparing it to pre-registered speech patterns and responds to the recognized speech. The speech recognition device comprises a speech analysis unit for analyzing an input speech and creating a speech data pattern that matches the characteristics of the input speech; a speech recognition unit for generating recognition data that corresponds to the input speech based on the speech data pattern output by the speech analysis unit; a timing circuit for generating time data; a response content level storage unit for storing information relating to passage of time relative to a response content level; a response content level generation unit for storing time ranges for a plurality of response content levels, the response content level generation means being responsive to the time data from the timing means, the recognition data from the speech recognition means and the information from the response content level storage means, for generating a response content level value according to passage of time; a response content creation unit responsive to the recognition data from the speech recognition means and the response content level value from the response content level generation means for determining response content data appropriate for the response content level value generated by the response content level generation unit; and a speech synthesis unit for converting the response content data from the response content creation unit to corresponding speech synthesis data for producing an appropriate response to the input speech.
Still another embodiment of the interactive speech recognition device of the invention recognizes input speech by analyzing and comparing it to pre-registered speech patterns and responds to the recognized speech. The speech recognition device comprises a speech analysis unit for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech; a speech recognition unit for generating the recognition data that corresponds to the input speech, based on the speech data pattern from the speech analysis unit; a detection unit for detecting variable data that affects the interaction content between the speech recognition device and a speaker; a response content creation unit, responsive to the variable data from the detection unit and the recognition data from the speech recognition unit, for outputting response content data, based on the recognition data by taking the variable data into consideration; and a speech synthesis unit for converting the response content data to corresponding speech synthesis data for producing an appropriate response to the input speech.
The detection unit may be a temperature sensor that measures an environmental temperature and outputs temperature data, and the response content creation unit outputs the response content data by taking the temperature data into consideration.
Alternatively, the detection means may be an air pressure sensor that measures an environmental air pressure and outputs air pressure data, and the response content creation unit outputs the response content data by taking the air pressure data into consideration.
Alternatively, the detection means may be a calendar detection means that detects calendar data and outputs the calendar data, and the response content creation means outputs the response content data by taking the calendar data into consideration.
According to the invention, in operation a weighting coefficient is assigned to the recognition data of each of the pre-registered recognition target speeches, based on the changes in the variable data (e.g., time of day, temperature, weather, and date) that affects the content of the interaction. If time of day is used as the variable data, for example, a weighting coefficient can be assigned to each recognition data of recognition target speeches according to the time of day, and speech recognition that considers the weighting coefficients can be operated by taking into consideration whether or not the phrase (in particular, a greeting phrase) issued by the speaker is appropriate for the time of day. Therefore, even if the speech analysis result shows that multiple recognition target speeches exist that possess a similar speech pattern, weighting coefficients can increase the differences among the numerical values of the recognition data that are ultimately output, thus improving the recognition rate. The same is also true for other various types of variable data mentioned above, in addition to time of day. For example, if weighting coefficients that correspond to the current temperature are set up, whether or not the greeting phrase issued by the speaker is appropriate relative to the current temperature can be determined. Here again, even if the speech analysis result shows that multiple recognition target speeches exist that possess a similar speech pattern, weighting coefficients can increase the differences among the numerical values of the recognition data that are ultimately output, thus improving the recognition rate.
Furthermore, when time of day is used as the variable data, the relationship between phrases and times of day that matches actual usage can be obtained by detecting the time of day at which a particular phrase is used most often and assigning a large weighting coefficient to this peak time, and smaller weighting coefficients to times of day that deviate farther from this peak time.
Additionally, the response content level can be changed in response to the speaker's phrase by generating the response content level for changing the response content for the input speech as time passes, and by issuing an appropriate response by determining the response content that matches the response level based on the recognition data from the speech recognition unit.
Furthermore, by using data from instruments such as a temperature sensor or air pressure sensor, or variable data such as calendar data, and creating the response content based on these data, the response content can be varied widely, enabling more meaningful interactions.
Other objects and attainments together with a fuller understanding of the invention will become apparent and appreciated by referring to the following description and claims taken in conjunction with the accompanying drawings.
In the drawings wherein like reference symbols refer to like parts
FIG. 1A is a block diagram showing the overall configuration of the stuffed toy dog of Working example 1 of the invention; FIG. 1B is a more detailed diagram of the configuration in FIG. 1A;
FIG. 2A is a block diagram showing the overall configuration of Working example 2 of the invention; FIG. 2B is a more detailed diagram of the configuration in FIG. 1A;
FIG. 3A is a block diagram showing the overall configuration of Working example 3 of the invention; FIG. 3B is a more detailed diagram of the configuration in FIG. 3A;
FIG. 4 is a block diagram showing the overall configuration of Working example 4 of the invention;
FIG. 5 is a block diagram showing the overall configuration of Working example 5 of the invention; and
FIG. 6 is a block diagram showing the overall configuration of Working example 6 of the invention.
The invention is explained in detail below using working examples. Note that the invention has been applied to a toy in these working examples, and more particularly to a stuffed toy dog intended for small children.
Working example 1
In Working example 1, weighting coefficients are set up for the recognition data of pre-registered recognition target speeches according to the value of the variable data (e.g., time of day, temperature, weather, and date) that affects the interaction content, in order to improve the recognition rate when a greeting phrase is input. FIG. 1A is a block diagram that illustrates Working example 1 of the invention. The embodiment will be briefly explained first, and the individual functions will be explained in detail later. Note that Working example 1 uses time of day as the variable data that affects the content of the interaction.
In FIG. 1A, in the stuffed toy dog 30, microphone 1 inputs speeches from outside. Speech analysis unit 2 analyzes the speech input from microphone 1 and generates a speech pattern that matches the characteristics volume of the input speech. Clock 3 is a timing means for outputting timing data such as the time at which the speech is input, and the time at which this speech input is recognized by the speech recognition unit described below. Coefficient setting unit 4 receives the time data from clock 3 and generates weighting coefficients that change over time, in correspondence to the content of each recognition target speech. Speech recognition unit 5 receives the speech data pattern of the input speech from speech analysis unit 2 and at the same time obtains a weighting coefficient in effect for a registered recognition target speech at the time from coefficient setting unit 4. As will be described below, in connection with FIG. 1B, speech recognition unit 5 computes the final recognition data by multiplying the recognition data corresponding to each recognition target speech by its corresponding weighting coefficient, recognizes the input speech based on the computed final recognition data, and outputs the final recognition result of the recognized speech. Speech synthesis unit 6 outputs the speech synthesis data that corresponds to the final recognition result recognized by taking the coefficient from speech recognition unit 5 into consideration. Drive control unit 7 drives motion mechanism 10 which moves the mouth, etc. of stuffed toy 30 according to the drive condition that are predetermined in correspondence to the recognition data recognized by speech recognition unit 5. Speaker 8 outputs the content of the speech synthesized by speech synthesis unit 6 to the outside. Power supply unit 9 drives all of the above units.
In motion mechanism 10, motor 11 rotates based on the drive signal (which matches the length of the output signal from speech synthesis unit 6) output by drive control unit 7. When cam 12 rotates in conjunction with motor 11, protrusion-shaped rib 13 provided on cam 12 moves in a circular trace in conjunction with the rotation of cam 12. Crank 15 which uses axis 14 as a fulcrum is clipped on rib 13, and moves lower jaw 16 of the stuffed toy dog up and down synchronously with the rotation of cam 12.
In this embodiment, the speech input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created. This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, and is recognized as explained below.
With reference to FIGS. 1A and 1B, the explanation below is based on an example in which several greeting words or phrases are recognized. For example, greeting phrases such as "Good morning," "I'm leaving," "Good day," "I'm home," and "Good night" are used here for explanation. Suppose that a phrase "Good morning" issued by a non-specific speaker is input into microphone 1. The characteristics of this speaker's "Good morning" are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
At the same time, the time at which the phrase "Good morning" input from microphone 1 is detected as sound pressure, or the data related to the time at which the phrase "Good morning" is recognized by the neural network of speech recognition unit 5 is supplied from clock 3 to coefficient setting unit 4. Note that the time to be referenced by coefficient setting unit 4 is the time the speech is recognized by speech recognition unit 5 in this case.
The speech data pattern of "Good morning" that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data, as shown in FIG. 1B. Here, an example in which this value is a number between 0 and 10 possessing a floating point is used for explanation.
As shown in FIG. 1B, when the speaker says "Good morning" to stuffed toy 30, the neural network of speech recognition unit 5 outputs a recognition data value of 8.0 for "Good morning," 1.0 for "I'm leaving," 2.0 for "Good day," 1.0 for "I'm home," and 4.0 for "Good night." The fact that the recognition data from the neural network for the speaker's "Good morning" is a high value of 8.0 is understandable. The reason why the recognition data value for "Good night" is relatively high compared to those for "I'm leaving, " "Good day, " and "I'm home" is presumed to be because the speech pattern data of "Good morning" and "Good night" of a non-specific speaker, analyzed by speech analysis unit 2, are somewhat similar to each other. Therefore, although the probability is nearly nonexistent that the speaker's "Good morning" will be recognized as "I'm leaving, " "Good day, " or "I'm home," the probability is high that the speaker's "Good morning" will be recognized as "Good night."
During this process, speech recognition unit 5 fetches the weighting coefficient pre-assigned to a recognition target speech by referencing coefficient setting unit 4, and multiply the recognition data by this coefficient. Because different greeting phrases are used depending on the time of day, weighting coefficients are assigned to various greeting phrases based on the time of day. For example, if the current time is 7:00 am, 1.0 will be used as the weighting coefficient for "Good morning," 0.9 for "I'm leaving, " 0.7 for "Good day, " 0.6 for "I'm home," and 0.5 for "Good night." These relationships among recognition target speeches, time of day, and coefficients are stored in coefficient setting unit 4 in advance.
When weighting coefficients are used in this way, the final recognition data of "Good morning" will be 8.0 (i.e., 8.0×1.0) since the recognition data for "Good morning" output by the neural network is 8.0 and the coefficient for "Good morning" at 7:00 am is 1.0. Likewise, the final recognition data for "I'm leaving" will be 0.9 (i.e., 1.0×0.9), the final recognition data for "Good day" will be 1.4 (i.e., 2.0×0.7), the final recognition data for "I'm home" will be 0.6 (i.e., 1.0×0.6), and the final recognition data for "Good night" will be 2.0 (i.e., 4.0×0.5). In this way, speech recognition unit 5 creates final recognition data by taking time-dependent weighting coefficients into consideration.
When the final recognition data are determined by taking time-dependent weighting coefficients into consideration in this way, the final recognition data for "Good morning" is four times larger than that for "Good night." As a result, speech recognition unit 5 can accurately recognize the phrase "Good morning" when it is issued by the speaker. Note that the number of phrases that can be recognized can be set to any value.
As shown in FIG. 1B, the final recognition result of the phrase "Good morning" determined in this way is input into speech synthesis unit 6 and drive control unit 7. Speech synthesis unit 6 converts the final recognition result from speech recognition unit 5 to pre-determined speech synthesis data, and outputs that speech synthesis output from speaker 8. For example, "Good morning" will be output from speaker 8 in response to the final recognition result of the phrase "Good morning" in this case. That is, when the child playing with the stuffed toy says "Good morning" to the toy, the toy responds with "Good morning." This is because the phrase issued and the time of day match each other since the child says "Good morning" at 7:00 am. As a result "Good morning" is correctly recognized and an appropriate response is returned.
At the same time, drive control unit 7 drives individual action mechanisms according to the drive conditions pre-determined for the final recognition result. Here, the mouth of stuffed toy dog 30 is moved synchronously with the output signal ("Good morning" in this case) from speech synthesis unit 6. Naturally, in addition to moving the mouth of the stuffed toy, it is possible to move any other units, such as shaking the head or tail, for example.
Next, a case in which the current time is 8:00 pm is explained. In this case, 0.5 is set as the weighting coefficient for "Good morning," 0.6 for "I'm leaving, " 0.7 for "Good day, " 0.9 for "I'm home," and 1.0 for "Good night."
When weighting coefficients are used in this way, the final recognition data of "Good morning" will be 4.0 (i.e., 8.0×0.5) since the recognition data for "Good morning" output by the neural network is 8.0 and the weighting coefficient for "Good morning" at 8:00 pm is 0.5. Likewise, the final recognition data for "I'm leaving" will be 0.6 (i.e., 1.0×0.6), the final recognition data for "Good day" will be 1.4 (i.e., 2.0×0.7), the final recognition data for "I'm home" will be 0.9 (i.e., 1.0×0.9), and the final recognition data for "Good night" will be 4.0 (i.e., 4.0×1.0).
In this way, speech recognition unit 5 creates final recognition data by taking weighting coefficients into consideration. Since the final recognition data for both "Good morning" and "Good night" are 4.0, the two phrases cannot be differentiated. In other words, when the speaker says "Good morning" at 8:00 pm, it is not possible to determine whether the phrase is "Good morning" or "Good night."
This final recognition result is supplied to speech synthesis unit 6 and drive control unit 7, both of which act accordingly. That is, speech synthesis unit 6 converts the final recognition data to a pre-determined ambiguous speech synthesis data and outputs it. For example, "Something is funny here|" is output from speaker 8, indicating that "Good morning" is not appropriate for use at night time.
At the same time, drive control unit 7 drives individual action mechanisms according to the drive conditions pre-determined for the final recognition data. Here, the mouth of stuffed toy dog is moved synchronously with the output signal ("Something is funny here|" in this case) from speech synthesis unit 6. Naturally, in addition to moving the mouth of the stuffed toy, it is possible to move any other units, as in the case above.
Next, a case in which the speaker says "Good night" when the current time is 8:00 pm is explained. In this case, it is assumed that the neural network of speech recognition unit 5 outputs a recognition data value of 4.0 for "Good morning," 1.0 for "I'm leaving, " 2.0 for "Good day, " 1.0 for "I'm home," and 8.0 for "Good night." When the current time is 8:00 pm, 0.5 will be used as the weighting coefficient for "Good morning," 0.6 for "I'm leaving, " 0.7 for "Good day, " 0.9 for "I'm home," and 1.0 for "Good night,"
When weighting coefficients are used in this way, the final recognition data of "Good morning" will be 2.0 (i.e., 4.0×0.5) since the recognition data for "Good morning" output by the neural network is 4.0 and the weighting coefficient for "Good morning" at 8:00 pm is 0.5. Likewise, the final recognition data for "I'm leaving" will be 0.9 (i.e., 1.0×0.9), the final recognition data for "Good day" will be 1.4 (i.e., 2.0×0.7), the final recognition data for "I'm home" will be 0.6 (i.e., 1.0×0.6), and the final recognition data for "Good night" will be 8.0 (i.e., 8.0×1.0). In this way, speech recognition unit 5 creates final recognition data by taking weighting coefficients into consideration.
When the final recognition data are determined by taking time-related information into consideration in this way, the final recognition data for "Good night" is four times larger than that for "Good morning." As a result, speech recognition unit 5 can accurately recognize the phrase "Good night" when it is issued by the speaker.
The final recognition data of the phrase "Good night" determined in this way is input into speech synthesis unit 6 and drive control unit 7. Speech synthesis unit 6 converts the final recognition data from speech recognition unit 5 to predetermined speech synthesis data, and outputs that speech synthesis output from speaker 8. For example, "Good night" will be output from speaker 8 in response to the final recognition data of the phrase "Good night" in this case.
Although the response from stuffed toy 30 is "Good morning" or "Good night" in response to the speaker's "Good morning" or "Good night", respectively, in the above explanation, it is possible to set many kinds of phrases as the response. For example, "You're up early today" can be used in response to "Good morning."
Furthermore, although the time of day is used as the variable data for setting weighting coefficients in Working example 1, it is also possible to set weighting coefficients based on other data such as temperature, weather, and date. For example, if temperature is used as the variable data, temperature data is detected from a temperature sensor that measures the air temperature, and weighting coefficients are assigned to the recognition data for weather-related greeting phrases (e.g., "It's hot, isn't it?" or "It's cold, isn't it?") that are input and to other registered recognition data. In this way, the difference in the values of the two recognition data is magnified by their weighting coefficients even if a speech data pattern that is similar to the input speech exists, thus increasing the recognition rate. Furthermore, if a combination of variable data such as time of day, temperature, weather, and date, is used and weighting coefficients are assigned to these variable data, the recognition rate for various greeting phrases can be increased even further.
Working example 2
Next, Working example 2 of the invention will be explained with reference to FIGS. 2A and 2B. Note that stuffed toy dog 30, motion mechanism 10 for moving the mouth of the stuffed toy, etc. are omitted from FIG. 2A. FIG. 2A is different from FIG. 1A in that coefficient storage unit 21 is provided for storing the weighting coefficients for recognizable phrases that are set by coefficient setting unit 4 according to time data. Since all other elements are identical as in FIG. 1A, like symbols are used to represent like parts. The processing between coefficient storage unit 21 and coefficient setting unit 4 will be explained later.
In FIG. 2A, the speech that is input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created. This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, as shown in FIG. 2B, and is recognized as explained below.
With reference to FIGS. 2A and 2B the explanation below is based on an example in which several greeting words or phrases are recognized. For example, greeting phrases such as "Good morning," "I'm leaving, " "Good day, " "I'm home," and "Good night" are used here for explanation. Suppose that a phrase "Good morning" issued by a non-specific speaker is input into microphone 1. The characteristics of this speaker's "Good morning" are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
At the same time, the time at which the phrase "Good morning" input from microphone 1 is detected as sound pressure, or the data related to the time at which the phrase "Good morning" is recognized by the neural network of speech recognition unit 5 is supplied from clock 3 to coefficient setting unit 4. Note that the time to be referenced by coefficient setting unit 4 is the time the speech is recognized by speech recognition unit 5 in this case.
The speech data pattern of "Good morning" that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. Here, an example in which this value is a number between 0 and 10 possessing a floating point is used for explanation.
As shown in FIG. 2B, when the speaker says "Good morning" to stuffed toy 30, the neural network of speech recognition unit 5 outputs a recognition data value of 8.0 for "Good morning," 1.0 for "I'm leaving, " 2.0 for "Good day, " 1.0 for "I'm home," and 4.0 for "Good night." The fact that the recognition data from the neural network for the speaker's "Good morning" is a high value of 8.0 is understandable. The reason why the recognition data value for "Good night" is relatively high compared to those for "I'm leaving, " "Good day, " and "I'm home" is presumed to be because the speech pattern data of "Good morning" and "Good night" of a non-specific speaker, analyzed by speech analysis unit 2, are somewhat similar to each other. Therefore, although the probability is nearly nonexistent that the speaker's "Good morning" will be recognized as "I'm leaving, " "Good day, " or "I'm home," the probability is high that the speaker's "Good morning" will be recognized as "Good night." Up to this point, Working example 2 is nearly identical to Working example 1.
For example, as shown in FIG. 2B if it is assumed that the current time is 7:00 am, and that 1.0 is used as the initial weighting coefficient for "Good morning," 0.9 for "I'm leaving, " 0.7 for "Good day, " 0.6 for "I'm home," and 0.5 for "Good night," and these coefficients are stored in coefficient storage unit 21, the final recognition data of "Good morning" will be 8.0 (i.e., 8.0×1.0) since the recognition data for "Good morning" output by the neural network is 8.0 and the coefficient for "Good morning" fetched from memory unit 21 at 7:00 am is 1.0. Likewise, the final recognition data will be 0.9 for "I'm leaving, " 1.4 for "Good day, " 0.6 for "I'm home," and 2.0 for "Good night." These final recognition data are initially created by speech recognition unit 5.
Even when recognition is performed by taking into consideration the weighting coefficient based on the time of day, there is some range of time in which a certain phrase will be correctly recognized. For example, the phrase "Good morning" may be correctly recognized at 7:00 am, 7:30 am, or 8:00 am. By taking this factor into consideration, coefficient table creation unit 44 of coefficient storage unit 21 stores the largest weighting coefficient for a phrase when it occurs at the time of day with the highest usage frequency based on the time data for recognizing that phrase in the past, and stores a smaller weighting coefficient for the phrase as it occurs away from that time of day.
For example, if the phrase "Good morning" was most frequently recognized at 7:00 am according to the past statistics, the coefficient to be applied to the recognition data of "Good morning" is set the largest when the time data indicates 7:00 am, and smaller as the time data deviates farther away from 7:00 am. That is, the coefficient is set at 1.0 for 7:00 am, 0.9 for 8:00 am, and 0.8 for 9:00 am, for example. The time data used for setting coefficients is statistically created based on several past time data instead of just one time data. Note that the coefficients during the initial setting are set to standard values for pre-determined times of day. That is, in the initial state, the weighting coefficient for "Good morning" at 7:00 am is set to 1.0.
The coefficient of the "Good morning" that is most recently recognized is input into coefficient storage unit 21 as a new coefficient data along with the time data, and coefficient storage 21 updates the coefficient for the phrase based on this data and past data as needed.
By making the coefficient for a phrase the largest at the time of day when it is used most frequently, when the phrase "Good morning" is issued at around 7:00 am, the final recognition data of "Good morning" will be 8.0 (i.e., 8.0×1.0) since the recognition data for "Good morning" output by the neural network is 8.0 and the coefficient for "Good morning" fetched from memory unit 21 at 7:00 am is 1.0. Since this final recognition data is at least four times larger than those of other phrases, the phrase "Good morning" is correctly recognized by speech recognition unit 5.
The final recognition result of the phrase "Good morning" determined in this way is input into speech synthesis unit 6 and drive control unit 7. Speech synthesis unit 6 converts the final recognition result from speech recognition unit 5 to predetermined speech synthesis data, and a preset phrase such as "Good morning" or "You're up early today" is returned through speaker 8 embedded in the body of the stuffed toy dog, as a response to the speaker's "Good morning."
On the other hand, if "Good morning" is issued at around 12 noon, the coefficient for "Good morning" becomes small, making the final recognition data for "Good morning" small, and "Good morning" will not be recognized. In such a case, speech synthesis unit 6 is programmed to issue a corresponding phrase as in Working example 1, and a response such as "Something is funny here|" is issued by stuffed toy 30.
Working example 3
Next, Working example 3 of the invention will be explained with reference to FIGS. 3A and 3B. Note that stuffed toy dog 30, motion mechanism 10 for moving the mouth of the stuffed toy, etc. shown in FIG. 1 are omitted from FIG. 3A. In working example 3, microphone 1 inputs speeches from outside. Speech analysis unit 2 analyzes the speech input from microphone 1 and generates a speech pattern that matches the characteristics volume of the input speech. Clock 3 outputs timing data. Speech recognition unit 5 outputs the recognition data for the input speech based on the speech data pattern output by speech analysis unit 2. Speech synthesis unit 6 outputs the speech synthesis data that corresponds to the final recognition data recognized by taking the coefficient from speech recognition unit 5 into consideration. Drive control unit 7 drives motion mechanism 10 (see FIG. 1A) which moves the mouth, etc. of stuffed toy 30 according to the drive condition that are predetermined in correspondence to the recognition data recognized by speech recognition unit 5. Speaker 8 outputs the content of the speech synthesized by speech synthesis unit 6 to the outside. Power supply unit 9 drives all of the above units. Response content level generation unit 31, response content level storage unit 32, and response content creation unit 33 are also included in this embodiment.
Response content level generation unit 31 generates response level values for increasing the level of response content as time passes or as the number of recognition's by speech recognition unit 5 increases. As shown in FIG. 3B, response content level generation unit 31 includes level determination table 52 and level determination unit 54. Level determination table 52 contains time ranges for various levels, e.g., level 1 applies up to 24 hours, etc. Level determination unit 54 determines the level value according to the time elapsed. Response content level storage unit 32 stores the relationship between the response level values generated by response content level generation unit 31 and time. That is, the relationship between the passage of time and level value is stored, e.g., level 1 when the activation switch is turned on for the first time after the stuffed toy is purchased, level 2 after 24 hours pass, and level 3 after 24 more hours pass. In other words, response content level storage unit 32 stores information of the time elapsed relative to a level value listed in level determination table 52. For example, storage unit 32 stores information that 2 hours have passed from level 2 in a case where 50 hours have passed from a certain point of time. Thus, storage unit 32 stores information of time elapsed so that the data contained in level determination table 52 corresponds to the information.
As shown in FIG. 3B, response content creation unit 33 includes response content table 56 and response content determination unit 58. When the final recognition data is received from speech recognition unit 5, response content determination unit 58 references response content level generation unit 31 and determines response content that corresponds to the response content level value. During this process, response content level generation unit 31 fetches the response content level that corresponds to the time data from response content level storage unit 32. For example, response content level 1 is fetched if the current time is within the first 24 hours after the switch was turned on for the first time, and level 2 is fetched if the current time is between 24th and 48th hours.
Response content creation unit 33 then creates recognition data with the response content that corresponds to the fetched response content level, based on the recognition data from speech recognition unit 5. For example, "Bow-wow" is returned for recognition data "Good morning" when the response content level (hereafter simply referred to as "level") is 1, broken "G-o-o-d mor-ning" for level 2, "Good morning" for level 3, and "Good morning. It's a nice day, isn't it?" for a higher level n. In this way, both the response content and level are increased as time passes. The response data created by response content creation unit 33 is synthesized into a speech by speech synthesis unit 6 and is output from speaker 8.
Suppose that a phrase "Good morning" issued by a non-specific speaker is input into microphone 1. The characteristics of this speaker's "Good morning" are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
The speech data pattern of "Good morning" that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. If the recognition data for the phrase "Good morning" is higher than those recognition data for other phrases, speech recognition unit 5 correctly recognizes the speaker's "Good morning" as "Good morning."
The final recognition result for the phrase "Good morning" thus identified is input into response content creation unit 33. Response content determination unit 58 in response content creation unit 33 then determines the response content for the final recognition result, based on the final recognition result and the response level value of response content level generation unit 31.
As explained above, the response level value from response content level generation unit 31 is used for gradually increasing the level of response content in response to the phrase issued by the speaker; and in this case, the level is increased as time passes based on the time data of clock 3. However, it is also possible to change the level value based on the number or types of phrases recognized, instead of the passage of time. Alternatively, it is possible to change the level value based on the combination of the passage of time and the number or types of phrases recognized.
Working example 3 is characterized in that it provides an illusion that the stuffed toy is growing up like a living creature as time passes. In other words, the stuffed toy can only respond with "Bow-wow" to "Good morning" on the first day after being purchased because the response level is only 1. However, on the second day, it can respond with "G-o-o-d mor-ning" to "Good morning" on the second day because the response level is 2. Furthermore, after several days, the stuffed toy can respond with "It's a nice day, isn't it?" to "Good morning" because of a higher level.
Although the clock of time for increasing the response content by one level was set at 1 day (24 hours) in the above explanation, the unit is not limited to 1 day, and it is possible to use a longer or shorter time span for increasing the level. Note that it will be possible to reset level increase if a reset switch for resetting the level is provided. For example, it will be possible to reset the level back to the initial value when level 3 has been reached.
Although the above explanation was provided for the response to the phrase "Good morning," it is not limited to "Good morning" and is naturally applicable to upgrading of responses to other phrases such as "Good night" and "I'm leaving." Take "Good night" for example. The content of the response from the stuffed toy in reply to "Good night" can be changed from "Unn-unn" (puppy cry) in level 1, to "G-o-o-d nigh-t" in level 2.
By increasing the level of response content in this way, the stuffed toy dog can be made to appear to be changing the content of its response as it grows. The toy can then be made to act like a living creature by making it respond differently as time passes even when the same phrase "Good morning" is recognized. Furthermore, the toy is not boring because it responds with different phrases even when the speaker says the same thing.
Working example 3 is also useful for training the speaker to find out the best way to speak to the toy in order to obtain a high recognition rate when the toy's response content level value is still low. That is, when the speaker does not pronounce "Good morning" in a correct way, the "Good morning" will not be easily recognized, often resulting in a low recognition rate. However, if the toy responds with "Bow-wow" to "Good morning," this means that the "Good morning" was correctly recognized. Therefore, if the speaker practices to speak in a recognizable manner early on, the speaker can learn how to speak so that the speech can be recognized. Consequently, the speaker's phrases will be recognized at high rates even when the response content level value gradually increases, resulting in smooth interactions.
Working example 4
Next, Working example 4 of the invention will be explained with reference to FIG. 4. Note that stuffed toy dog 30, motion mechanism 10 for moving the mouth of the stuffed toy, etc. shown in FIG. 1 are omitted from FIG. 4. In Working example 4, temperature is detected as one of the variable data that affect the interaction, and the change in temperature is used for changing the content of the response from response content creation unit 33 shown in Working example 3 above. Temperature sensor 34 is provided in FIG. 4, and like symbols are used to represent like parts as in FIG. 3. When it receives the recognition data from speech recognition unit 5, response content creation unit 33 determines the response content for stuffed toy 30 based on the recognition data and the temperature data from temperature sensor 34. The specific processing details will be explained later.
In FIG. 4, the speech that is input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created. This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, and is recognized as a speech.
Suppose that a phrase "Good morning" issued by a non-specific speaker is input into microphone 1. The characteristics of this speaker's "Good morning" are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
The speech data pattern of "Good morning" that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. If the recognition data for the phrase "Good morning" is higher than those recognition data for other phrases, speech recognition unit 5 correctly recognizes the speaker's "Good morning" as "Good morning."
The final recognition result for the phrase "Good morning" thus recognized is input into response content creation unit 33. Response content creation unit 33 then determines the response content for the final recognition result, based on the final recognition result and the temperature data from temperature sensor 34.
Therefore, the data content of the response to the recognition data that is output by speech recognition unit 5 can be created according to the current temperature. For example, suppose that the speaker's "Good morning" is correctly recognized by speech recognition unit 5 as "Good morning." Response content creation unit 33 then creates response data "Good morning. It's a bit cold, isn't it?" in reply to the recognition data "Good morning" if the current temperature is low. On the other hand, response data "Good morning. It's a bit hot, isn't it?" is created in reply to the same recognition data "Good morning" if the current temperature is higher. The response data created by response content creation unit 33 is input into speech synthesis unit 6 and drive control unit 7. The speech data input into synthesis unit 6 is converted into speech synthesis data, and is output by speaker 8 embedded in the body of the stuffed toy dog. The recognition data input into drive control unit 7 drives motion mechanism 10 (see FIG. 1) according to the corresponding pre-determined drive condition and moves the mouth of the stuffed toy while the response is being issued.
In this way, the stuffed toy dog can be made to behave as if it sensed a change in the temperature in its environment and responded accordingly. The toy can then be made to act like a living creature by making it respond differently as the surrounding temperature changes even when the same phrase "Good morning" is recognized. Furthermore, the toy is not boring because it responds with different phrases even when the speaker says the same thing.
Working example 5
Next, Working example 5 of the invention will be explained with reference to FIG. 5. Note that stuffed toy dog 30, motion mechanism 10 for moving the mouth of the stuffed toy, etc. shown in FIG. 1 are omitted from FIG. 5. In Working example 5, air pressure is detected as one of the variable data that affect the interaction, and the change in air pressure (good or bad weather) is used for changing the content of the response from response content creation unit 33 shown in Working example 3 above. Air pressure sensor 35 is provided in FIG. 5, and like symbols are used to represent like parts as in FIG. 3. Response content creation unit 33 receives the recognition data from speech recognition unit 5, and determines the response content for the stuffed toy based on the recognition data and the air pressure data from air pressure sensor 35. The specific processing details will be explained later.
In FIG. 5, the speech that is input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created. This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, and is recognized as a speech.
Suppose that a phrase "Good morning" issued by a non-specific speaker is input into microphone 1. The characteristics of this speaker's "Good morning" are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
The speech data pattern of "Good morning" that is input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. If the recognition data for the phrase "Good morning" is higher than those recognition data for other phrases, speech recognition unit 5 correctly recognizes the speaker's "Good morning" as "Good morning."
The recognition data for the phrase "Good morning" thus recognized is input into response content creation unit 33. Response content creation unit 33 then determines the response content for the input recognition data, based on the input recognition data and the air pressure data from air pressure sensor 35.
Therefore, the data content of the response to the recognition data that is output by speech recognition unit 5 can be created according to the current air pressure. For example, suppose that the speaker's "Good morning" is correctly recognized by speech recognition unit 5 as "Good morning." Response content creation unit 33 then creates response data "Good morning. The weather is going to get worse today." in reply to the recognition data "Good morning" if the air pressure has fallen. On the other hand, response data "Good morning. The weather is going to get better today." is created in reply to the recognition data "Good morning" if the air pressure has risen. The response data created by response content creation unit 33 is input into speech synthesis unit 6 and drive control unit 7. The speech data input into synthesis unit 6 is converted into speech synthesis data, and is output by speaker 8 embedded in the body of the stuffed toy dog. The recognition data input into drive control unit 7 drives motion mechanism 10 (see FIG. 1) according to the corresponding pre-determined drive condition and moves the mouth of the stuffed toy while the response is being issued.
In this way, the stuffed toy dog can be made to behave as if it sensed a change in the air pressure in its environment and responded accordingly. The toy can then be made to act like a living creature by making it respond differently as the air pressure changes even when the same phrase "Good morning" is recognized. Furthermore, the toy is not boring because it responds with different phrases even when the speaker says the same thing.
Working example 6
Next, Working example 6 of the invention will be explained with reference to FIG. 6. Note that stuffed toy dog 30, motion mechanism 10 for moving the mouth of the stuffed toy, etc. shown in FIG. 1 are omitted from FIG. 6. In Working example 6, calendar data is detected as one of the variable data that affect the interaction, and the change in calendar data (change in date) is used for changing the content of the response. The embodiment in FIG. 6 is different from those in FIGS. 4 and 5 in that calendar unit 36 is provided in place of temperature sensor 34 or air pressure sensor 35, and like symbols are used to represent like parts as in FIGS. 4 or 5. Note that calendar unit 36 updates the calendar by referencing the time data from the clock (not shown in the figure). Response content creation unit 33 in Working example 6 receives speech recognition data from speech recognition unit 5, and determines the response content for the stuffed toy based on the recognition data and the calendar data from calendar unit 36. The specific processing details will be explained later.
In FIG. 6, the speech that is input from microphone 1 is analyzed by speech analysis unit 2, and a speech data pattern matching the characteristics volume of the input speech is created. This speech data pattern is input into the input unit of the neural network provided in speech recognition unit 5, and is recognized as a speech.
Suppose that a phrase "Good morning" issued by a non-specific speaker is input into microphone 1. The characteristics of this speaker's "Good morning" are analyzed by speech analysis unit 2 and are input into speech recognition unit 5 as a speech data pattern.
The speech data pattern of "Good morning" that was input into the neural network of speech recognition unit 5 in this way is output from the output unit of the neural network as a recognition data possessing a value, instead of a binary data. If the recognition data for the phrase "Good morning" is higher than those recognition data for other phrases, speech recognition unit 5 correctly recognizes the speaker's "Good morning" as "Good morning."
The recognition data for the phrase "Good morning" thus recognized is input into response content creation unit 33. Response content creation unit 33 then determines the response content for the input recognition data, based on the input recognition data and the calendar data (date information which can also include year data) from calendar unit 36.
Therefore, the data content of the response to the recognition data that is output by speech recognition unit 5 can be created according to the current date. For example, suppose that the speaker's "Good morning" is correctly recognized by speech recognition unit 5 as "Good morning." Response content creation unit 33 then creates response data "Good morning. Please take me to cherry blossom viewing." in reply to the recognition data "Good morning" if the calendar data shows April 1. On the other hand, response data "Good morning. Christmas is coming soon." is created in reply to the same recognition data "Good morning" if the calendar data shows Dec. 23. Naturally, it is possible to create a response that is different from the previous year if the year data is available.
The response data created by response content creation unit 33 is input into speech synthesis unit 6 and drive control unit 7. The speech data input into synthesis unit 6 is converted into speech synthesis data, and is output by speaker 8 embedded in the body of the stuffed toy dog. The recognition data input into drive control unit 7 drives motion mechanism 10 (see FIG. 1) according to the corresponding pre-determined drive condition and moves the mouth of the stuffed toy while the response is being issued.
In this way, the stuffed toy dog can be made to behave as if it sensed a change in the date and responded accordingly. The toy can then be made to act like a living creature by making it respond differently as the date changes even when the same phrase "Good morning" is recognized. Furthermore, the toy is not boring because it responds with different phrases even when the speaker says the same thing.
Although several working examples have been used for explaining the invention, the invention can be widely applied to electronic instruments that are used daily, such as personal digital assistants and interactive games, in addition to toys. Furthermore, in the third and subsequent working examples, speech recognition unit 5 can obtain the final recognition data using weighting coefficients that take into consideration the appropriateness of the content of the speaker's phrase relative to a variable data such as time of day as in Working example 1 or 2, or can obtain the final recognition data using some other method. For example, if the final recognition data is obtained as in Working example 1 or 2 and the response content for this final recognition data is processed as explained in Working examples 3 through 6, the speaker's phrases can be successfully recognized at high rates, and the response to the speaker's phrase can match the prevailing condition much better. Additionally, by using all of the response content processes explained in Working examples 3 through 6 or in some combinations, the response can match the prevailing condition much better. For example, if Working example 2 is combined with Working example 3, and the temperature sensor, the air pressure sensor, and the calendar unit explained in Working examples 4 through 6 are added, accurate speech recognition can be performed that takes into consideration appropriateness of the content of the speaker's phrase relative to time of day, and it is possible to enjoy changes in the level of the response content from the stuffed toy as time passes. Furthermore, interactions that take into account information such as temperature, weather, and date become possible, and thus an extremely sophisticated interactive speech recognition device can be realized.
While the invention has been described in conjunction with several specific embodiments, it is evident to those skilled in the art that many further alternatives, modifications and variations will be apparent in light of the foregoing description. Thus, the invention described herein is intended to embrace all such alternatives, modifications, applications and variations as may fall within the spirit and scope of the appended claims.
Claims (12)
1. An interactive speech recognition device, comprising:
speech analysis means for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech;
detection means for detecting variable non-speech data that changes speech flowing from the speech recognition device;
coefficient setting means, responsive to the variable non-speech data, for generating a plurality of weighting coefficients each pre-assigned to a pre-registered recognition target speech, based on the variable non-speech data;
speech recognition means for computing a final recognition result in response to the speech data pattern, said speech recognition means including:
means for storing a plurality of pre-registered recognition target speeches and for outputting, in response to the speech data pattern, a plurality of recognition data values each for a corresponding pre-registered recognition target speech,
means for computing final recognition data by multiplying each recognition data value by a corresponding one of said pre-assigned weighting coefficients for a corresponding pre-registered recognition target speech, and
means for recognizing the input speech by comparing the final recognition data for all of the pre-registered recognition target speeches and for outputting a final recognition result; and
speech synthesis means for converting the final recognition result to corresponding speech synthesis data for producing an appropriate response to the input speech.
2. The interactive speech recognition device of claim 1, wherein said detection means includes timing means for providing time data, and each of the weighting coefficients generated by said coefficient setting means corresponds to the time data of a day for a corresponding pre-registered recognition target speech.
3. The interactive speech recognition device of claim 2, further comprising coefficient storage means, responsive to the time data from said timing means, for storing past time data relating to past statistic data and for creating weighting coefficients based on the past time data relating to the past statistic data, wherein said coefficient setting means, responsive to said timing means and said coefficient storage means, generates a preset largest value of a weighting coefficient for a pre-selected, pre-registered recognition target speech if the input speech occurs at a peak time at which it was correctly recognized most frequently in the past, and generates a smaller value of the weighting coefficient as time deviates from this peak time.
4. An interactive speech recognition device, comprising:
speech analysis means for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech;
speech recognition means for generating recognition data that corresponds to the input speech based on the speech data pattern created by said speech analysis means;
timing means for generating time data;
response content level storage means for storing information relating to passage of time relative to a response content level;
response content level generation means for storing time ranges for a plurality of response content levels, said response content level generation means being responsive to the time data from said timing means, the recognition data from said speech recognition means, and the information from said response content level storage means, for generating a response content level value according to passage of time;
response content creation means, responsive to the recognition data from said speech recognition means and the response content level value from said response content level generation means, for determining response content data appropriate for the response content level value generated by said response content level generation means; and
speech synthesis means for converting the response content data from said response content creation means to corresponding speech synthesis data for producing an appropriate response to the input speech.
5. An interactive speech recognition device, comprising;
speech analysis means for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech;
speech recognition means for generating recognition data that corresponds to the input speech, based on the speech data pattern from said speech analysis means;
detection means for detecting variable non-speech data that changes speech flowing from the speech recognition device;
response content creation means, responsive to the variable non-speech data from said detection means and the recognition data from said speech recognition means, for outputting response content data, based on the recognition data by taking the variable non-speech data into consideration,
speech synthesis means for converting the response content data to corresponding speech synthesis data for producing an appropriate response to the input speech.
6. The interactive speech recognition device of claim 5, wherein said detection means includes a temperature sensor that measures an environmental temperature and outputs temperature data, and said response content creation means outputs the response content data by taking the temperature data into consideration.
7. The interactive speech recognition device of claim 5, wherein said detection means includes an air pressure sensor that measures an environmental air pressure and outputs air pressure data, and said response content creation means outputs the response content data by taking the air pressure data into consideration.
8. The interactive speech recognition device of claim 5, wherein said detection means includes calendar detection means for detecting calendar data and outputting the calendar data, and said response content creation means outputs the response content data by taking the calendar data into consideration.
9. The interactive speech recognition device of claim 5, wherein said detection means includes timing means for providing time data, and response content data generated by said response content creation means corresponds to the time data of a day for a corresponding pre-registered recognition target speech.
10. An interactive speech recognition device, comprising:
speech analysis means for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech;
detection means for detecting time data that changes speech flowing from the speech recognition device;
coefficient setting means, responsive to the detected time data, for generating a plurality of weighting coefficients each pre-assigned to a pre-registered recognition target speech, based on the time data;
speech recognition means for computing a final recognition result in response to the speech data pattern, said speech recognition means including:
means for storing a plurality of pre-registered recognition target speeches and for outputting, in response to the speech data pattern, a plurality of recognition data values each for a corresponding pre-registered recognition target speech,
means for computing final recognition data by multiplying each recognition data value by a corresponding one of said pre-assigned weighting coefficient for a corresponding pre-registered recognition target speech, and
means for recognizing the input speech by comparing the final recognition data for all of the pre-registered recognition target speeches and for outputting a final recognition result; and
speech synthesis means for converting the final recognition result to corresponding speech synthesis data for producing an appropriate response to the input speech.
11. An interactive speech recognition device, comprising:
speech analysis means for analyzing an input speech and creating a speech data pattern that matches characteristics of the input speech;
speech recognition means for generating recognition data that corresponds to the input speech, based on the speech data pattern from said speech analysis means;
detection means for detecting variable non-speech data that changes speech flowing from the speech recognition device;
response content creation means, responsive to the variable non-speech data from said detection means and the recognition data from said speech recognition means, for outputting response content data, based on the recognition data by taking the variable data into consideration;
an operating mechanism and a drive control unit responsive to the response content data for controlling the operating mechanism.
12. The interactive speech recognition device of claim 11 further comprising speech synthesis means for converting said response content data to speech synthesis data simultaneous with said drive control unit controlling said operating mechanism in response to said response content data.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP4200595 | 1995-03-01 | ||
JP7-042005 | 1995-03-01 | ||
JP32935295A JP3254994B2 (en) | 1995-03-01 | 1995-12-18 | Speech recognition dialogue apparatus and speech recognition dialogue processing method |
JP7-329352 | 1995-12-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
US5802488A true US5802488A (en) | 1998-09-01 |
Family
ID=26381654
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/609,336 Expired - Lifetime US5802488A (en) | 1995-03-01 | 1996-02-29 | Interactive speech recognition with varying responses for time of day and environmental conditions |
Country Status (8)
Country | Link |
---|---|
US (1) | US5802488A (en) |
EP (1) | EP0730261B1 (en) |
JP (1) | JP3254994B2 (en) |
KR (1) | KR100282022B1 (en) |
CN (2) | CN1132148C (en) |
DE (1) | DE69618488T2 (en) |
HK (1) | HK1014604A1 (en) |
TW (1) | TW340938B (en) |
Cited By (67)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010021907A1 (en) * | 1999-12-28 | 2001-09-13 | Masato Shimakawa | Speech synthesizing apparatus, speech synthesizing method, and recording medium |
FR2811238A1 (en) * | 2000-07-04 | 2002-01-11 | Tomy Co Ltd | Interactive dog/robot game having stimulus detector and drive elements with command element providing interactive response following action point sequence. |
US20020019678A1 (en) * | 2000-08-07 | 2002-02-14 | Takashi Mizokawa | Pseudo-emotion sound expression system |
WO2002028603A1 (en) * | 2000-10-05 | 2002-04-11 | Sony Corporation | Robot apparatus and its control method |
US6585556B2 (en) * | 2000-05-13 | 2003-07-01 | Alexander V Smirnov | Talking toy |
US6587547B1 (en) | 1999-09-13 | 2003-07-01 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with real-time drilling via telephone |
US6594630B1 (en) | 1999-11-19 | 2003-07-15 | Voice Signal Technologies, Inc. | Voice-activated control for electrical device |
US20030163320A1 (en) * | 2001-03-09 | 2003-08-28 | Nobuhide Yamazaki | Voice synthesis device |
US20030187659A1 (en) * | 2002-03-15 | 2003-10-02 | Samsung Electronics Co., Ltd. | Method and apparatus for controlling devices connected to home network |
US6631351B1 (en) | 1999-09-14 | 2003-10-07 | Aidentity Matrix | Smart toys |
US6705919B2 (en) * | 2002-01-08 | 2004-03-16 | Mattel, Inc. | Electronic amusement device with long duration timer |
US20040077272A1 (en) * | 1998-12-04 | 2004-04-22 | Jurmain Richard N. | Infant simulator |
US6772121B1 (en) * | 1999-03-05 | 2004-08-03 | Namco, Ltd. | Virtual pet device and control program recording medium therefor |
US20040152394A1 (en) * | 2002-09-27 | 2004-08-05 | Marine Jon C. | Animated multi-persona toy |
US6829334B1 (en) | 1999-09-13 | 2004-12-07 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with telephone-based service utilization and control |
US6836537B1 (en) | 1999-09-13 | 2004-12-28 | Microstrategy Incorporated | System and method for real-time, personalized, dynamic, interactive voice services for information related to existing travel schedule |
US6850603B1 (en) | 1999-09-13 | 2005-02-01 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized dynamic and interactive voice services |
US6885734B1 (en) | 1999-09-13 | 2005-04-26 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive inbound and outbound voice services, with real-time interactive voice database queries |
US6940953B1 (en) | 1999-09-13 | 2005-09-06 | Microstrategy, Inc. | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services including module for generating and formatting voice services |
US6956497B1 (en) * | 1997-10-09 | 2005-10-18 | Vulcan Patents Llc | Method and apparatus for sending presence messages |
US6964012B1 (en) | 1999-09-13 | 2005-11-08 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, including deployment through personalized broadcasts |
US20060020467A1 (en) * | 1999-11-19 | 2006-01-26 | Nippon Telegraph & Telephone Corporation | Acoustic signal transmission method and acoustic signal transmission apparatus |
US6991511B2 (en) | 2000-02-28 | 2006-01-31 | Mattel Inc. | Expression-varying device |
US20060036433A1 (en) * | 2004-08-10 | 2006-02-16 | International Business Machines Corporation | Method and system of dynamically changing a sentence structure of a message |
US20060106611A1 (en) * | 2004-11-12 | 2006-05-18 | Sophia Krasikov | Devices and methods providing automated assistance for verbal communication |
US7065490B1 (en) * | 1999-11-30 | 2006-06-20 | Sony Corporation | Voice processing method based on the emotion and instinct states of a robot |
US7203642B2 (en) | 2000-10-11 | 2007-04-10 | Sony Corporation | Robot control apparatus and method with echo back prosody |
US20070128979A1 (en) * | 2005-12-07 | 2007-06-07 | J. Shackelford Associates Llc. | Interactive Hi-Tech doll |
US7313524B1 (en) * | 1999-11-30 | 2007-12-25 | Sony Corporation | Voice recognition based on a growth state of a robot |
US20090063155A1 (en) * | 2007-08-31 | 2009-03-05 | Hon Hai Precision Industry Co., Ltd. | Robot apparatus with vocal interactive function and method therefor |
US7545359B1 (en) | 1995-08-03 | 2009-06-09 | Vulcan Patents Llc | Computerized interactor systems and methods for providing same |
US20090157199A1 (en) * | 1995-05-30 | 2009-06-18 | Brown David W | Motion Control Systems |
US20100023163A1 (en) * | 2008-06-27 | 2010-01-28 | Kidd Cory D | Apparatus and Method for Assisting in Achieving Desired Behavior Patterns |
US20100191373A1 (en) * | 2009-01-23 | 2010-07-29 | Samsung Electronics Co., Ltd. | Robot |
US20110071652A1 (en) * | 2001-02-09 | 2011-03-24 | Roy-G-Biv Corporation | Event Management Systems and Methods for Motion Control Systems |
US7953112B2 (en) | 1997-10-09 | 2011-05-31 | Interval Licensing Llc | Variable bandwidth communication systems and methods |
US20110178801A1 (en) * | 2001-02-28 | 2011-07-21 | Telecom Italia S.P.A. | System and method for access to multimedia structures |
US20110213613A1 (en) * | 2006-04-03 | 2011-09-01 | Google Inc., a CA corporation | Automatic Language Model Update |
US20110301957A1 (en) * | 1997-10-07 | 2011-12-08 | Roy-G-Biv Corporation | System and/or Method for Audibly Prompting a Patient with a Motion Device |
US8130918B1 (en) | 1999-09-13 | 2012-03-06 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with closed loop transaction processing |
US8321411B2 (en) | 1999-03-23 | 2012-11-27 | Microstrategy, Incorporated | System and method for management of an automatic OLAP report broadcast system |
US8607138B2 (en) | 1999-05-28 | 2013-12-10 | Microstrategy, Incorporated | System and method for OLAP report generation with spreadsheet report within the network user interface |
US8762133B2 (en) | 2012-08-30 | 2014-06-24 | Arria Data2Text Limited | Method and apparatus for alert validation |
US8762134B2 (en) | 2012-08-30 | 2014-06-24 | Arria Data2Text Limited | Method and apparatus for situational analysis text generation |
US20150324351A1 (en) * | 2012-11-16 | 2015-11-12 | Arria Data2Text Limited | Method and apparatus for expressing time in an output text |
US9208213B2 (en) | 1999-05-28 | 2015-12-08 | Microstrategy, Incorporated | System and method for network user interface OLAP report formatting |
US9244894B1 (en) | 2013-09-16 | 2016-01-26 | Arria Data2Text Limited | Method and apparatus for interactive reports |
US9336193B2 (en) | 2012-08-30 | 2016-05-10 | Arria Data2Text Limited | Method and apparatus for updating a previously generated text |
US9355093B2 (en) | 2012-08-30 | 2016-05-31 | Arria Data2Text Limited | Method and apparatus for referring expression generation |
US9396181B1 (en) | 2013-09-16 | 2016-07-19 | Arria Data2Text Limited | Method, apparatus, and computer program product for user-directed reporting |
US9405448B2 (en) | 2012-08-30 | 2016-08-02 | Arria Data2Text Limited | Method and apparatus for annotating a graphical output |
US9443515B1 (en) | 2012-09-05 | 2016-09-13 | Paul G. Boyce | Personality designer system for a detachably attachable remote audio object |
US9520142B2 (en) | 2014-05-16 | 2016-12-13 | Alphonso Inc. | Efficient apparatus and method for audio signature generation using recognition history |
US9600471B2 (en) | 2012-11-02 | 2017-03-21 | Arria Data2Text Limited | Method and apparatus for aggregating with information generalization |
US9946711B2 (en) | 2013-08-29 | 2018-04-17 | Arria Data2Text Limited | Text generation from correlated alerts |
US9990360B2 (en) | 2012-12-27 | 2018-06-05 | Arria Data2Text Limited | Method and apparatus for motion description |
US10115202B2 (en) | 2012-12-27 | 2018-10-30 | Arria Data2Text Limited | Method and apparatus for motion detection |
CN108769090A (en) * | 2018-03-23 | 2018-11-06 | 山东英才学院 | A kind of intelligence control system based on toy for children |
US10445432B1 (en) | 2016-08-31 | 2019-10-15 | Arria Data2Text Limited | Method and apparatus for lightweight multilingual natural language realizer |
US10467347B1 (en) | 2016-10-31 | 2019-11-05 | Arria Data2Text Limited | Method and apparatus for natural language document orchestrator |
US10565308B2 (en) | 2012-08-30 | 2020-02-18 | Arria Data2Text Limited | Method and apparatus for configurable microplanning |
US10664558B2 (en) | 2014-04-18 | 2020-05-26 | Arria Data2Text Limited | Method and apparatus for document planning |
US10776561B2 (en) | 2013-01-15 | 2020-09-15 | Arria Data2Text Limited | Method and apparatus for generating a linguistic representation of raw input data |
US11176214B2 (en) | 2012-11-16 | 2021-11-16 | Arria Data2Text Limited | Method and apparatus for spatial descriptions in an output text |
US11443747B2 (en) * | 2019-09-18 | 2022-09-13 | Lg Electronics Inc. | Artificial intelligence apparatus and method for recognizing speech of user in consideration of word usage frequency |
US11520474B2 (en) * | 2015-05-15 | 2022-12-06 | Spotify Ab | Playback of media streams in dependence of a time of a day |
CN116352727A (en) * | 2023-06-01 | 2023-06-30 | 安徽淘云科技股份有限公司 | Control method of bionic robot and related equipment |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19635754A1 (en) * | 1996-09-03 | 1998-03-05 | Siemens Ag | Speech processing system and method for speech processing |
US7283964B1 (en) | 1999-05-21 | 2007-10-16 | Winbond Electronics Corporation | Method and apparatus for voice controlled devices with improved phrase storage, use, conversion, transfer, and recognition |
US20030093281A1 (en) * | 1999-05-21 | 2003-05-15 | Michael Geilhufe | Method and apparatus for machine to machine communication using speech |
EP1063636A3 (en) * | 1999-05-21 | 2001-11-14 | Winbond Electronics Corporation | Method and apparatus for standard voice user interface and voice controlled devices |
US6584439B1 (en) | 1999-05-21 | 2003-06-24 | Winbond Electronics Corporation | Method and apparatus for controlling voice controlled devices |
US20020193989A1 (en) * | 1999-05-21 | 2002-12-19 | Michael Geilhufe | Method and apparatus for identifying voice controlled devices |
JP3212578B2 (en) * | 1999-06-30 | 2001-09-25 | インタロボット株式会社 | Physical voice reaction toy |
JP4032273B2 (en) * | 1999-12-28 | 2008-01-16 | ソニー株式会社 | Synchronization control apparatus and method, and recording medium |
JP2001277166A (en) * | 2000-03-31 | 2001-10-09 | Sony Corp | Robot and behaivoir determining method therefor |
JP2001340659A (en) * | 2000-06-05 | 2001-12-11 | Interrobot Inc | Various communication motion-forming method for pseudo personality |
JP2002028378A (en) * | 2000-07-13 | 2002-01-29 | Tomy Co Ltd | Conversing toy and method for generating reaction pattern |
AUPR141200A0 (en) * | 2000-11-13 | 2000-12-07 | Symons, Ian Robert | Directional microphone |
JP4687936B2 (en) * | 2001-03-22 | 2011-05-25 | ソニー株式会社 | Audio output device, audio output method, program, and recording medium |
KR100434065B1 (en) * | 2001-12-18 | 2004-06-04 | 엘지전자 주식회사 | Voice recognition method of robot |
EP1487259B1 (en) * | 2002-03-22 | 2005-12-28 | C.R.F. Società Consortile per Azioni | A vocal connection system between humans and animals |
WO2004036939A1 (en) * | 2002-10-18 | 2004-04-29 | Institute Of Acoustics Chinese Academy Of Sciences | Portable digital mobile communication apparatus, method for controlling speech and system |
ITTO20020933A1 (en) | 2002-10-25 | 2004-04-26 | Fiat Ricerche | VOICE CONNECTION SYSTEM BETWEEN MAN AND ANIMALS. |
WO2005038776A1 (en) * | 2003-10-17 | 2005-04-28 | Intelligent Toys Ltd | Voice controlled toy |
GB0604624D0 (en) * | 2006-03-06 | 2006-04-19 | Ellis Anthony M | Toy |
JP4305672B2 (en) * | 2006-11-21 | 2009-07-29 | ソニー株式会社 | Personal identification device, personal identification method, identification dictionary data update method, and identification dictionary data update program |
US20080147411A1 (en) * | 2006-12-19 | 2008-06-19 | International Business Machines Corporation | Adaptation of a speech processing system from external input that is not directly related to sounds in an operational acoustic environment |
CN101075435B (en) * | 2007-04-19 | 2011-05-18 | 深圳先进技术研究院 | Intelligent chatting system and its realizing method |
JP2009151314A (en) * | 2008-12-25 | 2009-07-09 | Sony Corp | Information processing device and information processing method |
JP5464078B2 (en) * | 2010-06-30 | 2014-04-09 | 株式会社デンソー | Voice recognition terminal |
JP6166889B2 (en) * | 2012-11-15 | 2017-07-19 | 株式会社Nttドコモ | Dialog support apparatus, dialog system, dialog support method and program |
JP2015087649A (en) * | 2013-10-31 | 2015-05-07 | シャープ株式会社 | Utterance control device, method, utterance system, program, and utterance device |
US10049666B2 (en) * | 2016-01-06 | 2018-08-14 | Google Llc | Voice recognition system |
CN109841216B (en) * | 2018-12-26 | 2020-12-15 | 珠海格力电器股份有限公司 | Voice data processing method and device and intelligent terminal |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4459674A (en) * | 1981-02-20 | 1984-07-10 | Canon Kabushiki Kaisha | Voice input/output apparatus |
WO1987006487A1 (en) * | 1986-05-02 | 1987-11-05 | Vladimir Sirota | Toy |
US4923428A (en) * | 1988-05-05 | 1990-05-08 | Cal R & D, Inc. | Interactive talking toy |
US5029214A (en) * | 1986-08-11 | 1991-07-02 | Hollander James F | Electronic speech control apparatus and methods |
WO1993006575A1 (en) * | 1991-09-24 | 1993-04-01 | Sedlmayr Steven R | Night light |
US5255342A (en) * | 1988-12-20 | 1993-10-19 | Kabushiki Kaisha Toshiba | Pattern recognition system and method using neural network |
JPH06142342A (en) * | 1992-10-14 | 1994-05-24 | Sanyo Electric Co Ltd | Voice recognizable toy |
US5375173A (en) * | 1991-08-08 | 1994-12-20 | Fujitsu Limited | Speaker adapted speech recognition system |
US5404422A (en) * | 1989-12-28 | 1995-04-04 | Sharp Kabushiki Kaisha | Speech recognition system with neural network |
US5410635A (en) * | 1987-11-25 | 1995-04-25 | Nec Corporation | Connected word recognition system including neural networks arranged along a signal time axis |
US5481644A (en) * | 1992-08-06 | 1996-01-02 | Seiko Epson Corporation | Neural network speech recognition apparatus recognizing the frequency of successively input identical speech data sequences |
US5596679A (en) * | 1994-10-26 | 1997-01-21 | Motorola, Inc. | Method and system for identifying spoken sounds in continuous speech by comparing classifier outputs |
US5638486A (en) * | 1994-10-26 | 1997-06-10 | Motorola, Inc. | Method and system for continuous speech recognition using voting techniques |
US5655057A (en) * | 1993-12-27 | 1997-08-05 | Nec Corporation | Speech recognition apparatus |
-
1995
- 1995-12-18 JP JP32935295A patent/JP3254994B2/en not_active Expired - Lifetime
- 1995-12-21 TW TW084113714A patent/TW340938B/en not_active IP Right Cessation
-
1996
- 1996-02-22 KR KR1019960004559A patent/KR100282022B1/en not_active IP Right Cessation
- 1996-02-29 DE DE69618488T patent/DE69618488T2/en not_active Expired - Lifetime
- 1996-02-29 EP EP96301394A patent/EP0730261B1/en not_active Expired - Lifetime
- 1996-02-29 US US08/609,336 patent/US5802488A/en not_active Expired - Lifetime
- 1996-02-29 CN CN96104209A patent/CN1132148C/en not_active Expired - Lifetime
- 1996-02-29 CN CNB031311911A patent/CN1229773C/en not_active Expired - Lifetime
-
1998
- 1998-12-28 HK HK98115936A patent/HK1014604A1/en not_active IP Right Cessation
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4459674A (en) * | 1981-02-20 | 1984-07-10 | Canon Kabushiki Kaisha | Voice input/output apparatus |
WO1987006487A1 (en) * | 1986-05-02 | 1987-11-05 | Vladimir Sirota | Toy |
US5029214A (en) * | 1986-08-11 | 1991-07-02 | Hollander James F | Electronic speech control apparatus and methods |
US5410635A (en) * | 1987-11-25 | 1995-04-25 | Nec Corporation | Connected word recognition system including neural networks arranged along a signal time axis |
US4923428A (en) * | 1988-05-05 | 1990-05-08 | Cal R & D, Inc. | Interactive talking toy |
US5255342A (en) * | 1988-12-20 | 1993-10-19 | Kabushiki Kaisha Toshiba | Pattern recognition system and method using neural network |
US5404422A (en) * | 1989-12-28 | 1995-04-04 | Sharp Kabushiki Kaisha | Speech recognition system with neural network |
US5375173A (en) * | 1991-08-08 | 1994-12-20 | Fujitsu Limited | Speaker adapted speech recognition system |
WO1993006575A1 (en) * | 1991-09-24 | 1993-04-01 | Sedlmayr Steven R | Night light |
US5481644A (en) * | 1992-08-06 | 1996-01-02 | Seiko Epson Corporation | Neural network speech recognition apparatus recognizing the frequency of successively input identical speech data sequences |
JPH06142342A (en) * | 1992-10-14 | 1994-05-24 | Sanyo Electric Co Ltd | Voice recognizable toy |
US5655057A (en) * | 1993-12-27 | 1997-08-05 | Nec Corporation | Speech recognition apparatus |
US5596679A (en) * | 1994-10-26 | 1997-01-21 | Motorola, Inc. | Method and system for identifying spoken sounds in continuous speech by comparing classifier outputs |
US5638486A (en) * | 1994-10-26 | 1997-06-10 | Motorola, Inc. | Method and system for continuous speech recognition using voting techniques |
Cited By (142)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090157199A1 (en) * | 1995-05-30 | 2009-06-18 | Brown David W | Motion Control Systems |
US8154511B2 (en) | 1995-08-03 | 2012-04-10 | Vintell Applications Ny, Llc | Computerized interactor systems and methods for providing same |
US7545359B1 (en) | 1995-08-03 | 2009-06-09 | Vulcan Patents Llc | Computerized interactor systems and methods for providing same |
US20110301957A1 (en) * | 1997-10-07 | 2011-12-08 | Roy-G-Biv Corporation | System and/or Method for Audibly Prompting a Patient with a Motion Device |
US6956497B1 (en) * | 1997-10-09 | 2005-10-18 | Vulcan Patents Llc | Method and apparatus for sending presence messages |
US8509137B2 (en) | 1997-10-09 | 2013-08-13 | Interval Licensing Llc | Method and apparatus for sending presence messages |
US8416806B2 (en) | 1997-10-09 | 2013-04-09 | Interval Licensing Llc | Variable bandwidth communication systems and methods |
US20110228039A1 (en) * | 1997-10-09 | 2011-09-22 | Debby Hindus | Variable bandwidth communication systems and methods |
US7953112B2 (en) | 1997-10-09 | 2011-05-31 | Interval Licensing Llc | Variable bandwidth communication systems and methods |
US8414346B2 (en) * | 1998-12-04 | 2013-04-09 | Realityworks, Inc. | Infant simulator |
US20040077272A1 (en) * | 1998-12-04 | 2004-04-22 | Jurmain Richard N. | Infant simulator |
US6772121B1 (en) * | 1999-03-05 | 2004-08-03 | Namco, Ltd. | Virtual pet device and control program recording medium therefor |
US8321411B2 (en) | 1999-03-23 | 2012-11-27 | Microstrategy, Incorporated | System and method for management of an automatic OLAP report broadcast system |
US9477740B1 (en) | 1999-03-23 | 2016-10-25 | Microstrategy, Incorporated | System and method for management of an automatic OLAP report broadcast system |
US8607138B2 (en) | 1999-05-28 | 2013-12-10 | Microstrategy, Incorporated | System and method for OLAP report generation with spreadsheet report within the network user interface |
US9208213B2 (en) | 1999-05-28 | 2015-12-08 | Microstrategy, Incorporated | System and method for network user interface OLAP report formatting |
US10592705B2 (en) | 1999-05-28 | 2020-03-17 | Microstrategy, Incorporated | System and method for network user interface report formatting |
US6768788B1 (en) | 1999-09-13 | 2004-07-27 | Microstrategy, Incorporated | System and method for real-time, personalized, dynamic, interactive voice services for property-related information |
US6873693B1 (en) | 1999-09-13 | 2005-03-29 | Microstrategy, Incorporated | System and method for real-time, personalized, dynamic, interactive voice services for entertainment-related information |
US8995628B2 (en) | 1999-09-13 | 2015-03-31 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services with closed loop transaction processing |
US8051369B2 (en) | 1999-09-13 | 2011-11-01 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, including deployment through personalized broadcasts |
US6788768B1 (en) | 1999-09-13 | 2004-09-07 | Microstrategy, Incorporated | System and method for real-time, personalized, dynamic, interactive voice services for book-related information |
US6798867B1 (en) | 1999-09-13 | 2004-09-28 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with real-time database queries |
US6829334B1 (en) | 1999-09-13 | 2004-12-07 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with telephone-based service utilization and control |
US6836537B1 (en) | 1999-09-13 | 2004-12-28 | Microstrategy Incorporated | System and method for real-time, personalized, dynamic, interactive voice services for information related to existing travel schedule |
US6850603B1 (en) | 1999-09-13 | 2005-02-01 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized dynamic and interactive voice services |
US6765997B1 (en) | 1999-09-13 | 2004-07-20 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with the direct delivery of voice services to networked voice messaging systems |
US6885734B1 (en) | 1999-09-13 | 2005-04-26 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive inbound and outbound voice services, with real-time interactive voice database queries |
US6940953B1 (en) | 1999-09-13 | 2005-09-06 | Microstrategy, Inc. | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services including module for generating and formatting voice services |
US6658093B1 (en) | 1999-09-13 | 2003-12-02 | Microstrategy, Incorporated | System and method for real-time, personalized, dynamic, interactive voice services for travel availability information |
US6587547B1 (en) | 1999-09-13 | 2003-07-01 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with real-time drilling via telephone |
US6964012B1 (en) | 1999-09-13 | 2005-11-08 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, including deployment through personalized broadcasts |
US7881443B2 (en) | 1999-09-13 | 2011-02-01 | Microstrategy, Incorporated | System and method for real-time, personalized, dynamic, interactive voice services for travel availability information |
US8094788B1 (en) | 1999-09-13 | 2012-01-10 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services with customized message depending on recipient |
US6606596B1 (en) | 1999-09-13 | 2003-08-12 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, including deployment through digital sound files |
US8130918B1 (en) | 1999-09-13 | 2012-03-06 | Microstrategy, Incorporated | System and method for the creation and automatic deployment of personalized, dynamic and interactive voice services, with closed loop transaction processing |
US6631351B1 (en) | 1999-09-14 | 2003-10-07 | Aidentity Matrix | Smart toys |
US7949519B2 (en) | 1999-11-19 | 2011-05-24 | Nippon Telegraph And Telephone Corporation | Information communication apparatus, transmission apparatus and receiving apparatus |
US20090157406A1 (en) * | 1999-11-19 | 2009-06-18 | Satoshi Iwaki | Acoustic Signal Transmission Method And Acoustic Signal Transmission Apparatus |
US6594630B1 (en) | 1999-11-19 | 2003-07-15 | Voice Signal Technologies, Inc. | Voice-activated control for electrical device |
US20110176683A1 (en) * | 1999-11-19 | 2011-07-21 | Nippon Telegraph And Telephone Corporation | Information Communication Apparatus, Transmission Apparatus And Receiving Apparatus |
US20060020467A1 (en) * | 1999-11-19 | 2006-01-26 | Nippon Telegraph & Telephone Corporation | Acoustic signal transmission method and acoustic signal transmission apparatus |
US7657435B2 (en) | 1999-11-19 | 2010-02-02 | Nippon Telegraph | Acoustic signal transmission method and apparatus with insertion signal |
US20060153390A1 (en) * | 1999-11-19 | 2006-07-13 | Nippon Telegraph & Telephone Corporation | Acoustic signal transmission method and acoustic signal transmission apparatus |
US8635072B2 (en) | 1999-11-19 | 2014-01-21 | Nippon Telegraph And Telephone Corporation | Information communication using majority logic for machine control signals extracted from audible sound signals |
US7065490B1 (en) * | 1999-11-30 | 2006-06-20 | Sony Corporation | Voice processing method based on the emotion and instinct states of a robot |
US7313524B1 (en) * | 1999-11-30 | 2007-12-25 | Sony Corporation | Voice recognition based on a growth state of a robot |
US20010021907A1 (en) * | 1999-12-28 | 2001-09-13 | Masato Shimakawa | Speech synthesizing apparatus, speech synthesizing method, and recording medium |
US7379871B2 (en) * | 1999-12-28 | 2008-05-27 | Sony Corporation | Speech synthesizing apparatus, speech synthesizing method, and recording medium using a plurality of substitute dictionaries corresponding to pre-programmed personality information |
US6991511B2 (en) | 2000-02-28 | 2006-01-31 | Mattel Inc. | Expression-varying device |
US6585556B2 (en) * | 2000-05-13 | 2003-07-01 | Alexander V Smirnov | Talking toy |
FR2811238A1 (en) * | 2000-07-04 | 2002-01-11 | Tomy Co Ltd | Interactive dog/robot game having stimulus detector and drive elements with command element providing interactive response following action point sequence. |
US20020016128A1 (en) * | 2000-07-04 | 2002-02-07 | Tomy Company, Ltd. | Interactive toy, reaction behavior pattern generating device, and reaction behavior pattern generating method |
US6682390B2 (en) * | 2000-07-04 | 2004-01-27 | Tomy Company, Ltd. | Interactive toy, reaction behavior pattern generating device, and reaction behavior pattern generating method |
US20020019678A1 (en) * | 2000-08-07 | 2002-02-14 | Takashi Mizokawa | Pseudo-emotion sound expression system |
WO2002028603A1 (en) * | 2000-10-05 | 2002-04-11 | Sony Corporation | Robot apparatus and its control method |
US6711467B2 (en) | 2000-10-05 | 2004-03-23 | Sony Corporation | Robot apparatus and its control method |
US7203642B2 (en) | 2000-10-11 | 2007-04-10 | Sony Corporation | Robot control apparatus and method with echo back prosody |
US20110071652A1 (en) * | 2001-02-09 | 2011-03-24 | Roy-G-Biv Corporation | Event Management Systems and Methods for Motion Control Systems |
US20110178801A1 (en) * | 2001-02-28 | 2011-07-21 | Telecom Italia S.P.A. | System and method for access to multimedia structures |
US8155970B2 (en) * | 2001-02-28 | 2012-04-10 | Telecom Italia S.P.A. | System and method for access to multimedia structures |
US20030163320A1 (en) * | 2001-03-09 | 2003-08-28 | Nobuhide Yamazaki | Voice synthesis device |
US6705919B2 (en) * | 2002-01-08 | 2004-03-16 | Mattel, Inc. | Electronic amusement device with long duration timer |
US7957974B2 (en) * | 2002-03-15 | 2011-06-07 | Samsung Electronics Co., Ltd. | Method and apparatus for controlling devices connected to home network |
US20030187659A1 (en) * | 2002-03-15 | 2003-10-02 | Samsung Electronics Co., Ltd. | Method and apparatus for controlling devices connected to home network |
US20040152394A1 (en) * | 2002-09-27 | 2004-08-05 | Marine Jon C. | Animated multi-persona toy |
US7118443B2 (en) | 2002-09-27 | 2006-10-10 | Mattel, Inc. | Animated multi-persona toy |
US20050233675A1 (en) * | 2002-09-27 | 2005-10-20 | Mattel, Inc. | Animated multi-persona toy |
US8380484B2 (en) | 2004-08-10 | 2013-02-19 | International Business Machines Corporation | Method and system of dynamically changing a sentence structure of a message |
US20060036433A1 (en) * | 2004-08-10 | 2006-02-16 | International Business Machines Corporation | Method and system of dynamically changing a sentence structure of a message |
US7818179B2 (en) * | 2004-11-12 | 2010-10-19 | International Business Machines Corporation | Devices and methods providing automated assistance for verbal communication |
US20060106611A1 (en) * | 2004-11-12 | 2006-05-18 | Sophia Krasikov | Devices and methods providing automated assistance for verbal communication |
US20070128979A1 (en) * | 2005-12-07 | 2007-06-07 | J. Shackelford Associates Llc. | Interactive Hi-Tech doll |
US8423359B2 (en) * | 2006-04-03 | 2013-04-16 | Google Inc. | Automatic language model update |
US8447600B2 (en) | 2006-04-03 | 2013-05-21 | Google Inc. | Automatic language model update |
US10410627B2 (en) | 2006-04-03 | 2019-09-10 | Google Llc | Automatic language model update |
US20110213613A1 (en) * | 2006-04-03 | 2011-09-01 | Google Inc., a CA corporation | Automatic Language Model Update |
US9159316B2 (en) | 2006-04-03 | 2015-10-13 | Google Inc. | Automatic language model update |
US9953636B2 (en) | 2006-04-03 | 2018-04-24 | Google Llc | Automatic language model update |
US20090063155A1 (en) * | 2007-08-31 | 2009-03-05 | Hon Hai Precision Industry Co., Ltd. | Robot apparatus with vocal interactive function and method therefor |
US20100023163A1 (en) * | 2008-06-27 | 2010-01-28 | Kidd Cory D | Apparatus and Method for Assisting in Achieving Desired Behavior Patterns |
US8565922B2 (en) * | 2008-06-27 | 2013-10-22 | Intuitive Automata Inc. | Apparatus and method for assisting in achieving desired behavior patterns |
US8554367B2 (en) * | 2009-01-23 | 2013-10-08 | Samsung Electronics Co., Ltd. | Robot |
US20100191373A1 (en) * | 2009-01-23 | 2010-07-29 | Samsung Electronics Co., Ltd. | Robot |
US10839580B2 (en) | 2012-08-30 | 2020-11-17 | Arria Data2Text Limited | Method and apparatus for annotating a graphical output |
US9640045B2 (en) | 2012-08-30 | 2017-05-02 | Arria Data2Text Limited | Method and apparatus for alert validation |
US9355093B2 (en) | 2012-08-30 | 2016-05-31 | Arria Data2Text Limited | Method and apparatus for referring expression generation |
US10282878B2 (en) | 2012-08-30 | 2019-05-07 | Arria Data2Text Limited | Method and apparatus for annotating a graphical output |
US9405448B2 (en) | 2012-08-30 | 2016-08-02 | Arria Data2Text Limited | Method and apparatus for annotating a graphical output |
US10026274B2 (en) | 2012-08-30 | 2018-07-17 | Arria Data2Text Limited | Method and apparatus for alert validation |
US9323743B2 (en) | 2012-08-30 | 2016-04-26 | Arria Data2Text Limited | Method and apparatus for situational analysis text generation |
US10963628B2 (en) | 2012-08-30 | 2021-03-30 | Arria Data2Text Limited | Method and apparatus for updating a previously generated text |
US9336193B2 (en) | 2012-08-30 | 2016-05-10 | Arria Data2Text Limited | Method and apparatus for updating a previously generated text |
US8762133B2 (en) | 2012-08-30 | 2014-06-24 | Arria Data2Text Limited | Method and apparatus for alert validation |
US10467333B2 (en) | 2012-08-30 | 2019-11-05 | Arria Data2Text Limited | Method and apparatus for updating a previously generated text |
US10769380B2 (en) | 2012-08-30 | 2020-09-08 | Arria Data2Text Limited | Method and apparatus for situational analysis text generation |
US10504338B2 (en) | 2012-08-30 | 2019-12-10 | Arria Data2Text Limited | Method and apparatus for alert validation |
US8762134B2 (en) | 2012-08-30 | 2014-06-24 | Arria Data2Text Limited | Method and apparatus for situational analysis text generation |
US10565308B2 (en) | 2012-08-30 | 2020-02-18 | Arria Data2Text Limited | Method and apparatus for configurable microplanning |
US9443515B1 (en) | 2012-09-05 | 2016-09-13 | Paul G. Boyce | Personality designer system for a detachably attachable remote audio object |
US9600471B2 (en) | 2012-11-02 | 2017-03-21 | Arria Data2Text Limited | Method and apparatus for aggregating with information generalization |
US10216728B2 (en) | 2012-11-02 | 2019-02-26 | Arria Data2Text Limited | Method and apparatus for aggregating with information generalization |
US9904676B2 (en) * | 2012-11-16 | 2018-02-27 | Arria Data2Text Limited | Method and apparatus for expressing time in an output text |
US10853584B2 (en) * | 2012-11-16 | 2020-12-01 | Arria Data2Text Limited | Method and apparatus for expressing time in an output text |
US11580308B2 (en) | 2012-11-16 | 2023-02-14 | Arria Data2Text Limited | Method and apparatus for expressing time in an output text |
US10311145B2 (en) * | 2012-11-16 | 2019-06-04 | Arria Data2Text Limited | Method and apparatus for expressing time in an output text |
US20200081985A1 (en) * | 2012-11-16 | 2020-03-12 | Arria Data2Text Limited | Method And Apparatus For Expressing Time In An Output Text |
US11176214B2 (en) | 2012-11-16 | 2021-11-16 | Arria Data2Text Limited | Method and apparatus for spatial descriptions in an output text |
US20150324351A1 (en) * | 2012-11-16 | 2015-11-12 | Arria Data2Text Limited | Method and apparatus for expressing time in an output text |
US10860810B2 (en) | 2012-12-27 | 2020-12-08 | Arria Data2Text Limited | Method and apparatus for motion description |
US10115202B2 (en) | 2012-12-27 | 2018-10-30 | Arria Data2Text Limited | Method and apparatus for motion detection |
US9990360B2 (en) | 2012-12-27 | 2018-06-05 | Arria Data2Text Limited | Method and apparatus for motion description |
US10803599B2 (en) | 2012-12-27 | 2020-10-13 | Arria Data2Text Limited | Method and apparatus for motion detection |
US10776561B2 (en) | 2013-01-15 | 2020-09-15 | Arria Data2Text Limited | Method and apparatus for generating a linguistic representation of raw input data |
US10671815B2 (en) | 2013-08-29 | 2020-06-02 | Arria Data2Text Limited | Text generation from correlated alerts |
US9946711B2 (en) | 2013-08-29 | 2018-04-17 | Arria Data2Text Limited | Text generation from correlated alerts |
US10255252B2 (en) | 2013-09-16 | 2019-04-09 | Arria Data2Text Limited | Method and apparatus for interactive reports |
US10860812B2 (en) | 2013-09-16 | 2020-12-08 | Arria Data2Text Limited | Method, apparatus, and computer program product for user-directed reporting |
US9244894B1 (en) | 2013-09-16 | 2016-01-26 | Arria Data2Text Limited | Method and apparatus for interactive reports |
US9396181B1 (en) | 2013-09-16 | 2016-07-19 | Arria Data2Text Limited | Method, apparatus, and computer program product for user-directed reporting |
US10282422B2 (en) | 2013-09-16 | 2019-05-07 | Arria Data2Text Limited | Method, apparatus, and computer program product for user-directed reporting |
US11144709B2 (en) * | 2013-09-16 | 2021-10-12 | Arria Data2Text Limited | Method and apparatus for interactive reports |
US10664558B2 (en) | 2014-04-18 | 2020-05-26 | Arria Data2Text Limited | Method and apparatus for document planning |
US9583121B2 (en) | 2014-05-16 | 2017-02-28 | Alphonso Inc. | Apparatus and method for determining co-location of services |
US10575126B2 (en) | 2014-05-16 | 2020-02-25 | Alphonso Inc. | Apparatus and method for determining audio and/or visual time shift |
US9698924B2 (en) * | 2014-05-16 | 2017-07-04 | Alphonso Inc. | Efficient apparatus and method for audio signature generation using recognition history |
US9641980B2 (en) | 2014-05-16 | 2017-05-02 | Alphonso Inc. | Apparatus and method for determining co-location of services using a device that generates an audio signal |
US9590755B2 (en) | 2014-05-16 | 2017-03-07 | Alphonso Inc. | Efficient apparatus and method for audio signature generation using audio threshold |
US9584236B2 (en) | 2014-05-16 | 2017-02-28 | Alphonso Inc. | Efficient apparatus and method for audio signature generation using motion |
US9942711B2 (en) | 2014-05-16 | 2018-04-10 | Alphonso Inc. | Apparatus and method for determining co-location of services using a device that generates an audio signal |
US9520142B2 (en) | 2014-05-16 | 2016-12-13 | Alphonso Inc. | Efficient apparatus and method for audio signature generation using recognition history |
US10278017B2 (en) | 2014-05-16 | 2019-04-30 | Alphonso, Inc | Efficient apparatus and method for audio signature generation using recognition history |
US11520474B2 (en) * | 2015-05-15 | 2022-12-06 | Spotify Ab | Playback of media streams in dependence of a time of a day |
US10853586B2 (en) | 2016-08-31 | 2020-12-01 | Arria Data2Text Limited | Method and apparatus for lightweight multilingual natural language realizer |
US10445432B1 (en) | 2016-08-31 | 2019-10-15 | Arria Data2Text Limited | Method and apparatus for lightweight multilingual natural language realizer |
US10963650B2 (en) | 2016-10-31 | 2021-03-30 | Arria Data2Text Limited | Method and apparatus for natural language document orchestrator |
US10467347B1 (en) | 2016-10-31 | 2019-11-05 | Arria Data2Text Limited | Method and apparatus for natural language document orchestrator |
US11727222B2 (en) | 2016-10-31 | 2023-08-15 | Arria Data2Text Limited | Method and apparatus for natural language document orchestrator |
CN108769090A (en) * | 2018-03-23 | 2018-11-06 | 山东英才学院 | A kind of intelligence control system based on toy for children |
US11443747B2 (en) * | 2019-09-18 | 2022-09-13 | Lg Electronics Inc. | Artificial intelligence apparatus and method for recognizing speech of user in consideration of word usage frequency |
CN116352727A (en) * | 2023-06-01 | 2023-06-30 | 安徽淘云科技股份有限公司 | Control method of bionic robot and related equipment |
CN116352727B (en) * | 2023-06-01 | 2023-10-24 | 安徽淘云科技股份有限公司 | Control method of bionic robot and related equipment |
Also Published As
Publication number | Publication date |
---|---|
EP0730261A2 (en) | 1996-09-04 |
CN1229773C (en) | 2005-11-30 |
JPH08297498A (en) | 1996-11-12 |
KR960035426A (en) | 1996-10-24 |
CN1142647A (en) | 1997-02-12 |
HK1014604A1 (en) | 1999-09-30 |
DE69618488T2 (en) | 2002-08-01 |
EP0730261A3 (en) | 1997-08-06 |
EP0730261B1 (en) | 2002-01-16 |
CN1516112A (en) | 2004-07-28 |
CN1132148C (en) | 2003-12-24 |
DE69618488D1 (en) | 2002-02-21 |
KR100282022B1 (en) | 2001-02-15 |
TW340938B (en) | 1998-09-21 |
JP3254994B2 (en) | 2002-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5802488A (en) | Interactive speech recognition with varying responses for time of day and environmental conditions | |
US11443739B1 (en) | Connected accessory for a voice-controlled device | |
US12033611B2 (en) | Generating expressive speech audio from text data | |
Moore | PRESENCE: A human-inspired architecture for speech-based human-machine interaction | |
Somervuo et al. | Parametric representations of bird sounds for automatic species recognition | |
US7228275B1 (en) | Speech recognition system having multiple speech recognizers | |
CN111128118B (en) | Speech synthesis method, related device and readable storage medium | |
US20050154594A1 (en) | Method and apparatus of simulating and stimulating human speech and teaching humans how to talk | |
US4696042A (en) | Syllable boundary recognition from phonological linguistic unit string data | |
JP2004513444A (en) | User interface / entertainment device that simulates personal interactions and augments external databases with relevant data | |
CN110853616A (en) | Speech synthesis method, system and storage medium based on neural network | |
US20020019678A1 (en) | Pseudo-emotion sound expression system | |
CN114173188B (en) | Video generation method, electronic device, storage medium and digital person server | |
US7313524B1 (en) | Voice recognition based on a growth state of a robot | |
KR20220165666A (en) | Method and system for generating synthesis voice using style tag represented by natural language | |
Liu et al. | Multistage deep transfer learning for EmIoT-Enabled Human–Computer interaction | |
EP1266374A4 (en) | Automatically retraining a speech recognition system | |
US20220199068A1 (en) | Speech Synthesis Apparatus and Method Thereof | |
CN117373431A (en) | Audio synthesis method, training method, device, equipment and storage medium | |
KR20030037774A (en) | Object growth control system and method | |
US8574020B2 (en) | Animated interactive figure and system | |
WO2017082717A2 (en) | Method and system for text to speech synthesis | |
Kos et al. | A speech-based distributed architecture platform for an intelligent ambience | |
JP3179370B2 (en) | Talking parrot utterance device | |
US11335321B2 (en) | Building a text-to-speech system from a small amount of speech data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SEIKO EPSON CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EDATSUNE, ISAO;REEL/FRAME:008008/0289 Effective date: 19960402 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
CC | Certificate of correction | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |