WO2005038776A1 - Voice controlled toy - Google Patents

Voice controlled toy Download PDF

Info

Publication number
WO2005038776A1
WO2005038776A1 PCT/GB2004/004395 GB2004004395W WO2005038776A1 WO 2005038776 A1 WO2005038776 A1 WO 2005038776A1 GB 2004004395 W GB2004004395 W GB 2004004395W WO 2005038776 A1 WO2005038776 A1 WO 2005038776A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
toy
speech
product
utterance
Prior art date
Application number
PCT/GB2004/004395
Other languages
French (fr)
Inventor
David Woodfield
David Neil Laurence Levy
Andrew Keatley
Original Assignee
Intelligent Toys Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0324375A external-priority patent/GB0324375D0/en
Priority claimed from GB0324376A external-priority patent/GB0324376D0/en
Priority claimed from GB0324372A external-priority patent/GB0324372D0/en
Priority claimed from GB0324373A external-priority patent/GB0324373D0/en
Application filed by Intelligent Toys Ltd filed Critical Intelligent Toys Ltd
Publication of WO2005038776A1 publication Critical patent/WO2005038776A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63HTOYS, e.g. TOPS, DOLLS, HOOPS OR BUILDING BLOCKS
    • A63H3/00Dolls
    • A63H3/28Arrangements of sound-producing means in dolls; Means in dolls for producing sounds
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the invention relates to a number of improvements to toys and the like. Several different inventions are described. Known conversational toys are described in the following US patents: 6110000, 4857030, 6631351, 6089942, 5213510, 6554679, 5752880, 6160986 and 6658388.
  • chatterbot which produces conversations that lack a continuity of logical thread but are nevertheless quite fun.
  • chatterbots can be found on the Internet. They typically comprise a keyboard processing system for comparing input text with known templates and for generating an output utterance via a speech synthesizer depending on the result of the comparison.
  • the chatterbot software outputs a response, often as text on a monitor, that appears to be related to the input. For example, if the user inputs-. "I love you” the chatterbot might respond "Tell me why you love me.
  • the chatterbot understands nothing - it somehow matches the input (or part of the input) to one of a number of stored utterances and then outputs a response that is somehow related to the matched input utterance .
  • Many chatterbot programs do this matching by looking for patterns of words. Normally, the user interacts with the chatterbot via a keyboard. It would be possible, using a very accurate speech recognizer, for the user's input utterance to be spoken. But speech recognition is not yet sufficiently advanced for this to be done completely reliably.
  • a speech generating assembly comprises an input device, such as a speech recogniser; a processing system connected to the input device to receive an input utterance; and a speech output device, such as a synthesizer, for outputting an utterance under control of the processing system operating in accordance with a predetermined algorithm which responds to an input utterance to generate an output utterance to which a single word or short phrase reply is expected via the input device, and, in response to that reply, to generate an input utterance for use by the predetermined algorithm.
  • a predetermined algorithm which responds to an input utterance to generate an output utterance to which a single word or short phrase reply is expected via the input device, and, in response to that reply, to generate an input utterance for use by the predetermined algorithm.
  • Figure 1 is a schematic block diagram
  • Figure 2 is a flow diagram.
  • the primary components of the toy as far as this invention are concerned are shown in Figure 1 and comprise a CPU or processing system ll coupled to a speech synthesizer 13 which in turn is connected to a loudspeaker 14.
  • the speech synthesizer 13 could be a device that outputs text-to-speech or one that stores and outputs recorded speech.
  • the CPU 11 is also connected to a speech recognizer 16 which may be of conventional form which receives input from a microphone 17.
  • the CPU 11 is also coupled to a ROM 12, RAM 18 and EEPROM 19 and to a RF transceiver 15 although the transceiver 15 is not necessary for implementing this invention.
  • the ROM 12 stores program code for operating the CPU 11.
  • the EEPROM 19 stores any data that the toy acquires during its use and which is needed to enhance the subsequent use of the toy, e.g. remembering a child's name so it can be used in conversation.
  • a typical operation will now be described with reference to the flow diagram of Figure 2. Initially, the CPU 11 selects a question from the ROM 12 and controls the speech synthesizer 13 to output the question via the loudspeaker 14 (step 20) . This question is selected to have a yes or no answer and could be:
  • step 24 while if the user answers "no", that answer is converted by the CPU 11 in conjunction with the program and data in the ROM 12 into:
  • the predetermined algorithm stored in the ROM 12 which corresponds to a conventional Chatterbot algorithm, performs its normal matching process by comparing the input utterance with templates selected from the ROM 12 and then selects an appropriate new output utterance (step 26) and the process returns to step 20.
  • a template could be of the form: "I ⁇ verbxnoun>” where the slots can be filled by an appropriate verb and an appropriate noun, either or both of which are taken from the system's previous utterance (which might have been: "Do you ⁇ verb ⁇ noun>?" , in which case, if the user responds "yes", the single word response is converted into the sentence "I ⁇ verb ⁇ noun>” ) . So out of a single word or short phrase answer a whole sentence has been created that can be used as the next input utterance for the chatterbot .
  • the system could take the form of a network.
  • a network is a collection of linked nodes, each node representing an utterance, and the links from a node to another node being determined by the user's response to that utterance.
  • the utterance "Do you like spaghetti?" could be a node, with links to two other nodes, one link chosen if the user responds "yes” and the other link chosen if the user responds "no" .
  • each of the possible answers by the user would lead the system to one or more possible next utterances.
  • the system has six possible next utterances corresponding to each possible answer by the user, thus providing variety of conversation by ensuring that one particular answer to a certain question is not always followed by the same utterance from the toy.
  • the word category By knowing what the word is the toy knows the word category and therefore can start a chatterbot conversation with a sensible utterance. It will be understood that although typically the conversation will be between the speech generating assembly and a human, the conversation could also be between two speech generating assemblies such as two toy dolls.
  • SECOND INVENTION US-A-4857030 describes a system of conversing dolls in which a number of dolls can take part in a multi-way conversation.
  • this prior art does not solve the problem of how to have more than two toys taking an active part in a conversation.
  • Their approach is to have just two toys as “speaker” and “responder” while “the third, fourth or more dolls which are present during a sequence will be listeners and provide occasional responses”.
  • a speech generating assembly comprises a speech output device, such as a synthesizer; a processor for implementing scripted speech; and an information signal receiving and transmitting system for receiving information signals from, and transmitting information signals to, other speech generating assemblies, the processor being adapted to act as a conversation leader for a period or for a number of utterances determined in accordance with a predetermined algorithm and then to generate a transfer information signal to transfer conversation leadership to another speech generating assembly.
  • a speech output device such as a synthesizer
  • a processor for implementing scripted speech
  • an information signal receiving and transmitting system for receiving information signals from, and transmitting information signals to, other speech generating assemblies
  • the processor being adapted to act as a conversation leader for a period or for a number of utterances determined in accordance with a predetermined algorithm and then to generate a transfer information signal to transfer conversation leadership to another speech generating assembly.
  • Conversation leadership means that the conversation leader controls the other speech generating assemblies in a group by issuing suitable control information signals to the assemblies, these signals usually being addressed to specific assemblies. Under this control, the speech synthesizers or other speech output devices in the group are controlled by the conversation leader via the respective processors to output utterances according to the script .
  • the conversation leader decides to whom or what it should be addressing its next utterance. This decision is based on a probabilistic process such as the one mentioned below.
  • a script represents how a conversation might develop between a toy and a child. This might consist of phrases spoken by the toy followed by replies made by the child (in the simplest case) . There will be a single point at which the conversation will begin but there may be a large number of ways in which the conversation will end, depending on the responses of the child or the "mood" of the toy. More formally, a script is a means of
  • nodes typically represents something said by the toy or an alteration in the toy's mood. (For example the toy becomes less happy) . It always contains something that occurs at a particular point in the conversation. Nodes are the most basic building blocks of a conversation. An arc joins two nodes together to produce a conversation path between them. For example, imagine that we have two nodes. In one the toy says “What colour do you like best - is i t red?" and in the other the toy says “"I like red too . " .
  • a correct conversation path would consist of the first node followed by the child answering "yes", followed by the second node.
  • An arc consisting of the word "yes” produces the required path of conversation.
  • a node describes a single event that can occur at a particular point in the conversation, whereas there may be many arcs coming from a node representing the many different paths that could occur at that point .
  • Scripts are normally employed by a toy having a conversation with a human.
  • Figure 3 is a block diagram of a group of toys
  • Figures 4 and 5 are flow diagrams.
  • Figure 3 illustrates a group of three toys T m ,T n ,T together with one user 30.
  • each of the toys has an internal structure similar to that shown in Figure 1 and so this will not be described again.
  • the ROM 12 stores software for carrying out scripted conversations as will be described further below. Assuming that one of the toys has been selected as the conversation leader, for example toy T m , it will then undertake a scripted conversation with the other toys in the vicinity and/or the user 30. This will occur in a conventional manner with the toy issuing an output utterance (step 32) , receiving a reply (step 34) and, if appropriate, outputting a further utterance.
  • the conversation leader In order to pass conversation leadership to another toy, the conversation leader maintains a count, in this example, of the number of output utterances which it makes so that after each step 32, that count is incremented (step 36). If the count is not greater than a threshold T (step 38) , the conversation leader retains leadership. However, if the count exceeds the threshold T, then the conversation leader passes leadership on to another toy. In order to achieve this control, as well as outputting audible speech utterances, the CPU 11 controls the RF transceiver 15 to generate a corresponding RF signal which is received by the transceivers 15 in other toys.
  • RF signals representing speech output by other toys are generated by corresponding transceivers 15 and detected by the RF transceiver 15 in the conversation leader.
  • the toys “speak” using text-to-speech or pre-recorded speech and "hear” what the user says using speech recognition it is only the RF data that enables each toy to know what has been said by the other toys and to receive commands such as which toy should be the next to speak and obtain knowledge of the presence of other toys .
  • a particular advantage of text-to-speech is that it allows for the possibility of the user (or others) upgrading the product by adding new scripts, since new scripts might well call for the use of vocabulary that is not employed in the original scripts and therefore, if pre- stored speech was used, the toy would not be able to say the "new" words.
  • the exact method of transmission and reception of the RF data could be primitive, for example that used in radio controlled toy cars and similar products.
  • Such RF systems typically transmit one of a small repertoire of commands, such as data corresponding to "backwards", “forwards", “left” or “right”. It may be desirable for the toys to transmit more data than can conveniently be handled by a very low cost RF transmitter/receiver system 15 or infrared.
  • the toys may employ a Micrel MICRF102 transmitter IC and a Micrel MICRF001 receiver IC.
  • the transmitter and receiver ICs could be integrated into a single transceiver IC such as the Xemics XE1201.
  • the transmission frequency is chosen so as to permit the use of unlicensed frequency bands, e.g. 433 MHz.
  • T m is the conversation leader and outputs an utterance via speech synthesizer 13 addressed to T x (step 40, Figure 5) which will be accompanied by an equivalently addressed RF information signal from the transceiver 15 in Tm.
  • the processor 11 of T x will determine that the RF signal is addressed to it and use the RF signal to determine its scripted response. This is output via its speech synthesizer 13 and a corresponding RF signal by its transceiver 15 (step 42) . On receiving the response RF signal, T m increments a counter for T x by one (step 44) .
  • toy (Tm) decides to hand over leadership (step 46) to another toy (Tn) (as determined using a probability function or otherwise as described above) , it does so immediately after a response to a question.
  • Tm chooses to whom it should pass the leadership by counting, as explained above, for each of the other toys in the group, how many toy utterances have been made in total since that toy last spoke and receiving the counts (step 48) . So for each toy other than Tm there will be a count - these counts, which are stored in the RAM 18, serve as weights for selecting the next toy to speak so the toy that has not spoken for the longest will have the greatest chance of being chosen to be the next to speak. Then, if the toy that is chosen (step 50) to be next to speak has a count of 8 or some other pre-set threshold, the handing- over toy (Tm) says:
  • RF signals are being exchanged between the toys in which the conversation leader sends (step 52) a RF signal addressed to T n instructing T n to become the next conversation leader.
  • humans 30 can be brought into the conversation from time to time. In one example, there is a 90% probability that a new conversation leader will address its next question to the same human to whom the previous question was addressed.
  • the other toy(s) in the room will ignore what the human says, picking up only on the utterance of the other toy that was indicated by the RF message. If there are two or more humans in the room and a human other than the one spoken to responds to the toy, the toy will still assume that the responder is the human it spoke to . In the 10% case that a new conversation leader does not address its next question to the same human to whom the previous question was addressed, the new conversation leader will choose between addressing a different human (if one is known to be in the room) or addressing another toy.
  • the probabilities of making each of these choices is based on giving each human a fair crack of the whip in terms of being a person addressed by a toy and on giving each toy a fair crack of the whip in terms of being addressed by a toy.
  • This probabilistic process operated by the conversation leader (s) and the actual probabilities employed in this process are such as to cause, over a long period of time, each toy to be the conversation leader for approximately the same number of utterances and each human to be the addressee of approximately the same number of utterances.
  • Each toy T m ,T n ,T x in the group is programmed to assume that if a toy leaves the group then its human has also left the group and should not be addressed again during the current conversation unless it re-enters the group.
  • each packet of data carries a unique identification number for the toy transmitting the data and the corresponding numbers for those toys or toy destined to receive the data. If a toy receives a data packed addressed to it, it will return a "data received" message.
  • the transmitting toy will not receive a "data received" message from that destination toy and will retransmit the data one or more times. After a predetermined limit has been reached on the number of transmissions, for example 10, the transmitting toy assumes that the particular destination toy has been taken out of the room and/or is otherwise no longer involved in the activities of the transmitting toy and has quit the conversation.
  • Each speech generating assembly or toy in a group will be individually identified using a unique identifier, each toy maintaining a record of other toys and humans it has met and information it has learned about them during its conversations, which data is stored in the EEPROM 19, and which allows the toys to demonstrate some intelligence.
  • the allocation of unique identifiers can be achieved as follows. In the factory a toy typically goes through various test procedures. These could include pressing buttons (keys) . The time delay between successive keystrokes is used to set the unique identifier. For example, the number of microseconds (modulo 255) between the first and second keystrokes is the first byte of a 4- byte identifier. The number of microseconds between the second and third keystrokes provides the second byte, and so on. (Of course it does not have to be successive keystrokes.) The resulting 4-byte number is stored in the EEPROM 19 so it is retained when power is removed, e.g. when the batteries are changed. All or part of this 4-byte number can be used in various ways.
  • each toy could have a different name.
  • the algorithm requires a random number generator and so the 4- byte number is used as the seed. This allows us to have more than 4 billion different names.
  • each toy can have a different set of likes and dislikes, and different personality characteristics, all determined from the 4-byte "unique" identifier.
  • the identifiers cannot be guaranteed to be absolutely unique - it is possible that two toys will have the same 4-byte number.
  • An important aspect of the invention is that it is preferable that in all of the conversations, those involving only one toy and one human and those involving more than one toy and/or more than one human, what the toy says is, in general, not random, but depends on various factors including its mood and what it has experienced in the past .
  • One example of the relevance of past experience is the way that they toy can remember (by storing data in a memory "store") things such as what virtual food it ate for breakfast, when was the last time that it virtually ate bananas, when was the last time the user played tic-tac-toe with it and what was the result.
  • Each toy's database of information about itself is preset in the factory (after the 4-byte number is created) with certain information about that particular toy.
  • the 4-byte number or part of it is used as the seed for a random number generator which in turn provides a list of, for example, the foods and drinks that that toy "likes" and those it doesn't like.
  • This information is stored in the EEPROM 19 so it is retained when the batteries are changed.
  • the toy asks questions such as: "What are you going to give me for lunch? Spaghetti, chicken or hamburgers?"
  • information is stored its the database so that the toy can use this information in subsequent conversations, for example: “Oh no. Not again. You've given me hamburgers three days running . " Each toy normally knows the name of its owner.
  • each of the inventions can be implemented independently of the other or in one or more combinations.
  • one type of conversation switches back and forth between scripted conversation and chatterbot conversation.
  • the conversation may start off being scripted.
  • chatterbot mode normally using one of the nouns in its most recent utterance as a key word in the first of its chatterbot utterances. This creates a measure of continuity.
  • chatterbot mode the toy might say something like:
  • One use of this method is that the software could choose to be terse (selecting utterances of shorter lengths: "The cat sat on the mat") or loquatious (selecting one of the longest utterances: "The little tabby kitten was sitting on the mat”) , depending on the toy's "mood” at that time, or for some other reason.
  • the assembly further comprises a store known as its Personal Data Base (PDB) for storing information such as "likes and dislikes" and details of past experiences such as what virtual food it ate for breakfast, when was the last time it virtually ate bananas, when was the last time the user played tic-tac-toe with it and what was the result etc.
  • PDB Personal Data Base
  • This information can also be used to determine a mood of the toy which can be used either alone or with the other information to determine the next output utterance. This will now be described in more detail below.
  • a toy's personality is exhibited through its behaviour which, in turn, is governed by its moods which, in their turn, are elicited by its emotions. Each emotion is associated with one or more moods .
  • Each mood is responsible for one or more behaviours - a behaviour can be an action of some sort or the execution of a script including varying what is said in the script according to the toy's mood and/or its past experiences.
  • the three factors in a personality make-up are called, in our model, Pleasure (P) , Arousal (A) and Dominance (D) .
  • Pleasure assumes its usual meaning - if something happens which the toy likes then the value of P rises. This could occur when the toy eats his favourite food (simultaneously increasing its arousal level) or when he plays a game that he enjoys or meets a friend.
  • a toy with a volatile temperament may become very unhappy indeed as a result of one unpleasant experience (such as being given its least favourite food for dinner) and may then suddenly change to a euphoric mood when it is taken to see its favourite animal at the zoo.
  • a phlegmatic toy may require 20 pleasant events to change its mood from neutral to euphoric and then
  • a toy's temperament is governed by its "step size" when modifying the values of P, A and D.
  • 8% of toys have a step size of 1% for a particular parameter 8% of toys have a step size of 2% 8% a step size of 3% 8% step sizes of 4%, 5%, 6%, 7%, 8%, 9%, 10%, 20% and 30% 4% a step size of 100%
  • the toy has its basic values of P, A and D.
  • the values of P, A and D undergo temporary changes and it is these changed values which determine, at any moment in time, the toy's current mood.
  • the temporary values of P, A and D decay, moving towards their basic values.
  • the toy powers down the values continue to decay but at a slower rate.
  • the toy has certain likes and dislikes which are stored in its Personal Data Base (PDB) .
  • PDB Personal Data Base
  • +1 means “I like it”
  • +2 means “I like it a lot”
  • +3 means “I'm crazy about it”.
  • -1, -2 and -3 mean, respectively, "I dislike it”, “I dislike it a lot” and “I hate it”.
  • 0 means "I'm neutral about it”.
  • the value of P increases by 1, 2 or 3 steps as appropriate, while if a "dislike” occurs the value of P is reduced by 1, 2 or 3 steps.
  • the likes and dislikes stored in the PDB there are other things which are likes and dislikes. For example, if the child does something the toy has asked him to do by, for example, playing a game with the user, the toy likes the fact that its request has been acceded to and P increases by 1 step. (But not if there is already an increase in P due to the requested action being on the "like" list in the PDB.) Certain things arouse the toy.
  • the toy's arousal level is reduced by one step. (This reduction is faster than when the toy is powered down because when the toy is active it expects quick responses from the child whereas when it is powered down it expects no response . )
  • at any time when the toy is switched on it is either "awake” or "asleep". If it is awake it acts as though it is awake, carrying on conversations, playing games, etc. From time to time the toy will simulate tiredness.
  • the toy When it is simulating sleep the toy can be programmed sometimes to make snoring sounds. It can also simulate dreaming in its sleep - when it dreams it talks in its sleep, sometimes even carrying on a conversation while asleep (though in a higher pitched or lower pitched voice in order to differentiate awake and asleep conditions) . Certain things affect the toy's Dominance level.
  • the decay rate for each of the parameters P, A and D depends on the toy's temperament.
  • An example of a decay rate is (step size) for every minute after the first minute that the toy is asleep or not being used while powered up.
  • step size for every minute after the first minute that the toy is asleep or not being used while powered up.
  • This decay effect is that when the toy simulates waking up, for example by starting a normal conversation, having been asleep, it can comment that it feels better or not so angry any more, according to how much change of mood there has been since it fell asleep. This is measured by computing how near the toy's mood is to certain undesirable states using the sum of squares of the differences: (P-P') 2 + (A-A') 2 + (D-D') 2 between the current values of P, A and D and the values corresponding to the mood in question. This nearness measure is recomputed when the toy wakes up and the toy then picks the undesirable mood corresponding to the maximum change of distance (sum of squares) .
  • Appendix 1 provides more examples of multi-way conversations .
  • a speech generating assembly comprises a speech output device, such as a speech synthesizer; a processor for controlling the speech output device; an information signal receiving and transmitting system for receiving information signals from, and transmitting information signals to, other speech generating assemblies, the processor being adapted to exchange information signals with one or more other speech generating assemblies when it wishes to output an utterance, the processor controlling the speech output device to output an utterance when the information signal exchange indicates that the or each other assembly is not outputting an utterance.
  • the assembly is particularly applicable for use in toys.
  • the information signal receiving and transmitting system in the second and third inventions transmits and receives signals using RF, infrared or ultrasound.
  • RF or other medium
  • the RF (or other medium) communication between assemblies should not be hampered by "collisions" between data packets being transmitted by more than one assembly. This is achieved using a system that is logically similar to the "Ethernet".
  • the Ethernet is a standard originated by the IEEE for physically connecting computers into a local area network and a communications protocol that allows those computers to share data. This standard is described in specification documents IEEE 802. IE (March 1991) and IEEE 802. IB (August 1992), as well as (for example) "Ethernet: The Definitive Guide” by Charles E.
  • the Ethernet standard dictates the communications protocol, i.e. how connected computers send data.
  • Computers linked by Ethernet send data along the wire in small chunks called "packets".
  • Packets One may think of a packet as a suitcase, tagged to travel to a different city, moving along a conveyor belt in an airport.
  • each packet carries a destination address and the sending computer's "home" address.
  • the Ethernet interface uses a protocol called Carrier Sense Multiple Access With Collision Detection to send packets of data.
  • the computer first detects a lull in activity - in the suitcase analogy this might be a gap during which there is no suitcase on the conveyor belt.
  • the computer When such a lull is detected the computer (or in our case the transmitting toy) transmits a data packet. Every time a packet reaches its destination toy (and there may be more than one destination toy for which the packet is intended) the sending toy is sent a confirmation data packet by the destination toy, while the sending toy waits for a gap to open that allows it to transmit another packet of data.
  • the amount of data transmitted in a package may well be such that only one packet of data is required to get the entire message from the sending toy to the destination toy(s) .
  • computers and other devices "along the way" read the destination address (es) for the data.
  • the sending computer attempts to read the data it has sent in order to ensure that no collision has occurred. If the data it receives is not identical to the data it sent, the sending computer then transmits data that can be recognized by the other computers in the configuration as garbage. This alerts the other sending computer (s) to the fact that two or more computers are attempting to send data simultaneously. When this has been accomplished those computers that had been sending data simultaneously immediately stop their transmissions and each waits for a random period of time before re-transmitting their data. By making this delay random the chances of a second successive collision are significantly decreased. This approach helps prevent network gridlock. In the case of speech generating assemblies according to the third invention, such as toys, the sending toy attempts to receive the data it has transmitted in order to ensure that no collision has occurred.
  • the sending toy then transmits garbage data that can be recognized as such by the other toys in the vicinity. This alerts the other sending toy(s) to the fact that two or more toys are attempting to transmit data simultaneously.
  • those toys that had been sending data simultaneously which may include the toy sending garbage data, immediately stop their transmissions and each waits for a random period of time, probably of the order of tens of milleseconds, before re-transmitting their data and speaking the corresponding utterances. By making this delay random the chances of a second successive collision are significantly decreased. This process continues until all of the data from the sending toy has been successfully received by all of the intended recipients or until the upper limit is reached on the number of times such data is sent before the sending toy assumes that an intended recipient is no longer in the room and switched on.
  • This invention relates to a method of transferring data and more particularly but not exclusively to such a method which is suitable for transferring from a data source to a plaything such as a toy or educational aid. It is well known to transfer data from a data source to a product such, as a computer using a telecommunications link, with the data source and product communicating via two way signal transfer. To achieve this, it is necessary to use equipment such as a modem or the like at each end of the telecommunications link, to modulate a signal for transmission at either end and to de-modulate the signal at the receiving end, for use. Higher speed data transfer means have been developed, such as so called ISDN connections . Such data transmission apparatus are expensive and generally not simple to use, particularly by children.
  • toys it is desirable for toys to be modified to provide additional interest for a child. For example, it is desirable for a toy which is able to perform a plurality of simple tasks, e.g. to utter several phrases, to be modified to utter alternative phrases. Alternatively it is desirable to be able to modify a toy which is programmed with a plurality of questions and answers by changing the questions and answers with which the toy is programmed. Toys are known in which modification is achieved for example, by inserting alternative instruction cards, electronic memory cartridges or the like, but the modification which can be made is restricted by the availability of such alternative instruction cards or cartridges.
  • a method of transferring data from a data source to a product which is capable of using the data to perform a task comprising establishing a link between the data source and the product, sending a signal including the data along the link from the data source to the product, and when the product indicates to an operator that the data has been received, the operator manually terminating the link.
  • the product has means to indicate to an operator when the data from the data source has successfully been received so that the operator may then manually terminate the link.
  • Such means may include a simple visual indicator and/or an audible indicator.
  • the product may include a transducer so that the product may receive and subsequently use data transmitted by the telephone apparatus as an audible signal or induced signal, and the method may include placing a telephone apparatus in the vicinity of the product whereby the signal sent by the data source is received by the transducer.
  • the product typically includes a memory for the data, and the data received by the product is stored in the memory for subsequent use in performing a task so that it is unnecessary for the data to be used live as it is transferred.
  • the data received by the product may replace data already stored in memory so that the product will thereafter only be capable of performing tasks according to the new data received, or the data may additionally be stored in memory along with existing data so that the product may perform additional tasks on receipt of the data.
  • the data to be transferred may be sent in discrete packets, and the method may thus include manually signalling the data source using a telephone apparatus, when one data packet has been received by the product, whereby the data source then sends another data packet.
  • a signal may be sent by the operator to the data source using a touch tone key of a telephone handset .
  • the transmission of data from the data source along telecommunications link may be controlled at the end of the link where the product is located, solely by a telephone apparatus, by means of which a telecommunications link is established, maintained and terminated.
  • the product may have an interpreter to interpret the data received, and the data may include one or more specific commands, in a simple arrangement the product may have a memory, preferably a read-only memory, in which there is contained information necessary to enable the product to perform a plurality of tasks, and the data may include further information relevant to at least one of the tasks, whereby when the product has received the data e.g.
  • the product is enabled to perform at least one of the plurality of tasks.
  • the product may be a toy which is capable of performing a plurality of tasks, a set of instructions for the performance of each task being stored in the memory thereof, and the data transferred from the data source activating at least one of the sets of instructions so that the toy is enabled to perform at least one of the tasks, or the data transferred from the data source may activate a plurality of sets of instructions, and the toy is enabled to perform a plurality of the tasks, e.g. in an order specified in the transferred data.
  • an apparatus for performing the method of the first aspect of the invention including a product which is capable of using the data to perform a task, a data source, and a link between the data source and the product, the product having signal receiving means for receiving a signal from the data source and no means for sending a signal along the link to the data source, and the link including apparatus by means of which, when the data is received by the product, the link may manually be terminated.
  • a product 110 which in this example is a child's soft toy, a telephone apparatus 111 including a telephone handset 112 and base unit 114, which is connected to a conventional telecommunications network 115. Also connected to the network 115 is a source of data 116.
  • the toy 110 has a memory 118 in which is contained information, which, when a child operates a control 119, may be used by the toy 110 to utter phrases.
  • the toy 110 is enabled to use only a small portion of the stored information, so that only selected of the phrases for which information is stored in the memory 118, may be uttered.
  • a telecommunications link may be made with the data source 116 using the telephone apparatus 111 and the network 115. Even a young child may use the telephone apparatus 111 to call the data source 116.
  • the data source 116 may commence sending a signal which contains a data packet to the telephone apparatus 111. The child may place the handset 112 in he vicinity of the toy 110 so that the audible signal transmitted by the handset 112 may be received by a sound transducer 120 of the toy 110, or a signal may be induced otherwise in an appropriate transducer of the toy 110.
  • the data source 116 is arranged repeatedly to send the signal, there being no connection established between the product 110 mid the data source 116 which would enable the product to signal the data source 116.
  • the signal may be sent by the data source 110 until the telecommunications link is broken, e.g. by the child replacing the handset 112 on the base unit 114.
  • the signal may be sent for a predetermined time only, and the operator would have to press a selected touch tone key 122 if it is desired for the data to be resent, if the data has not successfully been received by die toy 110.
  • the toy 110 may indicate this to the child by issuing an audible tone, uttering an appropriate phrase or the like, or by some visual indicating means.
  • the child may terminate the telecommunications link e.g. by replacing the telephone handset 112 on its base 114.
  • the toy 110 may be arranged to indicate to an operator that a complete packet of data has not been received, by issuing an alternative indication.
  • the data packet received by the toy 110 may include an instruction to enable the toy 110 to utter alternative or additional phrases e.g. when the control 119 is operated or at any other time, utilising the information already stored in the memory 118.
  • the toy 110 may be modified to perform additional or alternative tasks e.g..
  • phrase utterings upon receipt of the data, and the data transfer is controlled entirely by the operation of the telephone apparatus 111.
  • further data packets may be transferred from the data source 116 to the product 110 in the same way, and when a last data packet has been received by the product 110, the product 110 may indicate this to an operator, e.g, by issuing a special tone, other audible or visual signal, so that the operator may then terminate the telecommunications link.
  • the transferred data may indicate the order in which additional/alternative tasks are to be performed. Thus each task may be part of an overall task, and by varying the order in which tasks are to be performed, a large number of alternative overall tasks may be accomplished.
  • the transferred data may include information required by the product to perform a task, such as commands in an appropriate computer language.
  • the product 110 may require an interpreter to interpret the transferred data to enable the product to use the data to perform tasks.
  • the tasks which the product 110 may be enabled to perform by the transferred data may include not only the uttering of phrases, but may include movements e.g. of limbs or a head of the toy 110, or the tasks may be the visual and/or audible presentation of information, which may be question and answer form.
  • the product 110 may alternatively be an effector means for use in an industrial or consumer environment, the performance of the effector means being changed by the received data to suit the effector for a particular application.
  • an operator may arrange for particular data to be made available to third parties from the data source 16.
  • a child satisfied with the modified performance of his toy 110 may, using a series of touch tones on the telephone apparatus 111, arrange for the same data transferred to his toy, to be made available to a friend who establishes a telecommunications link with the data source 116 and the friend's own toy, rather than the friend having different data transferred to him from the data source 116.
  • the data source 116 using a telephone apparatus 111, could be arranged to transfer data to each of a group of two or more products 110 at different locations to enable a particular shared game to be played in the different locations, using a product 110 in each of the different locations .
  • the invention has been described in relation to a product 110 comprising a toy, the invention may be applied for transferring data from a data source 116 to many alternative kinds of product, to modify the performance of the product to enable the product to perform additional, alternative or modified tasks.
  • any other non-connecting coupler may be provided such as a telephone induction coupler of the kind used for hearing aids, so that a signal may be induced in a suitable transducer of the product 110, other than a sound transducer 120, when the telephone apparatus 111, or at least a handset hereof is placed in the vicinity of the toy 10 or other product.
  • a telephone apparatus 111 which is physically connected to a telecommunications network 115
  • a mobile telephone could of course be used, by which we include radio telephones, which are connected to a telecommunications network over the ether.
  • the data source 116 may be a central databank to which each of a plurality of telecommunication connections may be made, if desired, the data source 116 may be a second toy or other product, so that data can be transferred between two toys by the method of the invention.
  • the toy 110 or other product sending data would require a suitable output transducer so that data may be sent on the telecommunications network 115, as would an input transducer to receive data.
  • the two may electronically "handshake" so that manual intervention of the data transfer process may not be required.
  • This invention is particularly suitable to enable new scripts (as previously described) or other data to be downloaded to a toy or other interactive device.
  • FIFTH INVENTION In the field of toys and other animated products, it is common to provide a synthesized face with moving lips and possibly blinking eyes in conjunction with uttered speech. A problem with lips and eyes that move under motor control is that it creates noise. Another problem is that it is difficult to achieve good synchronization between the toy's moving lips and the speech being uttered because the motor controlled lip movements tend to be slower than is required.
  • an apparatus for generating an image of a face or one or more parts thereof comprising an image generator having a plurality of individually actuable elements that can be controlled to generate the desired image; a controller for controlling the elements of the image generator in synchronism with speech generated by a speech synthesizer or with prerecorded speech; and an optical system for focussing a representation of the image generated by the image generator onto a face surface.
  • the elements typically define different mouth shapes such as the form of lips but alternatively or in addition may define different eye shapes, nose shapes, ear shapes, cheek shapes, eyebrow shapes or shapes of other parts of a face .
  • the elements of the image generator are luminous.
  • each element may be a light emitting diode or alternatively, it may be an incandescent light source.
  • the elements of the image generator may be liquid crystal elements and the image generator may further comprise a light source, such as a light emitting diode. In this case, each liquid crystal element will typically form part of a liquid crystal display.
  • the elements of the image generator may be arranged to form a dot matrix in order that any desired shape can be generated.
  • the image generator comprises liquid crystal elements then these may be mounted on a transmissive substrate and the light source may be disposed so that light emitted by it shines through the substrate and any deactivated liquid crystal elements.
  • the liquid crystal elements may be mounted on a reflective substrate and the light source may be disposed so that light emitted by it that passes through any deactivated liquid crystal elements is reflected by the substrate back through those elements.
  • the system shown in Figure 7 comprises a light source 211 such as a bright LED located behind a liquid crystal display (LCD) array 212 with a lens 216 between them.
  • the array 212 is preferably a high contrast transmissive LCD.
  • a lens system 213 which refocuses the image generated by the LCD array 212
  • the LCD array 212 so that it appears in much enlarged form on the surface of a screen 214.
  • the screen 214 is of a type which is normally opaque but which allows the projected image to be seen through it. The distance between the lens
  • the LCD array 212 can be adjusted to change the image size as necessary.
  • the construction of the part of the LCD array 212 defining a mouth is shown in more detail in Figure 8.
  • the LCD has a set of segments (labelled 1-9 and A-E) defining various mouth or lip configurations.
  • Each of these segments is individually controllable by a control processor 215.
  • a neutral mouth shape is exhibited by activating segments 3 to C.
  • Figure 10 shows an open mouth with teeth visible by activating segments 2,4,6, 9,A, B,D.
  • Figure 11 illustrates the lips in conjunction with a tongue by activating segments 2,5,8,9,A,B,C.
  • Figure 12 is a modification of the Figure 5 display using segments 2,5,9,A,B,D.
  • Figure 13 illustrates a wide open mouth shape formed by activating segments 1 and E only.
  • the sixth invention relates to a music composition system. Examples of known systems are described in US-A- 4664010 and US-A-4926737.
  • a music composition system comprises pitch estimation apparatus for monitoring a tonal sequence to estimate the pitch and duration of each note in the sequence; and a music composition module for generating a musical composition based on the estimated note pitches and durations.
  • the sixth invention provides a very convenient way of controlling a music composition module by singing, whistling, humming or playing a tonal sequence on which the music composition module bases its composition. This should be contrasted with known systems in which the tonal sequence must be manually entered.
  • a sung, whistled or hummed tune for the purpose of serving as the theme for a composition.
  • the user enters a tune or melody by humming or singing.
  • Software for pitch estimation also known as pitch detection, pitch tracking and pitch determination
  • the composition module then creates a piece of music based on the theme input by the user. The length of the piece of composed music may be specified to within any desired limits. Further compositions may be created based on the same theme.
  • the key input parameters are stored in case they are required to reproduce the composition at a later data - these parameters include the original theme (after "smoothing" by the input module) , the duration, the “style” (see below) and the seed for the random number generator.
  • FIG. 14 is a block diagram of the apparatus; and, Figure 15 is a flow diagram illustrating the process.
  • the apparatus shown in Figure 14 comprises a microphone 300 coupled to a pitch estimation module 310 for analysing an input tonal sequence so as to identify the pitch of each note and its duration. This module 310 also ensures that a resultant tonal sequence is generated having a minimum length of, for example, eight, notes.
  • This tonal sequence is then fed to a composition module 320 which generates a new musical composition from the input tonal sequence which is then output on an output device such as a loudspeaker 330 and/or stored in a memory (not shown) .
  • a composition module 320 which generates a new musical composition from the input tonal sequence which is then output on an output device such as a loudspeaker 330 and/or stored in a memory (not shown) .
  • the tune is input (400) by the user singing, whistling, playing or humming a sequence of notes, any or all of which could be out of tune.
  • the input module 310 functions in real time, estimating (410) the pitch of the user's notes and their duration. Performing the analysis on the input tune in real time avoids the need for excessive amounts of RAM, as the input data for a note is discarded as soon as the pitch and duration of that note have been determined. Numerous pitch estimation techniques and algorithms have been devised over the years .
  • the notes can have eight different durations - the four basic durations are: whole note (semibreve) , half note (minim) , quarter note (crotchet) , eighth note (quaver) and for each of the four there is also a note whose duration is 50% greater (the so-called "dotted" notes) .
  • the starting point for the composition process is a tune (possibly just one note) made up of notes (pitches on a true musical scale) and rests, all quantized to the same duration scale.
  • the module 310 determines how many notes have been input (step 420) . If this is less than 8, then the sequence must be extended.
  • the input tune or theme on which the program's compositions will be based should ideally consist of at least 8 notes. If the user hums, whistles or sings fewer than 8 notes then the tune can be extended (step 430) to 8 notes or longer. This can be achieved in various ways, such as the following.
  • nl+1 means the note one octave higher than the given note nl
  • -1 means one octave lower
  • nl-1 means the note 1 octave lower than the given note nl , ... etc.
  • nl nl+1 nl+2 nl+1 nl-1 nl-2 nl-1 nl 8 notes nl nl-1 nl-2 nl-1 nl+1 nl+2 nl+1 nl 8 notes nl nl nl+1 nl+1 nl nl nl-1 nl-1 nl nl 10 notes nl nl nl-1 nl-1 nl nl+1 nl+1 nl nl 10 notes nl nl nl+1 nl nl nl-1 nl nl 8 notes nl nl nl-1 nl nl+1 nl nl nl 8 notes nl nl nl-1 nl nl+1 nl nl nl 8 notes nl nl nl-1 nl nl+1 nl nl 8 notes nl nl-1 nl
  • nl n2 nl+1 n2+l nl n2 nl-1 n2-l nl n2 10 notes nl n2 nl+1 n2+l nl+2 nl+1 n2+l nl n2 10 notes nl n2 nl-1 n2-l nl-2 n2-2 nl-1 n2-l nl n2 10 notes nl n2 nl-1 n2-l nl n2 nl+1 n2+l nl n2 10 notes nl nl+1 n2 n2+l nl n2 10 notes nl nl+1 n2 n2+l nl nl n2 10 notes nl nl+1 n2 n2+l nl nl-1 n2 n2-l nl n2 10 notes nl n2 nl-1 n2-l nl+1 n
  • the composition module is a software module and employs one or more algorithms for creating variations on the user's theme (440). Ideally the composition should return to the user's (extended) theme tune at the end of the piece.
  • the general principle of our composition algorithm is that the variations move away from the original theme and then back again. This away-back process might happen more than once during the composed piece, for example the variations could move away in an upward direction (towards higher octaves) , and back, and then away in a downward direction (lower octaves) , and back.
  • the composition is then output (step 450) .
  • the principle method employed for composition uses Markov chaining. This allows music to be composed in a particular "style". The style is codified in Markov chain transfer tables created in the following way.
  • Figure 1 is a schematic block diagram of an example of apparatus for use in any of the first to third inventions;
  • Figure 2 is a flow diagram illustrating operation of an example of the first invention;
  • Figure 3 is a block diagram illustrating a group of toys ;
  • Figures 4 and 5 are flow diagrams illustrating different examples of the second invention;
  • Figure 6 is a schematic illustration of an embodiment of apparatus according to the fourth invention;
  • Figure 7 is a schematic side view of the optical arrangement of an example of the fifth invention;
  • Figure 8 is an enlarged plan view of the part of the LCD array of Figure 7 defining a mouth;
  • Figures 9-13 illustrate different projected images;
  • Figure 14 is a block diagram of an embodiment of apparatus according to the sixth invention; and.
  • Figure 15 is a flow diagram illustrating an example of a process according to the sixth invention.
  • T2 responds following receipt of an appropriate RF signal addressed to it from Tl :
  • T2 checks if its owner is present
  • Tl knows that David is present because it has been talking to him. And T2 knows that David is present because Tl tells him so via RF. But the toys need to know whether
  • T2 ' s owner is present so T2 asks the question and passes the answer to Tl .
  • T2 "Is ⁇ name of T2 ' s owner> here? "
  • T2 "That's good” Tl : “ Yes, I 'm real pleased too . " else T2: "That ' s a real pi ty. " Tl : “Yes, that ' s a great shame . "
  • T2 If a toy (T2) knows that it is leaving the group, e.g. it decides to move away from the group, leave the room (if feasible) , become inactive (to go to sleep) , ... etc., it should first make a suitable comment :
  • a toy If a toy is uttering something from a script it should interrupt itself after its next statement or question (i.e. if it is making a statement prior to asking a question then it should stop before the question; if it is asking a question then it should stop after uttering the question) . Then, depending on whether the interruption came at the end of a statement or at the end of a question, it should say:
  • toy (Tl) If the toy (Tl) is not uttering something from a conversation script when another toy enters the group then either it has just finished a conversation script or action and is ready to start another script or action, or it is in the process of doing something else, such as singing or playing a game. If it has just finished a conversation script and has not started something else (there will be a short pause [1-2 seconds] between the two) then it simply goes through the above introduction process. But if it is doing something then it should interrupt itself just before it is next due to speak:
  • chatterbot conversations can be treated in a similar way, so multi-way conversations are not restricted to be completely scripted conversations but can be part scripted and part chatterbot.
  • chatterbot conversations (and “mixed mode” conversations) can be multi-way and the description below of how conversation leadership changes can apply equally to chatterbot and mixed mode conversations . This concludes the description of what happens when a toy enters a group.
  • the invention [a] Allows the "conversation leader” (the toy currently leading the conversation) to change from one toy to another, ensuring that all toys get a fair crack of the whip. [b] Brings each of the humans into the conversation from time to time. [c] Demonstrates the intelligence of the toys by showing that they understand what the others in the group are saying and they know and remember things about the other toys .
  • Tm chooses to whom it should pass the leadership by counting, for each of the other toys in the group, how many toy utterances have been made in total since that toy last spoke. So for each toy other than Tm there will be a count - these counts serve as weights for randomly selecting the next toy to speak so the toy that has not spoken for the longest will have the greatest chance of being chosen to be the next to speak. Then, if the toy that is chosen to be next to speak has a count of 8 or more, the handing-over toy (Tm) says:
  • the type of data stored in the PDB includes whether the toy or a person likes a particular food. Demonstrating the intelligence and memory of the toys Each toy keeps track of other toys it has met. (A list of their unique identifiers is retained in the EEPROM . ) Every now and then a toy (Ta) should ask something of another toy (Tb) . If Ta and Tb have not met on a previous occasion then the question should be prefaced by:

Abstract

A speech generating assembly such as a toy comprises an input device, such as a speech recogniser (16); a processing system (11) connected to the input device (16) to receive an input utterance; and a speech synthesizer (13) for outputting an utterance under control of the processing system (11). The processing system (11) operates in accordance with a predetermined algorithm which responds to an input utterance to generate an output utterance to which a single word or short phrase reply is expected via the input device (16), and, in response to that reply, to generate an input utterance for use by the predetermined algorithm.

Description

VOICE CONTROLLED TOY
The invention relates to a number of improvements to toys and the like. Several different inventions are described. Known conversational toys are described in the following US patents: 6110000, 4857030, 6631351, 6089942, 5213510, 6554679, 5752880, 6160986 and 6658388.
FIRST INVENTION A well known novelty device is the "Chatterbot" which produces conversations that lack a continuity of logical thread but are nevertheless quite fun. Many examples of chatterbots can be found on the Internet. They typically comprise a keyboard processing system for comparing input text with known templates and for generating an output utterance via a speech synthesizer depending on the result of the comparison. Thus, the user inputs some text (his utterance) - the chatterbot software outputs a response, often as text on a monitor, that appears to be related to the input. For example, if the user inputs-. "I love you" the chatterbot might respond "Tell me why you love me. " The chatterbot understands nothing - it somehow matches the input (or part of the input) to one of a number of stored utterances and then outputs a response that is somehow related to the matched input utterance . Many chatterbot programs do this matching by looking for patterns of words. Normally, the user interacts with the chatterbot via a keyboard. It would be possible, using a very accurate speech recognizer, for the user's input utterance to be spoken. But speech recognition is not yet sufficiently advanced for this to be done completely reliably. In accordance with a first invention, a speech generating assembly comprises an input device, such as a speech recogniser; a processing system connected to the input device to receive an input utterance; and a speech output device, such as a synthesizer, for outputting an utterance under control of the processing system operating in accordance with a predetermined algorithm which responds to an input utterance to generate an output utterance to which a single word or short phrase reply is expected via the input device, and, in response to that reply, to generate an input utterance for use by the predetermined algorithm. We have realised that it is possible to enable input speech to be handled in this context by limiting the input which is to be expected. Typically, this will be limited to very short answers, typically single word answers such as "yes" and "no", in response to questions forming part of the output utterances. An alternative to speech recognition for the input method of the user's answer would be a small number of keys or buttons that can be pressed to indicate one of a small number of responses. For example, there could be a "yes" button and a "no" button if all the output questions required "yes" or "no" answers. Or the system could ask a question and request the user to press one button for one answer, another button for another answer, and so on. The invention is particularly suited to novelty devices or toys such as dolls but in certain circumstances could be implemented in other devices such as diagnostic tools, stock control systems and the like. An example of the first invention based on the use of a chatterbot located in a toy will now be described with reference to Figure 1 which is a schematic block diagram, and Figure 2 which is a flow diagram. The primary components of the toy as far as this invention are concerned are shown in Figure 1 and comprise a CPU or processing system ll coupled to a speech synthesizer 13 which in turn is connected to a loudspeaker 14. The speech synthesizer 13 could be a device that outputs text-to-speech or one that stores and outputs recorded speech. The CPU 11 is also connected to a speech recognizer 16 which may be of conventional form which receives input from a microphone 17. The CPU 11 is also coupled to a ROM 12, RAM 18 and EEPROM 19 and to a RF transceiver 15 although the transceiver 15 is not necessary for implementing this invention. The ROM 12 stores program code for operating the CPU 11. The EEPROM 19 stores any data that the toy acquires during its use and which is needed to enhance the subsequent use of the toy, e.g. remembering a child's name so it can be used in conversation. A typical operation will now be described with reference to the flow diagram of Figure 2. Initially, the CPU 11 selects a question from the ROM 12 and controls the speech synthesizer 13 to output the question via the loudspeaker 14 (step 20) . This question is selected to have a yes or no answer and could be:
"Do you like eating spaghetti ? "
If the user answers "yes", this is received by the microphone 17, converted to suitable data by the speech recognizer 16 and fed to the CPU 11. (Step 22.) The CPU 11 in conjunction with the program and data in the ROM 12 then converts the answer "yes" to an input utterance such as :
"I like eating spaghetti "
(step 24) while if the user answers "no", that answer is converted by the CPU 11 in conjunction with the program and data in the ROM 12 into:
"J do not like eating spaghetti "
These input utterances are in electronic form and not audible. In response to this input utterance, the predetermined algorithm stored in the ROM 12 which corresponds to a conventional Chatterbot algorithm, performs its normal matching process by comparing the input utterance with templates selected from the ROM 12 and then selects an appropriate new output utterance (step 26) and the process returns to step 20. A template could be of the form: "I <verbxnoun>" where the slots can be filled by an appropriate verb and an appropriate noun, either or both of which are taken from the system's previous utterance (which might have been: "Do you <verbχnoun>?" , in which case, if the user responds "yes", the single word response is converted into the sentence "I <verbχnoun>" ) . So out of a single word or short phrase answer a whole sentence has been created that can be used as the next input utterance for the chatterbot . As an alternative to matching a whole sentence (the input utterance) against a database of stored utterances, the system could take the form of a network. A network is a collection of linked nodes, each node representing an utterance, and the links from a node to another node being determined by the user's response to that utterance. E.g. the utterance "Do you like spaghetti?" could be a node, with links to two other nodes, one link chosen if the user responds "yes" and the other link chosen if the user responds "no" . In such a network each of the possible answers by the user would lead the system to one or more possible next utterances. Typically, the system has six possible next utterances corresponding to each possible answer by the user, thus providing variety of conversation by ensuring that one particular answer to a certain question is not always followed by the same utterance from the toy. For example, if the user responds "no" to the question "do you like spaghetti", the system might choose its next utterance from the following list: "Do you prefer macaroni?", "Is that because your mother forced you to eat it when you were very young?", "Have you ever seen a cat eating spaghetti?", ... etc. Advantages of employing such a network are the lack of need for linguistic knowledge about the user's responses and a corresponding reduction in the amount of memory space needed by the system. It would be desirable if the toy could say to the user
"What shall we talk about today?" and use its speech recognizer to understand the answer. But speech recognition technology is not sufficiently advanced to cope. The task could be made much easier in accordance with the first invention by having the toy say: "What shall we talk about today - cats, dogs or rabbits?" Then the speech recognition software could pick out the correct word, but the choice of topics would be limited and we would pretty soon run out of scripts and have to recycle them. The system can allow the user a very wide rein in his choice of topic. The toy can ask the user to "think of something" and to write down the word. Then the toy plays a game commonly known as "Hangman" with the user to determine the word. (This might start: "Is the letter E in your word? " , ... and so on.) Once the toy knows the word it uses that word in the first few utterances of a chatterbot conversation. All nouns can be divided into a fixed number of categories for example the 25 categories (known as "unique beginners") described in "WordNet (Language, Speech and Communication)", edited by Christine Fellbaum, MIT Press, such that it is possible to devise sentences appropriate for words in any category without knowing what the word is. For example, one category might be "foods", in which case a valid sentence would be: "Do you like eating <word> ? "
By knowing what the word is the toy knows the word category and therefore can start a chatterbot conversation with a sensible utterance. It will be understood that although typically the conversation will be between the speech generating assembly and a human, the conversation could also be between two speech generating assemblies such as two toy dolls.
SECOND INVENTION US-A-4857030 describes a system of conversing dolls in which a number of dolls can take part in a multi-way conversation. However, this prior art does not solve the problem of how to have more than two toys taking an active part in a conversation. Their approach is to have just two toys as "speaker" and "responder" while "the third, fourth or more dolls which are present during a sequence will be listeners and provide occasional responses". In accordance with a second invention, a speech generating assembly comprises a speech output device, such as a synthesizer; a processor for implementing scripted speech; and an information signal receiving and transmitting system for receiving information signals from, and transmitting information signals to, other speech generating assemblies, the processor being adapted to act as a conversation leader for a period or for a number of utterances determined in accordance with a predetermined algorithm and then to generate a transfer information signal to transfer conversation leadership to another speech generating assembly. We have developed a new type of assembly which utilizes a known conversational software technique called "scripts". There must be some commonality between the scripts in each assembly and it is only those scripts that exist in both/all assemblies in the conversation that can be used for that conversation. But there could be additional scripts in one or more of the assemblies that are not common to all of them. "Conversation leadership" means that the conversation leader controls the other speech generating assemblies in a group by issuing suitable control information signals to the assemblies, these signals usually being addressed to specific assemblies. Under this control, the speech synthesizers or other speech output devices in the group are controlled by the conversation leader via the respective processors to output utterances according to the script . The conversation leader decides to whom or what it should be addressing its next utterance. This decision is based on a probabilistic process such as the one mentioned below. The probabilistic process is such that every active assembly, typically a toy, is given a fair crack-of-the- whip in terms of how often it says something, and every human in the room is similarly treated fairly in terms of how often he/she is addressed by a toy. As well as control information signals and the uttered speech itself, speech information signals will typically also be transmitted defining the speech generated by the assembly after which another assembly can respond according to the script. Typically, a script represents how a conversation might develop between a toy and a child. This might consist of phrases spoken by the toy followed by replies made by the child (in the simplest case) . There will be a single point at which the conversation will begin but there may be a large number of ways in which the conversation will end, depending on the responses of the child or the "mood" of the toy. More formally, a script is a means of
"programming" the toy's conversations. The most important building blocks of a script are referred to here as "nodes" and "arcs" which are now described briefly. A node typically represents something said by the toy or an alteration in the toy's mood. (For example the toy becomes less happy) . It always contains something that occurs at a particular point in the conversation. Nodes are the most basic building blocks of a conversation. An arc joins two nodes together to produce a conversation path between them. For example, imagine that we have two nodes. In one the toy says "What colour do you like best - is i t red?" and in the other the toy says ""I like red too . " . A correct conversation path would consist of the first node followed by the child answering "yes", followed by the second node. An arc consisting of the word "yes" produces the required path of conversation. Of course, there may be more than one arc joining "What colour do you like best - is i t red? " to one or two other nodes - one arc for "no" and perhaps one arc for when the child does not reply. This makes an arc fundamentally different from a node. A node describes a single event that can occur at a particular point in the conversation, whereas there may be many arcs coming from a node representing the many different paths that could occur at that point . Scripts are normally employed by a toy having a conversation with a human. In this second invention, however, we have developed this idea to enable toys or other speech generating assemblies to converse with one another by introducing the concept of a "conversation leader" which takes control of the scripted conversation but then relinquishes that control. Conversation leadership could be passed after the conversation leader has uttered a predetermined number of utterances or after a certain period of time or some other defined circumstances such as when a new assembly is brought into the conversation by being physically brought into the vicinity of other assemblies. Typically, the period is determined in accordance with a probability function such that as the number of successive utterances for which a particular toy was the conversation leader increases, there is an increased probability that conversation leadership will be passed to another toy. For example, after a particular toy has been conversation leader for one utterance there might be a 10% probability of it passing leadership to another toy prior to the next utterance; after it has been conversation leader for two successive toy utterances the probability of it passing leadership prior to the next toy utterance will be higher, say 40%; after it has been conversation leader for three successive toy utterances that probability rises further, say to 70%; after four successive toy utterances
(or whatever maximum has been programmed) the change of leadership is mandatory. One or more sensors may be provided which enable the toy not always to stick exactly to the conversation plan. If it is more urgent to say something else, it may do so. For example, if it is suddenly turned upside-down the processor may stop whatever it was controlling the speech generating assembly to say and cause it to utter a suitable exclamation. Other examples include a change in temperature or change in position relative to a user. An example of the second invention will now be described with reference to Figure 1, Figure 3 which is a block diagram of a group of toys, and Figures 4 and 5 which are flow diagrams. Figure 3 illustrates a group of three toys Tm,Tn,T together with one user 30. Each of the toys has an internal structure similar to that shown in Figure 1 and so this will not be described again. However, instead of the chatterbot software described above in connection with the first invention, the ROM 12 stores software for carrying out scripted conversations as will be described further below. Assuming that one of the toys has been selected as the conversation leader, for example toy Tm, it will then undertake a scripted conversation with the other toys in the vicinity and/or the user 30. This will occur in a conventional manner with the toy issuing an output utterance (step 32) , receiving a reply (step 34) and, if appropriate, outputting a further utterance. In order to pass conversation leadership to another toy, the conversation leader maintains a count, in this example, of the number of output utterances which it makes so that after each step 32, that count is incremented (step 36). If the count is not greater than a threshold T (step 38) , the conversation leader retains leadership. However, if the count exceeds the threshold T, then the conversation leader passes leadership on to another toy. In order to achieve this control, as well as outputting audible speech utterances, the CPU 11 controls the RF transceiver 15 to generate a corresponding RF signal which is received by the transceivers 15 in other toys. Similarly, RF signals representing speech output by other toys are generated by corresponding transceivers 15 and detected by the RF transceiver 15 in the conversation leader. Thus, although the toys "speak" using text-to-speech or pre-recorded speech and "hear" what the user says using speech recognition it is only the RF data that enables each toy to know what has been said by the other toys and to receive commands such as which toy should be the next to speak and obtain knowledge of the presence of other toys . A particular advantage of text-to-speech is that it allows for the possibility of the user (or others) upgrading the product by adding new scripts, since new scripts might well call for the use of vocabulary that is not employed in the original scripts and therefore, if pre- stored speech was used, the toy would not be able to say the "new" words. The exact method of transmission and reception of the RF data could be primitive, for example that used in radio controlled toy cars and similar products. Such RF systems typically transmit one of a small repertoire of commands, such as data corresponding to "backwards", "forwards", "left" or "right". It may be desirable for the toys to transmit more data than can conveniently be handled by a very low cost RF transmitter/receiver system 15 or infrared. Therefore instead of a very low cost RF transmitter/receiver system that requires no integrated circuits, the toys may employ a Micrel MICRF102 transmitter IC and a Micrel MICRF001 receiver IC. Alternatively the transmitter and receiver ICs could be integrated into a single transceiver IC such as the Xemics XE1201. The transmission frequency is chosen so as to permit the use of unlicensed frequency bands, e.g. 433 MHz. In another example, Tm is the conversation leader and outputs an utterance via speech synthesizer 13 addressed to Tx (step 40, Figure 5) which will be accompanied by an equivalently addressed RF information signal from the transceiver 15 in Tm. The processor 11 of Tx will determine that the RF signal is addressed to it and use the RF signal to determine its scripted response. This is output via its speech synthesizer 13 and a corresponding RF signal by its transceiver 15 (step 42) . On receiving the response RF signal, Tm increments a counter for Tx by one (step 44) . When toy (Tm) decides to hand over leadership (step 46) to another toy (Tn) (as determined using a probability function or otherwise as described above) , it does so immediately after a response to a question. It (Tm) chooses to whom it should pass the leadership by counting, as explained above, for each of the other toys in the group, how many toy utterances have been made in total since that toy last spoke and receiving the counts (step 48) . So for each toy other than Tm there will be a count - these counts, which are stored in the RAM 18, serve as weights for selecting the next toy to speak so the toy that has not spoken for the longest will have the greatest chance of being chosen to be the next to speak. Then, if the toy that is chosen (step 50) to be next to speak has a count of 8 or some other pre-set threshold, the handing- over toy (Tm) says:
"What do you think about that Tn ? You haven ' t said anything for a while . "
Then Tn says : "Very interesting"
and takes its turn as conversation leader. If the handover takes place while a script is in progress the new conversation leader continues with that script. Otherwise the new conversation leader acts in accordance with preprogrammed commands to determine its behaviour. Of course, this is how it appears to an observer. In fact, suitable RF signals are being exchanged between the toys in which the conversation leader sends (step 52) a RF signal addressed to Tn instructing Tn to become the next conversation leader. In some cases, humans 30 can be brought into the conversation from time to time. In one example, there is a 90% probability that a new conversation leader will address its next question to the same human to whom the previous question was addressed. If it does so there is a 50% probability on each subsequent utterance that it addresses the same human for the next one, subject to a maximum of 5 successive questions to the same human. When the toy asks a question of a human it assumes that the answer will come from that human. Certainly the answer will not come from another toy because no toy will speak until it is told to by the "conversation leader". In addition, when a toy speaks it sends an RF signal to all the other (switched on) toys in the room to indicate that it is saying something and what it is saying. So there is no way a toy can believe that something is said to it by another toy when the speaker was a human. If a human interjects and speaks while a toy is speaking, the other toy(s) in the room will ignore what the human says, picking up only on the utterance of the other toy that was indicated by the RF message. If there are two or more humans in the room and a human other than the one spoken to responds to the toy, the toy will still assume that the responder is the human it spoke to . In the 10% case that a new conversation leader does not address its next question to the same human to whom the previous question was addressed, the new conversation leader will choose between addressing a different human (if one is known to be in the room) or addressing another toy. The probabilities of making each of these choices is based on giving each human a fair crack of the whip in terms of being a person addressed by a toy and on giving each toy a fair crack of the whip in terms of being addressed by a toy. This probabilistic process operated by the conversation leader (s) and the actual probabilities employed in this process are such as to cause, over a long period of time, each toy to be the conversation leader for approximately the same number of utterances and each human to be the addressee of approximately the same number of utterances. Each toy Tm,Tn,Tx in the group is programmed to assume that if a toy leaves the group then its human has also left the group and should not be addressed again during the current conversation unless it re-enters the group. In an example of the invention when applied to toys, each packet of data carries a unique identification number for the toy transmitting the data and the corresponding numbers for those toys or toy destined to receive the data. If a toy receives a data packed addressed to it, it will return a "data received" message. If one or more of the toys destined to receive a data packet fail to do so for whatever reason, the transmitting toy will not receive a "data received" message from that destination toy and will retransmit the data one or more times. After a predetermined limit has been reached on the number of transmissions, for example 10, the transmitting toy assumes that the particular destination toy has been taken out of the room and/or is otherwise no longer involved in the activities of the transmitting toy and has quit the conversation. Each speech generating assembly or toy in a group will be individually identified using a unique identifier, each toy maintaining a record of other toys and humans it has met and information it has learned about them during its conversations, which data is stored in the EEPROM 19, and which allows the toys to demonstrate some intelligence. The allocation of unique identifiers can be achieved as follows. In the factory a toy typically goes through various test procedures. These could include pressing buttons (keys) . The time delay between successive keystrokes is used to set the unique identifier. For example, the number of microseconds (modulo 255) between the first and second keystrokes is the first byte of a 4- byte identifier. The number of microseconds between the second and third keystrokes provides the second byte, and so on. (Of course it does not have to be successive keystrokes.) The resulting 4-byte number is stored in the EEPROM 19 so it is retained when power is removed, e.g. when the batteries are changed. All or part of this 4-byte number can be used in various ways. For example, in an algorithm for generating names - each toy could have a different name. The algorithm requires a random number generator and so the 4- byte number is used as the seed. This allows us to have more than 4 billion different names. Similarly each toy can have a different set of likes and dislikes, and different personality characteristics, all determined from the 4-byte "unique" identifier. Of course, the identifiers cannot be guaranteed to be absolutely unique - it is possible that two toys will have the same 4-byte number.
It is also possible that different seeds will occasionally produce the same name, or the same set of personality characteristics. But it will happen so rarely that to all intents and purposes each toy can be guaranteed to be different from every other toy. If any toy asks a question of any human user and the answer provides data that should be stored in that user's PDB ("Person Database") in the EPROM 19 (which could be flash ROM), e.g. whether the user likes a particular food, then the toy owned by that user is the one that stores it - the other toys do not need to store this data. An important aspect of the invention is that it is preferable that in all of the conversations, those involving only one toy and one human and those involving more than one toy and/or more than one human, what the toy says is, in general, not random, but depends on various factors including its mood and what it has experienced in the past . One example of the relevance of past experience is the way that they toy can remember (by storing data in a memory "store") things such as what virtual food it ate for breakfast, when was the last time that it virtually ate bananas, when was the last time the user played tic-tac-toe with it and what was the result. Each toy's database of information about itself is preset in the factory (after the 4-byte number is created) with certain information about that particular toy. The 4-byte number or part of it is used as the seed for a random number generator which in turn provides a list of, for example, the foods and drinks that that toy "likes" and those it doesn't like. This information is stored in the EEPROM 19 so it is retained when the batteries are changed. During the course of some scripted conversations the toy asks questions such as: "What are you going to give me for lunch? Spaghetti, chicken or hamburgers?" When the user's answer is known that information is stored its the database so that the toy can use this information in subsequent conversations, for example: "Oh no. Not again. You've given me hamburgers three days running . " Each toy normally knows the name of its owner. This can be given to the toy in the shop when it is purchased, or by an adult before giving the toy to a child, or by the owner of the toy. The owner's name is learned by playing the "hangman" game in which the toy guesses the letters in the owner's name until the whole name is revealed. It will be appreciated that each of the inventions can be implemented independently of the other or in one or more combinations. For example, one type of conversation switches back and forth between scripted conversation and chatterbot conversation. The conversation may start off being scripted. After a few exchanges between user and toy the software switches to chatterbot mode, normally using one of the nouns in its most recent utterance as a key word in the first of its chatterbot utterances. This creates a measure of continuity. After a few exchanges in chatterbot mode the toy might say something like:
"But I 'm getting away from the point .
And then returns to scripted mode where the script continues from where it left off. We have also developed some variations on scripted conversations. One way of achieving variation is to have points in the conversation script where there are different options as to which arc to choose next . This means that at the same point in a conversation a toy might say one thing on one occasion and something completely different on another occasion. More variety is created within a script utterance by having the toy say essentially the same thing in many different ways. Here is a simple example. Perm one choice from the first line (there are 5 choices, including the null choice) with one from the second line, one from the third line, etc. Note that here a blank space after a "/" symbol means saying nothing from this particular section.
The {little/big/fat/thin/ } {tabby/brown/black/ginger/ } {cat/kitten} {sat/was sitting/was standing} on {the mat/the rug/the carpet}.
This sentence can be uttered in 5x5x2x3x3 different ways = 450 in all. One use of this method is that the software could choose to be terse (selecting utterances of shorter lengths: "The cat sat on the mat") or loquatious (selecting one of the longest utterances: "The little tabby kitten was sitting on the mat") , depending on the toy's "mood" at that time, or for some other reason. In preferred aspects of the first and second inventions, the assembly further comprises a store known as its Personal Data Base (PDB) for storing information such as "likes and dislikes" and details of past experiences such as what virtual food it ate for breakfast, when was the last time it virtually ate bananas, when was the last time the user played tic-tac-toe with it and what was the result etc. This information can also be used to determine a mood of the toy which can be used either alone or with the other information to determine the next output utterance. This will now be described in more detail below. A toy's personality is exhibited through its behaviour which, in turn, is governed by its moods which, in their turn, are elicited by its emotions. Each emotion is associated with one or more moods . Each mood is responsible for one or more behaviours - a behaviour can be an action of some sort or the execution of a script including varying what is said in the script according to the toy's mood and/or its past experiences. The three factors in a personality make-up are called, in our model, Pleasure (P) , Arousal (A) and Dominance (D) . For our purposes Pleasure assumes its usual meaning - if something happens which the toy likes then the value of P rises. This could occur when the toy eats his favourite food (simultaneously increasing its arousal level) or when he plays a game that he enjoys or meets a friend. Arousal measures how easily the toy is mentally aroused by complex, changing and/or unexpected events. When a toy is asleep or tired it has a low level of arousal. Similarly when it is in a darkened, quiet room. When it is given something stimulating to do such as play a game or sing a song or when it is in a noisy and well lit environment, its level of arousal increases. Dominance measures a feeling of control and influence over others, relative to feelings of being controlled by others. A toy's dominance will rise, for example, when it wins a game against the human user or against another toy, or when it persuades the child to agree to something after the child first tried to refuse it. We measure each of the three parameters P, A and D on a scale of -100 to +100. For a toy's basic personality, which never changes, the value of P is a randomly chosen integer on the range +10 ≤ P < +80, the value of A is an integer randomly chosen on the range -50 < A ≤ +80 and the value of D is an integer randomly chosen on the range -50 < D ≤ +80. This gives 91x131x131 possible combinations, i.e. approximately 1.56 million different basic personalities . How quickly or slowly each of these values (P, A and D) changes according to events is governed by the toy's temperament. A toy with a volatile temperament may become very unhappy indeed as a result of one unpleasant experience (such as being given its least favourite food for dinner) and may then suddenly change to a euphoric mood when it is taken to see its favourite animal at the zoo. In contrast, a phlegmatic toy may require 20 pleasant events to change its mood from neutral to euphoric and then
40 unpleasant ones to change it from euphoric to desolate. All of the above parameters are determined from the toy's unique 4-byte identifier. A toy's temperament is governed by its "step size" when modifying the values of P, A and D. Step size is a percentage of the distance, from where on the scale of -100 to +100 the parameter is currently valued, to the extreme end in the direction of the change being made. For example, if P is currently -10 and the step size for Pleasure is 20% then a single positive step is 20% x 110 = 22 (the number 110 being the difference between -10 and +100) . Some occurrences will require two or even three steps of change - in these cases the change for the first step is calculated, then the change for the second step after the first has been made, then (if necessary) the change for the third step after the second has been made. A completely manic toy has a Pleasure step size of 100%, so just a single step is necessary to make it euphoric or desolate. The step sizes should be generated separately for the P, A and D of each toy. An example is:
8% of toys have a step size of 1% for a particular parameter 8% of toys have a step size of 2% 8% a step size of 3% 8% step sizes of 4%, 5%, 6%, 7%, 8%, 9%, 10%, 20% and 30% 4% a step size of 100% At the start of its life the toy has its basic values of P, A and D. During use the values of P, A and D undergo temporary changes and it is these changed values which determine, at any moment in time, the toy's current mood. When the toy is powered on but not being used, the temporary values of P, A and D decay, moving towards their basic values. When the toy powers down the values continue to decay but at a slower rate. After a certain amount of inactivity the toy's mood subsides and returns to that corresponding to the parameter values of its basic personality. The toy has certain likes and dislikes which are stored in its Personal Data Base (PDB) . Associated with each "like" and "dislike" there is a measure of how much it is liked or disliked: +1 means "I like it"; +2 means "I like it a lot"; +3 means "I'm crazy about it". Conversely -1, -2 and -3 mean, respectively, "I dislike it", "I dislike it a lot" and "I hate it". 0 means "I'm neutral about it". If one of the toy's "likes" occurs the value of P increases by 1, 2 or 3 steps as appropriate, while if a "dislike" occurs the value of P is reduced by 1, 2 or 3 steps. In addition to the likes and dislikes stored in the PDB there are other things which are likes and dislikes. For example, if the child does something the toy has asked him to do by, for example, playing a game with the user, the toy likes the fact that its request has been acceded to and P increases by 1 step. (But not if there is already an increase in P due to the requested action being on the "like" list in the PDB.) Certain things arouse the toy. These include, as defined by the appropriate scripts, playing games, singing, creating a poem, ... etc., almost anything which is intellectually stimulating, including making conversation. If more than one human is involved in the conversation the level of arousal increases further, one step for each human, as it does for every other toy that joins the conversation. As indicated above, when the toy is asleep (as in "I feel like taking a nap") its arousal decreases by three steps - being asleep is not the same as when it is powered down and the arousal level decays. Similarly when the toy is tired from too much activity its arousal level decreases. Note that if the child takes a long time to respond to the toy (anything in excess of 10 seconds) the toy's arousal level is reduced by one step. (This reduction is faster than when the toy is powered down because when the toy is active it expects quick responses from the child whereas when it is powered down it expects no response . ) In this aspect, at any time when the toy is switched on it is either "awake" or "asleep". If it is awake it acts as though it is awake, carrying on conversations, playing games, etc. From time to time the toy will simulate tiredness. This happens according to a probabilistic process but is linked to the time of day (it is more likely to simulate tiredness late in the day than it is in the morning) and to how long it has been active (i.e. how long since it last "woke up") and possibly to other factors such as boredom (a mood) . When it simulates tiredness it may say something such as: "I feel sleepy right now. Good night." or something else that is appropriate, and then it will simulate falling asleep. To determine time of day, there is a timer, most usually on the CPU chip 11, which (possibly together with components external to the CPU such as a watch crystal running at approx 32KHz,) enables the toy to know what time it is. This data (the current time and date) is resent whenever the batteries are changed. When it is simulating sleep the toy can be programmed sometimes to make snoring sounds. It can also simulate dreaming in its sleep - when it dreams it talks in its sleep, sometimes even carrying on a conversation while asleep (though in a higher pitched or lower pitched voice in order to differentiate awake and asleep conditions) . Certain things affect the toy's Dominance level.
Winning a game increases Dominance because winning is a dominating result. Getting its own way increases a toy's dominance - the harder it had to try to get its own way the more its dominance increases when the child gives in. The decay rate for each of the parameters P, A and D depends on the toy's temperament. An example of a decay rate is (step size) for every minute after the first minute that the toy is asleep or not being used while powered up. Thus, if a toy has a step size for P of 5% then its decay rate for P while it is powered up will be ( 5) % = 2.24% per minute. When the toy is powered down the decay rate for each parameter is halved. One advantageous use of this decay effect is that when the toy simulates waking up, for example by starting a normal conversation, having been asleep, it can comment that it feels better or not so angry any more, according to how much change of mood there has been since it fell asleep. This is measured by computing how near the toy's mood is to certain undesirable states using the sum of squares of the differences: (P-P')2 + (A-A')2 + (D-D')2 between the current values of P, A and D and the values corresponding to the mood in question. This nearness measure is recomputed when the toy wakes up and the toy then picks the undesirable mood corresponding to the maximum change of distance (sum of squares) . If this change of distance is more than 1,200 then it is worth mentioning, otherwise not. If the change of mood is not worthy of mention then the toy can simply yawn and comment that it had a nice nap. For each mood we specify the values of P, A and D which correspond to the point in P-A-D space on which that mood is centred. In some cases there is more than one such point and hence more than one triple of values. We also include a list of the emotions that can be elicited by that mood. Each emotion is a point in 3-space, the dimensions being P, A and D. The current mood is also a point in the same 3 -space. We therefore determine the current mood by computing the sum of squares distance between the current values of P, A and D and those of each of the emotions - the current mood is the one corresponding to the lowest sum-of-squares distance. Examples of mood definitions are:
Angry P=-51 A=59 D=25 Bored P=-65 A=-62 D=-33 Curious P=22 A=62 D=-l Dignified P=55 A=22 D=61 Elated P=50 A=42 D=23 Hungry P=-44 A=14 D=-21 Inhibited P=-54 A=-4 D=-41 Loved P=87 A=54 D=-18 Puzzled P=-41 A=48 D=-33 Sleepy P=20 A=-70 D=-44 Unconcerned P=-13 A=-41 D=8 Violent P=-50 A=62 D=38
Appendix 1 provides more examples of multi-way conversations .
THIRD INVENTION Another reason why the system of US 4857030 limited conversation to principle participants was that they wanted to avoid confusion or "collisions" between the spoken words of the doll . In accordance with a third invention, a speech generating assembly comprises a speech output device, such as a speech synthesizer; a processor for controlling the speech output device; an information signal receiving and transmitting system for receiving information signals from, and transmitting information signals to, other speech generating assemblies, the processor being adapted to exchange information signals with one or more other speech generating assemblies when it wishes to output an utterance, the processor controlling the speech output device to output an utterance when the information signal exchange indicates that the or each other assembly is not outputting an utterance. The assembly is particularly applicable for use in toys. Preferably, the information signal receiving and transmitting system in the second and third inventions transmits and receives signals using RF, infrared or ultrasound. It is an important aspect of the invention that the RF (or other medium) communication between assemblies should not be hampered by "collisions" between data packets being transmitted by more than one assembly. This is achieved using a system that is logically similar to the "Ethernet". The Ethernet is a standard originated by the IEEE for physically connecting computers into a local area network and a communications protocol that allows those computers to share data. This standard is described in specification documents IEEE 802. IE (March 1991) and IEEE 802. IB (August 1992), as well as (for example) "Ethernet: The Definitive Guide" by Charles E. Spurgeon, published by O'Reilly and Associates. The Ethernet standard dictates the communications protocol, i.e. how connected computers send data. Computers linked by Ethernet send data along the wire in small chunks called "packets". One may think of a packet as a suitcase, tagged to travel to a different city, moving along a conveyor belt in an airport. In addition to the data itself, each packet carries a destination address and the sending computer's "home" address. The Ethernet interface uses a protocol called Carrier Sense Multiple Access With Collision Detection to send packets of data. The computer first detects a lull in activity - in the suitcase analogy this might be a gap during which there is no suitcase on the conveyor belt. When such a lull is detected the computer (or in our case the transmitting toy) transmits a data packet. Every time a packet reaches its destination toy (and there may be more than one destination toy for which the packet is intended) the sending toy is sent a confirmation data packet by the destination toy, while the sending toy waits for a gap to open that allows it to transmit another packet of data. In practice the amount of data transmitted in a package may well be such that only one packet of data is required to get the entire message from the sending toy to the destination toy(s) . In an Ethernet configuration computers and other devices "along the way" read the destination address (es) for the data. In the case of assemblies such as toys according to the invention there will normally be no connection between toys by means of a wire so every toy that is active and close enough to the sending toy to be able to do so will receive every data packet, but only those toys to which a data packet is addressed will "capture" that data - all other toys receiving that data will ignore it. As explained above, a problem would arise if two or more toys transmitted data at the same time. In the luggage analogy this problem would correspond to two suitcases being placed in the same gap on the conveyor belt at the same time, resulting in a "collision". A data collision would normally result in the loss of both (or all) data packets, for the moment, and the requirement to retransmit all the data. In an Ethernet configuration the sending computer attempts to read the data it has sent in order to ensure that no collision has occurred. If the data it receives is not identical to the data it sent, the sending computer then transmits data that can be recognized by the other computers in the configuration as garbage. This alerts the other sending computer (s) to the fact that two or more computers are attempting to send data simultaneously. When this has been accomplished those computers that had been sending data simultaneously immediately stop their transmissions and each waits for a random period of time before re-transmitting their data. By making this delay random the chances of a second successive collision are significantly decreased. This approach helps prevent network gridlock. In the case of speech generating assemblies according to the third invention, such as toys, the sending toy attempts to receive the data it has transmitted in order to ensure that no collision has occurred. If the data it receives is not identical to the data it sent, the sending toy then transmits garbage data that can be recognized as such by the other toys in the vicinity. This alerts the other sending toy(s) to the fact that two or more toys are attempting to transmit data simultaneously. When this has been accomplished those toys that had been sending data simultaneously, which may include the toy sending garbage data, immediately stop their transmissions and each waits for a random period of time, probably of the order of tens of milleseconds, before re-transmitting their data and speaking the corresponding utterances. By making this delay random the chances of a second successive collision are significantly decreased. This process continues until all of the data from the sending toy has been successfully received by all of the intended recipients or until the upper limit is reached on the number of times such data is sent before the sending toy assumes that an intended recipient is no longer in the room and switched on.
FOURTH INVENTION This invention relates to a method of transferring data and more particularly but not exclusively to such a method which is suitable for transferring from a data source to a plaything such as a toy or educational aid. It is well known to transfer data from a data source to a product such, as a computer using a telecommunications link, with the data source and product communicating via two way signal transfer. To achieve this, it is necessary to use equipment such as a modem or the like at each end of the telecommunications link, to modulate a signal for transmission at either end and to de-modulate the signal at the receiving end, for use. Higher speed data transfer means have been developed, such as so called ISDN connections . Such data transmission apparatus are expensive and generally not simple to use, particularly by children. It is well known that children tire of their toys and playthings. Rather than replacing toys, it is desirable for toys to be modified to provide additional interest for a child. For example, it is desirable for a toy which is able to perform a plurality of simple tasks, e.g. to utter several phrases, to be modified to utter alternative phrases. Alternatively it is desirable to be able to modify a toy which is programmed with a plurality of questions and answers by changing the questions and answers with which the toy is programmed. Toys are known in which modification is achieved for example, by inserting alternative instruction cards, electronic memory cartridges or the like, but the modification which can be made is restricted by the availability of such alternative instruction cards or cartridges. Whereas it would be possible to provide a toy which is inherently capable of performing a greater number of tasks, this increases the initial expense of the toy and there can be play value in a child being able to modify a toy. To modify a toy using conventional data transfer technology, e.g. by downloading data on a telecommunications link using conventional data transfer technology would be prohibitively expensive and due to the nature of the necessary equipment, too complicated for at least a young child to perform. According to a first aspect of the fourth invention, we provide a method of transferring data from a data source to a product which is capable of using the data to perform a task, the method comprising establishing a link between the data source and the product, sending a signal including the data along the link from the data source to the product, and when the product indicates to an operator that the data has been received, the operator manually terminating the link. Thus, there is no need for the product and the data source to "handshake" i.e. to have two-way communication. Accordingly the need for a modem at the end of the communications link where the product is located, which is capable of modulating a signal for transmission on the link, is avoided. This is because the method relies on an operator terminating the link when the data has been received by the product . A signal needs only to be sent along the link from the data source to the product, there being no need for the product to send any signal along the link to the data source. Thus a very simple method of data transfer is provided, which may readily be performed by even a young child by using a telephone or computer apparatus. Even though the volume and nature of the data transferred may be restricted by virtue of the one-way signal and the nature of the apparatus, a sufficiently large packet of data may be transferred to modify at least a simple toy or plaything. Preferably to assist an operator in knowing when the data has been received by the product, the product has means to indicate to an operator when the data from the data source has successfully been received so that the operator may then manually terminate the link. Such means may include a simple visual indicator and/or an audible indicator. To enable the method to be performed using a simple telephone apparatus such as a telephone handset, the product may include a transducer so that the product may receive and subsequently use data transmitted by the telephone apparatus as an audible signal or induced signal, and the method may include placing a telephone apparatus in the vicinity of the product whereby the signal sent by the data source is received by the transducer. Typically the product includes a memory for the data, and the data received by the product is stored in the memory for subsequent use in performing a task so that it is unnecessary for the data to be used live as it is transferred. The data received by the product may replace data already stored in memory so that the product will thereafter only be capable of performing tasks according to the new data received, or the data may additionally be stored in memory along with existing data so that the product may perform additional tasks on receipt of the data. To enable the product to receive multiple packets of data without the need for two-way data communication between the data source and the product, the data to be transferred may be sent in discrete packets, and the method may thus include manually signalling the data source using a telephone apparatus, when one data packet has been received by the product, whereby the data source then sends another data packet. For example, a signal may be sent by the operator to the data source using a touch tone key of a telephone handset . More generally the transmission of data from the data source along telecommunications link, may be controlled at the end of the link where the product is located, solely by a telephone apparatus, by means of which a telecommunications link is established, maintained and terminated. Although the product may have an interpreter to interpret the data received, and the data may include one or more specific commands, in a simple arrangement the product may have a memory, preferably a read-only memory, in which there is contained information necessary to enable the product to perform a plurality of tasks, and the data may include further information relevant to at least one of the tasks, whereby when the product has received the data e.g. in a random access memory or an EEPROM, the product is enabled to perform at least one of the plurality of tasks. In one example the product may be a toy which is capable of performing a plurality of tasks, a set of instructions for the performance of each task being stored in the memory thereof, and the data transferred from the data source activating at least one of the sets of instructions so that the toy is enabled to perform at least one of the tasks, or the data transferred from the data source may activate a plurality of sets of instructions, and the toy is enabled to perform a plurality of the tasks, e.g. in an order specified in the transferred data. According to a second aspect of the fourth invention, we provide an apparatus for performing the method of the first aspect of the invention including a product which is capable of using the data to perform a task, a data source, and a link between the data source and the product, the product having signal receiving means for receiving a signal from the data source and no means for sending a signal along the link to the data source, and the link including apparatus by means of which, when the data is received by the product, the link may manually be terminated. The invention will now be described with reference to Figure 6 which is a schematic illustration of an apparatus for use in performing the method of the invention. Referring to the drawing there is shown a product 110 which in this example is a child's soft toy, a telephone apparatus 111 including a telephone handset 112 and base unit 114, which is connected to a conventional telecommunications network 115. Also connected to the network 115 is a source of data 116. In this example the toy 110 has a memory 118 in which is contained information, which, when a child operates a control 119, may be used by the toy 110 to utter phrases. However the toy 110 is enabled to use only a small portion of the stored information, so that only selected of the phrases for which information is stored in the memory 118, may be uttered. When it is desired to change the selection of phrases, a telecommunications link may be made with the data source 116 using the telephone apparatus 111 and the network 115. Even a young child may use the telephone apparatus 111 to call the data source 116. When the link is established, the data source 116 may commence sending a signal which contains a data packet to the telephone apparatus 111. The child may place the handset 112 in he vicinity of the toy 110 so that the audible signal transmitted by the handset 112 may be received by a sound transducer 120 of the toy 110, or a signal may be induced otherwise in an appropriate transducer of the toy 110. The data source 116 is arranged repeatedly to send the signal, there being no connection established between the product 110 mid the data source 116 which would enable the product to signal the data source 116. The signal may be sent by the data source 110 until the telecommunications link is broken, e.g. by the child replacing the handset 112 on the base unit 114. Alternatively, the signal may be sent for a predetermined time only, and the operator would have to press a selected touch tone key 122 if it is desired for the data to be resent, if the data has not successfully been received by die toy 110. When the toy 110 has received a complete packet of data, the toy 110 may indicate this to the child by issuing an audible tone, uttering an appropriate phrase or the like, or by some visual indicating means. In response, the child may terminate the telecommunications link e.g. by replacing the telephone handset 112 on its base 114. The toy 110 may be arranged to indicate to an operator that a complete packet of data has not been received, by issuing an alternative indication. The data packet received by the toy 110 may include an instruction to enable the toy 110 to utter alternative or additional phrases e.g. when the control 119 is operated or at any other time, utilising the information already stored in the memory 118. Thus the toy 110 may be modified to perform additional or alternative tasks e.g.. phrase utterings, upon receipt of the data, and the data transfer is controlled entirely by the operation of the telephone apparatus 111. For a more complex toy 110 or other product, it may be desired to transfer multiple packets of data from the data source 116 to the product 110 to modify the tasks which the product 110 is able to carry out. This may be achieved by the data source 116 repeatedly sending a first data packet until the product 110 indicates that the packet has been received. Then the operator may signal the data source 116 subsequently repeatedly to send an alternative packet of data, such signalling being achieved solely using the telephone 111, e.g. by pressing a selected touch tone key 122 thereof, to send a selected signal being a tone, to the data source 116. When the alternative packet of data has been received by the product 110, further data packets may be transferred from the data source 116 to the product 110 in the same way, and when a last data packet has been received by the product 110, the product 110 may indicate this to an operator, e.g, by issuing a special tone, other audible or visual signal, so that the operator may then terminate the telecommunications link. The transferred data may indicate the order in which additional/alternative tasks are to be performed. Thus each task may be part of an overall task, and by varying the order in which tasks are to be performed, a large number of alternative overall tasks may be accomplished. Instead of the toy 110 or other product having information already programmed therein which is activated by the transferred data, it will be appreciated that the transferred data may include information required by the product to perform a task, such as commands in an appropriate computer language. Thus the product 110 may require an interpreter to interpret the transferred data to enable the product to use the data to perform tasks. Depending on the nature of the product 110, the tasks which the product 110 may be enabled to perform by the transferred data may include not only the uttering of phrases, but may include movements e.g. of limbs or a head of the toy 110, or the tasks may be the visual and/or audible presentation of information, which may be question and answer form. The product 110 may alternatively be an effector means for use in an industrial or consumer environment, the performance of the effector means being changed by the received data to suit the effector for a particular application. Particularly but not exclusively where the product 110 is a toy, by manipulating the telephone apparatus 111, an operator may arrange for particular data to be made available to third parties from the data source 16. For example, a child satisfied with the modified performance of his toy 110, may, using a series of touch tones on the telephone apparatus 111, arrange for the same data transferred to his toy, to be made available to a friend who establishes a telecommunications link with the data source 116 and the friend's own toy, rather than the friend having different data transferred to him from the data source 116. The data source 116, using a telephone apparatus 111, could be arranged to transfer data to each of a group of two or more products 110 at different locations to enable a particular shared game to be played in the different locations, using a product 110 in each of the different locations . Although the invention has been described in relation to a product 110 comprising a toy, the invention may be applied for transferring data from a data source 116 to many alternative kinds of product, to modify the performance of the product to enable the product to perform additional, alternative or modified tasks. If desired, instead of the product 110 receiving an audible signal via the telephone apparatus 111, any other non-connecting coupler may be provided such as a telephone induction coupler of the kind used for hearing aids, so that a signal may be induced in a suitable transducer of the product 110, other than a sound transducer 120, when the telephone apparatus 111, or at least a handset hereof is placed in the vicinity of the toy 10 or other product. Although the invention has been described with reference to a telephone apparatus 111 which is physically connected to a telecommunications network 115, a mobile telephone could of course be used, by which we include radio telephones, which are connected to a telecommunications network over the ether. Although the data source 116 may be a central databank to which each of a plurality of telecommunication connections may be made, if desired, the data source 116 may be a second toy or other product, so that data can be transferred between two toys by the method of the invention. In that event, the toy 110 or other product sending data would require a suitable output transducer so that data may be sent on the telecommunications network 115, as would an input transducer to receive data. In this latter context, if appropriate the two may electronically "handshake" so that manual intervention of the data transfer process may not be required. This invention is particularly suitable to enable new scripts (as previously described) or other data to be downloaded to a toy or other interactive device. In this context, when a user wishes to download a new script, he places the toy so that its microphone is near to his PC's loudspeaker if the download is from a PC or adjacent a telephone apparatus. The toy is set to "receive script" mode whereupon it says something like:
"Please click on DOWNLOAD"
The user then clicks on DOWNLOAD and a series of tones are heard from the loudspeaker. These tones are decoded by software in the toy. If the script data has been received intact the toy utters an appropriate message such as :
"Operation complete . Thank you . Please click on OK" If the script data has not been received intact the toy utters a message asking the user to click on certain button icons on the screen - there will be one button for each block of data and the user is asked to click on those corresponding to the block (s) of data that did not arrive intact . The process thus does not require a modem or acoustic couplers . In another example, data could be downloaded using tones output by the loudspeaker (s) of a PC. In this case, a user can indicate an error by using the keyboard, mouse or other input device on the PC.
FIFTH INVENTION In the field of toys and other animated products, it is common to provide a synthesized face with moving lips and possibly blinking eyes in conjunction with uttered speech. A problem with lips and eyes that move under motor control is that it creates noise. Another problem is that it is difficult to achieve good synchronization between the toy's moving lips and the speech being uttered because the motor controlled lip movements tend to be slower than is required. In accordance with the fifth invention, there is provided an apparatus for generating an image of a face or one or more parts thereof, the apparatus comprising an image generator having a plurality of individually actuable elements that can be controlled to generate the desired image; a controller for controlling the elements of the image generator in synchronism with speech generated by a speech synthesizer or with prerecorded speech; and an optical system for focussing a representation of the image generated by the image generator onto a face surface. The elements typically define different mouth shapes such as the form of lips but alternatively or in addition may define different eye shapes, nose shapes, ear shapes, cheek shapes, eyebrow shapes or shapes of other parts of a face . A particular advantage of this invention is that the facial expressions can be much more accurately matched with uttered speech, often on a phoneme by phoneme basis. In one embodiment of the fifth invention, the elements of the image generator are luminous. For example, each element may be a light emitting diode or alternatively, it may be an incandescent light source. In another embodiment, the elements of the image generator may be liquid crystal elements and the image generator may further comprise a light source, such as a light emitting diode. In this case, each liquid crystal element will typically form part of a liquid crystal display. The elements of the image generator may be arranged to form a dot matrix in order that any desired shape can be generated. In the event that the image generator comprises liquid crystal elements then these may be mounted on a transmissive substrate and the light source may be disposed so that light emitted by it shines through the substrate and any deactivated liquid crystal elements. Alternatively, the liquid crystal elements may be mounted on a reflective substrate and the light source may be disposed so that light emitted by it that passes through any deactivated liquid crystal elements is reflected by the substrate back through those elements. An example of a synthesized face projection system according to the fifth invention will now be described with reference to the accompanying drawings, in which: - Figure 7 is a schematic side view of the optical arrangement ; Figure 8 is an enlarged plan view of the part of the LCD array of Figure 7 defining a mouth; and, Figures 9-13 illustrate different projected images. The system shown in Figure 7 comprises a light source 211 such as a bright LED located behind a liquid crystal display (LCD) array 212 with a lens 216 between them. The array 212 is preferably a high contrast transmissive LCD. On the other side of the LCD array 212 is provided a lens system 213 which refocuses the image generated by the
LCD array 212 so that it appears in much enlarged form on the surface of a screen 214. The screen 214 is of a type which is normally opaque but which allows the projected image to be seen through it. The distance between the lens
213 and the LCD array 212 can be adjusted to change the image size as necessary. The construction of the part of the LCD array 212 defining a mouth is shown in more detail in Figure 8. As can be seen, the LCD has a set of segments (labelled 1-9 and A-E) defining various mouth or lip configurations.
Each of these segments is individually controllable by a control processor 215. In Figure 9, a neutral mouth shape is exhibited by activating segments 3 to C. Figure 10 shows an open mouth with teeth visible by activating segments 2,4,6, 9,A, B,D. Figure 11 illustrates the lips in conjunction with a tongue by activating segments 2,5,8,9,A,B,C. Figure 12 is a modification of the Figure 5 display using segments 2,5,9,A,B,D. Finally, Figure 13 illustrates a wide open mouth shape formed by activating segments 1 and E only.
SIXTH INVENTION The sixth invention relates to a music composition system. Examples of known systems are described in US-A- 4664010 and US-A-4926737. In accordance with the sixth invention, a music composition system comprises pitch estimation apparatus for monitoring a tonal sequence to estimate the pitch and duration of each note in the sequence; and a music composition module for generating a musical composition based on the estimated note pitches and durations. The sixth invention provides a very convenient way of controlling a music composition module by singing, whistling, humming or playing a tonal sequence on which the music composition module bases its composition. This should be contrasted with known systems in which the tonal sequence must be manually entered. It also makes possible the capture and use of a sung, whistled or hummed tune for the purpose of serving as the theme for a composition. In a typical example, the user enters a tune or melody by humming or singing. Software for pitch estimation (also known as pitch detection, pitch tracking and pitch determination) estimates the pitch of each note and its duration. If the number of notes is insufficient to provide the composition module with a useful starting point, the note sequence is extended. The composition module then creates a piece of music based on the theme input by the user. The length of the piece of composed music may be specified to within any desired limits. Further compositions may be created based on the same theme. The key input parameters are stored in case they are required to reproduce the composition at a later data - these parameters include the original theme (after "smoothing" by the input module) , the duration, the "style" (see below) and the seed for the random number generator. An example of a music composition system according to the sixth invention will now be described with reference to the accompanying drawings, in which: - Figure 14 is a block diagram of the apparatus; and, Figure 15 is a flow diagram illustrating the process. The apparatus shown in Figure 14 comprises a microphone 300 coupled to a pitch estimation module 310 for analysing an input tonal sequence so as to identify the pitch of each note and its duration. This module 310 also ensures that a resultant tonal sequence is generated having a minimum length of, for example, eight, notes. This tonal sequence is then fed to a composition module 320 which generates a new musical composition from the input tonal sequence which is then output on an output device such as a loudspeaker 330 and/or stored in a memory (not shown) . The operation of the apparatus will now be described in more detail.
INPUT MODULE The tune is input (400) by the user singing, whistling, playing or humming a sequence of notes, any or all of which could be out of tune. Ideally the input module 310 functions in real time, estimating (410) the pitch of the user's notes and their duration. Performing the analysis on the input tune in real time avoids the need for excessive amounts of RAM, as the input data for a note is discarded as soon as the pitch and duration of that note have been determined. Numerous pitch estimation techniques and algorithms have been devised over the years . Various techniques are described in: Gold & Rabiner (1969) : Parallel Processing Techniques for Estimating Pi tch Periods of Speech in Time Domain; Carey, Parris & Tattersall (1997) : Pi tch Estimation of Singing for re-Synthesis and Musical Transcription; McNab, Smith, Bainbridge and Witten (1997) : The New Zealand Digi tal Library MELodylndex; Talkin (1995) : A Robust Algori thm for Pi tch Tracking (RAPT) in Klejn & Paliwal (Eds . ) "Speech Coding and Synthesis " , Elsevier. The publication by McNab et . al . includes a description of adaptive tuning, which is often necessary because users will often hum or sing notes that are (at least slightly) out of tune. When all of the user's input notes have been adaptively tuned, the next stage is to apply an adaptive technique to the note durations. In our preferred embodiment, the notes can have eight different durations - the four basic durations are: whole note (semibreve) , half note (minim) , quarter note (crotchet) , eighth note (quaver) and for each of the four there is also a note whose duration is 50% greater (the so-called "dotted" notes) .
An Adaptive Technique for Note Duration Without having any reference duration for the notes (there are no time signatures or any associated data) it is not possible to use the same type of adaptive technique on the durations as is employed on the pitches. The following method has the advantage of being fairly simple to compute.
1. Assume that the whole note has a duration of 0.2 sees. Calculate the other seven note durations accordingly. For each of the user's actual note durations compute the difference in duration between the note itself and the closest of the eight "standard" durations, then divide this difference by the duration of the closest standard and square the result. Do this for all the user's notes and sum the squares. (For speed of arithmeric calculations, it is possible to scale everything so that integer arithmetic can be used.)
2. Perform the same calculation for whole note durations of 0.21, 0.22, .... 1 second.
3. Choose the whole note duration for which the sum of squares of the differences is a minimum.
4. If the duration of any user note is less than half a quaver assume it is an error and change the pitch of the note to that of the preceding or following note, whichever is the closest in pitch to the errant note.
5. Adjust the duration of each rest to whichever duration (out of the eight) represents the smallest fractional error (i.e. difference/standard).
The starting point for the composition process is a tune (possibly just one note) made up of notes (pitches on a true musical scale) and rests, all quantized to the same duration scale.
Extending the User ' s Tune The module 310 then determines how many notes have been input (step 420) . If this is less than 8, then the sequence must be extended. The input tune or theme on which the program's compositions will be based should ideally consist of at least 8 notes. If the user hums, whistles or sings fewer than 8 notes then the tune can be extended (step 430) to 8 notes or longer. This can be achieved in various ways, such as the following. (Here "+1" means one octave higher so nl+1 means the note one octave higher than the given note nl; similarly "-1" means one octave lower and nl-1 means the note 1 octave lower than the given note nl , ... etc. )
If the user's tune is just 1 note (nl) Choose at random from the following: nl nl+1 nl+2 nl+1 nl-1 nl-2 nl-1 nl 8 notes nl nl-1 nl-2 nl-1 nl+1 nl+2 nl+1 nl 8 notes nl nl nl+1 nl+1 nl nl nl-1 nl-1 nl nl 10 notes nl nl nl-1 nl-1 nl nl nl+1 nl+1 nl nl 10 notes nl nl nl+1 nl nl nl-1 nl nl 8 notes nl nl nl-1 nl nl nl+1 nl nl 8 notes nl nl-1 nl-2 nl-2 nl-1 nl+1 nl+2 nl+2 nl+1 nl 10 notes nl nl+1 nl+2 nl+2 nl+1 nl-1 nl-2 nl-2 nl-1 nl 10 notes
If the user's tune is 2 notes (nl, n2) Choose at random from the following: nl n2 nl+1 n2+l nl n2 nl-1 n2-l nl n2 10 notes nl n2 nl+1 n2+l nl+2 n2+2 nl+1 n2+l nl n2 10 notes nl n2 nl-1 n2-l nl-2 n2-2 nl-1 n2-l nl n2 10 notes nl n2 nl-1 n2-l nl n2 nl+1 n2+l nl n2 10 notes nl nl+1 n2 n2+l nl nl-1 n2 n2-l nl n2 10 notes nl n2 nl-1 n2-l nl+1 n2+l nl n2 8 notes nl n2 nl+1 n2+l nl-1 n2-l nl n2 8 notes nl n2 nl n2 nl n2 nl n2 8 notes
And so on. THE COMPOSITION MODULE The composition module is a software module and employs one or more algorithms for creating variations on the user's theme (440). Ideally the composition should return to the user's (extended) theme tune at the end of the piece. The general principle of our composition algorithm is that the variations move away from the original theme and then back again. This away-back process might happen more than once during the composed piece, for example the variations could move away in an upward direction (towards higher octaves) , and back, and then away in a downward direction (lower octaves) , and back. The composition is then output (step 450) . The principle method employed for composition uses Markov chaining. This allows music to be composed in a particular "style". The style is codified in Markov chain transfer tables created in the following way.
Creating Markov Chain Tables for Note Pitches In order to emulate the style of a given composer it is first necessary to acquire several pieces of music by that composer, for example songs. The pitches and durations of the notes in several compositions form the basis of the Markov tables. Such data can be found, for example, in electronic form, in MIDI files. The data required from each composition is the pitch of each note in the principal "voice", the duration of each note and the duration of each rest. Let us first consider just the pitches of the first notes in each verse. Using the normal 12 -note scale (A A#
B C C# D D# E F F# G G# [then back to A] ) we count how often a verse of the song starts with each of the 12 notes, normalize the frequencies and list the 12 notes of the scale each with their normalized frequency. We refer to this list as the "start table". Then, for each of the notes in the scale, count how often (over the whole collection of songs) that note is followed by each of the 12 notes in the scale. Once again we then normalize the frequencies and list, for each of the 12 notes, the 12 notes which can follow them together with their normalized frequencies. Thus for "A" we will have a table with the 12 notes: A A# B C ..., each with its normalized frequency. Then we perform the same process on note pairs . For each pair of notes we count how often that pair is followed by each of the 12 notes in the scale, normalizing the frequencies. Thus for the note pair: "A A" we will have a table with the 12 notes: A A# B C ..., each with its normalized frequency. Then perform the same process on note triples. This process can be extended to sequences of 4 or more notes if desired. At the end of this process we have, in effect, a table with 12x12x12x12 entries, each entry being a 4-note sequence together with its frequency of occurrence. In practice, the table is bigger than 12x12x12x12 because we must allow for the null note in a few places, e.g. at the end of the composition. If we did not have to take the user's theme into account we could now compose a piece of music in the style of the given composer as follows. Choose a note from the start table in accordance with the frequencies therein. (Use middle-C as the reference point.) Then use the 12x12x12x12 table to choose the second note of the composition in accordance with how often the first note (already chosen) is followed by each of the possible second notes - if the note chosen by this method is, for example, a D, then the software chooses the D nearest to the first note so as to avoid large jumps between successive notes. Then choose the third note by referring again to the 12x12x12x12 table, looking to see how often the first two notes of the composition (already chosen) are followed by each of the 12 notes. And so on. When the duration of the composition approaches the specified limit the algorithm should work towards bringing about the end of the piece. There are numerous ways in which this can be achieved that quickly reach a 4-note sequence ending with the null note. It should be appreciated that any known music composition module could be used in the present invention. Examples of methods of writing computer programs to compose music are found in the literature, e.g. "The Algorithmic Composer" by David Cope, A-R Editions, 2000, and "Experiments in Music Intelligence" by David Cope, A-R Editions, 1996.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic block diagram of an example of apparatus for use in any of the first to third inventions; Figure 2 is a flow diagram illustrating operation of an example of the first invention; Figure 3 is a block diagram illustrating a group of toys ; Figures 4 and 5 are flow diagrams illustrating different examples of the second invention; Figure 6 is a schematic illustration of an embodiment of apparatus according to the fourth invention; Figure 7 is a schematic side view of the optical arrangement of an example of the fifth invention; Figure 8 is an enlarged plan view of the part of the LCD array of Figure 7 defining a mouth; Figures 9-13 illustrate different projected images; Figure 14 is a block diagram of an embodiment of apparatus according to the sixth invention; and. Figure 15 is a flow diagram illustrating an example of a process according to the sixth invention. APPENDIX 1
MULTI-WAY CONVERSATIONS This Appendix is a specification by example of how multi-way scripted conversations can work. From the users' perception there will be a lot of conversational interactivity between all the toys and all the users in the group. It will be appreciated that RF signals are also transmitted representing the speech utterances addressed to particular toys as discussed earlier. The users will not be aware of these . When a toy enters a group The first greetings
Let us first assume that a single user (David) is interacting with a single toy (Tl) when another toy (T2) enters the room. Each toy immediately knows that the other is present because of the RF transmissions between them. As soon as there is a "convenient moment" Tl, acting as conversation leader, says:
"Oh look, David, here is my friend T2. Hi there T2. "
To which T2 responds following receipt of an appropriate RF signal addressed to it from Tl :
"Hello Tl . Hi there David. "
[NOTE: If there is more than one toy and/or more than one user already in the room then T2 ' s response is varied accordingly, for example: "Hello there Tl . Hi T3 . It ' s good to see you T4 , and you John, and Bill , and Fred ... " and then each of the other toys responds accordingly.] T2 checks if its owner is present
Tl knows that David is present because it has been talking to him. And T2 knows that David is present because Tl tells him so via RF. But the toys need to know whether
T2 ' s owner is present so T2 asks the question and passes the answer to Tl .
T2 : "Is < name of T2 ' s owner> here? "
if YES then T2 : "That's good" Tl : " Yes, I 'm real pleased too . " else T2: "That ' s a real pi ty. " Tl : "Yes, that ' s a great shame . "
[NOTE: If there is already more than one other toy in the room when T2 does this check, these utterances are echoed by each of the toys.]
When a toy leaves a group
If a toy (T2) knows that it is leaving the group, e.g. it decides to move away from the group, leave the room (if feasible) , become inactive (to go to sleep) , ... etc., it should first make a suitable comment :
T2 : "I 'm going now. Goodbye David . Bye Tl , Bye everyone. "
And then all the other toys in turn say: "Goodbye T2 "
But if a toy does not know it is leaving a group, e.g. if its battery goes flat or it is switched off, then all the other toys simply stop talking to it and to its owner.
This avoids any problems relating to whether or not a toy is still in the room after it has left the conversation.
When is "a convenient moment"?
[This question refers to the moment when a toy (T2) has joined a group and Tl says something relating to T2. ]
If a toy is uttering something from a script it should interrupt itself after its next statement or question (i.e. if it is making a statement prior to asking a question then it should stop before the question; if it is asking a question then it should stop after uttering the question) . Then, depending on whether the interruption came at the end of a statement or at the end of a question, it should say:
If at the end of a statement then: "Hey, wai t a minute . "
And it has now arrived at "a convenient moment".
Else it is at the end of a question: "Hey. Don ' t answer that <owner ' s name> . "
And it has now arrived at "a convenient moment".
Once the introduction utterances are completed the toy that interrupted itself continues with either (whichever is appropriate) :
If interrupted after a statement: " Wha t wa s I sayi ng <owner ' s name> ? Oh yes . "
and then it repeats that statement and continues with the script .
Else it interrupted itself after a question but before the answer: "What was I saying <owner ' s name> ? Oh yes . I asked you this . "
And then it repeats that question and awaits the answer.
If the toy (Tl) is not uttering something from a conversation script when another toy enters the group then either it has just finished a conversation script or action and is ready to start another script or action, or it is in the process of doing something else, such as singing or playing a game. If it has just finished a conversation script and has not started something else (there will be a short pause [1-2 seconds] between the two) then it simply goes through the above introduction process. But if it is doing something then it should interrupt itself just before it is next due to speak:
Tl : "Hi there T2. I 'm playing a game wi th my owner David . I ' ll talk to you in a moment . "
Then it continues with its present activity (singing, playing a game) until it is completed, and then it should say:
Tl: "Hello again T2. Let me introduce you . "
Then it continues with the "Oh look David ... " introduction, but omitting the words "Oh look David" . Chatterbot conversations can be treated in a similar way, so multi-way conversations are not restricted to be completely scripted conversations but can be part scripted and part chatterbot. Similarly chatterbot conversations (and "mixed mode" conversations) can be multi-way and the description below of how conversation leadership changes can apply equally to chatterbot and mixed mode conversations . This concludes the description of what happens when a toy enters a group.
Once a toy is in a group Let us assume that we have multiple toys (Tl, T2 , T3 , ...) and multiple users (David, Bill, Fred, ...)
The invention: [a] Allows the "conversation leader" (the toy currently leading the conversation) to change from one toy to another, ensuring that all toys get a fair crack of the whip. [b] Brings each of the humans into the conversation from time to time. [c] Demonstrates the intelligence of the toys by showing that they understand what the others in the group are saying and they know and remember things about the other toys .
Taking over as conversation leader After its first utterance as conversation leader there is a 90% probability that the toy will remain leader. The figure of 90% is chosen so as to create some continuity at the start of the leadership period but is a matter of taste. If the same toy does remain conversational leader then there is a 50% probability on each subsequent utterance that it remains leader for the next one, subject to a maximum of 4 successive utterances by the same toy. (Again, these percentages and the number 4 are a matter of taste . ) When a toy (Tm) hands over leadership to another toy (Tn) , it does so immediately after a response to a question. It (Tm) chooses to whom it should pass the leadership by counting, for each of the other toys in the group, how many toy utterances have been made in total since that toy last spoke. So for each toy other than Tm there will be a count - these counts serve as weights for randomly selecting the next toy to speak so the toy that has not spoken for the longest will have the greatest chance of being chosen to be the next to speak. Then, if the toy that is chosen to be next to speak has a count of 8 or more, the handing-over toy (Tm) says:
"What do you think about that Tn ? You haven ' t said anything for a while . "
Then Tn says :
"Very interesting. "
and takes its turn as conversation leader. If the handover takes place while a script is in progress the new conversation leader continues with that script. Otherwise the new conversation leader acts in accordance with preprogrammed commands to determine its behaviour.
Bringing each of the humans into the conversation from time to time There is a 90% probability that a new conversation leader will address its next question to the same owner to whom the previous question was addressed. If it does so there is a 50% probability on each subsequent utterance that it addresses the same owner for the next one, subject to a maximum of 5 successive questions to the same owner. Assume that if a toy leaves the group then its owner has also left the group. If any toy asks a question of any owner and the answer provides data that should be stored in that owner's PDB within the toy (PDB=Personal Data Base - a database of personal information relating to the toy, its owner and other people with whom it has been in conversation) . The type of data stored in the PDB includes whether the toy or a person likes a particular food. Demonstrating the intelligence and memory of the toys Each toy keeps track of other toys it has met. (A list of their unique identifiers is retained in the EEPROM . ) Every now and then a toy (Ta) should ask something of another toy (Tb) . If Ta and Tb have not met on a previous occasion then the question should be prefaced by:
"Someone told me tha t you <FACT> . Is that right ? "
where "FACT" is something stored in Tb ' s PDB. (The fact could be chosen at random.) Possible facts to discuss are:
Likes - "... you like porridge . "
Dislikes - "... you don't like prawns "
Games - "... you won a game of Nim against <owner ' s name> <on Thursday/yesterday/this morning/ >"
Etc.
If the answer is consistent with the stored fact then Ta says :
"Tha t ' s what I thought . "
But if the answer differs from the stored fact then Ta says :
"My memory must be playing tricks on me . "

Claims

1. A speech generating assembly comprising an input device, such as a speech recogniser; a processing system connected to the input device to receive an input utterance; and a speech output device, such as a speech synthesizer, for outputting an utterance under control of the processing system operating in accordance with a predetermined algorithm which responds to an input utterance to generate an output utterance to which a single word or short phrase reply is expected via the input device, and, in response to that reply, to generate an input utterance for use by the predetermined algorithm.
2. An assembly according to claim 1, wherein the input device comprises a speech recognizer.
3. An assembly according to claim 1 or claim 2, wherein the processing system responds to a single word input reply.
4. An assembly according to claim 3, wherein the single word is either "yes" or "no" .
5. An assembly according to any of the preceding claims, wherein the predetermined algorithm is adapted to match one or more words of the input utterance generated by the processing system with one of a plurality of utterances or partial utterances stored in a memory and to cause an output utterance to be generated selected in accordance with the result of the matching process .
6. A speech generating assembly comprising a speech output device, such as a speech synthesizer; a processor for implementing scripted speech; and an information signal receiving and transmitting system for receiving information signals from, and transmitting information signals to, other speech generating assemblies, the processor being adapted to act as a "conversation leader" for a period or for a number of utterances determined in accordance with a predetermined algorithm and then to generate a transfer information signal to transfer conversation leadership to another speech generating assembly.
7. An assembly according to claim 6, wherein the processor is responsive to an input signal at least temporarily to depart from the script and utter an output.
8. An assembly according to claim 7, wherein the input signal is generated in response to a change in the physical attitude of the assembly, or to the physical circumstances of the assembly for example its temperature or its physical proximity to a user or a change in position of a user or to other changes that can be determined by electronic sensing devices .
9. An assembly according to any of claims 6 to 8 , wherein the processor relinquishes conversation leadership after a predetermined number of utterances or after a predetermined period of time.
10. An assembly according to claim 9, wherein the predetermined period or the predetermined number of utterances is determined in accordance with a probability function such that there is an increased probability that conversation leadership will be transferred with an increase in the number of utterances made by the conversation leader.
11. An assembly according to any of claims 6 to 10, wherein the information signal receiving and transmitting system is adapted to transmit signals defining the speech generated by the assembly so another assembly can respond according to the script .
12. A speech generating assembly comprising a speech output device, such as a speech synthesizer; a processor for controlling the speech output device; an information signal receiving and transmitting system for receiving information signals from and transmitting information signals to other speech generating assemblies, the processor being adapted to exchange information signals with one or more other speech generating assemblies when it wishes to output an utterance, the processor controlling the speech output device to output an utterance when the information signal exchange indicates that the or each other assembly is not outputting an utterance.
13. An assembly according to claim 12, wherein the transmitted information signal contains the address of at least one destination speech generating assembly, and wherein the processor is adapted to monitor for a response from that destination speech generating assembly that it received the transmitted information signal and is not outputting an utterance, the processor thereafter controlling the speech synthesizer to output the utterance.
14. An assembly according to claim 13, wherein if it determines that another assembly is outputting an utterance, the processor is adapted to transmit control signals to cause all speech generating assembles to cease outputting utterances for a, preferably random, period of time .
15. An assembly according to any of claims 6 to 14, wherein the information signal receiving and transmitting system is adapted to transmit and receive signals using RF, infrared or ultrasound.
16. An assembly according to any of the preceding claims, wherein the processing system or processor is adapted to calculate a mood parameter and to utilize the mood parameter in the choice of output utterance.
17. A toy in which an assembly according to any of the preceding claims is located.
18. A group of toys according to claim 17, when dependent on at least claim 6 or claim 12, each toy being programmed with the same set of scripts.
19. A method of transferring data from a data source to a product which is capable of using the data to perform a task, the method comprising establishing a link between the data source and the product, sending a signal including the data along the link from the data source to the product, and when the product indicates to an operator that the data has been received, the operator manually terminating the link.
20. A method according to claim 19, characterized in that a signal is sent along the link from the data source to the product only, without the product sending any signal along the link to the data source.
21. A method according to claim 19 or claim 20, characterized in that the product has means to indicate to an operator when the data from the data source has successfully been received so that the operator may then manually terminate the link.
22. A method according to any one of claims 19 to 21, characterized in that the product includes a transducer and the data signal is transmitted by a telephone or other apparatus as an audible or induced signal, the method including placing the apparatus in the vicinity of the product whereby the signal sent by the data source is received by the transducer.
23. A method according to any of claims 19 to 22, characterized in that the product includes a memory for the data, and the data received by the product is stored in the memory for subsequent use in performing a task.
24. A method according to claim 23, characterized in that the data received by the product replaces data already stored in memory, or is additionally stored in memory along with existing data.
25. A method according to any of claims 19 to 24, characterized in that the data is sent in discrete packets, and the method includes manually signalling the data source when one data packet has been received by the product, whereby the data source then sends another data packet .
26. A method according to any of claims 19 to 25, characterized in that the transmission of data from the data source along the link, is controlled at the end of the link where the product is located, solely by a telephone or other receiving apparatus.
27. A method according to any of claims 19 to 26, characterized in that the product has a memory in which there is contained information necessary to enable the product to perform a plurality of tasks and the data includes further information relevant to at least one of the tasks, whereby when the product has received the data, the product is enabled to perform the at least one of the plurality of tasks.
28. A method according to claim 27, characterized in that the product is a toy which is capable of performing a plurality of tasks, a set of instructions for the performance of each task being stored in the memory thereof and the data transferred from the data source activating at least one of the sets of instructions so that the toy is enabled to perform at least one of the tasks.
29. A method according to claim 28, characterized in that the data transferred from the data source activates a plurality of sets of instructions whereby the toy is enabled to perform a plurality of tasks.
30. A method according to any of claims 19 to 29, wherein the link is a telecommunications link.
31. An apparatus for performing the method of any claims 19 to 30, including a product which is capable of using the data to perform a task, a data source, and a link between the data source and the product, the product having signal receiving means for receiving a signal from the data source and no means for sending a signal along the link to the data source, and the link including apparatus by means of which, when the data is received by the product, the link may manually be terminated.
32. Apparatus according to claim 31, wherein the link is a telecommunications link, the apparatus comprising telephone apparatus.
33. Apparatus for generating an image of a face or one or more parts thereof, the apparatus comprising an image generator having a plurality of individually actuable elements that can be controlled to generate the desired image; a controller for controlling the elements of the image generator in synchronism with speech generated by a speech synthesizer or with prerecorded speech; and an optical system for focussing a representation of the image generated by the image generator onto a face surface .
34. Apparatus according to claim 33, wherein the elements of the image generator are luminous .
35. Apparatus according to claim 34, wherein each luminous element is a light emitting diode (LED) .
36. Apparatus according to claim 34, wherein each luminous element is an incandescent light source.
37. Apparatus according to claim 33, wherein the elements of the image generator are liquid crystal elements and the image generator further comprises a light source.
38. Apparatus according to claim 37, wherein each liquid crystal element forms part of a liquid crystal display (LCD) .
39. Apparatus according to any of claims 33 to 38, wherein the elements of the image generator are arranged as a dot matrix.
40. Apparatus according to any of claims 37 to 39, wherein the liquid crystal elements are mounted on a transmissive substrate and the light source is disposed so that light emitted by it shines through the substrate and any deactivated liquid crystal elements.
41. Apparatus according to any of claims 37 to 39, wherein the liquid crystal elements are mounted on a reflective substrate and the light source is disposed so that light emitted by it that passes through any deactivated liquid crystal elements is reflected by the substrate back through those elements.
42. A music composition system comprising pitch estimation apparatus for monitoring a tonal sequence to estimate the pitch and duration of each note in the sequence; and a music composition module for generating a musical composition based on the estimated note pitches and durations.
43. A system according to claim 42, further comprising a microphone coupled to an analogue-to-digital converter for generating a digital representation of the tonal sequence.
44. A system according to claim 42 or claim 43, wherein the pitch estimation apparatus is adapted to operate in real time to estimate the pitch and duration of the input tonal sequence .
45. A system according to any of claims 42 to 44, wherein the pitch estimation apparatus is further adapted to allocate one of a plurality of predetermined durations to each note in the tonal sequence .
46. A system according to any of claims 42 to 45, wherein the pitch estimation apparatus is adapted to generate additional notes in the tonal sequence.
47. A system according to claim 46, wherein the pitch estimation apparatus is adapted to generate additional notes if the tonal sequence is less than 8 notes.
48. A system according to claim 46 or claim 47, wherein the additional notes are selected from a number of predetermined sequences of notes each commencing with the input tonal sequence .
PCT/GB2004/004395 2003-10-17 2004-10-15 Voice controlled toy WO2005038776A1 (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
GB0324376.3 2003-10-17
GB0324372.2 2003-10-17
GB0324375.5 2003-10-17
GB0324375A GB0324375D0 (en) 2003-10-17 2003-10-17 Synthesized face projection system
GB0324373.0 2003-10-17
GB0324376A GB0324376D0 (en) 2003-10-17 2003-10-17 Music compositions system
GB0324372A GB0324372D0 (en) 2003-10-17 2003-10-17 Speech generating assemblies
GB0324373A GB0324373D0 (en) 2003-10-17 2003-10-17 A method of transferring data

Publications (1)

Publication Number Publication Date
WO2005038776A1 true WO2005038776A1 (en) 2005-04-28

Family

ID=34468417

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2004/004395 WO2005038776A1 (en) 2003-10-17 2004-10-15 Voice controlled toy

Country Status (2)

Country Link
TW (1) TW200523005A (en)
WO (1) WO2005038776A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007007228A2 (en) 2005-07-11 2007-01-18 Philips Intellectual Property & Standards Gmbh Method for communication and communication device
EP2444948A1 (en) * 2010-10-04 2012-04-25 Franziska Recht Toy for teaching a language
US9213940B2 (en) 2006-06-29 2015-12-15 International Business Machines Corporation Cyberpersonalities in artificial reality
US9369410B2 (en) 2009-01-08 2016-06-14 International Business Machines Corporation Chatbots

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI412393B (en) * 2010-03-26 2013-10-21 Compal Communications Inc Robot
CN103949072B (en) * 2014-04-16 2016-03-30 上海元趣信息技术有限公司 Intelligent toy is mutual, transmission method and intelligent toy

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0730261A2 (en) * 1995-03-01 1996-09-04 Seiko Epson Corporation An interactive speech recognition device
US6381574B1 (en) * 1998-03-18 2002-04-30 Siemens Aktiengesellschaft Device for reproducing information or executing functions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0730261A2 (en) * 1995-03-01 1996-09-04 Seiko Epson Corporation An interactive speech recognition device
US6381574B1 (en) * 1998-03-18 2002-04-30 Siemens Aktiengesellschaft Device for reproducing information or executing functions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KELLNER A ET AL: "PADIS - An automatic telephone switchboard and directory information system", SPEECH COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 23, no. 1-2, October 1997 (1997-10-01), pages 95 - 111, XP004117212, ISSN: 0167-6393 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007007228A2 (en) 2005-07-11 2007-01-18 Philips Intellectual Property & Standards Gmbh Method for communication and communication device
WO2007007228A3 (en) * 2005-07-11 2007-05-03 Philips Intellectual Property Method for communication and communication device
US9213940B2 (en) 2006-06-29 2015-12-15 International Business Machines Corporation Cyberpersonalities in artificial reality
US9369410B2 (en) 2009-01-08 2016-06-14 International Business Machines Corporation Chatbots
US9794199B2 (en) 2009-01-08 2017-10-17 International Business Machines Corporation Chatbots
EP2444948A1 (en) * 2010-10-04 2012-04-25 Franziska Recht Toy for teaching a language

Also Published As

Publication number Publication date
TW200523005A (en) 2005-07-16

Similar Documents

Publication Publication Date Title
US9039482B2 (en) Interactive toy apparatus and method of using same
US6110000A (en) Doll set with unidirectional infrared communication for simulating conversation
US11504856B2 (en) System and method for selective animatronic peripheral response for human machine dialogue
US10967508B2 (en) System and method for dynamic robot configuration for enhanced digital experiences
Osada et al. The scenario and design process of childcare robot, PaPeRo
WO2001050342A1 (en) Multiplicity interactive toy system in computer network
Ritschel et al. Personalized synthesis of intentional and emotional non-verbal sounds for social robots
JP2000187435A (en) Information processing device, portable apparatus, electronic pet device, recording medium with information processing procedure recorded thereon, and information processing method
KR20020071917A (en) User interface/entertainment device that simulates personal interaction and charges external database with relevant data
JP2003205483A (en) Robot system and control method for robot device
KR20020067592A (en) User interface/entertainment device that simulates personal interaction and responds to user&#39;s mental state and/or personality
KR20020067591A (en) Self-updating user interface/entertainment device that simulates personal interaction
WO2004104736A2 (en) Figurines having interactive communication
KR20020067590A (en) Environment-responsive user interface/entertainment device that simulates personal interaction
US10994421B2 (en) System and method for dynamic robot profile configurations based on user interactions
JP2006061632A (en) Emotion data supplying apparatus, psychology analyzer, and method for psychological analysis of telephone user
US20190202061A1 (en) System and method for detecting physical proximity between devices
US20220241977A1 (en) System and method for dynamic program configuration
JP2002169590A (en) System and method for simulated conversation and information storage medium
WO1999032203A1 (en) A standalone interactive toy
WO2005038776A1 (en) Voice controlled toy
JP3958253B2 (en) Dialog system
JP2006109966A (en) Sound game machine and cellular phone
US20020137013A1 (en) Self-contained, voice activated, interactive, verbal articulate toy figure for teaching a child a chosen second language
JPH10328421A (en) Automatically responding toy

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase