SOUND RESPONSIVE TOY " " "
The present invention relates to an electronic sound-responsive toy capable of distinguishing between several types of input sounds and generating a variety of speech-like sounds in response thereto which relate in a perceptible way to the input sound.
Sound-generating toys have been available for many years, some based on mechanical sound generators, some, more recent designs, employing electronic means to produce the sounds. Generation of complex sounds such as human speech or animal noises by electronic means is well-known. However, it is too costly and complex for implementation in toys using such prior art apparatus.
Sound responsive toys are also commonly known, most of them accepting specific input sounds as "commands", after receipt of which a specific action is taken. An example of this is the model car which upon sensing a loud click or handclap starts to move forward, stopping or reversing or turning upon receipt of further clicks or claps. The apparatus of the present invention comprises a toy which combines the functionalities of the sound-generating and sound-responsive toys mentioned above, resulting in a "conversational" mode of stimulus-response interaction with the user. Further more, novel use has been made in the apparatus of a technique known in music synthesis as "Piecewise-Linear" control (see Bernstein and Cooper, "The Piecewise-Linear Technique of Electronic Music Synthesis", Journal of the Audio Engineering Society, Jul /August 1976 Vol. 24 No. 6, pp. 446-454. To reduce both the quantity of data and circuit complexity required for faithful reproduction of speech and speech-like sounds, thereby making possible the production of low-cost electronic toys capable of generating high-quality vocal sounds.
Traditional control schemes for speech synthesizer parameters e ply digital-to-analog converters or their equivalent which are able to specifiy a paramenter to any desired point within its range. There is a one-to-one correspondence between the digital codes used to specify a parameter and the resultant parameter value. This type of control mechanism is referred to as "Piecewise Constant". Using this approach it is necessary to specify explicitly each frequency desired at the instant in time at which it is desird. This scheme requires a great deal of data to produce smooth transitions from one value to another, since a great many intermediate points must be specified during the period of transition (refer to Fig. 3) . The "Piecewise Linear" process presented here circumvents this problem by specifying not the absolute formant frequencies desired, but rather the rate at which the frequencies are to change and the direction of their motion. The amount of data required is thus reduced since new data is required only when the slope of a parameter changes, not when its value changes (refer to Fig. 4) . Of note is the lack of steps in the transition from one value to another despite the low data rate; the transition is smooth, resulting in a natural-sounding synthesis.
SUMMARY OF THE INVENTION The invention relates to a sound responsive toy which produces vocal sounds in response to input sounds in a "conversational" manner. The apparatus receives sound signals by means of a loudspeaker and amplifier. Upon sensing a sound stimulus which meets certain frequency and amplitude requirements, the apparatus, by means of logic circuitry, analyzes the received sound with regard to its spectral content and its duration. The apparatus then waits for cessation of the sound, whereupon a response appropriate to the type of stimulus is chosen according to predetermined rules and generated by means of conditioning circuitry controlled by the logic circuitry and specialized controlling circuits. The resulting signals are amplified and converted to sound energy via the same loudspeaker which is used to convert input sounds to electrical signals. In the specific embodiment the apparatus is contained in a plush toy e.g. a parrot. Squeezing the toy energizes the apparatus which responds with a "wolf whistle". Conversational speech causes the apparatus to respond in parrot-like squawks when the speaker pauses. A sharp clap of the hands also causes a "wolf whistle" response. A whistle causes the apparatus to play one of three songs depending upon the previous number of whistles received during the energized period. An automatic shut-off feature is also included which removes power if a sound stimulus is not received within a aredetermined period.
A BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a block diagram of the complete system.
Fig. 2 is a flow diagram showing the operational cycle of the device.
Fig. 3 depicts the operation of a piecewise-constant .parameter control mechanism of the prior art.
Fig. 4 depicts the operation of the pieσewise-linear parameter control mechanism of the invention.
Fig. 5 is a detailed schematic diagram of one of the piecewise-linear controllers of the invention.
Fig. 6 is a schematic representation of the piecewise-linear pitch generator.
Fig. 7 is a detailed schematic diagram of the output driver circuit of the invention.
Fig. 8 is a schematic diagram of a formant filter used in the apparatus of the invention.
DETAILED DESCRIPTION OF INVENTION
Referring to Fig. 1 the device is first actuated by closure of a specially adapted squeeze switch 17 which is the subject of co-pending application serial No filed and assigned to the assignee of this application. The closing of switch 17 causes power from the battery 19 to be supplied to the circuit through power control circuit 18. After power is applied, the logic unit 5 causes control circuit 18 via line 20 to hold power on so that power to the circuit is maintained after release of switch 17. This operation is referred to hereinafter as the Wake Up Phase 42. The logic unit 5 also controls the other phases of operation of the invention shown diagramatically in Fig. 2. In accordance with one phase of operation the logic unit 5 controls the Response Generation Phase 43 which generates an audio response via Piecewise Linear controllers 8, 9 formant filters 13, 14, mixer 15, output driver 16 and speaker 1. In accordance with a second phase the logic unit 5 controls the Acquisition Phase 40 which tests the audio input received via speaker 1 and preamplifier 2. Referring now to Fig. 2 actuation of Switch 17 causes the apparatus to go from the Sleep Phase 82 into the power holding. Wake Up Phase 42 described hereinabove. Upon completion of that phase Response Generator phase 43 is entered and a predetermined initial audio response is generated. In the specific embodiment described herein a sound characteristic of an animal is generated e.g. a squawking Parrot sound. The specifics of the unique response generation circuit of the apparatus is described more fully hereinbelow. After completing the initial response the logic unit 5 activates the acquisition phase 40.
-6-
Upon entering the Acquisition Phase 40 the unit is ready and waiting for input stimuli. Referring again to Fig. 1, sounds impinging on the loudspeaker 1 are amplified by preamplifier 2. As is discussed hereinbelow, the preamplifier gain may be set to one of two values at any time by the logic unit 5 via line 4. The logic unit is thus able to compensate for low or high levels of ambient noise. The output of the preamplifier consists of a rectangular pulse waveform on line 3 whenever the input signal from Speaker 1 exceeds a specific amplitude. To enhance the noise-rejection ability of the system, the logic unit 5 responds only when the number of input transitions (zero crossings) exceeds a predetermined threshold during a specific predetermined period of time. Thus the apparatus only responds to signals whose average frequency exceeds a certain value.
Referring again to Fig. 2 and entry period 46 of ' the Acquisition Phase 40 is shown. During this period the input signal is tested for existence of the predetermined minimum frequency at 46a. If this frequency is not present, a waiting period 46b is entered. If the minimum frequency is not detected during this period the logic unit 5 increases the sensitivity of preamplifier 2 thereby adapting to a low ambient noise environment. After increasing the sensitivity another waiting period 46c is initiated, if no stimulus is received after approximately four minutes in the specific embodiment, the logic unit 5 will remove power to the entire apparatus via the power control circuit 18 and the Sleep Phase 82 will be entered, thus conserving power if the device is inadvertently left on» Sleep Phase may also be entered at any time by a second closure of the switch 17.
Once the requirements for input signal validity have been met, the logic unit 5 begins simultaneously to both time the input signal's duration and count its
-7-
zero-crossings 49. Zero-crossings are counted for a pre-determined amount of time, then while continuing to time the input's duration,.a period waiting for the input signal to stay below the minimum average frequency requirement for a pre-determined amount of time is entered. When the wait period condition is satisfied, the Response Selection Phase 41 of the operation cycle begins. If the input frequency remains above the minimum average frequency requirement for longer than a preset period of time the logic unit will reduce the sensitivity of the preamplifier thereby adapting to high ambient noise environments. The number of input transitions counted and the duration of the input stimulus are also used by the logic unit 5 in the selection of the response to be produced.
The Response Selection Phase 41 consists of selection of the response to be produced and the number of times the response is to be repeated. This selection is based on the duration of the stimulus and the number of transitions which were counted during the Acquisition Phase 40. In the specific embodiment stimuli are categorized as one of three types: a handclap, speech, or a whistle. Parameters of the input preamplifier and aquisition logic have been selected such that stimuli will be sorted correctly into these three categories. For a given category of input stimulus the system chooses between a range of possible responses. These responses may be selected at random, correlated to input frequency and duration, or varying proportions of weight may be given to any of these criteria.
In the specific embodiment, the control unit 5 is a COP 422 microprocessor manufactured by the National Semiconductor Corporation on which executes the stored program listed in Appendicies I & II. In accordance with that program a clap input produces a wolf whistle; a
whistle input produces one of those singing responses cycled in the order of Bali Hai, Yellow Bird and Sailor's Hornpipe; and speech, produces responsive squawks. The program steps for accomplishing this selection appear in Appendix I at pages 7 and 12. -
In accordance with a unique feature of the invention Piecewise-Linear controllers are used to greatly reduce the amount of data required to faithfully reproduce the selected audio responses.
Referring to Fig. 3 there is shown a time-varying parameter 70 of a process such as speech synthesis. For purposes of this discussion the function from starting point 83 to ending point 84 will be referred to as a "sound sequence". A prior art piecewise-constant representation of this function might look like the stepped function 71. On the time scale shown, 40 steps (data elements) are required to model the function at a 25 s sample rate, which would normally be regarded as a low sample rate producing poor fidelity. Referring now to Fig. 4 the same function is modeled using a piecewise-linear controller of the invention. As can be seen a significant data reduction is achieved. Inflection points 73 - 76 are connected by segments 78 - 81, producing a smooth function 72 which requires as data only the initial value and the slopes at the four inflection points. By precisely controlling the segment slopes and durations, the inflection point ordinates are readily reached. After the response and number of repetitions of that response have been chosen in the Response Selection Phase, 41, the logic unit 5 enters the Response Generation Phase 43 and initializes several parameters of the sound sequence to be generated. The initial pitch, the pitch slopes for the.first segment, the formant frequencies, and the formant slopes for the first segment are set to their
sta ting values. A "spontaneous" quality is imparted to the responses by randomizing the initial pitch and formant positions so that more variation is perceived in the vocal qualities of the generated sounds. This feature gives the responses to speech input in particular a more conversational quality. Also initialized is the duration of the first segment, at the end of which the desired inflection point is reached and the slope and duration parameters are updated for the next segment.
The piecewise-linear segments which comprise the parameter control functions are executed by the controllers described below. Upon completion of the execution of the final segment of the sound sequence, control is transferred from the sound-generating routine to the higher-level "executive" control level routine of logic unit 5 Appendix I page 15. In the specific embodiment,. the executive level routine determines the number of repetitions of the sound sequence that remain to be executed. Further levels of control could be implemented by those skilled in the art whereby a "macro sequence" consisting of several sound sequences are executed as a single sequence. Applied to speech generation, this amounts to word construction from phonemes. Adding another level permits phrase or sentence generation from words.
Once all repetitions and/or lower control levels have been completed and the desired response has been generated by the Response Generation Phase 43 the system returns to the Acquisition Phase 40.
The piecewise-linear filter control circuit of the specific embodiment is illustrated in Fig. 5. This circuit is used for P/L CTRLLRS 8 and 9 of Fig. 1. An operational amplifier 35 is configured as an integrator which performs continuous integration of any current flowing through node 38. A fixed current flow into or out
of this node will result in a linearly increasing or decreasing voltage at the circuit's output 36. The output increase or decrease is constrained to within the output voltage range of the operational amplifier 35. If no current flows through node 38 the integrator's output voltage will remain constant. If the voltage at node 38 is maintained at that of reference node 37 while the integrator is functioning within its linear range (i.e. not at either of its output voltage limits) ; a predictable current flow through node 38 and the resultant rate of change of the output voltage at 36 is produced by connecting a resistor of known value between node 38 and a known reference voltage. The reference voltage is applied at 22 and binary-weighted resistors 26 - 28 sum binary-weighted currents into node 38. The direction of current flow is determined by the polarity of the voltage at 22 with respect to the voltage at 37, which is made to be at the midpoint of the range of the voltage swing at 22. Hence if 22 is regarded as the most significant bit of a digital word, it will be seen that it acts as a sign bit, determining the direction of current flow through node 38 and the direction of the output voltage slope. The remaining less-significant bits of the digital control word comprise lines 23 - 25 and are used to control electronic switches 29 - 31 which enable or disable current flow through binary-weighted resistors 26 - 28.
The result is a 4-bit sign-magnitude integrating digital-to-analog converter, the output voltage rate of change as opposed to instantaneous value of which is determined by the digital control word at inputs 22 - 25. The output voltage at 36 is in turn used to control the resonant frequency of formant filters 13, 14. Switch 33 is used to discharge the integrating capacitor 34 under control of the logic unit 5 via line 39 thereby pre-setting the output voltage to a known reference value
(i.e. reference voltage 37) at the commencement of each sound generating sequence. A resistor 32 limits the discharge rate of the capacitor. Its value and the amount 5 of time allowed for capacitor discharge are chosen such that the capacitor is not completely discharged, so that the initial integrator voltage (and subsequent voltage outputs during production of the sound sequence) are in part dependent on the output voltage of the integrator
10 prior to the discharge interval. This results in varying vocal qualities being produced each time a sound sequence is generated, even though the formant control data remain . constant. This effect is further enhanced by randomly varying the discharge interval itself within a range of
15 values. In the specific embodiment this function is performed by the logic unit 5 under control of the program of Appendices I and II. The formant filters, 13, 14 have been implemented as shown in Fig. 8.an operational amplifier .85 is configured as a multiple-feedback resonant
20 filter, the center frequency of which (and other parameters) is determined by the values of resistors 89 and 95, capacitors 90 and 91, and the AC impedance of semiconductor diode 93. By varying the voltage at control node 94 and thereby varying the current through resistor
25 92 and diode 93, the AC impendance of diode 93 is controlled. Diode 93 in turn controls the center frequency of the filter. The input signal to be filtered (the excitation pitch of the specific embodiment) is applied at 96 through resistor 95. A DC reference voltage 0 is applied at 87 for DC biasing of the amplifier 85. The filtered output signal appears at output node 86. In Fig. 1 the output of two identical formant filters 13, 14 with center frequencies one octave apart are input to mixer 15. In like manner to that employed in control of the 5 formant filters, the pitch of the excitation waveform is specified in piecewise-linear terms, with the accompanying
data reduction benefits. The pitch generation is performed by the logic unit 5 under program control. It should be noted that although the specific embodiment of the formant filters employs analog filters controlled by digital-to-analog converters,' the data reduction benefits of the piecewise-linear control method would still be realized were the filters to be implemented in some other way, e.g. digital filters whose resonant frequencies could be controlled directly by a digital word.
Fig. 6 schematically depicts a piecewise-linear pitch generator which is functionally equivalent to the pitch generator of the invention. The excitation pitch is produced by a controllable oscillator 50. This device emits a rectangular pulse train at its output 51 the period of which is linearly proportional to the value presented to a control input at 52. The control input signal is taken from a time-delay element 56. At the beginnning of each sound sequence the control input is initialized to give the desired initial period of the first segment of .the sound sequence to be executed. When released from this initial condition, the period of the audio output will continuously increase, decrease, or remain constant, dependent on the value of the slope control 54 which is applied to one input of a multiplier 55. The output of multiplier 55 is the product of the values present at its two inputs 53 and 54, namely the period control signal and the pitch slope control signal. The output pitch will remain constant if the slope has value of exactly one. Greater slope values will result in continuously decreasing pitch as the period grows larger, and slope values less than one yield increasing pitch. It can be seen that the time delay element 56 also controls the rate of change of the output pitch by determining the rate at which the period control value is updated by the multiplier product. Use of the multiplier allows the
pitch slope parameter to be expressed in units of octaves per second, i.e. the frequency changes exponentially as a function of time. This is a very useful characteristic in all audio work, be it music or speech-o iented.
The piecewise-linear pitch generator affords the same data-reduction benefits as its formant-controller counterpart. A single data element, the pitch slope, is sufficient to program a continuous (for all intents and purposes) glissando of pitches for an indefinite amount of time. This may be contrasted with the piecewise-constant approach which requires ever-increasing amount of data in proportion to the length of sound to be produced. The piecewise-linear tone generator of the specific embodiment is implemented in the control program of Appendix I and II, but could be implemented in a variety of ways known to those skilled "in the art, including discrete digital logic elements or analog circuits (voltage-controlled oscillators, analog multipliers, etc.) while retaining its essential functions.
The specific embodiment of the invention uses for both its audio input and output a single dynamic loudspeaker 1. To make this possible the circuit 16 used to drive the loudspeaker during the Response Generation Phase 43 must have low output impedance in order to drive the speaker to sufficient sound output levels yet must not present such a low impedance to the speaker during the Acquisition Phase 40 that input sensitivity is degraded. A solution to this problem is illustrated in Fig. 7. An inverting operational amplifier 59 drives a pair of complementary emitter-followers 61, 62. Negative DC feedback is supplied to the op-amp by a resistor-capacitor network 64 - 66 where any AC feedback signal is filtered out by capacitor 66. If no signal is applied to the input 68, the feedback network allows the circuit to rest in an equilibrium state with the op-amp output 60 voltage equal
to its input voltage 58. Due to resistor 63 the same voltage will appear at the common emitters of the transistors 69. Since there is no voltage difference between the base and emitter of either transistor, they both present a high impedance to the loudspeaker. Note that there is no AC feedback path around the op-amp and it therefore has a very high voltage gain for any input signal. As soon as one is applied through input capacitor 57 the op-amp output voltage at 60 will slew rapidly either above or below the quiescent voltage depending on the slope of the input signal. When the voltage difference between the base and emitter of either transistor exceeds approximately .7 Volts that transistor will begin to conduct current and function as normal emitter follower, argumenting the drive capability of the OP amp. In this state there will exist a low impedance path for AC feedback through the conducting transistor and the resistor at 63, and the op-amp will operate in a closed-loop fashion as long as the output voltage swing at the op-amp 60 exceeds the forward voltage drops necessary to turn on the transistors. Thus this circuit can be seen to exhibit a high output impedance when no excitation signal is being generated (Acquition Phase) and a low output impedance for driving the loudspeaker in Response Generation Phase when the mixed formant signals are applied to the circuit's input at 68.
The appendices are listings of the program stored in the memory of the microprocessor of the specific embodiment. Briefly the listing of Appendix I is of program statements performing the following functions.
Page 1 Assigns symbol names to RAM (data) locations Pages 2-3 Assign values to symbolic names used in the program
Page 4 Initializes system and jumps to sound generating routine Page 5 Waits for average input frequency to exceed threshold for valid input
Page 6 Measures input frequency, times duration of input stimulus, and waits for input frequency to remain below threshold for time specified by variable EXITTIME (pg. 3) Page 7, 12 Choose sound to be produced
Page 15 Makes control decisions after completion of a sound sequence Pages 17-19 Comprise a timekeeping routine which is called from various parts of the program. It uses the microprocessor's built-in timer to increment several independent counters. When a counter overflows, a corresponding flag bit is set which indicates to the calling routine that the time interval has elapsed
Page 21 Generates the audio tone by executing the LEI instructions at lines 734 and 737 at a rate determined by the contents of the DELAY register. The PIPELINE routine called in lines 722 and 735 performs the multiplication operation necessary for generating the sloping pitch segments described in the theory of operation. The multiplication if performed in its entirety in one subroutine call would take too much time to be performed concurrently with the tone generation, therefore it is "pipelined", i.e. a portion of the operation is performed each time the routine is
called, with the product being made available before it is required for the updating of the DELAY register. Pages 23 - 31 Comprise the pipelined multiply routine described above. The PIPE12 section on pg. 30 also monitors flags returned by the timekeeping routine and in turn sets flags which control exit from the sound 10Op. PIPE13 and PIPE14 on pg. 31 are used to look up data from the sound data area at the beginning of each piecewise-linear parameter control segment. The pitch slope is looked up and stored in its RAM location and the formant frequency controllers' slopes are looked up and output via the "L" I/O Port. By including these functions in the pipelined routine, they may be performed while the sound loop is executing without disturbing its timing so that no audible
"clicks" or inconsistencies are produced at parameter inflection points when new data is retrieved. Note that the "error" on pg. 23 is not in fact a true error but a use of an idiosyncrasy of the microprocessor to gain more memory space for the pipelined routine. The JID instruction in line 772 will access memory locations in the next block of program space. The same technique is used in line 1115 with an LQID instruction. The MACROS file of Appendix II lists several macro subroutines used by the program of Appendix I.