EP1183677B1 - Durch sprache gesteuertes elektronisches musikinstrument - Google Patents

Durch sprache gesteuertes elektronisches musikinstrument Download PDF

Info

Publication number
EP1183677B1
EP1183677B1 EP00936067A EP00936067A EP1183677B1 EP 1183677 B1 EP1183677 B1 EP 1183677B1 EP 00936067 A EP00936067 A EP 00936067A EP 00936067 A EP00936067 A EP 00936067A EP 1183677 B1 EP1183677 B1 EP 1183677B1
Authority
EP
European Patent Office
Prior art keywords
instrument
pitch
frequency
sound
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP00936067A
Other languages
English (en)
French (fr)
Other versions
EP1183677A1 (de
Inventor
John W. Jameson
Mark B. Ring
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JOHNMARK LLC
Original Assignee
JOHNMARK LLC
Jameson John W
Ring Mark B
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JOHNMARK LLC, Jameson John W, Ring Mark B filed Critical JOHNMARK LLC
Publication of EP1183677A1 publication Critical patent/EP1183677A1/de
Application granted granted Critical
Publication of EP1183677B1 publication Critical patent/EP1183677B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H3/00Instruments in which the tones are generated by electromechanical means
    • G10H3/12Instruments in which the tones are generated by electromechanical means using mechanical resonant generators, e.g. strings or percussive instruments, the tones of which are picked up by electromechanical transducers, the electrical signals being further manipulated or amplified and subsequently converted to sound by a loudspeaker or equivalent instrument
    • G10H3/125Extracting or recognising the pitch or fundamental frequency of the picked up signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H5/00Instruments in which the tones are generated by means of electronic generators
    • G10H5/005Voice controlled instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/011Files or data streams containing coded musical information, e.g. for transmission
    • G10H2240/046File format, i.e. specific or non-standard musical file format used in or adapted for electrophonic musical instruments, e.g. in wavetables
    • G10H2240/056MIDI or other note-oriented file format
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/175Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments for jam sessions or musical collaboration through a network, e.g. for composition, ensemble playing or repeating; Compensation of network or internet delays therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/201Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
    • G10H2240/211Wireless transmission, e.g. of music parameters or control data by radio, infrared or ultrasound
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/281Protocol or standard connector for transmission of analog or digital data to or from an electrophonic musical instrument
    • G10H2240/295Packet switched network, e.g. token ring
    • G10H2240/305Internet or TCP/IP protocol use for any electrophonic musical instrument data or musical parameter transmission purposes

Definitions

  • the invention relates to musical instruments. More particularly, the invention relates to a voice-controlled electronic musical instrument.
  • the memory is capable of containing discrete notes of the chromatic scale and respond to discrete input notes of the same pitch.
  • the system is analogous to a keyboard instrument where the player has only discrete notes to choose from and actuates one by depressing that particular key.
  • Other musical instruments give a player a choice of pitches between whole and half tone increments.
  • a violin can produce a pitch which is variable depending upon where the string is fretted or a slide trombone can cause a pitch falling in between whole and half tone increments. Both of these instruments produce an unbroken frequency spectrum of pitch.
  • the difficulty in employing either the Ishikawa or the Tsunoo devices for useful purposes is that most untrained musicians do not know which scales are appropriate for different songs and applications.
  • the device may even be a detractor from the unimproved voice-controlled music synthesizer, due to the frustration of the user not being able to reach certain notes he desires to play.
  • the concept of "music-minus-one” is the use of a predefined usually prerecorded musical background to supply contextual music around which a musician/user sings or plays an instrument, usually the lead part. This concept allows the user to make fuller sounding music, by playing a key part, but having the other parts played by other musicians. Benefits to such an experience include greater entertainment value, practice value and an outlet for creative expression.
  • Hoff performs pitch correction only in the context of preprogrammed accompaniments, using the scale note suggested by the accompaniment nearest to the detected pitch. Hoff does not provide pitch correction in the absence of accompaniment, for example, the capability for the user to choose the scale to be used for the pitch correction or the capability to assign the currently detected pitch to the tonic of that scale.
  • a major drawback of most presently known systems that allow voice control of a musical instrument is that they require bulky enclosures and are presented in unfamiliar form factors, i . e . as imposing pieces of technical equipment. Thus, a user is unable to connect with such instruments in a natural way. Rather than playing a musical instrument, such devices give one the impression of operating a piece of machinery which, in most cases, is similar to operating a computer. This fact alone well explains the lack of commercial success and consumer acceptance these devices have found.
  • the invention provides a voice-controlled musical instrument in a form factor that most nearly represents the actual instrument that the electronic instrument is to represent. Such form factor contributes to the ease of use of such instrument by providing a user with a simple method of operation.
  • the invention also provides a computationally efficient pitch-detection technique for a voice-controlled electronic musical instrument.
  • the device described in this document is an electronic, voice-controlled musical instrument. It is in essence an electronic kazoo. The player hums into the mouthpiece, and the device imitates the sound of a musical instrument whose pitch and volume change in response to the player's voice.
  • the device is compact, self contained, and operated by the user with a simple set of controls.
  • the invention overcomes many of the barriers to acceptance of such electronic instruments as were taught in the prior art. That is, the device is simple to operate and to hold while playing. Because the device is self contained, lightweight, and fully integrated, there are no exposed wires or connections to make between various components of a system which would detract from both the enjoyment of the device and the sense that the device is an electronic surrogate for the actual instrument that it physically represents. Because the device is provided in a dedicated form, e.g .
  • the device uses a unique pitch-detection scheme that is both computationally efficient and well suited for an integrated device, such as the voice-controlled electronic musical instrument herein disclosed, it is possible to provide both a compact, self-contained device and, significantly, a device that provides a high degree of musicality, thereby further enhancing the impression that the user is actually playing a musical instrument.
  • the instrument can in principle be any music-producing sound source: a trumpet, trombone, saxophone, oboe, bassoon, clarinet, flute, piano, electric guitar, voice, whistle, i . e . virtually any source of sound.
  • the instrument In its simplest configuration, the instrument resembles a kind of horn, and is for convenience called the HumHorn throughout this document. However, the shape and appearance of the instrument can be fashioned by the manufacturer to match the sound of any traditional instrument, if desired; or its shape can be completely novel. The functional requirements of the HumHorn's physical design are only:
  • the frequency-detection module the loudness-tracking module, and the note-attack module.
  • the frequency-detection module identifies the frequency of the player's voice. It does this by analyzing the incoming sound wave and finding patterns of recurring shapes. This method is a highly computationally efficient and novel combination of auto-correlation and zero-crossing- or peak-based pitch detection. The chosen instrument is synthesized at the pitch determined by the FDM or at an offset from that pitch as desired by the player.
  • the loudness-tracking component measures the loudness of the player's voice, and this information is used then to set the volume of the synthesized sound.
  • the note-attack module detects abrupt changes in the loudness of the player's voice. This component helps decide when the synthesized instrument should begin a new note.
  • the invention provides a voice-controlled musical instrument in a form factor that most nearly represents the actual instrument that the electronic instrument is to represent. Such form factor contributes to the ease of use of such instrument by providing a user with a simple method of operation.
  • the invention also provides a computationally efficient pitch-detection technique for a voice-controlled electronic musical instrument.
  • the device described in this document is an electronic, voice-controlled musical instrument. It is in essence an electronic kazoo. The player hums into the mouthpiece, and the device imitates the sound of a musical instrument whose pitch and volume change in response to the player's voice.
  • the player is given the impression of playing the actual instrument and controlling it intimately with the fine nuances of his voice.
  • the device is compact, self contained, and operated by the user with a simple set of controls. In such way, the invention overcomes many of the barriers to acceptance of such electronic instruments as were taught in the prior art.
  • the device is simple to operate and to hold while playing. Because the device is self contained, lightweight, and fully integrated, there are no exposed wires or connections to make between various components of a system which would detract from both the enjoyment of the device and the sense that the device is an electronic surrogate for the actual instrument that it physically represents. Because the device is provided in a dedicated form, e.g. as a horn, the user is drawn into the musical experience rather than distracted by the use of a microphone. Thus, voice operation of the device most nearly implies playing the actual instrument the device represents and creates the impression that the user is actually playing an instrument. Further, by taking the counter intuitive measure of severely restricting the user's ability to alter operation of the device, the user interface is significantly simplified.
  • the device uses a unique pitch-detection scheme that is both computationally efficient and well suited for an integrated device, such as the voice-controlled electronic musical instrument herein disclosed, it is possible to provide both a compact, self-contained device and, significantly, a device that provides a high degree of musicality, thereby further enhancing the impression that the user is actually playing a musical instrument.
  • the HumHorn imitates the experience of playing and performing an actual musical instrument, including the visual, tactile, and auditory qualities of the experience, and including the finely nuanced auditory control of an instrument that previously only musicians trained in the art of a musical instrument could have, and also including all the personal, psychological, and social benefits that accompany the act of performing an actual musical instrument, whether solo or together with other performers, whether in front of an audience or alone.”
  • the instrument can in principle be any music-producing sound source: a trumpet, trombone, clarinet, flute, piano, electric guitar, voice, whistle, even a chorus of voices, i . e . virtually any source of sound.
  • the instrument In its simplest configuration, the instrument resembles a kind of horn, and is for convenience called the HumHorn throughout this document. However, the shape and appearance of the instrument can be fashioned by the manufacturer to match the sound of any traditional instrument, if desired; or its shape can be completely novel. The functional requirements of the HumHorn's physical design are only:
  • the frequency-detection module the loudness-tracking module, and the note-attack module.
  • the frequency-detection module identifies the frequency of the player's voice. It does this by analyzing the incoming sound wave and finding patterns of recurring shapes. This method is a highly computationally efficient and novel combination of auto-correlation and zero-crossing- or peak-based pitch detection.
  • the chosen instrument is synthesized at the pitch determined by the FDM or at an offset from that pitch as desired by the player.
  • Various approaches to the process of pitch detection itself are known. As discussed above, Russ discloses that the traditional general classifications for pitch detection are a) zero-crossing, b) auto-correlation, c) spectral interpretation.
  • the present approach is much more computationally efficient because the wave shapes are compared (correlated) only for time spans bounded by distinguishing wave characteristics such as peaks or zero crossings rather than to spans bounded by arbitrary sample points. For the latter case, a much greater number of correlation calculations are required.
  • the present approach simply takes advantage of the fact that waves can be segmented by distinguishing characteristics such as peaks or zero crossings.
  • Russ' classifications the present approach is a novel combination of (a) and (b) classifications, providing the accuracy of auto-correlation with the computational efficiency of the zero crossing methods.
  • the present approach accounts for pitch change over time by stretching or shrinking the compared waves to the same length before correlation is performed.
  • the loudness-tracking component measures the loudness of the player's voice, and this information is used then to set the volume of the synthesized sound.
  • the note-attack module detects abrupt changes in the loudness of the player's voice. This component helps decide when the synthesized instrument should begin a new note.
  • the HumHorn is a hand-held music synthesizer whose output is controlled by the human voice.
  • Figure 1 diagrams the functionality of the HumHorn.
  • the player 10 sings or hums into the mouthpiece 14 of the instrument 12.
  • the HumHorn produces the sound at the output 13 of a musical instrument that closely follows in both pitch and volume the nuances of the player's voice.
  • the player can choose which instrument the HumHorn should imitate, and is given the impression of playing the chosen instrument merely by singing.
  • the form factor of the device is a musical instrument and that all components of the device are contained within the instrument itself.
  • the user is most nearly given the impression of playing an actual instrument and not of operating a computer or other electronic device. It is thought that this fact alone is sufficiently significant to overcome the technophobia that intimidates many individuals when they are confronted with new technologies.
  • a psychological barrier is overcome that allows the device to be used by a broader group of persons.
  • a physical barrier is overcome, allowing physically disabled individuals the ability to play a musical instrument.
  • the user and audience are given the impression that an actual instrument is actually being played. This means that the sounds produced by the device match the instrument it resembles, as is expected by the user.
  • the HumHorn itself can resemble any known or novel instrument.
  • One possible configuration is shown in Figure 2.
  • the mouthpiece 5 leads directly to the microphone 9.
  • the loudspeaker resides in a double-cone section 3 from which a channel leads through the central housing 11 to a bell section 7 where the sound is transmitted.
  • the housing imparts an acoustic quality to the sound produced.
  • the electronics and batteries are contained in the central housing, which also holds several finger-actuated controls: both push buttons 1b and selection switches 1 a. These controls allow the player to alter synthesizer parameters, such as instrument selection, volume, or octave.
  • the logical structure of the HumHorn is diagrammed in Figure 3.
  • the microphone 30 sends an analog signal to an analog-to-digital converter (ADC) 31, which samples the signal at a fixed frequency, preferably 22,050 Hz.
  • ADC analog-to-digital converter
  • the ADC converts one sample at a time and sends it to a band-pass filter 32 (which smoothes the signal by removing frequencies that are too high or too low).
  • Each filtered sample is then sent to the signal-analysis module (SAM) 33 where it is analyzed within the context of the preceding samples. After analyzing the sample, the SAM passes the following information to the synthesizer 38:
  • the synthesizer also receives input from the finger-actuated controls 37. These control values can modify a variety of synthesizer parameters, including (but not limited to):
  • An output sample is then produced by the synthesizer according to all information passed in, and this output sample is fed to a digital-to-analog converter (DAC) 34.
  • the DAC produces an analog output signal from a stream of digital output samples that it receives. This signal is sent to an amplifier 35 before being transmitted by the loudspeaker 36.
  • the discussion below first describes the filter.
  • the discussion describes the core software component, the SAM, which consists of three sub-modules: the frequency-detection module (FDM), the play and attack decision module (PADM), and the loudness-tracking module (LTM).
  • FDM frequency-detection module
  • PADM play and attack decision module
  • LTM loudness-tracking module
  • SSM sound synthesizer module
  • the filter takes the raw input signal directly from the ADC and digitally filters it, one sample at a time.
  • the digital filter is allowed to look at previous samples, but it cannot see future values.
  • the filter smoothes the raw data, removing jagged peaks, which are usually not related to the player's intended pitch.
  • a simple third-order band-pass filter is used.
  • the filter has a low cutoff of 200 Hz and a high cutoff of 300 Hz.
  • a preferred filter is described in W. Press, B. Flannery, S. Teukolsky, W. Vetterling, Numerical Recipes in C , pp. 456-460, Cambridge University Press (1988).
  • SAM Signal-Analysis Module
  • the signal-analysis module takes the current sample as input 40 and produces as output the four pieces of information described above: note on/off 41, frequency 42, loudness 43, and attack 44.
  • SAM signal-analysis module
  • the relationship between SAM's three sub-modules is diagrammed in Figure 4.
  • the input sample is available to all three sub-modules.
  • the FDM 45 calculates both the frequency of the input signal as well as a measure of this calculation's reliability.
  • the former is sent on to the SSM 38 (Fig. 3), while the latter is used by the PADM 46.
  • the PADM also makes use of the loudness value computed by the LTM 47.
  • the Frequency-Detection Module (FDM)
  • the frequency-detection module analyzes the input signal to discover the fundamental frequency. It does this by looking for patterns in the shapes of the incoming waves.
  • the fundamental wavelength is the largest repeated shape.
  • Figure 5 displays a wave resembling one that a human voice might produce after band-pass filtering.
  • the horizontal axis represents time; points on the right occur after points on the left.
  • the vertical axis represents signal voltage. Points above the center horizontal line have a positive voltage. Points below this line have a negative voltage.
  • the ADC converts these voltages to digital sample values. With the preferred 8-bit ADC, the sample values fall within the range ⁇ 128 (a 16-bit ADC generates values in the range ⁇ 32768.) The greater the average magnitude of a wave's samples, the louder it is.
  • the peaks are given labels, 1 - 17, representing the order in which they occur.
  • the term peak is used to refer to both high (odd numbered) as well as low (even numbered) peaks.
  • the time at which a peak occurs is written t p , where p is the number of the peak, e.g . the time at which peak 1 occurred is written t 1 , etc.
  • the wave stretches from t 1 to t 17 and consists of a fundamental wave repeated four times: t 1 to t 5 , t 5 to t 9 , t 9 to t 13 , and t 13 to t 17 .
  • the duration or length of this wave e.g.
  • the FDM finds this fundamental wavelength by finding the longest indivisible, repeated wave shape - the fundamental wave shape. It is indivisible if it is not itself composed entirely of a repeated wave shape. For example, in Figure 5 the wave shape from t 13 to t 17 matches that from t 9 to t 13 and is the fundamental wave shape. Although the segment from t 9 to t 17 matches the segment from t 1 to t 9 , it is not the fundamental wave shape because it is divisible into the two smaller matching segments.
  • This technique - identifying the fundamental frequency by finding the fundamental wave shape - works for the HumHorn because the input signal is the human voice, and certain properties of this input signal are known in advance.
  • the voice can only produce a certain range of frequencies and wavelengths. Therefore, waves longer or shorter than this range can be ignored, which keeps the processing effort within reasonable bounds.
  • the human voice can be effectively band-pass filtered, and the resulting waveform is smooth and well behaved (see below).
  • a well-behaved wave is one where the fundamental wave spans only a small number of peaks - typically not more than four or five. This also helps limit the search effort.
  • the FDM finds the fundamental wave shape by comparing recent segments of the input wave in search of the largest repeated shape.
  • the efficiency of the FDM's shape-matching method is due to one fundamental insight: because the fundamental wave shape is always bounded by peaks, the search for matching wave shapes can be greatly economized by comparing only wave segments bounded by peaks. For this reason, frequency calculations are only performed when a new peak is detected. Because the fundamental wave usually has no more than four or five peaks, the number of comparisons is generally not more than about 25 (as will be seen shortly), and the average is much less than this.
  • the box numbered 61 in Figure 6 tests whether the current sample represents a peak.
  • the test measures the first and second derivative of the wave at the current point. There are three possible outcomes: STRONG, WEAK, and NONE. Pseudo code for this test is shown in Figure 7.
  • Lines 1 through 3 define t to be the current time, sample(t) to be the value of the input sample at the current time step, and slope(t) to measure the slope at the current time step.
  • the curvature is set equal to the magnitude of the second derivative at the sample point (line 7).
  • " represent absolute value.
  • test in line 10 only serves to reduce the number of peaks used for frequency estimation (strong peaks), and hence reduce the overall computational burden. This particular method for culling the peaks is rather arbitrary. The test of line 10 could be removed altogether to increase the rate of frequency estimation at the expense of added computation.
  • the box numbered 62 in Figure 6 (iterate over individual wave-segment pairs) enumerates all pairs of recent wave segments and sends them to the box numbered 63 to be compared.
  • the peak at t 17 has just been detected in box 61. It is now used as the endpoint for the second of two segments to be compared.
  • the first segment, wave1 begins on a peak temporarily labeled start, and ends on a later peak temporarily labeled split.
  • the second segment, wave2 begins on the split peak and ends on the peak just detected in box 61, called current .
  • split is the penultimate peak and start is the immediately preceding peak. Then an iterative process begins, whereby the start and split labels are moved backwards in time from one peak to the next. Each time a label is moved, the new segments wave 1 and wave2 are compared. This continues until all likely segments are compared. As was already stated, only wavelengths within a certain range need to be considered. Segments are first tested to see whether they are likely matches before they are sent to box 63 for comparison. The start and split peaks must also be strong peaks.
  • Wave Segments Wave 1 wave(start, split) Wave2 wave(split, current) Wave 1 wave(start, split) Wave2 wave(split, current) wave(15,16) wave(16,17) wave(13,14) wave(14,17) wave(14,16) wave(16,17) wave(12,14) wave(14,17) ... ... ... ... wave(8,16) wave(16,17) wave(6,14) wave(14,17) ... ... wave(14,15) wave(15,17) wave(8,9) wave(9,17) wave(13,15) wave(15,17) wave(7,9) wave(9,17) ... ... ... ... ... ... wave(7,15) wave(15,17) wave(1,9) wave(9,17)
  • the comparison procedure is described in detail in the following paragraph, but briefly stated, it stretches the waves to the same length and then subtracts one from the other. The difference resulting from this subtraction is used to judge their similarity: the closer the difference is to zero, the more similar the two waves are. If the two waves are similar enough (lines 14 and 15) then they are considered to have matched. The criterion for whether or not they match depends upon whether a note is currently playing. If a note is not playing, then a stricter standard is used, which ensures that playing begins at the correct frequency. Once a note has started playing and the approximate frequency has been established, a more relaxed standard is applied.
  • FCCR Frequency Calculation and Correction Module
  • the fundamental wavelength they represent (the average of their individual lengths) is calculated (line 16). If this wavelength is approximately twice that of the best wavelength matched so far (line 17), then the search has gone too far, and wave1 and wave2 each consist of two complete fundamental wave shapes. In this case, processing stops and the new frequency is returned (line 18). Otherwise, the difference between the segments is compared against previous difference values (line 19). If it is the lowest so far, it is kept (lines 20 and 21), and the match flag is set to TRUE.
  • the preferred frequency detection method described here relies on the identification of peaks, it could just as well rely on the identification of any other distinguishing characteristic, such as, for example, zero crossings.
  • the analogue of a strong peak in box 61 is a zero crossing with a large positive or negative slope.
  • Box 63 which compares two segments, takes two wave segments, stretches or shrinks the second one so that it is the same length as the first one, and adds up their sample differences. Instead of summing the differences over every sample in both waves, only a small number of evenly distributed samples (called checkpoints ) are chosen, which speeds processing. The distance between each checkpoint is approximately N_SAMPLES_PER_CHECKPOINT. Pseudo-code for box 63 is shown in Figure 9. The two wave segments are called wave1 and wave2. Line 2 calculates the number of checkpoints, based on the length of wave1. The floor symbol " ⁇ ⁇ " means to round down to the nearest integer. The value of wavelength_ratio represents the length of wave2 compared to the length of wave1 (line 3).
  • t 1 is the time of the i th checkpoint in wave1.
  • t 2 is the time of the i th checkpoint in wave2 - which is based on t 1 , but expanded or contracted to correspond to the same location in wave2.
  • Lines 9 and 10 find the sample values for wave1 and wave2 at the i th checkpoint.
  • the difference between the two waves is updated with the magnitude of their difference at this checkpoint.
  • lines 12 - 14 the average magnitude of the two samples is calculated, and the maximum of this value is kept for the entire wave.
  • the sum of the wave's differences is normalized for both length and height so that the effect of the procedure is the same for both high and low frequencies, and both loud and soft signals.
  • FCCR Frequency Calculation and Correction Routine
  • the frequency calculation and correction routine (box 64 in Figure 6) assumes that the most recently detected frequency is false and replaces it with the previously detected frequency. (It is especially for this reason that frequency detection is more stringent when a note is established than after the note has already begun, see Figure 8, lines 14 and 15.) By accepting only small frequency changes in its input, the output of the HumHorn appears to change pitch smoothly and continuously.
  • the pseudo code for the FCCR is shown in Figure 10.
  • Line 2 calculates the time elapsed since the last wave match.
  • Line 3 calculates what the frequency will be if the best wavelength is accepted, as according to Equation (1).
  • Lines 4 - 7 calculate the percent difference in frequency between the last accepted frequency and the newly suggested one. The numerator is the larger of the two, and 1 is subtracted from the quotient for normalization. If no match was found in box 62, the frequency is left unchanged (line 9). Otherwise, its time is recorded (line 9) to be used again in a later iteration at line 2. If the change in frequency was within the speed that the human voice can achieve, then the frequency is changed to the new value, otherwise it is left unchanged.
  • Figure 11 shows two filtered waveforms recorded from the same voice. These waveforms are particularly interesting because they are so highly ambiguous and are a challenge for the frequency detector.
  • the upper wave has a wavelength of just under eight milliseconds, but it could easily be interpreted as having a wavelength of twice this much due to shape replication.
  • For the lower wave the opposite is true. It has a wavelength of just over seven milliseconds, but it could easily be interpreted as having half this wavelength.
  • the FDM to recognize both wavelengths correctly the parameters must be carefully tuned. The full set of parameter values is discussed below.
  • Another method for correcting the frequency entails altering box 62 in Figure 6.
  • box 62 could return the match that was closest to the previous wavelength. It is possible that this wavelength, though not the best match, was the real fundamental wavelength.
  • Frequency correction is an important part of the frequency-detection process, and there are a multitude of different techniques for accomplishing it.
  • the general frequency-detection algorithm has reduced the number of candidate frequencies from the entire spectrum that the human voice can produce, down to a small number of alternatives, typically two or three.
  • the ambiguity that may be difficult for the general-purpose frequency-detection algorithm to distinguish may be simpler for a special-purpose algorithm.
  • Two methods in particular that can be used for these occasional frequency ambiguities are: (a) predictive filtering and other parametric frequency-estimation techniques, and (b) context-sensitive probabilistic methods.
  • the singer has in the last few notes been singing upwards in ascending semitones, then, given two ambiguous alternatives, one a semitone higher than the last pitch and the other one octave plus one semitone higher, the probability is greater that the former rather than the latter was intended by the singer.
  • a priori information about the human voice and the wave patterns that it can generate, or is likely to generate can be used for making the final decision as to which frequency has been detected.
  • Box 66 estimate frequency reliability, is an important routine, but it is better described below when the context for its use becomes more clear. For now, it is noted that box 66 has not yet been described but that in the discussion below, when there is a need for a measurement of frequency reliability, box 66 is described and explained.
  • the entire frequency-detection procedure is not very computationally intensive, consisting mostly of summations over a small subset of recent time steps. Yet, this approach is very effective for finding the correct fundamental frequency. Most other methods for frequency detection rely on much more filtering of the signal. Spectral methods require several orders of magnitude more computation. The FDM is also much faster reacting than spectral methods, requiring far fewer samples before the fundamental wavelength is detected. The FDM method is related to standard auto-correlation, but it is less computationally intensive. Whereas auto-correlation methods use a dot product to measure wave-shape similarity, which involves a large number of multiplications, the FDM uses the sum of differences. The FDM also saves considerable computational effort by comparing only wave segments bounded by wave-shape features - such as peaks or zero crossings.
  • the Play and Attack Decision Module (PADM)
  • the third and fourth of these cases involve performance and synthesizer options. These cases are handled by the SSM (the Sound Synthesizes Module) and are described in detail below. The first two cases require the detection of an intended attack on the part of the player. These cases are handled by the Play and Attack Detection Module (PADM).
  • SSM Sound Synthesizes Module
  • PADM Play and Attack Detection Module
  • the player wants the HumHorn to produce an attack at specific times.
  • the player automatically uses his tongue and lips, pronouncing consonants, usually 'd', 't', 'l', 'p', 'b', and/or 'm', to separate one note from the next.
  • pronouncing consonants usually 'd', 't', 'l', 'p', 'b', and/or 'm'
  • One generally sings syllables such as 'dum, ba dum, badumpadumpadum', or 'doodle oo, doodle oo, doodle oo doo doo,' to make notes distinct.
  • the PADM can detect these signals and respond to them by issuing attacks to the SSM.
  • the routine for calculating the frequency reliability is described below. First, though, the rest of the PADM is described. A diagram of the relationships between the PADM and its related routines is given in Figure 13. Besides frequency reliability, the PADM 46 also requires information about the current loudness of the player's voice. The loudness is calculated by the LTM 47, the Loudness-Tracking Module, which is described below. The PADM also requires information about recent changes in loudness, in particular, whether the loudness has suddenly increased abruptly in the recent past. This information comes from the Recent Loudness Surge routine 130, which is described below.
  • Pseudo code for the PADM is given in Figure 14.
  • the PADM issues an attack in two different cases, corresponding to case (1) and case (2) above.
  • Lines 3 - 5 correspond to case (1)
  • lines 7 - 10 correspond to case (2).
  • an attack is issued if: (line 3) there has recently been at least one frequency match (the frequency reliability is GOOD or STABLE); (line 4) there has been an abrupt increase in loudness; and (line 5) sufficient time has passed since the last attack.
  • the frequency reliability has been steady for some time (line 7), the signal is loud enough (line 8), no note is currently playing (line 9), and sufficient time has passed since the note was turned off (line 10).
  • a weak or lost signal is a normal part of detection. It happens most often when the player stops singing a note, or is separating two notes by pronouncing a consonant, which adds noise to the signal.
  • Another bad sign i . e . indication that the frequency has been lost, is when a strong peak is found but the FDM can find no shape match for it.
  • the reliability of the signal can be rated according to good signs and bad signs.
  • the good signs are strong, matching peaks.
  • the bad signs are non-matching peaks and strings of too many weak peaks in a row.
  • the frequency is STABLE if there have been at least three good signs in a row with no bad signs. It is BAD if there have been at least five bad signs in a row with no good signs. If neither BAD nor STABLE, but the current peak is a strong, matching peak, then the frequency reliability is GOOD. If none of these cases apply, then the reliability is UNSURE.
  • Figure 15 shows the pseudo code for estimating frequency reliability.
  • Lines 2 - 15 count the good signs and bad signs.
  • Lines 16 - 23 classify them into a reliability estimate. If the current peak is weak (line 2), then the number of consecutive weak peaks is incremented (line 3). If the number of consecutive weak peaks is too large (a bad sign), then the bad-sign counter should be incremented (line 5) and the good-sign counter should be reset to zero (line 6). Also, the counting of consecutive weak peaks should begin again (line 7).
  • the peak is not weak it must be strong (line 8). If no match was found (bad sign) then, again, the bad-sign counter is incremented (line 9), the good-sign counter is reset (line 10), and the weak-peaks counter is reset (line 11). However, if a match was found (good sign), then the good-sign counter is incremented (line 13), the bad-sign counter reset (line 14), and again, the weak-peaks counter is reset (line 15).
  • the classification begins. If, leading up to the current peak, there are have been five or more bad signs in a row, then the frequency reliability is BAD (line 17). If there have been three or more good signs, then the reliability is STABLE (line 19). If neither BAD nor STABLE, but the current peak is a strong, matching peak, then the reliability is GOOD (line 21). If none of these cases apply, then the reliability is UNSURE (line 23).
  • the numbers of good signs (3) and bad signs (5) are clearly arbitrary and a matter of tuning.
  • the criteria for good signs and bad signs could also in principle be enhanced to include other indicators.
  • the final component required by the PADM is the routine for detecting a recent loudness surge.
  • This routine returns TRUE if there has recently been a loudness surge (a sudden increase in the loudness of the player's voice) that has not already been used as the basis for an attack.
  • a loudness surge is considered to have occurred whenever the current loudness is substantially greater than any previous loudness in the recent past, i . e .
  • the "min" function in Equation 0 is computationally expensive in comparison with the rest of the HumHorn's functions.
  • One method that can speed this process is to split up the loudness values into bins, each of which represents a range of values. When a new loudness value arrives, the bin that it corresponds to is incremented. WINDOW_SIZE seconds later, when the value should leave the window, the bin is decremented. The minimum value in the window lies within the range of the lowest, non-zero bin. Alternatively, the bins could point to a list of the actual values represented by that bin. In fact, the values could be stored in any standard, ordered data structure whose store time is O(log n). Alternatively, a subset of past loudness values can be used for this comparison, e.g . those that correspond with peak detection.
  • the Loudness-Tracking Module (LTM)
  • the HumHorn's immediate and continuous response to the moment by moment changes of loudness in the player's voice provides a subtle, nuanced control that no keyboard instrument can match.
  • control of volume is completely intuitive and natural; it does not first have to be translated from the mind to the fingers. It is effortless and automatic. Responsive loudness tracking is also very important for the PADM to detect rapid as well as subtle note attacks.
  • L(t) (1-K) *
  • L(t) is simply a trace or low-pass filter of the sample value magnitude. This method is sufficient for tracking slow changes in loudness. It is not sufficient, however, for tracking quick changes. To detect a rapid succession of note attacks, it is necessary to track quick changes in loudness.
  • a straightforward way to get more responsive loudness tracking is simply to look at all the sample values in the recent past, i . e . over a window of the most recent M steps).
  • Equation 0 is potentially expensive to implement computationally, though some optimizations can be devised.
  • / M L(t- ⁇ ) + [
  • Equation 0 A more sophisticated method of loudness tracking accomplishes what Equation 0 describes, but it is far less computationally intensive. It too finds the maximum value in the preceding time window (of size M), but it only updates its evaluation of the loudness when a strong or weak peak arrives (instead of at every time step). As a further optimization, the only sample values used for the evaluation are those at recognized peaks. This peak-based approach fits in nicely with the peak-based FDM. The C++ code supplied in Figure 28 implements this method.
  • frequency information can be used to adjust M dynamically, i . e . the size of the sample window.
  • the fundamental frequency is high, shorter windows can be used; when it is low, longer windows can be used.
  • M is optimally one time step smaller than the fundamental wavelength. This is because the full wavelength might also contain the maximum peak from the previous fundamental wave.
  • the loudness is only updated whenever the FDM finds a match for a peak, then M should include everything after the peak that matched, i . e . everything after the split peak of Figure 8 for the best matching wave1 and wave2.
  • the longest expected wavelength could be used for M. This avoids over-responsiveness, where rapid changes in loudness are detected that do not actually exist in the signal, which could cause too frequent note attacks by the PADM.
  • more responsiveness can be obtained through a variety of heuristics that seek to estimate the wavelength from partially or completely unreliable frequency information. If no match has been found for several peaks in a row, then the proper size for M starts to become uncertain. In this case, a good heuristic is to start with M at approximately 0.9 times the most recently detected fundamental wavelength. As the frequency becomes more unreliable, M can be increased as a function, g( ⁇ t), of the time elapsed since the last GOOD frequency reliability measurement.
  • M(t) min[w max , 0.9 W(t lm ) g(t - t lm )] where w max is the longest expected wavelength, t is the current time, t lm was the last time that the FDM detected a match, and W(x) is the wavelength at time x.
  • the loudness-tracking module should ideally be tuned according to how sensitive the player wishes the attack detection to be.
  • g( ⁇ t) should depend on the vocal abilities of the individual player. There is a limit as to how fast a person can change the pitch of his voice. It is thought that most people cannot change pitch at a rate greater than about one octave in 40 milliseconds. If the goal is to minimize false attacks, then g( ) can compute the worst-case value, as though the frequency were actually to drop at this maximal rate beginning the instant of the FDM's last match. However, to increase enjoyment for a large market of users it is preferable to change M more slowly than this maximal rate.
  • Figure 12 shows a typical profile of loudness values obtained using Equations (7) and (8).
  • the loudness profile 122 is overlaid on the corresponding sequence of sample-value magnitudes 123. Note that the loudness is only updated at the peaks, as described in the method above.
  • the sound synthesizer receives the following inputs from the SAM: note on/off, frequency, loudness, and attack. From the finger-actuated control (FAC) system, it receives parameters from the user that specify instrument, octave/offset, discrete versus continuous, and musical mode preferences, as well as perhaps other controls not mentioned here. These inputs and their relationships are now described in detail.
  • the output of the SSM is a stream of output samples that are sent to the DAC for conversion to the output signal.
  • the internal structure of the SSM 38 is displayed in Figure 16.
  • the SSM consists of two primary components: the Message Processor (MP) 160, and the Sound Generator (SG) 161.
  • the pitch-conversion and volume-conversion boxes are minor functions that are described below.
  • the MP takes the information generated by the SAM and FAC and produces messages, which it sends to the SG.
  • the most striking part of the SSM is the asynchronous relationship between the Message Processor and the Sound Generator.
  • the MP receives signals from the SAM at regular intervals, preferably 8,000Hz, 11,025Hz, or 22,050Hz, and the SG produces sound samples at regular intervals, preferably at the same rate.
  • messages are not sent from the MP to the SG at regular intervals. Instead, messages are sent only when the output from the SG needs to be altered.
  • the SG generates sounds from an instrument, one note at a time. It can by itself and without further assistance produce an output signal , i.e . a series of output samples, that imitates the requested instrument playing the requested note at the requested volume. Once the note begins to play, it continues to play until it is turned off.
  • the MP sends messages to the SG that tell it to begin or end a note. While a note is playing, the MP can send messages to alter the note in pitch and in volume. The MP can also send messages that tell the SG which instrument to imitate.
  • the pitch-conversion function 162 takes the frequency generated by the SAM and transforms it into a pitch for the MP. Though pitch and frequency are often used to denote the same thing, there is a subtle difference. Frequency occurs naturally; pitch is man made. Frequency describes sound as a physical phenomenon (cycles per second). Pitch is psychophysical, describing sound the way we perceive it. If two frequencies are an octave apart, they have a fixed ratio, i . e . 2. In contrast, pitch is the position of an auditory frequency on a linear musical scale, such as a musical staff or a piano keyboard, where two pitches an octave apart are separated by a constant number of steps, e.g. 12.
  • pitch is understood to be a continuous value that can fall anywhere on a linear musical scale.
  • a note also falls on a linear musical scale, but has a discrete, integer value.
  • P 12 log 2 (F)
  • P the resulting pitch
  • F the frequency as given by the FDM.
  • the volume-conversion function 163 takes the loudness value from the SAM and converts it to a volume-control value for the MP.
  • the MP receives information from the SAM and from the FAC. From the SAM it receives four values: note on/off, attack, pitch, and volume, the latter two being converted frequency and loudness, as just described.
  • the information from the SAM arrives synchronously: 4 values at every cycle.
  • the FAC sends player preference values, such as instrument and octave settings.
  • the FAC information arrives asynchronously, whenever the user wishes to change one or more parameters. For example, the player might press a button to change the instrument being imitated by the SSM, or to cause the SSM to play at an offset of one or more octaves from the pitch being sung.
  • the MP stores the most recent settings as internal variables and applies them appropriately whenever messages are sent to the SG.
  • Instrument-change requests from the FAC require no substantial processing by the MP and can be handled upon arrival. They are simply formed into messages and passed directly to the SG.
  • octave holds a value between -3 and +3. If not zero, this variable indicates that the HumHorn should generate a pitch this many octaves below or above the hummed pitch. Only octave offsets are discussed here, but in principle, any offset from the sung pitch could be specified by the user, for example a major third (4 semitones) or a perfect fifth (7 semitones). These non-octave offsets can be used to produce pleasing and interesting parallel melodies to accompany the hummed pitches.
  • the HumHorn provides the option of having the instrument play the note nearest to the user's pitch. Then, even if the player's voice wavers a bit, the pitch of the instrument remains steady. Thus, if continuous is FALSE, then the pitch played should be rounded up or down to the nearest note in the musical scale or mode selected by the player, as is described below.
  • the variables "mode” and “new_tonic” are also described below.
  • Pseudo code for the Message Processor is shown in Figure 17.
  • the pitch is modified to reflect the pitch scale of the SG as well as the current octave variable.
  • the SG is assumed to have a linear pitch scale discretized into half-step intervals, which correspond to the traditional notes on a keyboard. This is the system used by the MIDI protocol.
  • the starting note in the scale is arbitrary and depends on the SG.
  • the value synthesizer_offset is the difference between a pitch on the mathematically derived pitch scale, as described in Equation 0, and the corresponding pitch for the SG. This is a constant offset for all pitches.
  • the frequency 440Hz corresponds to the 69 th note on the keyboard. In this case, the synthesizer offset is 12 log 2 (440) - 69, or about 36.38 (just over three octaves).
  • a musical mode is a subset of the eleven semitones in an octave. Examples are: major, minor, blues, chromatic, and many more esoteric modes, such as dorian, phrygian, whole-tone, and pentatonic.
  • the chromatic mode consists of every semitone in the octave, numbered 0 - 11.
  • the major mode consists of the following semitones: ⁇ 0, 2, 4, 5, 7, 9, 11 ⁇ .
  • the first note in the mode (note zero) is called the tonic, and all semitones in the mode are an offset from the tonic.
  • the mode variable allows the user to select which mode to use.
  • the player can dynamically assign the tonic to whatever pitch he/she is currently singing. If pitch following is continuous, then the chromatic mode is used (lines 3 and 4), so the nearest semitone is looked up.
  • the nearest_mode_note routine is described below.
  • Lines 6 - 13 decide whether there are reasons to issue an attack despite the fact that there is no attack signal from the SAM.
  • the two cases at lines 8 - 9 and lines 11 - 12 correspond to cases 3 and 4, respectively, discussed above.
  • pitch tracking is continuous (line 8) and the pitch has moved beyond the range where the synthesizer can produce smooth pitch changes based on the attack pitch (line 9).
  • the attack pitch is current_note at line 7, which was set on a previous attack (line 16).
  • the range that the pitch has exceeded is MAX_BEND_RANGE in line 9.
  • pitch tracking is discrete, and the pitch is much closer to another note in the mode then to the attack note (line 12).
  • the attack note for this case is again current_note.
  • MAX_PITCH_ERROR a value between 0.5 and 1.0, determines how much closer the pitch has to be to the other note.
  • a value of 0.5 indicates that the pitch should be rounded to the nearest note.
  • a value greater than 0.5 acts as a kind of hysteresis, keeping the note from changing when the player's voice is a bit unsteady.
  • Lines 14 - 33 send the SG the appropriate message for the current situation, if any. If an attack has been issued for any of the reasons given above, and therefore for any of the four cases described above, then a message is sent to play the new note at the new volume (lines 14 - 22). Whether pitch following is discrete or continuous, the SG receives a message to play nearest_note, an integer note value. If pitch following is continuous, then the SG also receives a message to bend the pitch up or down by a certain amount so as to match the input frequency. Lines 15 and 16 store the note and volume for future reference. If the SG is currently playing a note, then line 18 sends a message to the SG to stop. Line 19 issues the message to play the new note at the new volume. If pitch following is continuous (line 20), the new note is adjusted to match the pitch of the player's voice (line 21). The time of the attack is recorded (line22).
  • the adjust_pitch routine depends again on the SG. For the MIDI protocol, it is possible to adjust the pitch via pitch bend, as well as to adjust the maximum allowable pitch bend range (MAX_BEND_RANGE). The adjust_pitch routine does both, if required.
  • the function that returns the nearest mode note is shown as pseudo code in Figure 18.
  • Four modes are defined at the beginning, though there could be many others. Each mode is defined in terms of the semitones that make it up, starting from the tonic, at position zero, and ending one octave above the tonic, at position 12.
  • the second note is two semitones above the tonic. The next is two more semitones above that, i . e . four above the tonic. The next is one more semitone up.
  • the tonic is itself is an integer between 0 and 11 and is a note in the lowest octave of the linear pitch scale.
  • the twelfth semitone above the tonic is one octave above the tonic, but it has the same place in the mode as the tonic and is also considered to be the tonic. In fact, all modes are octave blind, i . e . they are offsets from the nearest tonic below. Thus, if the pitch is 38.3 and the tonic is 2, then the nearest tonic below 38.3 is 36 (2 + 12 + 12 + 12).
  • the new_tonic variable is set, the integer, i . e . semitone, closest to the given pitch is stored as the tonic, but reduced to the first octave in the scale, so that it has a value between 0 and 11 (line 7).
  • the variable "offset” is the difference between pitch and the nearest tonic below it (line 8).
  • mode_note an integer is the number in the specified mode that is closest to offset (a real). The difference between them (line 10), when added to the original pitch, gives the closest mode note (line 11).
  • the Sound Generator can be implemented: with a standard MIDI (Musical Instrument Digital Interface) module, or with a self-designed synthesizer. Because the demands on this module are far less than the capabilities of most MIDI systems, it may be preferable to build and design a custom synthesizer module so as to save chip space. On the other hand, the capabilities of off-the-shelf MIDI chips are generally sufficient for our purposes, and in fact the messaging methodology of the Message Processor was designed to conform to MIDI standards. A MIDI processing unit could therefore meet our specifications with little or no modifications.
  • MIDI Musical Instrument Digital Interface
  • the Humhorn consists of the following hardware components, each either custom built or off the shelf:
  • the ADC, DAC, or both might already be resident on the chip.
  • the filtering mechanism of the SAM could be replaced by a filtering microphone or other mechanism that performs the necessary band-pass filtering mechanically or with analog circuitry.
  • the finger-actuated controls it is desirable to have at least two different kinds: those that switch and stay in position, and those that return upon release.
  • FACs used for pitch tracking. It is best to have a switch that can be set in continuous or discrete pitch-tracking mode and that stays there once set. It is also desirable to have a button that temporarily changes to the opposite mode.
  • the player when the player is in continuous mode and wants to quickly nail down a pitch, or to sing a quick scale in a musical mode, he can press the button and then immediately release it when done.
  • the player while in discrete mode, the player can quickly slide to another pitch - including one outside the current musical mode - by temporarily pressing the button, then immediately pop back into key by releasing the button.
  • Buttons are also desirable for quickly changing between instruments and octaves, allowing the player to be a one man band.
  • the housing of the instrument may itself have a variety of purposes and functions.
  • the housing may come in two sections: An inner container and an outer shell.
  • the inner container holds the electronics and batteries in a simple, convenient, easy-to-handle, self-contained unit. Its purpose is to hold the heavy and high-priced items in a compact and modular form.
  • the contribution of the outer shell is its styling.
  • the outer shell can be manufactured to resemble any traditional or novel instrument form, for both its visual and/or its acoustic properties.
  • the shell may contain the microphone and/or speaker(s) as well.
  • the inner and outer housings can be manufactured such that they are easily separated.
  • the shell When they are properly fit together, the shell provides information to the inner container by way of a physical key on the inner surface of the outer shell that fits into a corresponding slot on the outside of the inner container. Together with other possible information, the key would provide a description of the expected instrument sound that the SG should produce. Thus, by pulling the inner container from a shell in the form of one instrument and inserting it into another shell in the form of a different instrument, the sound produced by the SG would change from that of the former instrument to that of the latter. A multitude of different shells could be manufactured so that the player can get not just the auditory impression, but also the tactile and visual impression of playing a specific musical instrument.
  • Instrument mouthpieces are unsanitary and make one reluctant to share one's instrument.
  • a funnel-shaped receptacle on the microphone end of the instrument psychologically and mechanically discourages holding one's lips against it.
  • N_SAMPLES_PER_CHECKPOINT Fig. 9 11 MAX_VOICE_SPEED Fig. 10 .2 MIN_ATTACK_GAP Fig. 14 .1 seconds MIN_ATTACK_LOUDNESS Fig. 14 10 (based on 8 bit sample values) MIN_LOUDNESS Fig. 14 8 (based on 8 bit sample values) MIN_TIME_OFF Fig. 14 .05 seconds MAX_CONSECUTIVE_WEAK Fig. 15 5 MAX_PITCH_ERROR Fig. 17 0.75 semitones MAX_BEND_RANGE Fig. 17 2 semitones SG_REFRACTORY_PERIOD Fig. 17 .02 seconds
  • the FDM described above has a delay of less than 30 milliseconds (about 1/30th of a second) from the time the new pitch is begun by the singer until it is finally detected by the FDM.
  • the lowest note sung is C two octaves below middle C, having a frequency of 65 Hz (an exceptionally low note), in which case one cycle takes 15 milliseconds and two cycles take 30 milliseconds.
  • the SSM generates a new instrument attack only after the FDM detects the pitch, this attack may be slightly noticeable and possibly jarring, emphasizing this delay. It is possible to reduce the impression of delay in the following way. For each instrument there is a non-voiced attack sound.
  • the SSM begins playing the unvoiced attack sound. Then, starting when the FDM detects the pitch, this unvoiced sound is gradually blended into the sound of the instrument attack at the detected pitch. This would require specialized MIDI programming if standard MIDI were to be used.
  • the instrument can sometimes sound whiny during continuous pitch tracking due to the minor pitch variations of the singer's voice, which the sound of the HumHorn may actually accentuate. It is possible to mitigate this whininess significantly by smoothing the resulting pitch profile played by the instrument. That is, the intent of the pitch-smoothing function is to allow the flexibility of continuous pitch tracking, while mitigating the whininess exhibited by some instruments with some people's voices.
  • One way of smoothing pitch is to pass the pitch profile generated by the FDM through a low-pass filter.
  • a better method is obtained by using principles from control systems theory. Think of the pitch played by the instrument as tracking the pitch profile generated by the FDM. We can add mass to the pitch of the instrument in the way that this tracking occurs.
  • E P FDM - P inst d 2
  • P inst /dt 2 k1 * E + k2* int_time(E) - k3 * dP in /dt
  • P FDM is the pitch indicated by the FDM
  • P inst is the pitch to be played by the instrument
  • E is the pitch-tracking error between the instrument and the output of the FDM
  • int_time() stands for integration over time
  • k1, k2, and k3 are constants.
  • the derivative term (the third term) stabilizes P inst , because it has a dampening effect. It is used to dampen out oscillations in the control.
  • the integral term improves the accuracy of the tracking. By changing the values of the constants, we can obtain various levels of smoothing, tracking accuracy, and response times. In fact, there are probably better control laws than this for this purpose, such as lead-lag control, but the main idea is presented by the PID control law.
  • the following ideas are related to HumBandTM technology, in particular, the use of the HumBandTM in relation to the Internet, for example, as an Internet appliance.
  • the HumBandTM voice-analysis process extracts a small amount of important information from the voice stream and uses it to play the desired instruments. It is estimated that an uncompressed bandwidth of no more than 300 bytes/second is necessary to capture all nuances, but this can be greatly compressed to an estimated 500 bits/second on average with no loss, perhaps less. A three-minute song would therefore consume approximately 11 Kbytes for one voice. Multiple voices require proportionately more. This is quite a low number and suggests that HumBandTM email, downloads, and other forms of HumBandTM communication can be performed with minor overhead.
  • Audience Member As a member of the audience, one may comment and discuss upon a performance in real time, during the performance. There may be special symbols or auditory icons that have particular meanings and that can be sent to the performer. For example, applause, bravo's, catcalls, laughter, cheers, and whistling, that the performer hears. In addition, each audience member may participate in a round of voting to express his subjective opinion as to the quality of the performance.
  • the performer is attracted to the session because of his innate hidden desire to perform live in front of an audience. This is exciting and fun, and due to the anonymity of the Internet as well as the vocal disguise provided by the HumBandTM, it is also less intimidating than an on-stage performance. Imagine performing for a crowd of tens or hundreds (or even thousands!) in the secluded comfort of your home.
  • the HumBandTM instrument is connected via an interface directly to the Internet so that the performance can be transmitted live via the HumJam.com web site.
  • the performer receives live feedback from the audience members, and at the end of the performance may receive a rating by said members.
  • Voting is performed for three purposes:
  • the method above is a suggestion of a single kind of interactive scenario that could appeal to the competitive and performance-interested nature of many people. Further, one kind of prize would be the opportunity to perform for a very large audience, such as the combined audiences of all groups, or specially advertised events which feature the winning performers (of any ranking).
  • Each performer and audience member is able to participate via his Internet-capable HumBandTM. All performers send information via their HumBandTM. All audience members listen to these performances via their HumBandsTM/PCs/PC headphones/HumBandTM headphones/ or other HumBand-codec enabled appliance.
  • the performer plays along with an accompaniment provided by the HumJam.com HumServerTM.
  • the server sends the accompaniment information via the HumBand codec to the performer.
  • the accompaniment is played on any enabled appliance
  • the HumBand codec is very similar to MIDI, but perhaps optimized for voice-control.
  • the performer plays in synch with this accompaniment and his signal is sent via this same codec back to the server.
  • the server only then broadcasts the performance to the audience.
  • the performer and accompaniment are in perfect synchrony. There is no latency issue. This is because the server receives the performer's signal and is able to combine it with the accompaniment, properly timed such that the resulting signal reproduces the performance as heard by the performer. Thus, though there is a slight delay, the performance is nonetheless broadcast live and is full fidelity.
  • Audience members send their comments and votes to the server, which tallies, organizes, and distributes them.
  • a central server - a conductor - can send a steady pulse, e.g. a metronome tick, to all parties, timed such that each party receives the signal simultaneously.
  • a steady pulse e.g. a metronome tick
  • Each performer can then time his performance to match this pulse and expect a slight delay from the other performers, gradually (or perhaps quickly) learning to ignore and adapt to this slight delay.
  • the pulse is effectively the accompaniment.
  • Software at the performer's site can account for this delay upon the conclusion of the piece, and can replay for each performer, the sound of the whole performance without the delays.
  • Performances can be downloaded from the site, different pieces played by various interesting and/or famous performers. Because the information downloaded is much more precise, containing far greater subtlety and nuance, than typical MIDI performances, one can expect greater realism and appeal.
  • the accompaniment sections of many, many different pieces could be made available such that one could download them (low bandwidth) into one's HumBandTM and then play along with them.
  • HumBandTM Software, "HumletsTM”, could be downloadable from the site into the HumBandTM and could modify control in a variety of useful ways, such as:
  • music instruction could be human (paid), or software (free).
  • Software could be used on or offline and would, for example, help the learner improve pitch control. This could be done by playing a selection for the learner or alternatively, allowing the learner to read a score, and waiting for the learner to play what he has heard or read.
  • the software could show both the correct pitch and the learner's pitch as two simultaneous real-time graphs. The learner could then see visually when his pitch is too high or low.
  • the HumJam.com site could sponsor HumBandTM games for online use or to download.
  • One example is a "Simon"-like game, where the player must mimic a sequence of notes. When the player repeats the notes correctly, the sequence is extended by another note. Besides simply singing the notes in order, the player might also have to play them with different instrument sounds which change between notes, or in changing octaves.
  • the HumBandTM could be built with an I/R port to allow wireless networking between instruments present in the same room. Possible uses are:
  • FIG. 33 shows a feed-forward multilayer perceptron (FFMLP) neural network whose inputs are the sample values at equally spaced intervals along the waves.
  • the first wave shape (between t start and t split ) is input to input layer 334, and samples from the wave between t split and t current ) are inputs to input layer 336. Connections from the said input layers are then fed froward to the hidden layer 332, and connections form this layer are fed forward to the single output node 338.
  • the desired output of the latter, i . e . of the network is the probability that the shape of the first wave matches the shape of the second wave.
  • the variable "difference" (see Figure 9) is defined to be inversely related to this probability in some manner.
  • a network or other adaptive algorithm can be built to replace both box 62 and box 63.
  • This algorithm takes as its input a specific number of the most recent samples, and the algorithm is trained to produce at its output an estimate for the split point within those values.
  • the advantage of the FDM is that a much smaller set of waves are tested for matching compared to standard auto-correlation approaches. The FDM is therefore still efficient even when a more complex and adaptive shape-comparison module replaces box 63.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)

Claims (21)

  1. Ein in der Hand zu haltendes, in sich abgeschlossenes, sprachgesteuertes elektronisches Musikinstrument, das ein Mundstück (5), an dem die Stimme eines Benutzers eintritt, ein Signalanalysemodul (SAM, 33), das ein Eingangssignal (40) von dem Mundstück (5) empfängt, eine oder mehrere Benutzersteuerungen (FAC, 1a, 1b, 37), eine oder mehrere Klangwiedergabevorrichtungen (34, 35, 36), die mit dem Signalanalysemodul (SAM, 33) gekoppelt sind, ein Gehäuse (11), wobei das Mundstück (5), die eine oder die mehreren Benutzersteuerungen (FAC, 1a, 1b, 37) und die eine oder die mehreren Klangwiedergabevorrichtungen (34, 35, 36) gänzlich innerhalb der Grenzen des Gehäuses (11) enthalten sind, aufweist und das ferner folgende Merkmale aufweist:
    ein Klangsynthetisierungsmodul (SSM, 38), das mit dem Signalanalysemodul (SAM, 33) gekoppelt und gänzlich innerhalb der Grenzen des Gehäuses (11) enthalten ist; und
    wobei das Signalanalysemodul (SAM, 33) die Frequenz des Eingangssignals (40) identifiziert und ein Frequenzausgangssignal (42) liefert, das dieselbe angibt, und die Lautstärke des Eingangssignals (40) bestimmt und ein Lautstärkeausgangssignal (43) liefert, das dieselbe angibt;
    wobei das Klangsynthetisierungsmodul (SSM, 38) sowohl das Frequenzausgangssignal (40) als auch das Lautstärkeausgangssignal (43) von dem Signalanalysemodul (SAM, 33) empfängt, wobei das Klangsynthetisierungsmodul (SSM, 38) das Frequenzausgangssignal (40) in ein Tonhöhesignal transformiert (162) und das Lautstärkeausgangssignal (43) in einen Volumensteuerwert umwandelt (163), um ein Ausgangsprobensignal zu erzeugen, das im Wesentlichen sowohl die Tonhöhe als auch das Volumen der Stimme des Benutzers erfasst;
    wobei die eine oder die mehreren Klangwiedergabevorrichtungen (34, 35, 36) das Ausgangsprobensignal empfangen und einen Ausgangsklang (13) erzeugen, der im Wesentlichen sowohl die Tonhöhe als auch das Volumen der Stimme des Benutzers erfasst und den Klang eines traditionellen Instruments imitieren kann;
    wobei die Tonhöhe und das Volumen des Instruments im Wesentlichen der Tonhöhe und dem Volumen der Stimme des Benutzers folgen.
  2. Das Instrument gemäß Anspruch 1, bei dem das Signalanalysemodul (SSM, 38) folgendes Merkmal aufweist:
    eine Tonhöheerfassungstechnik, um die Frequenz des Eingangssignals (40) für das sprachgesteuerte elektronische Musikinstrument zu identifizieren.
  3. Das Instrument gemäß Anspruch 2, bei dem die Tonhöheerfassungstechnik folgende Schritte umfasst:
    Bestimmen kürzlich erfolgter Zeitschritte;
    Summieren von Unterschieden über eine kleine Teilmenge von kürzlich erfolgten Zeitschritten, um eine richtige Grundfrequenz zu finden;
    Bestimmen von Wellensegmenten; und
    Vergleichen lediglich von Wellensegmenten, die durch Wellenformmerkmale wie z.B. Spitzen oder Nulldurchgänge begrenzt sind.
  4. Das Instrument gemäß Anspruch 1, bei dem das Gehäuse (11) in einer Form gebildet ist, die ein Musikinstrument darstellt.
  5. Das Instrument gemäß Anspruch 1, bei dem das Signalanalysemodul (SAM, 33) ferner folgende Merkmale aufweist:
    ein Frequenzerfassungsmodul (FDM, 45) zum Empfangen des Eingangssignals (40), zum Identifizieren der Frequenz des Eingangssignals und zum Liefern des Frequenzausgangssignals (42), das dieselbe angibt, an das Klangsynthetisierungsmodul (SSM, 38);
    ein Lautstärkenachverfolgungsmodul (LTM, 47) zum Empfangen des Eingangssignals (40), zum Bestimmen der Lautstärke des Eingangssignals (40) und zum Liefern des Lautstärkeausgangssignals (43), das dieselbe angibt, an das Klangsynthetisierungsmodul (SSM, 38); und
    ein Toneinsatzmodul (PADM, 46) zum Empfangen des Eingangssignals (40), zum Bestimmen eines Ton-Ein-/-Aus-Werts und eines Einsatzwerts und zum Liefern eines Ton-Ein-/-Aus-Ausgangssignals (41) und eines Einsatzausgangssignals (44), die dieselben angeben, an einen Klangsynthetisierer (SSM, 38).
  6. Das Instrument gemäß Anspruch 1, bei dem zumindest eine der einen oder der mehreren Benutzersteuerungen (FAC, 37, 1a, 1b) entweder eine Steuerung, die auf eine Betätigung eines Benutzers hin umschaltet und in ihrer Position verbleibt, oder eine Steuerung, die auf eine Freigabe durch einen Benutzer hin zurückkehrt, umfasst.
  7. Das Instrument gemäß Anspruch 1, bei dem das Klangsynthetisierungsmodul (SSM, 38) auf steuerbare Weise wirksam ist, um entweder eine kontinuierliche Tonhöhenachverfolgung oder eine diskrete Tonhöhenachverfolgung zu liefern (162), wobei das Tonhöhesignal der exakten Frequenz des Frequenzausgangssignals (40) während der kontinuierlichen Tonhöhenachverfolgung entspricht, und wobei das Tonhöhesignal dem Ton entspricht, der während der diskreten Tonhöhenachverfolgung am nächsten bei dem Frequenzausgangssignal (40) liegt; und
    wobei zumindest eine der einen oder der mehreren Benutzersteuerungen (FAC, 37, 1a, 1b) einen Schalter umfasst, der auf wählbare Weise entweder die kontinuierliche Tonhöhenachverfolgung oder die diskrete Tonhöhenachverfolgung einstellen kann, wobei der Schalter bei der ausgewählten Tonhöhenachverfolgung bleibt, nachdem dieselbe eingestellt wurde.
  8. Das Instrument gemäß Anspruch 1, bei dem das Klangsynthetisierungsmodul (SSM, 38) auf steuerbare Weise wirksam ist, um entweder eine kontinuierliche Tonhöhenachverfolgung oder eine diskrete Tonhöhenachverfolgung zu liefern (162), wobei das Tonhöhesignal der exakten Frequenz des Frequenzausgangssignals (40) während der kontinuierlichen Tonhöhenachverfolgung entspricht, und wobei das Tonhöhesignal dem Ton entspricht, der während der diskreten Tonhöhenachverfolgung am nächsten bei dem Frequenzausgangssignal (40) liegt; und
    wobei zumindest eine der einen oder der mehreren Benutzersteuerungen (FAC, 1a, 1b, 37) eine Taste (FAC, 1a, 1b, 37) umfasst, die das Instrument vorübergehend von der kontinuierlichen Tonhöhenachverfolgung zu der diskreten Tonhöhenachverfolgung wechseln lässt, wobei, wenn der Benutzer das Instrument in der kontinuierlichen Tonhöhenachverfolgung bedient und eine Tonhöhe auswählen oder eine Tonleiter in einem Musikmodus singen möchte, der Benutzer die Taste (FAC, 1a, 1b, 37) drücken und, wenn er fertig ist, unverzüglich loslassen kann.
  9. Das Instrument gemäß Anspruch 1, bei dem das Klangsynthetisierungsmodul (SSM, 38) auf steuerbare Weise wirksam ist, um entweder eine kontinuierliche Tonhöhenachverfolgung oder eine diskrete Tonhöhenachverfolgung zu liefern (162), wobei das Tonhöhesignal der exakten Frequenz des Frequenzausgangssignals (40) während der kontinuierlichen Tonhöhenachverfolgung entspricht, und wobei das Tonhöhesignal dem Ton entspricht, der während der diskreten Tonhöhenachverfolgung am nächsten bei dem Frequenzausgangssignal (40) liegt; und
    bei dem zumindest eine der einen oder der mehreren Benutzersteuerungen (FAC, 1a, 1b, 37) eine Taste umfasst, mittels derer ein Benutzer, während das Instrument für die diskrete Tonhöhenachverfolgung eingestellt wird, vorübergehend zu einer anderen Tonhöhe ziehen kann, einschließlich einer außerhalb eines aktuellen Musikmodus liegenden Tonhöhe, indem er vorübergehend die Taste drückt.
  10. Das Instrument gemäß Anspruch 1, bei dem zumindest eine der einen oder der mehreren Benutzersteuerungen (FAC, 1a, 1b, 37) eine Taste zum Wechseln zwischen Oktaven umfasst.
  11. Das Instrument gemäß Anspruch 1, bei dem das Gehäuse (11) folgende Merkmale umfasst:
    einen Innenbehälter und einen Außenmantel;
    wobei der Innenbehälter das Signalanalysemodul (SAM, 33) enthält; und
    wobei der Außenmantel so hergestellt ist, dass er einem traditionellen Musikinstrument ähnelt.
  12. Das Instrument gemäß Anspruch 11, bei dem der Außenmantel das Mundstück (5) und/oder die eine oder die mehreren Klangwiedergabevorrichtungen (34, 35, 36) enthält.
  13. Das Instrument gemäß Anspruch 1, das ferner folgendes Merkmal aufweist:
    eine Stromquelle, die gänzlich innerhalb der Grenzen des Gehäuses (11) enthalten ist.
  14. Das Instrument gemäß Anspruch 11, bei dem der Innenbehälter und der Außenmantel so gebildet sind, dass sie zusammenpassen, wobei der Außenmantel eine Instrumentbeschreibung und einen Kommunikationsweg umfasst, der Informationen an den Innenbehälter liefert, die eine Beschreibung eines tatsächlichen Instrumentenklanges umfassen, der durch ein traditionelles Instrument erzeugt wird, dem der Außenmantel ähnelt;
    bei dem ein Herausziehen des Innenbehälters aus dem Außenmantel, wobei der Außenmantel in Form eines Instruments erzeugt ist, und ein Einfügen desselben in einen anderen Außenmantel, der in Form eines anderen Instruments vorgesehen ist, den Innenbehälter dazu konfiguriert, einen Klang zu liefern, der durch ein Instrument erzeugt wird, dem der Außenmantel, mit dem der Innenbehälter derzeit ein Paar bildet, ähnelt.
  15. Das Instrument gemäß Anspruch 14, bei dem ein physischer Schlüssel auf einer Innenoberfläche des Außenmantels in einen entsprechenden Schlitz auf einer Außenseite des Innenbehälters passt, um das Instrument dazu zu konfigurieren, einen Klang wiederzugeben, der einem tatsächlichen Instrument zugeordnet ist, das die Form aufweist, die durch den Außenmantel dargestellt wird.
  16. Das Instrument gemäß Anspruch 14, das ferner folgendes Merkmal aufweist:
    eine Mehrzahl unterschiedlicher Außenmäntel, die so hergestellt sind, dass der Benutzer nicht nur den auditiven Eindruck, sondern auch einen taktilen und visuellen Eindruck erhält, ein spezifisches Musikinstrument zu spielen.
  17. Das Instrument gemäß Anspruch 1, bei dem das Gehäuse (11) in Form entweder einer Trompete, einer Posaune, eines Saxophons, einer Oboe, eines Fagotts, einer Klarinette, einer Flöte, eines Klaviers, einer elektrischen Gitarre oder einer Pfeife vorgesehen ist.
  18. Das Instrument gemäß Anspruch 1, bei dem das Signalanalysemodul (SAM, 33) eine Kombination einer Autokorrelation und eines Nulldurchgangs oder einer spitzenbasierten Tonhöheerfassung implementiert.
  19. Das Instrument gemäß Anspruch 6, bei dem die eine oder die mehreren Benutzersteuerungen (FAC, 37, 1a, 1b) ferner eine beliebige der folgenden umfasst beziehungsweise umfassen:
    eine Musikmodus-Auswahltaste und eine Tonika-Einstelltaste für einen Musikmodus, wobei die Tonikaeinstellung eine Auswahl dessen umfasst, ob eine aktuelle Tonhöhe ein erster Ton in dem Musikmodus ist.
  20. Das Instrument gemäß Anspruch 6, bei dem die eine oder die mehreren Benutzersteuerungen (FAC, 37, 1a, 1b) ferner folgendes Merkmal umfasst beziehungsweise umfassen:
    zumindest eine Instrumentenauswahltaste (FAC, 37, 1a, 1b) zum Bewirken eines dauerhaften oder vorübergehenden Instrumentenwechsels, optional durch dauerhaftes oder vorübergehendes Zuweisen eines Instruments zu einer Taste (FAC, 37, 1a, 1b), wobei ein Pressen der Taste (FAC, 37, 1a, 1b) einen durch das Instrument erzeugten Klang zu einem Klang eines anderen Instruments verändert, das der Taste (FAC, 37, 1a, 1b) zugewiesen ist, bis die Taste (FAC, 37, 1a, 1b) losgelassen oder gewechselt wird.
  21. Das Instrument gemäß Anspruch 13, bei dem das Gehäuse (11) folgende Merkmale aufweist:
    einen Innenbehälter und einen Außenmantel;
    wobei die Stromquelle in dem Außenmantel enthalten ist;
    wobei der Innenbehälter das Signalanalysemodul (SAM, 33) enthält; und
    wobei der Außenmantel so hergestellt ist, dass er einem traditionellen Musikinstrument ähnelt.
EP00936067A 1999-05-20 2000-05-19 Durch sprache gesteuertes elektronisches musikinstrument Expired - Lifetime EP1183677B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13501499P 1999-05-20 1999-05-20
US135014P 1999-05-20
PCT/US2000/013721 WO2000072303A1 (en) 1999-05-20 2000-05-19 Voice-controlled electronic musical instrument

Publications (2)

Publication Number Publication Date
EP1183677A1 EP1183677A1 (de) 2002-03-06
EP1183677B1 true EP1183677B1 (de) 2005-08-31

Family

ID=22466107

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00936067A Expired - Lifetime EP1183677B1 (de) 1999-05-20 2000-05-19 Durch sprache gesteuertes elektronisches musikinstrument

Country Status (6)

Country Link
EP (1) EP1183677B1 (de)
JP (1) JP2003500700A (de)
AT (1) ATE303645T1 (de)
AU (1) AU5143400A (de)
DE (1) DE60022343T2 (de)
WO (1) WO2000072303A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9024168B2 (en) 2013-03-05 2015-05-05 Todd A. Peterson Electronic musical instrument

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6653546B2 (en) * 2001-10-03 2003-11-25 Alto Research, Llc Voice-controlled electronic musical instrument
GB2392544A (en) * 2002-08-29 2004-03-03 Morgan Computing Ltd Device for creating note data
JP4448378B2 (ja) 2003-07-30 2010-04-07 ヤマハ株式会社 電子管楽器
JP2005049439A (ja) 2003-07-30 2005-02-24 Yamaha Corp 電子楽器
DE102013014443A1 (de) * 2013-09-02 2015-03-05 Michael Kraft Vorrichtung zum Erzeugen eines elektroakustischen Schaltwandlersignals mittels der Stimme eines Benutzers

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1393542A (en) * 1972-02-24 1975-05-07 Pitt D B Voice actuated instrument
US4342244A (en) * 1977-11-21 1982-08-03 Perkins William R Musical apparatus
DE3009864A1 (de) * 1980-03-12 1981-09-24 Günter Dipl.-Ing. Dr. 2282 List Wagner Didaktisches elektronisches musikinstrument
US4633748A (en) * 1983-02-27 1987-01-06 Casio Computer Co., Ltd. Electronic musical instrument
US4757737A (en) * 1986-03-27 1988-07-19 Ugo Conti Whistle synthesizer

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9024168B2 (en) 2013-03-05 2015-05-05 Todd A. Peterson Electronic musical instrument

Also Published As

Publication number Publication date
EP1183677A1 (de) 2002-03-06
AU5143400A (en) 2000-12-12
ATE303645T1 (de) 2005-09-15
DE60022343T2 (de) 2006-06-22
WO2000072303A1 (en) 2000-11-30
JP2003500700A (ja) 2003-01-07
DE60022343D1 (de) 2005-10-06

Similar Documents

Publication Publication Date Title
US6737572B1 (en) Voice controlled electronic musical instrument
US6653546B2 (en) Voice-controlled electronic musical instrument
US6967275B2 (en) Song-matching system and method
Dittmar et al. Music information retrieval meets music education
CN112382257B (zh) 一种音频处理方法、装置、设备及介质
US20040244566A1 (en) Method and apparatus for producing acoustical guitar sounds using an electric guitar
JP7424359B2 (ja) 情報処理装置、歌唱音声の出力方法、及びプログラム
CN107146598B (zh) 一种多音色混合的智能演奏系统和方法
Paulus Signal processing methods for drum transcription and music structure analysis
Hsu Strategies for managing timbre and interaction in automatic improvisation systems
EP1183677B1 (de) Durch sprache gesteuertes elektronisches musikinstrument
JP4808641B2 (ja) 似顔絵出力装置およびカラオケ装置
US5430244A (en) Dynamic correction of musical instrument input data stream
JP4038836B2 (ja) カラオケ装置
Janer Singing-driven interfaces for sound synthesizers
Al-Ghawanmeh Automatic accompaniment to Arab vocal improvisation “Mawwāl”
Franklin PnP maxtools: Autonomous parameter control in MaxMSP utilizing MIR algorithms
Mehrabi et al. Vocal imitation for query by vocalisation
Risset et al. Sculpting sounds with computers: music, science, technology
JP6582517B2 (ja) 制御装置およびプログラム
CN103943098A (zh) 多米索交响乐器
Oliver The Singing Tree: a novel interactive musical experience
Murray-Rust Virtualatin-agent based percussive accompaniment
Rego Rhythmically-Controlled Automata Applied to Musical Improvisation
Jensenius How do we recognize a song in one second?: the importance of salience and sound in music perception

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20011110

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17Q First examination report despatched

Effective date: 20040629

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED.

Effective date: 20050831

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050831

Ref country code: CH

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050831

Ref country code: LI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050831

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050831

Ref country code: BE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050831

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050831

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 60022343

Country of ref document: DE

Date of ref document: 20051006

Kind code of ref document: P

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20051130

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20051130

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20051130

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20051212

RAP2 Party data changed (patent owner data changed or rights of a patent transferred)

Owner name: JOHNMARK LLC

RIN2 Information on inventor provided after grant (corrected)

Inventor name: JAMESON, JOHN W.

Inventor name: RING, MARK B.

NLT2 Nl: modifications (of names), taken from the european patent patent bulletin

Owner name: JOHNMARK LLC

Effective date: 20051221

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060222

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20060406

Year of fee payment: 7

ET Fr: translation filed
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060519

RIN2 Information on inventor provided after grant (corrected)

Inventor name: JAMESON, JOHN W.

Inventor name: RING, MARK B.

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060531

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20060531

Year of fee payment: 7

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20060601

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20070519

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20080131

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20071201

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20070519

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060519

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20070531

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060531

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050831