WO1993017408A1

WO1993017408A1 - Method and apparatus for ear training

Info

Publication number: WO1993017408A1
Application number: PCT/US1993/001450
Authority: WO
Inventors: Bertrand Perroud; Agnès PERROUD; Hervé PERROUD
Original assignee: Bertrand Perroud; Perroud Agnes; Perroud Herve
Priority date: 1992-02-20
Filing date: 1993-02-18
Publication date: 1993-09-02
Also published as: AU3774293A

Abstract

An ear training system (100) which selects particular exercises from a structured curriculum for performance by a user. The system (100) prompts the user to play a note. The training system (100), in a practice mode, samples the played note (160), extracts its frequency (105), and provides immediate feedback (101) to the precision of the played note relative to the prompted note. The process continues for each note of the exercise. Results are stored in a session log (107), which is evaluated to select, within a framework of the predefined pedagogic scenarios, to select a subsequent exercise. Asynchronous performance of exercises which provide feedback upon completion of the entire exercise provide a better evaluation of a user's ability to precisely recognize and reproduce notes. A digitizer (160) coupled with a processor (frequencizer) (108) to perform fast Fourier transforms for frequency extraction permit conversion of audio sound signals to digitized sounds and ASCII representations of decimal notes. Systems employing a frequencizer are able to perform music notation for analog music notes.

Description

METHOD AND APPARATUS FOR EAR TRAINING

BACKGROUND OF THE INVENTION

The present invention relates generally to a music training apparatus and method and, more particularly, to a system for ear training which calibrates and educates a person's ear to recognize pitch and develop a sense of harmony and musicality.

Musicians enhance their musicality by their ability to identify, and to produce mentally and physically, a series of musical notes. Precise identification and production of the series of notes is of utmost importance for singers, violinists, saxophonists or other instrument for which the musician produces a proper pitch for a note by proper body mechanics.

However, pitch sensitivity of the musician's ear is very important to any musician as it provides the musician with better musicality, in addition to sight reading and sight singing abilities. Piano players need an outside source of ear training. As they produce good notes having proper pitch when they strike the proper key, provided that the piano is properly tuned, the production of proper pitch for the notes does not challenge any discrimination from the musician's ear.

Techniques to improve musicality include development of a separate tool to enhance musicality through ear training. In a best case scenario, a keyboard instrument can teach ear discrimination of notes for one-half step intervals. Thus, ear training systems relying on keyboard devices are limited as to a degree of precision attainable for identification of notes. Keyboard-type ear training systems cannot judge intonation, also referred to as perfection of pitch.

For instruments such as the voice, the violin or the saxophone, for example, playing in tune relies not only upon proper identification of notes, but also on physically producing the right pitch. For these types of instruments, the musician must identify the pitch of a note in order to produce it. The more accurate the musician is in identifying the notes, the more accurate the musician will be in reproducing the notes. For these reasons, among others, it is important for musicians to train their ears to be able to identify and reproduce accurately, mentally or by singing, a series of notes.

Ear training methods of the prior art include professor-student sessions in which the student attempts to reproduce a series of predefined notes by voice or instrument. The professor attempts to provide feedback to the student during the performance. The professor attempts to indicate to the student when a pitch of any particular one played note is too sharp or too flat. While this method has been employed for years, it has disadvantages. One disadvantage is that it is difficult to provide real-time feedback for each note. There is some delay after playing a note and the professor's ability to comment on the note. A second disadvantage is a necessity for the professor, for its communication, to rely on the ear of the student (which is the object of the training itself). Another disadvantage with the system is that the comments may be incorrect in that the system is only as reliable as the professor's ability to discriminate proper relative pitch. The prior art method has still another disadvantage in that the student is unable to learn by themselves. Learning according to the method of the prior art requires the assistance of another, and that other person must be a skilled person.

The invention provides a novel apparatus and method to allow a student, otherwise unassisted, to properly and efficiently train their ear to recognize proper pitch of notes at a significant precision depending upon the instrument (for example one-eighth, one-twelfth or one-sixteenth of a step). Additionally, the ear training system trains each user to recognize pitch so as to produce notes themselves at such a precision, by singing or playing an instrument such as a voice, a violin, or a saxophone, for example.

SUMMARY OF THE INVENTION

The present invention provides apparatus and method for assisting a student to improve their abilities to recognize pitches and their understanding of solfege (music theory, intervals, sight reading and sight singing). The invention permits the musician to calibrate and educate their ear to enhance development of harmony and musicality. For each student, the system selects a particular exercise from a set of self-parameterizing pedagogic sessions. A difficulty level of the curriculum adjusts automatically to the student by choosing a particular exercise, a routine for the exercise and a level of audio and visual support provided by the system (this is one of several modes of the ear training system, called a practice mode in which samples are preocessed synchronously, after each sample is acquired for each note of the exercise). An ultimate goal of the ear training system is to have the user perform all exercises without any support and interference from the system (this is a mode referred to as test mode in which samples are processed asynchronously, i.e. when the exercise is complete and the user has attempted to generate each note) one hundred percent in tune (with a sensitivity requirement varying depending upon a particular instrument). Users of the ear training system have an option to elect to practice any specific element of a curriculum in a mode of their choice. Furthermore, graphic representation of sound characteristics (waveform and spectrum) are available for refining the quality of the musician's sound. Users can also prompt the ear training system to scan and analyze statistically their session log (which contains the best and last performance of each exercise executed in the test mode) for patterns of mistakes.

According to one aspect of the invention, an ear training system includes an electronic display, an analog-to-digital converter to sample generated sounds and convert the sounds into a digital format, memory for storing scenarios, exercises, and session logs, and a processor to present specific exercises to the electronic display, to determine pitch of notes actually played and compare them to expected notes, to provide feedback to the user regarding the pitch of the generated note, to record performance against predetermined standards, and to provide the user with a historical evaluation of performance relative to the predetermined standards.

In operation, the system initializes itself upon launch to determine specific hardware features present to optimize performance on the specific platform. After launch, the processor displays an exercise window with a recommended choice and a palette from which the user selects specific options or modifies system

parameters. To begin, the user manually selects a particular exercise, or permits the system to select an exercise based on the curriculum and past performance. The processor displays the particular exercise, typically a series of notes, and prompts the user to sing or play the exercise. As the user generates the notes, the analog-to-digital converter samples each note and extracts the note's pitch. The processor compares the generated note against the note of the exercise. If the pitch of the generated note exceeds a predefined error margin, the system indicates whether the pitch of the generated note was too sharp, or too flat. Different exercises will prompt the user to continue to try to generate the proper pitch until correct, or will permit the user to continue with the exercise.

When ready, the user performs the exercise without any interference from the system (test mode). The notes are analyzed when the exercise is complete with results of the exercise displayed on the score and recorded in the session log. The log permits the user to review any particular exercise's performance, not just in a numerical sense, but also to playback the session using the user data. The system employs an optional pedagogic structure which will select exercises for the user, based upon data recorded in the session logs.

Using input notes, the system is able to facilitate tuning or evaluation of an instrument by plotting frequency against time (an input note's waveform), or by graphing energy versus frequency (the input note's spectrum).

In an alternate preferred embodiment, systems employing a frequencizer are able to function as a music notation system. User-generated notes are converted to graphic representations of symbols corresponding to frequencies of notes closest to the user-generated notes. In this mode, the user is able to play music and have the system record, in conventional music notation form, the generated notes.

The present invention offers many advantages over the prior art ear training systems and method. These advantages include a use of a knowledge base that includes pedagogic structured exercises and use of past performance results to advance a user efficiently. The user is able to train the ear without another person's assistance. The training system is useable for complex audio sources, such as a voice or a violin. To facilitate training, the ear training system is designed to operate in a hands-off mode, allowing the user to play their instrument without manually operating the ear training system as well.

Reference to other portions of the specification and drawings may realize other features and advantages of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is an ear training system 100 embodying a preferred embodiment of the present invention;

Fig. 2 is an illustration showing relationships among selected modules and windows of the ear training system 100;

Fig. 3 is a flow chart of the self-configuration module 210 for the ear training system 100;

Fig. 4 is a flow chart 500 of a operation of the ear training system 100 for a typical session;

Fig. 5 is a detailed view of the exercise window 280 and the palette 380;

Fig. 6 is a detailed illustration of the tuner window 340;

Fig. 7 is a detailed illustration of the spectrum window 310; Fig. 8 is a detailed illustration of the waveform window 320; Fig. 9 is a detailed illustration of the results window 330;

Fig. 10 is a detailed illustration of the tuning plot window 300;

Fig. 11 is a flow chart of the listening mode having synchronous processing;

Fig. 12 is a flow chart illustrating the tuning mode;

Fig. 13 is a flow chart of the process for the waveform window 320 of Fig. 8;

Fig. 14 is a flow chart of the spectrum display process for the spectrum window 310 of Fig. 7;

Fig. 15 is a flow chart showing operation of the test mode in conjunction with the processing mode described for the exercise window 280;

Fig. 16 is a flow chart detailing the sampling process described in the preceding figures;

Fig. 17 is a flow chart of the fast Fourier transform process identified in the preceding figures;

Fig. 18 is an operational diagram of a frequencizer for providing sampling and signal processing according to a preferred embodiment of the present invention; Fig. 19 is an illustration of an exercise window 280 providing an exercise, using an alternate preferred embodiment of the present invention, for

Twinkle Twinkle Little Star using animated characters to teach children; and

Fig. 20 illustrates a format for a freqDurationCmd command provided to the SoundManager.

DESCRIPTION OF THE PREFERRED EMBODIMENT

The present invention provides method and apparatus which assist a user in educating and calibrating their ear to correctly recognize pitches of generated notes. According to one aspect of the invention, a system selects an exercise from a curriculum and displays a series of notes for the user to generate. As the notes are generated, the system extracts a digital representation of a frequency for each of the generated notes. The system provides feedback for each note. The feedback indicates whether a pitch determined for the generated note was too sharp or too flat relative to a pitch for the note of the exercise corresponding to the note the user attempted to reproduce.

Fig. 1 is an ear training system 100 embodying a preferred embodiment of the present invention. The ear training system 100 includes a computer system 150 coupled to a digitizer 160 and an output circuit 162. The computer system 150 comprises a central processor 108, a system memory 107, an input device such as a keyboard 103, for example, a mouse 102, a fixed disk 104, a display 101, an external interface 105, a printer 109, and an input/output (I/O) controller 106. The computer system 150 optionally includes a floating point unit (FPU) 110 to provide enhanced features to embodiments implemented with computer systems 150 having central processing units 108 lacking an integral FPU, as will be explained further below. A system bus 115 couples all the components of the computer system 150, providing a link between all of them. A preferred embodiment uses an appropriately programmed Apple Macintosh^® Plus or better (Apple Computers, Cupertino, California). It is to be understood that other platforms are available and may embody the invention in other forms. The invention is not limited to embodiments that include Apple computers. The digitizer 160 receives audio analog input and converts it into digitized audio. In the preferred embodiment, the digitizer 160 provides 8 bit sampling at 22.1 kHz, or better. The output circuit 162 includes an digital-to-analog converter to convert digitized sounds to analog sounds and a speaker to reproduce the analog sounds. The digitizer 160 and the output circuit 162 shown coupled to the external interface 105 are actually an integral part of the Apple Macintosh computer and provide the required audio sampling and playback for one preferred embodiment of the present invention.

Fig. 2 is an illustration showing relationships among selected modules and windows of the ear training system 100. The modules of the ear training system 100 include a launch module 200, a gestalt and self-configuration module 210, a group of session-management modules that include a scenarios module 220, an exercises module 230, and a session log module 240, and sound processing modules including a sampling module 250, a pitch determination module 260, and a sound generation module 270.

There are seven windows associated with the ear training system 100. They include an exercise window 280, a content window 290, a tuning window 300, a harmonic content window 310, a waveform window 320, a result window 330, and an electronic tuner window 340.

Window selection is made through selection of a desired window choice from an options menu integrated into each window. Other menus of the ear training system 100 permit selection of other features and customization of operation of the ear training system.

Another mechanism used to control operation of the ear training system 100 is use of a palette 380 providing a plurality of icons associated with various functions. When the ear training system 100 is implemented on a particular computer system 150 having a graphical user interface (GUI), icon selection from the palette 380, or menu choices from any of the menus is possible by use of the mouse 102, or other relative pointing device.

The launch module 200 directs initialization of the ear training system

100 followed by an execution of the gestalt and self-configuration module 210. The self-configuration module 210 automatically enters default values for ear training system 100 operation based upon a model of the computer system 150. This self-configuration optimizes operation to compensate for speed and central processor type differences between computer system platforms.

After launch and self-configuration, a particular exercise is displayed in the exercise window 280, waiting for the user. This initial exercise is selected by use of a scenario module 220, an exercise module 230 and a session log 240, if available. The exercise module 230 contains numbered sets of a series of notes. Each exercise is designed with some particular objective in mind. For example, an exercise may test an ability to play a chromatic scale, or test fingering for note generation, for example. The exercises may be quite specific, depending upon a scope of a curriculum the ear training system 100 is designed to cover. The notes of the exercises are stored in chained ASCII format

The scenarios module 220 includes at least one decision tree. The decision tree identifies a default progression of exercises designed to efficiently advance the user in pitch recognition and note generation. Typically, a user starts at exercise one and follows the pedagogic structured decision path after satisfactory completion of each particular one exercise identified in the user's decision tree. The decision tree selected for a particular user is based in part upon the user's age, instrument type and measured performance for a set of evaluation exercises selected when the user first begins to use the ear training system 100. At a beginning of each session, the ear training system reviews the user's session log 240 to determine the user's demonstrated proficiency for any exercises completed in the user's decision branch. If performance proficiency is sufficient, the next pedagogic structured choice from the decision branch selects and loads a next particular exercise from the exercise module 240. If the demonstrated proficiency indicates that specialized work on specified aspects of the user's technique is necessary, the ear training system 100 will branch to adjustment scenarios in the scenario module 220 based upon evaluation of the information in the session log 240. The user has the option to manually select particular exercises. Particular manual choices are made from the content window 290 to select a particular aspect of their instrument to improve or from the result window 330. Thus, the session management modules select a particular exercise to load. The series of notes to be played and the mode are displayed in the exercise window 280. The ear training system 100 has two typical modes of operation, the practice mode and the test mode. In each of these modes, the user attempts to play prompted notes with a particular precision as measured within a predetermined step-interval. For each of the series of notes, the user sings or plays a note. The sampling module 250 takes the analog sound and digitizes it to produce a set of digitized values. There are many prior art techniques available for digitizing sounds, many of which would be applicable for the sampling module 250. The user has the option to select the waveform window 320 to directly display the sampled and digitized sounds from the sampling module 250. After sampling the sound, the set of digitized values is processed by the pitch extraction module 260. The pitch extraction module 260 determines a frequency for the set of digitized sounds. In the preferred embodiment, fast Fourier transforms extract the frequency. Optional, and more powerful and accurate techniques are also used depending upon the configuration for the computer system 150, specifically whether the FPU 110 is present. The user may visualize pitch of generated notes in several ways in addition to feedback presented in the exercise window 280. The ear training system 100 presents extracted pitch information to the spectrum window 310, to the tuning plot window 300, or to the tuner window 340, depending upon a particular use for the system 100.

The pitch of the played note is compared with the pitch of the prompted note. If the played note is within the present step-interval selected for the user (a predefined margin), then no further action is required for the particular note. Should the note be flat or sharp, an appropriate indication in the exercise window 280 provides feedback to the user. Depending upon whether the ear training system 100 is operating in the practice mode or the test mode, the system will store results of an exercise into the session log 240. For test mode operation, the ear training system 100 collects information regarding all the notes of an exercise prior to adjudicating performance for each note. After completing the entire session, the system 100 provides feedback and grading to the user. The user has the option of having the exercise, or a reproduction of the user's performance, played by the system 100. The playback module 270 reproduces digitized sounds. In the practice mode, results are stored in memory, such as RAM. The storage duration is for one exercise, allowing the system to play back the performance to assist the user in evaluating the

performance. The preferred embodiment does not record all this information to a session log to preserve processing and allow users to practice. It should be recognized that for some applications, storage of all exercises is desirable.

In the above-description, a term referred to as a step interval is used to judge precision of generated notes. One popular system for transcribing musical notes is use of a harmonic scale in which each music octave is divided into 12 notes; C, C sharp, D, D sharp, E, F, F sharp, G, G sharp, A, A sharp, and B. The scale continues to repeat itself for each octave. Advancing from one note to a next is a half-step interval, thus the notes are said to have a half-step between them. Each note of each octave has an associated frequency associated with it. In one notational system, desired notes, notes are numbered consecutively, such that middle C has a note number of 60. Middle C sharp is numbered note 61. This system continues for each of the octaves. Not only does each note have a representative number, but each note has a frequency associated with it. While the note numbering system is a linear system, the frequencies associated with notes of an octave are exponential. Thus, a change of frequency from note 60 to note 61 is one-half a change of frequency from a change of note 72 to note 73 (Note 72 is C, one octave higher than middle C).

It is more convenient for the ear training system to operate with the linear system of note notation, thus the system converts frequencies to decimal note numbers. In the linear notation system, each step from note to note is divided appropriately. As a change from note 60 to note 61 is a half-step interval, when the system 100 judges user precision with one-eighth, one twelfth or one-sixteenth note precision, the range is divided accordingly. That is, for one-sixteenth step intervals, ranges are established from 60 to 60.125; 60.125 to 60.25; 60.25 to 60.375; 60.375 to 60.5; 60.5 to 60.625; 60.625 to 60.75; 60.75 to 60.875; and 60.875 to 61.0. For the one-half step interval between note 60 and 61, eight divisions provide one-sixteenth step interval. A user must perform the prompted note within 0.125 on each side of the prompted note to be judged as being played properly.

The mechanism for converting frequency to a note number is performed with the following mathematical equation: F = 8 . 18exp ( 0 . 057 8 * N) ; where F is frequency in Hertz and N is a numerical, i.e. "decimal value", for a note. For example, for C4, N=60 and F = 261.625566 Hz. Table I identifies decimal values for eleven intervals of notes.

The TABLE I values identify, for each octave interval, a decimal note value for a corresponding chromatic scale note in the identified interval. The values of Table I correspond to MIDI values for the corresponding note.

A first step in obtaining the note number is to determine a frequency for a played note. There are many mechanisms and algorithms, as well known in the art, to extract a frequency from an audio signal. The preferred embodiments of the present invention use the digitizer 160 and the central processor 108 including the FPU 110 (if available) or a frequencizer (described below) to determine a frequency to establish a note number. Fast Fourier Transforms (FFT) performed on a set of sampled digitized sounds facilitate extraction of a note's frequency in the preferred embodiment.

Fig. 3 is a flow chart of the self-configuration module 210 for the ear training system 100. The flow chart of Fig. 3 illustrates a relationship between a main process flow of the system 100 and resources, user interface elements, auxiliary flow and function. For example, for the Macintosh system, the self-configuration module 210 checks, at step 211, the system version number of the operating system to be sure that the version is at least version 6.0.7. This check, specific to the Apple computers, ensures that SoundManager 2.0 is available on the system. SoundManager is a part of the Macintosh^® toolbox which handles sound sampling and synthesizing. The

SoundManager is further described in Apple^® Inside Macintosh, volume VI, hereby expressly incorporated by reference for all purposes. Users are required to upgrade if their system version is less than version 6.0.7. As illustrated, a dialog box 211A appears directing the user to upgrade the operating system to a version greater than 6.0.7 if less than 6.0.7. Other checks may be necessary for other systems, such as checking whether a user employs Windows 3.1 with its multimedia extensions, for example.

Fig. 20 illustrates a format for a freqDurationCmd command provided to the SoundManager. The SoundManager responds to commands in this format to generate sounds. Sample code in Appendix Al illustrates use of the freqDurationCmd command.

At step 212, the self-configuration module 210 checks for a presence of the digitizer 160. The Sound Manager attributes an identification number to the digitizer 160 at start-up. The identification number permits access to the digitizer 160 by the ear training system 100. If the ear training system 100 does not detect the presence of the digitizer 160, it notifies the user of a possible malfunction through a dialog box 212A. Next, at step 213, the self-configuration module 210 checks for the FPU 110 or an FPU integral to the central processor 108. Without the FPU 110, the ear training system 100 enters a PowerMiser ON mode. A dialog box 213A notifies the user that the PowerMiser mode is on. PowerMiser refers to a reduced features mode available for the ear training system 100 which compensates for an absence of the FPU 110 by reducing numerical processing so that note processing time is acceptable.

In PowerMiser on mode, the preferred embodiment, for example, provides for smaller sample sizes, disables octave determinations (permitting determinations of a note of the chromatic scale only), and reduces range of the ear training system 100. For computer systems 150 with central processors 108 operating at a relatively slow clock rate, for example less than or equal to 16 MHz, or missing the FPU 110, extensive numerical calculations degrade an ability of the computer system 150 to provide real-time feedback to the user (in synchronous mode). For these computer systems 150, the ear training system 100 will determine a note by the played note's frequency of maximum energy. The system 100 does not determine an octave for the note. If the PowerMiser mode is off, such as when the FPU 110 is present, the system 100 establishes both the note and its octave. A full power mode is possible for computer systems 150 having the slow central processor 108 lacking the FPU 110 through addition of a custom hardware device referred to as a frequencizer. The frequencizer includes the digitizer 160, a ROM-based frequency-extraction algorithm, and a digital signal processor (DSP) to perform the frequency extraction (e.g., fast Fourier transforms and autocorrelation functions). The frequencizer sends a signal representative of a frequency of a sampled sound set to the computer system 150. The frequencizer, to maintain an ability for asynchronous processing, includes an ability to send sound samples to the computer system 150, just as the digitizer 160 operates. Alternate embodiments of systems employing a frequencizer are able to function as music notation systems.

At step 214, a color depth check loads attributes used for the monitor 101. Identification of screen depth by the ear training system 100 selects between a monochrome display (1 bit depth), or color (4 or 8 bits depth). These resources include various graphic objects including but not limited to icons, notes, musical symbols, pictures, et cetera. Step 211 through step 214 define a gestalt function for the ear training system 100 to access and optimize the ear training system 100 for operation with the computer system 150.

At step 215, the correct resources for color depth and other configuration parameters are collected. These other configuration parameters include screen pixel resolution, RAM size, Macintosh^® type, digitizer 160 type, port used for the digitizer 160, if applicable. For first time operation of the ear training system 100, a dialog box 215A appears requesting user information. Auxiliary flow performs a registration step at 215B which saves a serial number, location of system licensor, and various settings chosen by the user through checking a special window. After any of steps 215 or 215B, the self-configuration module 210 saves user preference and system information to a file. Thereafter, the ear training system 100 initializes itself at step 216. Step 216 presents an about window which loads information from the user file and any QuickTime^™ animation files if QuickTime^™ is present. QuickTime^™ is an animation standard developed for use with Apple Macintosh computers.

The about window step 216 presents a dialog box 216A which identifies the product, the manufacturer, configuration and user information, in addition to other information. During the intialization process or at any time upon selection of a menu item, a preferences window is displayed, to permit the user to make special choices regarding some attributes of the interface, such as, for example, colors of various objects, musical nomenclature style (e.g., ABC, mobile Do, fixed Do, or fingering when applicable), a volume of sound output, enablind or disabling a voice prompt incorporated within the ear training system 100, adjusting a background rejection level, switching PowerMiser mode. Changing any of these values results in a branch to the auxiliary step of 216C to set preferences. These new values are saved in the user preference file and the system 100 reinitializes at step 216.

Thereafter, at step 217, the ear training system 100, depending upon user preferences set in the preferences window at step 216, either automatically sets conditions or allows the user to self select an exercise. For automatic operation, the ear training system 100 checks the curriculum (from the scenarios module 220 of Fig. 2) to determine a decision branch applicable to the user based upon information supplied in the about window at step 216. This information includes age, instrument and skill level, for example. Next, the automatic mode reviews the session log to evaluate performance to date to properly place the user in the decision tree. Based upon the user's decision tree and past performances, the ear training system selects a suggested exercise for the user. The suggested exercise is loaded from the exercises module 230 (Fig. 2). The system 100 displays the notes and parameters of the conditions in the exercise window 280.

For self-selection, the ear training system 100 permits the user to identify a particular one exercise from the exercises module (Fig. 2). The ear training system 100 at step 217 will use the identified exercise to set conditions for

performance at step 217. The set conditions step 217 thereafter displays the particular one exercise in the exercise window 280.

Fig. 4 is a flow chart 500 of a operation of the ear training system 100 for a typical session. The flow chart 500 of Fig. 4 illustrates a relationship of the main flow process with resources and user-interfaces. Session-management modules and sound processing modules participate in execution of a session. The session-management modules are responsible for selection of a particular exercise from a curriculum and presenting a series of notes to the user to prompt generation of the notes. The session-management modules initially select and load a particular exercise and display a series of notes in the exercise window 280. Associated with, the exercise is a performance mode, such as practice or test mode. Selection of the particular exercise results in a. display of the exercise window 280 which has a plurality of scores and superimposed notes. At start, step 502, the ear training system 100 displays an exercise window 280 with a loaded exercise. At 504, the system 100 enters a listening mode and prompts the user to get ready by use of a dialog box 504A. At step 506, a performance process synchronously prompts the user to play a note, at which time the system 100 provides feedback on the played note relative to the prompted note to the user. The system 100 optionally displays the feedback in many different forms, a typical way is directly in the exercise window. Upon completion of the performance at step 506, the system 100 evaluates the user's performance. Thereafter at step 508, the system 100 selects a new exercise. The new exercise, derived from the session-management modules presents the new exercise to the user in an exercise window 280 (Fig. 2). For the new exercise, which may be the same previous series of notes, the system 100 selects an appropriate mode. The practice mode provides the user with relatively large amounts of audio and visual support to satisfactorily perform the exercise. The same series of notes, with less audio and visual feedback, can present an entirely different exercise to the student. A goal of the ear training system 100 is to train the user to recognize and perform to a specified precision without extraneous support from the system 100.

Thus, the new exercise selected at step 508 can return to step 502 if the exercise is to be a practice mode exercise. If the ear training system 100 is to test the user based upon the new exercise loaded at step 508, the program flow advances to step 510. Step 510 places the sound processing modules into a sample mode. At step 512, the user completely performs the exercise. Thereafter, at step 514, the

performance is asynchronously processed to determine precision of the performance. The processing of step 514 includes saving the data in a session log and providing feedback regarding the user's precision. Depending upon the exercise, if the performance was not above a particular level, the system 100 expects the user to perform the exercise again. The user continues to perform, steps 512 and 514, until the user achieves the exercise at a particular result level. After the processing step 514, the system 100 advances to step 516 to load information from the session log and the display resources. Information from the user log allows the user to monitor progress and proficiency. One way to provide the user with the information is through display of the results window 330 (Fig. 2).

Thereafter, at step 518, a global auditing of past performances provides a historical assessment of proficiency for different aspects of the ear training. For example, an statistical analysis of the user file evaluates all exercises by note. The analysis could indicate that notes F and G are consistently sharp while D is often flat and the other notes are, on average, performed adequately. An audition window 518A displays a graphic representation of the intonation abilities of the user, for the range of notes stored in the session log, The view in the audition window 518A represents actual patterns of recognition of pitches for each of the notes practiced in the curriculum. The graphic display of the audition window 518A allows the user, or others, to visually perceive strenghts and weaknesses of the user's ear. The audition window 518A provides the user with an option to print the summary information displayed. Other analysis tools are possible, depending upon implementations of the ear training system 100. The system 100 saves global auditing information in the session log.

Fig. 5 is a detailed view of the exercise window 280 and the palette 380. The exercise window displays a series of notes 281 for the exercise in traditional musical notation on musical scores 282. The exercise window 280 includes a menu bar 283 on top listing different menu choices available to the user. On the right of the exercise window is a command bar 284 illustrating particular options set for the mode. For example, the view mode which controls the note and staff size (reduced in example), the tempo (80 in example), the routine type (loop on miss in example), the level of visual or audio support provided, the sensitivity and the nomenclature chosen. Audio support includes playing an exercise or note prior to or subsequent to the user generating the note. A status bar 285 along the bottom of the exercise window indicates the instrument type, system mode, status and prompted note.

The palette 380 identifies various selectable features by association of the feature with an icon. The highlighted ear icon in the palette selects display of the exercise window which will prompt notes and wait for the user to play the note. The speaker icon will play an exercise, while the microphone icon samples and stores notes the user generates. The hammer icon initiates an evaluation of any sampled and stored notes resulting from selection of the microphone icon. The tuning fork icon selects the tuning window 340, the spectrum icon selects the spectrum window 310 and the waveform icon selects the waveform window 320.

In the practice mode, the system 100 prompts the user to play a particular note by highlighting the note, such as for example by displaying the note with inverse video or surrounding the note with a blinking rectangle 286. After the user generates the prompted note, the ear training system 100 determines a frequency for the played note and provides the user with feedback regarding performance on the note prior to prompting a new note. The feedback in the preferred embodiment results from superimposing a chevron 287 of one or two stripes either above or below each note. For example, for a note played slightly sharp, a single chevron, pointing down, is displayed above the note symbol. Similarly, for notes slightly flat, the system 100 places a single upwards directed chevron under the note. For notes which are much further off, such as playing above C sharp for a prompted C, an appropriately pointing double chevron juxtaposed near the note denotes to the user that the generated note differed significantly from the expected note. The exercise window 280 also includes another user notation symbol for a bad data sample. The cross symbol 288 signifies that a problem exists with the sample, and the problem prevents evaluation of the note. Possible causes of a bad data sample include the user generating two notes at one time if multi-frequency tones are not allowed (for example, for systems 100 lacking the frequencizer), or some other sound

superimposed over the user generated note, such as a telephone signal. Other visual and auditory feedback mechanisms are possible to indicate performance precision to the user.

In evaluating a user's performance for particular notes, the ear training system 100 uses information from the user regarding instrument type and

performance ability. The ear training system uses this information to select an initial precision step. The preferred embodiment selects from one-eighth, one-twelfth and one-sixteenth step interval's, although other steps could be used. This interval step indicates how precise the user's reproduction of a note must be to warrant

adjudication that the note was properly played. The ear training system dynamically monitors performances during test modes against this interval step. After a user performs satisfactorily such that a predetermined percentage, for example eiglity percent in the preferred embodiment, of the reproduced notes are correct, the ear training system automatically selects a more precise step interval for use. Thereafter, the user's performance is gauged against the new step interval. As the user continues to improve and performance is judged correct for eiglity percent of the notes, the ear training system uses progressively more precise step intervals for the user. Should the user ever demonstrate a proficiency for which less than a second predetermined percentage, for example forty percent, of notes are reproduced within the precision of the present step interval, the ear training system 100 adapts the system 100 to thereafter use a less precise step interval. The ear training system is able to adjust itself to a user's ability level, providing a valuable tool for users of all ability levels.

User performances executed while in test mode, which is the

asynchronous mode, are recorded into a session log, along with sampled sounds to permit the session to be replayed for the user. The test mode provides feedback, for each note, after the user has played all notes of the session. It is this mode which grades a user's performance and determines the user's proficiency level. Different statistical analyses and tests of the session log influence selection of a next exercise from the curriculum.

Fig. 6 is a detailed illustration of the tuner window 340. When optionally selected by the user, the user is able to synchronously determine proper pitch of a played note. The window updates, based upon the sampled sound, the present frequency, acting like an electronic tuner. The window includes an arrow 341 which moves relative to a scale 342. The arrow 341 indicates a percentage of step the note of the sampled sound is away from a proper note (i.e., tuned). The three notes displayed indicate the one step range (-50% to 50% corresponds to one step) represented on the window, and which frames the sampled note. This window provides quantitative information for feedback rather than indicating only whether a note was flat or sharp.

Fig. 7 is a detailed illustration of the spectrum window 310. By selecting the spectrum window 310, a user is able to evaluate energy versus frequency of a sampled sound. It is sometimes useful to determine a distribution of energy between the different frequencies composing a note. This permits the quantitative analysis of a content of a note or tone according to its fundamentals and harmonics frequencies. This analysis is useful to asses a quality of a particular instrument by careful review of notes it generates

Fig. 8 is a detailed illustration of the waveform window 320. The waveform window 320 is another diagnostic tool providing specific information regarding a played note. The plot in the waveform window 320 is amplitude versus time. This is a traditional representation of a wave. By observing a regularity and homogeneity of the waveform, a user is able to judge a purity of a generated note or a pertinence of a sample.

Fig. 9 is a detailed illustration of the results window 330. The results window 330 is available to present summary information regarding the completed exercises. The results window 330 presents summary information in histogram format. The format graphically indicates, for each completed exercise, a last score and a best score. Statistical and other evaluations of trends and proficiency measurements are accessed from the results window. Some of these tests include mean average, variance, standard deviation, tempo problems, range addressed, and patterns of mistakes like specific notes or intervals fingering.

Fig. 10 is a detailed illustration of the tuning plot window 300. This window permits a user to accurately tune their -instrument. A desired frequency is used as a reference frequency. The tuning plot window 300 displays the closest note. When the user generates a note, the note's frequency, relative to the reference, is plotted against time. Additionally, pitch error indicators dynamically indicate the played note's relative pitch. That is, as shown, the note began too flat and became sharp. As time progressed, the note became decreasingly sharp, approaching the reference frequency. The proper tuning is graphically displayed in the tuning window 300, facilitating accurate reproduction of notes. Sharp and flat indicators graphically provide information regarding relative pitch compared to a reference note.

Fig. 11 is a flow chart of the practice mode having synchronous processing. As shown, the system prompts a note at step 510. A sample process 522 converts analog audio signals to digitized signals, through eight bit sampling at a sampling rate of 22.1 kHz or better. The sampling step 522 provides a time-out error if it does not detect audio input within a prespecified period. At step 524, the ear training system 100 performs fast Fourier transforms of the sampled and digitized data to extract its frequency. After extracting the frequency, a storage step 526 records the information to memory for later playback. Thereafter, a delay step 528 precedes displaying the result at step 530. The delay step allows adjustment of a time interval between consecutive samples to an assigned tempo independently of a speed of processing of the sample which will vary significantly for different hardware configurations of the ear training system 100, such as central processor 100 clock speed or type, or a presence of an FPU 110 or a frequencizer. The display step 530 provides feedback regarding the played note relative to the prompted note. In the preferred embodiment, the feedback results directly in the exercise window 280. The highhghted icon of the palette 380; commands the system into the particular mode.

Fig. 12 is a flow chart illustrating the tuning mode. The tuning mode is available only on computer systems 150 having the FPU 110. In this mode, the system 100 continuously processes a an input audio frequency and plots the result against time. Displaying the information as an electronic tuner is one preferred embodiment. The plotting window permits visualization of sustained accuracy and vibrato

characteristics. A quantitative indication of flatness or sharpness of a note results from use of the tuner window. To better simulate a Vu-meter, the system does not plot frequency versus time, but rather completely redraws the arrow.

A tuning mode begins with a particular display window at step 550. The system 100 samples input audio at step 552. Fast Fourier transforms, step 554, extract a frequency for the sampled sound. The system 100 records pitch errors detected after extraction of the frequency at the step 554. At step 556, the system 100 plots frequency, either versus time or per acquisition. The flow returns to step 552 for the next sample. The process continues until the user terminates this mode.

Fig. 13 is a flow chart of the process for the waveform window 320 of Fig. 8. Selecting display of the waveform window 320 executes the steps 560 through 564. The system 100 will review a set of sampled notes. At step 560, the system 100 loads a note. At step 562, the system 100 opens the waveform window 320.

Thereafter, at step 564, the system 100 plots the sample in the waveform window 320. The process waveform of the sample processed is displayed until the user deselects this waveform view mode.

Fig. 14 is a flow chart of the spectrum display process for the spectrum window 310 of Fig. 7. Similar to the flow chart of Fig. 13, the process loads a note from a set of sampled notes at step 570. After performing a fast Fourier transform on the loaded note, step 572, the process opens the spectrum window 310 at step 574. The spectrum display process then plots, step 576, a power content of the sample versus frequency. Again, the spectrum of the sample processed is displayed until the user deselects the mode.

Fig. 15 is a flow chart showing operation of the test mode in conjunction with the processing mode described for the exercise window 280. The preferred embodiments contemplate providing a test mode after each exercise. The test mode is an asynchronous processing of the users notes. The test mode provides a faster interactivity allowing work at faster tempos. This faster interactivity is possible without regard to the computer system 150 because the system 100 samples and stores data and waits until completion of the exercise to process the stored samples. Processing the data will be dependent upon performance capabilities of the computer system 150, but the processing does not impact the execution of the exercise..

In the test mode, the system 100 prompts a note, step 580, samples the sound, step 582, and stores the samples at 584. These steps 580 through 584 continue for each note of the exercise. Thereafter, for steps 586 through 590, the stored sampled notes are processed to extract frequency and compare to expected values. Step 586 loads a note, step 588 extracts its frequency, and step 590 displays the extracted frequency relative to a value for the corresponding prompted note. The processing continues for all the stored notes.

Fig. 16 is a flow chart detailing the sampling process described in the preceding figures. The sampling process begins at start, step 600. At step 602, the system 100 initializes the sound manager. A timeout check at step 604 prevents the system 100 from waiting indefinitely for user input. If timeout is true, the sample process asserts a timeout error, step 606, and ends.

If there is not a timeout at step 604, the process advances to step 608 to get a set of samples. Step 610 generates a value based on the deviation of the last one-eighth samples. Step 612 checks the value to ensure that the value is greater than a minimum value. If it is not, the sampling process returns to step 604 to test for timeout If the test at step 612 is greater than a threshold, the process advances to step 614 to delay, then to step 616 to obtain more samples. At step 618, an amplitude value derived from a deviation from a first one-eighth samples. The amplitude value must exceed a minimum at step 620 to proceed. If the amplitude is not sufficiently large, the process returns to step 604 to test for a timeout error.

Passing the amplitude test at step 620 results in calculation of a new amplitude value derived from a deviation of a last one-eighth samples of a most recent sample set, step 622. Again, step 624 tests the amplitude and if it is too low, the process branches to step 604 to test for timeout. Passing the amplitude test at step 624 results in a calculation of a new amplitude value based upon a deviation of all samples, step 626. If the amplitude is sufficiently large, step 628, the process returns the samples, step 630, and ends, step 632. If the amplitude at step 628 is not above a threshold, then the process branches to step 604 to test for a timeout error. Thus the sampling process performs amplitude qualification on samples from the digitizer 160. Fig. 17 is a flowchart of the fast Fourier transform process identified in the figures above. The FFT process begins at 700, start. Step 702 computes a mean of the samples provided from a sampling process, for example. Step 704 subtracts the mean from all the samples. Step 706 computes a Fourier transform of all samples. Step 708 obtains the discrete frequency of maximum energy. Step 710 interpolates a real frequency from all neighboring energies. Step 712 tests for a presence of the FPU 110 in the computer system 150. If there is a FPU 110, the process continues to step 714 to compute an autocorrelation of a frequency spectrum for the sample. Step 716 obtains a frequency corresponding to a first maximum of autocorrelation energy. Step 718 interpolates real frequency from neighboring energies. Next, step 720 chooses a best estimate of true frequency. Sample code in Appendix A2 illustrates a

representative process for extraction of frequency of a sampled note.

The test at step 712, if it indicates an absence of the FPU 110, the process skips steps 714 - 718 by branching to step 720. Step 722 converts the frequency into a decimal note by techniques well known in the art. Step 724 returns the decimal note and the frequency spectrum of the sample. The process is complete at step 726, end.

Fig. 18 is an operational diagram of a frequencizer for providing sampling and signal processing according to a preferred embodiment of the present invention. The frequencizer has two modes of operation, the frequencizer mode and the digitizer mode. In the digitizer mode, the frequencizer performs like the digitizer 160 of the computer system 150. It samples analog sound, eight bits, of 22.1 kHz or better. Digitizer operation is shown on the right hand side of Fig. 18. The digitizer performs ASCII conversion on the sampled values and chains the ASCII values to produce an output file. The output of the frequencizer is coupled to the computer serial port. This digitizer mode permits operation of the ear training system in asynchronous mode. This chained ASCII is a common convention for communication of values. For example, to send 60.25, five ASCII characters are sent. These characters represent "6", "0", ".", "2", and "5".

A more powerful mode is the frequencizer mode which includes signal processing circuitry to perform autocorrelation and fast Fourier transforms to convert a spectrum of energy. The frequencizer converts the energy spectrum into a decimal note, followed by an ASCII conversion. The output of the frequencizer, coupled to a computer system's 150 serial port, are frequencies of sampled analog sounds. The ear training system 100 employing the frequencizer has an ability to quickly process synchronous exercises at relatively fast tempo rates and provide real-time feedback. The inclusion of a frequencizer, by virtue of increased processing speed, allows an embodiment of the ear training system 100 employing a frequencizer, to process multiple frequencies usch as two notes being played at once. Further, the frequencizer allows coupling with a notation capacity such that the user can play notes, have the pitch extracted, and have the played notes graphically represented on the display.

In conclusion, the present invention provides an efficient mechanism to enhance musicality and to educate and calibrate a user's ear to recognize and produce precise pitches for a series of notes. The use of a structured pedagogic scenario that selects particular exercises to efficiently advance a user is an important aspect of a preferred embodiment of the present invention. The structured pedagogic exercises are modified by analyses performed on recorded data of past performances to correct aspects of performances below a desired proficiency. The ear training system customizes itself to existing software to enable peak performance on multiple classes of computer systems. A digitizer converts analog audio into digital samples.

Processing of the digitized samples extracts a frequency of the samples and permits comparison to expected values. A frequencizer can improve performance for all classes of computer systems. Providing a practice mode and a test mode (synchronous and asynchronous) with varying levels of feedback enhances an ability of the ear training system to train a user in precise recognition and reproduction of pitch.

While the above describes particular preferred embodiments of the present invention, various alternatives, modifications or alterations are possible. For example, different applications of the invention have an ability to adjust a curriculum and notation to different user needs. For young children for example, the curriculum will be made of animations related to a song or a specific exercise. Children would be prompted to sing notes by use of an animated character which would indicate the note to be played. Rather than using a symbol, such as the chevrons which could be cryptic to the child, the animated character provides the feedback to the child user. For example, in an exercise designed to target a particular pitch, a dog, running , at a proper tempo, would jump through a hoop for correctly sung notes. For sharp notes, the dog would overjump the hoop, while flat notes would cause the dog to undershoot the hoop.

The graphic character is customizable to particular songs. For example, to teach children to sing Twinkle Twinkle Little Star, an animated star walking above a staff would prompt a user to sing particular notes. Similarly to the example with the dog, the animated star could leave a smiling star for properly produced notes, while producing frowning stars for sharp or flat notes. Fig. 19 is an illustration of an exercise window 280 providing an exercise for Twinkle Twinkle Little Star using animated characters. As shown, the animated character 800 advances above the staff 282, from note to note. Correct notes produce a smiling star 802, sharp notes produce frowning, downward looking stars 804, and flat notes produce frowning, upward looking stars 806. Bad samples have frowning stars with crossed-out eyes 808.

Therefore, the above description should not be taken as limiting the scope of the present invention. The scope of the invention is set forth in the appended claims.

APPENDIX A1

VAR

mySndChan : SndChannelPer; (pointer to a SndChannel ) mySndCmd: SndCommand; (a sound command)

myErr : OSErr;

amplitude : Integer;

frequency : Integer ;

BEGIN

amplitude := $FF; (loudest possible amplitude) frequency := 60;

mySndCmd.cmd := freqDurationCmd;

mySndCmd.param1 := 2000; {duration in half-milliseconds} mySndCmd.param2 := BitShift (amplitude, 24) + frequency;

(play the sound)

(assume that mySndChan points to a valid SndChannel) myErr : = SndDoCommand (myAndChan, mySndCmd, false ) ;

IF myErr <> noErr THEN DoError(myErr) ;

END;

APPENDIX A2 source code of function extract (synchronous processing)

XFCN name: Extract╌ extract the frequency of sound - - Form:Extracl(<SID number>,<note>,<tempo1>,<tempo2>)

Example: Extract(1.60.100.500)

Note: if note is 0. resampling step will be 2: otherwise it will be adjusted for best precision in frequency extraction around the given Hypercard note

MPW compilation syntax

C -b extract, c

Link -o testXcmd -w -rt XFCN=201 -sg Extract -m ENTRYPOINT extract..c.o∂ {Llbrarles|Runtime.o {CLlbrarics-|St.dCLlb.o {Llbrarles|Interface.o∂

(LlbranesfHyperXLlb.o {CLibrarieslCSANELth.o

*/

# include <HyperXCmd.h>

#include <Type.9.h>

#include <String.h>

#lncludc <strings.h>

#include <Memory.h>

#include <ToolUtils.h>

#include <OSUuls.h>

#include <time.h>

#include <FixMath.h>

#include <Math.h>

#include <SANE.h>

#include <SoιmdInput.h>

#include <GestaltEqu.h>

/* Function prototypes */

void Fail(XCmdPtr. short);

short GetParamCount(XCmdPty, short, short):

Handle CopyStiToHand(char *):

extern void sndinfo(int n. float fll. float "mean, float *amp):

extern float fp(short n, float *v. float /.. float 'a. float ip. short which):

extern void four(short nn. float *%, short sens):

extern void realft(short n. float *2, short sens);

extern void autoflt(short n. float *z):

I* Definition of constants */

#define minParamCount (short) 4

#define maxParamCount (short) 4

#define ParamError (short) 1

#define MemoryError (short) 2

#define SIDError (short) 3

#define SampleError (short) 4

#define GestaltError (short) 6

#define TimeEnror (short) 7

#define DELAY 1024

#define NBMAX 512

#define kAsynch true

pascal void EntiyPoint(XCmdPtr paramPtr)

{

/ *

Declarations locales

* /

OSType selector:

OSErr err;

long respunse.refnum.soundsize.duree,timeout.tempo.tempo1,tempo2; short 1,count.i,j ,k.step .nba.note:

float mean,amp,if.tpas.num.theta;

Fixed rate;

extended xrat.;

Stx255 myInputDevice,fname;

Handle myDeviccIcon.msghndl;

Boolean cfflag;

SPB mySound:

Ptr samples;

char result[32]:

float freq[NBMAX].buf[NDMAX],fen[NBMAX+1].ac[NBMAX+1];

clock_t debul,stemp;

/*

parametre d'appel

* /

if(GetParamCount(paramPtr,minParamCount.maxParamCount) = = (short) -1) {

Fall(paramPtr.ParamError);

retura:

]

strcpy((char *) fname.'\pStatus");

cflflag=false;

count=(short) HandleToNum(paramPtr.paramPtr->params[0]); note=(shorl) HandleToNum(paramPtr.paramPtr->params[1]):

if (note) theta=8.1758"exp(0.05776*note):

tempo1 = HandleToNum(paramPtr.paramPtr->params[2]);

tempo2 = HandleToNum(paramPtr,paramPtr->params(3]):

/*

Attribute Son

*/

selector='snd ';

err=Gestalt(selector,&response):

if {err) {

Fail(paramPtr,GestaltError) :

return;

}else {

if(!response&0×0008) {

Fail(paramPtr.SIDError) :

) else{

erreSPBGetIndexedDevice(count.myInputDevice.&myDeviccIcon); if (err){

Fail(paramPtr.SIDError);

) else {

err=SPBOpenDevice(myInputDevice.& WritePermission.&refnum); if (err) {

Fail(paramPtr.SIDErτor);

} else {

err=SPBGetDeviccInfo(refnum.'sral'.(Ptr) &rate);

xrat=Fix2x(rate);

step= (note ? (int) xrat./(16*theta) : 2):

if(step < 1)fstep = 1;

tpas=step/xrat;

nbs-NBMAX;

soundsize=DELAY+step*nbs;

samples=NewPtr(soundsize):

if(samples= = NULL) {

Fail(paramPtr.MemeryErτor):

return:

}

mySound.inRelNum=refnum;

mySound.count=soundsize;

mySound.milliseconds=0;

mySound.buffeiLength=soundsize;

mySound.bufferPtr=samples:

mySound.completionRoutine=NULL;

mySound.interruptRoutine=NULL:

mySound.υserLong=0;

mySound.error=noErr;

mySound.unused1 =0;

debutaclock0:

timeout=30;

msghndl=Handle) CopyStrToHand("Sampling");

SetFieldByName(paramPtr.emag.fname.msghnd1):

j=(nbs > 64 ? nbs/8 : 8);

for (;;) {

duree=(clock()-debutl)/CLOCKS_PER_SEC;

iflduree >= timeout) (

Fail(paramPtr.TimeError);

break:

}

err=SPBRecord(& mySound.!kAsynch);

if (! err & & mySound.count == soundsize) { *(samples+soundsize-step*i));

sndinfo(J.buf.-Sαnean.&amp):

if (amp < 100 ) continue:

stemp=clock0:

do{

tempo=(1000*(clock() -stemp)) / CLOCKS _PER_SEC; } while (tempo < tempo1);

err=SPBRecord(& mySound.!kAsynch);

if (err) continue;

for (i=0; i<J; i++) buf[i+ ] l=(floal)((unsigned char)

*(samples+DELAY+step*1))- 127. :

andinfo(j,buf-&mean,&amp):

if (amp < 100.) continue:

for (i=0: i<J; i++) buf[j-i]=(float)((unsigned char)

*(samples+soundsize-step*1))- 127. ;

sndinfo(j.buf.&mean.&amp);

if (amp < 100.) continue;

for (i-0: i<nbs; i++) buf[i+1]=(float)((unsigned char) *(samples+DELAY+Step*1))- 127. ;

sndinfo(nbs.buf.&mean,&amp);

if (amp < 100.) continue;

msghndl=(Handle) CopyStr ToHand("Processing");

SetFieldByNaιne{paramPtr.cfflag.fname,msghndl);

for (i=1: i<=nbs; i++) freq[i]="buf[i]-mean:

ff=fp(nbs.freq.fen.ac.tpas.1);

num= 17.312342*log(0.1223122*ff);

i=(int) num:

k=(int) 100*(num-1):

numtostring(i,result);

I=strlen(result):

strcpy(result+1,"."):

numtostring(k,result+1+1);

1=strlen(result):

strcpy(result+;," \n"};

paramPtr->returnValue = (Handle) CopyStrToHand(result):

}else{

Fail(paramPtr.SampleError);

}

break:

}

DisposPtr(samples);

err=SPBCloseDevice(refnum):

stemp=clock():

do {

tempo=(1000*(clock()-stemp))/CLOCKS_PER_SEC; } while (tempo < tempo2);

}

return;

}

/* Functions */

void Fail(paramPtr. errorNumber)

XCmdPtr paramPtr;

short errorNumber;

{

switch (errorNumber) {

case ParamError: paramPtr->returnValue = (Handle) CopyStrToHand("Error - Usage:

Extract(1,60.100.500)");

break:

case MemoryError:

paramPtr->returnValue = (Handle) CopyStiToHand("Error - Memory error"); break;

case SIDEΠΌΓ:

paramPtr->reluπιValue = (Handle) CopyStrToIIand("Error - SID error");

break;

case SampleError:

paramPtr->returnValue = (Handle) CopyStrToHandCError - Sample error"); break;

case GestallError

paramPtr->-returnValue = (Handle) CopyStrToHand("Error - Gestalt error"): break;

case TimeError:

paramPtr->returnValue = (Handle) CopyStrToHandC'Error - Timeout error"); break:

default

paraιnPtr->returnValue = (Handle) CopyStrToHand("Error - Unknown error"): break:

}

return:

}

short GetParamCount(paramPtr. min, max)

XCmdPtx paramPtr;

short min:

short max;

{

short count;

count--paraιnPtr->paramCount;

lflcount > max 1 1 count < mm) rttumUshort) -1):

retuxn(counU:

}

Handle CopyStrToHarιd(str)

diar *str;

{

Handle newHndl:

newHndl=(Handlc) NewHandle(dong) strlen(str)+1);

strcpyflchar *) (*newHndl).str):

return(newHndl): long HandleToNum(paxamPtr.hndl)

XCmdPtr paramPtr;

Handle hndl;

{

char str[32];

long num:

strcpy(str,*hndl);

num=StiToLong(paramPtr. c2pstr(.s(.r));

return(num):

Claims

WHAT IS CLAIMED IS:

1. An ear training method, comprising the steps of:

prompting a user to generate a particular note;

sampling digitally sounds generated by said user to collect a set of digitized sampled sounds;

processing synchronously said set of digitized sampled sounds to determine a frequency and a sampled note corresponding to said set of digitized sampled sounds;

comparing said sampled note to said particular note; and

providing said user with a visual indication when said frequency of said sampled note differs from a frequency of said particular note by a predefined margin.

2. The method of claim 1 wherein said processing step includes the steps of:

computing a Fourier transform for said set of digitized sampled sounds; extracting a discrete frequency of maximum energy; and

interpolating a real frequency for said set of digitized sampled sounds from said discrete frequency and a plurality of energies of frequencies adjacent to said discrete frequency.

3. The method of claim 1 wherein said prompting step further comprises the steps of:

analyzing past performances to select a particular one exercise from a plurality of predefined exercises, with each said exercise including a plurality of notes; and

periodically selecting for prompting each of said plurality of notes.

4. A method for improving an ability of a user to determine a pitch of a note, comprising the steps of:

entering a set of parameters for the user; storing a knowledge base including a plurality of structured pedagogic exercises which develop particular abilities, each exercise of said plurality of structured pedagogic exercises including a plurality of notes;

testing said set of parameters and an analysis of past performances against said knowledge base to select a particular one of said plurality of structured pedagogic exercises;

prompting generation of each note of said plurality of notes of said particular one of said plurality of structured pedagogic exercises;

sampling digitally generated sounds corresponding to each prompted note of said plurality of notes to collect a plurality of sets of digitized sampled sounds;

processing each said set of digitized sampled sounds to determine a frequency and a sampled note corresponding to each said set of digitized sampled sounds;

comparing each said sampled note to each said particular note; and providing said user with an indication when any of said frequencies of said sampled notes differs by a predefined margin from a frequency of said

corresponding particular note.

5. The pitch determining ability improving method of claim 4 wherein said digitized sampled sounds processing step is synchronous.

6. The pitch determining ability improving method of claim 4 wherein said digitized sampled sounds processing step is asynchronous. 7. The pitch determining ability improving method of claim 4 further comprising the step of modifying said predefined margin responsive to an analysis of instances of generated notes within said predefined margin compared to instances of generated notes exceeding said predefined margin. 8. The pitch determining ability improving method of claim 7 wherein said predefined margin is decreased when a percentage of properly generated notes exceeds a predetermined value.

9. The pitch determining ability improving method of claim 7 wherein said predefined margin is increased when a percentage of improperly generated notes exceeds a predetermined value. 10. An ear training system, comprising:

a frequencizer for producing a frequency of a sampled sound;

a computer having a memory, a display and a processor, said computer selectively coupled to said frequencizer;

input means, coupled to said computer, for entering a user-specific parameter;

a knowledge base, coupled to said memory, for storing a plurality of predefined pedagogic structured exercises, with each exercise including a plurality of notes, and a past performance history of exercises performed by a particular user; and programming means, coupled to said memory, for instructing said processor to:

test said user-specific parameter against said knowledge base to select a particular one of said predefined pedagogic structured exercises; display selected notes of said particular one exercise;

prompt said particular user to generate a sound corresponding to a particular one of said displayed notes;

access said frequencizer and receive a sampled frequency corresponding to said prompted particular one note;

compare said sampled frequency with a reference frequency of said prompted particular one note; and

indicate an improperly generated note to said particular user when a difference between said sampled frequency and said reference frequency exceeds a predefined margin of error.

11. The ear training apparatus of claim 10 wherein said programming means further instructs said processor to store in said memory a session log of said sampled frequency and instances of improperly generated notes.

12. An ear training system, comprising:

a digitizer for producing a set of digitized sampled sounds from a user-generated note;

a computer having a memory, a display and a processor, said computer selectively coupled to said digitizer;

input means, coupled to said computer, for entering a user-specific parameter;

access said digitzer and receive said set of sampled sounds generated by said user responsive to said prompted particular one note; extract a frequency for said accessed set of sampled soundss to produce a sampled frequency;

13. A method for improving an ability of a user to determine a pitch of a note, comprising the steps of:

entering a set of parameters for the user;

storing a knowledge base including a plurality of structured pedagogic exercises which develop particular abilities, each exercise of said plurality of structured pedagogic exercises including a plurality of notes;

prompting generation of each note of said plurality of notes of said particular one of said plurality of structured pedagogic exercises by use of an animated character;

corresponding particular note by said animated character.

14. An ear training frequencizer, comprising:

a digitizer for producing a set of digitized sampled sounds; a digital signal processor, coupled to said digitizer, for processing said set of digitized sampled sounds to extract a frequency for said set of digitized sampled sounds; and

switching means, coupled to said digitizer and to said digital signal processor and responsive to a signal from an ear training system, for providing said ear training system with one of said set of digitized sampled sounds and said extracted frequency.

15. An ear training system, comprising:

a frequencizer for producing a frequency of a sampled sound;

a computer having a memory, a display and a processor, said computer selectively coupled to said frequencizer; and programming means, coupled to said memory, for instructing said processor to:

access said frequencizer and receive a frequency corresponding to a note generated by a user; and

present synchronously a symbol on said display indicating said frequency of said note.

16. An analog music notation system, comprising:

a frequencizer for producing a frequency of a sampled sound;

a computer having a memory, a display, and a processor, said computer selectively coupled to said frequencizer; and

programmming means, coupled to said memory, for instructing said processor to:

present synchronously the nearest musical note on a musical score corresponding to said frequency of said analog sound on said display.

AMENDED CLAIMS

[received by the International Bureau on 16 August 1993 (16.08.93);

original claims 1-3 replaced by amended claims 1-4;

original claims 4,11 and 12 amended and renumbered 5,12 and 13 respectively; original claims 5-10 and 13-16 unchanged but renumbered 6-11 and 14-17 respectively;

(6 pages) ]

1. An ear (raining method, comprising the steps of:

prompting a user lo generate a particular note;

processing synchronously said set of digitized sampled sounds to determine a frequency and a sampled note corresponding to said set

of digitized sampled sounds;

comparing said sampled note to said particular note; and

providing said user with a visual indication when said

frequency of said sampled note differs from a frequency of said

particular note by a predefined variable margin. 2. The method of claim 1 wherein said processing step

includes the steps of:

normalizing digital data by mathematic formula;

computing a Fourier transform or other mathematic formula for said set of normalized digitized sampled sounds;

extracting a discrete frequency of maximum energy;

interpolating a real frequency for said set of normalized

digitized sampled sounds from said discrete frequency and a plurality

of energies of frequencies adjacent to said discrete frequency;

computing an auto-correlation of frequency spectrum;

extracting a discrete frequency of maximum energy;

interpolating a real frequency for said set of normalized

digitized sampled sounds from said discrete frequency and a plurality

of energies of frequencies adjacent to siid discrete frequency; and

choosing the best estimate of the true frequency.

3. The method of claim 1 wherein said prompting step includes the steps of:

playing digitized sounds for said user to match, including a

plurality of predefined musical nomenclature choices;

analyzing past performances to select a particular one exercise

from a plurality of predefined exercises, with each said exercise

including a plurality of notes; and periodically selecting for prompting each of said plurality of notes.

4. The method of claim 1 wherein said sampling step includes the steps of:

delaying the acquisition of the sample by a predetermined, but variable, length of time; and

changing the sampling rate based on the target note of the sample.

5. A method for improving an ability of a user to determine a pitch of a note, comprising the steps of;

entering a set of parameters for the user;

testing said set of parameters and an analysis of past

performances against said knowledge base to select a particular one of said plurality of structured pedagogic exercises;

prompting generation of each note of said plurality of notes of said particular one of said plurality of structured pedagogic exercises; sampling digitally user generated sounds corresponding to each prompted note of said plurality of notes to collect a plurality of sets of digitized sampled sounds;

processing each set of digitized sampled sounds to determine a frequency and a sampled note corresponding to each said set of digitized sampled sounds;

comparing each said sampled note to each said particular note; and

providing said user with a qualitative or quantitative indication when any of said frequencies of said sampled notes differs by a predefined variable margin from said corresponding particular note. 6. The pitch determining ability improving method of claim 5 wherein said digitized sampled sounds processing step is

synchronous.

7. The pitch determining ability improving method of claim 5 wherein said digitized sampled sounds processing step is

asynchronous.

8. The pitch determining ability improving method of claim 5 further comprising the step of modifying said predefined margin responsive to an analysis of instances of generated notes within said predefined margin compared to instances of generated notes exceeding said predefined margin.

9. The pitch determining ability improving method of claim 8 wherein said predefined margin is decreased when a percentage of properly generated notes exceeds a predetermined value.

10. The pitch determining ability improving method of claim 8 wherein said predefined margin is increased when a percentage of improperly generated notes exceeds a predetermined value.

11. An ear training system/ comprising:

a frequencizer for producing a frequency of a sampled sound; a computer having memory, a display and a processor, said computer selectively coupled to said frequencizer ;

input means, coupled to said computer, for entering user-specific parameter;

a knowledge base, coupled to said memory, for storing a plurality of predefined pedagogic structured exercises, with each exercise including a plurality of notes, and a past performance history of exercises performed by a particular user; and

programming means, coupled to said memory, for instructing said processor to:

test said user-specific parameter against said knowledge base to select a particular one of said predefined pedagogic structured exercises;

display selected notes of said particular one exercise; prompt said particular user to generate a sound corresponding to a particular one of said displayed notes;

access said frequencizer and receive a sampled frequency corresponding to said prompted particular one note; compare said sampled frequency with a reference frequency of said particular one note; and

12. The car training apparatus of claim 11 wherein said programming means further instructs said processor to store in said memory a session log of said sampled frequency and user

generated notes;

a means for graphically displaying a report of exercises performed by said user;

a means for calculating and graphically displaying a profile of said user generated notes plotted against targeted notes; and

a means for playing user generated notes and displaying their relationship to targeted notes.

13. An car training system, comprising:

a computer having memory, a display and a processor, said computer selectively coupled to said digitizer;

input means, coupled to said computer, for entering user-specific parameter;

programming means, coupled to said memory, for instructing said processor to:

display selected notes of said particular one exercise; prompt said particular user to generate a sound corresponding to a particular one of said displayed notes; access said digitizer and receive said set of sampled sounds generated by said user responsive to said prompted particular one note;

extract a frequency for said accessed set of sampled sounds to produce a sampled frequency;

compare said sampled frequency with a reference frequency of said particular one note; and

14. A method for improving an ability of a user to determine a pitch of a note, comprising the steps of:

entering a set of parameters for the user;

testing said set of parameters and an analysis of past

sampling digitally user generated sounds corresponding to each prompted note of said plurality of notes to collect a plurality of sets of digitized sampled sounds;

comparing each said sampled note to each said particular note; and

providing said user with an indication when any of said frequencies of said sampled notes differs by a predefined margin from said corresponding particular note by said animated character.

15. An ear training frequencizer, comprising:

16. An car training system, comprising:

a frequencizer for producing a frequency of a sampled sound; a computer having memory, a display and a processor, said computer selectively coupled to said frequencizer ; and

programming means, coupled to said memory, for instructing said processor to:

17. An analog music notation system, comprising:

a frequencizer for producing a frequency of a sampled sound; α computer having memory, a display and a processor, said computer selectively coupled to said frequencizer ; and

programming means, coupled to said memory, for instructing said processor to: