WO1993017408A1 - Method and apparatus for ear training - Google Patents

Method and apparatus for ear training Download PDF

Info

Publication number
WO1993017408A1
WO1993017408A1 PCT/US1993/001450 US9301450W WO9317408A1 WO 1993017408 A1 WO1993017408 A1 WO 1993017408A1 US 9301450 W US9301450 W US 9301450W WO 9317408 A1 WO9317408 A1 WO 9317408A1
Authority
WO
WIPO (PCT)
Prior art keywords
note
user
frequency
sampled
notes
Prior art date
Application number
PCT/US1993/001450
Other languages
French (fr)
Inventor
Bertrand Perroud
Agnès PERROUD
Hervé PERROUD
Original Assignee
Bertrand Perroud
Perroud Agnes
Perroud Herve
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bertrand Perroud, Perroud Agnes, Perroud Herve filed Critical Bertrand Perroud
Publication of WO1993017408A1 publication Critical patent/WO1993017408A1/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B15/00Teaching music
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems

Definitions

  • pitch sensitivity of the musician's ear is very important to any musician as it provides the musician with better musicality, in addition to sight reading and sight singing abilities.
  • Piano players need an outside source of ear training. As they produce good notes having proper pitch when they strike the proper key, provided that the piano is properly tuned, the production of proper pitch for the notes does not challenge any discrimination from the musician's ear.
  • ear training systems can teach ear discrimination of notes for one-half step intervals.
  • ear training systems relying on keyboard devices are limited as to a degree of precision attainable for identification of notes.
  • Keyboard-type ear training systems cannot judge intonation, also referred to as perfection of pitch.
  • Ear training methods of the prior art include professor-student sessions in which the student attempts to reproduce a series of predefined notes by voice or instrument.
  • the professor attempts to provide feedback to the student during the performance.
  • the professor attempts to indicate to the student when a pitch of any particular one played note is too sharp or too flat.
  • This method has been employed for years, it has disadvantages.
  • One disadvantage is that it is difficult to provide real-time feedback for each note. There is some delay after playing a note and the professor's ability to comment on the note.
  • a second disadvantage is a necessity for the professor, for its communication, to rely on the ear of the student (which is the object of the training itself).
  • Another disadvantage with the system is that the comments may be incorrect in that the system is only as reliable as the professor's ability to discriminate proper relative pitch.
  • the prior art method has still another disadvantage in that the student is unable to learn by themselves. Learning according to the method of the prior art requires the assistance of another, and that other person must be a skilled person.
  • the invention provides a novel apparatus and method to allow a student, otherwise unassisted, to properly and efficiently train their ear to recognize proper pitch of notes at a significant precision depending upon the instrument (for example one-eighth, one-twelfth or one-sixteenth of a step). Additionally, the ear training system trains each user to recognize pitch so as to produce notes themselves at such a precision, by singing or playing an instrument such as a voice, a violin, or a saxophone, for example.
  • the present invention provides apparatus and method for assisting a student to improve their abilities to recognize pitches and their understanding of solfege (music theory, intervals, sight reading and sight singing).
  • the invention permits the musician to calibrate and educate their ear to enhance development of harmony and musicality.
  • the system selects a particular exercise from a set of self-parameterizing pedagogic sessions.
  • a difficulty level of the curriculum adjusts automatically to the student by choosing a particular exercise, a routine for the exercise and a level of audio and visual support provided by the system (this is one of several modes of the ear training system, called a practice mode in which samples are preocessed synchronously, after each sample is acquired for each note of the exercise).
  • An ultimate goal of the ear training system is to have the user perform all exercises without any support and interference from the system (this is a mode referred to as test mode in which samples are processed asynchronously, i.e. when the exercise is complete and the user has attempted to generate each note) one hundred percent in tune (with a sensitivity requirement varying depending upon a particular instrument).
  • Users of the ear training system have an option to elect to practice any specific element of a curriculum in a mode of their choice.
  • graphic representation of sound characteristics are available for refining the quality of the musician's sound. Users can also prompt the ear training system to scan and analyze statistically their session log (which contains the best and last performance of each exercise executed in the test mode) for patterns of mistakes.
  • an ear training system includes an electronic display, an analog-to-digital converter to sample generated sounds and convert the sounds into a digital format, memory for storing scenarios, exercises, and session logs, and a processor to present specific exercises to the electronic display, to determine pitch of notes actually played and compare them to expected notes, to provide feedback to the user regarding the pitch of the generated note, to record performance against predetermined standards, and to provide the user with a historical evaluation of performance relative to the predetermined standards.
  • the system initializes itself upon launch to determine specific hardware features present to optimize performance on the specific platform.
  • the processor displays an exercise window with a recommended choice and a palette from which the user selects specific options or modifies system
  • the user manually selects a particular exercise, or permits the system to select an exercise based on the curriculum and past performance.
  • the processor displays the particular exercise, typically a series of notes, and prompts the user to sing or play the exercise.
  • the analog-to-digital converter samples each note and extracts the note's pitch.
  • the processor compares the generated note against the note of the exercise. If the pitch of the generated note exceeds a predefined error margin, the system indicates whether the pitch of the generated note was too sharp, or too flat. Different exercises will prompt the user to continue to try to generate the proper pitch until correct, or will permit the user to continue with the exercise.
  • the user When ready, the user performs the exercise without any interference from the system (test mode).
  • the notes are analyzed when the exercise is complete with results of the exercise displayed on the score and recorded in the session log.
  • the log permits the user to review any particular exercise's performance, not just in a numerical sense, but also to playback the session using the user data.
  • the system employs an optional pedagogic structure which will select exercises for the user, based upon data recorded in the session logs.
  • the system is able to facilitate tuning or evaluation of an instrument by plotting frequency against time (an input note's waveform), or by graphing energy versus frequency (the input note's spectrum).
  • systems employing a frequencizer are able to function as a music notation system.
  • User-generated notes are converted to graphic representations of symbols corresponding to frequencies of notes closest to the user-generated notes.
  • the user is able to play music and have the system record, in conventional music notation form, the generated notes.
  • the present invention offers many advantages over the prior art ear training systems and method. These advantages include a use of a knowledge base that includes pedagogic structured exercises and use of past performance results to advance a user efficiently.
  • the user is able to train the ear without another person's assistance.
  • the training system is useable for complex audio sources, such as a voice or a violin.
  • the ear training system is designed to operate in a hands-off mode, allowing the user to play their instrument without manually operating the ear training system as well.
  • Fig. 1 is an ear training system 100 embodying a preferred embodiment of the present invention
  • Fig. 2 is an illustration showing relationships among selected modules and windows of the ear training system 100;
  • Fig. 3 is a flow chart of the self-configuration module 210 for the ear training system 100;
  • Fig. 4 is a flow chart 500 of a operation of the ear training system 100 for a typical session
  • Fig. 5 is a detailed view of the exercise window 280 and the palette 380;
  • Fig. 6 is a detailed illustration of the tuner window 340
  • Fig. 7 is a detailed illustration of the spectrum window 310;
  • Fig. 8 is a detailed illustration of the waveform window 320;
  • Fig. 9 is a detailed illustration of the results window 330;
  • Fig. 10 is a detailed illustration of the tuning plot window 300
  • Fig. 11 is a flow chart of the listening mode having synchronous processing
  • Fig. 12 is a flow chart illustrating the tuning mode
  • Fig. 13 is a flow chart of the process for the waveform window 320 of Fig. 8;
  • Fig. 14 is a flow chart of the spectrum display process for the spectrum window 310 of Fig. 7;
  • Fig. 15 is a flow chart showing operation of the test mode in conjunction with the processing mode described for the exercise window 280;
  • Fig. 16 is a flow chart detailing the sampling process described in the preceding figures.
  • Fig. 17 is a flow chart of the fast Fourier transform process identified in the preceding figures.
  • Fig. 18 is an operational diagram of a frequencizer for providing sampling and signal processing according to a preferred embodiment of the present invention
  • Fig. 19 is an illustration of an exercise window 280 providing an exercise, using an alternate preferred embodiment of the present invention, for
  • Fig. 20 illustrates a format for a freqDurationCmd command provided to the SoundManager.
  • the present invention provides method and apparatus which assist a user in educating and calibrating their ear to correctly recognize pitches of generated notes.
  • a system selects an exercise from a curriculum and displays a series of notes for the user to generate. As the notes are generated, the system extracts a digital representation of a frequency for each of the generated notes. The system provides feedback for each note. The feedback indicates whether a pitch determined for the generated note was too sharp or too flat relative to a pitch for the note of the exercise corresponding to the note the user attempted to reproduce.
  • Fig. 1 is an ear training system 100 embodying a preferred embodiment of the present invention.
  • the ear training system 100 includes a computer system 150 coupled to a digitizer 160 and an output circuit 162.
  • the computer system 150 comprises a central processor 108, a system memory 107, an input device such as a keyboard 103, for example, a mouse 102, a fixed disk 104, a display 101, an external interface 105, a printer 109, and an input/output (I/O) controller 106.
  • the computer system 150 optionally includes a floating point unit (FPU) 110 to provide enhanced features to embodiments implemented with computer systems 150 having central processing units 108 lacking an integral FPU, as will be explained further below.
  • FPU floating point unit
  • a system bus 115 couples all the components of the computer system 150, providing a link between all of them.
  • a preferred embodiment uses an appropriately programmed Apple Macintosh ® Plus or better (Apple Computers, Cupertino, California). It is to be understood that other platforms are available and may embody the invention in other forms. The invention is not limited to embodiments that include Apple computers.
  • the digitizer 160 receives audio analog input and converts it into digitized audio. In the preferred embodiment, the digitizer 160 provides 8 bit sampling at 22.1 kHz, or better.
  • the output circuit 162 includes an digital-to-analog converter to convert digitized sounds to analog sounds and a speaker to reproduce the analog sounds.
  • the digitizer 160 and the output circuit 162 shown coupled to the external interface 105 are actually an integral part of the Apple Macintosh computer and provide the required audio sampling and playback for one preferred embodiment of the present invention.
  • Fig. 2 is an illustration showing relationships among selected modules and windows of the ear training system 100.
  • the modules of the ear training system 100 include a launch module 200, a gestalt and self-configuration module 210, a group of session-management modules that include a scenarios module 220, an exercises module 230, and a session log module 240, and sound processing modules including a sampling module 250, a pitch determination module 260, and a sound generation module 270.
  • windows associated with the ear training system 100 There are seven windows associated with the ear training system 100. They include an exercise window 280, a content window 290, a tuning window 300, a harmonic content window 310, a waveform window 320, a result window 330, and an electronic tuner window 340.
  • Window selection is made through selection of a desired window choice from an options menu integrated into each window.
  • Other menus of the ear training system 100 permit selection of other features and customization of operation of the ear training system.
  • a palette 380 providing a plurality of icons associated with various functions.
  • GUI graphical user interface
  • the launch module 200 directs initialization of the ear training system
  • the self-configuration module 210 automatically enters default values for ear training system 100 operation based upon a model of the computer system 150. This self-configuration optimizes operation to compensate for speed and central processor type differences between computer system platforms.
  • a particular exercise is displayed in the exercise window 280, waiting for the user.
  • This initial exercise is selected by use of a scenario module 220, an exercise module 230 and a session log 240, if available.
  • the exercise module 230 contains numbered sets of a series of notes.
  • Each exercise is designed with some particular objective in mind. For example, an exercise may test an ability to play a chromatic scale, or test fingering for note generation, for example.
  • the exercises may be quite specific, depending upon a scope of a curriculum the ear training system 100 is designed to cover.
  • the notes of the exercises are stored in chained ASCII format
  • the scenarios module 220 includes at least one decision tree.
  • the decision tree identifies a default progression of exercises designed to efficiently advance the user in pitch recognition and note generation. Typically, a user starts at exercise one and follows the pedagogic structured decision path after satisfactory completion of each particular one exercise identified in the user's decision tree.
  • the decision tree selected for a particular user is based in part upon the user's age, instrument type and measured performance for a set of evaluation exercises selected when the user first begins to use the ear training system 100.
  • the ear training system reviews the user's session log 240 to determine the user's demonstrated proficiency for any exercises completed in the user's decision branch.
  • the next pedagogic structured choice from the decision branch selects and loads a next particular exercise from the exercise module 240. If the demonstrated proficiency indicates that specialized work on specified aspects of the user's technique is necessary, the ear training system 100 will branch to adjustment scenarios in the scenario module 220 based upon evaluation of the information in the session log 240.
  • the user has the option to manually select particular exercises. Particular manual choices are made from the content window 290 to select a particular aspect of their instrument to improve or from the result window 330.
  • the session management modules select a particular exercise to load.
  • the series of notes to be played and the mode are displayed in the exercise window 280.
  • the ear training system 100 has two typical modes of operation, the practice mode and the test mode.
  • the user attempts to play prompted notes with a particular precision as measured within a predetermined step-interval. For each of the series of notes, the user sings or plays a note.
  • the sampling module 250 takes the analog sound and digitizes it to produce a set of digitized values. There are many prior art techniques available for digitizing sounds, many of which would be applicable for the sampling module 250.
  • the user has the option to select the waveform window 320 to directly display the sampled and digitized sounds from the sampling module 250.
  • the set of digitized values is processed by the pitch extraction module 260.
  • the pitch extraction module 260 determines a frequency for the set of digitized sounds. In the preferred embodiment, fast Fourier transforms extract the frequency.
  • the user may visualize pitch of generated notes in several ways in addition to feedback presented in the exercise window 280.
  • the ear training system 100 presents extracted pitch information to the spectrum window 310, to the tuning plot window 300, or to the tuner window 340, depending upon a particular use for the system 100.
  • the pitch of the played note is compared with the pitch of the prompted note. If the played note is within the present step-interval selected for the user (a predefined margin), then no further action is required for the particular note. Should the note be flat or sharp, an appropriate indication in the exercise window 280 provides feedback to the user.
  • the system will store results of an exercise into the session log 240. For test mode operation, the ear training system 100 collects information regarding all the notes of an exercise prior to adjudicating performance for each note. After completing the entire session, the system 100 provides feedback and grading to the user. The user has the option of having the exercise, or a reproduction of the user's performance, played by the system 100.
  • the playback module 270 reproduces digitized sounds. In the practice mode, results are stored in memory, such as RAM. The storage duration is for one exercise, allowing the system to play back the performance to assist the user in evaluating the
  • the preferred embodiment does not record all this information to a session log to preserve processing and allow users to practice. It should be recognized that for some applications, storage of all exercises is desirable.
  • a term referred to as a step interval is used to judge precision of generated notes.
  • One popular system for transcribing musical notes is use of a harmonic scale in which each music octave is divided into 12 notes; C, C sharp, D, D sharp, E, F, F sharp, G, G sharp, A, A sharp, and B.
  • the scale continues to repeat itself for each octave. Advancing from one note to a next is a half-step interval, thus the notes are said to have a half-step between them.
  • Each note of each octave has an associated frequency associated with it.
  • desired notes notes are numbered consecutively, such that middle C has a note number of 60.
  • Middle C sharp is numbered note 61. This system continues for each of the octaves. Not only does each note have a representative number, but each note has a frequency associated with it. While the note numbering system is a linear system, the frequencies associated with notes of an octave are exponential. Thus, a change of frequency from note 60 to note 61 is one-half a change of frequency from a change of note 72 to note 73 (Note 72 is C, one octave higher than middle C).
  • each step from note to note is divided appropriately.
  • a change from note 60 to note 61 is a half-step interval, when the system 100 judges user precision with one-eighth, one twelfth or one-sixteenth note precision, the range is divided accordingly. That is, for one-sixteenth step intervals, ranges are established from 60 to 60.125; 60.125 to 60.25; 60.25 to 60.375; 60.375 to 60.5; 60.5 to 60.625; 60.625 to 60.75; 60.75 to 60.875; and 60.875 to 61.0.
  • eight divisions provide one-sixteenth step interval. A user must perform the prompted note within 0.125 on each side of the prompted note to be judged as being played properly.
  • the TABLE I values identify, for each octave interval, a decimal note value for a corresponding chromatic scale note in the identified interval.
  • the values of Table I correspond to MIDI values for the corresponding note.
  • a first step in obtaining the note number is to determine a frequency for a played note.
  • the preferred embodiments of the present invention use the digitizer 160 and the central processor 108 including the FPU 110 (if available) or a frequencizer (described below) to determine a frequency to establish a note number.
  • FFT Fast Fourier Transforms
  • Fig. 3 is a flow chart of the self-configuration module 210 for the ear training system 100.
  • the flow chart of Fig. 3 illustrates a relationship between a main process flow of the system 100 and resources, user interface elements, auxiliary flow and function.
  • the self-configuration module 210 checks, at step 211, the system version number of the operating system to be sure that the version is at least version 6.0.7. This check, specific to the Apple computers, ensures that SoundManager 2.0 is available on the system. SoundManager is a part of the Macintosh ® toolbox which handles sound sampling and synthesizing.
  • the flow chart of Fig. 3 illustrates a relationship between a main process flow of the system 100 and resources, user interface elements, auxiliary flow and function.
  • the self-configuration module 210 checks, at step 211, the system version number of the operating system to be sure that the version is at least version 6.0.7. This check, specific to the Apple computers, ensures that SoundManager 2.0 is available on the system. SoundManager is a part of
  • SoundManager is further described in Apple ® Inside Macintosh, volume VI, hereby expressly incorporated by reference for all purposes. Users are required to upgrade if their system version is less than version 6.0.7. As illustrated, a dialog box 211A appears directing the user to upgrade the operating system to a version greater than 6.0.7 if less than 6.0.7. Other checks may be necessary for other systems, such as checking whether a user employs Windows 3.1 with its multimedia extensions, for example.
  • Fig. 20 illustrates a format for a freqDurationCmd command provided to the SoundManager.
  • the SoundManager responds to commands in this format to generate sounds.
  • Sample code in Appendix Al illustrates use of the freqDurationCmd command.
  • the self-configuration module 210 checks for a presence of the digitizer 160.
  • the Sound Manager attributes an identification number to the digitizer 160 at start-up. The identification number permits access to the digitizer 160 by the ear training system 100. If the ear training system 100 does not detect the presence of the digitizer 160, it notifies the user of a possible malfunction through a dialog box 212A.
  • the self-configuration module 210 checks for the FPU 110 or an FPU integral to the central processor 108. Without the FPU 110, the ear training system 100 enters a PowerMiser ON mode. A dialog box 213A notifies the user that the PowerMiser mode is on. PowerMiser refers to a reduced features mode available for the ear training system 100 which compensates for an absence of the FPU 110 by reducing numerical processing so that note processing time is acceptable.
  • the preferred embodiment provides for smaller sample sizes, disables octave determinations (permitting determinations of a note of the chromatic scale only), and reduces range of the ear training system 100.
  • central processors 108 operating at a relatively slow clock rate, for example less than or equal to 16 MHz, or missing the FPU 110
  • extensive numerical calculations degrade an ability of the computer system 150 to provide real-time feedback to the user (in synchronous mode).
  • the ear training system 100 will determine a note by the played note's frequency of maximum energy. The system 100 does not determine an octave for the note.
  • the system 100 establishes both the note and its octave.
  • a full power mode is possible for computer systems 150 having the slow central processor 108 lacking the FPU 110 through addition of a custom hardware device referred to as a frequencizer.
  • the frequencizer includes the digitizer 160, a ROM-based frequency-extraction algorithm, and a digital signal processor (DSP) to perform the frequency extraction (e.g., fast Fourier transforms and autocorrelation functions).
  • DSP digital signal processor
  • the frequencizer sends a signal representative of a frequency of a sampled sound set to the computer system 150.
  • the frequencizer to maintain an ability for asynchronous processing, includes an ability to send sound samples to the computer system 150, just as the digitizer 160 operates. Alternate embodiments of systems employing a frequencizer are able to function as music notation systems.
  • a color depth check loads attributes used for the monitor 101. Identification of screen depth by the ear training system 100 selects between a monochrome display (1 bit depth), or color (4 or 8 bits depth). These resources include various graphic objects including but not limited to icons, notes, musical symbols, pictures, et cetera. Step 211 through step 214 define a gestalt function for the ear training system 100 to access and optimize the ear training system 100 for operation with the computer system 150.
  • step 215 the correct resources for color depth and other configuration parameters are collected. These other configuration parameters include screen pixel resolution, RAM size, Macintosh ® type, digitizer 160 type, port used for the digitizer 160, if applicable.
  • a dialog box 215A appears requesting user information.
  • Auxiliary flow performs a registration step at 215B which saves a serial number, location of system licensor, and various settings chosen by the user through checking a special window.
  • the self-configuration module 210 saves user preference and system information to a file.
  • the ear training system 100 initializes itself at step 216. Step 216 presents an about window which loads information from the user file and any QuickTime TM animation files if QuickTime TM is present.
  • QuickTime TM is an animation standard developed for use with Apple Macintosh computers.
  • the about window step 216 presents a dialog box 216A which identifies the product, the manufacturer, configuration and user information, in addition to other information.
  • a preferences window is displayed, to permit the user to make special choices regarding some attributes of the interface, such as, for example, colors of various objects, musical nomenclature style (e.g., ABC, mobile Do, fixed Do, or fingering when applicable), a volume of sound output, enablind or disabling a voice prompt incorporated within the ear training system 100, adjusting a background rejection level, switching PowerMiser mode. Changing any of these values results in a branch to the auxiliary step of 216C to set preferences. These new values are saved in the user preference file and the system 100 reinitializes at step 216.
  • the ear training system 100 depending upon user preferences set in the preferences window at step 216, either automatically sets conditions or allows the user to self select an exercise.
  • the ear training system 100 checks the curriculum (from the scenarios module 220 of Fig. 2) to determine a decision branch applicable to the user based upon information supplied in the about window at step 216. This information includes age, instrument and skill level, for example.
  • the automatic mode reviews the session log to evaluate performance to date to properly place the user in the decision tree.
  • the ear training system selects a suggested exercise for the user.
  • the suggested exercise is loaded from the exercises module 230 (Fig. 2).
  • the system 100 displays the notes and parameters of the conditions in the exercise window 280.
  • the ear training system 100 permits the user to identify a particular one exercise from the exercises module (Fig. 2).
  • the ear training system 100 at step 217 will use the identified exercise to set conditions for
  • the set conditions step 217 thereafter displays the particular one exercise in the exercise window 280.
  • Fig. 4 is a flow chart 500 of a operation of the ear training system 100 for a typical session.
  • the flow chart 500 of Fig. 4 illustrates a relationship of the main flow process with resources and user-interfaces.
  • Session-management modules and sound processing modules participate in execution of a session.
  • the session-management modules are responsible for selection of a particular exercise from a curriculum and presenting a series of notes to the user to prompt generation of the notes.
  • the session-management modules initially select and load a particular exercise and display a series of notes in the exercise window 280.
  • the exercise is a performance mode, such as practice or test mode. Selection of the particular exercise results in a. display of the exercise window 280 which has a plurality of scores and superimposed notes.
  • the ear training system 100 displays an exercise window 280 with a loaded exercise.
  • the system 100 enters a listening mode and prompts the user to get ready by use of a dialog box 504A.
  • a performance process synchronously prompts the user to play a note, at which time the system 100 provides feedback on the played note relative to the prompted note to the user.
  • the system 100 optionally displays the feedback in many different forms, a typical way is directly in the exercise window.
  • the system 100 evaluates the user's performance.
  • the system 100 selects a new exercise.
  • the new exercise derived from the session-management modules presents the new exercise to the user in an exercise window 280 (Fig. 2).
  • the system 100 selects an appropriate mode.
  • the practice mode provides the user with relatively large amounts of audio and visual support to satisfactorily perform the exercise.
  • the same series of notes, with less audio and visual feedback, can present an entirely different exercise to the student.
  • a goal of the ear training system 100 is to train the user to recognize and perform to a specified precision without extraneous support from the system 100.
  • the new exercise selected at step 508 can return to step 502 if the exercise is to be a practice mode exercise. If the ear training system 100 is to test the user based upon the new exercise loaded at step 508, the program flow advances to step 510. Step 510 places the sound processing modules into a sample mode. At step 512, the user completely performs the exercise. Thereafter, at step 514, the
  • step 514 includes saving the data in a session log and providing feedback regarding the user's precision. Depending upon the exercise, if the performance was not above a particular level, the system 100 expects the user to perform the exercise again. The user continues to perform, steps 512 and 514, until the user achieves the exercise at a particular result level.
  • step 516 the system 100 advances to step 516 to load information from the session log and the display resources. Information from the user log allows the user to monitor progress and proficiency. One way to provide the user with the information is through display of the results window 330 (Fig. 2).
  • a global auditing of past performances provides a historical assessment of proficiency for different aspects of the ear training.
  • an statistical analysis of the user file evaluates all exercises by note. The analysis could indicate that notes F and G are consistently sharp while D is often flat and the other notes are, on average, performed adequately.
  • An audition window 518A displays a graphic representation of the intonation abilities of the user, for the range of notes stored in the session log, The view in the audition window 518A represents actual patterns of recognition of pitches for each of the notes practiced in the curriculum.
  • the graphic display of the audition window 518A allows the user, or others, to visually perceive strenghts and weaknesses of the user's ear.
  • the audition window 518A provides the user with an option to print the summary information displayed.
  • Other analysis tools are possible, depending upon implementations of the ear training system 100.
  • the system 100 saves global auditing information in the session log.
  • Fig. 5 is a detailed view of the exercise window 280 and the palette 380.
  • the exercise window displays a series of notes 281 for the exercise in traditional musical notation on musical scores 282.
  • the exercise window 280 includes a menu bar 283 on top listing different menu choices available to the user.
  • On the right of the exercise window is a command bar 284 illustrating particular options set for the mode. For example, the view mode which controls the note and staff size (reduced in example), the tempo (80 in example), the routine type (loop on miss in example), the level of visual or audio support provided, the sensitivity and the nomenclature chosen. Audio support includes playing an exercise or note prior to or subsequent to the user generating the note.
  • a status bar 285 along the bottom of the exercise window indicates the instrument type, system mode, status and prompted note.
  • the palette 380 identifies various selectable features by association of the feature with an icon.
  • the highlighted ear icon in the palette selects display of the exercise window which will prompt notes and wait for the user to play the note.
  • the speaker icon will play an exercise, while the microphone icon samples and stores notes the user generates.
  • the hammer icon initiates an evaluation of any sampled and stored notes resulting from selection of the microphone icon.
  • the tuning fork icon selects the tuning window 340, the spectrum icon selects the spectrum window 310 and the waveform icon selects the waveform window 320.
  • the system 100 prompts the user to play a particular note by highlighting the note, such as for example by displaying the note with inverse video or surrounding the note with a blinking rectangle 286.
  • the ear training system 100 determines a frequency for the played note and provides the user with feedback regarding performance on the note prior to prompting a new note.
  • the feedback results from superimposing a chevron 287 of one or two stripes either above or below each note. For example, for a note played slightly sharp, a single chevron, pointing down, is displayed above the note symbol. Similarly, for notes slightly flat, the system 100 places a single upwards directed chevron under the note.
  • the exercise window 280 also includes another user notation symbol for a bad data sample.
  • the cross symbol 288 signifies that a problem exists with the sample, and the problem prevents evaluation of the note. Possible causes of a bad data sample include the user generating two notes at one time if multi-frequency tones are not allowed (for example, for systems 100 lacking the frequencizer), or some other sound
  • the ear training system 100 uses information from the user regarding instrument type and
  • the ear training system uses this information to select an initial precision step.
  • the preferred embodiment selects from one-eighth, one-twelfth and one-sixteenth step interval's, although other steps could be used. This interval step indicates how precise the user's reproduction of a note must be to warrant
  • the ear training system dynamically monitors performances during test modes against this interval step. After a user performs satisfactorily such that a predetermined percentage, for example eiglity percent in the preferred embodiment, of the reproduced notes are correct, the ear training system automatically selects a more precise step interval for use. Thereafter, the user's performance is gauged against the new step interval. As the user continues to improve and performance is judged correct for eiglity percent of the notes, the ear training system uses progressively more precise step intervals for the user.
  • a predetermined percentage for example eiglity percent in the preferred embodiment
  • the ear training system 100 adapts the system 100 to thereafter use a less precise step interval.
  • the ear training system is able to adjust itself to a user's ability level, providing a valuable tool for users of all ability levels.
  • asynchronous mode are recorded into a session log, along with sampled sounds to permit the session to be replayed for the user.
  • the test mode provides feedback, for each note, after the user has played all notes of the session. It is this mode which grades a user's performance and determines the user's proficiency level. Different statistical analyses and tests of the session log influence selection of a next exercise from the curriculum.
  • Fig. 6 is a detailed illustration of the tuner window 340.
  • the window updates, based upon the sampled sound, the present frequency, acting like an electronic tuner.
  • the window includes an arrow 341 which moves relative to a scale 342.
  • the arrow 341 indicates a percentage of step the note of the sampled sound is away from a proper note (i.e., tuned).
  • the three notes displayed indicate the one step range (-50% to 50% corresponds to one step) represented on the window, and which frames the sampled note.
  • This window provides quantitative information for feedback rather than indicating only whether a note was flat or sharp.
  • Fig. 7 is a detailed illustration of the spectrum window 310.
  • a user By selecting the spectrum window 310, a user is able to evaluate energy versus frequency of a sampled sound. It is sometimes useful to determine a distribution of energy between the different frequencies composing a note. This permits the quantitative analysis of a content of a note or tone according to its fundamentals and harmonics frequencies. This analysis is useful to asses a quality of a particular instrument by careful review of notes it generates
  • Fig. 8 is a detailed illustration of the waveform window 320.
  • the waveform window 320 is another diagnostic tool providing specific information regarding a played note.
  • the plot in the waveform window 320 is amplitude versus time. This is a traditional representation of a wave. By observing a regularity and homogeneity of the waveform, a user is able to judge a purity of a generated note or a pertinence of a sample.
  • Fig. 9 is a detailed illustration of the results window 330.
  • the results window 330 is available to present summary information regarding the completed exercises.
  • the results window 330 presents summary information in histogram format. The format graphically indicates, for each completed exercise, a last score and a best score. Statistical and other evaluations of trends and proficiency measurements are accessed from the results window. Some of these tests include mean average, variance, standard deviation, tempo problems, range addressed, and patterns of mistakes like specific notes or intervals fingering.
  • Fig. 10 is a detailed illustration of the tuning plot window 300.
  • This window permits a user to accurately tune their -instrument.
  • a desired frequency is used as a reference frequency.
  • the tuning plot window 300 displays the closest note.
  • the note's frequency relative to the reference, is plotted against time.
  • pitch error indicators dynamically indicate the played note's relative pitch. That is, as shown, the note began too flat and became sharp. As time progressed, the note became decreasingly sharp, approaching the reference frequency.
  • the proper tuning is graphically displayed in the tuning window 300, facilitating accurate reproduction of notes. Sharp and flat indicators graphically provide information regarding relative pitch compared to a reference note.
  • Fig. 11 is a flow chart of the practice mode having synchronous processing.
  • the system prompts a note at step 510.
  • a sample process 522 converts analog audio signals to digitized signals, through eight bit sampling at a sampling rate of 22.1 kHz or better.
  • the sampling step 522 provides a time-out error if it does not detect audio input within a prespecified period.
  • the ear training system 100 performs fast Fourier transforms of the sampled and digitized data to extract its frequency.
  • a storage step 526 records the information to memory for later playback. Thereafter, a delay step 528 precedes displaying the result at step 530.
  • the delay step allows adjustment of a time interval between consecutive samples to an assigned tempo independently of a speed of processing of the sample which will vary significantly for different hardware configurations of the ear training system 100, such as central processor 100 clock speed or type, or a presence of an FPU 110 or a frequencizer.
  • the display step 530 provides feedback regarding the played note relative to the prompted note. In the preferred embodiment, the feedback results directly in the exercise window 280.
  • the highhghted icon of the palette 380 commands the system into the particular mode.
  • Fig. 12 is a flow chart illustrating the tuning mode.
  • the tuning mode is available only on computer systems 150 having the FPU 110. In this mode, the system 100 continuously processes a an input audio frequency and plots the result against time. Displaying the information as an electronic tuner is one preferred embodiment. The plotting window permits visualization of sustained accuracy and vibrato
  • a tuning mode begins with a particular display window at step 550.
  • the system 100 samples input audio at step 552.
  • Fast Fourier transforms, step 554, extract a frequency for the sampled sound.
  • the system 100 records pitch errors detected after extraction of the frequency at the step 554.
  • the system 100 plots frequency, either versus time or per acquisition. The flow returns to step 552 for the next sample. The process continues until the user terminates this mode.
  • Fig. 13 is a flow chart of the process for the waveform window 320 of Fig. 8. Selecting display of the waveform window 320 executes the steps 560 through 564.
  • the system 100 will review a set of sampled notes.
  • the system 100 loads a note.
  • the system 100 opens the waveform window 320.
  • step 564 the system 100 plots the sample in the waveform window 320.
  • the process waveform of the sample processed is displayed until the user deselects this waveform view mode.
  • Fig. 14 is a flow chart of the spectrum display process for the spectrum window 310 of Fig. 7. Similar to the flow chart of Fig. 13, the process loads a note from a set of sampled notes at step 570. After performing a fast Fourier transform on the loaded note, step 572, the process opens the spectrum window 310 at step 574. The spectrum display process then plots, step 576, a power content of the sample versus frequency. Again, the spectrum of the sample processed is displayed until the user deselects the mode.
  • Fig. 15 is a flow chart showing operation of the test mode in conjunction with the processing mode described for the exercise window 280.
  • the preferred embodiments contemplate providing a test mode after each exercise.
  • the test mode is an asynchronous processing of the users notes.
  • the test mode provides a faster interactivity allowing work at faster tempos. This faster interactivity is possible without regard to the computer system 150 because the system 100 samples and stores data and waits until completion of the exercise to process the stored samples. Processing the data will be dependent upon performance capabilities of the computer system 150, but the processing does not impact the execution of the exercise..
  • the system 100 prompts a note, step 580, samples the sound, step 582, and stores the samples at 584. These steps 580 through 584 continue for each note of the exercise. Thereafter, for steps 586 through 590, the stored sampled notes are processed to extract frequency and compare to expected values. Step 586 loads a note, step 588 extracts its frequency, and step 590 displays the extracted frequency relative to a value for the corresponding prompted note. The processing continues for all the stored notes.
  • Fig. 16 is a flow chart detailing the sampling process described in the preceding figures.
  • the sampling process begins at start, step 600.
  • the system 100 initializes the sound manager.
  • a timeout check at step 604 prevents the system 100 from waiting indefinitely for user input. If timeout is true, the sample process asserts a timeout error, step 606, and ends.
  • Step 608 If there is not a timeout at step 604, the process advances to step 608 to get a set of samples.
  • Step 610 generates a value based on the deviation of the last one-eighth samples.
  • Step 612 checks the value to ensure that the value is greater than a minimum value. If it is not, the sampling process returns to step 604 to test for timeout If the test at step 612 is greater than a threshold, the process advances to step 614 to delay, then to step 616 to obtain more samples.
  • step 618 an amplitude value derived from a deviation from a first one-eighth samples. The amplitude value must exceed a minimum at step 620 to proceed. If the amplitude is not sufficiently large, the process returns to step 604 to test for a timeout error.
  • Passing the amplitude test at step 620 results in calculation of a new amplitude value derived from a deviation of a last one-eighth samples of a most recent sample set, step 622. Again, step 624 tests the amplitude and if it is too low, the process branches to step 604 to test for timeout. Passing the amplitude test at step 624 results in a calculation of a new amplitude value based upon a deviation of all samples, step 626. If the amplitude is sufficiently large, step 628, the process returns the samples, step 630, and ends, step 632. If the amplitude at step 628 is not above a threshold, then the process branches to step 604 to test for a timeout error.
  • Fig. 17 is a flowchart of the fast Fourier transform process identified in the figures above.
  • the FFT process begins at 700, start.
  • Step 702 computes a mean of the samples provided from a sampling process, for example.
  • Step 704 subtracts the mean from all the samples.
  • Step 706 computes a Fourier transform of all samples.
  • Step 708 obtains the discrete frequency of maximum energy.
  • Step 710 interpolates a real frequency from all neighboring energies.
  • Step 712 tests for a presence of the FPU 110 in the computer system 150. If there is a FPU 110, the process continues to step 714 to compute an autocorrelation of a frequency spectrum for the sample.
  • Step 716 obtains a frequency corresponding to a first maximum of autocorrelation energy.
  • Step 718 interpolates real frequency from neighboring energies.
  • step 720 chooses a best estimate of true frequency.
  • Sample code in Appendix A2 illustrates a
  • step 712 if it indicates an absence of the FPU 110, the process skips steps 714 - 718 by branching to step 720.
  • Step 722 converts the frequency into a decimal note by techniques well known in the art.
  • Step 724 returns the decimal note and the frequency spectrum of the sample. The process is complete at step 726, end.
  • Fig. 18 is an operational diagram of a frequencizer for providing sampling and signal processing according to a preferred embodiment of the present invention.
  • the frequencizer has two modes of operation, the frequencizer mode and the digitizer mode.
  • the frequencizer performs like the digitizer 160 of the computer system 150. It samples analog sound, eight bits, of 22.1 kHz or better. Digitizer operation is shown on the right hand side of Fig. 18.
  • the digitizer performs ASCII conversion on the sampled values and chains the ASCII values to produce an output file.
  • the output of the frequencizer is coupled to the computer serial port.
  • This digitizer mode permits operation of the ear training system in asynchronous mode.
  • This chained ASCII is a common convention for communication of values. For example, to send 60.25, five ASCII characters are sent. These characters represent "6", "0", ".”, "2", and "5".
  • a more powerful mode is the frequencizer mode which includes signal processing circuitry to perform autocorrelation and fast Fourier transforms to convert a spectrum of energy.
  • the frequencizer converts the energy spectrum into a decimal note, followed by an ASCII conversion.
  • the output of the frequencizer coupled to a computer system's 150 serial port, are frequencies of sampled analog sounds.
  • the ear training system 100 employing the frequencizer has an ability to quickly process synchronous exercises at relatively fast tempo rates and provide real-time feedback.
  • the inclusion of a frequencizer by virtue of increased processing speed, allows an embodiment of the ear training system 100 employing a frequencizer, to process multiple frequencies usch as two notes being played at once.
  • the frequencizer allows coupling with a notation capacity such that the user can play notes, have the pitch extracted, and have the played notes graphically represented on the display.
  • the present invention provides an efficient mechanism to enhance musicality and to educate and calibrate a user's ear to recognize and produce precise pitches for a series of notes.
  • the use of a structured pedagogic scenario that selects particular exercises to efficiently advance a user is an important aspect of a preferred embodiment of the present invention.
  • the structured pedagogic exercises are modified by analyses performed on recorded data of past performances to correct aspects of performances below a desired proficiency.
  • the ear training system customizes itself to existing software to enable peak performance on multiple classes of computer systems.
  • a digitizer converts analog audio into digital samples.
  • Processing of the digitized samples extracts a frequency of the samples and permits comparison to expected values.
  • a frequencizer can improve performance for all classes of computer systems. Providing a practice mode and a test mode (synchronous and asynchronous) with varying levels of feedback enhances an ability of the ear training system to train a user in precise recognition and reproduction of pitch.
  • the curriculum will be made of animations related to a song or a specific exercise. Children would be prompted to sing notes by use of an animated character which would indicate the note to be played. Rather than using a symbol, such as the chevrons which could be cryptic to the child, the animated character provides the feedback to the child user. For example, in an exercise designed to target a particular pitch, a dog, running , at a proper tempo, would jump through a hoop for correctly sung notes. For sharp notes, the dog would overjump the hoop, while flat notes would cause the dog to undershoot the hoop.
  • the graphic character is customizable to particular songs. For example, to teach children to sing Twinkle Twinkle Little Star, an animated star walking above a staff would prompt a user to sing particular notes. Similarly to the example with the dog, the animated star could leave a smiling star for properly produced notes, while producing frowning stars for sharp or flat notes.
  • Fig. 19 is an illustration of an exercise window 280 providing an exercise for Twinkle Twinkle Little Star using animated characters. As shown, the animated character 800 advances above the staff 282, from note to note. Correct notes produce a smiling star 802, sharp notes produce frowning, downward looking stars 804, and flat notes produce frowning, upward looking stars 806. Bad samples have frowning stars with crossed-out eyes 808.
  • step (note ? (int) xrat./(16*theta) : 2):
  • duree (clock()-debutl)/CLOCKS_PER_SEC;
  • ff fp(nbs.freq.fen.ac.tpas.1)
  • paramPtr->returnValue (Handle) CopyStiToHand("Error - Memory error"); break;

Abstract

An ear training system (100) which selects particular exercises from a structured curriculum for performance by a user. The system (100) prompts the user to play a note. The training system (100), in a practice mode, samples the played note (160), extracts its frequency (105), and provides immediate feedback (101) to the precision of the played note relative to the prompted note. The process continues for each note of the exercise. Results are stored in a session log (107), which is evaluated to select, within a framework of the predefined pedagogic scenarios, to select a subsequent exercise. Asynchronous performance of exercises which provide feedback upon completion of the entire exercise provide a better evaluation of a user's ability to precisely recognize and reproduce notes. A digitizer (160) coupled with a processor (frequencizer) (108) to perform fast Fourier transforms for frequency extraction permit conversion of audio sound signals to digitized sounds and ASCII representations of decimal notes. Systems employing a frequencizer are able to perform music notation for analog music notes.

Description

METHOD AND APPARATUS FOR EAR TRAINING
BACKGROUND OF THE INVENTION
The present invention relates generally to a music training apparatus and method and, more particularly, to a system for ear training which calibrates and educates a person's ear to recognize pitch and develop a sense of harmony and musicality.
Musicians enhance their musicality by their ability to identify, and to produce mentally and physically, a series of musical notes. Precise identification and production of the series of notes is of utmost importance for singers, violinists, saxophonists or other instrument for which the musician produces a proper pitch for a note by proper body mechanics.
However, pitch sensitivity of the musician's ear is very important to any musician as it provides the musician with better musicality, in addition to sight reading and sight singing abilities. Piano players need an outside source of ear training. As they produce good notes having proper pitch when they strike the proper key, provided that the piano is properly tuned, the production of proper pitch for the notes does not challenge any discrimination from the musician's ear.
Techniques to improve musicality include development of a separate tool to enhance musicality through ear training. In a best case scenario, a keyboard instrument can teach ear discrimination of notes for one-half step intervals. Thus, ear training systems relying on keyboard devices are limited as to a degree of precision attainable for identification of notes. Keyboard-type ear training systems cannot judge intonation, also referred to as perfection of pitch.
For instruments such as the voice, the violin or the saxophone, for example, playing in tune relies not only upon proper identification of notes, but also on physically producing the right pitch. For these types of instruments, the musician must identify the pitch of a note in order to produce it. The more accurate the musician is in identifying the notes, the more accurate the musician will be in reproducing the notes. For these reasons, among others, it is important for musicians to train their ears to be able to identify and reproduce accurately, mentally or by singing, a series of notes.
Ear training methods of the prior art include professor-student sessions in which the student attempts to reproduce a series of predefined notes by voice or instrument. The professor attempts to provide feedback to the student during the performance. The professor attempts to indicate to the student when a pitch of any particular one played note is too sharp or too flat. While this method has been employed for years, it has disadvantages. One disadvantage is that it is difficult to provide real-time feedback for each note. There is some delay after playing a note and the professor's ability to comment on the note. A second disadvantage is a necessity for the professor, for its communication, to rely on the ear of the student (which is the object of the training itself). Another disadvantage with the system is that the comments may be incorrect in that the system is only as reliable as the professor's ability to discriminate proper relative pitch. The prior art method has still another disadvantage in that the student is unable to learn by themselves. Learning according to the method of the prior art requires the assistance of another, and that other person must be a skilled person.
The invention provides a novel apparatus and method to allow a student, otherwise unassisted, to properly and efficiently train their ear to recognize proper pitch of notes at a significant precision depending upon the instrument (for example one-eighth, one-twelfth or one-sixteenth of a step). Additionally, the ear training system trains each user to recognize pitch so as to produce notes themselves at such a precision, by singing or playing an instrument such as a voice, a violin, or a saxophone, for example.
SUMMARY OF THE INVENTION
The present invention provides apparatus and method for assisting a student to improve their abilities to recognize pitches and their understanding of solfege (music theory, intervals, sight reading and sight singing). The invention permits the musician to calibrate and educate their ear to enhance development of harmony and musicality. For each student, the system selects a particular exercise from a set of self-parameterizing pedagogic sessions. A difficulty level of the curriculum adjusts automatically to the student by choosing a particular exercise, a routine for the exercise and a level of audio and visual support provided by the system (this is one of several modes of the ear training system, called a practice mode in which samples are preocessed synchronously, after each sample is acquired for each note of the exercise). An ultimate goal of the ear training system is to have the user perform all exercises without any support and interference from the system (this is a mode referred to as test mode in which samples are processed asynchronously, i.e. when the exercise is complete and the user has attempted to generate each note) one hundred percent in tune (with a sensitivity requirement varying depending upon a particular instrument). Users of the ear training system have an option to elect to practice any specific element of a curriculum in a mode of their choice. Furthermore, graphic representation of sound characteristics (waveform and spectrum) are available for refining the quality of the musician's sound. Users can also prompt the ear training system to scan and analyze statistically their session log (which contains the best and last performance of each exercise executed in the test mode) for patterns of mistakes.
According to one aspect of the invention, an ear training system includes an electronic display, an analog-to-digital converter to sample generated sounds and convert the sounds into a digital format, memory for storing scenarios, exercises, and session logs, and a processor to present specific exercises to the electronic display, to determine pitch of notes actually played and compare them to expected notes, to provide feedback to the user regarding the pitch of the generated note, to record performance against predetermined standards, and to provide the user with a historical evaluation of performance relative to the predetermined standards.
In operation, the system initializes itself upon launch to determine specific hardware features present to optimize performance on the specific platform. After launch, the processor displays an exercise window with a recommended choice and a palette from which the user selects specific options or modifies system
parameters. To begin, the user manually selects a particular exercise, or permits the system to select an exercise based on the curriculum and past performance. The processor displays the particular exercise, typically a series of notes, and prompts the user to sing or play the exercise. As the user generates the notes, the analog-to-digital converter samples each note and extracts the note's pitch. The processor compares the generated note against the note of the exercise. If the pitch of the generated note exceeds a predefined error margin, the system indicates whether the pitch of the generated note was too sharp, or too flat. Different exercises will prompt the user to continue to try to generate the proper pitch until correct, or will permit the user to continue with the exercise.
When ready, the user performs the exercise without any interference from the system (test mode). The notes are analyzed when the exercise is complete with results of the exercise displayed on the score and recorded in the session log. The log permits the user to review any particular exercise's performance, not just in a numerical sense, but also to playback the session using the user data. The system employs an optional pedagogic structure which will select exercises for the user, based upon data recorded in the session logs.
Using input notes, the system is able to facilitate tuning or evaluation of an instrument by plotting frequency against time (an input note's waveform), or by graphing energy versus frequency (the input note's spectrum).
In an alternate preferred embodiment, systems employing a frequencizer are able to function as a music notation system. User-generated notes are converted to graphic representations of symbols corresponding to frequencies of notes closest to the user-generated notes. In this mode, the user is able to play music and have the system record, in conventional music notation form, the generated notes.
The present invention offers many advantages over the prior art ear training systems and method. These advantages include a use of a knowledge base that includes pedagogic structured exercises and use of past performance results to advance a user efficiently. The user is able to train the ear without another person's assistance. The training system is useable for complex audio sources, such as a voice or a violin. To facilitate training, the ear training system is designed to operate in a hands-off mode, allowing the user to play their instrument without manually operating the ear training system as well.
Reference to other portions of the specification and drawings may realize other features and advantages of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is an ear training system 100 embodying a preferred embodiment of the present invention;
Fig. 2 is an illustration showing relationships among selected modules and windows of the ear training system 100;
Fig. 3 is a flow chart of the self-configuration module 210 for the ear training system 100;
Fig. 4 is a flow chart 500 of a operation of the ear training system 100 for a typical session;
Fig. 5 is a detailed view of the exercise window 280 and the palette 380;
Fig. 6 is a detailed illustration of the tuner window 340;
Fig. 7 is a detailed illustration of the spectrum window 310; Fig. 8 is a detailed illustration of the waveform window 320; Fig. 9 is a detailed illustration of the results window 330;
Fig. 10 is a detailed illustration of the tuning plot window 300;
Fig. 11 is a flow chart of the listening mode having synchronous processing;
Fig. 12 is a flow chart illustrating the tuning mode;
Fig. 13 is a flow chart of the process for the waveform window 320 of Fig. 8;
Fig. 14 is a flow chart of the spectrum display process for the spectrum window 310 of Fig. 7;
Fig. 15 is a flow chart showing operation of the test mode in conjunction with the processing mode described for the exercise window 280;
Fig. 16 is a flow chart detailing the sampling process described in the preceding figures;
Fig. 17 is a flow chart of the fast Fourier transform process identified in the preceding figures;
Fig. 18 is an operational diagram of a frequencizer for providing sampling and signal processing according to a preferred embodiment of the present invention; Fig. 19 is an illustration of an exercise window 280 providing an exercise, using an alternate preferred embodiment of the present invention, for
Twinkle Twinkle Little Star using animated characters to teach children; and
Fig. 20 illustrates a format for a freqDurationCmd command provided to the SoundManager.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention provides method and apparatus which assist a user in educating and calibrating their ear to correctly recognize pitches of generated notes. According to one aspect of the invention, a system selects an exercise from a curriculum and displays a series of notes for the user to generate. As the notes are generated, the system extracts a digital representation of a frequency for each of the generated notes. The system provides feedback for each note. The feedback indicates whether a pitch determined for the generated note was too sharp or too flat relative to a pitch for the note of the exercise corresponding to the note the user attempted to reproduce.
Fig. 1 is an ear training system 100 embodying a preferred embodiment of the present invention. The ear training system 100 includes a computer system 150 coupled to a digitizer 160 and an output circuit 162. The computer system 150 comprises a central processor 108, a system memory 107, an input device such as a keyboard 103, for example, a mouse 102, a fixed disk 104, a display 101, an external interface 105, a printer 109, and an input/output (I/O) controller 106. The computer system 150 optionally includes a floating point unit (FPU) 110 to provide enhanced features to embodiments implemented with computer systems 150 having central processing units 108 lacking an integral FPU, as will be explained further below. A system bus 115 couples all the components of the computer system 150, providing a link between all of them. A preferred embodiment uses an appropriately programmed Apple Macintosh® Plus or better (Apple Computers, Cupertino, California). It is to be understood that other platforms are available and may embody the invention in other forms. The invention is not limited to embodiments that include Apple computers. The digitizer 160 receives audio analog input and converts it into digitized audio. In the preferred embodiment, the digitizer 160 provides 8 bit sampling at 22.1 kHz, or better. The output circuit 162 includes an digital-to-analog converter to convert digitized sounds to analog sounds and a speaker to reproduce the analog sounds. The digitizer 160 and the output circuit 162 shown coupled to the external interface 105 are actually an integral part of the Apple Macintosh computer and provide the required audio sampling and playback for one preferred embodiment of the present invention.
Fig. 2 is an illustration showing relationships among selected modules and windows of the ear training system 100. The modules of the ear training system 100 include a launch module 200, a gestalt and self-configuration module 210, a group of session-management modules that include a scenarios module 220, an exercises module 230, and a session log module 240, and sound processing modules including a sampling module 250, a pitch determination module 260, and a sound generation module 270.
There are seven windows associated with the ear training system 100. They include an exercise window 280, a content window 290, a tuning window 300, a harmonic content window 310, a waveform window 320, a result window 330, and an electronic tuner window 340.
Window selection is made through selection of a desired window choice from an options menu integrated into each window. Other menus of the ear training system 100 permit selection of other features and customization of operation of the ear training system.
Another mechanism used to control operation of the ear training system 100 is use of a palette 380 providing a plurality of icons associated with various functions. When the ear training system 100 is implemented on a particular computer system 150 having a graphical user interface (GUI), icon selection from the palette 380, or menu choices from any of the menus is possible by use of the mouse 102, or other relative pointing device.
The launch module 200 directs initialization of the ear training system
100 followed by an execution of the gestalt and self-configuration module 210. The self-configuration module 210 automatically enters default values for ear training system 100 operation based upon a model of the computer system 150. This self-configuration optimizes operation to compensate for speed and central processor type differences between computer system platforms.
After launch and self-configuration, a particular exercise is displayed in the exercise window 280, waiting for the user. This initial exercise is selected by use of a scenario module 220, an exercise module 230 and a session log 240, if available. The exercise module 230 contains numbered sets of a series of notes. Each exercise is designed with some particular objective in mind. For example, an exercise may test an ability to play a chromatic scale, or test fingering for note generation, for example. The exercises may be quite specific, depending upon a scope of a curriculum the ear training system 100 is designed to cover. The notes of the exercises are stored in chained ASCII format
The scenarios module 220 includes at least one decision tree. The decision tree identifies a default progression of exercises designed to efficiently advance the user in pitch recognition and note generation. Typically, a user starts at exercise one and follows the pedagogic structured decision path after satisfactory completion of each particular one exercise identified in the user's decision tree. The decision tree selected for a particular user is based in part upon the user's age, instrument type and measured performance for a set of evaluation exercises selected when the user first begins to use the ear training system 100. At a beginning of each session, the ear training system reviews the user's session log 240 to determine the user's demonstrated proficiency for any exercises completed in the user's decision branch. If performance proficiency is sufficient, the next pedagogic structured choice from the decision branch selects and loads a next particular exercise from the exercise module 240. If the demonstrated proficiency indicates that specialized work on specified aspects of the user's technique is necessary, the ear training system 100 will branch to adjustment scenarios in the scenario module 220 based upon evaluation of the information in the session log 240. The user has the option to manually select particular exercises. Particular manual choices are made from the content window 290 to select a particular aspect of their instrument to improve or from the result window 330. Thus, the session management modules select a particular exercise to load. The series of notes to be played and the mode are displayed in the exercise window 280. The ear training system 100 has two typical modes of operation, the practice mode and the test mode. In each of these modes, the user attempts to play prompted notes with a particular precision as measured within a predetermined step-interval. For each of the series of notes, the user sings or plays a note. The sampling module 250 takes the analog sound and digitizes it to produce a set of digitized values. There are many prior art techniques available for digitizing sounds, many of which would be applicable for the sampling module 250. The user has the option to select the waveform window 320 to directly display the sampled and digitized sounds from the sampling module 250. After sampling the sound, the set of digitized values is processed by the pitch extraction module 260. The pitch extraction module 260 determines a frequency for the set of digitized sounds. In the preferred embodiment, fast Fourier transforms extract the frequency. Optional, and more powerful and accurate techniques are also used depending upon the configuration for the computer system 150, specifically whether the FPU 110 is present. The user may visualize pitch of generated notes in several ways in addition to feedback presented in the exercise window 280. The ear training system 100 presents extracted pitch information to the spectrum window 310, to the tuning plot window 300, or to the tuner window 340, depending upon a particular use for the system 100.
The pitch of the played note is compared with the pitch of the prompted note. If the played note is within the present step-interval selected for the user (a predefined margin), then no further action is required for the particular note. Should the note be flat or sharp, an appropriate indication in the exercise window 280 provides feedback to the user. Depending upon whether the ear training system 100 is operating in the practice mode or the test mode, the system will store results of an exercise into the session log 240. For test mode operation, the ear training system 100 collects information regarding all the notes of an exercise prior to adjudicating performance for each note. After completing the entire session, the system 100 provides feedback and grading to the user. The user has the option of having the exercise, or a reproduction of the user's performance, played by the system 100. The playback module 270 reproduces digitized sounds. In the practice mode, results are stored in memory, such as RAM. The storage duration is for one exercise, allowing the system to play back the performance to assist the user in evaluating the
performance. The preferred embodiment does not record all this information to a session log to preserve processing and allow users to practice. It should be recognized that for some applications, storage of all exercises is desirable.
In the above-description, a term referred to as a step interval is used to judge precision of generated notes. One popular system for transcribing musical notes is use of a harmonic scale in which each music octave is divided into 12 notes; C, C sharp, D, D sharp, E, F, F sharp, G, G sharp, A, A sharp, and B. The scale continues to repeat itself for each octave. Advancing from one note to a next is a half-step interval, thus the notes are said to have a half-step between them. Each note of each octave has an associated frequency associated with it. In one notational system, desired notes, notes are numbered consecutively, such that middle C has a note number of 60. Middle C sharp is numbered note 61. This system continues for each of the octaves. Not only does each note have a representative number, but each note has a frequency associated with it. While the note numbering system is a linear system, the frequencies associated with notes of an octave are exponential. Thus, a change of frequency from note 60 to note 61 is one-half a change of frequency from a change of note 72 to note 73 (Note 72 is C, one octave higher than middle C).
It is more convenient for the ear training system to operate with the linear system of note notation, thus the system converts frequencies to decimal note numbers. In the linear notation system, each step from note to note is divided appropriately. As a change from note 60 to note 61 is a half-step interval, when the system 100 judges user precision with one-eighth, one twelfth or one-sixteenth note precision, the range is divided accordingly. That is, for one-sixteenth step intervals, ranges are established from 60 to 60.125; 60.125 to 60.25; 60.25 to 60.375; 60.375 to 60.5; 60.5 to 60.625; 60.625 to 60.75; 60.75 to 60.875; and 60.875 to 61.0. For the one-half step interval between note 60 and 61, eight divisions provide one-sixteenth step interval. A user must perform the prompted note within 0.125 on each side of the prompted note to be judged as being played properly.
The mechanism for converting frequency to a note number is performed with the following mathematical equation: F = 8 . 18exp ( 0 . 057 8 * N) ; where F is frequency in Hertz and N is a numerical, i.e. "decimal value", for a note. For example, for C4, N=60 and F = 261.625566 Hz. Table I identifies decimal values for eleven intervals of notes.
Figure imgf000013_0001
The TABLE I values identify, for each octave interval, a decimal note value for a corresponding chromatic scale note in the identified interval. The values of Table I correspond to MIDI values for the corresponding note.
A first step in obtaining the note number is to determine a frequency for a played note. There are many mechanisms and algorithms, as well known in the art, to extract a frequency from an audio signal. The preferred embodiments of the present invention use the digitizer 160 and the central processor 108 including the FPU 110 (if available) or a frequencizer (described below) to determine a frequency to establish a note number. Fast Fourier Transforms (FFT) performed on a set of sampled digitized sounds facilitate extraction of a note's frequency in the preferred embodiment.
Fig. 3 is a flow chart of the self-configuration module 210 for the ear training system 100. The flow chart of Fig. 3 illustrates a relationship between a main process flow of the system 100 and resources, user interface elements, auxiliary flow and function. For example, for the Macintosh system, the self-configuration module 210 checks, at step 211, the system version number of the operating system to be sure that the version is at least version 6.0.7. This check, specific to the Apple computers, ensures that SoundManager 2.0 is available on the system. SoundManager is a part of the Macintosh® toolbox which handles sound sampling and synthesizing. The
SoundManager is further described in Apple® Inside Macintosh, volume VI, hereby expressly incorporated by reference for all purposes. Users are required to upgrade if their system version is less than version 6.0.7. As illustrated, a dialog box 211A appears directing the user to upgrade the operating system to a version greater than 6.0.7 if less than 6.0.7. Other checks may be necessary for other systems, such as checking whether a user employs Windows 3.1 with its multimedia extensions, for example.
Fig. 20 illustrates a format for a freqDurationCmd command provided to the SoundManager. The SoundManager responds to commands in this format to generate sounds. Sample code in Appendix Al illustrates use of the freqDurationCmd command.
At step 212, the self-configuration module 210 checks for a presence of the digitizer 160. The Sound Manager attributes an identification number to the digitizer 160 at start-up. The identification number permits access to the digitizer 160 by the ear training system 100. If the ear training system 100 does not detect the presence of the digitizer 160, it notifies the user of a possible malfunction through a dialog box 212A. Next, at step 213, the self-configuration module 210 checks for the FPU 110 or an FPU integral to the central processor 108. Without the FPU 110, the ear training system 100 enters a PowerMiser ON mode. A dialog box 213A notifies the user that the PowerMiser mode is on. PowerMiser refers to a reduced features mode available for the ear training system 100 which compensates for an absence of the FPU 110 by reducing numerical processing so that note processing time is acceptable.
In PowerMiser on mode, the preferred embodiment, for example, provides for smaller sample sizes, disables octave determinations (permitting determinations of a note of the chromatic scale only), and reduces range of the ear training system 100. For computer systems 150 with central processors 108 operating at a relatively slow clock rate, for example less than or equal to 16 MHz, or missing the FPU 110, extensive numerical calculations degrade an ability of the computer system 150 to provide real-time feedback to the user (in synchronous mode). For these computer systems 150, the ear training system 100 will determine a note by the played note's frequency of maximum energy. The system 100 does not determine an octave for the note. If the PowerMiser mode is off, such as when the FPU 110 is present, the system 100 establishes both the note and its octave. A full power mode is possible for computer systems 150 having the slow central processor 108 lacking the FPU 110 through addition of a custom hardware device referred to as a frequencizer. The frequencizer includes the digitizer 160, a ROM-based frequency-extraction algorithm, and a digital signal processor (DSP) to perform the frequency extraction (e.g., fast Fourier transforms and autocorrelation functions). The frequencizer sends a signal representative of a frequency of a sampled sound set to the computer system 150. The frequencizer, to maintain an ability for asynchronous processing, includes an ability to send sound samples to the computer system 150, just as the digitizer 160 operates. Alternate embodiments of systems employing a frequencizer are able to function as music notation systems.
At step 214, a color depth check loads attributes used for the monitor 101. Identification of screen depth by the ear training system 100 selects between a monochrome display (1 bit depth), or color (4 or 8 bits depth). These resources include various graphic objects including but not limited to icons, notes, musical symbols, pictures, et cetera. Step 211 through step 214 define a gestalt function for the ear training system 100 to access and optimize the ear training system 100 for operation with the computer system 150.
At step 215, the correct resources for color depth and other configuration parameters are collected. These other configuration parameters include screen pixel resolution, RAM size, Macintosh® type, digitizer 160 type, port used for the digitizer 160, if applicable. For first time operation of the ear training system 100, a dialog box 215A appears requesting user information. Auxiliary flow performs a registration step at 215B which saves a serial number, location of system licensor, and various settings chosen by the user through checking a special window. After any of steps 215 or 215B, the self-configuration module 210 saves user preference and system information to a file. Thereafter, the ear training system 100 initializes itself at step 216. Step 216 presents an about window which loads information from the user file and any QuickTime animation files if QuickTime is present. QuickTime is an animation standard developed for use with Apple Macintosh computers.
The about window step 216 presents a dialog box 216A which identifies the product, the manufacturer, configuration and user information, in addition to other information. During the intialization process or at any time upon selection of a menu item, a preferences window is displayed, to permit the user to make special choices regarding some attributes of the interface, such as, for example, colors of various objects, musical nomenclature style (e.g., ABC, mobile Do, fixed Do, or fingering when applicable), a volume of sound output, enablind or disabling a voice prompt incorporated within the ear training system 100, adjusting a background rejection level, switching PowerMiser mode. Changing any of these values results in a branch to the auxiliary step of 216C to set preferences. These new values are saved in the user preference file and the system 100 reinitializes at step 216.
Thereafter, at step 217, the ear training system 100, depending upon user preferences set in the preferences window at step 216, either automatically sets conditions or allows the user to self select an exercise. For automatic operation, the ear training system 100 checks the curriculum (from the scenarios module 220 of Fig. 2) to determine a decision branch applicable to the user based upon information supplied in the about window at step 216. This information includes age, instrument and skill level, for example. Next, the automatic mode reviews the session log to evaluate performance to date to properly place the user in the decision tree. Based upon the user's decision tree and past performances, the ear training system selects a suggested exercise for the user. The suggested exercise is loaded from the exercises module 230 (Fig. 2). The system 100 displays the notes and parameters of the conditions in the exercise window 280.
For self-selection, the ear training system 100 permits the user to identify a particular one exercise from the exercises module (Fig. 2). The ear training system 100 at step 217 will use the identified exercise to set conditions for
performance at step 217. The set conditions step 217 thereafter displays the particular one exercise in the exercise window 280.
Fig. 4 is a flow chart 500 of a operation of the ear training system 100 for a typical session. The flow chart 500 of Fig. 4 illustrates a relationship of the main flow process with resources and user-interfaces. Session-management modules and sound processing modules participate in execution of a session. The session-management modules are responsible for selection of a particular exercise from a curriculum and presenting a series of notes to the user to prompt generation of the notes. The session-management modules initially select and load a particular exercise and display a series of notes in the exercise window 280. Associated with, the exercise is a performance mode, such as practice or test mode. Selection of the particular exercise results in a. display of the exercise window 280 which has a plurality of scores and superimposed notes. At start, step 502, the ear training system 100 displays an exercise window 280 with a loaded exercise. At 504, the system 100 enters a listening mode and prompts the user to get ready by use of a dialog box 504A. At step 506, a performance process synchronously prompts the user to play a note, at which time the system 100 provides feedback on the played note relative to the prompted note to the user. The system 100 optionally displays the feedback in many different forms, a typical way is directly in the exercise window. Upon completion of the performance at step 506, the system 100 evaluates the user's performance. Thereafter at step 508, the system 100 selects a new exercise. The new exercise, derived from the session-management modules presents the new exercise to the user in an exercise window 280 (Fig. 2). For the new exercise, which may be the same previous series of notes, the system 100 selects an appropriate mode. The practice mode provides the user with relatively large amounts of audio and visual support to satisfactorily perform the exercise. The same series of notes, with less audio and visual feedback, can present an entirely different exercise to the student. A goal of the ear training system 100 is to train the user to recognize and perform to a specified precision without extraneous support from the system 100.
Thus, the new exercise selected at step 508 can return to step 502 if the exercise is to be a practice mode exercise. If the ear training system 100 is to test the user based upon the new exercise loaded at step 508, the program flow advances to step 510. Step 510 places the sound processing modules into a sample mode. At step 512, the user completely performs the exercise. Thereafter, at step 514, the
performance is asynchronously processed to determine precision of the performance. The processing of step 514 includes saving the data in a session log and providing feedback regarding the user's precision. Depending upon the exercise, if the performance was not above a particular level, the system 100 expects the user to perform the exercise again. The user continues to perform, steps 512 and 514, until the user achieves the exercise at a particular result level. After the processing step 514, the system 100 advances to step 516 to load information from the session log and the display resources. Information from the user log allows the user to monitor progress and proficiency. One way to provide the user with the information is through display of the results window 330 (Fig. 2).
Thereafter, at step 518, a global auditing of past performances provides a historical assessment of proficiency for different aspects of the ear training. For example, an statistical analysis of the user file evaluates all exercises by note. The analysis could indicate that notes F and G are consistently sharp while D is often flat and the other notes are, on average, performed adequately. An audition window 518A displays a graphic representation of the intonation abilities of the user, for the range of notes stored in the session log, The view in the audition window 518A represents actual patterns of recognition of pitches for each of the notes practiced in the curriculum. The graphic display of the audition window 518A allows the user, or others, to visually perceive strenghts and weaknesses of the user's ear. The audition window 518A provides the user with an option to print the summary information displayed. Other analysis tools are possible, depending upon implementations of the ear training system 100. The system 100 saves global auditing information in the session log.
Fig. 5 is a detailed view of the exercise window 280 and the palette 380. The exercise window displays a series of notes 281 for the exercise in traditional musical notation on musical scores 282. The exercise window 280 includes a menu bar 283 on top listing different menu choices available to the user. On the right of the exercise window is a command bar 284 illustrating particular options set for the mode. For example, the view mode which controls the note and staff size (reduced in example), the tempo (80 in example), the routine type (loop on miss in example), the level of visual or audio support provided, the sensitivity and the nomenclature chosen. Audio support includes playing an exercise or note prior to or subsequent to the user generating the note. A status bar 285 along the bottom of the exercise window indicates the instrument type, system mode, status and prompted note.
The palette 380 identifies various selectable features by association of the feature with an icon. The highlighted ear icon in the palette selects display of the exercise window which will prompt notes and wait for the user to play the note. The speaker icon will play an exercise, while the microphone icon samples and stores notes the user generates. The hammer icon initiates an evaluation of any sampled and stored notes resulting from selection of the microphone icon. The tuning fork icon selects the tuning window 340, the spectrum icon selects the spectrum window 310 and the waveform icon selects the waveform window 320.
In the practice mode, the system 100 prompts the user to play a particular note by highlighting the note, such as for example by displaying the note with inverse video or surrounding the note with a blinking rectangle 286. After the user generates the prompted note, the ear training system 100 determines a frequency for the played note and provides the user with feedback regarding performance on the note prior to prompting a new note. The feedback in the preferred embodiment results from superimposing a chevron 287 of one or two stripes either above or below each note. For example, for a note played slightly sharp, a single chevron, pointing down, is displayed above the note symbol. Similarly, for notes slightly flat, the system 100 places a single upwards directed chevron under the note. For notes which are much further off, such as playing above C sharp for a prompted C, an appropriately pointing double chevron juxtaposed near the note denotes to the user that the generated note differed significantly from the expected note. The exercise window 280 also includes another user notation symbol for a bad data sample. The cross symbol 288 signifies that a problem exists with the sample, and the problem prevents evaluation of the note. Possible causes of a bad data sample include the user generating two notes at one time if multi-frequency tones are not allowed (for example, for systems 100 lacking the frequencizer), or some other sound
superimposed over the user generated note, such as a telephone signal. Other visual and auditory feedback mechanisms are possible to indicate performance precision to the user.
In evaluating a user's performance for particular notes, the ear training system 100 uses information from the user regarding instrument type and
performance ability. The ear training system uses this information to select an initial precision step. The preferred embodiment selects from one-eighth, one-twelfth and one-sixteenth step interval's, although other steps could be used. This interval step indicates how precise the user's reproduction of a note must be to warrant
adjudication that the note was properly played. The ear training system dynamically monitors performances during test modes against this interval step. After a user performs satisfactorily such that a predetermined percentage, for example eiglity percent in the preferred embodiment, of the reproduced notes are correct, the ear training system automatically selects a more precise step interval for use. Thereafter, the user's performance is gauged against the new step interval. As the user continues to improve and performance is judged correct for eiglity percent of the notes, the ear training system uses progressively more precise step intervals for the user. Should the user ever demonstrate a proficiency for which less than a second predetermined percentage, for example forty percent, of notes are reproduced within the precision of the present step interval, the ear training system 100 adapts the system 100 to thereafter use a less precise step interval. The ear training system is able to adjust itself to a user's ability level, providing a valuable tool for users of all ability levels.
User performances executed while in test mode, which is the
asynchronous mode, are recorded into a session log, along with sampled sounds to permit the session to be replayed for the user. The test mode provides feedback, for each note, after the user has played all notes of the session. It is this mode which grades a user's performance and determines the user's proficiency level. Different statistical analyses and tests of the session log influence selection of a next exercise from the curriculum.
Fig. 6 is a detailed illustration of the tuner window 340. When optionally selected by the user, the user is able to synchronously determine proper pitch of a played note. The window updates, based upon the sampled sound, the present frequency, acting like an electronic tuner. The window includes an arrow 341 which moves relative to a scale 342. The arrow 341 indicates a percentage of step the note of the sampled sound is away from a proper note (i.e., tuned). The three notes displayed indicate the one step range (-50% to 50% corresponds to one step) represented on the window, and which frames the sampled note. This window provides quantitative information for feedback rather than indicating only whether a note was flat or sharp.
Fig. 7 is a detailed illustration of the spectrum window 310. By selecting the spectrum window 310, a user is able to evaluate energy versus frequency of a sampled sound. It is sometimes useful to determine a distribution of energy between the different frequencies composing a note. This permits the quantitative analysis of a content of a note or tone according to its fundamentals and harmonics frequencies. This analysis is useful to asses a quality of a particular instrument by careful review of notes it generates
Fig. 8 is a detailed illustration of the waveform window 320. The waveform window 320 is another diagnostic tool providing specific information regarding a played note. The plot in the waveform window 320 is amplitude versus time. This is a traditional representation of a wave. By observing a regularity and homogeneity of the waveform, a user is able to judge a purity of a generated note or a pertinence of a sample.
Fig. 9 is a detailed illustration of the results window 330. The results window 330 is available to present summary information regarding the completed exercises. The results window 330 presents summary information in histogram format. The format graphically indicates, for each completed exercise, a last score and a best score. Statistical and other evaluations of trends and proficiency measurements are accessed from the results window. Some of these tests include mean average, variance, standard deviation, tempo problems, range addressed, and patterns of mistakes like specific notes or intervals fingering.
Fig. 10 is a detailed illustration of the tuning plot window 300. This window permits a user to accurately tune their -instrument. A desired frequency is used as a reference frequency. The tuning plot window 300 displays the closest note. When the user generates a note, the note's frequency, relative to the reference, is plotted against time. Additionally, pitch error indicators dynamically indicate the played note's relative pitch. That is, as shown, the note began too flat and became sharp. As time progressed, the note became decreasingly sharp, approaching the reference frequency. The proper tuning is graphically displayed in the tuning window 300, facilitating accurate reproduction of notes. Sharp and flat indicators graphically provide information regarding relative pitch compared to a reference note.
Fig. 11 is a flow chart of the practice mode having synchronous processing. As shown, the system prompts a note at step 510. A sample process 522 converts analog audio signals to digitized signals, through eight bit sampling at a sampling rate of 22.1 kHz or better. The sampling step 522 provides a time-out error if it does not detect audio input within a prespecified period. At step 524, the ear training system 100 performs fast Fourier transforms of the sampled and digitized data to extract its frequency. After extracting the frequency, a storage step 526 records the information to memory for later playback. Thereafter, a delay step 528 precedes displaying the result at step 530. The delay step allows adjustment of a time interval between consecutive samples to an assigned tempo independently of a speed of processing of the sample which will vary significantly for different hardware configurations of the ear training system 100, such as central processor 100 clock speed or type, or a presence of an FPU 110 or a frequencizer. The display step 530 provides feedback regarding the played note relative to the prompted note. In the preferred embodiment, the feedback results directly in the exercise window 280. The highhghted icon of the palette 380; commands the system into the particular mode.
Fig. 12 is a flow chart illustrating the tuning mode. The tuning mode is available only on computer systems 150 having the FPU 110. In this mode, the system 100 continuously processes a an input audio frequency and plots the result against time. Displaying the information as an electronic tuner is one preferred embodiment. The plotting window permits visualization of sustained accuracy and vibrato
characteristics. A quantitative indication of flatness or sharpness of a note results from use of the tuner window. To better simulate a Vu-meter, the system does not plot frequency versus time, but rather completely redraws the arrow.
A tuning mode begins with a particular display window at step 550. The system 100 samples input audio at step 552. Fast Fourier transforms, step 554, extract a frequency for the sampled sound. The system 100 records pitch errors detected after extraction of the frequency at the step 554. At step 556, the system 100 plots frequency, either versus time or per acquisition. The flow returns to step 552 for the next sample. The process continues until the user terminates this mode.
Fig. 13 is a flow chart of the process for the waveform window 320 of Fig. 8. Selecting display of the waveform window 320 executes the steps 560 through 564. The system 100 will review a set of sampled notes. At step 560, the system 100 loads a note. At step 562, the system 100 opens the waveform window 320.
Thereafter, at step 564, the system 100 plots the sample in the waveform window 320. The process waveform of the sample processed is displayed until the user deselects this waveform view mode.
Fig. 14 is a flow chart of the spectrum display process for the spectrum window 310 of Fig. 7. Similar to the flow chart of Fig. 13, the process loads a note from a set of sampled notes at step 570. After performing a fast Fourier transform on the loaded note, step 572, the process opens the spectrum window 310 at step 574. The spectrum display process then plots, step 576, a power content of the sample versus frequency. Again, the spectrum of the sample processed is displayed until the user deselects the mode.
Fig. 15 is a flow chart showing operation of the test mode in conjunction with the processing mode described for the exercise window 280. The preferred embodiments contemplate providing a test mode after each exercise. The test mode is an asynchronous processing of the users notes. The test mode provides a faster interactivity allowing work at faster tempos. This faster interactivity is possible without regard to the computer system 150 because the system 100 samples and stores data and waits until completion of the exercise to process the stored samples. Processing the data will be dependent upon performance capabilities of the computer system 150, but the processing does not impact the execution of the exercise..
In the test mode, the system 100 prompts a note, step 580, samples the sound, step 582, and stores the samples at 584. These steps 580 through 584 continue for each note of the exercise. Thereafter, for steps 586 through 590, the stored sampled notes are processed to extract frequency and compare to expected values. Step 586 loads a note, step 588 extracts its frequency, and step 590 displays the extracted frequency relative to a value for the corresponding prompted note. The processing continues for all the stored notes.
Fig. 16 is a flow chart detailing the sampling process described in the preceding figures. The sampling process begins at start, step 600. At step 602, the system 100 initializes the sound manager. A timeout check at step 604 prevents the system 100 from waiting indefinitely for user input. If timeout is true, the sample process asserts a timeout error, step 606, and ends.
If there is not a timeout at step 604, the process advances to step 608 to get a set of samples. Step 610 generates a value based on the deviation of the last one-eighth samples. Step 612 checks the value to ensure that the value is greater than a minimum value. If it is not, the sampling process returns to step 604 to test for timeout If the test at step 612 is greater than a threshold, the process advances to step 614 to delay, then to step 616 to obtain more samples. At step 618, an amplitude value derived from a deviation from a first one-eighth samples. The amplitude value must exceed a minimum at step 620 to proceed. If the amplitude is not sufficiently large, the process returns to step 604 to test for a timeout error.
Passing the amplitude test at step 620 results in calculation of a new amplitude value derived from a deviation of a last one-eighth samples of a most recent sample set, step 622. Again, step 624 tests the amplitude and if it is too low, the process branches to step 604 to test for timeout. Passing the amplitude test at step 624 results in a calculation of a new amplitude value based upon a deviation of all samples, step 626. If the amplitude is sufficiently large, step 628, the process returns the samples, step 630, and ends, step 632. If the amplitude at step 628 is not above a threshold, then the process branches to step 604 to test for a timeout error. Thus the sampling process performs amplitude qualification on samples from the digitizer 160. Fig. 17 is a flowchart of the fast Fourier transform process identified in the figures above. The FFT process begins at 700, start. Step 702 computes a mean of the samples provided from a sampling process, for example. Step 704 subtracts the mean from all the samples. Step 706 computes a Fourier transform of all samples. Step 708 obtains the discrete frequency of maximum energy. Step 710 interpolates a real frequency from all neighboring energies. Step 712 tests for a presence of the FPU 110 in the computer system 150. If there is a FPU 110, the process continues to step 714 to compute an autocorrelation of a frequency spectrum for the sample. Step 716 obtains a frequency corresponding to a first maximum of autocorrelation energy. Step 718 interpolates real frequency from neighboring energies. Next, step 720 chooses a best estimate of true frequency. Sample code in Appendix A2 illustrates a
representative process for extraction of frequency of a sampled note.
The test at step 712, if it indicates an absence of the FPU 110, the process skips steps 714 - 718 by branching to step 720. Step 722 converts the frequency into a decimal note by techniques well known in the art. Step 724 returns the decimal note and the frequency spectrum of the sample. The process is complete at step 726, end.
Fig. 18 is an operational diagram of a frequencizer for providing sampling and signal processing according to a preferred embodiment of the present invention. The frequencizer has two modes of operation, the frequencizer mode and the digitizer mode. In the digitizer mode, the frequencizer performs like the digitizer 160 of the computer system 150. It samples analog sound, eight bits, of 22.1 kHz or better. Digitizer operation is shown on the right hand side of Fig. 18. The digitizer performs ASCII conversion on the sampled values and chains the ASCII values to produce an output file. The output of the frequencizer is coupled to the computer serial port. This digitizer mode permits operation of the ear training system in asynchronous mode. This chained ASCII is a common convention for communication of values. For example, to send 60.25, five ASCII characters are sent. These characters represent "6", "0", ".", "2", and "5".
A more powerful mode is the frequencizer mode which includes signal processing circuitry to perform autocorrelation and fast Fourier transforms to convert a spectrum of energy. The frequencizer converts the energy spectrum into a decimal note, followed by an ASCII conversion. The output of the frequencizer, coupled to a computer system's 150 serial port, are frequencies of sampled analog sounds. The ear training system 100 employing the frequencizer has an ability to quickly process synchronous exercises at relatively fast tempo rates and provide real-time feedback. The inclusion of a frequencizer, by virtue of increased processing speed, allows an embodiment of the ear training system 100 employing a frequencizer, to process multiple frequencies usch as two notes being played at once. Further, the frequencizer allows coupling with a notation capacity such that the user can play notes, have the pitch extracted, and have the played notes graphically represented on the display.
In conclusion, the present invention provides an efficient mechanism to enhance musicality and to educate and calibrate a user's ear to recognize and produce precise pitches for a series of notes. The use of a structured pedagogic scenario that selects particular exercises to efficiently advance a user is an important aspect of a preferred embodiment of the present invention. The structured pedagogic exercises are modified by analyses performed on recorded data of past performances to correct aspects of performances below a desired proficiency. The ear training system customizes itself to existing software to enable peak performance on multiple classes of computer systems. A digitizer converts analog audio into digital samples.
Processing of the digitized samples extracts a frequency of the samples and permits comparison to expected values. A frequencizer can improve performance for all classes of computer systems. Providing a practice mode and a test mode (synchronous and asynchronous) with varying levels of feedback enhances an ability of the ear training system to train a user in precise recognition and reproduction of pitch.
While the above describes particular preferred embodiments of the present invention, various alternatives, modifications or alterations are possible. For example, different applications of the invention have an ability to adjust a curriculum and notation to different user needs. For young children for example, the curriculum will be made of animations related to a song or a specific exercise. Children would be prompted to sing notes by use of an animated character which would indicate the note to be played. Rather than using a symbol, such as the chevrons which could be cryptic to the child, the animated character provides the feedback to the child user. For example, in an exercise designed to target a particular pitch, a dog, running , at a proper tempo, would jump through a hoop for correctly sung notes. For sharp notes, the dog would overjump the hoop, while flat notes would cause the dog to undershoot the hoop.
The graphic character is customizable to particular songs. For example, to teach children to sing Twinkle Twinkle Little Star, an animated star walking above a staff would prompt a user to sing particular notes. Similarly to the example with the dog, the animated star could leave a smiling star for properly produced notes, while producing frowning stars for sharp or flat notes. Fig. 19 is an illustration of an exercise window 280 providing an exercise for Twinkle Twinkle Little Star using animated characters. As shown, the animated character 800 advances above the staff 282, from note to note. Correct notes produce a smiling star 802, sharp notes produce frowning, downward looking stars 804, and flat notes produce frowning, upward looking stars 806. Bad samples have frowning stars with crossed-out eyes 808.
Therefore, the above description should not be taken as limiting the scope of the present invention. The scope of the invention is set forth in the appended claims.
APPENDIX A1
VAR
mySndChan : SndChannelPer; (pointer to a SndChannel ) mySndCmd: SndCommand; (a sound command)
myErr : OSErr;
amplitude : Integer;
frequency : Integer ;
BEGIN
amplitude := $FF; (loudest possible amplitude) frequency := 60;
mySndCmd.cmd := freqDurationCmd;
mySndCmd.param1 := 2000; {duration in half-milliseconds} mySndCmd.param2 := BitShift (amplitude, 24) + frequency;
(play the sound)
(assume that mySndChan points to a valid SndChannel) myErr : = SndDoCommand (myAndChan, mySndCmd, false ) ;
IF myErr <> noErr THEN DoError(myErr) ;
END;
APPENDIX A2 source code of function extract (synchronous processing)
XFCN name: Extract╌ extract the frequency of sound - - Form:Extracl(<SID number>,<note>,<tempo1>,<tempo2>)
Example: Extract(1.60.100.500)
Note: if note is 0. resampling step will be 2: otherwise it will be adjusted for best precision in frequency extraction around the given Hypercard note
MPW compilation syntax
C -b extract, c
Link -o testXcmd -w -rt XFCN=201 -sg Extract -m ENTRYPOINT extract..c.o∂ {Llbrarles|Runtime.o {CLlbrarics-|St.dCLlb.o {Llbrarles|Interface.o∂
(LlbranesfHyperXLlb.o {CLibrarieslCSANELth.o
*/
# include <HyperXCmd.h>
#include <Type.9.h>
#include <String.h>
#lncludc <strings.h>
#include <Memory.h>
#include <ToolUtils.h>
#include <OSUuls.h>
#include <time.h>
#include <FixMath.h>
#include <Math.h>
#include <SANE.h>
#include <SoιmdInput.h>
#include <GestaltEqu.h>
/* Function prototypes */
void Fail(XCmdPtr. short);
short GetParamCount(XCmdPty, short, short):
Handle CopyStiToHand(char *):
extern void sndinfo(int n. float fll. float "mean, float *amp):
extern float fp(short n, float *v. float /.. float 'a. float ip. short which):
extern void four(short nn. float *%, short sens):
extern void realft(short n. float *2, short sens);
extern void autoflt(short n. float *z):
I* Definition of constants */
#define minParamCount (short) 4
#define maxParamCount (short) 4
#define ParamError (short) 1
#define MemoryError (short) 2
#define SIDError (short) 3
#define SampleError (short) 4
#define GestaltError (short) 6
#define TimeEnror (short) 7
#define DELAY 1024
#define NBMAX 512
#define kAsynch true
pascal void EntiyPoint(XCmdPtr paramPtr)
{
/ *
Declarations locales
* /
OSType selector:
OSErr err;
long respunse.refnum.soundsize.duree,timeout.tempo.tempo1,tempo2; short 1,count.i,j ,k.step .nba.note:
float mean,amp,if.tpas.num.theta;
Fixed rate;
extended xrat.;
Stx255 myInputDevice,fname;
Handle myDeviccIcon.msghndl;
Boolean cfflag;
SPB mySound:
Ptr samples;
char result[32]:
float freq[NBMAX].buf[NDMAX],fen[NBMAX+1].ac[NBMAX+1];
clock_t debul,stemp;
/*
parametre d'appel
* /
if(GetParamCount(paramPtr,minParamCount.maxParamCount) = = (short) -1) {
Fall(paramPtr.ParamError);
retura:
]
strcpy((char *) fname.'\pStatus");
cflflag=false;
count=(short) HandleToNum(paramPtr.paramPtr->params[0]); note=(shorl) HandleToNum(paramPtr.paramPtr->params[1]):
if (note) theta=8.1758"exp(0.05776*note):
tempo1 = HandleToNum(paramPtr.paramPtr->params[2]);
tempo2 = HandleToNum(paramPtr,paramPtr->params(3]):
/*
Attribute Son
*/
selector='snd ';
err=Gestalt(selector,&response):
if {err) {
Fail(paramPtr,GestaltError) :
return;
}else {
if(!response&0×0008) {
Fail(paramPtr.SIDError) :
) else{
erreSPBGetIndexedDevice(count.myInputDevice.&myDeviccIcon); if (err){
Fail(paramPtr.SIDError);
) else {
err=SPBOpenDevice(myInputDevice.& WritePermission.&refnum); if (err) {
Fail(paramPtr.SIDErτor);
} else {
err=SPBGetDeviccInfo(refnum.'sral'.(Ptr) &rate);
xrat=Fix2x(rate);
step= (note ? (int) xrat./(16*theta) : 2):
if(step < 1)fstep = 1;
tpas=step/xrat;
nbs-NBMAX;
soundsize=DELAY+step*nbs;
samples=NewPtr(soundsize):
if(samples= = NULL) {
Fail(paramPtr.MemeryErτor):
return:
}
mySound.inRelNum=refnum;
mySound.count=soundsize;
mySound.milliseconds=0;
mySound.buffeiLength=soundsize;
mySound.bufferPtr=samples:
mySound.completionRoutine=NULL;
mySound.interruptRoutine=NULL:
mySound.υserLong=0;
mySound.error=noErr;
mySound.unused1 =0;
debutaclock0:
timeout=30;
msghndl=Handle) CopyStrToHand("Sampling");
SetFieldByName(paramPtr.emag.fname.msghnd1):
j=(nbs > 64 ? nbs/8 : 8);
for (;;) {
duree=(clock()-debutl)/CLOCKS_PER_SEC;
iflduree >= timeout) (
Fail(paramPtr.TimeError);
break:
}
err=SPBRecord(& mySound.!kAsynch);
if (! err & & mySound.count == soundsize) { *(samples+soundsize-step*i));
sndinfo(J.buf.-Sαnean.&amp):
if (amp < 100 ) continue:
stemp=clock0:
do{
tempo=(1000*(clock() -stemp)) / CLOCKS _PER_SEC; } while (tempo < tempo1);
err=SPBRecord(& mySound.!kAsynch);
if (err) continue;
for (i=0; i<J; i++) buf[i+ ] l=(floal)((unsigned char)
*(samples+DELAY+step*1))- 127. :
andinfo(j,buf-&mean,&amp):
if (amp < 100.) continue:
for (i=0: i<J; i++) buf[j-i]=(float)((unsigned char)
*(samples+soundsize-step*1))- 127. ;
sndinfo(j.buf.&mean.&amp);
if (amp < 100.) continue;
for (i-0: i<nbs; i++) buf[i+1]=(float)((unsigned char) *(samples+DELAY+Step*1))- 127. ;
sndinfo(nbs.buf.&mean,&amp);
if (amp < 100.) continue;
msghndl=(Handle) CopyStr ToHand("Processing");
SetFieldByNaιne{paramPtr.cfflag.fname,msghndl);
for (i=1: i<=nbs; i++) freq[i]="buf[i]-mean:
ff=fp(nbs.freq.fen.ac.tpas.1);
num= 17.312342*log(0.1223122*ff);
i=(int) num:
k=(int) 100*(num-1):
numtostring(i,result);
I=strlen(result):
strcpy(result+1,"."):
numtostring(k,result+1+1);
1=strlen(result):
strcpy(result+;," \n"};
paramPtr->returnValue = (Handle) CopyStrToHand(result):
}else{
Fail(paramPtr.SampleError);
}
break:
}
DisposPtr(samples);
err=SPBCloseDevice(refnum):
stemp=clock():
do {
tempo=(1000*(clock()-stemp))/CLOCKS_PER_SEC; } while (tempo < tempo2);
}
}
}
}
return;
}
/* Functions */
void Fail(paramPtr. errorNumber)
XCmdPtr paramPtr;
short errorNumber;
{
switch (errorNumber) {
case ParamError: paramPtr->returnValue = (Handle) CopyStrToHand("Error - Usage:
Extract(1,60.100.500)");
break:
case MemoryError:
paramPtr->returnValue = (Handle) CopyStiToHand("Error - Memory error"); break;
case SIDEΠΌΓ:
paramPtr->reluπιValue = (Handle) CopyStrToIIand("Error - SID error");
break;
case SampleError:
paramPtr->returnValue = (Handle) CopyStrToHandCError - Sample error"); break;
case GestallError
paramPtr->-returnValue = (Handle) CopyStrToHand("Error - Gestalt error"): break;
case TimeError:
paramPtr->returnValue = (Handle) CopyStrToHandC'Error - Timeout error"); break:
default
paraιnPtr->returnValue = (Handle) CopyStrToHand("Error - Unknown error"): break:
}
return:
}
short GetParamCount(paramPtr. min, max)
XCmdPtx paramPtr;
short min:
short max;
{
short count;
count--paraιnPtr->paramCount;
lflcount > max 1 1 count < mm) rttumUshort) -1):
retuxn(counU:
}
Handle CopyStrToHarιd(str)
diar *str;
{
Handle newHndl:
newHndl=(Handlc) NewHandle(dong) strlen(str)+1);
strcpyflchar *) (*newHndl).str):
return(newHndl): long HandleToNum(paxamPtr.hndl)
XCmdPtr paramPtr;
Handle hndl;
{
char str[32];
long num:
strcpy(str,*hndl);
num=StiToLong(paramPtr. c2pstr(.s(.r));
return(num):

Claims

WHAT IS CLAIMED IS:
1. An ear training method, comprising the steps of:
prompting a user to generate a particular note;
sampling digitally sounds generated by said user to collect a set of digitized sampled sounds;
processing synchronously said set of digitized sampled sounds to determine a frequency and a sampled note corresponding to said set of digitized sampled sounds;
comparing said sampled note to said particular note; and
providing said user with a visual indication when said frequency of said sampled note differs from a frequency of said particular note by a predefined margin.
2. The method of claim 1 wherein said processing step includes the steps of:
computing a Fourier transform for said set of digitized sampled sounds; extracting a discrete frequency of maximum energy; and
interpolating a real frequency for said set of digitized sampled sounds from said discrete frequency and a plurality of energies of frequencies adjacent to said discrete frequency.
3. The method of claim 1 wherein said prompting step further comprises the steps of:
analyzing past performances to select a particular one exercise from a plurality of predefined exercises, with each said exercise including a plurality of notes; and
periodically selecting for prompting each of said plurality of notes.
4. A method for improving an ability of a user to determine a pitch of a note, comprising the steps of:
entering a set of parameters for the user; storing a knowledge base including a plurality of structured pedagogic exercises which develop particular abilities, each exercise of said plurality of structured pedagogic exercises including a plurality of notes;
testing said set of parameters and an analysis of past performances against said knowledge base to select a particular one of said plurality of structured pedagogic exercises;
prompting generation of each note of said plurality of notes of said particular one of said plurality of structured pedagogic exercises;
sampling digitally generated sounds corresponding to each prompted note of said plurality of notes to collect a plurality of sets of digitized sampled sounds;
processing each said set of digitized sampled sounds to determine a frequency and a sampled note corresponding to each said set of digitized sampled sounds;
comparing each said sampled note to each said particular note; and providing said user with an indication when any of said frequencies of said sampled notes differs by a predefined margin from a frequency of said
corresponding particular note.
5. The pitch determining ability improving method of claim 4 wherein said digitized sampled sounds processing step is synchronous.
6. The pitch determining ability improving method of claim 4 wherein said digitized sampled sounds processing step is asynchronous. 7. The pitch determining ability improving method of claim 4 further comprising the step of modifying said predefined margin responsive to an analysis of instances of generated notes within said predefined margin compared to instances of generated notes exceeding said predefined margin. 8. The pitch determining ability improving method of claim 7 wherein said predefined margin is decreased when a percentage of properly generated notes exceeds a predetermined value.
9. The pitch determining ability improving method of claim 7 wherein said predefined margin is increased when a percentage of improperly generated notes exceeds a predetermined value. 10. An ear training system, comprising:
a frequencizer for producing a frequency of a sampled sound;
a computer having a memory, a display and a processor, said computer selectively coupled to said frequencizer;
input means, coupled to said computer, for entering a user-specific parameter;
a knowledge base, coupled to said memory, for storing a plurality of predefined pedagogic structured exercises, with each exercise including a plurality of notes, and a past performance history of exercises performed by a particular user; and programming means, coupled to said memory, for instructing said processor to:
test said user-specific parameter against said knowledge base to select a particular one of said predefined pedagogic structured exercises; display selected notes of said particular one exercise;
prompt said particular user to generate a sound corresponding to a particular one of said displayed notes;
access said frequencizer and receive a sampled frequency corresponding to said prompted particular one note;
compare said sampled frequency with a reference frequency of said prompted particular one note; and
indicate an improperly generated note to said particular user when a difference between said sampled frequency and said reference frequency exceeds a predefined margin of error.
11. The ear training apparatus of claim 10 wherein said programming means further instructs said processor to store in said memory a session log of said sampled frequency and instances of improperly generated notes.
12. An ear training system, comprising:
a digitizer for producing a set of digitized sampled sounds from a user-generated note;
a computer having a memory, a display and a processor, said computer selectively coupled to said digitizer;
input means, coupled to said computer, for entering a user-specific parameter;
a knowledge base, coupled to said memory, for storing a plurality of predefined pedagogic structured exercises, with each exercise including a plurality of notes, and a past performance history of exercises performed by a particular user; and programming means, coupled to said memory, for instructing said processor to:
test said user-specific parameter against said knowledge base to select a particular one of said predefined pedagogic structured exercises; display selected notes of said particular one exercise;
prompt said particular user to generate a sound corresponding to a particular one of said displayed notes;
access said digitzer and receive said set of sampled sounds generated by said user responsive to said prompted particular one note; extract a frequency for said accessed set of sampled soundss to produce a sampled frequency;
compare said sampled frequency with a reference frequency of said prompted particular one note; and
indicate an improperly generated note to said particular user when a difference between said sampled frequency and said reference frequency exceeds a predefined margin of error.
13. A method for improving an ability of a user to determine a pitch of a note, comprising the steps of:
entering a set of parameters for the user;
storing a knowledge base including a plurality of structured pedagogic exercises which develop particular abilities, each exercise of said plurality of structured pedagogic exercises including a plurality of notes;
testing said set of parameters and an analysis of past performances against said knowledge base to select a particular one of said plurality of structured pedagogic exercises;
prompting generation of each note of said plurality of notes of said particular one of said plurality of structured pedagogic exercises by use of an animated character;
sampling digitally generated sounds corresponding to each prompted note of said plurality of notes to collect a plurality of sets of digitized sampled sounds;
processing each said set of digitized sampled sounds to determine a frequency and a sampled note corresponding to each said set of digitized sampled sounds;
comparing each said sampled note to each said particular note; and providing said user with an indication when any of said frequencies of said sampled notes differs by a predefined margin from a frequency of said
corresponding particular note by said animated character.
14. An ear training frequencizer, comprising:
a digitizer for producing a set of digitized sampled sounds; a digital signal processor, coupled to said digitizer, for processing said set of digitized sampled sounds to extract a frequency for said set of digitized sampled sounds; and
switching means, coupled to said digitizer and to said digital signal processor and responsive to a signal from an ear training system, for providing said ear training system with one of said set of digitized sampled sounds and said extracted frequency.
15. An ear training system, comprising:
a frequencizer for producing a frequency of a sampled sound;
a computer having a memory, a display and a processor, said computer selectively coupled to said frequencizer; and programming means, coupled to said memory, for instructing said processor to:
access said frequencizer and receive a frequency corresponding to a note generated by a user; and
present synchronously a symbol on said display indicating said frequency of said note.
16. An analog music notation system, comprising:
a frequencizer for producing a frequency of a sampled sound;
a computer having a memory, a display, and a processor, said computer selectively coupled to said frequencizer; and
programmming means, coupled to said memory, for instructing said processor to:
access said frequencizer and receive a frequency corresponding to a note generated by a user; and
present synchronously the nearest musical note on a musical score corresponding to said frequency of said analog sound on said display.
AMENDED CLAIMS
[received by the International Bureau on 16 August 1993 (16.08.93);
original claims 1-3 replaced by amended claims 1-4;
original claims 4,11 and 12 amended and renumbered 5,12 and 13 respectively; original claims 5-10 and 13-16 unchanged but renumbered 6-11 and 14-17 respectively;
(6 pages) ]
1. An ear (raining method, comprising the steps of:
prompting a user lo generate a particular note;
sampling digitally sounds generated by said user to collect a set of digitized sampled sounds;
processing synchronously said set of digitized sampled sounds to determine a frequency and a sampled note corresponding to said set
of digitized sampled sounds;
comparing said sampled note to said particular note; and
providing said user with a visual indication when said
frequency of said sampled note differs from a frequency of said
particular note by a predefined variable margin. 2. The method of claim 1 wherein said processing step
includes the steps of:
normalizing digital data by mathematic formula;
computing a Fourier transform or other mathematic formula for said set of normalized digitized sampled sounds;
extracting a discrete frequency of maximum energy;
interpolating a real frequency for said set of normalized
digitized sampled sounds from said discrete frequency and a plurality
of energies of frequencies adjacent to said discrete frequency;
computing an auto-correlation of frequency spectrum;
extracting a discrete frequency of maximum energy;
interpolating a real frequency for said set of normalized
digitized sampled sounds from said discrete frequency and a plurality
of energies of frequencies adjacent to siid discrete frequency; and
choosing the best estimate of the true frequency.
3. The method of claim 1 wherein said prompting step includes the steps of:
playing digitized sounds for said user to match, including a
plurality of predefined musical nomenclature choices;
analyzing past performances to select a particular one exercise
from a plurality of predefined exercises, with each said exercise
including a plurality of notes; and periodically selecting for prompting each of said plurality of notes.
4. The method of claim 1 wherein said sampling step includes the steps of:
delaying the acquisition of the sample by a predetermined, but variable, length of time; and
changing the sampling rate based on the target note of the sample.
5. A method for improving an ability of a user to determine a pitch of a note, comprising the steps of;
entering a set of parameters for the user;
storing a knowledge base including a plurality of structured pedagogic exercises which develop particular abilities, each exercise of said plurality of structured pedagogic exercises including a plurality of notes;
testing said set of parameters and an analysis of past
performances against said knowledge base to select a particular one of said plurality of structured pedagogic exercises;
prompting generation of each note of said plurality of notes of said particular one of said plurality of structured pedagogic exercises; sampling digitally user generated sounds corresponding to each prompted note of said plurality of notes to collect a plurality of sets of digitized sampled sounds;
processing each set of digitized sampled sounds to determine a frequency and a sampled note corresponding to each said set of digitized sampled sounds;
comparing each said sampled note to each said particular note; and
providing said user with a qualitative or quantitative indication when any of said frequencies of said sampled notes differs by a predefined variable margin from said corresponding particular note. 6. The pitch determining ability improving method of claim 5 wherein said digitized sampled sounds processing step is
synchronous.
7. The pitch determining ability improving method of claim 5 wherein said digitized sampled sounds processing step is
asynchronous.
8. The pitch determining ability improving method of claim 5 further comprising the step of modifying said predefined margin responsive to an analysis of instances of generated notes within said predefined margin compared to instances of generated notes exceeding said predefined margin.
9. The pitch determining ability improving method of claim 8 wherein said predefined margin is decreased when a percentage of properly generated notes exceeds a predetermined value.
10. The pitch determining ability improving method of claim 8 wherein said predefined margin is increased when a percentage of improperly generated notes exceeds a predetermined value.
11. An ear training system/ comprising:
a frequencizer for producing a frequency of a sampled sound; a computer having memory, a display and a processor, said computer selectively coupled to said frequencizer ;
input means, coupled to said computer, for entering user-specific parameter;
a knowledge base, coupled to said memory, for storing a plurality of predefined pedagogic structured exercises, with each exercise including a plurality of notes, and a past performance history of exercises performed by a particular user; and
programming means, coupled to said memory, for instructing said processor to:
test said user-specific parameter against said knowledge base to select a particular one of said predefined pedagogic structured exercises;
display selected notes of said particular one exercise; prompt said particular user to generate a sound corresponding to a particular one of said displayed notes;
access said frequencizer and receive a sampled frequency corresponding to said prompted particular one note; compare said sampled frequency with a reference frequency of said particular one note; and
indicate an improperly generated note to said particular user when a difference between said sampled frequency and said reference frequency exceeds a predefined margin of error.
12. The car training apparatus of claim 11 wherein said programming means further instructs said processor to store in said memory a session log of said sampled frequency and user
generated notes;
a means for graphically displaying a report of exercises performed by said user;
a means for calculating and graphically displaying a profile of said user generated notes plotted against targeted notes; and
a means for playing user generated notes and displaying their relationship to targeted notes.
13. An car training system, comprising:
a digitizer for producing a set of digitized sampled sounds from a user-generated note;
a computer having memory, a display and a processor, said computer selectively coupled to said digitizer;
input means, coupled to said computer, for entering user-specific parameter;
a knowledge base, coupled to said memory, for storing a plurality of predefined pedagogic structured exercises, with each exercise including a plurality of notes, and a past performance history of exercises performed by a particular user; and
programming means, coupled to said memory, for instructing said processor to:
test said user-specific parameter against said knowledge base to select a particular one of said predefined pedagogic structured exercises;
display selected notes of said particular one exercise; prompt said particular user to generate a sound corresponding to a particular one of said displayed notes; access said digitizer and receive said set of sampled sounds generated by said user responsive to said prompted particular one note;
extract a frequency for said accessed set of sampled sounds to produce a sampled frequency;
compare said sampled frequency with a reference frequency of said particular one note; and
indicate an improperly generated note to said particular user when a difference between said sampled frequency and said reference frequency exceeds a predefined margin of error.
14. A method for improving an ability of a user to determine a pitch of a note, comprising the steps of:
entering a set of parameters for the user;
storing a knowledge base including a plurality of structured pedagogic exercises which develop particular abilities, each exercise of said plurality of structured pedagogic exercises including a plurality of notes;
testing said set of parameters and an analysis of past
performances against said knowledge base to select a particular one of said plurality of structured pedagogic exercises;
prompting generation of each note of said plurality of notes of said particular one of said plurality of structured pedagogic exercises by use of an animated character;
sampling digitally user generated sounds corresponding to each prompted note of said plurality of notes to collect a plurality of sets of digitized sampled sounds;
processing each set of digitized sampled sounds to determine a frequency and a sampled note corresponding to each said set of digitized sampled sounds;
comparing each said sampled note to each said particular note; and
providing said user with an indication when any of said frequencies of said sampled notes differs by a predefined margin from said corresponding particular note by said animated character.
15. An ear training frequencizer, comprising:
a digitizer for producing a set of digitized sampled sounds; a digital signal processor, coupled to said digitizer, for processing said set of digitized sampled sounds to extract a frequency for said set of digitized sampled sounds; and
switching means, coupled to said digitizer and to said digital signal processor and responsive to a signal from an ear training system, for providing said ear training system with one of said set of digitized sampled sounds and said extracted frequency.
16. An car training system, comprising:
a frequencizer for producing a frequency of a sampled sound; a computer having memory, a display and a processor, said computer selectively coupled to said frequencizer ; and
programming means, coupled to said memory, for instructing said processor to:
access said frequencizer and receive a frequency corresponding to a note generated by a user; and
present synchronously a symbol on said display indicating said frequency of said note.
17. An analog music notation system, comprising:
a frequencizer for producing a frequency of a sampled sound; α computer having memory, a display and a processor, said computer selectively coupled to said frequencizer ; and
programming means, coupled to said memory, for instructing said processor to:
access said frequencizer and receive a frequency corresponding to a note generated by a user; and
present synchronously the nearest musical note on a musical score corresponding to said frequency of said analog sound on said display.
PCT/US1993/001450 1992-02-20 1993-02-18 Method and apparatus for ear training WO1993017408A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US83914992A 1992-02-20 1992-02-20
US07/839,149 1992-02-20

Publications (1)

Publication Number Publication Date
WO1993017408A1 true WO1993017408A1 (en) 1993-09-02

Family

ID=25278976

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/001450 WO1993017408A1 (en) 1992-02-20 1993-02-18 Method and apparatus for ear training

Country Status (2)

Country Link
AU (1) AU3774293A (en)
WO (1) WO1993017408A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BE1007899A3 (en) * 1993-12-22 1995-11-14 Philips Electronics Nv Information system with means for user interactions, an informationprocessing and information management system and the means for operating thesaid system and the responses to the user
DE19621518A1 (en) * 1996-05-29 1997-12-11 Rosmann Karl Musical tone pitch visualiser
WO1999034345A1 (en) * 1997-12-30 1999-07-08 Cohen Gal A Method and apparatus for training auditory skills
GB2339629A (en) * 1998-07-13 2000-02-02 Timothy Keith Gray An interactive music teaching device
EP1033157A2 (en) * 1999-02-26 2000-09-06 Konami Co., Ltd. Music game system, game control method for the game system, and computer-readable memory medium
GB2350228A (en) * 1999-05-20 2000-11-22 Kar Ming Chow Digital processing of analogue audio signals
WO2009156901A2 (en) * 2008-06-27 2009-12-30 Koninklijke Philips Electronics N.V. Device, method, and computer-readable medium for providing musical feedback
WO2012146829A1 (en) * 2011-04-26 2012-11-01 Ovelin Oy System and method for providing exercise in playing a music instrument
US8338684B2 (en) 2010-04-23 2012-12-25 Apple Inc. Musical instruction and assessment systems
AT513073A1 (en) * 2011-12-24 2014-01-15 Isma Ag Ear Training Unit
WO2017211292A1 (en) * 2016-06-08 2017-12-14 Wai Yuk Annie YUEN Computer-assisted method and computer system for teaching piano

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4024789A (en) * 1973-08-30 1977-05-24 Murli Advani Tone analysis system with visual display
US4510840A (en) * 1982-12-30 1985-04-16 Victor Company Of Japan, Limited Musical note display device
US4546690A (en) * 1983-04-27 1985-10-15 Victor Company Of Japan, Limited Apparatus for displaying musical notes indicative of pitch and time value
US5183398A (en) * 1990-06-01 1993-02-02 The Software Toolworks Apparatus and method for interactive instruction of a student

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4024789A (en) * 1973-08-30 1977-05-24 Murli Advani Tone analysis system with visual display
US4510840A (en) * 1982-12-30 1985-04-16 Victor Company Of Japan, Limited Musical note display device
US4546690A (en) * 1983-04-27 1985-10-15 Victor Company Of Japan, Limited Apparatus for displaying musical notes indicative of pitch and time value
US5183398A (en) * 1990-06-01 1993-02-02 The Software Toolworks Apparatus and method for interactive instruction of a student

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BE1007899A3 (en) * 1993-12-22 1995-11-14 Philips Electronics Nv Information system with means for user interactions, an informationprocessing and information management system and the means for operating thesaid system and the responses to the user
DE19621518A1 (en) * 1996-05-29 1997-12-11 Rosmann Karl Musical tone pitch visualiser
WO1999034345A1 (en) * 1997-12-30 1999-07-08 Cohen Gal A Method and apparatus for training auditory skills
GB2339629A (en) * 1998-07-13 2000-02-02 Timothy Keith Gray An interactive music teaching device
EP1033157A2 (en) * 1999-02-26 2000-09-06 Konami Co., Ltd. Music game system, game control method for the game system, and computer-readable memory medium
GB2350228A (en) * 1999-05-20 2000-11-22 Kar Ming Chow Digital processing of analogue audio signals
GB2350228B (en) * 1999-05-20 2001-04-04 Kar Ming Chow An apparatus for and a method of processing analogue audio signals
US6288318B1 (en) 1999-05-20 2001-09-11 Kar Ming Chow Apparatus for and a method of processing analogue audio signals
WO2009156901A2 (en) * 2008-06-27 2009-12-30 Koninklijke Philips Electronics N.V. Device, method, and computer-readable medium for providing musical feedback
WO2009156901A3 (en) * 2008-06-27 2010-06-10 Koninklijke Philips Electronics N.V. Device, method, and computer-readable medium for providing musical feedback
US8338684B2 (en) 2010-04-23 2012-12-25 Apple Inc. Musical instruction and assessment systems
US8785757B2 (en) 2010-04-23 2014-07-22 Apple Inc. Musical instruction and assessment systems
WO2012146829A1 (en) * 2011-04-26 2012-11-01 Ovelin Oy System and method for providing exercise in playing a music instrument
US9218748B2 (en) 2011-04-26 2015-12-22 Yousician Oy System and method for providing exercise in playing a music instrument
AT513073A1 (en) * 2011-12-24 2014-01-15 Isma Ag Ear Training Unit
AT513073B1 (en) * 2011-12-24 2019-07-15 Isma Ag Ear Training Unit
WO2017211292A1 (en) * 2016-06-08 2017-12-14 Wai Yuk Annie YUEN Computer-assisted method and computer system for teaching piano
US10019911B2 (en) 2016-06-08 2018-07-10 Wai Yuk Annie YUEN Computer-assisted method and computer system for teaching piano

Also Published As

Publication number Publication date
AU3774293A (en) 1993-09-13

Similar Documents

Publication Publication Date Title
US10789921B2 (en) Audio extraction apparatus, machine learning apparatus and audio reproduction apparatus
US7323629B2 (en) Real time music recognition and display system
US6856923B2 (en) Method for analyzing music using sounds instruments
Gupta et al. Perceptual evaluation of singing quality
US20210074267A1 (en) Machine learning method, audio source separation apparatus, and electronic instrument
US20230402026A1 (en) Audio processing method and apparatus, and device and medium
CN110070847B (en) Musical tone evaluation method and related products
CN108038146B (en) Music playing artificial intelligence analysis method, system and equipment
KR101459324B1 (en) Evaluation method of sound source and Apparatus for evaluating sound using it
Wu et al. Towards the objective assessment of music performances
WO1993017408A1 (en) Method and apparatus for ear training
JP2008040259A (en) Musical piece practice assisting device, dynamic time warping module, and program
Stevens et al. Design and evaluation of an auditory glance at algebra for blind readers
Arom et al. Theory and technology in African music
Bozkurt A system for tuning instruments using recorded music instead of theory-based frequency presets
JP5005445B2 (en) Code name detection device and code name detection program
JP2004325744A (en) Musical interval judging device
WO2019180830A1 (en) Singing evaluating method, singing evaluating device, and program
KR20190121080A (en) media contents service system using terminal
JP2008040260A (en) Musical piece practice assisting device, dynamic time warping module, and program
CN110956870A (en) Solfeggio teaching method and device
JP2008040258A (en) Musical piece practice assisting device, dynamic time warping module, and program
JPS61120188A (en) Musical sound analyzer
Półrolniczak et al. Analysis of the signal of singing using the vibrato parameter in the context of choir singers
Piszczalski et al. Computer analysis and transcription of performed music: A project report

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU BR CA JP KR RU

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA