CA2165862A1 - Method and system for producing sounds - Google Patents
Method and system for producing soundsInfo
- Publication number
- CA2165862A1 CA2165862A1 CA002165862A CA2165862A CA2165862A1 CA 2165862 A1 CA2165862 A1 CA 2165862A1 CA 002165862 A CA002165862 A CA 002165862A CA 2165862 A CA2165862 A CA 2165862A CA 2165862 A1 CA2165862 A1 CA 2165862A1
- Authority
- CA
- Canada
- Prior art keywords
- demi
- sound
- syllable
- concatenation
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims description 23
- 230000015654 memory Effects 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 28
- 230000004044 response Effects 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 10
- 230000000007 visual effect Effects 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 9
- 241000282326 Felis catus Species 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000007175 bidirectional communication Effects 0.000 description 2
- 230000006854 communication Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 210000003811 finger Anatomy 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241001137901 Centropomus undecimalis Species 0.000 description 1
- 206010011878 Deafness Diseases 0.000 description 1
- 241000282320 Panthera leo Species 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01L—MEASURING FORCE, STRESS, TORQUE, WORK, MECHANICAL POWER, MECHANICAL EFFICIENCY, OR FLUID PRESSURE
- G01L9/00—Measuring steady of quasi-steady pressure of fluid or fluent solid material by electric or magnetic pressure-sensitive elements; Transmitting or indicating the displacement of mechanical pressure-sensitive elements, used to measure the steady or quasi-steady pressure of a fluid or fluent solid material, by electric or magnetic means
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Apparatus for producing digital sound records from demi-syllable sound files includes a keyboard having a first set of keys for entering data corresponding to beginning consonant sounds used in a selected language, a second set of keys for entering data corresponding to ending consonant sounds used in the language, and a third set of keys for entering data corresponding to vowel sounds used in the language. The apparatus further includes a memory for digitally storing a plurality of demi-syllable sound files, each file corresponding to and representing the sound of a predetermined demi-syllable in the language. Input signal processing means provides file address signals to the memory in response to data entered from the keyboard. Output signal processing means combines the contents of successively read demi-syllable files to produce a derivative sound record which, depending upon the sounds represented by the demi-syllable sound files, corresponds to either a full or a partial concatenation of those files.
Description
2 1 65862 APPARATUS FOR PRODUCING DIGITAL SOUND RECORDS
FIELD OF THE INVENTION
The present invention relates to method of producing sounds and to its application in a language training method.
BACKGROUND TO THE INVENTION
Various devices are known in the art for high speed phonetic keyboarding. These include the stenotype which is most commonly used in North America and the palentype which is most commonly used in Europe. The palentype is similar in operation to the stenotype, but the grouping of keys on its keyboard is modified into a "V" angle which is more comfortable to operate.
The digital proces~ing of data entered from such keyboards is also known. For example, an IBM Technical Disclosure Bulletin entitled Speech Translation Machine For The Deaf, Vol.
17, No. 12, May 1975, describes a stenotype machine adapted to translate and display speech in a phonetic word form. See also, for example: U.S. Patent No. 3,665,115 entitled 20 Stenographic Apparatus For Providing A Magnetically Recorded Digitally Encoded Record, granted May 23, 1972 (Snook); and U.S. Patent No. 4,632,578 entitled Co~llpu~el;7ed Printing System, granted December 30, 1986 (Cuff et al.).
As well, it is known in text or word processing systems to store words or portions of words in digital memory for the purpose of subsequent addressing and retrieval depending upon input from a keyboard or other source: see, for example, U.S. Patent No. 4,342,085 entitled Stem Proces~ing For Data Reduction In A Dictionary Storage File, granted July 27, 1982 (Glickman et al.); and U.S. Patent No. 4,439,836 entitled Electronic Translator, granted March 27, 1984 (Yoshida).
Further, it is well known to store words in digit~l memory in the form of sound files which can be addressed, retrieved and played out depending upon input from a keyboard or other source.
With systems which are designed to access words in memory, signal processing speed can present a fundamental difficulty, particularly when the data input may be fast and may require inte.~reldtion in order to locate the proper word in memory.
One area where the problem of signal processing speed can become particularly acute is 10 in the area of sound production. If a keyboard input is used to enter data to produce the sound of desired words, the keyboard operator may effectively outrun the system unless the input is processed fast and efficiently. The input may be at the rate of normal speech, yet the output may lag. Any sense of real time operation is then lost.
This problem can present itself in the learning of a new language. Here, it is desirable for the student to work with sound patterns and combinations of sound patterns rather than series of letters which make up the language. But, when a student keys in the input for a desired sound or combination of sounds in the language, it is important to simultaneously hear the result.
Otherwise, the l~rning process can be impaired. In this regard, it is desirable to provide a real 20 time sense of whether or not a mistake has been made at the keyboard.
Accordingly, a primary object of the present invention is to provide new and improved a~paldtus which can rapidly receive and process keyboard input representing sounds in a selected language.
A further object of the present invention is to provide a new and improved system which can rapidly receive and process such keyboard input and produce sounds or other indicia corresponding to such input without significant delay.
SUMMARY OF THE INVENTION
In accordance with one aspect of the present invention, there is provided appalalus for producing digital sound records from demi-syllable sound files, such appa~alus including a keyboard means having a first set of keys for entering data corresponding to beginning consonant sounds used in a selected language, a second set of keys for entering data corresponding to ending consonant sounds used in that language, and a third set of keys for entering data corresponding to vowel sounds used in that language. A memory means digitally stores a plurality of demi-syllable sound files, each file corres~onding to and represçnting the sound of 10 a predetermined demi-syllable in the selected language. Further, the appa~at~s includes an input signal proces~ing means operatively connected to the keyboard means and to the memory means for providing file address signals to the memory means in response to data entered from the keyboard means, and output signal processing means operatively connected to the memory means for reading the contents of the files so addressed and for combining the conlenls of successively read first and second ones of the files so read to produce a derivative sound record which, depending upon the sounds represented by the first and second files, corresponds to either a full or a partial conc~tçn~tion of the first and second files.
Depending upon the sounds represented by the first and second files, the derivative sound 20 record preferably corresponds to a full conc~ten~tion of the first and second files, a conc~tPn~tion of the first file with a part of the second file, a conc~tçn~tion of a part of the first file with the second file, or a concatenation of a part of the first file with a part of the second file.
By breaking the language down to demi-syllables and utili~ing a memory store of demi-syllable sound files addressed with such a keyboard input, the basis for enhanced processing speed is established. However, when one demi-syllable sound file is combined with another, the combined or derivative sound may be distorted l~lesell~lion of that which is actually desired. For example, if the demi-syllable "at" is simply concalenaled at the end of the demi-syllable "ca", the result would not be the simple word "cat". Rather, it would a disconcerting version of the word with a stretched out central portion. Accordingly, depending upon the demi-syllables being combined, only a portion of the respective sound files may be utilized to produce the derivative sound record.
In accordance with a another aspect of the present invention, there is provided a language training system which includes a keyboard means as described above for entering beginning consonant sounds, ending consonant sounds, and vowel sounds in a selected language, a first 10 memory means for digitally storing a plurality of demi-syllable sound files, each demi-syllable sound file corresponding to and representing the sound of a predetermined word or words having defined meaning in a that language, and a second memory means for digitally storing a plurality of demi-syllable sound files, each demi-syllable sound file corresponding to and representing the sound of a predetermined demi-syllable in that language. Further, the system includes signal processing means operatively connected to the keyboard means and to the second memory means for producing the sound of a desired word from the demi-syllable sound files in response to data entered from the keyboard means, said desired word corresponding to a word stored in the first memory means. As well, the system includes means operatively connected to the first memory means for producing the sound of the desired word as represented in the demi-syllable sound 20 files.
Advantageously, the words in the memory of demi-syllable sound files contains predetermined words stored with their proper stress and intonation, features which will be lacking in words formed from the combination of demi-syllable sound files. In the process of language training, the student may be prompted by the sound of such a word, and called upon to enter the same word from the keyboard. Alternately, the student may be visually prompted to enter the word from the keyboard, hear the sound resulting from the combination of demi-syllable sound files, then hear the sound of the same word as produced from the demi-syllable sound file.
To further advantage, the foregoing system may include means for visually indicating keys that were pressed to produce the sound of the desired word and keys that should have been pressed to produce the sound of the desired word.
The invention will now be described in more detail with reference to the following drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of a language training system in accordance with the present invenhon.
Figure 2 illustrates in more detail a portion of a keyboard used with the system shown in Figure 1.
Figure 3 is a chart illustrating key and key combinations used to produce desired 20 beginning consonant sounds, vowel sounds, and ending consonant sounds with the keyboard shown in Figure 1.
Figure 4 is a graphic representation of a model speaker's speaking.
Figure S illustrates a display device and the graphical replesentation of the keys of keys pressed and proper keys to press.
DETAILED DESCRIPTION
The block diagram shown in Figure 1 illustrates a language training system comprising a keyboard 50 for entering data, and a processing unit 60 which is in communication with the keyboard via bus 51 and capable of decoding and executing instructions. Conveniently, unit 60 will be part of a commercially available personal co~puler having a 386 or higher microprocessor running at a minimum speed of at least 25 MHz. The system includes two memories: a first memory 70 for digitally storing a plurality of demi-syllable sound files; and a second memory 80 for digitally storing a plurality of demi-syllable sound files. Memory 70 is in bi-directional communication with processing unit 60 via bus 71. Similarly, memory 80 is in bi-directional communication with processing unit 60 via bus 81.
Processing unit 60 is also in communication with two output devices: one being an audio or sound device 90 connected via bus 91; the other being a visual or display device 100 connected via bus 101.
Each demi-syllable sound file stored in memory 70 corresponds to and rep-esellls the sound of a predetermined word or words in a selected language. Complete sentences may be included. While the selected language may be any one of a number of known languages, the chosen language for the embodiment now being described is Fngli~h. In other words, as a language training system, the present embodiment is designed to help teach the Fngli~h language, 20 and the words stored in the demi-syllable sound files of memory 70 are predetermined Fngli~h words to be presented and taught to a student who is endeavouring to learn Fn~ h. To enhance the le~rning of proper pronunciation, auditory recognition and auditory discrimin~tion, the words are preferably stored with their proper intonation and stress.
Each demi-syllable sound file of memory 80 corresponds to and r~lesents the sound of a predetermined demi-syllable in the English language. There are approximately 1400 of such files, each being prerecorded and stored - but necessarily without any co--lpa-~live stress or intonation because the words or syllables they may be used to form are not known.
As shown in Figure 2, keyboard 50 includes twenty-four keys. Data input from these keys corresponds to the forty phonemes or sound segments of the Fn~ h language. As such, keyboard 50 may be referred to as a phonemic keyboard.
The twenty-four keys are or~ni7ed in three sets: a first set of eight keys 1 to 8 on the left hand side for entering data corresponding to beginning consonant sounds used in the F.n~ h language; a second set of ten keys 9 to 18 on the right hand side for entering data corresponding to ending consonant sounds used in the Fngli.~h language; and a third set of six keys 19 to 24 for entering data corresponding to vowel sounds.
The chart shown in Figure 3 illustrates the key and key combinations of keyboard 50 used to produce desired data input. Input occurs upon the release of selected keys and not when keys are pressed. Thus, as illustrated by way of example in Figure 3, data mput corresponding to the beginning consonant sound "b" as in the word "book" is achieved upon the release of keys 2 and 8, normally operated with the left hand. Likewise, data input corresponding to the ending consonant sound "b" as in the word "cab" is achieved upon the release of keys 11 and 14, normally operated with the right hand.
In the case of a demi-syllable such as "be", for example, the approp-iate data input is 20 achieved by using the left hand to press keys 2 and 8 repres~nting the beginnin~ "b" sound, and the thumbs to press vowel sound keys 21 and 24 representing the ending "e" sound; then releasing all keys.
It should be understood that the student will not normally be cognizant of the fact that a demi-syllable or combination of demi-syllables has been entered from the keyboard. From his or her standpoint, such recognition and appropliate processing occurs in the background.
~ 1 65862 A keyboard which enables data entry in accordance with the chart shown in Figure 3 is available from Boswell Industries Inc., 8455 Lougheed Highway, Burnaby, British Columbia, Canada.
Processing unit 60 includes an input signal proces~ing means 62 and an output signal processing means 64. The input signal processing means includes an input register for receiving input data from keyboard 50 to be processed in accordance with instructions stored in a program memory of unit 60. Similarly, the output signal processing means includes an output register for storing signals which result from the program instructions, and which are made available to audio device 90 and display device 100 over buses 91 and 101. Audio device 90 may comprise a conventional processor and amplifier or headphones capable of receiving a digital sound signals from the output register and producing corresponding sounds. Display device 100 may comprise a conventional monitor compatible with the personal computer which is part of processing unit 60.
When input data is received from keyboard 50, it will not necess~rily represent a demi-syllable or pair of demi-syllables forming a word or syllable. However, when the data does represent a demi-syllable, then the instruction program of proces~ing unit 60 responds to address and look up the corresponding sound file in memory 80. Output signal processing means 64 reads the sound file and makes the record available at the output register. When the keyboard data represents a pair of demi-syllables forming a word or syllable, then the instruction program responds to address and look up both demi-syllable sound files. Output signal processing means 64, also following the instruction program, then combines the files to produce a derivative sound record which is made available at the output register.
Since the number of demi-syllable sound files required for the F.ngli~h language is significantly more limited than the number of meaningful words or syllables that might be produced by combining the sounds represented by such files, a significantly smaller store of 2 ~ 65862 sounds is required in memory 80 than with a complete store of syllables. Further, the time required to address and look up a given file in response to keyboard data is foreshortened.
However, as noted above with the example of the word "cat", the simple conc~t~-n~tion or addition of one demi-syllable sound file to another can result in a distorted representation of the sound which is actually desired. Accordingly, in order to accurately reproduce a desired sound, it is necessary to consider the demi-syllables sound files which are being combined, and in many cases to remove or truncate a portion of the files. Thus, depending upon the sounds which are epresented by two demi-syllable sound files which are being combined, the derivative sound record may be a full concatenation of the files, a concatenation of the first file with a part of 10 the second file, a conc~t~n~tion of a part of the first file with the second file, or a conc~tPn~tion of a part of the first file with a part of the second file.
In F.n~ h, there are twenty four consonant sounds. Using the DECTalk~ symbols, these are {"b" "ch" "d" "dh" "f" "g" "hx" "jx" "k" "1" "m" "n" "nx" "p" "r" "s" "sh" "t" "th" "v"
"w" "yx" "z" "zh"}. There are sixteen vowel sounds {"ae" "ah" "ao" "aw" "ax" "ay" "eh" "er"
"ey" "ih" "iy" "ow" "oy" "uh" "uw" "yu"}, which will be denoted Vmn where m,n = 1 to 16.
The sm~llest acoustic unit which allows a difference in meaning and which serve as the building blocks of a language are known as "phonemes". Phonemes comprise all consonant 20 sounds and vowel sounds. Accordingly, in the F.ngli~h language for example, there are 40 phonemes.
A demi-syllable sound is an acoustic unit which comprises one or more phonemes and spans from the start of a consonant sound to the middle of the following vowel sound in a syllable or from the middle of a vowel sound to the end of the following consonant sound.
Accordingly, we define a beginning demi-syllable to be an acoustic unit of the form BC(i) + BVm(j) 2 ~6586~' where m = 1 to 16 BC is the beginning consonant sound with i = 0, 1 or more and BV is the beginning vowel sound with j = O or 1, i and j lepresell~ing the number of such sounds and an ending demi-syllable to be an acoustic unit of the form EVn(j) + EC(k) where n = 1 to 16 EV is the ending vowel sound with j = 0, 1 or more and EC is the ending consonant sound with k = 0-5, j and k lepresenling the number of such sounds Note that in the formulaic definitions of demi-syllable above, a demi-syllable need not necessarily have a vowel sound. That is, when j=0, then there is no vowel sound and m and n are undefined. Accordingly, the combination of a beginning demi-syllable and an ending demi-syllable, does not necessarily form a syllable which is commonly understood to have a 20 vowel sound. When j=l and m=n, then the beginning demi-syllable and the ending demi-syllable have the same vowel sound.
{Vm} and {Vn} for m,n = 1 to 16 are the sixteen vowel sounds in Fngli~h, mentioned above. In principle, more vowel sounds are possible and in some languages, are required.
Although not denoted in the above formula, there are 24 consonant sounds in F.n~ h but more consonant sounds are possible and in some languages are required. In Fngli~h, the number of contiguous beginning consonant sounds typically range from i= 0 to 4 (with i > 1 called commonly a "blend"), while the number of contiguous ending consonant sounds typically range from 0 to 5 (with k > 1 called commonly a "cluster"). In principle, these ranges may be extended in languages other than English.
Some examples follow to illustrate the above formula.
Example, the Fngli.~h word "pit". The phoneme string is {p ih t}. The beginning demi-syllable and ending demi-syllable are respectively {p ih} and {ih t}.
Example, the Fngli~h word "am" is represented by the phoneme string {"ae m"} and by the 10 demi-syllable {ae + aem} or {BC(0) + BV2(j)} + EV2(j) + EC(l).
Example, the F.ngli~h word "high" is represented by the phoneme string {hx ay} and by the demi-syllable {hx ay + ay} or {BC(l) + BV6(1)} + {EV6(1) + EC(0)}
Example, the F.ngli~h consonant sound "s" is represented by the phoneme string and demi-syllable "s" or {BC(l) + BV(0)} + {EV(0) + EC(0)}
Example, the Fn~ h word "strength" is replesented with the phoneme string {s t r eh nx th}
and the demi-syllables are {streh} and {ehnxth} or {BC(3) + BV7(1)} + {EV7(1) + EC(2)}
The mechanical wave embodiments of consonant sounds are generally irregular and therefore difficult to manipulate without severe loss of fidelity to the original sound. In contrast, the mechanical wave embodiments of vowel sounds are regular and periodic and therefore can be easily duplicated, truncated and otherwise edited while retaining fidelity to the original sound.
By recognizing that the vowel sounds are easily editable and manipulable, and by restricting conc~t-on~tion of acoustic units at the vowel sounds, distortion is minimi7ed For a given vowel sound, i.e. m=n and j =l and the sound of BVm(j) is the same as EVn(j), the exact formula for concaten~ting a first demi-syllable BV sound with the EV sound of the second demi-syllable sound, to produce a satisfactory result, is the subject of trial and error.
It is has been found that a concatenation ratio of 20/80 produces a suitable conc~tpn~tion of demi-syllable sounds in most cases. In other words, 20% of the BV sound is joined with 80% of the EV sound. Consider the following example for the Fngli~h word "base". If the "ey" sound is embodied in 25 cycles of a certain mechanical wave, then using the 20/80 10 conc~tlon~tion ratio bey ("b" plus 5 cycles of "ey") + eys (20 cycles of "ey" plus "s") = "base"
This ratio is associated with a given vowel sound and represents a good co-llprolllise to produce a satisfactory sound upon concatenating a beginning demi-syllable with "each" ending demi-syllable. In other words, there are 24 ratios (one for each beginning demi-syllable) associated with each of 16 vowel sounds.
As mentioned above, these ratios are a good complolllise for the general case. There are 20 exceptions for which this ratio produces unsatisfactory results. In such exceptional cases, a second ratio is developed and used instead of the one for the general case. To consider the example of "cat" above, the ratio of 20/80 may work well for the vowel sound "ae" in "cat"
{kae + aet} and "cap" (kae aep). However, in "cam", the result may be un~ti~f~ctory and therefore in joining (kae + aem), a different, second ratio is used. This often occurs when the first ratio works well with unvoiced consonants (like p and t) but the second ratio must be used with voiced consonants which have no unvoiced counlerpal Ls. The voiced or liquid consonants which have no unvoiced counterparts are "m", "n", "1" and "r". The voiced consonants which have unvoiced countel~al ls are "d", "b", "g", "z", "zh", "j " with unvoiced counterparts respectively of "t, "p", "k", "s", "sh", "ch".
The production of the original digitized sound files for the demi-syllable sounds should be created as norm~li7ed as possible. Voice spectrum (pitch and harmonics), volume, duration and aspiration should be as constant as possible for each sound file. Obviously, to meet the above re~uirements, the human speaker should be the same for all sound files.
Consonant sounds are digitally recorded in the entirety and are, depending on the 10 consonant, about 0.25 to 1 second in duration. The Fnglish native speaker typically pronounces a vowel sound in about 40 to 50 cycles. Vowel sounds may be digitally recorded in truncated form or dynamically truncated and used according to certain conc~ten~tion ratios. Beginning vowel BV sounds are typically recorded for about 20% of their natural duration, i.e. around 5 -10 cycles; and ending vowel EV sounds are typically recorded for about 80% of their natural duration, i.e. around 20 cycles.
T f~rning the spoken part of a foreign language is most effectively accomplished through different and simultaneous (or almost simultaneous) sensory channels (seeing, hearing, touching, understanding). Leaming through the different channels reinforce and augment the learning 20 process. This is more so when leaming Fngli~h because the written form of Fngli~h and the spoken form of Fnglish lost their direct correlation centuries ago. Also, it is a common experience that the student of a foreign language has difficulty correctly hearing the spoken language - he or she hears what is not spoken or fails to hear what is spoken. The spoken part is often too abstract in the sense that without context or some other anchor, the sounds are filtered inaccurately through the student's native language processes. The sounds float, as it were.
247464^1 2il 65862 In response to this phenomenon, an important part of this invention is the ~C~igning a permanent physical value to each of the forty phonemes in Fnglich. A kin~Psthetic relationship is developed through the use of the keyboard.
The system introduces the sound to the student's aural sense by audibly creating the sound. Then it addresses the visual sense by printing the sound on the screen (display device 100). Then the student responds kin~Psthetically by entering data corresponding to what the student thought he or she heard and saw, through his hands and fingers interacting spatially with a phonemic keyboard like keyboard 50 with keys spatially org~ni7Pd as described and illustrated.
10 Then a visual representation is presented of the throat, mouth and tongue operating properly to make the proper sounds, as illustrated in Figure 4. The student speaks into a microphone the sound printed and it is played back with the recording of a native speaker saying the sound. A
multi-media dictionary entry is provided to give the student information about the sound, if it is an F.ngli~h word. Obviously variations of the above-described procedures are possible, all to reinforce the learning process.
An extra advantage is provided by the keyboard illustrated in Figure 2. It "assigns" a "left" physical area to beginning demi-syllables, a "right" physical area to ending demi-syllables, and a "central" physical area to vowel sounds. Obviously, other physical layouts are possible 20 (for example, the keys are arranged in areas different than that illustrated, having relative positions different than that illustrated). These physical areas are spaced apart from one another.
Within each physical area are the specific keys necessary to create the desired demi-syllable.
Through use of the keyboard, a physical and spatial relationship is created between the written language and spoken language. One speaks with one's fingers, as it were. One speaks with one's eyes, as it were. Meaning dictionary.
Other keyboards are possible where the physical areas of the keys are arranged differently with respect to each other, than the layout illustrated in Figure 2.
2 1 ~ 5~2 Also, the above scheme will work with an ordinary QWERTY keyboard (demi-syllables can be entered normally, one letter at a time). But there is a disadvantage. Because the unit of entering is a syllable on a Boswell board, a person trained on the Boswell board will normally be able to enter more quickly than a person trained on a QWERTY board which enters in units of a single letter. Also, the desired spatial connection or correlation does not exist in the QWERTY board because the keys for the demi-syllables are diffused across the entire board in a pattern which is unrelated to the phonemes the student is entering.
The conc~ten~tion process can be simple (audible production of the beginning demi-10 syllable followed immediately by the ending demi-syllable). But alternative smoothing techniques are possible and preferable depending on the opel~ling environment and requirements.
Such techniques include ~mming windows.
Thus, with the system of Figure 1, a student may create desired syllables and words by entering data from keyboard 50, and may hear the result over audio device 90. Similarly, with app,opliate progr~mming, the student may see the result on display 100. However, since words produced using keyboard 50 will not have characteristic stress or intonation in their syllables, the system includes word memory 70 in which predetermined words are stored with these features. Processing unit 70 is programmed to retrieve and playback a desired word, words or 20 complete sentences from memory 70 - which the student can hear with audio device 90 or see with display device 100, and then endeavour to reproduce with appropliate keyboard input.
Similarly, the student can be prolllpted beforehand to produce a desired word from the keyboard, and may subsequently listen to the sound of the word as reproduced from memory 70.
To enhance the learning process, processing unit 60 may advantageously include aprogram which visually indicates on display device 100 keys that were pressed to produce the sound of said desired word and keys that should have been pressed to produce the sound of the desired word. Preferably, as illustratd in Figure 5, this is achieved by a graphic illustration of 2~i 65862 the keys on keyboard 50, the keys which were pressed being visually highli~hted, the keys that should have been pressed being ~imil~rly hi~hlighted but with the addition of a distinguishing dot in the centre of the key. When all the highli~hte~ keys include the distinguishing dot, then the student knows that the correct keys have been pressed. Alternatively, the graphic illustration of the keys on display device 100 is indicated only upon an incorrect key being pressed.
Various modifications are possible to the embodiment which has been described herein without departing from the principles of the present invention. Accordingly, the present invention should be understood as encompassing all such modifications as are within the spirit 10 and scope of the claims which follow.
FIELD OF THE INVENTION
The present invention relates to method of producing sounds and to its application in a language training method.
BACKGROUND TO THE INVENTION
Various devices are known in the art for high speed phonetic keyboarding. These include the stenotype which is most commonly used in North America and the palentype which is most commonly used in Europe. The palentype is similar in operation to the stenotype, but the grouping of keys on its keyboard is modified into a "V" angle which is more comfortable to operate.
The digital proces~ing of data entered from such keyboards is also known. For example, an IBM Technical Disclosure Bulletin entitled Speech Translation Machine For The Deaf, Vol.
17, No. 12, May 1975, describes a stenotype machine adapted to translate and display speech in a phonetic word form. See also, for example: U.S. Patent No. 3,665,115 entitled 20 Stenographic Apparatus For Providing A Magnetically Recorded Digitally Encoded Record, granted May 23, 1972 (Snook); and U.S. Patent No. 4,632,578 entitled Co~llpu~el;7ed Printing System, granted December 30, 1986 (Cuff et al.).
As well, it is known in text or word processing systems to store words or portions of words in digital memory for the purpose of subsequent addressing and retrieval depending upon input from a keyboard or other source: see, for example, U.S. Patent No. 4,342,085 entitled Stem Proces~ing For Data Reduction In A Dictionary Storage File, granted July 27, 1982 (Glickman et al.); and U.S. Patent No. 4,439,836 entitled Electronic Translator, granted March 27, 1984 (Yoshida).
Further, it is well known to store words in digit~l memory in the form of sound files which can be addressed, retrieved and played out depending upon input from a keyboard or other source.
With systems which are designed to access words in memory, signal processing speed can present a fundamental difficulty, particularly when the data input may be fast and may require inte.~reldtion in order to locate the proper word in memory.
One area where the problem of signal processing speed can become particularly acute is 10 in the area of sound production. If a keyboard input is used to enter data to produce the sound of desired words, the keyboard operator may effectively outrun the system unless the input is processed fast and efficiently. The input may be at the rate of normal speech, yet the output may lag. Any sense of real time operation is then lost.
This problem can present itself in the learning of a new language. Here, it is desirable for the student to work with sound patterns and combinations of sound patterns rather than series of letters which make up the language. But, when a student keys in the input for a desired sound or combination of sounds in the language, it is important to simultaneously hear the result.
Otherwise, the l~rning process can be impaired. In this regard, it is desirable to provide a real 20 time sense of whether or not a mistake has been made at the keyboard.
Accordingly, a primary object of the present invention is to provide new and improved a~paldtus which can rapidly receive and process keyboard input representing sounds in a selected language.
A further object of the present invention is to provide a new and improved system which can rapidly receive and process such keyboard input and produce sounds or other indicia corresponding to such input without significant delay.
SUMMARY OF THE INVENTION
In accordance with one aspect of the present invention, there is provided appalalus for producing digital sound records from demi-syllable sound files, such appa~alus including a keyboard means having a first set of keys for entering data corresponding to beginning consonant sounds used in a selected language, a second set of keys for entering data corresponding to ending consonant sounds used in that language, and a third set of keys for entering data corresponding to vowel sounds used in that language. A memory means digitally stores a plurality of demi-syllable sound files, each file corres~onding to and represçnting the sound of 10 a predetermined demi-syllable in the selected language. Further, the appa~at~s includes an input signal proces~ing means operatively connected to the keyboard means and to the memory means for providing file address signals to the memory means in response to data entered from the keyboard means, and output signal processing means operatively connected to the memory means for reading the contents of the files so addressed and for combining the conlenls of successively read first and second ones of the files so read to produce a derivative sound record which, depending upon the sounds represented by the first and second files, corresponds to either a full or a partial conc~tçn~tion of the first and second files.
Depending upon the sounds represented by the first and second files, the derivative sound 20 record preferably corresponds to a full conc~ten~tion of the first and second files, a conc~tPn~tion of the first file with a part of the second file, a conc~tçn~tion of a part of the first file with the second file, or a concatenation of a part of the first file with a part of the second file.
By breaking the language down to demi-syllables and utili~ing a memory store of demi-syllable sound files addressed with such a keyboard input, the basis for enhanced processing speed is established. However, when one demi-syllable sound file is combined with another, the combined or derivative sound may be distorted l~lesell~lion of that which is actually desired. For example, if the demi-syllable "at" is simply concalenaled at the end of the demi-syllable "ca", the result would not be the simple word "cat". Rather, it would a disconcerting version of the word with a stretched out central portion. Accordingly, depending upon the demi-syllables being combined, only a portion of the respective sound files may be utilized to produce the derivative sound record.
In accordance with a another aspect of the present invention, there is provided a language training system which includes a keyboard means as described above for entering beginning consonant sounds, ending consonant sounds, and vowel sounds in a selected language, a first 10 memory means for digitally storing a plurality of demi-syllable sound files, each demi-syllable sound file corresponding to and representing the sound of a predetermined word or words having defined meaning in a that language, and a second memory means for digitally storing a plurality of demi-syllable sound files, each demi-syllable sound file corresponding to and representing the sound of a predetermined demi-syllable in that language. Further, the system includes signal processing means operatively connected to the keyboard means and to the second memory means for producing the sound of a desired word from the demi-syllable sound files in response to data entered from the keyboard means, said desired word corresponding to a word stored in the first memory means. As well, the system includes means operatively connected to the first memory means for producing the sound of the desired word as represented in the demi-syllable sound 20 files.
Advantageously, the words in the memory of demi-syllable sound files contains predetermined words stored with their proper stress and intonation, features which will be lacking in words formed from the combination of demi-syllable sound files. In the process of language training, the student may be prompted by the sound of such a word, and called upon to enter the same word from the keyboard. Alternately, the student may be visually prompted to enter the word from the keyboard, hear the sound resulting from the combination of demi-syllable sound files, then hear the sound of the same word as produced from the demi-syllable sound file.
To further advantage, the foregoing system may include means for visually indicating keys that were pressed to produce the sound of the desired word and keys that should have been pressed to produce the sound of the desired word.
The invention will now be described in more detail with reference to the following drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of a language training system in accordance with the present invenhon.
Figure 2 illustrates in more detail a portion of a keyboard used with the system shown in Figure 1.
Figure 3 is a chart illustrating key and key combinations used to produce desired 20 beginning consonant sounds, vowel sounds, and ending consonant sounds with the keyboard shown in Figure 1.
Figure 4 is a graphic representation of a model speaker's speaking.
Figure S illustrates a display device and the graphical replesentation of the keys of keys pressed and proper keys to press.
DETAILED DESCRIPTION
The block diagram shown in Figure 1 illustrates a language training system comprising a keyboard 50 for entering data, and a processing unit 60 which is in communication with the keyboard via bus 51 and capable of decoding and executing instructions. Conveniently, unit 60 will be part of a commercially available personal co~puler having a 386 or higher microprocessor running at a minimum speed of at least 25 MHz. The system includes two memories: a first memory 70 for digitally storing a plurality of demi-syllable sound files; and a second memory 80 for digitally storing a plurality of demi-syllable sound files. Memory 70 is in bi-directional communication with processing unit 60 via bus 71. Similarly, memory 80 is in bi-directional communication with processing unit 60 via bus 81.
Processing unit 60 is also in communication with two output devices: one being an audio or sound device 90 connected via bus 91; the other being a visual or display device 100 connected via bus 101.
Each demi-syllable sound file stored in memory 70 corresponds to and rep-esellls the sound of a predetermined word or words in a selected language. Complete sentences may be included. While the selected language may be any one of a number of known languages, the chosen language for the embodiment now being described is Fngli~h. In other words, as a language training system, the present embodiment is designed to help teach the Fngli~h language, 20 and the words stored in the demi-syllable sound files of memory 70 are predetermined Fngli~h words to be presented and taught to a student who is endeavouring to learn Fn~ h. To enhance the le~rning of proper pronunciation, auditory recognition and auditory discrimin~tion, the words are preferably stored with their proper intonation and stress.
Each demi-syllable sound file of memory 80 corresponds to and r~lesents the sound of a predetermined demi-syllable in the English language. There are approximately 1400 of such files, each being prerecorded and stored - but necessarily without any co--lpa-~live stress or intonation because the words or syllables they may be used to form are not known.
As shown in Figure 2, keyboard 50 includes twenty-four keys. Data input from these keys corresponds to the forty phonemes or sound segments of the Fn~ h language. As such, keyboard 50 may be referred to as a phonemic keyboard.
The twenty-four keys are or~ni7ed in three sets: a first set of eight keys 1 to 8 on the left hand side for entering data corresponding to beginning consonant sounds used in the F.n~ h language; a second set of ten keys 9 to 18 on the right hand side for entering data corresponding to ending consonant sounds used in the Fngli.~h language; and a third set of six keys 19 to 24 for entering data corresponding to vowel sounds.
The chart shown in Figure 3 illustrates the key and key combinations of keyboard 50 used to produce desired data input. Input occurs upon the release of selected keys and not when keys are pressed. Thus, as illustrated by way of example in Figure 3, data mput corresponding to the beginning consonant sound "b" as in the word "book" is achieved upon the release of keys 2 and 8, normally operated with the left hand. Likewise, data input corresponding to the ending consonant sound "b" as in the word "cab" is achieved upon the release of keys 11 and 14, normally operated with the right hand.
In the case of a demi-syllable such as "be", for example, the approp-iate data input is 20 achieved by using the left hand to press keys 2 and 8 repres~nting the beginnin~ "b" sound, and the thumbs to press vowel sound keys 21 and 24 representing the ending "e" sound; then releasing all keys.
It should be understood that the student will not normally be cognizant of the fact that a demi-syllable or combination of demi-syllables has been entered from the keyboard. From his or her standpoint, such recognition and appropliate processing occurs in the background.
~ 1 65862 A keyboard which enables data entry in accordance with the chart shown in Figure 3 is available from Boswell Industries Inc., 8455 Lougheed Highway, Burnaby, British Columbia, Canada.
Processing unit 60 includes an input signal proces~ing means 62 and an output signal processing means 64. The input signal processing means includes an input register for receiving input data from keyboard 50 to be processed in accordance with instructions stored in a program memory of unit 60. Similarly, the output signal processing means includes an output register for storing signals which result from the program instructions, and which are made available to audio device 90 and display device 100 over buses 91 and 101. Audio device 90 may comprise a conventional processor and amplifier or headphones capable of receiving a digital sound signals from the output register and producing corresponding sounds. Display device 100 may comprise a conventional monitor compatible with the personal computer which is part of processing unit 60.
When input data is received from keyboard 50, it will not necess~rily represent a demi-syllable or pair of demi-syllables forming a word or syllable. However, when the data does represent a demi-syllable, then the instruction program of proces~ing unit 60 responds to address and look up the corresponding sound file in memory 80. Output signal processing means 64 reads the sound file and makes the record available at the output register. When the keyboard data represents a pair of demi-syllables forming a word or syllable, then the instruction program responds to address and look up both demi-syllable sound files. Output signal processing means 64, also following the instruction program, then combines the files to produce a derivative sound record which is made available at the output register.
Since the number of demi-syllable sound files required for the F.ngli~h language is significantly more limited than the number of meaningful words or syllables that might be produced by combining the sounds represented by such files, a significantly smaller store of 2 ~ 65862 sounds is required in memory 80 than with a complete store of syllables. Further, the time required to address and look up a given file in response to keyboard data is foreshortened.
However, as noted above with the example of the word "cat", the simple conc~t~-n~tion or addition of one demi-syllable sound file to another can result in a distorted representation of the sound which is actually desired. Accordingly, in order to accurately reproduce a desired sound, it is necessary to consider the demi-syllables sound files which are being combined, and in many cases to remove or truncate a portion of the files. Thus, depending upon the sounds which are epresented by two demi-syllable sound files which are being combined, the derivative sound record may be a full concatenation of the files, a concatenation of the first file with a part of 10 the second file, a conc~t~n~tion of a part of the first file with the second file, or a conc~tPn~tion of a part of the first file with a part of the second file.
In F.n~ h, there are twenty four consonant sounds. Using the DECTalk~ symbols, these are {"b" "ch" "d" "dh" "f" "g" "hx" "jx" "k" "1" "m" "n" "nx" "p" "r" "s" "sh" "t" "th" "v"
"w" "yx" "z" "zh"}. There are sixteen vowel sounds {"ae" "ah" "ao" "aw" "ax" "ay" "eh" "er"
"ey" "ih" "iy" "ow" "oy" "uh" "uw" "yu"}, which will be denoted Vmn where m,n = 1 to 16.
The sm~llest acoustic unit which allows a difference in meaning and which serve as the building blocks of a language are known as "phonemes". Phonemes comprise all consonant 20 sounds and vowel sounds. Accordingly, in the F.ngli~h language for example, there are 40 phonemes.
A demi-syllable sound is an acoustic unit which comprises one or more phonemes and spans from the start of a consonant sound to the middle of the following vowel sound in a syllable or from the middle of a vowel sound to the end of the following consonant sound.
Accordingly, we define a beginning demi-syllable to be an acoustic unit of the form BC(i) + BVm(j) 2 ~6586~' where m = 1 to 16 BC is the beginning consonant sound with i = 0, 1 or more and BV is the beginning vowel sound with j = O or 1, i and j lepresell~ing the number of such sounds and an ending demi-syllable to be an acoustic unit of the form EVn(j) + EC(k) where n = 1 to 16 EV is the ending vowel sound with j = 0, 1 or more and EC is the ending consonant sound with k = 0-5, j and k lepresenling the number of such sounds Note that in the formulaic definitions of demi-syllable above, a demi-syllable need not necessarily have a vowel sound. That is, when j=0, then there is no vowel sound and m and n are undefined. Accordingly, the combination of a beginning demi-syllable and an ending demi-syllable, does not necessarily form a syllable which is commonly understood to have a 20 vowel sound. When j=l and m=n, then the beginning demi-syllable and the ending demi-syllable have the same vowel sound.
{Vm} and {Vn} for m,n = 1 to 16 are the sixteen vowel sounds in Fngli~h, mentioned above. In principle, more vowel sounds are possible and in some languages, are required.
Although not denoted in the above formula, there are 24 consonant sounds in F.n~ h but more consonant sounds are possible and in some languages are required. In Fngli~h, the number of contiguous beginning consonant sounds typically range from i= 0 to 4 (with i > 1 called commonly a "blend"), while the number of contiguous ending consonant sounds typically range from 0 to 5 (with k > 1 called commonly a "cluster"). In principle, these ranges may be extended in languages other than English.
Some examples follow to illustrate the above formula.
Example, the Fngli.~h word "pit". The phoneme string is {p ih t}. The beginning demi-syllable and ending demi-syllable are respectively {p ih} and {ih t}.
Example, the Fngli~h word "am" is represented by the phoneme string {"ae m"} and by the 10 demi-syllable {ae + aem} or {BC(0) + BV2(j)} + EV2(j) + EC(l).
Example, the F.ngli~h word "high" is represented by the phoneme string {hx ay} and by the demi-syllable {hx ay + ay} or {BC(l) + BV6(1)} + {EV6(1) + EC(0)}
Example, the F.ngli~h consonant sound "s" is represented by the phoneme string and demi-syllable "s" or {BC(l) + BV(0)} + {EV(0) + EC(0)}
Example, the Fn~ h word "strength" is replesented with the phoneme string {s t r eh nx th}
and the demi-syllables are {streh} and {ehnxth} or {BC(3) + BV7(1)} + {EV7(1) + EC(2)}
The mechanical wave embodiments of consonant sounds are generally irregular and therefore difficult to manipulate without severe loss of fidelity to the original sound. In contrast, the mechanical wave embodiments of vowel sounds are regular and periodic and therefore can be easily duplicated, truncated and otherwise edited while retaining fidelity to the original sound.
By recognizing that the vowel sounds are easily editable and manipulable, and by restricting conc~t-on~tion of acoustic units at the vowel sounds, distortion is minimi7ed For a given vowel sound, i.e. m=n and j =l and the sound of BVm(j) is the same as EVn(j), the exact formula for concaten~ting a first demi-syllable BV sound with the EV sound of the second demi-syllable sound, to produce a satisfactory result, is the subject of trial and error.
It is has been found that a concatenation ratio of 20/80 produces a suitable conc~tpn~tion of demi-syllable sounds in most cases. In other words, 20% of the BV sound is joined with 80% of the EV sound. Consider the following example for the Fngli~h word "base". If the "ey" sound is embodied in 25 cycles of a certain mechanical wave, then using the 20/80 10 conc~tlon~tion ratio bey ("b" plus 5 cycles of "ey") + eys (20 cycles of "ey" plus "s") = "base"
This ratio is associated with a given vowel sound and represents a good co-llprolllise to produce a satisfactory sound upon concatenating a beginning demi-syllable with "each" ending demi-syllable. In other words, there are 24 ratios (one for each beginning demi-syllable) associated with each of 16 vowel sounds.
As mentioned above, these ratios are a good complolllise for the general case. There are 20 exceptions for which this ratio produces unsatisfactory results. In such exceptional cases, a second ratio is developed and used instead of the one for the general case. To consider the example of "cat" above, the ratio of 20/80 may work well for the vowel sound "ae" in "cat"
{kae + aet} and "cap" (kae aep). However, in "cam", the result may be un~ti~f~ctory and therefore in joining (kae + aem), a different, second ratio is used. This often occurs when the first ratio works well with unvoiced consonants (like p and t) but the second ratio must be used with voiced consonants which have no unvoiced counlerpal Ls. The voiced or liquid consonants which have no unvoiced counterparts are "m", "n", "1" and "r". The voiced consonants which have unvoiced countel~al ls are "d", "b", "g", "z", "zh", "j " with unvoiced counterparts respectively of "t, "p", "k", "s", "sh", "ch".
The production of the original digitized sound files for the demi-syllable sounds should be created as norm~li7ed as possible. Voice spectrum (pitch and harmonics), volume, duration and aspiration should be as constant as possible for each sound file. Obviously, to meet the above re~uirements, the human speaker should be the same for all sound files.
Consonant sounds are digitally recorded in the entirety and are, depending on the 10 consonant, about 0.25 to 1 second in duration. The Fnglish native speaker typically pronounces a vowel sound in about 40 to 50 cycles. Vowel sounds may be digitally recorded in truncated form or dynamically truncated and used according to certain conc~ten~tion ratios. Beginning vowel BV sounds are typically recorded for about 20% of their natural duration, i.e. around 5 -10 cycles; and ending vowel EV sounds are typically recorded for about 80% of their natural duration, i.e. around 20 cycles.
T f~rning the spoken part of a foreign language is most effectively accomplished through different and simultaneous (or almost simultaneous) sensory channels (seeing, hearing, touching, understanding). Leaming through the different channels reinforce and augment the learning 20 process. This is more so when leaming Fngli~h because the written form of Fngli~h and the spoken form of Fnglish lost their direct correlation centuries ago. Also, it is a common experience that the student of a foreign language has difficulty correctly hearing the spoken language - he or she hears what is not spoken or fails to hear what is spoken. The spoken part is often too abstract in the sense that without context or some other anchor, the sounds are filtered inaccurately through the student's native language processes. The sounds float, as it were.
247464^1 2il 65862 In response to this phenomenon, an important part of this invention is the ~C~igning a permanent physical value to each of the forty phonemes in Fnglich. A kin~Psthetic relationship is developed through the use of the keyboard.
The system introduces the sound to the student's aural sense by audibly creating the sound. Then it addresses the visual sense by printing the sound on the screen (display device 100). Then the student responds kin~Psthetically by entering data corresponding to what the student thought he or she heard and saw, through his hands and fingers interacting spatially with a phonemic keyboard like keyboard 50 with keys spatially org~ni7Pd as described and illustrated.
10 Then a visual representation is presented of the throat, mouth and tongue operating properly to make the proper sounds, as illustrated in Figure 4. The student speaks into a microphone the sound printed and it is played back with the recording of a native speaker saying the sound. A
multi-media dictionary entry is provided to give the student information about the sound, if it is an F.ngli~h word. Obviously variations of the above-described procedures are possible, all to reinforce the learning process.
An extra advantage is provided by the keyboard illustrated in Figure 2. It "assigns" a "left" physical area to beginning demi-syllables, a "right" physical area to ending demi-syllables, and a "central" physical area to vowel sounds. Obviously, other physical layouts are possible 20 (for example, the keys are arranged in areas different than that illustrated, having relative positions different than that illustrated). These physical areas are spaced apart from one another.
Within each physical area are the specific keys necessary to create the desired demi-syllable.
Through use of the keyboard, a physical and spatial relationship is created between the written language and spoken language. One speaks with one's fingers, as it were. One speaks with one's eyes, as it were. Meaning dictionary.
Other keyboards are possible where the physical areas of the keys are arranged differently with respect to each other, than the layout illustrated in Figure 2.
2 1 ~ 5~2 Also, the above scheme will work with an ordinary QWERTY keyboard (demi-syllables can be entered normally, one letter at a time). But there is a disadvantage. Because the unit of entering is a syllable on a Boswell board, a person trained on the Boswell board will normally be able to enter more quickly than a person trained on a QWERTY board which enters in units of a single letter. Also, the desired spatial connection or correlation does not exist in the QWERTY board because the keys for the demi-syllables are diffused across the entire board in a pattern which is unrelated to the phonemes the student is entering.
The conc~ten~tion process can be simple (audible production of the beginning demi-10 syllable followed immediately by the ending demi-syllable). But alternative smoothing techniques are possible and preferable depending on the opel~ling environment and requirements.
Such techniques include ~mming windows.
Thus, with the system of Figure 1, a student may create desired syllables and words by entering data from keyboard 50, and may hear the result over audio device 90. Similarly, with app,opliate progr~mming, the student may see the result on display 100. However, since words produced using keyboard 50 will not have characteristic stress or intonation in their syllables, the system includes word memory 70 in which predetermined words are stored with these features. Processing unit 70 is programmed to retrieve and playback a desired word, words or 20 complete sentences from memory 70 - which the student can hear with audio device 90 or see with display device 100, and then endeavour to reproduce with appropliate keyboard input.
Similarly, the student can be prolllpted beforehand to produce a desired word from the keyboard, and may subsequently listen to the sound of the word as reproduced from memory 70.
To enhance the learning process, processing unit 60 may advantageously include aprogram which visually indicates on display device 100 keys that were pressed to produce the sound of said desired word and keys that should have been pressed to produce the sound of the desired word. Preferably, as illustratd in Figure 5, this is achieved by a graphic illustration of 2~i 65862 the keys on keyboard 50, the keys which were pressed being visually highli~hted, the keys that should have been pressed being ~imil~rly hi~hlighted but with the addition of a distinguishing dot in the centre of the key. When all the highli~hte~ keys include the distinguishing dot, then the student knows that the correct keys have been pressed. Alternatively, the graphic illustration of the keys on display device 100 is indicated only upon an incorrect key being pressed.
Various modifications are possible to the embodiment which has been described herein without departing from the principles of the present invention. Accordingly, the present invention should be understood as encompassing all such modifications as are within the spirit 10 and scope of the claims which follow.
Claims (19)
1. A method of converting an input string of phonemes into speech, comprising the performance of the steps of:
(a) creating a plurality of beginning demi-syllable sounds, where each beginning demi-syllable sound is of the form {BC(i) + BVm(j)}, m = 1 to 16 BC is the beginning consonant sound with i = 0, 1 or more and BV is the beginning vowel sound with j = 0 or 1, and (b) creating a plurality of ending demi-syllable sounds, where each ending demi-syllable sound is of the form {EVn(j) + EC(k)}, n = 1 to 16 EV is the ending vowel sound with j = 0 or 1 and EC is the ending consonant sound with k= 0, 1 or more;
(c) for each beginning demi-syllable sound having a given vowel sound, developing a first concatenation ratio for use with each ending demi-syllable sound of said plurality of ending demi-syllable sounds which has the same vowel sound, said first concatenation ratio indicating the relative amounts respectively of the BV of said beginning demi-syllable sound and the EV of said ending demi-syllable sound, that are to be concatenated, said first concatenation ratio being such that a satisfactory sound is produced upon concatenation of said beginning demi-syllable sound and each said ending demi-syllable sound which has the same vowel sound, which have been respectively truncated according to said first concatenation ratio;
(d) upon instruction to convert the input string of phonemes, producing the sound of the input string of phonemes by (i) identifying the beginning demi-syllable sound corresponding to the input string of phonemes, (ii) truncating said corresponding beginning demi-syllable sound according to the first concatenation ratio associated with said beginning demi-syllable sound, (iii) audibly producing said truncated corresponding beginning demi-syllable sound, (iv) identifying the ending demi-syllable sound corresponding to the input string of phonemes, (v) truncating said corresponding ending demi-syllable sound according to the first concatenation ratio associated with said ending demi-syllable sound, (vi) audibly producing said truncated corresponding ending demi-syllable sound,
(a) creating a plurality of beginning demi-syllable sounds, where each beginning demi-syllable sound is of the form {BC(i) + BVm(j)}, m = 1 to 16 BC is the beginning consonant sound with i = 0, 1 or more and BV is the beginning vowel sound with j = 0 or 1, and (b) creating a plurality of ending demi-syllable sounds, where each ending demi-syllable sound is of the form {EVn(j) + EC(k)}, n = 1 to 16 EV is the ending vowel sound with j = 0 or 1 and EC is the ending consonant sound with k= 0, 1 or more;
(c) for each beginning demi-syllable sound having a given vowel sound, developing a first concatenation ratio for use with each ending demi-syllable sound of said plurality of ending demi-syllable sounds which has the same vowel sound, said first concatenation ratio indicating the relative amounts respectively of the BV of said beginning demi-syllable sound and the EV of said ending demi-syllable sound, that are to be concatenated, said first concatenation ratio being such that a satisfactory sound is produced upon concatenation of said beginning demi-syllable sound and each said ending demi-syllable sound which has the same vowel sound, which have been respectively truncated according to said first concatenation ratio;
(d) upon instruction to convert the input string of phonemes, producing the sound of the input string of phonemes by (i) identifying the beginning demi-syllable sound corresponding to the input string of phonemes, (ii) truncating said corresponding beginning demi-syllable sound according to the first concatenation ratio associated with said beginning demi-syllable sound, (iii) audibly producing said truncated corresponding beginning demi-syllable sound, (iv) identifying the ending demi-syllable sound corresponding to the input string of phonemes, (v) truncating said corresponding ending demi-syllable sound according to the first concatenation ratio associated with said ending demi-syllable sound, (vi) audibly producing said truncated corresponding ending demi-syllable sound,
2. A method of claim 1, further comprising the steps of:
(e) for each beginning demi-syllable sound of said given vowel, developing a second concatenation ratio for use with each ending demi-syllable sound said second concatenation ratio indicating the relative amounts respectively of the BV of said beginning demi-syllable sound and the EV of said ending demi-syllable sound, that are to be concatenated, said second concatenation ratio being such that a satisfactory sound is produced upon concatenation of said beginning demi-syllable sound and each said ending demi-syllable sound which has the same vowel sound, which have been respectively truncated according to said second concatenation ratio;
(f) detecting the presence of a designated demi-syllable in the input string of phonemes;
(g) in case of detection of said designated demi-syllable sound, truncating in step (d) according to said second concatenation ratio instead of said first concatenation ratio.
(e) for each beginning demi-syllable sound of said given vowel, developing a second concatenation ratio for use with each ending demi-syllable sound said second concatenation ratio indicating the relative amounts respectively of the BV of said beginning demi-syllable sound and the EV of said ending demi-syllable sound, that are to be concatenated, said second concatenation ratio being such that a satisfactory sound is produced upon concatenation of said beginning demi-syllable sound and each said ending demi-syllable sound which has the same vowel sound, which have been respectively truncated according to said second concatenation ratio;
(f) detecting the presence of a designated demi-syllable in the input string of phonemes;
(g) in case of detection of said designated demi-syllable sound, truncating in step (d) according to said second concatenation ratio instead of said first concatenation ratio.
3. The method of claims 1-2, further comprising the step of:
(h) a user inputting the input string of phonemes by means of a keyboard in which a beginning demi-syllable sound and an ending demi-syllable sound are physically requested by the user touching an area of the keyboard associated with the beginning demi-syllable which is physically spaced apart from an area of the keyboard associated with the ending demi-syllable.
(h) a user inputting the input string of phonemes by means of a keyboard in which a beginning demi-syllable sound and an ending demi-syllable sound are physically requested by the user touching an area of the keyboard associated with the beginning demi-syllable which is physically spaced apart from an area of the keyboard associated with the ending demi-syllable.
4. The method of claim 3, further comprising the step of:
(i) automatically producing audibly for the user a test input string of phonemes, in response to which the user inputs the input string of phonemes.
(i) automatically producing audibly for the user a test input string of phonemes, in response to which the user inputs the input string of phonemes.
5. The method of claim 4, further comprising the step of:
(j) in response to the user inputting in step (h), producing in a visual or auditory medium the sounds corresponding to the inputted data.
(j) in response to the user inputting in step (h), producing in a visual or auditory medium the sounds corresponding to the inputted data.
6. The method of claims 1-5, wherein said first concatenation ratio is 20/80.
7. The method of claims 1-6, wherein said demi-syllable sound is recorded after being truncated according to said first concatenation ratio and in said step (d), audibly producing the said sound of said demi-syllable sound is accomplished by playing said recorded demi-syllable sound.
8. A system for user to input a input string of phonemes, comprising (a) input means for entering by the user a desired demi-syllable;
(b) computing means for implementing the method of claim 1(or 2, etc.);
(c) output means for outputting a demi-syllable in a form receivable by the user.
(b) computing means for implementing the method of claim 1(or 2, etc.);
(c) output means for outputting a demi-syllable in a form receivable by the user.
9. The system of claim 8 wherein said indication means includes means for audibly producing the input string of demi-syllable sounds entered.
10. The system of claims 9 wherein said indication means includes means for visually displaying the input string of demi-syllable sounds entered.
11. The system of claims 8-10 wherein said input means is a keyboard having a first set of keys for entering data corresponding to beginning consonant sounds of a demi-syllable sound, a second set of keys for entering data corresponding to vowel sounds of a demi-syllable sound and a third set of keys for entering data corresponding to end consonant sounds of a demi-syllable sound.
12. The system of claims 8-11 wherein said entering of data is accomplished by pressing the suitable keys of said first, second and third sets of keys and then releasing simultaneously said pressed keys.
13. The system of claims 8-12 further comprising means for automatically selecting a desired phoneme for the user to enter data through said input means to attempt to match, and wherein said indication means displays the phoneme which said entered data represents and displays a visual representation of the keyboard indicating which keys were pressed and which keys correspond to the desired phoneme.
14. Apparatus for producing digital sound records from demi-syllable sound files, comprising:
(a) a keyboard means comprising a first set of keys for entering data corresponding to beginning consonant sounds used in a selected language, a second set of keys for entering data corresponding to ending consonant sounds used in said language, and a third set of keys for entering data corresponding to vowel sounds used in said language;
(b) memory means for digitally storing a plurality of demi-syllable sound files, each file corresponding to and representing the sound of a predetermined demi-syllable in said language;
(c) input signal processing means operatively connected to said keyboard means and to said memory means for providing file address signals to said memory means in response to data entered from said keyboard means; and, (d) output signal processing means operatively connected to said memory means for reading the contents of said files so addressed and for combining the contents of successively read first and second ones of said files so read to produce a derivative sound record which, depending upon the sounds represented by said first and second files, corresponds to either a full or a partial concatenation of said first and second files.
(a) a keyboard means comprising a first set of keys for entering data corresponding to beginning consonant sounds used in a selected language, a second set of keys for entering data corresponding to ending consonant sounds used in said language, and a third set of keys for entering data corresponding to vowel sounds used in said language;
(b) memory means for digitally storing a plurality of demi-syllable sound files, each file corresponding to and representing the sound of a predetermined demi-syllable in said language;
(c) input signal processing means operatively connected to said keyboard means and to said memory means for providing file address signals to said memory means in response to data entered from said keyboard means; and, (d) output signal processing means operatively connected to said memory means for reading the contents of said files so addressed and for combining the contents of successively read first and second ones of said files so read to produce a derivative sound record which, depending upon the sounds represented by said first and second files, corresponds to either a full or a partial concatenation of said first and second files.
15. A system as defined in Claim 14 wherein, depending upon the sounds represented by said first and second files, said derivative sound record corresponds to:
(a) a full concatenation of said first and second files;
(b) a concatenation of said first file with a part of said second file;
(c) a concatenation of a part of said first file with said second file; or, (d) a concatenation of a part of said first file with a part of said second file.
(a) a full concatenation of said first and second files;
(b) a concatenation of said first file with a part of said second file;
(c) a concatenation of a part of said first file with said second file; or, (d) a concatenation of a part of said first file with a part of said second file.
16. A language training system, comprising:
(a) a keyboard means comprising a first set of keys for entering data corresponding to beginning consonant sounds used in a selected language, a second set of keys for entering data corresponding to ending consonant sounds used in said language, and a third set of keys for entering data corresponding to vowel sounds used in said language;
(b) a first memory means for digitally storing a plurality of demi-syllable sound files, each demi-syllable sound file corresponding to and representing the sound of a predetermined word or words having defined meaning in a said language;
(c) a second memory means for digitally storing a plurality of demi-syllable sound files, each demi-syllable sound file corresponding to and representing the sound of a predetermined demi-syllable in said selected language;
(d) signal processing means operatively connected to said keyboard means and to said second memory means for producing the sound of a desired word from said demi-syllable sound files in response to data entered from said keyboard means, said desired word corresponding to a word stored in said first memory means; and, (e) means operatively connected to said first memory means for producing the sound of said desired word as represented in said demi-syllable sound files.
(a) a keyboard means comprising a first set of keys for entering data corresponding to beginning consonant sounds used in a selected language, a second set of keys for entering data corresponding to ending consonant sounds used in said language, and a third set of keys for entering data corresponding to vowel sounds used in said language;
(b) a first memory means for digitally storing a plurality of demi-syllable sound files, each demi-syllable sound file corresponding to and representing the sound of a predetermined word or words having defined meaning in a said language;
(c) a second memory means for digitally storing a plurality of demi-syllable sound files, each demi-syllable sound file corresponding to and representing the sound of a predetermined demi-syllable in said selected language;
(d) signal processing means operatively connected to said keyboard means and to said second memory means for producing the sound of a desired word from said demi-syllable sound files in response to data entered from said keyboard means, said desired word corresponding to a word stored in said first memory means; and, (e) means operatively connected to said first memory means for producing the sound of said desired word as represented in said demi-syllable sound files.
17. A language training system as defined in Claim 16, further including means for visually indicating keys that were pressed to produce the sound of said desired word and keys that should have been pressed to produce the sound of said desired word.
18. A language training system as defined in Claim 16, wherein said signal processing means comprises:
(a) input signal processing means for providing file address signals to said second memory means in response to said data entered from said keyboard means; and, (b) output signal processing means for reading the contents of demi-syllable sound files so addressed and for combining the contents of successively read first and second ones of said demi-syllable sound files so read to produce a derivative sound record which, depending upon the sounds represented by said first and second demi-syllable sound files, corresponds to either a full or a partial concatenation of said first and second demi-syllable sound files.
(a) input signal processing means for providing file address signals to said second memory means in response to said data entered from said keyboard means; and, (b) output signal processing means for reading the contents of demi-syllable sound files so addressed and for combining the contents of successively read first and second ones of said demi-syllable sound files so read to produce a derivative sound record which, depending upon the sounds represented by said first and second demi-syllable sound files, corresponds to either a full or a partial concatenation of said first and second demi-syllable sound files.
19. A language training system as defined in Claim 16-18 wherein, depending upon the sounds represented by said first and second demi-syllable sound files, said derivative sound record corresponds to:
(a) a full concatenation of said first and second demi-syllable sound files;
(b) a concatenation of said first demi-syllable sound file with a part of said second demi-syllable sound file;
(c) a concatenation of a part of said first demi-syllable sound file with said second demi-syllable sound file; or, (d) a concatenation of a part of said first demi-syllable sound file with a part of said second demi-syllable sound file.
(a) a full concatenation of said first and second demi-syllable sound files;
(b) a concatenation of said first demi-syllable sound file with a part of said second demi-syllable sound file;
(c) a concatenation of a part of said first demi-syllable sound file with said second demi-syllable sound file; or, (d) a concatenation of a part of said first demi-syllable sound file with a part of said second demi-syllable sound file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002165862A CA2165862A1 (en) | 1995-12-21 | 1995-12-21 | Method and system for producing sounds |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002165862A CA2165862A1 (en) | 1995-12-21 | 1995-12-21 | Method and system for producing sounds |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2165862A1 true CA2165862A1 (en) | 1997-06-22 |
Family
ID=4157212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002165862A Abandoned CA2165862A1 (en) | 1995-12-21 | 1995-12-21 | Method and system for producing sounds |
Country Status (1)
Country | Link |
---|---|
CA (1) | CA2165862A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107134177A (en) * | 2017-04-15 | 2017-09-05 | 黄宪成 | A kind of children learn Pinyin Chinese input method button acoustical generator coding and keyboard |
-
1995
- 1995-12-21 CA CA002165862A patent/CA2165862A1/en not_active Abandoned
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107134177A (en) * | 2017-04-15 | 2017-09-05 | 黄宪成 | A kind of children learn Pinyin Chinese input method button acoustical generator coding and keyboard |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
RU2340007C2 (en) | Device and method of pronouncing phoneme | |
Westermann et al. | Practical phonetics for students of African languages | |
US6865533B2 (en) | Text to speech | |
US5275569A (en) | Foreign language teaching aid and method | |
KR20010013236A (en) | Reading and pronunciation tutor | |
WO2004063902B1 (en) | Speech training method with color instruction | |
JP2004246184A (en) | Language learning system and method with visualized pronunciation suggestion | |
CN107041159B (en) | Pronunciation assistant | |
JP7166580B2 (en) | language learning methods | |
van Heuven | Resolving the prosody paradox | |
KR100888267B1 (en) | Language traing method and apparatus by matching pronunciation and a character | |
CN111508522A (en) | Statement analysis processing method and system | |
Sherwood | Fast text-to-speech algorithms for Esperanto, Spanish, Italian, Russian and English | |
Martens et al. | Applying adaptive recognition of the learner’s vowel space to English pronunciation training of native speakers of Japanese | |
De Pijper | High-quality message-to-speech generation in a practical application | |
CA2165862A1 (en) | Method and system for producing sounds | |
JP3621624B2 (en) | Foreign language learning apparatus, foreign language learning method and medium | |
JP2009075526A (en) | Comprehensive english learning system using speech synthesis | |
US20130149680A1 (en) | Methods and systems for teaching a non-native language | |
Prudnikova et al. | Difficulties in Conducting Listening Comprehension in Modern English Language | |
JP2001337594A (en) | Method for allowing learner to learn language, language learning system and recording medium | |
Liu | An Acoustic Study on The Acquisition of English Stops Produced by Qingdao Dialect Speakers | |
Nguyen | An approach to tackle pronunciation errors in Thai language among Vietnamese students by linguistic and speech recogniton | |
Al-Wabil et al. | Arabic text-to-speech synthesis: A preliminary evaluation | |
Rahimi et al. | A computing environment for the blind |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FZDE | Discontinued |