CA1258317A - Data card system for initializing spoken-word recognition units - Google Patents
Data card system for initializing spoken-word recognition unitsInfo
- Publication number
- CA1258317A CA1258317A CA000506053A CA506053A CA1258317A CA 1258317 A CA1258317 A CA 1258317A CA 000506053 A CA000506053 A CA 000506053A CA 506053 A CA506053 A CA 506053A CA 1258317 A CA1258317 A CA 1258317A
- Authority
- CA
- Canada
- Prior art keywords
- data
- spoken
- words
- card
- speaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
- 230000003287 optical effect Effects 0.000 claims abstract description 15
- 239000000463 material Substances 0.000 claims description 40
- 238000002310 reflectometry Methods 0.000 claims description 19
- 238000000034 method Methods 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims 1
- 238000000605 extraction Methods 0.000 claims 1
- 230000015654 memory Effects 0.000 description 8
- 238000002844 melting Methods 0.000 description 4
- 230000008018 melting Effects 0.000 description 4
- 229920003023 plastic Polymers 0.000 description 4
- 230000006403 short-term memory Effects 0.000 description 4
- 238000010030 laminating Methods 0.000 description 3
- 239000004033 plastic Substances 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000011065 in-situ storage Methods 0.000 description 2
- 229910052751 metal Inorganic materials 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 238000000465 moulding Methods 0.000 description 2
- 239000004417 polycarbonate Substances 0.000 description 2
- 229920000515 polycarbonate Polymers 0.000 description 2
- 229910052718 tin Inorganic materials 0.000 description 2
- 229910000497 Amalgam Inorganic materials 0.000 description 1
- YUBJPYNSGLJZPQ-UHFFFAOYSA-N Dithiopyr Chemical compound CSC(=O)C1=C(C(F)F)N=C(C(F)(F)F)C(C(=O)SC)=C1CC(C)C YUBJPYNSGLJZPQ-UHFFFAOYSA-N 0.000 description 1
- 239000000853 adhesive Substances 0.000 description 1
- 230000001070 adhesive effect Effects 0.000 description 1
- 229910052787 antimony Inorganic materials 0.000 description 1
- 229910052785 arsenic Inorganic materials 0.000 description 1
- 229910052797 bismuth Inorganic materials 0.000 description 1
- 229910052793 cadmium Inorganic materials 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 239000000084 colloidal system Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 239000000428 dust Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 229910052733 gallium Inorganic materials 0.000 description 1
- 229910052732 germanium Inorganic materials 0.000 description 1
- 230000003760 hair shine Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000004922 lacquer Substances 0.000 description 1
- 239000002923 metal particle Substances 0.000 description 1
- 150000002739 metals Chemical class 0.000 description 1
- 239000002985 plastic film Substances 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 229920000136 polysorbate Polymers 0.000 description 1
- 239000004800 polyvinyl chloride Substances 0.000 description 1
- 229920000915 polyvinyl chloride Polymers 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 229910052711 selenium Inorganic materials 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 229910052714 tellurium Inorganic materials 0.000 description 1
- 229910052716 thallium Inorganic materials 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G07—CHECKING-DEVICES
- G07C—TIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
- G07C9/00—Individual registration on entry or exit
- G07C9/20—Individual registration on entry or exit involving the use of a pass
- G07C9/22—Individual registration on entry or exit involving the use of a pass in combination with an identity check of the pass holder
- G07C9/25—Individual registration on entry or exit involving the use of a pass in combination with an identity check of the pass holder using biometric data, e.g. fingerprints, iris scans or voice recognition
- G07C9/257—Individual registration on entry or exit involving the use of a pass in combination with an identity check of the pass holder using biometric data, e.g. fingerprints, iris scans or voice recognition electronically
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Credit Cards Or The Like (AREA)
- Optical Recording Or Reproduction (AREA)
Abstract
Abstract Data Card System for Initializing Spoken-Word Recognition Units A spoken-word recognition system for recog-nizing a speaker's words with the assistance of a data card initializing system. The data is stored on data cards, each card having sufficient data to teach the recognition unit to recognize the words of the speaker.
A data card reader reads the optical data on the card and inputs this data into the spoken-word recognition unit.
An auxiliary system is used to encode the cards with the speaker's voice characteristics through use of selected speech inputting of a set of words to a microphone, followed by a data card writer which writes the data on the data card.
A data card reader reads the optical data on the card and inputs this data into the spoken-word recognition unit.
An auxiliary system is used to encode the cards with the speaker's voice characteristics through use of selected speech inputting of a set of words to a microphone, followed by a data card writer which writes the data on the data card.
Description
1;~5t~:31'7 Technical Field.
The invention relates to spoken-word recogni-tion systems.
~O
Background Art Suzuki, et al. (U.S. patents 4,060,694, 4,078,154 and 4,100,370) teach a voice recognition system in which the phonemes as spoken by different speakers and the voice of the person speaking can be recognized. A
key phrase is spoken. Parallel filters derive a spectral characteristic parameter which contains weighting factors extracted and compared with the selected phoneme in memory. Improved specificity over other speakers can be obtained by varying the weighting factors through a num-ber of different values, and storing in memory the set of parameters for each sound as spoken by a specific speaker. The system can be used thus for voice verifica-tion.
Felix et al. (U.S. patent 4,449,189) disclose amethod for identifying an individual using a combination Dl: ~
1258~17 of speech and face recognition. The voice signature of a person uttering a key word into a microphone is compared in a pattern matched with the previously stored voice signature of a known person uttering the same key word.
At the same time, a momentary image of that person's mouth region is recorded and compared with that of the same known person. The results of the comparison are analyzed to verify that the identity of the speaker is that of the known person.
Katayama (U.S. patent 4,461,023) discloses a method of storing spoken words for use in a speech recog-nition system. Spoken words are input, then analyzed.
The resulting patterns are stored in memory. A-second memory stores the digitized speech that was input. The addresses of both memory units are specified to be the same for a particular word.
Systems that recognize voice commands require memory for storing speech characteristics for later com-parison. The system must first be "taught" the speech characteristics before it can rècognize specific voice commands. A problem with this is that each speaker has his own speech characteristics that the system must learn. Computer memory space is limited, so the speech characteristics of an individual must be relearned each time the speaker changes.
An object of the invention is to devise a spoken word recognition system which is of reduced com-plexity and which can be quickly and easily programmed to understand any individual's voice commands more readily.
Another object of the invention is to devise a system in which a voice command unit can be initialized easily by each individual user without needing a knowledge of programming or of the unit's operation.
Disclosure of Invention The above objects have been met by a system which stores a person's voice characteristics on a wallet-size card containing a laser recordable strip.
1;~583~7 These cards may be inserted into and removed from a word recognition processor. Every user has a card with his own pre-recorded speech characteristics thereon.
Upon insertion into a word recognition processor, the processor unit would be initialized with respect to the particular voice charcteristics of the owner of the card.
To encode the card, a set of words is spoken by a user into a microphone. The spoken words are analyzed and speech characteristics are extracted. Such charac-teristics include pitch, intonation, speed of speaking,accent parameters, and other parameters. A sufficient number of these characteristics is recorded on the card so that the words spoken later by the speaker may be understood by the word recognition unit.
A spoken-word recognition unit receives a user's voice message and identifies the words with the ; help of the speaker's voice characteristics in its mem-ory, which was initialized by the spoken-word identifica-tion data on a card. In this manner, the spoken words can be recognized. For each new speaker, the unit must first be "taught" a particular speaker's characteristics so that the unit can more easily recognize the spoken `
words. The card provides the information to teach the-unit. A record of an individual's speech characteristics is laser recorded on a card which is later read into the unit by placing it in a card reader and the character-istics entered into the short-term memory of the spoken-word recognition unit.
The card has a strip of laser recording material, such as the reflective direct-read-after-write material described in U.S. patent 4,284,716 to Drexler et al. A modulated laser beam records data on the strip, in situ, by ablation, melting, physical or cnemical change or deformation, thereby forming spots having a detectable 3~ change in an optical characteristic relative to the strip. The recording process on the above mentioned direct-read-after-write material produces differences in reflectivity detectable by a light detector. No pro-1;~5~17 cessing after laser recording is required when the re-cording strip is a direct-read-after-write material.
Laser recording materials also may be used that require heat processing after laser recording.
Each person has his own speech characteris-tics, in much the same way that each person has his own set of fingerprints. The card with the recorded speech characteristics is read by shining a laser beam or light emitting diode onto the strip. The beam, typically, has an intensity of ten percent of the recording intensity.
The beam is reflected from the strip to a photodetector.
The detector detects the contrast in optical character-istics between the strip and the recorded spots, and transmits corresponding signals to the speech recognition unit's short-term memory. The system is now ready to listen to words spoken by the user and to identify the words with the help of the speaker's voice characteris-tics stored in the memory. By this procedure the speaker's words are more clearly identified.
The uniform surface reflectivity of this reflec-tive strip before recording typically would range be-tween 8% and 65~. For a highly reflective strip the average reflectivity over a laser recorded spot might be in the range of 5% to 25%. Thus, the reflective contrast ratio of the recorded spots would range between 2:l and 7:l. Laser recording materials are known in the art that create either low reflectivity spots in a reflective field or high reflec~ive spots in a low reflectivity field. An example of the latter type is described in U.S. patent 4,343,879. When the reflectivity of the field is in the range of 8~ to 20% the reflective spots have a reflectivity of about 40~. The reflective con-trast ratio would range from 2:l to 5:l. Photographic pre-formatting would create spots having a 10% reflectivi-ty in a reflective field or 40~ in a low reflectivity field.
The voice information on the card would typi-cally be in digital form. It would inform the word 1~58:~17 recognition unit of macro aspects of speech such as accent parameters, speed of speaking, dropping of "th"
beginnings or "g" endings, variations in intensity as well as the micro aspects such as tone, pitch, intonation, etc. With this advance knowledge about the speech characteristics of the words about to be spoken the words can more easily be recognized.
The card can store tens, hundreds or even thousands of deviation parameters from a ~normal" voice.
When a word is not understood the word interpreter unit would add in corrections to the unidentified word based upon the individual's speech deviation information. At-tempts to recognize the word are then repeated.
Brief Description of the Drawings Fig. l is a schematic diagram of the spoken-word recognition system of the present invention.
Fig. 2 is a schematic diagram of the data card encoding of the present invention.
Fig. 3 is a plan view of one side of a data card in accord with the present invention.
Fig. 4 is a partial side sectional view taken along lines 4-4 in Fig. 3.
Fig. S is a detail of laser writing on a portion of the laser recording strip illustrated by dashed lines in Fig. 3.
Fig. 6 is a plan view of an apparatus for reading and writing on the optical recording media strip illustrated in Fig. 3.
Best Mode for Carrying Out the Invention With reference to Fig. l, a spoken-word recog-nition system lO reads a person's voice characteristics from a wallet-size card 31 containing a strip of laser recordable material. Each person would have a card 3l with his own speech characteristics prerecorded thereon.
The system lO is initialized with respect to the particular voice characteristics of the card owner by inserting the ( 1;~58;~7 card 31 into system lO. A sufficient number of character-istics is recorded so that words spoken by a particular speaker may be identified.
With reference to Fig. 2 a data card encoding system llO is used to form a card 131. A set of words 116 is spoken by a person`into a microphone 117. The resulting signal is analyzed by a speech analyzer 121 and speech characteristics 122 are extracted. such charac-teristics 122 include pitch, formats, ratio of voiced to unvoiced amplitudes, and other parameters used to help identify words and parts of words. The exact set of parameters will vary from one system to another, de-pending on the type of speech analysis which is used.
Macro aspects of speech such as accent parameters, speed of speaking, dropping of particular sounds at the begin-ning or ending of words, and variations in tone may also be included to make word recognition even easier. In any case, speech analyzer 121 sends a digital signal 122 repre-senting a person's speech characteristics to a data card writer/reader 129 which writes the data with a laser onto card 131 by shining a modulated laser beam 130 onto the card 131. The card 131 has a strip of optical contrast laser recording material disposed thereon. The beam 130 records data onto the card 131, in situ, by ablation, melting physical or chemical change or deformation, thereby forming spots with contrasting reflectivity rela-tive to the unreccrded strip. Reflected beam 132 is read by the card reader/w~iter 129 to confirm laser writing.
In Fig. 1 the spoken-word recognition system lO
is initialized by placing a prerecorded card 31 in data card reader 29. The card reader 29 shines a light beam 30 from a laser or a LED onto the prerecorded strip.
This read beam, typically, has an intensity of five to ten percent of the typical semiconductor laser recording intensity. The light beam 32 is reflected from the strip to a photodetector, which detects this contrast in re-flectivity between the strip and recorded spots. Card reader 29 transmits a signal 24 corresponding to the 1;~5~3~317 recorded data to the short-term memory of the spoken-word recognition unit 23.
The system 10 is now ready to listen to words 16 spoken by the user. The words 16 spoken into micro-phone 17 are analyzed and interpreted by the speechrecognition unit 23 with respect to the voice character-istics 24, now stored in its short-term memory. The words 16 are recognized and the result is sent to an output device 27, such as a CRT terminal.
With reference to Figs. 3 and 4, a data card 11 is illustrated having a size common to most credit cards.
The width dimension of such a card is approximately 54 mm and the length dimension is approximately 85 mm. These dimensions are not critical, but preferred because such a size easily fits into a wallet and has historically been adopted as a convenient size for automatic teller ma-chines and the like. The card's base 13 is a dielectric, usually a plastic material such as polyvinyl chloride or similar material. Polycarbonate plastic is preferred.
The surface finish of the base should have low specular reflectivity, preferably less than 10%.
Base 13 carries strip 15. The strip is about 16 or 35 millimeters wide and extends the length of the card. Alternatively, the strip may have other sizes and orientations. The strip is relatively thin, approximate-ly 60-200 microns, although this is not critical. The strip may be applied to the card by any convenient method which achieves flatness.
The strip is adhered to th~ card with an adhe-sive and covered by a transparent laminating sheet 19which serves to keep strip 15 flat, as well as protecting the strip from dust and scratches. Sheet 19 is a thin, transparent plastic sheet laminating material or a coat-ing, such as a transparent lacquer. The material is preferably made of polycarbonate plastic.
The opposite side of base 13 may have user identification indicia embossed on the surface of the 1;~5~:~1 7 card. Other indicia such as card number and the like may be optionally provided.
The high resolution laser recording material which forms strip 15 may ~e any of the reflective record-ing material which have been developed for use as directread-after-write (DRAW) optical disks, so long as the materials can be formed on thin substrates. An advantage of reflective materials over transmissive materials is that the read/write equipment is all on one side af the card, the data storage capacity is doubled, and the automatic focus is easier. For example, the high resolu-tion material described in U.S. patent 4,230,939 issued to de 8Ont, et al. teaches a thin metallic recording layer of reflecti~e metals such as Bi, Te, Ind, Sn, Cu , Al, Pt, Au, Rh, As, Sb, Ge, Se, Ga.
Materials which are preferred are those having high reflectivity and low melting point, particularly Cd, Sn, Tl, Ind, Bi and amalgams. Suspensions of reflective metal particles in organic colloids also form low melting temperature laser recording media. Silver is one such metal. Typical recording media are described in U.S.
patents Nos. 4,314,260, 4,298,684, 4,278,758, 4,278,758, 4,278,756 and 4,269,917, all assigned to the assignee of the present invention.
The laser recording material which is selected should be compatible with the laser which is used for writing on it. Some materials are more sensitive than others at certain wavelengths. Good sensitivity to in-frared light is preferred because infrared is affected least by scratches and dirt on the transparent laminating sheet. The selected recording material should have a favorable signal-to-noise ratio and form chigh contrast data bits with the read/write system with which it is used.
The material should not lose data when subjected to temperatures of about 17SF(79C) for long periods. The material should also be capable of re-cording at speeds of at least several thousand bits/sec.
1;~58;~.7 This generally precludes the use of materials that re-quire long heating times or that rely on slow chemical reactions in the presence of heat, which may permit recording of only a few bits/sec. A large number of highly reflective laser recording materials have been used for optical data disk applications.
Data is recorded by forming spots in the sur-rounding field of the reflective layer itself, thereby altering the reflectivity in the data spot. Data is read by detecting the optical reflective contrast between the surrounding reflective field of unrecorded areas and the recorded spots. Spot reflectivity of less than half the reflectivity of the surrounding field produces a contrast ratio of at least two to one, which is sufficient con-trast for reading. Greater contrast is preferred. Re-flectivity of the strip field of about 50% is preferred with reflectivity of a spot in the reflective field being less than 10%, thus creating a contrast ratio of greater than five to one. Alternatively, data may also be re-corded by increasing the reflectivity of the strip. Forexample, the recording laser can melt a field of dull microscopic spikes on the strip to create flat shiny spots. This method is described in SPIE, Vol. 329, Optical Disk Technology (1982), p. 202. A spot re-flectivity of more than twice the surrounding spikedfield reflectivity produces a contrast ratio of at least two to one, which is sufficient contrast for reading.
With reference to Fig. 5, a magnified view of laser writing on the laser recording-material strip 15 may be seen. The dashed line 33, corresponds to the dashed line 33 in Fig. 3. The oblong spots 35 are aligned in a path and have generally similar dimensions.
The spots are generally circ~lar or oval in shape with the axis of the oval perpendicular to the lengthwise dimension of the strip. A second group of spots 37 is shown aligned in a second path. The spots 37 have similar dimensions to the spots 35. The spacing between paths is not critical, except that the optics of the 1~5~
readback system should be able to easily distinguish between paths. Presently, in optical data storage technoloqy, tracks which are separated by only a few microns may be resolved. The spacing and pattern of the spots along each path is selected for easy decoding.
The spots illustrated in Fig. 5 have a recom-mended size of approximately 5 microns by 20 microns, or circular spots 5 microns or 10 microns in diameter.
Generally, the smallest dimension of a spot should be less than 50 microns. In the preferred embodiment the largest dimension would also be less than 50 microns. Of course, to offset lower densities from larger spots, the size of the strip 15 could be expanded to the point where it covers a large extent of the card. In Fig. 3, the laser recording strip 15 could completely cover a single side of the card. A minimum information capacity of 250,000 bits is indicated and a storage capacity of over one million bits is preferable.
In Fig. 6, a side view of the lengthwise dimen-sion of a card 41 is shown inserted into cardreader/writer 29. The card is usually received in a movable holder 42 which brings the card into the beam `
trajectory. A laser light source 43, preferably a pulsed semiconductor laser of near infrared wavelength emits a beam 45 which passes through collimating and focussing optics 47. The beam is sampled by a beam splitter 49 which transmits a portion of the beam through a focusing lens 51 to a photodetector 53. The detector 53 confirms laser writing and is not essential. The beam is then directed to a first servo controlled mirror 55 which is mounted for rotation along the axis 57 in the direction indicated by the arrows A. The purpose of the mirror 55 is to find the lateral edges of the laser recording material in a coarse mode of operation and then in a fine 3~ mode of operation identify data paths which exist prede-termined distances from the edges.
From mirror 55, the beam is directed toward mirror 61. This mirror is mounted for rotation at pivot 1'~58;317 63. The purpose of mirror 55 is for fine control of motion of the beam along the length of the card. Coarse control of the lengthwise position of the card relative to the beam is achieved by motion of movable holder 42.
The position of the holder may be established by a linear motor adjusted by a closed loop position servo system of the type used in magnetic disk drives.
During itsmanufacturethe card may be pre-recorded with database information or a preinscribed pattern containing servo tracks, timing marks, program instructions, and related functions. These positioning marks can be used as a reference for the laser recording system to record or read data at particular locations.
Each of the various spoken word recognition systems may have formats specific to its particular needs. U.s.
patent No. 4,304,848 describes how formatting may be done photolithographically. Formatting may also be done using laser recording or surface molding of the servo tracks, having marks, programming and related functions. Dil, in U.S. patent 4,209,804 teaches a type of surface molding.
Reference position information may be prerecorded on the card so that position error signals may be generated and used as feedback in motor control. Upon reading one data path, the mirror 55 is slightly rotated. The motor moves holder 4l lengthwise so that the path can be read, and so on.
Light scattered and reflected from the spots con-trasts with the surrounding field where no spots exist.
The beam should deliver sufficient laser pulse energy to the surface of the recording material to create spots.
Typically, 5-20 milliwatts is required, depending on the recording material. A 20 milliwatt semiconductor laser, focussed to a five micron beam size, records at tempera-tures of about 200 C and is capable of creating spots in less than 25 microseconds. The wavelength of the laser should be compatible with the recording mate-rial. In the read mode, power is lowered to about 5% to 10% of the record power.
-`` 1;~5~3~7 Optical contrast between a spot and surrounding field are detected by light detector 65 which may be a photodiode. Light is focussed onto detector 65 by beam splitter 67 and focusing lens 69. Servo motors, not shown, control the positions of the mirrors and drive the mirrors in accord with instructions received from control circuits, as well as from feedback devices. The detector 65 produces electrical signals corresponding to spots.
These signals are processed by the spoken-word recognition unit and used for identifying words spoken by a particular speaker.
The invention relates to spoken-word recogni-tion systems.
~O
Background Art Suzuki, et al. (U.S. patents 4,060,694, 4,078,154 and 4,100,370) teach a voice recognition system in which the phonemes as spoken by different speakers and the voice of the person speaking can be recognized. A
key phrase is spoken. Parallel filters derive a spectral characteristic parameter which contains weighting factors extracted and compared with the selected phoneme in memory. Improved specificity over other speakers can be obtained by varying the weighting factors through a num-ber of different values, and storing in memory the set of parameters for each sound as spoken by a specific speaker. The system can be used thus for voice verifica-tion.
Felix et al. (U.S. patent 4,449,189) disclose amethod for identifying an individual using a combination Dl: ~
1258~17 of speech and face recognition. The voice signature of a person uttering a key word into a microphone is compared in a pattern matched with the previously stored voice signature of a known person uttering the same key word.
At the same time, a momentary image of that person's mouth region is recorded and compared with that of the same known person. The results of the comparison are analyzed to verify that the identity of the speaker is that of the known person.
Katayama (U.S. patent 4,461,023) discloses a method of storing spoken words for use in a speech recog-nition system. Spoken words are input, then analyzed.
The resulting patterns are stored in memory. A-second memory stores the digitized speech that was input. The addresses of both memory units are specified to be the same for a particular word.
Systems that recognize voice commands require memory for storing speech characteristics for later com-parison. The system must first be "taught" the speech characteristics before it can rècognize specific voice commands. A problem with this is that each speaker has his own speech characteristics that the system must learn. Computer memory space is limited, so the speech characteristics of an individual must be relearned each time the speaker changes.
An object of the invention is to devise a spoken word recognition system which is of reduced com-plexity and which can be quickly and easily programmed to understand any individual's voice commands more readily.
Another object of the invention is to devise a system in which a voice command unit can be initialized easily by each individual user without needing a knowledge of programming or of the unit's operation.
Disclosure of Invention The above objects have been met by a system which stores a person's voice characteristics on a wallet-size card containing a laser recordable strip.
1;~583~7 These cards may be inserted into and removed from a word recognition processor. Every user has a card with his own pre-recorded speech characteristics thereon.
Upon insertion into a word recognition processor, the processor unit would be initialized with respect to the particular voice charcteristics of the owner of the card.
To encode the card, a set of words is spoken by a user into a microphone. The spoken words are analyzed and speech characteristics are extracted. Such charac-teristics include pitch, intonation, speed of speaking,accent parameters, and other parameters. A sufficient number of these characteristics is recorded on the card so that the words spoken later by the speaker may be understood by the word recognition unit.
A spoken-word recognition unit receives a user's voice message and identifies the words with the ; help of the speaker's voice characteristics in its mem-ory, which was initialized by the spoken-word identifica-tion data on a card. In this manner, the spoken words can be recognized. For each new speaker, the unit must first be "taught" a particular speaker's characteristics so that the unit can more easily recognize the spoken `
words. The card provides the information to teach the-unit. A record of an individual's speech characteristics is laser recorded on a card which is later read into the unit by placing it in a card reader and the character-istics entered into the short-term memory of the spoken-word recognition unit.
The card has a strip of laser recording material, such as the reflective direct-read-after-write material described in U.S. patent 4,284,716 to Drexler et al. A modulated laser beam records data on the strip, in situ, by ablation, melting, physical or cnemical change or deformation, thereby forming spots having a detectable 3~ change in an optical characteristic relative to the strip. The recording process on the above mentioned direct-read-after-write material produces differences in reflectivity detectable by a light detector. No pro-1;~5~17 cessing after laser recording is required when the re-cording strip is a direct-read-after-write material.
Laser recording materials also may be used that require heat processing after laser recording.
Each person has his own speech characteris-tics, in much the same way that each person has his own set of fingerprints. The card with the recorded speech characteristics is read by shining a laser beam or light emitting diode onto the strip. The beam, typically, has an intensity of ten percent of the recording intensity.
The beam is reflected from the strip to a photodetector.
The detector detects the contrast in optical character-istics between the strip and the recorded spots, and transmits corresponding signals to the speech recognition unit's short-term memory. The system is now ready to listen to words spoken by the user and to identify the words with the help of the speaker's voice characteris-tics stored in the memory. By this procedure the speaker's words are more clearly identified.
The uniform surface reflectivity of this reflec-tive strip before recording typically would range be-tween 8% and 65~. For a highly reflective strip the average reflectivity over a laser recorded spot might be in the range of 5% to 25%. Thus, the reflective contrast ratio of the recorded spots would range between 2:l and 7:l. Laser recording materials are known in the art that create either low reflectivity spots in a reflective field or high reflec~ive spots in a low reflectivity field. An example of the latter type is described in U.S. patent 4,343,879. When the reflectivity of the field is in the range of 8~ to 20% the reflective spots have a reflectivity of about 40~. The reflective con-trast ratio would range from 2:l to 5:l. Photographic pre-formatting would create spots having a 10% reflectivi-ty in a reflective field or 40~ in a low reflectivity field.
The voice information on the card would typi-cally be in digital form. It would inform the word 1~58:~17 recognition unit of macro aspects of speech such as accent parameters, speed of speaking, dropping of "th"
beginnings or "g" endings, variations in intensity as well as the micro aspects such as tone, pitch, intonation, etc. With this advance knowledge about the speech characteristics of the words about to be spoken the words can more easily be recognized.
The card can store tens, hundreds or even thousands of deviation parameters from a ~normal" voice.
When a word is not understood the word interpreter unit would add in corrections to the unidentified word based upon the individual's speech deviation information. At-tempts to recognize the word are then repeated.
Brief Description of the Drawings Fig. l is a schematic diagram of the spoken-word recognition system of the present invention.
Fig. 2 is a schematic diagram of the data card encoding of the present invention.
Fig. 3 is a plan view of one side of a data card in accord with the present invention.
Fig. 4 is a partial side sectional view taken along lines 4-4 in Fig. 3.
Fig. S is a detail of laser writing on a portion of the laser recording strip illustrated by dashed lines in Fig. 3.
Fig. 6 is a plan view of an apparatus for reading and writing on the optical recording media strip illustrated in Fig. 3.
Best Mode for Carrying Out the Invention With reference to Fig. l, a spoken-word recog-nition system lO reads a person's voice characteristics from a wallet-size card 31 containing a strip of laser recordable material. Each person would have a card 3l with his own speech characteristics prerecorded thereon.
The system lO is initialized with respect to the particular voice characteristics of the card owner by inserting the ( 1;~58;~7 card 31 into system lO. A sufficient number of character-istics is recorded so that words spoken by a particular speaker may be identified.
With reference to Fig. 2 a data card encoding system llO is used to form a card 131. A set of words 116 is spoken by a person`into a microphone 117. The resulting signal is analyzed by a speech analyzer 121 and speech characteristics 122 are extracted. such charac-teristics 122 include pitch, formats, ratio of voiced to unvoiced amplitudes, and other parameters used to help identify words and parts of words. The exact set of parameters will vary from one system to another, de-pending on the type of speech analysis which is used.
Macro aspects of speech such as accent parameters, speed of speaking, dropping of particular sounds at the begin-ning or ending of words, and variations in tone may also be included to make word recognition even easier. In any case, speech analyzer 121 sends a digital signal 122 repre-senting a person's speech characteristics to a data card writer/reader 129 which writes the data with a laser onto card 131 by shining a modulated laser beam 130 onto the card 131. The card 131 has a strip of optical contrast laser recording material disposed thereon. The beam 130 records data onto the card 131, in situ, by ablation, melting physical or chemical change or deformation, thereby forming spots with contrasting reflectivity rela-tive to the unreccrded strip. Reflected beam 132 is read by the card reader/w~iter 129 to confirm laser writing.
In Fig. 1 the spoken-word recognition system lO
is initialized by placing a prerecorded card 31 in data card reader 29. The card reader 29 shines a light beam 30 from a laser or a LED onto the prerecorded strip.
This read beam, typically, has an intensity of five to ten percent of the typical semiconductor laser recording intensity. The light beam 32 is reflected from the strip to a photodetector, which detects this contrast in re-flectivity between the strip and recorded spots. Card reader 29 transmits a signal 24 corresponding to the 1;~5~3~317 recorded data to the short-term memory of the spoken-word recognition unit 23.
The system 10 is now ready to listen to words 16 spoken by the user. The words 16 spoken into micro-phone 17 are analyzed and interpreted by the speechrecognition unit 23 with respect to the voice character-istics 24, now stored in its short-term memory. The words 16 are recognized and the result is sent to an output device 27, such as a CRT terminal.
With reference to Figs. 3 and 4, a data card 11 is illustrated having a size common to most credit cards.
The width dimension of such a card is approximately 54 mm and the length dimension is approximately 85 mm. These dimensions are not critical, but preferred because such a size easily fits into a wallet and has historically been adopted as a convenient size for automatic teller ma-chines and the like. The card's base 13 is a dielectric, usually a plastic material such as polyvinyl chloride or similar material. Polycarbonate plastic is preferred.
The surface finish of the base should have low specular reflectivity, preferably less than 10%.
Base 13 carries strip 15. The strip is about 16 or 35 millimeters wide and extends the length of the card. Alternatively, the strip may have other sizes and orientations. The strip is relatively thin, approximate-ly 60-200 microns, although this is not critical. The strip may be applied to the card by any convenient method which achieves flatness.
The strip is adhered to th~ card with an adhe-sive and covered by a transparent laminating sheet 19which serves to keep strip 15 flat, as well as protecting the strip from dust and scratches. Sheet 19 is a thin, transparent plastic sheet laminating material or a coat-ing, such as a transparent lacquer. The material is preferably made of polycarbonate plastic.
The opposite side of base 13 may have user identification indicia embossed on the surface of the 1;~5~:~1 7 card. Other indicia such as card number and the like may be optionally provided.
The high resolution laser recording material which forms strip 15 may ~e any of the reflective record-ing material which have been developed for use as directread-after-write (DRAW) optical disks, so long as the materials can be formed on thin substrates. An advantage of reflective materials over transmissive materials is that the read/write equipment is all on one side af the card, the data storage capacity is doubled, and the automatic focus is easier. For example, the high resolu-tion material described in U.S. patent 4,230,939 issued to de 8Ont, et al. teaches a thin metallic recording layer of reflecti~e metals such as Bi, Te, Ind, Sn, Cu , Al, Pt, Au, Rh, As, Sb, Ge, Se, Ga.
Materials which are preferred are those having high reflectivity and low melting point, particularly Cd, Sn, Tl, Ind, Bi and amalgams. Suspensions of reflective metal particles in organic colloids also form low melting temperature laser recording media. Silver is one such metal. Typical recording media are described in U.S.
patents Nos. 4,314,260, 4,298,684, 4,278,758, 4,278,758, 4,278,756 and 4,269,917, all assigned to the assignee of the present invention.
The laser recording material which is selected should be compatible with the laser which is used for writing on it. Some materials are more sensitive than others at certain wavelengths. Good sensitivity to in-frared light is preferred because infrared is affected least by scratches and dirt on the transparent laminating sheet. The selected recording material should have a favorable signal-to-noise ratio and form chigh contrast data bits with the read/write system with which it is used.
The material should not lose data when subjected to temperatures of about 17SF(79C) for long periods. The material should also be capable of re-cording at speeds of at least several thousand bits/sec.
1;~58;~.7 This generally precludes the use of materials that re-quire long heating times or that rely on slow chemical reactions in the presence of heat, which may permit recording of only a few bits/sec. A large number of highly reflective laser recording materials have been used for optical data disk applications.
Data is recorded by forming spots in the sur-rounding field of the reflective layer itself, thereby altering the reflectivity in the data spot. Data is read by detecting the optical reflective contrast between the surrounding reflective field of unrecorded areas and the recorded spots. Spot reflectivity of less than half the reflectivity of the surrounding field produces a contrast ratio of at least two to one, which is sufficient con-trast for reading. Greater contrast is preferred. Re-flectivity of the strip field of about 50% is preferred with reflectivity of a spot in the reflective field being less than 10%, thus creating a contrast ratio of greater than five to one. Alternatively, data may also be re-corded by increasing the reflectivity of the strip. Forexample, the recording laser can melt a field of dull microscopic spikes on the strip to create flat shiny spots. This method is described in SPIE, Vol. 329, Optical Disk Technology (1982), p. 202. A spot re-flectivity of more than twice the surrounding spikedfield reflectivity produces a contrast ratio of at least two to one, which is sufficient contrast for reading.
With reference to Fig. 5, a magnified view of laser writing on the laser recording-material strip 15 may be seen. The dashed line 33, corresponds to the dashed line 33 in Fig. 3. The oblong spots 35 are aligned in a path and have generally similar dimensions.
The spots are generally circ~lar or oval in shape with the axis of the oval perpendicular to the lengthwise dimension of the strip. A second group of spots 37 is shown aligned in a second path. The spots 37 have similar dimensions to the spots 35. The spacing between paths is not critical, except that the optics of the 1~5~
readback system should be able to easily distinguish between paths. Presently, in optical data storage technoloqy, tracks which are separated by only a few microns may be resolved. The spacing and pattern of the spots along each path is selected for easy decoding.
The spots illustrated in Fig. 5 have a recom-mended size of approximately 5 microns by 20 microns, or circular spots 5 microns or 10 microns in diameter.
Generally, the smallest dimension of a spot should be less than 50 microns. In the preferred embodiment the largest dimension would also be less than 50 microns. Of course, to offset lower densities from larger spots, the size of the strip 15 could be expanded to the point where it covers a large extent of the card. In Fig. 3, the laser recording strip 15 could completely cover a single side of the card. A minimum information capacity of 250,000 bits is indicated and a storage capacity of over one million bits is preferable.
In Fig. 6, a side view of the lengthwise dimen-sion of a card 41 is shown inserted into cardreader/writer 29. The card is usually received in a movable holder 42 which brings the card into the beam `
trajectory. A laser light source 43, preferably a pulsed semiconductor laser of near infrared wavelength emits a beam 45 which passes through collimating and focussing optics 47. The beam is sampled by a beam splitter 49 which transmits a portion of the beam through a focusing lens 51 to a photodetector 53. The detector 53 confirms laser writing and is not essential. The beam is then directed to a first servo controlled mirror 55 which is mounted for rotation along the axis 57 in the direction indicated by the arrows A. The purpose of the mirror 55 is to find the lateral edges of the laser recording material in a coarse mode of operation and then in a fine 3~ mode of operation identify data paths which exist prede-termined distances from the edges.
From mirror 55, the beam is directed toward mirror 61. This mirror is mounted for rotation at pivot 1'~58;317 63. The purpose of mirror 55 is for fine control of motion of the beam along the length of the card. Coarse control of the lengthwise position of the card relative to the beam is achieved by motion of movable holder 42.
The position of the holder may be established by a linear motor adjusted by a closed loop position servo system of the type used in magnetic disk drives.
During itsmanufacturethe card may be pre-recorded with database information or a preinscribed pattern containing servo tracks, timing marks, program instructions, and related functions. These positioning marks can be used as a reference for the laser recording system to record or read data at particular locations.
Each of the various spoken word recognition systems may have formats specific to its particular needs. U.s.
patent No. 4,304,848 describes how formatting may be done photolithographically. Formatting may also be done using laser recording or surface molding of the servo tracks, having marks, programming and related functions. Dil, in U.S. patent 4,209,804 teaches a type of surface molding.
Reference position information may be prerecorded on the card so that position error signals may be generated and used as feedback in motor control. Upon reading one data path, the mirror 55 is slightly rotated. The motor moves holder 4l lengthwise so that the path can be read, and so on.
Light scattered and reflected from the spots con-trasts with the surrounding field where no spots exist.
The beam should deliver sufficient laser pulse energy to the surface of the recording material to create spots.
Typically, 5-20 milliwatts is required, depending on the recording material. A 20 milliwatt semiconductor laser, focussed to a five micron beam size, records at tempera-tures of about 200 C and is capable of creating spots in less than 25 microseconds. The wavelength of the laser should be compatible with the recording mate-rial. In the read mode, power is lowered to about 5% to 10% of the record power.
-`` 1;~5~3~7 Optical contrast between a spot and surrounding field are detected by light detector 65 which may be a photodiode. Light is focussed onto detector 65 by beam splitter 67 and focusing lens 69. Servo motors, not shown, control the positions of the mirrors and drive the mirrors in accord with instructions received from control circuits, as well as from feedback devices. The detector 65 produces electrical signals corresponding to spots.
These signals are processed by the spoken-word recognition unit and used for identifying words spoken by a particular speaker.
Claims (6)
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A system for initializing spoken-word recognition units comprising, a spoken-word recognition unit having a voice input and a data input, and providing an output of word recognition, a data card reader connected to said data input of said spoken-word recognition unit, and a plurality of data cards for reading by the data card reader, each card associated with a speaker and having a strip of laser recording material for the optical storage of prerecorded data which is speaker characteristic data, said storage being in the form of spots having a reflective contrast ratio to surrounding unrecorded material of at least two to one, said speaker characteristic data derived from speech analysis of an input set of words spoken by said speaker and sufficient in extent to teach said spoken-word recognition unit words spoken into said voice input by said speaker, said words spoken into said voice input including words not found in said input set of words.
2. The system of claim 1 wherein said data card reader comprises, a light source having a light beam directed at one of said data cards, and a light detector disposed to receive said light beam reflected from said card, said detector connected to said spoken-word recognition unit for inputting said data on said card.
3. A system for encoding speech characteristic data on a card comprising, speech input means for inputting an input set of spoken words, analyzing means connected to said input means for extracting speech characteristics of individual users from said input set of spoken words, a data card writer/reader connected to said analyzing means, a plurality of data cards, adapted for writing by the card writer/reader, each card having a strip of laser recording material disposed thereon, said laser recording material having an encoded optical storage of speaker characteristic data sufficient in extent to characterize how a speaker would say words including words other than said input set of spoken words, said optical storage being in the form of spots written by said card writer/reader into said laser recording material, the reflective contrast ratio of the spots with respect to surrounding unrecorded material being at least two to one.
4. The system of claim 3 wherein said data card writer/reader comprises, a laser having a laser beam directed at one of said data cards, said laser connected to said analyzing means for receiving said speech characteristics, and a light detector disposed to receive said light beam reflected from said card.
5. A method for storing speaker dependent voice recognition data comprising, having at least one speaker speak an input set of words, analyzing said input set of words spoken by each speaker for extraction of speech characteristics from said spoken input set of words, said speech characteristics sufficient in extent to characterize words for a specific speaker including other than said input set of words, generating digital data corresponding to said speech characteristics of each speaker, and recording with a modulated laser beam said digital data corresponding to the speech characteristics of a speaker onto a card having a strip of laser recording material, said recording forming spots representing said speech characteristics, said spots having a detectable change in an optical characteristic relative to said strip.
6. A method for initializing spoken-word recognition units comprising, placing a data card in data reading relation to a data card reader, said data card associated with a speaker and having a strip of laser recording material having speaker characteristic data prerecorded thereon, said data prerecorded in the form of spots having a reflectivity distinct from the reflectivity of unrecorded laser recording material, said speaker characteristic data derived from analysis of an input set of words spoken by said speaker and sufficient in extent to teach a spoken-word recognition unit to understand words spoken by said speaker, said words spoken by said speaker including words not found in said input set of words, reading said data on said data card with said data card reader, and inputting said data read by the card reader to a spoken-word recognition unit, said unit being connected to said card reader for receiving said data input, whereby said unit is initialized for understanding the words spoken by said speaker based on the input data.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US72138185A | 1985-04-09 | 1985-04-09 | |
US721,381 | 1985-04-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
CA1258317A true CA1258317A (en) | 1989-08-08 |
Family
ID=24897749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA000506053A Expired CA1258317A (en) | 1985-04-09 | 1986-04-08 | Data card system for initializing spoken-word recognition units |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0218723A1 (en) |
CA (1) | CA1258317A (en) |
WO (1) | WO1986006197A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0248593A1 (en) * | 1986-06-06 | 1987-12-09 | Speech Systems, Inc. | Preprocessing system for speech recognition |
JPH0795240B2 (en) * | 1986-12-19 | 1995-10-11 | 株式会社日立製作所 | Card system with personal voice pattern |
US4827518A (en) * | 1987-08-06 | 1989-05-02 | Bell Communications Research, Inc. | Speaker verification system using integrated circuit cards |
FR2642882B1 (en) * | 1989-02-07 | 1991-08-02 | Ripoll Jean Louis | SPEECH PROCESSING APPARATUS |
ES2114493A1 (en) * | 1996-05-22 | 1998-05-16 | Univ Madrid Politecnica | System for verifying the identity of persons by means of a portable data medium based on voice recognition. |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BE787377A (en) * | 1971-08-09 | 1973-02-09 | Waterbury Nelson J | SECURITY CARDS AND SYSTEM FOR USING SUCH CARDS |
US4284716A (en) * | 1979-07-06 | 1981-08-18 | Drexler Technology Corporation | Broadband reflective laser recording and data storage medium with absorptive underlayer |
-
1986
- 1986-03-10 WO PCT/US1986/000494 patent/WO1986006197A1/en unknown
- 1986-03-10 EP EP86903718A patent/EP0218723A1/en not_active Withdrawn
- 1986-04-08 CA CA000506053A patent/CA1258317A/en not_active Expired
Also Published As
Publication number | Publication date |
---|---|
WO1986006197A1 (en) | 1986-10-23 |
EP0218723A1 (en) | 1987-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4500777A (en) | High data capacity, scratch and dust resistant, infrared, read-write data card for automatic teller machines | |
US4360728A (en) | Banking card for automatic teller machines and the like | |
US4711996A (en) | Redundant optical recording of information in different formats | |
US4542288A (en) | Method for making a laser recordable wallet-size plastic card | |
US4544835A (en) | Data system containing a high capacity optical contrast laser recordable wallet-size plastic card | |
US5421619A (en) | Laser imaged identification card | |
US4683371A (en) | Dual stripe optical data card | |
US4680459A (en) | Updatable micrographic pocket data card | |
US4609812A (en) | Prerecorded dual strip data storage card | |
US4680460A (en) | System and method for making recordable wallet-size optical card | |
US4910725A (en) | Optical recording method for data cards | |
US4680458A (en) | Laser recording and storage medium | |
US4680456A (en) | Data system employing wallet-size optical card | |
WO1988002169A1 (en) | Dual beam optical data system | |
US4656346A (en) | System for optically reading and annotating text on a data card | |
CA1258317A (en) | Data card system for initializing spoken-word recognition units | |
AU8272582A (en) | Banking card for automatic teller machines and the like | |
EP0784317A3 (en) | Magneto-optic disk apparatus | |
JP2001126267A (en) | Optical recording medium recorder and optical recording medium | |
JPS6082396A (en) | Method of discriminating card by hologram |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MKEX | Expiry |