WO1986006197A1 - Data card system for initializing spoken-word recognition units - Google Patents
Data card system for initializing spoken-word recognition units Download PDFInfo
- Publication number
- WO1986006197A1 WO1986006197A1 PCT/US1986/000494 US8600494W WO8606197A1 WO 1986006197 A1 WO1986006197 A1 WO 1986006197A1 US 8600494 W US8600494 W US 8600494W WO 8606197 A1 WO8606197 A1 WO 8606197A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- card
- spoken
- words
- spots
- Prior art date
Links
- 230000003287 optical effect Effects 0.000 claims abstract description 14
- 239000000463 material Substances 0.000 claims description 35
- 238000000034 method Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 5
- 238000013500 data storage Methods 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims 1
- 238000000605 extraction Methods 0.000 claims 1
- 239000011232 storage material Substances 0.000 claims 1
- 238000002310 reflectometry Methods 0.000 description 17
- 230000015654 memory Effects 0.000 description 7
- 238000002844 melting Methods 0.000 description 4
- 230000008018 melting Effects 0.000 description 4
- 229920003023 plastic Polymers 0.000 description 4
- 230000006403 short-term memory Effects 0.000 description 4
- 238000010030 laminating Methods 0.000 description 3
- 239000004033 plastic Substances 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000011065 in-situ storage Methods 0.000 description 2
- 229910052751 metal Inorganic materials 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 238000000465 moulding Methods 0.000 description 2
- 239000004417 polycarbonate Substances 0.000 description 2
- 229920000515 polycarbonate Polymers 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 229910052718 tin Inorganic materials 0.000 description 2
- 229910000497 Amalgam Inorganic materials 0.000 description 1
- 229910052787 antimony Inorganic materials 0.000 description 1
- 229910052785 arsenic Inorganic materials 0.000 description 1
- 229910052797 bismuth Inorganic materials 0.000 description 1
- 229910052793 cadmium Inorganic materials 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000000084 colloidal system Substances 0.000 description 1
- 244000221110 common millet Species 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 239000000428 dust Substances 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 229910052732 germanium Inorganic materials 0.000 description 1
- 230000003760 hair shine Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000004922 lacquer Substances 0.000 description 1
- 239000002923 metal particle Substances 0.000 description 1
- 150000002739 metals Chemical class 0.000 description 1
- 239000002985 plastic film Substances 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 229920000136 polysorbate Polymers 0.000 description 1
- 239000004800 polyvinyl chloride Substances 0.000 description 1
- 229920000915 polyvinyl chloride Polymers 0.000 description 1
- 229910052711 selenium Inorganic materials 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 229910052714 tellurium Inorganic materials 0.000 description 1
- 229910052716 thallium Inorganic materials 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G07—CHECKING-DEVICES
- G07C—TIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
- G07C9/00—Individual registration on entry or exit
- G07C9/20—Individual registration on entry or exit involving the use of a pass
- G07C9/22—Individual registration on entry or exit involving the use of a pass in combination with an identity check of the pass holder
- G07C9/25—Individual registration on entry or exit involving the use of a pass in combination with an identity check of the pass holder using biometric data, e.g. fingerprints, iris scans or voice recognition
- G07C9/257—Individual registration on entry or exit involving the use of a pass in combination with an identity check of the pass holder using biometric data, e.g. fingerprints, iris scans or voice recognition electronically
Definitions
- the invention relates to spoken-word recogni ⁇ tion systems.
- Suzuki, et al. (U.S. patents 4,060,694, 4,078,154 and 4,100,370) teach a voice recognition system in which the phonemes as spoken by different speakers and the voice of the person speaking can be recognized. A key phrase is spoken. Parallel filters derive a spectral characteristic parameter which contains weighting factors extracted and compared with the selected phoneme in memory. Improved specificity over other speakers can be obtained by varying the weighting factors through a num ⁇ ber of different values, and storing in memory the set of parameters for each sound as spoken by a specific speaker. The system can be used thus for voice Feedback ⁇ tion.
- Felix et al. disclose a method for identifying an individual using a combination of speech and face recognition.
- the voice signature of a person uttering a key word into a microphone is compared in a pattern matched with the previously stored voice signature of a known person uttering the same key word.
- a momentary image of that person's mouth region is recorded and compared with that of the same known person.
- the results of the comparison are analyzed to verify that the identity of the speaker is that of the known person.
- Katayama U.S. patent 4,461,023 discloses a method of storing spoken words for use in a speech recog- -2-
- An object of the invention is to devise a spoken word recognition system which is of reduced com ⁇ plexity and which can be quickly and easily programmed to understand any individual's voice commands more readily.
- Another object of the invention is to devise a system in which a voice command unit can be initialized easily by each individual user without needing a knowledge of programming or of the unit's operation.
- a set of words is spoken by a user into a microphone.
- the spoken words are analyzed and speech characteristics are extracted.
- Such charac ⁇ teristics include pitch, intonation, speed of speaking, accent parameters, and other parameters.
- a spoken-word recognition unit receives a user's voice message and identifies the words with the help of the speaker's voice characteristics in its mem ⁇ ory, which was initialized by the spoken-word identifica ⁇ tion data on a card. In this manner, the spoken words can be recognized. For each new speaker, the unit must first be "taught" a particular speaker's characteristics so that the unit can more easily recognize the spoken words. The card provides the information to teach the unit. A record of an individual's speech characteristics is laser recorded on a card which is later read into the unit by placing it in a card reader and the character ⁇ istics entered into the short-term memory of the spoken- word recognition unit.
- the card has a strip of laser recording material, such as the reflective direct-read-after-write material described in U.S. patent 4,284,716 to Drexler et al.
- a modulated laser beam records data on the strip, in situ, by ablation, melting, physical or chemical change or deformation, thereby forming spots having a detectable change in an optical characteristic relative to the strip.
- the recording process on the above mentioned direct-read-after-write material produces differences in reflectivity detectable by a light detector. No pro ⁇ cessing after laser recording is required when the re ⁇ cording strip is a direct-read-after-write material.
- Laser recording materials also may be used that require heat processing after laser recording.
- Each person has his own speech characteris ⁇ tics, in much the same way that each person has his own set of fingerprints.
- the card with the recorded speech characteristics is read by shining a laser beam or light emitting diode onto the strip.
- the beam typically, has an intensity of ten percent of the recording intensity.
- the beam is reflected from the strip to a photodetector.
- the detector detects the contrast in optical character ⁇ istics between the strip and the recorded spots, and transmits corresponding signals to the speech recognition unit's short-term memory.
- the system is now ready to listen to words spoken by the user and to identify the words with the help of the speaker's voice characteris ⁇ tics stored in the memory. By this procedure the speaker's words are more clearly identified.
- the uniform surface reflectivity of this reflec ⁇ tive strip before recording typically would range be ⁇ tween 8% and 65%.
- the average reflectivity over a laser recorded spot might be in the range of 5% to 25%.
- the reflective contrast ratio of the recorded spots would range between 2:1 and 7:1.
- Laser recording materials are known in the art that create either low reflectivity spots in a reflective field or high reflective spots in a low reflectivity field. An example of the latter type is described in U.S. patent 4,343,879. When the reflectivity of the field is in the range of 8% to 20% the reflective spots have a reflectivity of about 40%.
- the reflective con ⁇ trast ratio would range from 2:1 to 5:1. Photographic pre-formatting would create spots having a 10% reflectivi ⁇ ty in a reflective field or 40% in a low reflectivity field.
- the voice information on the card would typi ⁇ cally be in digital form. It would inform the word recognition unit of macro aspects of speech such as accent parameters, speed of speaking, dropping of "th" beginnings or “g” endings, variations in intensity as well as the micro aspects such as tone, pitch, intonation, etc. With this advance knowledge about the speech characteristics of the words about to be spoken the words can more easily be recognized.
- the card can store tens, hundreds or even thousands of deviation parameters from a "normal" voice. When a word is not understood the word interpreter unit would add in corrections to the unidentified word based -5-
- Fig. 1 is a schematic diagram of the spoken- word recognition system of the present invention.
- Fig. 2 is a schematic diagram of the data card encoding of the present invention.
- Fig. 3 is a plan view of one side of a data card in accord with the present invention.
- Fig. 4 is a partial side sectional view taken along lines 4-4 in Fig. 3.
- Fig. 5 is a detail of laser writing on a portion of the laser recording strip illustrated by dashed lines in Fig. 3.
- Fig. 6 is a plan view of an apparatus for reading and writing on the optical recording media strip illustrated in Fig. 3.
- a spoken-word recog ⁇ nition system 10 reads a person's voice characteristics from a wallet-size card 31 containing a strip of laser recordable material. Each person would have a card 31 with his own speech characteristics prerecorded thereon.
- the system 10 is initialized with respect to the particular voice characteristics of the card owner by inserting the card 31 into system 10. A sufficient number of character ⁇ istics is recorded so that words spoken by a particular speaker may be identified.
- a data card encoding system 110 is used to form a card 131.
- a set of words 116 is spoken by a person into a microphone 117.
- the resulting signal is analyzed by a speech analyzer 121 and speech characteristics 122 are extracted.
- Such charac ⁇ teristics 122 include pitch, formats, ratio of voiced to unvoiced amplitudes, and other parameters used to help identify words and parts of words.
- speech analyzer 121 sends a digital signal 122 repre ⁇ senting a person's speech characteristics to a data card writer/reader 129 which writes the data with a laser onto card 131 by shining a modulated laser beam 130 onto the card 131.
- the card 131 has a strip of optical contrast laser recording material disposed thereon.
- the beam 130 records data onto the card 131, in situ, by ablation, melting physical or chemical change or deformation, thereby forming spots with contrasting reflectivity rela ⁇ tive to the unrecorded strip.
- Reflected beam 132 is read by the card reader/writer 129 to confirm laser writing.
- the spoken-word recognition system 10 is initialized by placing a prerecorded card 31 in data card reader 29.
- the card reader 29 shines a light beam 30 from a laser or a LED onto the prerecorded strip.
- This read beam typically, has an intensity of five to ten percent of the typical semiconductor laser recording intensity.
- the light beam 32 is reflected from the strip to a photodetector, which detects this contrast in re- flectivity between the strip and recorded spots.
- Card reader 29 transmits a signal 24 corresponding to the recorded data to the short-term memory of the spoken-word recognition unit 23.
- the system 10 is now ready to listen to words
- the words 16 spoken by the user are analyzed and interpreted by the speech recognition unit 23 with respect to the voice character ⁇ istics 24, now stored in its short-term memory.
- the words 16 are recognized and the result is sent to an output device 27, such as a CRT terminal.
- a data card 11 is illustrated having a size common to most credit cards. -7-
- the width dimension of such a card is approximately 54 mi and the length dimension is approximately 85 mm. These dimensions are not critical, but preferred because such a size easily fits into a wallet and has historically been adopted as a convenient size for automatic teller ma ⁇ chines and the like.
- the card's base 13 is a dielectric, usually a plastic material such as polyvinyl chloride or similar material. Polycarbonate plastic is preferred.
- the surface finish of the base should have low specular reflectivity, preferably less than 10%.
- Base 13 carries strip 15.
- the strip is about 16 or 35 millimeters wide and extends the length of the card. Alternatively, the strip may have other sizes and orientations.
- the strip is relatively thin, approximate- ly 60-200 microns, although this is not critical.
- the strip may be applied to the card by any convenient method which achieves flatness.
- the strip is adhered to the card with an adhe ⁇ sive and covered by a transparent laminating sheet 19 which serves to keep strip 15 flat, as well as protecting the strip from dust and scratches.
- Sheet 19 is a thin, transparent plastic sheet laminating material or a coat ⁇ ing, such as a transparent lacquer.
- the material is preferably made of polycarbonate plastic.
- the opposite side of base 13 may have user identification indicia embossed on the surface of the card. Other indicia such as card number and the like may be optionally provided.
- the high resolution laser recording material which forms strip 15 may be any of the reflective record ⁇ ing material which have been developed for use as direct read-after-write (DRAW) optical disks, so long as the materials can be formed on thin substrates.
- DRAW direct read-after-write
- An advantage of reflective materials over transmissive materials is that the read/write equipment is all on one side of the card, the data storage capacity is doubled, and the automatic focus is easier.
- Materials which are preferred are those having high reflectivity and low melting point, particularly Cd, Sn, Tl, Ind, Bi and amalgams. Suspensions of reflective metal particles in organic colloids also form low melting temperature laser recording media. Silver is one such metal. Typical recording media are described in U.S. patents Nos. 4,314,260, 4,298,684, 4,278,758, 4,278,758, 4,278,756 and 4,269,917, all assigned to the assignee of the present invention.
- the laser recording material which is selected should be compatible with the laser which is used for writing on it. Some materials are more sensitive than others at certain wavelengths. Good sensitivity to in ⁇ frared light is preferred because infrared is affected least by scratches and dirt on the transparent laminating sheet.
- the selected recording material- should have a favorable signal-to-noise ratio and form chigh contrast data bits with the read/write system with which it is used.
- the material should not lose data when subjected to temperatures of about 122°F(50°C) for long periods.
- the material should also be capable of re ⁇ cording at speeds of at least several thousand bits/sec. This generally precludes the use of materials that re ⁇ quire long heating times or that rely on slow chemical reactions in the presence of heat, which may permit recording of only a few bits/sec.
- a large number of highly reflective laser recording materials have been used for optical data disk applications.
- Data is recorded by forming spots in the sur ⁇ rounding field of the reflective layer itself, thereby altering the reflectivity in the data spot.
- Data is read by detecting the optical reflective contrast between the surrounding reflective field of unrecorded areas and the recorded spots. Spot reflectivity of less than half the
- reflectivity of the surrounding field produces a contrast ratio of at least two to one, which is sufficient con ⁇ trast for reading. Greater contrast is preferred.
- Re ⁇ flectivity of the strip field of about 50% is preferred with reflectivity of a spot in the reflective field being less than 10%, thus creating a contrast ratio of greater than five to one.
- data may also be re ⁇ corded by increasing the reflectivity of the strip.
- the recording laser can melt a field of dull microscopic spikes on the strip to create flat shiny spots. This method is described in SPIE, Vol. 329, Optical Disk Technology (1982), p. 202.
- a spot re ⁇ flectivity of more than twice the surrounding spiked field reflectivity produces a contrast ratio of at least two to one, which is sufficient contrast for reading.
- the dashed line 33 corresponds to the dashed line 33 in Fig. 3.
- the oblong spots 35 are aligned in a path and have generally similar dimensions.
- the spots are generally circular or oval in shape with the axis of the oval perpendicular to the lengthwise dimension of the strip.
- a second group of spots 37 is shown aligned in a second path.
- the spots 37 have similar dimensions to the spots 35.
- the spacing between paths is not critical, except that the optics of the readback system should be able to easily distinguish between paths.
- tracks which are separated by only a few microns may be resolved. The spacing and pattern of the spots along each path is selected for easy decoding.
- the spots illustrated in Fig. 5 have a recom ⁇ mended size of approximately 5 microns by 20 microns, or circular spots 5 microns or 10 microns in diameter.
- the smallest dimension of a spot should be less than 50 microns. In the preferred embodiment the largest dimension would also be less than 50 microns.
- the size of the strip 15 could be expanded to the point where it covers a large extent of the card.
- the laser recording strip 15 could completely cover a single side of the card.
- a minimum information capacity of 250,000 bits is indicated and a storage capacity of over one million bits is preferable.
- a side view of the lengthwise dimen ⁇ sion of a card 41 is shown inserted into card reader/writer 29.
- the card is usually received in a movable holder 42 which brings the card into the beam trajectory.
- a laser light source 43 preferably a pulsed semiconductor laser of near infrared wavelength emits a beam 45 which passes through collimating and focussing optics 47.
- the beam is sampled by a beam splitter 49 which transmits a portion of the beam through a focusing lens 51 to a photodetector 53.
- the detector 53 confirms laser writing and is not essential.
- the beam is then directed to a first servo controlled mirror 55 which is mounted for rotation along the axis 57 in the direction indicated by the arrows A.
- the purpose of the mirror 55 is to find the lateral edges of the laser recording material in a coarse mode of operation and then in a fine mode of operation identify data paths which exist prede ⁇ termined distances from the edges. From mirror 55, the beam is directed toward mirror 61. This mirror is mounted for rotation at pivot 63. The purpose of mirror 55 is for fine control of motion of the beam along the length of the card. Coarse control of the lengthwise position of the card relative to the beam is achieved by motion of movable holder 42. The position of the holder may be established by a linear motor adjusted by a closed loop position servo system of the type used in magnetic disk drives.
- the card may be pre- recorded with database information or a preinscribed pattern containing servo tracks, timing marks, program instructions, and related functions. These positioning marks can be used as a reference for the laser recording -11-
- U.S. patent No. 4,304,848 describes how formatting may be done photolithographically. Formatting may also be done using laser recording or surface molding of the servo tracks, having marks, programming and related functions. Dil, in U.S. patent 4,209,804 teaches a type of surface molding. Reference position information may be prerecorded on the card so that position error signals may be generated and used as feedback in motor control. Upon reading one data path, the mirror 55 is slightly rotated. The motor moves holder 41 lengthwise so that the path can be read, and so on.
- the beam should deliver sufficient laser pulse energy to the surface of the recording material to create spots. Typically, 5-20 milliwatts is required, depending on the recording material.
- a 20 milliwatt semiconductor laser focussed to a five micron beam size, records at tempera ⁇ tures of about 200 C and is capable of creating spots in about 75 microseconds.
- the wavelength of the laser should be compatible with the recording material. In the read mode, power is lowered to about 5% to 10% of the record power.
- Optical contrast between a spot and surrounding field are detected by light detector 65 which may be a photodiode.
- Light is focussed onto detector 65 by beam splitter 67 and focusing lens 69.
- Servo motors not shown, control the positions of the mirrors and drive the mirrors in accord with instructions received from control circuits, as well as from feedback devices.
- the detector 65 produces electrical signals corresponding to spots. These signals are processed by the spoken-word recogni ⁇ tion unit and used for identifying words spoken by a particular speaker.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Credit Cards Or The Like (AREA)
- Optical Recording Or Reproduction (AREA)
Abstract
A spoken-word recognition system for recognizing a speaker's words with the assistance of a data card intializing system. The data is stored on date cards, each card (31) having sufficient data to teach the recognition unit (23) to recognize the words of the speaker. A data card reader (29) reads the optical data on the card and inputs this data into the spoken-word recognition unit. An auxiliary system is used to encode the cards (131) with the speaker's voice characteristics through use of selected speech inputting of a set of words (116) to a microphone (117), followed by a data card writer (129) which writes the data on the data card.
Description
-I-
Description
Data Card System for Initializing Spoken-Word Recognition Units
Technical Field.
The invention relates to spoken-word recogni¬ tion systems.
Background Art
Suzuki, et al. (U.S. patents 4,060,694, 4,078,154 and 4,100,370) teach a voice recognition system in which the phonemes as spoken by different speakers and the voice of the person speaking can be recognized. A key phrase is spoken. Parallel filters derive a spectral characteristic parameter which contains weighting factors extracted and compared with the selected phoneme in memory. Improved specificity over other speakers can be obtained by varying the weighting factors through a num¬ ber of different values, and storing in memory the set of parameters for each sound as spoken by a specific speaker. The system can be used thus for voice verifica¬ tion.
Felix et al. (U.S. patent 4,449,189) disclose a method for identifying an individual using a combination of speech and face recognition. The voice signature of a person uttering a key word into a microphone is compared in a pattern matched with the previously stored voice signature of a known person uttering the same key word. At the same time, a momentary image of that person's mouth region is recorded and compared with that of the same known person. The results of the comparison are analyzed to verify that the identity of the speaker is that of the known person.
Katayama (U.S. patent 4,461,023) discloses a method of storing spoken words for use in a speech recog-
-2-
nition system. Spoken words are input, then analyzed. The resulting patterns are stored in memory. A second memory stores the digitized speech that was input. The addresses of both memory units are specified to be the same for a particular word.
Systems that recognize voice commands require memory for storing speech characteristics for later com¬ parison. The system must first be "taught" the speech characteristics before it can recognize specific voice commands. A problem with this is that each speaker has his own speech characteristics that the system must learn. Computer memory space is limited, so the speech characteristics of an individual must be relearned each time the speaker changes.
An object of the invention is to devise a spoken word recognition system which is of reduced com¬ plexity and which can be quickly and easily programmed to understand any individual's voice commands more readily.
Another object of the invention is to devise a system in which a voice command unit can be initialized easily by each individual user without needing a knowledge of programming or of the unit's operation.
Disclosure of Invention
The above objects have been met by a system which stores a person's voice characteristics on a wallet-size card containing a laser recordable strip. These cards may be inserted into and removed from a word recognition processor. Every user has a card with his own pre-recorded speech characteristics thereon. Upon insertion into a word recognition processor, the processor unit would be initialized with respect to the particular voice charcteristics of the owner of the card.
To encode the card, a set of words is spoken by a user into a microphone. The spoken words are analyzed and speech characteristics are extracted. Such charac¬ teristics include pitch, intonation, speed of speaking, accent parameters, and other parameters. A sufficient
■3-
number of these characteristics is recorded on the card so that the words spoken later by the speaker may be understood by the word recognition unit.
A spoken-word recognition unit receives a user's voice message and identifies the words with the help of the speaker's voice characteristics in its mem¬ ory, which was initialized by the spoken-word identifica¬ tion data on a card. In this manner, the spoken words can be recognized. For each new speaker, the unit must first be "taught" a particular speaker's characteristics so that the unit can more easily recognize the spoken words. The card provides the information to teach the unit. A record of an individual's speech characteristics is laser recorded on a card which is later read into the unit by placing it in a card reader and the character¬ istics entered into the short-term memory of the spoken- word recognition unit.
The card has a strip of laser recording material, such as the reflective direct-read-after-write material described in U.S. patent 4,284,716 to Drexler et al. A modulated laser beam records data on the strip, in situ, by ablation, melting, physical or chemical change or deformation, thereby forming spots having a detectable change in an optical characteristic relative to the strip. The recording process on the above mentioned direct-read-after-write material produces differences in reflectivity detectable by a light detector. No pro¬ cessing after laser recording is required when the re¬ cording strip is a direct-read-after-write material. Laser recording materials also may be used that require heat processing after laser recording.
Each person has his own speech characteris¬ tics, in much the same way that each person has his own set of fingerprints. The card with the recorded speech characteristics is read by shining a laser beam or light emitting diode onto the strip. The beam, typically, has an intensity of ten percent of the recording intensity. The beam is reflected from the strip to a photodetector.
-4-
The detector detects the contrast in optical character¬ istics between the strip and the recorded spots, and transmits corresponding signals to the speech recognition unit's short-term memory. The system is now ready to listen to words spoken by the user and to identify the words with the help of the speaker's voice characteris¬ tics stored in the memory. By this procedure the speaker's words are more clearly identified.
The uniform surface reflectivity of this reflec¬ tive strip before recording typically would range be¬ tween 8% and 65%. For a highly reflective strip the average reflectivity over a laser recorded spot might be in the range of 5% to 25%. Thus, the reflective contrast ratio of the recorded spots would range between 2:1 and 7:1. Laser recording materials are known in the art that create either low reflectivity spots in a reflective field or high reflective spots in a low reflectivity field. An example of the latter type is described in U.S. patent 4,343,879. When the reflectivity of the field is in the range of 8% to 20% the reflective spots have a reflectivity of about 40%. The reflective con¬ trast ratio would range from 2:1 to 5:1. Photographic pre-formatting would create spots having a 10% reflectivi¬ ty in a reflective field or 40% in a low reflectivity field.
The voice information on the card would typi¬ cally be in digital form. It would inform the word recognition unit of macro aspects of speech such as accent parameters, speed of speaking, dropping of "th" beginnings or "g" endings, variations in intensity as well as the micro aspects such as tone, pitch, intonation, etc. With this advance knowledge about the speech characteristics of the words about to be spoken the words can more easily be recognized. The card can store tens, hundreds or even thousands of deviation parameters from a "normal" voice. When a word is not understood the word interpreter unit would add in corrections to the unidentified word based
-5-
upon the individual's speech deviation information. At¬ tempts to recognize the word are then repeated.
Brief Description of the Drawings
Fig. 1 is a schematic diagram of the spoken- word recognition system of the present invention.
Fig. 2 is a schematic diagram of the data card encoding of the present invention.
Fig. 3 is a plan view of one side of a data card in accord with the present invention.
Fig. 4 is a partial side sectional view taken along lines 4-4 in Fig. 3.
Fig. 5 is a detail of laser writing on a portion of the laser recording strip illustrated by dashed lines in Fig. 3.
Fig. 6 is a plan view of an apparatus for reading and writing on the optical recording media strip illustrated in Fig. 3.
Best Mode for Carrying Out the Invention
With reference to Fig. 1, a spoken-word recog¬ nition system 10 reads a person's voice characteristics from a wallet-size card 31 containing a strip of laser recordable material. Each person would have a card 31 with his own speech characteristics prerecorded thereon.
The system 10 is initialized with respect to the particular voice characteristics of the card owner by inserting the card 31 into system 10. A sufficient number of character¬ istics is recorded so that words spoken by a particular speaker may be identified.
With reference to Fig. 2 a data card encoding system 110 is used to form a card 131. A set of words 116 is spoken by a person into a microphone 117. The resulting signal is analyzed by a speech analyzer 121 and speech characteristics 122 are extracted. Such charac¬ teristics 122 include pitch, formats, ratio of voiced to unvoiced amplitudes, and other parameters used to help identify words and parts of words. The exact set of
-6-
parameters will vary from one system to another, de¬ pending on the type of speech analysis which is used. Macro aspects of speech such as accent parameters, speed of speaking, dropping of particular sounds at the begin¬ ning or ending of words, and variations in tone may also be included to make word recognition even easier. In any case, speech analyzer 121 sends a digital signal 122 repre¬ senting a person's speech characteristics to a data card writer/reader 129 which writes the data with a laser onto card 131 by shining a modulated laser beam 130 onto the card 131. The card 131 has a strip of optical contrast laser recording material disposed thereon. The beam 130 records data onto the card 131, in situ, by ablation, melting physical or chemical change or deformation, thereby forming spots with contrasting reflectivity rela¬ tive to the unrecorded strip. Reflected beam 132 is read by the card reader/writer 129 to confirm laser writing.
In Fig. 1 the spoken-word recognition system 10 is initialized by placing a prerecorded card 31 in data card reader 29. The card reader 29 shines a light beam 30 from a laser or a LED onto the prerecorded strip. This read beam, typically, has an intensity of five to ten percent of the typical semiconductor laser recording intensity. The light beam 32 is reflected from the strip to a photodetector, which detects this contrast in re- flectivity between the strip and recorded spots. Card reader 29 transmits a signal 24 corresponding to the recorded data to the short-term memory of the spoken-word recognition unit 23. The system 10 is now ready to listen to words
16 spoken by the user. The words 16 spoken into micro¬ phone 17 are analyzed and interpreted by the speech recognition unit 23 with respect to the voice character¬ istics 24, now stored in its short-term memory. The words 16 are recognized and the result is sent to an output device 27, such as a CRT terminal.
With reference to Figs. 3 and 4, a data card 11 is illustrated having a size common to most credit cards.
-7-
The width dimension of such a card is approximately 54 mi and the length dimension is approximately 85 mm. These dimensions are not critical, but preferred because such a size easily fits into a wallet and has historically been adopted as a convenient size for automatic teller ma¬ chines and the like. The card's base 13 is a dielectric, usually a plastic material such as polyvinyl chloride or similar material. Polycarbonate plastic is preferred. The surface finish of the base should have low specular reflectivity, preferably less than 10%.
Base 13 carries strip 15. The strip is about 16 or 35 millimeters wide and extends the length of the card. Alternatively, the strip may have other sizes and orientations. The strip is relatively thin, approximate- ly 60-200 microns, although this is not critical. The strip may be applied to the card by any convenient method which achieves flatness.
The strip is adhered to the card with an adhe¬ sive and covered by a transparent laminating sheet 19 which serves to keep strip 15 flat, as well as protecting the strip from dust and scratches. Sheet 19 is a thin, transparent plastic sheet laminating material or a coat¬ ing, such as a transparent lacquer. The material is preferably made of polycarbonate plastic. The opposite side of base 13 may have user identification indicia embossed on the surface of the card. Other indicia such as card number and the like may be optionally provided.
The high resolution laser recording material which forms strip 15 may be any of the reflective record¬ ing material which have been developed for use as direct read-after-write (DRAW) optical disks, so long as the materials can be formed on thin substrates. An advantage of reflective materials over transmissive materials is that the read/write equipment is all on one side of the card, the data storage capacity is doubled, and the automatic focus is easier. For example, the high resolu¬ tion material described in U.S. patent 4,230,939 issued
-8-
to de Bont, et al. teaches a thin metallic recording layer of reflective metals such as Bi, Te, Ind, Sn, Cu , Al, Pt, Au, Rh, As, Sb, Ge, Se, 6a.
Materials which are preferred are those having high reflectivity and low melting point, particularly Cd, Sn, Tl, Ind, Bi and amalgams. Suspensions of reflective metal particles in organic colloids also form low melting temperature laser recording media. Silver is one such metal. Typical recording media are described in U.S. patents Nos. 4,314,260, 4,298,684, 4,278,758, 4,278,758, 4,278,756 and 4,269,917, all assigned to the assignee of the present invention.
The laser recording material which is selected should be compatible with the laser which is used for writing on it. Some materials are more sensitive than others at certain wavelengths. Good sensitivity to in¬ frared light is preferred because infrared is affected least by scratches and dirt on the transparent laminating sheet. The selected recording material- should have a favorable signal-to-noise ratio and form chigh contrast data bits with the read/write system with which it is used.
The material should not lose data when subjected to temperatures of about 122°F(50°C) for long periods. The material should also be capable of re¬ cording at speeds of at least several thousand bits/sec. This generally precludes the use of materials that re¬ quire long heating times or that rely on slow chemical reactions in the presence of heat, which may permit recording of only a few bits/sec. A large number of highly reflective laser recording materials have been used for optical data disk applications.
Data is recorded by forming spots in the sur¬ rounding field of the reflective layer itself, thereby altering the reflectivity in the data spot. Data is read by detecting the optical reflective contrast between the surrounding reflective field of unrecorded areas and the recorded spots. Spot reflectivity of less than half the
-9-
reflectivity of the surrounding field produces a contrast ratio of at least two to one, which is sufficient con¬ trast for reading. Greater contrast is preferred. Re¬ flectivity of the strip field of about 50% is preferred with reflectivity of a spot in the reflective field being less than 10%, thus creating a contrast ratio of greater than five to one. Alternatively, data may also be re¬ corded by increasing the reflectivity of the strip. For example, the recording laser can melt a field of dull microscopic spikes on the strip to create flat shiny spots. This method is described in SPIE, Vol. 329, Optical Disk Technology (1982), p. 202. A spot re¬ flectivity of more than twice the surrounding spiked field reflectivity produces a contrast ratio of at least two to one, which is sufficient contrast for reading.
With reference to Fig. 5, a magnified view of laser writing on the laser recording material strip 15 may be seen. The dashed line 33, corresponds to the dashed line 33 in Fig. 3. The oblong spots 35 are aligned in a path and have generally similar dimensions. The spots are generally circular or oval in shape with the axis of the oval perpendicular to the lengthwise dimension of the strip. A second group of spots 37 is shown aligned in a second path. The spots 37 have similar dimensions to the spots 35. The spacing between paths is not critical, except that the optics of the readback system should be able to easily distinguish between paths. Presently, in optical data storage technology, tracks which are separated by only a few microns may be resolved. The spacing and pattern of the spots along each path is selected for easy decoding.
The spots illustrated in Fig. 5 have a recom¬ mended size of approximately 5 microns by 20 microns, or circular spots 5 microns or 10 microns in diameter. Generally, the smallest dimension of a spot should be less than 50 microns. In the preferred embodiment the largest dimension would also be less than 50 microns. Of course, to offset lower densities from larger spots, the
-10-
size of the strip 15 could be expanded to the point where it covers a large extent of the card. In Fig. 3, the laser recording strip 15 could completely cover a single side of the card. A minimum information capacity of 250,000 bits is indicated and a storage capacity of over one million bits is preferable.
In Fig. 6, a side view of the lengthwise dimen¬ sion of a card 41 is shown inserted into card reader/writer 29. The card is usually received in a movable holder 42 which brings the card into the beam trajectory. A laser light source 43, preferably a pulsed semiconductor laser of near infrared wavelength emits a beam 45 which passes through collimating and focussing optics 47. The beam is sampled by a beam splitter 49 which transmits a portion of the beam through a focusing lens 51 to a photodetector 53. The detector 53 confirms laser writing and is not essential. The beam is then directed to a first servo controlled mirror 55 which is mounted for rotation along the axis 57 in the direction indicated by the arrows A. The purpose of the mirror 55 is to find the lateral edges of the laser recording material in a coarse mode of operation and then in a fine mode of operation identify data paths which exist prede¬ termined distances from the edges. From mirror 55, the beam is directed toward mirror 61. This mirror is mounted for rotation at pivot 63. The purpose of mirror 55 is for fine control of motion of the beam along the length of the card. Coarse control of the lengthwise position of the card relative to the beam is achieved by motion of movable holder 42. The position of the holder may be established by a linear motor adjusted by a closed loop position servo system of the type used in magnetic disk drives.
During its manuacture the card may be pre- recorded with database information or a preinscribed pattern containing servo tracks, timing marks, program instructions, and related functions. These positioning marks can be used as a reference for the laser recording
-11-
system to record or read data at particular locations. Each of the various spoken word recognition systems may have formats specific to its particular needs. U.S. patent No. 4,304,848 describes how formatting may be done photolithographically. Formatting may also be done using laser recording or surface molding of the servo tracks, having marks, programming and related functions. Dil, in U.S. patent 4,209,804 teaches a type of surface molding. Reference position information may be prerecorded on the card so that position error signals may be generated and used as feedback in motor control. Upon reading one data path, the mirror 55 is slightly rotated. The motor moves holder 41 lengthwise so that the path can be read, and so on.
Light scattered and reflected from the spots con¬ trasts with the surrounding field where no spots exist. The beam should deliver sufficient laser pulse energy to the surface of the recording material to create spots. Typically, 5-20 milliwatts is required, depending on the recording material. A 20 milliwatt semiconductor laser, focussed to a five micron beam size, records at tempera¬ tures of about 200 C and is capable of creating spots in about 75 microseconds. The wavelength of the laser should be compatible with the recording material. In the read mode, power is lowered to about 5% to 10% of the record power.
Optical contrast between a spot and surrounding field are detected by light detector 65 which may be a photodiode. Light is focussed onto detector 65 by beam splitter 67 and focusing lens 69. Servo motors, not shown, control the positions of the mirrors and drive the mirrors in accord with instructions received from control circuits, as well as from feedback devices. The detector 65 produces electrical signals corresponding to spots. These signals are processed by the spoken-word recogni¬ tion unit and used for identifying words spoken by a particular speaker.
Claims
1. A system for initializing spoken-word recognition units comprising, a spoken-word recognition unit having a voice input and a data input, and providing an output of word recognition, a data card reader connected to said spoken_ word recognition unit, and a plurality of data cards, adapted to be read by the data card reader, each card having prerecorded data which is speaker characteristic data sufficient in extent to teach said spoken-word recognition unit to understand the words of a speaker.
2. The system of claim 1 wherein said data card reader comprises, a light source having a light beam directed at one of said data cards, and a light detector disposed to receive said light beam reflected from said card, said detector connected to said spoken-word recognition unit for inputting said data on said card.
3. The system of claim 1 wherein each of said data cards has a strip of laser recording material for the optical storage of said speaker characteristic data, said storage being in the form of spots, the reflective contrast ratio of the spots with respect to the surrounding unrecorded material being at least two to one. -13-
4. A system for encoding speech characteristic data on a card comprising, speech input means for inputting spoken words, analyzing means connected to said input means for extracting speech characteristics of individual users from said spoken words, a data card writer/reader connected to said analyzing means, and a plurality of data cards, adapted for writing by the card writer/reader, each card having data storage material disposed thereon capable of storing speaker characteristic data sufficient in extent to characterize the words of a speaker.
5. The system of claim 4 wherein said data card writer/reader comprises, a laser having a laser beam directed at one of said data cards, said laser connected to said analyzing means for receiving said speech characteristics, and a light detector disposed to receive said light beam reflected froms said card.
6. The system of claim 4 wherein each of said data cards has a strip of laser recording material for the optical storage of said speaker characteristic data, said storage being in the form of spots, the reflective contrast ratio of the spots with respect to the surrounding unrecorded material being at least two to one.
-14-
7. A method for storing speaker dependent voice recognition data comprising, speaking a set of words, analyzing said spoken words for extraction of speech characteristics from said spoken words, generating digital data corresponding to said speech characteristics, recording with a modulated laser beam said digital data onto a card having a strip of laser recording material, said recording forming spots representing said speech characteristics, said spots having a detectable change in an optical characteristic relative to said strip.
8. A method for initializing spoken-word recognition units comprising, placing a data card in data reading relation to a data card reader, said data card having speaker characteristic data prerecorded thereon, reading said data on said data card with said data card reader, and inputting said data to a spoken word recogni¬ tion unit, said unit being connected to said card reader for receiving said data input.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US72138185A | 1985-04-09 | 1985-04-09 | |
US721,381 | 1985-04-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1986006197A1 true WO1986006197A1 (en) | 1986-10-23 |
Family
ID=24897749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1986/000494 WO1986006197A1 (en) | 1985-04-09 | 1986-03-10 | Data card system for initializing spoken-word recognition units |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0218723A1 (en) |
CA (1) | CA1258317A (en) |
WO (1) | WO1986006197A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0248593A1 (en) * | 1986-06-06 | 1987-12-09 | Speech Systems, Inc. | Preprocessing system for speech recognition |
EP0271835A2 (en) * | 1986-12-19 | 1988-06-22 | Hitachi, Ltd. | Personal voice pattern carrying card system |
US4827518A (en) * | 1987-08-06 | 1989-05-02 | Bell Communications Research, Inc. | Speaker verification system using integrated circuit cards |
FR2642882A1 (en) * | 1989-02-07 | 1990-08-10 | Ripoll Jean Louis | SPEECH PROCESSING APPARATUS |
ES2114493A1 (en) * | 1996-05-22 | 1998-05-16 | Univ Madrid Politecnica | System for verifying the identity of persons by means of a portable data medium based on voice recognition. |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3896266A (en) * | 1971-08-09 | 1975-07-22 | Nelson J Waterbury | Credit and other security cards and card utilization systems therefore |
US4284716A (en) * | 1979-07-06 | 1981-08-18 | Drexler Technology Corporation | Broadband reflective laser recording and data storage medium with absorptive underlayer |
-
1986
- 1986-03-10 EP EP86903718A patent/EP0218723A1/en not_active Withdrawn
- 1986-03-10 WO PCT/US1986/000494 patent/WO1986006197A1/en unknown
- 1986-04-08 CA CA000506053A patent/CA1258317A/en not_active Expired
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3896266A (en) * | 1971-08-09 | 1975-07-22 | Nelson J Waterbury | Credit and other security cards and card utilization systems therefore |
US4284716A (en) * | 1979-07-06 | 1981-08-18 | Drexler Technology Corporation | Broadband reflective laser recording and data storage medium with absorptive underlayer |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0248593A1 (en) * | 1986-06-06 | 1987-12-09 | Speech Systems, Inc. | Preprocessing system for speech recognition |
EP0271835A2 (en) * | 1986-12-19 | 1988-06-22 | Hitachi, Ltd. | Personal voice pattern carrying card system |
EP0271835A3 (en) * | 1986-12-19 | 1989-02-22 | Hitachi, Ltd. | Personal voice pattern carrying card system |
US4827518A (en) * | 1987-08-06 | 1989-05-02 | Bell Communications Research, Inc. | Speaker verification system using integrated circuit cards |
FR2642882A1 (en) * | 1989-02-07 | 1990-08-10 | Ripoll Jean Louis | SPEECH PROCESSING APPARATUS |
WO1990009656A1 (en) * | 1989-02-07 | 1990-08-23 | Alcept | Speech processing machine |
ES2114493A1 (en) * | 1996-05-22 | 1998-05-16 | Univ Madrid Politecnica | System for verifying the identity of persons by means of a portable data medium based on voice recognition. |
Also Published As
Publication number | Publication date |
---|---|
CA1258317A (en) | 1989-08-08 |
EP0218723A1 (en) | 1987-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4711996A (en) | Redundant optical recording of information in different formats | |
US4360728A (en) | Banking card for automatic teller machines and the like | |
US4544835A (en) | Data system containing a high capacity optical contrast laser recordable wallet-size plastic card | |
US4609812A (en) | Prerecorded dual strip data storage card | |
US4680459A (en) | Updatable micrographic pocket data card | |
US5421619A (en) | Laser imaged identification card | |
US4683371A (en) | Dual stripe optical data card | |
US4680460A (en) | System and method for making recordable wallet-size optical card | |
US4910725A (en) | Optical recording method for data cards | |
US4680458A (en) | Laser recording and storage medium | |
CA1231780A (en) | Method for making a laser recordable wallet-size plastic card | |
EP1218878B1 (en) | Method and system for laser writing microscopic data spots on cards and labels readable with a ccd array | |
US4680456A (en) | Data system employing wallet-size optical card | |
US4835376A (en) | Laser read/write system for personal information card | |
JPH02501241A (en) | Updateable Micrographic Pocket Data Card | |
US4588665A (en) | Micrographic film member with laser written data | |
US4656346A (en) | System for optically reading and annotating text on a data card | |
AU549957B2 (en) | Banking card for automatic teller machines and the like | |
CA1258317A (en) | Data card system for initializing spoken-word recognition units | |
JP2001126267A (en) | Optical recording medium recorder and optical recording medium | |
JPS63127435A (en) | Optical card system | |
JPS6082396A (en) | Method of discriminating card by hologram |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): DE GB JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE FR GB IT LU NL SE |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |