CA1258317A

CA1258317A - Data card system for initializing spoken-word recognition units

Info

Publication number: CA1258317A
Application number: CA000506053A
Authority: CA
Inventors: Jerome Drexler
Original assignee: Drexler Technology Corp
Current assignee: LaserCard Corp
Priority date: 1985-04-09
Filing date: 1986-04-08
Publication date: 1989-08-08
Also published as: WO1986006197A1; EP0218723A1

Abstract

Abstract Data Card System for Initializing Spoken-Word Recognition Units A spoken-word recognition system for recog-nizing a speaker's words with the assistance of a data card initializing system. The data is stored on data cards, each card having sufficient data to teach the recognition unit to recognize the words of the speaker.
A data card reader reads the optical data on the card and inputs this data into the spoken-word recognition unit.
An auxiliary system is used to encode the cards with the speaker's voice characteristics through use of selected speech inputting of a set of words to a microphone, followed by a data card writer which writes the data on the data card.

Description

1;~5t~:31'7 Technical Field.
The invention relates to spoken-word recogni-tion systems.
~O
Background Art Suzuki, et al. (U.S. patents 4,060,694, 4,078,154 and 4,100,370) teach a voice recognition system in which the phonemes as spoken by different speakers and the voice of the person speaking can be recognized. A
key phrase is spoken. Parallel filters derive a spectral characteristic parameter which contains weighting factors extracted and compared with the selected phoneme in memory. Improved specificity over other speakers can be obtained by varying the weighting factors through a num-ber of different values, and storing in memory the set of parameters for each sound as spoken by a specific speaker. The system can be used thus for voice verifica-tion.
Felix et al. (U.S. patent 4,449,189) disclose amethod for identifying an individual using a combination Dl: ~

1258~17 of speech and face recognition. The voice signature of a person uttering a key word into a microphone is compared in a pattern matched with the previously stored voice signature of a known person uttering the same key word.
At the same time, a momentary image of that person's mouth region is recorded and compared with that of the same known person. The results of the comparison are analyzed to verify that the identity of the speaker is that of the known person.
Katayama (U.S. patent 4,461,023) discloses a method of storing spoken words for use in a speech recog-nition system. Spoken words are input, then analyzed.
The resulting patterns are stored in memory. A-second memory stores the digitized speech that was input. The addresses of both memory units are specified to be the same for a particular word.
Systems that recognize voice commands require memory for storing speech characteristics for later com-parison. The system must first be "taught" the speech characteristics before it can rècognize specific voice commands. A problem with this is that each speaker has his own speech characteristics that the system must learn. Computer memory space is limited, so the speech characteristics of an individual must be relearned each time the speaker changes.
An object of the invention is to devise a spoken word recognition system which is of reduced com-plexity and which can be quickly and easily programmed to understand any individual's voice commands more readily.
Another object of the invention is to devise a system in which a voice command unit can be initialized easily by each individual user without needing a knowledge of programming or of the unit's operation.

Disclosure of Invention The above objects have been met by a system which stores a person's voice characteristics on a wallet-size card containing a laser recordable strip.

1;~583~7 These cards may be inserted into and removed from a word recognition processor. Every user has a card with his own pre-recorded speech characteristics thereon.
Upon insertion into a word recognition processor, the processor unit would be initialized with respect to the particular voice charcteristics of the owner of the card.
To encode the card, a set of words is spoken by a user into a microphone. The spoken words are analyzed and speech characteristics are extracted. Such charac-teristics include pitch, intonation, speed of speaking,accent parameters, and other parameters. A sufficient number of these characteristics is recorded on the card so that the words spoken later by the speaker may be understood by the word recognition unit.
A spoken-word recognition unit receives a user's voice message and identifies the words with the ; help of the speaker's voice characteristics in its mem-ory, which was initialized by the spoken-word identifica-tion data on a card. In this manner, the spoken words can be recognized. For each new speaker, the unit must first be "taught" a particular speaker's characteristics so that the unit can more easily recognize the spoken `
words. The card provides the information to teach the-unit. A record of an individual's speech characteristics is laser recorded on a card which is later read into the unit by placing it in a card reader and the character-istics entered into the short-term memory of the spoken-word recognition unit.
The card has a strip of laser recording material, such as the reflective direct-read-after-write material described in U.S. patent 4,284,716 to Drexler et al. A modulated laser beam records data on the strip, in situ, by ablation, melting, physical or cnemical change or deformation, thereby forming spots having a detectable 3~ change in an optical characteristic relative to the strip. The recording process on the above mentioned direct-read-after-write material produces differences in reflectivity detectable by a light detector. No pro-1;~5~17 cessing after laser recording is required when the re-cording strip is a direct-read-after-write material.
Laser recording materials also may be used that require heat processing after laser recording.
Each person has his own speech characteris-tics, in much the same way that each person has his own set of fingerprints. The card with the recorded speech characteristics is read by shining a laser beam or light emitting diode onto the strip. The beam, typically, has an intensity of ten percent of the recording intensity.
The beam is reflected from the strip to a photodetector.
The detector detects the contrast in optical character-istics between the strip and the recorded spots, and transmits corresponding signals to the speech recognition unit's short-term memory. The system is now ready to listen to words spoken by the user and to identify the words with the help of the speaker's voice characteris-tics stored in the memory. By this procedure the speaker's words are more clearly identified.
The uniform surface reflectivity of this reflec-tive strip before recording typically would range be-tween 8% and 65~. For a highly reflective strip the average reflectivity over a laser recorded spot might be in the range of 5% to 25%. Thus, the reflective contrast ratio of the recorded spots would range between 2:l and 7:l. Laser recording materials are known in the art that create either low reflectivity spots in a reflective field or high reflec~ive spots in a low reflectivity field. An example of the latter type is described in U.S. patent 4,343,879. When the reflectivity of the field is in the range of 8~ to 20% the reflective spots have a reflectivity of about 40~. The reflective con-trast ratio would range from 2:l to 5:l. Photographic pre-formatting would create spots having a 10% reflectivi-ty in a reflective field or 40~ in a low reflectivity field.
The voice information on the card would typi-cally be in digital form. It would inform the word 1~58:~17 recognition unit of macro aspects of speech such as accent parameters, speed of speaking, dropping of "th"
beginnings or "g" endings, variations in intensity as well as the micro aspects such as tone, pitch, intonation, etc. With this advance knowledge about the speech characteristics of the words about to be spoken the words can more easily be recognized.
The card can store tens, hundreds or even thousands of deviation parameters from a ~normal" voice.
When a word is not understood the word interpreter unit would add in corrections to the unidentified word based upon the individual's speech deviation information. At-tempts to recognize the word are then repeated.

Brief Description of the Drawings Fig. l is a schematic diagram of the spoken-word recognition system of the present invention.
Fig. 2 is a schematic diagram of the data card encoding of the present invention.
Fig. 3 is a plan view of one side of a data card in accord with the present invention.
Fig. 4 is a partial side sectional view taken along lines 4-4 in Fig. 3.
Fig. S is a detail of laser writing on a portion of the laser recording strip illustrated by dashed lines in Fig. 3.
Fig. 6 is a plan view of an apparatus for reading and writing on the optical recording media strip illustrated in Fig. 3.

Best Mode for Carrying Out the Invention With reference to Fig. l, a spoken-word recog-nition system lO reads a person's voice characteristics from a wallet-size card 31 containing a strip of laser recordable material. Each person would have a card 3l with his own speech characteristics prerecorded thereon.
The system lO is initialized with respect to the particular voice characteristics of the card owner by inserting the ( 1;~58;~7 card 31 into system lO. A sufficient number of character-istics is recorded so that words spoken by a particular speaker may be identified.
With reference to Fig. 2 a data card encoding system llO is used to form a card 131. A set of words 116 is spoken by a person`into a microphone 117. The resulting signal is analyzed by a speech analyzer 121 and speech characteristics 122 are extracted. such charac-teristics 122 include pitch, formats, ratio of voiced to unvoiced amplitudes, and other parameters used to help identify words and parts of words. The exact set of parameters will vary from one system to another, de-pending on the type of speech analysis which is used.
Macro aspects of speech such as accent parameters, speed of speaking, dropping of particular sounds at the begin-ning or ending of words, and variations in tone may also be included to make word recognition even easier. In any case, speech analyzer 121 sends a digital signal 122 repre-senting a person's speech characteristics to a data card writer/reader 129 which writes the data with a laser onto card 131 by shining a modulated laser beam 130 onto the card 131. The card 131 has a strip of optical contrast laser recording material disposed thereon. The beam 130 records data onto the card 131, in situ, by ablation, melting physical or chemical change or deformation, thereby forming spots with contrasting reflectivity rela-tive to the unreccrded strip. Reflected beam 132 is read by the card reader/w~iter 129 to confirm laser writing.
In Fig. 1 the spoken-word recognition system lO
is initialized by placing a prerecorded card 31 in data card reader 29. The card reader 29 shines a light beam 30 from a laser or a LED onto the prerecorded strip.
This read beam, typically, has an intensity of five to ten percent of the typical semiconductor laser recording intensity. The light beam 32 is reflected from the strip to a photodetector, which detects this contrast in re-flectivity between the strip and recorded spots. Card reader 29 transmits a signal 24 corresponding to the 1;~5~3~317 recorded data to the short-term memory of the spoken-word recognition unit 23.
The system 10 is now ready to listen to words 16 spoken by the user. The words 16 spoken into micro-phone 17 are analyzed and interpreted by the speechrecognition unit 23 with respect to the voice character-istics 24, now stored in its short-term memory. The words 16 are recognized and the result is sent to an output device 27, such as a CRT terminal.
With reference to Figs. 3 and 4, a data card 11 is illustrated having a size common to most credit cards.
The width dimension of such a card is approximately 54 mm and the length dimension is approximately 85 mm. These dimensions are not critical, but preferred because such a size easily fits into a wallet and has historically been adopted as a convenient size for automatic teller ma-chines and the like. The card's base 13 is a dielectric, usually a plastic material such as polyvinyl chloride or similar material. Polycarbonate plastic is preferred.
The surface finish of the base should have low specular reflectivity, preferably less than 10%.
Base 13 carries strip 15. The strip is about 16 or 35 millimeters wide and extends the length of the card. Alternatively, the strip may have other sizes and orientations. The strip is relatively thin, approximate-ly 60-200 microns, although this is not critical. The strip may be applied to the card by any convenient method which achieves flatness.
The strip is adhered to th~ card with an adhe-sive and covered by a transparent laminating sheet 19which serves to keep strip 15 flat, as well as protecting the strip from dust and scratches. Sheet 19 is a thin, transparent plastic sheet laminating material or a coat-ing, such as a transparent lacquer. The material is preferably made of polycarbonate plastic.
The opposite side of base 13 may have user identification indicia embossed on the surface of the 1;~5~:~1 7 card. Other indicia such as card number and the like may be optionally provided.
The high resolution laser recording material which forms strip 15 may ~e any of the reflective record-ing material which have been developed for use as directread-after-write (DRAW) optical disks, so long as the materials can be formed on thin substrates. An advantage of reflective materials over transmissive materials is that the read/write equipment is all on one side af the card, the data storage capacity is doubled, and the automatic focus is easier. For example, the high resolu-tion material described in U.S. patent 4,230,939 issued to de 8Ont, et al. teaches a thin metallic recording layer of reflecti~e metals such as Bi, Te, Ind, Sn, Cu , Al, Pt, Au, Rh, As, Sb, Ge, Se, Ga.
Materials which are preferred are those having high reflectivity and low melting point, particularly Cd, Sn, Tl, Ind, Bi and amalgams. Suspensions of reflective metal particles in organic colloids also form low melting temperature laser recording media. Silver is one such metal. Typical recording media are described in U.S.
patents Nos. 4,314,260, 4,298,684, 4,278,758, 4,278,758, 4,278,756 and 4,269,917, all assigned to the assignee of the present invention.
The laser recording material which is selected should be compatible with the laser which is used for writing on it. Some materials are more sensitive than others at certain wavelengths. Good sensitivity to in-frared light is preferred because infrared is affected least by scratches and dirt on the transparent laminating sheet. The selected recording material should have a favorable signal-to-noise ratio and form chigh contrast data bits with the read/write system with which it is used.
The material should not lose data when subjected to temperatures of about 17SF(79C) for long periods. The material should also be capable of re-cording at speeds of at least several thousand bits/sec.

1;~58;~.7 This generally precludes the use of materials that re-quire long heating times or that rely on slow chemical reactions in the presence of heat, which may permit recording of only a few bits/sec. A large number of highly reflective laser recording materials have been used for optical data disk applications.
Data is recorded by forming spots in the sur-rounding field of the reflective layer itself, thereby altering the reflectivity in the data spot. Data is read by detecting the optical reflective contrast between the surrounding reflective field of unrecorded areas and the recorded spots. Spot reflectivity of less than half the reflectivity of the surrounding field produces a contrast ratio of at least two to one, which is sufficient con-trast for reading. Greater contrast is preferred. Re-flectivity of the strip field of about 50% is preferred with reflectivity of a spot in the reflective field being less than 10%, thus creating a contrast ratio of greater than five to one. Alternatively, data may also be re-corded by increasing the reflectivity of the strip. Forexample, the recording laser can melt a field of dull microscopic spikes on the strip to create flat shiny spots. This method is described in SPIE, Vol. 329, Optical Disk Technology (1982), p. 202. A spot re-flectivity of more than twice the surrounding spikedfield reflectivity produces a contrast ratio of at least two to one, which is sufficient contrast for reading.
With reference to Fig. 5, a magnified view of laser writing on the laser recording-material strip 15 may be seen. The dashed line 33, corresponds to the dashed line 33 in Fig. 3. The oblong spots 35 are aligned in a path and have generally similar dimensions.
The spots are generally circ~lar or oval in shape with the axis of the oval perpendicular to the lengthwise dimension of the strip. A second group of spots 37 is shown aligned in a second path. The spots 37 have similar dimensions to the spots 35. The spacing between paths is not critical, except that the optics of the 1~5~

readback system should be able to easily distinguish between paths. Presently, in optical data storage technoloqy, tracks which are separated by only a few microns may be resolved. The spacing and pattern of the spots along each path is selected for easy decoding.
The spots illustrated in Fig. 5 have a recom-mended size of approximately 5 microns by 20 microns, or circular spots 5 microns or 10 microns in diameter.
Generally, the smallest dimension of a spot should be less than 50 microns. In the preferred embodiment the largest dimension would also be less than 50 microns. Of course, to offset lower densities from larger spots, the size of the strip 15 could be expanded to the point where it covers a large extent of the card. In Fig. 3, the laser recording strip 15 could completely cover a single side of the card. A minimum information capacity of 250,000 bits is indicated and a storage capacity of over one million bits is preferable.
In Fig. 6, a side view of the lengthwise dimen-sion of a card 41 is shown inserted into cardreader/writer 29. The card is usually received in a movable holder 42 which brings the card into the beam `
trajectory. A laser light source 43, preferably a pulsed semiconductor laser of near infrared wavelength emits a beam 45 which passes through collimating and focussing optics 47. The beam is sampled by a beam splitter 49 which transmits a portion of the beam through a focusing lens 51 to a photodetector 53. The detector 53 confirms laser writing and is not essential. The beam is then directed to a first servo controlled mirror 55 which is mounted for rotation along the axis 57 in the direction indicated by the arrows A. The purpose of the mirror 55 is to find the lateral edges of the laser recording material in a coarse mode of operation and then in a fine 3~ mode of operation identify data paths which exist prede-termined distances from the edges.
From mirror 55, the beam is directed toward mirror 61. This mirror is mounted for rotation at pivot 1'~58;317 63. The purpose of mirror 55 is for fine control of motion of the beam along the length of the card. Coarse control of the lengthwise position of the card relative to the beam is achieved by motion of movable holder 42.
The position of the holder may be established by a linear motor adjusted by a closed loop position servo system of the type used in magnetic disk drives.
During itsmanufacturethe card may be pre-recorded with database information or a preinscribed pattern containing servo tracks, timing marks, program instructions, and related functions. These positioning marks can be used as a reference for the laser recording system to record or read data at particular locations.
Each of the various spoken word recognition systems may have formats specific to its particular needs. U.s.
patent No. 4,304,848 describes how formatting may be done photolithographically. Formatting may also be done using laser recording or surface molding of the servo tracks, having marks, programming and related functions. Dil, in U.S. patent 4,209,804 teaches a type of surface molding.
Reference position information may be prerecorded on the card so that position error signals may be generated and used as feedback in motor control. Upon reading one data path, the mirror 55 is slightly rotated. The motor moves holder 4l lengthwise so that the path can be read, and so on.
Light scattered and reflected from the spots con-trasts with the surrounding field where no spots exist.
The beam should deliver sufficient laser pulse energy to the surface of the recording material to create spots.
Typically, 5-20 milliwatts is required, depending on the recording material. A 20 milliwatt semiconductor laser, focussed to a five micron beam size, records at tempera-tures of about 200 C and is capable of creating spots in less than 25 microseconds. The wavelength of the laser should be compatible with the recording mate-rial. In the read mode, power is lowered to about 5% to 10% of the record power.

-`` 1;~5~3~7 Optical contrast between a spot and surrounding field are detected by light detector 65 which may be a photodiode. Light is focussed onto detector 65 by beam splitter 67 and focusing lens 69. Servo motors, not shown, control the positions of the mirrors and drive the mirrors in accord with instructions received from control circuits, as well as from feedback devices. The detector 65 produces electrical signals corresponding to spots.
These signals are processed by the spoken-word recognition unit and used for identifying words spoken by a particular speaker.

Claims

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. A system for initializing spoken-word recognition units comprising, a spoken-word recognition unit having a voice input and a data input, and providing an output of word recognition, a data card reader connected to said data input of said spoken-word recognition unit, and a plurality of data cards for reading by the data card reader, each card associated with a speaker and having a strip of laser recording material for the optical storage of prerecorded data which is speaker characteristic data, said storage being in the form of spots having a reflective contrast ratio to surrounding unrecorded material of at least two to one, said speaker characteristic data derived from speech analysis of an input set of words spoken by said speaker and sufficient in extent to teach said spoken-word recognition unit words spoken into said voice input by said speaker, said words spoken into said voice input including words not found in said input set of words.

2. The system of claim 1 wherein said data card reader comprises, a light source having a light beam directed at one of said data cards, and a light detector disposed to receive said light beam reflected from said card, said detector connected to said spoken-word recognition unit for inputting said data on said card.

3. A system for encoding speech characteristic data on a card comprising, speech input means for inputting an input set of spoken words, analyzing means connected to said input means for extracting speech characteristics of individual users from said input set of spoken words, a data card writer/reader connected to said analyzing means, a plurality of data cards, adapted for writing by the card writer/reader, each card having a strip of laser recording material disposed thereon, said laser recording material having an encoded optical storage of speaker characteristic data sufficient in extent to characterize how a speaker would say words including words other than said input set of spoken words, said optical storage being in the form of spots written by said card writer/reader into said laser recording material, the reflective contrast ratio of the spots with respect to surrounding unrecorded material being at least two to one.

4. The system of claim 3 wherein said data card writer/reader comprises, a laser having a laser beam directed at one of said data cards, said laser connected to said analyzing means for receiving said speech characteristics, and a light detector disposed to receive said light beam reflected from said card.

5. A method for storing speaker dependent voice recognition data comprising, having at least one speaker speak an input set of words, analyzing said input set of words spoken by each speaker for extraction of speech characteristics from said spoken input set of words, said speech characteristics sufficient in extent to characterize words for a specific speaker including other than said input set of words, generating digital data corresponding to said speech characteristics of each speaker, and recording with a modulated laser beam said digital data corresponding to the speech characteristics of a speaker onto a card having a strip of laser recording material, said recording forming spots representing said speech characteristics, said spots having a detectable change in an optical characteristic relative to said strip.

6. A method for initializing spoken-word recognition units comprising, placing a data card in data reading relation to a data card reader, said data card associated with a speaker and having a strip of laser recording material having speaker characteristic data prerecorded thereon, said data prerecorded in the form of spots having a reflectivity distinct from the reflectivity of unrecorded laser recording material, said speaker characteristic data derived from analysis of an input set of words spoken by said speaker and sufficient in extent to teach a spoken-word recognition unit to understand words spoken by said speaker, said words spoken by said speaker including words not found in said input set of words, reading said data on said data card with said data card reader, and inputting said data read by the card reader to a spoken-word recognition unit, said unit being connected to said card reader for receiving said data input, whereby said unit is initialized for understanding the words spoken by said speaker based on the input data.