WO1986006197A1

WO1986006197A1 - Data card system for initializing spoken-word recognition units

Info

Publication number: WO1986006197A1
Application number: PCT/US1986/000494
Authority: WO
Inventors: Jerome Drexler
Original assignee: Drexler Technology Corporation
Priority date: 1985-04-09
Filing date: 1986-03-10
Publication date: 1986-10-23
Also published as: CA1258317A; EP0218723A1

Abstract

A spoken-word recognition system for recognizing a speaker's words with the assistance of a data card intializing system. The data is stored on date cards, each card (31) having sufficient data to teach the recognition unit (23) to recognize the words of the speaker. A data card reader (29) reads the optical data on the card and inputs this data into the spoken-word recognition unit. An auxiliary system is used to encode the cards (131) with the speaker's voice characteristics through use of selected speech inputting of a set of words (116) to a microphone (117), followed by a data card writer (129) which writes the data on the data card.

Description

-I-

Description

Data Card System for Initializing Spoken-Word Recognition Units

Technical Field.

The invention relates to spoken-word recogni¬ tion systems.

Background Art

Suzuki, et al. (U.S. patents 4,060,694, 4,078,154 and 4,100,370) teach a voice recognition system in which the phonemes as spoken by different speakers and the voice of the person speaking can be recognized. A key phrase is spoken. Parallel filters derive a spectral characteristic parameter which contains weighting factors extracted and compared with the selected phoneme in memory. Improved specificity over other speakers can be obtained by varying the weighting factors through a num¬ ber of different values, and storing in memory the set of parameters for each sound as spoken by a specific speaker. The system can be used thus for voice verifica¬ tion.

Felix et al. (U.S. patent 4,449,189) disclose a method for identifying an individual using a combination of speech and face recognition. The voice signature of a person uttering a key word into a microphone is compared in a pattern matched with the previously stored voice signature of a known person uttering the same key word. At the same time, a momentary image of that person's mouth region is recorded and compared with that of the same known person. The results of the comparison are analyzed to verify that the identity of the speaker is that of the known person.

Katayama (U.S. patent 4,461,023) discloses a method of storing spoken words for use in a speech recog- -2-

nition system. Spoken words are input, then analyzed. The resulting patterns are stored in memory. A second memory stores the digitized speech that was input. The addresses of both memory units are specified to be the same for a particular word.

Systems that recognize voice commands require memory for storing speech characteristics for later com¬ parison. The system must first be "taught" the speech characteristics before it can recognize specific voice commands. A problem with this is that each speaker has his own speech characteristics that the system must learn. Computer memory space is limited, so the speech characteristics of an individual must be relearned each time the speaker changes.

An object of the invention is to devise a spoken word recognition system which is of reduced com¬ plexity and which can be quickly and easily programmed to understand any individual's voice commands more readily.

Another object of the invention is to devise a system in which a voice command unit can be initialized easily by each individual user without needing a knowledge of programming or of the unit's operation.

Disclosure of Invention

The above objects have been met by a system which stores a person's voice characteristics on a wallet-size card containing a laser recordable strip. These cards may be inserted into and removed from a word recognition processor. Every user has a card with his own pre-recorded speech characteristics thereon. Upon insertion into a word recognition processor, the processor unit would be initialized with respect to the particular voice charcteristics of the owner of the card.

To encode the card, a set of words is spoken by a user into a microphone. The spoken words are analyzed and speech characteristics are extracted. Such charac¬ teristics include pitch, intonation, speed of speaking, accent parameters, and other parameters. A sufficient ^■3-

number of these characteristics is recorded on the card so that the words spoken later by the speaker may be understood by the word recognition unit.

A spoken-word recognition unit receives a user's voice message and identifies the words with the help of the speaker's voice characteristics in its mem¬ ory, which was initialized by the spoken-word identifica¬ tion data on a card. In this manner, the spoken words can be recognized. For each new speaker, the unit must first be "taught" a particular speaker's characteristics so that the unit can more easily recognize the spoken words. The card provides the information to teach the unit. A record of an individual's speech characteristics is laser recorded on a card which is later read into the unit by placing it in a card reader and the character¬ istics entered into the short-term memory of the spoken- word recognition unit.

The card has a strip of laser recording material, such as the reflective direct-read-after-write material described in U.S. patent 4,284,716 to Drexler et al. A modulated laser beam records data on the strip, in situ, by ablation, melting, physical or chemical change or deformation, thereby forming spots having a detectable change in an optical characteristic relative to the strip. The recording process on the above mentioned direct-read-after-write material produces differences in reflectivity detectable by a light detector. No pro¬ cessing after laser recording is required when the re¬ cording strip is a direct-read-after-write material. Laser recording materials also may be used that require heat processing after laser recording.

Each person has his own speech characteris¬ tics, in much the same way that each person has his own set of fingerprints. The card with the recorded speech characteristics is read by shining a laser beam or light emitting diode onto the strip. The beam, typically, has an intensity of ten percent of the recording intensity. The beam is reflected from the strip to a photodetector. -4-

The detector detects the contrast in optical character¬ istics between the strip and the recorded spots, and transmits corresponding signals to the speech recognition unit's short-term memory. The system is now ready to listen to words spoken by the user and to identify the words with the help of the speaker's voice characteris¬ tics stored in the memory. By this procedure the speaker's words are more clearly identified.

The uniform surface reflectivity of this reflec¬ tive strip before recording typically would range be¬ tween 8% and 65%. For a highly reflective strip the average reflectivity over a laser recorded spot might be in the range of 5% to 25%. Thus, the reflective contrast ratio of the recorded spots would range between 2:1 and 7:1. Laser recording materials are known in the art that create either low reflectivity spots in a reflective field or high reflective spots in a low reflectivity field. An example of the latter type is described in U.S. patent 4,343,879. When the reflectivity of the field is in the range of 8% to 20% the reflective spots have a reflectivity of about 40%. The reflective con¬ trast ratio would range from 2:1 to 5:1. Photographic pre-formatting would create spots having a 10% reflectivi¬ ty in a reflective field or 40% in a low reflectivity field.

The voice information on the card would typi¬ cally be in digital form. It would inform the word recognition unit of macro aspects of speech such as accent parameters, speed of speaking, dropping of "th" beginnings or "g" endings, variations in intensity as well as the micro aspects such as tone, pitch, intonation, etc. With this advance knowledge about the speech characteristics of the words about to be spoken the words can more easily be recognized. The card can store tens, hundreds or even thousands of deviation parameters from a "normal" voice. When a word is not understood the word interpreter unit would add in corrections to the unidentified word based -5-

upon the individual's speech deviation information. At¬ tempts to recognize the word are then repeated.

Brief Description of the Drawings

Fig. 1 is a schematic diagram of the spoken- word recognition system of the present invention.

Fig. 2 is a schematic diagram of the data card encoding of the present invention.

Fig. 3 is a plan view of one side of a data card in accord with the present invention.

Fig. 4 is a partial side sectional view taken along lines 4-4 in Fig. 3.

Fig. 5 is a detail of laser writing on a portion of the laser recording strip illustrated by dashed lines in Fig. 3.

Fig. 6 is a plan view of an apparatus for reading and writing on the optical recording media strip illustrated in Fig. 3.

Best Mode for Carrying Out the Invention

With reference to Fig. 1, a spoken-word recog¬ nition system 10 reads a person's voice characteristics from a wallet-size card 31 containing a strip of laser recordable material. Each person would have a card 31 with his own speech characteristics prerecorded thereon.

The system 10 is initialized with respect to the particular voice characteristics of the card owner by inserting the card 31 into system 10. A sufficient number of character¬ istics is recorded so that words spoken by a particular speaker may be identified.

With reference to Fig. 2 a data card encoding system 110 is used to form a card 131. A set of words 116 is spoken by a person into a microphone 117. The resulting signal is analyzed by a speech analyzer 121 and speech characteristics 122 are extracted. Such charac¬ teristics 122 include pitch, formats, ratio of voiced to unvoiced amplitudes, and other parameters used to help identify words and parts of words. The exact set of -6-

parameters will vary from one system to another, de¬ pending on the type of speech analysis which is used. Macro aspects of speech such as accent parameters, speed of speaking, dropping of particular sounds at the begin¬ ning or ending of words, and variations in tone may also be included to make word recognition even easier. In any case, speech analyzer 121 sends a digital signal 122 repre¬ senting a person's speech characteristics to a data card writer/reader 129 which writes the data with a laser onto card 131 by shining a modulated laser beam 130 onto the card 131. The card 131 has a strip of optical contrast laser recording material disposed thereon. The beam 130 records data onto the card 131, in situ, by ablation, melting physical or chemical change or deformation, thereby forming spots with contrasting reflectivity rela¬ tive to the unrecorded strip. Reflected beam 132 is read by the card reader/writer 129 to confirm laser writing.

In Fig. 1 the spoken-word recognition system 10 is initialized by placing a prerecorded card 31 in data card reader 29. The card reader 29 shines a light beam 30 from a laser or a LED onto the prerecorded strip. This read beam, typically, has an intensity of five to ten percent of the typical semiconductor laser recording intensity. The light beam 32 is reflected from the strip to a photodetector, which detects this contrast in re- flectivity between the strip and recorded spots. Card reader 29 transmits a signal 24 corresponding to the recorded data to the short-term memory of the spoken-word recognition unit 23. The system 10 is now ready to listen to words

16 spoken by the user. The words 16 spoken into micro¬ phone 17 are analyzed and interpreted by the speech recognition unit 23 with respect to the voice character¬ istics 24, now stored in its short-term memory. The words 16 are recognized and the result is sent to an output device 27, such as a CRT terminal.

With reference to Figs. 3 and 4, a data card 11 is illustrated having a size common to most credit cards. -7-

The width dimension of such a card is approximately 54 mi and the length dimension is approximately 85 mm. These dimensions are not critical, but preferred because such a size easily fits into a wallet and has historically been adopted as a convenient size for automatic teller ma¬ chines and the like. The card's base 13 is a dielectric, usually a plastic material such as polyvinyl chloride or similar material. Polycarbonate plastic is preferred. The surface finish of the base should have low specular reflectivity, preferably less than 10%.

Base 13 carries strip 15. The strip is about 16 or 35 millimeters wide and extends the length of the card. Alternatively, the strip may have other sizes and orientations. The strip is relatively thin, approximate- ly 60-200 microns, although this is not critical. The strip may be applied to the card by any convenient method which achieves flatness.

The strip is adhered to the card with an adhe¬ sive and covered by a transparent laminating sheet 19 which serves to keep strip 15 flat, as well as protecting the strip from dust and scratches. Sheet 19 is a thin, transparent plastic sheet laminating material or a coat¬ ing, such as a transparent lacquer. The material is preferably made of polycarbonate plastic. The opposite side of base 13 may have user identification indicia embossed on the surface of the card. Other indicia such as card number and the like may be optionally provided.

The high resolution laser recording material which forms strip 15 may be any of the reflective record¬ ing material which have been developed for use as direct read-after-write (DRAW) optical disks, so long as the materials can be formed on thin substrates. An advantage of reflective materials over transmissive materials is that the read/write equipment is all on one side of the card, the data storage capacity is doubled, and the automatic focus is easier. For example, the high resolu¬ tion material described in U.S. patent 4,230,939 issued -8-

to de Bont, et al. teaches a thin metallic recording layer of reflective metals such as Bi, Te, Ind, Sn, Cu , Al, Pt, Au, Rh, As, Sb, Ge, Se, 6a.

Materials which are preferred are those having high reflectivity and low melting point, particularly Cd, Sn, Tl, Ind, Bi and amalgams. Suspensions of reflective metal particles in organic colloids also form low melting temperature laser recording media. Silver is one such metal. Typical recording media are described in U.S. patents Nos. 4,314,260, 4,298,684, 4,278,758, 4,278,758, 4,278,756 and 4,269,917, all assigned to the assignee of the present invention.

The laser recording material which is selected should be compatible with the laser which is used for writing on it. Some materials are more sensitive than others at certain wavelengths. Good sensitivity to in¬ frared light is preferred because infrared is affected least by scratches and dirt on the transparent laminating sheet. The selected recording material- should have a favorable signal-to-noise ratio and form chigh contrast data bits with the read/write system with which it is used.

The material should not lose data when subjected to temperatures of about 122°F(50°C) for long periods. The material should also be capable of re¬ cording at speeds of at least several thousand bits/sec. This generally precludes the use of materials that re¬ quire long heating times or that rely on slow chemical reactions in the presence of heat, which may permit recording of only a few bits/sec. A large number of highly reflective laser recording materials have been used for optical data disk applications.

Data is recorded by forming spots in the sur¬ rounding field of the reflective layer itself, thereby altering the reflectivity in the data spot. Data is read by detecting the optical reflective contrast between the surrounding reflective field of unrecorded areas and the recorded spots. Spot reflectivity of less than half the

-9-

reflectivity of the surrounding field produces a contrast ratio of at least two to one, which is sufficient con¬ trast for reading. Greater contrast is preferred. Re¬ flectivity of the strip field of about 50% is preferred with reflectivity of a spot in the reflective field being less than 10%, thus creating a contrast ratio of greater than five to one. Alternatively, data may also be re¬ corded by increasing the reflectivity of the strip. For example, the recording laser can melt a field of dull microscopic spikes on the strip to create flat shiny spots. This method is described in SPIE, Vol. 329, Optical Disk Technology (1982), p. 202. A spot re¬ flectivity of more than twice the surrounding spiked field reflectivity produces a contrast ratio of at least two to one, which is sufficient contrast for reading.

With reference to Fig. 5, a magnified view of laser writing on the laser recording material strip 15 may be seen. The dashed line 33, corresponds to the dashed line 33 in Fig. 3. The oblong spots 35 are aligned in a path and have generally similar dimensions. The spots are generally circular or oval in shape with the axis of the oval perpendicular to the lengthwise dimension of the strip. A second group of spots 37 is shown aligned in a second path. The spots 37 have similar dimensions to the spots 35. The spacing between paths is not critical, except that the optics of the readback system should be able to easily distinguish between paths. Presently, in optical data storage technology, tracks which are separated by only a few microns may be resolved. The spacing and pattern of the spots along each path is selected for easy decoding.

The spots illustrated in Fig. 5 have a recom¬ mended size of approximately 5 microns by 20 microns, or circular spots 5 microns or 10 microns in diameter. Generally, the smallest dimension of a spot should be less than 50 microns. In the preferred embodiment the largest dimension would also be less than 50 microns. Of course, to offset lower densities from larger spots, the -10-

size of the strip 15 could be expanded to the point where it covers a large extent of the card. In Fig. 3, the laser recording strip 15 could completely cover a single side of the card. A minimum information capacity of 250,000 bits is indicated and a storage capacity of over one million bits is preferable.

In Fig. 6, a side view of the lengthwise dimen¬ sion of a card 41 is shown inserted into card reader/writer 29. The card is usually received in a movable holder 42 which brings the card into the beam trajectory. A laser light source 43, preferably a pulsed semiconductor laser of near infrared wavelength emits a beam 45 which passes through collimating and focussing optics 47. The beam is sampled by a beam splitter 49 which transmits a portion of the beam through a focusing lens 51 to a photodetector 53. The detector 53 confirms laser writing and is not essential. The beam is then directed to a first servo controlled mirror 55 which is mounted for rotation along the axis 57 in the direction indicated by the arrows A. The purpose of the mirror 55 is to find the lateral edges of the laser recording material in a coarse mode of operation and then in a fine mode of operation identify data paths which exist prede¬ termined distances from the edges. From mirror 55, the beam is directed toward mirror 61. This mirror is mounted for rotation at pivot 63. The purpose of mirror 55 is for fine control of motion of the beam along the length of the card. Coarse control of the lengthwise position of the card relative to the beam is achieved by motion of movable holder 42. The position of the holder may be established by a linear motor adjusted by a closed loop position servo system of the type used in magnetic disk drives.

During its manuacture the card may be pre- recorded with database information or a preinscribed pattern containing servo tracks, timing marks, program instructions, and related functions. These positioning marks can be used as a reference for the laser recording -11-

system to record or read data at particular locations. Each of the various spoken word recognition systems may have formats specific to its particular needs. U.S. patent No. 4,304,848 describes how formatting may be done photolithographically. Formatting may also be done using laser recording or surface molding of the servo tracks, having marks, programming and related functions. Dil, in U.S. patent 4,209,804 teaches a type of surface molding. Reference position information may be prerecorded on the card so that position error signals may be generated and used as feedback in motor control. Upon reading one data path, the mirror 55 is slightly rotated. The motor moves holder 41 lengthwise so that the path can be read, and so on.

Light scattered and reflected from the spots con¬ trasts with the surrounding field where no spots exist. The beam should deliver sufficient laser pulse energy to the surface of the recording material to create spots. Typically, 5-20 milliwatts is required, depending on the recording material. A 20 milliwatt semiconductor laser, focussed to a five micron beam size, records at tempera¬ tures of about 200 C and is capable of creating spots in about 75 microseconds. The wavelength of the laser should be compatible with the recording material. In the read mode, power is lowered to about 5% to 10% of the record power.

Optical contrast between a spot and surrounding field are detected by light detector 65 which may be a photodiode. Light is focussed onto detector 65 by beam splitter 67 and focusing lens 69. Servo motors, not shown, control the positions of the mirrors and drive the mirrors in accord with instructions received from control circuits, as well as from feedback devices. The detector 65 produces electrical signals corresponding to spots. These signals are processed by the spoken-word recogni¬ tion unit and used for identifying words spoken by a particular speaker.

Claims

-12-Clai s

1. A system for initializing spoken-word recognition units comprising, a spoken-word recognition unit having a voice input and a data input, and providing an output of word recognition, a data card reader connected to said spoken_ word recognition unit, and a plurality of data cards, adapted to be read by the data card reader, each card having prerecorded data which is speaker characteristic data sufficient in extent to teach said spoken-word recognition unit to understand the words of a speaker.

2. The system of claim 1 wherein said data card reader comprises, a light source having a light beam directed at one of said data cards, and a light detector disposed to receive said light beam reflected from said card, said detector connected to said spoken-word recognition unit for inputting said data on said card.

3. The system of claim 1 wherein each of said data cards has a strip of laser recording material for the optical storage of said speaker characteristic data, said storage being in the form of spots, the reflective contrast ratio of the spots with respect to the surrounding unrecorded material being at least two to one. -13-

4. A system for encoding speech characteristic data on a card comprising, speech input means for inputting spoken words, analyzing means connected to said input means for extracting speech characteristics of individual users from said spoken words, a data card writer/reader connected to said analyzing means, and a plurality of data cards, adapted for writing by the card writer/reader, each card having data storage material disposed thereon capable of storing speaker characteristic data sufficient in extent to characterize the words of a speaker.

5. The system of claim 4 wherein said data card writer/reader comprises, a laser having a laser beam directed at one of said data cards, said laser connected to said analyzing means for receiving said speech characteristics, and a light detector disposed to receive said light beam reflected froms said card.

6. The system of claim 4 wherein each of said data cards has a strip of laser recording material for the optical storage of said speaker characteristic data, said storage being in the form of spots, the reflective contrast ratio of the spots with respect to the surrounding unrecorded material being at least two to one.

-14-

7. A method for storing speaker dependent voice recognition data comprising, speaking a set of words, analyzing said spoken words for extraction of speech characteristics from said spoken words, generating digital data corresponding to said speech characteristics, recording with a modulated laser beam said digital data onto a card having a strip of laser recording material, said recording forming spots representing said speech characteristics, said spots having a detectable change in an optical characteristic relative to said strip.

8. A method for initializing spoken-word recognition units comprising, placing a data card in data reading relation to a data card reader, said data card having speaker characteristic data prerecorded thereon, reading said data on said data card with said data card reader, and inputting said data to a spoken word recogni¬ tion unit, said unit being connected to said card reader for receiving said data input.