EP0233285A1 - Systeme de verification du locuteur - Google Patents

Systeme de verification du locuteur

Info

Publication number
EP0233285A1
EP0233285A1 EP86907234A EP86907234A EP0233285A1 EP 0233285 A1 EP0233285 A1 EP 0233285A1 EP 86907234 A EP86907234 A EP 86907234A EP 86907234 A EP86907234 A EP 86907234A EP 0233285 A1 EP0233285 A1 EP 0233285A1
Authority
EP
European Patent Office
Prior art keywords
values
value
acoustic
source
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP86907234A
Other languages
German (de)
English (en)
Other versions
EP0233285A4 (fr
Inventor
Huseyin Abut
Thomas A. Denker
Jeffrey L. Elman
Bertram P. M. Tao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ECCO INDUSTRIES Inc
Original Assignee
ECCO INDUSTRIES Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ECCO INDUSTRIES Inc filed Critical ECCO INDUSTRIES Inc
Publication of EP0233285A1 publication Critical patent/EP0233285A1/fr
Publication of EP0233285A4 publication Critical patent/EP0233285A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies

Definitions

  • This invention relates to speaker verification, and particularly to a method and system for identifying an individual based on speech input.
  • speech input There are numerous situations in which it is necessary to establish an individual's identity. Such situations include controlling physical access to a secure environment, validating data entry or data transfer, controlling access to an automated teller machine, and establishing identification for credit card usage. Identification by voice is advantageous over alternative systems in all of these cases because it is more convenient, optionally may be carried out at a distance such as over the telephone, and can be very reliable.
  • One such type of system utilizes selected peaks and valleys of successive pitch periods to obtain characteristic coordinates of the voiced input of an unknown speaker. These coordinates are selectively compared to previously stored reference coordinates. As a result of the comparison, a decision is made as to the identity of the unknown speaker.
  • a serious limitation of this system is that it may experience problems resulting from changes in overall intensity levels of the received utterance as compared to the stored utterances.
  • Another system in this area of technology compares the characteristic way an individual utters a test sentence with a previously stored utterance of the same sentence. The system relies on a spectral and fundamental frequency match between the test utterance and the stored reference utterances. As a result, the system is subject to errors from changes in the pitch of the speaker's voice.
  • Still another type of arrangement which has been utilized for verification of a speaker filters each utterance to obtain parameters that are highly indicative of the individual, but are independent * of the content of the utterance. This is accomplished through linear prediction analysis, based on the unique properties of the speaker's vocal tract.
  • a set of reference coefficient signals are adapted to transform signals identifying the properties of the speaker's vocal tract into a set of signals which are representative of the identity of the speaker, and indicative of the speaker's physical characteristics.
  • Prescribed linear prediction parameters are utilized in the system to produce a hypothesized identity of an unknown speaker, which is then compared with ,the signals representative of the identified speaker's physical characteristics, whereby the identity of the unknown speaker is recognized.
  • the reference describes no mechanism in its system by which the distortion between the test and reference utterances can be compared. As a result, no explicit method is provided for actually carrying out the stage of speaker verification, as opposed to the preliminary steps of utterance analysis.
  • Another system which takes a somewhat different approach comprises a speaker verification system utilizing moment invariants of a waveform which corresponds to a standard phrase uttered by a user.
  • a number of independent utterances of the same phrase by the same user are used to compile a set of moment invariants which each correspond to an utterance vector.
  • An average utterance vector is then computed. Comparison of the average utterance vector with later obtained utterance vectors provides for verification of the speaker.
  • Moment invariants from a group of persons of different ages and sexes are also stored, and the group of invariants from persons in the group who are closest in age and sex to the user are compared against the- stored utterance vectors of the user to arrive at a weight vector.
  • the user's weight vector and computed utterance vector are stored on a card and used in computing a threshold, and in evaluating later utterances of the user.
  • the system provides no mechanism by which the distortion between the test and reference utterances may be compared. Further, reliability of acceptances based on comparison of new utterances against the average utterance vector could be questionable. This is especially true if the utterances of the user tend to have a large variance in their various characteristics. For example, if the speaker says a word differently at any given time, the single average value provides no flexibility in recognizing such varying speech. Still another system of interest calculates "distance thresholds" and "mean values" between a test word to be classified and other reference samples.
  • Weighting factors are utilized to gauge the importance of particular variables.
  • a fixed threshold for a given user is required, except that a comparison of portions of the test word outside the threshold may still be used to verify the speech if those portions come within a minimum distance of portions of a reference sample. If no reference sample happens to be near the test sample, there is no means to gain acceptance where the test sample is outside the basic threshold. For example, a user may not be verified if he unintentionally does not properly pronounce the word.
  • the average values of acoustic features from a plurality of speakers are stored in standardized templates for a given reference word.
  • a set of signals representative of the correspondence of the "identified speaker's features with the feature template of the -..ference word is generated.
  • An unknown speaker's utterance is analyzed by comparing the unknown speake 's utterance features and the stored templates fo_ the recognzied words.
  • this system experiences the problem of comparing the single sample of incoming speech with a threshold for that particular user.
  • the user may be unable to qualify for verification if his single attempt to pronounce the word varies by too great an amount from the reference information stored in the system.
  • Another problem with many prior art systems is they have no reliable or tractable means for detecting the beginning and end of speech.
  • This invention comprises a method and apparatus for carrying out verification of a user's identity by evaluation of his utterance.
  • the invention comprises a system which initially develops a data base comprising word samples from a user which are processed by comparison with themselves and with generic utterance data, for developing measures of the probability of erroneously accepting an impostor with respect to verification based on a given word. With this data base created, the system operates to verify the identity of a speaker based upon plural trials and in light of the information in the data base.
  • the system Before a speaker can use the system for verification purposes, he must.be enrolled. In order to enroll the user, the system repeatedly prompts the user for tokens (a token is a single utterance of a word) of a series of reference words until a sufficient number of tokens of the words are obtained and stored in a data base.
  • the tokens are subjected to feature analysis whereby certain coefficients representative of the speaker's vocal tract are obtained.
  • the tokens are also subjected to end point detection. By comparison of the tokens with themselves, and with corresponding tokens of a generic group of people, the system obtains measures of the probabilities of erroneously accepting an impostor or rejecting the true speaker.
  • an enrolled user When an enrolled user wishes to have his speech verified, he enters the identity he is claiming and the system then prompts him for an utterance. His utterance is digitally encoded and analyzed. The start and end points are detected, and coefficients corresponding to features of the utterance are developed. Selected coefficients developed from the user's previously recorded tokens of the selected word are compared with the coefficients of the newly received utterance, producing a measure of the distance between the new utterance and each of the the reference tokens. This process may be repeated for additional utterances from the user. By analyzing one or more of the measures of distance against the probability information developed during enrollment, the system determines the probabilities of making erroneous decisions.
  • decisions are made at successive stages whether to accept or reject the user.
  • the use of cumulative probabilities from stage to stage provides a means of dynamically evaluating the speaker in conjunction with several trials of speech directed to different words, so that the verification decision is based on e user's performance on each of the various words, reducing the likelihood of erroneously accepting an imposter.
  • One general feature of the invention is in making the verification decision in stages where in an earlier stage a decision is made whether or not to proceed o a subsequent stage, and in the subsequent stage a verification decision is based both on the analysis mad ⁇ in the first stage and the analysis made in subsequent stages.
  • Another general feature is in basing the verification decision on at least one probability value derived from probability data which is in turn derived from stored speech information.
  • Another general feature of the invention is in updating the stored information based on test information about 'a speaker's utterance, but only if the speaker has been verified as being the known person.
  • Another general feature is a non-mechanical, non-magnetic power switch for triggering a device (e.g., a solenoid of a door lock) upon verifying a speaker as a known person.
  • a device e.g., a solenoid of a door lock
  • Another general feature is in both detecting and decoding coded tone signals, and performing speech verification using the same digital processor. Another general feature is in time interleaving the analyses of different utterances received from different stations so that the analyses can proceed simultaneously.
  • Another general feature is in the combination of a plurality of stations for receiving utterances, a plurality of processors for serving respective stations, and a host computer having a real-time operating system for serving respective stations in real time.
  • Figure 1 illustrates a general block diagram of the method of speech verification as used in the p * esent invention.
  • Figure 2 is a detailed block diagram of one preferred embodiment of an apparatus for use in the speaker verification system of the present invention.
  • Figures 3 and 4 are flowcharts illustrating the operation of the system of the present invention.
  • Figure 5 is a state diagram of the speech detector system of the present invention.
  • Figures 6 through 8 are flow charts illustrating the operation of the system of the present invention.
  • Figure 9 is a graphical representation of the method for obtaining the global distortion representative of comparison of tokens of a given word.
  • Figures 10 and 11 are flow charts illustrating the operation of the system of the present invention.
  • Figure 12 is a tabular representation of the array "D" for organizing distortions developed in operation of the present invention.
  • Figure 13 is a tabular representation of a per-word STAT file created during operation of the present invention.
  • Figures 14 through 16 are flow charts illustrating the operation of the system of the present invention.
  • Figure 17 is a block-diagram of a multiple access module system served by a host computer with a real-time operating system.
  • Figure 18 is a diagram of an access module.
  • Figure 19 is a block diagram of a portion of the access module.
  • Figure 20 is a circuit diagram of a relay-type circuitry in the access module.
  • Figure 21 is a diagram of memory locations and tables for a real time operating system.
  • Figures 22, 23 show various tables used in the real-time operating system.
  • Figure 24 is a flow-chart of a verify procedure.
  • Figures 25, 26 are flow-charts of an alternate verify procedure.
  • Figure 27 shows Gaussian functions for use in the alternate verify procedure.
  • FUNCTIONAL DESCRIPTION The present invention may be functionally described by reference to Figure 1.
  • the operation of the system is initiated in block 20 by an external condition, such as the operation of a switch or other mechanical activating device, by electronic detection equipment, such as photosensors which detect the presence of a user or by voice or sound activation which detect the user speaking or otherwise making a noise, such as activating a "touch tone" signal.
  • an external condition such as the operation of a switch or other mechanical activating device
  • electronic detection equipment such as photosensors which detect the presence of a user or by voice or sound activation which detect the user speaking or otherwise making a noise, such as activating a "touch tone" signal.
  • the activated system will function to perform operations which have been requested either by the user, or which have been preselected for system operation by an operator at an earlier time.
  • the system moves to decision block 22, and cased on the instructions which it has received, determines whether it is to enter the "enroll" mode which is indicated generally at 24.
  • the enroll mode accomplishes a procedure whereby info-maticu relating to a particular user is obtained frop that user, processed and stored for use at subsequent tj-..es * -.n verifying the identity of the user.
  • This informatior includes utterances (i.e., tokens) by the user of selected reference words which are processed to form a data base which is used for comparison purposes during subs quent verification of the user.
  • utterances i.e., tokens
  • the system passes from decision block 22 to block 26 where the operator keys in information to the system, including the user's name, access points through which • the user may pass, and maximum permissible levels for false acceptance and false rejection errors in verification of the user. If the access points and maximum permissible levels are not specified, default values are used.
  • the system passes to block 28 where it prompts the user to provide an utterance of one of the list of reference words which are to be recorded by the user.
  • Such utterances produce an acoustic message or communication whose pattern is indicative of the identity of the user.
  • the system After prompting the user the system passes to block 30 where it samples incoming signals. Upon detecting an utterance the system passes to block 32 wherein the detected utterance i; ** converted into digital form.
  • the system passes to block 34 where the incoming signals are periodically sampled with each period forming a sample containing signals representing the detected speecn during the particular period of time.
  • the samples are stored until a specified number of them form a frame, which is then processed to obtain autocorrelat ⁇ or functions, normalized autocorrelation functions, and linear prediction coefficients representative of the frame.
  • the set of coefficients which are extracted for each token are utilized in the verification process to be described subsequently.
  • the linear prediction coefficients comprise a representation of the characteristics of the speaker's vocal tract. From block 34 the system passes to block 36 wherein the features extracted in block 34 are untilized for detecteing the beginning and end of the utterance. A state machine is employed accomplishing this end point detection, based upon energy and spectral levels of the incoming signals.
  • the extracted features and end point information are stored in a temporary store 38 and the system then passes from block 36 to decision block 40.
  • decision block 40 returns to block 28 and the user is prompted to provide another utterance, and processing continues as described above. If it is determined in block 40 that the necessary tokens have been obtained then the system passes to block 42 and awaits in a loop there until such time as instructions are received to form a data base. Upon being instructed to form a data base, the system moves from block 42 to block 44 wherein global distortion values are formed be accessing the extracted features from block 38, and conducting comparisions between the individual templates of a given reference word against themselves, and also comparisons of these features against corresponding tokens from a generic group of speakers. As a result of these comparisons, a set of global distortions is produced, each distortion being representative of a distance measure between two corresponding templates.
  • the system passes to block 46 where the global distortions for both the intra-speaker comparisions of corresponding templates from the same speaker, and inter-speaker comparisons of the user's templates with the generic templates are processed and ordered into arrays which indicate the probability of a false acceptance of an impostor depending upon a selected threshold level.
  • the templates and corresponding thresholds are stored in reference templates block 47, and the identity of the user relating to the templates and thresholds stored in block 47 is stored in block 48. After all templates have been formed and thresholds have been computed, the system moves from block 46 to block 49 and terminates further operation of the enroll mode.
  • decision block 50 it is determined whether the system is to enter the verify mode, generally indicated at 53.
  • the verification procedure is initiated when a user presents himself at any access point. If the system is instructed to enter the verify mode, the system moves to block 52 and awaits activation by a user. The user may activate the system in several ways including, but not limited to, entering a personal identification number on a key pad; inserting a plastic card in a reader; or saying his name into a microphone. Upuu receiving this user data in block 52, the system moves to block 54 where the system sends a message to h user requesting that he input an utterance which is one of _he reference words previously stored in the enrollment mode.
  • each frame is processed to obtain its linear prediction coefficient parameters (referred hereafter as LPC parameters) .
  • LPC parameters 5 first comprises computation of autocorrelation coefficients which measure the degree of statistical dependency between neighboring speech samples within the frame.
  • each frame contains from 100 to 300 speech samples, and the autocorrelation coefficients lOrepresent the relationship between these various samples.
  • the autocorrelation coefficients are processed by use of a well-known algorithm referred to as "Levinson's Algorithm" to produce the group of coefficients representative of the frame.
  • These 5 coefficients comprise the linear prediction coefficients, and are representative of the geometry of the vocal tract of the speaker.
  • These coefficients are then processed to transform them into a _. ⁇ ngle, transformed coefficient which is representative of the 0 group of linear prediction coefficients .presenting the frame of speech. This information s stored for future use.
  • the system detects the end point of an utterance, and forms a template comprising the group of transformed coefficients corresponding tc that utterance, with the template thus being representative 0 of the utterance.
  • the reference templates for comparison are retrievced from a reference templates store 66 which contains those templates which were produced by the individual user during the enrollment process, as well as templates of the corresponding word which are representative of a generic group of speakers.
  • the particular templates to be referenced in block 66 are indentified by a signal from an identity store 68 which selects the particular reference templates in block 66 based upon the identity of the user which was entered through block 52, and by use of the particular word being prompted, which was identified in block 54.
  • the distance comparisons or "distortions” - generated in block 64 comprise a set of values, with each value being a difference of the present utterance being tested as compared to one of the reference templates of this particular user. These difference values are then processed to provide a "score" which indicates the relative correspondence between the current utterance and the reference utterances.
  • the system next passes to decision block 7C ' and references the appropriate threshold value from block '/2 in determing whether to accept, reject or make no decision yet.
  • the threshold values in block 72 may be set by the operator, but if not set by the operator they will be set to default values. These threshold values comprise a set of "scores" which are based upon the results of comparisons in the enablement mode, and which indicate the probability of falsely accepting an impostor given a selected test utterance.
  • the score from block 64 is compared with the threshold value from block 72.
  • the system passes to block 74 from whence a signal is produced terminating operation of the verification mode. If the user has not provided at least two samples, even though the score is below the threshold, the system returns to block 54 where the user 0 is prompted to provide an utterance corresponding to another of the reference words provided in the enrollment process.
  • block 70 if the user has a score which is above any threshold value in block 72, this is counted 5 as a failure. If there are less than two failures, the .. • /stem proceeds to the block 54 and prompts the user for another of the user's reference words. If the user has failed twice, the system passes from block 70 to block 74 an rejects the user as an imposter. From block 74,
  • an IBM Personal Computer 112 includes a real-time operating system 114 that provides real-time servicing via a bus 116 connected to four speaker verification boards (SVBs) 118.
  • SVBs speaker verification boards
  • Each SVB in turn can serve two stations 102 on a multiprocessing basis.
  • One of the stations, station 103, is dedicated to enrollment and is denoted an enrollment station.
  • the system of the present invention has been designed to work on a host computer's bus in either a synchronous or direct memory access (DMA) manner after an interrupt protocol.
  • the host interface can be programmed for virtually any standard, the embodiment disclosed herein assumes use of an IBM PC/XT as the host.
  • the host may be selected from among many of the well-known IBM compatible personal computers which are presently on the commercial market, with only minor modifications necessary to interface to virtually any computer bus.
  • the host provides for user interaction to collect and store reference templates as described above, to calculate, update and store statistically information about the reference templates, and to perform certain ve r ification tasks.
  • the physical configuration of the system illustrated in Figure 17 is such that the bus 116 and the SVBs 118 are housed internally in the host computer with the stations 102 being external to the host.
  • each station 590 includes a 16-key touch-tone pad 592, a microphone 594 and indicator lights 596.
  • Station 590 is connected to an SVB 600 via a multiplexor 602 and two-way three-wire signal lines 598. O 87/00332
  • the user depresses the touch-tone keys (e.g. to identify himself during verification) and speaks into the microphone (e.g. to speak the reference words) , although never simultaneously.
  • Depressing the touch-tone keys causes DTFM tone generator 604 to produce one of the standard DTFM tone signal pairs which is then passed through signal combiner 606 and out onto lines 598 for delivery to the SVB 118.
  • the microphone produces voice signals which also pass through the signal combiner 606 and out onto lines 598.
  • the SD 106 receives the signals and passes them through analog-to-digital (A/D) converter 608.
  • the digitized samples are passed from SD 106 to SP 122 (described below) which, as described below, assembles them into frames and, for each frame, derives linear prediction coefficients.
  • SP 122 described below
  • CP 110 analyzes the subsequent incoming frames to determine whether speech or tone signals are being received. If tone signals are detected a conventional fourier analysis is performed to determine which key has been depressed. If, however, speech signals are being received they are processed as described below.
  • the decoding of DTFM signals thus uses the same hardware and software as is required for the processing of speech signals.
  • a solid state relay 610 (for operating, e.g., a door lock solenoid) is connected to a power source 612 (power supply) which is capable of supplying 12 volts or 24 volts, AC or DC (pulsating DC) , and is switch selectable.
  • Relay 610 includes (within the dashed lines) an "open" collector non-inverting buffer 614 for providing power to a light source for a Dionics DIG-12-08-45 opto-coupler 616.
  • the light source contained within the opto coupler is a light emitting diode (LED) of the Gallium Aluminum Arsenide type, emitting light in the near infrared spectrum.)
  • a door open signal 618 of a specific wave shape delivered over line 598 resets a counter 620 which is driven by a clock signal from an on-board door strike oscillator 622.
  • a second counter 624 is switch selectable (switches 626) for various door strike “on” times (2 sec to 50 sec) .
  • the output of counter 624 is connected to a gate 628 which implements either "norm off” or “norm on” door strike power as selected by jumper 630.
  • the signal output of gate 628 directly controls the state of the solid state relay 610.
  • optically coupled (electrically isolating) relay prevents referencing the power MOSFETS of the relay 610 or their input drive to system ground and, secondly, it isolates the system logic from noise and any high voltage (11500 volt) transients.
  • the output of opto-coupler 616 drives two power
  • MOSFETS 632, 634 Motorola MTP25N06 or International Rectifier Corp. 1RF-541) which are characterized by a very low resistance drain to source (0.085 ohms max., known as R "on") .
  • All power MOSFETS have an integral diode, drain to source, connected in a reverse direction from the normal polarity of the operating voltage. The diode plays an important role in the operation of relay 610, which allows AC or DC operation. With one input polarity, the current flows through the bulk of the left hand power MOSFET and through the diode of the right power MOSFET. When the polarity is reversed the path of the current flow also reverses.
  • MOSFETS are not subject to changes over time in their contact resistance and are therefore considerably more reliable. They are also unaffected by humidity and require extremely low input currents ( ⁇ 100 x 10 amps) to turn the device "on" (thus a very low power voltage source can be employed for turn-on) .
  • Element 634 is a V47ZA7 available from General Electric and is employed for transient limiting and protection.
  • the system of Figure 2 is capable of multi-tasking two verification channels for handling two users at different stations simultaneously in real-time. In addition, the system is designed to support as many as 16 separate user stations 102.
  • the SD 106 is comprised ⁇ f elements including PAL20RA10CNS programmable logic array from MMI for control purposes, in addition to other, wr l-known conventional logic elements for constructi * . - - -.:ch a data system.
  • the SD 106 receives analog signals from the stations 102, and iL ⁇ ltiplexes the signals to select two channels from among the set of 16. The two channels are then PCM filtered, both on their inputs and outputs, and are sampled every 125 to 150 microseconds. SD 106 is driven by its own clock 108 wh_.ch is interconnected thereto to obtain the samples. The SD 106 is electrically connected to a control processor subsystem (CP) 110, and interrupts the CP 110 at every sample conversion time set by the clock 108. During this time, the CP 110 can perform successive read and write operations to both channels from the SD 106. At no other time can the CP 110 gain valid access to the SD 106 for reading or writing.
  • CP control processor subsystem
  • the CP 110 comprises a 10 MHz MC68000 microprocessor 112, a memory 114 and logic 116 to generate various control signals for input/output and memory accesses.
  • the CP 110 communicates with and coordinates all other functions in the system, including communicating and coordinating with the host computer.
  • the CP 110 can perform signal processing tasks, it mainly functions as a data administrator. Thus, it can directly control and gain access to each part of the system and it also can sesiate with the host computer in the form of coded data via a host interface unit (HIU) 118.
  • HEU host interface unit
  • the CP 110 is also electrically connected to a central data buffer (CDB) 120 which is a fast 4K X 16 random access memory data buffer, through which a majority of the inter-processor data and messages can be quickly and efficiently passed.
  • the buffer could comprise, for example, four IMS1421S-50 RAM devices (4 X 4096) produced by INMOS.
  • CDB 120 is the sole source of data transfer between processors. The only other data path of this kind is the port between the CP 110 and the HIU 118.
  • CDB 120 includes data ports for the CP 110, for the HIU 118 and for a signal processor subsystem (SP) 122.
  • CDB 120 also includes an address port for memory-mapped input/output communication with - 22 -
  • the CP 110 For direct memory access purposes, there is a 12-bit counter 124 which allows fast sequential auto-incrementing access to the CDB.
  • the 4K X 16 memory is accessed by the SP 122, and the HIU 118, but only byte-wide by the latter.
  • the HIU 118 provides a signal on the line illustrated at 126 which, when the CDB 120 is placed in the proper mode, only allows auto-incrementing every two bytes. Either word or byte-wide accessing is available to the CP 110, however, for maximum efficiency.
  • the CDB 120 One of the most important functions of the CP 110 is to manage allocation of the CDB 120. In light of this, and to maximize efficiency, the CDB 120 was designed to interface with the CP 110 in memory-mapped fashion. The SP 122 and HIU 118, however, view the CDB ' as an input/output device, but attainable via direct memory access. Most inter-processor communications must utilize the CDB 120, therefore much of the system time in the verification mode is spent in this element of the system. Nevertheless, any processor can gain access to the CDB 120 at any time since there is no hard-wired priority system.
  • Accesses to the CDB 120 are strictly controlled by the CP 110, to the extent that CDB 120 requests are first granted by the CP 110.
  • the protocol starts with an interrupt to the CP 110 by the requesting processor.
  • the CP checks the queue to see if the CDB 120 is being used. If not, an interrupt is sent back to the requester, who then can assume that it has designated possession of the CDB buss.
  • the CDB 120 Since the SP 122 has no direct communications port to either of the other processors, the CDB 120 must be employed -for this purpose.
  • the HIU 118 can also deposit and retrieve messages and/or data via the CDB 120, even though there is a port between it and the CP 110.
  • each processor must have a pre-assigned window of memory, one of which is known to the other processors. Therefore, data can be passed to any processor from any processor by simply accessing the appropriate window in memory with any possession of the ' CDB 120.
  • the CP 110 is also connected to the SP 122 which is a 16/32-bit TMS32010 signal processing microprocessor.
  • the SP 122 also includes memory storage for the SP microcode provided by elements such as two 8 X 1024 bit PROMS of 754 ns speed, such as MMI6381.
  • This s-.osystem includes random logic for input/output decoding., communications with the CP 110, and a data path to the CDB 120.
  • SP 122 performs arithmetic calculations at a rate of 5 million operations per second, and therefore is used for all of the heavy number handling tasks.
  • the CP 110 initiates a task in the SP 122 by sending a special signal over line 128 which causes the SP 122 to branch to the location specified by a command written by the CP into the CDB.
  • SP 122 has a pre-stored library of commands comprising uncoded vector locations, which the CP 110 calls to perform various tasks. Therefore, the SP 112 is a slave system whose tasks are initiated by the CP 110. It is also possible for the CP 110 to assign the CDB 120 to the SP 122 for the entire duration of a selected task, in which case the CDB 120 becomes a dedicated storage medium for the SP 122.
  • the HIU 118 functions as the interface between the host processor (not shown) and the CP 110, as well as the CDB 120. All signal interfacing is buffered by the HIU 118, which conditions the timing and control signals from the host to meet compatibility requirements. For example, and incompatibilities between the timing system of Figure 2 and that of the host processor are taken care of by a programmable array logic device (PAL) (not shown) which is internal to the HIU 118.
  • PAL programmable array logic device
  • the interrupt vector can either be -.ead _rom the CP host processor data port or from the mf * - c ",age window in the CDB as described above.
  • the host processor must know the s-s__r. address.
  • the base address is installed in the HIU 118 input selector, device, which is simply an address comparator.
  • Each system's base address is user programmable via switches, giving the host processor multiple systems to use for greater processing power.
  • up to 64 stations 102 can be provided for use in the verificationton mode by merely installing more circuit boards onto the host processor's bus.
  • Figure 3 illustrates the enroll mode which is generally indicated at 24 of Figure 1.
  • the system begins operation so that in block 152 it obtains the user's name, the access points through which this user may pass, and the maximum permissible levels for false acceptance of a user and false rejection of a user. Default values are provided where necessary, if no values are specified.
  • the system also retrieves a list of reference words which are to be recorded by the user and stores them for reference. For purposes of discussion, the use of 10 reference words is described herein.
  • substantially any desired number of reference words may be selected without changing the function or structure of the system.
  • the number of reference words used is increased, the ability to obtain words having this low probability of error is enhanced.
  • the enroll mode is functioning to generate a set of data from this particular user which may bew utilized in the later verification protion of the system operation.
  • step 152 From block 152 the system passes to block 154 where the information is assigned a temporary identification number and then the system passes to step 156.
  • Block 156 is a decision block relating to the number of tokens of the 10 reference words which have been collected. If a selected number of tokens (or utterances) of each of the 10 words have not been made, the host processor passes to step 158 and requests a new word fro.- the user by generating a prompt command. For purposes of discussion, the use of four tokens for each reference word is described herein. Of course, it will be appreciated that substantially any number of tokens could be utilized for proper operation of the method and apparatus - ⁇ .. ⁇ the present invention.
  • the CP 110 of Fi ⁇ re 2 produces a signal through SD 106 which is co munic'.t'.' 1 via input/output line 104 to a station 102 whereby, through means of a speaker or other audio or visual communicatxon means, the user is prompted to utter one of tha ten reference words for which tokens are to be stored.
  • the microphone in station 102 receives an utterance of the reference word form the user, and communicates it from station 102 via line 104 and SD 106 to the CP 110, where it is further processed as explained hereafter.
  • the host processor moves to block 160 where it instructs the system to produce autocorrelation coefficients (r coefficients) for the incoming speech by calling the "enroll" routine.
  • the enroll routine will be discussed more completely hereafter in reference to Figure 4.
  • the system examines those coefficients to determine whether incoming speech is present in the system. If no speech is present within a preset time period following the prompt signal, the host porcessor determines that there is a failure and passes to block 162. If the number of times the system has detected no speech following a prompt signal exceeds a predetermined threshold number, the processor moves to block 164 and aborts further attempts to enroll the user.
  • the processor returns to block 158 and again causes the user to be prompted to input an utterance.
  • the host processor moves to block 166 and initiates conversion of the r coefficients to transformed linear prediction coefficients (aa coefficients) which provide a correlation between the detected utterance and the geometry of the user's vocal tract. This conversion is performed as part of the enroll routine to be described with respect to Figure 4.
  • the host processor moves to block -168 and causes the autocorrelation coefficients and transformed linear prediction coefficients to be stored in an array for future use.
  • the processor then moves to block 170 and increments the wo'rd and token counter to identify the next token of the current word if four tokens have not yet been received from the user, or to the next word if four tokens have been reveived. As was indicated above, the number of tokens is programmable, and is selected by an operator prior to use. From block 170, the processor returns to block 156.
  • the system moves from block 156 to block 169 and sets a flag indicating that the stored information from the enable mode must be processed to produce the data base necessary for later system operation in the verify mode. Having set this flag, the system passes to block 171 where it exits the enroll mode of operation and returns to the initiate block 20 of Figure 1 to await further instructions. If four tokens of each of the ten words have not been received, the system passes from block 156 to to block 158 and operates as described above. By reference to Figure 4, it is possible to describe the enroll routine which is activated by the host processor in block 160 of Figure 3.
  • the system Upon entering the enroll mode 160 of Figure 4, the system moves to block 172 wherein the system allows the MC 68000 microprocessor of the CP 110 of Figure 2 to digitize the analog speech signal received from SD 106 at a sampling rate of approximately 8,000 samples per second. Sample frames are formed and stored in buffers allocated in memory 114 of the CP 110. Each one of the samples is read from a coder/decoder (CODEC) chip in the SD 106 as an 8-bit wide m - law encoded word whenever control is passed to the CODEC interrupt handler. Once a buffer in the CP memory 114 is filled with a frame of samples, the system moves to block 174 and copies the samples from the buffer into the CDB 120.
  • CDB 120 coder/decoder
  • Each of the 8-bit code words making up the samples is decompanded to define a 16-bit sample. This decompansion is done on the fly utilizing a table driven procedure for the sake of speed.
  • the use of m-law compansion is well-known in the technology. A comprehensive discussion of this subject matter is presented in "Digital Signal Processing Application
  • the system moves to block 176 and accomplishes preemphasis by forming a frame which includes the present frame plus the last half of the previous frame, thereby permitting an overlapping analysis.
  • the samples are preemphasized by use of a first order finite impulse response filter which is applied to the input samples in a format as follows:
  • n the index number of the sample
  • x the samples prior to preemphasis
  • y samples after preemphasis.
  • Preemphasis is performed to emphasize the high frequencies and cancel the low frequency emphasis caused by the sound transition from the speaker's lips to.the open space between the speaker and the microphone.
  • Equation (2) The unnormalized autocorrelations from Equation (2) are converted into normalized correlations by the relation:
  • the R(0) correlation is referred to as the "energy” and is a quantity which is utilized by the system in the detection of the beginning and end of an utterance as is explained hereafter.
  • the normalized autocorrelations (r coefficients) from Equation (3) together with the energy term R(0) are returned to the CP 110 as soon as the SP 122 has finished the calcu] __ion ⁇ * - .
  • the system passes to block 178 where it develops the linear prediction (LP) coefficients. Specifically, t'.e "-r 110 retrieves a copy of the r coefficients from t'.te CDB 120 and then starts the SP 122 to develop the LP coefficients.
  • LP linear prediction
  • Levinson's Algorithm delivers a set of filter coefficients by the recursive procedure which relies on the relationships indicated below:
  • the system moves to b ' Ock 180 and transforms the LP coefficients into aa coefficients.
  • the procedure for accomplishing tnia - transformation is as follows:
  • the system moves to decision block 182.
  • the CP 110 feeds the extracted data into a state machine to determine whether a user was speaking into the microphone of station 102 during the time the frame was recorded. This state machine is described in more detail hereafter with reference to Figure 5.
  • the speech detector finds that no speech is active, the system moves to block 184 and determines whether the speech detector state machine is in an exit state. If it is in an exit state this indicates that speech is completed and the system moves to block 186, terminates operation of the algorithm of Figure 4, and then returns to block 160 of Figure 3. If the state rachine is not in an exit state while the system is in block 184, the system returns to block 172 to obtain the nex speech samples for processing as described above.
  • the system moves to block 188 and determines whether the system is in the enroll mode. If it is in the enroll mode, the system moves to block 190 and stores the aa coefficients in the memory 114 of Figure 2. If the system is not in _ne enroll mode, then the system moves from block 188 to block 192 and stores the normalized autocorrelation coefficients and the residual energy from the Levinson's Algorithm in memory 114. From either of blocks 190 or 192, the system returns to block 172 and continues functioning as described above. As soon as the last frame of active speech from the user has been found, all of the parameter sets extracted so far are copied from the memory 114 into the CDB 120. The host is then interrupted by the CP 110 indicating that the parameters for a full utterance are being delivered to the CDB 120. The host then reads the parameters from CDB 120 and eventually stores the results on its mass storage devices.
  • 1'he state diagram includes a silence state 200 which is botn the initial and final state. In this state, the machine waits to detect the beginning of an utterance and returns after the end of the utterance is detected.
  • the state machine goes from the silence state 200 to the attention state 202. Specifically, the machine goes to the attention state if the energy of the detected signal is either above a certain upper threshold level ⁇ r, if the energy is above a certain lower threshold level and the normalized autocorrelation function (r) has an euclidean distance which is more than a preselected threshold distance "a" from that value of the autocorrelation function (r) which the machine has measured for noise.
  • This noise autocorrelation function is recursively updated by the machine in the silence state 200.
  • the machine From the attention state 202 the machine will go to a speech state 204 when the detected energy is high enough to prevent return of the system to the silence state, and when the machine has spent three cycles in the attention state 202.
  • the system will remain there until the detected energy drops below an "end of speech" threshold, indicating that a possible end of the utterance has been detected. At that time, the machine will go to an exit state 206. From exit state 206 the machine will go to silence if the detected energy is not high enough to exceed a lower threshold after five cycles. If energy of a sufficiently high value is detected, the machine will move from exit state 206 to resumption state 208 which functions, similar to the attention state, to move the machine to the spee 1- state 204 when the energy is high enough not to rf.curn to the exit state 206 and when the machine has spent three cycles in the resumption state 208.
  • the speech detection automaton also controls the attenuator system of the present invention.
  • the speech detection machine comprises software, although a hardware embodiment could be readily provided by one skilled in the technology and based on the above description of the machine.
  • the system can be activated either by an operator, or by other means such as a timing device, to utilize the information generated in the enrollment mode for forming the data base which will be used to accomplish comparisons of incoming speech signals in the verification mode.
  • Figure 6 illuratrates a method of system performance by which the appropriate data base may be constructed.
  • the system Upon receiving a signal requesting formation of a data base, the system moves from block 220 of Figure 6 to block 222, and determines whether the flag set in 169 of Figure 3 is still set. If the flag is not set, then there is no data to be processed and the system moves to block 224, terminates operation of the procedure for making the data base, and returns to initialize state 20 in Figure 1.
  • the system moves from block 222 to block 226 where it obtains the next word stored in CDB 120.
  • the system also determines whether it is finished with the preparation of the data base. If it is not, the system passes from block 226 to block 228. in block 228 the system makes "intra-speaker” comparisons wherein the statistics developed in the enrollment mode are compared with each other to develop "global distortions” indicating the extent to which the words differ amongst themselves. These intra-speaker comparisons are computed by obtaining, for each of the speakers, N tokens of that word.
  • the host processor obtains the global distortion G from this intra-speaker comparison by use of an intra-speaker routing which implements the above-described procedure and which will be described hereafter with respect to Figures 7 - 9. _
  • the inter-speaker comparisons for a speaker's word are computed by obtaining, for each of the speaker's N tokens of that word, the global distortion (G) that ⁇ ._ ⁇ lt ⁇ from comparing r(i) versions of that token with the aa(i) versions of all other tokens of the sam-j word, produced by all speakers in a generic data ba'_. .
  • the system moves to block 232 where the distortions from the intra-speaker and the inter-speaker comparisons are merged, sorted in numerically descending order, and stored in an array D.
  • the distortions in this array continue to be labeled as inter-speaker or intra-speaker distortions.
  • An example of the array D created in block 232 is illustrated in Figure 12.
  • a statistics file STAT is created.
  • the system creates a file which, for each intra-speaker distortion, provides an indication of the likelihood that the system will erroneously reject the actual speaker, or erroneously accept an imposter, if the threshold value for making the accept/reject decision is based on a distortion value corresponding to the distortion of that particular intra-speaker distortion.
  • a STAT file based upon the information in array D of Figure 12 is illustrated in Figure 13.
  • the pro * ⁇ - ⁇ ure utilized in block 234 for developing the STAT file of Fiture 13 will be described hereafter with respect to Figure 11.
  • the system of Figure 6 returns to block 226 to obtain the enrollee's next word. If all of the words have been processed in the manner described above, the system moves to block 236 and constructs an ORDER file which indicates the relative ability of each of the reference words to distinguish the present speaker from other speakers whose data is stored in the generic data base. Words with high discriminability will be those that have, at any given level of false reject error rate, low false accept error rates. The system determines the relative discriminative power of the words for the current user by sorting those words based on the highest value of the false accept designation in that word's per-word STAT file.
  • the value used for sorting the word represented by the STAT file in Figure 13 would be the first entry under the ERROR FAL -_ A CCEPT nea 3ing, which is the value of 2/6.
  • ERROR FAL -_ A CCEPT nea 3ing which is the value of 2/6.
  • the system Upon completing the comparison and otaining the global distortion, the system moves to block 244 and stores the distortion for later use. From block 244 the system returns to block 240 and obtains the next of the N tokens, and then processes this as described above. If no more tokens are available for the given word, the system moves to block 230 which corresponds to block 230 of Figure 6 and initiates an inter-speaker comparison.
  • FIGs 8 and 9 the COMPARE procedure utilized in block 242 of Figure 7 is described.
  • the system Upon entering the COMPARE block 242 of Figure 7, the system moves to block 242 of Figure 8 and initiates a comparison between a reference token and a test token.
  • the reference token is one of the previous utterances of the enrollee.
  • the test token is the token which is currently being processed in the making of the riata base, as illustrated in block 228 of Figure 6.
  • the comparison is accomplished by comparing a reference pattern of aa coeffeicient sets against the * (i) and e values of the test template.
  • the circumstances of the test are represented graphically in Figure 10 where it is seen that the reference template is defined by designating the aa values for each frame from zero to j on the reference axis 280. Likewise, the r and e values for each frame from zero to i of the test template are located along the i axis 282.
  • the length of the reference template is designated as extending from the origin 284 to M on the j axis. Likewise, the length of the test template is indicated as extending from the origin 284 to the location indicated as N on the i axis.
  • the results of the test comprise a global distortion value G which is representative of the minimum amount of distortion experienced in traversing all possible paths between the origin 284 and the intersection of lines perpendicular to the axes of M and N locations, designated at 288.
  • This global distortion G is derived as the result of a two-stage process which is embodied in the flow diagram of Figure 8 and which will now be explained. From block 242 of Figure 8 the system passes to a decision block 246 where the number of frames M of the reference template is compared against the number of frames N of the test template. If the difference between these numbers of frames is outside preselected threshold values, the system moves to block 248 and provides an output signal "999" indicating that the lengths of the utterances are uncompatible for comparison purposes.
  • the threshold values are:
  • the SP 122 of Figure 2 compares the "N" vectors of aa coefficients with the normalized autocorrelations "r". The "N" resulting local distortions are stored as 16-bit values in the CDB 120. The relationship for obtaining these local distortions is as follows:
  • a distortion value may be derived representative of the distance measures between the reference and test templates at that location.
  • the system next moves to block 256 and develops the global distance value g(i,j) for the particular location indicated by the coordinates i a"- ⁇ _ j on the graph of Figure 9.
  • the process evaluates different paths to the given location and selects the path having the minimum distortion value.
  • the minimum distortion from among the three paths indicated at 292, 294 and 296 is accepted.
  • This minimum path can be determined mathematically by the following relation: 9i-2 , j-l + 2di- ⁇ ,j + di . j
  • the final global distortion which defines the value of the minimum path, is found at point 288 in Figure 9.
  • the system in block 256 returns a final G value which is normalized based on the length of the tokens being compared. This final G value is defined as:
  • the CP 110 maintains only tlr.ee -ows of global distortions and two rows of local 'istortions- since this is all the information necessary to continue the above-described computation. This saves a significant amount of memory space. Further, due to the parallelogram boundaries in the dynamic programming algorithm described above, approximately 30% of the points can be excluded from an explicit search. These points would be outside of a parallelogram with its corners connecting the origin 284 and the end point 286 of the line in Figure 9. These points would be particularly concentrated near those boundaries of the graph which are in the vicinity of the M value on the j axis, and the N value on the i axis.
  • the system moves to block 260 and increments the j index by 1.
  • the system then moves to decision block 262 and determines whether the new j index is equal to the number of aa coefficients in the reference token. If the j index does not equal the number of aa coefficients, then the system returns to block 254 and computes the distortions as described above for the new location in the graph of Figure 9.
  • the system moves from block 262 to fock 264 and increments the i index by 1. From block ⁇ 264 the system moves to decision block 266 and determines whether the new i index is equal to the number of r coefficients of the text token. If it is not, the system returns to block 252, sets the j index equal to zero and continues developing the distortions at the new location in the graph of Figure 9,
  • the system If the new index i equals the number of r coefficient sets for. t e test token, then the system is at the final point 288 in the graph of Figure 9. The system then moves to block 268 where the final, normalized global distortion value at point 288 is defined as indicated in Equation (13) above. This normalized value then becomes the global distortion correponding to the comparison of the selected reference token and the test token.
  • the system passes to block 230 of Figure 6 where the inter-speaker comparisons are performed. By reference to Figure 10, the procedure for accomplishing the inter-speaker comparisons is more clearly described.
  • the system moves to block 300 wherein it advances to the user's next token for the particular word being dealt with. Once this token is identified, the system moves to block 302 where it advances to the next generic speaker, and then moves to block 304 where it loads that specific generic speaker's next token for the current word being tried. The system next passes to block 306 where it conducts the comparison of the current token of the word from the user with the token from the particular generic speaker as identified in block 304.
  • the procedure for accomplishing this comparison is the same as was described above with respect to Figures 8 and 9, with each comparison producing a global distortion representative of a score value for the compared tokens. After completing the comparison, the global distortion is stored for further use.
  • the system returns to block 30 ⁇ and obtains the next token of that particular generic speaker. If there are no more tokens for that speaker relating to the word under consideration, the system passes from block 304 to block 302 and advances to the next generic speaker, and then proceeds as described above to evaluate the tokens for that generic speaker. If there are no more tokens of the present word from generic speakers, the system moves from block 302 to block 300, and advances to the user's next token of the particular word. If the comparison for all of the user's tokens of that word have been completed then the system moves from block 300 to block 232 in the flow chart of Figure 6 wherein the distortions which have been stored in block 308 of Figure 10 are merged and sorted as described previously.
  • each of the intra-speaker distortions in the sorted array D of Figure 12 is assigned a number
  • the first element of the ERROR treat file in the per-word STAT file is arrived at by applying the above formula (14) to the first intra-speaker distortion value which is encountered when working from the top down in the D array of Figure 12.
  • this first encountered distortion value would be identified in order as number 6, having a distortion value of 87.
  • the K term equal? 6 s- ' .nce there are 6 distortions which are equal to or I.ss than th_ first intra-speaker distortion encountere--, which is order number 6.
  • the system moves to block 314 where it develops the ERR0R-_ AL _ E REJECT (ERROR FR ) column of the per-word STAT file of Figure 13.
  • ERROR_late column can be developed by the following relation:
  • the system next asses to block 316 and stores the information developed in blocks 312 and 314 in the per-word STAT file as illustrated for discussion purposes in Figure 13.
  • the system moves from block 316 to block 310 where it gets the next intra-speaker distortion from the D array, and forms a per-word STAT file for that word, in the manner as described above. If there are no more intra-speaker distortions in the array D the system moves to block 226 of Figure 6 and functions as described previously.
  • a user may activate the verify mode of the present invention in attempting to gain access to a secured area, or to otherwise use the present invention for purposes of access, identification, and the like for which the verification system of the present invention finds application.
  • the system in Figure 14 passes to block 400 and determines whether there is a present claim for access at one of the stations 102.
  • the host processor continuously scans all access points for activity. Thus, if no activity is indicated the system loops via line 402 back to block 400 and continues to function here until such time as a claim is made.
  • the user may present an identity claim in several ways, including (but not limited to) entering a personal identification number on a key pad, inserting a plastic card in a reader, or saying his name into a microphone.
  • the system passes to block 404 and issues a request to the user for an identification. This request is transmitted from the CP 100 through the SD 106 to the station 102.
  • the system passes to block 406 where it monitors incoming signals from the station 102 via SD 106, and detects energy levels of the signals corresponding to an indication that identification information may be present. If no incoming signals of this type are detected after a preset period of time, the system returns to block 404 and again requests an identification from the user. Upon detecting signals comprising identification, the system moves to block 408. In block 408 the system compares the identification information which was previously stored for the user to determine whether the identification corresponds to an enrolled user, and whether the identification permits access from the particular access point.
  • the system moves to block 410 and produces a signal rejecting the user. If the identification is found to be acceptable for the particular access point, the system moves to block 412 and sets the threshold values based upon data for the particular user which was previously entered and stored. II no threshold values are specified, the system will utilize default values.
  • The. threshold values which are set include the maxxmum number of trials to permit before rejecting the claim (where each word constitutes a trial) ; the minimum number of trials to require before permitting a claim to be accepted; the level of false accept error rate
  • the system moves from block 412 to block 414 from whence it accesses the ORDER file 416 for this particular user.
  • the system gets the next word from the ORDER file which was created in the enrollment mode.
  • the system Upon receiving the current token uttered by the user the system moves to block 420 and obtains the normalized autocorrelation coefficients r. These r coefficients are obtained by use of the test procedure.
  • the test procedure functions on the current speech to develop the r coefficients, as well as other parameters, in order to facilitate the verification process. This procedure was previously described in connection with the illustration of Figure 4.
  • ERROR_ A by comparing the present token with the corresponding four templates of the user and then utilizing the global distortion obtained from those comparisons to obtain the correct value for ERROR-,., from the ST? ' , file for this word.
  • the procedure for obtainin g this ERROR FA is illustrated in more detail in F ⁇ ' jure 15 and will be discussed hereafter.
  • the system passes from block 422 to block 424, wherein the fail counter is incremented.
  • the system then passes to block 428 where further testing is conducted to determine whether any action as to acceptance or rejection should be taken.
  • the test procedure which is performed in block 428, is described hereafter with reference to Figure 16.
  • the system Based upon the decision of the test procedure in block 428, the system either makes no decision and returns to block 414 wherein the next word is obtained from the ORDER file and operation continues as described above, or the system passes to either block 430 and rejects the claimant, or to block 432 and accepts the claimant. From blocks 430 and 432 signals are produced which are transmitted to station 102 of Figure 2 to advise the claimant of the decision. Of course, these signals could also be provided to other external equipment to accomplish things such as opening doors to secured areas, initiating operation of selected equipment and the like.
  • the system moves to block 426 where the cumulative error is adjusted.
  • the cumulative error corresponds to the ERROR--- value provided from block 422.
  • the cumulative error comprises a combination of the previous cumulative error value and the current ERROR ⁇ value received from block 422. In one preferred embodiment, this combination comprises the product of the-previous cumulative error value and the current ERROR-., value received from block 422.
  • the procedure for obtaining the ERROR-,-, in block 422 may be described.
  • the system Upon entering block 422 the system immediately passes to block 440 of Figure 15 and gets the next of those four aa files developed in the enrollment mode, which correspond to the word being tested.
  • the system passes to block 442 and compares the new r coefficients obtained in block 420 of Figure 14 with the aa values obtained from block 440. This comparison is accomplished by the procedure outlined previously with reference to Figures 8 and 9, producing a global distortion representative of the difference between the r values of the new token, and the aa values of the token obtained in block 440.
  • the system next passes to block 444 and reads the global distortion value obtained in block 442. This distortion value is saved and the system passes to block 446 and determines whether any of the N stored tokens correspr--;_din ⁇ . to this user have not yet been compared with the r values of the current utterance. If there are tokens that have not been compared the system returns to ' jJc-i * 440 and functions as described above. If no more tokens remain to be compared, the system moves to block 443 '-here the global distortions produced by the comparisons of the new r values with the N stored templates of the user are processed to develop a composite distortion. This composite distortion may comprise any of several different vlues.
  • the token distortion values are averaged together to comprise the composite distortion.
  • the lowest value of the distortions is selected, and in yet another preferred embodiment the lowest two distortions are averaged.
  • the basis upon which the make-up of the composite distortion is selected may be dependent upon the type of application for which the invention is utilized and upon the desires of the operator.
  • the system passes to block 450 and references the per-word STAT file created in the enrollment mode.
  • the composite distortion is compared with the distortions in the per-word STAT file and that distortion which is closest to, but greater than, the composite distortion is identified.
  • the ERROR FA associated with the identified distortion is extracted, and is utilized as the ERROR-,, which is provided to block 422 in Figure 14.
  • block 428 of Figure 14 which comprises the test block, is described with respect to Figure 16. Specifically, upon entering block 428 the system passes .to block 460 and determines whether the 5 fail counter from block 424 indicates that the user has exceeded a preselected number of trials. In a default condition this preselected threshold value is two. Of course, this value may be set at the discretion of the operator. If the maximum failed trials has been o exceeded, the system passes to block 430 and rejects the claimant as described previously.
  • the system passes to block 462 and determines whether the maximum number of trials has been exceeded. 5 This condition arises when the cumulative error of the claimant is greater than the threshold value for acceptance, but the trial has not been failed in block 460. A default threshold value for the maximum number of trials is five. Again, the operator can select 0 another value if he desires. If the maximum trials has been exceeded, the system passes to block 430 and rejects the claimant in the manner described above. If the maximum trials have not been exceeded, the system passes to block 464 and determines whether the cumulative error from block 426 of Figure 14 is greater than the threshold value. If it is, the system passes to block 414 of Figure 14 and obtains the next word from the ORDER file and then proceeds as described above.
  • the system passes to block 466 and determines whether the claimant has passed at least two trials. This means that at least two trials have been conducted for this claimant, and that the cumulative error at the end of the second trial is below the threshold value for acceptance.
  • the cumulative error is adjusted with each trial in block 426.
  • the threshold is also adjusted along with the cumulative error. This adjustment can be made as a proportion of the change in cumulative error such as by multiplying the threshold by the present ERROR__ A value, in the same manner as is done for the cumulative error.
  • the tendency of the cumulative error value to go down as it is multiplied by fractions would be accompanied by a lowering of the threshold level, so that the likelihood of an impostor being accepted could not be increased with an increasing number of trials.
  • the system passes from block 466 to block 414, gets the next word from ORDER file 416, and proceeds as discussed previously. If at least two trials have been conducted, the system passes to block 468 and accepts the claimant by producing a signal which is transmitted through the station 102 to communicate to the claimant that he is accepted, and to otherwise activate equipment which may be operated by such acceptance signals.
  • the memory of the personal computer is organized into blocks to be used as stacks 124. These stacks will hold all modifiable data for each of twelve processes that may be contemporaneously in progress.
  • An "image" of each process to be run is stored in its stack.
  • Process 0 is designated to be special in that it needs no image initialization and it performs all other initializations and setups. Thereafter processes 0 simply reads the time continuously and makes the "time" available to all other processes in a shared semaphored memory location. Only after the images have been initialized is a routine called the "clock interrupt service routine" or "clock ISR" permitted to begin operation.
  • the clock ISR pushes the image of the current process onto that process's stack. Then, based on the status of other processes, a decision is made by a "scheduler" as to which process deserves use of the digital processor in the personal computer.
  • the chosen process's stack address is then retrieved from the process table 126 and its image is popped off the stack. Operation of that process begins immediately upon returning from the interrupt.
  • a critical part of the initial image is the establishment of the correct interrupt return address and flags.
  • the procedure of changing stacks and images is called a "context switch”.
  • a second mech_.ri-.ui also performs a context switch.
  • a process wa ⁇ cing for some event hands control to the context switch mechanism --hich builds an image as if it were interrupted then continues with the stack decision and image popping of another process.
  • a semaphore is a flag which can be set and examined only as an atomic non-interruptable' operation. At issue is the possibility that two processes may try to examine or modify the same location, device, or resource and expect no interference during that operation.
  • a semaphore is guaranteed validity by having a process turn off interrupts and read the flag. If a flag is free, the flag is set and interrupts are turned on again. If the flag is in use, do a context switch. The flag is examined again when the processor returns to this context. Interrupts remain off within this context until the semaphore is free.
  • Each device i.e. the clock, screen, keyboard, disk, diskette, rs232 bus, and SVB
  • the SVBs are examined in a "round robin” fashion in the interrupt service routine t_ determine which board has the right to use a DMA function which is shared by up to four boards.
  • a process When using a device a process sets the device semaphore, turns off interrupts, then commands the device, then sets the device busy flag, and then co ⁇ it ⁇ vt switches out.
  • the device interrupt service routine must reset the device busy flag.
  • the scheduler can then return to that process.
  • the process must then re-.ea-.p_ the device semaphore and turn interrupts back on.
  • the decision of the scheduler as to which process gets use of the processor is based on priorities, device busy flags, and process states. There are only two levels of priority. High priority is given only to speech output which must remain contiguous (continuous) to sound correct to the user of the system. All other processes have low priority. High priority processes have absolute precedence over low priority processes.
  • the scheduler examines the process table in a "round robin" fashion starting at the process immediately following the one being switched out. During this round robin examination, processes waiting for devices and dead processes are skipped. The first runnable low priority process is remembered. If a high priority process is waiting, the first one found is served first. if none is found, the remembered low priority task gets service. It is not possible for process 0 to be blocked or "dead” therefore the scheduler always has a process to run. Processes may be "killed” or "doomed”. A killed process will terminate immediately, context switch out (if in) and be subsequently skipped in the round robin examination. A doomed process will continue until the process finishes upon which time it kills itself.
  • the process driver is an infinite loop which executes the function specified in the process table. This infinite loop will kill itself upon finding the ⁇ .'ccess doom flag set and then context switchout.
  • Processes may be "exec'ed". To exec a process, - first it must be assuredly dead. A kill command is issued for that process and a function pointer is copied into the process table entry for that process, then the remainder of the process table entry is initialized except for the process state which must remain as killed until the entry is up to date. Then the process state is raised to "live". - 59 -
  • the operating system must maintain the fixed disk in the equivalent of a file system.
  • the disk is organized into seven major blocks: the BOOT, OPERATING SYSTEM, STRUCTURES, HEADERS, IMPOSTORS, ENROLLEES, and UDIT TRAIL.
  • the BOOT contains the program necessary to bring up the operating system including the disk linkage for finding the operating system and structures.
  • the OPERATING SYSTEM contains the program that operates, e.g., all devices and resources in the management of verification.
  • STRUCTURES is a map of where all the 7 major blocks are located. These major blocks are sometimes split into separated minor blocks. For instance, two copies of the HEADER are stored in widely separated disk sectors so that damage to one copy does not permanently destroy valuable data.
  • the HEADERS contain all the enrollee personal data and linkages into the ENROLLEES block and IMPOSTORS block.
  • the IMPOSTORS block is an array of tokens. Each IMPOSTOR has 5 tokens each of 5 words. .
  • the ENROLLEES block is an array of tokens. For each enrollee there are 5 tokens each for 5 words.
  • the verification procedure is coded in a separate module which uses primitive functions provided by the operating system.
  • Verification is broken down into three major sub-modules: VERIFY, ENROLL, and MANAGE.
  • ENROLL can only be seen from the action of the "Manager Access Point".
  • the two buffers are required for keeping in RAM the operation of generating intra- and inter-speaker statistics. Shuffling off and onto disk during this operation would slow the system down considerably.
  • NOTE These buffers are also used by other modules which may not run concurrently with ENROLL such as BACKUP. BACKUP also shuffles blocks of 25 tokens each off and on disks and diskettes.)
  • the prompts are kept in RAM for similar speed reasons.
  • the sharing of the processor by the scheduler could make the portions of output speech fragment leading to poor quality sound. Therefore, speech output is given high priority.
  • the process sets up a command for the interrupt handler, marks itself as waiting for service and signals the board interrupt handler. Since the interrupt handler has interrupts turned off there is never contention for the DMA so that semaphores and device busy flags are unnecessary.
  • the interrupt handler drops the waiting-for-service fl? ⁇ j for a process as soon as it is finished with that process's request.
  • the process context switches out while waiting for service and is not executed again until the waitinj for service flag drops.
  • the process semaphores the disk, sets up the command and DMA, marks the disk as busy, starts the DMA, and then requests a context switch to give other processes time to execute while the disk is busy.
  • the disk executes an interrupt which marks the disk as idle and turns off the semaphore.
  • the scheduler will, when the round robin loop reaches this process again, continue execution of this process.
  • ENROLL is started from the MANAGER which "dooms" process 1. Since process 1 is normally a verify process, the system waits for the verify to time out and kill itself. It would be inappropriate to interrupt a valid verification which may be occurring.
  • ENROLL begins by getting vital statistics, and then as each token is retrieved it is placed in the R token buffer in sequence according to its word number and token number. The tokens are then examined for large variances in length. If any such variances are found, an attempt is made to replace the tokens that have standout lengths. If too many tokens are "standout" within a word, a fresh set of tokens for that word is prompted. If this is unsuccessful, the ENROLL aborts. If the ENROLL is successful, the tokens are then converted to k tokens which are placed appropriately in the k token buffer.
  • the board is then commanded to generate scores on each token as compared to all the other tokens of the same word. These scores are used to determine the intra-speaker statistics which are stored in the header. Then each installed impostor is read into the r token buffer and inter-speaker statistics are generated in the same manner. For each impostor that is not installed a default set of statistics is generated. These statistics are combined into generalized inter-speaker statistics. These statistics are then stored in the header. The k token buffer is then assigned to a User Disk Index (udi) and written out. It is read back into the r token buffer and a byte-by-byte comparison is made to determine if there is a bad sector on the disk.
  • udi User Disk Index
  • VERIFY is the normal function for processes 1 through 8.
  • Appendix A is a copy of the object code for'the subprograms for verification, data
  • the object code is in octal form, written in the programming language C and the Masscomp version of the M68000 assembler language for use on the IBM PC/XT. Also attached, as Appendix B, is a copy of the SP object code which is written in the
  • the apparatus and method described above comprise a significant improvement over the prior art systems by providing a reliable and efficient speech verification system which: (1) includes an efficient and accurate means for detecting the beginning and end of speech; (2) provides parameters for use in verification which are related to features of the vocal cord, and which may provide for comparison between reference and test utterances without limitations based on time or intensity restrictions; (3) provides a verification mode which develops an indication of the probability of erroneously accepting an impostor or rejecting.
  • the system may proceed to calculate (500) the intra-speaker global distortion values for the enrollee' s utterances of the first reference word. Once all the global distortion values have been calculated for that word, the largest value is discarded and the mean and variance for the remainder "of the global distortion values are calculated and stored (502) for later use in' the verification operation.
  • the inter-speaker comparisons for calculating the inter-speaker global distortion values are performed (504) , the largest distortion value is discarded, and, for the remainder of the global distortion values, the inter-speaker mean and variance are calculated and stored (506) .
  • the intra-speaker and the inter-speaker mean and variance are calculated and stored, as described above.
  • the system retrieves from memory the intra-speaker and inter-speaker mean and variance values, the reference words, the k coefficients for the utterances associated with each reference word, and the two threshold values, U and V, corresponding to the user's claimed identity (522).
  • the system randomly selects one word and prompts the user to speak the selected word into the microphone (524) .
  • the received utterance is immediately processed to obtain the normalized autocorrelation coefficients r (526) .
  • the r coefficients are then compared with the k coefficients for each of the stored utterances of that word (previously spoken by the enrolled user) to calculate a new set of global distortions (528) . From the set of global distortions a single combined score (i.e. the mean global distortion) is calculated.
  • p and q are calculated; they represent, respectively, the probability that the enrolled user would produce an utterance yielding a combined score as poor as or worse than the one just calculated and the probability that an impostor would produce an utterance yielding a combined score as good as or better than the one just calculated.
  • p is calculated by integrating the Gaussian density function 540 characterized by the two intra-speaker values (i.e. the mean 542 and the variance 544) from the combined score 546 to positive infinity (i.e. p equals the area of the shaded region 548) .
  • q is calculated by integrating the Gaussian density function 540 characterized by the two intra-speaker values (i.e. the mean 542 and the variance 544) from the combined score 546 to positive infinity (i.e. p equals the area of the shaded region 548) .
  • q is calculated by integrating the Gaussian density function 540 characterized by the two intra-speaker values (i.e. the mean 542
  • Gaussian density function 550 characterized by the two mter-speaker values (i.e. the mean 552 and the variance 554) from negative infinity to the combined score 546 (i.e. q equals the area of shaded region 556).
  • P and Q are cumulative probability values which are updated each time new values for p and q are calculated, i.e. each time a new utterance is received from the user, P and Q (which are initialized to 1) are updated by multiplying each by its respective corresponding individual probability value, p or q. Consequently, after the users first utterance, P and Q equal p and q, respectively (560) .
  • P and Q are compared, respectively, with the two threshold values, U and V, for the enrolled user. If P is less than U but Q remains greater than V (564) the user is rejected; if, conversly, Q is less than V but P remains greater than U (566) the user is accepted. If, however, P remains greater than U and Q remains greater than V (568) no decision to accept or reject the user is made; instead another one of the reference words is chosen and the user is prompted for another utterance. New values for p, q, P, and Q are calculated for this utterance and the updated values of P and Q are once again compared to u and V. The system continues to prompt the user for new utterances until a verification decision is made to either accept or reject the user.
  • the sensitivity of the verification decision, and thus the security level, is related directly to the value of the two threshold values, U and V. If U and V are assigned relatively small values the verification operation will be an extended one requ i ring a long sequence of utterances and analysis.
  • the benefit of low threshold values is an increased accuracy, i.e. greater certainty that an impostor has not been accepted and that a valid user has not been rejected; the benefit of increased accuracy must of course be- weighed against a lengthy verification operation.
  • the utterances he or she spoke during the verification operation are used to update both the stored reference utterances and the intra-speaker and the inter-speaker mean and variance values. This is done to accommodate the changes which occur in the human voice over time. The updating is done autoregressively such that very recent utterances are given more weight than very old ones.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

Un système et un procédé de vérification du locuteur établissent, dans un mode d'initialisation (22), une base de données comprenant plusieurs modèles (47) pour chacun d'un groupe de mots émanant d'un locuteur et présentant des paramètres correspondant à l'appareil phonateur, lesquels paramètres sont comparés avec eux-mêmes et avec des paramètres représentant des modèles de ces mots parmi un groupe générique de locuteurs, afin d'établir des valeurs de distorsion indiquant le degré de correspondance de l'appareil phonateur du locuteur avec ces propres modèles et d'un appareil phonateur générique avec les mêmes modèles du locuteur. Ces distorsions sont utilisées pour établir des informations indiquant les probabilités pour des mots donnés de faire accepter par erreur un imposteur ou de faire rejeter par erreur le bon locuteur (46). Dans un mode de vérification (50), le locuteur prononce de mots-type correspondant a ces modèles mémorisés. Des paramètres sont extraits (60) de ces mots-type et comparés avec les paramètres des modèles mémorisés du locuteur pour établir des valeurs de distorsion représentant la différence entre les paramètres comparés. Les valeurs de distorsion de la parole actuelle sont analysées par rapport aux probabilités d'admission ou de rejet par erreur. Une pluralité d'échantillons de parole peut être traitée pendant la vérification. L'admission ou le rejet est fondé sur les probabilités cumulatives (70). L'accès à des zones contrôlées est autorisé en réponse à un signal d'admission de la part du système de vérification.
EP19860907234 1985-07-01 1986-07-01 Systeme de verification du locuteur. Withdrawn EP0233285A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US75103185A 1985-07-01 1985-07-01
US751031 1985-07-01

Publications (2)

Publication Number Publication Date
EP0233285A1 true EP0233285A1 (fr) 1987-08-26
EP0233285A4 EP0233285A4 (fr) 1987-12-01

Family

ID=25020188

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19860907234 Withdrawn EP0233285A4 (fr) 1985-07-01 1986-07-01 Systeme de verification du locuteur.

Country Status (4)

Country Link
EP (1) EP0233285A4 (fr)
JP (1) JPS63500126A (fr)
AU (1) AU6128586A (fr)
WO (1) WO1987000332A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0397399A2 (fr) * 1989-05-09 1990-11-14 Texas Instruments Incorporated Dispositif de vérification de la voix pour contrôler l'identité de l'utilisateur d'une carte de crédit téléphonique

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1990008379A1 (fr) * 1989-01-17 1990-07-26 The University Court Of The University Of Edinburgh Reconnaissance de personnes sur la base de leur voix
GB2237135A (en) * 1989-10-16 1991-04-24 Logica Uk Ltd Speaker recognition
GB0427205D0 (en) 2004-12-11 2005-01-12 Ncr Int Inc Biometric system
JP5272141B2 (ja) * 2009-05-26 2013-08-28 学校法人早稲田大学 音声処理装置およびプログラム
US9251792B2 (en) 2012-06-15 2016-02-02 Sri International Multi-sample conversational voice verification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3673331A (en) * 1970-01-19 1972-06-27 Texas Instruments Inc Identity verification by voice signals in the frequency domain
US4032711A (en) * 1975-12-31 1977-06-28 Bell Telephone Laboratories, Incorporated Speaker recognition arrangement
EP0078014A1 (fr) * 1981-10-22 1983-05-04 Nissan Motor Co., Ltd. Dispositif de reconnaissance vocal pour véhicule automobile
EP0121248A1 (fr) * 1983-03-30 1984-10-10 Nec Corporation Procédé et système de contrôle de l'identité d'un locuteur
EP0154020A1 (fr) * 1983-12-19 1985-09-11 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Dispositif pour vérifier l'identité d'un locuteur

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2511233C2 (de) * 1975-03-14 1977-03-10 Dornier Gmbh Verfahren zur Verhinderung ungewollter Land- oder Wasserberührung von in niedriger Höhe fliegenden Fluggeräten
GB1569450A (en) * 1976-05-27 1980-06-18 Nippon Electric Co Speech recognition system
DE2844156A1 (de) * 1978-10-10 1980-04-24 Philips Patentverwaltung Verfahren zum verifizieren eines sprechers
JPS6057261B2 (ja) * 1980-03-18 1985-12-13 日本電気株式会社 多回線音声入出力装置
JPS5876893A (ja) * 1981-10-30 1983-05-10 日本電気株式会社 音声認識装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3673331A (en) * 1970-01-19 1972-06-27 Texas Instruments Inc Identity verification by voice signals in the frequency domain
US4032711A (en) * 1975-12-31 1977-06-28 Bell Telephone Laboratories, Incorporated Speaker recognition arrangement
EP0078014A1 (fr) * 1981-10-22 1983-05-04 Nissan Motor Co., Ltd. Dispositif de reconnaissance vocal pour véhicule automobile
EP0121248A1 (fr) * 1983-03-30 1984-10-10 Nec Corporation Procédé et système de contrôle de l'identité d'un locuteur
EP0154020A1 (fr) * 1983-12-19 1985-09-11 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Dispositif pour vérifier l'identité d'un locuteur

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
ICASSP'79, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2nd-4th April 1979, Washington, D.C., pages 789-792, IEEE, New York, US; U. H\FKER et al.: "Structure and performance of an on-line speaker verification system" *
ICASSP'80, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 9th-11th April 1980, Denver, Colorado, vol. 3, pages 1060-1062, IEEE, New York, US; S. FURUI et al.: "Experimental studies in a new automatic speaker verification system using telephone speech" *
ICASSP'85, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 26th-29th March 1985, Tampa, Florida, vol. 1, pages 399-402, IEEE, New York, US; W. FEIX et al.: "A speaker verification system for access-control" *
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-24, no. 3, June 1976, pages 201-212, IEEE, New York, US; B.S. ATAL et al.: "A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition" *
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-29, no. 4, August 1981, pages 777-785, IEEE, New York, US; L.F. LAMEL et al.: "An improved endpoint detector for isolated word recognition" *
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-33, no. 3, June 1985, pages 574-586, IEEE, New York, US; A.E. ROSENBERG et al.: "Talker recognition in tandem with talker-independent isolated word recognition" *
PROCEEDINGS OF THE FOURTH INTERNATIONAL JOINT CONFERENCE ON PATTERN RECOGNITION, 7th-10th November 1978, Kyoto, pages 1019-1021, IEEE, New York, US; Y. GRENIER: "Speaker identification from linear prediction" *
PROCEEDINGS OF THE IEEE, vol. 64, no. 4, April 1976, pages 475-487, New York, US; A.E. ROSENBERG: "Automatic speaker verification: a review" *
See also references of WO8700332A1 *
THE BELL SYSTEM TECHNICAL JOURNAL, vol. 58, no. 10, December 1979, pages 2217-2233, American Telephone and Telegraph Co., New York, US; L.R. RABINER et al.: "Application of clustering techniques to speaker-trained isolated word recognition" *
WESCON TECHNICAL PAPERS, vol. 19, 16th September 1975, pages 31.3.1 - 31.3.5, Western Periodicals Co., North Hollywood, US; G.R. DODDINGTON: "Speaker verification for entry control" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0397399A2 (fr) * 1989-05-09 1990-11-14 Texas Instruments Incorporated Dispositif de vérification de la voix pour contrôler l'identité de l'utilisateur d'une carte de crédit téléphonique
EP0397399A3 (fr) * 1989-05-09 1991-07-31 Texas Instruments Incorporated Dispositif de vérification de la voix pour contrôler l'identité de l'utilisateur d'une carte de crédit téléphonique

Also Published As

Publication number Publication date
EP0233285A4 (fr) 1987-12-01
WO1987000332A1 (fr) 1987-01-15
AU6128586A (en) 1987-01-30
JPS63500126A (ja) 1988-01-14

Similar Documents

Publication Publication Date Title
US5548647A (en) Fixed text speaker verification method and apparatus
EP3599606B1 (fr) Apprentissage machine d'authentification vocale
US10950245B2 (en) Generating prompts for user vocalisation for biometric speaker recognition
US5339385A (en) Speaker verifier using nearest-neighbor distance measure
US6510415B1 (en) Voice authentication method and system utilizing same
US6463415B2 (en) 69voice authentication system and method for regulating border crossing
US5794196A (en) Speech recognition system distinguishing dictation from commands by arbitration between continuous speech and isolated word modules
EP0983587B1 (fr) Procede de verification du locuteur mettant en oeuvre de multiples groupes de classe
US5897616A (en) Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases
US20060222210A1 (en) System, method and computer program product for determining whether to accept a subject for enrollment
KR100406307B1 (ko) 음성등록방법 및 음성등록시스템과 이에 기초한음성인식방법 및 음성인식시스템
EP0892388B1 (fr) Méthode et dispositif d'authentification d'interlocuteur par vérification d'information utilisant un décodage forcé
EP0389541A1 (fr) Systeme de reduction d'erreurs de reconnaissance de structures
US5937381A (en) System for voice verification of telephone transactions
WO1998022936A1 (fr) Identification d'un locuteur fondee par le sous-mot par fusion de plusieurs classificateurs, avec adaptation de canal, de fusion, de modele et de seuil
CA2318262A1 (fr) Systeme et procede multiresolution destines a une verification du locuteur
KR19980086697A (ko) 음성 인식 시스템에서의 화자 인식 방법 및 장치
US8032380B2 (en) Method of accessing a dial-up service
EP1027700A1 (fr) Systeme d'adaptation de modele et procede de verification de locuteur
US7490043B2 (en) System and method for speaker verification using short utterance enrollments
US20080071538A1 (en) Speaker verification method
EP0233285A1 (fr) Systeme de verification du locuteur
JPH1173196A (ja) 話者の申し出識別を認証する方法
JP4245948B2 (ja) 音声認証装置、音声認証方法及び音声認証プログラム
CN117351945A (zh) 身份鉴权方法、装置及介质

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LI NL SE

17P Request for examination filed

Effective date: 19870706

RIN1 Information on inventor provided before grant (corrected)

Inventor name: ABUT, HUSEYIN

Inventor name: ELMAN, JEFFREY, L.

Inventor name: TAO, BERTRAM, P., M.

Inventor name: DENKER, THOMAS, A.

A4 Supplementary search report drawn up and despatched

Effective date: 19871201

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19900201

RIN1 Information on inventor provided before grant (corrected)

Inventor name: ELMAN, JEFFREY, L.

Inventor name: ABUT, HUSEYIN

Inventor name: DENKER, THOMAS, A.

Inventor name: TAO, BERTRAM, P., M.