US4477925A - Clipped speech-linear predictive coding speech processor - Google Patents

Clipped speech-linear predictive coding speech processor Download PDF

Info

Publication number
US4477925A
US4477925A US06/329,776 US32977681A US4477925A US 4477925 A US4477925 A US 4477925A US 32977681 A US32977681 A US 32977681A US 4477925 A US4477925 A US 4477925A
Authority
US
United States
Prior art keywords
spoken speech
speech
linear predictive
output
speech utterance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US06/329,776
Inventor
James M. Avery
Elmer A. Hoyer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MagnaChip Semiconductor Ltd
LSI Logic FSI Corp
NCR Voyix Corp
Original Assignee
NCR Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NCR Corp filed Critical NCR Corp
Assigned to NCR CORPORATION, A CORP OF MD reassignment NCR CORPORATION, A CORP OF MD ASSIGNMENT OF ASSIGNORS INTEREST. Assignors: AVERY, JAMES M., HOYER, ELMER A.
Priority to US06/329,776 priority Critical patent/US4477925A/en
Priority to DE8383900305T priority patent/DE3271705D1/en
Priority to JP83500435A priority patent/JPS58502113A/en
Priority to CA000417214A priority patent/CA1180447A/en
Priority to PCT/US1982/001716 priority patent/WO1983002190A1/en
Priority to EP83900305A priority patent/EP0096712B1/en
Priority to DE198383900305T priority patent/DE96712T1/en
Publication of US4477925A publication Critical patent/US4477925A/en
Application granted granted Critical
Assigned to HYUNDAI ELECTRONICS AMERICA reassignment HYUNDAI ELECTRONICS AMERICA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AT&T GLOBAL INFORMATION SOLUTIONS COMPANY (FORMERLY KNOWN AS NCR CORPORATION)
Assigned to SYMBIOS LOGIC INC. reassignment SYMBIOS LOGIC INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HYUNDAI ELECTRONICS AMERICA
Assigned to SYMBIOS, INC . reassignment SYMBIOS, INC . CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SYMBIOS LOGIC INC.
Assigned to LEHMAN COMMERCIAL PAPER INC., AS ADMINISTRATIVE AGENT reassignment LEHMAN COMMERCIAL PAPER INC., AS ADMINISTRATIVE AGENT SECURITY AGREEMENT Assignors: HYUNDAI ELECTRONICS AMERICA, A CORP. OF CALIFORNIA, SYMBIOS, INC., A CORP. OF DELAWARE
Assigned to HYUNDAI ELECTRONICS AMERICA reassignment HYUNDAI ELECTRONICS AMERICA TERMINATION AND LICENSE AGREEMENT Assignors: SYMBIOS, INC.
Anticipated expiration legal-status Critical
Assigned to HYNIX SEMICONDUCTOR INC. reassignment HYNIX SEMICONDUCTOR INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HYNIX SEMICONDUCTOR AMERICA, INC.
Assigned to HYNIX SEMICONDUCTOR AMERICA INC. reassignment HYNIX SEMICONDUCTOR AMERICA INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: HYUNDAI ELECTRONICS AMERICA
Assigned to MAGNACHIP SEMICONDUCTOR, LTD. reassignment MAGNACHIP SEMICONDUCTOR, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HYNIX SEMICONDUCTOR, INC.
Assigned to SYMBIOS, INC., HYUNDAI ELECTRONICS AMERICA reassignment SYMBIOS, INC. RELEASE OF SECURITY INTEREST Assignors: LEHMAN COMMERICAL PAPER INC.
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters

Definitions

  • the present invention relates to speech recognition systems and more particularly, to a system for recognizing an utterance as one of a plurality of reference utterances, and the method therefor.
  • Speech input arrangements may be utilized to record transactions, to record and request information, to control machine tools, or to permit a person to interact with data processing and control equipment without diverting attention from other activity. Because of the complex nature of speech, its considerable variability from speaker to speaker and variability even for a particular speaker, it is difficult to attain perfect recognition of speech segments.
  • One type of priorly known speech recognition system converts an input speech signal into a sequence of phonetically based features.
  • the derived features generally obtained from a spectral analysis of speech segments, are compared to a stored set of reference features corresponding to the speech segment or word to be recognized. If an input speech segment meets prescribed recognition criteria, the segment is accepted as the reference speech segment. Otherwise it is rejected.
  • the reliability of the recognition system is thus highly dependent on the prescribed set of reference features and on the recognition criteria.
  • a speech recognition system which comprises a clipping element, having an input terminal adapted to receive an input signal representative of a spoken utterance, to generate a clipped input signal.
  • An element for sampling the clipped input signal which is operatively connected to the means for clipping, generates a plurality of discrete binary values, each discrete binary value corresponding to a sample value of the clipped input signal.
  • An element for analyzing the plurality of sample values thereby identifies the spoken utterance.
  • the method of speech recognition of the present invention comprises the steps of clipping an input signal representative of a spoken utterance to generate a clipped input signal.
  • the clipped input signal is sampled, generating a plurality of discrete binary values, each discrete binary value corresponding to a sample value of the clipped input signal.
  • the plurality of sample values is then analyzed thereby identifying the spoken utterance.
  • a signal recognition system of the present invention includes a signal quantizer having an input terminal for receiving an analog input signal and an output terminal.
  • the signal quantizer is designed to quantize the input signal into binary values on its output terminal.
  • a sampler is connected to the output terminal of the signal quantizer for periodically sampling the binary value on the output terminal and generating a string of binary bits responsive thereto.
  • An analyzer is included which is responsive to each string of bits generated by the sampler and operative to determine autocorrelation functions of each string of bits produced by the sampling means for providing a discernible representation thereof.
  • FIG. 1 shows a block diagram of the preferred embodiment of the speech recognition system of the present invention
  • FIGS. 2A through 2C which taken together as shown in FIG. 2D, comprise FIG. 2, shows a logic diagram of the digital computer input elements for the speech recognition system of the preferred embodiment
  • FIG. 3 shows the waveforms associated with the various logic elements of FIG. 2;
  • FIG. 4 is a flow diagram of the data base building process, or learn mode, of the present invention.
  • FIG. 5 is a general flow diagram of the recognition process or mode of the present invention.
  • FIGS. 6A and 6B is a detailed flow of the learn mode of FIG. 4.
  • FIGS. 7A through 7B is a detailed flow diagram of the recognition mode of FIG. 5.
  • the speech recognition system 1 comprises a bandpass filter 10 which receives an INPUT SIGNAL.
  • the INPUT SIGNAL is an electrical signal representation of the uttered speech provided by a transducer or electroacoustical device (not shown).
  • An infinite clipper 20 is operatively connected to a sample clock 30, a shift register 40 and the bandpass filter 10.
  • a first in-first out buffer (FIFO buffer) 50 operates as a buffer between a digital computer 60 and the shift register 40, the FIFO buffer 50 and shift register 40 being clocked by the sample clock 30.
  • the digital computer 60 has an associated storage 70 for providing storage capability, and outputs a signal (OUTPUT SIGNAL) which is a digital quantity, in BCD or other appropriate format, indicative of the recognized speech utterance.
  • a speech utterance contains a multiplicity of resonant frequency components which are modified dynamically by the characteristics of an individual's vocal and nasal tracts during the speech utterance.
  • a speech utterance refers to a word or group of words spoken in a continuous chain and is not meant to refer to a grunt or other unintelligible sound.
  • These resonant characteristics or frequencies are called the formant frequencies and reside in a spectral band as follows:
  • Fundamental formant F0 contributes significantly to the "pitch" of the uttered speech but contains little intelligence.
  • Formants F4 and F5 contribute little in terms of energy in a spectrogram and have been shown to have little effect on the intelligibility of speech. Therefore, in order to eliminate the fundamental formant F0, and in order to eliminate the higher frequency formants which contribute little intelligence, the INPUT SIGNAL is passed through bandpass filter 10.
  • bandpass filter 10 comprises a low pass filter 11, in conjunction with a resistor 12 and capacitor 13 which comprises a high pass filter.
  • the resistor and capacitor values are selected to yield a cutoff frequency of 300 cycles
  • the low pass filter 11 is a Khronhite filter having a cutoff frequency of approximately 5 KHz.
  • the output of the bandpass filter 10 results in a filtered input signal as shown in FIG. 3A.
  • the filtered input signal is then coupled to infinite clipper 20, resulting in a CLIPPED-SPEECH signal as shown in FIG. 3B.
  • the infinite clipper 20 of the preferred embodiment comprises integrated circuit chip LM311 well known in the art. (The numbers around the outside periphery of the chip indicate the pin number and the letters inside the chip indicate a function, e.g., CLR signifying clear.)
  • the resulting output signal from infinite clipper 20, the CLIPPED-SPEECH signal is coupled to a shift register 40.
  • the shift register 40 of the preferred embodiment comprises two integrated circuit chips 74164. The shift register 40 performs the sampling and the serial to parallel transfer of a sampled CLIPPED-SPEECH signal, under the control of sample clock 30.
  • the shift register 40 When the shift register 40 is full, the contents of the shift register 40, a data word, is then shifted in parallel to the FIFO buffer 50 under control of sample clock 30.
  • the number of stages of shift register 40 is selected to correspond to a data word size of digital computer 60.
  • the digital computer 60 accepts the data word from the FIFO buffer 50 from the data output lines D0 through D15, the transfer being controlled by a handshaking interface which comprises the READ signal from digital computer 60 and the SAMPLE READY OUT signal from FIFO buffer 50.
  • the FIFO buffer 50 comprises four 3341 integrated circuit chips 51-54 and the control section 55 comprises integrated circuit chip 74161.
  • Two NAND-gates 56, 57 combine control signals from the four 3341 integrated circuit chips 51-54 to yield a SAMPLE ERROR signal and the SAMPLE READY OUT signal, these signals comprising part of the interface with the digital computer 60.
  • the sample clock 30 comprises oscillator 31 and gate 32.
  • the oscillator 31 utilized is a Wavetek 159 programmable signal generator which can be turned on under control of gate 32, gate 32 comprising a J-K flip-flop, integrated circuit chip 74109.
  • the clock input (C) of gate 32 is operatively connected to the output of infinite clipper 20 for detecting when the CLIPPED-SPEECH signal is present and is to be sampled.
  • a reset or initialization signal, INIT is provided for the speech recognition system 1.
  • the digital computer 60 of the preferred embodiment is a Hewlett-Packard 2117F computer with 512k bytes of main memory. Storage 70 is provided by a 125M byte HP7925 disk drive.
  • the computer operating system is Hewlett-Packards Real Time Environment RTE IV B Software, and the data base architecture is supported by Hewlett-Packards IMAGE 1000 Data Base Management System Software. It will be recognized by those skilled in the art that a variety of processors or digital computers may be utilized without departing from the scope of the invention. It will also be further recognized that the various elements of the speech recognition system 1 may be modified within the scope and spirit of the present invention.
  • each sample comprises a digital quantity, the digital quantity being made up of a number of bits. The number of bits may be a byte, computer word, etc.
  • storage 70 has stored therein all the sampled clipped speech data read from FIFO buffer 50 by the digital computer 60. After the speech utterance is in storage 70 in the sampled clipped-speech format, the digital computer analyzes the stored data to yield the recognized speech, the digital computer processing to be discussed in detail hereinunder.
  • the oscillator 31 frequency determines the sampling rate.
  • the sampling rate of the system of the present invention should be sufficient to maintain zero crossing accuracy.
  • a nominal sampling rate utilized by the system of the present invention is 24 KHz. It will be understood by those skilled in the art that the values, parameters, etc., contained herein are intended for illustrative purposes only to aid in the understanding of the concept and implementation of the present invention and is not intended in any way to limit the scope of the present invention.
  • the learn process refers to the building of the data base for the speech to be recognized. This data base is also referred to herein as the learned speech, vocabulary, and dictionary.
  • the input speech is input to the system both verbally (i.e., via the INPUT SIGNAL discussed above) and also via an input device for identifying the verbal input (block 100).
  • the INPUT SIGNAL is then filtered, clipped and sampled (block 110) and inputted to the digital computer 60.
  • the digital computer calculates the linear predictive coding (LPC) parameters (block 120) and then stores the respective LPC parameters, distance measures, and identifying voice information (blocks 131, 132, 133). These stored quantities are stored in storage 70 consistent with data base management techniques well known in the art. If any more input speech or voicings are to be made (block 140), block 100 is repeated. If no more voicings are to be made, the process stops.
  • LPC linear predictive coding
  • the speech recognition system 1 is ready to perform the recognition process.
  • FIG. 5 there is shown a flowchart of the recognition process of the speech recognition system 1 of the present invention.
  • the speech utterance to be recognized is inputted into the speech recognition system 1 (block 200).
  • the INPUT SIGNAL is then filtered, clipped and sampled (block 210) and then inputted to the digital computer 60.
  • the digital computer 60 then calculates the LPC parameters (block 220) and calculates the minimum distance (block 230).
  • the distance measure calculated is then compared to the distance measure stored in the data base (block 240) and repeats the comparison process until the minimum distance measure is found (block 250).
  • the computer When the minimum distance measure is found, the computer outputs the identifying voice information stored in the data base with the associated parameters determined as the OUTPUT SIGNAL (block 260). If any further voice recognition is to be performed (block 270), the process repeats at block 200, otherwise the process halts.
  • the linear prediction analysis is based on the all-pole linear prediction filter model well known in the art.
  • the linear prediction coefficients a k are the coefficients of the sampled clipped-speech signal y(n) in accordance with the representation of equation (1).
  • coefficients a k are the coefficients of the sampled clipped-speech signal y(n) in accordance with the representation of equation (1).
  • a 16-pole filter model is used. It is to be understood, that other pole arrangements may be used.
  • the coefficients a k are the coefficients of the sampled speech signal y(n) in accordance with the representation of equation (1).
  • the actual clipped speech sampled values ⁇ V are replaced via the infinite clipper 20 with a corresponding binary value (binary 1 for +V and binary 0 for -V).
  • the LPC method utilizes the clipped speech sampled values of ⁇ V, the binary 1 or binary 0 being a sample value, as stored in storage 70 for the value of the signal y(n).
  • Equation (2) forms a set of "p" equations with "p" unknowns in the form.
  • the Levinson recursion method is used to solve the "p" linear equations.
  • the p ⁇ p autocorrelation matrix is symmetric with identical elements along the diagonals and is identified as a Toeplitz matrix.
  • the a k coefficients, resulting from the solution of Equation 2 for each short time segment of sampled speech are stored in a data base structure within storage 70. These stored a k parameters are then used as elements comparison templates during the recognition process.
  • FIG. 6 comprises FIGS. 6A and 6B.
  • the process described in FIG. 6 will be what is referred to as the learn mode, i.e., setting up the data bases to contain the vocabulary or dictionary for the speech utterances to be recognized.
  • a second mode of the program, the recognition mode is also included in the same program. It will be recognized by those skilled in the art that the programs may be separated. If a fixed data base vocabulary or learned speech is established and resides in storage 70, there is no need to perform the learn mode. Because of the common processing between the two modes, the programs for each mode are combined into a single program.
  • the program sets the number of poles (IP) (block 300), initializes the autocorrelation window (IW) (block 310), and initializes all the working arrays contained within the program (block 320).
  • IP number of poles
  • IW autocorrelation window
  • the mnemonics IRECOG, IW, and IP are symbols used in the program which is included herein as Appendix I).
  • an input of the text (voice identifying information) of the utterance to be stored in the data base is input by an input device (block 330).
  • the input speech utterance (verbal input or voicing) is then inputted to the speech recognition system 1 (block 340) and goes through the filtering, clipping and sampling process described above.
  • the binary information is then stored in storage 70.
  • the program begins the analysis.
  • the program retrieves a clipped speech sample of one bit (block 350) and computes R(i) and R(0) for the current window of N speech samples (block 360) in accordance with Equations (4) and (5). ##EQU5## where, P is the number of poles,
  • N is the number of samples in a window
  • n is the individual sample instant.
  • the program solves for the coefficients a k using the Levinson recursion method in accordance with Equation (6), and saves the coefficients a k in the data base (block 370).
  • the program calculates the gain G in accordance with Equation (7) and saves that information in the data base (block 380), and calculates the residuals in accordance with Equation (8) and saves the results in the data base (block 390).
  • the program then calculates the measure (or distance measure) in accordance with Equation (9), and saves that information in the data base (block 325).
  • the recognition mode will now be described.
  • the process starts by initializing the program which comprises setting the number of poles p (IP)(block 400), initializing the autocorrelation window (IW) (block 410), and initializing all working arrays (block 420).
  • IP number of poles p
  • IW autocorrelation window
  • block 420 initializing all working arrays
  • the speech utterance is then inputted (block 430) and is filtered, clipped, sampled, and stored in storage 70.
  • the digital computer 60 then proceeds with processing the speech utterance as stored in storage 70.
  • the program retrieves a single clipped speech sample of one bit (block 440).
  • the program computes R'(i) and R'(0) of N speech samples (block 450) in accordance with Equations (4) and (5). (The “prime” indicates the speech parameters to be recognized, or "unknown” speech parameters versus the parameters stored in the data base.)
  • the program then calculates the gain G' in accordance with Equation (7) and solves for coefficients a'k (block 460) in accordance with Equation (6).
  • the program calculates the residuals (block 470) in accordance with Equation (10) and then calculates the measure for the unknown speech input (block 480) in accordance with Equation (9). ##EQU8##
  • the program shifts to the next window (block 425) and repeats the processing starting with block 440. If all the speech windows have been analyzed, the program retrieves all the data base members which have a measure in accordance with Equation (11) less than a predetermined value, the predetermined value of the distance measure of the preferred embodiment being 200 (block 435).
  • the data base item numbers ##EQU9## for the words retrieved are saved (block 445), and each member retrieved is examined according to a distance measure specified by Equations (12) or (13) (block 446).
  • the distance measure of the preferred embodiment utilized in block 446 is that specified by Equation (13).
  • the items are then sorted to find the item with the minimum distance (block 455).
  • the item having the minimum distance can then be retrieved using the item pointer, the information contained in the retrieved item includes the voice identifying information thereby identifying the speech utterance from the previously-learned vocabulary (block 465).
  • the program then outputs the voice identifying information which constitutes the OUTPUT SIGNAL (block 475). If more speech is to be recognized (block 485), the program repeats the process starting at block 430. If no more recognition is to be performed, the program stops.

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Complex Calculations (AREA)
  • Navigation (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Machine Translation (AREA)
  • Traffic Control Systems (AREA)

Abstract

The present invention relates to a speech recognition system and the method therefor, which analyzes a sampled clipped speech signal for identifying a spoken utterance. An input signal representative of the spoken utterance is passed through a clipper to generate a clipped input signal. A sampler generates a plurality of discrete binary values, each discrete binary value corresponding to a sample value of the clipped input signal. A processor then analyzes the plurality of sample values thereby identifying the spoken utterance. Analysis includes determining linear prediction coefficients of the autocorrelation function of speech utterences.

Description

BACKGROUND OF THE INVENTION
The present invention relates to speech recognition systems and more particularly, to a system for recognizing an utterance as one of a plurality of reference utterances, and the method therefor.
In communication, data processing and control systems, it is often desirable to utilize speech as direct input for data, commands, or other information. Speech input arrangements may be utilized to record transactions, to record and request information, to control machine tools, or to permit a person to interact with data processing and control equipment without diverting attention from other activity. Because of the complex nature of speech, its considerable variability from speaker to speaker and variability even for a particular speaker, it is difficult to attain perfect recognition of speech segments.
One type of priorly known speech recognition system converts an input speech signal into a sequence of phonetically based features. The derived features, generally obtained from a spectral analysis of speech segments, are compared to a stored set of reference features corresponding to the speech segment or word to be recognized. If an input speech segment meets prescribed recognition criteria, the segment is accepted as the reference speech segment. Otherwise it is rejected. The reliability of the recognition system is thus highly dependent on the prescribed set of reference features and on the recognition criteria.
Another type of speech recognition system disclosed in the article "Minimum Prediction Residual Principle Applied to Speech Recognition," by Fumitada Itakura in the IEEE Transactions on Acoustics, Speech, and Signal Processing, February 1975, pages 67-72, does not rely on a prescribed set of spectrally derived phonetic features but instead obtains a sequence of vectors representative of the linear prediction characteristics of a speech signal and compares these linear prediction characteristic vectors with a corresponding sequence of reference vectors representative of the linear prediction characteristics of a previous utterance of an identified speech segment or word. As is well known in the art, linear prediction characteristics include combinations of a large number of speech features and thus can provide an improved recognition over arrangements in which only a limited number of selected spectrally derived phonetic features are used.
The prior art systems mentioned above require the use of an A-D converter in order to digitize the input speech signal, the digitized quantities being stored for subsequent processing by a digital computer or processor. The amount of storage required to store the digitized quantities, while dependent upon the sampling rate, can be extremely large. Therefore, there exists a need for a speech recognition system which would eliminate the plurality of spectral filters, eliminate the bulky and costly A-D converters, and reduce memory requirements of the prior art systems while maintaining a high degree of speech recognition capability, and also be more readily implementable in VLSI technology.
SUMMARY OF THE INVENTION
In accordance with the present invention there is provided a speech recognition system which comprises a clipping element, having an input terminal adapted to receive an input signal representative of a spoken utterance, to generate a clipped input signal. An element for sampling the clipped input signal, which is operatively connected to the means for clipping, generates a plurality of discrete binary values, each discrete binary value corresponding to a sample value of the clipped input signal. An element for analyzing the plurality of sample values thereby identifies the spoken utterance.
The method of speech recognition of the present invention comprises the steps of clipping an input signal representative of a spoken utterance to generate a clipped input signal. The clipped input signal is sampled, generating a plurality of discrete binary values, each discrete binary value corresponding to a sample value of the clipped input signal. The plurality of sample values is then analyzed thereby identifying the spoken utterance.
In a preferred embodiment, a signal recognition system of the present invention includes a signal quantizer having an input terminal for receiving an analog input signal and an output terminal. The signal quantizer is designed to quantize the input signal into binary values on its output terminal. A sampler is connected to the output terminal of the signal quantizer for periodically sampling the binary value on the output terminal and generating a string of binary bits responsive thereto. An analyzer is included which is responsive to each string of bits generated by the sampler and operative to determine autocorrelation functions of each string of bits produced by the sampling means for providing a discernible representation thereof.
Accordingly, it is an object of the present invention to provide a speech recognition system.
It is another object of the present invention to provide a speech recognition system with reduced memory requirements.
It is a further object of the present invention to provide a binary speech recognition system.
These and other objects of the present invention will become more apparent when taken in conjunction with the following description and attached drawings, wherein like characters indicate like parts, and which drawings form a part of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows a block diagram of the preferred embodiment of the speech recognition system of the present invention;
FIGS. 2A through 2C, which taken together as shown in FIG. 2D, comprise FIG. 2, shows a logic diagram of the digital computer input elements for the speech recognition system of the preferred embodiment;
FIG. 3 shows the waveforms associated with the various logic elements of FIG. 2;
FIG. 4 is a flow diagram of the data base building process, or learn mode, of the present invention;
FIG. 5 is a general flow diagram of the recognition process or mode of the present invention;
FIGS. 6A and 6B is a detailed flow of the learn mode of FIG. 4; and
FIGS. 7A through 7B is a detailed flow diagram of the recognition mode of FIG. 5.
DETAILED DESCRIPTION
Referring to FIG. 1, there is shown a block diagram of the preferred embodiment of the speech recognition system of the present invention. The speech recognition system 1 comprises a bandpass filter 10 which receives an INPUT SIGNAL. The INPUT SIGNAL is an electrical signal representation of the uttered speech provided by a transducer or electroacoustical device (not shown). An infinite clipper 20 is operatively connected to a sample clock 30, a shift register 40 and the bandpass filter 10. A first in-first out buffer (FIFO buffer) 50 operates as a buffer between a digital computer 60 and the shift register 40, the FIFO buffer 50 and shift register 40 being clocked by the sample clock 30. The digital computer 60 has an associated storage 70 for providing storage capability, and outputs a signal (OUTPUT SIGNAL) which is a digital quantity, in BCD or other appropriate format, indicative of the recognized speech utterance.
The operation of the speech recognition system 1 will now be described generally in conjunction with FIGS. 2 and 3.
A speech utterance contains a multiplicity of resonant frequency components which are modified dynamically by the characteristics of an individual's vocal and nasal tracts during the speech utterance. (A speech utterance refers to a word or group of words spoken in a continuous chain and is not meant to refer to a grunt or other unintelligible sound.) These resonant characteristics or frequencies are called the formant frequencies and reside in a spectral band as follows:
F0: 0-300 Hz (fundamental)
F1: 200-999 Hz
F2: 550-2700 Hz
F3: 1100-2950 Hz
Fundamental formant F0 contributes significantly to the "pitch" of the uttered speech but contains little intelligence. Formants F4 and F5 contribute little in terms of energy in a spectrogram and have been shown to have little effect on the intelligibility of speech. Therefore, in order to eliminate the fundamental formant F0, and in order to eliminate the higher frequency formants which contribute little intelligence, the INPUT SIGNAL is passed through bandpass filter 10.
Referring to FIG. 2, bandpass filter 10 comprises a low pass filter 11, in conjunction with a resistor 12 and capacitor 13 which comprises a high pass filter. In the preferred embodiment, the resistor and capacitor values are selected to yield a cutoff frequency of 300 cycles, and the low pass filter 11 is a Khronhite filter having a cutoff frequency of approximately 5 KHz. The output of the bandpass filter 10 results in a filtered input signal as shown in FIG. 3A.
The filtered input signal is then coupled to infinite clipper 20, resulting in a CLIPPED-SPEECH signal as shown in FIG. 3B. The infinite clipper 20 of the preferred embodiment comprises integrated circuit chip LM311 well known in the art. (The numbers around the outside periphery of the chip indicate the pin number and the letters inside the chip indicate a function, e.g., CLR signifying clear.) The resulting output signal from infinite clipper 20, the CLIPPED-SPEECH signal, is coupled to a shift register 40. The shift register 40 of the preferred embodiment comprises two integrated circuit chips 74164. The shift register 40 performs the sampling and the serial to parallel transfer of a sampled CLIPPED-SPEECH signal, under the control of sample clock 30. When the shift register 40 is full, the contents of the shift register 40, a data word, is then shifted in parallel to the FIFO buffer 50 under control of sample clock 30. The number of stages of shift register 40 is selected to correspond to a data word size of digital computer 60.
The digital computer 60 accepts the data word from the FIFO buffer 50 from the data output lines D0 through D15, the transfer being controlled by a handshaking interface which comprises the READ signal from digital computer 60 and the SAMPLE READY OUT signal from FIFO buffer 50.
In the preferred embodiment, the FIFO buffer 50 comprises four 3341 integrated circuit chips 51-54 and the control section 55 comprises integrated circuit chip 74161. Two NAND- gates 56, 57 combine control signals from the four 3341 integrated circuit chips 51-54 to yield a SAMPLE ERROR signal and the SAMPLE READY OUT signal, these signals comprising part of the interface with the digital computer 60. In the preferred embodiment, the sample clock 30 comprises oscillator 31 and gate 32. The oscillator 31 utilized is a Wavetek 159 programmable signal generator which can be turned on under control of gate 32, gate 32 comprising a J-K flip-flop, integrated circuit chip 74109. It will be recognized by those skilled in the art that any oscillator may be substituted which has the function and characteristics utilized in the preferred embodiment. The clock input (C) of gate 32 is operatively connected to the output of infinite clipper 20 for detecting when the CLIPPED-SPEECH signal is present and is to be sampled. A reset or initialization signal, INIT, is provided for the speech recognition system 1.
The digital computer 60 of the preferred embodiment is a Hewlett-Packard 2117F computer with 512k bytes of main memory. Storage 70 is provided by a 125M byte HP7925 disk drive. The computer operating system is Hewlett-Packards Real Time Environment RTE IV B Software, and the data base architecture is supported by Hewlett-Packards IMAGE 1000 Data Base Management System Software. It will be recognized by those skilled in the art that a variety of processors or digital computers may be utilized without departing from the scope of the invention. It will also be further recognized that the various elements of the speech recognition system 1 may be modified within the scope and spirit of the present invention.
Referring to FIGS. 3B and 3C, it can be seen that the sampling of the CLIPPED-SPEECH signal results in a discrete value of +V or -V for each sample, which is subsequently translated to a logic 1 or a logic 0, respectively. Each sample is then represented by a single bit, the 16-bit words stored in storage 70 as shown in FIG. 3D thereby containing 16 sample values. It will be understood, that under previous digital techniques of speech recognition not utilizing clipped speech (clipped speech implies infinite clipped speech herein unless otherwise noted) each sample comprises a digital quantity, the digital quantity being made up of a number of bits. The number of bits may be a byte, computer word, etc. In the previous digital speech recognition systems mentioned above, if sixteen bits are required to yield the desired results, then it can be seen that a sixteen to one reduction of memory is obtained in the clipped speech system of the present invention. Hence, storage 70 has stored therein all the sampled clipped speech data read from FIFO buffer 50 by the digital computer 60. After the speech utterance is in storage 70 in the sampled clipped-speech format, the digital computer analyzes the stored data to yield the recognized speech, the digital computer processing to be discussed in detail hereinunder.
The oscillator 31 frequency determines the sampling rate. The sampling rate of the system of the present invention should be sufficient to maintain zero crossing accuracy. A nominal sampling rate utilized by the system of the present invention is 24 KHz. It will be understood by those skilled in the art that the values, parameters, etc., contained herein are intended for illustrative purposes only to aid in the understanding of the concept and implementation of the present invention and is not intended in any way to limit the scope of the present invention.
Referring to FIG. 4, there is shown a block diagram of the learn process of the speech recognition system 1 of the present invention. The learn process refers to the building of the data base for the speech to be recognized. This data base is also referred to herein as the learned speech, vocabulary, and dictionary. The input speech is input to the system both verbally (i.e., via the INPUT SIGNAL discussed above) and also via an input device for identifying the verbal input (block 100). The INPUT SIGNAL is then filtered, clipped and sampled (block 110) and inputted to the digital computer 60. The digital computer calculates the linear predictive coding (LPC) parameters (block 120) and then stores the respective LPC parameters, distance measures, and identifying voice information (blocks 131, 132, 133). These stored quantities are stored in storage 70 consistent with data base management techniques well known in the art. If any more input speech or voicings are to be made (block 140), block 100 is repeated. If no more voicings are to be made, the process stops.
Once the data base has been established, the speech recognition system 1 is ready to perform the recognition process. Referring to FIG. 5, there is shown a flowchart of the recognition process of the speech recognition system 1 of the present invention. The speech utterance to be recognized is inputted into the speech recognition system 1 (block 200). The INPUT SIGNAL is then filtered, clipped and sampled (block 210) and then inputted to the digital computer 60. The digital computer 60 then calculates the LPC parameters (block 220) and calculates the minimum distance (block 230). The distance measure calculated is then compared to the distance measure stored in the data base (block 240) and repeats the comparison process until the minimum distance measure is found (block 250). When the minimum distance measure is found, the computer outputs the identifying voice information stored in the data base with the associated parameters determined as the OUTPUT SIGNAL (block 260). If any further voice recognition is to be performed (block 270), the process repeats at block 200, otherwise the process halts.
A linear prediction analysis of the sampled clipped-speech signal y(n) is made by digital computer 60 in accordance with ##EQU1## where n=1 to N, N being the number of samples within a window, and p is the number of poles of a prediction analysis model. The linear prediction analysis is based on the all-pole linear prediction filter model well known in the art.
The linear prediction coefficients ak, or more simply referred to herein as coefficients ak, are the coefficients of the sampled clipped-speech signal y(n) in accordance with the representation of equation (1). In the preferred embodiment a 16-pole filter model is used. It is to be understood, that other pole arrangements may be used.
The coefficients ak, are the coefficients of the sampled speech signal y(n) in accordance with the representation of equation (1). For the 16-pole filter model used in the preferred embodiment, the coefficients a(1) through a(16) are generated by the digital computer 60 for each window of N samples by the short term autocorrelation analysis in accordance with Equations (2). Since the digital filter model is representative of the sampled clipped-speech for a time period of approximately 10-12 ms, it is desirable to update the coefficients ak about every 10 ms. For a sample rate of 24 KHz, there are 256 samples (i.e., N=256) in a window time of 10.6 ms. The number of windows is dependent upon the length of time of the speech utterance T1, namely T1 ÷time of window. ##EQU2## R(i) and R(i-k), Equation (3), are arrived at through windowing of sampled clipped-speech signal y(n). ##EQU3##
As discussed above, the actual clipped speech sampled values ±V (or normalized to ±1) are replaced via the infinite clipper 20 with a corresponding binary value (binary 1 for +V and binary 0 for -V). The LPC method utilizes the clipped speech sampled values of ±V, the binary 1 or binary 0 being a sample value, as stored in storage 70 for the value of the signal y(n). Equation (2) forms a set of "p" equations with "p" unknowns in the form. ##EQU4## The Levinson recursion method is used to solve the "p" linear equations. The p×p autocorrelation matrix is symmetric with identical elements along the diagonals and is identified as a Toeplitz matrix. The ak coefficients, resulting from the solution of Equation 2 for each short time segment of sampled speech, are stored in a data base structure within storage 70. These stored ak parameters are then used as elements comparison templates during the recognition process.
The processing by the digital computer 60 will now be described in conjunction with FIG. 6, which comprises FIGS. 6A and 6B. The process described in FIG. 6 will be what is referred to as the learn mode, i.e., setting up the data bases to contain the vocabulary or dictionary for the speech utterances to be recognized. A second mode of the program, the recognition mode, is also included in the same program. It will be recognized by those skilled in the art that the programs may be separated. If a fixed data base vocabulary or learned speech is established and resides in storage 70, there is no need to perform the learn mode. Because of the common processing between the two modes, the programs for each mode are combined into a single program.
Referring to FIG. 6, after the learn mode has been established (IRECOG=0) the program sets the number of poles (IP) (block 300), initializes the autocorrelation window (IW) (block 310), and initializes all the working arrays contained within the program (block 320). (The mnemonics IRECOG, IW, and IP are symbols used in the program which is included herein as Appendix I). At this point, an input of the text (voice identifying information) of the utterance to be stored in the data base is input by an input device (block 330). The input speech utterance (verbal input or voicing) is then inputted to the speech recognition system 1 (block 340) and goes through the filtering, clipping and sampling process described above. The binary information is then stored in storage 70. When the complete utterance has been stored in storage 70, the program begins the analysis. The program retrieves a clipped speech sample of one bit (block 350) and computes R(i) and R(0) for the current window of N speech samples (block 360) in accordance with Equations (4) and (5). ##EQU5## where, P is the number of poles,
N is the number of samples in a window, and
n is the individual sample instant.
The program then solves for the coefficients ak using the Levinson recursion method in accordance with Equation (6), and saves the coefficients ak in the data base (block 370). The program then calculates the gain G in accordance with Equation (7) and saves that information in the data base (block 380), and calculates the residuals in accordance with Equation (8) and saves the results in the data base (block 390). ##EQU6## The program then calculates the measure (or distance measure) in accordance with Equation (9), and saves that information in the data base (block 325). ##EQU7## If all the speech windows have not been analyzed (block 335), the program shifts the autocorrelation window (IW) (block 345) and repeats the process starting at block 350. If all the speech windows have been analyzed and more speech is to be learned (block 355), the process repeats starting with block 340. If there is no more speech to learn, i.e., the data base vocabulary or dictionary of learned speech is completed, the program stops.
Referring to FIG. 7 which comprises FIGS. 7A, 7B, and 7C, the recognition mode will now be described. Once the digital computer 60 has been set up for the recognition mode (IRECOG=1), the process starts by initializing the program which comprises setting the number of poles p (IP)(block 400), initializing the autocorrelation window (IW) (block 410), and initializing all working arrays (block 420). The speech utterance is then inputted (block 430) and is filtered, clipped, sampled, and stored in storage 70. After all the information has been stored in storage 70, the digital computer 60 then proceeds with processing the speech utterance as stored in storage 70. The program retrieves a single clipped speech sample of one bit (block 440). The program computes R'(i) and R'(0) of N speech samples (block 450) in accordance with Equations (4) and (5). (The "prime" indicates the speech parameters to be recognized, or "unknown" speech parameters versus the parameters stored in the data base.) The program then calculates the gain G' in accordance with Equation (7) and solves for coefficients a'k (block 460) in accordance with Equation (6). The program calculates the residuals (block 470) in accordance with Equation (10) and then calculates the measure for the unknown speech input (block 480) in accordance with Equation (9). ##EQU8##
If all the speech windows have not been analyzed (block 490), the program shifts to the next window (block 425) and repeats the processing starting with block 440. If all the speech windows have been analyzed, the program retrieves all the data base members which have a measure in accordance with Equation (11) less than a predetermined value, the predetermined value of the distance measure of the preferred embodiment being 200 (block 435). The data base item numbers ##EQU9## for the words retrieved are saved (block 445), and each member retrieved is examined according to a distance measure specified by Equations (12) or (13) (block 446). The distance measure of the preferred embodiment utilized in block 446 is that specified by Equation (13). It will be recognized by those skilled in the art that many types of distance measures exist and may be employed herein without departing from the spirit and scope of the invention. ##EQU10## The items are then sorted to find the item with the minimum distance (block 455). The item having the minimum distance can then be retrieved using the item pointer, the information contained in the retrieved item includes the voice identifying information thereby identifying the speech utterance from the previously-learned vocabulary (block 465). The program then outputs the voice identifying information which constitutes the OUTPUT SIGNAL (block 475). If more speech is to be recognized (block 485), the program repeats the process starting at block 430. If no more recognition is to be performed, the program stops.
Although the above description has been directed to a speech recognition whereby the INPUT SIGNAL is representative of uttered speech, it will be recognized that the system may be capable of recognizing any analog input signal representative of some phenomenon, event, occurrence, material characteristic, information, quantity, parameter, etc. in which there may be defined an associated set of reference features capable of identification, consistent with the concepts described herein.
While there has been shown what is considered to be the preferred embodiment of the invention, it will be manifest that many changes and modifications can be made therein without departing from the essential spirit and scope of the invention. It is intended, therefore, in the annexed claims, to cover all such changes and modifications which fall within the true scope of the invention. ##SPC1##

Claims (4)

We claim:
1. An apparatus for matching a presently spoken speech utterance with a corresponding one of a desired plurality of previously spoken speech utterances comprising:
signal quantizing means having an output terminal, and an input terminal for receiving analog input signals representing spoken speech utterances including a desired plurality of previously spoken speech utterances and a presently spoken speech utterance, said signal quantizing means quantizing said analog input signal on its input terminal into a binary value on its output terminal;
sampling means having an output, and an input connected to the output terminal of said signal quantizing means, said sampling means periodically sampling the binary value on the output terminal of said quantizing means and placing on its output, said binary value in a string of binary bits having a predetermined number of bits;
buffer means having an output, and an input connected to the output of said sampling means, said buffering means storing each complete string of binary bits appearing on the output of said sampling means in a sequential fashion throughout the duration of said analog input signal on the input terminal of said quantizing means such that the number of said strings for a particular one of said spoken speech utterances is dependent upon the duration of said analog input signal;
analyzing means having a storage interconnect, an output, and an input connected to the output of said buffer means, said analyzing means determining autocorrelation functions of each of said strings of binary bits of said spoken speech utterances stored in said buffer means, determining linear predictive coefficients of said autocorrelation functions for each of said spoken speech utterances, and placing said linear predictive coefficients on the output of said analyzing means;
storage means connected to the storage interconnect of said analyzing means, said storage means successively storing said linear predictive coefficients and identifying data for each of said desired plurality of previously spoken speech utterances; and
said analyzing means further including means calculating distance measures between the linear predictive coefficients of said presently spoken speech utterance and the linear predictive coefficients of selected ones of said desired plurality of previously spoken speech utterances stored in said storage means, means finding the minimum values of said distance measures, and means for placing on the output of said analyzing means, the identifying data stored for the previously spoken speech utterance corresponding to said minimum values of said distance measures, thereby matching said presently spoken speech utterance with a corresponding one of said desired plurality of previously spoken speech utterances.
2. A method for matching a presently spoken speech utterance with a corresponding one of a desired plurality of previously spoken speech utterances comprising the steps of:
a. quantizing an analog input signal representing a spoken speech utterance into a binary value;
b. periodically sampling said binary value;
c. placing said binary value in a string of binary bits having a predetermined number of bits;
d. repeating steps b and c for the duration of said analog input signal such that the number of said strings is dependent upon the duration of said analog input signal;
e. determining autocorrelation functions of each of said strings of binary bits of said spoken speech utterance;
f. determining linear predictive coefficients of said autocorrelation functions of said spoken speech utterance;
g. storing said linear predictive coefficients and identifying data for said spoken speech utterance;
h. repeating steps a-g until linear predictive coefficients and identifying data for each spoken speech utterance of a desired plurality of previously spoken speech utterances is stored;
i. repeating steps a-f for a presently spoken speech utterance;
j. calculating distance measures between the linear predictive coefficients of said presently spoken speech utterance and the linear predictive coefficients of selected ones of said desired plurality of previously spoken speech utterances stored in said storage means;
k. finding the minimum values of said distance measures;
l. providing said identifying data stored for the spoken speech utterance corresponding to said minimum values of said distance measures, thereby matching said presently spoken speech utterance with a corresponding one of said desired plurality of previously spoken speech utterances.
3. A method according to claim 2 further comprising:
determining the linear predictive coefficients (a'k) of the autocorrelation function of a string of bits (y(n)) in accordance with ##EQU11## where p is the number of poles of an all-pole linear prediction filter model.
4. A method according to claim 3, wherein the step of determining a distance measure is accomplished by evaluating the distance measure in accordance with ##EQU12##
US06/329,776 1981-12-11 1981-12-11 Clipped speech-linear predictive coding speech processor Expired - Fee Related US4477925A (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US06/329,776 US4477925A (en) 1981-12-11 1981-12-11 Clipped speech-linear predictive coding speech processor
PCT/US1982/001716 WO1983002190A1 (en) 1981-12-11 1982-12-07 A system and method for recognizing speech
JP83500435A JPS58502113A (en) 1981-12-11 1982-12-07 voice recognition device
CA000417214A CA1180447A (en) 1981-12-11 1982-12-07 Clipped speech-linear predictive coding speech processor
DE8383900305T DE3271705D1 (en) 1981-12-11 1982-12-07 A system and method for recognizing speech
EP83900305A EP0096712B1 (en) 1981-12-11 1982-12-07 A system and method for recognizing speech
DE198383900305T DE96712T1 (en) 1981-12-11 1982-12-07 METHOD AND SYSTEM FOR VOICE RECOGNITION.

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US06/329,776 US4477925A (en) 1981-12-11 1981-12-11 Clipped speech-linear predictive coding speech processor

Publications (1)

Publication Number Publication Date
US4477925A true US4477925A (en) 1984-10-16

Family

ID=23286972

Family Applications (1)

Application Number Title Priority Date Filing Date
US06/329,776 Expired - Fee Related US4477925A (en) 1981-12-11 1981-12-11 Clipped speech-linear predictive coding speech processor

Country Status (6)

Country Link
US (1) US4477925A (en)
EP (1) EP0096712B1 (en)
JP (1) JPS58502113A (en)
CA (1) CA1180447A (en)
DE (2) DE96712T1 (en)
WO (1) WO1983002190A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4763278A (en) * 1983-04-13 1988-08-09 Texas Instruments Incorporated Speaker-independent word recognizer
US4817154A (en) * 1986-12-09 1989-03-28 Ncr Corporation Method and apparatus for encoding and decoding speech signal primary information
US4860357A (en) * 1985-08-05 1989-08-22 Ncr Corporation Binary autocorrelation processor
US4945568A (en) * 1986-12-12 1990-07-31 U.S. Philips Corporation Method of and device for deriving formant frequencies using a Split Levinson algorithm
US5136652A (en) * 1985-11-14 1992-08-04 Ncr Corporation Amplitude enhanced sampled clipped speech encoder and decoder
US5809464A (en) * 1994-09-24 1998-09-15 Alcatel N.V. Apparatus for recording speech for subsequent text generation
EP1850328A1 (en) * 2006-04-26 2007-10-31 Honda Research Institute Europe GmbH Enhancement and extraction of formants of voice signals
US20090030690A1 (en) * 2007-07-25 2009-01-29 Keiichi Yamada Speech analysis apparatus, speech analysis method and computer program
US20090326942A1 (en) * 2008-06-26 2009-12-31 Sean Fulop Methods of identification using voice sound analysis

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3419636C2 (en) * 1984-05-25 1986-08-28 Rolf 8000 München Treutlin Method for generating and processing control information arranged at certain points in a sound recording for controlling acoustic or optical devices and apparatus for carrying out the method
CN111384051B (en) * 2016-03-07 2022-09-27 杭州海存信息技术有限公司 Memory with speech recognition function

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3071652A (en) * 1959-05-08 1963-01-01 Bell Telephone Labor Inc Time domain vocoder
US3278685A (en) * 1962-12-31 1966-10-11 Ibm Wave analyzing system
US3416080A (en) * 1964-03-06 1968-12-10 Int Standard Electric Corp Apparatus for the analysis of waveforms
US3742146A (en) * 1969-10-21 1973-06-26 Nat Res Dev Vowel recognition apparatus
US3816722A (en) * 1970-09-29 1974-06-11 Nippon Electric Co Computer for calculating the similarity between patterns and pattern recognition system comprising the similarity computer
US4015088A (en) * 1975-10-31 1977-03-29 Bell Telephone Laboratories, Incorporated Real-time speech analyzer

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3521235A (en) * 1965-07-08 1970-07-21 Gen Electric Pattern recognition system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3071652A (en) * 1959-05-08 1963-01-01 Bell Telephone Labor Inc Time domain vocoder
US3278685A (en) * 1962-12-31 1966-10-11 Ibm Wave analyzing system
US3416080A (en) * 1964-03-06 1968-12-10 Int Standard Electric Corp Apparatus for the analysis of waveforms
US3742146A (en) * 1969-10-21 1973-06-26 Nat Res Dev Vowel recognition apparatus
US3816722A (en) * 1970-09-29 1974-06-11 Nippon Electric Co Computer for calculating the similarity between patterns and pattern recognition system comprising the similarity computer
US4015088A (en) * 1975-10-31 1977-03-29 Bell Telephone Laboratories, Incorporated Real-time speech analyzer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Hellwarth et al., "Automatic Conditioning of Speech Signals", IEEE Trans. on Audio etc., Jun. 1968, pp. 169-179.
Hellwarth et al., Automatic Conditioning of Speech Signals , IEEE Trans. on Audio etc., Jun. 1968, pp. 169 179. *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4763278A (en) * 1983-04-13 1988-08-09 Texas Instruments Incorporated Speaker-independent word recognizer
US4860357A (en) * 1985-08-05 1989-08-22 Ncr Corporation Binary autocorrelation processor
US5136652A (en) * 1985-11-14 1992-08-04 Ncr Corporation Amplitude enhanced sampled clipped speech encoder and decoder
US4817154A (en) * 1986-12-09 1989-03-28 Ncr Corporation Method and apparatus for encoding and decoding speech signal primary information
US4945568A (en) * 1986-12-12 1990-07-31 U.S. Philips Corporation Method of and device for deriving formant frequencies using a Split Levinson algorithm
US5809464A (en) * 1994-09-24 1998-09-15 Alcatel N.V. Apparatus for recording speech for subsequent text generation
EP1850328A1 (en) * 2006-04-26 2007-10-31 Honda Research Institute Europe GmbH Enhancement and extraction of formants of voice signals
US20090030690A1 (en) * 2007-07-25 2009-01-29 Keiichi Yamada Speech analysis apparatus, speech analysis method and computer program
US8165873B2 (en) * 2007-07-25 2012-04-24 Sony Corporation Speech analysis apparatus, speech analysis method and computer program
US20090326942A1 (en) * 2008-06-26 2009-12-31 Sean Fulop Methods of identification using voice sound analysis
US8036891B2 (en) 2008-06-26 2011-10-11 California State University, Fresno Methods of identification using voice sound analysis

Also Published As

Publication number Publication date
DE96712T1 (en) 1984-05-10
CA1180447A (en) 1985-01-02
EP0096712A1 (en) 1983-12-28
WO1983002190A1 (en) 1983-06-23
EP0096712B1 (en) 1986-06-11
JPS58502113A (en) 1983-12-08
DE3271705D1 (en) 1986-07-17

Similar Documents

Publication Publication Date Title
EP1610301B1 (en) Speech recognition method based on word duration modelling
US4910784A (en) Low cost speech recognition system and method
US4994983A (en) Automatic speech recognition system using seed templates
US4624011A (en) Speech recognition system
US4454586A (en) Method and apparatus for generating speech pattern templates
US4661915A (en) Allophone vocoder
US4087632A (en) Speech recognition system
JPS62231997A (en) Voice recognition system and method
US4424415A (en) Formant tracker
JPH0422276B2 (en)
JPH0816187A (en) Speech recognition method in speech analysis
US4477925A (en) Clipped speech-linear predictive coding speech processor
US5764853A (en) Voice recognition device and method using a (GGM) Guaranteed Global minimum Mapping
US4922539A (en) Method of encoding speech signals involving the extraction of speech formant candidates in real time
US5202926A (en) Phoneme discrimination method
EP0421744B1 (en) Speech recognition method and apparatus for use therein
Davis et al. Evaluation of acoustic parameters for monosyllabic word identification
Kalaiarasi et al. Performance Analysis and Comparison of Speaker Independent Isolated Speech Recognition System
RU2119196C1 (en) Method and system for lexical interpretation of fused speech
JP2580768B2 (en) Voice recognition device
EP0190489B1 (en) Speaker-independent speech recognition method and system
EP0116324A1 (en) Speaker-dependent connected speech word recognizer
JPH03116100A (en) Large vocabulary voice recognizing device
Al Mahmud Performance analysis of hidden markov model in Bangla speech recognition
Bellemin 18-551 Digital Communications and Signal Processing System Design Spring 2002 Professor Casasent

Legal Events

Date Code Title Description
AS Assignment

Owner name: NCR CORPORATION DAYTON, OH A CORP OF MD

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST.;ASSIGNORS:AVERY, JAMES M.;HOYER, ELMER A.;REEL/FRAME:003952/0775

Effective date: 19811209

Owner name: NCR CORPORATION, A CORP OF MD, OHIO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AVERY, JAMES M.;HOYER, ELMER A.;REEL/FRAME:003952/0775

Effective date: 19811209

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: HYUNDAI ELECTRONICS AMERICA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T GLOBAL INFORMATION SOLUTIONS COMPANY (FORMERLY KNOWN AS NCR CORPORATION);REEL/FRAME:007408/0104

Effective date: 19950215

AS Assignment

Owner name: SYMBIOS LOGIC INC., COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HYUNDAI ELECTRONICS AMERICA;REEL/FRAME:007629/0431

Effective date: 19950818

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
FP Expired due to failure to pay maintenance fee

Effective date: 19961016

AS Assignment

Owner name: SYMBIOS, INC ., COLORADO

Free format text: CHANGE OF NAME;ASSIGNOR:SYMBIOS LOGIC INC.;REEL/FRAME:009089/0936

Effective date: 19971210

AS Assignment

Owner name: LEHMAN COMMERCIAL PAPER INC., AS ADMINISTRATIVE AG

Free format text: SECURITY AGREEMENT;ASSIGNORS:HYUNDAI ELECTRONICS AMERICA, A CORP. OF CALIFORNIA;SYMBIOS, INC., A CORP. OF DELAWARE;REEL/FRAME:009396/0441

Effective date: 19980226

AS Assignment

Owner name: HYUNDAI ELECTRONICS AMERICA, CALIFORNIA

Free format text: TERMINATION AND LICENSE AGREEMENT;ASSIGNOR:SYMBIOS, INC.;REEL/FRAME:009596/0539

Effective date: 19980806

AS Assignment

Owner name: HYNIX SEMICONDUCTOR AMERICA INC., CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:HYUNDAI ELECTRONICS AMERICA;REEL/FRAME:015246/0599

Effective date: 20010412

Owner name: HYNIX SEMICONDUCTOR INC., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HYNIX SEMICONDUCTOR AMERICA, INC.;REEL/FRAME:015279/0556

Effective date: 20040920

AS Assignment

Owner name: MAGNACHIP SEMICONDUCTOR, LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HYNIX SEMICONDUCTOR, INC.;REEL/FRAME:016216/0649

Effective date: 20041004

AS Assignment

Owner name: SYMBIOS, INC., COLORADO

Free format text: RELEASE OF SECURITY INTEREST;ASSIGNOR:LEHMAN COMMERICAL PAPER INC.;REEL/FRAME:016602/0895

Effective date: 20050107

Owner name: HYUNDAI ELECTRONICS AMERICA, CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST;ASSIGNOR:LEHMAN COMMERICAL PAPER INC.;REEL/FRAME:016602/0895

Effective date: 20050107

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362