EP0148171A1 - Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur - Google Patents

Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur

Info

Publication number
EP0148171A1
EP0148171A1 EP19830902050 EP83902050A EP0148171A1 EP 0148171 A1 EP0148171 A1 EP 0148171A1 EP 19830902050 EP19830902050 EP 19830902050 EP 83902050 A EP83902050 A EP 83902050A EP 0148171 A1 EP0148171 A1 EP 0148171A1
Authority
EP
European Patent Office
Prior art keywords
utterance
word
unknown
signal
predefined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19830902050
Other languages
German (de)
English (en)
Inventor
Robert D. Kirkpatrick
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Voice Control Systems Inc
Original Assignee
Voice Control Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Voice Control Systems Inc filed Critical Voice Control Systems Inc
Publication of EP0148171A1 publication Critical patent/EP0148171A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal

Definitions

  • This invention relates to improvements in apparatuses and methods for recognizing unknown speech utterances or words, and more particularly to improvements in such apparatuses and methods in which such recognition is enabled independently of the speaker; i.e., without requiring a prior memorization of a par- 10 ticular speaker's voice patterns for particular words to be recognized.
  • Speech independent recognition is generally held to mean recognition of at least a predefined vocabulary or set of words without a requirement for prior knowledge of the particular voice characteristics, such as dialect, pitch, speaking rate, etc., of the speaker.
  • feature analysis a technique of word or speech-utterance identification which is referred to herein as "feature analysis”. This is in contra ⁇ distinction to the prior art which often is referred to as “template analysis”.
  • template analysis an utterance of a
  • 25 particular speaker is memorized in digital or analog form and subsequent utterances are compared against the template. If a match is found, the word is identified, otherwise the word is not identified.
  • One of the problems of the "template analysis" tech- niques of the prior is that they are, in general, speaker de ⁇ pendent. That is, a particular word spoken by one individual produces a unique temp-late which does not match the speech pat ⁇ tern of most speakers saying the same word.
  • each speaker whose words are to be identified preproduce a template vocabulary of the words to be recognized. It can be seen that it would be of great advantage to provide a system which is speaker independent, that is, does not require a series of individual speaker templates.
  • feature analysis recognizes words by determining several predefined charac- teristics of the word-acoustic patterns and through decisional software routines, eliminating from consideration, or including in consideration, possible word candidates to be identified. The process may be accomplished at various levels and stages using some or all of the characteristics, but, it should be emphasized that a particular word or utterance to be recognized is not routinely compared to each of the word candidates possible to be recognized in the system. It is this feature which distinguishes the technique of the invention from the "template analysis” of the prior art which required such precision complete unknown word to template comparisions. (The prior art, in fact, usually com ⁇ pared the utterance to each and every word of the vocabulary even though a match was found early in the comparison process.)
  • FIG. 1 is a diagrammatic box-diagram of an apparatus for speaker independently determining unknown speech utterances, in accordance with the invention.
  • FIG. 2 is a flow chart illustrating the general or generic method for speaker independently determining unknown speech utterances in conjunction with the apparatus of FIG. 1, and in accordance with the invention.
  • FIG. 3 is a silhouette of an acoustic waveform of the word
  • FIG. 4 is a silhouette of an acoustic waveform of the word "one".
  • FIG. 5 is a silhouette of an acoustic waveform of the word "two".
  • FIG. 6 is a silhouette of an acoustic waveform of the word "three".
  • FIG . 7 is a silhouette of an acoustic waveform of the word
  • FIG. 8 is a silhouette of an acoustic waveform of the word
  • FIG. 9 is a silhouette of an acoustic waveform of the word "six”.
  • FIG. 10 is a silhouette of an acoustic waveform of the word "seven”.
  • FIG. 11 is a silhouette of an acoustic waveform of the word "eight".
  • FIG. 12 is a silhouette of an acoustic waveform of the word "nine".
  • an object of the invention to provide a speaker-independent voice recognition apparatus and method.
  • the invention provides an apparatus and method for identifying voice utterances or words including the steps of generating the digitized signal representing the utterance, determining the features of the digital representation including the zero crossing frequencies, energy, zero crossing rates, grouping the determined features into vowel, consonant and syllable groups and identifying the grouped features.
  • the method includes the steps of companding the digitized signal to generate a complete decodeable signal representation of the utterance over the dyna-
  • the method for recognizing an unknown word or speech utterance as one of a pre- 5 defined set of words includes the steps of establishing at least one flag (feature indicator) which is set when a predefined utterance pattern is present in the unknown speech utterance. At least one signal representing the gross parameter of the unknown . utterance is established, and a plurality of signals representing 0. fine parameters of predefined representations of the unknown utterance are also established. The unknown speech utterance is tested to determine whether to set the flag, and the gross para ⁇ meter representing the signal is determined.
  • the predefined set of words is searched to identify at least one of them which is 5 characterized by at least the utterance pattern indicated by the set flag, if any, and by at least the gross parameter indicated by the gross parameter indicating signal. Finally, it is deter ⁇ mined whether the set of identified features associated with the unknown word are adequate to identify the word.
  • the voice control recognition system in accordance with a preferred embodiment of the invention, is achieved with both hardware and software requirements presently defined. It should be emphasized that the apparatus and method herein described 5 achieves speaker-independent voice recognition as a time-domain process, without regard to frequency-domain analysis. That is, no complicated orthogonal analysis or fast Fourier transformed analysis are used, thereby enabling rapid, real time operation of
  • a system 10 utilizes a central pro- 5 cessing unit (CPU) 11 in conjunction with other hardware elements to be described.
  • the CPU 11 can be any general purpose appropriately programmed computer, and, in fact, can be a CPU portion of any widely available home computer, such as those sold by IBM, APPLE, COMMODORE, RADIO SHACK, and other vendors.
  • a 10 memory element 12 is in data and control communication with the
  • An output device 15 is provided to convert the output signal generated by the CPU 11 to an appropriate useable form.
  • the output of the apparatus 10 is delivered to a CRT screen 16 for display; con ⁇ sequently, the output box 15 in the embodiment illustrated will contain appropriate data decoders and CRT drivers, as will be apparent to those skilled in the art. Also, it will be apparent 20. to those skilled in the art that many other output utilization devices can be equally advantageously employed. For instance, by way of example and not limitation, robots, data receivers for flight control apparatuses, automatic banking devices, and the like may be used. Diverse utilization application devices will 25 be clearly apparent to those skilled in the art.
  • a bandpass filter 20 At the data input port of the CPU 11, a bandpass filter 20, an analog-to-digital (A/D) and companding circuitry 21 are pro ⁇ vided.
  • A/D analog-to-digital
  • companding circuitry 21 At the data input port of the CPU 11, a bandpass filter 20, an analog-to-digital (A/D) and companding circuitry 21 are pro ⁇ vided.
  • the companding operation provided by the A/D and com ⁇ panding circuitry 21 enables a wide, dynamic energy range of
  • a microphone or other speech-receiving apparatus 23 is provided into which a speech utterance made by a person 24 is received and applied to the CPU 11 via the bandpass filter 20 and A/D in companding cir ⁇ cuitry 21.
  • control algorithm portion 27 of the memory is provided in which the various machine CPU operational steps are contained. Also contained in the memory element 12 is a word buffer 28 as well as a portion for various data tables developed by the control algorithm 27 from the received utterance delivered to the CPU 11. The detailed characteristic portion are contained in the area 29.
  • the analog signal is applied to a bandpass filter 20, then to the A/D and companded circuitry 21 where it is digitized and companded.
  • the bandpass filter and A/D compander 21 may be of any commercial available bandpass, A/D and com ⁇ panding circuits, an example of which are, respectively, an MK5912 and MK5116, provided by Mostek Corporation of Dallas, Texas.
  • the companding circuitry 21 follows the u-255 Law Companding Code.
  • the signal received along the path including the microphone 23, bandpass filter 20 and A/D and companding circuitry 21 continuously receives and applies a signal to the CPU 11 which, under the control of the control algorithm 27 in the memory element 12, examines or deter ⁇ mines when the signal represents a speech utterance or, as referred to herein, a word.
  • the detailed characteristic data tables 29 and summary tables 33 are developed, as well as the various flags 35.
  • the wordset 36 is then examined, and wordsubsets determined, both of possible can ⁇ didates for the word to be detected and also sets of words which are not the word to be detected.
  • the characteristics of the received word are then further refined and the word selection/elimination process is continued until the final word is selected, or until no word is selected.
  • the CPU develops an output to the output device 15, either directly on line 45 or indirectly via the A/D and com ⁇ panding circuitry 21 on line 46 for application to the output utilization device, in the embodiment illustrated, the television or CRT display 16.
  • the A/D and com ⁇ panding circuitry 20 includes a decoder circuit so that if desired, the unknown word can be decoded and applied in analog form to the output device 15 for application to a speaker (not shown) or the like for verification. More particularly, the steps for determining the word or other speech utterance are shown in box-diagram form in FIG.
  • the unknown word 50 in analog form, is digi ⁇ tized and companded, box 52.
  • the human ear is particularly adapted for detecting subtle nuances of volume or intensity of sounds in human speech recognition, machines or apparatuses used in the past do not have such dynamic range readily available.
  • the companded signals can be uncompanded or decoded at a receiving terminal to enable the original signal to be recreated for the listener.
  • the signal to be recognized is first companded in the box indicated by the reference numeral 52. This enables the signal having low levels to be amplified so that the ordinarily recognizable low energy levels of the signal can be recognized, and, additionally, the high level portions of the signal can be preserved for appropriate processing.
  • companding is recognized in the telecommunications art, one companding law being referred to, for example, is the u-255 Law Companding Code.
  • the companding pro ⁇ cess in essence produces a signal having an output energy level which is nonlinearly related to the input signal level at high and low signal levels, and linearly related at the mid-range signal level values.
  • the decoder has an appropriate inverse transfer function. As mentioned with reference to FIG. 1, the digitized and companded signal is applied to the CPU 11 con ⁇ tinuously, and as the various sounds and other background noises are received, the presence, if any, of a word or any other speech utterance is determined, box 53.
  • the manner in which the start of an unknown word is determined is to monitor con ⁇ tinuously the signal output from the A/D and companding circuitry 21 and noting when the signal level exceeds a predetermined level for a predetermined time.
  • a "word" is initially defined (although, as below described, if a major vowel is not found in the word, the development of the data representing the characteristics of the word is not completed) . It has been found that a level of approximately l/8th of the maximum anticipated signal level for about 45 milliseconds can be used to reliably determine the beginning of a signal having a higher probability of being an appropriate word to be recognized.
  • a word length of 600 milliseconds is stored for analysis.
  • the word to be recognized is defined to be contained in a window of 600 millise ⁇ cond. It will be appreciated that some words will occupy more time space of the 600 millisecond window than others, but the amount of window space can be, if desired, one of the parameters or characteristics used in including and excluding word patterns of the possible set of words which the unknown word could possibly be.
  • the signal detected at the input to the system is con ⁇ tinuously circulated through the unknown word buffer of the memory until the signal having the required level and duration occurs.
  • the central processing unit 11 on an interrupt level begins the process of defining the word "window".
  • the first step in defining the word "window” is to examine the data in the unknown word buffer 28 backwards in time until a silence condition is detected. From the point at which silence is detected, a 600 millisecond window is defined. It should be emphasized that during this time, once the beginning of the 600 millisecond window is defined, the various characteristic data tables are being generated, thus, enabling rapid, dynamic processing of the unknown word.
  • the window is divided into ten millisecond sections or frames, each frame containing a portion of the digitized word signal in the window. This is indicated in box 55.
  • Table 1 represents the number of zero crossings of the companded signal, each of the numbers in each frame representing the number of zero crossings of the signal in that particular frame.
  • H thirtee
  • the companded signal there are fifteen(H) crossings.
  • the third ten millisecond frame there are also fiftee (H) zero crossing.
  • the fourth ten millisecond frame there are eighteen(H) zero crossings, and so forth.
  • Table 2 represents an energy-related value of the signal represented in each ten millisecond frame.
  • the energy approxi ⁇ mation in accordance with the invention is determined by the sum of the absolute value of the positive going signal excursions of the companded signal in each frame multiplied times four.
  • the value in the first ten millisecond frames is eight(H).
  • the values in the second and third ten millisecond frames are A(H) and B(H), respectively .
  • Table 3 represents in accordance with the invention, the value of the peak-to-peak maximum voltage of the companded signal in each of the respective ten millisecond frames.
  • the peak-to-peak maximum voltage in the first three ten millisecond 10. frames is four(H).
  • the value in the fourth ten millisecond frame is five(H) and the value in the fifth ten millisecond frame is five(H), and so on.
  • Table 4 is the number of major cycles contained in each of the ten millisecond frames.
  • a major cycle is determined by the 15 number of cycles which exceed 50 percent of the maximum amplitude of the signal contained throughout the entire 600 millisecond word window.
  • the number of major cycles is one(H).
  • the number of major cycles is two(H).
  • the number of major cycles is five(H) , and so on.
  • Table 5 represents the absolute number of cycles contained within each ten millisecond frame.
  • the absolute number of cycles contains both the number of major cycles, as set forth in 25 Table 4 as well as the lesser cycles, i.e., those having less than 50 percent of the amplitude of the signal contained in the 600 millisecond window.
  • the first ten millisecond fra ⁇ me has a value of D(H).
  • the second frame has a value of thirteen ( H) and the third frame has a value of seventeen(H) , and so on . 15
  • Summary Characteristic Data Table 6 includes four values. s
  • VST Vehicle Start
  • VEN Vehicle End - Start of Second Syllable
  • frame 1B Frame 1B(H)
  • VEN2 End of Second Syllable
  • the vowel start frame is determined from the data contained in the detailed characteristic data table representing the peak- to-peak maximum voltage, Table 3.
  • the method by which the vowel start is determined is by determining whether a predetermined peak-to-peak voltage value exists for a predetermined number of frames. For example, if 3/4ths of the maximum peak-to-peak value exists in six frames, a major vowel would appear to be present. From the point at which the first 3/4ths peak-to-peak value appears in the 600 millisecond word window, the previous word frames are examined to a point at which the voltage drop is at least l/8th of the peak-to-peak voltage value.
  • Other methods for determining the vowel start can be used from the data in the • tables, for example, another method which can be used is to exa ⁇ mine the data preceeding the 3/4ths peak-to-peak value frame
  • the vowel end (or start of the second syllable) frame is determined from an examination of the detailed characteristic data tables. More particularly, the detailed characteristic data table contain the energy related values, Table 2 , are examined to determine the vowel end. Thus, when the energy falls to a prede ⁇ termined level of energy for a predetermined number of frames , then the vowel end is defined. For example, if the energy falls to a value of 1/8th of the maximum energy for a period of about eight frames, the frame at which the energy falls to the l/8th level is defined as the vowel end.
  • the end of the second syllable is determined from the detailed characteristic data tables in a fashion similar to the determination of the Vowel End (VEN) determination above described.
  • the maximum peak-to-peak voltage over the entire sample is determined directly from the data in Table 3, from which it can be seen that the largest value in the table is the value FKH) which occurs at frame 16.
  • the data in the summary characteristic data tables * 7, 8, and 9, are generated from grouped data from the detailed characteristic data tables 1-5.
  • the word to be evaluated within the 6/10ths second window is divided into four sections, beginning with the vowel-start frame indicated by the parameter VST in Table 6 and ending with the vowel end frame, as indicated by the parameter VEN in Table 6.
  • the vowel- start frame begins in frame D(H) and the vowel ends in frame 17(H).
  • the first- l/4th of the divided data is contained in frames 13 to 15
  • the second l/4th is contained in frames 16 to
  • the third l/4th is contained in frames 19 to 21, and the last l/4th is contained in frames 22 to 23.
  • Summary characteristic data table 7 is developed from the detailed characteristic data table presenting the approximate energy set forth in Table 2.
  • the summary values listed in Table 7 are as follows: HZl - the average number of zero crossings of the companded signal in the first quarter of the frames beginning at the vowel start (VST) and ending at the end of the first vowel (VEN).
  • HZ2 - is the average number of zero crossings of the com ⁇ panded signal for the second and third quarters of the frames
  • HZ3 - is the average number of zero crossings of the com ⁇ panded signal for the last quarter during the period VST-VEN.
  • HZ4 - set forth in Table 7 is a post-end characteristic 5 which represents the average number of zero crossings of the com ⁇ panded signal beginning at a point after the end of the first vowel and includes the second syllable, if any. In the case illustrated, the values in this region is the "VEN" sound of the word "seven”. JLO In similar fashion, a summary table of the average number of major cycles over the same one-quarter portions of the period between the vowel start (VST) and vowel end (VEN) is developed.
  • HZF1 is the average number of major cycles in the first quarter
  • HZF2 is the average number of major cycles in the second 15 and third quarters
  • HZF3 is the average number of major cycles in the fourth quarter
  • HZF4 is the average number of major cycles in the post-end period.
  • HZAl represents the average of the abso ⁇ lute number of cycles in the first quarter
  • HZA2 represents the 25 average of the absolute number of cycles in the second and third quarters
  • HZA3 represents the average of the absolute number of cycles in the fourth quarter
  • HZA4 represents the average of the absolute number of cycles in the post-end period.
  • a number of special flags are determined.
  • the preliminary special flags are* set upon the existence of: a leading "s”, a leading "t”, a trailing "s”, a trailing "t”, a multiple syllable word, a second syllable empha- sized word, a trailing "p", a leading "f”, a trailing "f”, a leading "w”, a possible leading "s”, a possible leading "t”, a possible trailing "s”, a possible trailing "t”, and a possible trailing "p".
  • the energy values of the word “one” has a relatively shallow leading edge building up to a peak-energy value.
  • the word “two” has an explosive initial energy value, followed by a period of minimum energy, then the remaining word sounds completing the word after the initial "t” sound.
  • one or more special charac ⁇ teristics can be defined.
  • the characteristics which are defined depend primarily upon the type of characteristics necessary to recognize the word as well as distinguishing characteristics which distinguish the word from other words which may be in the system vocabulary. The greater the number of words in the vocab ⁇ ulary and the closer the number of words from which an unknown word must be distinguished, the greater the number of charac ⁇ teristics which must be included and examined before an unknown word can be identified.
  • the word “six” can have six different characteristics by which the word “six” is deter ⁇ mined when the included vocabulary against which it is compared includes only the numbers 0 through 9 and not words which are of close sound, such as the words “sex”, “socks”, “sucks”, etc.
  • the six numerically distinguishing characteristics are: (1) a leading "s” must be present (leading "s” flagset);
  • the vowel must be a high-frequency vowel (determined from tables 1 and 5 ⁇ ; 22
  • the post-end characteristic must be of low amplitude with a high frequency tail of hydration, for example, greater than 30 milliseconds.
  • J.Q of word identification is not dependent upon the speed against which each word in the vocabulary is compared.
  • the issue to be determined in each particular system is that of determining the number of characteristics of a particular unknown word which must be identified before the word can be identified.
  • the issue is determined principally by the

Abstract

Dispositif (Figure 1) et procédé (Figure 2) permettant l'identification d'émissions sonores vocales ou de mots, consistant à produire le signal numérisé (52) représentant l'émission vocale, à déterminer les caractéristiques de la représentation numérique comprenant les fréquences de croisement zéro, l'énergie, les taux de croisement zéro (56), à grouper les caractéristiques déterminées en groupes de voyelles, consonnes et syllabes et à identifier les caractéristiques regroupées. Le procédé consiste à comprimer-dilater (52) le signal numérisé pour produire une représentation complète décodable du signal de l'émission vocale sur toute la gamme de l'énergie dynamique du signal de manière à accroître la précision de la reconnaissance vocale indépendamment du locuteur. Le procédé de reconnaissance d'un mot ou d'une émission vocale inconnu parmi un ensemble prédéfini de mots consiste à établir au moins un drapeau (indicateur de caractéristiques) qui est établi (60) lorsqu'un modèle prédéfini d'émission vocale est présent dans l'émission vocale inconnue. On établit au moins les paramètres bruts de l'émission vocale inconnue ainsi qu'une pluralité de paramètres fins de représentation prédéfinie de l'émission vocale inconnue. L'émission vocale inconnue est testée pour déterminer s'il y a lieu d'établir le drapeau et l'on détermine le paramètre brut représentant le signal. L'ensemble de mots prédéfini est trié pour identifier au moins un mot se caractérisant par au moins les paramètres bruts. On détermine finalement si l'ensemble de caractéristiques identifié associées au mot inconnu permet d'identifier le mot d'une manière adéquate.
EP19830902050 1983-05-16 1983-05-16 Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur Withdrawn EP0148171A1 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US1983/000750 WO1984004620A1 (fr) 1983-05-16 1983-05-16 Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur

Publications (1)

Publication Number Publication Date
EP0148171A1 true EP0148171A1 (fr) 1985-07-17

Family

ID=22175144

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19830902050 Withdrawn EP0148171A1 (fr) 1983-05-16 1983-05-16 Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur

Country Status (2)

Country Link
EP (1) EP0148171A1 (fr)
WO (1) WO1984004620A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774851A (en) * 1985-08-15 1998-06-30 Canon Kabushiki Kaisha Speech recognition apparatus utilizing utterance length information

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3553372A (en) * 1965-11-05 1971-01-05 Int Standard Electric Corp Speech recognition apparatus
US3499987A (en) * 1966-09-30 1970-03-10 Philco Ford Corp Single equivalent formant speech recognition system
US3940565A (en) * 1973-07-27 1976-02-24 Klaus Wilhelm Lindenberg Time domain speech recognition system
US4335302A (en) * 1980-08-20 1982-06-15 R.L.S. Industries, Inc. Bar code scanner using non-coherent light source

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO8404620A1 *

Also Published As

Publication number Publication date
WO1984004620A1 (fr) 1984-11-22

Similar Documents

Publication Publication Date Title
Abdelatty Ali et al. Acoustic-phonetic features for the automatic classification of fricatives
EP0109190A1 (fr) Dispositif pour la reconnaissance de monosyllabes
US3770892A (en) Connected word recognition system
US5025471A (en) Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of similar patterns
EP0178509A1 (fr) Système de formation d'éléments de référence pour la reconnaissance de la parole
JPH0376472B2 (fr)
US4665548A (en) Speech analysis syllabic segmenter
Ivanov et al. Modulation Spectrum Analysis for Speaker Personality Trait Recognition.
US5995924A (en) Computer-based method and apparatus for classifying statement types based on intonation analysis
CN111724770A (zh) 一种基于深度卷积生成对抗网络的音频关键词识别方法
Shareef et al. Gender voice classification with huge accuracy rate
US3198884A (en) Sound analyzing system
JP2996019B2 (ja) 音声認識装置
EP0148171A1 (fr) Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur
CN110933236A (zh) 一种基于机器学习的空号识别方法
David Artificial auditory recognition in telephony
Hasija et al. Recognition of Children Punjabi Speech using Tonal Non-Tonal Classifier
Mishra et al. Speaker identification, differentiation and verification using deep learning for human machine interface
Mufungulwa et al. Enhanced running spectrum analysis for robust speech recognition under adverse conditions: A case study on japanese speech
Paudzi et al. Evaluation of prosody-related features and word frequency for Malay speeches
Silipo et al. Automatic detection of prosodic stress in american english discourse
Wehde Computerized speech analysis
Vysotsky A speaker-independent discrete utterance recognition system, combining deterministic and probabilistic strategies
CN115547340A (zh) 语音同一性的检验方法、装置、电子设备及存储介质
CN111599381A (zh) 音频数据处理方法、装置、设备及计算机存储介质

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19850704

RIN1 Information on inventor provided before grant (corrected)

Inventor name: KIRKPATRICK, ROBERT, D.