EP0148171A1 - Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur - Google Patents
Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteurInfo
- Publication number
- EP0148171A1 EP0148171A1 EP19830902050 EP83902050A EP0148171A1 EP 0148171 A1 EP0148171 A1 EP 0148171A1 EP 19830902050 EP19830902050 EP 19830902050 EP 83902050 A EP83902050 A EP 83902050A EP 0148171 A1 EP0148171 A1 EP 0148171A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- utterance
- word
- unknown
- signal
- predefined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- KKEBXNMGHUCPEZ-UHFFFAOYSA-N 4-phenyl-1-(2-sulfanylethyl)imidazolidin-2-one Chemical compound N1C(=O)N(CCS)CC1C1=CC=CC=C1 KKEBXNMGHUCPEZ-UHFFFAOYSA-N 0.000 claims description 7
- 230000000063 preceeding effect Effects 0.000 claims description 2
- 230000005540 biological transmission Effects 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 description 15
- 238000012545 processing Methods 0.000 description 6
- 101000782167 Homo sapiens Zinc finger protein 234 Proteins 0.000 description 2
- 101000785650 Homo sapiens Zinc finger protein 268 Proteins 0.000 description 2
- 108091008770 Rev-ErbAß Proteins 0.000 description 2
- 102100036555 Zinc finger protein 234 Human genes 0.000 description 2
- 102100026522 Zinc finger protein 267 Human genes 0.000 description 2
- 102100026516 Zinc finger protein 268 Human genes 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 229910003460 diamond Inorganic materials 0.000 description 2
- 239000010432 diamond Substances 0.000 description 2
- 239000002360 explosive Substances 0.000 description 2
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 1
- 241000256844 Apis mellifera Species 0.000 description 1
- 241001470502 Auzakia danava Species 0.000 description 1
- 101000785648 Homo sapiens Zinc finger protein 266 Proteins 0.000 description 1
- 241000405961 Scomberomorus regalis Species 0.000 description 1
- 102100026521 Zinc finger protein 266 Human genes 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000036571 hydration Effects 0.000 description 1
- 238000006703 hydration reaction Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
Definitions
- This invention relates to improvements in apparatuses and methods for recognizing unknown speech utterances or words, and more particularly to improvements in such apparatuses and methods in which such recognition is enabled independently of the speaker; i.e., without requiring a prior memorization of a par- 10 ticular speaker's voice patterns for particular words to be recognized.
- Speech independent recognition is generally held to mean recognition of at least a predefined vocabulary or set of words without a requirement for prior knowledge of the particular voice characteristics, such as dialect, pitch, speaking rate, etc., of the speaker.
- feature analysis a technique of word or speech-utterance identification which is referred to herein as "feature analysis”. This is in contra ⁇ distinction to the prior art which often is referred to as “template analysis”.
- template analysis an utterance of a
- 25 particular speaker is memorized in digital or analog form and subsequent utterances are compared against the template. If a match is found, the word is identified, otherwise the word is not identified.
- One of the problems of the "template analysis" tech- niques of the prior is that they are, in general, speaker de ⁇ pendent. That is, a particular word spoken by one individual produces a unique temp-late which does not match the speech pat ⁇ tern of most speakers saying the same word.
- each speaker whose words are to be identified preproduce a template vocabulary of the words to be recognized. It can be seen that it would be of great advantage to provide a system which is speaker independent, that is, does not require a series of individual speaker templates.
- feature analysis recognizes words by determining several predefined charac- teristics of the word-acoustic patterns and through decisional software routines, eliminating from consideration, or including in consideration, possible word candidates to be identified. The process may be accomplished at various levels and stages using some or all of the characteristics, but, it should be emphasized that a particular word or utterance to be recognized is not routinely compared to each of the word candidates possible to be recognized in the system. It is this feature which distinguishes the technique of the invention from the "template analysis” of the prior art which required such precision complete unknown word to template comparisions. (The prior art, in fact, usually com ⁇ pared the utterance to each and every word of the vocabulary even though a match was found early in the comparison process.)
- FIG. 1 is a diagrammatic box-diagram of an apparatus for speaker independently determining unknown speech utterances, in accordance with the invention.
- FIG. 2 is a flow chart illustrating the general or generic method for speaker independently determining unknown speech utterances in conjunction with the apparatus of FIG. 1, and in accordance with the invention.
- FIG. 3 is a silhouette of an acoustic waveform of the word
- FIG. 4 is a silhouette of an acoustic waveform of the word "one".
- FIG. 5 is a silhouette of an acoustic waveform of the word "two".
- FIG. 6 is a silhouette of an acoustic waveform of the word "three".
- FIG . 7 is a silhouette of an acoustic waveform of the word
- FIG. 8 is a silhouette of an acoustic waveform of the word
- FIG. 9 is a silhouette of an acoustic waveform of the word "six”.
- FIG. 10 is a silhouette of an acoustic waveform of the word "seven”.
- FIG. 11 is a silhouette of an acoustic waveform of the word "eight".
- FIG. 12 is a silhouette of an acoustic waveform of the word "nine".
- an object of the invention to provide a speaker-independent voice recognition apparatus and method.
- the invention provides an apparatus and method for identifying voice utterances or words including the steps of generating the digitized signal representing the utterance, determining the features of the digital representation including the zero crossing frequencies, energy, zero crossing rates, grouping the determined features into vowel, consonant and syllable groups and identifying the grouped features.
- the method includes the steps of companding the digitized signal to generate a complete decodeable signal representation of the utterance over the dyna-
- the method for recognizing an unknown word or speech utterance as one of a pre- 5 defined set of words includes the steps of establishing at least one flag (feature indicator) which is set when a predefined utterance pattern is present in the unknown speech utterance. At least one signal representing the gross parameter of the unknown . utterance is established, and a plurality of signals representing 0. fine parameters of predefined representations of the unknown utterance are also established. The unknown speech utterance is tested to determine whether to set the flag, and the gross para ⁇ meter representing the signal is determined.
- the predefined set of words is searched to identify at least one of them which is 5 characterized by at least the utterance pattern indicated by the set flag, if any, and by at least the gross parameter indicated by the gross parameter indicating signal. Finally, it is deter ⁇ mined whether the set of identified features associated with the unknown word are adequate to identify the word.
- the voice control recognition system in accordance with a preferred embodiment of the invention, is achieved with both hardware and software requirements presently defined. It should be emphasized that the apparatus and method herein described 5 achieves speaker-independent voice recognition as a time-domain process, without regard to frequency-domain analysis. That is, no complicated orthogonal analysis or fast Fourier transformed analysis are used, thereby enabling rapid, real time operation of
- a system 10 utilizes a central pro- 5 cessing unit (CPU) 11 in conjunction with other hardware elements to be described.
- the CPU 11 can be any general purpose appropriately programmed computer, and, in fact, can be a CPU portion of any widely available home computer, such as those sold by IBM, APPLE, COMMODORE, RADIO SHACK, and other vendors.
- a 10 memory element 12 is in data and control communication with the
- An output device 15 is provided to convert the output signal generated by the CPU 11 to an appropriate useable form.
- the output of the apparatus 10 is delivered to a CRT screen 16 for display; con ⁇ sequently, the output box 15 in the embodiment illustrated will contain appropriate data decoders and CRT drivers, as will be apparent to those skilled in the art. Also, it will be apparent 20. to those skilled in the art that many other output utilization devices can be equally advantageously employed. For instance, by way of example and not limitation, robots, data receivers for flight control apparatuses, automatic banking devices, and the like may be used. Diverse utilization application devices will 25 be clearly apparent to those skilled in the art.
- a bandpass filter 20 At the data input port of the CPU 11, a bandpass filter 20, an analog-to-digital (A/D) and companding circuitry 21 are pro ⁇ vided.
- A/D analog-to-digital
- companding circuitry 21 At the data input port of the CPU 11, a bandpass filter 20, an analog-to-digital (A/D) and companding circuitry 21 are pro ⁇ vided.
- the companding operation provided by the A/D and com ⁇ panding circuitry 21 enables a wide, dynamic energy range of
- a microphone or other speech-receiving apparatus 23 is provided into which a speech utterance made by a person 24 is received and applied to the CPU 11 via the bandpass filter 20 and A/D in companding cir ⁇ cuitry 21.
- control algorithm portion 27 of the memory is provided in which the various machine CPU operational steps are contained. Also contained in the memory element 12 is a word buffer 28 as well as a portion for various data tables developed by the control algorithm 27 from the received utterance delivered to the CPU 11. The detailed characteristic portion are contained in the area 29.
- the analog signal is applied to a bandpass filter 20, then to the A/D and companded circuitry 21 where it is digitized and companded.
- the bandpass filter and A/D compander 21 may be of any commercial available bandpass, A/D and com ⁇ panding circuits, an example of which are, respectively, an MK5912 and MK5116, provided by Mostek Corporation of Dallas, Texas.
- the companding circuitry 21 follows the u-255 Law Companding Code.
- the signal received along the path including the microphone 23, bandpass filter 20 and A/D and companding circuitry 21 continuously receives and applies a signal to the CPU 11 which, under the control of the control algorithm 27 in the memory element 12, examines or deter ⁇ mines when the signal represents a speech utterance or, as referred to herein, a word.
- the detailed characteristic data tables 29 and summary tables 33 are developed, as well as the various flags 35.
- the wordset 36 is then examined, and wordsubsets determined, both of possible can ⁇ didates for the word to be detected and also sets of words which are not the word to be detected.
- the characteristics of the received word are then further refined and the word selection/elimination process is continued until the final word is selected, or until no word is selected.
- the CPU develops an output to the output device 15, either directly on line 45 or indirectly via the A/D and com ⁇ panding circuitry 21 on line 46 for application to the output utilization device, in the embodiment illustrated, the television or CRT display 16.
- the A/D and com ⁇ panding circuitry 20 includes a decoder circuit so that if desired, the unknown word can be decoded and applied in analog form to the output device 15 for application to a speaker (not shown) or the like for verification. More particularly, the steps for determining the word or other speech utterance are shown in box-diagram form in FIG.
- the unknown word 50 in analog form, is digi ⁇ tized and companded, box 52.
- the human ear is particularly adapted for detecting subtle nuances of volume or intensity of sounds in human speech recognition, machines or apparatuses used in the past do not have such dynamic range readily available.
- the companded signals can be uncompanded or decoded at a receiving terminal to enable the original signal to be recreated for the listener.
- the signal to be recognized is first companded in the box indicated by the reference numeral 52. This enables the signal having low levels to be amplified so that the ordinarily recognizable low energy levels of the signal can be recognized, and, additionally, the high level portions of the signal can be preserved for appropriate processing.
- companding is recognized in the telecommunications art, one companding law being referred to, for example, is the u-255 Law Companding Code.
- the companding pro ⁇ cess in essence produces a signal having an output energy level which is nonlinearly related to the input signal level at high and low signal levels, and linearly related at the mid-range signal level values.
- the decoder has an appropriate inverse transfer function. As mentioned with reference to FIG. 1, the digitized and companded signal is applied to the CPU 11 con ⁇ tinuously, and as the various sounds and other background noises are received, the presence, if any, of a word or any other speech utterance is determined, box 53.
- the manner in which the start of an unknown word is determined is to monitor con ⁇ tinuously the signal output from the A/D and companding circuitry 21 and noting when the signal level exceeds a predetermined level for a predetermined time.
- a "word" is initially defined (although, as below described, if a major vowel is not found in the word, the development of the data representing the characteristics of the word is not completed) . It has been found that a level of approximately l/8th of the maximum anticipated signal level for about 45 milliseconds can be used to reliably determine the beginning of a signal having a higher probability of being an appropriate word to be recognized.
- a word length of 600 milliseconds is stored for analysis.
- the word to be recognized is defined to be contained in a window of 600 millise ⁇ cond. It will be appreciated that some words will occupy more time space of the 600 millisecond window than others, but the amount of window space can be, if desired, one of the parameters or characteristics used in including and excluding word patterns of the possible set of words which the unknown word could possibly be.
- the signal detected at the input to the system is con ⁇ tinuously circulated through the unknown word buffer of the memory until the signal having the required level and duration occurs.
- the central processing unit 11 on an interrupt level begins the process of defining the word "window".
- the first step in defining the word "window” is to examine the data in the unknown word buffer 28 backwards in time until a silence condition is detected. From the point at which silence is detected, a 600 millisecond window is defined. It should be emphasized that during this time, once the beginning of the 600 millisecond window is defined, the various characteristic data tables are being generated, thus, enabling rapid, dynamic processing of the unknown word.
- the window is divided into ten millisecond sections or frames, each frame containing a portion of the digitized word signal in the window. This is indicated in box 55.
- Table 1 represents the number of zero crossings of the companded signal, each of the numbers in each frame representing the number of zero crossings of the signal in that particular frame.
- H thirtee
- the companded signal there are fifteen(H) crossings.
- the third ten millisecond frame there are also fiftee (H) zero crossing.
- the fourth ten millisecond frame there are eighteen(H) zero crossings, and so forth.
- Table 2 represents an energy-related value of the signal represented in each ten millisecond frame.
- the energy approxi ⁇ mation in accordance with the invention is determined by the sum of the absolute value of the positive going signal excursions of the companded signal in each frame multiplied times four.
- the value in the first ten millisecond frames is eight(H).
- the values in the second and third ten millisecond frames are A(H) and B(H), respectively .
- Table 3 represents in accordance with the invention, the value of the peak-to-peak maximum voltage of the companded signal in each of the respective ten millisecond frames.
- the peak-to-peak maximum voltage in the first three ten millisecond 10. frames is four(H).
- the value in the fourth ten millisecond frame is five(H) and the value in the fifth ten millisecond frame is five(H), and so on.
- Table 4 is the number of major cycles contained in each of the ten millisecond frames.
- a major cycle is determined by the 15 number of cycles which exceed 50 percent of the maximum amplitude of the signal contained throughout the entire 600 millisecond word window.
- the number of major cycles is one(H).
- the number of major cycles is two(H).
- the number of major cycles is five(H) , and so on.
- Table 5 represents the absolute number of cycles contained within each ten millisecond frame.
- the absolute number of cycles contains both the number of major cycles, as set forth in 25 Table 4 as well as the lesser cycles, i.e., those having less than 50 percent of the amplitude of the signal contained in the 600 millisecond window.
- the first ten millisecond fra ⁇ me has a value of D(H).
- the second frame has a value of thirteen ( H) and the third frame has a value of seventeen(H) , and so on . 15
- Summary Characteristic Data Table 6 includes four values. s
- VST Vehicle Start
- VEN Vehicle End - Start of Second Syllable
- frame 1B Frame 1B(H)
- VEN2 End of Second Syllable
- the vowel start frame is determined from the data contained in the detailed characteristic data table representing the peak- to-peak maximum voltage, Table 3.
- the method by which the vowel start is determined is by determining whether a predetermined peak-to-peak voltage value exists for a predetermined number of frames. For example, if 3/4ths of the maximum peak-to-peak value exists in six frames, a major vowel would appear to be present. From the point at which the first 3/4ths peak-to-peak value appears in the 600 millisecond word window, the previous word frames are examined to a point at which the voltage drop is at least l/8th of the peak-to-peak voltage value.
- Other methods for determining the vowel start can be used from the data in the • tables, for example, another method which can be used is to exa ⁇ mine the data preceeding the 3/4ths peak-to-peak value frame
- the vowel end (or start of the second syllable) frame is determined from an examination of the detailed characteristic data tables. More particularly, the detailed characteristic data table contain the energy related values, Table 2 , are examined to determine the vowel end. Thus, when the energy falls to a prede ⁇ termined level of energy for a predetermined number of frames , then the vowel end is defined. For example, if the energy falls to a value of 1/8th of the maximum energy for a period of about eight frames, the frame at which the energy falls to the l/8th level is defined as the vowel end.
- the end of the second syllable is determined from the detailed characteristic data tables in a fashion similar to the determination of the Vowel End (VEN) determination above described.
- the maximum peak-to-peak voltage over the entire sample is determined directly from the data in Table 3, from which it can be seen that the largest value in the table is the value FKH) which occurs at frame 16.
- the data in the summary characteristic data tables * 7, 8, and 9, are generated from grouped data from the detailed characteristic data tables 1-5.
- the word to be evaluated within the 6/10ths second window is divided into four sections, beginning with the vowel-start frame indicated by the parameter VST in Table 6 and ending with the vowel end frame, as indicated by the parameter VEN in Table 6.
- the vowel- start frame begins in frame D(H) and the vowel ends in frame 17(H).
- the first- l/4th of the divided data is contained in frames 13 to 15
- the second l/4th is contained in frames 16 to
- the third l/4th is contained in frames 19 to 21, and the last l/4th is contained in frames 22 to 23.
- Summary characteristic data table 7 is developed from the detailed characteristic data table presenting the approximate energy set forth in Table 2.
- the summary values listed in Table 7 are as follows: HZl - the average number of zero crossings of the companded signal in the first quarter of the frames beginning at the vowel start (VST) and ending at the end of the first vowel (VEN).
- HZ2 - is the average number of zero crossings of the com ⁇ panded signal for the second and third quarters of the frames
- HZ3 - is the average number of zero crossings of the com ⁇ panded signal for the last quarter during the period VST-VEN.
- HZ4 - set forth in Table 7 is a post-end characteristic 5 which represents the average number of zero crossings of the com ⁇ panded signal beginning at a point after the end of the first vowel and includes the second syllable, if any. In the case illustrated, the values in this region is the "VEN" sound of the word "seven”. JLO In similar fashion, a summary table of the average number of major cycles over the same one-quarter portions of the period between the vowel start (VST) and vowel end (VEN) is developed.
- HZF1 is the average number of major cycles in the first quarter
- HZF2 is the average number of major cycles in the second 15 and third quarters
- HZF3 is the average number of major cycles in the fourth quarter
- HZF4 is the average number of major cycles in the post-end period.
- HZAl represents the average of the abso ⁇ lute number of cycles in the first quarter
- HZA2 represents the 25 average of the absolute number of cycles in the second and third quarters
- HZA3 represents the average of the absolute number of cycles in the fourth quarter
- HZA4 represents the average of the absolute number of cycles in the post-end period.
- a number of special flags are determined.
- the preliminary special flags are* set upon the existence of: a leading "s”, a leading "t”, a trailing "s”, a trailing "t”, a multiple syllable word, a second syllable empha- sized word, a trailing "p", a leading "f”, a trailing "f”, a leading "w”, a possible leading "s”, a possible leading "t”, a possible trailing "s”, a possible trailing "t”, and a possible trailing "p".
- the energy values of the word “one” has a relatively shallow leading edge building up to a peak-energy value.
- the word “two” has an explosive initial energy value, followed by a period of minimum energy, then the remaining word sounds completing the word after the initial "t” sound.
- one or more special charac ⁇ teristics can be defined.
- the characteristics which are defined depend primarily upon the type of characteristics necessary to recognize the word as well as distinguishing characteristics which distinguish the word from other words which may be in the system vocabulary. The greater the number of words in the vocab ⁇ ulary and the closer the number of words from which an unknown word must be distinguished, the greater the number of charac ⁇ teristics which must be included and examined before an unknown word can be identified.
- the word “six” can have six different characteristics by which the word “six” is deter ⁇ mined when the included vocabulary against which it is compared includes only the numbers 0 through 9 and not words which are of close sound, such as the words “sex”, “socks”, “sucks”, etc.
- the six numerically distinguishing characteristics are: (1) a leading "s” must be present (leading "s” flagset);
- the vowel must be a high-frequency vowel (determined from tables 1 and 5 ⁇ ; 22
- the post-end characteristic must be of low amplitude with a high frequency tail of hydration, for example, greater than 30 milliseconds.
- J.Q of word identification is not dependent upon the speed against which each word in the vocabulary is compared.
- the issue to be determined in each particular system is that of determining the number of characteristics of a particular unknown word which must be identified before the word can be identified.
- the issue is determined principally by the
Abstract
Dispositif (Figure 1) et procédé (Figure 2) permettant l'identification d'émissions sonores vocales ou de mots, consistant à produire le signal numérisé (52) représentant l'émission vocale, à déterminer les caractéristiques de la représentation numérique comprenant les fréquences de croisement zéro, l'énergie, les taux de croisement zéro (56), à grouper les caractéristiques déterminées en groupes de voyelles, consonnes et syllabes et à identifier les caractéristiques regroupées. Le procédé consiste à comprimer-dilater (52) le signal numérisé pour produire une représentation complète décodable du signal de l'émission vocale sur toute la gamme de l'énergie dynamique du signal de manière à accroître la précision de la reconnaissance vocale indépendamment du locuteur. Le procédé de reconnaissance d'un mot ou d'une émission vocale inconnu parmi un ensemble prédéfini de mots consiste à établir au moins un drapeau (indicateur de caractéristiques) qui est établi (60) lorsqu'un modèle prédéfini d'émission vocale est présent dans l'émission vocale inconnue. On établit au moins les paramètres bruts de l'émission vocale inconnue ainsi qu'une pluralité de paramètres fins de représentation prédéfinie de l'émission vocale inconnue. L'émission vocale inconnue est testée pour déterminer s'il y a lieu d'établir le drapeau et l'on détermine le paramètre brut représentant le signal. L'ensemble de mots prédéfini est trié pour identifier au moins un mot se caractérisant par au moins les paramètres bruts. On détermine finalement si l'ensemble de caractéristiques identifié associées au mot inconnu permet d'identifier le mot d'une manière adéquate.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US1983/000750 WO1984004620A1 (fr) | 1983-05-16 | 1983-05-16 | Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0148171A1 true EP0148171A1 (fr) | 1985-07-17 |
Family
ID=22175144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19830902050 Withdrawn EP0148171A1 (fr) | 1983-05-16 | 1983-05-16 | Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP0148171A1 (fr) |
WO (1) | WO1984004620A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5774851A (en) * | 1985-08-15 | 1998-06-30 | Canon Kabushiki Kaisha | Speech recognition apparatus utilizing utterance length information |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3553372A (en) * | 1965-11-05 | 1971-01-05 | Int Standard Electric Corp | Speech recognition apparatus |
US3499987A (en) * | 1966-09-30 | 1970-03-10 | Philco Ford Corp | Single equivalent formant speech recognition system |
US3940565A (en) * | 1973-07-27 | 1976-02-24 | Klaus Wilhelm Lindenberg | Time domain speech recognition system |
US4335302A (en) * | 1980-08-20 | 1982-06-15 | R.L.S. Industries, Inc. | Bar code scanner using non-coherent light source |
-
1983
- 1983-05-16 EP EP19830902050 patent/EP0148171A1/fr not_active Withdrawn
- 1983-05-16 WO PCT/US1983/000750 patent/WO1984004620A1/fr unknown
Non-Patent Citations (1)
Title |
---|
See references of WO8404620A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO1984004620A1 (fr) | 1984-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Abdelatty Ali et al. | Acoustic-phonetic features for the automatic classification of fricatives | |
EP0109190A1 (fr) | Dispositif pour la reconnaissance de monosyllabes | |
US3770892A (en) | Connected word recognition system | |
US5025471A (en) | Method and apparatus for extracting information-bearing portions of a signal for recognizing varying instances of similar patterns | |
EP0178509A1 (fr) | Système de formation d'éléments de référence pour la reconnaissance de la parole | |
JPH0376472B2 (fr) | ||
US4665548A (en) | Speech analysis syllabic segmenter | |
Ivanov et al. | Modulation Spectrum Analysis for Speaker Personality Trait Recognition. | |
US5995924A (en) | Computer-based method and apparatus for classifying statement types based on intonation analysis | |
CN111724770A (zh) | 一种基于深度卷积生成对抗网络的音频关键词识别方法 | |
Shareef et al. | Gender voice classification with huge accuracy rate | |
US3198884A (en) | Sound analyzing system | |
JP2996019B2 (ja) | 音声認識装置 | |
EP0148171A1 (fr) | Dispositif et procede permettant de reconnaitre des emissions sonores vocales independamment du locuteur | |
CN110933236A (zh) | 一种基于机器学习的空号识别方法 | |
David | Artificial auditory recognition in telephony | |
Hasija et al. | Recognition of Children Punjabi Speech using Tonal Non-Tonal Classifier | |
Mishra et al. | Speaker identification, differentiation and verification using deep learning for human machine interface | |
Mufungulwa et al. | Enhanced running spectrum analysis for robust speech recognition under adverse conditions: A case study on japanese speech | |
Paudzi et al. | Evaluation of prosody-related features and word frequency for Malay speeches | |
Silipo et al. | Automatic detection of prosodic stress in american english discourse | |
Wehde | Computerized speech analysis | |
Vysotsky | A speaker-independent discrete utterance recognition system, combining deterministic and probabilistic strategies | |
CN115547340A (zh) | 语音同一性的检验方法、装置、电子设备及存储介质 | |
CN111599381A (zh) | 音频数据处理方法、装置、设备及计算机存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB SE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19850704 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: KIRKPATRICK, ROBERT, D. |