EP0141497A1 - Voice recognition - Google Patents
Voice recognition Download PDFInfo
- Publication number
- EP0141497A1 EP0141497A1 EP84305702A EP84305702A EP0141497A1 EP 0141497 A1 EP0141497 A1 EP 0141497A1 EP 84305702 A EP84305702 A EP 84305702A EP 84305702 A EP84305702 A EP 84305702A EP 0141497 A1 EP0141497 A1 EP 0141497A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- tes
- signal
- voice
- speech
- symbol stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012360 testing method Methods 0.000 claims description 9
- 238000000034 method Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 abstract description 3
- 239000011159 matrix material Substances 0.000 description 24
- 238000010586 diagram Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
Definitions
- This invention relates to a method of and system for recognising voice signals.
- TES time encoded speech
- a voice recognition system characterised in that voice signals are encoded in a TES format, and that the relationships between at least some of the parameters comprising the TES symbol stream and a test signal (or signals) are examined to provide an output signal indicative of the nature of the voice signal as a result of the examination.
- Time encoded speech has previously only been considered in respect of digital speech transmission. See for example published patent application 2020517A.
- Time encoded speech is a form of speech waveform coding.
- the speech waveform is broken into segments between successive real zeros.
- Fig. 1 shows a random speech waveform and the arrows indicate the points of zero crossing.
- the code consists of a single digital word. This word is derived from two parameters of the segment, namely its quantised time duration and its shape. The measure of duration is straightforward and Fig. 2 illustrates the quantised time duration for each successive segment - two, three, six etcetera.
- the preferred strategy for shape description is to classify wave segments on the basis of the number of positive minima or negative maxima occurring therein, although other shape descriptions are also appropriate. This is represented in Fig. 3 - nought, nought, one etcetera. These two parameters can then be compounded into a matrix to produce a unique alphabet of numerical symbols. Fig. 4 shows such an alphabet. Along the rows the "S" parameter is the number of maxima or minima and down the columns the D parameter is the quantised time duration. However this naturally occurring alphabet has been simplified based on the following observations.
- time encoded speech format Another important aspect associated with the time encoded speech format is that it is not necessary to quantise the lower frequencies so precisely as the higher frequencies.
- the first three symbols (1, 2 and 3), having three different time durations but no maxima and minima, are assigned the same descriptor (1)
- symbols 6 and 7 are assigned the same descriptor (4)
- symbols 8, 9 and 10 are assigned the same descriptor (5) with no shape definition and the descriptor (6) with one maximum or minimum.
- Fig. 7 is shown part of the time encoded speech symbol stream for this word spoken by the given speaker and this represents the symbol stream which will be produced by an encoder such as the one to be described with reference to Figs. 5 and 6, utilising the alphabet shown in Fig. 4.
- Fig. 7 shows a symbol stream for the word "SIX”
- Fig. 8 shows a two dimensioned plot or "A" matrix of time encoded speech events for the word "SIX”.
- the first figure 239 represents the total number of descriptors (1) followed by another descriptor (1).
- the Figure 71 represents the number of descriptors (2) followed each by a descriptor (1).
- the Figure 74 represents the total number of descriptors (1) followed by a (2). And so on.
- This matrix gives a basic set of criteria used to identify a word or a speaker in a preferred embodiment of the invention.
- Many relationships between the events comprising the matrix are relatively immune to certain variations in the pronounciation of the word. For example the location of the most significant events in the matrix would be relatively immune to changing the length of the word from "SIX” (normally spoken) to "SI....IX", spoken in a more long drawn-out manner. It is merely the profile of the time encoded speech events as they occur, which would vary in this case, and other relationships would identify the speaker.
- the TES symbol stream may be formed to advantage into matrices of higher dimensionality and that the simple two dimensional "A"-matrix is described here for illustration purposes only.
- FIG. 5 there is shown a flow diagram of a voice recognition system according to an embodiment of the present invention.
- the speech utterance from a microphone tape recording or telephone line is fed at "IN" to a pre-processing stage 1 which includes filters to limit the spectral content of the signal from for example 300 Hz to 3.3 kHz.
- a pre-processing stage 1 which includes filters to limit the spectral content of the signal from for example 300 Hz to 3.3 kHz.
- some additional pre-processing such as partial differentiation/integration may be required to give the input speech a predetermined spectral content.
- AC coupling/DC removal may also be required prior to time encoding the speech (TES coding).
- Fig. 5a shows one arrangement in which, following the filtering, there is a DC removal stage 2, a first order recursive filter 3 and an ambient noise DC threshold sensing stage 4 which responds only if a DC threshold, dependent upon ambient noise, is exceeded
- a TES coder 5 enters a TES coder 5 and one embodiment of this is shown in Fig. 6.
- the band-limited and pre-processed input speech is converted into a TES symbol stream via an A/D converter 6 and suitable logic RZ logic 7, RZ counter 8, extremum logic 9 and positive minimum and negative maximum counter 10.
- a programmable read-only-memory 11, and associated logic acts as a look-up table containing the TES alphabets of Fig. 4 to produce an "n" bit TES symbol stream in response to being addressed by a) the count of zero crossings and b) the count of positive minimums and negative maximums such for example as shown for part of the word "SIX" in Fig. 7.
- Fig. 4 the coding structure of Fig. 4 is programmed into the architecture of the TES coder 5.
- the TES coder identifies the DS combinations shown in Fig. 4, converts these into the symbols shown appropriately in Fig. 4 and outputs them at the output of the coder 5 and they then form the TES symbol stream.
- a clock signal generator 12 synchronises the logic.
- the A-matrix appears in the Feature Pattern Extractor box 31.
- the pattern to be extracted or the feature to be extracted is the A matrix. That is the two dimensional matrix representation of the TES symbols.
- the two dimentional A matrix which has been rormed is compared with the reference patterns previously generated and stored in the Reference Pattern block 21. This comparison takes place in the Feature Pattern Comparison block 41, successive reference patterns being compared with the test pattern or alternatively the test pattern being compared with the sequence of reference patterns, to provide a decision as to which reference pattern best matches the test pattern.
- This and the other functions shown in the flow diagram of Fig. 5 and within the broken line L are implemented in real time on a Plessey MIPROC computer.
- a PDP11 has been used as a system builder and loader and to analyse results.
- FIG. 9 A detailed flow diagram for the matrix formation 31 is shown in Fig. 9. Boxes 34 and 35 correspond to the speech symbol transformation or TES coder 5 of Fig. 5 and the feature pattern extractor or matrix formation box 31 of Fig. 5 corresponds with boxes 32 and 33 of Fig. 9.
- the flow diagram of Fig. 9 operates as follows:-
- a telephone set comprises a voice recogniser 102 such as already described with reference to Figs. 5 and 5a.
- a microphone 103 receives the acoustic signals and feeds them to the recogniser 102 which has a control switch 104/1 for switching it on or off, coupled with a hook switch 104/2. This switch would be pressed to operate each time the 'phone is used and would maintain the recogniser active for a predetermined period until a recognised command had been received.
- Such commands would include the word "DIAL”. There would then follow, for example, the number as a series of digits "ZERO", “ONE” etcetera to "NINE".
- PAUSE would cause a pause in the dialling sequence for inserting dialling pauses for say level nine PABX's.
- Other commands would include “CANCEL”, “OFF HOOK”, “ON HOOK” or their equivalent.
- the command “DIAL” could be arranged to effect the "OFF HOOK” condition for dialling for example.
- the TES recogniser would be implemented on a single chip computer such as INTEL 8049 etcetera.
- the recogniser has another switch 105, which could also be implemented by voice commands, to switch between a recognising mode in which the telephone is operable, to a training or learning mode, in which the recognisable patterns are generated in the reference pattern store 21, Fig. 5, updated with a changed voice, such as might be necessary when the operator has a cold or with a different-from-usual operator.
- the last recognised pattern would be used as a new input to the reference patterns, to replace that reference pattern which had been least frequently used up until that time.
- the recognition matrix would change with it, without the machine having to be specifically re-programmed.
- the telephone set has an automatic dialling chip TCM5089 which is controlled by the recogniser 102.
- the reference patterns are generated by speaking the various command words while the system is switched to the traning mode.
- the system will then store the test pattern, e.g. for the word "SIX" in the set 21 of reference patterns.
- the word "SIX" is converted into the A matrix and in the software the feature pattern correlation is carried out whereby all the A or higher dimensional matrices held in store are compared in turn with the A or higher dimensional matrices generated by the spoken command and looks for a relationship which may be a correlation. A delay will be imposed to enable the comparison to be made.
- the 26 symbol alphabet used in the current VR evaluation was designed for a digital speech system.
- the alphabet is structured to produce a minimum bit-rate digital output from an input speech waveform, band-limited from 300 Hz to 3.3 kHz. To economise on bite-rate, this alphabet maps the three shortest speech segments of duration 1, 2, and 3, time quanta, into the single TES symbol "I". This is a sensible economy for digital speech processing, but for voice recognition, it reduces the options available for discriminating between a variety of different short symbol distributions usually associated with unvoiced sounds.
- TES alphabets could be used to advantage; for example pseudo zeros (PZ) and Interpolated zeros (IZ).
- PZ pseudo zeros
- IZ Interpolated zeros
- TES converter As a means for an economical voice recognition algorithm, a very simple TES converter can be considered which produces a TES symbol stream from speech without the need for an A/D converter.
- the proposal utilises Zero Crossing detectors, clocks, counters and logic gates. Two Zero Crossing detectors (ZCD) are used, one operating on the original speech signal, and the other opeating on the differentiated speech signal.
- ZCD Zero Crossing detectors
- the d/dt output can simply provide a count related to the number of extremum in the original speech signal, over any specified time interval.
- the time interval chosen is the time between the real zeros of the signal viz. the number of clock periods between the outputs of the ZCD associated with the undifferentiated speech signal. These numbers may be paired and manipulated with suitable logic to provide a TES symbol stream.
- This option has a number of obvious advantages for commercial embodiments but it lacks the flexibility associated with the A/D version. Nevertheless, it represents a level of 'front end' simplicity which could have a dramatic impact on a number of important commercial factors.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Electric Clocks (AREA)
- Selective Calling Equipment (AREA)
Abstract
Description
- This invention relates to a method of and system for recognising voice signals.
- In the IEEE Transactions and Communications Vo. Com-29 No. 5, May 1981, there is described the state of the art.
- M.H. Kuhn of Philips GmbH., Hamburg, in European Electronics
Issues 6, 1981 toIssue 1, 1982 has written a series of articles describing the theory of voice recognition and the systems devised by Texas Instruments, Philips and Bell Laboratories. - Then again, Messrs. B.E. Ray and C.R. Evans of the National Physical Laboratories, Teddington/Middlesex., wrote an article in Int.J. Man-Machine Studies (1981) 14, 13 to 27, describing their work in developing a practical system of speech recognition.
- These various papers describe complete analyses and in most cases attempt to recognise a large vocabulary and even continuous speeach.
- In some applications only a limited vocabulary is required such as ten or twenty differenct instruction words to operate a machine.
- Such voice recognition equipments are already commercially available, e.g.:-
- Nippon Electric Co. DP-200
- Interstate Electronics VRC-100-1
- Votan V-5000
- Auricle T-950
- Intel i SBC-570.
- They are relatively expensive and complex, operating on the principle of dividing the sounds into frequency bands with filters and analysing the energy levels in each band.
- We have now discovered that a technique previously only used for speech encoding in, for example digital speech transmission applications, can be used for speech recognition in a relatively cheap, effective way.
- According to the present invention there is provided a method of recognising voice signals characterised by utilising time encoded speech (TES).
- According to a further aspect of the present invention there is provided a voice recognition system, characterised in that voice signals are encoded in a TES format, and that the relationships between at least some of the parameters comprising the TES symbol stream and a test signal (or signals) are examined to provide an output signal indicative of the nature of the voice signal as a result of the examination.
- Time encoded speech has previously only been considered in respect of digital speech transmission. See for example published patent application 2020517A.
- In order that the invention can be more clearly understood reference will now be made to the accompanying drawing, in which:-
- Fig. 1 is a random speech waveform;
- Fig. 2 represents the quantised duration of each segment of the waveform of Fig. 1;
- Fig. 3 represents the maxima or minima occurring in each segment of the waveform of Fig.l;
- Fig. 4 is a symbol alphabet derived for use in an embodiment of the present invention;
- Fig. 5 is a flow diagram of a voice recognition system according to the embodiment of the present invention;
- Fig. 6 shows a block diagram of the encoder part of the system of Fig. 5;
- Fig. 7 shows a symbol stream for the word SIX generated in the system of Fig. 5 to be read sequentially in rows left to right and top to bottom.;
- Fig. 8 shows a two dimensional "A" matrix for the symbol stream of Fig. 7;
- Fig. 9 shows a flow diagram for generating the A matrix of Fig. 8, and
- Fig. 10 is a diagram of the application of the system of Fig. 5 to a loudspeaking telephone.
- Time encoded speech is a form of speech waveform coding. The speech waveform is broken into segments between successive real zeros. As an example Fig. 1 shows a random speech waveform and the arrows indicate the points of zero crossing. For each segment of the waveform the code consists of a single digital word. This word is derived from two parameters of the segment, namely its quantised time duration and its shape. The measure of duration is straightforward and Fig. 2 illustrates the quantised time duration for each successive segment - two, three, six etcetera.
- The preferred strategy for shape description is to classify wave segments on the basis of the number of positive minima or negative maxima occurring therein, although other shape descriptions are also appropriate. This is represented in Fig. 3 - nought, nought, one etcetera. These two parameters can then be compounded into a matrix to produce a unique alphabet of numerical symbols. Fig. 4 shows such an alphabet. Along the rows the "S" parameter is the number of maxima or minima and down the columns the D parameter is the quantised time duration. However this naturally occurring alphabet has been simplified based on the following observations. For economical coding it has been found acoustically that the number of naturally occurring distinguishable symbols produced by this process may be mapped in a non-linear fashion to form a much smaller number ("Alphabet") of code descriptors and in accordance with a preferred feature of the invention such code or event descriptors produced in the time encoded speech format are used for Voice Recognition. If the speech signal is band limited - for example to 3.5 kHz - then some of the shorter events cannot have maxima or minima. In the preferred embodiment quantising is carried out at 20 Kbits per second. The range of time intervals expected for normal speech is about three to thirty 20 Kbit samples, i.e. three 20 Kbit samples represent one half cycle at 3.3 kHz and thirty 20 Kbit samples represent one half cycle at 300 Hz.
- Another important aspect associated with the time encoded speech format is that it is not necessary to quantise the lower frequencies so precisely as the higher frequencies.
- Thus referring to Fig. 4, the first three symbols (1, 2 and 3), having three different time durations but no maxima and minima, are assigned the same descriptor (1),
symbols symbols - It is now proposed to explain how these descriptors are used in Voice Recognition and as an example it is appropriate at this point to look at the descriptors defining a word spoken by a given speaker. Take for example the word "SIX". In Fig. 7 is shown part of the time encoded speech symbol stream for this word spoken by the given speaker and this represents the symbol stream which will be produced by an encoder such as the one to be described with reference to Figs. 5 and 6, utilising the alphabet shown in Fig. 4.
- Fig. 7 shows a symbol stream for the word "SIX", and Fig. 8 shows a two dimensioned plot or "A" matrix of time encoded speech events for the word "SIX". Thus the first figure 239 represents the total number of descriptors (1) followed by another descriptor (1). The Figure 71 represents the number of descriptors (2) followed each by a descriptor (1). The Figure 74 represents the total number of descriptors (1) followed by a (2). And so on.
- This matrix gives a basic set of criteria used to identify a word or a speaker in a preferred embodiment of the invention. Many relationships between the events comprising the matrix are relatively immune to certain variations in the pronounciation of the word. For example the location of the most significant events in the matrix would be relatively immune to changing the length of the word from "SIX" (normally spoken) to "SI....IX", spoken in a more long drawn-out manner. It is merely the profile of the time encoded speech events as they occur, which would vary in this case, and other relationships would identify the speaker.
- It should be noted that the TES symbol stream may be formed to advantage into matrices of higher dimensionality and that the simple two dimensional "A"-matrix is described here for illustration purposes only.
- Referring back now to Fig. 5, there is shown a flow diagram of a voice recognition system according to an embodiment of the present invention.
- The speech utterance from a microphone tape recording or telephone line is fed at "IN" to a
pre-processing stage 1 which includes filters to limit the spectral content of the signal from for example 300 Hz to 3.3 kHz. Dependent on the characteristics of the microphone used, some additional pre-processing such as partial differentiation/integration may be required to give the input speech a predetermined spectral content. AC coupling/DC removal may also be required prior to time encoding the speech (TES coding). - Fig. 5a shows one arrangement in which, following the filtering, there is a
DC removal stage 2, a first orderrecursive filter 3 and an ambient noise DCthreshold sensing stage 4 which responds only if a DC threshold, dependent upon ambient noise, is exceeded - The signal then enters a
TES coder 5 and one embodiment of this is shown in Fig. 6. Referring to Fig. 6 the band-limited and pre-processed input speech is converted into a TES symbol stream via an A/D converter 6 and suitablelogic RZ logic 7,RZ counter 8,extremum logic 9 and positive minimum and negativemaximum counter 10. A programmable read-only-memory 11, and associated logic acts as a look-up table containing the TES alphabets of Fig. 4 to produce an "n" bit TES symbol stream in response to being addressed by a) the count of zero crossings and b) the count of positive minimums and negative maximums such for example as shown for part of the word "SIX" in Fig. 7. - Thus the coding structure of Fig. 4 is programmed into the architecture of the
TES coder 5. The TES coder identifies the DS combinations shown in Fig. 4, converts these into the symbols shown appropriately in Fig. 4 and outputs them at the output of thecoder 5 and they then form the TES symbol stream. - A
clock signal generator 12 synchronises the logic. - From the TES symbol stream is created the appropriate matrix by the matrix feature-
pattern extractor 31, Fig. 5, which in this example is a two dimensional "A" matrix. The A-matrix appears in the FeaturePattern Extractor box 31. In this case the pattern to be extracted or the feature to be extracted is the A matrix. That is the two dimensional matrix representation of the TES symbols. At the end of the utterance of the word "six" the two dimentional A matrix which has been rormed is compared with the reference patterns previously generated and stored in theReference Pattern block 21. This comparison takes place in the FeaturePattern Comparison block 41, successive reference patterns being compared with the test pattern or alternatively the test pattern being compared with the sequence of reference patterns, to provide a decision as to which reference pattern best matches the test pattern. This and the other functions shown in the flow diagram of Fig. 5 and within the broken line L are implemented in real time on a Plessey MIPROC computer. A PDP11 has been used as a system builder and loader and to analyse results. - A detailed flow diagram for the
matrix formation 31 is shown in Fig. 9.Boxes TES coder 5 of Fig. 5 and the feature pattern extractor ormatrix formation box 31 of Fig. 5 corresponds withboxes - 1. Given input sample [xn], define "centre clipped" input:-
- 2. Define an "epoch" as consecutive samples of like sign.
- 3. Define "Difference" [dn]
- 4. Define "Extremum" at n with value e if sgn(dn+1) sgn(d ) * e=sn, 0 accorded +ve sign.
- 5. From the sequence of extrema, delete those pairs whose absolute difference in value is less than a given "fluctuation error".
- 6. The output from the TES analysis occurs at the first sample of the new epoch. It consists of the number of contained samples and the number of contained extrema.
- 7. If both numbers fall within given ranges, a TES number is allocated according to a simple mapping. This is done in
box 34 "Screening" in Fig. 9. - 8. If the number of extrema exceeds the maximum, then this maximum is taken as the input. If the number of extrema is less than 1, then the event is considered as arising from background noise (within the value of the [+ve] fluctuation error) and the delay line is cleared.
- 9. If the number of samples is greater than the maximum permitted then the delay line is also cleared.
- 10. The TES numbers are written to a resettable delay line. If the delay line is full, then a delayed number is read and the input/output combination is accumulated into an N dimensional matrix, and in this example N=2. Once reset, the delay line must be reaccumulated before the histogram is updated.
- 11. The assigned number of highest entries ("Significant events") are selected from the histogram and stored with their matrix coordinates, in this example "A" matrix these are two dimensional coordinates to produce for example Fig. 8.
- One application of the voice recognition system is illustrated in Fig. 10 of the accompanying drawings. A telephone set comprises a
voice recogniser 102 such as already described with reference to Figs. 5 and 5a. Amicrophone 103 receives the acoustic signals and feeds them to therecogniser 102 which has a control switch 104/1 for switching it on or off, coupled with a hook switch 104/2. This switch would be pressed to operate each time the 'phone is used and would maintain the recogniser active for a predetermined period until a recognised command had been received. Such commands would include the word "DIAL". There would then follow, for example, the number as a series of digits "ZERO", "ONE" etcetera to "NINE". The word "PAUSE" would cause a pause in the dialling sequence for inserting dialling pauses for say level nine PABX's. Other commands would include "CANCEL", "OFF HOOK", "ON HOOK" or their equivalent. The command "DIAL" could be arranged to effect the "OFF HOOK" condition for dialling for example. - The TES recogniser would be implemented on a single chip computer such as INTEL 8049 etcetera.
- The recogniser has another
switch 105, which could also be implemented by voice commands, to switch between a recognising mode in which the telephone is operable, to a training or learning mode, in which the recognisable patterns are generated in thereference pattern store 21, Fig. 5, updated with a changed voice, such as might be necessary when the operator has a cold or with a different-from-usual operator. In a continuous learning machine, the last recognised pattern would be used as a new input to the reference patterns, to replace that reference pattern which had been least frequently used up until that time. By this means as the input voice gradually changes, the recognition matrix would change with it, without the machine having to be specifically re-programmed. - The telephone set has an automatic dialling chip TCM5089 which is controlled by the
recogniser 102. - The reference patterns are generated by speaking the various command words while the system is switched to the traning mode. The system will then store the test pattern, e.g. for the word "SIX" in the
set 21 of reference patterns. - In the recognition mode the word "SIX" is converted into the A matrix and in the software the feature pattern correlation is carried out whereby all the A or higher dimensional matrices held in store are compared in turn with the A or higher dimensional matrices generated by the spoken command and looks for a relationship which may be a correlation. A delay will be imposed to enable the comparison to be made.
- The 26 symbol alphabet used in the current VR evaluation was designed for a digital speech system. The alphabet is structured to produce a minimum bit-rate digital output from an input speech waveform, band-limited from 300 Hz to 3.3 kHz. To economise on bite-rate, this alphabet maps the three shortest speech segments of
duration - We have determined that the predominance of "1" symbols resulting from this alphabet and this bandwidth may dominate the 'A' matrix distribution to an extent which limits effective discrimination between some words, when comparing using the simpler distance measures. In these circumstances, more effective discrimination may be obtained by arbitrarily exluding "1" symbols and "1" symbol combinations from the 'A' matrix. Although improving VR scores, this effectively limits the examination/comparison to events associated with a much reduced bandwidth of 2.2 kHz. (0.3 kHz - 2.5 kHz). Alternatively and to advantage the TES alphabet may be increased in size to include descriptors for these shorter events.
- Under conditions of high background noise alternative TES alphabets could be used to advantage; for example pseudo zeros (PZ) and Interpolated zeros (IZ).
- As a means for an economical voice recognition algorithm, a very simple TES converter can be considered which produces a TES symbol stream from speech without the need for an A/D converter. The proposal utilises Zero Crossing detectors, clocks, counters and logic gates. Two Zero Crossing detectors (ZCD) are used, one operating on the original speech signal, and the other opeating on the differentiated speech signal.
- The d/dt output can simply provide a count related to the number of extremum in the original speech signal, over any specified time interval. The time interval chosen is the time between the real zeros of the signal viz. the number of clock periods between the outputs of the ZCD associated with the undifferentiated speech signal. These numbers may be paired and manipulated with suitable logic to provide a TES symbol stream.
- This option has a number of obvious advantages for commercial embodiments but it lacks the flexibility associated with the A/D version. Nevertheless, it represents a level of 'front end' simplicity which could have a dramatic impact on a number of important commercial factors.
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AT84305702T ATE48199T1 (en) | 1983-09-01 | 1984-08-22 | VOICE RECOGNITION. |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB8323481 | 1983-09-01 | ||
GB08323481A GB2145864B (en) | 1983-09-01 | 1983-09-01 | Voice recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0141497A1 true EP0141497A1 (en) | 1985-05-15 |
EP0141497B1 EP0141497B1 (en) | 1989-11-23 |
Family
ID=10548188
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP84305702A Expired EP0141497B1 (en) | 1983-09-01 | 1984-08-22 | Voice recognition |
Country Status (6)
Country | Link |
---|---|
US (1) | US5091949A (en) |
EP (1) | EP0141497B1 (en) |
JP (1) | JP2619852B2 (en) |
AT (1) | ATE48199T1 (en) |
DE (1) | DE3480569D1 (en) |
GB (1) | GB2145864B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1989002146A1 (en) * | 1987-09-01 | 1989-03-09 | Reginald Alfred King | Improvements in or relating to apparatus and methods for voice recognition |
WO1992015089A1 (en) * | 1991-02-18 | 1992-09-03 | Reginald Alfred King | Signal processing arrangements |
US6748354B1 (en) | 1998-08-12 | 2004-06-08 | Domain Dynamics Limited | Waveform coding method |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB8416496D0 (en) * | 1984-06-28 | 1984-08-01 | King R A | Encoding method |
GB2223844A (en) * | 1988-10-12 | 1990-04-18 | Graviner Ltd | Flame detector |
NO180737C (en) * | 1988-10-12 | 1997-06-04 | Detector Electronics | Apparatus and method for discriminating between electromagnetic radiation from a fire source and from a non-fire source |
US5237512A (en) * | 1988-12-02 | 1993-08-17 | Detector Electronics Corporation | Signal recognition and classification for identifying a fire |
US5355430A (en) * | 1991-08-12 | 1994-10-11 | Mechatronics Holding Ag | Method for encoding and decoding a human speech signal by using a set of parameters |
GB9806401D0 (en) * | 1998-03-25 | 1998-05-20 | Domain Dynamics Ltd | Improvements in voice operated mobile communications |
US6301562B1 (en) * | 1999-04-27 | 2001-10-09 | New Transducers Limited | Speech recognition using both time encoding and HMM in parallel |
US7085717B2 (en) * | 2002-05-21 | 2006-08-01 | Thinkengine Networks, Inc. | Scoring and re-scoring dynamic time warping of speech |
US6983246B2 (en) * | 2002-05-21 | 2006-01-03 | Thinkengine Networks, Inc. | Dynamic time warping using frequency distributed distance measures |
JP3827317B2 (en) * | 2004-06-03 | 2006-09-27 | 任天堂株式会社 | Command processing unit |
US20080284409A1 (en) * | 2005-09-07 | 2008-11-20 | Biloop Tecnologic, S.L. | Signal Recognition Method With a Low-Cost Microcontroller |
US9697824B1 (en) * | 2015-12-30 | 2017-07-04 | Thunder Power New Energy Vehicle Development Company Limited | Voice control system with dialect recognition |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3278685A (en) * | 1962-12-31 | 1966-10-11 | Ibm | Wave analyzing system |
GB1155422A (en) * | 1965-08-24 | 1969-06-18 | Nat Res Dev | Speech Recognition |
US3742143A (en) * | 1971-03-01 | 1973-06-26 | Bell Telephone Labor Inc | Limited vocabulary speech recognition circuit for machine and telephone control |
GB2020517A (en) * | 1978-04-04 | 1979-11-14 | King R A | Methods and apparatus for encoding and constructing signals |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3125723A (en) * | 1964-03-17 | shaver | ||
USB83255I5 (en) * | 1961-11-14 | |||
GB1170306A (en) * | 1967-11-16 | 1969-11-12 | Standard Telephones Cables Ltd | Apparatus for Analysing Complex Waveforms |
GB1012765A (en) * | 1964-03-06 | 1965-12-08 | Standard Telephones Cables Ltd | Apparatus for the analysis of waveforms |
US3466394A (en) * | 1966-05-02 | 1969-09-09 | Ibm | Voice verification system |
GB1139711A (en) * | 1966-11-30 | 1969-01-15 | Standard Telephones Cables Ltd | Apparatus for analysing complex waveforms |
FR1543791A (en) * | 1966-12-29 | Ibm | Speech analysis system | |
FR2150174A5 (en) * | 1971-08-18 | 1973-03-30 | Dreyfus Jean | |
CH549849A (en) * | 1972-12-29 | 1974-05-31 | Ibm | PROCEDURE FOR DETERMINING THE INTERVAL CORRESPONDING TO THE PERIOD OF THE EXCITATION FREQUENCY OF THE VOICE RANGES. |
US3940565A (en) * | 1973-07-27 | 1976-02-24 | Klaus Wilhelm Lindenberg | Time domain speech recognition system |
US4178472A (en) * | 1977-02-21 | 1979-12-11 | Hiroyasu Funakubo | Voiced instruction identification system |
CA1172366A (en) * | 1978-04-04 | 1984-08-07 | Harold W. Gosling | Methods and apparatus for encoding and constructing signals |
GB2084433B (en) * | 1978-04-04 | 1982-11-24 | Gosling Harold William | Methods and apparatus for encoding and constructing signals |
US4181813A (en) * | 1978-05-08 | 1980-01-01 | John Marley | System and method for speech recognition |
US4763278A (en) * | 1983-04-13 | 1988-08-09 | Texas Instruments Incorporated | Speaker-independent word recognizer |
-
1983
- 1983-09-01 GB GB08323481A patent/GB2145864B/en not_active Expired
-
1984
- 1984-08-22 DE DE8484305702T patent/DE3480569D1/en not_active Expired
- 1984-08-22 AT AT84305702T patent/ATE48199T1/en not_active IP Right Cessation
- 1984-08-22 EP EP84305702A patent/EP0141497B1/en not_active Expired
- 1984-08-31 JP JP59182535A patent/JP2619852B2/en not_active Expired - Fee Related
-
1989
- 1989-01-25 US US07/301,365 patent/US5091949A/en not_active Expired - Lifetime
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3278685A (en) * | 1962-12-31 | 1966-10-11 | Ibm | Wave analyzing system |
GB1155422A (en) * | 1965-08-24 | 1969-06-18 | Nat Res Dev | Speech Recognition |
US3742143A (en) * | 1971-03-01 | 1973-06-26 | Bell Telephone Labor Inc | Limited vocabulary speech recognition circuit for machine and telephone control |
GB2020517A (en) * | 1978-04-04 | 1979-11-14 | King R A | Methods and apparatus for encoding and constructing signals |
Non-Patent Citations (1)
Title |
---|
ICASSP 82 - PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 3rd-5th May 1982, Paris, FR, vol. 2, pages 879-882, IEEE, New York, US; M. BAUDRY et al.: "A simple and efficient isolated words recognition system" * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1989002146A1 (en) * | 1987-09-01 | 1989-03-09 | Reginald Alfred King | Improvements in or relating to apparatus and methods for voice recognition |
US5101434A (en) * | 1987-09-01 | 1992-03-31 | King Reginald A | Voice recognition using segmented time encoded speech |
WO1992015089A1 (en) * | 1991-02-18 | 1992-09-03 | Reginald Alfred King | Signal processing arrangements |
GB2268609A (en) * | 1991-02-18 | 1994-01-12 | Reginald Alfred King | Signal processing arrangements |
GB2268609B (en) * | 1991-02-18 | 1994-10-05 | Reginald Alfred King | Signal processing arrangements |
AU655235B2 (en) * | 1991-02-18 | 1994-12-08 | Domain Dynamics Limited | Signal processing arrangements |
US6748354B1 (en) | 1998-08-12 | 2004-06-08 | Domain Dynamics Limited | Waveform coding method |
Also Published As
Publication number | Publication date |
---|---|
ATE48199T1 (en) | 1989-12-15 |
GB2145864B (en) | 1987-09-03 |
GB2145864A (en) | 1985-04-03 |
US5091949A (en) | 1992-02-25 |
JP2619852B2 (en) | 1997-06-11 |
EP0141497B1 (en) | 1989-11-23 |
JPS6078500A (en) | 1985-05-04 |
DE3480569D1 (en) | 1989-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0141497B1 (en) | Voice recognition | |
US4284846A (en) | System and method for sound recognition | |
US4752958A (en) | Device for speaker's verification | |
US4715004A (en) | Pattern recognition system | |
EP0054365A1 (en) | Speech recognition systems | |
EP0338035B1 (en) | Improvements in or relating to apparatus and methods for voice recognition | |
EP0182989A1 (en) | Normalization of speech signals | |
US20030130846A1 (en) | Speech processing with hmm trained on tespar parameters | |
KR100366057B1 (en) | Efficient Speech Recognition System based on Auditory Model | |
US4423291A (en) | Method for operating a speech recognition device | |
EP0421744A2 (en) | Speech recognition method and apparatus for use therein | |
WO2001077635A1 (en) | Estimating the pitch of a speech signal using a binary signal | |
JPS60200300A (en) | Voice head/end detector | |
GB981153A (en) | Improved phonetic typewriter system | |
Cox et al. | Nonparametric rank-order statistics applied to robust voiced-unvoiced-silence classification | |
EP0386706A2 (en) | Speech recognition apparatus | |
US3499987A (en) | Single equivalent formant speech recognition system | |
Rangoussi et al. | Recognition of unvoiced stops from their time-frequency representation | |
EP0381507A2 (en) | Silence/non-silence discrimination apparatus | |
SU1781701A1 (en) | Method of separation of speech and nonstationary noise signals | |
Ananthapadmanabha et al. | Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes | |
EP0245252A1 (en) | System and method for sound recognition with feature selection synchronized to voice pitch | |
KR100435578B1 (en) | Method and system for phonetic recognition | |
Moore | Segmentation of Acoustic Events U sing Time Encoded Speech (TES) Descriptors TA Moore, J Holbeche and RA King1 | |
EP1173843A1 (en) | Speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Designated state(s): AT BE CH DE FR IT LI LU NL SE |
|
17P | Request for examination filed |
Effective date: 19851104 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: KING, REGINALD ALFRED |
|
17Q | First examination report despatched |
Effective date: 19870219 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE CH DE FR IT LI LU NL SE |
|
REF | Corresponds to: |
Ref document number: 48199 Country of ref document: AT Date of ref document: 19891215 Kind code of ref document: T |
|
ET | Fr: translation filed | ||
ITF | It: translation for a ep patent filed | ||
REF | Corresponds to: |
Ref document number: 3480569 Country of ref document: DE Date of ref document: 19891228 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
ITTA | It: last paid annual fee | ||
EPTA | Lu: last paid annual fee | ||
EAL | Se: european patent in force in sweden |
Ref document number: 84305702.7 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PUE Owner name: REGINALD ALFRED KING TRANSFER- DOMAIN DYNAMICS LIM |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP |
|
BECA | Be: change of holder's address |
Free format text: 951016 *DOMAIN DYNAMICS LTD:7-9 IRWELL TERRACE, BACUP LANCASHIRE OL13 9AJ |
|
NLS | Nl: assignments of ep-patents |
Owner name: DOMAIN DYNAMICS LIMITED |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20030717 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: SE Payment date: 20030728 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: BE Payment date: 20030805 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: LU Payment date: 20030811 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: AT Payment date: 20030829 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: NL Payment date: 20030831 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20030929 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: CH Payment date: 20031024 Year of fee payment: 20 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20040821 Ref country code: CH Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20040821 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20040822 Ref country code: LU Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20040822 Ref country code: AT Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20040822 |
|
BE20 | Be: patent expired |
Owner name: *DOMAIN DYNAMICS LTD Effective date: 20040822 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
EUG | Se: european patent has lapsed | ||
NLV7 | Nl: ceased due to reaching the maximum lifetime of a patent |