WO1993009531A1 - Processing of electrical and audio signals - Google Patents

Processing of electrical and audio signals Download PDF

Info

Publication number
WO1993009531A1
WO1993009531A1 PCT/GB1992/001987 GB9201987W WO9309531A1 WO 1993009531 A1 WO1993009531 A1 WO 1993009531A1 GB 9201987 W GB9201987 W GB 9201987W WO 9309531 A1 WO9309531 A1 WO 9309531A1
Authority
WO
WIPO (PCT)
Prior art keywords
accordance
electrical signal
peak
sound
periodic elements
Prior art date
Application number
PCT/GB1992/001987
Other languages
French (fr)
Inventor
Peter John Charles Spurgeon
Original Assignee
Peter John Charles Spurgeon
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB919122995A external-priority patent/GB9122995D0/en
Application filed by Peter John Charles Spurgeon filed Critical Peter John Charles Spurgeon
Publication of WO1993009531A1 publication Critical patent/WO1993009531A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • the present invention relates to an apparatus and method for enhancing the intelligibility of speech as well as to a method and apparatus for detecting turning points, particularly peaks, within an electrical signal.
  • an apparatus for enhancing the intelligibility of speech comprising means to generate an electrical signal representative of a detected audio signal, means for identifying a plurality of periodic elements comprising said electrical signal, and means for selectively altering the frequency and/or number of said periodic elements in response to signals characteristic of the periodic elements identified so as to thereby generate a modified output signal.
  • Advantageously means may be provided to assess whether the detected audio signal derives from speech.
  • Advantageously means may also be provided to identify whether a particular sound comprised within the detected audio signal is of long or short duration. If said sound is identified as being of long duration the number of periodic elements representative of said sound may be reduced, preferably by omitting one or more periodic elements from said modified output signal. However, if said sound is identified as being of short duration the number of periodic elements representative of said sound may be increased, preferably by repeating one or more of said periodic elements when generating said modified output signal. Alternatively, if said sound is identified as being of long duration the frequency of the periodic elements representative of said sound may be increased. Likewise, if said sound is identified as being of short duration the frequency of the periodic elements representative of said sound may be reduced. Preferably the frequency of the periodic elements representative of said sound is altered by inputting and outputting the periodic elements to and from a storage means at different rates.
  • Advantageously means may be provided for selectively amplifying one or more of said periodic elements.
  • Advantageously means may also be provided to selectively alter the envelope of one or more of said periodic elements.
  • said periodic elements may be defined as comprising that part of the electical signal disposed between successive peaks.
  • said periodic elements may be defined as comprising that part of the electrical signal disposed between successive positive zero crossings.
  • the modified output signal may serve as the input for a loudspeaker drive circuit, an inductive loop or a similar outputting means.
  • Advantageously means may be provided to determine whether the detected audio signal derives from specific non-speech inputs such as door bells, alarms, communication tones and the like.
  • the output signal is modified in accordance with the specific non-speech input identified and used to initiate an appropriate response.
  • a method of enhancing the intelligibility of speech comprising the steps of generating an electrical signal representative of a detected audio signal, identifying a plurality of periodic elements comprising said electrical signal, and selectively altering the frequency and/or number of said periodic elements in response to signals characteristic of the periodic elements identified so as to thereby generate a modified output signal.
  • the method may comprise the additional step of assessing whether the detected audio signal derives from speech.
  • the method may comprise the additional step of identifying whether a particular sound comprised within the detected audio signal is of long or short duration. If said sound is identified as being of long duration the number of periodic elements representative of said sound may be reduced, preferably by ommitting one or more of said periodic elements from said modified output signal. However, if said sound is identified as being of short duration the number of periodic elements representative of said sound may be increased, preferably by repeating one or more of said periodic elements when generating said modified output signal. Alternatively, if said sound is identified as being of long duration the frequency of the periodic elements representative of said sound may be increased. Likewise, if said sound is identified as being of short duration the frequency of the periodic elements representative of said sound may be reduced. Preferably the frequency of the periodic elements representative of said sound may be altered by inputting and outputting the periodic elements to and from a storage means at different rates.
  • the method may comprise the additional step of selectively amplifying one or more of said periodic elements.
  • Advantageosuly the method may comprise the additional step of selectively altering the envelope of one or more of said periodic elements.
  • said periodic elements may be defined as comprising that part of the electrical signal disposed between successive peaks.
  • said periodic elements may be defined as comprising that part of the electrical signal disposed between successive positive zero crossings.
  • the method may comprise the additional step of employing the modified output signal as the input for a loudspeaker drive circuit, an inductive loop or a similar outputting means.
  • the method may comprise the additional step of determining whether the detected audio signal derives from specific non-speech inputs such as door bells, alarms, communication tones and the like.
  • the outpout signal is modified in accordance with the specific non-speech input identified and used to initiate an appropriate response.
  • an apparatus for processing an electrical signal comprising means for detecting peaks in said electrical signal, means for storing peak-to-peak elements of said electrical signal and means for processing said electrical signal by manipulating one or more of said peak-to-peak elements.
  • said means for detecting peaks in said electrical signal may comprise means for periodically sampling said electrical signal, means to calculate the difference in amplitude between successive samples, and means to detect when successive values of said difference in amplitude change in sign from positive to negative.
  • the validity of a potential peak may be assessed by determining whether the amplitude at the potential peak is greater than a minimum threshold value.
  • the value of the minimum threshold may be derived from the rms value of the electrical signal in which peaks are to be detected or alternatively may be predetermined.
  • the validity of a potential peak may be assessed by determining whether the time interval between the potential peak and a previous verified peak is less than a maximum threshold value.
  • the value of the maximum threshold may be predetermined.
  • the validity of a potential peak may be assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
  • a method of processing an electrical signal comprising the steps of detecting peaks in said electrical signal, storing peak-to-peak elements of said electrical signal and processing said electrical signal by manipulating one or more of said peak-to-peak elements.
  • the step of detecting peaks in said electrical signal may comprise the additional steps of periodically sampling said electrical signal, calculating the difference in amplitude between successive samples, and detecting when successive values of said difference in amplitude change in sign from positive to negative.
  • the validity of a potential peak may be assessed by determining whether the amplitude at the potential peak is greater than a minimum threshold value.
  • the value of the minimum threshold may be derived from the rms value of the electrical signal in which peaks are to be detected or alternatively may be predetermined.
  • the validity of a potential peak may be assessed by determining whether the time interval between the potential peak and a previous verified peak is less than a maximum threshold value.
  • the value of the maximum threshold value may be predetermined.
  • the validity of a potential peak may be assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
  • an apparatus for detecting turning points within an electrical signal comprising means to periodically sample said electrical signal, means to calculate the difference in amplitude between successive samples, and means to detect when successive values of said difference in amplitude change in sign.
  • the turning point to be detected may comprise a peak and said means to detect a change in sign may be sensitive to detect a change in successive values of said difference in amplitude from positive to negative.
  • the validity of a potential turning point may be assessed by determining whether the modulus of the amplitude at the potential turning point is greater than a minimum threshold value.
  • the value of the minimum threshold may be derived from the rms value of the electrical signal in which turning points are to be detected or alternatively may be predetermined.
  • the validity of a potential turning point may be assessed by determining whether the time interval between the potential turning point and a previous verified turning point is less than a maximum threshold value.
  • a maximum threshold value Preferably the value of the maximum threshold may be predetermined.
  • Advantageously the validity of a potential peak may be assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
  • a method for detecting turning points within an electrical signal comprising the steps of periodically sampling said electrical signal, calculating the difference in amplitude between successive samples, and detecting when successive values of said difference in amplitude change in sign.
  • the turning point to be detected may comprise a peak and said step of detecting when successive values of said difference in amplitude change in sign may comprise detecting a change from positive to negative.
  • the validity of a potential turning point may be assessed by determining whether the modulus of the amplitude of the potential turning point is greater than a minimum threshold value.
  • the value of the minimum threshold may be derived from the rms value of the electrical signal in which turning points are to be detected or alternatively may be predetermined.
  • the validity of a potential turning point may be assessed by determining whether the time interval between the potential turning point and the previous verified turning point is less than a maximum thereshold value.
  • the value of the maximum threshold is predetermined.
  • the validity of the potential peak may be assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
  • Figure 1 is a typical waveform representative of speech
  • Figure 2 is schematic block diagram of an apparatus for enhancing the intelligibility of speech in accordance with a first embodiment of the present invention
  • Figure 3 is schematic block diagram of an apparatus for enhancing the intelligiblility of speech in accordance with a second embodiment of the present invention
  • Figure 4 is schematic illustration of a typical electrical signal which may be generated as a result of a detected audio signal
  • Figure 5 is a schematic illustration of the signal of Figure 4 showing the signal being sampled at regular time intervals and divided into successive elements with the start of each element being defined at a peak;
  • Figure 6 illustrates the effect of ommitting one of the elements of Figure 5
  • Figure 7 illustrates the effect of repeating a first of the elements of Figure 5 to counter the earlier ommission
  • Figure 8 is schematic illustration of the signal of Figure 4 showing the signal being sampled at regular time intervals and divided into successive elements with the start of each element being defined on the basis of a positive zero crossing;
  • Figure 9 illustrates the effect of ommitting one of the elements of Figure 8.
  • Figure 10 illustrates the effect of repeating a first of the elements of Figure 8 to counter the earlier ommission
  • Figure 11 is a schematic block diagram of an apparatus for identifying peaks in an input signal
  • Figure 12 is a flow diagram illustrating the steps of an interrupt programme used by the apparatus for identifying peaks
  • Figure 13 is a flow diagram illustrating the steps of a main programme loop used by the apparatus for identifying peaks
  • Figure 14 is a flow diagram illustrating the steps of a sub-routine used by the apparatus for detecting peaks.
  • Figure 15 is a schematic block diagram of an apparatus for enhancing the intelligibility of speech in accordance with a third embodiment of the present invention.
  • each part of a word is made up of a number of periodically repeating elements 10. It is believed that the intelligibility of speech is dependant upon the number, frequency and amplitude of these periodic elements and that strings of such elements may be identified in terms of actual sounds by virtue of certain characteristics in various frequency bands such as those that extend from 70-210Hz, 210-630Hz, 630-1890Hz and from 1800-5400Hz.
  • an overall sound envelope can be used to identify whether or not a detected sound derives from speech, it is also possible to differentiate between vowels and consonants within that speech by examining the ratio between the high and low frequencies present in each periodic element and counting the number of characteristic elements that serve to make up a particular string.
  • the apparatus for enhancing the intelligibility of speech to be described below operates by first generating an electrical signal representative of a detected -audio signal. This electrical signal is then analysed to determine whether or not the audio signal is derived from speech whereupon if it is so derived the electrical signal is processed in a variety of ways before then being used to generate an audio signal representative of the processed signal. If however as a result of this initial analysis it is determined that the audio signal is not derived from speech, then the electrical signal is processed differently,
  • the processing step may simply comprise storing the input electrical signal within a temporary store at one rate and outputing it at another. This has the effect of altering the pitch of the resulting audio signal and can be used, for example, to reduce the frequency of high pitched sounds to which the elderly may have become deaf.
  • the processing of the electrical signal may include the step of determining the duration of a sound making up the audio signal in terms of the number of characteristic periodic elements of which it is comprised and manipulating those elements in response to that duration. For example, short sounds which are common among consonants, such as the consonants "t” and “d” in the words “two” and “do”, may be lengthened by either reducing the rate at which they are output, thereby decreasing their pitch, or by repeating one or more of the periodic elements of which they are comprised. Conversely, long sounds, such as those associated with vowels, may be shortened either by increasing the rate at which they are output, thereby raising their pitch, or by deleting one or more of their constituent periodic elements from the processed signal.
  • the apparatus to be described may serve to selectively process vowel and consonant sounds by altering their pitch and/or duration so as to emphasise the differences between them. This can be of particular value to those with hearing difficulties such as children with Downs Syndrome who commonly have difficulty distinguishing between the sounds of certain consonants.
  • the apparatus to be described may also process an input signal in such a way as to selectively increase the amplitude of particular sounds thereby providing them with an increased emphasis. It has been found for example, that for the hard of hearing it is particularly difficult to distinguish between leading consonants. By amplifying and/or duplicating the periodic elements that comprise these parts of speech it is possible to emphasise the sounds concerned and so improve their intelligibility.
  • the rhythm of speech also provides an aid to its intelligibility and this rhythm may be emphasised by selectively increasing the amplitude of sounds originating at the start of a word in preference to those that occur later * Graphical tests would appear to indicate that a phrase or word is easiest to understand when its sound envelope is generally "hump" shaped.
  • the apparatus to be described is designed to be capable of manipulating the periodic elements of which an audio signal is comprised so as to give rise to sound envelopes having this shape and to do so in real time.
  • the apparatus in audio processing headphones and in recording systems for audio tapes or telephone messages.
  • FIG. 2 there is shown a block diagram of an apparatus for enhancing the intelligibility of speech in accordance with an embodiment of the first aspect of the present invention.
  • the apparatus can be seen to comprise a means 20 for generating an electrical signal representative of a detected audio signal, such as a microphone, and a processing unit 22, such as a CPU 80535 or an 8751 microcontroller having separate a/d converters.
  • the output of the electrical signal generating means 20 is connected via an amplitude gain and compression circuit 24 to the input of a number of filters 26,28 and 30 each having a differing passband.
  • a first of these filters 26 has a passband of from 0.1 to 4kHz the output of which is rectified and smoothed by a rectifying circuit 32.
  • the resulting DC signal is proportional to the amplitude of the detected audio signal and is passed from the rectifying circuit 32 to an a/d convertor contained within the processing unit 22.
  • the resulting DC signal is then digitised in the processing unit 22 by a further a/d converter.
  • the digitised signal may be used to determine whether the input audio signal derives from speech.
  • the third filter 30 has a passband of from 50Hz to 5kHz and its output is amplified by a gain control circuit 36 whose gain is controlled by the processing unit 22 using a signal level derived from the output of the first filter 26. This ensures that the analyser within the processing unit 22 operates at a more constant signal level.
  • the signal output from the variable gain amplifier 36 is digitised and passed to the processing unit 22 via an input store 38 such as a FIFO. It is this signal that constitutes the main audio signal to be processed by the processing unit 22.
  • the output of the amplitude gain and compression circuit 24 is also connected to a zero crossing detector 40.
  • the output of the zero crossing detector 40 is digitised within the processing unit 22 and it is from this digitised signal that the processing unit 22 is able to identify periodic elements comprised within the audio signal. Thereafter an analysis of the ratio of the amplitudes of their high and low frequency components can be used to determine whether the periodic elements concerned derive from a vowel or a consonant.
  • the output of the zero crossing detector 40 when digitised can also be used to assist in the synchronisation of subsequent processing.
  • the first and second filters 26 and 28 and the zero crossing detector 40 provide the processing unit 22 with signals indicative of the content of the input audio signal which can be used in processing the bandwidth reduced signal held in the FIFO store 38.
  • the output from the first filter 26 provides a signal indicative of the amplitude of the input audio signal; the output from the second filter 28 can be analysed to assess whether the input audio signal derives from speech; and the output from the zero crossing detector 40 can be used to identify periodic elements comprised within the audio signal and to determine whether particular sounds are of long or short duration.
  • the processing unit 22 reacts to the results of this analysis and selects the stored processing algorithms appropriate to the speech content revealed by the analysis.
  • the processing unit 22 can process the amplified output from the third filter 30 according to an analysis of the signal as it is stored rather than according to an analysis of the signal several samples earlier.
  • the processed signal is passed to a second FIFO store 42.
  • the output of the second FIFO store 42 is subsequently converted by a d/a converter 44 to provide a signal capable of driving an output means 46 such as a loud speaker or headphones or which is capable of operating some other electronic equipment.
  • the input clock rate of the first FIFO store 38 and the output clock rate of the second FIFO store 42 may be varied by the processing unit 22 via a programmeable clock generator 48.
  • the apparatus shown in Figure 2 also comprises an LCD display 52 which is connected to and operated by the processing unit 22.
  • This LCD display 52 may be used to show a representation of the sound envelope being received and/or an alphabetic representation of any recognised speech. Means may also be provided to provide a printout of this display thereby providing a record of the detected audio signal.
  • processing unit 22 may control three LED's 54,56 and 58 which may be used to indicate when the processing unit 22 is either recognising speech, rejecting the input signal or unable to keep up with the incoming speech or signal.
  • amplified samples of the output from the third filter 30 are stored in the first FIFO 38 on each pulse of its write clock.
  • the output of the first FIFO 38 is connected to the input of the second FIFO 42 and to an input/output port of the processing unit 22.
  • the processing unit 22 has separate control of the read and write functions of the first and second FIFO's 38 and 42 respectively and can elect merely to transfer data directly from the first FIFO 38 to the second FIFO 42. In this way the pitch of the audio signal emmanating from the output means 48 may be altered with respect to that detected by the input means 20 enabling the apparatus to be used, for example, to reduce the frequency of high pitched sounds to which the elderly may have become deaf.
  • the output from the first FIFO 38 may be operated on by the processing unit 22 in accordance with the algorithms stored within it.
  • the processing unit 22 may be used to determine whether the signal passed by the third filter 30 is derived from speech. This is done by determining whether the period of the signal output from the third filter 30 lies between predetermined limits, by checking the rate of change of the amplitude and period of adjacent elements and by checking the repetitiveness of the waveform. If, as a result of this process, the processing unit 22 determines that the signal output from the third filter 30 is not derived from speech, the samples stored in the first FIFO 38 are read out to be either discarded or used for some other purpose.
  • the output from the first FIFO 38 may be further operated on by the processing unit 22 to determine whether the particular sounds making up the speech are of long or short duration. This can be achieved by measuring the ratio of the amplitude of high frequencies to that of low frequencies while a classification of the sound in question may be obtained by examining the duration and shape of its envelope. If it is determined that the signal output from the third filter 30 derives from a long sound of low frequency the samples stored in the first FIFO 38 may be fed directly to the second FIFO 42 whereupon the pitch of the audio signal generated by the output means 46 may be raised and its duration shortened by increasing the read clock rate of the second FIFO 42. Alternatively, the duration of the sound may be shortened without raising its pitch by ommitting some of the periodic elements of which it is comprised when transferring the samples stored in the first FIFO 38 to the second FIFO 42.
  • the samples stored in the first FIFO 38 may again be fed to the second FIFO 42 via the processing unit 22 and the audio signal generated by the output means 46 "stretched" by the insertion of one or more additional copies of the periodic elements represented by the samples stored in the first FIFO 38.
  • the read clock rate of the second FIFO 42 may be reduced under the control of both the processing unit 22 and the clock generator 50 ao as to lower the pitch and increase the duration of the audio signal generated by the output means 46 thereby enabling the apparatus to assist those with high frequency hearing loss.
  • the processing unit 22 may also be used to amplify selected sounds to help differentiate between, for example, similar sounding consonants.
  • the envelopes of selected sounds may also be altered to aid the listener.
  • FIG. 3 A second embodiment of an apparatus to enhance the intelligibility of speech is shown in figure 3.
  • the apparatus shown in Figure 3 includes a portable computer 62 such as PSION which may be used via a serial adaptor 64 to relay to the processing unit 22 new or predetermined values of variables employed in the processing algorithms.
  • inputs to the processing unit 22 may be used to read switches and links 66 as well as to monitor the listener's speed controls 68 while the outputs of the processing unit 22 may be used to operate parallel interfaces for printers, vibrators or electrodes 70 by way of a buffer 72. It is to be noted that by applying the processed speech to vibrators or electrodes it is possible to aid the intelligibility of speech for the profoundly deaf.
  • the processing unit 22 may insert sample values into the second FIFO 42 which are based on modifications of the samples stored in the first FIFO 38.
  • the processing unit 22 may insert sample values into the second FIFO 42 which are based on either modifications of the filtered input audio signal, predetermined values which may be representative of an idealised sound element, or values representative of silence.
  • the apparatus may be provided with a second means for generating an electrical signal representative of a detected audio signal so as to thereby obtain directional information relating to its origin.
  • One of the problems with the use of a zero crossing detector to define the various periodic elements of which an input signal is comprised is that some speech waveforms are made up of several zero crossings. This necessitates the use of more elaborate algorithms or the introduction of CPU control of either the zero crossing detection time constant or the zero crossing hysterisis.
  • the zero crossing detector may be inhibited from detecting another zero crossing for a predetermined time following a first such detection or alternatively until after the amplitude of the signal has passed through a predetermined range of values.
  • each element may instead be defined as extending from peak to peak.
  • This re-definition can have a significant impact upon the result of the subsequent manipulation of the electrical signal since at a peak the rate of change of the waveform is considerably less than that at a zero crossing. This means that a small timing error between the occurrence of the true signal peak and the time at which the signal is next sampled will have less effect on the perception of the processed signal than would the same timing error between a true zero crossing and the taking of the next sample. This is particularly so when the signal to be sampled is a non-symetrically varying waveform.
  • Figure 4 illustrates a typical electrical signal which may be generated as a result of a detected audio signal.
  • Figure 5 shows this signal being sampled at regular time intervals and divided into successive elements with the start of each element being defined at a peak.
  • Figure 6 shows the effect of ommitting one of these elements as a result of a typical processing step while Figure 7 shows the effect of repeating a first of the elements to counter the earlier ommission.
  • the processed waveform of Figure 7 shows a close correspondence to that of Figure 4.
  • Figure 8 shows the signal of Figure 4 being sampled at regular time intervals but divided into successive elements on the basis of a positive zero crossing.
  • Figure 9 shows the result of ommitting one of the elements from the waveform of Figure 8 while Figure 10 shows the result of repeating a first of the elements to compensate for this earlier ommission.
  • the processed waveform in Figure 10 does not correspond as closely with that shown in Figure 4 as does that shown in Figure 7.
  • a means for identifying peaks in an input signal can be seen to comprise an a/d converter 80 and a microprocessor 82.
  • the signal in which peaks are to be identified is fed to the input of the a/d converter 80 which continuously digitises the incoming signal and stores the resulting data in an output buffer.
  • the microprocessor 82 operating under an interrupt programme, reads the data stored in the output buffer of the a/d converter 80 and writes the data into its own input buffer at a location R0 which is incremented upon the execution of each interrupt so as to circulate the location within the input buffer between BUFSTART (to which R0 is equated on start up) and BUFEND.
  • variable LASTDIFF represents the angle of the slope of the signal in which peaks are to be identified and its sign signifies whether the waveform is increasing, ie. approaching a peak, or decreasing, ie. moving away from a peak.
  • the interrupt programme signals that a peak has been detected whose address is that of the previous value of RO and which is now stored as the variable LASTADDR. This address is copied and stored as the variable PEAKCAND (peak candidate) to be verified by the main programme loop illustrated in Figures 13 and 14.
  • R0 is stored as the variable LASTADDR and R0 is incremented to the next position in the input buffer or, if the present value of RO is equal to BUFEND, is reset to BUFSTART.
  • the end of the interrupt programme allows the main programme loop to resume execution in a manner to be described. However, since the interrupt programme typically occurs every 70 microseconds, it is essential that the interrupt programme is capable of being executed in a much shorter time. To this end a microcontroller and software instructions are chosen that will result in an execution time for the interrupt programme of approximately 30 microseconds or less. This leaves the main programme loop with a period that may be calculated by multiplying the figure of approximately 40 microseconds by the number of programme interrupts that may be performed during the propagation of a signal element in which to process and output the data representative of that element. If for any reason this period should prove to be insufficient the programme interrupt may be performed at slightly longer intervals.
  • the sub-routine determines whether the sample represented by the variable THISPEAK comprises the end of a signal element in accordance with the amplitude and duration of the element that this would entail and clears a valid peak flag and sets PEAKCAND to 0 if the candidate is rejected. This has the effect of clearing the accumulator and prevents the main programme loop from entering the sub-routine until another peak candidate is identified by the programme interrupt.
  • the test to determine whether the amplitude of a prospective signal element is too small may comprise the subtracting of a value from the data stored at the address represented by the variable THISPEAK.
  • the value that is subtracted in this way may be a fixed value or alternatively may be based on the rms value of the input signal as determined by itself or by another microcontroller.
  • the duration of a prospective signal element may be determined from the difference between the addresses represented by the variables THISPEAK and LASTPEAK and which in turn will equate to the number of programme interrupts performed during the propagation of the signal element. Since the time between programme interrupts is set by the interrupt timer and is typically of the order of 70 microseconds, a fixed value of 4 can be used to test that the frequency of the prospective signal element is less than about 5000Hz.
  • the difference between the addresses represented by the variables THISPEAK and LASTPEAK can be compared with the number of cycles contained in the actual fundamental input frequency. The actual fundamental input frequency can be found by relating the amplitudes of rms values of the input after it has been filtered through bandpass filters having different passbands preferably under the control of a separate microcontroller.
  • variable LASTPEAK Also included in the sub-routine is a test to check that the address represented by the variable LASTPEAK does not equate to 0. This situation can arise when the input buffer does not already contain a valid peak such as will be the case at system start-up. However by setting the variable LASTPEAK equal to THISPEAK at least a signal element will be processed on the next valid peak.
  • the sub-routine may also include a test to compare the duration of a prospective signal element with those that have been verified previously. This can be achieved using a variable ELELNGTH which is incremented by one during the performance of each programme interrupt.
  • the variable ELELNGTH could also then be made available to the main programme loop to enable an investigation to be conducted should a peak not be detected when expected.
  • the variable could be used to prepare the main programme loop for signal elements of longer than usual duration or for the start of a new word. In either case ELELNGTH is set to zero when a prospective peak is verified.
  • the tests may be applied separately or pluraly.
  • a yes/no/? technique can be employed where a failure of any criterion will result in the peak candidate being declared invalid.
  • a scoring method can be adopted where each criterion will contribute toward a final score.
  • the data stored between the addresses represented by the variables LASTPEAK and THISPEAK is moved from the input buffer to an output buffer during which the signal element that it represents may be processed in any of the ways previously described and in particular may be either duplicated, erased or modified in some other way.
  • the output buffer is itself automatically output and this can be achieved by using a FIFO as previously described with its read clock operated by hardware and its output inhibited should it become empty.
  • a further use for the peak detector described would be to provide "scrambling" of speech by changing the order of signal elements before transmission and sorting the elements into the original order at the receiver.
  • Some public auditoriums are provided with inductive loop systems to enable an audience wearing specially adapted headsets or hearing aids with a Telecoil pickup to hear the speakers.
  • the apparatus described could be provided in the amplifier so that the signal received by the audience would be of an improved intelligibility. Similar considerations would apply to radio hearing aids or infra-red beam devices in which the inductive loop is replaced by a radio or infra-red link.
  • Such systems allow the listener to adjust the characteristics of the output remotely whilst at the same time giving interference free listening provided of course that in the case of an infra-red system there is a direct path for the infra-red beam.
  • Broadcasting studios could also be fitted with the described apparatus to improve the intelligibility of their programmes to many of their audience.
  • telephone receivers for the hard of hearing could benefit from the apparatus while telephone repeater amplifiers could be programmed to choose which of the incoming audio signals to re-transmit or improve and which to reject as too noisy.
  • FIG. 15 An example of an apparatus for enhancing the intelligibility of speech by manipulating speech elements defined on a peak-to-peak basis is shown in Figure 15. Unlike the apparatus previously described with reference to Figures 2 and 3, the apparatus of Figure 15 is shown to comprise two separate microcontrollers 100 and 102 although, as will be apparent to those skilled in the art, this need not necessarily be the case.
  • means 104 such as a microphone, are provided to generate an electrical signal representative of a detected audio signal.
  • This electrical signal is then passed via a pre-amplifier circuit 106 to an amplitude compressor circuit 108 where the signal is combined with that generated by a second generating means 110 responsive to higher amplitude audio frequency signals.
  • the amplitude compressor circuit 108 serves to reduce the dynamic range of the signals which it receives and provides an ouput which is then used by both the system control microcontroller 100 and the speech processing microcontroller 102.
  • the system control microcontroller 100 is used to determine whether the input audio signal derives from speech and if so to determine which of a predetermined number of speech processing algorithms are to be performed on the signal data which is stored in the speech processing microcontroller 102.
  • the output from the amplitude compressor circuit 108 is fed to four bandpass filters 112,114,116 and 118 whose outputs are rectified by respective rectifying circuits 120, 122, 124, and 126 to provide a four channel a/d converter 128 with slowly varying DC signals indicative of the audio signal detected by the generating means 104 and 110.
  • the digitised signals produced by the a/d converter 128 are passed to the microcontroller 100 whereupon the microcontroller 100 performs the steps necessary to determine whether the detected audio signal derives from speech and if so whether a particular sound making up the signal is of long or short duration.
  • the microcontroller 100 is able to select the algorithms to be performed on the data stored in the speech processing microcontroller 102. Control data representative of the algorithms to be performed is then directed to a first FIFO 130. Alternatively, the decision as to which of the algorithms to employ may be made as a result of other data received by the system control microcontroller 100 such as by means of analogue input values, input switch data or RS232 commands. In another arrangement a digital display 132 operated by the microcontroller 100 may be provided for diagnostics purposes during the development of the apparatus and to display characteristics of the detected audio signal during operation.
  • the output of the compressor circuit 108 is also connected to a low pass filter and track/hold circuit 134.
  • the low pass filter contained within circuit 134 may be programmed so as to pass frequencies up to a predetermined maximum within the range from 1kHz to 25kHz.
  • the a/d converter 136 to which the circuit 134 is connected may typically be prevented from operating on frequencies in excess of 6kHz.
  • the output of the a/d converter 136 which typically samples incoming signals at a rate of 13,000 a second, is connected to the speech processing microcontroller 102.
  • Data derived from these samples is input into an internal buffer by the speech processing microcontroller 102 under an internal timer interrupt to ensure that an accurate reconstruction of the sampled signal will be possible once it has been processed.
  • the microcontroller 102 may be programmed to set software flags and/or other values when a peak is detected in the input signal.
  • a background software task uses this information together with the control data stored in the first FIFO 130 to identify the data values that relate to a single signal element and to decide whether to omit that element, transfer it to a second ouput FIFO 138, duplicate it in the output FIFO 138 or modify the data values in some way before only then transferring them to the output FIFO 138.
  • the data memory of the microcontroller 102 may be used in an intermediate output stage to increase the amount of data capable of being stored prior to its output by the second FIFO 138. This enables several words of speech to be stored for further modification or for outputting without further modification or stored for outputting at a later time in response to a switch or serial command or in response to a signal included in the analogue input.
  • the output FIFO 138 may characteristically be loaded with a number of consecutive data values until such a time when loading stops whilst another signal element is processed. Despite this data may be output from the FIFO 138 at regular intervals with the aid of a clock from a programmable frequency generator 140. This processed data is then converted to an analogue signal by a d/a converter 142 whereupon any high frequency noise may be removed by a filter 144 and the resulting signal either expanded in amplitude range by an expander 146 or left at a reduced amplitude range to assist the hard of hearing who may typically be unable to hear the whole range of sound amplitudes.
  • This resulting signal may be used to generate an output audio signal by serving either as the input for a loudspeaker drive circuit 148, as the input for another audio circuit 150 or as the input for an inductive loop 152. It is to be noted however that an advantage of using the present apparatus in conjunction with an inductive loop is that the inductive loop provides a direct input to many hearing aids having a Telecoil input thereby enabling a processed audio signal to be fed to a hard of hearing audience while an unprocessed audio signal may still reach unimpaired members in the normal way.
  • the data output FIFO 138 may simply be read at a slower clock rate and the low pass filter 144 set to attenuate any high frequencies still present after the operation of the d/a converter 142.
  • various other ouputting methods may be used for the seriously handicapped.
  • vibrators can be used to signify the quality and loudness of sounds while information can also be embossed in a Braille-like manner or displayed as numbers or characters. Sounds may also be displayed graphically as waveforms or in terms of the proportions of different frequency bands that are present.
  • Specialist output devices would be required for some of these latter methods but these devices could be operated by a computer in conjunction with the apparatus described using an RS232 serial datalink.
  • the present invention is not limited to the processing of audio signals derived from speech. Indeed, depending on the passbands of the filters that are used, the apparatus may be made sensitive to specific non-speech inputs such as door bells, alarms, communication tones and the like. Thus an apparatus may be provided that is capable of detecting and interpreting Standard International Morse Codes which may then be used to trigger a variety of responses. Likewise, an apparatus may be provided that is sensitive to communication tones such as CCIR and ZVEI to facilitate the remote operation of associated equipment.
  • the processing unit of Figures 2 and 3 could in another arrangement comprise a microprocessor, a microcontroller, a digital signal processor, an ASIC, or a digital speech synthesizer. It is to be noted however that an advantage of using a microprocessor is its ability to include a choice of algorithms and methods in the software, selectable individually or in combination by links or switches to suit different listeners or situations.
  • a fast serial memory may provide an alternative to the FIFO stores of the embodiments described.
  • the advantage of using FIFO memories with separate input and output clock controls is their ability to store waveforms at one speed and to read, process and store them in a second FIFO at a different speed. This enables the audio output signal to be driven at a speed partly controlled by the listener.
  • the passbands specified for the various filters are those which are considered preferable for the type of microprocessor algorithms described. It will be appreciated however that different passbands may be more appropriate for other algorithms, languages or applications. Indeed in one arrangment the filters may be replaced by variable passband filters whose passbands can be digitally selected by the processor or pre-programmed during production. The filters may be separate from the microprocessor or combined in a dedicated chip.

Abstract

There is described an apparatus and method for enhancing the intelligibility of speech comprising means (20) to generate an electrical signal representative of a detected audio signal, means (40) for identifying a plurality of periodic elements comprising said electrical signal, and means (22) for selectively altering the frequency and/or number of said periodic elements in response to signals characteristic of the periodic elements identified so as to thereby generate a modified output signal. There is also described an apparatus and method for processing an electrical signal comprising means for detecting peaks in said electrical signal, means for storing peak-to-peak elements of said electrical signal and means for processing said electrical signal by manipulating one or more of said peak-to-peak elements. There is also described an apparatus and method for detecting turning points within an electrical signal comprising means to periodically sample said electrical signal, means to calculate the difference in amplitude between successive samples, and means to detect when successive values of the said difference in amplitude change in sign.

Description

Processing of electrical and audio signals.
The present invention relates to an apparatus and method for enhancing the intelligibility of speech as well as to a method and apparatus for detecting turning points, particularly peaks, within an electrical signal.
According a first aspect of the present invention there is provided an apparatus for enhancing the intelligibility of speech comprising means to generate an electrical signal representative of a detected audio signal, means for identifying a plurality of periodic elements comprising said electrical signal, and means for selectively altering the frequency and/or number of said periodic elements in response to signals characteristic of the periodic elements identified so as to thereby generate a modified output signal.
Advantageously means may be provided to assess whether the detected audio signal derives from speech.
Advantageously means may also be provided to identify whether a particular sound comprised within the detected audio signal is of long or short duration. If said sound is identified as being of long duration the number of periodic elements representative of said sound may be reduced, preferably by omitting one or more periodic elements from said modified output signal. However, if said sound is identified as being of short duration the number of periodic elements representative of said sound may be increased, preferably by repeating one or more of said periodic elements when generating said modified output signal. Alternatively, if said sound is identified as being of long duration the frequency of the periodic elements representative of said sound may be increased. Likewise, if said sound is identified as being of short duration the frequency of the periodic elements representative of said sound may be reduced. Preferably the frequency of the periodic elements representative of said sound is altered by inputting and outputting the periodic elements to and from a storage means at different rates.
Advantageously means may be provided for selectively amplifying one or more of said periodic elements.
Advantageously means may also be provided to selectively alter the envelope of one or more of said periodic elements.
Advantageously said periodic elements may be defined as comprising that part of the electical signal disposed between successive peaks. Alternatively, said periodic elements may be defined as comprising that part of the electrical signal disposed between successive positive zero crossings.
Advantageously the modified output signal may serve as the input for a loudspeaker drive circuit, an inductive loop or a similar outputting means.
Advantageously means may be provided to determine whether the detected audio signal derives from specific non-speech inputs such as door bells, alarms, communication tones and the like. Preferably the output signal is modified in accordance with the specific non-speech input identified and used to initiate an appropriate response.
According to a second aspect of the present invention there is provided a method of enhancing the intelligibility of speech comprising the steps of generating an electrical signal representative of a detected audio signal, identifying a plurality of periodic elements comprising said electrical signal, and selectively altering the frequency and/or number of said periodic elements in response to signals characteristic of the periodic elements identified so as to thereby generate a modified output signal. Advantageously the method may comprise the additional step of assessing whether the detected audio signal derives from speech.
Advantageously the method may comprise the additional step of identifying whether a particular sound comprised within the detected audio signal is of long or short duration. If said sound is identified as being of long duration the number of periodic elements representative of said sound may be reduced, preferably by ommitting one or more of said periodic elements from said modified output signal. However, if said sound is identified as being of short duration the number of periodic elements representative of said sound may be increased, preferably by repeating one or more of said periodic elements when generating said modified output signal. Alternatively, if said sound is identified as being of long duration the frequency of the periodic elements representative of said sound may be increased. Likewise, if said sound is identified as being of short duration the frequency of the periodic elements representative of said sound may be reduced. Preferably the frequency of the periodic elements representative of said sound may be altered by inputting and outputting the periodic elements to and from a storage means at different rates.
Advantageously the method may comprise the additional step of selectively amplifying one or more of said periodic elements.
Advantageosuly the method may comprise the additional step of selectively altering the envelope of one or more of said periodic elements.
Advantageously said periodic elements may be defined as comprising that part of the electrical signal disposed between successive peaks. Alternatively, said periodic elements may be defined as comprising that part of the electrical signal disposed between successive positive zero crossings. Advantageously the method may comprise the additional step of employing the modified output signal as the input for a loudspeaker drive circuit, an inductive loop or a similar outputting means.
Advantageously the method may comprise the additional step of determining whether the detected audio signal derives from specific non-speech inputs such as door bells, alarms, communication tones and the like. Preferably the outpout signal is modified in accordance with the specific non-speech input identified and used to initiate an appropriate response.
According to a third aspect of the present invention there is provided an apparatus for processing an electrical signal comprising means for detecting peaks in said electrical signal, means for storing peak-to-peak elements of said electrical signal and means for processing said electrical signal by manipulating one or more of said peak-to-peak elements.
Advantageously, said means for detecting peaks in said electrical signal may comprise means for periodically sampling said electrical signal, means to calculate the difference in amplitude between successive samples, and means to detect when successive values of said difference in amplitude change in sign from positive to negative.
Advantageously the validity of a potential peak may be assessed by determining whether the amplitude at the potential peak is greater than a minimum threshold value. The value of the minimum threshold may be derived from the rms value of the electrical signal in which peaks are to be detected or alternatively may be predetermined.
Advantageously the validity of a potential peak may be assessed by determining whether the time interval between the potential peak and a previous verified peak is less than a maximum threshold value. Preferably the value of the maximum threshold may be predetermined. Advantageously the validity of a potential peak may be assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
According to a fourth aspect of the present invention there is provided a method of processing an electrical signal comprising the steps of detecting peaks in said electrical signal, storing peak-to-peak elements of said electrical signal and processing said electrical signal by manipulating one or more of said peak-to-peak elements.
Advantageously the step of detecting peaks in said electrical signal may comprise the additional steps of periodically sampling said electrical signal, calculating the difference in amplitude between successive samples, and detecting when successive values of said difference in amplitude change in sign from positive to negative.
Advantageously the validity of a potential peak may be assessed by determining whether the amplitude at the potential peak is greater than a minimum threshold value. The value of the minimum threshold may be derived from the rms value of the electrical signal in which peaks are to be detected or alternatively may be predetermined.
Advantageously the validity of a potential peak may be assessed by determining whether the time interval between the potential peak and a previous verified peak is less than a maximum threshold value. Preferably the value of the maximum threshold value may be predetermined.
Advantageously the validity of a potential peak may be assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
According to a fifth aspect of the present invention there is provided an apparatus for detecting turning points within an electrical signal comprising means to periodically sample said electrical signal, means to calculate the difference in amplitude between successive samples, and means to detect when successive values of said difference in amplitude change in sign.
Advantageously the turning point to be detected may comprise a peak and said means to detect a change in sign may be sensitive to detect a change in successive values of said difference in amplitude from positive to negative.
Advantageously the validity of a potential turning point may be assessed by determining whether the modulus of the amplitude at the potential turning point is greater than a minimum threshold value. The value of the minimum threshold may be derived from the rms value of the electrical signal in which turning points are to be detected or alternatively may be predetermined.
Advantageously the validity of a potential turning point may be assessed by determining whether the time interval between the potential turning point and a previous verified turning point is less than a maximum threshold value. Preferably the value of the maximum threshold may be predetermined.
Adavantageously the validity of a potential peak may be assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
According to a sixth aspect of the present invention there is provided a method for detecting turning points within an electrical signal comprising the steps of periodically sampling said electrical signal, calculating the difference in amplitude between successive samples, and detecting when successive values of said difference in amplitude change in sign.
Advantageously the turning point to be detected may comprise a peak and said step of detecting when successive values of said difference in amplitude change in sign may comprise detecting a change from positive to negative.
Advantageously the validity of a potential turning point may be assessed by determining whether the modulus of the amplitude of the potential turning point is greater than a minimum threshold value. The value of the minimum threshold may be derived from the rms value of the electrical signal in which turning points are to be detected or alternatively may be predetermined.
Advantageously the validity of a potential turning point may be assessed by determining whether the time interval between the potential turning point and the previous verified turning point is less than a maximum thereshold value. Preferably the value of the maximum threshold is predetermined.
Advantageously the validity of the potential peak may be assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
A number of embodiments of the present invention will now be described by way of example with reference to the accompanying drawings in which:
Figure 1 is a typical waveform representative of speech;
Figure 2 is schematic block diagram of an apparatus for enhancing the intelligibility of speech in accordance with a first embodiment of the present invention;
Figure 3 is schematic block diagram of an apparatus for enhancing the intelligiblility of speech in accordance with a second embodiment of the present invention;
Figure 4 is schematic illustration of a typical electrical signal which may be generated as a result of a detected audio signal;
Figure 5 is a schematic illustration of the signal of Figure 4 showing the signal being sampled at regular time intervals and divided into successive elements with the start of each element being defined at a peak;
Figure 6 illustrates the effect of ommitting one of the elements of Figure 5; Figure 7 illustrates the effect of repeating a first of the elements of Figure 5 to counter the earlier ommission;
Figure 8 is schematic illustration of the signal of Figure 4 showing the signal being sampled at regular time intervals and divided into successive elements with the start of each element being defined on the basis of a positive zero crossing;
Figure 9 illustrates the effect of ommitting one of the elements of Figure 8;
Figure 10 illustrates the effect of repeating a first of the elements of Figure 8 to counter the earlier ommission;
Figure 11 is a schematic block diagram of an apparatus for identifying peaks in an input signal;
Figure 12 is a flow diagram illustrating the steps of an interrupt programme used by the apparatus for identifying peaks;
Figure 13 is a flow diagram illustrating the steps of a main programme loop used by the apparatus for identifying peaks;
Figure 14 is a flow diagram illustrating the steps of a sub-routine used by the apparatus for detecting peaks; and
Figure 15 is a schematic block diagram of an apparatus for enhancing the intelligibility of speech in accordance with a third embodiment of the present invention.
Upon visual examination of waveforms representative of speech such as that shown in Figure 1 it can be seen that each part of a word is made up of a number of periodically repeating elements 10. It is believed that the intelligibility of speech is dependant upon the number, frequency and amplitude of these periodic elements and that strings of such elements may be identified in terms of actual sounds by virtue of certain characteristics in various frequency bands such as those that extend from 70-210Hz, 210-630Hz, 630-1890Hz and from 1800-5400Hz. Thus, while the shape of an overall sound envelope can be used to identify whether or not a detected sound derives from speech, it is also possible to differentiate between vowels and consonants within that speech by examining the ratio between the high and low frequencies present in each periodic element and counting the number of characteristic elements that serve to make up a particular string.
The apparatus for enhancing the intelligibility of speech to be described below operates by first generating an electrical signal representative of a detected -audio signal. This electrical signal is then analysed to determine whether or not the audio signal is derived from speech whereupon if it is so derived the electrical signal is processed in a variety of ways before then being used to generate an audio signal representative of the processed signal. If however as a result of this initial analysis it is determined that the audio signal is not derived from speech, then the electrical signal is processed differently,
In one arrangement the processing step may simply comprise storing the input electrical signal within a temporary store at one rate and outputing it at another. This has the effect of altering the pitch of the resulting audio signal and can be used, for example, to reduce the frequency of high pitched sounds to which the elderly may have become deaf.
In another arrangement the processing of the electrical signal may include the step of determining the duration of a sound making up the audio signal in terms of the number of characteristic periodic elements of which it is comprised and manipulating those elements in response to that duration. For example, short sounds which are common among consonants, such as the consonants "t" and "d" in the words "two" and "do", may be lengthened by either reducing the rate at which they are output, thereby decreasing their pitch, or by repeating one or more of the periodic elements of which they are comprised. Conversely, long sounds, such as those associated with vowels, may be shortened either by increasing the rate at which they are output, thereby raising their pitch, or by deleting one or more of their constituent periodic elements from the processed signal. In this way the apparatus to be described may serve to selectively process vowel and consonant sounds by altering their pitch and/or duration so as to emphasise the differences between them. This can be of particular value to those with hearing difficulties such as children with Downs Syndrome who commonly have difficulty distinguishing between the sounds of certain consonants.
The apparatus to be described, may also process an input signal in such a way as to selectively increase the amplitude of particular sounds thereby providing them with an increased emphasis. It has been found for example, that for the hard of hearing it is particularly difficult to distinguish between leading consonants. By amplifying and/or duplicating the periodic elements that comprise these parts of speech it is possible to emphasise the sounds concerned and so improve their intelligibility. The rhythm of speech also provides an aid to its intelligibility and this rhythm may be emphasised by selectively increasing the amplitude of sounds originating at the start of a word in preference to those that occur later* Graphical tests would appear to indicate that a phrase or word is easiest to understand when its sound envelope is generally "hump" shaped. The apparatus to be described is designed to be capable of manipulating the periodic elements of which an audio signal is comprised so as to give rise to sound envelopes having this shape and to do so in real time.
As such the apparatus to be described has many applications, most notably in the improvement of hearing aids as well as in speech intelligibility circuits for use with telephones and in amplifiers for radio, television and public address systems. There is also scope for the use of - li ¬
the apparatus in audio processing headphones and in recording systems for audio tapes or telephone messages.
Referring to Figure 2 there is shown a block diagram of an apparatus for enhancing the intelligibility of speech in accordance with an embodiment of the first aspect of the present invention. In particular the apparatus can be seen to comprise a means 20 for generating an electrical signal representative of a detected audio signal, such as a microphone, and a processing unit 22, such as a CPU 80535 or an 8751 microcontroller having separate a/d converters. The output of the electrical signal generating means 20 is connected via an amplitude gain and compression circuit 24 to the input of a number of filters 26,28 and 30 each having a differing passband. A first of these filters 26 has a passband of from 0.1 to 4kHz the output of which is rectified and smoothed by a rectifying circuit 32. The resulting DC signal is proportional to the amplitude of the detected audio signal and is passed from the rectifying circuit 32 to an a/d convertor contained within the processing unit 22.
The output of the second filter 28, which has a passband of from 200 to 800Hz, is amplified in an amplifying circuit 34 before being rectified and smoothed by a rectifying circuit (not shown) . The resulting DC signal is then digitised in the processing unit 22 by a further a/d converter. The digitised signal may be used to determine whether the input audio signal derives from speech.
In contrast to the first and second filters 26 and 28 the third filter 30 has a passband of from 50Hz to 5kHz and its output is amplified by a gain control circuit 36 whose gain is controlled by the processing unit 22 using a signal level derived from the output of the first filter 26. This ensures that the analyser within the processing unit 22 operates at a more constant signal level. The signal output from the variable gain amplifier 36 is digitised and passed to the processing unit 22 via an input store 38 such as a FIFO. It is this signal that constitutes the main audio signal to be processed by the processing unit 22.
As well as the three filters 26,28 and 30, the output of the amplitude gain and compression circuit 24 is also connected to a zero crossing detector 40. The output of the zero crossing detector 40 is digitised within the processing unit 22 and it is from this digitised signal that the processing unit 22 is able to identify periodic elements comprised within the audio signal. Thereafter an analysis of the ratio of the amplitudes of their high and low frequency components can be used to determine whether the periodic elements concerned derive from a vowel or a consonant. The output of the zero crossing detector 40 when digitised can also be used to assist in the synchronisation of subsequent processing.
Thus the first and second filters 26 and 28 and the zero crossing detector 40 provide the processing unit 22 with signals indicative of the content of the input audio signal which can be used in processing the bandwidth reduced signal held in the FIFO store 38. In particular the output from the first filter 26 provides a signal indicative of the amplitude of the input audio signal; the output from the second filter 28 can be analysed to assess whether the input audio signal derives from speech; and the output from the zero crossing detector 40 can be used to identify periodic elements comprised within the audio signal and to determine whether particular sounds are of long or short duration. The processing unit 22 reacts to the results of this analysis and selects the stored processing algorithms appropriate to the speech content revealed by the analysis. As the contents of the FIFO store 38 are necessarily delayed with respect to the outputs of the first and second filters 26 and 28 and the zero crossing detector 40, the processing unit 22 can process the amplified output from the third filter 30 according to an analysis of the signal as it is stored rather than according to an analysis of the signal several samples earlier.
Once the contents of the FIFO store 38 have been processed by the processing unit 22 in accordance with the stored algorithms the processed signal is passed to a second FIFO store 42. The output of the second FIFO store 42 is subsequently converted by a d/a converter 44 to provide a signal capable of driving an output means 46 such as a loud speaker or headphones or which is capable of operating some other electronic equipment. In such an arrangement, the input clock rate of the first FIFO store 38 and the output clock rate of the second FIFO store 42 may be varied by the processing unit 22 via a programmeable clock generator 48.
The apparatus shown in Figure 2 also comprises an LCD display 52 which is connected to and operated by the processing unit 22. This LCD display 52 may be used to show a representation of the sound envelope being received and/or an alphabetic representation of any recognised speech. Means may also be provided to provide a printout of this display thereby providing a record of the detected audio signal.
In addition the processing unit 22 may control three LED's 54,56 and 58 which may be used to indicate when the processing unit 22 is either recognising speech, rejecting the input signal or unable to keep up with the incoming speech or signal.
In use amplified samples of the output from the third filter 30 are stored in the first FIFO 38 on each pulse of its write clock. The output of the first FIFO 38 is connected to the input of the second FIFO 42 and to an input/output port of the processing unit 22. The processing unit 22 has separate control of the read and write functions of the first and second FIFO's 38 and 42 respectively and can elect merely to transfer data directly from the first FIFO 38 to the second FIFO 42. In this way the pitch of the audio signal emmanating from the output means 48 may be altered with respect to that detected by the input means 20 enabling the apparatus to be used, for example, to reduce the frequency of high pitched sounds to which the elderly may have become deaf.
Alternatively the output from the first FIFO 38 may be operated on by the processing unit 22 in accordance with the algorithms stored within it. In particular the processing unit 22 may be used to determine whether the signal passed by the third filter 30 is derived from speech. This is done by determining whether the period of the signal output from the third filter 30 lies between predetermined limits, by checking the rate of change of the amplitude and period of adjacent elements and by checking the repetitiveness of the waveform. If, as a result of this process, the processing unit 22 determines that the signal output from the third filter 30 is not derived from speech, the samples stored in the first FIFO 38 are read out to be either discarded or used for some other purpose. Assuming that it is determined that the output from the third filter 30 derives from speech, the output from the first FIFO 38 may be further operated on by the processing unit 22 to determine whether the particular sounds making up the speech are of long or short duration. This can be achieved by measuring the ratio of the amplitude of high frequencies to that of low frequencies while a classification of the sound in question may be obtained by examining the duration and shape of its envelope. If it is determined that the signal output from the third filter 30 derives from a long sound of low frequency the samples stored in the first FIFO 38 may be fed directly to the second FIFO 42 whereupon the pitch of the audio signal generated by the output means 46 may be raised and its duration shortened by increasing the read clock rate of the second FIFO 42. Alternatively, the duration of the sound may be shortened without raising its pitch by ommitting some of the periodic elements of which it is comprised when transferring the samples stored in the first FIFO 38 to the second FIFO 42.
If, on the other hand, it is determined that the signal output from the third filter 30 derives from a short sound of high frequency the samples stored in the first FIFO 38 may again be fed to the second FIFO 42 via the processing unit 22 and the audio signal generated by the output means 46 "stretched" by the insertion of one or more additional copies of the periodic elements represented by the samples stored in the first FIFO 38. In addition, or as an alternative, the read clock rate of the second FIFO 42 may be reduced under the control of both the processing unit 22 and the clock generator 50 ao as to lower the pitch and increase the duration of the audio signal generated by the output means 46 thereby enabling the apparatus to assist those with high frequency hearing loss.
In addition to performing these algorithms, the processing unit 22 may also be used to amplify selected sounds to help differentiate between, for example, similar sounding consonants. The envelopes of selected sounds may also be altered to aid the listener.
A second embodiment of an apparatus to enhance the intelligibility of speech is shown in figure 3. This second embodiment is similar to the first and where appropriate the same reference numerals have been used to identify corresponding components. Unlike the first embodiment however, the apparatus shown in Figure 3 includes a portable computer 62 such as PSION which may be used via a serial adaptor 64 to relay to the processing unit 22 new or predetermined values of variables employed in the processing algorithms. In addition inputs to the processing unit 22 may be used to read switches and links 66 as well as to monitor the listener's speed controls 68 while the outputs of the processing unit 22 may be used to operate parallel interfaces for printers, vibrators or electrodes 70 by way of a buffer 72. It is to be noted that by applying the processed speech to vibrators or electrodes it is possible to aid the intelligibility of speech for the profoundly deaf.
In another embodiment the processing unit 22 may insert sample values into the second FIFO 42 which are based on modifications of the samples stored in the first FIFO 38. Alternatively, the processing unit 22 may insert sample values into the second FIFO 42 which are based on either modifications of the filtered input audio signal, predetermined values which may be representative of an idealised sound element, or values representative of silence.
In another embodiment the apparatus may be provided with a second means for generating an electrical signal representative of a detected audio signal so as to thereby obtain directional information relating to its origin.
One of the problems with the use of a zero crossing detector to define the various periodic elements of which an input signal is comprised is that some speech waveforms are made up of several zero crossings. This necessitates the use of more elaborate algorithms or the introduction of CPU control of either the zero crossing detection time constant or the zero crossing hysterisis. Thus the zero crossing detector may be inhibited from detecting another zero crossing for a predetermined time following a first such detection or alternatively until after the amplitude of the signal has passed through a predetermined range of values.
In a further embodiment, rather than defining each periodic element of a sound as being from one positive zero crossing point to the next, each element may instead be defined as extending from peak to peak. This re-definition can have a significant impact upon the result of the subsequent manipulation of the electrical signal since at a peak the rate of change of the waveform is considerably less than that at a zero crossing. This means that a small timing error between the occurrence of the true signal peak and the time at which the signal is next sampled will have less effect on the perception of the processed signal than would the same timing error between a true zero crossing and the taking of the next sample. This is particularly so when the signal to be sampled is a non-symetrically varying waveform.
The difference in the processed signal as a result of redefining the elements to be manipulated as extending from peak-to-peak rather than from zero crossing-to-zero crossing is shown schematically in Figures 4 to 10. Figure 4 illustrates a typical electrical signal which may be generated as a result of a detected audio signal. Figure 5 shows this signal being sampled at regular time intervals and divided into successive elements with the start of each element being defined at a peak. Figure 6 shows the effect of ommitting one of these elements as a result of a typical processing step while Figure 7 shows the effect of repeating a first of the elements to counter the earlier ommission. As can be seen, the processed waveform of Figure 7 shows a close correspondence to that of Figure 4.
In contrast Figure 8 shows the signal of Figure 4 being sampled at regular time intervals but divided into successive elements on the basis of a positive zero crossing. In analogy with Figures 6 and 7, Figure 9 shows the result of ommitting one of the elements from the waveform of Figure 8 while Figure 10 shows the result of repeating a first of the elements to compensate for this earlier ommission. As can be seen, the processed waveform in Figure 10 does not correspond as closely with that shown in Figure 4 as does that shown in Figure 7.
In order to take advantage of the improved distortion characteristics of a signal whose elements are manipulated after having first been defined as extending from peak-to-peak it is of course necessary to provide a means for identifying those peaks. One such means is shown schematically in Figure 11 and will be described with reference to the flow charts of Figures 12 to 14.
Referring to Figure 11, a means for identifying peaks in an input signal can be seen to comprise an a/d converter 80 and a microprocessor 82. The signal in which peaks are to be identified is fed to the input of the a/d converter 80 which continuously digitises the incoming signal and stores the resulting data in an output buffer. At predetermined intervals, typically of the order of 70 microseconds, the microprocessor 82, operating under an interrupt programme, reads the data stored in the output buffer of the a/d converter 80 and writes the data into its own input buffer at a location R0 which is incremented upon the execution of each interrupt so as to circulate the location within the input buffer between BUFSTART (to which R0 is equated on start up) and BUFEND.
As can be seen from Figure 12, within the interrupt programme the difference between the most recent data and that previously written into the input buffer is calculated and stored as a variable LASTDIFF. This variable represents the angle of the slope of the signal in which peaks are to be identified and its sign signifies whether the waveform is increasing, ie. approaching a peak, or decreasing, ie. moving away from a peak. Thus when successive values stored in the variable LASTDIFF change in sign from being positive to negative the interrupt programme signals that a peak has been detected whose address is that of the previous value of RO and which is now stored as the variable LASTADDR. This address is copied and stored as the variable PEAKCAND (peak candidate) to be verified by the main programme loop illustrated in Figures 13 and 14. Thereafter, before existing from the interrupt programme, the present value of R0 is stored as the variable LASTADDR and R0 is incremented to the next position in the input buffer or, if the present value of RO is equal to BUFEND, is reset to BUFSTART.
If no signal is present or if the waveform is of longer duration than the length of the input buffer, data written into the input buffer can start to overwrite that of a previous cycle before it is removed by the main programme loop. This situation can be detected by comparing the address in RO with that of the variable LASTPEAK (the start of the signal element which has yet to be processed by the main programme loop) and if they are equal the values in the input buffer may be rejected by setting LASTPEAK equal to PEAKCAND.
The end of the interrupt programme allows the main programme loop to resume execution in a manner to be described. However, since the interrupt programme typically occurs every 70 microseconds, it is essential that the interrupt programme is capable of being executed in a much shorter time. To this end a microcontroller and software instructions are chosen that will result in an execution time for the interrupt programme of approximately 30 microseconds or less. This leaves the main programme loop with a period that may be calculated by multiplying the figure of approximately 40 microseconds by the number of programme interrupts that may be performed during the propagation of a signal element in which to process and output the data representative of that element. If for any reason this period should prove to be insufficient the programme interrupt may be performed at slightly longer intervals.
The way in which a peak candidate is verified as being a true peak, thereby enabling the data stored between the addresses PEAKCAND and LASTPEAK to be processed, is illustrated in the flow chart of Figure 13. Since the interrupt programme could conceivably store a new address in the variable PEAKCAND at any time it is necessary to copy this address into a new variable THISPEAK during the verification process so that a new PEAKCAND will not be lost. Since each of these variables represent an address in the input buffer between BUFSTART and BUFEND a value of 0 may be used to identify an invalid peak candidate or variable. Thus having checked that the variable PEAKCAND is not invalid and having set THISPEAK equal to PEAKCAND the main programme loop enters a sub-routine to test the validity of the peak candidate which is shown in more detail in Figure 14.
As can be seen, the sub-routine determines whether the sample represented by the variable THISPEAK comprises the end of a signal element in accordance with the amplitude and duration of the element that this would entail and clears a valid peak flag and sets PEAKCAND to 0 if the candidate is rejected. This has the effect of clearing the accumulator and prevents the main programme loop from entering the sub-routine until another peak candidate is identified by the programme interrupt.
The test to determine whether the amplitude of a prospective signal element is too small may comprise the subtracting of a value from the data stored at the address represented by the variable THISPEAK. The value that is subtracted in this way may be a fixed value or alternatively may be based on the rms value of the input signal as determined by itself or by another microcontroller.
The duration of a prospective signal element may be determined from the difference between the addresses represented by the variables THISPEAK and LASTPEAK and which in turn will equate to the number of programme interrupts performed during the propagation of the signal element. Since the time between programme interrupts is set by the interrupt timer and is typically of the order of 70 microseconds, a fixed value of 4 can be used to test that the frequency of the prospective signal element is less than about 5000Hz. Alternatively, the difference between the addresses represented by the variables THISPEAK and LASTPEAK can be compared with the number of cycles contained in the actual fundamental input frequency. The actual fundamental input frequency can be found by relating the amplitudes of rms values of the input after it has been filtered through bandpass filters having different passbands preferably under the control of a separate microcontroller.
Also included in the sub-routine is a test to check that the address represented by the variable LASTPEAK does not equate to 0. This situation can arise when the input buffer does not already contain a valid peak such as will be the case at system start-up. However by setting the variable LASTPEAK equal to THISPEAK at least a signal element will be processed on the next valid peak.
In another arrangement the sub-routine may also include a test to compare the duration of a prospective signal element with those that have been verified previously. This can be achieved using a variable ELELNGTH which is incremented by one during the performance of each programme interrupt. The variable ELELNGTH could also then be made available to the main programme loop to enable an investigation to be conducted should a peak not be detected when expected. Likewise, the variable could be used to prepare the main programme loop for signal elements of longer than usual duration or for the start of a new word. In either case ELELNGTH is set to zero when a prospective peak is verified.
Whatever the nature of the tests included in the sub-routine the tests may be applied separately or pluraly. In arrangements where several tests are applied a yes/no/? technique can be employed where a failure of any criterion will result in the peak candidate being declared invalid. Alternatively a scoring method can be adopted where each criterion will contribute toward a final score. Assuming that the test peak sub-routine returns with a positive verification of the peak and with PEAKCAND not equal to zero, the data stored between the addresses represented by the variables LASTPEAK and THISPEAK is moved from the input buffer to an output buffer during which the signal element that it represents may be processed in any of the ways previously described and in particular may be either duplicated, erased or modified in some other way. Preferably the output buffer is itself automatically output and this can be achieved by using a FIFO as previously described with its read clock operated by hardware and its output inhibited should it become empty.
Although the manipulation of speech elements defined on a peak-to-peak basis was conceived in conjunction with an apparatus for enhancing the intelligibility of speech, the above method and apparatus clearly has far wider implications. For example, the ommission of cycles at the transmit end of a communications system can be used as a means of reducing the amount of traffic on the communications link. This would be particularly valuable for landlines and under sea cables where transmission at high frequency is not possible. At the receiving end of the link duplicate cycles could be inserted to renovate the signal. It is also considered that the fidelity of most music would not be badly affected by the same processing and that the concept could be applied to affect the storage of audio signals, for example in audio record and play-back.
A further use for the peak detector described would be to provide "scrambling" of speech by changing the order of signal elements before transmission and sorting the elements into the original order at the receiver.
Some public auditoriums are provided with inductive loop systems to enable an audience wearing specially adapted headsets or hearing aids with a Telecoil pickup to hear the speakers. The apparatus described could be provided in the amplifier so that the signal received by the audience would be of an improved intelligibility. Similar considerations would apply to radio hearing aids or infra-red beam devices in which the inductive loop is replaced by a radio or infra-red link. Such systems allow the listener to adjust the characteristics of the output remotely whilst at the same time giving interference free listening provided of course that in the case of an infra-red system there is a direct path for the infra-red beam. Broadcasting studios could also be fitted with the described apparatus to improve the intelligibility of their programmes to many of their audience. Furthermore telephone receivers for the hard of hearing could benefit from the apparatus while telephone repeater amplifiers could be programmed to choose which of the incoming audio signals to re-transmit or improve and which to reject as too noisy.
An example of an apparatus for enhancing the intelligibility of speech by manipulating speech elements defined on a peak-to-peak basis is shown in Figure 15. Unlike the apparatus previously described with reference to Figures 2 and 3, the apparatus of Figure 15 is shown to comprise two separate microcontrollers 100 and 102 although, as will be apparent to those skilled in the art, this need not necessarily be the case.
As in the previous apparatus, means 104, such as a microphone, are provided to generate an electrical signal representative of a detected audio signal. This electrical signal is then passed via a pre-amplifier circuit 106 to an amplitude compressor circuit 108 where the signal is combined with that generated by a second generating means 110 responsive to higher amplitude audio frequency signals. The amplitude compressor circuit 108 serves to reduce the dynamic range of the signals which it receives and provides an ouput which is then used by both the system control microcontroller 100 and the speech processing microcontroller 102. The system control microcontroller 100 is used to determine whether the input audio signal derives from speech and if so to determine which of a predetermined number of speech processing algorithms are to be performed on the signal data which is stored in the speech processing microcontroller 102. To this end the output from the amplitude compressor circuit 108 is fed to four bandpass filters 112,114,116 and 118 whose outputs are rectified by respective rectifying circuits 120, 122, 124, and 126 to provide a four channel a/d converter 128 with slowly varying DC signals indicative of the audio signal detected by the generating means 104 and 110. The digitised signals produced by the a/d converter 128 are passed to the microcontroller 100 whereupon the microcontroller 100 performs the steps necessary to determine whether the detected audio signal derives from speech and if so whether a particular sound making up the signal is of long or short duration. The steps involved in this analysis have already been described in relation to previous apparatus and will not be described here however as a result of this analysis the microcontroller 100 is able to select the algorithms to be performed on the data stored in the speech processing microcontroller 102. Control data representative of the algorithms to be performed is then directed to a first FIFO 130. Alternatively, the decision as to which of the algorithms to employ may be made as a result of other data received by the system control microcontroller 100 such as by means of analogue input values, input switch data or RS232 commands. In another arrangement a digital display 132 operated by the microcontroller 100 may be provided for diagnostics purposes during the development of the apparatus and to display characteristics of the detected audio signal during operation.
As well as being connected to the four bandpass filters 112,114,116 and 118, the output of the compressor circuit 108 is also connected to a low pass filter and track/hold circuit 134. The low pass filter contained within circuit 134 may be programmed so as to pass frequencies up to a predetermined maximum within the range from 1kHz to 25kHz. As a result the a/d converter 136 to which the circuit 134 is connected may typically be prevented from operating on frequencies in excess of 6kHz. The output of the a/d converter 136, which typically samples incoming signals at a rate of 13,000 a second, is connected to the speech processing microcontroller 102. Data derived from these samples is input into an internal buffer by the speech processing microcontroller 102 under an internal timer interrupt to ensure that an accurate reconstruction of the sampled signal will be possible once it has been processed. As has already been described with reference to Figures 11 to 14, the microcontroller 102 may be programmed to set software flags and/or other values when a peak is detected in the input signal. A background software task uses this information together with the control data stored in the first FIFO 130 to identify the data values that relate to a single signal element and to decide whether to omit that element, transfer it to a second ouput FIFO 138, duplicate it in the output FIFO 138 or modify the data values in some way before only then transferring them to the output FIFO 138.
For the purpose of some applications the data memory of the microcontroller 102 may be used in an intermediate output stage to increase the amount of data capable of being stored prior to its output by the second FIFO 138. This enables several words of speech to be stored for further modification or for outputting without further modification or stored for outputting at a later time in response to a switch or serial command or in response to a signal included in the analogue input.
The output FIFO 138 may characteristically be loaded with a number of consecutive data values until such a time when loading stops whilst another signal element is processed. Despite this data may be output from the FIFO 138 at regular intervals with the aid of a clock from a programmable frequency generator 140. This processed data is then converted to an analogue signal by a d/a converter 142 whereupon any high frequency noise may be removed by a filter 144 and the resulting signal either expanded in amplitude range by an expander 146 or left at a reduced amplitude range to assist the hard of hearing who may typically be unable to hear the whole range of sound amplitudes. This resulting signal may be used to generate an output audio signal by serving either as the input for a loudspeaker drive circuit 148, as the input for another audio circuit 150 or as the input for an inductive loop 152. It is to be noted however that an advantage of using the present apparatus in conjunction with an inductive loop is that the inductive loop provides a direct input to many hearing aids having a Telecoil input thereby enabling a processed audio signal to be fed to a hard of hearing audience while an unprocessed audio signal may still reach unimpaired members in the normal way.
In the case where the apparatus is used to change high frequencies which would normally fall outside the hearers frequency range of hearing to a somewhat lower frequency, the data output FIFO 138 may simply be read at a slower clock rate and the low pass filter 144 set to attenuate any high frequencies still present after the operation of the d/a converter 142.
In addition to the output means described above various other ouputting methods may be used for the seriously handicapped. For example vibrators can be used to signify the quality and loudness of sounds while information can also be embossed in a Braille-like manner or displayed as numbers or characters. Sounds may also be displayed graphically as waveforms or in terms of the proportions of different frequency bands that are present. Specialist output devices would be required for some of these latter methods but these devices could be operated by a computer in conjunction with the apparatus described using an RS232 serial datalink.
It will be apparent to those skilled in the art that whilst the apparatus described has been developed for use with the English language and with English pronunciation, the present invention may also find use in connection with other languages and with other norms of pronunciation.
It will also be apparent to those skilled in the art that the present invention is not limited to the processing of audio signals derived from speech. Indeed, depending on the passbands of the filters that are used, the apparatus may be made sensitive to specific non-speech inputs such as door bells, alarms, communication tones and the like. Thus an apparatus may be provided that is capable of detecting and interpreting Standard International Morse Codes which may then be used to trigger a variety of responses. Likewise, an apparatus may be provided that is sensitive to communication tones such as CCIR and ZVEI to facilitate the remote operation of associated equipment.
It will also be apparent to those skilled in the art that the constituent parts of the apparatus described may be replaced by other suitable devices. For example, the processing unit of Figures 2 and 3 could in another arrangement comprise a microprocessor, a microcontroller, a digital signal processor, an ASIC, or a digital speech synthesizer. It is to be noted however that an advantage of using a microprocessor is its ability to include a choice of algorithms and methods in the software, selectable individually or in combination by links or switches to suit different listeners or situations.
Likewise, a fast serial memory may provide an alternative to the FIFO stores of the embodiments described. However, the advantage of using FIFO memories with separate input and output clock controls is their ability to store waveforms at one speed and to read, process and store them in a second FIFO at a different speed. This enables the audio output signal to be driven at a speed partly controlled by the listener.
In the examples given, the passbands specified for the various filters are those which are considered preferable for the type of microprocessor algorithms described. It will be appreciated however that different passbands may be more appropriate for other algorithms, languages or applications. Indeed in one arrangment the filters may be replaced by variable passband filters whose passbands can be digitally selected by the processor or pre-programmed during production. The filters may be separate from the microprocessor or combined in a dedicated chip.
It will also be apparent that different analysis techniques may be used as alternatives to the examination of the filter output. For example, successive approximation techniques could also be used as could comparison type techniques or those based on a synthesis of harmonics using a phase lock-loop.

Claims

1. An apparatus for enhancing the intelligibility of speech comprising means to generate an electrical signal representative of a detected audio signal, means for identifying a plurality of periodic elements comprising said electrical signal, and means for selectively altering the frequency and/or number of said periodic elements in response to signals characteristic of the periodic elements identified so as to thereby generate a modified output signal.
2. An apparatus in accordance with claim 1, wherein means are provided to assess whether the detected audio signal derives from speech.
3. An apparatus in accordance with claim 1 or claim 2, wherein means are provided to identify whether a particular sound comprised within the detected audio signal is of long or short duration.
4. An apparatus in accordance with claim 3, wherein if said sound is identified as being of long duration the number of periodic elements representative of said sound is reduced.
5. An apparatus in accordance with claim 4, wherein the number of periodic elements representative of said sound is reduced by ommitting one or more periodic elements from said modified output signal.
6. An apparatus in accordance with any of claims 3 to 5, wherein if said sound is identified as being of short duration the number of periodic elements representative of said sound is increased.
7. An apparatus in accordance with claim 6, wherein the number of periodic elements representative of said sound is increased by repeating one or more of said periodic elements when generating said modified output signal.
8. An apparatus in accordance with any of claims 3 to 7, wherein if said sound is identified as being of long duration the frequency of the periodic elements representative of said sound is increased.
9. An apparatus in accordance wtih any of claims 3 to 8, wherein if said sound is identified as being of short duration the frequency of the periodic elements representative of said sound is reduced.
10. An apparatus in accordance with claim 8 or claim 9, wherein the frequency of the periodic elements representative of said sound is altered by inputting and outputting the periodic elements to and from a storage means at different rates.
11. An apparatus in accordance with any preceding claim, wherein means are provided for selectively amplifying one or more of said periodic elements.
12. An apparatus in accordance with any preceding claim, wherein means are provided to selectively alter the enevelope of one or more of said periodic elements.
13. An apparatus in accordance with any preceding claim, wherein said periodic elements are defined as comprising that part of the electrical signal disposed between successive peaks.
14. An apparatus in accordance with any of claims 1 to 12, wherein said periodic elements are defined as comprising that part of the electrical signal disposed between successive positive zero crossings.
15. An apparatus in accordance with any preceding claim, wherein the modified output signal serves as the input for a loudspeaker drive circuit, an inductive loop or a similar outputting means.
16. An apparatus in accordance with any preceding claim, wherein means are provided to determine whether the detected audio signal derives from specific non-speech inputs such as door bells, alarms, communication tones and the like.
17. An apparatus in accordance with claim 16, wherein the output signal is modified in accordance with the specific non-speech input identified and used to initiate an appropriate response.
18. A method of enhancing the intelligibility of speech comprising the steps of generating an electrical signal representative of a detected audio signal, identifying a plurality of periodic elements comprising said electrical signal, and selectively altering the frequency and/or number of said periodic elements in response to signals characteristic of the periodic elements identified so as to thereby generate a modified output signal.
19. A method in accordance with claim 18, and comprising the additional step of assessing whether the detected audio signal derives from speech.
20. A method in accordance with claim 18 or claim 19 and comprising the additional step of identifying whether a particular sound comprised within the detected audio signal is of long or short duration.
21. A method in accordance with claim 20, wherein if said sound is identified as being of long duration the number of periodic elements representative of said sound is reduced.
22. A method in accordance with claim 21, wherein the number of periodic elements representative of said sound is reduced by ommitting one or more of said periodic elements from said modified output signal.
23. A method in accordance with any of claims 20 to 22, wherein if said sound is identified as being of short duration the number of periodic elements representative of said sound is increased.
24. A method in accordance with claim 23, wherein the number of periodic elements representative of said sound is increased by repeating one or more of said periodic elements when generating said modified output signal.
25. A method in accordance with any of claims 20 to 24, wherein if said sound is identified as being of long duration the frequency of the periodic elements representative of said sound is increased.
26. A method in accordance with any of claims 20 to 25, wherein if said sound is identified as being of short duration the frequency of the periodic elements representative of said sound is reduced.
27. A method in accordance with claim 25 or claim 26, wherein the frequency of the periodic elements representative of said sound is altered by inputting and outputting the periodic elements to and from a storage means at different rates.
28. A method in accordance with any of claims 18 to 27, and comprising the additional step of selectively amplifying one or more of said periodic elements.
29. A method in accordance with any of claims 18 to 28, and comprising the additional step of selectively altering the envelope of one or more of said periodic elements.
30. A method in accordance with any of claims 18 to 29, wherein said periodic elements are defined as comprising that part of the electrical signal disposed between successive peaks.
31. A method in accordance with any of claims 18 to 29, wherein said periodic elements are defined as comprising that part of the electrical signal disposed between successive positive zero crossings.
32. A method in accordance with any of claims 18 to 31 and comprising the additional step of employing the modified output signal as the input for a loudspeaker drive ciruit, an inductive loop or a similar outputting means.
33. A method in accordance with any of claims 18 to 32, and comprising the additional step of determining whether the detected audio signal derives from specific non-speech inputs such as door bells, alarms, communication tones and the like.
34. A method in accordance with claim 33, wherein the output signal is modified in accordance with the specific non-speech input identified and used to initiate an appropriate response.
35. An apparatus for processing an electrical signal comprising means for detecting peaks in said electrical signal, means for storing peak-to-peak elements of said electrical signal and means for processing said electrical signal by manipulating one or more of said peak-to-peak elements.
36. An apparatus in accordance with claim 35, wherein said means for detecting peaks in said electrical signal comprises means for periodically sampling said electrical signal, means to calculate the difference in amplitude between successive samples, and means to detect when successive values of said difference in amplitude change in sign from positive to negative.
37. An apparatus in accordance with claim 35 or claim 36, wherein the validity of a potential peak is assessed by determining whether the amplitude at the potential peak is greater than a minimum threshold value.
38. An apparatus in accordance with claim 37, wherein the value of the minimum threshold is derived from the rms value of the electrical signal in which peaks are to be detected.
39. An apparatus in accordance with claim 37, wherein the value of the minimum threshold is predetermined.
40. An apparatus in accordance with any of claims 35 to 39, wherein the validity of a potential peak is assessed by determining whether the time interval between the potential peak and a previous verified peak is less than a maximum threshold value.
41. An apparatus in accordance with claim 40, wherein the value of the maximum threshold is predetermined.
42. An apparatus in accordance with any of claims 35 to
41, wherein the validity of a potential peak is assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
43. An apparatus in accordance with any of claims 35 to
42, wherein said processing means comprises means to repeat, omit or modify one or more of said peak-to-peak elements.
44. A method of processing an electrical signal comprising the steps of detecting peaks in said electrical signal, storing peak-to-peak elements of said electrical signal and processing said electrical signal by manipulating one or more of said peak-to-peak elements.
45. A method in accordance with claim 44, wherein the step of detecting peaks in said electrical signal comprises the additional steps of periodically sampling said electrical signal, calculating the difference in amplitude between successive samples, and detecting when successive values of said difference in amplitude change in sign from positive to negative.
46. A method in accordance with claim 44 or claim 45, wherein the validity of a potential peak is assessed by determining whether the amplitude at the potential peak is greater than a minimum threshold value.
47. A method in accordance with claim 46, wherein the value of the minimum threshold is derived from the rms value of the electrical signal in which peaks are to be detected.
48. A method in accordance with claim 46, wherein the value of the minimum threshold is predetermined.
49. A method in accordance with any of claims 44 to 48, wherein the validity of a potential peak is assessed by determining whether the time interval between the potential peak and a previous verified peak is less than a maximum threshold value.
50. A method in accordance with claim 49, wherein the value of the maximum threshold is predetermined.
51. A method in accordance with any of claims 44- to 50, wherein the validity of a potential peak is assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
52. A method in accordance with any of claims 44 to 51, wherein the step of processing said electrical signal comprises either repeating, ommitting or modifying one or more of said peak-to-peak elements.
53. An apparatus for detecting turning points within an electrical signal comprising means to periodically sample said electrical signal, means to calculate the difference in amplitude between successive samples, and means to detect when successive values of said difference in amplitude change in sign.
54. An apparatus in accordance with claim 53, wherein the turning point to be detected is a peak and said means to detect a change in sign is sensitive to detect a change in successive values of said difference in amplitude from positive to negative.
55. An apparatus in accordance with claim 53 or claim 54, wherein the validity of a potential turning point is assessed by determining whether the modulus of the amplitude at the potential turning point is greater than a minimum threshold value.
56. An apparatus in accordance with claim 55, wherein the value of the minimum threshold is derived from the rms value of the electrical signal in which turning points are to be detected.
57. An apparatus in accordance with claim 55, wherein the value of the minimum threshold is predetermined.
58. An apparatus in accordance with any of claims 53 to 57, wherein the validity of a potential turning point is assessed by determining whether the time interval between the potential turning point and a previous verified turning point is less than a maximum threshold value.
59. An apparatus in accordance with claim 58, wherein the value of the maximum threshold is predetermined.
60. An apparatus in accordance wtih any of claims 53 to 59, wherein the validity of a potential peak is assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
61. A method of detecting turning points within an electrical signal comprising the steps of periodically sampling said electrical signal, calculating the difference in amplitude between successive samples, and detecting when successive values of said difference in amplitude change in sign.
62. A method in accordance with claim 61, wherein the turning point to be detected is a peak and the step of detecting when successive values of said difference in amplitude change in sign comprises detecting a change from positive to negative.
63. A method in accordance with claim 61 or claim 62, wherein the validity of a potential turning point is assessed by determining whether the modulus of the amplitude at the potential turning point is greater than a minimum threshold value.
64. A method in accordance with claim 63, wherein the value of the minimum thereshold is derived from the rms value of the electrical signal in which turning points are to be detected.
65. A method in accordance with claim 63, wherein the value of the minimum threshold is predetermined.
66. A method in accordance with any of claims 61 to 65, wherein the validity of a potential turning point is assessed by determining whether the time interval between the potential turning point and the previous verified turning point is less than a maximum threshold value.
67. A method in accordance with claim 66, wherein the value of the maximum threshold is predetermined.
68. A method in accordance with any of claims 61 to 67, wherein the validity of a potential peak is assessed by comparing the duration of the element thereby defined with those defined by previously verified peaks.
PCT/GB1992/001987 1991-10-30 1992-10-30 Processing of electrical and audio signals WO1993009531A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB9122995.5 1991-10-30
GB919122995A GB9122995D0 (en) 1991-10-30 1991-10-30 Audio processing system
GB9211684.7 1992-06-03
GB929211684A GB9211684D0 (en) 1991-10-30 1992-06-03 Audio compression/expansion system

Publications (1)

Publication Number Publication Date
WO1993009531A1 true WO1993009531A1 (en) 1993-05-13

Family

ID=26299769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1992/001987 WO1993009531A1 (en) 1991-10-30 1992-10-30 Processing of electrical and audio signals

Country Status (1)

Country Link
WO (1) WO1993009531A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005099190A1 (en) 2004-04-07 2005-10-20 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for increasing perceived interactivity in communications systems
FR2941836A1 (en) * 2009-02-05 2010-08-06 Sound4 METHOD FOR MEASURING CRETE VALUES AND POWER OF AUDIOFREQUENCY SIGNAL
US8311814B2 (en) 2006-09-19 2012-11-13 Avaya Inc. Efficient voice activity detector to detect fixed power signals
EP3210207A4 (en) * 2014-10-20 2018-09-26 Audimax LLC Systems, methods, and devices for intelligent speech recognition and processing

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2943152A (en) * 1957-11-07 1960-06-28 Joseph C R Licklider Audio pitch control
US3104284A (en) * 1961-12-29 1963-09-17 Ibm Time duration modification of audio waveforms
GB1068282A (en) * 1964-06-09 1967-05-10 Ibm Speech waveform modification
US3723667A (en) * 1972-01-03 1973-03-27 Pkm Corp Apparatus for speech compression
FR2178410A5 (en) * 1972-03-28 1973-11-09 Ibm France
US3852535A (en) * 1972-11-16 1974-12-03 Zurcher Jean Frederic Pitch detection processor
US4468804A (en) * 1982-02-26 1984-08-28 Signatron, Inc. Speech enhancement techniques
US4652857A (en) * 1983-04-29 1987-03-24 Meiksin Zvi H Method and apparatus for transmitting wide-bandwidth frequency signals from mines and other power restricted environments
US4792975A (en) * 1983-06-03 1988-12-20 The Variable Speech Control ("Vsc") Digital speech signal processing for pitch change with jump control in accordance with pitch period

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2943152A (en) * 1957-11-07 1960-06-28 Joseph C R Licklider Audio pitch control
US3104284A (en) * 1961-12-29 1963-09-17 Ibm Time duration modification of audio waveforms
GB1068282A (en) * 1964-06-09 1967-05-10 Ibm Speech waveform modification
US3723667A (en) * 1972-01-03 1973-03-27 Pkm Corp Apparatus for speech compression
FR2178410A5 (en) * 1972-03-28 1973-11-09 Ibm France
US3852535A (en) * 1972-11-16 1974-12-03 Zurcher Jean Frederic Pitch detection processor
US4468804A (en) * 1982-02-26 1984-08-28 Signatron, Inc. Speech enhancement techniques
US4652857A (en) * 1983-04-29 1987-03-24 Meiksin Zvi H Method and apparatus for transmitting wide-bandwidth frequency signals from mines and other power restricted environments
US4792975A (en) * 1983-06-03 1988-12-20 The Variable Speech Control ("Vsc") Digital speech signal processing for pitch change with jump control in accordance with pitch period

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
EASCON'75 Record (IEEE Electronics and Aerospace Systems Convention, Washington, DC, 29 September - 1 October 1975), IEEE, (New York, US), M.L. MALPASS: "The gold-rabiner pitch detector in a real time environment", pages 31-A - 31-G, see page 31-B,31-C: Appendix: "Detailed description of frame by frame pitch detector" *
European Conference on Speech Technology, Edingburgh, September 1987, vol. 1, CEP Consultants, D.M. HOWARD et al.: "Speech fundamental frequency estimation by multi-channel peak-picking", pages 318-321, see pages 318-319: "Implementation" *
IBM Technical Diclosure Bulletin, vol. 24, no. 2, July 1981, (New York, US), E.G. NASSIMBENE: "Speech compression and reconstruction", pages 1017-1018, see the whole article *
ICASSP'79 (1979 IEEE International Conference on Acoustics, Speech & Signal Processing, Washington, DC, 2-4 April 1979), IEEE, (New York, US), M. DALRYMPLE et al.: "Pitch extraction using MOS-LSI circuitry", pages 768-772, see paragraph IV: "Peak detector"; paragraph V: "Pitch estimator" *
ICASSP'86 (IEEE-IECEJ-ASJ International Conference on Acoustics, Speech, and Signal Processing, Tokyo, 7-11 April 1986), vol. 4, IEEE, (New York, US), F.J. CASAJUS-QUIROS et al.: "Dynamic time-scale compression for waveform speech coding", pages 2399-2401, see page 2399, right-hand column, lines 6-19; figure 1 *
ICC'84 (IEEE International Conference on Communications, Amsterdam, 14-17 May 1984), vol. 3, IEEE, (New York, US), E. BRAZDA: "High quality bandwidth reduction of speech signals", pages 1504-1507, see paragraph 2: "Algorithm" *
IEEE Transactions on Consumer Electronics, vol. 34, no. 2, May 1988, (New York, US), P. JIANPING: "Effective time-domain method for speech rate-change", pages 339-346, see paragrapgh I: "Introduction" *
Wireless World, vol. 88, no. 1552, January 1982, (Haywards Heath, GB), I.H. WITTEN: "Digital storage and analysis of speech", pages 44,45,49, see page 45, left-hand column: "Feature-extraction methods" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005099190A1 (en) 2004-04-07 2005-10-20 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for increasing perceived interactivity in communications systems
EP1735968B1 (en) * 2004-04-07 2014-09-10 TELEFONAKTIEBOLAGET LM ERICSSON (publ) Method and apparatus for increasing perceived interactivity in communications systems
US8311814B2 (en) 2006-09-19 2012-11-13 Avaya Inc. Efficient voice activity detector to detect fixed power signals
FR2941836A1 (en) * 2009-02-05 2010-08-06 Sound4 METHOD FOR MEASURING CRETE VALUES AND POWER OF AUDIOFREQUENCY SIGNAL
US8222887B2 (en) 2009-02-05 2012-07-17 Sound4 Process for measuring peak values and power of an audio frequency signal
EP3210207A4 (en) * 2014-10-20 2018-09-26 Audimax LLC Systems, methods, and devices for intelligent speech recognition and processing
US10475467B2 (en) 2014-10-20 2019-11-12 Audimax Llc Systems, methods and devices for intelligent speech recognition and processing

Similar Documents

Publication Publication Date Title
KR100283421B1 (en) Speech rate conversion method and apparatus
KR100302370B1 (en) Speech interval detection method and system, and speech speed converting method and system using the speech interval detection method and system
JP2017538146A (en) Systems, methods, and devices for intelligent speech recognition and processing
JPH0764598A (en) Audio-signal discrimination device and audio apparatus
US8582792B2 (en) Method and hearing aid for enhancing the accuracy of sounds heard by a hearing-impaired listener
JPH0431898A (en) Voice/noise separating device
JPH0968997A (en) Method and device for processing voice
WO1993009531A1 (en) Processing of electrical and audio signals
JP2002252894A (en) Sound signal processor
JPH0916193A (en) Speech-rate conversion device
KR20220144997A (en) Apparatus for detecting forgery of Voice file and method thereof
JPH06289898A (en) Speech signal processor
JPS6367197B2 (en)
JP3284968B2 (en) Hearing aid with speech speed conversion function
JP3303446B2 (en) Audio signal processing device
JP4079478B2 (en) Audio signal processing circuit and processing method
KR100372576B1 (en) Method of Processing Audio Signal
KR100359988B1 (en) real-time speaking rate conversion system
JPH08298698A (en) Environmental sound analyzer
JP2870421B2 (en) Hearing aid with speech speed conversion function
JPS63278100A (en) Voice recognition equipment
JPH09146587A (en) Speech speed changer
JP4005166B2 (en) Audio signal processing circuit
JPH0698398A (en) Non-voice section detecting/expanding device/method
JP2975808B2 (en) Voice recognition device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): GB JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase