EP0850472A2 - Verfahren und vorrichtung zur verarbeitung von tonsignalen - Google Patents

Verfahren und vorrichtung zur verarbeitung von tonsignalen

Info

Publication number
EP0850472A2
EP0850472A2 EP96928357A EP96928357A EP0850472A2 EP 0850472 A2 EP0850472 A2 EP 0850472A2 EP 96928357 A EP96928357 A EP 96928357A EP 96928357 A EP96928357 A EP 96928357A EP 0850472 A2 EP0850472 A2 EP 0850472A2
Authority
EP
European Patent Office
Prior art keywords
time
leading edge
maximum
signal
εignal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP96928357A
Other languages
English (en)
French (fr)
Inventor
Frank Uldall Leonhard
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of EP0850472A2 publication Critical patent/EP0850472A2/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • the present invention relates to a method and system for signal processing, by which method and system features repre ⁇ senting distinct sound pictures in auditory signals are extracted from transients in auditory signals.
  • the result of the processing may be used for identification of sound or of speech signals or for quality measurement of audio products or systems, such as loudspeakers, hearing instruments or hearing aids, telecommunication systems, or for quality measurement of acoustic conditions.
  • the method of the present invention may also be used in connection with speech compres ⁇ sion and decompression in narrow band telecommunication or speech storing systems.
  • the human ear has the ability to catch fast sound signals, detect sound frequency with great accuracy and differentiate between sound signals in complicated sound environments. For instance it is possible to understand what a singer is sing ⁇ ing in an accompaniment of musical instruments.
  • transient component in an auditory signal in this invention may be interpreted as a fast change of the energy in an auditory signal, where the rise time of the energy change is at the most 3 ms, and a slower change of the energy level may be interpreted as a change of the quasi steady state component of an auditory signal.
  • the transient and the quasi steady state component in an auditory signal may be defined as follows:
  • the transient component in an auditory signal is the fast energy changes, that may be detected by means of an envelope detection using a lowpass filter with a rela ⁇ tively high cutoff frequency in the range 50-1500 Hz, and preferably in the range 300-1500 Hz.
  • the quasi steady state component in an auditory signal i ⁇ the energy level, that may be detected by means an of envelope detection using a low-pass filter with a rela ⁇ tively low cutoff frequency in the range below 400 Hz, and preferably below 150 Hz.
  • the fast energy changes in the auditory signal may also be detected without the use of envelope detection or without the use of a low pass filter.
  • the nerve pulses launched from the cochlea are synchronised to the frequency of a sinus tone if the frequency is less than about 1.4 kHz. If the frequency of the tone is higher than about 1.4 kHz the pulses are launched randomly and less than once per period. Therefore the audi ⁇ tory perceptive faculty is tone oriented in the range up to about 1.4 kHz and transient oriented above.
  • the frequency spectra of speech signals from human beings contain energy bands, called formants. These formants are carriers of outstanding transients, and if the formants are selected for transient analyses an important noise sup- pression may be obtained.
  • WO 94/25958 it is described how the information hold in the shape of pulses representing the fast energy change ⁇ in auditory signal ⁇ are used for identifying distinct sound pictures, and in a preferred embodiment the shape of the leading edge of a pulse is determined by determining the pulse rise time or determining the slope variation. It is further preferred that the shape of the top part of the leading edge is determined, the top part starting at the point of the edge where the slope is maximum.
  • the rise time of a pulse provided as an input to a filter is faster than the rise time of the impul ⁇ e response of the filter then, the rise time of the output of the filter generated in response to the input pulse will be substantially equal to the rise time of the impulse response of the filter.
  • the ri ⁇ e time of the output of the filter generated in re ⁇ pon ⁇ e to the input pulse will be sub ⁇ tantially equal to the ri ⁇ e time of the input pul ⁇ e.
  • the signal processing of sound signals in the cochlea may be simulated by a filter bank compri ⁇ ing a set of bandpass filters with different centre frequencies and that the bandwidths of these filters increase with increasing centre frequencies which again means that the rise time ⁇ of the impulse responses of the filters increase with increasing centre frequencies.
  • the ri ⁇ e time of an output pul ⁇ e generated by a corresponding filter of the filter bank will be substantially equal to the rise time of the impulse re ⁇ pon ⁇ e of the filter when the ri ⁇ e time of the input pul ⁇ e i ⁇ fa ⁇ ter than the rise time of the impulse re ⁇ ponse of the filter and sub ⁇ tantially equal to the ri ⁇ e time of the input pulse when the rise time of the input pulse is slower than the rise time of the impulse respon ⁇ e of the filter.
  • the rise time of the input pulse may be determined by determination of the two filters A and B of the filter bank having the narrowest bandwidths of the filters of the bank generating output pulses in response to the input pulse with ⁇ ub ⁇ tantially identical ri ⁇ e time ⁇ a ⁇ the ri ⁇ e time of the input pul ⁇ e mu ⁇ t be within the rise time range between the rise time of the impulse response of the filter A, B with the narrowest bandwidth and the rise time of the impulse respon ⁇ e of the filter with the largest bandwidth that is also lower than the bandwidths of the filters A, B.
  • This ri ⁇ e time detection principle may be utilized by the auditory organ ⁇ of living beings and thi ⁇ could explain why the bandwidth ⁇ of the filters simulating cochlea ⁇ ound proce ⁇ ing are increa ⁇ ing with increa ⁇ ing centre frequencie ⁇ .
  • sound ⁇ peech signals may be generated by modulation of pulses in filters that modulate the ⁇ hape of the pul ⁇ es as described above.
  • Pulses to be modulated correspond to speech signal ⁇ generated in the articulation channel, e.g. by the vocal chord, and the proce ⁇ sing in the filters correspond to the modulation performed by adjustment of the articulation channel according to the phoneme proce ⁇ ed whereby the filter ⁇ modulate the shape of the pulses.
  • the time between pulses to be modulated should sufficiently long to ensure that there i ⁇ no interference between output pul ⁇ e ⁇ generated in response to different input pulses.
  • This object is accomplished by providing a method of proces- ⁇ ing an auditory ⁇ ignal to facilitate identification of abrupt energy changes within the auditory signal, which abrupt energy changes have a ri ⁇ e time of at the mo ⁇ t 3 m ⁇ , and which abrupt energy change ⁇ can be perceived by an animal ear ⁇ uch a ⁇ a human ear a ⁇ representing a distinct sound picture.
  • the method comprises: deriving, from the auditory signal, a first signal com- prising transient pul ⁇ es corresponding to at least part of the abrupt energy change ⁇ , tracing or monitoring pul ⁇ e ⁇ in the fir ⁇ t transient signal, determining local maxima of the transient pul ⁇ e ⁇ , and generating a second transient signal wherein the value of at lea ⁇ t one determined local maximum of a pul ⁇ e in the fir ⁇ t tran ⁇ ient signal is hold at said maximum value for a pre ⁇ determined period of time t rfpr thereby generating a corre ⁇ sponding pul ⁇ e in the second transient signal, said predeter- mined period of time t rfpr being of at the most 5 ms.
  • pulses in a train of two or more successive pulses in the first transient signal are ⁇ ubjected to the above de ⁇ cribed holding procedure, and one or more of the pulses is/are located at a distance in time from a preceding pulse which is ⁇ horter than the predetermined period of time t rfpr and ha ⁇ /have a local maximum greater than the local maximum of ⁇ aid preceding pul ⁇ e, the hold of the local maximum of said preceding pulse is maintained until the occurrence of the ⁇ ub ⁇ equent, greater local maximum and i ⁇ replaced by ⁇ aid ⁇ ub ⁇ equent, greater local maximum.
  • the predetermined period of time t rfpr is shorter than or equal to 3 ms, or shorter than or equal to 2 m ⁇ . It i ⁇ even more preferred that t rfpr i ⁇ ⁇ horter than or equal to 1 m ⁇ , or about 0,7 m ⁇ .
  • the shape of a pulse in the second transient pulse signal is an important feature for identifi- cation of the pulse.
  • the shape of pulse ⁇ in the ⁇ econd tran ⁇ ient pulse signal are determined or identified, and preferably one or more distinct sound pictures is/are identified from the determined ⁇ hape.
  • the ⁇ hape of a pulse may be characterized by the pulse rise time, the form of the leading edge, the duration of the pulse, and/or the fall time or the form of the lagging edge, and it is preferred that the form of the leading edge is determined by determining rise time, slope and/or slope variation of at least part of the leading edge.
  • the frequency of the auditory signal is determined from the second transient signal based on the distance in time between succeeding leading edges of pul ⁇ e ⁇ in the ⁇ ignal.
  • the method include ⁇ ⁇ electing pulses where the shape of the leading edge has a maximum slope greater than a prede ⁇ termined minimum value, thereby discarding pulses with a rather small maximum slope, which pulses may be considered a ⁇ representing noise components in the process of identifica ⁇ tion or representation of distinct sound picture ⁇ of the auditory ⁇ ignal.
  • This object is accomplished by providing a method for select ⁇ ing leading edge ⁇ of tran ⁇ ient pulse ⁇ in a transient signal, said transient signal being derived from an auditory signal having abrupt energy changes with a rise time of at the mo ⁇ t 3 m ⁇ , and which abrupt energy change ⁇ can be perceived by an animal ear ⁇ uch a ⁇ a human ear a ⁇ representing a distinct sound picture.
  • the method compri ⁇ e ⁇ : determining or mea ⁇ uring the maximum ⁇ lope of a leading edge of a pul ⁇ e in the tran ⁇ ient ⁇ ignal, comparing the obtained maximum ⁇ lope with a predetermined lower thre ⁇ hold value for maximum slopes of leading edges, and if the obtained maximum slope i ⁇ equal to or greater than the predetermined lower threshold value, selecting ⁇ aid leading edge a ⁇ a candidate to the leading edge of a pul ⁇ e.
  • ⁇ everal leading pul ⁇ e edge ⁇ being candidate ⁇ for a ⁇ elected leading edge may be ob ⁇ erved within a ⁇ hort period of time. Thu ⁇ , it is preferred that if the transient signal comprises one or more sub ⁇ equent pul ⁇ e or pul ⁇ es, the leading edge or edges of which is/are located within a distance in time from the ⁇ elected candidate, which distance in time is ⁇ horter than a predetermined period of time, t s , of at the mo ⁇ t 4 m ⁇ , then the method further comprises: determining or measuring the maximum slope or ⁇ lope ⁇ of the leading edge or edge ⁇ of ⁇ aid ⁇ ub ⁇ equent pul ⁇ e or pulse ⁇ in the transient signal, comparing the obtained maximum slope or ⁇ lope ⁇ of the ⁇ ub ⁇ equent leading edge or edge ⁇ and the obtained maximum ⁇ lope of the selected candidate with one another, determining which of said leading edge ⁇ ha ⁇ the largest maximum slope, and selecting the leading edge with the largest maximum slope as the leading edge of a
  • the predetermined period of time t s is shorter than or equal to 3,3 ms, or shorter than or equal to 2 ms, or even shorter than or equal to 1 ms.
  • the method for ⁇ electing the leading edge of a ⁇ econd pul ⁇ e in the tran ⁇ ient ⁇ ignal further compri ⁇ e ⁇ : determining or mea ⁇ uring the maximum slope or slopes of the leading edge or edges of a pulse or pulses in the transi- ent signal subsequent to the selected leading edge of the first pulse within a distance in time from the leading edge of the fir ⁇ t pulse which is shorter than a predetermined period of time, t ep , of at the most 4 ms, said time period t ep being longer than or equal to the predetermined time period t s , comparing the obtained maximum ⁇ lope or ⁇ lope ⁇ of the ⁇ ubsequent leading edge or edges with the obtained maximum ⁇ lope of the leading edge of the fir
  • the method for selecting the leading edge of the second pulse in the transient ⁇ ignal further compri ⁇ es: determining or measuring the maximum slope or slope ⁇ of one or more leading pul ⁇ e edge ⁇ located at a di ⁇ tance in time from the leading edge of the first pulse which is longer than or equal to the predetermined period of time, t ep , reducing the required threshold value of the maximum slope below the maximum slope of the leading edge of the first pul ⁇ e, and ⁇ electing the fir ⁇ t leading edge with a maximum ⁇ lope • greater than the required threshold value as the leading edge of a second pulse, which second pul ⁇ e may correspond to an abrupt energy change representing a distinct sound picture.
  • the required threshold value for the maximum slope is decreased a ⁇ a function of time from the maximum ⁇ lope of the leading edge of the fir ⁇ t pul ⁇ e down to the predetermined lower thre ⁇ hold value.
  • the required thre ⁇ hold value i ⁇ decreased exponentially with a predetermined time constant t c .
  • the predetermined period of time t ep is shorter than or equal to 3,3 m ⁇ , or ⁇ horter than or equal to 2 m ⁇ , or even ⁇ horter than or equal to 1 m ⁇ .
  • the shape of a selected leading edge of a pulse may represent an important feature for identification or representation of the corresponding di ⁇ tinct ⁇ ound picture. Thu ⁇ , it i ⁇ pre ⁇ ferred that the ⁇ hape of the ⁇ elected leading edge ⁇ of pul ⁇ e ⁇ i ⁇ determined, and/or a distinct sound picture i ⁇ identified from the determined shape.
  • the shape of the selected leading edge of a pulse is determined by the obtained maximum ⁇ lope of the ⁇ elected leading edge.
  • the ri ⁇ e time of a ⁇ elected leading edge of a pulse may also represent an important feature for identification or repre ⁇ sentation of the corresponding distinct sound picture.
  • the ⁇ hape of a ⁇ elected leading edge of a pul ⁇ e i ⁇ characteri ⁇ ed by the rise time of the edge where the rise time i ⁇ determined as the time period from t b to t e , or by the ⁇ hape of the leading edge in the time period from t b to t e , where t b is the point in time where the slope of the leading edge has reached a threshold value for the beginning of the edge, d b , the ratio of said threshold value d b to the obtained maximum slope being predetermined, and t e is the point in time where the slope of the leading edge ha ⁇ decrea ⁇ ed from the maximum value to a thre ⁇ hold value for the end of the edge, d e , the ratio of said thres- hold value d e
  • the value of d b is in the range of 30-100% of the obtained maximum slope, and the value of d e is in the range of 30-90% of the obtained maximum ⁇ lope.
  • the value of d b may even more preferably be substantially equal to 50% or 100% of the obtained maximum slope, and the value of d e may even more preferably be sub ⁇ tantially equal to 70% of the obtained maximum ⁇ lope.
  • the transient signal from which the leading edge or edge ⁇ is/are selected is a transient signal generated in accordance with one of the embodiments referring to gene- ration of the ⁇ econd tran ⁇ ient ⁇ ignal.
  • the ⁇ y ⁇ tem com ⁇ pri ⁇ e ⁇ means for deriving, from the auditory signal, a first ⁇ ignal comprising transient pulses corresponding to at least part of the abrupt energy changes, and means for generating a second transient signal from said -first transient signal, said second signal generation mean ⁇ being adapted to hold the value of at least one local maximum of a pulse in the first transient signal at said maximum value for a predetermined period of time, t rfpr , thereby generating a corresponding pulse in the second transient signal, said predetermined period of time t rfpr being of at the most 5 ms.
  • t rfpr is of at the mo ⁇ t 1 ms or about 0,7 ms.
  • the invention also relates to a system for selecting leading edges of pulses in a transient signal, which signal repre- sents abrupt energy changes within an auditory signal.
  • the sy ⁇ tem compri ⁇ e ⁇ mean ⁇ for determining or measuring the maximum slope of a leading edge of a pulse in the tran ⁇ ient ⁇ ignal, mean ⁇ for comparing the obtained maximum slope with a predetermined lower threshold value for maximum slope ⁇ of leading edge ⁇ , and mean ⁇ for, ba ⁇ ed on the re ⁇ ult of ⁇ aid compari ⁇ on, se ⁇ lecting a candidate to the leading edge of a pulse.
  • the means for determining or measuring the maximum slope of a leading edge of a pulse are further adapted to determine or measure the maximum ⁇ lope or ⁇ lope ⁇ of a leading edge or edges of one or more pulses sub ⁇ equent to the ⁇ elected candi ⁇ date
  • the comparing means are further adapted for comparing the obtained maximum slope or slopes of the subsequent leading edge or edges and the obtained maximum ⁇ lope of the ⁇ elected candidate with one another
  • the ⁇ electing mean ⁇ are further adapted for, ba ⁇ ed on the re ⁇ ult of said comparison, selecting the leading edge with the largest maximum slope.
  • any of the system ⁇ which comprises means for generating the second transient signal further comprises means for selecting lead ⁇ ing edges of pulses in a transient signal in accordance with an embodiment of the present invention, the leading edges being selected from the second transient signal.
  • Fig. 1 show ⁇ a filter bank with N bandpass filters
  • Figs. 2 and 3 show transient detection signal ⁇ of the ⁇ peech ⁇ ignal " ⁇ oftkey" for two filters having different center frequencies in a filter bank
  • Fig. 4 how ⁇ the tran ⁇ ient detection signals of Fig. 3 of the vowel "i" a ⁇ in key
  • Fig. 5 show ⁇ tran ⁇ ient detection ⁇ ignal ⁇ corre ⁇ ponding to the ⁇ peech signal of Fig. 4, with the speech signal being pro ⁇ Obd according to a preferred embodiment of refractoriness period processing,
  • Fig. 6 show ⁇ tran ⁇ ient detection ⁇ ignal ⁇ corre ⁇ ponding to the speech signal of Fig. 4, with the speech signal being pro- ce ⁇ ed according to another preferred embodiment of refrac- torine ⁇ period proce ⁇ ing,
  • Fig. 7 illu ⁇ trates selection of a leading edge of a transient pul ⁇ e according to a preferred embodiment of the invention
  • Fig. 8 illustrate ⁇ the principle ⁇ of determination of maximum ⁇ lope and ri ⁇ e time of a leading edge of a tran ⁇ ient pul ⁇ e
  • Fig. 9 hows transient detection signals, including an edge signal and a measure of the pitch period, corresponding to the speech signal "softkey" pronounced by a female
  • Fig. 10 show ⁇ transient detection signal ⁇ , including an edge signal and a measure of the pitch period, corresponding to the vowel "i" as in key,
  • Fig. 11 shows the edge signal of Fig. 10 filtered by a band- pa ⁇ filter
  • Fig. 12 i a flow diagram illustrating a preferred embodiment of refractoriness period proce ⁇ ing
  • Fig. 13 is a flow diagram illustrating a preferred embodiment of detection of a leading edge
  • Fig. 14 is a plot of the bandwidths of cochlea bandpass filter ⁇ a ⁇ a function of centre frequency
  • Fig. 15 i ⁇ a plot of ri ⁇ e time ⁇ of input and output pulses of a bandpa ⁇ filter and of the impulse response of the filter.
  • the cochlea in the human ear can be regarded as an infinite number of bandpass filters, IBP, within the frequency range of the human ear.
  • a filter bank may be employed for detecting formants and thereby detecting the transient con ⁇ dition ⁇ that hold the most well qualified information with a sub ⁇ tantial suppres ⁇ ion of noi ⁇ e.
  • the bandwidth of the bandpas ⁇ filter ⁇ is chosen to be the same for all filter ⁇ in order to obtain the ⁇ ame envelope.
  • Another choice might be to scale the bandwidth of the filters in accordance with the Bark ⁇ cale or Mel ⁇ cale.
  • Fig. 1 shows a filter bank with N bandpass filters, BP ⁇ ⁇ -BP jj , followed by an envelope detection performed by use of rec ⁇ tification mean ⁇ , R- ⁇ R JJ , and lowpa ⁇ filter ⁇ , LP- L -LP J ⁇ .
  • the rectification mean ⁇ are preferably one-way rectification means.
  • the filter bank has to cover the transient oriented frequency range, and the centre frequency of the bandpa ⁇ filter ⁇ ha ⁇ therefore to be from about 1.4 kHz and upward ⁇ . To be able to detect sufficient fast transients the bandwidth has to be about 1.4 kHz.
  • Figs. 2 and 3 the transient detection by mean ⁇ of a filter bank i ⁇ illu ⁇ trated.
  • Figs. 2 and 3 show processed curves for the word " ⁇ oftkey" pronounced by a female and detected by mean ⁇ of two different bandpa ⁇ filter ⁇ .
  • the abscis ⁇ as represent a time interval of 1 ⁇ and the ordinates in Figs. 2a, 2b, 3a and 3b represent the sound pressure of the corre ⁇ ponding ⁇ peech ⁇ ignal wherea ⁇ the ordinates of Fig ⁇ . 2c and 3c repre ⁇ ent the energy of the corre ⁇ ponding ⁇ peech ⁇ ignal.
  • the bandpa ⁇ filter ⁇ are Butterworth filters of 6th order with a bandwidth on 1.4 kHz.
  • the centre frequency i ⁇ about 1.5 kHz with a lover cutoff frequency at about 0.8 kHz and an upper cutoff frequency at about 2.2 kHz.
  • the centre frequency is about 2.8 kHz with a lower cutoff frequency at about 2.1 kHz and an upper cutoff frequency at about 3.5 kHz.
  • the lowpas ⁇ filter i ⁇ a Ith order Butterworth filter with a cutoff frequency at 700 Hz, and the pretran ⁇ ient ⁇ ignal i ⁇ the output ⁇ ignal from the bandpa ⁇ filter.
  • the vowel "o” is very outstanding in the transient signal, but the other phonemes are very indistinct.
  • Fig. 3c the vowel "o” is less outstanding but the other phonemes are much more di ⁇ tinct.
  • the conclu ⁇ ion may be drawn that the vowel "o” should preferably be detected from the transient signal processed by the bandpa ⁇ filter with a centre fre ⁇ quency at 1.5 kHz, and the remaining phonemes should prefer ⁇ ably be detected from the transient signal processed by the bandpas ⁇ filter with a centre frequency at 2.8 kHz.
  • each branch can be regarded a ⁇ a TSD (Tran ⁇ ient Signal Detector) .
  • the number of branches in the sy ⁇ tem depend ⁇ on the demand on the ⁇ ystem, but the number should be in the range of 2-40.
  • TSDl the TSD used in connection with the results of Fig. 2 having a centre frequency at 1.5 kHz
  • TSD2 the TSD used in connection with the results of Fig. 3 having a centre frequency at 2.8 kHz
  • Fig. 1 then illu ⁇ trates a TSD bank.
  • Important features of fa ⁇ t energy changes of an auditory ⁇ ignal for identifying or repre ⁇ enting features that can be perceived by a human ear as repre ⁇ enting a di ⁇ tinct ⁇ ound picture may be the ⁇ hape of the leading edge and the period between the leading edge ⁇ .
  • Thi ⁇ period i ⁇ called the refractorine ⁇ period.
  • the nerve pul ⁇ e ⁇ launched from the cochlea are ⁇ ynchronized to the frequency of a ⁇ inu ⁇ tone if the frequency i ⁇ le ⁇ than about 1.4 kHz but not above thi ⁇ frequency.
  • Thi ⁇ mean ⁇ that the refractorine ⁇ period of interest may be about 0.7 ms.
  • the refractoriness period may be used for simplifying the proces ⁇ of detecting the leading edge of a tran ⁇ ient pul ⁇ e in the tran ⁇ ient component.
  • Fig. 4 shows part of the curves of Fig. 3 proces ⁇ ed by TSD2. The curves shown in Fig. 4 repre- sent the signals obtained for the vowel "i" as in key.
  • the transient signal of Fig. 4c is proces ⁇ ed without a refrac ⁇ torine ⁇ period.
  • Fig ⁇ . 5a and 6a are identical to Figs. 4a and Figs. 5b and 6b are identical to Figs. 4b.
  • the transient signal of Fig. 5c which represent ⁇ the energy of the corresponding speech signal is obtained from the bandpa ⁇ filtered pretransient signal in Fig. 5b by way of a rectification and by using a refractoriness period of l ms.
  • the signal of Fig. 5c ha ⁇ not been ⁇ ubject to a lowpa ⁇ filtration. It i ⁇ preferred that the implementation of the refractorine ⁇ s period is performed by using a software algorithm which is described below in connection with Fig. 12.
  • Fig. 6c show ⁇ a tran ⁇ ient ⁇ ignal which repre ⁇ ents the energy of the corresponding speech signal and which is obtained by performing a lowpas ⁇ filtration on the ⁇ ignal of Fig. 5c.
  • All the ⁇ ignal ⁇ of Fig ⁇ . 4 and 5 hold the ⁇ peech information and may ea ⁇ ily be perceived by a human ear, although ⁇ ome noi ⁇ e i ⁇ introduced during the proce ⁇ of tran ⁇ ient detection resulting in the signal ⁇ of Fig ⁇ . 5 c and 6c.
  • the ab ⁇ ci ⁇ a ⁇ repre ⁇ ent a time interval of 50 m ⁇ .
  • the refractorine ⁇ period may be about 0.5 m ⁇ or longer but preferably le ⁇ than the minimum pitch period, that mean ⁇ less than about 3.3 ms.
  • the shape of the leading edge may be one of the important feature ⁇ for repre ⁇ ⁇ enting a sound picture, and the maximum slope of the leading edge may be an important feature for the edge.
  • the maximum slope of the leading edge may be the basi ⁇ for detec- ting the important feature ⁇ for identifying or repre ⁇ enting a di ⁇ tinct ⁇ ound picture.
  • Fig. 7 the ab ⁇ ci ⁇ sa represent ⁇ a time interval of 50 m ⁇ , and the ⁇ ignals of Fig ⁇ . 7a, b and c correspond to the sig- nal ⁇ of Fig ⁇ . 6a, b and c, wherea ⁇ in Fig. 7d the differenti ⁇ ated ⁇ ignal of the signal of Fig. 7c, called differential signal, is shown.
  • d em a predetermined minimum value
  • the size of d em may depend on how the signal is normalised.
  • the signals of Fig ⁇ . 2-7 are normali ⁇ ed to the maximum nu ⁇ merical value in the whole ⁇ ignal, and d em i ⁇ preferably selected to 2.5% of the maximum detected slope value.
  • d em may be ⁇ elected otherwise, and preferably higher.
  • the maximum slope may be detected by finding a maximum greater than the threshold d em and select this a ⁇ a candidate to be the maximum ⁇ lope of a leading edge, called d m . If there i ⁇ a greater maximum ⁇ lope for a given ⁇ earch time, t s , then choo ⁇ e thi ⁇ point a ⁇ having the maximum ⁇ lope of a leading edge, else choose the candidate.
  • the search time t s may be selected to be les ⁇ than the minimum pitch period which means les ⁇ than about 3.3 m ⁇ , but preferably around 2 m ⁇ .
  • the following leading edge may be detected a ⁇ illu ⁇ trated in Fig. 7d.
  • t ep When the point for the maximum ⁇ lope for a leading edge i ⁇ detected, then for a time period, t ep , only a maximum ⁇ lope greater than the previou ⁇ maximum ⁇ lope will be accepted, in other word ⁇ , in thi ⁇ time period the thre ⁇ hold for accepting a leading edge i ⁇ equal to the previous maximum ⁇ lope.
  • the thre ⁇ hold may be expo ⁇ nential decrea ⁇ ed with a time con ⁇ tant t c , which i ⁇ also illustrated in Fig. 7d.
  • the time period for t ep may be less than the minimum pitch period, that mean less than about 3.3 ms, but preferably between 1-2 m ⁇ . However, t ep should be longer than or equal to the search time t s .
  • the edge of a leading edge may be described as beginning at a point in time, t b , where the slope has the maximum slope, or a point in time before the point with the maximum slope, where the slope has reached a threshold value, d b , having a predetermined ratio to the maximum slope, and ending at the point, t e , after the point with the maximum slope, where the ⁇ lope ha ⁇ decreased to a threshold value, d e , having a prede- termined ratio to the maximum slope.
  • This principle is il ⁇ lustrated in Fig. 8, where the amplitude of the leading edge is ⁇ hown a ⁇ A in Fig. 8a, and the differential of the leading edge i ⁇ ⁇ hown a ⁇ D in Fig. 8b.
  • Fig ⁇ . 9 and 10 an edge detection following the above defined edge detector principles i ⁇ illu ⁇ trated .
  • the absci ⁇ a ⁇ in Fig. 9 repre ⁇ ent a time interval of 1 ⁇ , while a time interval of 50 m ⁇ of the signal ⁇ in Fig. 9 i ⁇ repre ⁇ sented in Fig. 10, in which time interval the signal ⁇ for the vowel "i" in the word key are ⁇ hown.
  • the tran ⁇ ient signal of Figs. 9c and 10c has been processed in accordance with the signal presented in Fig. 6c, and a leading edge signal named edge ⁇ ignal, see Figs. 9d and lOd, has been obtained by determining the rise time of selected leading edges.
  • a graph of the pitch period between the selected edges is shown, Fig ⁇ . 9e and lOe. If the pitch period i ⁇ longer than 15 ms it i ⁇ set equal to 15 ms. A low resolution i ⁇ obtained in the printout of Fig. 9d due to a limited printer resolution.
  • the transient signal detector TSD2 is used when proces ⁇ ing the ⁇ ignal ⁇ of Fig ⁇ . 9 and 10.
  • the maximum slopes of pulse ⁇ in the tran ⁇ ient ⁇ ignal, Fig ⁇ . 9c and 10c, are determined, and for the selected leading edges the starting point in time, t b , of the edge is set equal to the point in time where the maximum ⁇ lope is detected, i.e. d b is equal to d m
  • t e is equal to the point in time where the ⁇ lope ha ⁇ decreased to 70% of d m , i.e. d e is equal to 70 % of d m .
  • the part of the leading edge of a pulse in the transient signal corresponding to the time interval of t b to t e is repre ⁇ ented a ⁇ the lead ⁇ ing edge of a pul ⁇ e in the edge signal, Figs. 9d and lOd.
  • the edge signal holds the full speech information and may easily be perceived by a human ear, although some noise may be introduced during the proces ⁇ ing.
  • the leading edge may be defined a ⁇ beginning at a leading threshold value, d b , greater than 50 % of the maximum slope, but preferably equal to the maximum ⁇ lope, and ending at a lagging thre ⁇ hold value, d e , greater than 50 % of the maximum ⁇ lope, but preferably 70% of the maximum ⁇ lope.
  • the rise time of the leading edge may be defined a ⁇ the time period between t b and t e , and may in a preferred embodiment be used as representing a measure for the ⁇ hape of the lead ⁇ ing edge, and thu ⁇ forming the ba ⁇ i ⁇ for identification of a di ⁇ tinct sound picture.
  • the pulses of the edge signal may al ⁇ o be cho ⁇ en a ⁇ the ba ⁇ i ⁇ for identification of a di ⁇ tinct ⁇ ound picture.
  • edge detector can be u ⁇ ed a ⁇ a pitch detector, but known technique ⁇ for pitch detection can al ⁇ o be applied.
  • the ⁇ hape of the leading edge of a ⁇ peech ⁇ ignal which ⁇ ignal may be a phoneme, may be considered a conclusive feature for narrow band communication. Therefore, only infor- mation about the leading edge, unvoiced or voiced, and/or pitch period, and/or loudnes ⁇ of the speech signal should need to be transmitted. Thu ⁇ , it ⁇ hould not be nece ⁇ ary to tran ⁇ mit information concerning the vocal filter, thereby ⁇ aving bandwidth.
  • Information about a ⁇ peech signal being unvoiced or voiced, and/or the pitch period and/or loudnes ⁇ of the speech signal may be compressed and decompres ⁇ ed by mean ⁇ of known tech ⁇ nology, in which ⁇ peech ⁇ ignals are framed in time periods of 20-40 ms, and only the change in the parameters need to be tran ⁇ mitted.
  • the leading edge may be compressed by identify ⁇ ing and representing the edge according to one of the embodi ⁇ ments of the present invention, for time frames of 20-40 m ⁇ by mean ⁇ of a template identification from a library or a book.
  • the speech signal may be decompres ⁇ ed by mean ⁇ of a library or book of edge template ⁇ with corresponding standard filters, which filters should be excited by the edge tem ⁇ plate. Otherwise the speech signal may be decompressed by mean ⁇ of a library or book, with ⁇ tandard wave form ⁇ iden- tified by means of the edge template identification.
  • FIG. 11 shows the edge signal of Fig. lOd filtered with the same bandpas ⁇ filter u ⁇ ed for processing the pretransient ⁇ ignal, Fig. 10b, i.e. the centre frequency i ⁇ about 2.8 kHz with a lower cutoff frequency about 2.1 kHz and an upper cutoff frequency about 3.5 kHz.
  • the sound quali ⁇ ty of the signal represented in Fig. 11 is improved when compared to the ⁇ ignal of Fig. lOd.
  • the ⁇ ignal of Fig. 11 may be compared with the pretran ⁇ ient ⁇ ignal of Fig. 10b.
  • the edge ⁇ ignal may be proce ⁇ ed by mean ⁇ of a filter with another filter characteri ⁇ tic or by means of waveform de ⁇ coding.
  • Fig. 12 how ⁇ a preferred embodiment of implementation of the refractorine ⁇ period.
  • the definition ⁇ of the flow chart variables of the proces ⁇ of Fig. 12 are given a ⁇ follow ⁇ :
  • PrvSi value of previous input ⁇ ample (Si (n-l), n > 0) .
  • LeadingEdge a Boolean variable,- it is true if the sample is in a leading edge or in a refractorines ⁇ period, el ⁇ e it i ⁇ false.
  • Fig. 13a how ⁇ a preferred embodiment of implementation of the edge detection principle.
  • d differentiated transient signal (Differential signal) .
  • n Index for ⁇ ample ⁇ of the differential ⁇ ignal.
  • d prv A help variable and mostly the previous sample of the differential ⁇ ignal.
  • d em Relative minimum thre ⁇ hold for the differential signal.
  • d m Maximum slope for the edge.
  • t s Search time in samples for the greatest local maximum of the slope greater than d m . t m :Sample no. for the detected maximum slope k :Index for the detected edge.
  • thr Predetermined ratio of thre ⁇ hold value for the ⁇ lope at the beginning of the edge d b to the maximum ⁇ lope d m .
  • thr c Predetermined ratio of thre ⁇ hold value for the ⁇ lope at the end of the edge d e to the maximum slope d m .
  • Fig. 15 illustrate ⁇ that if the ri ⁇ e time of a pul ⁇ e provided as an input to a filter is slower than the rise time of the impul ⁇ e re ⁇ pon ⁇ e of the filter then, the rise time of the output of the filter generated in response to the input pulse will be sub ⁇ tantially equal to the rise time of the input pulse.
  • Signal processing of sound signal ⁇ in the cochlea may be simulated by a filter bank comprising a ⁇ et of bandpass filters with different centre frequencies and wherein the bandwidths of these filters increase with increasing centre frequencies which again means that the ri ⁇ e times of the impulse responses of the filters increase with increasing centre frequencies.
  • the ri ⁇ e time of an output pulse generated by a corresponding filter of the filter bank will be ⁇ ubstantially equal to the ri ⁇ e time of the impul ⁇ e re ⁇ pon ⁇ e of the filter when the ri ⁇ e time of the input pulse is faster than the ri ⁇ e time of the impul ⁇ e re ⁇ pon ⁇ e of the filter and ⁇ ub ⁇ tantially equal to the ri ⁇ e time of the input pul ⁇ e when the ri ⁇ e time of the input pul ⁇ e i ⁇ ⁇ lower than the rise time of the impulse response of the filter.
  • the rise time of the input pul ⁇ e may be determined by determination of the two filters A and B of the filter bank having the narrowest bandwidths of the filters generating output pul ⁇ e ⁇ in re ⁇ ponse to the input pulse with sub ⁇ tantially identical rise times a ⁇ the rise time of the input pulse must be within the rise time range between the rise time of the impulse response of the filter A, B with the narrowest bandwidth and the ri ⁇ e time of the impul ⁇ e respon ⁇ e of the filter with the largest bandwidth that i ⁇ al ⁇ o lower than the bandwidth ⁇ of the filter ⁇ A, B.
  • speech signal ⁇ may be generated by modulation of pul ⁇ e ⁇ in a filter that modulate ⁇ the ⁇ hape of the pul ⁇ e ⁇ a ⁇ de ⁇ cribed above.
  • Pulse ⁇ to be modulated correspond to sound signals generated in the articulation channel, e.g. by the vocal chord, and the processing in the filters correspond ⁇ to the modulation performed by adju ⁇ tment of the articulation channel according to the phoneme proce ⁇ ed whereby the filters modulate the shape of the pulse ⁇ .
  • the time between pul ⁇ e ⁇ to be modulated ⁇ hould ⁇ ufficiently long to ensure that there i ⁇ no interference between output pul ⁇ e ⁇ generated in response to different input pulses.
  • the shape of the leading edge and the rise time may both be conclusive features.
  • the leading edge may be detected a ⁇ de ⁇ cribed above, and in a preferred embodiment the edge detection i ⁇ ba ⁇ ed on a transient signal proces ⁇ ed with a refractorine ⁇ period either without a lowpass filtering as ⁇ hown in Fig. 5, or with a lowpa ⁇ filter a ⁇ ⁇ hown in Fig. 6.
  • a phoneme may be identified by mean ⁇ of feature ⁇ , such as a cla ⁇ ification of the shape of the leading edges, mean pitch period, variation of pitch periods, and/or dynamic trend of the edge height in a time frame of 10-100 ms.
  • the pre ⁇ ent invention i ⁇ preferably implemented utilizing a programmed proce ⁇ or ⁇ uch a ⁇ a microcomputer for real time applications but this i ⁇ not to be limiting.
  • the pre ⁇ ent invention may al ⁇ o be implemented u ⁇ ing a dedicated hardware proce ⁇ or if de ⁇ ired or by a more powerful mainframe computer without departing from the pre ⁇ ent invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
EP96928357A 1995-09-05 1996-09-04 Verfahren und vorrichtung zur verarbeitung von tonsignalen Withdrawn EP0850472A2 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DK97495 1995-09-05
DK97495 1995-09-05
PCT/DK1996/000370 WO1997009712A2 (en) 1995-09-05 1996-09-04 Method and system for processing auditory signals

Publications (1)

Publication Number Publication Date
EP0850472A2 true EP0850472A2 (de) 1998-07-01

Family

ID=8099600

Family Applications (1)

Application Number Title Priority Date Filing Date
EP96928357A Withdrawn EP0850472A2 (de) 1995-09-05 1996-09-04 Verfahren und vorrichtung zur verarbeitung von tonsignalen

Country Status (3)

Country Link
EP (1) EP0850472A2 (de)
AU (1) AU6785696A (de)
WO (1) WO1997009712A2 (de)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002507776A (ja) * 1998-03-13 2002-03-12 レオンハルト,フランク,ウルダル 音声信号の過渡現象を解析するための信号処理方法
AU2001289593A1 (en) * 2000-09-20 2002-04-02 Leonhard Research A/S Quality control of electro-acoustic transducers
WO2002080618A1 (en) * 2001-03-30 2002-10-10 Leonhard Research A/S Noise suppression in measurement of a repetitive signal
EP2214165A3 (de) * 2009-01-30 2010-09-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung, Verfahren und Computerprogramm zur Änderung eines Audiosignals mit einem Transientenereignis

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AT276495B (de) * 1967-08-03 1969-11-25 Ibm Oesterreich Internationale Verfahren zur Multiplex-Sprachsynthese
JPS50155105A (de) * 1974-06-04 1975-12-15
US4382164A (en) * 1980-01-25 1983-05-03 Bell Telephone Laboratories, Incorporated Signal stretcher for envelope generator
DK46493D0 (da) * 1993-04-22 1993-04-22 Frank Uldall Leonhard Metode for signalbehandling til bestemmelse af transientforhold i auditive signaler

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9709712A3 *

Also Published As

Publication number Publication date
WO1997009712A3 (en) 1997-04-10
AU6785696A (en) 1997-03-27
WO1997009712A2 (en) 1997-03-13

Similar Documents

Publication Publication Date Title
US5884260A (en) Method and system for detecting and generating transient conditions in auditory signals
US3855416A (en) Method and apparatus for phonation analysis leading to valid truth/lie decisions by fundamental speech-energy weighted vibratto component assessment
US8488800B2 (en) Segmenting audio signals into auditory events
CA2448182C (en) Segmenting audio signals into auditory events
EP2549475A1 (de) Segmentierung von Audiosignalen in Hörereignissen
AU2002252143A1 (en) Segmenting audio signals into auditory events
EP0182989B1 (de) Normalisierung von Sprachsignalen
WO1990011593A1 (en) Method and apparatus for speech analysis
JPH0431898A (ja) 音声雑音分離装置
US5960373A (en) Frequency analyzing method and apparatus and plural pitch frequencies detecting method and apparatus using the same
US5483617A (en) Elimination of feature distortions caused by analysis of waveforms
Smith A phoneme detector
EP0850472A2 (de) Verfahren und vorrichtung zur verarbeitung von tonsignalen
EP1293961B1 (de) Signalverarbeitungsverfahren zur Analyse von Sprachsignal-Transienten
US4982433A (en) Speech analysis method
Kajita et al. Subband-autocorrelation analysis and its application for speech recognition
KR100359988B1 (ko) 실시간 화속 변환 장치
Kiukaanniemi et al. Long-term speech spectra: A computerized method of measurement and a comparative study of Finnish and English data
WO1997009712B1 (en) Method and system for processing auditory signals
WO1993009531A1 (en) Processing of electrical and audio signals
JPS61126600A (ja) 音響波入力処理方法
SU1111199A1 (ru) Способ спектрального представлени вокализованного речевого сигнала
David et al. Technique for Coding Speech Signals for Transmission over a Reduced Capacity Digital Channel
JPS61273599A (ja) 音声認識装置
JPH0462598B2 (de)

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19980406

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE ES FR GB

17Q First examination report despatched

Effective date: 19991006

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20000217