EP1143417B1 - Verfahren zur Geschwindigkeitsmodifikation von Sprachsignalen, Verwendung des Verfahrens, und Anordnung zur Durchführung des Verfahrens - Google Patents
Verfahren zur Geschwindigkeitsmodifikation von Sprachsignalen, Verwendung des Verfahrens, und Anordnung zur Durchführung des Verfahrens Download PDFInfo
- Publication number
- EP1143417B1 EP1143417B1 EP00610036A EP00610036A EP1143417B1 EP 1143417 B1 EP1143417 B1 EP 1143417B1 EP 00610036 A EP00610036 A EP 00610036A EP 00610036 A EP00610036 A EP 00610036A EP 1143417 B1 EP1143417 B1 EP 1143417B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- segment
- speech
- pitch period
- signal
- speech signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000004904 shortening Methods 0.000 claims description 6
- 230000003362 replicative effect Effects 0.000 claims 2
- 238000013459 approach Methods 0.000 description 13
- 239000002131 composite material Substances 0.000 description 7
- 238000005311 autocorrelation function Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000005284 excitation Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 208000016354 hearing loss disease Diseases 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/04—Time compression or expansion
Definitions
- the invention relates to a method of converting the speech rate of a speech signal having a pitch period below a maximum expected pitch period.
- the method comprises the steps of dividing the speech signal into segments, estimating the pitch period of the speech signal in a segment, copying a fraction of the speech signal in the segment, said fraction having a duration equal to said estimated pitch period, providing from said fraction an intermediate signal having the same duration, and expanding the segment by inserting said intermediate signal pitch synchronously into the speech signal of the segment.
- the invention also relates to the use of the method in a mobile telephone. Further, the invention relates to a device adapted to convert the speech rate of a speech signal.
- One way of enhancing the intelligibility of the speech is to slow down the speech.
- the principal objective of this approach is to give the listener some extra time to recognize what is being said. This can be obtained by using time-scaling techniques, which means that the temporal evolution of the signal is changed.
- the speech rate is adjusted by adding extra time data to the signal according to a chosen algorithm.
- a device utilizing such an algorithm is known from the article Y. Nejime, T. Aritsuka, T. Imamura, T. Ifukube, and J. Matsushima, "A Portable Digital Speech-Rate Converter for Hearing Impairment", IEEE Transactions on Rehabilitation Engineering, vol. 4, no. 2, pp. 73-83, June 1996.
- the device is a hand-sized portable device that converts the speech rate without changing the pitch.
- a time delay occurs between the input and the output speech.
- the speech signals are recorded into a solid-state memory while previously recorded signals are being slowed and generated.
- the user activates the device by holding down a button on the device. The longer the user holds the button to slow the speech, the longer the delay. Although the delay may be reduced by cutting silent intervals in excess of one second, this is not sufficient to eliminate the delay.
- the user can return to non-delay by releasing the button.
- the speech data in the memory are partitioned into frames.
- the time-scaling process expands the time scale of the speech data frame by frame.
- the time expansion is obtained by inserting a composite pitch pattern created from the signal of three consecutive pitch periods.
- the composite pattern is used in order to avoid reverberation of the expanded signal. Because the time-scaling process used needs four-pitch-length data elements, the length of each frame is 48 ms corresponding to four times the assumed maximum pitch interval which is set to 12 ms in this document. Other documents mention assumed maximum pitch periods of 16 ms or even close to 20 ms, which would necessitate even longer frame lengths and thus larger amounts of data to be processed for each frame.
- this object is achieved in that a segment size longer than said maximum expected pitch period but shorter than twice the maximum expected pitch period is used.
- the method further comprises the step of providing, if the actual estimated pitch period of the segment is greater than half the segment size, the intermediate signal by using the copied fraction directly as the intermediate signal. This avoids the extra calculation of a composite signal.
- the method may further comprise the steps of copying two consecutive fractions, each having a duration equal to the estimated pitch period, and providing the intermediate signal as an average of the two consecutive fractions. In this way reverberation may be minimized for speech with shorter pitch periods which actually have a higher risk for such reverberation.
- the method further comprises the steps of classifying a segment of the speech signal as a silent segment, if the content of speech information is below a preset threshold, and shortening a segment, if that segment and a number of immediately preceding segments have been classified as silent segments, to compensate for expansion of previous segments, it is possible to maintain the delay between the input signal and the (expanded) output signal at a very low level, thus providing a substantial real time conversion of the speech.
- This makes the algorithm more suited for use in mobile telephones in which it is desired to keep the expanded speech as close to real time as possible.
- An embodiment especially expedient for use in mobile telephones is obtained when a segment size of 20 ms is used, because this segment size is also used by the existing speech signal processing in many mobile telephones, and thus, a great many computational resources can be saved by using the same segments for the speech expansion algorithm.
- a better result without the introduction of spikes or similar discontinuities in the insertion may be achieved when an overlapping window is used when copying said fraction and inserting said intermediate signal.
- a typical use of the method is in portable communications devices, and in an expedient embodiment the method is used in a mobile telephone.
- the invention also relates to a device adapted to convert the speech rate of a speech signal having a pitch period below a maximum expected pitch period.
- the device comprises means for dividing the speech signal into segments, means for estimating the pitch period of the speech signal in a segment, means for copying a fraction of the speech signal in the segment, said fraction having a duration equal to said estimated pitch period, means for providing from the fraction an intermediate signal having the same duration, and means for expanding the segment by inserting said intermediate signal pitch synchronously into the speech signal of the segment.
- the device When the device is adapted to use a segment size longer than said maximum expected pitch period but shorter than twice the maximum expected pitch period, a considerably smaller amount of data has to be processed for a frame, so that the method can be implemented with the limited computational resources of e.g. a mobile telephone.
- the device is further adapted to provide, if the actual estimated pitch period of the segment is greater than half the segment size, the intermediate signal by using the copied fraction directly as the intermediate signal. This avoids the extra calculation of a composite signal.
- the device may further be adapted to copy two consecutive fractions, each having a duration equal to the estimated pitch period, and to provide the intermediate signal as an average of the two consecutive fractions. In this way reverberation may be minimized for speech with shorter pitch periods which actually have a higher risk for such reverberation.
- the device When the device is further adapted to classify a segment of the speech signal as a silent segment, if the content of speech information is below a preset threshold, and to shorten a segment, if that segment and a number of immediately preceding segments have been classified as silent segments, to compensate for expansion of previous segments, it is possible to maintain the delay between the input signal and the (expanded) output signal at a very low level, thus providing a substantial real time conversion of the speech.
- This makes the algorithm more suited for use in mobile telephones in which it is desired to keep the expanded speech as close to real time as possible.
- An embodiment especially expedient for use in mobile telephones is obtained when the device is adapted to use a segment size of 20 ms, because this segment size is also used by the existing speech signal processing in many mobile telephones, and thus, a great many computational resources can be saved by using the same segments for the speech expansion algorithm.
- the device When the device is adapted to expand a segment by inserting the intermediate signal pitch synchronously into the speech signal of the segment a plurality of times, higher expansion rates can be achieved without increasing the use of computational resources considerably.
- a better result without the introduction of spikes or similar discontinuities in the insertion may be achieved when the device is adapted to use an overlapping window when copying said fraction and inserting said intermediate signal.
- the device is a mobile telephone, although it may also be other types of portable communications devices.
- the device is an integrated circuit which can be used in different types of equipment.
- Figure 1 shows a block diagram of an example of a speech rate conversion system 1 in which the method and the device of the invention may be implemented.
- the shown speech rate conversion system can be used in a mobile telephone or a similar communications device.
- a speech signal 2 is sampled in a sampling circuit 3 with a sampling rate of 8 kHz and the samples are divided into segments or frames of 160 consecutive samples. Thus, each segment corresponds to 20 ms of the speech signal.
- This is the sampling and segmentation normally used for the speech processing in a standard mobile telephone and thus, the sampling circuit 3 is a normal part of such a telephone.
- Each segment or frame of 160 samples is then sent to a noise threshold unit 4 in which a classification step is performed which separates speech from silence.
- Frames classified as speech will be further processed while the others are sent to a silence shortening unit 5, which will be described later.
- the separation of speech from silence is a necessary operation when speech extension is to operate in real-time, since the extra time created by the extended speech is compensated by taking time from the silence or noise part of the signal.
- the classification is based on an energy measurement in combination with memory in the form of recorded history of energy from previous frames. It is presumed that the background noise changes slowly while the speech envelope changes more rapidly.
- a threshold is calculated. The short-time energy of each frame is calculated, and the short-time energy values of the latest 150 frames are continuously saved. The energy values of those frames classified as silence are selected and the mean energy is calculated over these selected energy values. Also the minimum energy value of the selected energy values is stored.
- the threshold is calculated by adding the difference between the mean value and the minimum value, multiplied by a pre-selected factor, to the mean energy. To decide whether a given frame is speech or silence the energy of the current frame is simply compared with the threshold value. If the energy of the frame exceeds this value it is classified as speech, otherwise it is classified as silence.
- the frames classified as speech are then sent to the voiced/unvoiced classification unit 6, because a separation of the speech into voiced and unvoiced portions is needed before an extension can be made.
- This separation can be performed by several methods, one of which will be described in detail below.
- a speech signal is modelled as an output of a slowly time-varying linear filter.
- the filter is either excited by a quasi-periodic sequence of pulses or random noise depending on whether a voiced or an unvoiced sound is to be created.
- the pulse train which creates voiced sounds is produced by pressing air out of the lungs through the vibrating vocal cords.
- the period of time between the pulses is called the pitch period and is of great importance for the singularity of the speech.
- unvoiced sounds are generated by forming a constriction in the vocal tract and produce turbulence by forcing air through the constriction at a high velocity.
- the filter has to be time-varying.
- the properties of a speech signal change relatively slowly with time. It is reasonable to believe that the general properties of speech remain fixed for periods of 10-20 ms. This has led to the basic principle that if short segments of the speech signal are considered, each segment can effectively be modelled as having been generated by exciting a linear time-invariant system during that period of time.
- the effect of the filter can be seen as caused by the vocal tract, the tongue, the mouth and the lips.
- voiced speech can be interpreted as the output signal from a linear filter driven by an excitation signal.
- This is shown in the upper part of figure 2 in which the pulse train 21 is processed by the filter 22 to produce the voiced speech signal 23.
- a good signal for the voiced/unvoiced classification is obtained if the excitation signal can be extracted from the speech.
- a signal 26 similar to the excitation signal can be obtained. This signal is called the residual signal.
- the blocks 24 and 25 are included in the voiced/unvoiced classification unit 6 in figure 1.
- LPA linear predictive analysis
- a classifying signal is then produced by calculating the autocorrelation function of the residual signal and scaling the result to be between ⁇ 1. As the inverse filtering has removed much of the smearing introduced by the filter, the possibility of a clearer peak is higher compared to calculating the autocorrelation directly of the speech frame.
- a voiced/unvoiced decision is then made by comparing the value of the highest peak in the classifying signal to a threshold value, because a sufficiently high peak in the classifying signal means that a pulse train was actually present in the residual signal and thus also in the original speech signal of the frame.
- the voiced/unvoiced decision can be made by a simple comparison of the power or energy level of the frame with a threshold similar to the one used in the noise threshold unit 4, just with a higher threshold value, because signals below a certain power level primarily contain consonants or semi-vowels, which are typically unvoiced.
- the results of this method is not as precise as those obtained by the above-mentioned classification.
- the frame If the frame is decided as unvoiced it will be sent directly to a combination or concatenation unit 7. Otherwise, i.e. if it is decided as voiced, it will be forwarded to the pitch estimation unit 8, which will be described below.
- the pitch is estimated as a preparation for the extension process which should be pitch synchronous.
- the general idea of the estimation originates in the speech model described above, where the pitch represents the period of the glottal excitation. As the pitch expresses the natural quality and singularity of the speech it is important to carry out a good estimation of the pitch.
- the estimation of the pitch is based on the autocorrelation of the residual signal, which is obtained by LPA as described above in the voiced/unvoiced classification. This can be done because the highest peak in the autocorrelation of the residual signal represents the pitch period and can thus be used as an estimate thereof. By thus reusing data the complexity of the method is lowered.
- Figure 3a shows an example of a 20 ms segment of a voiced speech signal and figure 3b the corresponding autocorrelation function of the residual signal. It will be seen from figure 3a that the actual pitch period is about 5.25 ms corresponding to 42 samples, and thus the pitch estimation should end up with this value.
- the first step in the estimation of the pitch is to apply a peak picking algorithm to the autocorrelation function provided by the unit 6. This is done with a peak detector which identifies the maximum peak (i.e. the largest value) in the autocorrelation function. The index value, i.e. the sample number or the lag, of the maximum peak is then used as an estimate of the pitch period. In the case shown in figure 3b it will be seen that the maximum peak is actually located at a lag of 42 samples. The search of the maximum peak is only performed in the range where a pitch period is likely to be located. In this case the range is set to 60-333 Hz.
- the result of the estimation is forwarded to the extension unit 9 along with the speech frame.
- the extension algorithm is a time-domain based method which operates on whole pitch period blocks. The use of this technique means that unwanted changes of the pitch can be avoided, and thereby the singularity of the speech can be preserved.
- the extension algorithm described below is a modified version of a Pitch Synchronous OverLap Add (PSOLA) method.
- PSOLA Pitch Synchronous OverLap Add
- the algorithm makes a copy of one or two pitch periods and adds it or them to the original speech data, possibly with some overlap.
- the modifications are due to the fact that the relatively short frame or segment length of 20 ms is used.
- the first approach is used for relatively short pitch periods. This could be pitch periods below 8.75 ms corresponding to 70 samples using a sample rate of 8 kHz. It also corresponds to pitch frequencies above 114 Hz.
- the second approach is then used for pitch periods above 8.75 ms, i.e. relatively long pitch periods.
- the reason for using two different approaches is that due to the short frame or segment length of 20 ms only one full pitch length of the signal, including a certain overlap, can be extracted for extension purposes for signals having long pitch periods, while two consecutive pitch periods (and overlap) may be extracted for signals with shorter pitch periods.
- the first approach utilizes the circumstance that the pitch period is relatively short.
- the different steps performed in this approach are illustrated in figure 4.
- From the incoming frame two subsequent pitch periods Tp, along with an extra piece corresponding to the overlapping part L, are copied.
- the overlapping part could be set to 10% of Tp.
- a window is applied to the two segments I and II, thereby creating what will be referred to as segment IWin and segment IIWin.
- the window being used could be a raised cosine window or trapezoid window.
- MWin By forming an averaged segment unnecessary repetitions of an already existing segment can be avoided. Thereby the risk of undesired artifacts, such as reverberation, can be reduced.
- the pitch periods are longer.
- the first approach cannot be used as the frame length is not long enough to include two pitch periods.
- a demonstration of the stages in the second approach can be seen in figure 6. From the incoming frame only one segment I of the length Tp+L is copied out and windowed with a chosen window. Also in this case the length of L corresponds to 10% of Tp. Then the windowed segment IWin is inserted with an overlap of L samples with the original samples. The insertion of IWin can be seen in the lower part of figure 6 showing the outgoing data, in which it can be seen that the extended frame now has a length of 160+2T p samples instead of the original 160 samples, because the original pitch length segment is used before as well as after the inserted IWin.
- the frame can be further extended by adding IWin including overlap again.
- the original pitch length segment could also be used only twice so that the extended frame length is 160+T p samples.
- the extended frame is now sent to the concatenation unit 7 where it will be merged with the other frames.
- the speech extension causes delays in the speech that are not desirable, especially in a mobile telephone environment. To avoid this delay some parts of the input signal have to be removed. A natural choice is to use the speech pauses which consist of silence only. A shortening algorithm fulfilling the demands for real time is performed in the shortening unit 5 and will be described below.
- the current frame and the preceding three frames must be silent frames. If this condition is satisfied, the number of samples corresponding to the extended part is removed. Also fractions of frame can be removed in order to maintain real time.
- the second reason for the condition is that there are pauses in the speech which are necessary for the natural flow of the speech. If they are removed, the speech is harder to understand, which is the opposite result of what is wanted.
- an incoming frame can take three ways in the system to the concatenation or combination unit 7 depending on whether the frame is classified as silence, unvoiced speech or voiced speech. Independent of which way the frames have taken, the incoming frames must be sent out in the same order as they arrived, irrespective of whether they have been altered or not. Therefore, the combination unit 7 can be viewed as a First In First Out (FIFO) buffer.
- FIFO First In First Out
- the autocorrelation function may be calculated directly of the speech signal instead of the residual signal, or other conformity functions may be used instead of the autocorrelation function.
- a cross correlation could be calculated between the speech signal and the residual signal.
- different sampling rates may be used.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mobile Radio Communication Systems (AREA)
Claims (17)
- Verfahren des Umsetzens der Sprachrate eines Sprachsignals (2) mit einer Grundperiode unterhalb einer maximal erwarteten Grundperiode, wobei das Verfahren die Schritte umfasst:Aufteilen des Sprachsignals in Segmente,Schätzen der Grundperiode (Tp) des Sprachsignals in einem Segment; undAusdehnen des Segmentes durch Replizieren eines Teils des Sprachsignals in dem Segment;Verwenden einer Segmentgröße, die größer ist als die maximal erwartete Grundperiode aber kürzer als zweimal die maximal erwartete Grundperiode;Kopieren eines Bruchteils des Sprachsignals in das Segment, wobei der Bruchteil eine Dauer gleich der geschätzten Grundperiode (Tp) hat;Bereitstellen eines Zwischensignals (MWin; IWin) mit derselben Dauer von dem Bruchteil; undAusdehnen des Segmentes durch Einfügen des Zwischensignalabstands synchron in das Sprachsignal des Segmentes.
- Verfahren nach Anspruch 1, dadurch gekennzeichnet, dass es ferner die Schritte umfasst:Bereitstellen des Zwischensignals unter Verwendung des kopierten Bruchteils direkt als Zwischensignal, wenn die tatsächlich geschätzte Grundperiode des Segmentes größer als die Hälfte der Segmentgröße ist.
- Verfahren nach Anspruch 1 oder 2, dadurch gekennzeichnet., dass es ferner die Schritte umfasst:Kopieren zweier aufeinanderfolgender Bruchteile, wobei jeder Bruchteil eine Dauer gleich der geschätzten Grundperiode hat, wenn die aktuell geschätzte Grundperiode des Segmentes geringer als die Hälfte der Segmentgröße ist, undBereitstellen des Zwischensignals als Mittelwert der beiden aufeinanderfolgenden Bruchteile.
- Verfahren nach einem der Ansprüche 1 bis 3, dadurch gekennzeichnet, dass es ferner die Schritte umfasst:Klassifizieren eines Segmentes des Sprachsignals als ein Stille-Segment, wenn der Inhalt der Sprachinformation unterhalb eines voreingestellten Schwellwerts liegt,Kürzen eines Segmentes, wenn das Segment und eine Zahl von unmittelbar vorangehenden Segmenten als Stille-Segmente klassifiziert worden sind, um eine Expansion vorangehender Segmente zu kompensieren.
- Verfahren nach einem der Ansprüche 1 bis 4, dadurch gekennzeichnet, dass eine Segmentgröße von 20 ms verwendet wird.
- Verfahren nach einem der Ansprüche 1 bis 5, dadurch gekennzeichnet, dass das Segment ausgedehnt wird durch mehrmaliges Einfügen des Zwischensignalabstands synchron in das Sprachsignal des Segmentes.
- Verfahren nach einem der Ansprüche 1 bis 6, dadurch gekennzeichnet, dass ein Überlappungsfenster verwendet wird beim Kopieren des Bruchteils und Einfügen des Zwischensignals.
- Verwendung des Verfahrens nach einem der Ansprüche 1 bis 7 in einem Mobiltelefon.
- Einrichtung, angepasst zum Umsetzen der Sprachrate eines Sprachsignals (2) mit einer Grundperiode unterhalb einer maximal erwarteten Grundperiode, wobei die Einrichtung umfasst:eine Vorrichtung (3) zum Aufteilen des Sprachsignals in Segmente;eine Vorrichtung (8) zum Schätzen der Grundperiode (Tp) des Sprachsignals in einem Segment; undeine Vorrichtung (9) zum Ausdehnen des Segmentes durch Replizieren eines Teils des Sprachsignals in dem Segment;eine Vorrichtung (2) zum Auswählen einer Segmentgröße größer als die maximal erwartete Grundperiode, aber kürzer als zweimal die maximal erwartete Grundperiode;eine Vorrichtung zum Kopieren eines Bruchteils des Sprachsignals in einem Segment, wobei der Bruchteil eine Dauer gleich der geschätzten Grundperiode (Tp) hat;eine Vorrichtung zum Bereitstellen von dem Bruchteil, eines Zwischensignals (MWin; IWin) mit derselben Dauer; undeine Vorrichtung (9) zum Ausdehnen des Segmentes durch Einfügen des Zwischensignalabstands synchron in das Sprachsignal des Segmentes.
- Einrichtung nach Anspruch 9, dadurch gekennzeichnet, dass sie ferner angepasst ist, um, wenn die aktuell geschätzte Grundfrequenz des Segmentes größer als die Hälfte der Segmentgröße ist, das Zwischensignal durch Verwenden des kopierten Bruchteils direkt als Zwischensignal bereitzustellen.
- Einrichtung nach Anspruch 9 oder 10, dadurch gekennzeichnet, dass sie ferner angepasst ist, um, wenn die aktuell geschätzte Grundperiode des Segmentes kleiner als die Hälfte der Segmentgröße ist, zwei aufeinanderfolgende Bruchteile, von denen jedes eine Dauer gleich der geschätzten Grundperiode hat, zu kopieren und das Zwischensignal als einen Durchschnittswert der beiden aufeinanderfolgenden Bruchteile bereitzustellen.
- Einrichtung nach einem der Ansprüche 9 bis 11, dadurch gekennzeichnet, dass sie ferner eingerichtet ist, um:ein Segment des Sprachsignals als ein Stille-Segment zu klassifizieren, wenn der Inhalt der Sprachinformation unterhalb eines voreingestellten Schwellwerts liegt;ein Segment zu kürzen, wenn das Segment und eine Anzahl von unmittelbar vorhergehenden Segmenten als Stille-Segmente klassifiziert worden sind, um in Bezug auf eine Ausdehnung der vorhergehenden Segmente zu kompensieren.
- Einrichtung nach einem der Ansprüche 9 bis 12, dadurch gekennzeichnet, dass sie angepasst ist, um eine Segmentgröße von 20 ms zu verwenden.
- Einrichtung nach einem der Ansprüche 9 bis 13, dadurch gekennzeichnet, dass sie angepasst ist, um das Segment durch mehrmaliges Einfügen des Zwischensignalabstandes synchron in das Sprachsignal des Segmentes auszudehnen.
- Einrichtung nach einem der Ansprüche 9 bis 14, dadurch gekennzeichnet, dass sie angepasst ist, um ein überlappendes Fenster zu verwenden beim Kopieren des Bruchteils und beim Einfügen des Zwischensignals.
- Einrichtung nach einem der Ansprüche 9 bis 15, dadurch gekennzeichnet, dass die Einrichtung ein Mobiltelefon ist.
- Einrichtung nach einem der Ansprüche 9 bis 15, dadurch gekennzeichnet, dass die Einrichtung eine integrierte Schaltung ist.
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AT00610036T ATE314719T1 (de) | 2000-04-06 | 2000-04-06 | Verfahren zur geschwindigkeitsmodifikation von sprachsignalen, verwendung des verfahrens, und anordnung zur durchführung des verfahrens |
EP00610036A EP1143417B1 (de) | 2000-04-06 | 2000-04-06 | Verfahren zur Geschwindigkeitsmodifikation von Sprachsignalen, Verwendung des Verfahrens, und Anordnung zur Durchführung des Verfahrens |
DE60025158T DE60025158T2 (de) | 2000-04-06 | 2000-04-06 | Verfahren zur Geschwindigkeitsmodifikation von Sprachsignalen, Verwendung des Verfahrens, und Anordnung zur Durchführung des Verfahrens |
PCT/EP2001/003491 WO2001078066A1 (en) | 2000-04-06 | 2001-03-27 | Speech rate conversion |
CN01810565.3A CN1432177A (zh) | 2000-04-06 | 2001-03-27 | 语音速率转换 |
AU2001242520A AU2001242520A1 (en) | 2000-04-06 | 2001-03-27 | Speech rate conversion |
US09/827,195 US6763329B2 (en) | 2000-04-06 | 2001-04-05 | Method of converting the speech rate of a speech signal, use of the method, and a device adapted therefor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00610036A EP1143417B1 (de) | 2000-04-06 | 2000-04-06 | Verfahren zur Geschwindigkeitsmodifikation von Sprachsignalen, Verwendung des Verfahrens, und Anordnung zur Durchführung des Verfahrens |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1143417A1 EP1143417A1 (de) | 2001-10-10 |
EP1143417B1 true EP1143417B1 (de) | 2005-12-28 |
Family
ID=8174384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00610036A Expired - Lifetime EP1143417B1 (de) | 2000-04-06 | 2000-04-06 | Verfahren zur Geschwindigkeitsmodifikation von Sprachsignalen, Verwendung des Verfahrens, und Anordnung zur Durchführung des Verfahrens |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1143417B1 (de) |
AT (1) | ATE314719T1 (de) |
DE (1) | DE60025158T2 (de) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3871657B2 (ja) * | 2003-05-27 | 2007-01-24 | 株式会社東芝 | 話速変換装置、方法、及びそのプログラム |
CN101719371B (zh) * | 2009-11-20 | 2012-04-04 | 安凯(广州)微电子技术有限公司 | 一种语音变速的方法 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5828995A (en) * | 1995-02-28 | 1998-10-27 | Motorola, Inc. | Method and apparatus for intelligible fast forward and reverse playback of time-scale compressed voice messages |
US5933808A (en) * | 1995-11-07 | 1999-08-03 | The United States Of America As Represented By The Secretary Of The Navy | Method and apparatus for generating modified speech from pitch-synchronous segmented speech waveforms |
JPH09198089A (ja) * | 1996-01-19 | 1997-07-31 | Matsushita Electric Ind Co Ltd | 再生速度変換装置 |
CN1163868C (zh) * | 1996-11-11 | 2004-08-25 | 松下电器产业株式会社 | 一种转换话音重现速率的方法及其装置 |
-
2000
- 2000-04-06 EP EP00610036A patent/EP1143417B1/de not_active Expired - Lifetime
- 2000-04-06 DE DE60025158T patent/DE60025158T2/de not_active Expired - Lifetime
- 2000-04-06 AT AT00610036T patent/ATE314719T1/de not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
DE60025158T2 (de) | 2006-07-06 |
EP1143417A1 (de) | 2001-10-10 |
ATE314719T1 (de) | 2006-01-15 |
DE60025158D1 (de) | 2006-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6763329B2 (en) | Method of converting the speech rate of a speech signal, use of the method, and a device adapted therefor | |
KR102158743B1 (ko) | 자연어 음성인식의 성능향상을 위한 데이터 증강장치 및 방법 | |
Arons | Techniques, perception, and applications of time-compressed speech | |
JP2008058983A (ja) | 音声コーディングにおける雑音のロバストな分類のための方法 | |
JPH07319496A (ja) | 入力音声信号の速度を変更する方法 | |
CN111508498A (zh) | 对话式语音识别方法、系统、电子设备和存储介质 | |
JP2007003682A (ja) | 話速変換装置 | |
JP4545941B2 (ja) | 音声符号化パラメータを決定する方法及び装置 | |
Mousa | Voice conversion using pitch shifting algorithm by time stretching with PSOLA and re-sampling | |
KR20050010927A (ko) | 오디오 신호 처리 장치 | |
EP1143417B1 (de) | Verfahren zur Geschwindigkeitsmodifikation von Sprachsignalen, Verwendung des Verfahrens, und Anordnung zur Durchführung des Verfahrens | |
JP2006119647A (ja) | ささやき声を通常の有声音声に擬似的に変換する装置 | |
JP2905112B2 (ja) | 環境音分析装置 | |
WO2009055718A1 (en) | Producing phonitos based on feature vectors | |
JP6313619B2 (ja) | 音声信号処理装置及びプログラム | |
Kondo et al. | A packet loss concealment method using recursive linear prediction. | |
KR100359988B1 (ko) | 실시간 화속 변환 장치 | |
JPH0777999A (ja) | 音声時間軸圧縮伸長方法 | |
JP2002297200A (ja) | 話速変換装置 | |
JP2006038956A (ja) | 音声速度遅延装置及び方法 | |
JPH10224898A (ja) | 補聴器 | |
KR100345402B1 (ko) | 피치 정보를 이용한 실시간 음성 검출 장치 및 그 방법 | |
KR100384898B1 (ko) | 발화속도 조절기능을 이용한 음성/영상의 동기화 방법 | |
JP2007047313A (ja) | 話速変換装置 | |
JPH07210192A (ja) | 出力データ制御方法及び装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
17P | Request for examination filed |
Effective date: 20020320 |
|
AKX | Designation fees paid |
Free format text: AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL) |
|
17Q | First examination report despatched |
Effective date: 20040513 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20051228 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED. Effective date: 20051228 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20051228 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20051228 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20051228 Ref country code: LI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20051228 Ref country code: CH Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20051228 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 60025158 Country of ref document: DE Date of ref document: 20060202 Kind code of ref document: P |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060328 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060328 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060328 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060406 Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060406 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060408 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060430 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060529 |
|
NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20060929 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20060406 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20070216 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060406 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060430 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20051228 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20051228 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 60025158 Country of ref document: DE Representative=s name: HOFFMANN - EITLE, DE Effective date: 20140618 Ref country code: DE Ref legal event code: R081 Ref document number: 60025158 Country of ref document: DE Owner name: OPTIS WIRELESS TECHNOLOGY, LLC, PLANO, US Free format text: FORMER OWNER: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL), STOCKHOLM, SE Effective date: 20140618 Ref country code: DE Ref legal event code: R082 Ref document number: 60025158 Country of ref document: DE Representative=s name: GRUENECKER, KINKELDEY, STOCKMAIR & SCHWANHAEUS, DE Effective date: 20140618 Ref country code: DE Ref legal event code: R082 Ref document number: 60025158 Country of ref document: DE Representative=s name: GRUENECKER PATENT- UND RECHTSANWAELTE PARTG MB, DE Effective date: 20140618 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 60025158 Country of ref document: DE Representative=s name: GRUENECKER, KINKELDEY, STOCKMAIR & SCHWANHAEUS, DE Ref country code: DE Ref legal event code: R082 Ref document number: 60025158 Country of ref document: DE Representative=s name: GRUENECKER PATENT- UND RECHTSANWAELTE PARTG MB, DE |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20170321 Year of fee payment: 18 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 60025158 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181101 |