JP2010026323A

JP2010026323A - Speech speed detection device

Info

Publication number: JP2010026323A
Application number: JP2008188950A
Authority: JP
Inventors: Teppei Washi; 哲平鷲; Keiichi Yoshida; 恵一吉田; Katsuhiko Kimura; 克彦木村
Original assignee: Panasonic Electric Works Co Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2008-07-22
Filing date: 2008-07-22
Publication date: 2010-02-04

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech speed detection device which is hardly affected by environmental noise and can accurately detect the speech speed of a speaker. <P>SOLUTION: The speech speed detection device includes: a linear prediction coefficient variation calculation section 11 for calculating variation of a linear prediction coefficient by performing linear prediction on a speech signal; a first envelope calculation section 12 for calculating the envelope of a variation sum of the linear prediction coefficient; a first peak detection section 13 for detecting the peak of the envelope; a first speech speed calculation section 14 for calculating the number of peaks per unit time, from the number of detected peaks; a speech absolute value calculation section 21 for calculating the absolute value of the speech signal; a second envelope calculation section 22 for calculating the envelope; a second peak detection section 23 for detecting the peak of the envelope; a second speech speed calculation section 24 for calculating the number of peaks per unit time, from the number of detected peaks; and a comprehensive speech speed calculation section 10 for calculating the speech speed on the basis of both of the number of peaks per unit time. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、インターホンなどのリアルタイム型の通話装置に用いられ、話速変換のために、話者の話す速度（話速）を検出する話速検出装置に関する。 The present invention relates to a speech speed detection device that is used in a real-time communication device such as an interphone and detects the speaking speed (speaking speed) of a speaker for speech speed conversion.

従来から、ＩＣレコーダなどの分野において、ディジタル化された音声信号を時間軸上で圧縮／伸長処理を行い、圧縮／伸長された音声信号をアナログ信号に変換してスピーカから出力することにより、音声速度を変換することが行われている。周知のように、話者によって話速はさまざまであり、同じ話者が話している場合であっても、その間の話速は一定ではない。そのため、音声信号に対して一定の圧縮／伸長率で話速変換を行うと、再生される音声はユーザ（聴者）が所望する話速よりも速く又は遅くなり、ユーザにとって聞き取りにくくなる可能性がある。 Conventionally, in a field such as an IC recorder, a digitized audio signal is compressed / expanded on the time axis, the compressed / expanded audio signal is converted into an analog signal and output from a speaker. Converting speed has been done. As is well known, the speaking speed varies depending on the speaker, and the speaking speed during that time is not constant even when the same speaker is speaking. Therefore, if speech speed conversion is performed on a speech signal at a constant compression / decompression rate, the reproduced speech may be faster or slower than the speech speed desired by the user (listener), which may be difficult for the user to hear. is there.

そのため、実際の話者の話速を検出し、検出した話速に応じて圧縮／伸長率を設定して話速変換を行う方法が提案されている。例えば特許文献１では、音声の時間軸波形のエンベロープにスムージング処理を施し、単位時間あたりの波形のピークの数をカウントして話速を演算し、その値に応じて話速変換を行っている。ところが、音声の変化は振幅だけでなく周波数にも現れるため、このような方法では話速の検出精度が低く、再生される音声が不自然に聞こえる場合がある。 For this reason, a method has been proposed in which the speaking speed of an actual speaker is detected and speaking speed conversion is performed by setting a compression / decompression ratio according to the detected speaking speed. For example, in Patent Document 1, smoothing processing is performed on an envelope of a time axis waveform of speech, the number of waveform peaks per unit time is counted to calculate speech speed, and speech speed conversion is performed according to the value. . However, since the change of the voice appears not only in the amplitude but also in the frequency, in such a method, the accuracy of detecting the speech speed is low, and the reproduced voice may be heard unnaturally.

ところで、インターホンなどのようにリアルタイムで通話を行うような装置の場合、通話を行う両者が互いに面識がなく、相手がどのような話速で話すのか予測がつかない場合が多く、通話開始時において、相手の話の内容を聞き逃す可能性が高い。そのため、リアルタイム型の通話装置において、話者の話速をより正確に検出することが要求されている。
特開平７−６４５９７号公報 By the way, in the case of a device that makes a call in real time, such as an intercom, the two parties who are making calls are often unfamiliar with each other, and it is often impossible to predict at what speed the other party will speak. , Likely to miss the content of the other person's story. For this reason, it is required to detect the speaking speed of the speaker more accurately in a real-time communication device.
Japanese Patent Laid-Open No. 7-64597

本発明は、上記従来例の問題を解決するためになされたものであり、話者の話速をより正確に検出しうる話速検出装置を提供することを目的とする。 The present invention has been made to solve the above-described problems of the conventional example, and an object of the present invention is to provide a speech speed detection device capable of more accurately detecting a speaker's speech speed.

上記目的を達成するために請求項１の発明は、話速検出装置であって、入力される音声信号に対して線形予測分析を行い、得られた予測係数の変化量の総和のエンベロープを求め、エンベロープにおける単位時間あたりのピーク数を第１話速パラメータとし、前記音声信号の絶対値のエンベロープをとり、エンベロープにおける単位時間あたりのピーク数を第２話速パラメータとし、前記第１話速パラメータ及び前記第２話速パラメータにそれぞれ所定の寄与率を掛けて足し算したものに基づいて話速を演算することを特徴とする。 In order to achieve the above object, the invention according to claim 1 is a speech speed detection device, which performs linear prediction analysis on an input speech signal and obtains an envelope of the total amount of change of the obtained prediction coefficient. The peak number per unit time in the envelope is the first speech speed parameter, the envelope of the absolute value of the speech signal is taken, the peak number per unit time in the envelope is the second speech speed parameter, and the first speech speed parameter The speech speed is calculated based on the second speech speed parameter multiplied by a predetermined contribution rate and added.

請求項２の発明は、話速演算装置であって、入力される音声信号に対して線形予測を行い、線形予測係数の変化量を演算する線形予測計数変化量演算部と、前記線形予測計数変化量演算部により得られた前記線形予測計数の変化量の総和のエンベロープを求める第１エンベロープ演算部と、前記第１エンベロープ演算部により得られたエンベロープのピークを検出する第１ピーク検出部と、前記第１ピーク検出部により検出されたピークの数から、単位時間あたりのピーク数を演算し、得られたピーク数を第１話速パラメータとして出力する第１話速演算部と、前記入力される音声信号の絶対値を求める音声絶対値演算部と、前記音声絶対値演算部により得られた音声絶対値のエンベロープを求める第２エンベロープ演算部と、前記第２エンベロープ演算部により得られたエンベロープのピークを検出する第２ピーク検出部と、前記第２ピーク検出部により検出されたピークの数から、単位時間あたりのピーク数を演算し、得られたピーク数を第２話速パラメータとして出力する第２話速演算部と、前記第１話速パラメータと、前記第２話速パラメータにそれぞれ所定の寄与率を掛けて足し算したものに基づいて話速を演算する総合話速演算部を備えたことを特徴とする。 The invention according to claim 2 is a speech speed calculation device, which performs linear prediction on an input speech signal and calculates a change amount of a linear prediction coefficient, and the linear prediction count. A first envelope calculation unit for obtaining an envelope of a sum of changes in the linear prediction count obtained by a change amount calculation unit; a first peak detection unit for detecting a peak of the envelope obtained by the first envelope calculation unit; Calculating the number of peaks per unit time from the number of peaks detected by the first peak detector, and outputting the obtained number of peaks as a first speech speed parameter; and the input A sound absolute value calculating unit for obtaining an absolute value of the sound signal to be generated; a second envelope calculating unit for obtaining an envelope of the sound absolute value obtained by the sound absolute value calculating unit; The number of peaks obtained by calculating the number of peaks per unit time from the number of peaks detected by the second peak detection unit that detects the peak of the envelope obtained by the rope calculation unit and the second peak detection unit Is calculated as a second speech speed calculation unit, and the first speech speed parameter is calculated based on the first speech speed parameter and the second speech speed parameter multiplied by a predetermined contribution rate. And a comprehensive speech speed calculation unit.

請求項３の発明は、請求項２に記載の話速演算装置において、入力される音声信号から音声区間を検出する音声区間検出部をさらに備え、前記音声区間検出部が音声区間であると判断した区間に対してのみ、前記第１話速演算部及び前記第２話速演算部は、それぞれ単位時間あたりのピーク数の演算を行うことを特徴とする。 The invention according to claim 3 is the speech speed calculation device according to claim 2, further comprising a voice section detector for detecting a voice section from the input voice signal, and determining that the voice section detector is a voice section. The first speech speed calculation unit and the second speech speed calculation unit each calculate the number of peaks per unit time only for the section that has been performed.

請求項４の発明は、請求項２に記載の話速演算装置において、前記第１ピーク検出部及び前記第２ピーク検出部は、エンベロープの極大値から極小値への変化量が所定の設定値よりも小さい場合、その極大値をピークとしては検出しないことを特徴とする。 According to a fourth aspect of the present invention, in the speech speed calculation device according to the second aspect, the first peak detection unit and the second peak detection unit have a predetermined amount of change from a maximum value to a minimum value of an envelope. If the value is smaller than the maximum value, the maximum value is not detected as a peak.

請求項１又は２の発明によれば、線形予測係数の変化量のエンベロープから音声の周波数特性などの変化を抽出することができる。一方、線形予測係数の変化量は、振幅の小さい信号に対しては変動が小さく、音声の特徴の変化を抽出できない場合がある。そのため、信号レベルの小さい音声の変化も抽出することができる音声絶対値のエンベロープを組み合わせることにより、精度の高い話速検出を実現することができる。 According to the first or second aspect of the present invention, it is possible to extract changes such as the frequency characteristics of speech from the envelope of the change amount of the linear prediction coefficient. On the other hand, the change amount of the linear prediction coefficient varies little with respect to a signal having a small amplitude, and there may be a case where a change in voice feature cannot be extracted. For this reason, it is possible to realize high-accuracy speech speed detection by combining an envelope of a speech absolute value that can extract a change in speech with a low signal level.

請求項３の発明によれば、音声信号のうち、話者が実際に話している音声区間に対してだけ話速検出処理を行うので、話者が話していない非音声区間における環境ノイズなどによる影響を排除することができ、より精度の高い話速検出を行うことができる。 According to the invention of claim 3, since the speech speed detection processing is performed only for the speech section in which the speaker is actually speaking in the speech signal, it is caused by environmental noise in the non-speech section where the speaker is not speaking. The influence can be eliminated and more accurate speech speed detection can be performed.

請求項４の発明によれば、変化の大きいピークのみを検出するので、ノイズなどによる影響を低減することができ、精度の高い話速検出を行うことができる。 According to the invention of claim 4, since only a peak having a large change is detected, the influence of noise or the like can be reduced, and speech speed detection with high accuracy can be performed.

本発明の一実施形態に係る話速検出装置について、図面を参照しつつ説明する。はじめに、話速検出装置１の使用例を図１に示す。例えばマイクロホン（図示せず）などから入力される音声信号は、話速検出装置１に入力され、以下に説明する処理が行われ、話者の話速が検出される。検出された話速は、話速パラメータとして音声伸長率決定部２に入力され、音声伸長率が決定される。音声信号は、話速変換装置１とパラレルに話速変換部３に入力され、音声伸長率決定部２により決定された音声伸長率に基づいて音声信号の話速変換が行われる。そして、話速変換が行われた音声信号は、スピーカ４から音声に変換されて出力される。話速変換装置１は、電話やインターホンなどの話速変換機能を備えた通話装置の回路の一部として構成されていてもよいし、専用のＩＣとしてモジュール化されていてもよい。 A speech speed detection apparatus according to an embodiment of the present invention will be described with reference to the drawings. First, an example of use of the speech speed detection device 1 is shown in FIG. For example, an audio signal input from a microphone (not shown) or the like is input to the speech speed detection device 1 and the processing described below is performed to detect the speaker's speech speed. The detected speech speed is input to the speech expansion rate determination unit 2 as a speech speed parameter, and the speech expansion rate is determined. The speech signal is input to the speech speed conversion unit 3 in parallel with the speech speed conversion device 1, and the speech speed of the speech signal is converted based on the speech expansion rate determined by the speech expansion rate determination unit 2. Then, the voice signal that has been subjected to the speech speed conversion is converted into voice from the speaker 4 and output. The speech speed conversion device 1 may be configured as part of a circuit of a communication device having a speech speed conversion function such as a telephone or an interphone, or may be modularized as a dedicated IC.

図２は、本実施形態に係る話速検出装置１の一構成例を示す。また、図３は、話速検出装置１の他の構成例を示す。話速検出装置１は、入力される音声信号に対して線形予測を行い、線形予測係数の変化量を演算する線形予測計数変化量演算部１１と、線形予測計数変化量演算部１１により得られた線形予測計数の変化量の総和のエンベロープを求める第１エンベロープ演算部１２と、第１エンベロープ演算部１２により得られたエンベロープのピークを検出する第１ピーク検出部１３と、第１ピーク検出部１３により検出されたピークの数から、単位時間あたりのピーク数を演算し、得られたピーク数を第１話速パラメータとして出力する第１話速演算部１４と、入力される音声信号の絶対値を求める音声絶対値演算部２１と、音声絶対値演算部２１により得られた音声絶対値のエンベロープを求める第２エンベロープ演算部２２と、第２エンベロープ演算部２２により得られたエンベロープのピークを検出する第２ピーク検出部２３と、第２ピーク検出部２３により検出されたピークの数から、単位時間あたりのピーク数を演算し、得られたピーク数を第２話速パラメータとして出力する第２話速演算部２４と、第１話速パラメータと、前記第２話速パラメータにそれぞれ所定の寄与率を掛けて足し算したものに基づいて話速を演算する総合話速演算部１０を備えている。図２に示す構成例では、さらに入力される音声信号から音声区間を検出する音声区間検出部２０を備えている。 FIG. 2 shows a configuration example of the speech speed detection apparatus 1 according to the present embodiment. FIG. 3 shows another configuration example of the speech speed detection device 1. The speech speed detection device 1 is obtained by a linear prediction count change amount calculation unit 11 that performs linear prediction on an input speech signal and calculates a change amount of a linear prediction coefficient, and a linear prediction count change amount calculation unit 11. A first envelope calculation unit 12 for obtaining an envelope of the total amount of change of the linear prediction count, a first peak detection unit 13 for detecting a peak of the envelope obtained by the first envelope calculation unit 12, and a first peak detection unit 13 calculates the number of peaks per unit time from the number of peaks detected by 13 and outputs the obtained number of peaks as a first speech speed parameter; and the absolute value of the input voice signal A sound absolute value calculation unit 21 for obtaining a value, a second envelope calculation unit 22 for obtaining an envelope of a sound absolute value obtained by the sound absolute value calculation unit 21, and a second envelope performance The number of peaks obtained by calculating the number of peaks per unit time from the number of peaks detected by the second peak detection unit 23 and the second peak detection unit 23 that detect the peak of the envelope obtained by the unit 22 Is calculated as a second speech speed parameter, and the first speech speed parameter is calculated based on the first speech speed parameter and the second speech speed parameter multiplied by a predetermined contribution rate. The integrated speech speed calculation unit 10 is provided. In the configuration example illustrated in FIG. 2, a voice section detection unit 20 that further detects a voice section from an input voice signal is provided.

これら線形予測計数変化量演算部１１、第１エンベロープ演算部１２、第１ピーク検出部１３、第１話速演算部１４、音声絶対値演算部２１、第２エンベロープ演算部２２、第２ピーク検出部２３、第２話速演算部２４、総合話速演算部１０及び音声区間検出部２０は、それぞれ個別の回路で構成されていてもよく、あるいは同一のＣＰＵ、ＲＯＭ及びＲＡＭなどで構成されていてもよく、ディジタル化された音声信号に所定の処理を行う。 These linear prediction count change amount calculation unit 11, first envelope calculation unit 12, first peak detection unit 13, first speech speed calculation unit 14, speech absolute value calculation unit 21, second envelope calculation unit 22, second peak detection The unit 23, the second speech speed calculation unit 24, the total speech speed calculation unit 10, and the voice section detection unit 20 may be configured by individual circuits, or may be configured by the same CPU, ROM, RAM, and the like. Alternatively, predetermined processing is performed on the digitized audio signal.

図２又は図３から明らかなように、本実施形態に係る話速検出装置１では、入力される音声信号に対して線形予測分析を行い、得られた予測係数の変化量の総和のエンベロープを求め、エンベロープにおける単位時間あたりのピーク数を第１話速パラメータとして出力する第１話速演算系統と、音声信号の絶対値のエンベロープをとり、エンベロープにおける単位時間あたりのピーク数を第２話速パラメータとして出力する第２話速演算系統の２系統を備えており、最終的に第１話速パラメータ及び第２話速パラメータにそれぞれ所定の寄与率を掛けて足し算したものに基づいて話速を演算する。 As is clear from FIG. 2 or FIG. 3, the speech speed detection apparatus 1 according to the present embodiment performs linear prediction analysis on the input speech signal, and obtains the envelope of the total amount of change of the obtained prediction coefficient. The first speech speed calculation system that outputs the peak number per unit time in the envelope as the first speech speed parameter and the absolute value envelope of the audio signal are taken, and the peak number per unit time in the envelope is the second speech speed. The second speech speed calculation system that outputs as a parameter is provided, and finally the speech speed is calculated based on the first speech speed parameter and the second speech speed parameter multiplied by a predetermined contribution rate and added. Calculate.

線形予測計数変化量演算部１１では、ＦＩＲフィルタに音声信号を入力し、フィルタ係数に対してＬＭＳアルゴリズムなどを用いることで線形予測係数を求める。時刻ｎでのＭ次線形予測フィルタにおける線形予測係数の時間変化量の総和ｈｅ（ｎ）は、以下の式から求まる。ｈ_ｍ（ｎ）は、時刻ｎでのｍ番目の線形予測係数である。

The linear prediction count change amount calculation unit 11 inputs an audio signal to the FIR filter and obtains a linear prediction coefficient by using an LMS algorithm or the like for the filter coefficient. The total sum he (n) of the temporal change amount of the linear prediction coefficient in the M-th order linear prediction filter at time n is obtained from the following equation. h _m (n) is the m-th linear prediction coefficient at time n.

第１エンベロープ演算部１２は、予測係数の変化量の総和のエンベロープをとり、第１ピーク検出部１３は、音節特徴量としてエンベロープピーク数を検出し、第１話速演算部１４は、単位時間あたりのピーク数をカウントし、カウントした単位時間あたりのピーク数を第１話速パラメータとして出力する。定常的な信号に対する線形予測分析を行った場合、得られた線形予測係数は時間により変化せず一定の値になるので、入力信号が音声信号に定常雑音が重畳したものであっても、音節特徴量を抽出することができるので、雑音環境下でも安定して話速を検出することができる。 The first envelope calculation unit 12 takes an envelope of the total amount of change in the prediction coefficient, the first peak detection unit 13 detects the number of envelope peaks as a syllable feature amount, and the first speech speed calculation unit 14 The number of peaks per unit time is counted, and the counted number of peaks per unit time is output as the first speech speed parameter. When linear prediction analysis is performed on a stationary signal, the obtained linear prediction coefficient does not change with time and becomes a constant value. Therefore, even if the input signal is a speech signal superimposed with stationary noise, Since the feature amount can be extracted, the speech speed can be stably detected even in a noisy environment.

一方、音声絶対値演算部２１は、入力される音声信号の絶対値を求め、第２エンベロープ演算部２２は、音声信号の絶対値のエンベロープをとる。第２ピーク検出部２３は、音節特徴量としてエンベロープピーク数を検出し、第２話速演算部２４は、単位時間あたりのピーク数をカウントし、カウントした単位時間あたりのピーク数を第２話速パラメータとして出力する。総合話速演算部１０は、第１話速パラメータと、第２話速パラメータにそれぞれ所定の寄与率（例えば３：７、４：６、５：５、６：４、７：３など）を掛けて足し算したものに基づいて話速を演算する。 On the other hand, the audio absolute value calculation unit 21 calculates the absolute value of the input audio signal, and the second envelope calculation unit 22 takes the envelope of the absolute value of the audio signal. The second peak detection unit 23 detects the number of envelope peaks as a syllable feature amount, and the second speech speed calculation unit 24 counts the number of peaks per unit time, and the counted number of peaks per unit time is the second story. Output as a speed parameter. The total speech speed calculation unit 10 gives predetermined contribution rates (for example, 3: 7, 4: 6, 5: 5, 6: 4, 7: 3, etc.) to the first speech speed parameter and the second speech speed parameter, respectively. The speech speed is calculated based on the result of multiplication and addition.

図４（ａ）に入力信号の時間波形を、図４（ｂ）に、入力信号の時間波形と線形予測係数の時間変化量の総和に対してエンベロープをかけた波形の一例を、図４（ｃ）に入力信号の時間波形の信号絶対値にエンベロープをかけた波形の一例を示す。 FIG. 4A shows an example of the time waveform of the input signal, and FIG. 4B shows an example of the waveform obtained by applying an envelope to the sum of the time waveform of the input signal and the time variation of the linear prediction coefficient. An example of a waveform obtained by applying an envelope to the signal absolute value of the time waveform of the input signal is shown in c).

前述のように、線形予測係数の変化量のエンベロープから音声の周波数特性などの変化を抽出することができるけれども、線形予測係数の変化量は、振幅の小さい信号に対しては変動が小さく、音声の特徴の変化を抽出できない場合がある。例えば図４（ｂ）の円Ａ及び図４（ｃ）の円Ｂに注目すると、話速検出のために線形予測係数の変化量のエンベロープを用いた場合、信号絶対値のエンベロープからでは抽出できないような音声の変化を検出することができる。それに対して、図４（ａ）の円Ｃ、図４（ｂ）の円Ｄ及び図４（ｃ）の円Ｅに注目すると、音声の振幅が小さい部分（円Ｃ）では、線形予測係数の変化量のエンベロープからは音声に変化を抽出することはできないが（円Ｄ）、信号絶対値のエンベロープからは音声の変化を抽出することができる（円Ｅ）。このように、線形予測係数の変化量のエンベロープと信号絶対値のエンベロープを併用することにより、互いに他方の欠点を補完し合うため、精度の高い話速検出を実現することができる。 As described above, it is possible to extract changes such as the frequency characteristics of speech from the envelope of the change amount of the linear prediction coefficient. However, the change amount of the linear prediction coefficient varies little for a signal having a small amplitude, It may not be possible to extract changes in the characteristics of For example, focusing on the circle A in FIG. 4B and the circle B in FIG. 4C, when the envelope of the change amount of the linear prediction coefficient is used for detecting the speech speed, it cannot be extracted from the envelope of the signal absolute value. Such a change in voice can be detected. On the other hand, when attention is paid to the circle C in FIG. 4A, the circle D in FIG. 4B, and the circle E in FIG. 4C, the linear prediction coefficient of the portion where the speech amplitude is small (circle C) is shown. A change cannot be extracted from the change amount envelope (circle D), but a change in the sound can be extracted from the signal absolute value envelope (circle E). In this way, by using the envelope of the change amount of the linear prediction coefficient and the envelope of the signal absolute value in combination, the other drawback is complemented with each other, so that it is possible to realize highly accurate speech speed detection.

図３に示す構成例では、音声区間検出部２０により、入力される音声信号から音声区間を検出し、音声区間検出部２０が音声区間であると判断した区間に対してのみ、すなわち、音声区間フラグがオンしている期間のみ、第１話速演算部１４及び第２話速演算部２４は、それぞれ単位時間あたりのピーク数の演算を行う。すなわち、音声信号のうち、話者が実際に話している音声区間に対してだけ話速検出処理を行うので、話者が話していない非音声区間における環境ノイズなどによる影響を排除することができ、より精度の高い話速検出を行うことができる。なお、音声区間検出部２０は、特開２００５−１５６８８６号公報に記載された方法などにより、音声区間と非音声区間の区別及び音声区間の検出を行うものとし、その詳細な説明は省略する。 In the configuration example shown in FIG. 3, the voice section is detected from the input voice signal by the voice section detection unit 20, and only for the section determined by the voice section detection unit 20 to be a voice section, that is, the voice section. Only during the period when the flag is on, the first speech speed calculator 14 and the second speech speed calculator 24 each calculate the number of peaks per unit time. In other words, the speech speed detection process is performed only for the speech section of the speech signal that the speaker is actually speaking, so it is possible to eliminate the influence of environmental noise in the non-speech section where the speaker is not speaking. Therefore, more accurate speech speed detection can be performed. Note that the speech segment detection unit 20 performs distinction between speech segments and non-speech segments and detection of speech segments by the method described in Japanese Patent Application Laid-Open No. 2005-156886, and detailed description thereof is omitted.

なお、図５に示すように、第１ピーク検出部１３及び第２ピーク検出部２３において、エンベロープの極大値から極小値への変化量が所定の設定値（閾値）よりも小さい場合、その極大値をピークとしては検出しないように構成してもよい。線形予測係数の変化量のエンベロープ及び信号絶対値のエンベロープでは、交互に極大値と極小値が連続する。図５中×印を付けたピークに着目すると、極小値から極大値への変化が比較的大きくても、極大値から極小値への変化は非常に小さい。従って、極大値から極小値への変化量を所定の設定値と比較し、変化量が設定値よりも小さい場合はピークとしてカウントしないようにすれば、変化の大きいピークのみが検出され、ノイズなどによる影響を低減することができ、その結果として、精度の高い話速検出を行うことができる。 As shown in FIG. 5, in the first peak detection unit 13 and the second peak detection unit 23, when the amount of change from the maximum value of the envelope to the minimum value is smaller than a predetermined set value (threshold value), the maximum You may comprise so that a value may not be detected as a peak. In the envelope of the change amount of the linear prediction coefficient and the envelope of the signal absolute value, the maximum value and the minimum value continue alternately. Focusing on the peaks marked with x in FIG. 5, even if the change from the minimum value to the maximum value is relatively large, the change from the maximum value to the minimum value is very small. Therefore, if the amount of change from the maximum value to the minimum value is compared with a predetermined set value, and if the change amount is smaller than the set value, it is not counted as a peak, only the peak with a large change is detected and noise etc. As a result, the speech speed can be detected with high accuracy.

図１に示す音声伸長率決定部２は、総合話速演算部１０により演算された話速パラメータに基づいて音声信号を再生する際の音声伸長率を決定する。また、話速変換部３は、音声伸長率決定部２により決定された音声伸長率に基づいて、音声信号の話速変換を行う。話速演算アルゴリズムとしては、例えばＰＩＣＯＬＡ(Pointer Interval Controlled OverLap and Add)アルゴリズムなどを用いることができる。 The speech expansion rate determination unit 2 shown in FIG. 1 determines a speech expansion rate when reproducing a speech signal based on the speech speed parameter calculated by the general speech speed calculation unit 10. Further, the speech speed conversion unit 3 performs speech speed conversion of the speech signal based on the speech expansion rate determined by the speech expansion rate determination unit 2. As the speech speed calculation algorithm, for example, a PICOLA (Pointer Interval Controlled OverLap and Add) algorithm can be used.

なお、本発明は、上記実施形態の記載に限定されるものではなく、発明の趣旨を逸脱しない範囲で様々な変形や応用が可能である。例えば、本発明は、電話やインターホンなどのリアルタイム型の通話装置だけでなくＩＣレコーダなどの音声再生装置の話速検出に用いることことができることはいうまでもない。 In addition, this invention is not limited to description of the said embodiment, A various deformation | transformation and application are possible in the range which does not deviate from the meaning of invention. For example, it goes without saying that the present invention can be used for detecting the speech speed of not only a real-time communication device such as a telephone or an interphone but also an audio playback device such as an IC recorder.

本発明の一実施形態に係る話速検出装置の使用例を示すブロック図。The block diagram which shows the usage example of the speech-speed detection apparatus which concerns on one Embodiment of this invention. 本実施形態に係る話速検出装置の一構成例を示すブロック図。The block diagram which shows the example of 1 structure of the speech-speed detection apparatus which concerns on this embodiment. 本実施形態に係る話速検出装置の他の構成例を示すブロック図。The block diagram which shows the other structural example of the speech speed detection apparatus which concerns on this embodiment. （ａ）は入力信号の時間波形を示す図、（ｂ）は入力信号の時間波形にエンベロープをかけた波形の一例を示す図、（ｃ）は入力信号の時間波形と線形予測係数の時間変化量の総和に対してエンベロープをかけた波形の一例を示す図。(A) is a figure which shows the time waveform of an input signal, (b) is a figure which shows an example of the waveform which applied the envelope to the time waveform of the input signal, (c) is a time change of the time waveform of an input signal, and a linear prediction coefficient. The figure which shows an example of the waveform which applied the envelope with respect to the sum total of quantity. エンベロープにおける極大値から極小値への変化が小さい場合にピークとしてカウントしないようにした変形例を説明するための波形図。The wave form diagram for demonstrating the modification which was not counted as a peak, when the change from the maximum value to the minimum value in an envelope was small.

Explanation of symbols

１話速検出装置
１０総合話速演算部
１１線形予測係数変化量演算部
１２第１エンベロープ演算部
１３第１ピーク検出部
１４第１話速演算部
２０音声区間検出部
２１音声絶対値演算部
２２第２エンベロープ演算部
２３第２ピーク検出部
２４第２話速演算部 DESCRIPTION OF SYMBOLS 1 Speech speed detection apparatus 10 Comprehensive speech speed calculation part 11 Linear prediction coefficient variation | change_quantity calculation part 12 1st envelope calculation part 13 1st peak detection part 14 1st speech speed calculation part 20 Voice area detection part 21 Voice absolute value calculation part 22 Second envelope calculator 23 Second peak detector 24 Second speech speed calculator

Claims

Perform linear prediction analysis on the input speech signal, find the envelope of the total amount of change in the obtained prediction coefficient, use the number of peaks per unit time in the envelope as the first speech speed parameter,
Taking the envelope of the absolute value of the audio signal, the peak number per unit time in the envelope as the second speech speed parameter,
2. A speech speed detecting apparatus, comprising: calculating a speech speed based on a result obtained by multiplying the first speech speed parameter and the second speech speed parameter by a predetermined contribution rate, respectively.

A linear prediction count change amount calculation unit that performs linear prediction on an input speech signal and calculates a change amount of a linear prediction coefficient;
A first envelope calculation unit for obtaining an envelope of the sum of the change amounts of the linear prediction count obtained by the linear prediction count change amount calculation unit;
A first peak detector that detects a peak of the envelope obtained by the first envelope calculator;
A first speech speed calculator that calculates the number of peaks per unit time from the number of peaks detected by the first peak detector, and outputs the obtained peak number as a first speech speed parameter;
An audio absolute value calculation unit for obtaining an absolute value of the input audio signal;
A second envelope calculation unit for obtaining an envelope of the voice absolute value obtained by the voice absolute value calculation unit;
A second peak detector for detecting the peak of the envelope obtained by the second envelope calculator;
A second speech speed calculator that calculates the number of peaks per unit time from the number of peaks detected by the second peak detector, and outputs the obtained number of peaks as a second speech speed parameter;
A speech speed calculation comprising an overall speech speed calculation unit for calculating a speech speed based on a result obtained by multiplying the first speech speed parameter and the second speech speed parameter by a predetermined contribution rate, respectively. apparatus.

A voice section detector for detecting a voice section from the input voice signal is further provided, and the first speech speed calculator and the second episode are only applied to a section determined by the voice section detector as a voice section. The speech speed calculation device according to claim 2, wherein each of the speed calculation units calculates the number of peaks per unit time.

The first peak detection unit and the second peak detection unit do not detect the maximum value as a peak when the amount of change from the maximum value of the envelope to the minimum value is smaller than a predetermined set value. The speech speed calculation device according to claim 2.