JP2008107706A - Speech speed conversion apparatus and program - Google Patents

Speech speed conversion apparatus and program Download PDF

Info

Publication number
JP2008107706A
JP2008107706A JP2006292470A JP2006292470A JP2008107706A JP 2008107706 A JP2008107706 A JP 2008107706A JP 2006292470 A JP2006292470 A JP 2006292470A JP 2006292470 A JP2006292470 A JP 2006292470A JP 2008107706 A JP2008107706 A JP 2008107706A
Authority
JP
Japan
Prior art keywords
frame
speech
speech speed
frames
speed conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2006292470A
Other languages
Japanese (ja)
Inventor
Yuji Hisaminato
裕司 久湊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Priority to JP2006292470A priority Critical patent/JP2008107706A/en
Publication of JP2008107706A publication Critical patent/JP2008107706A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Telephone Function (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To prevent that a wave shape of a voice signal does not become continuous after processing, without imposing an excessive processing load on a speech speed conversion apparatus, when speech speed is converted by applying time axis compression on the voice signal. <P>SOLUTION: The speech speed conversion apparatus comprises: a calculation means for calculating a value which indicates a time change rate of a logarithmic value of a sound volume which is expressed by the voice signal to be processed; a discrimination means for discriminating that speech speed conversion is inhibited, in frames in which a magnitude of the value calculated by the calculation means is more than a prescribed threshold value, and the prescribed number of frames following the frame, in the plurality of frames for constituting the voice signal; and a speech speed conversion means which outputs the voice signal as it is, for the frame which is discriminated that the speech speed conversion is inhibited, and meanwhile, which outputs the voice signal by inserting or deleting a wave shape so that it may become the specified speech speed, for the other frames. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声信号に時間軸圧伸処理を施す技術に関する。   The present invention relates to a technique for performing time axis companding processing on an audio signal.

音声信号に時間軸圧伸処理を施し、その音声信号の表す音声の話速を適宜調整する技術が種々提案されている。例えば、非特許文献1には、フレーム長を変えつつ音声の自己相関を算出し、最も相関が高くなるフレーム長をその音声の周期とみなし、その周期単位で波形の挿入または削除を行うことにより話速変換を行うPICOLAと呼ばれるアルゴリズムが開示されている。
森田直孝,板倉文忠、“ポインター移動制御による重複加算法(PICOLA)を用いた音声の時間軸での伸長圧縮とその評価”、日本音響学会講演論文集、p.149-150、昭和61年10月
Various techniques have been proposed for performing time axis companding processing on an audio signal and appropriately adjusting the speech speed of the audio represented by the audio signal. For example, Non-Patent Document 1 calculates the autocorrelation of speech while changing the frame length, regards the frame length with the highest correlation as the cycle of the speech, and inserts or deletes waveforms in units of the cycle. An algorithm called PICOLA that performs speech speed conversion is disclosed.
Naotaka Morita and Fumada Itakura, “Expansion and compression of speech over time using pointer movement control (PICOLA) and its evaluation”, Proc. Of the Acoustical Society of Japan, p.149-150, October 1986 Moon

しかしながら、非特許文献1に開示された技術を単純に適用してしまうと、音量変化の大きい部分や破裂音の子音部分で聴感上の問題を生じさせてしまう場合がある。具体的には、音量変化の大きい部分で波形の挿入や削除を行ってしまうと、その部分で音量の変化が不連続になり、「ぼこっ」という異音が聞こえてしまう場合がある。また、破裂音の子音部分で波形の挿入や削除が行われてしまうと、子音が複数回聴こえてしまう場合がある。   However, if the technique disclosed in Non-Patent Document 1 is simply applied, there may be a problem in hearing in a portion where the volume change is large or a consonant portion of a plosive sound. Specifically, if a waveform is inserted or deleted at a portion where the volume change is large, the change in the volume becomes discontinuous at that portion, and there may be an audible noise “bumpy”. In addition, if a waveform is inserted or deleted at the consonant part of the plosive sound, the consonant may be heard multiple times.

上記の如き波形の不連続の発生を回避する方策としては、処理対象である音声信号に音声認識処理を施し、破裂音の子音など波形の不連続が生じ易い箇所を予め特定し、該当箇所を時間軸圧伸処理の対象から除外しておくことが考えられる。しかしながら、音声認識処理の実行には一般に多大なハードウェアリソースを要し(換言すれば、処理負荷が高い)、例えば携帯電話機など処理能力が低い端末装置には適用が難しいといった問題点がある。   As a measure for avoiding the occurrence of waveform discontinuity as described above, speech recognition processing is performed on the speech signal to be processed, a location where waveform discontinuity such as a consonant of a plosive is likely to occur is specified in advance, It may be possible to exclude it from the target of the time axis companding process. However, the execution of the speech recognition process generally requires a large amount of hardware resources (in other words, a high processing load), and there is a problem that it is difficult to apply to a terminal device with a low processing capability such as a mobile phone.

本発明は、上記課題に鑑みて為されたものであり、音声信号に時間軸圧伸を施して話速変換する際に、話速変換を実行する装置に過大な処理負荷をかけることなく、処理後の音声信号に波形の不連続が生じることを回避する技術を提供することを目的としている。   The present invention has been made in view of the above problems, and when performing speech axis conversion by applying time axis companding to a speech signal, without overloading the apparatus that performs speech speed conversion, An object of the present invention is to provide a technique for avoiding the occurrence of waveform discontinuity in the processed audio signal.

上記課題を解決するために、本発明は、処理対象である音声信号の表す音の立ち上がり部分および該音の立下り部分に該当するフレームを前記音の音量の時間変化から特定し、そのフレームとそのフレームの前後の所定数のフレームについて、話速変換を禁止されている話速変換禁止フレームであると判別する判別手段と、話速変換禁止フレームであると前記判別手段により判別されたフレームについてはそのまま出力する一方、その他のフレームについては、指定された話速になるように波形挿入または波形削除を行って出力する話速変換手段と、を具備することを特徴とする話速変換装置を提供する。   In order to solve the above-described problem, the present invention identifies a rising portion of a sound represented by an audio signal to be processed and a frame corresponding to the falling portion of the sound from a temporal change in the volume of the sound, A discriminating means for discriminating that a predetermined number of frames before and after the frame are speech speed conversion prohibiting frames for which speech speed conversion is prohibited, and for a frame determined by the discriminating means for being a speech speed conversion prohibiting frame A speech speed converting device comprising: a speech speed converting means for outputting the other frames while performing waveform insertion or waveform deletion so as to achieve the designated speech speed. provide.

また、上記課題を解決するために、本発明は、コンピュータ装置に、処理対象である音声信号の表す音の立ち上がり部分および該音の立下り部分に該当するフレームを前記音の音量の時間変化から特定し、そのフレームとそのフレームの前後の所定数のフレームについて、話速変換を禁止されている話速変換禁止フレームであると判別する第1のステップと、話速変換禁止フレームであると前記第1のステップにて判別されたフレームについてはそのまま出力する一方、その他のフレームについては、指定された話速になるように波形挿入または波形削除を行って出力する第2のステップと、を実行させることを特徴とするプログラムを提供する。   In order to solve the above-described problem, the present invention provides a computer apparatus with a rising portion of a sound represented by an audio signal to be processed and a frame corresponding to the falling portion of the sound from a time change of the sound volume. A first step of identifying the frame and a predetermined number of frames before and after the frame as a speech speed conversion prohibited frame for which speech speed conversion is prohibited; The frame determined in the first step is output as it is, while the other steps are output by performing waveform insertion or waveform deletion so as to achieve the designated speech speed. Provided is a program characterized in that

本発明によれば、音声信号に時間軸圧伸を施して話速変換する際に、話速変換を実行する装置に過大な処理負荷をかけることなく、処理後の音声信号に波形の不連続が生じることを回避することが可能になる、といった効果を奏する。   According to the present invention, when the speech signal is subjected to time axis companding to convert the speech speed, the waveform of the processed speech signal is not discontinuous without imposing an excessive processing load on the device that performs the speech speed conversion. It is possible to avoid the occurrence of the occurrence of the problem.

以下、図面を参照しつつ、本発明を実施する際の最良の形態について説明する。
(A:構成)
図1は、本発明の一実施形態に係る話速変換装置10のハードウェア構成の一例を示す図である。
図1の入力端子CH−inには、アナログの音声信号S−in(t)を出力する音源(図示省略)が接続されており、この音源から出力される音声信号が入力される。
図1においては詳細な図示は省略したが、入力端子CH−inへ入力された音声信号S−in(t)は、A/D変換回路(図示省略)によってデジタル信号に変換された後に所定時間長(本実施形態では、8ミリ秒)のフレーム単位で切り出され、遅延処理回路110と音量変化率算出部120とへ引き渡される。以下では、デジタルデータに変換されフレーム単位で切り出された音声信号についても“S−in(t)”と表記する。
なお、以下では、図2(a)に示す信号波形を有する音声信号S−in(t)が入力端子CH−inへ入力されるものとする。また、本実施形態では、入力端子CH−inへ入力される音声信号がアナログ信号である場合について説明するが、デジタル信号であっても良いことは勿論であり、この場合、A/D変換回路を設ける必要がないことは言うまでもない。
Hereinafter, the best mode for carrying out the present invention will be described with reference to the drawings.
(A: Configuration)
FIG. 1 is a diagram illustrating an example of a hardware configuration of a speech speed conversion apparatus 10 according to an embodiment of the present invention.
A sound source (not shown) that outputs an analog audio signal S-in (t) is connected to the input terminal CH-in in FIG. 1, and an audio signal output from this sound source is input.
Although the detailed illustration is omitted in FIG. 1, the audio signal S-in (t) input to the input terminal CH-in is converted into a digital signal by an A / D conversion circuit (not shown) for a predetermined time. A long frame (8 milliseconds in this embodiment) is cut out in units of frames and delivered to the delay processing circuit 110 and the volume change rate calculation unit 120. Hereinafter, an audio signal converted into digital data and cut out in units of frames is also expressed as “S-in (t)”.
In the following, it is assumed that an audio signal S-in (t) having the signal waveform shown in FIG. 2A is input to the input terminal CH-in. In this embodiment, the case where the audio signal input to the input terminal CH-in is an analog signal will be described. However, it is a matter of course that the audio signal may be a digital signal. In this case, the A / D conversion circuit Needless to say, there is no need to provide a.

図1の遅延処理回路110は、入力された音声信号S−in(t)を数十ミリ秒程度の遅延時間Δtだけ遅延させて話速変換部140へ出力する。なお、音声信号S−in(t)に上記遅延処理を施す理由は、話速変換処理が略リアルタイムで為されているようにユーザに体感させるため、遅延時間Δtだけ過去に遡って話速変換を実行するためである。ここで上記遅延時間Δtは、数十ミリ秒程度であるから、聴感上の影響は殆どない。   The delay processing circuit 110 in FIG. 1 delays the input audio signal S-in (t) by a delay time Δt of about several tens of milliseconds and outputs the delayed signal to the speech speed conversion unit 140. Note that the reason why the delay process is performed on the audio signal S-in (t) is that the voice speed conversion is performed retroactively by the delay time Δt in order to make the user feel as if the voice speed conversion process is performed in substantially real time. It is for executing. Here, since the delay time Δt is about several tens of milliseconds, there is almost no influence on hearing.

図1の音量変化率算出部120は、上記のようにデジタル変換された音声信号S−in(t)をフレーム単位で順次受け取り、その音声信号の表す音声の音量の時間変化の度合いを表す値ΔP(t)をフレーム毎に算出し判別部130へ引き渡すものである。
より詳細に説明すると、音量変化率算出部120は、フレーム単位で順次受け取った音声信号S−in(t)のエンベロープを求めることによって、その音声信号の音量をフレーム毎に特定し(図2(b)参照)、さらに、上記ΔP(t)として、音量の常用対数値(図2(c))の一次微分(図2(d))をフレーム毎に算出して判別部130へ引き渡す。
なお、上記音声信号S−in(t)の表す音声にて音量が急激に上昇する立ち上がり部分や音量が急激に下降する立下り部分を正確に特定するために、上記ΔP(t)として音量の常用対数値(図2(c))の一次微分を用いている。しかしながら、例えば音量の一次微分を用いても上記立ち上がり部分や立下り部分を特定することができる場合には、音量の一次微分を上記ΔP(t)として用いても良いことは勿論である。
The volume change rate calculation unit 120 in FIG. 1 sequentially receives the audio signal S-in (t) digitally converted as described above in units of frames, and represents a value representing the degree of temporal change in volume of the audio represented by the audio signal. ΔP (t) is calculated for each frame and delivered to the determination unit 130.
More specifically, the volume change rate calculation unit 120 specifies the volume of the audio signal for each frame by obtaining the envelope of the audio signal S-in (t) sequentially received in units of frames (FIG. 2 ( b)), and the first derivative (FIG. 2D) of the common logarithmic value of the sound volume (FIG. 2C) is calculated for each frame as ΔP (t) and delivered to the determination unit 130.
Note that in order to accurately specify the rising portion where the volume suddenly increases and the falling portion where the volume rapidly decreases in the sound represented by the audio signal S-in (t), the volume of the sound is expressed as ΔP (t). The first derivative of the common logarithm value (FIG. 2 (c)) is used. However, for example, when the rising portion and the falling portion can be specified using the first derivative of the sound volume, it is needless to say that the first derivative of the sound volume may be used as the ΔP (t).

さて、本実施形態に係る音量変化率算出部120は、t番目のフレームについてのΔP(t)を以下の式(1)にしたがって算出する。
ΔP(t)={ln(10)×(P(t)−P(t−1))}/ln(P(t)…(1)
ただし、式(1)において、ln()は自然対数を意味し、P(t)はt番目のフレームの音量を所定の基準音量で規格化して得られる値であり、P(t−1)はその1つ手前のフレームの音量を所定の基準音量で規格化して得られる値である。
なお、本実施形態では、各フレームの音量の時間変化の度合いを示す値ΔP(t)として音量の常用対数値の一次微分を用い、その値を式(1)にしたがって算出する場合について説明したが、他の手法(例えば、互いに異なる複数の真数値に対応付けてその真数値に対する常用対数の一次微分の値を格納したテーブルを参照して上記ΔP(t)を求める手法)であっても勿論良い。
Now, the volume change rate calculation unit 120 according to the present embodiment calculates ΔP (t) for the t-th frame according to the following equation (1).
ΔP (t) = {ln (10) × (P (t) −P (t−1))} / ln (P (t) (1)
In Equation (1), ln () means a natural logarithm, P (t) is a value obtained by normalizing the volume of the t-th frame with a predetermined reference volume, and P (t−1) Is a value obtained by normalizing the volume of the previous frame with a predetermined reference volume.
In the present embodiment, a case has been described in which the first-order differential of the sound volume is used as the value ΔP (t) indicating the degree of temporal change in volume of each frame, and the value is calculated according to the equation (1). However, other methods (for example, a method of obtaining the above ΔP (t) by referring to a table in which the values of the first derivative of the common logarithm with respect to the true values are stored in association with a plurality of different true values) Of course it is good.

判別部130は、図1においては、詳細な図示は省略したが、遅延時間Δtに応じた数のフレームを格納し得る記憶容量を有するバッファとコンパレータとを含んでいる。
この判別部130は、音量変化率算出部120から引き渡されるΔP(t)を受け取って、その受け取り順に上記バッファへ格納する一方、そのΔP(t)よりもΔtだけ過去の時刻における音量の変化度合いを示す値(すなわち、ΔP(t−Δt))を上記バッファから読み出し、そのΔP(t−Δt)の大きさが所定の閾値thを超えているか否かを判別し、その判別結果に応じた制御信号CSを出力する。
より詳細に説明すると、判別部130は、ΔP(t)の大きさが上記閾値thを上回っているフレーム(図2(d)にて破線で区画された部分)とその前後の所定数のフレーム(図2(d)にてハッチングで示す部分)について、話速変換を禁止するフレーム(以下、「話速変換禁止フレーム」)であると判別し、その旨を示す制御信号CSとして信号値が“1”である制御信号(以下、話速変換禁止信号)を話速変換部140へ出力する。また、判別部130は、他のフレームについては、信号値が“0”である制御信号CS(以下、話速変換許可信号)を話速変換部140へ出力する。これにより、音声信号S−in(t)を構成する複数のフレームうちの何れが話速変換禁止フレームであるのかが話速変換部140へ伝達されることになる。
Although not shown in detail in FIG. 1, the determination unit 130 includes a buffer having a storage capacity capable of storing a number of frames corresponding to the delay time Δt, and a comparator.
The determination unit 130 receives ΔP (t) delivered from the volume change rate calculation unit 120 and stores the ΔP (t) in the buffer in the order of reception. On the other hand, the change level of the volume at the past time by Δt from the ΔP (t). Is read from the buffer, and it is determined whether or not the magnitude of ΔP (t−Δt) exceeds a predetermined threshold th, and the value corresponding to the determination result is determined. A control signal CS is output.
More specifically, the determination unit 130 determines that a frame in which ΔP (t) has a value greater than the threshold th (a portion partitioned by a broken line in FIG. 2D) and a predetermined number of frames before and after the frame. (The portion indicated by hatching in FIG. 2D) is determined to be a frame for which speech speed conversion is prohibited (hereinafter referred to as “speech speed conversion prohibition frame”), and a signal value is given as a control signal CS indicating that. A control signal that is “1” (hereinafter referred to as a speech speed conversion prohibition signal) is output to the speech speed conversion unit 140. In addition, the determination unit 130 outputs a control signal CS (hereinafter referred to as a speech speed conversion permission signal) whose signal value is “0” to the speech speed conversion unit 140 for other frames. As a result, which of the plurality of frames constituting the audio signal S-in (t) is the speech rate conversion prohibited frame is transmitted to the speech rate conversion unit 140.

話速変換部140は、例えばDSPであり、非特許文献1に開示された話速変換アルゴリズム(すなわち、PICOLA)にしたがって、図示せぬ操作部を介してユーザによって指定された話速に応じた周期数分の波形を挿入または削除する処理をS−in(t−Δt)に施し、その処理結果である音声信号S−out(t−Δt)をD/A変換回路(図示省略)によってアナログ信号へ変換し出力端子CH−outを介して外部へ出力する。この出力端子CH−outには、例えばスピーカなどの放音装置(図示省略)が接続されており、この放音装置からは、話速変換装置10により話速変換処理が施された音声信号に応じた音声が放音される。   The speech rate conversion unit 140 is, for example, a DSP, and corresponds to the speech rate specified by the user via an operation unit (not shown) according to the speech rate conversion algorithm (that is, PICOLA) disclosed in Non-Patent Document 1. A process for inserting or deleting a waveform corresponding to the number of cycles is applied to S-in (t-Δt), and an audio signal S-out (t-Δt) as a result of the processing is analogized by a D / A conversion circuit (not shown). The signal is converted into a signal and output to the outside via the output terminal CH-out. For example, a sound emitting device (not shown) such as a speaker is connected to the output terminal CH-out. From the sound emitting device, the sound signal subjected to the speech speed conversion processing by the speech speed converting device 10 is supplied. The corresponding sound is emitted.

ただし、本実施形態にかかる話速変換部140は、処理対象であるフレームについて判別部130から引き渡された制御信号CSが話速変換禁止信号である場合(すなわち、処理対象フレームが話速変換禁止フレームである場合)には、S−in(t−Δt)をそのままS−out(t−Δt´)として出力する点が非特許文献1に開示された技術と異なっている。なお、出力信号S−outの遅延量が「Δt´」となっているのは、それまでの話速変換によって遅延量がΔtとは異なりうることを意味している。
このため、本実施形態に係る話速変換装置10においては、音量が大きく変化している部分(例えば、音の立ち上がり部分や立下り部分)に波形挿入や波形削除が行われることはなく、非特許文献1について指摘した問題が生じることはない。なお、破裂音の子音部分については、音量変化の度合いが大きいことが一般的であり、そのような部分については判別部130によって話速変換禁止信号が出力されるので、破裂音の子音部分で複数回表れるといった不具合が生じることは極めて少なくなる。また、破裂音の子音部分について常に音量変化が大きいとは限らないが、音量変化小さい部分については、波形の不連続が生じたとしても聴感上はほとんど影響を与えない。
However, the speech speed conversion unit 140 according to the present embodiment, when the control signal CS delivered from the determination unit 130 for the processing target frame is a speech speed conversion prohibition signal (that is, the processing target frame is the speech speed conversion prohibition). In the case of a frame), S-in (t-Δt) is directly output as S-out (t-Δt ′), which is different from the technique disclosed in Non-Patent Document 1. Note that the delay amount of the output signal S-out is “Δt ′”, which means that the delay amount can be different from Δt due to the speech speed conversion up to that point.
For this reason, in the speech rate conversion apparatus 10 according to the present embodiment, waveform insertion or waveform deletion is not performed in a portion where the volume is greatly changed (for example, a rising portion or falling portion of sound). The problem pointed out with respect to Patent Document 1 does not occur. Note that the consonant part of the plosive generally has a large degree of volume change, and the speech rate conversion prohibition signal is output by the determination unit 130 for such a part. The occurrence of problems such as multiple appearances is extremely low. Further, although the volume change is not always large for the consonant part of the plosive sound, even if the waveform discontinuity occurs in the part where the volume change is small, there is almost no effect on hearing.

以上に説明したように、本実施形態に係る話速変換装置10によれば、音声信号に時間軸圧伸を施して話速変換する際に、音量変化の大きい部分で音声信号に波形の不連続が生じることが回避される。加えて、本実施形態に係る話速変換装置10においては、波形挿入や波形削除の対象から除外する部分を音量の変化率に基づいて特定し、音声認識などの複雑な処理を行ってはいないため、話速変換装置10に過大な負荷がかかることはない。
このように、本実施形態によれば、音声信号に時間軸圧伸を施して話速変換する際に、話速変換を実行する装置に過大な処理負荷をかけることなく、処理後の音声信号に波形の不連続が生じることを回避することが可能になる、といった効果を奏する。
As described above, according to the speech speed converting apparatus 10 according to the present embodiment, when speech speed conversion is performed by applying time axis companding to a speech signal, the waveform of the speech signal is reduced in a portion where the volume change is large. Continuation is avoided. In addition, in the speech speed conversion apparatus 10 according to the present embodiment, a part to be excluded from waveform insertion and waveform deletion targets is specified based on the rate of change in volume, and complicated processing such as voice recognition is not performed. Therefore, an excessive load is not applied to the speech speed conversion apparatus 10.
Thus, according to the present embodiment, when the speech signal is subjected to time axis companding to convert the speech speed, the processed speech signal is not subjected to an excessive processing load on the device that performs the speech speed conversion. It is possible to avoid the occurrence of discontinuity in the waveform.

(B:変形)
以上、本発明の1実施形態について説明したが、係る実施形態に以下に述べるような変形を加えても良いことは勿論である。
(1)上述した実施形態では、ΔP(t)の大きさが所定の閾値を超えているフレームとそのフレームの前後の所定数のフレームについて話速変換を禁止する場合について説明した。しかしながら、音の立ち上がり部分(ΔP(t)が正の部分)と音の立下り部分とで、話速変換を禁止する区間の長さを変える(例えば、音の立下り部分について話速変換を禁止する区間を長くする)ようにしても良い。
(B: Deformation)
Although one embodiment of the present invention has been described above, it is needless to say that the embodiment may be modified as described below.
(1) In the above-described embodiment, a case has been described in which speech speed conversion is prohibited for a frame in which the magnitude of ΔP (t) exceeds a predetermined threshold and a predetermined number of frames before and after the frame. However, the length of the section in which speech speed conversion is prohibited is changed between the rising part of the sound (the part where ΔP (t) is positive) and the falling part of the sound (for example, speaking speed conversion is performed for the falling part of the sound). It is also possible to lengthen the prohibited section.

(2)上述した実施形態では、処理対象である音声信号の表す音声の音量の対数値の時間変化率を示す値をフレーム毎に算出する算出手段(音量変化率算出部120)、音声信号を構成する複数のフレームのうち、算出手段により算出された値の大きさが所定の閾値を超えているフレームおよびそのフレームに後続する所定数のフレームについて、話速変換を禁止されていると判別する判別手段(判別部130)、話速変換を禁止されていると判別されたフレームについてはそのまま出力する一方、その他のフレームについては、指定された話速になるように波形挿入または波形削除を行って出力する話速変換手段(話速変換部140)の各々をハードウェアモジュールで実現し、これらハードウェアモジュールを組み合わせて本発明に係る話速変換装置を構成する場合について説明したが、これら各手段をソフトウェアモジュールで実現するとしても良いことは勿論である。 (2) In the above-described embodiment, the calculation means (volume change rate calculation unit 120) that calculates a value indicating the time change rate of the logarithmic value of the sound volume represented by the audio signal to be processed for each frame, Among a plurality of constituting frames, it is determined that speech rate conversion is prohibited for a frame whose magnitude calculated by the calculation means exceeds a predetermined threshold and a predetermined number of frames following the frame. The discriminating means (discriminating unit 130) outputs the frame as it is determined that the speech speed conversion is prohibited, while performing waveform insertion or waveform deletion so as to achieve the designated speech speed for the other frames. Each of the speech speed converting means (speech speed converting unit 140) that outputs the data is realized by hardware modules, and the hardware modules are combined to provide the present invention. Been described to configure the speed converter, it is of course may be to realize these respective means in the software module.

具体的には、CPU(Central Processing Unit)を上記音量変化率算出手段、判別手段および話速変換手段として機能させるプログラム(例えば、上記CPUに図3に示す話速変換処理を実行させるプログラム)を、パーソナルコンピュータなど一般的なコンピュータ装置にインストールし、そのコンピュータ装置を本発明に係る話速変換装置として機能させるようにすれば良い。   Specifically, a program that causes a CPU (Central Processing Unit) to function as the volume change rate calculation means, the determination means, and the speech speed conversion means (for example, a program that causes the CPU to execute the speech speed conversion processing shown in FIG. 3). It may be installed in a general computer device such as a personal computer so that the computer device functions as the speech rate conversion device according to the present invention.

本発明の一実施形態に係る話速変換装置10の構成例を示すブロック図である。It is a block diagram which shows the structural example of the speech speed converter 10 which concerns on one Embodiment of this invention. 本実施形態に係る話速変換処理の処理過程を示す図である。It is a figure which shows the process of the speech speed conversion process which concerns on this embodiment. 変形例(1)に係る話速変換処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the speech speed conversion process which concerns on a modification (1).

符号の説明Explanation of symbols

10…話速変換装置、CH−in…入力端子、110…遅延回路、120…音量変化率算出部、130…判別部、140…話速変換部、CH−out…出力端子。   DESCRIPTION OF SYMBOLS 10 ... Speech speed conversion apparatus, CH-in ... Input terminal, 110 ... Delay circuit, 120 ... Volume change rate calculation part, 130 ... Discrimination part, 140 ... Speech speed conversion part, CH-out ... Output terminal.

Claims (4)

処理対象である音声信号の表す音の立ち上がり部分および該音の立下り部分に該当するフレームを前記音の音量の時間変化から特定し、そのフレームとそのフレームの前後の所定数のフレームについて、話速変換を禁止されている話速変換禁止フレームであると判別する判別手段と、
話速変換禁止フレームであると前記判別手段により判別されたフレームについてはそのまま出力する一方、その他のフレームについては、指定された話速になるように波形挿入または波形削除を行って出力する話速変換手段と、
を具備することを特徴とする話速変換装置。
The frame corresponding to the rising portion of the sound represented by the audio signal to be processed and the falling portion of the sound is identified from the temporal change in the volume of the sound, and the frame and a predetermined number of frames before and after the frame are talked about. A discriminating means for discriminating that the frame is a speech speed conversion prohibition frame prohibited from speed conversion;
While the frame determined by the determination unit as being a speech rate conversion prohibition frame is output as it is, the other frames are output by performing waveform insertion or waveform deletion so that the specified speech rate is obtained. Conversion means;
A speech rate conversion device comprising:
前記音声信号の表す音声の音量の対数値の時間変化率を示す値をフレーム毎に算出する算出手段を備え、
前記判別手段は、
前記音声信号を構成する複数のフレームのうち、前記算出手段により算出された値の大きさが所定の閾値を超えているフレームとそのフレームの前後の所定数のフレームについて、前記話速変換禁止フレームであると判別する
ことを特徴とする請求項1に記載の話速変換装置。
A calculation unit that calculates a value indicating a time change rate of a logarithmic value of a volume of a voice represented by the voice signal for each frame;
The discrimination means includes
Of the plurality of frames constituting the audio signal, the speech rate conversion prohibition frame for a frame whose magnitude calculated by the calculation means exceeds a predetermined threshold and a predetermined number of frames before and after the frame. It is discriminate | determined that it is. The speech-speed converter of Claim 1 characterized by the above-mentioned.
前記音声信号の表す音声の音量の時間変化をその音量で除して得られる値をフレーム毎に算出する算出手段を備え、
前記判別手段は、
前記音声信号を構成する複数のフレームのうち、前記算出手段により算出された値の大きさが所定の閾値を超えているフレームとそのフレームの前後の所定数のフレームについて、前記話速変換禁止フレームであると判別する
ことを特徴とする請求項1に記載の話速変換装置。
A calculation means for calculating a value obtained by dividing a time change of a volume of a voice represented by the voice signal by the volume, for each frame;
The discrimination means includes
Of the plurality of frames constituting the audio signal, the speech rate conversion prohibition frame for a frame whose magnitude calculated by the calculation means exceeds a predetermined threshold and a predetermined number of frames before and after the frame. It is discriminate | determined that it is. The speech-speed converter of Claim 1 characterized by the above-mentioned.
コンピュータ装置に、
処理対象である音声信号の表す音の立ち上がり部分および該音の立下り部分に該当するフレームを前記音の音量の時間変化から特定し、そのフレームとそのフレームの前後の所定数のフレームについて、話速変換を禁止されている話速変換禁止フレームであると判別する第1のステップと、
話速変換禁止フレームであると前記第1のステップにて判別されたフレームについてはそのまま出力する一方、その他のフレームについては、指定された話速になるように波形挿入または波形削除を行って出力する第2のステップと、
を実行させることを特徴とするプログラム。
Computer equipment,
The frame corresponding to the rising portion of the sound represented by the audio signal to be processed and the falling portion of the sound is identified from the temporal change in the volume of the sound, and the frame and a predetermined number of frames before and after the frame are talked about. A first step of determining that the frame is a speech speed conversion prohibited frame in which speed conversion is prohibited;
The frame determined in the first step is output as it is if it is a speech rate conversion prohibition frame, while the other frames are output after waveform insertion or waveform deletion so that the designated speech rate is obtained. A second step of:
A program characterized in that is executed.
JP2006292470A 2006-10-27 2006-10-27 Speech speed conversion apparatus and program Pending JP2008107706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2006292470A JP2008107706A (en) 2006-10-27 2006-10-27 Speech speed conversion apparatus and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2006292470A JP2008107706A (en) 2006-10-27 2006-10-27 Speech speed conversion apparatus and program

Publications (1)

Publication Number Publication Date
JP2008107706A true JP2008107706A (en) 2008-05-08

Family

ID=39441084

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006292470A Pending JP2008107706A (en) 2006-10-27 2006-10-27 Speech speed conversion apparatus and program

Country Status (1)

Country Link
JP (1) JP2008107706A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012070655A1 (en) 2010-11-25 2012-05-31 ヤマハ株式会社 Masker sound generation device, storage medium which stores masker sound signal, masker sound player device, and program

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63223798A (en) * 1987-03-13 1988-09-19 松下電器産業株式会社 Voice recognition
JPH10288995A (en) * 1997-04-16 1998-10-27 Oki Electric Ind Co Ltd Method for learning hidden markov model
JP2000276200A (en) * 1999-03-26 2000-10-06 Matsushita Electric Works Ltd Voice quality converting system
JP2001184100A (en) * 1999-12-24 2001-07-06 Anritsu Corp Speaking speed converting device
JP2001242900A (en) * 2000-02-25 2001-09-07 Yamaha Corp Sound's time expansion device, method and recoding medium for recording sound's times expansion program
JP2004355015A (en) * 1996-11-20 2004-12-16 Yamaha Corp Device and method to analyze sound signal
WO2006077626A1 (en) * 2005-01-18 2006-07-27 Fujitsu Limited Speech speed changing method, and speech speed changing device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63223798A (en) * 1987-03-13 1988-09-19 松下電器産業株式会社 Voice recognition
JP2004355015A (en) * 1996-11-20 2004-12-16 Yamaha Corp Device and method to analyze sound signal
JPH10288995A (en) * 1997-04-16 1998-10-27 Oki Electric Ind Co Ltd Method for learning hidden markov model
JP2000276200A (en) * 1999-03-26 2000-10-06 Matsushita Electric Works Ltd Voice quality converting system
JP2001184100A (en) * 1999-12-24 2001-07-06 Anritsu Corp Speaking speed converting device
JP2001242900A (en) * 2000-02-25 2001-09-07 Yamaha Corp Sound's time expansion device, method and recoding medium for recording sound's times expansion program
WO2006077626A1 (en) * 2005-01-18 2006-07-27 Fujitsu Limited Speech speed changing method, and speech speed changing device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012070655A1 (en) 2010-11-25 2012-05-31 ヤマハ株式会社 Masker sound generation device, storage medium which stores masker sound signal, masker sound player device, and program
US9390703B2 (en) 2010-11-25 2016-07-12 Yamaha Corporation Masking sound generating apparatus, storage medium stored with masking sound signal, masking sound reproducing apparatus, and program

Similar Documents

Publication Publication Date Title
CN110265064B (en) Audio frequency crackle detection method, device and storage medium
JP4587160B2 (en) Signal processing apparatus and method
CN105190746B (en) Method and apparatus for detecting target keyword
US8473282B2 (en) Sound processing device and program
JPH10257596A (en) Speech speed conversion method and its device
JP2011027825A (en) Device, and method for processing sound, and program
US9754606B2 (en) Processing apparatus, processing method, program, computer readable information recording medium and processing system
JP2005292812A (en) Method and device to discriminate voice and noise, method and device to reduce noise, voice and noise discriminating program, noise reducing program, and recording medium for program
KR100806155B1 (en) Method and system for enabling audio speed conversion
US11367457B2 (en) Method for detecting ambient noise to change the playing voice frequency and sound playing device thereof
JP4548953B2 (en) Voice automatic gain control apparatus, voice automatic gain control method, storage medium storing computer program having algorithm for voice automatic gain control, and computer program having algorithm for voice automatic gain control
JP5815435B2 (en) Sound source position determination apparatus, sound source position determination method, program
JP3555490B2 (en) Voice conversion system
CN111243618B (en) Method, device and electronic equipment for determining specific voice fragments in audio
JP2008107706A (en) Speech speed conversion apparatus and program
WO2014099740A1 (en) Histogram based pre-pruning scheme for active hmms
CN113658581B (en) Acoustic model training method, acoustic model processing method, acoustic model training device, acoustic model processing equipment and storage medium
CN111477246A (en) Voice processing method and device and intelligent terminal
CN105632523A (en) Method and device for regulating sound volume output value of audio data, and terminal
CN112669872B (en) Audio data gain method and device
WO2017085815A1 (en) Perplexed state determination system, perplexed state determination method, and program
US8306828B2 (en) Method and apparatus for audio signal expansion and compression
CN112786047A (en) Voice processing method, device, equipment, storage medium and intelligent sound box
JP5585432B2 (en) Acoustic signal processing apparatus and method, and program
JP2008139573A (en) Vocal quality conversion method, vocal quality conversion program and vocal quality conversion device

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20090820

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20110601

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20110614

RD13 Notification of appointment of power of sub attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7433

Effective date: 20110622

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20110624

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20110810

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20120508