JP4461985B2

JP4461985B2 - Speech waveform expansion device, waveform expansion method, speech waveform reduction device, waveform reduction method, program, and speech processing device

Info

Publication number: JP4461985B2
Application number: JP2004281430A
Authority: JP
Inventors: 博康井手
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2004-09-28
Filing date: 2004-09-28
Publication date: 2010-05-12
Anticipated expiration: 2024-09-28
Also published as: JP2006098477A

Abstract

<P>PROBLEM TO BE SOLVED: To realize speech reduction and expansion with arbitrary magnification while reducing deterioration of a speech waveform. <P>SOLUTION: A speech processor 100 divides an inputted speech signal into pitch waveforms. Similarities of each pitch waveform with pitch waveforms right before and after it are calculated and pitch waveforms are generated together with pitch waveforms with higher similarities. The generated pitch waveforms are inserted between the source pitch waveforms for the generation when expanded or replaced with the source pitch waveforms for the generation when reduced. The pitch waveforms are processed in the decreasing order of similarities within processing units. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声波形伸張装置、波形伸張方法、音声波形縮小装置、波形縮小方法、プログラム並びに音声処理装置に関し、特に、入力された音声波形を時間的に伸張して出力する音声波形伸張装置、波形伸張方法及びプログラム、入力された音声波形を時間的に縮小して出力する音声波形縮小装置、波形縮小方法及びプログラム、並びに入力された音声波形を時間的に伸張または縮小して出力する音声処理装置に関する。 The present invention relates to an audio waveform expansion device, a waveform expansion method, an audio waveform reduction device, a waveform reduction method, a program, and an audio processing device, and in particular, an audio waveform expansion device that extends and outputs an input audio waveform in time, Waveform decompression method and program, speech waveform reduction device for reducing and outputting input speech waveform in time, waveform reduction method and program, and speech processing for extending and outputting input speech waveform in time Relates to the device.

音声データを変形する処理の１つとして、音声波形の長さを時間的にｍ／ｎ倍（ｍ、ｎは自然数）に縮小・伸張するＴＤＨＳ（Time Domain Harmonic Scaling）方式がある（例えば、特許文献１）。図７は、ＴＤＨＳ方式の原理を説明するための図である。 As one of the processes for transforming audio data, there is a TDHS (Time Domain Harmonic Scaling) system that reduces and expands the length of the audio waveform to m / n times (m and n are natural numbers) in time (for example, patents). Reference 1). FIG. 7 is a diagram for explaining the principle of the TDHS method.

図示するように、ＴＤＨＳ方式では、今回処理を開始する場所０から始まる長さｍＴの部分音声波形（Ｔは繰り返される波形の１周期分の長さとする）と、現在の場所から（ｎ−ｍ）Ｔの場所から始まる長さｍＴの部分音声波形とを重み付け加算し、生成した部分音声波形（長さｍＴ）で、０からｎＴまでの部分を置き換える。つまり、１回の部分音声波形の置き換えにより、長さが（ｍ−ｎ）Ｔだけ増減する。これを繰り返して、全体として音声波形の長さを時間的にｍ／ｎ倍（ｍ、ｎは自然数）に縮小・伸張する。 As shown in the figure, in the TDHS system, a partial speech waveform having a length mT starting from a location 0 where processing is started this time (T is a length of one cycle of a repeated waveform) and a current location (nm). ) Weighted addition of the partial speech waveform of length mT starting from the location of T, and replaces the portion from 0 to nT with the generated partial speech waveform (length mT). That is, the length is increased or decreased by (mn) T by one replacement of the partial speech waveform. By repeating this, the length of the speech waveform as a whole is reduced / expanded to m / n times (m and n are natural numbers) in terms of time.

この場合、重み付け加算の対象となっている２つの部分音声波形のうち、時間的に過去側の部分音声波形には重みＷ（ｋ）が、時間的に未来側の部分音声波形には重み１−Ｗ（ｋ）が乗算される。ここで、Ｗ（ｋ）の値は部分音声波形の先頭のサンプル位置から末尾のサンプル位置に向かって、値０から値１まで直線的に変化する。このような重み係数Ｗ（ｋ）および１−Ｗ（ｋ）を用いることで、連続性を保持しながら波形を縮小・伸張することができる。
特開平８−１４６９９３号公報（第３−５頁、図１２−１５） In this case, of the two partial speech waveforms to be weighted and added, the weight W (k) is used for the partial speech waveform on the past side in time, and the weight 1 is used for the partial speech waveform on the future side in terms of time. -W (k) is multiplied. Here, the value of W (k) changes linearly from the value 0 to the value 1 from the head sample position to the tail sample position of the partial speech waveform. By using such weighting factors W (k) and 1-W (k), the waveform can be reduced / expanded while maintaining continuity.
JP-A-8-146993 (page 3-5, FIG. 12-15)

上記ＴＤＨＳ方式では、任意の有理数倍（ｍ／ｎ）でしか信号波形を縮小・伸張することができなかった。 In the TDHS system, the signal waveform can be reduced / expanded only by an arbitrary rational multiple (m / n).

本発明は、上記問題点に鑑みてなされたもので、任意の倍率で信号波形を伸張することを可能とする音声波形伸張装置、波形伸張処理方法及びプログラム、任意の倍率で信号波形を縮小することを可能とする音声波形縮小装置、波形縮小処理方法及びプログラム、並びに、任意の倍率で信号波形を伸張または縮小することを可能とする音声処理装置を提供することを目的とする。 The present invention has been made in view of the above problems, and is a speech waveform expansion device, a waveform expansion processing method and a program that can expand a signal waveform at an arbitrary magnification, and reduce a signal waveform at an arbitrary magnification. It is an object of the present invention to provide a voice waveform reduction device, a waveform reduction processing method and program, and a voice processing device that can expand or reduce a signal waveform at an arbitrary magnification.

本発明の第１の観点にかかる音声波形伸張装置は、
入力波形を時間軸上で伸張して出力する音声波形伸張装置であって、
入力波形を表すデータを受け付ける入力波形受付手段と、
入力波形を伸張する倍率の入力を受け付ける伸張倍率受付手段と、
前記入力波形受付手段で受け付けた入力波形からピッチ波形を切り出す切り出し手段と、
前記切り出し手段で切り出した各ピッチ波形について、それぞれ該ピッチ波形と、時間軸上の前後で該ピッチ波形と隣接するピッチ波形のうちの一方のピッチ波形との類似度を算出する類似度算出手段と、
前記類似度算出手段で算出された類似度を尺度として各ピッチ波形に処理順序を付与する順序付け手段と、
前記順序付け手段で付与された処理順序でピッチ波形を選択し、この選択したピッチ波形と該ピッチ波形に隣接する一方のピッチ波形とを重み付け加算することによって、挿入用の波形を生成する伸張波形生成手段と、
前記伸張波形生成手段で生成した波形を、入力波形上で重み付け加算の対象となった２つのピッチ波形の間に挿入する伸張波形接続手段と、
を具備し、
指定された倍率の波形長になるまで、前記処理順序に応じて選択するピッチ波形を更新しながら、前記伸張波形生成手段と、前記伸張波形接続手段とによる処理を繰り返すこと、
を特徴とする。 A speech waveform expansion device according to a first aspect of the present invention is:
A speech waveform expansion device for expanding and outputting an input waveform on a time axis,
Input waveform receiving means for receiving data representing the input waveform;
An expansion ratio acceptance means for accepting an input of a magnification for expanding the input waveform;
Cutting means for cutting out a pitch waveform from the input waveform received by the input waveform receiving means;
For each pitch waveform cut out by the cut-out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Expanded waveform generation that generates a waveform for insertion by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means,
An expanded waveform connecting means for inserting the waveform generated by the expanded waveform generating means between two pitch waveforms that are subjected to weighted addition on the input waveform;
Equipped with,
Repeating the processing by the expanded waveform generating means and the expanded waveform connecting means while updating the pitch waveform to be selected according to the processing order until the waveform length of the specified magnification is reached.
It is characterized by.

この発明によれば、波形の類似度の高い順、すなわち、同じような波形が繰り返されている部分から順にピッチ波形を選択して伸張波形を生成する。このため、音質の劣化を抑えながら、任意の指定された倍率に入力音声を伸張して出力できる。 According to the present invention, a stretched waveform is generated by selecting a pitch waveform in order from a waveform having a high degree of similarity, that is, a portion where similar waveforms are repeated. For this reason, it is possible to expand and output the input sound to an arbitrarily specified magnification while suppressing deterioration of sound quality.

上記音声波形伸張装置において、
前記切り出し手段は、入力波形から一定長の部分波形を切り出し、切り出した部分波形単位でピッチ波形を切り出すことが望ましい。
この場合、前記順序付け手段は、前記一定長の部分波形単位でピッチ波形の処理順序付けを行う。 In the speech waveform expansion device,
The cutout means cuts out a partial waveform having a certain length from the input waveform, and cuts out a pitch waveform in units of cutout partial waveforms.
In this case, the ordering means performs processing of ordering the pitch waveforms in units of the fixed-length partial waveform.

この発明によれば、途中でピッチが変化した場合に、その変化に追随して入力波形を縮小・伸張することができる。 According to the present invention, when the pitch changes midway, the input waveform can be reduced / expanded following the change.

本発明の第２の観点にかかる波形伸張方法は、
入力波形を表すデータを受け付ける入力波形受付ステップと、
入力波形を伸張する倍率の入力を受け付ける伸張倍率受付ステップと、
前記入力波形受付ステップで受け付けた入力波形からピッチ波形を切り出す切り出しステップと、
前記切り出しステップにおいて切り出した各ピッチ波形について、それぞれ該ピッチ波形と、時間軸上の前後で該ピッチ波形と隣接するピッチ波形のうちの一方のピッチ波形との類似度を算出する類似度算出ステップと、
前記類似度算出ステップで算出された類似度を尺度として各ピッチ波形に処理順序を付与する順序付けステップと、
前記順序付けステップで付与された処理順序でピッチ波形を選択し、この選択したピッチ波形と該ピッチ波形に隣接する一方のピッチ波形とを重み付け加算することによって、挿入用の波形を生成する伸張波形生成ステップと、
前記伸張波形生成ステップで生成した波形を、入力波形上で重み付け加算の対象となった２つのピッチ波形の間に挿入する伸張波形接続ステップと、
を備え、
指定された倍率の波形長になるまで、前記処理順序に応じて選択するピッチ波形を更新しながら、前記伸張波形生成ステップと、前記伸張波形接続ステップとによる処理を繰り返すことを特徴とする。 The waveform expansion method according to the second aspect of the present invention is:
An input waveform receiving step for receiving data representing the input waveform;
An expansion magnification reception step for receiving an input of a magnification for expanding the input waveform;
A step of cutting out a pitch waveform from the input waveform received in the input waveform receiving step;
For each pitch waveform cut out in the cut-out step, a similarity calculation step for calculating a similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
An ordering step of assigning a processing order to each pitch waveform using the similarity calculated in the similarity calculation step as a scale;
Choose Back Symbol ordered pitch waveforms applied treatment sequence in step, by weighted addition of the one pitch waveform adjacent to the selected pitch waveform and the pitch waveform, decompression waveform to generate a waveform for insertion Generation step;
An expanded waveform connecting step for inserting the waveform generated in the expanded waveform generating step between two pitch waveforms that are subjected to weighted addition on the input waveform;
With
Until waveform length of the specified magnification, while updating the pitch waveforms to be selected according to the processing order, and repeating said expansion waveform generation step, a process by said expansion waveform connecting step.

本発明の第３の観点にかかるプログラムは、
入力波形を時間軸上で伸張して出力する音声波形伸張装置に用いられるコンピュータを、
入力波形を表すデータを受け付ける入力波形受付手段と、
入力波形を伸張する倍率の入力を受け付ける伸張倍率受付手段と、
前記入力波形受付手段で受け付けた入力波形からピッチ波形を切り出す切り出し手段と、
前記切り出し手段で切り出した各ピッチ波形について、それぞれ該ピッチ波形と、時間軸上の前後で該ピッチ波形と隣接するピッチ波形のうちの一方のピッチ波形との類似度を算出する類似度算出手段と、
前記類似度算出手段で算出された類似度を尺度として各ピッチ波形に処理順序を付与する順序付け手段と、
前記順序付け手段で付与された処理順序でピッチ波形を選択し、この選択したピッチ波形と該ピッチ波形に隣接する一方のピッチ波形とを重み付け加算することによって、挿入用の波形を生成する伸張波形生成手段と、
前記伸張波形生成手段で生成した波形を、入力波形上で重み付け加算の対象となった２つのピッチ波形の間に挿入する伸張波形接続手段と、
して機能させ、
指定された倍率の波形長になるまで、前記処理順序に応じて選択するピッチ波形を更新しながら、前記伸張波形生成手段と、前記伸張波形接続手段とによる処理を繰り返すようにしたことを特徴とする。 The program according to the third aspect of the present invention is:
The computer used in the speech waveform decompression apparatus for decompressing and outputting input waveform on the time axis,
Input waveform receiving means for receiving data representing the input waveform ;
And stretching magnification accepting means for accepting an input of a magnification stretching the input waveform,
And the cut-out means that you cut out the pitch waveform from the input waveform received by the input waveform receiving means,
For each pitch waveform cut out by the cut- out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Expanded waveform generation that generates a waveform for insertion by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means ,
An expanded waveform connecting means for inserting the waveform generated by the expanded waveform generating means between two pitch waveforms that are subjected to weighted addition on the input waveform;
To function,
Until waveform length of the specified magnification, while updating the pitch waveforms to be selected according to the processing order, and said decompressed waveform generating means, that it has the decompressed waveform connecting means and repeatedly to by the processing returns Suyo Features .

本発明の第４の観点にかかる音声波形縮小装置は、
入力波形を時間軸上で縮小して出力する音声波形縮小装置であって、
入力波形を表すデータを受け付ける入力波形受付手段と、
入力波形を縮小する倍率の入力を受け付ける縮小倍率受付手段と、
前記入力波形受付手段で受け付けた入力波形からピッチ波形を切り出す切り出し手段と、
前記切り出し手段で切り出した各ピッチ波形について、それぞれ該ピッチ波形と、時間軸上の前後で該ピッチ波形と隣接するピッチ波形のうちの一方のピッチ波形との類似度を算出する類似度算出手段と、
前記類似度算出手段で算出された類似度を尺度として各ピッチ波形に処理順序を付与する順序付け手段と、
前記順序付け手段で付与された処理順序でピッチ波形を選択し、この選択したピッチ波形と該ピッチ波形に隣接する一方のピッチ波形とを重み付け加算することによって、置き換え用の波形を生成する縮小波形生成手段と、
前記縮小波形生成手段で生成した波形を、入力波形上で前記縮小波形生成手段において重み付け加算の対象となった２つのピッチ波形と置き換える縮小波形接続手段と、
を具備し、
指定された倍率の波形長になるまで、前記処理順序に応じて選択するピッチ波形を更新しながら、前記縮小波形生成手段と、前記縮小波形接続手段とによる処理を繰り返すこと、
を特徴とする。 The speech waveform reduction device according to the fourth aspect of the present invention is:
A speech waveform reduction device that reduces and outputs an input waveform on a time axis,
Input waveform receiving means for receiving data representing the input waveform;
A reduction magnification receiving means for receiving an input of a magnification for reducing the input waveform;
Cutting means for cutting out a pitch waveform from the input waveform received by the input waveform receiving means;
For each pitch waveform cut out by the cut-out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Reduced waveform generation that generates a replacement waveform by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means,
The waveform generated by the reduced waveform generating means, and two reduced waveform connecting means Ru replaced with pitch waveform subject to weighting addition in the reduced waveform generating means on the input waveform,
Equipped with,
Repeating the processing by the reduced waveform generating means and the reduced waveform connecting means while updating the pitch waveform to be selected according to the processing order until the waveform length of the specified magnification is reached.
It is characterized by.

この発明によれば、波形の類似度の高い順、すなわち、同じような波形が繰り返されている部分から順にピッチ波形を選択して縮小波形を生成する。このため、音質の劣化を抑えながら、任意の指定された倍率に入力音声を縮小して出力できる。 According to the present invention, a reduced waveform is generated by selecting a pitch waveform in order from the highest waveform similarity, that is, from a portion where similar waveforms are repeated. For this reason, it is possible to reduce the input voice to an arbitrarily specified magnification and output the output while suppressing deterioration in sound quality.

上記音声波形縮小装置において、
前記切り出し手段は、入力波形から一定長の部分波形を切り出し、切り出した部分波形単位でピッチ波形を切り出すことが望ましい。
この場合、前記順序付け手段は、前記一定長の部分波形単位でピッチ波形の処理順序付けを行う。 In the speech waveform reduction device,
The cutout means cuts out a partial waveform having a certain length from the input waveform, and cuts out a pitch waveform in units of cutout partial waveforms.
In this case, the ordering means performs processing of ordering the pitch waveforms in units of the fixed-length partial waveform.

本発明の第５の観点にかかる波形縮小方法は、
入力波形を表すデータを受け付けた入力波形受付ステップと、
入力波形を縮小する倍率の入力を受け付ける縮小倍率受付ステップと、
前記入力波形受付ステップで受け付けた入力波形からピッチ波形を切り出す切り出しステップと、
前記切り出しステップにおいて切り出した各ピッチ波形について、それぞれ該ピッチ波形と、時間軸上の前後で該ピッチ波形と隣接するピッチ波形のうちの一方のピッチ波形との類似度を算出する類似度算出ステップと、
前記類似度算出ステップで算出された類似度を尺度として各ピッチ波形に処理順序を付与する順序付けステップと、
前記順序付けステップで付与された処理順序でピッチ波形を選択し、この選択したピッチ波形と該ピッチ波形に隣接する一方のピッチ波形とを重み付け加算することによって、置き換え用の波形を生成する縮小波形生成ステップと、
前記縮小波形生成ステップで生成した波形を、入力波形上で前記縮小波形生成ステップにおいて重み付け加算の対象となった２つのピッチ波形と置き換える縮小波形接続ステップと、
を備え、
指定された倍率の波形長になるまで、前記処理順序に応じて選択するピッチ波形を更新しながら、前記縮小波形生成ステップと、前記縮小波形接続ステップとによる処理を繰り返すことを特徴とする。 The waveform reduction method according to the fifth aspect of the present invention is:
An input waveform reception step for receiving data representing the input waveform;
A reduction magnification acceptance step for accepting an input of a magnification for reducing the input waveform;
A step of cutting out a pitch waveform from the input waveform received in the input waveform receiving step;
For each pitch waveform cut out in the cut-out step, a similarity calculation step for calculating a similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
An ordering step of assigning a processing order to each pitch waveform using the similarity calculated in the similarity calculation step as a scale;
Choose Back Symbol ordered pitch waveform processing order granted in step, by weighted addition of the one pitch waveform adjacent to the selected pitch waveform and the pitch waveform, reduced to generate a waveform for replacement waveform Generation step;
The waveform generated by the reduced waveform generation step, and two reduced waveform connecting step of Ru replaced with pitch waveform subject to weighting addition in the reduced waveform generation step on the input waveform,
With
Until waveform length of the specified magnification, while updating the pitch waveforms to be selected according to the processing order, and repeating said reduced waveform generation step, a process by said reduction waveform connecting step.

本発明の第６の観点にかかるプログラムは、
入力波形を時間軸上で縮小して出力する音声波形縮小装置に用いられるコンピュータを、
入力波形を表すデータを受け付ける入力波形受付手段と、
入力波形を縮小する倍率の入力を受け付ける縮小倍率受付手段と、
前記入力波形受付手段で受け付けた入力波形からピッチ波形を切り出す切り出し手段と、
前記切り出し手段で切り出した各ピッチ波形について、それぞれ該ピッチ波形と、時間軸上の前後で該ピッチ波形と隣接するピッチ波形のうちの一方のピッチ波形との類似度を算出する類似度算出手段と、
前記類似度算出手段で算出された類似度を尺度として各ピッチ波形に処理順序を付与する順序付け手段と、
前記順序付け手段で付与された処理順序でピッチ波形を選択し、この選択したピッチ波形と該ピッチ波形に隣接する一方のピッチ波形とを重み付け加算することによって、置き換え用の波形を生成する縮小波形生成手段と、
前記縮小波形生成手段で生成した波形を、入力波形上で前記縮小波形生成手段において重み付け加算の対象となった２つのピッチ波形と置き換える縮小波形接続手段と、
して機能させ、
指定された倍率の波形長になるまで、前記処理順序に応じて選択するピッチ波形を更新しながら、前記縮小波形生成手段と、前記縮小波形接続手段とによる処理を繰り返すようにしたことを特徴とする。 The program according to the sixth aspect of the present invention is:
The computer used to input waveform to the sound wave reduction device for outputting reduced on the time axis,
Input waveform receiving means for receiving data representing the input waveform ;
A reduction ratio accepting means for accepting an input of a magnification reducing the input waveform,
And the cut-out means that you cut out the pitch waveform from the input waveform received by the input waveform receiving means,
For each pitch waveform cut out by the cut- out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Reduced waveform generation that generates a replacement waveform by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means ,
The waveform generated by the reduced waveform generating means, a reduced waveform connecting means for replacing two pitch waveforms subject to weighting addition in the reduced waveform generating means on the input waveform,
To function,
Until waveform length of the specified magnification, while updating the pitch waveforms to be selected according to the processing order, and the reduced waveform generating means, that it has to the reduced waveform connecting means and repeatedly to by the processing returns Suyo Features .

本発明の第７の観点にかかる音声処理装置は、
入力波形を時間軸上で伸張または縮小して出力する音声処理装置であって、
入力波形を表すデータを受け付ける入力波形受付手段と、
入力波形を伸張または縮小する倍率の入力を受け付ける倍率受付手段と、
前記入力波形受付手段で受け付けた入力波形からピッチ波形を切り出す切り出し手段と、
前記切り出し手段で切り出した各ピッチ波形について、それぞれ該ピッチ波形と、時間軸上の前後で該ピッチ波形と隣接するピッチ波形のうちの一方のピッチ波形との類似度を算出する類似度算出手段と、
前記類似度算出手段で算出された類似度を尺度として各ピッチ波形に処理順序を付与する順序付け手段と、
前記順序付け手段で付与された処理順序でピッチ波形を選択し、この選択したピッチ波形と該ピッチ波形に隣接する一方のピッチ波形とを重み付け加算することによって、挿入用の波形を生成する伸張波形生成手段と、
前記伸張波形生成手段で生成した波形を、入力波形上で重み付け加算の対象となった２つのピッチ波形の間に挿入する伸張波形接続手段と、
前記順序付け手段で付与された処理順序でピッチ波形を選択し、この選択したピッチ波形と該ピッチ波形に隣接する一方のピッチ波形とを重み付け加算することによって、置き換え用の波形を生成する縮小波形生成手段と、
前記縮小波形生成手段で生成した波形を、入力波形上で前記縮小波形生成手段において重み付け加算の対象となった２つのピッチ波形と置き換える縮小波形接続手段と、
入力波形を伸張するか縮小するかを判別する倍率判別手段と、
前記倍率判別手段により、入力波形を伸張すると判別すると判別された場合に、指定された倍率の波形長になるまで、前記処理順序に応じて選択するピッチ波形を更新しながら、前記伸張波形生成手段と、前記伸張波形接続手段とによる処理を繰り返す第１の繰り返し手段と、
前記倍率判別手段により、入力波形を縮小すると判別すると判別された場合に、指定された倍率の波形長になるまで、前記処理順序に応じて選択するピッチ波形を更新しながら、前記縮小波形生成手段と、前記縮小波形接続手段とによる処理を繰り返す第２の繰り返し手段と、
を具備する。 The speech processing apparatus according to the seventh aspect of the present invention is:
A speech processing apparatus that outputs an input waveform by extending or reducing on a time axis,
Input waveform receiving means for receiving data representing the input waveform;
Magnification accepting means for accepting an input of a magnification for expanding or reducing the input waveform;
Cutting means for cutting out a pitch waveform from the input waveform received by the input waveform receiving means;
For each pitch waveform cut out by the cut-out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Expanded waveform generation that generates a waveform for insertion by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means,
An expanded waveform connecting means for inserting the waveform generated by the expanded waveform generating means between two pitch waveforms that are subjected to weighted addition on the input waveform;
Reduced waveform generation that generates a replacement waveform by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means,
The waveform generated by the reduced waveform generating means, and two reduced waveform connecting means Ru replaced with pitch waveform subject to weighting addition in the reduced waveform generating means on the input waveform,
Magnification discrimination means for discriminating whether to expand or reduce the input waveform;
When it is determined that the input waveform is to be expanded by the magnification determining means, the expanded waveform generating means is updated while updating the pitch waveform selected according to the processing order until the waveform length of the specified magnification is reached. And first repeating means for repeating the processing by the expanded waveform connecting means,
When it is determined that the input waveform is to be reduced by the magnification determining means, the reduced waveform generating means is updated while updating the pitch waveform selected according to the processing order until the waveform length of the specified magnification is reached. And second repeating means for repeating the processing by the reduced waveform connecting means,
It comprises.

本発明によれば、音質の劣化を抑えながら、任意の倍率で音声波形の縮小または伸張を行うことができる。 According to the present invention, it is possible to reduce or expand a speech waveform at an arbitrary magnification while suppressing deterioration in sound quality.

本発明にかかる実施形態を、以下図面を参照して説明する。 Embodiments according to the present invention will be described below with reference to the drawings.

（実施形態１）
実施形態１では、指定された倍率が１／２から２の間の場合を先に説明し、後に、指定された倍率が０から１／２の間または２以上の場合を説明する。なお、指定された倍率が１より大きい場合、入力された音声信号波形を伸張して出力し、１未満の場合、入力された音声信号波形を縮小して出力する。 (Embodiment 1)
In the first embodiment, the case where the specified magnification is between 1/2 and 2 will be described first, and the case where the specified magnification is between 0 and 1/2 or 2 or more will be described later. If the specified magnification is greater than 1, the input audio signal waveform is expanded and output. If it is less than 1, the input audio signal waveform is reduced and output.

図１は、本発明の実施形態にかかる音声処理装置の構成を示すブロック図である。図１に示すように、音声処理装置１００は、例えば、コンピュータなどの情報処理装置から構成される。入力装置１２と出力装置１３と記録媒体１７とが音声処理装置１００に接続される。音声処理装置１００は、入力装置１２から指示を受けて、記録媒体１７から入力された音声波形データを指定された倍率の長さに伸張または縮小し、記録媒体１７に出力する。 FIG. 1 is a block diagram showing a configuration of a sound processing apparatus according to an embodiment of the present invention. As shown in FIG. 1, the audio processing device 100 is configured by an information processing device such as a computer, for example. The input device 12, the output device 13, and the recording medium 17 are connected to the sound processing device 100. In response to the instruction from the input device 12, the audio processing device 100 expands or reduces the audio waveform data input from the recording medium 17 to the length of the designated magnification, and outputs it to the recording medium 17.

ここで、音声波形データとは、アナログ音声が所定のサンプリング周波数（例えば、８ｋＨｚ）で量子化されているサンプル値データである。 Here, the audio waveform data is sample value data in which analog audio is quantized at a predetermined sampling frequency (for example, 8 kHz).

記録媒体１７は、例えば、ＣＤ−ＲＷ（Compact Disk ReWritable）ディスクなどであり、音声波形データを格納する。 The recording medium 17 is, for example, a CD-RW (Compact Disk ReWritable) disk and stores audio waveform data.

音声処理装置１００は、制御部１１０と、入力制御部１２０と、出力制御部１３０と、プログラム格納部１４０と、記憶部１５０と、データ記録部１７０とを備える。 The speech processing apparatus 100 includes a control unit 110, an input control unit 120, an output control unit 130, a program storage unit 140, a storage unit 150, and a data recording unit 170.

制御部１１０は、例えば、ＣＰＵ（Central Processing Unit：中央演算処理装置）、ＲＡＭ（Random Access Memory）等を備え、プログラム格納部１４０に予め格納されている動作プログラムに基づいて、音声処理装置１００の各部を制御したり、データ記録部１７０を介して、記録媒体１７に格納されている音声波形データを読み出したり、伸張・縮小した音声波形データを記録媒体１７に書き込んだり、後述する波形伸張処理、波形縮小処理などを実行したりする。 The control unit 110 includes, for example, a central processing unit (CPU), a random access memory (RAM), and the like, and is based on an operation program stored in advance in the program storage unit 140. Control each unit, read out the audio waveform data stored in the recording medium 17 via the data recording unit 170, write the expanded / reduced audio waveform data into the recording medium 17, Perform waveform reduction processing.

制御部１１０は、記憶部１５０に一時記憶された音声波形データに対して、波形伸張処理または波形縮小処理を行い、伸張、縮小後の音声波形データを記憶部１５０に格納する。波形伸張処理の場合、制御部１１０は、音声波形データを繰り返し単位でいくつかの部分（以下、ピッチ波形と称する）に分割し、各ピッチ波形とその前後のピッチ波形のうちの一方とに基づいて、指定の倍率となるよう音声波形を生成し、生成した音声波形を生成する元となったピッチ波形の間に挿入する。波形縮小処理の場合、制御部１１０は、音声波形データをピッチ波形単位に分割し、各ピッチ波形とその前後のピッチ波形のうちの一方とに基づいて、指定の倍率となるように、音声波形を生成して、この音声波形を生成する元となったピッチ波形を先頭および最後尾とする区間の音声波形と置き換える。 The control unit 110 performs waveform expansion processing or waveform reduction processing on the audio waveform data temporarily stored in the storage unit 150, and stores the expanded and reduced audio waveform data in the storage unit 150. In the case of the waveform expansion processing, the control unit 110 divides the audio waveform data into several parts (hereinafter referred to as pitch waveforms) in units of repetition, and based on each pitch waveform and one of the pitch waveforms before and after that. Then, a speech waveform is generated so as to have a specified magnification, and inserted between the pitch waveforms from which the generated speech waveform is generated. In the case of the waveform reduction processing, the control unit 110 divides the voice waveform data into pitch waveform units, and the voice waveform is set to a specified magnification based on each pitch waveform and one of the pitch waveforms before and after the pitch waveform. Is replaced with the speech waveform of the section having the pitch waveform from which the speech waveform is generated as the head and tail.

制御部１１０は、ピッチ波形を生成する時に、その前後のピッチ波形のうち、注目しているピッチ波形との相関が高い方を判別する。相関が高いということは、２つのピッチ波形が類似しているということである。より類似するピッチ波形からピッチ波形を生成すればするほど、得られる伸張（縮小）ピッチ波形の劣化を抑えることができる。 When generating the pitch waveform, the control unit 110 discriminates the one having a higher correlation with the pitch waveform of interest among the pitch waveforms before and after the pitch waveform. High correlation means that the two pitch waveforms are similar. The more the pitch waveform is generated from the more similar pitch waveform, the more the deterioration of the obtained expansion (reduction) pitch waveform can be suppressed.

制御部１１０は、プログラム格納部１４０に予め記憶された動作プログラム等を読み出して実行することにより、例えば図２に示すような、分割部１１１、ピッチ抽出部１１２、ピッチ選択部１１３、波形縮小／伸張部１１４等を実現する。 The control unit 110 reads out and executes an operation program or the like stored in advance in the program storage unit 140, thereby, for example, as shown in FIG. 2, a division unit 111, a pitch extraction unit 112, a pitch selection unit 113, a waveform reduction / reduction unit, and the like. The decompression unit 114 and the like are realized.

分割部１１１は、入力された音声信号を一定時間長（サンプル数Ｍ）の音声フレームに分割し、ピッチ抽出部１１２とピッチ縮小／伸張部に送信する。なお、音声フレームの長さは、１つの音声フレーム内でピッチ波形を判別・切り出しできるように、判別するピッチ波形と比べて、十分に長い必要がある。経験上、音声フレームの長さは、最低でも音声フレーム内に含まれるピッチ波形の長さの３．４〜４倍程度の長さが必要である。 The dividing unit 111 divides the input audio signal into audio frames having a predetermined time length (number of samples M), and transmits the audio frames to the pitch extracting unit 112 and the pitch reducing / expanding unit. Note that the length of the voice frame needs to be sufficiently longer than the pitch waveform to be discriminated so that the pitch waveform can be discriminated / cut out within one voice frame. From experience, the length of the voice frame needs to be at least about 3.4 to 4 times the length of the pitch waveform included in the voice frame.

ピッチ抽出部１１２は、音声フレーム内に存在するピッチ波形を判別する。例えば、音声フレーム内の各サンプル値を｛ｓ_０，ｓ_１，・・・，ｓ_Ｍ−１｝とおいたとき、次の数１で示される計算式をｔ_{ｓｔａｒｔ}からｔ_ｅｎｄまでの間で計算し、そのうち、ｅ_ｔが最小となるｔをピッチ波形の長さ（以下、Ｎと表記する）とする。ここで、ｔ_{ｓｔａｒｔ}、ｔ_ｅｎｄは縮小・伸張の対象となっている音声信号に応じて、ピッチ長がその範囲内にあることが妥当な範囲で変更可能である。例えば、人間の音声を縮小・伸張の対象とした場合は、ｔ_{ｓｔａｒｔ}を４００Ｈｚ程度に相当するサンプル数（サンプリングレート／４００）、ｔ_ｅｎｄを５０Ｈｚ程度に相当するサンプル数（サンプリングレート／５０）とする。 The pitch extraction unit 112 determines a pitch waveform present in the audio frame. For example, when each sample value in the voice frame is set as {s ₀ , s ₁ ,..., S _M−1 }, the calculation formula shown by the following equation 1 is calculated from t _start to t _end. Of these, _t that minimizes _et is the length of the pitch waveform (hereinafter referred to as N). Here, t _start and t _end can be changed within a reasonable range that the pitch length is within the range according to the audio signal to be reduced / expanded. For example, when human speech is to be reduced / expanded, t _start is the number of samples corresponding to about 400 Hz (sampling rate / 400), and t _end is the number of samples corresponding to about 50 Hz (sampling rate / 50). To do.

なお、他のピッチ抽出法の例としては、サンプル値が極小値あるいは極大値をとるサンプル点、もしくはサンプル値の正負が変わるサンプル点からピッチ長を推定して、ピッチ波形を切り出す方法が考えられる。例えば、極小値をとるサンプル点の音声フレームが、先頭から、６、１２、３５、４２、６６、７２、９５、１０２、１２６、・・・、の場合には、ピッチ抽出部１１２は、公差の並び方を判別してピッチ波形の長さを６０と推定する。 As another example of the pitch extraction method, a method of cutting the pitch waveform by estimating the pitch length from the sample point where the sample value takes the minimum value or the maximum value, or from the sample point where the sign value of the sample value changes can be considered. . For example, if the audio frame of the sample point taking the minimum value is 6, 12, 35, 42, 66, 72, 95, 102, 126,. And the length of the pitch waveform is estimated to be 60.

ピッチ抽出部１１２が算出したピッチ波形の長さに従って、音声フレームからピッチ波形を切り出すと、音声フレームの最後に、ピッチ波形の長さに満たない部分が残る。したがって、ピッチ抽出部１１２は残った部分を次回処理時に利用する。したがって、ピッチ抽出部１１２は次回処理時に、今回の未処理部分と音声フレームとを結合したものからピッチ波形を抽出する。もちろん、最後の音声フレームを処理している場合には、残った部分をそのまま出力する。 When the pitch waveform is cut out from the voice frame according to the length of the pitch waveform calculated by the pitch extraction unit 112, a portion less than the length of the pitch waveform remains at the end of the voice frame. Therefore, the pitch extraction unit 112 uses the remaining part in the next processing. Therefore, the pitch extraction unit 112 extracts a pitch waveform from the combination of the current unprocessed portion and the audio frame at the next processing. Of course, when the last audio frame is being processed, the remaining portion is output as it is.

このように、ピッチ抽出部１１２が音声フレーム単位でピッチ波形を抽出するのは、入力波形のピッチ長が途中で変化する可能性に対処するためである。 Thus, the reason why the pitch extraction unit 112 extracts the pitch waveform in units of audio frames is to cope with the possibility that the pitch length of the input waveform changes in the middle.

ピッチ選択部１１３は、ピッチ抽出部１１２でピッチ波形単位に分割された音声データの各ピッチ波形を所定の基準に従って、順序付けを行う。この所定の基準とは、仮に当該ピッチ波形に対して縮小／伸張処理を行ったとしても、音質が劣化しくいという基準である。具体例としては、数２に示すように、平均二乗誤差の小さな順である。ここで、Ｓ_ｘは音声フレームの先頭からｘ番目の位置にあるサンプル値であり、Ｐｌ_ｎは先頭からｎ番目のピッチ波形のサンプル数、ｐ_ｎは先頭からｎ番目のピッチ波形の先頭のサンプル値の、音声フレーム内の先頭から数えた個数である。 The pitch selection unit 113 orders the pitch waveforms of the audio data divided into pitch waveform units by the pitch extraction unit 112 according to a predetermined standard. The predetermined standard is a standard that the sound quality is not easily deteriorated even if the pitch waveform is reduced / expanded. As a specific example, as shown in Formula 2, the order of the mean square error is ascending. Here, S _x is the sample value in the x-th position from the head of the audio frame, Pl _n is the number of samples n-th pitch waveform from the beginning, the beginning of a sample of p _n is the n-th pitch waveform from the head This is the number of values counted from the beginning in the audio frame.

波形縮小／伸張部１１４は連続するピッチ波形から新しいピッチ波形を生成し、縮小、伸張の場合に応じて、ピッチ波形の置き換え、挿入を行う。ここで、新しいピッチ波形を生成するためのピッチ波形は注目しているピッチ波形と、そのピッチ波形の前後一方のピッチ波形である。注目しているピッチ波形の前後のピッチ波形のうち、いずれのピッチ波形を選択するかについては、以下の通りである。 The waveform reduction / expansion unit 114 generates a new pitch waveform from the continuous pitch waveform, and replaces and inserts the pitch waveform in accordance with the reduction and extension. Here, the pitch waveform for generating a new pitch waveform is a focused pitch waveform and one pitch waveform before and after the pitch waveform. Which pitch waveform is selected from the pitch waveforms before and after the pitch waveform of interest is as follows.

すなわち、今、注目しているピッチ波形のサンプル値列を｛ｘ_ｋ，ｘ_ｋ＋１，・・・，ｘ_{ｋ＋Ｎ−１}｝、このピッチ波形の前のピッチ波形のサンプル値列を｛ｘ_ｋ−Ｎ，ｘ_{ｋ−Ｎ＋１}，・・・，ｘ_ｋ−１｝、そして、このピッチ波形の後のピッチ波形のサンプル値列を｛ｘ_ｋ＋Ｎ，ｘ_{ｋ＋Ｎ＋１}，・・・，ｘ_{ｋ＋２Ｎ−１}｝とすると、注目しているピッチ波形とその前の区間のピッチ波形との相関係数ｃ_ａは数３に示す式を用いて求められ、注目しているピッチ波形とその後の区間のピッチ波形との相関係数ｃ_ｂは数４に示す式を用いて求められる。制御部１１０は、ｃ_ａ，ｃ_ｂの値のうち大きな方に対応するピッチ波形が他方のピッチ波形よりも相関が高いと判別し、相関が高い方のピッチ波形を選択する。 That is, the sample value sequence of the pitch waveform of interest is {x _k , x _{k + 1} ,..., X _{k + N−1} }, and the sample value sequence of the pitch waveform before this pitch waveform is {x _k−N , X _{k−N + 1} ,..., X _k−1 }, and the sample value sequence of the pitch waveform after this pitch waveform is {x _{k + N} , x _{k + N + 1} ,..., X _{k + 2N−1} } correlation coefficient c _a of interest to have a pitch waveform with the pitch waveforms of the previous section are determined using the equations shown in equation 3, attention to that the correlation between the pitch waveforms and the pitch waveforms subsequent sections the number c _b is determined using the equation shown in formula 4. Control unit 110, c _a, pitch waveform corresponding to the larger of the values of c _b is determined that there is high correlation than the other pitch waveform is selected pitch waveforms having a higher correlation.

数３及び数４に示した式は２つのピッチ波形の相互相関をとっているが、これらのピッチ波形は元々同じ音声波形データから取り出されたものである。このため、結局、数３及び数４に示した式は、音声波形データの自己相関をとっている。 The equations shown in Equations 3 and 4 take the cross-correlation of the two pitch waveforms, but these pitch waveforms are originally extracted from the same speech waveform data. Therefore, in the end, the equations shown in Equations 3 and 4 take the autocorrelation of the speech waveform data.

波形縮小／伸張部１１４は、波形縮小の場合、注目するピッチ波形と、このピッチ波形との相関係数が高い方のピッチ波形と、からピッチ波形を生成する。ピッチ波形の生成手順については後述する。そして、生成したピッチ波形で、注目するピッチ波形と、このピッチ波形との相関係数が高い方のピッチ波形とを置き換える操作を行う。 In the case of waveform reduction, the waveform reduction / expansion unit 114 generates a pitch waveform from a pitch waveform of interest and a pitch waveform having a higher correlation coefficient with the pitch waveform. The procedure for generating the pitch waveform will be described later. Then, an operation of replacing the pitch waveform of interest with the generated pitch waveform and the pitch waveform having a higher correlation coefficient with the pitch waveform is performed.

また、波形伸張の場合、波形縮小／伸張部１１４は、注目するピッチ波形と、このピッチ波形との相関係数が高い方のピッチ波形と、からピッチ波形を生成する。ピッチ波形の生成手順については後述する。そして、生成したピッチ波形を、注目するピッチ波形と、このピッチ波形との相関係数が高い方のピッチ波形との間に挿入する操作を行う。 In the case of waveform expansion, the waveform reduction / expansion unit 114 generates a pitch waveform from a pitch waveform of interest and a pitch waveform having a higher correlation coefficient with the pitch waveform. The procedure for generating the pitch waveform will be described later. Then, an operation of inserting the generated pitch waveform between the pitch waveform of interest and a pitch waveform having a higher correlation coefficient with the pitch waveform is performed.

図１に戻って、入力制御部１２０は、例えば、キーボードやポインティングデバイス等の入力装置１２を接続し、入力装置１２から入力された制御部１１０への指示などを受け付けて制御部１１０に伝達する。 Returning to FIG. 1, for example, the input control unit 120 connects the input device 12 such as a keyboard or a pointing device, receives an instruction to the control unit 110 input from the input device 12, and transmits the instruction to the control unit 110. .

出力制御部１３０は、例えば、ディスプレイやスピーカ等の出力装置１３を接続し、制御部１１０の処理結果などを必要に応じて出力装置１３に出力する。 For example, the output control unit 130 connects the output device 13 such as a display or a speaker, and outputs the processing result of the control unit 110 to the output device 13 as necessary.

プログラム格納部１４０は、ＲＯＭ（Read Only Memory）などによって構成され、制御部１１０が実行する動作プログラムを格納する。 The program storage unit 140 is configured by a ROM (Read Only Memory) or the like, and stores an operation program executed by the control unit 110.

記憶部１５０は、例えば、ハードディスク装置やＲＡＭ（Random Access Memory）などの記憶装置から構成され、データ記録部１７０から送られてきた音声波形データ、及び波形伸張処理、あるいは波形縮小処理後の音声波形データを一時記憶する。記憶部１５０は、一時記憶した音声波形データをデータ記録部１７０または制御部１１０に送り出す。 The storage unit 150 is composed of a storage device such as a hard disk device or RAM (Random Access Memory), for example, and the audio waveform data sent from the data recording unit 170 and the audio waveform after the waveform expansion process or the waveform reduction process. Temporarily store data. The storage unit 150 sends the temporarily stored audio waveform data to the data recording unit 170 or the control unit 110.

データ記録部１７０は、例えば、ＣＤ−ＲＷドライブなどであって、制御部１１０からの指示に従って、記録媒体１７に格納されている音声波形データを読み出す。また、伸張あるいは縮小された音声波形データを記録媒体１７に書き込む。 The data recording unit 170 is, for example, a CD-RW drive or the like, and reads audio waveform data stored in the recording medium 17 in accordance with an instruction from the control unit 110. Further, the expanded or reduced audio waveform data is written to the recording medium 17.

以下、図面を参照して波形縮小／伸張処理を説明する。図３はこの波形縮小／伸張処理のフローチャートである。なお、以下の説明では、縮小／伸張倍率ａは１／２から２の間とする。 The waveform reduction / expansion processing will be described below with reference to the drawings. FIG. 3 is a flowchart of the waveform reduction / expansion processing. In the following description, the reduction / expansion magnification a is between 1/2 and 2.

まず、制御部１１０は、指定された縮小／伸張倍率ａに基づき、目標縮小／伸張長Ｆａを算出する（ステップＳ１０１）。Ｆａの算出式は数５に示す。
（数５）
Ｆａ＝Ｆ×ａ−Ｆ
ここで、Ｆは音声フレームの長さ（サンプル数）である。入力波形を縮小する場合、Ｆａは負になり、伸張する場合、Ｆａは正になる。 First, the control unit 110 calculates a target reduction / expansion length Fa based on the designated reduction / expansion magnification a (step S101). The formula for calculating Fa is shown in Equation 5.
(Equation 5)
Fa = F × a−F
Here, F is the length (number of samples) of the audio frame. When the input waveform is reduced, Fa becomes negative, and when it expands, Fa becomes positive.

次に、制御部１１０内の分割部１１１は、入力波形を長さＦの音声フレームに分割する（ステップＳ１０２）。そして、最初の音声フレームを注目する音声フレームとする。 Next, the dividing unit 111 in the control unit 110 divides the input waveform into audio frames having a length F (step S102). Then, the first audio frame is set as a focused audio frame.

制御部１１０は、処理すべき入力波形が残っているか否かを判別し（ステップＳ１０３）、処理すべき入力波形が残っておらず入力波形全体を縮小あるいは伸張したと判別すると（ステップＳ１０３：ＮＯ）、波形縮小／伸張処理を終了する。処理すべき入力波形が残っていると判別した場合（ステップＳ１０３：ＹＥＳ）、制御部１１０は以下のステップＳ１０４からステップＳ１１２までを処理すべき入力波形が無くなるまで繰り返す。 The controller 110 determines whether or not the input waveform to be processed remains (step S103), and determines that the input waveform to be processed does not remain and the entire input waveform has been reduced or expanded (step S103: NO). ), The waveform reduction / expansion processing is terminated. When it is determined that the input waveform to be processed remains (step S103: YES), the control unit 110 repeats the following steps S104 to S112 until there is no input waveform to be processed.

ステップＳ１０４では、ピッチ抽出部１１２で、上述した手法により、音声フレームからピッチ波形を切り出す。そして、これをピッチ選択部１１３に転送する。ピッチ選択部１１３は、各ピッチ波形の類似度（隣接するピッチ波形との相関係数）を計算する（ステップＳ１０５、算出式は数２）。ピッチ選択部１１３は、ステップＳ１０５で算出した類似度の高いピッチ波形が先に縮小あるいは伸張を受けるようにピッチ波形の処理順序を決定する（ステップＳ１０６）。 In step S104, the pitch extraction unit 112 cuts out a pitch waveform from the audio frame by the method described above. Then, this is transferred to the pitch selection unit 113. The pitch selection unit 113 calculates the similarity of each pitch waveform ( correlation coefficient with an adjacent pitch waveform ) (step S105, the calculation formula is Equation 2). The pitch selection unit 113 determines the processing order of the pitch waveforms so that the high-similarity pitch waveform calculated in step S105 is first reduced or expanded (step S106).

波形縮小／伸張部１１４は、ピッチ選択部１１３が決定した処理順序にピッチ波形の縮小あるいは伸張を実行する。まず、波形縮小／伸張部１１４は、決定された処理順序でピッチ波形を選択する（ステップＳ１０７）。次に、処理中の音声フレーム内における処理累計がＦａの絶対値を超えているか否かを判別する（ステップＳ１０８）。 The waveform reduction / expansion unit 114 performs reduction or expansion of the pitch waveform in the processing order determined by the pitch selection unit 113. First, the waveform reduction / expansion unit 114 selects a pitch waveform in the determined processing order (step S107). Next, it is determined whether or not the accumulated processing within the audio frame being processed exceeds the absolute value of Fa (step S108).

音声フレーム内におけるピッチ波形の処理累計（伸張／縮小で生成した音声波形の長さを総計する）がＦａの絶対値を超えていると判別した場合（ステップＳ１０８：ＹＥＳ）、波形縮小／伸張部１１４は、処理累計とＦａの絶対値との差を次回の処理に繰り越す（ステップＳ１０９）。つまり、２回目の音声フレームの処理において、ステップＳ１０８では、音声フレーム内におけるピッチ波形の処理累計が２｜Ｆａ｜−（前回のピッチ波形の処理累計）を超えたか否かを判別する。以降、それまでのピッチ波形の処理累計が次回の処理範囲に影響を与え、ステップＳ１０８では、今回のピッチ波形の処理累計と比較される値が変動する。ステップＳ１０９の処理が終了すると、処理はステップＳ１０３へ移り、次の音声フレームを処理する。 When it is determined that the accumulated processing of pitch waveforms in the speech frame (total length of speech waveforms generated by expansion / reduction) exceeds the absolute value of Fa (step S108: YES), the waveform reduction / expansion unit In step S109, the difference 114 between the accumulated process and the absolute value of Fa is carried over to the next process. That is, in the processing of the second audio frame, in step S108, it is determined whether or not the accumulated processing of the pitch waveform in the audio frame has exceeded 2 | Fa |-(the accumulated processing of the previous pitch waveform). Thereafter, the accumulated processing of the pitch waveform so far affects the next processing range, and the value compared with the accumulated processing of the current pitch waveform varies in step S108. When the process of step S109 ends, the process moves to step S103, and the next audio frame is processed.

音声フレーム内におけるピッチ波形の処理累計がＦａの絶対値を超えていないと判別した場合（ステップＳ１０８：ＮＯ）、波形縮小／伸張部１１４は、指定された倍率ａが１以上であるか否かを判別し（ステップＳ１１０）、１以上であれば（ステップＳ１１０：ＹＥＳ）、ピッチ波形に対し伸張処理（ステップＳ１１１）を実行し、１未満であれば（ステップＳ１１０：ＮＯ）、ピッチ波形に対し縮小処理（ステップＳ１１１）を実行する。そして、制御部１１０はステップＳ１０７に処理を戻す。 When it is determined that the accumulated processing of the pitch waveform within the audio frame does not exceed the absolute value of Fa (step S108: NO), the waveform reduction / expansion unit 114 determines whether or not the designated magnification a is 1 or more. (Step S110), if it is 1 or more (step S110: YES), the expansion process (step S111) is executed for the pitch waveform. If it is less than 1 (step S110: NO), the pitch waveform is processed. A reduction process (step S111) is executed. And the control part 110 returns a process to step S107.

ステップＳ１１１の伸張処理は、図４のフローチャートを参照して説明する。 The decompression process in step S111 will be described with reference to the flowchart in FIG.

以下、注目している（波形を伸張することが指定されている）ピッチ波形の各サンプル値列を｛ｘ_ｋ，ｘ_ｋ＋１，・・・，ｘ_{ｋ＋Ｎ−１}｝、このピッチ波形の前のピッチ波形の各サンプル値列を｛ｘ_ｋ−Ｎ，ｘ_{ｋ−Ｎ＋１}，・・・，ｘ_ｋ−１｝、そして、注目しているピッチ波形の後のピッチ波形の各サンプル値列を｛ｘ_ｋ＋Ｎ，ｘ_{ｋ＋Ｎ＋１}，・・・，ｘ_{ｋ＋２Ｎ−１}｝とする。 Hereinafter, each sample value sequence of the pitch waveform of interest (designated to expand the waveform) is _represented as {x _k , x _{k + 1} ,..., X _{k + N−1} }, and the pitch before this pitch waveform Each sample value sequence of the waveform is _represented by {x _k−N , x _{k−N + 1} ,..., X _k−1 }, and each sample value sequence of the pitch waveform after the pitch waveform of interest is _represented by {x _{k + N} , X _{k + N + 1} ,..., X _{k + 2N−1} }.

次に、波形縮小／伸張部１１４は注目しているピッチ波形とその前の区間のピッチ波形との相関係数ｃ_ａを数３に示す式を用いて計算し、注目しているピッチ波形とその後の区間のピッチ波形との相関係数ｃ_ｂを数４に示す式を用いて計算する（図４：ステップＳ３０１）。 Next, the waveform reduction / expansion unit 114 calculates _a correlation coefficient ca between the pitch waveform of interest and the pitch waveform of the previous section using the equation shown in Formula 3, and calculates the pitch waveform of interest. calculated using the equation shown in equation (4) the correlation coefficient c _b between the pitch waveforms of the subsequent interval (Figure 4: step S301).

そして、波形縮小／伸張部１１４はステップＳ３０１で計算したｃ_ａとｃ_ｂとの大小を判別し、注目しているピッチ波形との相関が高い方のピッチ波形を判別する（ステップＳ３０２）。 Then, the waveform reduction / expansion unit 114 determines the magnitude of c _a and c _b calculated in step S301, the correlation between the pitch waveforms of interest to determine the higher pitch waveform (step S302).

過去側のピッチ波形の相関が未来側のピッチ波形の相関よりも高い場合（ステップＳ３０２：過去側（前））、波形縮小／伸張部１１４は次の数６に示す式に従って、ピッチ波形を生成する（ステップＳ３０３）。
（数６）
ｓ_ｉ＝（ｉ／Ｎ−１）×ｘ_{ｋ−Ｎ＋ｉ}＋（（Ｎ−１−ｉ）／Ｎ−１）×ｘ_ｋ＋ｉ
（ｉは０からＮ−１） When the correlation of the past-side pitch waveform is higher than the correlation of the future-side pitch waveform (step S302: past-side (previous)), the waveform reduction / expansion unit 114 generates a pitch waveform according to the following equation (6). (Step S303).
(Equation 6)
s _i = (i / N−1) × x _{k−N + i} + ((N−1−i) / N−1) × x _{k + i}
(I is 0 to N-1)

数６で示した式は、過去側のピッチ波形と注目区間のピッチ波形の各サンプル値を重み付け加算していることを示している。過去側のピッチ波形の重み係数（ｉ／Ｎ−１）は０から始まり１で終わる。そして、注目区間のピッチ波形の重み係数（（Ｎ−１−ｉ）／Ｎ−１）は１で始まり０で終わる。 The expression shown in Expression 6 indicates that the sample values of the past-side pitch waveform and the pitch waveform of the section of interest are weighted and added. The weight coefficient (i / N-1) of the pitch waveform on the past side starts from 0 and ends with 1. The weighting coefficient ((N-1-i) / N-1) of the pitch waveform in the attention section starts with 1 and ends with 0.

次に、波形縮小／伸張部１１４は生成したピッチ波形を前のピッチ波形と注目しているピッチ波形との間に接続（挿入）し（ステップＳ３０４）、伸張処理を終了しステップＳ１０７に進む。 Next, the waveform reduction / expansion unit 114 connects (inserts) the generated pitch waveform between the previous pitch waveform and the pitch waveform of interest (step S304), ends the expansion process, and proceeds to step S107.

以上の各ステップの処理で得られる音声波形の各サンプル値は、｛・・・，ｘ_ｋ−１，ｓ_０，ｓ_１，・・・，ｓ_Ｎ−１，ｘ_ｋ，ｘ_ｋ＋１，・・・，ｘ_{ｋ＋Ｎ−１}，・・・｝となる。 The respective sample values of the speech waveform obtained by the processing of the above steps are {..., X _k−1 , s ₀ , s ₁ ,..., S _N−1 , x _k , x _{k + 1} , , X _{k + N−1} ,.

一方、未来側のピッチ波形の相関が過去側のピッチ波形の相関よりも高い場合（図４：ステップＳ３０２：未来側（後））、波形縮小／伸張部１１４は次の数７に示す式に従って、ピッチ波形を生成する（ステップＳ３０５）。
（数７）
ｓ_ｉ＝（ｉ／Ｎ−１）×ｘ_ｋ＋ｉ＋（（Ｎ−１−ｉ）／Ｎ−１）×ｘ_{ｋ＋Ｎ＋ｉ}
（ｉは０からＮ−１） On the other hand, when the correlation of the pitch waveform on the future side is higher than the correlation of the pitch waveform on the past side (FIG. 4: step S302: future side (after)), the waveform reduction / expansion unit 114 follows the equation shown in the following equation (7). A pitch waveform is generated (step S305).
(Equation 7)
s _i = (i / N−1) × x _{k + i} + ((N−1−i) / N−1) × x _{k + N + i}
(I is 0 to N-1)

次に、波形縮小／伸張部１１４は生成したピッチ波形を注目しているピッチ波形と後のピッチ波形との間に接続（挿入）し（ステップＳ３０６）、伸張処理を終了する。 Next, the waveform reduction / expansion unit 114 connects (inserts) the generated pitch waveform between the focused pitch waveform and the subsequent pitch waveform (step S306), and ends the expansion process.

以上の各ステップの処理で得られる音声波形の各サンプル値は、｛・・・，ｘ_ｋ，ｘ_ｋ＋１，・・・，ｘ_{ｋ＋Ｎ−１}，ｓ_０，ｓ_１，・・・，ｓ_Ｎ−１，ｘ_ｋ＋Ｎ，・・・｝となる。 Each sample value of the speech waveform obtained by the processing of the above steps is {..., X _k , x _{k + 1} ,..., X _{k + N−1} , s ₀ , s ₁ _{,. 1} , x _{k + N} ,.

ステップＳ１１２の縮小処理は、図５のフローチャートを参照して説明する。 The reduction process in step S112 will be described with reference to the flowchart of FIG.

ステップＳ４０１、Ｓ４０２は、それぞれ図４のステップＳ３０１、Ｓ３０２と同様であり、説明を省略する。 Steps S401 and S402 are the same as steps S301 and S302 in FIG.

ステップＳ４０３では、図４のステップＳ３０３と同様に、波形縮小／伸張部１１４は、注目するピッチ波形と前のピッチ波形に基づいて、数８に示す式を用いて重み付け加算を実行し、新しいピッチ波形を生成する。数６と数８とを比較すると明らかなように、重み係数は伸張する場合と逆である。したがって、過去側のピッチ波形の重み係数（ｉ／Ｎ−１）は１から始まり０で終わる。そして、注目区間のピッチ波形の重み係数（（Ｎ−１−ｉ）／Ｎ−１）は０で始まり１で終わる。
（数８）
ｓ_ｉ＝（（Ｎ−１−ｉ）／Ｎ−１）×ｘ_{ｋ−Ｎ＋ｉ}＋（ｉ／Ｎ−１）×ｘ_ｋ＋ｉ
（ｉは０からＮ−１） In step S403, as in step S303 of FIG. 4, the waveform reduction / expansion unit 114 performs weighted addition using the equation shown in Equation 8 based on the pitch waveform of interest and the previous pitch waveform, and creates a new pitch. Generate a waveform. As is clear from the comparison between Equation 6 and Equation 8, the weighting factor is the reverse of the expansion. Therefore, the weight coefficient (i / N-1) of the pitch waveform on the past side starts from 1 and ends at 0. The weighting coefficient ((N-1-i) / N-1) of the pitch waveform in the attention section starts with 0 and ends with 1.
(Equation 8)
s _i = ((N−1−i) / N−1) × x _{k−N + i} + (i / N−1) × x _{k + i}
(I is 0 to N-1)

そして、波形縮小／伸張部１１４は、注目するピッチ波形とその前のピッチ波形とを生成したピッチ波形で置き換える（ステップＳ４０４）。つまり、連続した２つのピッチ波形を生成したピッチ波形で置き換える。 Then, the waveform reduction / expansion unit 114 replaces the pitch waveform of interest and the previous pitch waveform with the generated pitch waveform (step S404). That is, two consecutive pitch waveforms are replaced with generated pitch waveforms.

ステップＳ４０５では、図４のステップＳ３０５と同様に、波形縮小／伸張部１１４は、注目するピッチ波形と後のピッチ波形に基づいて、数９に示す式を用いて重み付け加算を実行し、新しいピッチ波形を生成する。そして、この注目するピッチ波形と後のピッチ波形とを生成したピッチ波形で置き換える（ステップＳ４０６）。
（数９）
ｓ_ｉ＝（（Ｎ−１−ｉ）／Ｎ−１）×ｘ_ｋ＋ｉ＋（ｉ／Ｎ−１）×ｘ_{ｋ＋Ｎ＋ｉ}
（ｉは０からＮ−１） In step S405, as in step S305 of FIG. 4, the waveform reduction / expansion unit 114 performs weighted addition using the equation shown in Equation 9 based on the pitch waveform of interest and the subsequent pitch waveform, and creates a new pitch. Generate a waveform. Then, the pitch waveform of interest and the subsequent pitch waveform are replaced with the generated pitch waveform (step S406).
(Equation 9)
s _i = ((N−1−i) / N−1) × x _{k + i} + (i / N−1) × x _{k + N + i}
(I is 0 to N-1)

このような構成によれば、過去側と未来側とのうち、相関が高い側のピッチ波形と、注目しているピッチ波形とに基づいてピッチ波形を生成する。そして、伸張の場合には、注目しているピッチ波形の直前または直後に挿入する。また、縮小の場合には、生成したピッチ波形のもととなった音声波形の代わりに挿入する。このため、過渡期の音声波形を再生する際に、雑音の発生を低減することができる。また、２つのピッチ波形のうち、過去側のピッチ波形に対し、０から始まり１で終わるような重み係数を乗算し、未来側のピッチ波形に対し、１から始まり０で終わるような重み係数を乗算する。このため、生成したピッチ波形は、波形の連続性を保った状態で前後のピッチ波形と接続される。 According to such a configuration, the pitch waveform is generated based on the pitch waveform having the higher correlation between the past side and the future side and the pitch waveform of interest. In the case of expansion, it is inserted immediately before or after the pitch waveform of interest. In the case of reduction, the generated pitch waveform is inserted instead of the voice waveform. For this reason, it is possible to reduce the generation of noise when reproducing a transitional speech waveform. Of the two pitch waveforms, the past-side pitch waveform is multiplied by a weighting factor starting from 0 and ending with 1, and the future-side pitch waveform is weighted starting with 1 and ending with 0. Multiply. For this reason, the generated pitch waveform is connected to the preceding and succeeding pitch waveforms while maintaining the continuity of the waveform.

なお、入力された音声波形を縮小して出力する場合、他のピッチ波形と重み付け加算されて置き換えられるため、処理順が回ってくる前に消去されるピッチ波形がある。この場合、そのようなピッチ波形はステップＳ１１２の処理対象にならない。 Note that when the input speech waveform is reduced and output, since it is replaced by weighted addition with another pitch waveform, there is a pitch waveform that is erased before the processing order comes around. In this case, such a pitch waveform is not a processing target in step S112.

また、上述した数６乃至数９の重み係数は、一例であり、０から始まり１で終わるようなＮ個の数列ａ_ｉ（上記実施形態１ではｉ／（Ｎ−１））、及び１から始まり０で終わるＮ個の数列ｂ_ｉ（上記実施形態１では（Ｎ−１−ｉ）／（Ｎ−１））であればどのようなものでもよい。ただし、各ｉ（０からＮ−１まで）に対し、次の数１０で示す関係を満たしている必要がある。
（数１０）
ａ_ｉ＋ｂ_ｉ＝１ Further, the weighting coefficients of the above-described equations 6 to 9 are examples, and N number sequences a _i (i / (N−1) in the first embodiment) and 1 starting from 0 and ending with 1 and 1 Any number may be used as long as it is an N number sequence b _i that starts and ends with 0 ((N-1-i) / (N-1) in the first embodiment). However, for each i (from 0 to N−1), it is necessary to satisfy the relationship expressed by the following equation (10).
(Equation 10)
a _i + b _i = 1

また、上記ピッチ抽出部１１２は、残り区間が指定の倍数以下になったとき、残りの区間を次回処理の先頭としていたが、最後の縮小・伸張処理を行った直後の地点から、次回処理（ピッチ波形検出）を行うようにしてもよい。このように処理を行うと、上記処理方式よりも短い単位で処理が実行される。このため、指定された倍率により近い出力を得ることができる。 The pitch extraction unit 112 sets the remaining section as the head of the next process when the remaining section is equal to or less than the specified multiple, but starts the next process ( Pitch waveform detection) may be performed. When processing is performed in this way, processing is executed in units shorter than the above processing method. For this reason, an output closer to the designated magnification can be obtained.

また、ピッチ波形の切り出しを止める条件として、切り出したピッチ波形のうちで、サンプル数の最も多いもの（最大値）を基準としていたが、これをサンプル数の算術平均（以下、単に平均と称する）とすることができる。この場合、誤ってピッチ波形の長さを判別してしまい、余計に次回処理に回す場合を回避することができる。 Further, as a condition for stopping the cutout of the pitch waveform, the cutout pitch waveform having the largest number of samples (maximum value) was used as a reference. This is the arithmetic average of the number of samples (hereinafter simply referred to as the average). It can be. In this case, it is possible to avoid the case where the length of the pitch waveform is erroneously determined and the process is unnecessarily transferred to the next process.

なお、ここまで、倍率が１／２から２の間の場合を説明したが、倍率はこの範囲外にあってもよい。 Although the case where the magnification is between 1/2 and 2 has been described so far, the magnification may be outside this range.

例えば、倍率がｎ倍から（ｎ＋１）倍の間にある場合（ただし、ｎは２以上の整数で、ｎ倍は含まないが、（ｎ＋１）倍を含む）、処理順により選択されたピッチ波形は（ｎ＋１）倍の伸張処理を受ける。（ｎ＋１）倍の伸張処理を受ける場合、上記ステップＳ３０２において、波形縮小／伸張部１１４は伸張波形を生成するためのピッチ波形を次のように選択する。ここで、注目しているピッチ波形の先頭位置を０と置き、ピッチ波形の長さをＮとする。
１）過去側の相関が未来側の相関より大きい場合
−ｎＮから０までの音声波形と０からｎＮまでの音声波形
２）未来側の相関が過去側の相関より大きい場合
（１−ｎ）ＮからＮまでの音声波形とＮから（１＋ｎ）Ｎまでの音声波形 For example, when the magnification is between n times and (n + 1) times (where n is an integer greater than or equal to 2 and does not include n times, but includes (n + 1) times), the pitch waveform selected according to the processing order Undergoes (n + 1) times expansion processing. When receiving the (n + 1) times expansion process, in step S302, the waveform reduction / expansion unit 114 selects a pitch waveform for generating an expansion waveform as follows. Here, the head position of the pitch waveform of interest is set to 0, and the length of the pitch waveform is set to N.
1) When past side correlation is greater than future side correlation-Speech waveform from nN to 0 and speech waveform from 0 to nN 2) When future side correlation is greater than past side correlation (1-n) N Speech waveform from N to N and speech waveform from N to (1 + n) N

そして、数６及び数７の重み係数は、ｉ／（Ｎ−１）の代わりにｉ／（ｎＮ−１）とし、（Ｎ−１−ｉ）／（Ｎ−１）の代わりに（ｎＮ−１−ｉ）／（ｎＮ−１）とする。なお、上記相関係数ｃ_ａを−ｎＮから０までの音声波形と０からｎＮまでの音声波形とから、上記相関係数ｃ_ｂを（１−ｎ）ＮからＮまでの音声波形とＮから（１＋ｎ）Ｎまでの音声波形とから求めることが望ましい。 The weighting coefficients of Equations 6 and 7 are i / (nN-1) instead of i / (N-1), and (nN-) instead of (N-1-i) / (N-1). 1-i) / (nN-1). The correlation coefficient c _a is determined from the speech waveform from −nN to 0 and the speech waveform from 0 to nN, and the correlation coefficient c _b is determined from the speech waveform from (1-n) N to N and from N. It is desirable to obtain from speech waveforms up to (1 + n) N.

また、例えば、倍率が１／（ｍ＋１）倍から１／ｍ倍の間にある場合（ただし、ｍは２以上の整数で、１／ｍ倍を含まないが、１／（ｍ＋１）倍を含む）、選択されたピッチ波形は１／（ｍ＋１）倍の縮小処理を受ける。１／（ｍ＋１）倍の縮小処理を受ける場合、上記ステップＳ４０２において、波形縮小／伸張部１１４は縮小波形を生成するためのピッチ波形を次のように選択する。ここで、注目しているピッチ波形の先頭位置を０と置き、ピッチ波形の長さをＮとする。
１）過去側の相関が未来側の相関より大きい場合
−ｍＮから（−ｍ＋１）Ｎまでの音声波形と０からＮまでの音声波形
２）未来側の相関が過去側の相関より大きい場合
０からＮまでの音声波形とｍＮから（ｍ＋１）Ｎまでの音声波形 Also, for example, when the magnification is between 1 / (m + 1) times and 1 / m times (where m is an integer of 2 or more and does not include 1 / m times, but includes 1 / (m + 1) times) ), The selected pitch waveform is subjected to a reduction process of 1 / (m + 1) times. When the 1 / (m + 1) times reduction process is performed, in step S402, the waveform reduction / expansion unit 114 selects a pitch waveform for generating a reduced waveform as follows. Here, the head position of the pitch waveform of interest is set to 0, and the length of the pitch waveform is set to N.
1) When the correlation on the past side is larger than the correlation on the future side The speech waveform from -mN to (-m + 1) N and the speech waveform from 0 to N 2) When the correlation on the future side is greater than the correlation on the past side From 0 Speech waveform up to N and speech waveform from mN to (m + 1) N

そして、ステップＳ４０４では、生成したピッチ波形で前のピッチ波形から注目ピッチ波形までの音声波形を置き換え、ステップＳ４０６では、生成したピッチ波形で注目ピッチ波形から後のピッチ波形までを置き換える。 In step S404, the generated pitch waveform replaces the speech waveform from the previous pitch waveform to the target pitch waveform. In step S406, the generated pitch waveform replaces the target pitch waveform to the subsequent pitch waveform.

また、この際、上記相関係数ｃ_ａを−ｍＮから０までの音声波形と（−ｍ＋１）ＮからＮまでの音声波形とから、上記相関係数ｃ_ｂを０からｍＮまでの音声波形とＮから（ｍ＋１）Ｎまでの音声波形とから求めることが望ましい。 At this time, the correlation coefficient c _a is a speech waveform from −mN to 0 and a speech waveform from (−m + 1) N to N, and the correlation coefficient c _b is a speech waveform from 0 to mN. It is desirable to obtain from speech waveforms from N to (m + 1) N.

（実施形態２）
上記実施形態１では、ピッチ波形の類似度を求め、類似度の高い順にピッチ波形の伸張・縮小を行う。しかし、類似度という尺度のみに基づいてピッチ波形を縮小・伸張するため、音素の言語上の重要度とは無関係にピッチ波形が縮小・伸張される。このため、例えば、人間の会話を高速再生する場合、強勢（ストレス、アクセント）がおかれているピッチ波形も縮小されてしまい、その会話を人間が聞き取りにくくなるという問題が生じる。そこで、実施形態２では、音声フレーム内に強勢がおかれている部分（以下、強勢部分と称する。）があるか否かを判別し、その部分に対しては、できるだけ再生時間が短くならないようにする第２の実施形態の音声処理装置について説明する。 (Embodiment 2)
In the first embodiment, the pitch waveform similarity is obtained, and the pitch waveform is expanded / reduced in descending order of similarity. However, since the pitch waveform is reduced / expanded based only on the measure of similarity, the pitch waveform is reduced / expanded regardless of the phoneme language importance. For this reason, for example, when a human conversation is played back at high speed, the pitch waveform on which stress (stress, accent) is placed is also reduced, which causes a problem that it becomes difficult for humans to hear the conversation. Therefore, in the second embodiment, it is determined whether or not there is a portion (hereinafter, referred to as a stress portion) in which a stress is placed in the audio frame, and the playback time is not shortened as much as possible. A speech processing apparatus according to the second embodiment will be described.

本実施形態にかかる音声処理装置１００の構成は実施形態１と同一であるので、実施形態１との相違点のみを説明することにする。 Since the configuration of the speech processing apparatus 100 according to the present embodiment is the same as that of the first embodiment, only differences from the first embodiment will be described.

音声信号の縮小が指示されている場合、本実施形態のピッチ選択部１１３は、ピッチ毎に分割された音声フレームを順序付けする際にさらに、強勢部分があるか否かを判別し、強勢部分を含むピッチの処理順序を後回しにする。 When the reduction of the audio signal is instructed, the pitch selection unit 113 of the present embodiment further determines whether or not there is a stress portion when ordering the sound frames divided for each pitch, and determines the stress portion. The processing order of the pitch including it is postponed.

ピッチ選択部１１３は、音声フレーム内に強勢部分があるか否かを判別するために、以下の手順により、判別を行う。 The pitch selection unit 113 performs the determination according to the following procedure in order to determine whether or not there is a stressed portion in the audio frame.

まず、実施形態１で説明した方法（類似度の高い順に並べる）により、一旦ピッチ波形間の順序付けを行う。強勢がおかれている部分では前後とよく似たピッチ波形が出現するため、強勢部分を含むピッチ波形は、この順序付けにより上位にくる。また、振幅が他より大きくなるため、強勢部分は他の部分（ピッチ波形）と比較して大きな波のエネルギーを有する。 First, the pitch waveforms are once ordered by the method described in the first embodiment (arranged in descending order of similarity). Since a pitch waveform that resembles the front and rear appears in the portion where the stress is placed, the pitch waveform including the stress portion is placed higher in this ordering. Further, since the amplitude is larger than the others, the stress portion has a larger wave energy than the other portion (pitch waveform).

そこで、ピッチ選択部１１３は、仮に順序付けされた各ピッチ波形が有するエネルギー値の変化の様子から強勢部分を含むピッチ波形が存在するか否かの判別を行う（以下、「強勢判別処理」と称する）。強勢部分があると判別した場合には、それらの部分の順位を所定の順序まで下げる。 Therefore, the pitch selection unit 113 determines whether or not there is a pitch waveform including a stress portion from the state of change of the energy value of each ordered pitch waveform (hereinafter referred to as “stress determination processing”). ). If it is determined that there are strong portions, the rank of those portions is lowered to a predetermined order.

強勢判別処理について、図６に示したフローチャートを参照して説明する。強勢判別処理は、図３のステップＳ１０６の後、ステップＳ１０７の前に実行される。 The stress determination process will be described with reference to the flowchart shown in FIG. The stress determination process is executed after step S106 in FIG. 3 and before step S107.

まず、ピッチ選択部１１３は、初期化処理として、変数カウンタｉに１を代入する（ステップＳ６０１）。 First, the pitch selection unit 113 substitutes 1 for a variable counter i as an initialization process (step S601).

次に、ｉの値とＮ／ｔの値とを比較し、ｉの値がＮ／ｔの値より小さいか否かを判別する（ステップＳ６０２）。ｉの値がＮ／ｔの値より小さければ（ステップＳ６０２：ＹＥＳ）、ピッチ選択部１１３は、強勢判別処理の実行を継続し、ステップＳ６０３に処理を移す。なお、Ｎは強勢判別処理で処理するピッチ波形の数である。また、ｔは実験的に求めた値で、本実施形態では４とする。 Next, the value of i is compared with the value of N / t, and it is determined whether or not the value of i is smaller than the value of N / t (step S602). If the value of i is smaller than the value of N / t (step S602: YES), the pitch selection unit 113 continues to execute the stress determination process, and moves the process to step S603. N is the number of pitch waveforms to be processed in the stress determination process. Further, t is a value obtained experimentally, and is 4 in this embodiment.

ｉの値がＮ／ｔの値より大きければ（ステップＳ６０２：ＮＯ）、ピッチ選択部１１３は、当該音声フレーム内に強勢が無かったと判別し（ステップＳ６０７）、強勢判別処理を終了する。この場合、ステップＳ１０６で行った以上の順序の変更は無い。 If the value of i is larger than the value of N / t (step S602: NO), the pitch selection unit 113 determines that there is no stress in the audio frame (step S607), and ends the stress determination process. In this case, there is no change in the order more than that performed in step S106.

ステップＳ６０３では、数１１、１２に示す式により、ｋ_０、ｋ_１を計算する。なお、ｓｗ_ｊは上述のステップＳ１０６で付与された処理順ｊに対応するピッチ波形が有する波のエネルギー値である。ｓｗ_ｊは数１３に示す式により算出する。なお、Ｓ_ｘ、Ｐｌ_ｎ、ｐ_ｎは数２で使用した定義と同一とする。また、数１３でのｎは処理順ｊに対応するピッチ波形が音声フレームの先頭からｎ番目のピッチ波形であったことを意味する。 In step S603, k ₀ and k ₁ are calculated by the equations shown in equations 11 and 12. Note that sw _j is an energy value of a wave included in the pitch waveform corresponding to the processing order j given in step S106 described above. sw _j is calculated by the equation shown in Equation 13. S _x , Pl _n , and _pn are the same as the definitions used in Equation 2. Further, n in Equation 13 means that the pitch waveform corresponding to the processing order j is the nth pitch waveform from the beginning of the audio frame.

もし、音声フレーム内に強勢部分が含まれているとすれば、強勢部分の振幅は他の部分の振幅より大きい。従って、音声フレーム内で周波数がほとんど変化しなければ、強勢部分を含むピッチ波形が有するエネルギーは、強勢部分を含まない他のピッチ波形が有するエネルギーよりもとりわけ大きくなっている。それ故、ｋ_０の値とｋ_１の定数倍ｂとを比較し（ステップＳ６０４）、ｋ_０の値がｋ_１の定数倍ｂ以下であれば（ステップＳ６０４：ＮＯ）、ｉに１を加算し（ステップＳ６０５）、ステップＳ６０２に戻る。なお、ｂの値は実験的に求めた最適値であって、例えば、１．５である。 If the stress portion is included in the voice frame, the amplitude of the stress portion is larger than the amplitude of the other portions. Therefore, if the frequency hardly changes in the voice frame, the energy of the pitch waveform including the stress portion is particularly higher than the energy of the other pitch waveforms not including the stress portion. Therefore, the value of k ₀ is compared with the constant multiple b of k ₁ (step S604). If the value of k ₀ is equal to or less than the constant multiple b of k ₁ (step S604: NO), 1 is added to i. Then (step S605), the process returns to step S602. Note that the value of b is an experimentally obtained optimum value, for example, 1.5.

ｋ_０の値がｋ_１の定数倍ｂより大きいと判別した場合（ステップＳ６０４：ＹＥＳ）、ピッチ選択部１１３は、当該音声フレーム内に強勢があると判別する（ステップＳ６０６）。そして、ｋ_０に含まれる類似度の大きな順でｉ個とったピッチ波形の処理順位を、他のピッチ波形よりも後で縮小を受けるように変更する。ここでのｉの値は、ステップＳ６０４においてｋ_０の値がｋ_１の定数倍ｂより大きくなったときのｉの値とする。そして、強勢判別処理を終了する。 When it is determined that the value of k ₀ is larger than the constant multiple b of k ₁ (step S604: YES), the pitch selection unit 113 determines that there is a stress in the audio frame (step S606). Then, the processing order of descending order in i pieces took pitch waveform similarity contained in k _0, changing to receive reduced later than the other pitch waveforms. The value of i here is the value of i when the value of k ₀ becomes larger than the constant multiple b of k ₁ in step S604. Then, the stress determination process ends.

ただし、強勢部分を有するピッチ波形の処理順位を最後に下げると、本来、縮小処理に適さないピッチ波形をも縮小してしまい、音質を余計に劣化させるおそれがある。そこで、ピッチ選択部１１３は、例えば、このピッチ波形の処理順位を、最大の類似値と最大の類似値を算術平均し、得られた値の類似度を持つピッチ波形に与えられる処理順位よりは低くしないことで音質の劣化を抑えることができる。なお、この算術平均値と等しい値の類似度を持つピッチ波形がない場合は、類似度が算術平均値より小さいピッチ波形のうち、最も値の大きなピッチ波形に付与された処理順位よりは処理順位を下げないものとする。 However, if the processing order of the pitch waveform having the stress portion is lowered last, the pitch waveform that is originally not suitable for the reduction processing is also reduced, and the sound quality may be further deteriorated. Therefore, the pitch selection unit 113 arithmetically averages the processing order of the pitch waveform, for example, the maximum similarity value and the maximum similarity value, and the processing order given to the pitch waveform having the degree of similarity of the obtained values. Deterioration of sound quality can be suppressed by not lowering. If there is no pitch waveform having a similarity with a value equal to the arithmetic average value, the processing order is higher than the processing order assigned to the pitch waveform having the largest value among the pitch waveforms having a similarity lower than the arithmetic mean value. Shall not be lowered.

このような構成により、強勢のある部分の音声波形が縮小されにくくなり、例えば、高速再生時における聞き取りやすさを向上させることができる。また、強勢のある部分の処理順序をある順位より下に下げないことにしたため、再生時の音質劣化を抑えることができる。 With such a configuration, it is difficult to reduce the voice waveform of a strong portion, and for example, it is possible to improve the ease of hearing during high-speed playback. In addition, since the processing order of the strong portion is not lowered below a certain order, it is possible to suppress deterioration in sound quality during reproduction.

以上説明したように、実施形態１および実施形態２の音声処理装置１００では、入力された音声波形データを任意の倍率で縮小あるいは伸張して出力できる。 As described above, in the speech processing apparatus 100 according to the first and second embodiments, the input speech waveform data can be reduced or expanded at an arbitrary magnification and output.

なお、本発明は上記実施形態に限定されず、種々の変形及び応用が可能である。 In addition, this invention is not limited to the said embodiment, A various deformation | transformation and application are possible.

例えば、上記実施形態では、類似するピッチ波形が連続する場所を選択する方法として数２に示した最小二乗誤差を利用したが、数１４に示す平均誤差ｖ_ｎや、数１５に示すベクトルの角度係数ｈ_ｎを使用するようにしてもよい。ピッチ選択部１１３は、平均誤差ｖ_ｎを利用する場合は、平均誤差の小さい順、角度係数ｈ_ｎを利用する場合は、角度係数の大きい順に縮小／伸張する波形を選択する。なお、実施形態２の場合には、数１３あるいは数１４に示した式でピッチ波形の順序を並び換えた後に、上述した強勢判別処理を実行し、さらにピッチ波形の順序を並び換える。 For example, in the above embodiment, the angle of the average error v _n and a vector shown in Equation 15, but using the least square error expressed by Equation 2 as a method for selecting the location where pitch waveforms similar to continuous, as shown in Equation 14 The coefficient h _n may be used. Pitch selection unit 113, when using the average error v _n is ascending order of average error, when using the angular coefficient h _n selects a waveform to shrink / stretch in descending order of angular coefficient. In the case of the second embodiment, after the order of pitch waveforms is rearranged by the equation shown in Equation 13 or 14, the above-described stress determination process is executed, and the order of pitch waveforms is further rearranged.

Ｓ_ｘ、Ｐｌ_ｎ、ｐ_ｎの定義は、数１４、数１５ともに数２と同じである。

Definition of S _x, _Pl _{n, p} n is the number 14, is the same as the number 15 together several 2.

また、上記数２では、注目するピッチ波形と、入力波形上そのピッチ波形の前に位置するピッチ波形との平均二乗誤差を算出していたが、入力波形上そのピッチ波形の後に位置するピッチ波形との平均二乗誤差を算出してもよい。また、入力波形上そのピッチ波形の前後に位置するピッチ波形との平均二乗誤差を算出し、そのうちの一方（大きい方あるいは小さな方）を類似度の代表値として採用するようにしてもよい。 In the above equation 2, the mean square error between the pitch waveform of interest and the pitch waveform positioned before the pitch waveform on the input waveform is calculated, but the pitch waveform positioned after the pitch waveform on the input waveform is calculated. The mean square error may be calculated. Further, an average square error with the pitch waveform positioned before and after the pitch waveform on the input waveform may be calculated, and one of them (larger or smaller) may be adopted as a representative value of similarity.

また、上記各実施形態にかかる音声処理装置を伸張と縮小とのうち一方だけを処理するように構成してもよい。 Further, the audio processing apparatus according to each of the above embodiments may be configured to process only one of expansion and reduction.

また、上記各実施形態にかかる音声処理装置は、インターネット等のネットワークを介して他の装置との通信を行う通信制御部をさらに備えてもよく、この通信制御部を介して、音声波形データを他の装置と送受信するようにしてもよい。 In addition, the audio processing device according to each of the above embodiments may further include a communication control unit that communicates with another device via a network such as the Internet, and the audio waveform data is transmitted via the communication control unit. You may make it transmit / receive with another apparatus.

また、音声処理装置１００はアナログ音声の入力を受け付けるようにしてもよい。この場合、音声処理装置１００は、アナログ音声データをＰＣＭ（Pulse Code Modulation）などの所定の方式により、サンプリングする音声サンプリング部をさらに備えるものとする。また、音声処理装置１００はアナログ音声を出力するようにしてもよい。 In addition, the voice processing apparatus 100 may accept an analog voice input. In this case, the audio processing apparatus 100 further includes an audio sampling unit that samples analog audio data by a predetermined method such as PCM (Pulse Code Modulation). Further, the sound processing apparatus 100 may output analog sound.

なお、上記各実施形態における音声処理装置１００は、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。例えば、上述の動作を実行するためのプログラムをコンピュータ読み取り可能な記録媒体（ＦＤ、ＣＤ−ＲＯＭ、ＤＶＤ等）に格納して配布し、該プログラムをコンピュータにインストールすることにより、上述の処理を実行する、音声処理再生装置１００を構成してもよい。また、インターネット等のネットワーク上のサーバ装置が有するディスク装置に格納しておき、例えばコンピュータにダウンロード等するようにしてもよい。 Note that the audio processing apparatus 100 in each of the above embodiments can be realized using a normal computer system, not a dedicated system. For example, a program for executing the above operation is stored in a computer-readable recording medium (FD, CD-ROM, DVD, etc.) and distributed, and the program is installed in the computer to execute the above processing. The audio processing / playback apparatus 100 may be configured. Alternatively, it may be stored in a disk device of a server device on a network such as the Internet, and downloaded to a computer, for example.

また、上述の機能を、ＯＳが分担又はＯＳとアプリケーションの共同より実現する場合等には、ＯＳ以外の部分のみを媒体に格納して配布してもよく、また、コンピュータにダウンロード等してもよい。 In addition, when the OS realizes the above functions by sharing the OS or jointly of the OS and the application, only the part other than the OS may be stored and distributed in the medium, or may be downloaded to the computer. Good.

本発明の実施形態にかかる音声処理装置のブロック図である。It is a block diagram of the audio processing apparatus concerning embodiment of this invention. 図１の制御部の論理構成図である。It is a logic block diagram of the control part of FIG. 本発明の実施形態にかかる波形縮小／伸張処理を説明するためのフローチャートである。It is a flowchart for demonstrating the waveform reduction / expansion process concerning embodiment of this invention. 本発明の実施形態にかかる伸張処理を説明するためのフローチャートである。It is a flowchart for demonstrating the expansion | extension process concerning embodiment of this invention. 本発明の実施形態にかかる縮小処理を説明するためのフローチャートである。It is a flowchart for demonstrating the reduction process concerning embodiment of this invention. 本発明の実施形態２にかかる強勢判別処理を説明するためのフローチャートである。It is a flowchart for demonstrating the stress discrimination | determination process concerning Embodiment 2 of this invention. 従来の音声圧縮／縮小方式であるＴＤＨＳ方式の原理説明図である。It is a principle explanatory drawing of the TDHS system which is a conventional audio compression / reduction system.

Explanation of symbols

１００…音声処理装置、１１０…制御部、１１１…分割部、１１２…ピッチ抽出部、１１３…ピッチ選択部、１１４…波形縮小／伸張部、１２０…入力制御部、１２…入力装置、１３０…出力音声処理部、１３…出力装置、１４０…プログラム格納部、１５０…記憶部、１７０…外部記憶ＩＯ装置、１７…記憶媒体 DESCRIPTION OF SYMBOLS 100 ... Voice processing apparatus, 110 ... Control part, 111 ... Dividing part, 112 ... Pitch extraction part, 113 ... Pitch selection part, 114 ... Waveform reduction / expansion part, 120 ... Input control part, 12 ... Input device, 130 ... Output Audio processing unit, 13 ... output device, 140 ... program storage unit, 150 ... storage unit, 170 ... external storage IO device, 17 ... storage medium

Claims

A speech waveform expansion device for expanding and outputting an input waveform on a time axis,
Input waveform receiving means for receiving data representing the input waveform;
An expansion ratio acceptance means for accepting an input of a magnification for expanding the input waveform;
Cutting means for cutting out a pitch waveform from the input waveform received by the input waveform receiving means;
For each pitch waveform cut out by the cut-out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Expanded waveform generation that generates a waveform for insertion by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means,
An expanded waveform connecting means for inserting the waveform generated by the expanded waveform generating means between two pitch waveforms that are subjected to weighted addition on the input waveform;
Equipped with,
Repeating the processing by the expanded waveform generating means and the expanded waveform connecting means while updating the pitch waveform to be selected according to the processing order until the waveform length of the specified magnification is reached.
Speech waveform decompression apparatus according to claim.

The cutout means is means for cutting out a partial waveform of a certain length from the input waveform, and cutting out a pitch waveform in units of the cutout partial waveform,
The ordering means performs processing of ordering pitch waveforms in units of the fixed-length partial waveform.
The speech waveform expansion device according to claim 1.

An input waveform receiving step for receiving data representing the input waveform;
An expansion magnification reception step for receiving an input of a magnification for expanding the input waveform;
A step of cutting out a pitch waveform from the input waveform received in the input waveform receiving step;
For each pitch waveform cut out in the cut-out step, a similarity calculation step for calculating a similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
An ordering step of assigning a processing order to each pitch waveform using the similarity calculated in the similarity calculation step as a scale;
Choose Back Symbol ordered pitch waveforms applied treatment sequence in step, by weighted addition of the one pitch waveform adjacent to the selected pitch waveform and the pitch waveform, decompression waveform to generate a waveform for insertion Generation step;
An expanded waveform connecting step for inserting the waveform generated in the expanded waveform generating step between two pitch waveforms that are subjected to weighted addition on the input waveform;
With
Until waveform length of the specified magnification, while updating the pitch waveforms to be selected according to the processing order, characterized said expansion waveform generation step, and a score repeat processing by said expansion waveform connecting step Waveform stretching method.

The computer used in the speech waveform decompression apparatus for decompressing and outputting input waveform on the time axis,
Input waveform receiving means for receiving data representing the input waveform ;
And stretching magnification accepting means for accepting an input of a magnification stretching the input waveform,
And the cut-out means that you cut out the pitch waveform from the input waveform received by the input waveform receiving means,
For each pitch waveform cut out by the cut- out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Expanded waveform generation that generates a waveform for insertion by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means ,
An expanded waveform connecting means for inserting the waveform generated by the expanded waveform generating means between two pitch waveforms that are subjected to weighted addition on the input waveform;
To function,
Until waveform length of the specified magnification, while updating the pitch waveforms to be selected according to the processing order, and said decompressed waveform generating means, that it has the decompressed waveform connecting means and repeatedly to by the processing returns Suyo A featured program.

A speech waveform reduction device that reduces and outputs an input waveform on a time axis,
Input waveform receiving means for receiving data representing the input waveform;
A reduction magnification receiving means for receiving an input of a magnification for reducing the input waveform;
Cutting means for cutting out a pitch waveform from the input waveform received by the input waveform receiving means;
For each pitch waveform cut out by the cut-out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Reduced waveform generation that generates a replacement waveform by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means,
The waveform generated by the reduced waveform generating means, and two reduced waveform connecting means Ru replaced with pitch waveform subject to weighting addition in the reduced waveform generating means on the input waveform,
Equipped with,
Repeating the processing by the reduced waveform generating means and the reduced waveform connecting means while updating the pitch waveform to be selected according to the processing order until the waveform length of the specified magnification is reached.
Speech waveform reduction apparatus according to claim.

The cutout means is means for cutting out a partial waveform of a certain length from the input waveform, and cutting out a pitch waveform in units of the cutout partial waveform,
The ordering means performs processing of ordering pitch waveforms in units of the fixed-length partial waveform.
The speech waveform reduction device according to claim 5 .

An input waveform reception step for receiving data representing the input waveform;
A reduction magnification acceptance step for accepting an input of a magnification for reducing the input waveform;
A step of cutting out a pitch waveform from the input waveform received in the input waveform receiving step;
For each pitch waveform cut out in the cut-out step, a similarity calculation step for calculating a similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
An ordering step of assigning a processing order to each pitch waveform using the similarity calculated in the similarity calculation step as a scale;
Choose Back Symbol ordered pitch waveform processing order granted in step, by weighted addition of the one pitch waveform adjacent to the selected pitch waveform and the pitch waveform, reduced to generate a waveform for replacement waveform Generation step;
The waveform generated by the reduced waveform generation step, and two reduced waveform connecting step of Ru replaced with pitch waveform subject to weighting addition in the reduced waveform generation step on the input waveform,
With
Until waveform length of the specified magnification, while updating the pitch waveforms to be selected according to the processing order, characterized with the reduced waveform generation step, and a score repeat the processing of and the reduced waveform connecting step Waveform reduction method.

The computer used to input waveform to the sound wave reduction device for outputting reduced on the time axis,
Input waveform receiving means for receiving data representing the input waveform ;
A reduction ratio accepting means for accepting an input of a magnification reducing the input waveform,
And the cut-out means that you cut out the pitch waveform from the input waveform received by the input waveform receiving means,
For each pitch waveform cut out by the cut- out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Reduced waveform generation that generates a replacement waveform by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means ,
The waveform generated by the reduced waveform generating means, a reduced waveform connecting means for replacing two pitch waveforms subject to weighting addition in the reduced waveform generating means on the input waveform,
To function,
Until waveform length of the specified magnification, while updating the pitch waveforms to be selected according to the processing order, and the reduced waveform generating means, that it has to the reduced waveform connecting means and repeatedly to by the processing returns Suyo A featured program.

A speech processing apparatus that outputs an input waveform by extending or reducing on a time axis,
Input waveform receiving means for receiving data representing the input waveform;
Magnification accepting means for accepting an input of a magnification for expanding or reducing the input waveform;
Cutting means for cutting out a pitch waveform from the input waveform received by the input waveform receiving means;
For each pitch waveform cut out by the cut-out means, similarity calculation means for calculating the similarity between the pitch waveform and one pitch waveform adjacent to the pitch waveform before and after on the time axis; ,
Ordering means for assigning a processing order to each pitch waveform using the similarity calculated by the similarity calculation means as a scale;
Expanded waveform generation that generates a waveform for insertion by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means,
An expanded waveform connecting means for inserting the waveform generated by the expanded waveform generating means between two pitch waveforms that are subjected to weighted addition on the input waveform;
Reduced waveform generation that generates a replacement waveform by selecting a pitch waveform in the processing order given by the ordering means and weighting and adding the selected pitch waveform and one pitch waveform adjacent to the pitch waveform Means,
The waveform generated by the reduced waveform generating means, and two reduced waveform connecting means Ru replaced with pitch waveform subject to weighting addition in the reduced waveform generating means on the input waveform,
Magnification discrimination means for discriminating whether to expand or reduce the input waveform;
When it is determined that the input waveform is to be expanded by the magnification determining means, the expanded waveform generating means is updated while updating the pitch waveform selected according to the processing order until the waveform length of the specified magnification is reached. And first repeating means for repeating the processing by the expanded waveform connecting means,
When it is determined that the input waveform is to be reduced by the magnification determining means, the reduced waveform generating means is updated while updating the pitch waveform selected according to the processing order until the waveform length of the specified magnification is reached. And second repeating means for repeating the processing by the reduced waveform connecting means,
Speech processing apparatus characterized by comprising a.