JPH04219797A - Time base compressing and elongating method - Google Patents

Time base compressing and elongating method

Info

Publication number
JPH04219797A
JPH04219797A JP2404324A JP40432490A JPH04219797A JP H04219797 A JPH04219797 A JP H04219797A JP 2404324 A JP2404324 A JP 2404324A JP 40432490 A JP40432490 A JP 40432490A JP H04219797 A JPH04219797 A JP H04219797A
Authority
JP
Japan
Prior art keywords
frame
buffer
frames
pitch
sent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2404324A
Other languages
Japanese (ja)
Inventor
Hiroyuki Hirai
平井啓之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanyo Electric Co Ltd
Original Assignee
Sanyo Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanyo Electric Co Ltd filed Critical Sanyo Electric Co Ltd
Priority to JP2404324A priority Critical patent/JPH04219797A/en
Publication of JPH04219797A publication Critical patent/JPH04219797A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To obtain combination between frames with less distortion and to enable time base compression and elongation by a frame unit by making a position of a peak between connected frames integral times of a pitch. CONSTITUTION:A compression and elongation adjustment part 1 chooses per frame whether input voice is sent to a buffer 2 for work or a buffer 3 for output or deleted according to a compression rate or an elongation rate which was put in. For example, in the case where it is to be compressed to 2/3, a first frame is sent to the buffer 3 for output, the next frame is deleted and the frame next to it is sent to the buffer 2 for work. The frame next to the deleted one is sent to the buffer 2 for work without fail. And cross-correlation between the buffer 2 for work and the buffer 3 for output is calculated at a cross-correlation calculation part 4, and an interval which becomes the maximum is set as a pitch. Thus, as time base is compressed or elongated with composing the voice which is compressed and encoded, the frames can be connected with each other without dislocation of the pitch.

Description

【発明の詳細な説明】[Detailed description of the invention]

【0001】0001

【産業上の利用分野】本発明は、例えば、英会話や講話
等の録音された音声を、録音時と異なる速度で再生する
際に使用される時間軸圧縮伸長方法に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a time axis compression/expansion method used when reproducing recorded audio, such as English conversation or lectures, at a speed different from that at which it was recorded.

【0002】0002

【従来の技術】音声の時間軸を変換する技術として、T
DHS方式やPICOLA方式等があるが、これらの方
式は音声のピッチを基準に処理を行うため、ピッチが絶
えず変化する自然な音声に対しては一定のフレ−ムごと
に処理を行うことができなかった。
[Prior art] As a technology for converting the time axis of audio, T
There are DHS and PICOLA methods, but these methods perform processing based on the pitch of the audio, so they cannot process natural audio whose pitch constantly changes on a fixed frame basis. There wasn't.

【0003】0003

【発明が解決しようとする課題】しかしながら、PIC
OLA方式などのピッチ分析を必要とする処理は、高品
質の早聞き・遅聞き再生音が得られるが、ピッチ分析を
行うための処理が複雑になり、計算量が非常に多くなる
という問題があった。
[Problem to be solved by the invention] However, PIC
Processing that requires pitch analysis, such as the OLA method, can produce high-quality early/slow playback sound, but the problem is that the processing for performing pitch analysis is complex and the amount of calculations is extremely large. there were.

【0004】デジタル技術の発達に伴い音声の録音再生
や伝送などにおいて、パケットごとに圧縮符号化される
ことが多くなってきた。よって、それらの音声時間軸を
変換する場合、復号化し、バッファに1度保存した後に
、その音声の時間軸圧縮伸長を行わなければならなかっ
た。そして、上記処理を1つのCPUで行なおうとした
場合に、音声の復号化を行うフレームは固定長であるが
、時間軸圧縮伸長を行うフレームはピッチにより大きさ
が変化するため処理が大変複雑になるという問題があっ
た。
[0004] With the development of digital technology, it has become common for each packet to be compressed and encoded in audio recording, playback, transmission, and the like. Therefore, when converting the audio time axis, the audio must be decoded, stored once in a buffer, and then compressed and expanded on the time axis. When trying to perform the above processing with a single CPU, the frame for audio decoding has a fixed length, but the frame for time axis compression/decompression changes in size depending on the pitch, making the processing very complicated. There was a problem with becoming.

【0005】また、音声処理を行うフレームを最大ピッ
チが2つ以上入る大きさの固定長のものとし、そのフレ
ームを数個置きに削除することによって時間軸圧縮を行
い、数個おきに同じフレームを挿入することによって時
間軸伸長を行うようにすると、フレームのつなぎ目でピ
ッチがずれてしまうと言った問題がある。
[0005] Furthermore, the frames for audio processing are made of a fixed length that is large enough to accommodate two or more maximum pitches, and the time axis is compressed by deleting every few frames. If the time axis is expanded by inserting a frame, there is a problem in that the pitch will shift at the joint between frames.

【0006】[0006]

【課題を解決するための手段】本発明は上記課題を解決
するためのものであって、音声処理を行うフレームを最
大ピッチが2つ以上入る大きさの固定長のものとし、そ
のフレームを数個置きに削除することによって時間軸圧
縮を行い、また数個おきに同じフレームを挿入すること
によって時間軸伸長を行うようにする際に、繋ぐ2つの
フレーム間の相互相関を計算し、その相互相関の最大と
なる間隔をピッチとし、繋ぎ合わせる前のフレームの最
後のピッチ長で振幅が最大となる位置を求め、繋ぎ合わ
せる後のフレームの最初のピッチ長で振幅が最大となる
位置を求め、それらの2つの位置の間隔がピッチの整数
倍となるように繋ぐと共に、フレームの繋ぎ目で出力波
形を整形することを特徴とした音声時間軸圧縮伸長方法
を提案する。
[Means for Solving the Problems] The present invention is intended to solve the above problems, and is to make the frame for audio processing a fixed length large enough to accommodate two or more maximum pitches, and to divide the frame into a number of frames. When compressing the time axis by deleting every other frame and expanding the time axis by inserting the same frame every few frames, the cross-correlation between the two frames to be connected is calculated, and the The interval where the correlation is maximum is defined as the pitch, the position where the amplitude is maximum at the last pitch length of the frame before splicing is found, the position where the amplitude is maximum at the first pitch length of the frame after splicing is found, We propose an audio time axis compression/expansion method characterized by connecting these two positions so that the interval is an integral multiple of the pitch, and shaping the output waveform at the joint of frames.

【0007】[0007]

【作用】上記の如く構成したことにより、圧縮符号化さ
れた音声を復合化しながら時間軸圧縮伸長を行う場合、
ピッチがずれることなくフレームとフレームを接続でき
る。
[Operation] With the above configuration, when performing time-axis compression/expansion while decoding compression-encoded audio,
Frames can be connected without pitch deviation.

【0008】[0008]

【実施例】以下に、本発明の一実施例について図面を参
照して説明する。図1において、1は入力した圧縮率又
は伸長率に合わせて、入力音声を1フレームごとに作業
用バッファ2又は出力用バッファ3に送るか、削除する
かを選択する圧縮伸長調整部である。例えば、2/3倍
に圧縮する場合は3回に1回削除を行う。即ち、最初の
フレームは出力用バッファ3に送り、次のフレームは削
除し、その次のフレームは作業用バッファ2に送ってお
り、削除を行った次のフレームは必ず作業用バッファ2
へ送られる。また、3/2倍に伸長を行う場合は2回に
1回は同じフレームを繰り返すため、最初のフレームは
出力用バッファ3に送り、次のフレームは作業用バッフ
ァ2及び出力用バッファ3に送られる。尚、前記作業用
バッファ2は出力用バッファ3から出力されるフレーム
の次に繋ぐ波形をためておくところである。出力用バッ
ファ3は2フレ−ムたまると古い1フレームを出力する
ところである。4は相互相関計算部で、作業用バッファ
2と出力用バッファ3との相互相関を計算し、その相関
の最大となる間隔をピッチと設定する。5は出力波形整
形部で、その動作について図2、図3を参照して説明す
る。図2に於て、t1は出力用バッファ3から出力され
たフレームF1の最後からピッチ長で振幅が最大となる
位置までの時間、t2は作業用バッファ2から出力され
たフレームF2の最初からピッチ長で振幅が最大となる
位置までの時間、波形1はt2からピッチ長の波形であ
る。また、t1区間に破線で示されるような1から0に
直線を引いて窓をかける。t2区間には破線で示される
ような0から1に直線を引いて窓をかける。更に、図中
(A)部に示す波形1に対して破線に示されるような0
〜1になり、また0に戻るような窓をかける。そして、
上の段と下の段の波形を加え合わせ、図4のような波形
を形成する。図3は、t1+t2がピッチ長よりも長い
場合で、波形2は作業用バッファ側のフレームF2の先
頭からt2+ピッチ長分の波形で、出力用バッファ側の
フレームF1の後方にかける窓はt3(=t1+t2−
ピッチ長)区間にかける。作業用バッファ側のフレーム
F2の先方にかける窓はt2区間にかける。更に、図中
(B)部に示す波形2に対して破線に示されるような0
〜1になり、また0に戻るような窓をかけ、上の段と下
の段の波形を加え合わせて、図5のような波形を形成す
る。
DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. In FIG. 1, reference numeral 1 denotes a compression/expansion adjustment unit that selects whether to send the input audio frame by frame to the work buffer 2 or the output buffer 3, or to delete it, according to the input compression rate or expansion rate. For example, when compressing by 2/3 times, deletion is performed once every three times. In other words, the first frame is sent to the output buffer 3, the next frame is deleted, and the next frame is sent to the work buffer 2.The frame after deletion is always sent to the work buffer 2.
sent to. Also, when decompressing to 3/2 times, the same frame is repeated every two times, so the first frame is sent to output buffer 3, and the next frame is sent to work buffer 2 and output buffer 3. It will be done. The working buffer 2 is used to store waveforms to be connected to the next frame outputted from the output buffer 3. The output buffer 3 outputs one old frame when two frames are accumulated. 4 is a cross-correlation calculation unit that calculates the cross-correlation between the working buffer 2 and the output buffer 3, and sets the interval at which the correlation is maximum as the pitch. 5 is an output waveform shaping section, and its operation will be explained with reference to FIGS. 2 and 3. In FIG. 2, t1 is the time from the end of frame F1 output from the output buffer 3 to the position where the amplitude is maximum in pitch length, and t2 is the pitch from the beginning of frame F2 output from the work buffer 2. Waveform 1 is a waveform from t2 to the pitch length. Furthermore, a window is created by drawing a straight line from 1 to 0 as shown by the broken line in the t1 interval. In the t2 interval, a window is drawn by drawing a straight line from 0 to 1 as shown by the broken line. Furthermore, for waveform 1 shown in part (A) in the figure, 0 as shown by the broken line
Create a window so that it becomes ~1 and then returns to 0. and,
The waveforms of the upper and lower rows are added together to form a waveform as shown in FIG. In FIG. 3, t1+t2 is longer than the pitch length, and waveform 2 is a waveform corresponding to t2+pitch length from the beginning of frame F2 on the work buffer side, and the window applied to the rear of frame F1 on the output buffer side is t3 ( =t1+t2-
pitch length) section. The window placed ahead of the frame F2 on the work buffer side is placed in the t2 interval. Furthermore, for waveform 2 shown in part (B) in the figure, 0 as shown by the broken line
A window is created so that the value becomes ~1 and then returns to 0, and the waveforms of the upper and lower rows are added together to form a waveform as shown in FIG.

【0009】尚、図2、図3の2つの場合とも、出力波
形整形部5で波形が接続され、整形された後、フレーム
F1から2フレーム分を取り出して出力用バッファ3に
返して、図4、図5に示されるように、フレーム接続の
際に固定のフレーム長より延びた分だけ破棄する。
In both cases of FIGS. 2 and 3, after the waveforms are connected and shaped by the output waveform shaping section 5, two frames are extracted from the frame F1 and returned to the output buffer 3. 4. As shown in FIG. 5, when connecting frames, only the part that extends beyond the fixed frame length is discarded.

【0010】0010

【発明の効果】本発明では接続するフレーム間のピーク
の位置をピッチの整数倍とすることで歪みの少ないフレ
ーム間の結合が得られ、フレーム単位での時間軸圧縮伸
長を行うことができるので有益である。
[Effects of the Invention] In the present invention, by setting the peak position between connected frames to an integral multiple of the pitch, a connection between frames with less distortion can be obtained, and time axis compression/expansion can be performed on a frame-by-frame basis. Beneficial.

【図面の簡単な説明】[Brief explanation of the drawing]

【図1】本発明を説明するための概略ブロック図である
FIG. 1 is a schematic block diagram for explaining the present invention.

【図2】本発明の出力波形整形部の処理を説明するため
の図である。
FIG. 2 is a diagram for explaining processing of an output waveform shaping section of the present invention.

【図3】本発明の出力波形整形部の処理を説明するため
の図である。
FIG. 3 is a diagram for explaining the processing of the output waveform shaping section of the present invention.

【図4】本発明の出力波形整形部の処理を説明するため
の図である。
FIG. 4 is a diagram for explaining the processing of the output waveform shaping section of the present invention.

【図5】本発明の出力波形整形部の処理を説明するため
の図である。
FIG. 5 is a diagram for explaining the processing of the output waveform shaping section of the present invention.

【符号の説明】[Explanation of symbols]

1    圧縮伸長調整部 2    作業用バッファ 3    出力用バッファ 4    相互相関計算部 5    出力波形整形部 1 Compression/expansion adjustment section 2 Work buffer 3 Output buffer 4 Cross correlation calculation section 5 Output waveform shaping section

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】  音声処理を行うフレームを最大ピッチ
が2つ以上入る大きさの固定長のものとし、そのフレー
ムを数個置きに削除することによって時間軸圧縮を行い
、また数個おきに同じフレームを挿入することによって
時間軸伸長を行うようにする際に、繋ぐ2つのフレーム
間の相互相関を計算し、その相互相関の最大となる間隔
をピッチとし、繋ぎ合わせる前のフレームの最後のピッ
チ長で振幅が最大となる位置を求め、繋ぎ合わせる後の
フレームの最初のピッチ長で振幅が最大となる位置を求
め、それらの2つの位置の間隔がピッチの整数倍となる
ように繋ぐと共に、フレームの繋ぎ目で出力波形を整形
することを特徴とした音声時間軸圧縮伸長方法。
[Claim 1] The frames for which audio processing is to be performed are of a fixed length that is large enough to accommodate two or more maximum pitches, and the time axis is compressed by deleting every few frames, and the same frame is deleted every few frames. When extending the time axis by inserting a frame, calculate the cross-correlation between the two frames to be connected, set the interval with the maximum cross-correlation as the pitch, and set the pitch at the end of the frame before joining. Find the position where the amplitude is maximum at the length, find the position where the amplitude is maximum at the first pitch length of the frames after joining, and connect these two positions so that the interval is an integral multiple of the pitch, and An audio time axis compression/expansion method characterized by shaping the output waveform at the joints of frames.
JP2404324A 1990-12-20 1990-12-20 Time base compressing and elongating method Pending JPH04219797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2404324A JPH04219797A (en) 1990-12-20 1990-12-20 Time base compressing and elongating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2404324A JPH04219797A (en) 1990-12-20 1990-12-20 Time base compressing and elongating method

Publications (1)

Publication Number Publication Date
JPH04219797A true JPH04219797A (en) 1992-08-10

Family

ID=18514002

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2404324A Pending JPH04219797A (en) 1990-12-20 1990-12-20 Time base compressing and elongating method

Country Status (1)

Country Link
JP (1) JPH04219797A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997026647A1 (en) * 1996-01-19 1997-07-24 Matsushita Electric Industrial Co., Ltd. Reproducing speed changer
JP2006084754A (en) * 2004-09-16 2006-03-30 Oki Electric Ind Co Ltd Voice recording and reproducing apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997026647A1 (en) * 1996-01-19 1997-07-24 Matsushita Electric Industrial Co., Ltd. Reproducing speed changer
US6085157A (en) * 1996-01-19 2000-07-04 Matsushita Electric Industrial Co., Ltd. Reproducing velocity converting apparatus with different speech velocity between voiced sound and unvoiced sound
JP2006084754A (en) * 2004-09-16 2006-03-30 Oki Electric Ind Co Ltd Voice recording and reproducing apparatus

Similar Documents

Publication Publication Date Title
JP3546755B2 (en) Method and apparatus for companding time axis of rhythm sound source signal
US6735738B1 (en) Method and device for reconstructing acoustic data and animation data in synchronization
CN108259965B (en) Video editing method and system
JP3063838B2 (en) Audio / video synchronous playback apparatus and method
JP4630876B2 (en) Speech speed conversion method and speech speed converter
JP4319548B2 (en) Audio program playback method and apparatus during video trick mode playback
CN100555876C (en) Signal processor and method
EP1655963A3 (en) Compressed picture data editing apparatus and method
US6925340B1 (en) Sound reproduction method and sound reproduction apparatus
WO1997019552A3 (en) Method and apparatus for implementing playback features for compressed video data
US6292454B1 (en) Apparatus and method for implementing a variable-speed audio data playback system
JPH04219797A (en) Time base compressing and elongating method
KR20090058522A (en) Network jitter smoothing with reduced delay
JP3947352B2 (en) Playback device
JP4534168B2 (en) Information processing apparatus and method, recording medium, and program
JP4212253B2 (en) Speaking speed converter
JP2000324448A5 (en)
JP2000259200A (en) Method and device for converting speaking speed, and recording medium storing speaking speed conversion program
JP3162945B2 (en) Video tape recorder
EP1453036B1 (en) Method and apparatus for synthesizing speech from text
JP2987246B2 (en) How to edit multimedia data
JP5325059B2 (en) Video / audio synchronized playback device, video / audio synchronized processing device, video / audio synchronized playback program
JP4648183B2 (en) Continuous media data shortening reproduction method, composite media data shortening reproduction method and apparatus, program, and computer-readable recording medium
KR19990087745A (en) Partial editing method of digital compressed storage data
JPH04285769A (en) Multi-media data editing method