JPH08146985A

JPH08146985A - Speaking speed control system

Info

Publication number: JPH08146985A
Application number: JP6283641A
Authority: JP
Inventors: Koji Tanaka; 浩司田中; Masanori Miyatake; 正典宮武; Masayuki Iida; 正幸飯田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1994-11-17
Filing date: 1994-11-17
Publication date: 1996-06-07

Abstract

PURPOSE: To provide a speaking speed control system which can control the speaking speed on the basis of speaking speed control information for a reproducing device which previously stores information for controlling the speaking speed in sent data or on the sound recording media, etc., and receives and reproduces the transmitted data or a reproducing device for the sound recording media. CONSTITUTION: The speaking speed control system is equipped with an editing device 1 which generates a signal by adding speaking speed control information to a speech signal and a reproducing device 2 which separates the signal generated by the editing device 1 into the speech signal and speaking speed control information and performs speaking speed control over the speech signal according to the speaking speed control information.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、放送、電話などの通
信、パソコン、音声メール等の送信データに話速を制御
するための制御情報を入れておき、受信側において制御
情報に基づいて話速を制御する話速変換制御システム、
または録音テープ、光磁気ディスク、レーザディスク、
ＣＤ−ＲＯＭ、ビデオＣＤ、ＩＣメモリ等の録音メディ
アに話速を制御するための制御情報を記録しておき、再
生時に制御情報に基づいて話速を制御する話速制御シス
テムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention includes control information for controlling the speech speed in transmission data such as broadcasting, telephone communication, personal computer, voice mail, etc., and the receiving side talks based on the control information. Speech speed conversion control system to control the speed,
Or recording tape, magneto-optical disk, laser disk,
The present invention relates to a voice speed control system in which control information for controlling the voice speed is recorded in a recording medium such as a CD-ROM, a video CD, an IC memory, etc., and the voice speed is controlled based on the control information during reproduction.

【０００２】[0002]

【従来の技術】本出願人は、音声区間の入力音声信号の
時間長さを圧縮・伸長する手段および所定長さ以上の無
音区間の入力音声信号を削除する削除手段を有する話速
変換装置を開発している。このような話速変換装置で
は、入力された音声信号を分析して、分析結果に応じて
圧縮・伸長処理、削除処理等が行なわれている。2. Description of the Related Art The applicant of the present invention has proposed a speech speed conversion apparatus having means for compressing / expanding the time length of an input voice signal in a voice section and deleting means for deleting an input voice signal in a silent section of a predetermined length or more. We are developing. In such a speech speed conversion device, an input voice signal is analyzed, and compression / expansion processing, deletion processing and the like are performed according to the analysis result.

【０００３】このような話速変換装置の入力音声信号と
しては、放送、電話などの通信、パソコン、音声メール
等の送信データ、録音テープ、光磁気ディスク、レーザ
ディスク、ＣＤ−ＲＯＭ、ビデオＣＤ、ＩＣメモリ等の
録音メディアから読み出されたデータがある。Input voice signals of such a speech speed converter include broadcast, telephone communication, personal computer, transmission data such as voice mail, recording tape, magneto-optical disk, laser disk, CD-ROM, video CD, There is data read from a recording medium such as an IC memory.

【０００４】[0004]

【発明が解決しようとする課題】この発明は、予め送信
データ、録音メディア等に話速を制御するための情報を
入れておき、送信データを受信して再生する再生装置ま
たは録音メディアの再生装置において、話速制御情報に
基づいて話速を制御できる話速制御システムを提供する
ことを目的とする。DISCLOSURE OF THE INVENTION The present invention provides a reproducing apparatus or a recording medium reproducing apparatus which previously stores information for controlling the speech speed in transmission data, recording medium, etc., and receives and reproduces the transmission data. In order to provide a speech speed control system capable of controlling the speech speed based on the speech speed control information.

【０００５】[0005]

【課題を解決するための手段】この発明による話速制御
システムは、話速制御情報が音声信号に付加された信号
を生成する編集装置、および編集装置によって生成され
た信号から音声信号と話速制御情報とを分離し、かつ音
声信号を話速制御情報にしたがって話速制御する再生装
置を備えていることを特徴とする。SUMMARY OF THE INVENTION A speech speed control system according to the present invention includes an editing apparatus for generating a signal in which speech speed control information is added to a speech signal, and a speech signal and a speech speed from a signal generated by the editing apparatus. It is characterized in that it is provided with a reproducing device for separating the control information and controlling the voice speed of the voice signal according to the voice speed control information.

【０００６】編集装置としては、たとえば、話速制御情
報が音声信号に付加された信号を生成して、録音メディ
アに記録するものが用いられ、再生装置としては、たと
えば、上記録音メディアから話速制御情報が音声信号に
付加された信号を読み出して、音声信号と話速制御情報
とを分離し、かつ音声信号を話速制御情報にしたがって
話速制御するものが用いられる。As the editing device, for example, a device for generating a signal in which voice speed control information is added to an audio signal and recording it on a recording medium is used, and as a reproducing device, for example, the voice speed from the recording medium is used. A signal is used in which a signal in which control information is added to a voice signal is read to separate the voice signal from the voice speed control information, and the voice speed of the voice signal is controlled according to the voice speed control information.

【０００７】また、編集装置としては、たとえば、話速
制御情報が音声信号に付加された信号を生成して、送信
データを作成するものが用いられ、再生装置としては、
たとえば、上記送信データを受信して、受信した上記送
信データから音声信号と話速制御情報とを分離し、かつ
音声信号を話速制御情報にしたがって話速制御するもの
が用いられる。As the editing device, for example, a device for generating transmission data by generating a signal in which speech speed control information is added to an audio signal is used, and a reproducing device is as follows.
For example, one that receives the transmission data, separates the voice signal and the voice speed control information from the received transmission data, and controls the voice speed of the voice signal according to the voice speed control information is used.

【０００８】話速制御情報を音声信号に付加する方法と
しては、時分割多重方式、周波数分割多重方式、切換伝
送方式等が用いられる。As a method of adding the voice speed control information to the voice signal, a time division multiplexing system, a frequency division multiplexing system, a switching transmission system, etc. are used.

【０００９】再生装置として、たとえば、音声信号を時
間軸圧縮伸長処理する処理手段と、音声信号を削除処理
する処理手段とを有する話速変換部を備えたものが用い
られれる。話速制御情報としては、たとえば、音声信号
を時間軸伸長圧縮処理を行なうための情報と、音声信号
を削除処理するための情報とが用いられる。また、話速
制御情報としては、音声区間であることを示す情報と、
無音区間の継続数を示す情報とが用いられる。As the reproducing apparatus, for example, one provided with a speech speed conversion unit having a processing means for time-axis compression / expansion processing of an audio signal and a processing means for deleting the audio signal is used. As the speech speed control information, for example, information for performing a time axis expansion / compression process on an audio signal and information for deleting the audio signal are used. Also, as the voice speed control information, information indicating that it is a voice section,
Information indicating the number of continuous silent sections is used.

【００１０】この発明による編集装置は、音声信号を分
析して話速制御情報を生成する手段、および生成された
話速制御情報を音声信号に付加する手段を備えているこ
とを特徴とする。An editing apparatus according to the present invention is characterized by including means for analyzing a voice signal to generate voice speed control information, and means for adding the generated voice speed control information to the voice signal.

【００１１】この発明による再生装置は、話速制御情報
が付加された音声信号から音声信号と話速制御情報とを
分離する手段、および音声信号を話速制御情報にしたが
って話速制御する手段を備えていることを特徴とする。The reproducing apparatus according to the present invention comprises means for separating the voice signal and the voice speed control information from the voice signal to which the voice speed control information is added, and means for controlling the voice speed of the voice signal according to the voice speed control information. It is characterized by having.

【００１２】[0012]

【作用】この発明による話速制御システムでは、話速制
御情報が音声信号に付加された信号が編集装置によって
生成される。そして、再生装置側で、編集装置によって
生成された信号から音声信号と話速制御情報とが分離さ
れ、かつ音声信号が話速制御情報にしたがって話速制御
される。In the speech speed control system according to the present invention, the editing device generates a signal in which the speech speed control information is added to the voice signal. Then, on the reproducing device side, the voice signal and the voice speed control information are separated from the signal generated by the editing device, and the voice speed of the voice signal is controlled according to the voice speed control information.

【００１３】この発明による編集装置では、音声信号を
分析して話速制御情報が生成され、生成された話速制御
情報が音声信号に付加される。In the editing device according to the present invention, the voice signal is analyzed to generate the voice speed control information, and the generated voice speed control information is added to the voice signal.

【００１４】この発明による再生装置では、話速制御情
報が付加された音声信号から音声信号と話速制御情報と
が分離され、音声信号が話速制御情報にしたがって話速
制御される。In the reproducing apparatus according to the present invention, the voice signal and the voice speed control information are separated from the voice signal to which the voice speed control information is added, and the voice speed of the voice signal is controlled according to the voice speed control information.

【００１５】[0015]

【実施例】以下、図面を参照して、この発明の実施例に
ついて説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１６】図１は、話速制御システムを示している。
話速制御システムは、話速制御情報が音声信号に付加さ
れた信号を生成する編集装置１と、編集装置１によって
生成された信号から音声信号と話速制御情報とを分離
し、かつ音声信号を話速制御情報にしたがって話速制御
する再生装置２とを備えている。FIG. 1 shows a speech speed control system.
The speech speed control system includes an editing apparatus 1 that generates a signal in which speech speed control information is added to a voice signal, a voice signal and a speech speed control information that are separated from the signal generated by the editing apparatus 1, and a voice signal. And a reproducing device 2 for controlling the voice speed according to the voice speed control information.

【００１７】編集装置１としては、話速制御情報が音声
信号に付加された信号から送信データを作成するもの、
話速制御情報が音声信号に付加された信号を録音メディ
ア３に記録するものがある。前者の場合には、編集装置
１から話速制御情報が音声信号に付加された送信データ
が再生装置に送信される。後者の場合には、編集装置１
によって録音メディア３から話速制御情報が音声信号に
付加された信号が読み出される。The editing device 1 creates transmission data from a signal in which speech speed control information is added to a voice signal,
There is one in which a signal in which the voice speed control information is added to a voice signal is recorded in the recording medium 3. In the former case, the editing device 1 transmits the transmission data in which the voice speed control information is added to the audio signal to the reproducing device. In the latter case, the editing device 1
Thus, a signal in which the voice speed control information is added to the audio signal is read from the recording medium 3.

【００１８】図２は、編集装置１の構成を示している。
編集装置１では、元となる音声信号が音声信号分析部１
１によって分析されて話速制御情報が生成される。生成
された話速制御情報は、コード化部１２によってコード
化される。そして、元となる音声信号と生成されたコー
ドとが多重化部１３によって多重化される。この多重化
信号に基づいて送信データが作成されるか、またはこの
多重化信号が録音メディアに記録される。FIG. 2 shows the structure of the editing apparatus 1.
In the editing apparatus 1, the original audio signal is the audio signal analysis unit 1
1 to analyze the speech speed control information. The generated speech speed control information is coded by the coding unit 12. Then, the original audio signal and the generated code are multiplexed by the multiplexing unit 13. Transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on a recording medium.

【００１９】図３は、再生装置２の構成を示している。
再生装置２には、編集装置１によって作成された送信デ
ータまたは編集装置１によって多重化信号が記録された
録音メディアからの読み出しデータが入力される。FIG. 3 shows the structure of the reproducing apparatus 2.
The reproduction device 2 receives the transmission data created by the editing device 1 or the read data from the recording medium on which the multiplexed signal is recorded by the editing device 1.

【００２０】入力された多重化信号は、復調部２１で復
調され、話速制御情報と音声信号とに分離される。音声
信号は、話速制御部２２に入力される。話速制御部２２
には、話速制御情報と復調部２１からの音声／話速制御
情報同期用制御信号が制御信号として入力される。話速
制御部２２では、話速制御情報に基づいて、入力音声の
話速が制御される。The input multiplexed signal is demodulated by the demodulator 21 and separated into speech speed control information and a voice signal. The voice signal is input to the speech speed control unit 22. Speech speed control unit 22
The voice speed control information and the voice / speech speed control information synchronizing control signal from the demodulation unit 21 are input to the control signal. The voice speed control unit 22 controls the voice speed of the input voice based on the voice speed control information.

【００２１】多重化方式としては、周波数分割多重方
式、時分割多重方式等が用いられる。なお、編集装置１
において、音声信号が符号化されている場合には、復調
部２１によって分離された音声信号は、復号化された後
に話速制御部２２に送られる。A frequency division multiplexing method, a time division multiplexing method, or the like is used as the multiplexing method. The editing device 1
In, when the voice signal is encoded, the voice signal separated by the demodulation unit 21 is sent to the voice speed control unit 22 after being decoded.

【００２２】図４は、話速制御部２２の構成を示してい
る。話速制御部２２は、話速制御情報を解析する話速制
御情報解析部３１、音声／話速制御情報同期用制御信号
に基づいて、話速制御情報解析部３１に同期信号を送る
制御情報同期化部３２および話速制御情報解析部３１の
解析結果に基づいて音声信号を話速変換する話速変換部
３３を備えている。FIG. 4 shows the structure of the speech speed control section 22. The voice speed control unit 22 controls the voice speed control information analysis unit 31 for analyzing the voice speed control information, and the control information for transmitting the synchronization signal to the voice speed control information analysis unit 31 based on the voice / voice speed control information synchronization control signal. A speech speed conversion unit 33 for converting the speech speed of the voice signal based on the analysis results of the synchronization unit 32 and the speech speed control information analysis unit 31 is provided.

【００２３】話速変換部３３は、話速制御情報解析部３
１の解析結果に応じて、入力音声信号の圧縮伸長処理、
入力音声信号の削除処理等を行なう。The speech speed conversion unit 33 includes a speech speed control information analysis unit 3
According to the analysis result of 1, the compression / expansion processing of the input audio signal
The input voice signal is deleted.

【００２４】図５は、編集装置１の第１動作例を示して
いる。ここでは、映像と音声とを伴う放送番組の音声信
号が入力されているものとする。また、説明の便宜上、
再生装置２側で設定される再生速度としては、１倍速再
生と２倍速再生との２種があるとする。また、音声信号
に多重化される話速制御情報には、１倍速再生用と２倍
速再生用とがある。FIG. 5 shows a first operation example of the editing apparatus 1. Here, it is assumed that an audio signal of a broadcast program including video and audio is input. Also, for convenience of explanation,
It is assumed that there are two types of playback speeds set on the playback device 2 side: 1 × speed playback and 2 × speed playback. Further, the voice speed control information multiplexed with the audio signal includes one for 1 × speed reproduction and one for 2 × speed reproduction.

【００２５】まず、音声信号分析部１１によって入力音
声信号の所定区間毎に入力音声信号のパワー平均値Ｐが
算出される（ステップ１）。次に、パワー平均値Ｐが所
定のしきい値Ｔｈ以上か否かが判別される（ステップ
２）。パワー平均値Ｐが所定のしきい値Ｔｈ以上（Ｐ≧
Ｔｈ）である場合には、当該区間は音声区間であると判
別され、音声信号の時間長さを伸長する区間と判定され
る（ステップ３）。ここでは、音声区間以外の低レベル
の定常雑音や環境音も無音区間として取り扱われるよう
に、しきい値が設定されている。この後、圧縮率αが設
定される（ステップ４）。First, the voice signal analyzer 11 calculates the power average value P of the input voice signal for each predetermined section of the input voice signal (step 1). Next, it is determined whether or not the power average value P is equal to or larger than a predetermined threshold Th (step 2). The power average value P is equal to or more than a predetermined threshold Th (P ≧
If it is Th), the section is determined to be a voice section, and is determined to be a section for extending the time length of the voice signal (step 3). Here, the threshold value is set so that low-level stationary noise and environmental sounds other than the voice section are also treated as the silent section. Thereafter, the compression rate α is set (step 4).

【００２６】２倍速再生用の圧縮率αは、たとえば、１
／２≦α≦１の範囲内の所定の値に設定される。圧縮率
αが２倍速再生時の一般的な圧縮率である１／２である
場合には、出力音声速度は入力音声速度の２倍となる。
圧縮率αが１である場合には、出力音声速度は入力音声
速度の１倍となる。したがって、出力音声速度が入力音
声速度の１倍以上で２倍以下となる範囲内で、圧縮率α
が設定される。The compression rate α for double speed reproduction is, for example, 1
It is set to a predetermined value within the range of / 2≤α≤1. When the compression rate α is 1/2 which is a general compression rate during double speed reproduction, the output audio speed is twice the input audio speed.
When the compression rate α is 1, the output voice speed is 1 time the input voice speed. Therefore, within the range in which the output voice speed is not less than 1 time and not more than 2 times the input voice speed, the compression rate α
Is set.

【００２７】ここでは、圧縮率αが２／３に設定される
とする。この場合には、再生時において、音声の３ピッ
チ周期が２ピッチ周期に間引かれる。このため、再生時
においては、出力音声速度は入力音声速度の３／２倍と
なる。このように、圧縮率２／３で圧縮された場合に
は、２倍速再生時の一般的な圧縮率１／２の場合に比べ
て、２／３−１／２＝１／６だけ、音声が伸長されるこ
とになる。Here, it is assumed that the compression rate α is set to 2/3. In this case, at the time of reproduction, the 3-pitch cycle of the sound is thinned out to the 2-pitch cycle. Therefore, during reproduction, the output voice speed is 3/2 times the input voice speed. In this way, when compressed at a compression rate of ⅔, only 2 / 3−1 / 2 = 1/6 of the audio is compared with the case of a general compression rate of ½ at the time of double speed reproduction. Will be extended.

【００２８】１倍速再生用の圧縮率αは、たとえば、１
≦α≦３／２の範囲内の所定の値に設定される。圧縮率
αが１倍速再生時において１である場合には、出力音声
速度は入力音声速度の１倍となる。圧縮率αが３／２で
ある場合には、出力音声速度は入力音声速度の２／３倍
となる。したがって、出力音声速度が入力音声速度の２
／３倍以上で１倍以下となる範囲内で、圧縮率αが設定
される。The compression rate α for 1 × speed reproduction is, for example, 1
It is set to a predetermined value within the range of ≦ α ≦ 3/2. When the compression rate α is 1 during the 1 × speed reproduction, the output voice speed is 1 × the input voice speed. When the compression rate α is 3/2, the output voice speed is 2/3 times the input voice speed. Therefore, the output voice speed is 2 times the input voice speed.
The compression rate α is set within the range of ⅓ or more and 1 or less.

【００２９】ここでは、圧縮率αが３／２に設定される
とする。この場合には、再生時において、音声の２ピッ
チ周期が３ピッチ周期に伸長される。このため、再生時
においては、出力音声速度は入力音声速度の２／３倍と
なる。このように、圧縮率３／２で伸長された場合に
は、１倍速再生の通常再生時に対して、３／２−１＝１
／２だけ、音声が伸長されることになる。Here, it is assumed that the compression rate α is set to 3/2. In this case, at the time of reproduction, the 2-pitch cycle of voice is extended to the 3-pitch cycle. Therefore, at the time of reproduction, the output voice speed is ⅔ times the input voice speed. In this way, when the data is expanded at the compression rate of 3/2, it is 3 / 2−1 = 1 as compared with the normal reproduction of the 1 × speed reproduction.
The voice is expanded by / 2.

【００３０】次に、１倍速再生速度および２倍速再生速
度に応じてそれぞれ設定された圧縮率αからなる話速制
御情報が生成され、コード化部１２によってコード化さ
れる（ステップ５）。次に、多重化部１３によって、当
該区間の入力音声信号と話速制御情報とが多重化される
（ステップ６）。そして、この多重化信号に基づいて送
信データが作成されるかまたはこの多重化信号が録音メ
ディアに記録される（ステップ７）。Next, the voice speed control information having the compression rate α set according to the 1 × speed reproduction speed and the 2 × speed reproduction speed is generated and coded by the coding unit 12 (step 5). Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 6). Then, transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on a recording medium (step 7).

【００３１】上記ステップ２において、パワー平均値Ｐ
が所定のしきい値Ｔｈより小さいときには（Ｐ＜Ｔ
ｈ）、当該区間は無音区間であると判別され、無音区間
の継続数が算出される（ステップ８）。そして、無音区
間の継続数が所定数Ｔｄｅｌ以上であるか否かが判別さ
れる（ステップ９）。無音区間の継続数が所定数Ｔｄｅ
ｌより少ないときには、音声信号の時間長さを伸長する
区間と判定される（ステップ１０）。In step 2, the power average value P
Is smaller than a predetermined threshold Th (P <T
h), the section is determined to be a silent section, and the number of continuous silent sections is calculated (step 8). Then, it is determined whether or not the number of continuous silent sections is equal to or larger than the predetermined number Tdel (step 9). The number of continuous silent sections is a predetermined number Tde
When it is less than l, it is determined to be a section in which the time length of the audio signal is extended (step 10).

【００３２】次に、上記ステップ４と同様に、１倍速再
生用および２倍速再生用の圧縮率αが設定される（ステ
ップ１１）。そして、１倍速再生および２倍速再生用の
圧縮率αからなる話速制御情報が生成され、コード化部
１２によってコード化される（ステップ１１）。次に、
多重化部１３によって、当該区間の入力音声信号と話速
制御情報とが多重化される（ステップ６）。そして、こ
の多重化信号に基づいて送信データが作成されるかまた
はこの多重化信号が録音メディアに記録される（ステッ
プ７）。Next, similarly to step 4, the compression rates α for 1 × speed reproduction and 2 × speed reproduction are set (step 11). Then, the voice speed control information including the compression rate α for the 1 × speed reproduction and the 2 × speed reproduction is generated and encoded by the encoding unit 12 (step 11). next,
The multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 6). Then, transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on a recording medium (step 7).

【００３３】上記ステップ９において、無音区間の継続
数が所定数Ｔｄｅｌ以上であると判定されたときには、
当該区間は削除すべき区間と判定される（ステップ１
２）。そして、当該区間の入力音声信号を削除区間とす
る制御情報が生成され、コード化部１２によってコード
化される（ステップ１３）。When it is determined in step 9 that the number of continuous silent sections is greater than or equal to the predetermined number Tdel,
The section is determined to be a section to be deleted (step 1
2). Then, the control information in which the input audio signal of the section is set as the deletion section is generated and coded by the coding unit 12 (step 13).

【００３４】次に、多重化部１３によって、当該区間の
入力音声信号と話速制御情報とが多重化される（ステッ
プ６）。そして、この多重化信号に基づいて送信データ
が作成されるかまたはこの多重化信号が録音メディアに
記録される（ステップ７）。Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 6). Then, transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on a recording medium (step 7).

【００３５】このようにして生成された多重化信号が再
生装置２側で再生される場合には、操作者によって再生
速度（１倍速または２倍速）が設定される。入力音声信
号のうち、設定された再生速度に対する圧縮率αが話速
制御情報として設定されている区間では、その圧縮率α
で入力音声信号が時間軸圧縮伸長処理された後、再生さ
れる。また、入力音声信号のうち、話速制御情報によっ
て削除される区間であると指定されている区間では、入
力音声信号が削除される。When the multiplexed signal thus generated is reproduced on the reproducing device 2 side, the reproduction speed (1 × speed or 2 × speed) is set by the operator. In the section in which the compression rate α for the set playback speed is set as the voice speed control information in the input audio signal, the compression rate α is set.
At, the input audio signal is time-axis compressed / decompressed and then reproduced. Further, in the input voice signal, the input voice signal is deleted in the section designated to be deleted by the voice speed control information.

【００３６】図６は、編集装置１の第２動作例を示して
いる。ここでは、映像と音声とを伴う放送番組の音声信
号が入力されているものとする。また、説明の便宜上、
再生装置２側で設定される再生速度としては、１倍速再
生と２倍速再生との２種があるとする。また、音声信号
に多重化される話速制御情報には、１倍速再生用と２倍
速再生用とがある。FIG. 6 shows a second operation example of the editing apparatus 1. Here, it is assumed that an audio signal of a broadcast program including video and audio is input. Also, for convenience of explanation,
It is assumed that there are two types of playback speeds set on the playback device 2 side: 1 × speed playback and 2 × speed playback. Further, the voice speed control information multiplexed with the audio signal includes one for 1 × speed reproduction and one for 2 × speed reproduction.

【００３７】まず、音声信号分析部１１によって入力音
声信号の所定区間毎に入力音声信号のパワー平均値Ｐが
算出される（ステップ２１）。次に、パワー平均値Ｐが
所定のしきい値Ｔｈ以上か否かが判別される（ステップ
２２）。パワー平均値Ｐが所定のしきい値Ｔｈ以上（Ｐ
≧Ｔｈ）である場合には、当該区間は音声区間であると
判別され、音声信号の時間長さを伸長する区間と判定さ
れる（ステップ２３）。そして、音声信号の時間長さを
伸長する区間とする旨の話速制御情報が生成され、コー
ド化部１２によってコード化される（ステップ２４）。First, the voice signal analysis unit 11 calculates the power average value P of the input voice signal for each predetermined section of the input voice signal (step 21). Next, it is determined whether or not the power average value P is equal to or larger than a predetermined threshold Th (step 22). The power average value P is equal to or more than a predetermined threshold Th (P
If ≧ Th), the section is determined to be a voice section, and is determined to be a section for extending the time length of the voice signal (step 23). Then, the voice speed control information indicating that the time length of the audio signal is to be extended is generated and encoded by the encoding unit 12 (step 24).

【００３８】次に、多重化部１３によって、当該区間の
入力音声信号と話速制御情報とが多重化される（ステッ
プ２５）。この多重化信号に基づいて送信データが作成
されるかまたはこの多重化信号が録音メディアに記録さ
れる（ステップ２６）。Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 25). Transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on a recording medium (step 26).

【００３９】上記ステップ２２において、パワー平均値
Ｐが所定のしきい値Ｔｈより小さいときには（Ｐ＜Ｔ
ｈ）、当該区間は無音区間であると判別され、無音区間
の継続数が算出される（ステップ２７）。そして、無音
区間の継続数を示す話速制御情報がコード化部１２によ
って生成される（ステップ２８）。In step 22, when the power average value P is smaller than the predetermined threshold value Th (P <T
h), the section is determined to be a silent section, and the number of continuous silent sections is calculated (step 27). Then, the speech rate control information indicating the number of continuous silent sections is generated by the encoding unit 12 (step 28).

【００４０】次に、多重化部１３によって、当該区間の
入力音声信号と話速制御情報とが多重化される（ステッ
プ２５）。この多重化信号に基づいて送信データが作成
されるかまたはこの多重化信号が録音メディアに記録さ
れる（ステップ２６）。Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 25). Transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on a recording medium (step 26).

【００４１】このようにして生成された多重化信号が再
生装置２側で再生される場合には、操作者によって再生
速度（１倍速または２倍速）が設定される。また、設定
した再生速度に応じた圧縮率αが操作者によって設定さ
れる。また、削除区間を決定するための継続長Ｔｄｅｌ
が操作者によって設定される。When the multiplexed signal thus generated is reproduced on the reproducing device 2 side, the reproduction speed (1 × speed or 2 × speed) is set by the operator. Further, the operator sets a compression rate α according to the set reproduction speed. In addition, the continuation length Tdel for determining the deleted section
Is set by the operator.

【００４２】そして、入力音声信号のうち、話速制御情
報によって伸長する区間であると指定された区間では、
操作者によって設定された圧縮率αで入力音声信号が時
間軸圧縮伸長処理される。また、入力音声信号のうち、
話速制御情報によって無音区間の継続数が指定されかつ
その継続長が操作者によって決定された継続長Ｔｄｅｌ
以上である区間では、入力音声信号が削除される。ま
た、入力音声信号のうち、話速制御情報によって無音区
間の継続数が指定されかつその継続長が操作者によって
決定された継続長Ｔｄｅｌ未満である区間では、操作者
によって設定された圧縮率αで入力音声信号が時間軸圧
縮伸長処理される。Then, of the input voice signal, in the section designated to be expanded by the voice speed control information,
The input audio signal is subjected to time-base compression / expansion processing at the compression rate α set by the operator. Also, of the input audio signal,
The duration length Tdel in which the duration number of the silent section is designated by the speech speed control information and the duration length is determined by the operator.
In the section above, the input audio signal is deleted. In the input voice signal, the compression rate α set by the operator is set in a section in which the number of continuous silent sections is designated by the voice speed control information and the continuous length is less than the continuous length Tdel determined by the operator. At, the input audio signal is subjected to time axis compression / expansion processing.

【００４３】図７は、編集装置１の第３動作例を示して
いる。ここでは、映像と音声とを伴う放送番組の音声信
号が入力されているものとする。また、説明の便宜上、
再生装置２側で設定される再生速度としては、１倍速再
生と２倍速再生との２種があるとする。また、音声信号
に多重化される話速制御情報には、１倍速再生用と２倍
速再生用とがある。FIG. 7 shows a third operation example of the editing apparatus 1. Here, it is assumed that an audio signal of a broadcast program including video and audio is input. Also, for convenience of explanation,
It is assumed that there are two types of playback speeds set on the playback device 2 side: 1 × speed playback and 2 × speed playback. Further, the voice speed control information multiplexed with the audio signal includes one for 1 × speed reproduction and one for 2 × speed reproduction.

【００４４】まず、音声信号分析部１１によって入力音
声信号の所定区間毎に入力音声信号のパワー平均値Ｐが
算出される（ステップ３１）。次に、パワー平均値Ｐが
所定のしきい値Ｔｈ以上か否かが判別される（ステップ
３２）。パワー平均値Ｐが所定のしきい値Ｔｈ以上（Ｐ
≧Ｔｈ）である場合には、当該区間は音声区間であると
判別され、音声信号の時間長さを伸長する区間と判定さ
れる（ステップ３３）。First, the voice signal analyzer 11 calculates the power average value P of the input voice signal for each predetermined section of the input voice signal (step 31). Next, it is determined whether or not the power average value P is equal to or greater than a predetermined threshold Th (step 32). The power average value P is equal to or more than a predetermined threshold Th (P
≧ Th), the section is determined to be a voice section, and is determined to be a section for extending the time length of the voice signal (step 33).

【００４５】次に、現時点での音声の伸長量、すなわ
ち、再生時の入力信号に対する出力信号の遅延時間に応
じた値が、１倍速再生速度および２倍再生速度ごとに算
出される（ステップ３４）。そして、１倍速再生速度お
よび２倍再生速度ごとに、伸長量に応じた圧縮率αが設
定される（ステップ３５）。Next, the amount of audio expansion at the present time, that is, a value corresponding to the delay time of the output signal with respect to the input signal during reproduction is calculated for each of the 1 × speed and the 2 × speed (step 34). ). Then, the compression rate α according to the expansion amount is set for each of the 1 × speed and the 2 × speed (step 35).

【００４６】２倍速再生用の圧縮率αは、たとえば、１
／２≦α≦１の範囲内で設定され、現時点での伸長量が
小さいほど圧縮率αは大きくされる。圧縮率αが２倍速
再生時の一般的な圧縮率である１／２である場合には、
出力音声速度は入力音声速度の２倍となり、圧縮率αが
１である場合には、出力音声速度は入力音声速度の１倍
となる。したがって、出力音声速度が入力音声速度の１
倍以上で２倍以下となる範囲内で、圧縮率αが設定され
る。The compression rate α for double speed reproduction is, for example, 1
It is set within the range of / 2≤α≤1, and the compression rate α is increased as the expansion amount at the present time is smaller. When the compression rate α is 1/2 which is a general compression rate during double speed reproduction,
The output voice speed is twice the input voice speed, and when the compression rate α is 1, the output voice speed is one time the input voice speed. Therefore, the output voice speed is 1 times the input voice speed.
The compression rate α is set within the range of not less than twice and not more than twice.

【００４７】また、現時点での伸長量が大きいほど、出
力音声速度が速くなるように圧縮率αが設定される。こ
の理由は、入力信号に対する出力信号の遅延時間が所定
時間以上になるのを防止するためである。Further, the compression rate α is set so that the output voice speed becomes faster as the amount of expansion at the present time becomes larger. The reason for this is to prevent the delay time of the output signal from the input signal from exceeding a predetermined time.

【００４８】１倍速再生用の圧縮率αは、たとえば、１
≦α≦３／２の範囲内の所定の値に設定され、現時点で
の伸長量が小さいほど圧縮率αは大きくされる。圧縮率
αが１倍速再生時において１である場合には、出力音声
速度は入力音声速度の１倍となる。圧縮率αが３／２で
ある場合には、出力音声速度は入力音声速度の２／３倍
となる。したがって、出力音声速度が入力音声速度の２
／３倍以上で１倍以下となる範囲内で、圧縮率αが設定
される。The compression rate α for 1 × speed reproduction is, for example, 1
The compression rate α is set to a predetermined value within the range of ≦ α ≦ 3/2, and the smaller the expansion amount at the present time, the larger the compression rate α. When the compression rate α is 1 during the 1 × speed reproduction, the output voice speed is 1 × the input voice speed. When the compression rate α is 3/2, the output voice speed is 2/3 times the input voice speed. Therefore, the output voice speed is 2 times the input voice speed.
The compression rate α is set within the range of ⅓ or more and 1 or less.

【００４９】１倍速再生用の圧縮率αについても、現時
点での伸長量が大きいほど、出力音声速度が速くなるよ
うに圧縮率αが設定される。この理由は、入力信号に対
する出力信号の遅延時間が所定時間以上になるのを防止
するためである。Regarding the compression rate α for 1 × speed reproduction, the compression rate α is set so that the output voice speed becomes faster as the amount of expansion at the present time becomes larger. The reason for this is to prevent the delay time of the output signal from the input signal from exceeding a predetermined time.

【００５０】次に、１倍速再生速度および２倍速再生速
度に応じてそれぞれ設定された圧縮率αからなる話速制
御情報が生成され、コード化部１２によってコード化さ
れる（ステップ３６）。次に、多重化部１３によって、
当該区間の入力音声信号と話速制御情報とが多重化され
る（ステップ３７）。そして、この多重化信号に基づい
て送信データが作成されるかまたはこの多重化信号が録
音メディアに記録される（ステップ３８）。Next, the voice speed control information consisting of the compression rate α set according to the 1 × speed reproduction speed and the 2 × speed reproduction speed is generated and coded by the coding unit 12 (step 36). Next, the multiplexing unit 13
The input voice signal of the section and the voice speed control information are multiplexed (step 37). Then, transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on the recording medium (step 38).

【００５１】上記ステップ３２において、パワー平均値
Ｐが所定のしきい値Ｔｈより小さいときには（Ｐ＜Ｔ
ｈ）、当該区間は無音区間であると判別され、無音区間
の継続数が算出される（ステップ３９）。そして、無音
区間の継続数が所定数Ｔｄｅｌ以上であるか否かが判別
される（ステップ４０）。無音区間の継続数が所定数Ｔ
ｄｅｌより少ないときには、音声信号の時間長さを伸長
する区間と判定される（ステップ４１）。In step 32, when the power average value P is smaller than the predetermined threshold value Th (P <T
h), the section is determined to be a silent section, and the number of continuous silent sections is calculated (step 39). Then, it is determined whether or not the number of continuous silent sections is equal to or larger than the predetermined number Tdel (step 40). The number of continuous silent sections is a predetermined number T
When it is less than del, it is determined to be a section in which the time length of the audio signal is extended (step 41).

【００５２】次に、上記ステップ３４と同様に、現時点
での音声の伸長量、すなわち、再生時の入力信号に対す
る出力信号の遅延時間に応じた値が、１倍速再生速度お
よび２倍再生速度ごとに算出される（ステップ４２）。
そして、上記ステップ３５と同様に、１倍速再生速度お
よび２倍再生速度こどに、伸長量に応じた圧縮率αが設
定される（ステップ４３）。Next, as in step 34, the amount of audio expansion at the present time, that is, the value corresponding to the delay time of the output signal with respect to the input signal at the time of reproduction is set to the 1 × speed and the 2 × speed. Is calculated (step 42).
Then, similarly to step 35, the compression rate α according to the expansion amount is set for the 1 × speed and the 2 × speed (step 43).

【００５３】そして、１倍速再生速度および２倍速再生
速度に応じてそれぞれ設定された圧縮率αからなる話速
制御情報が生成され、コード化部１２によってコード化
される（ステップ４４）。次に、多重化部１３によっ
て、当該区間の入力音声信号と話速制御情報とが多重化
される（ステップ３７）。そして、この多重化信号に基
づいて送信データが作成されるかまたはこの多重化信号
が録音メディアに記録される（ステップ３８）。Then, the voice speed control information including the compression rate α set according to the 1 × speed reproduction speed and the 2 × speed reproduction speed is generated and encoded by the encoding unit 12 (step 44). Next, the multiplexing unit 13 multiplexes the input voice signal and the voice speed control information in the section (step 37). Then, transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on the recording medium (step 38).

【００５４】上記ステップ４０において、無音区間の継
続数が所定数Ｔｄｅｌ以上であると判定されたときに
は、当該区間は削除すべき区間と判定される（ステップ
４５）。そして、当該区間の入力音声信号を削除区間と
する制御情報がコード化部１２によって生成される（ス
テップ４６）。When it is determined in step 40 that the number of continuous silent sections is equal to or larger than the predetermined number Tdel, the section is determined to be deleted (step 45). Then, the encoding unit 12 generates control information in which the input audio signal of the section is the deletion section (step 46).

【００５５】次に、多重化部１３によって、当該区間の
入力音声信号と話速制御情報とが多重化される（ステッ
プ３７）。そして、この多重化信号に基づいて送信デー
タが作成されるかまたはこの多重化信号が録音メディア
に記録される（ステップ３８）。Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 37). Then, transmission data is created based on this multiplexed signal, or this multiplexed signal is recorded on the recording medium (step 38).

【００５６】このようにして生成された多重化信号が再
生装置２側で再生される場合には、操作者によって再生
速度（１倍速または２倍速）が設定される。入力音声信
号のうち、設定された再生速度に対する圧縮率αが話速
制御情報として設定されている区間では、その圧縮率α
で入力音声信号が時間軸圧縮伸長処理された後、再生さ
れる。また、入力音声信号のうち、話速制御情報によっ
て削除される区間であると指定されている区間では、入
力音声信号が削除される。When the multiplexed signal thus generated is reproduced on the reproducing apparatus 2 side, the reproducing speed (1 × speed or 2 × speed) is set by the operator. In the section in which the compression rate α for the set playback speed is set as the voice speed control information in the input audio signal, the compression rate α is set.
At, the input audio signal is time-axis compressed / decompressed and then reproduced. Further, in the input voice signal, the input voice signal is deleted in the section designated to be deleted by the voice speed control information.

【００５７】この第３動作例においても図６の第２動作
例と同様に、無音区間と判別された場合には（ステップ
３２でＮＯ）、無音区間の継続長を話速制御情報として
生成するようにしてもよい。In the third operation example, as in the second operation example shown in FIG. 6, when it is determined that there is a silent section (NO in step 32), the duration of the silent section is generated as the voice speed control information. You may do it.

【００５８】図８は、編集装置１の第４動作例を示して
いる。ここでは、映像と音声とを伴う放送番組の音声信
号が入力されているものとする。また、説明の便宜上、
再生装置２側で設定される再生速度としては、１倍速再
生と２倍速再生との２種があるとする。また、音声信号
に多重化される話速制御情報には、１倍速再生用と２倍
速再生用とがある。FIG. 8 shows a fourth operation example of the editing apparatus 1. Here, it is assumed that an audio signal of a broadcast program including video and audio is input. Also, for convenience of explanation,
It is assumed that there are two types of playback speeds set on the playback device 2 side: 1 × speed playback and 2 × speed playback. Further, the voice speed control information multiplexed with the audio signal includes one for 1 × speed reproduction and one for 2 × speed reproduction.

【００５９】まず、音声信号分析部１１によって入力音
声信号の所定区間毎に入力音声信号のパワー平均値Ｐが
算出される（ステップ５１）。次に、パワー平均値Ｐが
所定のしきい値Ｔｈ以上か否かが判別される（ステップ
５２）。First, the voice signal analyzer 11 calculates the power average value P of the input voice signal for each predetermined section of the input voice signal (step 51). Next, it is determined whether the power average value P is greater than or equal to a predetermined threshold Th (step 52).

【００６０】パワー平均値Ｐが所定のしきい値Ｔｈ以上
（Ｐ≧Ｔｈ）である場合には、当該区間は音声区間であ
ると判別される。そして、フラグＦの状態に基づいて、
前回の区間は継続数が所定数Ｔｄｅｌ以上の無音区間で
あったか否かが判別される（ステップ５３）。When the power average value P is equal to or larger than the predetermined threshold Th (P ≧ Th), it is determined that the section is a voice section. Then, based on the state of the flag F,
It is determined whether or not the previous section was a silent section whose continuation number is equal to or greater than the predetermined number Tdel (step 53).

【００６１】前回の区間が、継続数が所定数Ｔｄｅｌ以
上の無音区間でない場合（Ｆ＝０）には、圧縮率αが現
在設定されている圧縮率に対して単調減少するように設
定される（ステップ５４）。２倍速再生用の圧縮率α
は、たとえば、１／２≦α≦１の範囲内で設定される。
圧縮率αが２倍速再生時の一般的な圧縮率である１／２
である場合には、出力音声速度は入力音声速度の２倍と
なり、圧縮率αが１である場合には、出力音声速度は入
力音声速度の１倍となる。したがって、出力音声速度が
入力音声速度の１倍以上で２倍以下となる範囲内で、圧
縮率αが設定される。When the previous section is not a silent section in which the number of continuations is the predetermined number Tdel or more (F = 0), the compression rate α is set to monotonically decrease with respect to the currently set compression rate. (Step 54). Compression rate α for double speed playback
Is set within the range of ½ ≦ α ≦ 1.
The compression rate α is 1/2 which is a general compression rate during double speed reproduction.
, The output voice speed is twice the input voice speed, and when the compression rate α is 1, the output voice speed is 1 time the input voice speed. Therefore, the compression rate α is set within a range in which the output voice speed is not less than 1 time and not more than 2 times the input voice speed.

【００６２】ステップ５４で、圧縮率αが単調減少され
るということは、２倍速再生時において、出力音声速度
が入力音声速度の１倍以上で２倍以下となる範囲内で、
再生時の出力音声速度が速くなるように圧縮率αが設定
されることを意味する。これは、２倍速再生時におい
て、入力音声信号に対する出力音声信号の遅延時間が所
定時間以上になるのを防止するためである。In step 54, the compression rate α is monotonically decreased, which means that the output voice speed during double-speed reproduction is not less than 1 time and not more than twice the input voice speed.
This means that the compression rate α is set so that the output audio speed during reproduction becomes faster. This is to prevent the delay time of the output audio signal with respect to the input audio signal from exceeding a predetermined time during double speed reproduction.

【００６３】１倍速再生用の圧縮率αは、たとえば、１
≦α≦３／２の範囲内で設定される。圧縮率αが１であ
る場合には、出力音声速度は入力音声速度の１倍とな
る。圧縮率αが３／２である場合には、出力音声速度は
入力音声速度の２／３倍となる。したがって、出力音声
速度が入力音声速度の２／３倍以上で１倍以下となる範
囲内で、圧縮率αが設定される。The compression rate α for 1 × speed reproduction is, for example, 1
It is set within the range of ≦ α ≦ 3/2. When the compression rate α is 1, the output voice speed is 1 time the input voice speed. When the compression rate α is 3/2, the output voice speed is 2/3 times the input voice speed. Therefore, the compression rate α is set within a range in which the output voice speed is ⅔ or more times the input voice speed and is 1 time or less.

【００６４】ステップ５４で、圧縮率αが単調減少され
るということは、１倍速再生時においては、出力音声速
度が入力音声速度の２／３倍以上で１倍以下となる範囲
内で、再生時の出力音声速度が速くなるように圧縮率α
が設定されることを意味する。これは、１倍速再生時に
おいて、入力音声信号に対する出力音声信号の遅延時間
が所定時間以上になるのを防止するためである。In step 54, the compression rate α is monotonically decreased, meaning that during 1 × speed reproduction, reproduction is performed within a range in which the output audio speed is ⅔ or more times the input audio speed and less than or equal to 1 time. Compression rate α so that the output voice speed at
Means that is set. This is to prevent the delay time of the output audio signal from the input audio signal from exceeding a predetermined time during the 1 × speed reproduction.

【００６５】次に、１倍速再生速度および２倍速再生速
度に応じてそれぞれ設定された圧縮設定αからなる話速
制御情報が生成され、コード化部１２によってコード化
される（ステップ５５）。次に、多重化部１３によっ
て、当該区間の入力音声信号と話速制御情報とが多重化
される（ステップ５６）。そして、この多重化信号に基
づいて送信データが作成されるかまたはこの多重化信号
が録音メディアに記録される（ステップ５７）。Next, the speech speed control information consisting of the compression setting α set in accordance with the 1 × speed reproduction speed and the 2 × speed reproduction speed is generated and coded by the coding unit 12 (step 55). Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 56). Then, transmission data is created based on this multiplexed signal or this multiplexed signal is recorded on the recording medium (step 57).

【００６６】上記ステップ５３において、前回の区間
が、継続数が所定数Ｔｄｅｌ以上の無音区間である（Ｆ
＝１）と判別された場合には、フラグＦがリセット（Ｆ
＝０）にされる（ステップ５８）。そして、圧縮率αが
現在設定されている圧縮率に対して大きくなるように設
定される（ステップ５９）。つまり、２倍速再生用の圧
縮率αが１／２≦α≦１の範囲内で設定される場合に
は、２倍速再生時において、出力音声速度が入力音声速
度の１倍以上で２倍以下となる範囲内で、再生時の出力
音声速度が遅くなるように２倍速再生用の圧縮率αが設
定される。また、１倍速再生用の圧縮率αが１≦α≦３
／２の範囲内で設定される場合には、１倍速再生時にお
いて、出力音声速度が入力音声速度の２／３倍以上で１
倍以下となる範囲内で、再生時の出力音声速度が遅くな
るように１倍速再生用の圧縮率αが設定される。In step 53, the previous section is a silent section whose continuation number is equal to or more than the predetermined number Tdel (F
= 1), the flag F is reset (F
= 0) (step 58). Then, the compression rate α is set to be larger than the currently set compression rate (step 59). That is, when the compression rate α for 2 × speed reproduction is set within the range of 1/2 ≦ α ≦ 1, the output audio speed is 1 or more times and 2 times or less than the input audio speed during the 2 × speed reproduction. The compression rate α for double speed reproduction is set so that the output audio speed during reproduction becomes slower within the range. Further, the compression rate α for 1 × speed reproduction is 1 ≦ α ≦ 3
When set within the range of / 2, the output voice speed is 1/3 when the output voice speed is ⅔ or more times the input voice speed during 1 × speed reproduction.
The compression rate α for 1 × speed reproduction is set so that the output audio speed during reproduction becomes slow within the range of double or less.

【００６７】そして、１倍速再生速度および２倍速再生
速度に応じてそれぞれ設定された圧縮設定αからなる話
速制御情報が生成され、コード化部１２によってコード
化される（ステップ５５）。次に、多重化部１３によっ
て、当該区間の入力音声信号と話速制御情報とが多重化
される（ステップ５６）。そして、この多重化信号に基
づいて送信データが作成されるかまたはこの多重化信号
が録音メディアに記録される（ステップ５７）。Then, the voice speed control information including the compression setting α set according to the 1 × speed reproduction speed and the 2 × speed reproduction speed is generated and coded by the coding unit 12 (step 55). Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 56). Then, transmission data is created based on this multiplexed signal or this multiplexed signal is recorded on the recording medium (step 57).

【００６８】上記ステップ５２において、パワー平均値
Ｐが所定のしきい値Ｔｈより小さいときには（Ｐ＜Ｔ
ｈ）、当該区間は無音区間であると判別され、無音区間
の継続数が算出される（ステップ６０）。そして、無音
区間の継続数が所定数Ｔｄｅｌ以上であるか否かが判別
される（ステップ６１）。無音区間の継続数が所定数Ｔ
ｄｅｌより少ないときには、フラグＦがセット（Ｆ＝
１）される（ステップ６２）。また、音声信号の時間長
さを伸長する区間と判定される（ステップ６３）。In step 52, when the power average value P is smaller than the predetermined threshold value Th (P <T
h), the section is determined to be a silent section, and the number of continuous silent sections is calculated (step 60). Then, it is determined whether or not the number of continuous silent sections is equal to or larger than the predetermined number Tdel (step 61). The number of continuous silent sections is a predetermined number T
When it is less than del, the flag F is set (F =
1) (step 62). Also, it is determined that the time length of the audio signal is to be extended (step 63).

【００６９】次に、上記ステップ５４と同様に、１倍速
再生用および２倍速再生用の圧縮率αが現在設定されて
いる対応する圧縮率に対して単調減少するように設定さ
れる（ステップ６４）。そして、１倍速再生速度および
２倍速再生速度に応じてそれぞれ設定された圧縮設定α
からなる話速制御情報が生成され、コード化部１２によ
ってコード化される（ステップ６５）。次に、多重化部
１３によって、当該区間の入力音声信号と話速制御情報
とが多重化される（ステップ５６）。そして、この多重
化信号に基づいて送信データが作成されるかまたはこの
多重化信号が録音メディアに記録される（ステップ５
７）。Next, as in step 54, the compression rates α for 1 × speed reproduction and 2 × speed reproduction are set so as to monotonically decrease with respect to the currently set corresponding compression rates (step 64). ). The compression setting α set according to the 1 × speed reproduction speed and the 2 × speed reproduction speed, respectively.
The voice speed control information consisting of is generated and encoded by the encoding unit 12 (step 65). Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 56). Then, transmission data is created based on this multiplexed signal or this multiplexed signal is recorded on a recording medium (step 5).
7).

【００７０】上記ステップ６１において、無音区間の継
続数が所定数Ｔｄｅｌ以上であると判定されたときに
は、当該区間は削除すべき区間と判定される（ステップ
６６）。そして、当該区間の入力音声信号を削除区間と
する制御情報が生成され、コード化部１２によってコー
ド化される（ステップ６７）。When it is determined in step 61 that the number of continuous silent sections is equal to or greater than the predetermined number Tdel, the section is determined to be a section to be deleted (step 66). Then, the control information in which the input voice signal of the section is set as the deletion section is generated and coded by the coding unit 12 (step 67).

【００７１】次に、多重化部１３によって、当該区間の
入力音声信号と話速制御情報とが多重化される（ステッ
プ５６）。そして、この多重化信号に基づいて送信デー
タが作成されるかまたはこの多重化信号が録音メディア
に記録される（ステップ５７）。Next, the multiplexer 13 multiplexes the input voice signal and the voice speed control information in the section (step 56). Then, transmission data is created based on this multiplexed signal or this multiplexed signal is recorded on the recording medium (step 57).

【００７２】このようにして生成された多重化信号が再
生装置２側で再生される場合には、操作者によって再生
速度（１倍速または２倍速）が設定される。入力音声信
号のうち、設定された再生速度に対する圧縮率αが話速
制御情報として設定されている区間では、その圧縮率α
で入力音声信号が時間軸圧縮伸長処理された後、再生さ
れる。また、入力音声信号のうち、話速制御情報によっ
て削除される区間であると指定されている区間では、入
力音声信号が削除される。When the multiplexed signal thus generated is reproduced on the reproducing apparatus 2 side, the reproducing speed (1 × speed or 2 × speed) is set by the operator. In the section in which the compression rate α for the set playback speed is set as the voice speed control information in the input audio signal, the compression rate α is set.
At, the input audio signal is time-axis compressed / decompressed and then reproduced. Further, in the input voice signal, the input voice signal is deleted in the section designated to be deleted by the voice speed control information.

【００７３】この第４動作例においても図６の第２動作
例と同様に、無音区間と判別された場合には（ステップ
５２でＮＯ）、無音区間の継続長を話速制御情報として
生成するようにしてもよい。In this fourth operation example as well, as in the second operation example of FIG. 6, when it is determined to be a silent section (NO in step 52), the duration of the silent section is generated as the voice speed control information. You may do it.

【００７４】また、図９に示すように、所定数以上の無
音区間に挟まれた音声の１文章の長さを測定し、１文章
の長さに応じて圧縮率αを単調減少させるようにしても
よい。つまり、文頭は圧縮率αが大きく（出力音声速度
を遅く）なり、文末にいくに従って圧縮率αが小さく
（出力音声速度を速く）なるように、話速制御情報を作
成してもよい。Further, as shown in FIG. 9, the length of one sentence of a voice sandwiched between a predetermined number or more of silent sections is measured, and the compression rate α is monotonically decreased according to the length of one sentence. May be. That is, the speech rate control information may be created such that the compression rate α increases at the beginning of a sentence (output voice speed becomes slower), and the compression rate α becomes smaller at the end of a sentence (output voice speed increases).

【００７５】また、入力音声のピッチが急激に上昇した
地点では圧縮率αが大きく（出力音声速度を遅く）な
り、その後入力音声のピッチが下降するにしたがって、
圧縮率αが小さく（出力音声速度を速く）なるように、
話速制御情報を作成してもよい。At a point where the pitch of the input voice sharply rises, the compression rate α becomes large (the output voice speed becomes slow), and thereafter, as the pitch of the input voice falls,
So that the compression rate α is small (the output voice speed is high),
The speech speed control information may be created.

【００７６】また、入力音声の発声速度を検出し、発声
速度に応じて圧縮率αを決定するようにしてもよい。す
なわち、発声速度が速い場合には、圧縮率αが大きく
（出力音声速度を遅く）なり、発声速度が遅い場合に
は、圧縮率αが小さく（出力音声速度を速く）なるよう
に、話速制御情報を作成してもよい。発声速度の検出方
法としては、単位時間当りの母音の数または母音の継続
長を検出する方法、単位時間当りの音声区間と無音区間
の割合を検出する方法等がある。Further, the utterance speed of the input voice may be detected and the compression rate α may be determined according to the utterance speed. That is, when the utterance speed is high, the compression rate α is high (the output voice speed is low), and when the utterance speed is low, the compression rate α is low (the output voice speed is high). Control information may be created. As a method of detecting the utterance speed, there are a method of detecting the number of vowels per unit time or a duration of vowels, a method of detecting a ratio of a voice section to a silent section per unit time, and the like.

【００７７】上記各実施例では、入力音声を音声区間と
無音区間とに区別しているが、入力音声を有声区間、無
声区間および無音区間に区別するようにしてもよい。こ
れらの判別は、パワー平均値により、まず、有声区間
と、無声区間および無音区間とを区別する。次に、無声
区間と無音区間との判別は、音声信号の零交差数（零レ
ベルとの単位時間当りの交差数）を計算し、零交差数が
所定数以上の区間を無声区間と判別し、零交差数が所定
数未満の区間を無音区間と判別する。有声区間に対して
は音声信号の時間長さを伸長する区間とし、所定長以上
の無音区間を削除区間とし、無声区間は処理しない区間
として、話速制御情報を作成する。また、有声区間の場
合には、自己相関法を用いて音声のピッチを抽出し、こ
のピッチ抽出情報を話速制御情報として作成してもよ
い。In each of the above embodiments, the input voice is divided into the voice section and the silent section, but the input voice may be divided into the voiced section, the unvoiced section and the silent section. In these determinations, the voiced section is first distinguished from the unvoiced section and the silent section based on the power average value. Next, the unvoiced section and the unvoiced section are distinguished by calculating the number of zero crossings of the voice signal (the number of crossings with the zero level per unit time), and distinguishing the section in which the number of zero crossings is a predetermined number or more as the unvoiced section. , A section in which the number of zero crossings is less than a predetermined number is determined as a silent section. The voice speed control information is created for the voiced section as a section in which the time length of the voice signal is extended, a silent section having a predetermined length or more as a deleted section, and the unvoiced section as an unprocessed section. In the case of a voiced section, the pitch of the voice may be extracted by using the autocorrelation method, and this pitch extraction information may be created as the speech speed control information.

【００７８】また、音声信号が音楽である場合には、再
生装置側で圧縮処理が行なわれないようにするために、
話速制御情報として音楽であることを示す情報を入れる
ようにしてもよい。音楽であるか否かを判別する方法と
しては、入力音声信号を周波数分析し、高周波数域（４
ＫＨｚ以上）の信号成分と、低周波域（４ＫＨｚ未満）
の信号成分との割合を算出し、高周波数域成分が大きい
場合は音楽として判断する方法がある。When the audio signal is music, in order to prevent the compression processing from being performed on the reproducing device side,
You may make it insert the information which shows that it is music as speech speed control information. As a method of discriminating whether or not it is music, frequency analysis is performed on the input voice signal, and high frequency range (4
KHz or higher) and low frequency range (less than 4 KHz)
There is a method of calculating the ratio with the signal component of, and judging as music when the high frequency band component is large.

【００７９】また、技術情報番組等においては、音声信
号を分析することによって所定の技術用語のフレーズを
検出し、検出した区間に所定のキーワードを話速制御情
報として付加しておき、再生時において操作者が上記キ
ーワードを再生装置に設定した場合には、キーワードに
対応する区間（上記技術用語のフレーズ）をゆっくり再
生するようにしてもよい。Further, in a technical information program or the like, a phrase of a predetermined technical term is detected by analyzing an audio signal, and a predetermined keyword is added to the detected section as speech speed control information, and at the time of reproduction. When the operator sets the keyword in the reproducing device, the section (phrase of the technical term) corresponding to the keyword may be slowly reproduced.

【００８０】なお、上記実施例では、編集装置において
音声信号と話速制御情報とが多重化されて出力されてい
るが、音声信号と話速制御情報とを切り替え伝送方式に
よって出力するようにしてもよい。In the above embodiment, the voice signal and the voice speed control information are multiplexed and output by the editing apparatus. However, the voice signal and the voice speed control information are switched and output by the transmission method. Good.

【００８１】また、再生装置側において、１倍速再生、
２倍速再生の他、３倍速再生等が可能な場合にも、圧縮
率αを全ての再生倍率速度ごとに設定して、話速制御情
報とし音声信号に付加することが好ましい。On the reproducing device side, 1 × speed reproduction,
In addition to the 2 × speed reproduction, it is preferable to set the compression rate α for all reproduction magnification speeds and add it to the voice signal as the voice speed control information even when the 3 × speed reproduction is possible.

【００８２】[0082]

【発明の効果】この発明によれば、予め送信データ、録
音メディア等に話速を制御するための情報を入れてお
き、受信側または再生側において、制御情報に基づいて
話速を制御できる話速変換システムが得られる。According to the present invention, information for controlling the speech speed is previously stored in transmission data, recording media, etc., and the speech speed can be controlled on the receiving side or the reproducing side based on the control information. A fast conversion system is obtained.

[Brief description of drawings]

【図１】話速制御システムの構成を示すブロック図であ
る。FIG. 1 is a block diagram showing a configuration of a speech speed control system.

【図２】編集装置の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of an editing device.

【図３】再生装置の構成を示すブロック図である。FIG. 3 is a block diagram showing a configuration of a playback device.

【図４】再生装置の話速制御部の構成を示すブロック図
である。FIG. 4 is a block diagram showing a configuration of a speech speed control unit of the playback device.

【図５】編集装置の第１動作例を示すフローチャートで
ある。FIG. 5 is a flowchart showing a first operation example of the editing apparatus.

【図６】編集装置の第２動作例を示すフローチャートで
ある。FIG. 6 is a flowchart showing a second operation example of the editing apparatus.

【図７】編集装置の第３動作例を示すフローチャートで
ある。FIG. 7 is a flowchart showing a third operation example of the editing apparatus.

【図８】編集装置の第４動作例を示すフローチャートで
ある。FIG. 8 is a flowchart showing a fourth operation example of the editing apparatus.

【図９】所定長以上の無音区間に挟まれた１文章の区間
を示す模式図である。FIG. 9 is a schematic diagram showing a section of one sentence sandwiched between silent sections of a predetermined length or more.

[Explanation of symbols]

１編集装置２再生装置１１音声信号分析部１２話速制御情報コード化部１３多重化部２１復調部２２話速制御部３１話速制御情報解析部３２制御情報同期化部３３話速変換部 DESCRIPTION OF SYMBOLS 1 Editing device 2 Playback device 11 Voice signal analysis unit 12 Speech rate control information coding unit 13 Multiplexing unit 21 Demodulation unit 22 Speech rate control unit 31 Speech rate control information analysis unit 32 Control information synchronization unit 33 Speech rate conversion unit

Claims

[Claims]

1. An editing device for generating a signal in which voice speed control information is added to a voice signal, and a voice signal and voice speed control information are separated from the signal generated by the edit device, and the voice signal is transmitted at a voice speed. A speech speed control system including a reproducing device for controlling the speech speed according to control information.

2. The editing device generates a signal in which the voice speed control information is added to the voice signal and records the signal on a recording medium, and the reproducing device outputs the voice speed control information from the recording medium to the voice signal. 2. The voice speed control system according to claim 1, wherein the added signal is read to separate the voice signal and the voice speed control information from each other, and the voice speed of the voice signal is controlled according to the voice speed control information.

3. The editing device generates a signal in which the voice speed control information is added to the audio signal to create transmission data, and the reproducing device receives the transmission data and receives the transmission. 2. The voice speed control system according to claim 1, wherein the voice signal and the voice speed control information are separated from the data, and the voice speed of the voice signal is controlled according to the voice speed control information.

4. The voice speed control system according to claim 1, wherein the voice speed control information is added to the voice signal by a time division multiplexing or a frequency division multiplexing method.

5. The voice speed control system according to claim 1, wherein the voice speed control information is added to the voice signal by the switching transmission method.

6. The reproduction device includes a speech speed conversion unit having a processing unit for time-axis compression / expansion processing of the audio signal and a processing unit for processing the deletion of the audio signal, and the speech speed control information is the audio signal. 2. At least information for performing time-axis expansion / compression processing and information for deleting audio signals are included.
6. The speech speed control system according to any one of 2, 3, 4 and 5.

7. The reproducing device comprises a speech speed conversion unit having a processing means for time-axis compression / expansion processing of an audio signal and a processing means for deleting an audio signal, wherein the speech speed control information is a voice section. 6. The speech speed control system according to claim 1, wherein the speech speed control system includes at least information indicating that the number of silent periods is continuous and information indicating the number of continuous silent periods.

8. An editing apparatus comprising means for analyzing a voice signal to generate voice speed control information, and means for adding the generated voice speed control information to the voice signal.

9. A reproducing apparatus comprising means for separating a voice signal and voice speed control information from a voice signal to which voice speed control information is added, and means for controlling the voice speed of the voice signal according to the voice speed control information. ..