JP4765460B2

JP4765460B2 - Speech coding apparatus and speech coding method

Info

Publication number: JP4765460B2
Application number: JP2005213676A
Authority: JP
Inventors: 達也鈴木
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2005-07-25
Filing date: 2005-07-25
Publication date: 2011-09-07
Anticipated expiration: 2025-07-25
Also published as: JP2007033585A

Description

本発明は、ＭＰ３やＡＡＣなどの符号化音声データを再符号化する音声符号化装置および音声符号化方法に関するものである。 The present invention relates to a speech encoding apparatus and speech encoding method for re-encoding encoded speech data such as MP3 and AAC.

近年、デジタル化された音声データを、人間の聴覚特性を利用して知覚されない情報を捨てることで圧縮する音声符号化（圧縮）技術が進歩し、符号化前のデータ量の１／１０以上のデータ量に圧縮しても符号化前の音質と遜色ない音質が得られるようになり、光ディスクやハードディスク、半導体メモリ等を使用したオーディオプレーヤで盛んに使われるようになった。 In recent years, speech coding (compression) technology for compressing digitized speech data by discarding information that is not perceived using human auditory characteristics has progressed, and more than 1/10 of the amount of data before encoding. Even when compressed to a data amount, the sound quality is comparable to that before encoding, and it has been actively used in audio players using optical disks, hard disks, semiconductor memories, and the like.

このようなオーディオプレーヤは、通常、パーソナルコンピュータ上に前記のような符号化技術を用いてデータ量を削減した上で蓄積されたコンテンツをオーディオプレーヤや記録媒体へ転送して使用するのが一般的である。 Such an audio player is generally used by transferring the content stored on a personal computer to the audio player or recording medium after reducing the amount of data using the encoding technique as described above. It is.

符号化音声データは、使用者が所有するコンパクトディスクなどから作成されるのが一般的であるが、音楽配信のように、特定の符号化方法で符号化されたものを手に入れて使用する場合もある。これらの符号化音声データの符号化方法、または、符号化音声データの符号化レートなど、必ずしも使用者が持つオーディオプレーヤの能力に適合したものであるとは限らないのが現状である。 Encoded audio data is generally created from a compact disc owned by the user, etc., but is obtained by using a data encoded by a specific encoding method such as music distribution. In some cases. The current situation is that the encoding method of these encoded audio data or the encoding rate of the encoded audio data does not necessarily match the capability of the audio player possessed by the user.

上記を解決するために、パーソナルコンピュータ上で符号化音声データを管理するソフトウェアの中には、使用者の所望の符号化方法、あるいはオーディオプレーヤの能力に適合した符号化方法に再符号化する音声符号化処理機能を持つものがある。例えば特許文献１でその技術が開示されている。以下、従来の前記の音声符号化装置について説明する。 In order to solve the above-mentioned problem, some software for managing encoded audio data on a personal computer includes audio that is re-encoded into an encoding method desired by the user or an encoding method suitable for the capability of the audio player. Some have an encoding processing function. For example, Patent Document 1 discloses the technique. Hereinafter, the conventional speech coding apparatus will be described.

従来の音声符号化装置においては、第一の符号化方法で符号化されている音声データを復号化部で復号化し、復号化したデータを一時記憶部に格納する。次に、符号化部により第二の符号化方法で音声データを符号化する。符号化部は、一時記憶部に所定のデータ量が蓄積されたら一時記憶部よりデータを読み出し、第二の符号化方法で符号化して一時記憶部へ出力し、さらに記録媒体に出力する。これを必要なコンテンツ分だけ実施すればよい。 In a conventional speech encoding apparatus, speech data encoded by the first encoding method is decoded by a decoding unit, and the decoded data is stored in a temporary storage unit. Next, the audio data is encoded by the encoding unit using the second encoding method. When a predetermined amount of data is accumulated in the temporary storage unit, the encoding unit reads the data from the temporary storage unit, encodes the data by the second encoding method, outputs the data to the temporary storage unit, and outputs the data to the recording medium. This only needs to be performed for the necessary content.

なお、復号化部の処理は、変換元の符号化方法に合わせて適宜処理の切り換えを行い、また、符号化部の処理は、変換後の音声データを利用する装置に適合する符号化方法に合わせて適宜処理の切り換えを行う。 Note that the processing of the decoding unit is appropriately switched according to the encoding method of the conversion source, and the processing of the encoding unit is an encoding method suitable for a device that uses the converted audio data. At the same time, processing is switched appropriately.

以上のように、符号化された音声データを、利用する装置、目的に合わせて再符号化を行うことで、その音声データの利用範囲を大きく広げることが出来る。
特開２００５−１２２７７号公報 As described above, by re-encoding the encoded audio data according to the device to be used and the purpose, the use range of the audio data can be greatly expanded.
JP 2005-12277 A

ところで、現在一般的に用いられているＭＰ３やＡＡＣといった音声符号化方法では、音声データから一定量を切り出し、ＭＤＣＴ（変形離散コサイン変換）によって時間領域のデータを周波数領域のデータに変換し、処理を行う。この処理単位をフレームと呼ぶが、符号化時には、隣接するフレームと、ある割合でオーバーラップさせる窓関数を用いてデータを切り出し、処理を行っている。逆に、復号化時は、オーバーラップさせた部分を相互に加算して元の音声データを復元している。隣接するフレームとオーバーラップさせるのは、符号化による圧縮によって生じるフレーム間の歪みを軽減するためである。このように、符号化された音声データのあるフレームを復元するためには、隣接するフレームの情報が必要になってくる。 By the way, in speech encoding methods such as MP3 and AAC that are generally used at present, a certain amount is cut out from speech data, and time domain data is converted into frequency domain data by MDCT (Modified Discrete Discrete Cosine Transform). I do. Although this processing unit is called a frame, at the time of encoding, data is cut out using a window function that overlaps with a neighboring frame at a certain rate. Conversely, at the time of decoding, the overlapped portions are added to each other to restore the original audio data. The reason for overlapping with adjacent frames is to reduce distortion between frames caused by compression by encoding. Thus, in order to restore a frame with encoded audio data, information on adjacent frames is required.

コンパクトディスクなどの音声データから符号化音声データを作成する場合は、トラック毎に符号化音声データをファイル化するのが普通である。コンパクトディスクに納められている音楽コンテンツの中には、トラックは分かれているが、複数トラックにまたがって音声が連続的に納められているものがあり、これらを符号化してトラック毎にファイル化しても、再生時にはこれらのトラックの音声を途切れさせることなく連続的に再生する必要がある。従って、符号化時には、トラック分割点においても、隣接するフレームとオーバーラップさせて符号化した後に分割する必要があり、また、復号化時においても、トラック分割点前後のフレームを隣接させた後に復号化された音声データをオーバーラップ加算して連続する音声データを復元する必要がある。 When creating encoded audio data from audio data such as a compact disc, the encoded audio data is usually filed for each track. Some music contents stored on compact discs have separate tracks, but some audio is stored continuously across multiple tracks. These are encoded and filed for each track. However, at the time of reproduction, it is necessary to reproduce the sound of these tracks continuously without interruption. Therefore, at the time of encoding, it is necessary to divide after encoding by overlapping with adjacent frames even at the track division point. Also, at the time of decoding, decoding is performed after adjacent frames before and after the track division point. It is necessary to restore the continuous audio data by adding the overlapped audio data.

しかしながら、前記従来の音声符号化装置では、復号化時に、各々のコンテンツの先頭フレームと最終フレームは、隣接するフレームがないために、フレームを正しく復元することが出来ず、これを再度符号化した符号化音声データは、隣接するフレームとオーバーラップさせても元のフレームを復元することが出来ず、音途切れが発生してしまう。 However, in the conventional speech encoding apparatus, at the time of decoding, since the first frame and the last frame of each content do not have adjacent frames, the frames cannot be correctly restored and are encoded again. Even if the encoded audio data overlaps with an adjacent frame, the original frame cannot be restored, and sound interruption occurs.

本発明は、連続して再生したときに連続的に音が接続されるべき符号化音声データを再符号化した場合であっても、連続的に再生したときに音途切れが発生しないよう再符号化を行う音声符号化装置を提供することを目的とする。 The present invention provides a re-encoding so that sound is not interrupted when continuously reproduced, even when encoded audio data to which a sound should be continuously connected when continuously reproduced is re-encoded. An object of the present invention is to provide a speech encoding apparatus that performs encoding.

上記の目的を達成するため、本発明の第１の発明の音声符号化装置は、第一の符号化方法で符号化され、コンテンツ毎に分割された複数の符号化音声データを、連続的に復号化してひとつの連続音声データを生成する復号化手段と、前記復号化手段で生成された前記連続音声データを第二の符号化方法で符号化するとともに、符号化された音声データを再びコンテンツ毎に分割する符号化手段とを有するものである。 In order to achieve the above object, a speech encoding apparatus according to a first aspect of the present invention continuously converts a plurality of encoded speech data encoded by the first encoding method and divided for each content. decoding means for generating a single continuous speech data and decodes, as well as sign-reduction in the communication Zokuoto voice data generated by the decoding means a second encoding method, encoded audio Coding means for dividing the data again for each content.

また、本発明の第２の発明の音声符号化装置は、第１の発明において、符号化手段は、第二の符号化方法で符号化された音声データをコンテンツ毎に分割する際、復号化手段で復号化する前の分割点に近い、第二の符号化方法の符号化フレームの境界で分割することを特徴とするものである。 The speech encoding apparatus according to a second aspect of the present invention is the speech encoding apparatus according to the first aspect, wherein the encoding means decodes the audio data encoded by the second encoding method for each content. It divides | segments on the boundary of the encoding frame of the 2nd encoding method close | similar to the division | segmentation point before decoding by a means.

本発明の音声符号化装置によれば、符号化音声データを再符号化した場合であっても、連続的な復号化処理と符号化処理を施すことが可能となり、連続的に再生したときに音途切れが発生しない再符号化コンテンツを得ることができる。 According to the speech encoding apparatus of the present invention, even when encoded speech data is re-encoded, it is possible to perform continuous decoding processing and encoding processing. It is possible to obtain re-encoded content in which no sound interruption occurs.

以下、本発明の実施の形態について、図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（実施の形態１）
図１は、本発明の実施の形態１による音声符号化装置の構成を示すブロック図である。始めに、図１を参照しながら、本実施の形態の音声符号化装置の構成について説明する。 (Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 1 of the present invention. First, the configuration of the speech coding apparatus according to the present embodiment will be described with reference to FIG.

図１に示すように、本実施の形態１における音声符号化装置は、第一の符号化方法で符号化された複数の符号化音声データを格納する第一の記録媒体１０と、第一の記録媒体１０に格納されている符号化音声データを順次読み出して一時的に記憶しておく第一の一時記憶部１１と、第一の一時記憶部１１に蓄積されている符号化音声データを復号化する復号化部１２と、復号化部１２にて復号化した音声データを一時的に記憶しておく第二の一時記憶部１３と、第二の一時記憶部１３に蓄積されている復号化された音声データを第二の符号化方法で符号化する符号化部１４と、符号化部１４にて符号化した音声データを一時的に記憶しておく第三の一時記憶部１５と、再符号化された音声データを格納する第二の記憶媒体１６と、装置全体の制御を行う制御部１７とから構成される。 As shown in FIG. 1, the speech encoding apparatus according to the first embodiment includes a first recording medium 10 that stores a plurality of encoded speech data encoded by a first encoding method, A first temporary storage unit 11 that sequentially reads out and temporarily stores the encoded audio data stored in the recording medium 10, and decodes the encoded audio data stored in the first temporary storage unit 11. , A second temporary storage unit 13 for temporarily storing the audio data decoded by the decoding unit 12, and a decoding stored in the second temporary storage unit 13 An encoding unit 14 that encodes the encoded audio data by the second encoding method, a third temporary storage unit 15 that temporarily stores the audio data encoded by the encoding unit 14, A second storage medium 16 for storing the encoded audio data, and the entire device And a control unit 17 for performing the control.

次に、その動作について説明する。記録媒体１０には第一の符号化方法で符号化された複数の符号化音声データファイルが格納されている。これら符号化音声データは、ある規則に従って再生順序が指定されているものである。制御部１７は、この再生順序に従って記録媒体１０から符号化音声データを読み出し、第一の一時記憶部１１に順次格納する。第一の一時記憶部１１に所定の音声データ量が蓄積された段階で、制御部１７は復号化部１２に対して復号化開始を指示する。復号化が開始され、第一の一時記憶部１１の音声データ量が所定量を下回れば、再び第一の記憶媒体１０から符号化音声データを読み出し、第一の一時記憶部１１に追加する。これを符号化音声データファイルの終端まで繰り返すが、あるファイルのデータを全て読み出し終えれば、再生順序に従って次の符号化音声データファイルを開き、音声データを読み出して第一の一時記憶部１１に、前ファイルのデータに続けて格納する。ここで、制御部１７は、ファイルの境界のタイミングを記憶しておく。 Next, the operation will be described. The recording medium 10 stores a plurality of encoded audio data files encoded by the first encoding method. These encoded audio data have a playback order designated according to a certain rule. The control unit 17 reads the encoded audio data from the recording medium 10 in accordance with the reproduction order and sequentially stores it in the first temporary storage unit 11. When a predetermined amount of audio data is accumulated in the first temporary storage unit 11, the control unit 17 instructs the decoding unit 12 to start decoding. When decoding is started and the amount of audio data in the first temporary storage unit 11 falls below a predetermined amount, the encoded audio data is read again from the first storage medium 10 and added to the first temporary storage unit 11. This is repeated until the end of the encoded audio data file. When all the data of a certain file has been read, the next encoded audio data file is opened according to the reproduction order, and the audio data is read and stored in the first temporary storage unit 11. , Store the data of the previous file. Here, the control unit 17 stores the timing of the file boundary.

復号化部１２で復号化された音声データは、順次第二の一時記憶部１３に格納される。復号化部１２は、第一の符号化方法で符号化された符号化音声データのファイルの境界に依存することなく連続的に復号化処理を行うため、第二の一時記憶部１３に格納される音声データは連続データとなる。第二の一時記憶部１３に所定の音声データ量が蓄積された段階で、制御部１７は符号化部１４に対して符号化開始を指示する。符号化部１４にて符号化された音声データは、順次第三の一時記憶部１５に格納される。制御部１７は、第三の一時記憶部１５に蓄積された符号化音声データを第二の記録媒体１６に順次記録する。ここで、前記第一の一時記憶部１１に符号化音声データを格納する際に記憶していたファイルの境界のタイミングに基づき、第二の記録媒体１６に記録していく際に、符号化音声データのファイルを分割する。 The audio data decoded by the decoding unit 12 is sequentially stored in the second temporary storage unit 13. The decoding unit 12 is stored in the second temporary storage unit 13 in order to perform the decoding process continuously without depending on the file boundary of the encoded audio data encoded by the first encoding method. The audio data is continuous data. When a predetermined amount of audio data is accumulated in the second temporary storage unit 13, the control unit 17 instructs the encoding unit 14 to start encoding. The audio data encoded by the encoding unit 14 is sequentially stored in the third temporary storage unit 15. The control unit 17 sequentially records the encoded audio data accumulated in the third temporary storage unit 15 on the second recording medium 16. Here, when recording on the second recording medium 16 based on the timing of the boundary of the file stored when the encoded audio data is stored in the first temporary storage unit 11, the encoded audio is recorded. Divide the data file.

以上のように、本発明の実施の形態１による音声符号化装置によれば、複数の符号化音声データファイルを再符号化するときに、各ファイルの接続点において、隣接するフレームを接続した状態で復号化して連続音声データとして復元し、さらに、連続音声データとして符号化した後に元のファイルの単位で分割するため、再符号化された符号化音声データを連続的に再生した場合にも音途切れが発生することがない。 As described above, according to the audio encoding device according to Embodiment 1 of the present invention, when re-encoding a plurality of encoded audio data files, adjacent frames are connected at the connection point of each file. Is decoded and restored as continuous audio data, and further, encoded as continuous audio data and then divided in units of the original file, so that even when re-encoded encoded audio data is continuously played back, There is no interruption.

なお、第一の符号化方法と第二の符号化方法では、音声データの時間軸方向の処理単位（フレームという）が異なる場合がある。このときの処理について図２を用いて説明する。 Note that the processing unit (referred to as a frame) in the time axis direction of audio data may be different between the first encoding method and the second encoding method. Processing at this time will be described with reference to FIG.

符号化音声データは、分割できる最小単位がこのフレームになるので、異なるフレーム長の符号化方法の間での変換を行う場合は、ファイルの分割点に注意する必要がある。本発明の実施の形態１による処理においては、図２に示すように、第一の符号化方法で符号化されている複数ファイルの音声データを、一旦連続する音声データに復号化する。そして、この連続する音声データを第二の符号化方法で連続的に符号化する。そして最後に、元のファイルの分割点に対して最も近いフレームの分割点においてファイルを分割し、記録する。元のファイルの分割点とは異なるが、連続的に再生可能な複数の符号化音声データファイルとして再符号化することが可能となる。 Since the encoded audio data is the smallest unit that can be divided into frames, it is necessary to pay attention to the division point of the file when converting between encoding methods having different frame lengths. In the processing according to the first embodiment of the present invention, as shown in FIG. 2, the audio data of a plurality of files encoded by the first encoding method is once decoded into continuous audio data. Then, the continuous audio data is continuously encoded by the second encoding method. Finally, the file is divided and recorded at the division point of the frame closest to the division point of the original file. Although it is different from the division point of the original file, it can be re-encoded as a plurality of encoded audio data files that can be reproduced continuously.

なお、ファイルの分割点を選択する際に、最も近い分割点を選ぶか、前側もしくは後側で最も近い分割点を選ぶかは、その装置の設計事項として決定すればよい。 In selecting a file division point, whether to select the closest division point or the closest division point on the front side or the rear side may be determined as a design matter of the apparatus.

（実施の形態２）
図３は、本発明の実施の形態２による音声符号化装置の構成を示すブロック図である。構成については、本発明の実施の形態１と共通であるが、制御部１７が第一の記録媒体１０より、各々の符号化音声データファイルに付随する情報を取得するところが異なる。 (Embodiment 2)
FIG. 3 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 2 of the present invention. The configuration is the same as that of the first embodiment of the present invention, except that the control unit 17 acquires information attached to each encoded audio data file from the first recording medium 10.

動作についても本発明の実施の形態１と共通であり、異なる部分のみを説明する。 The operation is also common to the first embodiment of the present invention, and only different parts will be described.

本発明の実施の形態２では、制御部１７は、符号化音声データファイルに付随する情報として、異なる音声データファイル間の連続性に関する情報を取得する。この情報は、ファイル間の連続性を直接的に示すフラグ情報でも良いし、コンテンツのグループ属性とその順列という間接的に示す情報であっても良い。後者の場合、例えば、元々の音声データがコンパクトディスクから作成されたものであれば、コンテンツ毎に「アルバム」「トラック番号」といった属性を持たせることが出来る。これらの情報は、ユーザが入力しても良いし、インターネットといった外部ネットワークから取得する情報であっても良い。そして、「アルバム」属性が同一であり、「トラック番号」属性が連続する複数の音声データは、連続性があると判断することが出来る。 In the second embodiment of the present invention, the control unit 17 acquires information on continuity between different audio data files as information accompanying the encoded audio data file. This information may be flag information that directly indicates continuity between files, or may be information that indirectly indicates content group attributes and their permutations. In the latter case, for example, if the original audio data is created from a compact disc, attributes such as “album” and “track number” can be provided for each content. These pieces of information may be input by the user or may be information acquired from an external network such as the Internet. A plurality of audio data having the same “album” attribute and having a continuous “track number” attribute can be determined to have continuity.

そして、制御部１７は、連続性があると判断した複数の符号化音声データファイルを再符号化する場合は、復号化後の音声の連続性が保たれるように、連続的な復号化と符号化による再符号化を行い、連続性がないと判断した複数の符号化音声データファイルを再符号化する場合は、各々の符号化音声データファイルを独立して再符号化する。 Then, when re-encoding a plurality of encoded audio data files determined to have continuity, the control unit 17 performs continuous decoding so that the continuity of the audio after decoding is maintained. When re-encoding by encoding and re-encoding a plurality of encoded audio data files determined not to be continuous, each encoded audio data file is independently re-encoded.

このように、本発明の実施の形態２では、本来連続的に再生されるべき複数の符号化音声データファイルのみに対して、音声の連続性を保つ再符号化処理を施すことが可能となる。 As described above, in the second embodiment of the present invention, it is possible to perform re-encoding processing that maintains audio continuity only on a plurality of encoded audio data files that should be reproduced continuously. .

なお、上記各実施の形態において、第一に記録媒体１０と第二の記録媒体１６は、個別の記録媒体でも、同じ記録媒体であってもよい。また、第一〜第三の一時記憶部１１，１３，１５においても、個別の記録部として構成しても、同じ記録部の領域を分けて構成してもよい。 In each of the above embodiments, first, the recording medium 10 and the second recording medium 16 may be separate recording media or the same recording medium. Also, the first to third temporary storage units 11, 13, and 15 may be configured as individual recording units or may be configured by dividing the area of the same recording unit.

また、上記各実施の形態における音声符号化装置の一部または全部を、ＣＰＵやメモリからなるコンピュータにおいて実行可能なソフトウェアとして実現できることは言うまでもない。この場合、本発明は、図１または図３に示した各構成に対応するステップを有する音声符号化方法としても実現可能である。 Needless to say, part or all of the speech encoding apparatus in each of the above embodiments can be realized as software that can be executed by a computer including a CPU and a memory. In this case, the present invention can also be realized as a speech encoding method having steps corresponding to each configuration shown in FIG. 1 or FIG.

本発明にかかる音声符号化装置は、装置として構成できることはもちろんのこと、パーソナルコンピュータ上で音声データを管理し、音声データを再生する外部機器に再符号化を施して出力するソフトウェアとしても有用であるし、音楽配信サーバや端末として、音声データを販売する際に、ユーザに最適な符号化方法で再符号化するようなシステムとしても有用である。 The speech coding apparatus according to the present invention can be configured as a device, and is also useful as software that manages speech data on a personal computer and performs re-encoding on an external device that reproduces the speech data. In addition, as a music distribution server or terminal, it is useful as a system that re-encodes audio data with an encoding method that is optimal for the user when selling audio data.

本発明の実施の形態１における音声符号化装置の構成を示すブロック図FIG. 1 is a block diagram showing the configuration of a speech coding apparatus according to Embodiment 1 of the present invention. 同実施の形態１における音声符号化装置のファイル分割の例を示す説明図Explanatory drawing which shows the example of the file division | segmentation of the audio | voice coding apparatus in Embodiment 1. 本発明の実施の形態２における音声符号化装置の構成を示すブロック図Block diagram showing the configuration of a speech encoding apparatus according to Embodiment 2 of the present invention.

Explanation of symbols

１０、１６記録媒体
１１、１３、１５一時記憶部
１２復号化部
１４符号化部
１７制御部 10, 16 Recording medium 11, 13, 15 Temporary storage unit 12 Decoding unit 14 Encoding unit 17 Control unit

Claims

A decoding unit that continuously decodes a plurality of pieces of encoded audio data encoded by the first encoding method and divided for each content to generate one continuous audio data; and the product is the communication Zokuoto voice data, as well as sign-reduction by the second coding method, speech coding apparatus and a coding means for dividing the audio data encoded again for each content.

When the encoding means divides the audio data encoded by the second encoding method for each content, the code of the second encoding method is close to the division point before being decoded by the decoding means. 2. The speech encoding apparatus according to claim 1, wherein the speech encoding apparatus divides at a boundary of the encoded frame.

A plurality of encoded audio data encoded by the first encoding method and divided for each content, a decoding step for continuously generating one continuous audio data, and generated by the decoding means wherein the communicating Zokuoto voice data, as well as sign-reduction by the second coding method, speech coding method and a coding step of dividing the audio data encoded again for each content.

In the encoding step, when the audio data encoded by the second encoding method is divided for each content, the code of the second encoding method is close to the division point before being decoded by the decoding step. 4. The speech encoding method according to claim 3 , wherein the audio encoding method is divided at the boundaries of the encoded frames.