JP4520922B2

JP4520922B2 - Data format determination method, apparatus, program, and recording medium

Info

Publication number: JP4520922B2
Application number: JP2005266870A
Authority: JP
Inventors: 健弘守谷; 登原田; 優鎌本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-09-14
Filing date: 2005-09-14
Publication date: 2010-08-11
Anticipated expiration: 2025-09-14
Also published as: JP2007079127A

Description

デジタルデータ中に音響データが含まれているかを判定する方法、装置、プログラム、および記録媒体に関する。 The method if there is any sound data determine Teisu Ru in the digital data, equipment, a program and a recording medium.

歪のない圧縮符号化としては、多くの提案がある。たとえば、文章やプログラムなどのテキストデータでは、ＺＩＰという汎用的な圧縮符号化方法が良く使われる。一方、音楽信号などの歪のない圧縮符号化方法としては、たとえば特許文献１などがある。この方法の場合、ＺＩＰより、はるかに効果的に圧縮できる。
圧縮しようとするデジタルデータを格納しているファイルに、統一的な拡張子が付されている場合は、テキストデータであるか音響データを含むものであるかは、その拡張子を見て判断できる。しかし、圧縮しようとするデジタルデータに拡張子が付されていない場合や、正しい拡張子が付されていない場合もあり、テキストデータであるか音響データを含むものであるかが分からないことも多い。このような場合に、効率良い圧縮符号化方法を選択して圧縮することができないため、特に音響データを含むデータを効率よく圧縮できていない。 There are many proposals for compression coding without distortion. For example, a general-purpose compression encoding method called ZIP is often used for text data such as sentences and programs. On the other hand, as a compression encoding method without distortion such as a music signal, there is, for example, Patent Document 1. This method can compress much more effectively than ZIP.
If the file storing the digital data to be compressed has a uniform extension, it can be determined by looking at the extension whether it is text data or audio data. However, the digital data to be compressed may not have an extension or may not have a correct extension, and it is often unclear whether the data is text data or audio data. In such a case, an efficient compression encoding method cannot be selected and compressed, and therefore data including acoustic data cannot be particularly efficiently compressed.

つまり、デジタルデータが音響データを含むものであるか否かを判別できれば、効率的に圧縮できる。しかし、従来技術では、そのような判定技術はなかった。
特開２００５−１１５２６７号公報 In other words, if it can be determined whether or not the digital data includes acoustic data, it can be efficiently compressed. However, there is no such determination technique in the prior art.
JP 2005-115267 A

従来技術では、圧縮しようとするデジタルデータが格納されたファイルの拡張子を見て音響データを含むか否かを判定することは可能である。しかし、拡張子が付されていない場合は、デジタルデータが音響データを含むか否かを判定することはできない。その結果、音響データが保有する特徴を利用した符号化ができないため、符号化の効率が大幅に悪くなる。これはアナログ波形データをサンプリングして得られたデジタル波形データの場合でも同じである。本発明の目的は、入力されたデジタルデータが音響データ等の波形データを含むか否かを判別することにある。 In the prior art, it is possible to determine whether or not sound data is included by looking at the extension of a file in which digital data to be compressed is stored. However, if no extension is added, it cannot be determined whether the digital data includes acoustic data. As a result, since the encoding using the characteristics possessed by the acoustic data cannot be performed, the encoding efficiency is greatly deteriorated. This also applies to digital waveform data obtained by sampling analog waveform data. An object of the present invention, the digital data input is in the child determines whether or not containing waveform data such as sound data.

本発明では、入力されたデジタルデータが既知の１つまたは複数のデータ形式の波形データのサンプル値列であるとして、その波形データらしさを、サンプル値列内のサンプル値間の関係を用いて求める。求めた１つまたは複数の波形データらしさの中に、あらかじめ設定された条件を満足するものがある場合には、入力されたデジタルデータが波形データを含むと判別し、前記条件を満足するものがない場合には、入力されたデジタルデータは波形データを含まないと判別する。 In the present invention, assuming that the input digital data is a sample value sequence of waveform data in one or more known data formats, the likelihood of the waveform data is obtained using the relationship between the sample values in the sample value sequence. . If one or more of the obtained waveform data characteristics satisfy a preset condition, it is determined that the input digital data includes waveform data, and the waveform data satisfies the condition. If not, it is determined that the input digital data does not include waveform data .

本発明によれば、入力されたデジタルデータに波形データが含まれているか分からない場合にも、入力されたデジタルデータの波形データらしさを判断することで、入力されたデジタルデータが波形データを含むか否かを判別することができる。さらに、波形データを含むと判別された場合には、波形データとしてのデータ形式も予測できるので、どのような波形データ形式で符号化すればよいかも判定できる。したがって、効率的な符号化が期待できる。 According to the present invention, even if it is not known whether waveform data is included in the input digital data, the input digital data includes the waveform data by determining the likelihood of the waveform data of the input digital data. It can be determined whether or not. Furthermore, when it is determined that the waveform data is included, the data format as the waveform data can also be predicted, so it is also possible to determine what waveform data format should be used for encoding. Therefore, efficient encoding can be expected.

以下では、最も代表的なアナログ波形データである音響データに本発明を適用した実施形態を説明する。説明の重複を避けるため同じ機能を有する構成部や同じ処理を行う処理ステップには同一の番号を付与し、説明を省略する。
［第１実施形態］
本発明の音響データ形式判別装置の機能構成例を図１に示す。また、音響データ形式判別装置１００の処理フローを図２に示す。データ形式判別装置１００は、データ記録部１１０、指標判定部１２０、データ形式判別部１３０から構成される。また、指標判定部１２０は、格納形式記録部１２１、指標計算部１２４から構成される。データ形式判別部１３０は、指標比較部１３１を備えている。 In the following, an embodiment in which the present invention is applied to acoustic data, which is the most representative analog waveform data, will be described. In order to avoid duplication of description, the same number is assigned to a component having the same function or a process step performing the same process, and the description is omitted.
[First Embodiment]
An example of the functional configuration of the acoustic data format discrimination device of the present invention is shown in FIG. Moreover, the processing flow of the acoustic data format discrimination device 100 is shown in FIG. The data format determination device 100 includes a data recording unit 110, an index determination unit 120, and a data format determination unit 130. The index determination unit 120 includes a storage format recording unit 121 and an index calculation unit 124. The data format determination unit 130 includes an index comparison unit 131.

入力されたデジタルデータは、まずデータ記録部１１０に記録される（Ｓ１１０）。指標判定部１２０は、以下のように、入力されたデジタルデータをあらかじめ設定された形式の音響データであると仮定して、音響データらしさの判定を行い、各データ形式での音響データらしさの指標の値を出力する（Ｓ１２４）。まず、指標判定部１２０の指標計算部１２４は、格納形式記録部１２１にあらかじめ設定された１つまたは複数の音響データ用のデータ形式の中から、データ形式を順次選択する。次に、それぞれのデータ形式であるとした場合のサンプル値列内のサンプル値間の関係を用いて、入力されたデジタルデータに対して、それぞれのデータ形式であると想定した場合の音響データらしさを評価する。そして、各データ形式での音響データらしさの指標の値を出力する（Ｓ１２４）。ここで、あらかじめ設定されたデータ形式とは、複数種類の１サンプルあたりのビット数、整数表現であるか浮動小数点表現であるか、複数種類のサンプルごとのバイト格納順、複数種類のチャネル数などである。評価に用いるデジタルデータは入力されたもの全てであっても良いし、一部であっても良い。一部を用いる場合は、例えばデジタルデータの先頭の数千サンプル〜数万サンプルを用いればよい。音響データらしさの評価であれば、数千サンプル分で構成されるフレームごとに行う。 The input digital data is first recorded in the data recording unit 110 (S110). As described below, the index determination unit 120 assumes that the input digital data is acoustic data in a preset format, performs acoustic data likelihood determination, and indicates an acoustic data likelihood index in each data format. Is output (S124). First, the index calculation unit 124 of the index determination unit 120 sequentially selects a data format from one or a plurality of data formats for acoustic data set in the storage format recording unit 121 in advance. Next, using the relationship between the sample values in the sample value sequence when each data format is assumed, the input digital data is likely to be the acoustic data when it is assumed that each data format is used. To evaluate. Then, the index value of the sound data likelihood in each data format is output (S124). Here, the preset data format is the number of bits per sample of a plurality of types, whether it is an integer representation or a floating point representation, the byte storage order for a plurality of types of samples, the number of channels of a plurality of types, etc. It is. The digital data used for evaluation may be all input data or a part thereof. In the case of using a part, for example, the first thousands of samples to tens of thousands of samples of digital data may be used. For the evaluation of the likelihood of acoustic data, it is performed for each frame composed of several thousand samples.

具体的には、音響データらしさの指標として、次式を用いて評価する。

Specifically, the following expression is used as an index of the likelihood of sound data.

ただし、ｐはあらかじめ定めた正の整数、ｋ_ｉ（ｉ＝１〜ｐ）はＰＡＲＣＯＲ係数。
Ｆは、ｐ次の予測の場合の予測残差のエネルギーに対する入力信号のエネルギーの比を近似するものである。Ｆの値が大きい方が、音響データらしいと言え、入力されたデジタルデータを圧縮する場合は、音響データ用の符号化手段を用いることで高い圧縮効果が期待できる。なお、例えば、フレーム単位に音響データらしさの指標を計算する場合に、想定したデータ形式でのサンプル値列が複数フレーム分ある場合には、データ形式ごと、フレームごとの複数の指標の値が求まることになる。このように同じデータ形式で複数の指標の値がある場合には、それらの中の最大の値、すなわち最も音響データらしい指標の値を、当該データ形式でのサンプル値列に対する指標の値とする。 Here, p is a predetermined positive integer, and k _i (i = 1 to p) is a PARCOR coefficient.
F approximates the ratio of the energy of the input signal to the energy of the prediction residual in the case of p-th order prediction. It can be said that a larger F value is more likely to be acoustic data, and when compressing input digital data, a high compression effect can be expected by using an encoding means for acoustic data. Note that, for example, when calculating an index of the likelihood of sound data in units of frames, if there are a plurality of sample value sequences in an assumed data format, the values of a plurality of indexes for each data format and each frame are obtained. It will be. Thus, when there are a plurality of index values in the same data format, the maximum value among them, that is, the index value that seems to be the most acoustic data, is set as the index value for the sample value string in the data format. .

データ形式判別部１３０は、各データ形式に対応する音響データらしさの指標（Ｆ）の値を比較し、最大の指標を与えるデータ形式、すなわち最も音響データらしいと判断されたデータ形式と、その形式での音響データらしさの指標（Ｆ）の値を得る。（Ｓ１３０）。また、データ形式判別部１３０の指標比較部１３１は、当該音響データらしさの指標（Ｆ）と閾値とを比較し、音響データらしいか否かを判断する（Ｓ１３１）。後述の実験例で説明するが、例えばＦの値が１０００以上の場合を音響データらしいと判断する。ステップＳ１３１がＹｅｓの場合には、データ形式判別部１３０は、音響データを含むと判別し、その旨と音響データのデータ形式を出力する（Ｓ１４１）。またステップＳ１３１がＮｏの場合には、データ形式判別部１３０は、音響データを含まないと判別し、その旨を出力する（Ｓ１４２）。なお、あらかじめ設定するデータ形式を１つとした場合は、ステップＳ１２４では順次データ形式を選択するのではなく、当該１つのデータ形式だけでの音響データらしさを求めればよい。また、ステップＳ１３０を行う必要がない。
このようにデータ形式を判別することで、入力されたデジタルデータに音響データが含まれるのか、および音響データが含まれる場合にはそのデータ形式を推定することができる。 The data format discriminating unit 130 compares the value of the index (F) of the acoustic data likelihood corresponding to each data format, and provides the data format that gives the maximum index, that is, the data format determined to be the most acoustic data, and the format The value of the index (F) of the likelihood of sound data at is obtained. (S130). Further, the index comparison unit 131 of the data format determination unit 130 compares the acoustic data-like index (F) with a threshold value, and determines whether or not the data is likely to be acoustic data (S131). As will be described in an experimental example described later, for example, a case where the value of F is 1000 or more is determined to be acoustic data. When step S131 is Yes, the data format discrimination | determination part 130 discriminate | determines that it contains acoustic data, and outputs that and the data format of acoustic data (S141). If step S131 is No, the data format determination unit 130 determines that no acoustic data is included, and outputs that effect (S142). Note that when one data format is set in advance, the data format is not selected sequentially in step S124, but it is only necessary to obtain the likelihood of acoustic data using only one data format. Further, there is no need to perform step S130.
By discriminating the data format in this way, it is possible to estimate whether the input digital data includes acoustic data and, if acoustic data is included, the data format.

［変形例１］
第１実施形態では、音響データらしさを示す指標としてＦを用いたが、周波数領域に変換する符号化の場合には、次式を用いて評価することもできる。

[Modification 1]
In the first embodiment, F is used as an index indicating the likelihood of sound data. However, in the case of encoding to be converted into the frequency domain, evaluation can be performed using the following expression.

ただし、Ｍはあらかじめ定めた正の整数、Ｙ_ｊ（ｊ＝０〜Ｍ−１）はｊ番目の周波数領域係数の２乗値。
Ｅは、パワースペクトルに相当する係数の２乗値の相加平均の比である。係数値がｊによらず一定であればＥ＝１という最低値となる。この場合、入力されたデジタル音響データをそのデータ形式とした場合のサンプル値列は乱数であり、そのデータ形式では音響データらしいとは言えない。従って、入力されたデジタルデータを圧縮する場合に、音響データに特化した圧縮方法を利用しても、圧縮は期待できない。一方、係数の２乗値の変動が大きければＥは大きな値となる。周波数領域の係数（スペクトル）の値に大きな変動があるということで、音響データらしいと言える。従って、入力されたデジタルデータを圧縮する場合に、音響データに特化した圧縮方法を利用すれば、圧縮後のデータ量を少なくできる。つまり、高い圧縮効果が期待できる。また、これはサンプル値間の相関が強いことも意味している。
したがって、この方法によっても、入力されたデジタルデータが音響データであるか否かを推定することができる。 Here, M is a predetermined positive integer, and Y _j (j = 0 to M−1) is a square value of the j-th frequency domain coefficient.
E is an arithmetic mean ratio of square values of coefficients corresponding to the power spectrum. If the coefficient value is constant regardless of j, the minimum value is E = 1. In this case, the sample value sequence when the input digital acoustic data is in the data format is a random number, and it cannot be said that the data format seems to be acoustic data. Therefore, when the input digital data is compressed, compression cannot be expected even if a compression method specialized for acoustic data is used. On the other hand, if the variation of the square value of the coefficient is large, E becomes a large value. It can be said that it seems to be acoustic data because there is a large variation in the value of the coefficient (spectrum) in the frequency domain. Therefore, when compressing input digital data, the amount of data after compression can be reduced by using a compression method specialized for acoustic data. That is, a high compression effect can be expected. This also means that the correlation between sample values is strong.
Therefore, also by this method, it can be estimated whether the input digital data is acoustic data.

［変形例２］
音響データらしさを示す指標として、エネルギー／Ｆまたはエネルギー／Ｅを用いることもできる。エネルギー／Ｆまたはエネルギー／Ｅは、予測誤差のエネルギーの近似に相当し、これらの値が小さいほど音響データらしいと判断できる。したがって、この指標を用いる場合には、データ形式判別部１３０は、各データ形式に対応する指標（エネルギー／Ｆまたはエネルギー／Ｅ）の値を比較し、値が最も小さいデータ形式とその指標の値を得る（Ｓ１３０）。データ形式判別部１３０の指標比較部１３１は、当該音響データらしさの指標（エネルギー／Ｆまたはエネルギー／Ｅ）と閾値とを比較し、閾値より小さい場合に音響データらしいと判断し、そうでない場合は音響データらしくないと判断する（Ｓ１３１）。 [Modification 2]
Energy / F or energy / E can also be used as an index indicating the likelihood of acoustic data. Energy / F or energy / E corresponds to approximation of the energy of the prediction error, and it can be determined that the smaller the value, the more likely it is acoustic data. Therefore, when this index is used, the data format determination unit 130 compares the index (energy / F or energy / E) value corresponding to each data format, and the data format having the smallest value and the value of the index. Is obtained (S130). The index comparison unit 131 of the data format determination unit 130 compares the acoustic data-like index (energy / F or energy / E) with a threshold value, and determines that the data is likely to be acoustic data if it is smaller than the threshold value. It is determined that the sound data does not appear (S131).

［変形例３］
第１実施形態および変形例１、２では、ステップＳ１３１で単純に閾値との比較による判断をしていた。しかし、入力信号のデータ形式が１バイトか２バイト以上かによって、もしくはサンプル値列を波形の振幅とした場合のエネルギーの大きさによって、音響データが含まれるか否かの判断が変わる場合もある。そこで、本変形例では、データ形式判別部１３０の処理を変更した例を示す。 [Modification 3]
In the first embodiment and the first and second modifications, the determination is simply made by comparison with the threshold value in step S131. However, depending on whether the data format of the input signal is 1 byte, 2 bytes or more, or depending on the magnitude of energy when the sample value sequence is the amplitude of the waveform, the determination of whether or not acoustic data is included may change. . Therefore, in this modification, an example in which the processing of the data format determination unit 130 is changed is shown.

図３に本変形例でのデータ形式判別装置１００の処理フローを示す。この処理フローでは、図６のステップＳ１３１の代わりに、ステップＳ１３２〜Ｓ１３６が実行される。以下にステップＳ１３２〜Ｓ１３６について説明する。データ形式判別部１０３は、入力信号のデータ形式が１バイト単位であると判別されたかを確認する（Ｓ１３２）。ステップＳ１３２がＮｏの場合には、データ形式判別部１０３は、サンプル値列を波形の振幅とした場合のエネルギーを計算し（Ｓ１３３）、エネルギーが閾値よりも小さいかを確認する（Ｓ１３４）。閾値としては、例えば最大振幅でのエネルギーの１／１００とする方法などがある。ステップＳ１３４がＹｅｓの場合には、ステップＳ２４１に進む。また、ステップＳ１３４がＮｏの場合には、指標比較部１３１が、音響データらしさを示す指標が閾値より音響データらしいかを比較する（Ｓ１３５）。このステップでの、音響データらしさを示す指標であるＦやＥに対する閾値は、１００程度とすれば良い。ステップＳ１３５がＹｅｓの場合にはステップＳ２４１へ進み、Ｎｏの場合にはステップＳ２４２へ進む。ステップＳ１３２がＹｅｓの場合には、指標比較部１３１が、音響データらしさを示す指標が閾値より音響データらしいかを比較する（Ｓ１３６）。このステップは第１実施形態（図２）のステップＳ１３１と同じである。
このように、バイト数やエネルギーの違いによって、音響データが含まれているか否かを細かく判断するため、より精度の高い判別が期待できる。 FIG. 3 shows a processing flow of the data format discrimination device 100 in this modification. In this processing flow, steps S132 to S136 are executed instead of step S131 of FIG. Steps S132 to S136 will be described below. The data format determination unit 103 confirms whether the data format of the input signal is determined to be in 1-byte units (S132). If step S132 is No, the data format determination unit 103 calculates the energy when the sample value sequence is the waveform amplitude (S133), and checks whether the energy is smaller than the threshold (S134). As the threshold value, for example, there is a method of setting the energy to 1/100 of the maximum amplitude. When step S134 is Yes, it progresses to step S241. If step S134 is No, the index comparison unit 131 compares whether the index indicating the likelihood of acoustic data is more likely to be acoustic data than the threshold (S135). In this step, the threshold value for F and E, which are indices indicating the likelihood of acoustic data, may be about 100. When step S135 is Yes, it progresses to step S241, and when it is No, it progresses to step S242. When step S132 is Yes, the index comparison unit 131 compares whether the index indicating the likelihood of acoustic data is more likely to be acoustic data than the threshold (S136). This step is the same as step S131 of the first embodiment (FIG. 2).
Thus, since it is determined in detail whether or not acoustic data is included, depending on the number of bytes and energy, it is possible to expect more accurate determination.

［第２実施形態］
図４に本実施形態の符号化装置の機能構成例を示す。また、図５に符号化装置２００の処理フローを示す。符号化装置２００は、データ記録部１１０、指標判定部１２０、データ形式判別部１３０、符号化部２４０から構成される。また、符号化部２４０は、音響信号用の符号化手段である音響信号符号化部２４１とＺＩＰなどの汎用的な符号化手段である非音響信号符号化部を備えている。図４と図１との比較からも分かるように、図４の符号化装置２００は、図１の音響データ形式判別装置１００に符号化部２４０を付加した構成となっている。 [Second Embodiment]
FIG. 4 shows an example of the functional configuration of the encoding apparatus according to this embodiment. FIG. 5 shows a processing flow of the encoding device 200. The encoding device 200 includes a data recording unit 110, an index determination unit 120, a data format determination unit 130, and an encoding unit 240. The encoding unit 240 includes an acoustic signal encoding unit 241 that is an encoding unit for acoustic signals and a non-acoustic signal encoding unit that is a general-purpose encoding unit such as ZIP. As can be seen from a comparison between FIG. 4 and FIG. 1, the encoding device 200 of FIG. 4 has a configuration in which an encoding unit 240 is added to the acoustic data format determination device 100 of FIG. 1.

図５の処理フローも、図２の処理フローにステップＳ２４１とＳ２４２が付加された処理フローである。ステップＳ２４１では、音響信号符号化部２４１が、データ形式判別部１３０が判別した音響データのデータ形式にしたがって、データ記録部１１０に記録されたデジタルデータを符号化する。したがって、音響信号符号化部２４１には、複数の音響データのデータ形式に対応できる符号化手段が備えられている必要がある。つまり指標計算部１２４で音響データらしさの指標を計算するとき（ステップＳ１２４）に用いるデータ形式は、音響信号符号化部２４１で対応できる符号化方法の範囲内に限られる。ステップＳ２４２では、非音響信号符号化部２４２が、ＺＩＰなどの汎用的な符号化手段によりデータ記録部１１０に記録されたデジタルデータを符号化する。
このように、本実施形態では、入力されたデジタルデータを、音響データが含まれる場合には最も音響データらしいと判定されたデータ形式であるとして符号化し、音響データが含まれない場合には汎用的な符号化手段で符号化できるので、高い圧縮効率が期待できる。 The processing flow of FIG. 5 is also a processing flow in which steps S241 and S242 are added to the processing flow of FIG. In step S241, the acoustic signal encoding unit 241 encodes the digital data recorded in the data recording unit 110 according to the data format of the acoustic data determined by the data format determination unit 130. Therefore, the acoustic signal encoding unit 241 needs to be provided with an encoding unit that can support the data format of a plurality of acoustic data. That is, the data format used when the index calculation unit 124 calculates the index of the acoustic data likelihood (step S124) is limited to the range of the encoding method that can be handled by the acoustic signal encoding unit 241. In step S242, the non-acoustic signal encoding unit 242 encodes the digital data recorded in the data recording unit 110 by a general-purpose encoding unit such as ZIP.
As described above, in the present embodiment, the input digital data is encoded as the data format determined to be the most likely sound data when the sound data is included, and general-purpose when the sound data is not included. Therefore, high compression efficiency can be expected.

［変形例］
第２実施形態では、ステップＳ１３１で単純に閾値との比較による判断をしていた。しかし、入力信号のデータ形式が１バイトか２バイト以上かによって、もしくはサンプル値列を波形の振幅とした場合のエネルギーの大きさによって、音響用の符号化手段を用いた方が良いか、非音響用の符号化手段を用いた方が良いかが変わる場合もある。そこで、本変形例では、データ形式判別部１３０の処理を変更した例を示す。 [Modification]
In the second embodiment, the determination is simply made in comparison with the threshold value in step S131. However, depending on whether the data format of the input signal is 1 byte or 2 bytes or more, or depending on the magnitude of energy when the sample value sequence is the amplitude of the waveform, it is better to use the encoding means for sound. In some cases, it may be better to use acoustic encoding means. Therefore, in this modification, an example in which the processing of the data format determination unit 130 is changed is shown.

図６に本変形例での符号化装置２００の処理フローを示す。図６の処理フローも、図３の処理フローにステップＳ２４１とＳ２４２が付加された処理フローである。したがって、図６と図３の違いは、図５（第２実施形態）と図２（第１実施形態）の違いと同じである。
このように、バイト数やエネルギーの違いによって、音響データが含まれているか否かを細かく判断するため、より精度の高い判別が期待できる。したがって、より効率の良い符号化が期待できる。 FIG. 6 shows a processing flow of the encoding apparatus 200 according to this modification. The processing flow in FIG. 6 is also a processing flow in which steps S241 and S242 are added to the processing flow in FIG. Therefore, the difference between FIG. 6 and FIG. 3 is the same as the difference between FIG. 5 (second embodiment) and FIG. 2 (first embodiment).
Thus, since it is determined in detail whether or not acoustic data is included, depending on the number of bytes and energy, it is possible to expect more accurate determination. Therefore, more efficient encoding can be expected.

第１、第２実施形態では、入力データが音響信号をサンプリングして得られたデジタル音響データかもしれない場合について説明した。しかし、入力データがアナログ波形データをサンプリングして得たものかもしれない場合であれば、その他の時系列データ等であっても本発明を利用することができる。この場合、各実施形態の「音響データらしさの判定」を、「波形データらしさの判定」に変更するだけで、一般的な波形データに対して本発明が適用できる。
なお、上記の実施形態はコンピュータに、上記方法の各ステップを実行させるプログラムを読み込ませ、実施することもできる。また、コンピュータに読み込ませる方法としては、プログラムをコンピュータ読み取り可能な記録媒体に記録しておき、記録媒体からコンピュータに読み込ませる方法、サーバ等に記録されたプログラムを電気通信回線等を通じてコンピュータに読み込ませる方法などがある。 In the first and second embodiments, the case where the input data may be digital acoustic data obtained by sampling an acoustic signal has been described. However, if the input data may be obtained by sampling analog waveform data, the present invention can be used with other time-series data. In this case, the present invention can be applied to general waveform data only by changing “determination of the likelihood of acoustic data” in each embodiment to “determination of the likelihood of waveform data”.
In addition, said embodiment can also read and implement the program which makes a computer perform each step of the said method. Also, as a method for reading into the computer, the program is recorded on a computer-readable recording medium, and the program is read from the recording medium into the computer, or the program recorded in the server or the like is read into the computer through an electric communication line or the like. There are methods.

［実験例］
本発明の効果を評価するため、第１実施形態でのシミュレーション結果を以下に示す。シミュレーションの条件は次のとおりである。１つ目の入力信号は、２バイト／サンプル、２０４８サンプル／フレームのデータ量（１７５フレーム）、２チャネルの音響データである。あらかじめ設定したデータ形式は、チャネル数（１、２、３）である。２つ目の入力信号は、１バイト単位のテキストデータである。計算した指標は、エネルギー、Ｆ（ｐ＝１、２、３、２０）である。また、Ｆを計算する際に仮定した音響データのデータ形式は、１つ目の入力のデータ形式と同じ（２バイト、２チャネル）としている。シミュレーションでは、入力信号の先頭の９フレームに対してエネルギーとＦを計算した。たとえば、１つ目の入力信号では、チャネル数が２なので、第１チャネルの５フレーム分と第２チャネルの４フレーム分である。 [Experimental example]
In order to evaluate the effects of the present invention, simulation results in the first embodiment are shown below. The simulation conditions are as follows. The first input signal is 2 bytes / sample, 2048 samples / frame data amount (175 frames), and 2-channel acoustic data. The data format set in advance is the number of channels (1, 2, 3). The second input signal is text data in units of 1 byte. The calculated index is energy, F (p = 1, 2, 3, 20). The data format of the acoustic data assumed when calculating F is the same as the data format of the first input (2 bytes, 2 channels). In the simulation, energy and F were calculated for the first nine frames of the input signal. For example, in the first input signal, since the number of channels is 2, it corresponds to 5 frames of the first channel and 4 frames of the second channel.

１つ目の入力に対するシミュレーション結果を図７、２つ目の入力に対するシミュレーション結果を図８に示す。１つ目の入力の場合、最初の２つのフレームは音響データらしさの指標の値が小さいが、その他の値は大きい。特に、ｐ＝２０の６フレーム目では、２７６０８という大きな値となっている。それに対し、２つ目の入力の場合は、Ｆの値はどれも１０以下であり、小さな値となっている。
また、１つ目の入力を音響データとして符号化した場合には３０．９０％に、ＺＩＰにより符号化した場合は８８．１４％に圧縮できた。２つ目の入力を音響データとして符号化した場合は９６．４０％、ＺＩＰにより符号化した場合は２０．３６％に圧縮できた。 The simulation result for the first input is shown in FIG. 7, and the simulation result for the second input is shown in FIG. In the case of the first input, the first two frames have a small acoustic data-like index value, but the other values are large. In particular, in the sixth frame at p = 20, the value is as large as 27608. On the other hand, in the case of the second input, all the values of F are 10 or less, which is a small value.
When the first input was encoded as acoustic data, it was compressed to 30.90%, and when encoded by ZIP, it was compressed to 88.14%. When the second input was encoded as acoustic data, it was compressed to 96.40%, and when encoded using ZIP, it was compressed to 20.36%.

したがって、音響データらしさの指標（Ｆ）を用いてデジタルデータを評価することにより、音響データが含まれているか否かの判断ができ、効率の良い符号化が期待できることが分かる。
なお、Ｆの値は、ｐを増加させると単調に増加する。また、Ｙ_ｊを平滑化したパワースペクトルまたは微小な正の値を加えたパワースペクトルとしてＥの値を計算すると、その値はｐを大きくしたときのＦの値と近似することが知られている。したがって、ｐが小さい値の場合でもチャネル数を３とするべきであることは分かるが、ｐを２０程度の値にしたＦやＥもしくはエネルギー／Ｆやエネルギー／Ｅを指標とすることが望ましい。また、少なくともｐを３以上とすべきである。 Therefore, it can be seen that by evaluating digital data using the index (F) of the likelihood of sound data, it can be determined whether sound data is included, and efficient coding can be expected.
Note that the value of F increases monotonically as p is increased. Further, when the value of E is calculated as a power spectrum obtained by smoothing Y _j or a power spectrum obtained by adding a small positive value, it is known that the value approximates the value of F when p is increased. . Therefore, although it is understood that the number of channels should be 3 even when p is small, it is desirable to use F or E or energy / F or energy / E with p as a value of about 20 as an index. Moreover, at least p should be 3 or more.

第１実施形態の音響データ形式判別部の機能構成例を示す図。The figure which shows the function structural example of the acoustic data format discrimination | determination part of 1st Embodiment. 第１実施形態の音響データ形式判別部の処理フローを示す図。The figure which shows the processing flow of the acoustic data format discrimination | determination part of 1st Embodiment. 第１実施形態の変形例３の音響データ形式判別部の処理フローを示す図。The figure which shows the processing flow of the acoustic data format discrimination | determination part of the modification 3 of 1st Embodiment. 第２実施形態の符号化装置の機能構成例を示す図。The figure which shows the function structural example of the encoding apparatus of 2nd Embodiment. 第２実施形態の符号化装置の処理フローを示す図。The figure which shows the processing flow of the encoding apparatus of 2nd Embodiment. 第２実施形態の変形例の符号化装置の処理フローを示す図。The figure which shows the processing flow of the encoding apparatus of the modification of 2nd Embodiment. 音響データを入力とする場合のシミュレーション結果を示す図。The figure which shows the simulation result in the case of making acoustic data into input. テキストデータを入力とする場合のシミュレーション結果を示す図。The figure which shows the simulation result in the case of inputting text data.

Claims

A data format determination method for determining whether or not input digital data includes acoustic data,
For a sample value string in the input digital data,

Where p is a predetermined positive integer, k _i (i = 1 to p) is a PARCOR coefficient, and an index calculation step for obtaining
When the value of F obtained in the index calculation step is greater than or equal to a preset threshold, it is determined that the input digital data includes acoustic data, and when the value of F is smaller than the threshold, input And a data format determining step for determining that the digital data does not include acoustic data .

Where M is a predetermined positive integer, Y _j (j = 0 to M−1) is an index calculation step for obtaining the square of the j-th frequency domain coefficient ,
When the value of E obtained in the index calculation step is greater than or equal to a preset threshold, it is determined that the input digital data includes acoustic data, and when the value of E is smaller than the threshold, input And a data format determining step for determining that the digital data does not include acoustic data .

A data format determination method for determining whether or not input digital data includes acoustic data,
For sample value sequences in the input digital data , energy / F, but

, P is a predetermined positive integer, k _i (i = 1 to p) is a PARCOR coefficient, an index calculation step,
When the energy / F value obtained in the index calculation step is smaller than a preset threshold value , it is determined that the input digital data includes acoustic data, and the energy / F value is equal to or greater than the threshold value. A data format discrimination method comprising: a data format discrimination step for determining that the input digital data does not include acoustic data in some cases .

A data format determination method for determining whether or not input digital data includes acoustic data,
For the sample value string in the input digital data , energy / E, but

, M is a predetermined positive integer, Y _j (j = 0 to M−1) is an index calculation step for obtaining the square of the j-th frequency domain coefficient ,
The index calculation digital data input when the value of the energy / E obtained there et beforehand set threshold value smaller than in step is determined to contain sound data, the value of the energy / E is the threshold value A data format determination method, comprising: a data format determination step for determining that the input digital data does not include acoustic data in the case of the above .

A data format discrimination device for discriminating whether or not input digital data includes acoustic data,
A data recording unit for recording the input digital data;
For a sample value string in the input digital data,

Where p is a predetermined positive integer, k _i (i = 1 to p) is a PARCOR coefficient ,
When the value of F obtained by the index calculation unit is greater than or equal to a preset threshold, it is determined that the input digital data includes acoustic data, and when the value of F is smaller than the threshold, input A data format discriminating device comprising: a data format discriminating unit that judges that the digital data that has been recorded does not contain acoustic data .

Where M is a predetermined positive integer, and Y _j (j = 0 to M−1) is the square of the j-th frequency domain coefficient ;
When the value of E obtained by the index calculation unit is greater than or equal to a preset threshold, it is determined that the input digital data includes acoustic data, and when the value of E is smaller than the threshold, input A data format discriminating device comprising: a data format discriminating unit that judges that the digital data that has been recorded does not contain acoustic data .

A data format discrimination device for discriminating whether or not input digital data includes acoustic data,
A data recording unit for recording the input digital data;
For the sample value sequence in the input digital data , energy / F, but

, P is a predetermined positive integer, k _i (i = 1 to p) is a PARCOR coefficient ,
If the energy / F value obtained by the index calculation unit is smaller than a preset threshold value , it is determined that the input digital data includes acoustic data, and the energy / F value is equal to or greater than the threshold value. A data format discriminating device comprising a data format discriminating unit that judges that input digital data does not include acoustic data .

A data format discrimination device for discriminating whether or not input digital data includes acoustic data,
A data recording unit for recording the input digital data;
For the sample value sequence in the input digital data , energy / E, where

, M is a predetermined positive integer, Y _j (j = 0 to M−1) is the square of the jth frequency domain coefficient ,
When the energy / E value obtained by the index calculation unit is smaller than a preset threshold value , it is determined that the input digital data includes acoustic data, and the energy / E value is equal to or greater than the threshold value. A data format discriminating apparatus comprising: a data format discriminating unit that judges that input digital data does not include acoustic data .

The program which makes a computer perform the method in any one of Claims 1-4 .

A computer-readable recording medium on which the program according to claim 9 is recorded.