JP2011186187A

JP2011186187A - Speech processor, speech processing method and speech processing program

Info

Publication number: JP2011186187A
Application number: JP2010051360A
Authority: JP
Inventors: Toshiharu Kuwaoka; 俊治桑岡
Original assignee: JVCKenwood Holdings Inc
Current assignee: JVCKenwood Holdings Inc
Priority date: 2010-03-09
Filing date: 2010-03-09
Publication date: 2011-09-22

Abstract

<P>PROBLEM TO BE SOLVED: To create a digital speech signal which is closer to original sound by suitably adding a high-frequency component according to an acquired digital speech signal. <P>SOLUTION: A speech processor 100 includes: a signal analysis section 124 which determines whether or not the high-frequency component of a predetermined frequency or more and a predetermined sound pressure or more is included in the digital speech signal; a correction value creation section 130 which creates a correction value for expanding an amplitude of the digital speech signal based on a coefficient that is different depending on whether or not the signal analysis section determines that the high-frequency component is included in the digital speech signal; and an addition section 134 for adding the correction value to the digital speech signal. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、デジタル音声信号を分析し、その分析結果に応じてデジタル音声信号を処理する音声処理装置、音声処理方法および音声処理プログラムに関する。 The present invention relates to a sound processing apparatus, a sound processing method, and a sound processing program for analyzing a digital sound signal and processing the digital sound signal according to the analysis result.

近年、音声符号化技術の進歩により、ＣＤ（Compact Disc）等に収録されている楽曲の音質を極力維持したままファイルサイズを小さくすることが可能となり、その結果として、メモリタイプのポータブルオーディオプレーヤに大量の楽曲を収録して携帯することが可能となった。 In recent years, due to advances in audio coding technology, it has become possible to reduce the file size while maintaining the sound quality of music recorded on CDs (Compact Discs) as much as possible. A large amount of music can be recorded and carried.

しかし、上述した音声符号化技術は、人間の聴覚特性を利用して通常聞き取れない高周波数帯域の音声信号をカットしたり、マスキング効果により聞き取れない音のデータを間引いたりしているため、デジタル化する前の原音と比較すると、音の伸び、広がり、ダイナミックレンジ、艶っぽさに乏しくなる。そのため、音声符号化技術により圧縮されたデジタル音声信号の音質を改善する技術が開発されている。 However, the above-described speech coding technology cuts out high-frequency band speech signals that are not normally audible using human auditory characteristics, or thins out unacceptable sound data due to the masking effect. Compared to the original sound before the sound, it will be less stretched, spread, dynamic range and glossy. Therefore, a technique for improving the sound quality of a digital audio signal compressed by an audio encoding technique has been developed.

例えば、本発明の発明者は、デジタル音声信号の極値のサンプルとその極値の直前のサンプルとの差分値に、極値間のサンプル数に応じた係数を乗算した値をデジタル音声信号に加算することで、所定周波数以上の高周波数成分をデジタル音声信号に付加する技術を提案した（例えば、特許文献１、２）。 For example, the inventor of the present invention sets a value obtained by multiplying a difference value between an extreme value sample of a digital audio signal and a sample immediately before the extreme value by a coefficient corresponding to the number of samples between the extreme values in the digital audio signal. A technique for adding a high-frequency component equal to or higher than a predetermined frequency to a digital audio signal by addition is proposed (for example, Patent Documents 1 and 2).

特許第３４０１１７１号公報Japanese Patent No. 3401171 特許第３６５９４８９号公報Japanese Patent No. 3659489

上記の特許文献１、２の技術のように、高周波数成分を付加することにより音質を改善する音質改善処理は、ＣＤ規格を始め、ＭＰＥＧ（Moving Picture Expert Group）−２、ＡＡＣ（登録商標）（Advanced Audio Coding）、ＡＴＲＡＣ（登録商標）（Adaptive TRansform Acoustic Coding）、ＭＰ３（MPEG Audio Layer-3）、ＷＭＡ（Windows（登録商標） Media Audio）等の規格に基づくデジタル音声信号に対して、エンコードおよびデコードを施す種々の場面で適応することができる。そのため、すでに音質改善処理が為されているデジタル音声信号に対して、音質改善処理が重複して施される場合が生じる。 The sound quality improvement processing for improving the sound quality by adding a high frequency component as in the techniques of Patent Documents 1 and 2 described above includes the CD standard, MPEG (Moving Picture Expert Group) -2, AAC (registered trademark). (Advanced Audio Coding), ATRAC (registered trademark) (Adaptive TRansform Acoustic Coding), MP3 (MPEG Audio Layer-3), WMA (Windows (registered trademark) Media Audio), etc. And can be applied in various scenes where decoding is performed. For this reason, the sound quality improvement processing may be performed on the digital audio signal that has already been subjected to the sound quality improvement processing.

しかし、すでに音質改善処理が為されているデジタル音声信号に対して、音質改善処理が為されていない通常のデジタル音声信号と同等の音質改善処理を施すと、高周波数成分が過剰に付加され、再生される音の中高音域のバランス感が変わる等、原音から離れた音となってしまう。 However, if a sound quality improvement process equivalent to a normal digital sound signal that has not been subjected to sound quality improvement processing is applied to a digital sound signal that has already been subjected to sound quality improvement processing, an excessive amount of high frequency components will be added, For example, the balance of the mid- and high-frequency ranges of the reproduced sound changes, resulting in a sound that is far from the original sound.

本発明は、このような課題に鑑み、取得したデジタル音声信号に応じて適切に高周波数成分を付加することで、より原音に近いデジタル音声信号を生成可能な音声処理装置、音声処理方法および音声処理プログラムを提供することを目的としている。 In view of such a problem, the present invention provides a sound processing device, a sound processing method, and sound that can generate a digital sound signal closer to the original sound by appropriately adding a high frequency component according to the acquired digital sound signal. The purpose is to provide a processing program.

上記課題を解決するために、本発明の音声処理装置は、所定周波数以上かつ所定音圧以上の高周波数成分がデジタル音声信号に含まれているか否かを判断する信号分析部と、信号分析部がデジタル音声信号に高周波数成分が含まれていると判断したか否かに応じて異なる係数に基づき、デジタル音声信号の振幅を拡大するような補正値を生成する補正値生成部と、デジタル音声信号に補正値を加算する加算部と、を備えることを特徴とする。 In order to solve the above-described problems, a speech processing apparatus according to the present invention includes a signal analysis unit that determines whether or not a high-frequency component having a predetermined frequency or higher and a predetermined sound pressure or higher is included in a digital audio signal; A correction value generation unit that generates a correction value that expands the amplitude of the digital audio signal based on a coefficient that differs depending on whether or not the digital audio signal includes a high-frequency component, and digital audio And an addition unit for adding a correction value to the signal.

上記信号分析部は、デジタル音声信号の任意の極値から次の極値までのサンプル数が所定数より少なく、任意の極値および次の極値のうち、いずれか大きい方の極値である極大値と、任意の極値および次の極値のうち、いずれか小さい方の極値である極小値との差分である極値間差分値が所定値を超えている場合、デジタル音声信号に高周波数成分が含まれていると判断してもよい。 The signal analysis unit has a smaller number of samples from an arbitrary extreme value to the next extreme value of the digital audio signal than a predetermined number, and is the larger extreme value between the arbitrary extreme value and the next extreme value. If the difference value between the extreme values, which is the difference between the local maximum value and any local extreme value or the next local extreme value, whichever is the smaller extreme value, exceeds a predetermined value, the digital audio signal It may be determined that a high frequency component is included.

上記信号分析部は、デジタル音声信号の任意の極値から次の極値までのサンプル数が所定数より少ない複数のサンプル全体に対して、任意の極値および次の極値のうち、いずれか大きい方の極値である極大値と、任意の極値および次の極値のうち、いずれか小さい方の極値である極小値との差分である極値間差分値が所定値を超えているサンプルが占める占有率が所定比率を超えていると、デジタル音声信号に高周波数成分が含まれていると判断してもよい。 The signal analysis unit is configured to select any one of an extreme value and a next extreme value for a plurality of samples in which the number of samples from an arbitrary extreme value to the next extreme value of the digital audio signal is less than a predetermined number. The difference value between extreme values that is the difference between the maximum value that is the larger extreme value and the minimum value that is the smaller extreme value of any extreme value or the next extreme value exceeds the predetermined value. If the occupation ratio occupied by a certain sample exceeds a predetermined ratio, it may be determined that a high frequency component is included in the digital audio signal.

上記補正値生成部は、信号分析部がデジタル音声信号に高周波数成分が含まれていると判断した場合、信号分析部がデジタル音声信号に高周波数成分が含まれていないと判断した場合より小さい係数に基づいて補正値を生成してもよい。 The correction value generation unit is smaller than when the signal analysis unit determines that the digital audio signal does not include a high frequency component when the signal analysis unit determines that the digital audio signal includes a high frequency component. A correction value may be generated based on the coefficient.

上記補正値生成部は、デジタル音声信号のフォーマットに応じて異なる係数に基づき補正値を生成してもよい。 The correction value generation unit may generate a correction value based on different coefficients depending on the format of the digital audio signal.

上記信号分析部は、デジタル音声信号のフォーマットに基づいて所定数と所定値とを決定してもよい。 The signal analysis unit may determine a predetermined number and a predetermined value based on a format of the digital audio signal.

上記音声処理装置は、音声処理装置が取得したデジタル音声信号をアップサンプリングする第１変換部と、加算部が補正値を加算した後のデジタル音声信号を、第１変換部によってアップサンプリングされる前のサンプリング周波数へとダウンサンプリングする第２変換部と、をさらに備えてもよい。 The audio processing device includes a first conversion unit that upsamples a digital audio signal acquired by the audio processing device, and a digital audio signal after the addition unit adds a correction value before the first conversion unit upsamples the digital audio signal. And a second conversion unit that down-samples to the sampling frequency.

上記補正値生成部は、デジタル音声信号の極大値と、その極大値となったサンプルの１サンプル前のサンプルの値との差分である極大差分値に係数を乗算すると共に、デジタル音声信号の極小値と、その極小値となったサンプルの１サンプル前のサンプルの値との差分である極小差分値に係数を乗算することで補正値を生成し、極大差分値に基づいて生成した補正値が極大値に加算されるように極大値となったサンプルに対応付け、極小差分値に基づいて生成した補正値が極小値から減算されるように極小値となったサンプルに対応付けてもよい。 The correction value generation unit multiplies the maximum difference value, which is the difference between the maximum value of the digital audio signal and the value of the sample one sample before the sample having the maximum value, by the coefficient, and the minimum of the digital audio signal. A correction value is generated by multiplying the minimum difference value, which is the difference between the value and the value of the sample one sample before the sample that is the minimum value, by a coefficient, and the correction value generated based on the maximum difference value is The correction value generated based on the minimum difference value may be associated with the sample having the minimum value so that the correction value generated based on the minimum difference value is subtracted from the minimum value so as to be added to the maximum value.

上記補正値生成部は、デジタル音声信号の極大値の１サンプル前のサンプルおよび１サンプル後のサンプルそれぞれの値と極大値との差分それぞれに係数を乗算すると共に、デジタル音声信号の極小値の１サンプル前のサンプルおよび１サンプル後のサンプルそれぞれの値と極小値との差分それぞれに係数を乗算することで補正値を生成し、極大値の１サンプル前のサンプルの値と極大値との差分に基づいて生成した補正値が極大値の１サンプル前のサンプルの値に加算されるように極大値の１サンプル前のサンプルに対応付け、極大値の１サンプル後のサンプルの値と極大値との差分に基づいて生成した補正値が極大値の１サンプル後のサンプルの値に加算されるように極大値の１サンプル後のサンプルに対応付け、極小値の１サンプル前のサンプルの値と極小値との差分に基づいて生成した補正値が極小値の１サンプル前のサンプルの値から減算されるように極小値の１サンプル前のサンプルに対応付け、極小値の１サンプル後のサンプルの値と極小値との差分に基づいて生成した補正値が極小値の１サンプル後のサンプルの値から減算されるように極小値の１サンプル後のサンプルに対応付けてもよい。 The correction value generation unit multiplies the difference between the maximum value of the sample one sample before and one sample after the maximum value of the digital audio signal by the coefficient, and 1 of the minimum value of the digital audio signal. A correction value is generated by multiplying the difference between the value of each of the sample before the sample and the sample after one sample and the minimum value by a coefficient, and the difference between the value of the sample one sample before the maximum value and the maximum value is obtained. Corresponding to the sample one sample before the maximum value so that the correction value generated based on the sample value one sample before the maximum value is added, the value of the sample one sample after the maximum value and the maximum value The correction value generated based on the difference is associated with the sample after one sample of the maximum value so that it is added to the value of the sample after one sample of the maximum value, and one sample before the minimum value Corresponding to the sample one sample before the minimum value so that the correction value generated based on the difference between the sample value and the minimum value is subtracted from the value of the sample one sample before the minimum value, one sample of the minimum value The correction value generated based on the difference between the value of the subsequent sample and the minimum value may be associated with the sample after one sample of the minimum value so as to be subtracted from the value of the sample after one sample of the minimum value.

上記課題を解決するために、本発明の音声処理方法は、所定周波数以上かつ所定音圧以上の高周波数成分がデジタル音声信号に含まれているか否かを判断し、デジタル音声信号に高周波数成分が含まれているか否かに応じて異なる係数に基づき、デジタル音声信号の振幅を拡大するような補正値を生成し、デジタル音声信号に補正値を加算することを特徴とする。 In order to solve the above problems, the audio processing method of the present invention determines whether or not a high-frequency component having a predetermined frequency or higher and a predetermined sound pressure or higher is included in the digital audio signal, and the high-frequency component is included in the digital audio signal. A correction value that expands the amplitude of the digital audio signal is generated based on a coefficient that differs depending on whether or not the signal is included, and the correction value is added to the digital audio signal.

上記課題を解決するために、本発明の音声処理プログラムは、コンピュータに、所定周波数以上かつ所定音圧以上の高周波数成分がデジタル音声信号に含まれているか否かを判断するステップと、デジタル音声信号に高周波数成分が含まれているか否かに応じて異なる係数に基づき、デジタル音声信号の振幅を拡大するような補正値を生成するステップと、デジタル音声信号に補正値を加算するステップと、を実行させることを特徴とする。 In order to solve the above-described problem, the audio processing program of the present invention includes a step of determining whether a digital audio signal includes a high frequency component having a frequency equal to or higher than a predetermined frequency and higher than a predetermined sound pressure. Generating a correction value for enlarging the amplitude of the digital audio signal based on different coefficients depending on whether or not the signal contains a high frequency component; and adding the correction value to the digital audio signal; Is executed.

以上説明したように本発明によれば、取得したデジタル音声信号に応じて適切に高周波数成分を付加することで、より原音に近いデジタル音声信号を生成可能となる。 As described above, according to the present invention, it is possible to generate a digital audio signal closer to the original sound by appropriately adding a high frequency component according to the acquired digital audio signal.

音声処理装置の利用状態を示した説明図である。It is explanatory drawing which showed the utilization condition of the audio processing apparatus. 音声処理装置の全体構成を説明するための機能ブロック図である。It is a functional block diagram for demonstrating the whole structure of a speech processing unit. 任意の極値からその次の極値までのサンプル数と、その任意の極値とその次の極値を含む半周期のデジタル音声信号に対応する周波数との関係を説明するための説明図である。It is explanatory drawing for demonstrating the relationship between the frequency corresponding to the number of samples from an arbitrary extreme value to the next extreme value, and the half-cycle digital audio signal including the arbitrary extreme value and the next extreme value. is there. 信号分析部の処理を説明するための説明図である。It is explanatory drawing for demonstrating the process of a signal analysis part. 信号分析部が行う、占有率が所定比率を超えているか否かに応じて、デジタル音声信号に高周波数成分が含まれているか否かを判断する処理を説明するための説明図である。It is explanatory drawing for demonstrating the process which a signal analysis part judges whether the high frequency component is contained in the digital audio | voice signal according to whether the occupation rate exceeds the predetermined ratio. デジタル音声信号のフォーマットと、サンプル数閾値および極値レベル閾値との関係の一例を示す説明図である。It is explanatory drawing which shows an example of the format of a digital audio | voice signal, and a sample number threshold value and an extreme value level threshold value. 係数テーブル群および係数テーブルを説明するための説明図である。It is explanatory drawing for demonstrating a coefficient table group and a coefficient table. デジタル音声信号に補正信号を加算する処理をさらに詳細に説明するための説明図である。It is explanatory drawing for demonstrating in more detail the process which adds a correction signal to a digital audio | voice signal. デジタル音声信号に補正信号を加算する他の処理を説明するための説明図である。It is explanatory drawing for demonstrating the other process which adds a correction signal to a digital audio | voice signal. デジタル音声信号に補正信号を加算する他の処理を説明するための説明図である。It is explanatory drawing for demonstrating the other process which adds a correction signal to a digital audio | voice signal. 第１変換部の処理を説明するための説明図である。It is explanatory drawing for demonstrating the process of a 1st conversion part. 音声処理装置による高音質化処理が可能なコンピュータ（情報処理装置）の典型例を示した機能ブロック図である。It is the functional block diagram which showed the typical example of the computer (information processing apparatus) in which the sound quality improvement process by an audio processing apparatus is possible. 音声処理方法の全体的な処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the whole process of the audio | voice processing method. 音声処理方法の全体的な処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the whole process of the audio | voice processing method.

以下に添付図面を参照しながら、本発明の好適な実施形態について詳細に説明する。かかる実施形態に示す寸法、材料、その他具体的な数値等は、発明の理解を容易とするための例示にすぎず、特に断る場合を除き、本発明を限定するものではない。なお、本明細書及び図面において、実質的に同一の機能、構成を有する要素については、同一の符号を付することにより重複説明を省略し、また本発明に直接関係のない要素は図示を省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The dimensions, materials, and other specific numerical values shown in the embodiments are merely examples for facilitating the understanding of the invention, and do not limit the present invention unless otherwise specified. In the present specification and drawings, elements having substantially the same function and configuration are denoted by the same reference numerals, and redundant description is omitted, and elements not directly related to the present invention are not illustrated. To do.

（音声処理装置１００）
図１は、音声処理装置１００の利用状態を説明するための説明図である。音声処理装置１００は、放送局１０２から放送波を通じて、コンテンツサーバ１０４から通信網１０６を通じて、または、記憶媒体１０８から直接、デジタル音声信号を取得し、そのデジタル音声信号に高周波数成分を付加することで、デジタル音声信号の音質を改善する。ユーザは、改善されたデジタル音声信号を、音声処理装置１００から直接、または、ポータブルオーディオプレーヤや携帯電話といった再生装置１１０に転送して聴くことができる。 (Speech processor 100)
FIG. 1 is an explanatory diagram for explaining a usage state of the speech processing apparatus 100. The audio processing apparatus 100 acquires a digital audio signal through broadcast waves from the broadcast station 102, through the communication network 106 from the content server 104, or directly from the storage medium 108, and adds a high frequency component to the digital audio signal. To improve the sound quality of digital audio signals. The user can listen to the improved digital audio signal directly from the audio processing device 100 or by transferring it to a playback device 110 such as a portable audio player or a mobile phone.

また、コンテンツサーバ１０４が、音声処理装置１００を有してもよく、その場合、コンテンツサーバ１０４の音声処理装置１００によって高周波数成分が付加された音声信号は、通信網１０６を通じて、パーソナルコンピュータやポータブルオーディオプレーヤ、携帯電話といった再生装置１１０へ配信される。 In addition, the content server 104 may include the audio processing device 100. In this case, the audio signal to which the high frequency component is added by the audio processing device 100 of the content server 104 is transmitted via the communication network 106 to a personal computer or a portable computer. It is distributed to a playback device 110 such as an audio player or a mobile phone.

また、ポータブルオーディオプレーヤ、携帯電話といった再生装置１１０が音声処理装置１００を有してもよい。その場合、コンテンツサーバ１０４から通信網１０６を通じて配信されたデジタル音声信号は、ポータブルオーディオプレーヤ、携帯電話といった再生装置１１０の音声処理装置１００によって、高周波数成分が付加されて再生される。 Further, the playback device 110 such as a portable audio player or a mobile phone may include the sound processing device 100. In this case, the digital audio signal distributed from the content server 104 through the communication network 106 is reproduced with a high frequency component added by the audio processing device 100 of the reproduction device 110 such as a portable audio player or a mobile phone.

音声処理装置１００が音質改善することができるデジタル音声信号は、例えば、ＣＤやＤＶＤ（Digital Versatile Disc）規格に基づいた音声信号、ＭＰＥＧ−２、ＡＡＣ（登録商標）、ＨＥ−ＡＡＣ、ＡＴＲＡＣ（登録商標）、ＭＰ３、ＷＭＡ等の音声符号化処理によって高周波数帯域がカットされた音声信号である。 Examples of digital audio signals that can be improved by the audio processing apparatus 100 include audio signals based on CD and DVD (Digital Versatile Disc) standards, MPEG-2, AAC (registered trademark), HE-AAC, and ATRAC (registered). (Trademark), MP3, WMA, and the like.

通常、音声信号は、デジタル化する際に、サンプリング周波数の半分以下の周波数成分に制限されてしまう。さらに、デジタル化された音声信号（デジタル音声信号）は、通信網１０６を経由させる際、通信網１０６での通信負荷を軽減するため、圧縮処理が施される場合もある。そのため、デジタル音声信号、またはデジタル化されてさらに圧縮されたデジタル音声信号は、高周波数成分を有さず、原音の再現性に乏しくなる。そこで、高周波数成分を付加する音質改善処理を施す場合がある。なお、原音とはデジタル化される前の音声信号である。 Normally, when an audio signal is digitized, the audio signal is limited to a frequency component equal to or less than half the sampling frequency. Further, when the digitized audio signal (digital audio signal) passes through the communication network 106, it may be subjected to compression processing in order to reduce the communication load on the communication network 106. Therefore, a digital audio signal or a digital audio signal that has been digitized and further compressed does not have a high frequency component, and the reproducibility of the original sound is poor. Therefore, a sound quality improvement process for adding a high frequency component may be performed. The original sound is an audio signal before being digitized.

この音質改善処理は、原音に近づけるために、デジタル化後、または圧縮処理後に、高周波数成分を付加するだけではなく、例えば、圧縮処理で高周波数成分が損なわれることを予測し、圧縮処理後に適度な高周波数成分が含まれるように、予め、原音以上に高周波数成分を付加する場合も含む。 This sound quality improvement process not only adds high-frequency components after digitization or compression processing in order to approximate the original sound, but also predicts that high-frequency components will be damaged by compression processing, for example, after compression processing This includes the case where a high frequency component is added to the original sound in advance so that an appropriate high frequency component is included.

そのため、音声処理装置１００が取得したデジタル音声信号は、音質改善処理が施されている場合も施されていない場合もあり得る。このような状況下で、音声処理装置１００がデジタル音声信号に対して一律に高周波数成分を付加すると、デジタル音声信号にすでに高周波数成分が付加されている場合、高周波数成分が過剰に加えられることとなり、中高音域のバランス感が変わる等、原音から離れた音となってしまう。 Therefore, the digital audio signal acquired by the audio processing device 100 may or may not be subjected to sound quality improvement processing. Under such circumstances, when the audio processing device 100 uniformly adds a high frequency component to the digital audio signal, if the high frequency component has already been added to the digital audio signal, the high frequency component is excessively added. In other words, the sound is far away from the original sound, for example, the balance of the mid-high range changes.

本実施形態の音声処理装置１００は、取得したデジタル音声信号に、すでに音質改善処理が施されているか否かに応じて適切に高周波数成分を付加するので、より原音に近いデジタル音声信号を生成することが可能となる。以下、音声処理装置１００の詳細な構成を説明する。 The sound processing apparatus 100 according to the present embodiment appropriately adds a high-frequency component to the acquired digital sound signal according to whether sound quality improvement processing has already been performed, so that a digital sound signal closer to the original sound is generated. It becomes possible to do. Hereinafter, a detailed configuration of the sound processing apparatus 100 will be described.

図２は、音声処理装置１００の全体構成を説明するための機能ブロック図である。音声処理装置１００は、信号取得部１２０と、極値特定部１２２と、信号分析部１２４と、テーブル選択部１２６と、係数記憶部１２８と、補正値生成部１３０と、遅延部１３２と、加算部１３４と、信号出力部１３６と、第１変換部１３８と、第２変換部１４０とを含んで構成される。 FIG. 2 is a functional block diagram for explaining the overall configuration of the speech processing apparatus 100. The audio processing device 100 includes a signal acquisition unit 120, an extreme value identification unit 122, a signal analysis unit 124, a table selection unit 126, a coefficient storage unit 128, a correction value generation unit 130, a delay unit 132, and an addition. Unit 134, signal output unit 136, first conversion unit 138, and second conversion unit 140.

本実施形態の音声処理装置１００は、取得したデジタル音声信号に高周波数成分が含まれているか否かに応じて係数テーブルを選択する選択処理を行い、その後、その選択した係数テーブルを用いた補正処理を行う。以下、選択処理と補正処理とを順に説明する。 The sound processing apparatus 100 according to the present embodiment performs a selection process of selecting a coefficient table depending on whether or not a high frequency component is included in the acquired digital sound signal, and then performs correction using the selected coefficient table. Process. Hereinafter, the selection process and the correction process will be described in order.

（選択処理）
選択処理において、まず、信号取得部１２０は、デジタル音声信号を取得する。そして、信号取得部１２０は、デジタル音声信号のヘッダ情報や、デジタル音声信号のＣＤ、ＡＡＣ（登録商標）、ＡＴＲＡＣ（登録商標）、ＭＰ３等の規格に関する情報に基づいて、取得したデジタル音声信号の量子化ビット数やサンプリング周波数を特定する。ただし、ＣＤプレーヤのように、入力されるデジタル音声信号が常に同様な形式のデジタル音声信号となる場合、信号取得部１２０は、量子化ビット数やサンプリング周波数を特定する機能を有していなくてもよい。 (Selection process)
In the selection process, first, the signal acquisition unit 120 acquires a digital audio signal. Then, the signal acquisition unit 120 performs the acquisition of the acquired digital audio signal based on the header information of the digital audio signal and the information related to the standard of the digital audio signal such as CD, AAC (registered trademark), ATRAC (registered trademark), and MP3. Specify the number of quantization bits and sampling frequency. However, when the input digital audio signal is always a digital audio signal of the same format as in a CD player, the signal acquisition unit 120 does not have a function of specifying the number of quantization bits and the sampling frequency. Also good.

第１変換部１３８は、信号取得部１２０が取得したデジタル音声信号をアップサンプリングして極値特定部１２２に出力する。なお、アップサンプリングすることは必須ではなく、第１変換部１３８を含まない構成としてもよい。 The first conversion unit 138 upsamples the digital audio signal acquired by the signal acquisition unit 120 and outputs it to the extreme value specifying unit 122. Note that up-sampling is not essential, and the first converter 138 may not be included.

極値特定部１２２は、信号取得部１２０が取得したデジタル音声信号の極値である極大値と極小値とを特定する。具体的に、極値特定部１２２は、デジタル音声信号のサンプルの値（サンプルの音圧値）を順次比較し、サンプルの値が増加または増減無しから減少に転じた場合、減少に転じる直前のサンプルの値を極大値とし、サンプルの値が減少または増減無しから増加に転じた場合、増加に転じる直前のサンプルの値を極小値とする。そして、極値特定部１２２は、任意の極値からその次の極値までのサンプル数、すなわち、極大値から極小値までのサンプル数、または極小値から極大値までのサンプル数を計数する。このサンプル数から、デジタル音声信号における、そのサンプル数を計数した極値から次の極値までの部分に対応する周波数がわかる。 The extreme value specifying unit 122 specifies a local maximum value and a local minimum value that are extreme values of the digital audio signal acquired by the signal acquisition unit 120. Specifically, the extreme value specifying unit 122 sequentially compares the sample values (sound pressure values of the samples) of the digital audio signal, and when the sample value changes from increasing or not increasing to decreasing, immediately before it starts decreasing. When the sample value is set to a maximum value and the sample value is changed from no decrease or increase / decrease to an increase, the sample value immediately before the sample value is increased is set to a minimum value. Then, the extreme value specifying unit 122 counts the number of samples from an arbitrary extreme value to the next extreme value, that is, the number of samples from the local maximum value to the local minimum value, or the number of samples from the local minimum value to the local maximum value. From this number of samples, the frequency corresponding to the part from the extreme value where the number of samples is counted to the next extreme value in the digital audio signal is known.

図３は、任意の極値からその次の極値までのサンプル数と、その任意の極値とその次の極値を含む半周期のデジタル音声信号に対応する周波数との関係を説明するための説明図である。図３（ａ）はサンプリング周波数が４４．１ｋＨｚの場合、図３（ｂ）はサンプリング周波数が９６ｋＨｚの場合を示す。図３（ａ）、（ｂ）に示すように、任意の極値から次の極値までのサンプル数が少ない程、その半周期のデジタル音声信号は、波形の周期が短く、高い周波数帯域の信号であると言える。例えば、ＣＤ規格では、サンプリング周波数が４４．１ｋＨｚであるため図３（ａ）に示す関係を用い、任意の極値から次の極値までのサンプル数が１であった場合、任意の極値とその次の極値を含む半周期のデジタル音声信号は１１．０２５〜２２．０５０ｋＨｚの周波数帯域に含まれる信号であり、任意の極値から次の極値までのサンプル数が２であった場合、任意の極値とその次の極値を含む半周期のデジタル音声信号は７．３５〜１１．０２５ｋＨｚの周波数帯域に含まれる信号である。 FIG. 3 illustrates the relationship between the number of samples from an arbitrary extreme value to the next extreme value, and the frequency corresponding to the half-period digital audio signal including the arbitrary extreme value and the next extreme value. It is explanatory drawing of. 3A shows a case where the sampling frequency is 44.1 kHz, and FIG. 3B shows a case where the sampling frequency is 96 kHz. As shown in FIGS. 3 (a) and 3 (b), the smaller the number of samples from any extreme value to the next extreme value, the shorter the half-cycle digital audio signal, the shorter the waveform period, and the higher the frequency band. It can be said that it is a signal. For example, in the CD standard, since the sampling frequency is 44.1 kHz, the relationship shown in FIG. 3A is used, and when the number of samples from one extreme value to the next extreme value is 1, any extreme value And a half-cycle digital audio signal including the next extreme value is a signal included in the frequency band of 11.0525 to 22.050 kHz, and the number of samples from any extreme value to the next extreme value is 2. In this case, a half-cycle digital audio signal including an arbitrary extreme value and the next extreme value is a signal included in the frequency band of 7.35 to 11.025 kHz.

信号分析部１２４は、例えば、信号取得部１２０がデジタル音声信号を取得し始めてから所定期間分のデジタル音声信号について、所定周波数以上かつ所定音圧以上の高周波数成分がデジタル音声信号に含まれているか否かを判断する。所定期間は、予め設定された期間でもよいし、デジタル音声信号が楽曲等所定の時間単位で区切ることが可能な音声信号であれば、その楽曲１曲分の期間であってもよい。デジタル音声信号には、ＡＤ変換時や伝送時に生じる雑音信号が含まれる場合があり、この雑音信号にも所定周波数以上の周波数成分が含まれる。この雑音信号は音圧が低いため、本実施形態では、高周波数成分を所定周波数以上かつ所定音圧以上の周波数成分として雑音信号と区別する。所定音圧は、例えば、雑音信号の平均音圧と、雑音信号を除いたデジタル音声信号の高周波数成分の平均音圧との中間の音圧としたり、雑音信号の平均音圧の所定倍の音圧としたりする。 The signal analysis unit 124 includes, for example, a high-frequency component equal to or higher than a predetermined frequency and equal to or higher than a predetermined sound pressure in the digital audio signal for a predetermined period after the signal acquisition unit 120 starts acquiring the digital audio signal. Determine whether or not. The predetermined period may be a preset period, or may be a period for one piece of music as long as the digital audio signal is an audio signal that can be divided into predetermined time units such as music. The digital audio signal may include a noise signal generated at the time of AD conversion or transmission, and this noise signal also includes a frequency component of a predetermined frequency or higher. Since this noise signal has a low sound pressure, in the present embodiment, the high frequency component is distinguished from the noise signal as a frequency component having a predetermined frequency or higher and a predetermined sound pressure or higher. The predetermined sound pressure is, for example, an intermediate sound pressure between the average sound pressure of the noise signal and the average sound pressure of the high frequency component of the digital audio signal excluding the noise signal, or a predetermined multiple of the average sound pressure of the noise signal. Or sound pressure.

具体的に、信号分析部１２４は、極値特定部１２２が計数した、デジタル音声信号の任意の極値から次の極値までのサンプル数が所定数（以下、サンプル数閾値と称する）より少なく、任意の極値および次の極値のうち、いずれか大きい方の極値である極大値のレベルと、任意の極値および次の極値のうち、いずれか小さい方の極値である極小値のレベルとの差分（任意の極値と次の極値との差分、または、次の極値と任意の極値との差分）である極値間差分値が所定値（以下、極値レベル閾値と称する）を超えている場合、デジタル音声信号に高周波数成分が含まれていると判断する。 Specifically, the signal analysis unit 124 counts the number of samples from an arbitrary extreme value of the digital audio signal to the next extreme value counted by the extreme value specifying unit 122 less than a predetermined number (hereinafter referred to as a sample number threshold value). , The level of the local maximum that is the larger of any extreme and the next extreme, and the local minimum that is the smaller of either the extreme and the next extreme The difference value between the extreme values, which is the difference between the value level (the difference between any extreme value and the next extreme value, or the difference between the next extreme value and the arbitrary extreme value) is a predetermined value (hereinafter, extreme value) If it exceeds the level threshold), it is determined that the digital audio signal contains a high frequency component.

なお、サンプル数閾値は、任意の極値とその次の極値を含む半周期のデジタル音声信号が所定周波数以上の音声信号であるか否かを判定するための閾値であり、取得されるデジタル音声信号の規格やサンプリング周波数等で決定される。ＣＤプレーヤのように取得されるデジタル音声信号が常に同様なサンプリング周波数のデジタル音声信号であれば、サンプル数閾値は常に同じ値でよい。パーソナルコンピュータのように様々なサンプリング周波数や様々な規格のデジタル音声信号が取得される場合、信号分析部１２４は、サンプリング周波数や規格に応じて、サンプル数閾値を設定する。具体的に、信号分析部１２４は、ヘッダ情報や規格に関する情報に基づいて、取得したデジタル音声信号のサンプリング周波数を検出し、そのサンプリング周波数において含まれる周波数成分よりも高い周波数成分が対応するサンプル数をサンプル数閾値とする。 The sample number threshold is a threshold for determining whether or not a half-cycle digital audio signal including an arbitrary extreme value and the next extreme value is an audio signal having a predetermined frequency or higher, and is acquired digitally. It is determined by the audio signal standard, sampling frequency, and the like. If the digital audio signal acquired like a CD player is always a digital audio signal having the same sampling frequency, the sample number threshold may always be the same value. When digital audio signals with various sampling frequencies and various standards are acquired as in a personal computer, the signal analysis unit 124 sets a sample number threshold according to the sampling frequency and standards. Specifically, the signal analysis unit 124 detects the sampling frequency of the acquired digital audio signal based on the header information and information on the standard, and the number of samples to which the frequency component higher than the frequency component included in the sampling frequency corresponds. Is the sample number threshold.

また、極値レベル閾値は、任意の極値とその次の極値を含む半周期のデジタル音声信号が所定音圧以上であるか否かを判定するための閾値であり、雑音信号の影響を排除するための閾値である。 The extreme level threshold is a threshold for determining whether or not a half-cycle digital audio signal including an arbitrary extreme value and the next extreme value is equal to or higher than a predetermined sound pressure. This is a threshold value for exclusion.

図４は、信号分析部１２４の処理を説明するための説明図である。図４（ａ）は、音質改善処理が施されたデジタル音声信号１５０の周波数スペクトルと音質改善処理が施されていないデジタル音声信号１５２の周波数スペクトルとを重ね合わせて示し、図４（ｂ）は、音質改善処理が施されたデジタル音声信号１５０の周波数スペクトルのみを、図４（ｃ）は、音質改善処理が施されていないデジタル音声信号１５２の周波数スペクトルのみを示している。 FIG. 4 is an explanatory diagram for explaining the processing of the signal analysis unit 124. FIG. 4A shows the frequency spectrum of the digital audio signal 150 that has been subjected to the sound quality improvement process and the frequency spectrum of the digital audio signal 152 that has not been subjected to the sound quality improvement process superimposed, and FIG. FIG. 4C shows only the frequency spectrum of the digital audio signal 152 that has not been subjected to the sound quality improvement process, and FIG. 4C shows only the frequency spectrum of the digital audio signal 152 that has not been subjected to the sound quality improvement process.

図４（ａ）に示すように、音質改善処理が施されたデジタル音声信号１５０は、音質改善処理が施されていないデジタル音声信号１５２には含まれていない高周波数成分（図４（ａ）にハッチングで示す）が含まれている。 As shown in FIG. 4A, the digital audio signal 150 that has been subjected to the sound quality improvement process is a high frequency component that is not included in the digital audio signal 152 that has not been subjected to the sound quality improvement process (FIG. 4A). (Indicated by hatching).

そのため、例えば周波数ｆ１以上の範囲（実質的に周波数ｆ１〜ｆ２の範囲）の周波数成分１５４が所定音圧以上の音圧であった場合に、図４（ｃ）に示すデジタル信号１５２の周波数成分ではなく、図４（ｂ）に示す音質改善処理が施されたデジタル音声信号１５０にしか含まれない高周波数成分であると判断できる。言い換えると、デジタル音声信号の任意の極値から次の極値までのサンプル数がサンプル数閾値より少なく、その任意の極値から次の極値の差分である極値間差分値が極値レベル閾値を超えている場合、そのデジタル音声信号は、音質改善処理が施されていなければ含まれないはずの周波数成分を有しており、過去に音質改善処理が施されたデジタル音声信号であると判断できる。 Therefore, for example, when the frequency component 154 in the range of the frequency f1 or higher (substantially in the range of the frequencies f1 to f2) is the sound pressure higher than the predetermined sound pressure, the frequency component of the digital signal 152 shown in FIG. Instead, it can be determined that the high-frequency component is included only in the digital audio signal 150 that has been subjected to the sound quality improvement processing shown in FIG. In other words, the number of samples from any extreme value of the digital audio signal to the next extreme value is less than the sample number threshold, and the difference value between extreme values that is the difference between the arbitrary extreme value and the next extreme value is the extreme value level. When the threshold value is exceeded, the digital audio signal has a frequency component that should not be included unless sound quality improvement processing is performed, and is a digital sound signal that has been subjected to sound quality improvement processing in the past. I can judge.

ここで、信号分析部１２４は、デジタル音声信号の任意の極値から次の極値までのサンプル数がサンプル数閾値より少なく、極値間差分値が極値レベル閾値を超えるサンプルが所定期間分のデジタル音声信号に１つでも含まれると、高周波数成分が含まれ、音質改善処理が施されたデジタル音声信号であると判断する。しかし、かかる場合に限定されず、信号分析部１２４は、例えば、そのようなサンプルの数が、所定閾値を超えていると、そのデジタル音声信号は高周波数成分が含まれ、音質改善処理が施されたデジタル音声信号であると判断してもよい。 Here, the signal analysis unit 124 has a predetermined number of samples in which the number of samples from any extreme value of the digital audio signal to the next extreme value is less than the sample number threshold and the difference value between extreme values exceeds the extreme level threshold. If at least one digital audio signal is included, it is determined that the digital audio signal includes a high frequency component and has been subjected to sound quality improvement processing. However, the signal analysis unit 124 is not limited to such a case. For example, if the number of such samples exceeds a predetermined threshold, the digital audio signal includes a high-frequency component and is subjected to sound quality improvement processing. The digital audio signal may be determined.

任意の極値から次の極値までのサンプル数がサンプル数閾値より少なく極値間差分値が極値レベル閾値を超えている場合、信号分析部１２４は、デジタル音声信号に所定周波数以上かつ所定音圧以上の高周波数成分が含まれていると判断する。信号分析部１２４は、取得されたデジタル音声信号に高周波数成分が含まれているか否かを、周波数帯域を特定するためのサンプル数閾値と任意の極値から次の極値までのサンプル数との比較と、その高周波数成分が雑音信号であるか否かを識別するための極値間差分値と極値レベル閾値との比較という簡易な処理で判断するので、音声処理装置１００は、処理負荷の増大を抑制しつつ音質改善処理を遂行することが可能となる。 When the number of samples from any extreme value to the next extreme value is less than the sample number threshold value and the difference value between extreme values exceeds the extreme value level threshold value, the signal analysis unit 124 adds the digital audio signal to a predetermined frequency and a predetermined frequency. It is determined that a high frequency component higher than the sound pressure is included. The signal analysis unit 124 determines whether or not a high frequency component is included in the acquired digital audio signal, the sample number threshold for specifying the frequency band, the number of samples from an arbitrary extreme value to the next extreme value, and the like. The speech processing apparatus 100 determines whether the high frequency component is a noise signal or not, and the comparison between the extreme value difference value and the extreme value level threshold for identifying whether the high frequency component is a noise signal. It is possible to perform sound quality improvement processing while suppressing an increase in load.

また、信号分析部１２４が極値間差分値と極値レベル閾値とを比較する構成により、信号分析部１２４が、デジタル音声信号にまだ音質改善処理が施されていないにもかかわらず、雑音信号等の影響によって、音質改善処理が施されていると誤判断してしまう事態を回避できる。 Further, the signal analyzer 124 compares the difference value between the extreme values and the extreme value level threshold value, so that the signal analyzer 124 can detect the noise signal even though the digital audio signal has not been subjected to the sound quality improvement processing yet. It is possible to avoid a situation in which it is erroneously determined that the sound quality improvement processing is performed due to the influence of the above.

次に、デジタル音声信号がすでに音質改善処理されたデジタル音声信号であるか否かを判断する方法の他の例について説明する。この例において、信号分析部１２４は、例えば、信号取得部１２０が所定期間に取得したデジタル音声信号について、任意の極値から次の極値までのサンプル数がサンプル数閾値より少ない複数のサンプル全体に対して、極値間差分値が極値レベル閾値を超えているサンプルが占める占有率が所定比率を超えていると、デジタル音声信号に高周波数成分が含まれていると判断する。ここで占有率は、サンプル数がサンプル数閾値より少ない複数のサンプルの総数に対する、極値間差分値が極値レベル閾値を超えているサンプルの数の比率をいう。 Next, another example of a method for determining whether or not a digital audio signal is a digital audio signal that has already undergone sound quality improvement processing will be described. In this example, the signal analysis unit 124, for example, for a digital audio signal acquired by the signal acquisition unit 120 in a predetermined period, the whole of a plurality of samples in which the number of samples from any extreme value to the next extreme value is less than the sample number threshold. On the other hand, if the occupation ratio occupied by the samples whose difference value between extreme values exceeds the extreme level threshold exceeds a predetermined ratio, it is determined that a high frequency component is included in the digital audio signal. Here, the occupation ratio refers to the ratio of the number of samples in which the difference value between extreme values exceeds the extreme level threshold with respect to the total number of a plurality of samples whose number of samples is smaller than the sample number threshold.

図５は、信号分析部１２４が行う、占有率が所定比率を超えているか否かに応じて、デジタル音声信号に高周波数成分が含まれているか否かを判断する処理を説明するための説明図である。図４（ａ）と同様に、図５（ａ）は、音質改善処理が施されたデジタル音声信号１５６の周波数スペクトルと音質改善処理が施されていないデジタル音声信号１５８の周波数スペクトルとを重ね合わせて示し、図５（ｂ）は、音質改善処理が施されたデジタル音声信号１５６の周波数スペクトルのみを、図５（ｃ）は、音質改善処理が施されていないデジタル音声信号１５８の周波数スペクトルのみを示している。 FIG. 5 is a diagram for explaining processing performed by the signal analysis unit 124 to determine whether or not a high frequency component is included in a digital audio signal depending on whether or not the occupation ratio exceeds a predetermined ratio. FIG. Similar to FIG. 4A, FIG. 5A superimposes the frequency spectrum of the digital audio signal 156 that has been subjected to the sound quality improvement process and the frequency spectrum of the digital audio signal 158 that has not been subjected to the sound quality improvement process. 5B shows only the frequency spectrum of the digital audio signal 156 that has been subjected to the sound quality improvement processing, and FIG. 5C shows only the frequency spectrum of the digital audio signal 158 that has not been subjected to the sound quality improvement processing. Is shown.

図５（ａ）においても、音質改善処理が施されたデジタル音声信号の周波数スペクトル１５６は、音質改善処理が施されていない周波数スペクトル１５８には含まれていない高周波数成分（図５（ａ）にハッチングで示す）が含まれている。しかし、その音質改善処理による高周波数成分の周波数帯域が判定基準となる周波数帯域（周波数ｆ３〜ｆ４）より狭く、任意の極値から次の極値までのサンプル数で特定できる周波数ｆ３〜ｆ４の範囲に音質改善処理による高周波数成分と、音質改善処理による高周波数成分以外の周波数成分とが包含される場合に、図４を用いて説明した極値レベル閾値と比較するのみの処理では、音質改善処理が施されているか否かを判断できない場合がある。なお、以下の図５に関する説明において、デジタル音声信号のサンプリング周波数を４４．１ｋＨｚ、周波数ｆ３〜ｆ４の範囲を１１．０２５ｋＨｚ〜２２．０５０ｋＨｚとする。また、サンプル数閾値を２とする。 Also in FIG. 5A, the frequency spectrum 156 of the digital audio signal subjected to the sound quality improvement process is a high frequency component not included in the frequency spectrum 158 not subjected to the sound quality improvement process (FIG. 5A). (Indicated by hatching). However, the frequency band of the high frequency component by the sound quality improvement processing is narrower than the frequency band (frequencies f3 to f4) serving as a criterion, and the frequencies f3 to f4 that can be specified by the number of samples from an arbitrary extreme value to the next extreme value. When the range includes a high-frequency component by the sound quality improvement process and a frequency component other than the high-frequency component by the sound quality improvement process, the sound quality is simply compared with the extreme level threshold described with reference to FIG. It may not be possible to determine whether or not improvement processing has been performed. In the following description regarding FIG. 5, the sampling frequency of the digital audio signal is 44.1 kHz, and the range of frequencies f3 to f4 is 11.0525 kHz to 22.050 kHz. In addition, the sample number threshold is 2.

図５に示すように、ある極値とその次の極値を含む半周期のデジタル音声信号が示す周波数成分１６０が、図５（ｂ）における音質改善処理が施されたデジタル音声信号の周波数スペクトル１５６にも図５（ｃ）における音質改善処理が施されていない周波数スペクトル１５８にも含まれる場合、そのデジタル音声信号が周波数ｆ３〜ｆ４の範囲に含まれる周波数成分を有することのみをもって、すでに音質改善処理が施されているか否かを判断できない。つまり、任意の極値から次の極値までのサンプル数がサンプル数閾値よりも少ない「１」であっても、その極値と次の極値までのサンプルを含む半周期のデジタル音声信号が有する周波数成分が１１．０２５ｋＨｚ〜２２．０５０ｋＨｚの範囲の周波数成分であることしか特定することができず、その半周期のデジタル音声信号の周波数成分が、音質改善処理によって付加されたものであるか否かを判断することができない。 As shown in FIG. 5, a frequency component 160 indicated by a half-cycle digital audio signal including a certain extreme value and the next extreme value is a frequency spectrum of the digital audio signal subjected to the sound quality improvement processing in FIG. If the frequency spectrum 158 that is not subjected to the sound quality improvement process in FIG. 5C is also included in the frequency spectrum 158, the sound quality has already been improved only because the digital audio signal has a frequency component included in the range of the frequencies f3 to f4. It cannot be determined whether or not improvement processing has been performed. That is, even if the number of samples from an arbitrary extreme value to the next extreme value is “1”, which is smaller than the sample number threshold, a half-cycle digital audio signal including the extreme value and the samples up to the next extreme value is obtained. It can only be specified that the frequency component it has is a frequency component in the range of 11.0525 kHz to 22.050 kHz, and is the frequency component of the digital audio signal of that half cycle added by the sound quality improvement processing? Cannot judge whether or not.

そこで、信号分析部１２４は、任意の極値から次の極値までのサンプル数がサンプル数閾値より少ない複数のサンプル、すなわち、周波数ｆ３以上の範囲（実質的に周波数ｆ３〜ｆ４の範囲）に含まれる複数のサンプルについて、極値間差分値が極値レベル閾値を超えているか否かを判断する。そして、信号分析部１２４は、例えば所定期間における、任意の極値から次の極値までのサンプル数がサンプル数閾値より少ないサンプル全体に対して、極値間差分値が極値レベル閾値を超えているサンプルが占める占有率を導出する。具体的には、信号分析部１２４は、任意の極値から次の極値までのサンプル数が２より少ない半周期のデジタル音声信号のうち、極値間差分値が極値レベル閾値を超えている半周期のデジタル音声信号の占有率を導出する。 Therefore, the signal analysis unit 124 has a plurality of samples in which the number of samples from an arbitrary extreme value to the next extreme value is smaller than the sample number threshold, that is, a range of the frequency f3 or more (substantially a range of frequencies f3 to f4). For a plurality of included samples, it is determined whether or not the difference value between extreme values exceeds an extreme value level threshold. Then, the signal analysis unit 124, for example, for an entire sample in which the number of samples from an arbitrary extreme value to the next extreme value is smaller than the sample number threshold in a predetermined period, the difference value between extreme values exceeds the extreme level threshold. The occupancy occupied by the sample is derived. Specifically, the signal analysis unit 124 determines that the difference value between extreme values exceeds the extreme value level threshold among digital audio signals having a half cycle in which the number of samples from an arbitrary extreme value to the next extreme value is less than two. The occupancy rate of the half-cycle digital audio signal is derived.

図５（ｂ）、（ｃ）のクロスハッチングで示す領域を比較してわかるように、任意の極値から次の極値までのサンプル数がサンプル数閾値より少ない周波数ｆ３〜ｆ４の周波数成分において、極値間差分値が極値レベル閾値を超えている比率は、図５（ｂ）に示す、音質改善処理による高周波数成分を含むデジタル音声信号の場合の方が高くなる。そのため、信号分析部１２４は、極値間差分値が極値レベル閾値を超えているサンプルの占有率を予め設定した所定比率（例えば５０％）と比較することで、デジタル音声信号にすでに音質改善処理が施されているか否かを判断する。 As can be seen by comparing the areas indicated by cross-hatching in FIGS. 5B and 5C, in the frequency components of the frequencies f3 to f4 where the number of samples from any extreme value to the next extreme value is less than the sample number threshold. The ratio at which the difference value between extreme values exceeds the extreme value level threshold is higher in the case of a digital audio signal including a high frequency component by sound quality improvement processing shown in FIG. Therefore, the signal analysis unit 124 compares the occupancy rate of the samples whose difference value between extreme values exceeds the extreme value level threshold value with a predetermined ratio (for example, 50%), thereby improving the sound quality of the digital audio signal. It is determined whether or not processing has been performed.

このように、デジタル音声信号のサンプリング周波数と、高周波数成分の周波数帯域との組み合わせよっては、ある半周期のデジタル音声信号に高周波数成分が含まれているか否かを、その半周期のサンプルで判断できない場合であっても、本実施形態の信号分析部１２４は、複数の周期に跨ってサンプルを分析することで、その複数の周期のデジタル音声信号に、すでに音質改善処理が施されているか否かを確実に判断できる。 Thus, depending on the combination of the sampling frequency of the digital audio signal and the frequency band of the high frequency component, whether or not the high frequency component is included in the digital audio signal of a certain half cycle can be determined by the sample of the half cycle. Even if it cannot be determined, the signal analysis unit 124 of the present embodiment analyzes the sample over a plurality of periods, so that the sound quality improvement processing has already been performed on the digital audio signals of the plurality of periods. It is possible to reliably determine whether or not.

次に、信号分析部１２４が、デジタル音声信号のフォーマットに基づいてサンプル数閾値と極値レベル閾値とを決定する手段について説明する。 Next, a description will be given of a means for the signal analysis unit 124 to determine the sample number threshold value and the extreme value level threshold value based on the format of the digital audio signal.

図６は、デジタル音声信号のフォーマットと、サンプル数閾値および極値レベル閾値との関係の一例を示す説明図である。図６に示すように、信号分析部１２４は、デジタル音声信号のフォーマット（例えば、ＡＡＣ（登録商標）、ＨＥ−ＡＡＣ、ＭＰ３）と、ビットレートによって、任意の極値から次の極値までのサンプル数閾値と、極値レベル閾値とを決定する。図６において、極値レベル閾値は、その極値レベル閾値に相当する音圧がＡＤ変換された場合に、ＡＤ変換の量子化単位であるＬＳＢ（Least Significant Bit）の何倍（例えば１２８）となるかで示している。 FIG. 6 is an explanatory diagram showing an example of the relationship between the format of the digital audio signal, the sample number threshold value, and the extreme value level threshold value. As shown in FIG. 6, the signal analysis unit 124 selects from any extreme value to the next extreme value depending on the format of the digital audio signal (for example, AAC (registered trademark), HE-AAC, MP3) and the bit rate. A sample number threshold value and an extreme value level threshold value are determined. In FIG. 6, when the sound pressure corresponding to the extreme value level threshold value is AD-converted, the extreme value level threshold value is several times (for example, 128) the LSB (Least Significant Bit) which is the quantization unit of AD conversion. It is shown in what.

図６に示すように、デジタル音声信号のフォーマットによって、デジタル化や圧縮処理等で除去されてしまう周波数帯域が定まり、音質改善処理によって付加されている可能性がある周波数帯域が定まる。したがって、デジタル音声信号のフォーマットおよびビットレートが定まれば、サンプリング周波数と任意の極値から次の極値までのサンプル数とに基づいてデジタル音声信号の周波数が導出される。また、デジタル音声信号に含まれる雑音信号の音圧レベルも、量子化ビット数やビットレート等によって定められる。そのため、信号分析部１２４は、サンプル数閾値と極値レベル閾値とを、フォーマットに基づいて決定することで、音質改善処理がすでに施されているか否かを確実に判断できる。 As shown in FIG. 6, the frequency band that is removed by digitization or compression processing is determined by the format of the digital audio signal, and the frequency band that may be added by the sound quality improvement processing is determined. Therefore, if the format and bit rate of the digital audio signal are determined, the frequency of the digital audio signal is derived based on the sampling frequency and the number of samples from an arbitrary extreme value to the next extreme value. The sound pressure level of the noise signal included in the digital audio signal is also determined by the number of quantization bits, the bit rate, and the like. Therefore, the signal analysis unit 124 can reliably determine whether the sound quality improvement processing has already been performed by determining the sample number threshold value and the extreme value level threshold value based on the format.

テーブル選択部１２６は、信号分析部１２４がデジタル音声信号に高周波数成分が含まれていると判断したか否かに応じて、係数を選択するための複数の係数テーブルのうち１の係数テーブルを選択する。このとき、テーブル選択部１２６は、デジタル音声信号に高周波数成分が含まれている場合、デジタル音声信号に高周波数成分が含まれていない場合より小さい係数が含まれる係数テーブルを選択する。後述する補正値生成部１３０は、この係数テーブルに含まれる係数に基づいて補正値を生成する。 The table selection unit 126 selects one coefficient table from among a plurality of coefficient tables for selecting a coefficient according to whether or not the signal analysis unit 124 determines that a high frequency component is included in the digital audio signal. select. At this time, when the digital audio signal includes a high frequency component, the table selection unit 126 selects a coefficient table that includes a smaller coefficient than when the digital audio signal does not include a high frequency component. A correction value generation unit 130, which will be described later, generates a correction value based on the coefficients included in the coefficient table.

デジタル音声信号に高周波数成分が含まれている場合、すでに音質改善処理が施されていると判断できる。テーブル選択部１２６は、音質改善処理が施されたデジタル音声信号には、より小さい係数が含まれる係数テーブルを選択することで補正量を抑制し、過剰な音質改善処理を確実に回避する。 If the digital audio signal contains high frequency components, it can be determined that the sound quality improvement processing has already been performed. The table selection unit 126 suppresses the correction amount by selecting a coefficient table in which a smaller coefficient is included in the digital audio signal subjected to the sound quality improvement process, and reliably avoids an excessive sound quality improvement process.

係数記憶部１２８は、ＲＡＭ（Random Access Memory）、ＥＥＰＲＯＭ、不揮発性ＲＡＭ、フラッシュメモリ、ＨＤＤ（Hard Disk Drive）等で構成され、第１係数テーブル群と第２係数テーブル群とが予め記憶されている。 The coefficient storage unit 128 includes a RAM (Random Access Memory), an EEPROM, a nonvolatile RAM, a flash memory, an HDD (Hard Disk Drive), and the like, and a first coefficient table group and a second coefficient table group are stored in advance. Yes.

図７は、係数テーブル群および係数テーブルを説明するための説明図である。図７（ａ）、（ｂ）に示すように、係数テーブル群（第１係数テーブル群１６６、第２係数テーブル群１６８）には、第１係数テーブル群１６６に係数テーブルＡの係数、係数テーブルＢの係数、・・・、第２係数テーブル群１６８に係数テーブルＡ’の係数、係数テーブルＢ’の係数、・・・の列でそれぞれ示すように、例えばデジタル音声信号のフォーマット（規格）毎に、サンプル数に応じた係数が設定されている。ここでは、理解を容易にするため、係数テーブルＡ、Ｂ、Ａ’、Ｂ’、・・・の係数を係数テーブル群の列で示しているが、より詳細には、係数テーブルＡ、Ｂ、Ａ’、Ｂ’、・・・それぞれにおいて、サンプル数と係数とが１対１に対応付けられている。 FIG. 7 is an explanatory diagram for explaining a coefficient table group and a coefficient table. As shown in FIGS. 7A and 7B, the coefficient table group (the first coefficient table group 166 and the second coefficient table group 168) includes the coefficient of the coefficient table A and the coefficient table in the first coefficient table group 166. B coefficient,..., For example, for each format (standard) of the digital audio signal, as shown in the second coefficient table group 168 by the columns of the coefficient table A ′, the coefficient table B ′ coefficient,. In addition, a coefficient corresponding to the number of samples is set. Here, for ease of understanding, the coefficients of the coefficient tables A, B, A ′, B ′,... Are shown in columns of the coefficient table group, but more specifically, the coefficient tables A, B, In each of A ′, B ′,..., The number of samples and the coefficient are associated one-to-one.

また、図７（ａ）に示す第１係数テーブル群１６６は、音質改善処理が施されていないデジタル音声信号に対して、図７（ｂ）に示す第２係数テーブル群１６８は、音質改善処理が施されたデジタル音声信号に対して用いられる。本実施形態においては、第２係数テーブル群１６８の係数テーブル（係数テーブルＡ’、係数テーブルＢ’、・・・）の係数は、第１係数テーブル群１６６の対応するそれぞれの係数テーブル（係数テーブルＡ、係数テーブルＢ、・・・）の係数と異なり、ここでは第１係数テーブル群１６６の対応するそれぞれの係数テーブル（係数テーブルＡ、係数テーブルＢ、・・・）の係数の１／２の値となっている。係数テーブルＡおよび係数テーブルＡ´は、取得したデジタル音声信号のサンプリング周波数が例えば４４．１ｋＨｚの場合に用いる係数テーブルであり、係数テーブルＢおよび係数テーブルＢ´は、例えば、取得したデジタル音声信号のサンプリング周波数が例えば９６ｋＨｚの場合に用いる係数テーブルである。図７に示すように同じサンプリング周波数のデジタル音声信号に対しても、音質改善処理がすでに施されているか否かに応じて、係数テーブルを異ならせている。 Further, the first coefficient table group 166 shown in FIG. 7A is a digital sound signal that has not been subjected to the sound quality improvement process, and the second coefficient table group 168 shown in FIG. This is used for digital audio signals subjected to. In the present embodiment, the coefficients of the coefficient tables (coefficient table A ′, coefficient table B ′,...) Of the second coefficient table group 168 are the corresponding coefficient tables (coefficient tables) of the first coefficient table group 166. Unlike the coefficients of A, coefficient table B,..., Here, half of the coefficients of the corresponding coefficient tables (coefficient table A, coefficient table B,...) Of the first coefficient table group 166 are used. It is a value. The coefficient table A and the coefficient table A ′ are coefficient tables used when the sampling frequency of the acquired digital audio signal is, for example, 44.1 kHz. The coefficient table B and the coefficient table B ′ are, for example, the acquired digital audio signal. It is a coefficient table used when a sampling frequency is 96 kHz, for example. As shown in FIG. 7, the coefficient tables are made different depending on whether or not the sound quality improvement processing has already been performed for digital audio signals having the same sampling frequency.

また、第１係数テーブル群１６６と第２係数テーブル群１６８とを１つのテーブルにまとめ、例えば、テーブルに、音質改善処理を施すか否かに対応した識別情報の項目（列）を付加しておき、テーブル選択部１２６は、デジタル音声信号のフォーマットに加え、識別情報に基づいて係数テーブルを選択してもよい。 Also, the first coefficient table group 166 and the second coefficient table group 168 are combined into one table, and for example, an identification information item (column) corresponding to whether or not sound quality improvement processing is performed is added to the table. The table selection unit 126 may select a coefficient table based on the identification information in addition to the format of the digital audio signal.

テーブル選択部１２６は、まず、第１係数テーブル群１６６と第２係数テーブル群１６８のうち、デジタル音声信号に高周波数成分が含まれているか否かに応じて１の係数テーブル群を選択し、さらに、選択された１の係数テーブル群を用いて、デジタル音声信号のフォーマットに対応した１の係数テーブルを選択する。後述する補正値生成部１３０は、テーブル選択部１２６が選択した係数テーブルを用いて補正値を生成する。 The table selection unit 126 first selects one coefficient table group from the first coefficient table group 166 and the second coefficient table group 168 depending on whether or not a high frequency component is included in the digital audio signal, Further, using the selected one coefficient table group, one coefficient table corresponding to the format of the digital audio signal is selected. The correction value generation unit 130 described later generates a correction value using the coefficient table selected by the table selection unit 126.

ここで、図７（ａ）、（ｂ）の第１係数テーブル群１６６の係数テーブルＡ、Ｂ、第２係数テーブル群１６８の係数テーブルＡ´、Ｂ´において、サンプル数が多いほど係数の値が小さいのは以下の理由からである。すなわち、任意の極値から次の極値までのサンプル数が多い場合、そのデジタル音声信号の周波数は低い。そのため、例えば、すでに２２．１ｋＨｚの低域通過フィルタ（ＬＰＦ：Low Pass Filter）でフィルタリングが施されていても、その低周波数の周波数成分の高調波は抑制されずに残っている。したがって、大きな高周波数成分を付加しなくとも十分に高音質を維持できるので、係数は小さくて済む。 Here, in the coefficient tables A and B of the first coefficient table group 166 and the coefficient tables A ′ and B ′ of the second coefficient table group 168 in FIGS. 7A and 7B, the coefficient value increases as the number of samples increases. Is small for the following reasons. That is, when the number of samples from an arbitrary extreme value to the next extreme value is large, the frequency of the digital audio signal is low. Therefore, for example, even if filtering has already been performed with a 22.1 kHz low-pass filter (LPF), the harmonics of the low-frequency frequency components remain without being suppressed. Therefore, a sufficiently high sound quality can be maintained without adding a large high-frequency component, and the coefficient can be small.

一方、任意の極値から次の極値までのサンプル数が少ない場合、そのデジタル音声信号の周波数は高い。そのため、例えば、すでに２２．１ｋＨｚの低域通過フィルタでフィルタリングが施されていると、その高周波数の周波数成分の高調波はほとんど削減されている。したがって、高周波数成分を十分に付加しないと音質の改善を図ることができないので、係数は大きい必要がある。 On the other hand, when the number of samples from any extreme value to the next extreme value is small, the frequency of the digital audio signal is high. Therefore, for example, if filtering has already been performed with a 22.1 kHz low-pass filter, the harmonics of the high-frequency frequency components are almost reduced. Therefore, the sound quality cannot be improved unless sufficient high-frequency components are added, so the coefficient needs to be large.

そこで、テーブル選択部１２６は、任意の極値から次の極値までのサンプル数に応じて係数が関連付けられた第１係数テーブル群１６６の係数テーブルＡ、Ｂと第２係数テーブル群１６８の係数テーブルＡ´、Ｂ´とを用い、サンプル数に応じて適切な補正量となるように係数テーブルを選択する。 Therefore, the table selection unit 126 calculates the coefficients of the coefficient tables A and B of the first coefficient table group 166 and the coefficients of the second coefficient table group 168 associated with the coefficients according to the number of samples from an arbitrary extreme value to the next extreme value. Using the tables A ′ and B ′, a coefficient table is selected so that an appropriate correction amount is obtained according to the number of samples.

このように、テーブル選択部１２６は、デジタル音声信号に音質改善処理がすでに施されている場合、まだ施されていない場合に用いる第１係数テーブル群１６６よりも全体的に小さい値の係数を関連付けた係数テーブルを含む第２係数テーブル群１６８を選択し、その係数テーブル群の中からさらに１の係数テーブルを選択する。かかる構成により、音声処理装置１００は、デジタル音声信号の周波数帯域と、音質改善処理がすでに施されているか否かに合わせて、適切な補正を施すことができる。 As described above, the table selection unit 126 associates a coefficient having a generally smaller value than the first coefficient table group 166 used when the sound quality improvement processing has already been performed on the digital audio signal and has not yet been performed. The second coefficient table group 168 including the coefficient table is selected, and one coefficient table is further selected from the coefficient table group. With this configuration, the audio processing apparatus 100 can perform appropriate correction according to the frequency band of the digital audio signal and whether or not the sound quality improvement processing has already been performed.

上述した選択処理によって、高周波数成分が含まれているか否かに応じた適切な係数テーブルが選択される。続いて、音声処理装置１００は、その選択された係数テーブルを用いて補正処理を行う。補正処理においては、信号取得部１２０がデジタル音声信号を再度取得し直し、第１変換部１３８および極値特定部１２２の処理が行われる。また、信号取得部１２０は、デジタル音声信号を再度取得し直さず、図示しないバッファ部にデジタル音声信号を一時的に保持しておいてもよい。信号取得部１２０から極値特定部１２２までの処理は、第１変換部１３８がアップサンプリングしたデジタル音声信号を遅延部１３２にも出力する点を除いて、選択処理と実質的に等しいため説明は省略し、補正値生成部１３０の処理から、補正処理を説明する。 By the selection process described above, an appropriate coefficient table is selected according to whether or not a high frequency component is included. Subsequently, the voice processing apparatus 100 performs correction processing using the selected coefficient table. In the correction process, the signal acquisition unit 120 acquires the digital audio signal again, and the processes of the first conversion unit 138 and the extreme value specifying unit 122 are performed. The signal acquisition unit 120 may temporarily hold the digital audio signal in a buffer unit (not shown) without acquiring the digital audio signal again. The processing from the signal acquisition unit 120 to the extreme value identification unit 122 is substantially the same as the selection processing except that the digital audio signal up-sampled by the first conversion unit 138 is also output to the delay unit 132, so the description will be made. The correction process will be described from the process of the correction value generation unit 130 omitted.

（補正処理）
補正処理において、補正値生成部１３０は、デジタル音声信号の振幅が拡大されるような補正値を係数テーブルから選択した係数に応じて生成する。具体的に、補正値生成部１３０は、デジタル音声信号の極大値と、その極大値となったサンプル（以下、極大値サンプルと称する）の１サンプル前のサンプルの値との差分である極大差分値に、テーブル選択部１２６が選択した係数テーブルにおけるサンプル数に応じた係数を乗算することでその極大値のサンプルに加算する補正値を生成すると共に、デジタル音声信号の極小値と、その極小値となったサンプル（以下、極小値サンプルと称する）の１サンプル前のサンプルの値との差分である極小差分値にテーブル選択部１２６が選択した係数テーブルにおけるサンプル数に応じた係数を乗算することでその極小値のサンプルから減算する補正値を生成する。 (Correction process)
In the correction process, the correction value generation unit 130 generates a correction value that increases the amplitude of the digital audio signal according to the coefficient selected from the coefficient table. Specifically, the correction value generation unit 130 determines the maximum difference that is the difference between the maximum value of the digital audio signal and the value of the sample one sample before the sample having the maximum value (hereinafter referred to as the maximum value sample). The value is multiplied by a coefficient corresponding to the number of samples in the coefficient table selected by the table selection unit 126 to generate a correction value to be added to the sample of the maximum value, and the minimum value of the digital audio signal and the minimum value thereof Multiplying the minimum difference value, which is the difference between the sample that has become the sample (hereinafter referred to as the minimum value sample) and the value of the previous sample, by a coefficient corresponding to the number of samples in the coefficient table selected by the table selection unit 126. To generate a correction value to be subtracted from the sample of the minimum value.

例えば、信号取得部１２０が取得したデジタル音声信号のサンプリング周波数が４４．１ｋＨｚであり、信号分析部１２４がそのデジタル音声信号は過去に音質改善処理が施されていないと判断した場合、テーブル選択部１２６は係数テーブルＡを選択する。補正値生成部１３０は、補正対象の極大値サンプルの１サンプル前の極小値サンプルから補正対象の極大値サンプルまでのサンプル数を極値特定部１２２から取得し、係数記憶部１２８に記憶された係数テーブルＡの中からそのサンプル数に応じた１の係数を選択し、その係数を極大差分値に乗算することでその極大値のサンプルに加算する補正値を生成する。つまり、テーブル選択部１２６によって係数テーブルＡが選択された場合、図７（ａ）に示すように、補正値生成部１３０は、サンプル数が１〜５であれば１／２、サンプル数が６〜９であれば１／４、サンプル数が１０〜１４であれば１／８、サンプル数が１５以上であれば１／１６を極大差分値に乗算することでその極大値のサンプルに加算する補正値を生成する。 For example, if the sampling frequency of the digital audio signal acquired by the signal acquisition unit 120 is 44.1 kHz and the signal analysis unit 124 determines that the sound quality improvement processing has not been performed in the past, the table selection unit 126 selects the coefficient table A. The correction value generation unit 130 obtains the number of samples from the minimum value sample one sample before the maximum value sample to be corrected to the maximum value sample to be corrected from the extreme value specifying unit 122 and stored in the coefficient storage unit 128. One coefficient corresponding to the number of samples is selected from the coefficient table A, and a correction value to be added to the sample of the maximum value is generated by multiplying the maximum difference value by the coefficient. In other words, when the coefficient table A is selected by the table selection unit 126, as shown in FIG. 7A, the correction value generation unit 130 is 1/2 if the number of samples is 1 to 5, and the number of samples is 6. If it is ~ 9, it is 1/4, if the number of samples is 10-14, 1/8, and if the number of samples is 15 or more, 1/16 is multiplied by the maximum difference value to add to the maximum value sample. A correction value is generated.

なお、信号分析部１２４が補正対象のデジタル音声信号が過去に音質改善処理が施されていると判断した場合、テーブル選択部１２６が係数テーブルＡ’を選択するため、サンプル数が同じとき、補正値生成部１３０は、信号分析部１２４がそのデジタル音声信号は過去に音質改善処理が施されていないと判断した場合よりも小さい係数を用いて補正値を生成することになる。 Note that when the signal analysis unit 124 determines that the digital audio signal to be corrected has been subjected to sound quality improvement processing in the past, the table selection unit 126 selects the coefficient table A ′, so that correction is performed when the number of samples is the same. The value generation unit 130 generates a correction value using a smaller coefficient than when the signal analysis unit 124 determines that the digital audio signal has not been subjected to sound quality improvement processing in the past.

同様にして、補正値生成部１３０は、補正対象の極小値サンプルの１サンプル前の極大値サンプルから補正対象の極小値サンプルまでのサンプル数を極値特定部１２２から取得し、係数テーブルＡの中からそのサンプル数に応じた１の係数を選択し、その係数を極小差分値に乗算することでその極小値サンプルに加算する補正値を生成する。つまり、テーブル選択部１２２によって係数テーブルＡが選択された場合、図７（ａ）に示すように、補正値生成部１３０は、サンプル数が１〜５であれば１／２、サンプル数が６〜９であれば１／４、サンプル数が１０〜１４であれば１／８、サンプル数が１５以上であれば１／１６を極小差分値に乗算することでその極小値サンプルに加算する補正値を生成する。 Similarly, the correction value generation unit 130 obtains the number of samples from the maximum value sample one sample before the minimum value sample to be corrected to the minimum value sample to be corrected from the extreme value specifying unit 122, and One coefficient corresponding to the number of samples is selected from among them, and the coefficient is multiplied by the minimum difference value to generate a correction value to be added to the minimum value sample. In other words, when the coefficient table A is selected by the table selection unit 122, as shown in FIG. 7A, the correction value generation unit 130 is 1/2 if the number of samples is 1 to 5, and the number of samples is 6. Correction to add to the minimum value sample by multiplying the minimum difference value by 1/4 if it is ~ 9, 1/8 if the number of samples is 10-14, or 1/16 if the number of samples is 15 or more. Generate a value.

極小値サンプルの補正値生成においても、信号分析部１２４が補正対象のデジタル音声信号が過去に音質改善処理が施されていると判断した場合、テーブル選択部１２６が係数テーブルＡ’を選択するため、サンプル数が同じとき、補正値生成部１３０は、信号分析部１２４がそのデジタル音声信号は過去に音質改善処理が施されていないと判断した場合よりも小さい係数を用いて補正値を生成することになる。 Even in the generation of the correction value of the minimum value sample, when the signal analysis unit 124 determines that the sound quality improvement processing has been performed on the digital audio signal to be corrected in the past, the table selection unit 126 selects the coefficient table A ′. When the number of samples is the same, the correction value generation unit 130 generates a correction value using a smaller coefficient than when the signal analysis unit 124 determines that the digital audio signal has not been subjected to sound quality improvement processing in the past. It will be.

そして、補正値生成部１３０は、極大差分値に基づいて生成した補正値が極大値に加算されるように極大値サンプルに対応付け、極小差分値に基づいて生成した補正値が極小値から減算されるように極小値サンプルに対応付ける。 Then, the correction value generation unit 130 associates the correction value generated based on the maximum difference value with the maximum value sample so that the correction value is added to the maximum value, and subtracts the correction value generated based on the minimum difference value from the minimum value. Associating with the local minimum sample.

また、補正値生成部１３０は、極大差分値から生成された補正値が、対応する極大値サンプルに加算され、かつ、極小差分値から生成された補正値が、対応するデジタル音声信号の極小値サンプルから減算されるように、補正値を配した補正信号を生成する。 Further, the correction value generation unit 130 adds the correction value generated from the maximum difference value to the corresponding maximum value sample, and the correction value generated from the minimum difference value becomes the minimum value of the corresponding digital audio signal. A correction signal with a correction value is generated so as to be subtracted from the sample.

加算部１３４は、デジタル音声信号に、補正値生成部１３０によって生成された補正信号を加算する。本実施形態において、デジタル音声信号への補正値の加算は、デジタル音声信号への補正信号の加算によって行われる。 The adder 134 adds the correction signal generated by the correction value generator 130 to the digital audio signal. In the present embodiment, the correction value is added to the digital audio signal by adding the correction signal to the digital audio signal.

その結果、加算部１３４において、次の２式で表わされるように、係数を乗算した極大差分値が、極大値に加算され、係数を乗算した極小差分値が、極小値から減算される。ここで、補正信号を加算する前の極大値をＶｍａｘ、補正信号を加算する前の極小値をＶｍｉｎ、補正信号を加算した後の極大値をＶ’ｍａｘ、補正信号を加算した後の極小値をＶ’ｍｉｎ、極大差分値をｄｌ０、極小差分値をｄｓ０、テーブル選択部１２６が選択した係数の中から、補正値生成部１３０がサンプル数に基づいて選択した１の係数をＡｍａｘ、Ａｍｉｎとすると、補正信号を加算した後の極大値および極小値はそれぞれ、以下の数式１および数式２のように表される。
Ｖ’ｍａｘ＝Ｖｍａｘ＋Ａｍａｘ×ｄｌ０ …（数式１）
Ｖ’ｍｉｎ＝Ｖｍｉｎ−Ａｍｉｎ×ｄｓ０ …（数式２） As a result, as shown in the following two equations, the adding unit 134 adds the maximum difference value multiplied by the coefficient to the maximum value, and subtracts the minimum difference value multiplied by the coefficient from the minimum value. Here, the maximum value before adding the correction signal is Vmax, the minimum value before adding the correction signal is Vmin, the maximum value after adding the correction signal is V'max, and the minimum value after adding the correction signal V′min, the maximum difference value is dl0, the minimum difference value is ds0, and the coefficient selected by the correction value generation unit 130 based on the number of samples from the coefficients selected by the table selection unit 126 is Amax and Amin. Then, the maximum value and the minimum value after adding the correction signal are expressed as the following Expression 1 and Expression 2, respectively.
V′max = Vmax + Amax × dl0 (Formula 1)
V′min = Vmin−Amin × ds0 (Formula 2)

図８は、デジタル音声信号に補正信号を加算する処理をさらに詳細に説明するための説明図である。なお、Ａｍａｘ・ｄｌ０＝Δｄｌ０、Ａｍｉｎ・ｄｓ０＝Δｄｓ０とする。図８（ａ）に示すようなデジタル音声信号１７０を取得すると、補正値生成部１３０は、極大差分値ｄｌ０と極小差分値ｄｓ０それぞれに係数Ａｍａｘ、Ａｍｉｎを乗算して補正値であるΔｄｌ０、Δｄｓ０を導出し、図８（ｂ）に示すような補正信号１７２を生成する。そして、加算部１３４がデジタル音声信号に補正信号１７２を加算すると、図８（ｃ）に白抜き矢印で示すように、補正後のデジタル音声信号１７４は極大値と極小値とにおいて振幅が大きくなる。 FIG. 8 is an explanatory diagram for explaining the process of adding the correction signal to the digital audio signal in more detail. Note that Amax · dl0 = Δdl0 and Amin · ds0 = Δds0. When the digital audio signal 170 as shown in FIG. 8A is acquired, the correction value generation unit 130 multiplies the maximum difference value dl0 and the minimum difference value ds0 by coefficients Amax and Amin, respectively, and corrects Δdl0 and Δds0. And a correction signal 172 as shown in FIG. 8B is generated. When the adding unit 134 adds the correction signal 172 to the digital audio signal, the corrected digital audio signal 174 increases in amplitude between the maximum value and the minimum value, as indicated by the white arrow in FIG. .

上述したように高周波数成分がカットされたデジタル音声信号は、例えば、音の伸び、広がり、ダイナミックレンジ、艶っぽさ等が乏しくなってしまう場合がある。本実施形態の補正値生成部１３０および加算部１３４は、以上のようにデジタル音声信号の極大値および極小値の絶対値を大きくする簡易な処理で、カットされてしまった高周波数成分を補完するため、より原音に近いデジタル音声信号を生成可能である。 As described above, a digital audio signal from which high-frequency components have been cut may have poor sound expansion, spread, dynamic range, glossiness, and the like. As described above, the correction value generation unit 130 and the addition unit 134 of the present embodiment complement the high-frequency component that has been cut by simple processing for increasing the absolute value of the maximum value and the minimum value of the digital audio signal. Therefore, a digital audio signal closer to the original sound can be generated.

次に、補正値生成部１３０が生成する補正値の他の例について説明する。補正値生成部１３０は、デジタル音声信号の極大値の１サンプル前のサンプルおよび１サンプル後のサンプルそれぞれの値と極大値との差分それぞれにテーブル選択部１２６が選択した係数テーブルの係数を乗算すると共に、デジタル音声信号の極小値の１サンプル前のサンプルおよび１サンプル後のサンプルそれぞれの値と極小値との差分それぞれにテーブル選択部１２６が選択した係数テーブルの係数を乗算することで補正値を生成する。 Next, another example of the correction value generated by the correction value generation unit 130 will be described. The correction value generation unit 130 multiplies the difference between the maximum value of the sample one sample before and one sample after the maximum value of the digital audio signal by the coefficient of the coefficient table selected by the table selection unit 126. At the same time, the correction value is obtained by multiplying the difference between the value of the sample one sample before and the sample after the sample of the minimum value of the digital audio signal by the coefficient of the coefficient table selected by the table selection unit 126. Generate.

そして、補正値生成部１３０は、極大値の１サンプル前のサンプルの値と極大値との差分に基づいて生成した補正値が極大値の１サンプル前のサンプルの値に加算されるように極大値の１サンプル前のサンプルに対応付け、極大値の１サンプル後のサンプルの値と極大値との差分に基づいて生成した補正値が極大値の１サンプル後のサンプルの値に加算されるように極大値の１サンプル後のサンプルに対応付け、極小値の１サンプル前のサンプルの値と極小値との差分に基づいて生成した補正値が極小値の１サンプル前のサンプルの値から減算されるように極小値の１サンプル前のサンプルに対応付け、極小値の１サンプル後のサンプルの値と極小値との差分に基づいて生成した補正値が極小値の１サンプル後のサンプルの値から減算されるように極小値の１サンプル後のサンプルに対応付ける。 Then, the correction value generation unit 130 maximizes the correction value generated based on the difference between the sample value one sample before the maximum value and the maximum value to the value of the sample one sample before the maximum value. The correction value generated based on the difference between the sample value one sample after the maximum value and the maximum value in correspondence with the sample one sample before the value is added to the value of the sample one sample after the maximum value. The correction value generated based on the difference between the value of the sample one sample before the minimum value and the value of the minimum value is subtracted from the value of the sample one sample before the minimum value. In this way, the correction value generated based on the difference between the sample value one sample after the minimum value and the minimum value is associated with the sample one sample before the minimum value and the value of the sample one sample after the minimum value is Subtracted Correspond to the sample after one sample of sea urchin minimum value.

ここでは、極値の１サンプル前のサンプルおよび１サンプル後のサンプルについて補正値を生成しているが、かかる場合に限定されず、極値の２つ以上前、極値の２つ以上後のサンプルについても補正値を生成してもよい。この場合、極大および極小近傍差分値は、補正値を生成する対象となるサンプルの値とそのサンプルより極値に１サンプル近いサンプルの値との差分となる。以下、図９および図１０を用いて、補正値生成部１３０のそのような処理について説明する。 Here, the correction value is generated for the sample one sample before the extreme value and the sample one sample after the extreme value. However, the correction value is not limited to such a case, and two or more before the extreme value and two or more after the extreme value. Correction values may also be generated for samples. In this case, the local maximum and local minimum difference values are the difference between the value of the sample for which the correction value is to be generated and the value of the sample closer to the extreme value than that sample. Hereinafter, such processing of the correction value generation unit 130 will be described with reference to FIGS. 9 and 10.

図９においても、図８（ａ）と同様、信号取得部１２０が取得したデジタル音声信号のサンプリング周波数が４４．１ｋＨｚであり、信号分析部１２４がそのデジタル音声信号は過去に音質改善処理が施されていないと判断した場合、テーブル選択部１２６は係数テーブルＡを選択する。図９（ａ）に示すようなデジタル音声信号１７０が取得されると、補正値生成部１３０は、補正対象の極大近傍サンプルが対応する極大値サンプルの１つ前の極小値サンプルから補正対象の極大近傍サンプルが対応する極大値サンプルまでのサンプル数を極値特定部１２２から取得する。そして、補正値生成部１３０は、係数記憶部１２８に記憶された係数テーブルＡの中からそのサンプル数に応じた１の係数を取得し、極大近傍サンプルの値と、極大値との差分（極大近傍差分値）であるｄｌ１、ｄｌ２に、その係数を乗算して、図９（ｂ）に示すように、補正値Δｄｌ１、Δｄｌ２を生成する。 Also in FIG. 9, as in FIG. 8A, the sampling frequency of the digital audio signal acquired by the signal acquisition unit 120 is 44.1 kHz, and the signal analysis unit 124 performs a sound quality improvement process on the digital audio signal in the past. If it is determined that it is not, the table selection unit 126 selects the coefficient table A. When the digital audio signal 170 as illustrated in FIG. 9A is acquired, the correction value generation unit 130 determines the correction target from the minimum value sample immediately before the maximum value sample to which the correction target local maximum sample corresponds. The number of samples up to the maximum value sample corresponding to the local maximum sample is acquired from the extreme value specifying unit 122. Then, the correction value generation unit 130 obtains one coefficient corresponding to the number of samples from the coefficient table A stored in the coefficient storage unit 128, and the difference between the maximum local sample value and the maximum value (maximum) As shown in FIG. 9B, correction values Δdl1 and Δdl2 are generated by multiplying dl1 and dl2 which are neighboring difference values) by their coefficients.

同様に、補正値生成部１３０は、補正対象の極小近傍サンプルが対応する極大値サンプルの１つ前の極大値サンプルから補正対象の極小近傍サンプルが対応する極小値サンプルまでのサンプル数を極値特定部１２２から取得し、係数記憶部１２８に記憶された係数テーブルＡの中からそのサンプル数に応じた１の係数を選択し、極小近傍サンプルの値と、極小値との差分（極小近傍差分値）であるｄｓ１、ｄｓ２に、その係数を乗算して、図９（ｂ）に示すように、補正値Δｄｓ１、Δｄｓ２を生成する。 Similarly, the correction value generation unit 130 determines the number of samples from the maximum value sample immediately before the maximum value sample to which the correction target local minimum sample corresponds to the local minimum sample to which the correction target local minimum sample corresponds. One coefficient corresponding to the number of samples is selected from the coefficient table A acquired from the specifying unit 122 and stored in the coefficient storage unit 128, and the difference between the local minimum sample value and the local minimum value (local minimum difference) Value) ds1 and ds2 are multiplied by the coefficients to generate correction values Δds1 and Δds2, as shown in FIG. 9B.

そして、補正値生成部１３０は、Δｄｌ１を極大値サンプル（極大値となったサンプル）から１サンプル前のサンプルと、Δｄｌ２を極大値サンプルから１サンプル後のサンプルと、Δｄｓ１を極小値サンプル（極小値となったサンプル）から１サンプル前のサンプルと、Δｄｓ２を極小値サンプルから１サンプル後のサンプルと、それぞれ対応付けて補正信号１７６を生成し、加算部１３４がデジタル音声信号に補正信号を加算する。すると、極大近傍差分値ｄｌ１、ｄｌ２に基づいて生成した補正値Δｄｌ１、Δｄｌ２が、対応する極大近傍サンプルの値に加算され、極小近傍差分値ｄｓ１、ｄｓ２に基づいて生成した補正値Δｄｓ１、Δｄｓ２が、極小近傍サンプルの値から減算される。その結果、図９（ｃ）に白抜き矢印で示すように、補正後のデジタル音声信号１７８の波形は矩形波に近づく。 Then, the correction value generation unit 130 sets Δdl1 to a sample one sample before the maximum value sample (sample that has reached the maximum value), Δdl2 sets a sample one sample after the maximum value sample, and Δds1 sets a minimum value sample (minimum value). The correction signal 176 is generated by associating the sample one sample before the sample) and the sample one sample after the minimum value sample with Δds2, and the adding unit 134 adds the correction signal to the digital audio signal. To do. Then, the correction values Δdl1 and Δdl2 generated based on the local maximum difference values dl1 and dl2 are added to the values of the corresponding local maximum samples, and the correction values Δds1 and Δds2 generated based on the local minimum difference values ds1 and ds2 are obtained. , Subtracted from the value of the local minimum sample. As a result, the waveform of the corrected digital audio signal 178 approaches a rectangular wave, as indicated by a white arrow in FIG. 9C.

本実施形態の補正値生成部１３０および加算部１３４は、デジタル音声信号のうち、極大近傍サンプルの値および極小近傍サンプルの値の絶対値を大きくする簡易な処理で補正して矩形波に近づけるため、より原音に近いデジタル音声信号を生成可能である。 The correction value generation unit 130 and the addition unit 134 of the present embodiment are corrected by a simple process of increasing the absolute value of the local maximum sample value and the local minimum sample value of the digital audio signal so as to approximate the rectangular wave. A digital audio signal closer to the original sound can be generated.

また、極大値サンプル、極小値サンプル、極大近傍サンプル、極小近傍サンプルすべてについて、補正値を生成して補正してもよい。仮に、極大差分値と極大値サンプル前後の極大近傍差分値とが等しくｄｌ３であり、極小差分値と極小値サンプル前後の極小近傍差分値とが等しくｄｓ３であるとする。その場合、図１０（ａ）に示すようなデジタル音声信号１７０が取得されると、補正値生成部１３０は、図１０（ｂ）に示すように、極大値サンプルと前後の極大近傍サンプルに補正値Δｄｌ３を、極小値サンプルと前後の極小近傍サンプルに補正値Δｄｓ３を対応付けて補正信号１８０を生成し、加算部１３４がデジタル音声信号に補正信号１８０を加算する。補正値生成部１３０および加算部１３４の処理の結果、図１０（ｃ）に白抜き矢印で示す補正後のデジタル音声信号１８２は、図８と図９とを用いて説明した補正の効果を兼ね備えるため、さらに、原音に近い音の再現が可能となる。 Further, a correction value may be generated and corrected for all of the maximum value sample, the minimum value sample, the local maximum sample, and the local minimum sample. Assume that the local maximum difference value and the local maximum difference value before and after the local maximum sample are equal to dl3, and the local minimum difference value and the local minimum difference value before and after the local minimum sample are equal to ds3. In this case, when the digital audio signal 170 as shown in FIG. 10A is acquired, the correction value generation unit 130 corrects the maximum value sample and the front and rear local maximum samples as shown in FIG. 10B. The correction signal 180 is generated by associating the value Δdl3 with the correction value Δds3 in association with the minimum value sample and the front and rear minimum neighboring samples, and the adding unit 134 adds the correction signal 180 to the digital audio signal. As a result of the processing of the correction value generation unit 130 and the addition unit 134, the corrected digital audio signal 182 indicated by the white arrow in FIG. 10C has the correction effect described with reference to FIGS. Therefore, it is possible to reproduce a sound close to the original sound.

本実施形態では、補正値生成部１３０が補正信号を生成し、加算部１３４がデジタル音声信号に補正信号を加算することとしたが、かかる場合に限定されず、補正値生成部１３０が補正信号を生成せず補正値のみを加算部１３４に出力し、加算部１３４が、補正値の符号やタイミングを調整して、デジタル音声信号に補正値を加算してもよい。 In this embodiment, the correction value generation unit 130 generates the correction signal, and the addition unit 134 adds the correction signal to the digital audio signal. However, the present invention is not limited to this, and the correction value generation unit 130 corrects the correction signal. May be output to the adding unit 134, and the adding unit 134 may adjust the sign and timing of the correction value to add the correction value to the digital audio signal.

なお、遅延部１３２は、信号取得部１２０から入力されたデジタル音声信号を、第１変換部１３８から第２変換部１４０までの各機能部における処理時間の合計分だけ遅延させ、第１変換部１３８から第２変換部１４０までの各機能部を経由したデジタル音声信号と同期させる。 Note that the delay unit 132 delays the digital audio signal input from the signal acquisition unit 120 by the total processing time in each functional unit from the first conversion unit 138 to the second conversion unit 140, and the first conversion unit It synchronizes with the digital audio signal that has passed through the functional units from 138 to the second conversion unit 140.

上述したように、音質改善処理が施されたデジタル音声信号には高周波数成分が含まれている。信号分析部１２４は、デジタル音声信号の周波数分析を行い高周波数成分の有無を識別することで音質改善処理が施されているか否かを判断する。そして、テーブル選択部１２６は、音質改善処理がすでに施されているか否かに応じた適切な係数を選択して、補正値生成部１３０が補正信号を生成する。かかる構成により、音声処理装置１００は、取得するデジタル音声信号に対しすでに音質改善処理が施されているか否かに応じて適切な音質改善処理を施し、中高音域のバランス感を保ち、より原音に近いデジタル音声信号を生成可能である。 As described above, the high-frequency component is included in the digital audio signal that has been subjected to the sound quality improvement processing. The signal analysis unit 124 performs frequency analysis of the digital audio signal and determines whether or not sound quality improvement processing has been performed by identifying the presence or absence of high frequency components. And the table selection part 126 selects the suitable coefficient according to whether the sound quality improvement process is already performed, and the correction value production | generation part 130 produces | generates a correction signal. With this configuration, the sound processing apparatus 100 performs an appropriate sound quality improvement process depending on whether or not the acquired digital sound signal has already been subjected to the sound quality improvement process, maintains a sense of balance in the middle and high frequencies, and further increases the original sound. A digital audio signal close to can be generated.

次に第１変換部１３８および第２変換部１４０の動作について説明する。第１変換部１３８は、サンプリング周波数を変更して、高周波数成分を付加する領域を確保することができる。 Next, operations of the first conversion unit 138 and the second conversion unit 140 will be described. The 1st conversion part 138 can change the sampling frequency, and can ensure the area | region which adds a high frequency component.

第１変換部１３８は、信号取得部１２０が取得したデジタル音声信号をアップサンプリングして極値特定部１２２、補正値生成部１３０、および遅延部１３２に出力する。第２変換部１４０は、加算部１３４が補正信号を加算した後のデジタル音声信号を、第１変換部１３８によってアップサンプリングされる前のサンプリング周波数へとダウンサンプリングする。 The first conversion unit 138 upsamples the digital audio signal acquired by the signal acquisition unit 120 and outputs it to the extreme value specifying unit 122, the correction value generation unit 130, and the delay unit 132. The second conversion unit 140 downsamples the digital audio signal after the addition unit 134 adds the correction signal to a sampling frequency before being upsampled by the first conversion unit 138.

図１１は、第１変換部１３８の処理を説明するための説明図である。図１１（ａ）に示すサンプリング周波数がｆｓであるデジタル音声信号に対して、補正処理に伴いサンプリング周波数の１／２以上の周波数成分がデジタル音声信号に付加されると、折り返しノイズが生じてしまう場合がある。 FIG. 11 is an explanatory diagram for explaining processing of the first conversion unit 138. When a digital audio signal having a sampling frequency of fs shown in FIG. 11A is added to the digital audio signal by a correction process with a frequency component that is 1/2 or more of the sampling frequency, aliasing noise occurs. There is a case.

そこで、図１１（ｂ）に示すように、第１変換部１３８は、デジタル音声信号のサンプリング周波数を２倍にアップサンプリングする。そして、この信号に対して、図１１（ｃ）に示すように、上述したように高周波数成分（図１１（ｃ）中ハッチングで示す）を付加する。その後、第２変換部１４０は、補正されたデジタル音声信号をダウンサンプリングしてサンプリング周波数を元のサンプリング周波数にすることで、折り返しノイズの発生を抑制することができる。 Therefore, as shown in FIG. 11B, the first converter 138 upsamples the sampling frequency of the digital audio signal by a factor of two. Then, as shown in FIG. 11C, a high frequency component (indicated by hatching in FIG. 11C) is added to this signal. Thereafter, the second conversion unit 140 can suppress the occurrence of aliasing noise by down-sampling the corrected digital audio signal to set the sampling frequency to the original sampling frequency.

（コンピュータ２００）
上述した音声処理装置１００は、コンピュータを用いて実現することができる。以下、音声処理装置１００をコンピュータを用いて実現する例について説明する。 (Computer 200)
The voice processing apparatus 100 described above can be realized using a computer. Hereinafter, an example in which the speech processing apparatus 100 is realized using a computer will be described.

図１２は、音声処理装置１００による音質改善処理が可能なコンピュータ（情報処理装置）２００の典型例を示した機能ブロック図である。コンピュータ２００は、中央処理装置２１０と、一時記憶装置２１２と、外部記憶装置２１４と、取得部２１６と、出力部２１８とを含んで構成される。 FIG. 12 is a functional block diagram showing a typical example of a computer (information processing apparatus) 200 that can perform sound quality improvement processing by the sound processing apparatus 100. The computer 200 includes a central processing unit 210, a temporary storage device 212, an external storage device 214, an acquisition unit 216, and an output unit 218.

中央処理装置（ＣＰＵ）２１０は、一時記憶装置２１２や外部記憶装置２１４のプログラムやアプリケーションによりコンピュータ２００全体を制御する。一時記憶装置２１２は、ＲＡＭ、ＥＥＰＲＯＭ、不揮発性ＲＡＭ等から構成され、中央処理装置２１０で処理されるデジタル音声信号等を一時的に記憶する。外部記憶装置２１４は、フラッシュメモリ、ＨＤＤ等で構成され、中央処理装置２１０で処理されるプログラムを記憶する。取得部２１６は、デジタル音声信号を取得し、一時記憶装置２１２に一時的に保持させる。出力部２１８は、当該コンピュータ２００によって補正されたデジタル音声信号を再生装置１１０等に出力する。 A central processing unit (CPU) 210 controls the entire computer 200 with programs and applications in the temporary storage device 212 and the external storage device 214. The temporary storage device 212 includes a RAM, an EEPROM, a nonvolatile RAM, and the like, and temporarily stores digital audio signals and the like processed by the central processing unit 210. The external storage device 214 includes a flash memory, an HDD, and the like, and stores a program processed by the central processing unit 210. The acquisition unit 216 acquires a digital audio signal and temporarily stores it in the temporary storage device 212. The output unit 218 outputs the digital audio signal corrected by the computer 200 to the playback device 110 or the like.

中央処理装置２１０は、プログラムを実行することによって、極値特定部１２２と、信号分析部１２４と、テーブル選択部１２６と、補正値生成部１３０と、遅延部１３２と、加算部１３４と、第１変換部１３８と、第２変換部１４０として機能する。したがって、本実施形態において、コンピュータ２００に、所定周波数以上かつ所定音圧以上の高周波数成分がデジタル音声信号に含まれているか否かを判断するステップと、デジタル音声信号に高周波数成分が含まれているか否かに応じて異なる係数に基づき、デジタル音声信号の振幅を拡大するような補正値を生成するステップと、デジタル音声信号に補正値を加算するステップと、を実行させる音声処理プログラムも提供される。また、このプログラムは、記憶媒体から読みとられてコンピュータに取り込まれてもよいし、通信網１０６を介してコンピュータ２００に取り込まれてもよい。 The central processing unit 210 executes the program to execute an extreme value specifying unit 122, a signal analysis unit 124, a table selection unit 126, a correction value generation unit 130, a delay unit 132, an addition unit 134, It functions as a first conversion unit 138 and a second conversion unit 140. Accordingly, in the present embodiment, the computer 200 determines whether or not the digital audio signal includes a high frequency component having a predetermined frequency or higher and a predetermined sound pressure or higher, and the digital audio signal includes the high frequency component. Also provided is an audio processing program that executes a step of generating a correction value that expands the amplitude of the digital audio signal and a step of adding the correction value to the digital audio signal based on different coefficients depending on whether or not Is done. In addition, this program may be read from a storage medium and loaded into a computer, or may be loaded into the computer 200 via the communication network 106.

（音声処理方法）
次に、上述した音声処理装置１００を用いてデジタル音声信号を分析し、その分析結果を用いてデジタル音声信号を補正する音声処理方法を説明する。 (Audio processing method)
Next, an audio processing method for analyzing a digital audio signal using the above-described audio processing apparatus 100 and correcting the digital audio signal using the analysis result will be described.

図１３、１４は、音声処理方法の全体的な流れを示したフローチャートである。ここでは、音声処理方法を大きく２つに分け、図１３を用いて、係数テーブルを選択する処理を説明し、図１４を用いて、その選択された係数テーブルに基づく補正処理を説明する。 13 and 14 are flowcharts showing the overall flow of the voice processing method. Here, the voice processing method is roughly divided into two, the process of selecting a coefficient table will be described using FIG. 13, and the correction process based on the selected coefficient table will be described using FIG.

当該係数テーブルの選択処理が開始されると、まず、所定期間のカウント（図示せず）が開始される。次に、図１３に示すように、信号取得部１２０がデジタル音声信号を取得し（Ｓ３００のＹＥＳ）、極値特定部１２２が、信号取得部１２０が取得したデジタル音声信号の極値である極大値と極小値とを特定すると（Ｓ３０２のＹＥＳ）、極値特定部１２２は、任意の極値から次の極値までのサンプル数を計数して（Ｓ３０４）、極大値と極小値との差分である極値間差分値（極大差分値と極小差分値）を導出する（Ｓ３０６）。 When the coefficient table selection process is started, first, counting for a predetermined period (not shown) is started. Next, as shown in FIG. 13, the signal acquisition unit 120 acquires a digital audio signal (YES in S300), and the extreme value specifying unit 122 is the local maximum that is the extreme value of the digital audio signal acquired by the signal acquisition unit 120. When the value and the minimum value are specified (YES in S302), the extreme value specifying unit 122 counts the number of samples from any extreme value to the next extreme value (S304), and the difference between the maximum value and the minimum value A difference value between extreme values (maximum difference value and minimum difference value) is derived (S306).

続いて、信号分析部１２４は、任意の極値から次の極値までのサンプル数がサンプル数閾値より少ないか否かを判断する（Ｓ３０８）。任意の極値から次の極値までのサンプル数がサンプル数閾値より少ない場合（Ｓ３０８のＹＥＳ）、信号分析部１２４は、極大値と極小値との差分である極値間差分値が極値レベル閾値を超えているか否かを判断する（Ｓ３１０）。極値間差分値が極値レベル閾値を超えている場合（Ｓ３１０のＹＥＳ）、デジタル音声信号に高周波数成分が含まれていると判断し、テーブル選択部１２６は、第２係数テーブル群１６８からデジタル信号のフォーマットに基づいて係数テーブルを選択する（Ｓ３１２）。したがって、第２係数テーブル群１６８から係数テーブルが選択された場合、所定期間の経過を待たずに当該係数テーブルの選択処理が終了する。 Subsequently, the signal analysis unit 124 determines whether or not the number of samples from an arbitrary extreme value to the next extreme value is smaller than the sample number threshold (S308). When the number of samples from any extreme value to the next extreme value is smaller than the sample number threshold (YES in S308), the signal analysis unit 124 indicates that the difference value between extreme values, which is the difference between the maximum value and the minimum value, is an extreme value. It is determined whether or not the level threshold is exceeded (S310). If the difference value between the extreme values exceeds the extreme level threshold (YES in S310), it is determined that the digital audio signal includes a high frequency component, and the table selection unit 126 determines from the second coefficient table group 168. A coefficient table is selected based on the format of the digital signal (S312). Therefore, when a coefficient table is selected from the second coefficient table group 168, the coefficient table selection process ends without waiting for the elapse of a predetermined period.

信号取得部１２０がデジタル音声信号を取得していないとき（Ｓ３００のＮＯ）、極値特定部１２２が、信号取得部１２０が取得したデジタル音声信号の極値である極大値と極小値とを特定するまで（Ｓ３０２のＮＯ）、任意の極値から次の極値までのサンプル数がサンプル数閾値以上の場合（Ｓ３０８のＮＯ）、または、極値間差分値が極値レベル閾値以下の場合（Ｓ３１０のＮＯ）、テーブル選択部１２６は、所定期間が経過したか否かを判断する（Ｓ３１４）。所定期間が経過していなかった場合（Ｓ３１４のＮＯ）、信号取得ステップ（Ｓ３００）から繰り返す。所定期間が経過していた場合（Ｓ３１４のＹＥＳ）、テーブル選択部１２６は、第１係数テーブル群１６６からデジタル信号のフォーマットに基づいて係数テーブルを選択する（Ｓ３１６）。こうして、所定期間内にサンプル数と極値間差分値とが所定の条件を満たさなければ、第１係数テーブル群１６６から係数テーブルが選択されることとなる。 When the signal acquisition unit 120 has not acquired a digital audio signal (NO in S300), the extreme value specifying unit 122 specifies the maximum value and the minimum value that are the extreme values of the digital audio signal acquired by the signal acquisition unit 120. Until the number of samples from any extreme value to the next extreme value is greater than or equal to the sample number threshold (NO in S308), or the difference value between extreme values is less than or equal to the extreme level threshold (NO in S302) In step S310, the table selection unit 126 determines whether a predetermined period has elapsed (step S314). If the predetermined period has not elapsed (NO in S314), the process is repeated from the signal acquisition step (S300). When the predetermined period has elapsed (YES in S314), the table selection unit 126 selects a coefficient table from the first coefficient table group 166 based on the format of the digital signal (S316). Thus, if the number of samples and the difference value between extreme values do not satisfy the predetermined condition within the predetermined period, the coefficient table is selected from the first coefficient table group 166.

第２係数テーブル群選択ステップ（Ｓ３１２）、または第１係数選択ステップ（Ｓ３１６）でテーブル選択部１２６が係数テーブルの選択を終えると、図１４に示す補正処理に移行する。 When the table selection unit 126 finishes selecting the coefficient table in the second coefficient table group selection step (S312) or the first coefficient selection step (S316), the process proceeds to the correction process shown in FIG.

図１４に示すように、補正処理では、信号取得部１２０がデジタル音声信号を取得すると（Ｓ３００のＹＥＳ）、極値特定部１２２が、信号取得部１２０が取得したデジタル音声信号の極値である極大値と極小値とを特定するまで（Ｓ３０２のＮＯ）、信号取得ステップ（Ｓ３００）を繰り返す。信号取得部１２０が取得したデジタル音声信号の極値である極大値と極小値とを特定すると（Ｓ３０２のＹＥＳ）、極値特定部１２２は、極値から極値までのサンプル数を計数する（Ｓ３０４）。なお、図１４におけるデジタル音声信号と図１３におけるデジタル音声信号は同様の信号である。 As shown in FIG. 14, in the correction process, when the signal acquisition unit 120 acquires a digital audio signal (YES in S300), the extreme value specifying unit 122 is the extreme value of the digital audio signal acquired by the signal acquisition unit 120. The signal acquisition step (S300) is repeated until the maximum value and the minimum value are specified (NO in S302). When the maximum value and the minimum value, which are the extreme values of the digital audio signal acquired by the signal acquisition unit 120, are specified (YES in S302), the extreme value specifying unit 122 counts the number of samples from the extreme value to the extreme value ( S304). The digital audio signal in FIG. 14 and the digital audio signal in FIG. 13 are similar signals.

そして、補正値生成部１３０は、第２係数テーブル群選択ステップ（Ｓ３１２）、または第１係数選択ステップ（Ｓ３１６）で、テーブル選択部１２６が選択した係数テーブルを用いて、デジタル音声信号のサンプル数に対応した１の係数を選択する（Ｓ３３０）。 Then, the correction value generation unit 130 uses the coefficient table selected by the table selection unit 126 in the second coefficient table group selection step (S312) or the first coefficient selection step (S316), and the number of samples of the digital audio signal. A coefficient corresponding to 1 is selected (S330).

補正値生成部１３０は、デジタル音声信号の極大値と、その極大値サンプルの１サンプル前のサンプルの値との差分である極大差分値にテーブル選択部１２６が選択した係数テーブルの係数を乗算することでその極大値のサンプルに加算する補正値を生成すると共に、デジタル音声信号の極小値と、その極小値サンプルの１サンプル前のサンプルの値との差分である極小差分値にテーブル選択部１２６が選択した係数テーブルの係数を乗算することでその極小値のサンプルから減算する補正値を生成し（Ｓ３３２）、極大差分値に基づいて生成した補正値が極大値に加算されるように極大値サンプルに対応付け、極小差分値に基づいて生成した補正値が極小値から減算されるように極小値サンプルに対応付ける（Ｓ３３４）。 The correction value generation unit 130 multiplies the maximum difference value, which is the difference between the maximum value of the digital audio signal and the value of the sample one sample before the maximum value sample, by the coefficient of the coefficient table selected by the table selection unit 126. Thus, a correction value to be added to the maximum value sample is generated, and the table selection unit 126 sets the difference value between the minimum value of the digital audio signal and the value of the sample one sample before the minimum value sample. Is multiplied by the coefficient of the selected coefficient table to generate a correction value to be subtracted from the sample of the minimum value (S332), and the maximum value is added so that the correction value generated based on the maximum difference value is added to the maximum value. In association with the sample, the correction value generated based on the minimum difference value is associated with the minimum value sample so as to be subtracted from the minimum value (S334).

続いて、補正値生成部１３０は、加算部１３４において、極大差分値から生成された補正値が、その補正値自体が対応付けられたデジタル音声信号の極大値に、極小差分値から生成された補正値が、その補正値が対応付けられたデジタル音声信号の極小値に、それぞれ同期するように補正値を配した補正信号を生成する（Ｓ３３６）。加算部１３４は、デジタル音声信号と補正信号とを加算し（Ｓ３３８）、信号出力部１３６は、補正信号が加算されたデジタル音声信号を出力し（Ｓ３４０）、信号取得ステップ（図１４に示すＳ３００）に戻る。 Subsequently, in the addition unit 134, the correction value generation unit 130 generates the correction value generated from the maximum difference value from the minimum difference value to the maximum value of the digital audio signal associated with the correction value itself. A correction signal in which the correction value is arranged so as to be synchronized with the minimum value of the digital audio signal associated with the correction value is generated (S336). The adder 134 adds the digital audio signal and the correction signal (S338), and the signal output unit 136 outputs the digital audio signal added with the correction signal (S340), and a signal acquisition step (S300 shown in FIG. 14). Return to).

以上説明した音声処理方法によっても、音質改善処理されたデジタル音声信号と通常のデジタル音声信号とを識別でき、それぞれに合わせた係数を用いて、補正処理を行うため、より原音に近いデジタル音声信号を生成可能となる。 Even with the audio processing method described above, a digital audio signal that has been subjected to sound quality improvement processing can be distinguished from a normal digital audio signal, and correction processing is performed using coefficients corresponding to each, so that the digital audio signal closer to the original sound is used. Can be generated.

以上、添付図面を参照しながら本発明の好適な実施形態について説明したが、本発明はかかる実施形態に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 As mentioned above, although preferred embodiment of this invention was described referring an accompanying drawing, it cannot be overemphasized that this invention is not limited to this embodiment. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Is done.

なお、本明細書の音声処理方法における各工程は、必ずしもフローチャートとして記載された順序に沿って時系列に処理する必要はなく、並列的あるいはサブルーチンによる処理を含んでもよい。 Note that each step in the voice processing method of the present specification does not necessarily have to be processed in time series in the order described in the flowchart, and may include parallel or subroutine processing.

本発明は、デジタル音声信号を分析し、その分析結果に応じてデジタル音声信号を処理する音声処理装置、音声処理方法および音声処理プログラムに利用することができる。 The present invention can be used in an audio processing apparatus, an audio processing method, and an audio processing program that analyze a digital audio signal and process the digital audio signal according to the analysis result.

１００ …音声処理装置
１２０ …信号取得部
１２４ …信号分析部
１２６ …テーブル選択部
１２８ …係数記憶部
１３０ …補正値生成部
１３４ …加算部
１３６ …第１変換部
１３８ …第２変換部
２００ …コンピュータ DESCRIPTION OF SYMBOLS 100 ... Voice processing apparatus 120 ... Signal acquisition part 124 ... Signal analysis part 126 ... Table selection part 128 ... Coefficient memory | storage part 130 ... Correction value generation part 134 ... Addition part 136 ... 1st conversion part 138 ... 2nd conversion part 200 ... Computer

Claims

A signal analysis unit for determining whether or not a high frequency component of a predetermined frequency or higher and a predetermined sound pressure or higher is included in the digital audio signal;
Correction that generates a correction value that expands the amplitude of the digital audio signal based on a different coefficient depending on whether or not the signal analysis unit determines that the high-frequency component is included in the digital audio signal A value generator,
An adder for adding the correction value to the digital audio signal;
An audio processing apparatus comprising:

The signal analysis unit has a number of samples from an arbitrary extreme value of the digital audio signal to the next extreme value, which is smaller than a predetermined number, and the larger extreme value between the arbitrary extreme value and the next extreme value. A difference value between extreme values that is a difference between a maximum value that is a value and a minimum value that is the smaller one of the arbitrary extreme value and the next extreme value exceeds a predetermined value The audio processing apparatus according to claim 1, wherein the digital audio signal is determined to include the high frequency component.

The signal analysis unit includes the arbitrary extreme value and the next extreme value for a plurality of samples in which the number of samples from an arbitrary extreme value to the next extreme value of the digital audio signal is less than a predetermined number. The difference value between the extreme values, which is the difference between the maximum value that is the larger extreme value, and the local minimum value that is the smaller of the arbitrary extreme value and the next extreme value, The audio processing according to claim 1, wherein if the occupation ratio occupied by samples exceeding a predetermined value exceeds a predetermined ratio, it is determined that the high-frequency component is included in the digital audio signal. apparatus.

When the signal analysis unit determines that the high frequency component is included in the digital audio signal, the correction value generation unit does not include the high frequency component in the digital audio signal. The speech processing apparatus according to any one of claims 1 to 3, wherein the correction value is generated based on a smaller coefficient than the case where the determination is made.

5. The audio processing apparatus according to claim 1, wherein the correction value generation unit generates the correction value based on a coefficient that differs according to a format of the digital audio signal. 6.

The audio processing apparatus according to claim 2, wherein the signal analysis unit determines the predetermined number and the predetermined value based on a format of the digital audio signal.

A first converter for upsampling a digital audio signal acquired by the audio processing device;
A second conversion unit that downsamples the digital audio signal after the addition unit adds the correction value to a sampling frequency before being upsampled by the first conversion unit;
The speech processing apparatus according to claim 1, further comprising:

The correction value generation unit multiplies the coefficient by a maximum difference value that is a difference between the maximum value of the digital audio signal and the value of the sample one sample before the sample that has reached the maximum value, and the digital audio signal The correction value is generated by multiplying the minimum difference value, which is the difference between the minimum value of the signal and the value of the sample one sample before the sample having the minimum value, by the coefficient, and based on the maximum difference value The correction value generated in association with the sample having the maximum value so that the correction value is added to the maximum value, and the correction value generated based on the minimum difference value is subtracted from the minimum value. The speech processing apparatus according to claim 1, wherein the speech processing apparatus is associated with a sample having a minimum value.

The correction value generation unit
The difference between the respective values of the sample one sample before and one sample after the maximum value of the digital audio signal and the maximum value is multiplied by the coefficient, and one sample before the minimum value of the digital audio signal. The correction value is generated by multiplying the difference between each value of the sample and the sample after one sample and the minimum value by the coefficient,
One sample before the maximum value so that a correction value generated based on the difference between the sample value one sample before the maximum value and the maximum value is added to the value of the sample one sample before the maximum value. Map to the sample
After one sample of the maximum value, a correction value generated based on the difference between the sample value after one sample of the maximum value and the maximum value is added to the value of the sample after one sample of the maximum value. Map to the sample
One sample before the minimum value so that a correction value generated based on the difference between the value of the sample one sample before the minimum value and the minimum value is subtracted from the value of the sample one sample before the minimum value. Map to the sample
After one sample of the minimum value, the correction value generated based on the difference between the sample value after one sample of the minimum value and the minimum value is subtracted from the value of the sample after one sample of the minimum value. The voice processing apparatus according to claim 1, wherein the voice processing apparatus is associated with the sample.

Determine whether the digital audio signal contains high frequency components above a specified frequency and above a specified sound pressure,
Based on different coefficients depending on whether the high frequency component is included in the digital audio signal, generating a correction value that expands the amplitude of the digital audio signal,
An audio processing method comprising adding the correction value to the digital audio signal.

On the computer,
Determining whether the digital audio signal includes a high frequency component of a predetermined frequency or higher and a predetermined sound pressure or higher;
Generating a correction value for enlarging the amplitude of the digital audio signal based on different coefficients depending on whether the high frequency component is included in the digital audio signal;
Adding the correction value to the digital audio signal;
A voice processing program characterized by causing