JP2024003855A

JP2024003855A - Sound quality generation means and acoustic data generation means

Info

Publication number: JP2024003855A
Application number: JP2022103159A
Authority: JP
Inventors: 裕児玉; Yutaka Kodama; 純一角元; Junichi Kakumoto
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2024-01-16
Anticipated expiration: 2042-06-28
Also published as: JP7179250B1

Abstract

PROBLEM TO BE SOLVED: To solve a problem in which a part that extracts sound quality is in a complex and restricted environment, it is technically not easy to freely express the intended sound quality, and in particular, at the stage of finalizing the sound quality, in a state of searching for compromises due to various factors.

SOLUTION: In a process of creating sound quality, data from which a plurality of sound quality components are extracted is prepared in advance for each unit of acoustic data, without depending on filter design or parameter selection. Each piece of sound quality component data of one sound data is multiplied by a specific coefficient and perform the addition to synthesis with sound data of a desired sound quality.

SELECTED DRAWING: Figure 1

Description

ディジタル音響信号処理技術
人の聴感、聴感を表現する言葉、言葉に対応する信号処理機能、それらの関係の曖昧さ
スマートフォンの音響信号処理環境
インターネット通信環境
仮想空間と呼ばれる分野の音質の表現技術 Digital acoustic signal processing technology Human hearing, words that express the sense of hearing, signal processing functions that correspond to words, ambiguity of the relationship between them Smartphone acoustic signal processing environment Internet communication environment Sound quality expression technology in the field called virtual space

請求項の用語を適用する。
ＡＰＫはアプリケーションプログラムのこととする。 Apply claim terms.
APK refers to an application program.

背景技術との関連事項１、
音質評価は感性に依存する。感性は評価する人によって異なる。純粋に理論に基づく計算処理を人の感性に合わせて仕上げる作業は技術担当者泣かせである。
感性による音質の評価に関して、客観的な数値で表現することには限界があることから、多くの現状は、熟練者の感性によって音質が決定される。
商品の生産を担当するマニファクチャーでは、ターゲットとする市場の商品との比較に
よって自社商品の音質をターゲット商品ン合わせて仕上げるというケースは少なくない。 Matters related to background technology 1,
Sound quality evaluation depends on sensitivity. Sensitivity varies depending on the person evaluating it. The task of refining a purely theoretical calculation process to suit human sensibilities is a challenge for engineers.
Regarding sound quality evaluation based on sensibility, there are limits to expressing it in objective numerical values, and therefore, in many cases, sound quality is currently determined by the sensibilities of experts.
Manufacturers in charge of product production often match the sound quality of their own products with those of the target market by comparing them with products in the target market.

背景技術との関連事項２、
人の音質に関する感性に照らして商品を企画開発する工程、特にＡＰＫ開発業務に関し、高度な音響信号処理の技術者を投入しなくても済む、という環境の整備は、業界関係者の潜在ニーズでもある。 Matters related to background technology 2,
Creating an environment that eliminates the need to bring in advanced audio signal processing engineers for the process of planning and developing products in light of people's sensibilities regarding sound quality, especially for APK development work, is a latent need among those in the industry. be.

独立した音質成分の加算による音質生成が可能であることは公知である。
しかし、この手法を使うと、フィルターの設計が、通常の参考書に説明されているフィルターとは微妙に違う性質を持つ。複数のフィルターの出力が加算されるので、最終出力を計算式で表現すると解かりにくいという側面もある。
一般には使われていない手法である。
取り上げる文献は特に無い。 It is well known that sound quality can be generated by adding independent sound quality components.
However, when this method is used, the filter design has slightly different characteristics from the filters described in standard reference books. Since the outputs of multiple filters are added together, it is difficult to express the final output using a formula.
This is a method that is not generally used.
There is no particular literature to discuss.

請求項で定義された用語については以下の説明でも同様であるものとする。 The same applies to the terms defined in the claims below.

次の第１と第２の課題がある。
第１の課題は、
納得できる音質への仕上げの際、
カットアンドトライに頼らざるを得ないケースが多く、正確な工程見積もりが難しい。
信号処理の目的が人の感性を対象とすることから、
音質評価の基準の設定と結果の評価のバラツキへの対処が容易なプログラミング技術や
ＧＵＩの設計やパラメータのバリエーションの選択手法、等に熟練を要する。 There are the following first and second issues.
The first issue is
When finishing to a satisfactory sound quality,
In many cases, we have to rely on cut-and-try methods, making it difficult to make accurate process estimates.
Since the purpose of signal processing is to target human sensibilities,
Skills are required in programming techniques, GUI design, parameter variation selection methods, etc. that make it easy to set standards for sound quality evaluation and deal with variations in evaluation results.

第２の課題は、
近年、AVコンテンツの表現が多様になっている。映像は、非線型処理を
作用させることで映像効果を表現することは全く問題ない。音響は極めて特殊な場合を除いて線型処理でなければならない。線型信号処理であることの必要条件は、
信号処理やそのプログラミング手法に制限が伴う。
さらに視覚と違って、感性の個人差や仕事に携わるメンバー間での正確な評価の共有が
難しい。多様なコンテンツ表現に対応して、その都度、信号処理に手を加えることは面倒でもある。 The second challenge is
In recent years, the expression of AV content has become diverse. There is no problem in expressing visual effects by applying non-linear processing to images. Sound must be processed linearly except in very special cases. The necessary conditions for linear signal processing are:
There are limitations to signal processing and programming methods.
Furthermore, unlike visual perception, individual sensibilities vary and it is difficult to share accurate evaluations among the members involved in the work. It is also troublesome to modify signal processing each time to accommodate various content expressions.

本案の本質は、音質仕上げの工程を、独立した二つの仕事に分業させるところにある。
複数種類の元音響データに関わる音質成分の抽出手段をあらかじめ用意しておく。
この仕事は、音響信号処理のスキルの高い者が担当する。音質成分の抽出の手段は
音響データ全般に共通であることから音質名称を付けて標準化、共通化が可能である。
用意された音質成分を組み合わせることで目的とする音質を仕上げる。
音質成分の抽出手段によって抽出された音質成分のデータを組み合わせる、という簡単な手法で目的の音質を作ることができる。
あるいは、リスナーが好む音質選択に委ねることもできる。 The essence of this proposal is to divide the sound quality finishing process into two independent tasks.
A means for extracting sound quality components related to multiple types of original acoustic data is prepared in advance.
This work will be carried out by a person with high skill in audio signal processing. Since the means for extracting sound quality components is common to all audio data, standardization and commonization are possible by assigning sound quality names.
The desired sound quality is achieved by combining the prepared sound quality components.
A desired sound quality can be created by a simple method of combining data of sound quality components extracted by the sound quality component extraction means.
Alternatively, it is possible to leave it up to the listener to select the sound quality they prefer.

本案発明は、手間暇がかからず、簡素にもかかわらず、音質名称と感覚との乖離が少なく、高度な音質表現と高い自由度の音質生成環境を提供できる。
第１の課題に関しては、
通常のプログラマーであれば誰でも記述できる程度に簡単で簡素にもかかわらず、
作ろうとする音質のイメージに合わせて高度な音質制御の仕組みを設計できる。
音質の調節に伴うフィルター自身の発振や、計算行程のダイナミックレンジオーバーなど、
不安定な現象は皆無であり、調節時のノイズ発生に関しては係数のスムーシング処理が必要なだけである。極めて使い安い。
第２の課題に関しては、
音質の制御軸が感性と一致していて、その軸上の強度の調節が係数の調節に対応している、という理由により、音質評価に携わるメンバー間での、実態と評価の共有度が高い。 The present invention requires no time and effort, and although it is simple, there is little discrepancy between sound quality names and sensations, and it is possible to provide a high-quality sound quality expression and a sound quality generation environment with a high degree of freedom.
Regarding the first issue,
Although it is simple and simple enough that any normal programmer can write it,
You can design an advanced sound quality control mechanism to match the sound quality you are trying to create.
The filter itself oscillates when adjusting the sound quality, and the dynamic range of the calculation process is exceeded.
There are no unstable phenomena, and smoothing of the coefficients is only necessary for noise generation during adjustment. Extremely cheap to use.
Regarding the second issue,
Because the control axis of sound quality matches sensitivity, and the adjustment of the intensity on that axis corresponds to the adjustment of the coefficient, there is a high degree of sharing of the actual situation and evaluation among the members involved in sound quality evaluation. .

請求項１の一実施例と応用例について、データの流れを示すブロック図A block diagram showing the flow of data regarding an embodiment and an application example of claim 1. 請求項１の係数セットを使っての音質選択と決定の実施例Example of sound quality selection and determination using the coefficient set of claim 1 請求項２の低音成分抽出手段の一実施例のゲイン位相特性Gain phase characteristics of an embodiment of the bass component extraction means according to claim 2 請求項２の高音成分抽出手段の一実施例のゲイン位相特性Gain phase characteristics of an embodiment of the treble component extraction means according to claim 2 請求項２の低音と高音の成分抽出手段による一実施例の再生音データの総合特性Comprehensive characteristics of reproduced sound data according to an embodiment of the bass and treble component extraction means according to claim 2 請求項３の明瞭成分均一化データを得る手段の一例のブロック図A block diagram of an example of means for obtaining clear component uniformity data according to claim 3. 請求項３の抽出フィルターの一実施例のゲイン位相特性Gain phase characteristics of an embodiment of the extraction filter according to claim 3 請求項３の明瞭成分均一化手段による一実施例の明瞭度改善効果Clarity improvement effect of an embodiment by the clear component equalization means of claim 3

再生音データGh(t) の供給
人の音質感覚に馴染みやすい音質表現のＧＵＩと成分データFk{S(t)} の供給
複数の、成分データFk{S(t)} と係数データHk(t) からなる音響データの供給
音質名称nameh と再生音データGh(t) のセットからなる音響データの供給 Supplier of reproduced sound data Gh(t) Supplier of sound quality representation GUI and component data Fk{S(t)} that are easily familiar to the person's sense of sound quality Multiple component data Fk{S(t)} and coefficient data Hk(t) ) Supply of acoustic data consisting of a set of sound quality name nameh and playback sound data Gh(t)

音質を表現する仕組みの標準化、規格化
音質成分の抽出アルゴリズムのライブラリ化
通常のプログラミング技術のスキルで高度な音質仕上げが可能 Standardization of the mechanism for expressing sound quality, creation of a library of extraction algorithms for standardized sound quality components, and advanced sound quality finishing possible with ordinary programming skills.

以下は図面を用いての請求項の捕捉説明である。
図１は請求項１の一実施例と応用例について、データの流れを示すブロック図である。
成分係数セットHk(t) を使っての音質選択の実施例と応用例である。
S(t) は元音響データ、Gh(t) は成分係数セットが H の場合の再生音データ、
Fk{} k=1,2,,,n は各音質成分を抽出する抽出フィルター
Fk{S(t)} k=1,2,,,n は元音響データの各音質成分、但し、F0{S(t)}=S(t)} とする。
Hk(t) k=0,1,2,,,n は音質生成に使われる成分係数セット、
Hk*Fk{S(t)} k=0,1,2,,,n は音質の合成使われる各音質成分、
SV はサーバー、 Ga(t),Gb(t),Gc(t) は成分係数セット H がそれぞれ A,B,C である場合の再生に供せられる再生音データセット、
TH はリスナーの端末器、Ａ、Ｂ、Ｃは端末器側が選択する音質の種類、である。
Gh(t)=Σ{Hk(t)*Fk{S(t))} k=0,1,2,,,n
は成分係数セットが H の場合の再生データ、
Ga(t)=Σ{Hak(t)*Fk{S(t))}、Gb(t)=Σ{Hbk(t)*Fk{S(t))}、Gc(t)=Σ{Hck(t)*Fk{S(t))} k=0,1,2,,,n
は成分係数セットがそれぞれＡ、Ｂ、Ｃの場合の再生データである。 The following is an explanatory statement of the claims with the aid of the drawings.
FIG. 1 is a block diagram showing the flow of data regarding one embodiment and application example of claim 1.
These are examples and application examples of sound quality selection using the component coefficient set Hk(t).
S(t) is the original sound data, Gh(t) is the reproduced sound data when the component coefficient set is H,
Fk{} k=1,2,,,n is an extraction filter that extracts each sound quality component
Fk{S(t)} k=1,2,,,n is each sound quality component of the original sound data, where F0{S(t)}=S(t)}.
Hk(t) k=0,1,2,,,n is the component coefficient set used for sound quality generation,
Hk*Fk{S(t)} k=0,1,2,,,n is each sound quality component used for sound quality synthesis,
SV is the server, Ga(t), Gb(t), and Gc(t) are the reproduced sound data sets used for reproduction when the component coefficient set H is A, B, and C, respectively.
TH is the listener's terminal, and A, B, and C are the types of sound quality selected by the terminal.
Gh(t)=Σ{Hk(t)*Fk{S(t))} k=0,1,2,,,n
is the reproduced data when the component coefficient set is H,
Ga(t)=Σ{Hak(t)*Fk{S(t))}, Gb(t)=Σ{Hbk(t)*Fk{S(t))}, Gc(t)=Σ{Hck (t)*Fk{S(t))} k=0,1,2,,,n
are the reproduced data when the component coefficient sets are A, B, and C, respectively.

図１では、
複数の音質成分はあらかじめ用意されていて、
それぞれの音質成分ごとに係数があって、
それぞれの音質成分ごとに係数を決めることで全体の音質を決定する場合の
実施例と応用例を示すブロック図である。
成分データはあらかじめ抽出され、保管データとして存在するものとする。
目的の音質合成が、音質成分に係数を乗じたデータの加算によって合成されることから、音質表現は直感的であり、構造が簡素である。
それぞれの音質成分について、音質の抽出フィルターの、種類やパラメータは固定であることから音質変更に伴うところのフィルター内部の不安定動作、ノイズの発生、
オーバーダイナミックレンジの対策を必要としない。それぞれの音質成分に
それぞれの係数を乗ずる計算処理だけなので、動的な音質可変が極めて容易である。 In Figure 1,
Multiple sound quality components are prepared in advance,
There is a coefficient for each sound quality component,
FIG. 3 is a block diagram showing an embodiment and an application example in which the overall sound quality is determined by determining coefficients for each sound quality component.
It is assumed that the component data is extracted in advance and exists as stored data.
Since the desired sound quality is synthesized by adding data obtained by multiplying sound quality components by coefficients, the sound quality expression is intuitive and the structure is simple.
For each sound quality component, the type and parameters of the sound quality extraction filter are fixed, so changes in sound quality may result in unstable operation inside the filter, generation of noise, etc.
No measures against over dynamic range are required. Since all that is required is the calculation process of multiplying each sound quality component by each coefficient, it is extremely easy to dynamically change the sound quality.

図２は、請求項１の音質データセットと成分係数セットを使っての、音質の選択と決定の実施例の説明図である。
成分データの各音質成分は、元音響データの供給側と第三者のいずれかによって作られる。
そして、音質選択は音響データを作る側、第三者、リスナーのいずれかによる。
図２による音質の決定方法は、関係者間で音質名と実質効果と感覚評価の一致度が高い。
音質は、独立ベクトル軸の選択とその大きさの組合せで表現できる。
Namek k=0,1,2,,,k,,,n は音質 Fk{S(t)} の音質ベクトル名称、
Hk(t) k=0,1,2,,,k,,,n は音質生成に使われる成分係数セット、
Vk(t) k=0,1,2,,,k,,,n はそれぞれの成分係数の値
である
図２（ａ）は、音質成分の強度を軸ベクトルの長さとする音質選択ＧＵＩの一例である。
図２（ｂ）は、音質成分の強度を軸ベクトルの長さとする音質選択ＧＵＩの一例である。
図２（ｃ）は、音質成分の強度を横位置に対応させる音質選択ＧＵＩの一例である。
図２（ｄ）は、音質成分の強度を数値に対応させる音質選択ＧＵＩの一例である。
図２（ａ）、（ｂ）、（Ｃ）、（ｄ）それぞれの音質選択ＧＵＩの表現方法は異なるが、
名称付けられた、個々の音質ベクトルの強度を決める仕組みは共通である。
請求項１の音質生成方法は、一般に用いられる音質調整卓のような帯域フィルターの組合せによる方法と比べ、音質の表現と感覚の一致度が高い。このことは音質設計が、強度を選択された各種音質成分を加算合成することで叶うところにある。
そして、音質を設計する側、評価する側、利用する側、のそれぞれが図２の例に示す音質表現によって、表現と感覚の共有度と一致度を高めることが容易であることを示す。
結果、音質成分の抽出アルゴリズムとソフトウェアの供給手段の標準化と
音質の選択と決定手段の標準化と評価に関しての標準化が可能となる。 FIG. 2 is an explanatory diagram of an example of selection and determination of sound quality using the sound quality data set and component coefficient set of claim 1.
Each sound quality component of the component data is created either by the supplier of the original audio data or by a third party.
The sound quality is selected by either the person creating the audio data, a third party, or the listener.
The sound quality determination method shown in FIG. 2 has a high degree of agreement between the sound quality name, practical effect, and sensory evaluation among the parties involved.
Sound quality can be expressed by a combination of the selection of independent vector axes and their magnitudes.
Namek k=0,1,2,,,k,,,n is the sound quality vector name of sound quality Fk{S(t)},
Hk(t) k=0,1,2,,,k,,,n is the component coefficient set used for sound quality generation,
Vk(t) k=0,1,2,,,k,,,n is the value of each component coefficient. Figure 2(a) shows the sound quality selection GUI where the strength of the sound quality component is the length of the axis vector. This is an example.
FIG. 2(b) is an example of a sound quality selection GUI that uses the strength of a sound quality component as the length of an axis vector.
FIG. 2C is an example of a sound quality selection GUI that makes the intensity of sound quality components correspond to the horizontal position.
FIG. 2(d) is an example of a sound quality selection GUI that makes the intensity of a sound quality component correspond to a numerical value.
Although the representation methods of the sound quality selection GUI in each of Figures 2(a), (b), (C), and (d) are different,
The mechanism for determining the strength of each named sound quality vector is common.
The sound quality generation method according to claim 1 has a higher degree of correspondence between sound quality expression and sensation than a method using a combination of bandpass filters such as a commonly used sound quality adjustment console. This can be achieved by sound quality design by additively synthesizing various sound quality components whose intensities are selected.
We will also show that it is easy for those who design, evaluate, and use sound quality to increase the degree of commonality and agreement between expressions and sensations by using the sound quality expression shown in the example in Figure 2.
As a result, it becomes possible to standardize sound quality component extraction algorithms and software supply means, standardize sound quality selection and determination means, and standardize evaluation.

図３は、請求項２の低音成分抽出手段の一実施例の特性である。
図４は、請求項２の高音成分抽出手段の一実施例の特性である。
図３と図4は低音と高音の違いであるので（）内を図４高音の説明とする。
抽出フィルターそのものは公知であるが、
請求項１の音質成分の生成手段とすることが請求項２の本質である。
低音(高音)成分抽出手段は、請求項１に記述の音質成分の一つであって、元音響データが楽曲である場合の最も重要な成分である。本案に供せられる音質成分は請求項１に記述の
音質効果の有効帯域で、元音響データに対し同相または逆相の関係になければならない。
特に、低音（高音）の音質効果のボーカル帯域への影響は、
低音（高音）域とボーカル域の音質のトレードオフ関係を生むことになり、音質仕上げ段階での妥協を迫られことから、
低音（高音）であっても、ボーカル帯域には感覚に影響を与えないようなフィルターが望ましい。
請求項２に記述の２次高域（低域）遮断フィルターは図３（図４）の特性を持つ。
これらは最も単純な公知のフィルターの一種である。
この特性は超低音（超高音）領域で高いゲインを確保しても、ボーカル帯域には強度にも位相にも与える影響は極めて小さい。なおかつ、全帯域で極を持たないので、音質に癖がなく、請求項１に供する最適フィルターの一つである。
2nd Low(High) Pass Gain はグラフが周波数ゲイン特性であること、
2nd Low(High) Pass Phase はグラフが周波数位相特性であることを示す。
Frequency Hz は横軸が周波数であることを示す。
Gain*30(Gain*15) は縦軸がゲインであって、抽出フィルターに、30倍(15倍)のゲインを持たせていることを示す。
70Hz(8000Hz) 上で約６倍(8倍)のゲインを持つ、この出力データを位相反転し、
元音響データに加えると、強力でも自然に感じる低音（高音）強調効果が得られる。
また、実験的にではあるが低音高音とも音源に接近したときに感じる音質となる。
Phase Degree は縦軸が度単位の位相であることを示す。 FIG. 3 shows the characteristics of an embodiment of the bass component extracting means according to the second aspect.
FIG. 4 shows the characteristics of an embodiment of the treble component extracting means according to the second aspect.
Since Figures 3 and 4 show the difference between bass and treble, the information in parentheses will be used to explain the treble in Figure 4.
Although the extraction filter itself is well known,
The essence of claim 2 is to provide the sound quality component generating means of claim 1.
The bass (treble) component extraction means is one of the sound quality components described in claim 1, and is the most important component when the original audio data is a song. The sound quality components provided for the present invention must be in the effective band of the sound quality effect described in claim 1, and must be in phase or out of phase with the original sound data.
In particular, the impact of bass (treble) sound quality effects on the vocal band is
This creates a trade-off relationship between the sound quality of the bass (treble) range and the vocal range, and we are forced to compromise at the sound quality finishing stage.
It is desirable to use a filter that does not affect the vocal range, even if it is low (or high).
The second-order high-pass (low-pass) cutoff filter described in claim 2 has the characteristics shown in FIG. 3 (FIG. 4).
These are one of the simplest known filters.
Even if this characteristic ensures high gain in the ultra-low (ultra-treble) range, it has very little effect on the intensity or phase of the vocal band. Furthermore, since there is no pole in the entire band, there is no peculiarity in sound quality, and this filter is one of the optimal filters according to claim 1.
2nd Low(High) Pass Gain indicates that the graph is the frequency gain characteristic,
2nd Low (High) Pass Phase indicates that the graph is a frequency phase characteristic.
Frequency Hz indicates that the horizontal axis is frequency.
Gain*30 (Gain*15) indicates the gain on the vertical axis, and indicates that the extraction filter has a gain of 30 times (15 times).
This output data, which has a gain of approximately 6 times (8 times) on 70Hz (8000Hz), is phase inverted,
When added to the original audio data, a powerful yet natural-sounding bass (treble) emphasis effect can be obtained.
Also, although this is experimental, both bass and treble sound quality is felt when approaching the sound source.
Phase Degree indicates that the vertical axis is the phase in degrees.

図５は、請求項２の低音と高音の成分抽出手段の組み合わせによる一実施例の総合特性であって、低音音質と高音音質の係数を変化させた場合の合成音質の周波数―ゲイン特性である。実験的に、係数の増減が音源との距離感に対応している、と感じる。
Gh(t)=H0*S(t)+H1*Low{S(t)}+H2*High{S(t)} は
元音響データH0*S(t)、低音成分H1*Low{S(t)}、高音成分H2*High{S(t)} とし、それらを加算することで全体の再生データを得ることを示す。
図５は、H0,H1,H2 が時間に関係なく固定係数の場合である。
{H0,H1,H2}={1,(30,21,15,9,6,3),(20,14,10,6,4,2}} は
H0 が1、H1とH2の組合せが (30,20),(21,14),(15,10),(9,6),(6,4),(3,2) であることを示す。Hz は横軸が周波数、dBV は縦軸の強度がデシベルであることを示す。 FIG. 5 shows the overall characteristics of an embodiment of the combination of the bass and treble component extracting means of claim 2, and is the frequency-gain characteristic of the synthesized sound quality when the coefficients of the bass sound quality and the treble sound quality are changed. . Experimentally, I feel that the increase or decrease in the coefficient corresponds to the sense of distance from the sound source.
Gh(t)=H0*S(t)+H1*Low{S(t)}+H2*High{S(t)} is the original acoustic data H0*S(t), the bass component H1*Low{S( t)}, and the high frequency component H2*High{S(t)}, and it is shown that the entire playback data is obtained by adding them.
FIG. 5 shows a case where H0, H1, and H2 are fixed coefficients regardless of time.
{H0,H1,H2}={1,(30,21,15,9,6,3),(20,14,10,6,4,2}} is
Show that H0 is 1 and the combination of H1 and H2 is (30,20),(21,14),(15,10),(9,6),(6,4),(3,2) . Hz indicates frequency on the horizontal axis, and dBV indicates intensity on the vertical axis in decibels.

図６は、請求項３の明瞭成分均一化データを得る手段の一例のブロック図である。
フィルター Band1{}, Band2{}, Band3{} は公知の2次帯域フィルターであるが、
請求項１の音質成分の抽出手段することが請求項３の本質である。
明瞭成分均一化データは、請求項１に記述の音質成分の一つであって、元音響データが
アナウンスやセリフである場合の最も重要な成分である。本案に供せられる音質成分は
請求項１に記述の音質効果の有効帯域で元音響データに対し同相または逆相の関係になければならない。
S(t) は元音響データ、Element(t) は明瞭成分均一化データ、
SALC は元音響データの強度の短時間内の抑揚を均一化する強度均一化手段である。特に部分的に強度が弱い区間の強度の均一化が重要である。短時間内の抑揚とはアナウンサー
の固有の癖にありがちな、強い語尾や弱い語尾による聞き取り難さをカバーすると同時に、明瞭成分を均一化するに必要なデータの前処理でもある。
SALC{S(t)} は強度が均一化された元音響データ、強度均一化データである。
PK{SALC{S(ｔ)}} は強度均一化データの強度であって均一化データ強度である。
元音響データの強度が均一であっても明瞭成分の強度に大きな抑揚がある。明瞭成分の
抑揚は弱難聴、難聴のリスナーにとって聞き取り難さの最大の要因の一つである。
Band1,Band2,Band3 は明瞭成分の検出のための3個の抽出フィルター、
PK は音響データの強度の検出手段、
COEFF1,COEFF2,COEFF3 はリスナーが感じるアナウンスやセリフの明瞭成分の強度を一定の水準に調節する係数調節手段、
R1,R2,R3 は元音響データの強度に対し、それぞれの帯域ごとに必要な明瞭成分の
強度の比率、
R1*PK{SALC{S(t)}}*Band1{S(t)},R2*PK{SALC{{S(t)}}*Band2{S(t)},
R3*PK{SALC{S(t)}}*Band3{S(t)} は、それぞれの帯域において、必要な強度の明瞭成分、
強度が均一化された元音響データである強度均一化データの強度であるところの、
均一化データ強度 PK{SALC{S8t}} に対し、明瞭度の改善に必要なウェイト R1,R2,R3 なる値を乗じることで、目標とする水準の明瞭度の明瞭成分を得ることができる。
Element(t)=
R1*PK{SALC{S(t)}}*Band1{S(t)}+R2*PK{SALC{S(t)}}*Band2{S(t)}+
R3*PK{SALC{S(t)}}*Band3{S(t)}
は、明瞭成分が均一化されたデータであって、請求項１の音質成分の一つである。 FIG. 6 is a block diagram of an example of means for obtaining clear component equalization data according to claim 3.
Filters Band1{}, Band2{}, Band3{} are well-known second-order band filters,
The essence of claim 3 is to provide the sound quality component extraction means of claim 1.
The clear component equalization data is one of the sound quality components described in claim 1, and is the most important component when the original audio data is an announcement or dialogue. The sound quality components provided for the present invention must be in phase with or out of phase with the original sound data in the effective band of the sound quality effect described in claim 1.
S(t) is the original acoustic data, Element(t) is the clear component homogenized data,
SALC is an intensity equalization method that equalizes the intonation of the intensity of the original acoustic data over a short period of time. In particular, it is important to equalize the intensity in sections where the intensity is partially weak. Intonation within a short period of time covers the difficulty of audibility caused by strong or weak endings, which tend to be an inherent characteristic of announcers, and at the same time, it is also a preprocessing of data necessary to equalize clear components.
SALC{S(t)} is the original acoustic data whose intensity has been equalized, and the intensity-equalized data.
PK{SALC{S(t)}} is the intensity of the intensity homogenized data, which is the homogenized data intensity.
Even if the intensity of the original acoustic data is uniform, there is a large intonation in the intensity of the clear component. The intonation of clear components is one of the biggest causes of difficulty in hearing for listeners with weak or hard of hearing.
Band1, Band2, Band3 are three extraction filters for detecting clear components,
PK is a means of detecting the intensity of acoustic data;
COEFF1, COEFF2, and COEFF3 are coefficient adjustment means that adjust the strength of the clear components of announcements and lines felt by the listener to a certain level.
R1, R2, and R3 are the ratio of the intensity of the clear component required for each band to the intensity of the original acoustic data,
R1*PK{SALC{S(t)}}*Band1{S(t)},R2*PK{SALC{{S(t)}}*Band2{S(t)},
R3*PK{SALC{S(t)}}*Band3{S(t)} is the clear component of the required intensity in each band,
The intensity of the intensity equalized data, which is the original acoustic data whose intensity has been equalized, is
By multiplying the equalized data strength PK{SALC{S8t}} by the weights R1, R2, and R3 necessary for improving intelligibility, the clarity component of the target level of intelligibility can be obtained.
Element(t)=
R1*PK{SALC{S(t)}}*Band1{S(t)}+R2*PK{SALC{S(t)}}*Band2{S(t)}+
R3*PK{SALC{S(t)}}*Band3{S(t)}
is data with uniform clarity components, and is one of the sound quality components of claim 1.

図７は、請求項３の抽出フィルターの一実施例の特性である。
2nd Band Pass Gain Gain=3 1045,2080Hz,4180Hz はそれぞれの抽出フィルターの、
ゲインが３で、中心周波数が 1045,2080Hz,4180Hz の周波数－ゲイン特性であることを示す。
2nd Band Pass Phase Gain=3 1045,2080Hz,4180Hz はそれぞれの抽出フィルターの、
周波数－位相特性であることを示す。
Frequency は横軸が周波数、Gain*3 は中心周波数におけるゲインが３、
Phase Degree は縦軸の単位が度であることを示す。
これらの特性は極めて一般的な帯域フィルターである。
帯域フィルターを、複数個に分け、それぞれ、明瞭度にとって重要な特性とする。
例えば、第1フォルマントの帯域の強度に対し第３フォルマントの帯域の強度が弱いと、
第３フォルマントの帯域を補強することで明瞭度は改善される。
さらに、弱難聴と難聴では必要な明瞭成分の配分と強度に違いがあることから、
供給側は用途に応じたデータを供給することで、リスナーの選択に委ねることができる。 FIG. 7 shows the characteristics of an embodiment of the extraction filter according to claim 3.
2nd Band Pass Gain Gain=3 1045,2080Hz,4180Hz are each extraction filter,
It shows that the gain is 3 and the center frequency is 1045, 2080Hz, 4180Hz.
2nd Band Pass Phase Gain=3 1045,2080Hz,4180Hz are each extraction filter's
It shows that the frequency-phase characteristic.
For Frequency, the horizontal axis is the frequency, and for Gain*3, the gain at the center frequency is 3.
Phase Degree indicates that the unit of the vertical axis is degrees.
These characteristics are very common bandpass filters.
The bandpass filter is divided into multiple parts, each with characteristics important for clarity.
For example, if the intensity of the third formant band is weaker than the intensity of the first formant band,
Intelligibility is improved by reinforcing the third formant band.
Furthermore, since there is a difference in the distribution and strength of the clear components required for weak hearing loss and hearing loss,
By supplying data according to the purpose, the supply side can leave it up to the listener's choice.

図８は、請求項３の明瞭成分均一化手段を使っての合成音質の効果の一例である。
図８（ａ）は、弱難聴者にとって聞き取り難い、
ナウンスの場合の、抽出明瞭成分の強度の一例のグラフである。
横軸は時間、縦軸は振幅である。難聴に自覚あるケースでは、聞き取りに
必要な明瞭成分の強度は一例として Required Clearness にある。この強度を満足する回数は２回である。明瞭成分は元音響データの強度とは一般的には必ずしも相関を持たない。訓練されたアナウンサーによる明瞭度は一様に近い。
同時通訳者の場合、声の抑揚が変化するケースは少なくない。
音量が放送設備によって均一に保たれても明瞭度は極度に頻繁に変動する。 FIG. 8 is an example of the effect of synthesized sound quality using the clear component equalization means of claim 3.
Figure 8(a) is difficult for people with weak hearing to hear.
It is a graph of an example of the intensity of the extracted clear component in the case of a announcement.
The horizontal axis is time and the vertical axis is amplitude. In cases where the user is aware of hearing loss, the strength of the clear component necessary for hearing can be found in Required Clearness, for example. The number of times this intensity is satisfied is two. Generally, the clear component does not necessarily have a correlation with the intensity of the original acoustic data. The intelligibility of trained announcers is nearly uniform.
In the case of simultaneous interpreters, there are many cases where the intonation of the voice changes.
Even if the volume is kept uniform by the broadcast equipment, the intelligibility fluctuates extremely frequently.

図８（ｂ）は、
図８(a)の変動する明瞭成分を図６の手法による一実施例を使って均一化したデータの
明瞭成分の強度である。 Required Clearness を満足する回数を正確に数えることは
難しいが、図８（ａ）に比べ大幅に改善されていることがわかる。
実際、図（ａ）と図（ｂ）の再生アナウンスを聞き比べると明瞭度の改善効果は著しい。 FIG. 8(b) is
This is the strength of the clear component of the data in which the varying clear component in FIG. 8(a) is equalized using an embodiment of the method of FIG. Although it is difficult to accurately count the number of times Required Clearness is satisfied, it can be seen that this is significantly improved compared to FIG. 8(a).
In fact, when comparing the playback announcements in Figures (a) and (b), the improvement in clarity is remarkable.

図８（ｃ）は、図８（ａ）と図８（ｂ）の明瞭成分のスペクトラム強度の比較を示す。
Not Needed Element for Clearness は明瞭度にとって不必要な帯域の成分、
Needed Element for Clearness は明瞭度にとって必要な帯域の成分である。
Original は図（ａ）の明瞭成分、Improved は図（ｂ）の明瞭成分である。
明瞭度にとって不要な帯域、例えばピッチに相当する部分の強度は減衰され、
有用なフォルマント帯域の抑揚は均一化され、補正され、そして強調される。
時系列の強度とスペクトラムの強度バランスが均一化されていることから明瞭成分の強調によって音量ばかり大きくなって不快感が強調されることはない。
図６のブロック図を図８の結果が得られる実働のＡＰＫに仕上げる作業は、
高度な信号処理のスキルを必要とする。めんどうなＡＰＫの製作にマンパワーを割くことなく、完成された、この種類の信号処理手段による抽出された成分データと音質合成手段を組合せることが請求項３の本質である。 FIG. 8(c) shows a comparison of the spectral intensities of the clear components in FIG. 8(a) and FIG. 8(b).
Not Needed Element for Clearness is the component in the band that is unnecessary for clarity.
Needed Element for Clearness is the component of the band necessary for clarity.
Original is the clear component in figure (a), and Improved is the clear component in figure (b).
The intensity of bands that are unnecessary for clarity, such as the part corresponding to pitch, is attenuated,
The intonation of useful formant bands is equalized, corrected, and emphasized.
Since the time-series intensity and spectrum intensity balance are equalized, emphasis on clear components will not increase the volume and emphasize discomfort.
The task of converting the block diagram in Figure 6 into a working APK that yields the results in Figure 8 is as follows:
Requires advanced signal processing skills. The essence of claim 3 is to combine the component data extracted by this type of completed signal processing means and the sound quality synthesis means without using manpower for the troublesome production of APK.

請求項１の捕捉説明
本案の本質は、音質を作る工程の分業化を可能とするところにある。
分業とは、それぞれの音質成分を作る複雑な作業工程と
前もって準備された音質成分を組み合わせる簡素な作業で全体の音質を仕上げる工程と
に分けることである。
音質成分を抽出するＡＰＫの供給側は完成度の高い音質成分抽出ＡＰＫを供給する。
音質の供給側は、既に準備されている音質成分抽出ＡＰＫが抽出する音質成分を組み合わせて、一種類または複数種類の供給音響データをリスナーに供給する。
リスナーから見れば、音質が一種類であれば、供給された音質を無条件に選択、
ネーミングされた複数の音質が提供されていれば、その内のいずれかを選択する。 The essence of the main feature of claim 1 is that it enables division of labor in the process of creating sound quality.
Division of labor means dividing the process into a complex work process that creates each sound quality component, and a process that completes the overall sound quality through simple work that combines previously prepared sound quality components.
The APK supplier for extracting sound quality components supplies a highly complete sound quality component extraction APK.
The sound quality supply side combines sound quality components extracted by sound quality component extraction APKs that have already been prepared, and supplies one or more types of supplied audio data to the listener.
From the listener's perspective, if there is only one type of sound quality, the listener will unconditionally select the supplied sound quality.
If a plurality of named sound qualities are provided, select one of them.

請求項２の捕捉説明
音質作りの最も基本は低音と高音である。
低音に関しては、１次の高域遮断をカスケードに接続した２次の高域遮断フィルター、
高音に関しては、１次の低域遮断をカスケードに接続した２次の低域遮断フィルター、
を音質成分抽出フィルターとする方法は最適な方法の一つである。
これらのフィルターそのものは公知であるが、図１の手法にこれらのフィルターを使うことが請求項２の本質である。 Acquisition explanation of claim 2 The most basic sound quality creation is bass and treble.
For bass, there is a 2nd order high cutoff filter that connects the 1st order high cutoff in cascade.
For treble, we use a 2nd-order low-frequency cutoff filter that connects the 1st-order low-frequency cutoff in cascade;
One of the best methods is to use it as a sound quality component extraction filter.
Although these filters themselves are known, the essence of claim 2 is to use these filters in the method of FIG.

請求項３の捕捉説明
音質作りにとって、アナウンスやセリフの明瞭化は重要である。
メッセージを伝える声の音質成分の内、ピッチ成分は不要、フォルマント帯域を主とする成分が重要である。特に、弱難聴者や難聴者にとって、明瞭成分の強度が安定していると聞き取りやすい。しかし、明瞭成分の強度の抑揚は、聞き取り難くさの要因の一つとなる。
聞き取り難いからといって全体の音量を上げると周囲に迷惑をかける。
高性能の明瞭化には複雑な信号処理を必要とするが、
複雑な信号処理を担当する作業と
高明瞭成分のデータと元音響データを組み合わせて高明瞭度の音質を作る作業と、
に分業化することで、高明瞭度の音質を提供することが容易になる。 Clarification of announcements and lines is important for creating the capture explanation sound quality of claim 3.
Among the sound quality components of a voice that conveys a message, pitch components are unnecessary, and components mainly in the formant band are important. In particular, it is easier for people with weak hearing or hearing loss to hear if the strength of the clear component is stable. However, the intonation of the intensity of the clear component is one of the factors that makes it difficult to hear.
If you raise the overall volume just because it is difficult to hear, it will disturb those around you.
High-performance clarity requires complex signal processing, but
The work involves complex signal processing, the work of combining high-clarity component data and original audio data to create high-clarity sound quality,
This division of labor makes it easier to provide highly intelligible sound quality.

請求項４の捕捉説明
請求項１の仕組みの内、音質成分を作る機能と作られた成分データが端末器内部に存在する状態で、端末器単独で選択した音質の再生音データを合成する。 Acquisition explanation of claim 4 In the mechanism of claim 1, reproduced sound data of a selected sound quality is synthesized by the terminal device alone in a state where the function of creating the sound quality component and the created component data exist inside the terminal device.

S(t) 元音響データ
Gh(t) 成分係数セットが H の再生音データ
Fk{S(t)} k=0,1,2,,,n 音質データセット
Hk(t) k=0,1,2,,,n 成分係数セット
MIX 音質データセットの加算手段
SV サーバー
TM 端末器
Ａ、Ｂ、Ｃ音質の種類
Ga(ｔ),Gb(t),Gc(t) それぞれ音質Ａ、音質Ｂ、音質Ｃ、の音質データセット
Hak(t) k=0,1,2,,,n 音質Ａ生成用の成分係数セット
Hbk(t) k=0,1,2,,,n 音質Ｂ生成用の成分係数セット
Hck(t) k=0,1,2,,,n 音質Ｃ生成用の成分係数セット S(t) Original acoustic data
Gh(t) Playback sound data with component coefficient set H
Fk{S(t)} k=0,1,2,,,n Sound quality dataset
Hk(t) k=0,1,2,,,n component coefficient set
Addition method for MIX sound quality data set
SV server
TM Terminal A, B, C Sound quality type
Ga(t), Gb(t), Gc(t) Sound quality data sets for sound quality A, sound quality B, and sound quality C, respectively.
Hak(t) k=0,1,2,,,n Component coefficient set for sound quality A generation
Hbk(t) k=0,1,2,,,n Component coefficient set for sound quality B generation
Hck(t) k=0,1,2,,,n Component coefficient set for sound quality C generation

Name0 元音響データ S(t) の音質ベクトル名称
Namek k=1,2,,,n 音質成分 Fk{S(t)} の音質ベクトル名称 Name0 Sound quality vector name of original acoustic data S(t)
Namek k=1,2,,,n Sound quality vector name of sound quality component Fk{S(t)}

2nd Low Pass Gain 2次低域フィルターの周波数－ゲイン特性
2nd Low Pass Phase 2次低域フィルターの周波数－位相特性
Gain*30 縦軸がゲイン、中心周波数での縦軸ゲインが30倍
Frequency Hz 横軸が周波数
Phase Degree 縦軸が位相（度） 2nd Low Pass Gain Frequency-gain characteristics of the 2nd low pass filter
2nd Low Pass Phase Frequency-phase characteristics of the 2nd low pass filter
Gain*30 The vertical axis is the gain, and the vertical axis gain at the center frequency is 30 times
Frequency Hz Horizontal axis is frequency
Phase Degree Vertical axis is phase (degrees)

2nd High Pass Gain 2次高域フィルターの周波数－ゲイン特性
2nd High Pass Phase 2次高域フィルターの周波数－位相特性
Gain*15 縦軸がゲイン、中心周波数での縦軸ゲインが１５倍 2nd High Pass Gain Frequency-gain characteristics of the 2nd-order high-pass filter
2nd High Pass Phase Frequency-phase characteristics of 2nd-order high-pass filter
Gain*15 The vertical axis is the gain, and the vertical axis gain at the center frequency is 15 times

Gh(t)=H0*S(t)+H1*Low{S(t)}+H2*High{S(t)} 低音と高音の音質データセット
{H0,H1,H2}={1,(30,21,15,9,6,3),(20,14,10,6,4,2}} 固定の成分係数セット
Hz 横軸が周波数
dBV 縦軸がゲイン単位はデシベル Gh(t)=H0*S(t)+H1*Low{S(t)}+H2*High{S(t)} Bass and treble sound quality dataset
{H0,H1,H2}={1,(30,21,15,9,6,3),(20,14,10,6,4,2}} Fixed component coefficient set
Hz Horizontal axis is frequency
dBV Vertical axis is gain, unit is decibel

S(t) 元音響データ
SALC 強度均一化手段
SALC{S(t)} 強度均一化データ
Element(t) 明瞭成分均一化データ
PK データ強度の計算手段
Pk{SALC{S(t)}} 均一化データ強度
Band1,Band2,Band3 明瞭成分の抽出フィルター
COFF1,COFF2,COFF3 係数器
R1,R2,R3 均一化データ強度に対する抽出フィルターの出力強度の比
R1*PK{SALC{S(t)}}*Band1{S(t)}, R2*PK{SALC{S(t)}}*Band2{S(t)},
R3*PK{SALC{S(t)}}*Band3{S(t)}
それぞれの抽出フィルターの出力データの強度
Element(t)=
R1*PK{SALC{S(t)}}*Band1{S(t)}+R2*PK{SALC{S(t)}}*Band2{S(t)}+
R3*PK{SALC{S(t)}}*Band3{S(t)}
明瞭成分均一化データ S(t) Original acoustic data
SALC strength equalization means
SALC{S(t)} Intensity homogenization data
Element(t) Clear component equalization data
PK data strength calculation method
Pk{SALC{S(t)}} Equalized data intensity
Band1,Band2,Band3 Clear component extraction filter
COFF1,COFF2,COFF3 Coefficient unit
R1,R2,R3 Ratio of output intensity of extraction filter to equalized data intensity
R1*PK{SALC{S(t)}}*Band1{S(t)}, R2*PK{SALC{S(t)}}*Band2{S(t)},
R3*PK{SALC{S(t)}}*Band3{S(t)}
Intensity of output data for each extraction filter
Element(t)=
R1*PK{SALC{S(t)}}*Band1{S(t)}+R2*PK{SALC{S(t)}}*Band2{S(t)}+
R3*PK{SALC{S(t)}}*Band3{S(t)}
Clear component homogenization data

2nd Band Pass Gain Gain=3 1045Hz,2080Hz,4180Hz 記載の周波数であって
ゲインが３の３個の抽出フィルターの周波数―ゲイン特性であることを示す。
2nd Band Pass Phase Gain=3 1045,2080Hz,4180Hz 記載の周波数であって
ゲインが３の３個の抽出フィルターの周波数―位相特性であることを示す。

Gain*3 抽出フィルターの中心周波数におけるゲインが３
Frequency 周波数
Phase Degree 単位が度の位相 2nd Band Pass Gain Gain=3 1045Hz,2080Hz,4180Hz Indicates the frequency-gain characteristics of three extraction filters with a gain of 3 at the stated frequencies.
2nd Band Pass Phase Gain=3 1045,2080Hz,4180Hz Indicates the frequency-phase characteristics of three extraction filters with a gain of 3 at the stated frequencies.

Gain*3 The gain at the center frequency of the extraction filter is 3.
Frequency
Phase Degree Phase in degrees

Clearness Element in S(t) 元音響データS(t) の明瞭成分強度特性
Required Clearness 必要な明瞭成分強度
Un-stable Clearness 抑揚が激しい明瞭成分の強度
Clearness Element in Element{S(t)} 明瞭成分の強度
Stable Clearness 抑揚が均一な明瞭成分の強度
Formant Spectrum 音声フォルマントのスペクトラム
Not Needed Element for Clearness 明瞭度にとって不必要な成分
Needed Element for Clearness 明瞭度にとって必要な成分

Clearness Element in S(t) Clearness component intensity characteristics of original acoustic data S(t)
Required Clearness Required clear component strength
Un-stable Clearness Strength of clear component with severe intonation
Clearness Element in Element{S(t)} Strength of clear component
Stable Clearness Strength of clear component with uniform intonation
Formant Spectrum Spectrum of vocal formants
Not Needed Element for Clearness Component unnecessary for clarity
Needed Element for Clearness Ingredients necessary for clarity

再生音データGh(t) の生成
人の音質感覚に馴染みやすい音質表現のＧＵＩと成分データFk{S(t)} の生成
複数の、成分データFk{S(t)} と係数データHk(t) からなる音響データの生成
音質名称nameh と再生音データGh(t) のセットからなる音響データの生成 Generation of playback sound data Gh(t)
Generation of sound quality representation GUI and component data Fk{S(t)} that is easily familiar to people's sense of sound quality
Generation of acoustic data consisting of multiple component data Fk{S(t)} and coefficient data Hk(t)
Generation of audio data consisting of a set of sound quality name nameh and playback sound data Gh(t)

以下は図面を用いての請求項の捕捉説明である。
図１は請求項１の一実施例と応用例について、データの流れを示すブロック図である。
成分係数セットHk(t) を使っての音質選択の実施例と応用例である。
S(t) は元音響データ、Gh(t) は成分係数セットが H の場合の再生音データ、
Fk{} k=1,2,,,n は各音質成分を抽出する抽出フィルター
Fk{S(t)} k=0,1,2,,,n は元音響データの各音質成分、但し、F0{S(t)}=S(t)} とする。
Hk(t) k=0,1,2,,,n は音質生成に使われる成分係数セット、
Hk*Fk{S(t)} k=0,1,2,,,n は音質の合成使われる各音質成分、
SV はサーバー、 Ga(t),Gb(t),Gc(t) は成分係数セット H がそれぞれ A,B,C である場合の再生に供せられる再生音データセット、
TH はリスナーの端末器、Ａ、Ｂ、Ｃは端末器側が選択する音質の種類、である。
Gh(t)=Σ{Hk(t)*Fk{S(t)} k=0,1,2,,,n
は成分係数セットが H の場合の再生データ、
Ga(t)=Σ{Hak(t)*Fk{S(t))}、Gb(t)=Σ{Hbk(t)*Fk{S(t))}、Gc(t)=Σ{Hck(t)*Fk{S(t))} k=0,1,2,,,n
は成分係数セットがそれぞれＡ、Ｂ、Ｃの場合の再生データである。 The following is an explanatory statement of the claims with the aid of the drawings.
FIG. 1 is a block diagram showing the flow of data regarding one embodiment and application example of claim 1.
These are examples and application examples of sound quality selection using the component coefficient set Hk(t).
S(t) is the original sound data, Gh(t) is the reproduced sound data when the component coefficient set is H,
Fk{} k=1,2,,,n is an extraction filter that extracts each sound quality component
Fk{S(t)} k=0,1,2,,,n is each sound quality component of the original sound data, where F0{S(t)}=S(t)}.
Hk(t) k=0,1,2,,,n is the component coefficient set used for sound quality generation,
Hk*Fk{S(t)} k=0,1,2,,,n is each sound quality component used for sound quality synthesis,
SV is the server, Ga(t), Gb(t), and Gc(t) are the reproduced sound data sets used for reproduction when the component coefficient set H is A, B, and C, respectively.
TH is the listener's terminal, and A, B, and C are the types of sound quality selected by the terminal.
Gh(t)=Σ{Hk(t)*Fk{S(t)} k=0,1,2,,,n
is the reproduced data when the component coefficient set is H,
Ga(t)=Σ{Hak(t)*Fk{S(t))}, Gb(t)=Σ{Hbk(t)*Fk{S(t))}, Gc(t)=Σ{Hck (t)*Fk{S(t))} k=0,1,2,,,n
are the reproduced data when the component coefficient sets are A, B, and C, respectively.

図２は、請求項１の音質データセットと成分係数セットを使っての、音質の選択と決定の実施例の説明図である。
成分データの各音質成分は、元音響データの供給側と第三者のいずれかによって作られる。
そして、音質選択は音響データを作る側、第三者、リスナーのいずれかによる。
図２による音質の決定方法は、関係者間で音質名と実質効果と感覚評価の一致度が高い。
音質は、独立した音質軸の選択とその大きさの組合せで表現できる。
Namek k=0,1,2,,,n は音質 Fk{S(t)} の名称、
Hk(t) k=0,1,2,,,n は音質生成に使われる成分係数セット、
Vk(t) k=0,1,2,,,n はそれぞれの成分係数の値
である
図２（ａ）は、音質成分の強度を音質軸の長さとする音質選択ＧＵＩの一例である。
図２（ｂ）は、音質成分の強度を音質軸の長さとする音質選択ＧＵＩの一例である。
図２（ｃ）は、音質成分の強度を横位置に対応させる音質選択ＧＵＩの一例である。
図２（ｄ）は、音質成分の強度を数値に対応させる音質選択ＧＵＩの一例である。
図２（ａ）、（ｂ）、（Ｃ）、（ｄ）それぞれの音質選択ＧＵＩの表現方法は異なるが、
名称付けられた、個々の音質の強度を決める仕組みは共通である。
請求項１の音質生成方法は、一般に用いられる音質調整卓のような帯域フィルターの組合せによる方法と比べ、音質の表現と感覚の一致度が高い。このことは音質設計が、強度を選択された各種音質成分を加算合成することで叶うところにある。
そして、音質を設計する側、評価する側、利用する側、のそれぞれが図２の例に示す音質表現によって、表現と感覚の共有度と一致度を高めることが容易であることを示す。
結果、音質成分の抽出アルゴリズムとソフトウェアの生成手段と供給手段の標準化と
音質の選択と決定手段の標準化と評価に関しての標準化が可能となる。 FIG. 2 is an explanatory diagram of an example of selection and determination of sound quality using the sound quality data set and component coefficient set of claim 1.
Each sound quality component of the component data is created either by the supplier of the original audio data or by a third party.
The sound quality is selected by either the person creating the audio data, a third party, or the listener.
The sound quality determination method shown in FIG. 2 has a high degree of agreement between the sound quality name, practical effect, and sensory evaluation among the parties involved.
Sound quality can be expressed by selecting independent sound quality axes and combining their sizes.
Namek k=0,1,2,,,n is the name of the sound quality Fk{S(t)},
Hk(t) k=0,1,2,,,n is the component coefficient set used for sound quality generation,
Vk(t) k=0,1,2,,,n is the value of each component coefficient. FIG. 2(a) is an example of a sound quality selection GUI in which the strength of the sound quality component is the length of the sound quality axis .
FIG. 2(b) is an example of a sound quality selection GUI that uses the strength of a sound quality component as the length of the sound quality axis .
FIG. 2C is an example of a sound quality selection GUI that makes the intensity of sound quality components correspond to the horizontal position.
FIG. 2(d) is an example of a sound quality selection GUI that makes the intensity of a sound quality component correspond to a numerical value.
Although the representation methods of the sound quality selection GUI in each of Figures 2(a), (b), (C), and (d) are different,
The named mechanism for determining the strength of each sound quality is common.
The sound quality generation method according to claim 1 has a higher degree of correspondence between sound quality expression and sensation than a method using a combination of bandpass filters such as a commonly used sound quality adjustment console. This can be achieved by sound quality design by additively synthesizing various sound quality components whose intensities are selected.
We will also show that it is easy for those who design, evaluate, and use sound quality to increase the degree of commonality and agreement between expressions and sensations by using the sound quality expression shown in the example in Figure 2.
As a result, it becomes possible to standardize sound quality component extraction algorithms, software generation means and supply means , standardization of sound quality selection and determination means, and standardization of evaluation.

図６は、請求項３の明瞭成分均一化データを得る手段の一例のブロック図である。
フィルター Band1{}, Band2{}, Band3{} は公知の2次帯域フィルターであるが、
請求項１の音質成分の抽出手段することが請求項３の本質である。
明瞭成分均一化データは、請求項１に記述の音質成分の一つであって、元音響データが
アナウンスやセリフである場合の最も重要な成分である。本案に供せられる音質成分は
請求項１に記述の音質効果の有効帯域で元音響データに対し同相または逆相の関係になければならない。
S(t) は元音響データ、Element(t) は明瞭成分均一化データ、
SALC は元音響データの強度の短時間内の抑揚を均一化する強度均一化手段である。特に部分的に強度が弱い区間の強度の均一化が重要である。短時間内の抑揚とはアナウンサー
の固有の癖にありがちな、強い語尾や弱い語尾による聞き取り難さをカバーすると同時に、明瞭成分を均一化するに必要なデータの前処理でもある。
SALC{S(t)} は強度が均一化された元音響データ、強度均一化データである。
PK{SALC{S(ｔ)}} は強度均一化データの強度であって均一化データ強度である。
元音響データの強度が均一であっても明瞭成分の強度に大きな抑揚がある。明瞭成分の
抑揚は弱難聴、難聴のリスナーにとって聞き取り難さの最大の要因の一つである。
Band1,Band2,Band3 は明瞭成分の検出のための3個の抽出フィルター、
PK は音響データの強度の検出手段、
COEFF1,COEFF2,COEFF3 はリスナーが感じるアナウンスやセリフの明瞭成分の強度を一定の水準に調節する係数調節手段、
R1,R2,R3 は元音響データの強度に対し、それぞれの帯域ごとに必要な明瞭成分の
強度の比率、
R1*PK{SALC{S(t)}}*Band1{S(t)},R2*PK{SALC{{S(t)}}*Band2{S(t)},
R3*PK{SALC{S(t)}}*Band3{S(t)} は、それぞれの帯域において、必要な強度の明瞭成分、
強度が均一化された元音響データである強度均一化データの強度であるところの、
均一化データ強度 PK{SALC{S(t}} に対し、明瞭度の改善に必要なウェイト R1,R2,R3 なる値を乗じることで、目標とする水準の明瞭度の明瞭成分を得ることができる。
Element(t)=
R1*PK{SALC{S(t)}}*Band1{S(t)}+R2*PK{SALC{S(t)}}*Band2{S(t)}+
R3*PK{SALC{S(t)}}*Band3{S(t)}
は、明瞭成分が均一化されたデータであって、請求項１の音質成分の一つである。 FIG. 6 is a block diagram of an example of means for obtaining clear component equalization data according to claim 3.
Filters Band1{}, Band2{}, Band3{} are well-known second-order band filters,
The essence of claim 3 is to provide the sound quality component extraction means of claim 1.
The clear component equalization data is one of the sound quality components described in claim 1, and is the most important component when the original audio data is an announcement or dialogue. The sound quality components provided for the present invention must be in phase with or out of phase with the original sound data in the effective band of the sound quality effect described in claim 1.
S(t) is the original acoustic data, Element(t) is the clear component homogenized data,
SALC is an intensity equalization method that equalizes the intonation of the intensity of the original acoustic data over a short period of time. In particular, it is important to equalize the intensity in sections where the intensity is partially weak. Intonation within a short period of time covers the difficulty of audibility caused by strong or weak endings, which tend to be an inherent characteristic of announcers, and at the same time, it is also a preprocessing of data necessary to equalize clear components.
SALC{S(t)} is the original acoustic data whose intensity has been equalized, and the intensity-equalized data.
PK{SALC{S(t)}} is the intensity of the intensity homogenized data, which is the homogenized data intensity.
Even if the intensity of the original acoustic data is uniform, there is a large intonation in the intensity of the clear component. The intonation of clear components is one of the biggest causes of difficulty in hearing for listeners with weak or hard of hearing.
Band1, Band2, Band3 are three extraction filters for detecting clear components,
PK is a means of detecting the intensity of acoustic data;
COEFF1, COEFF2, and COEFF3 are coefficient adjustment means that adjust the strength of the clear components of announcements and lines felt by the listener to a certain level.
R1, R2, and R3 are the ratio of the intensity of the clear component required for each band to the intensity of the original acoustic data,
R1*PK{SALC{S(t)}}*Band1{S(t)},R2*PK{SALC{{S(t)}}*Band2{S(t)},
R3*PK{SALC{S(t)}}*Band3{S(t)} is the clear component of the required intensity in each band,
The intensity of the intensity equalized data, which is the original acoustic data whose intensity has been equalized, is
By multiplying the equalized data strength PK{SALC{S(t}} by the weights R1, R2, R3 necessary for improving intelligibility, we can obtain the clarity component of the target level of intelligibility. can.
Element(t)=
R1*PK{SALC{S(t)}}*Band1{S(t)}+R2*PK{SALC{S(t)}}*Band2{S(t)}+
R3*PK{SALC{S(t)}}*Band3{S(t)}
is data with uniform clarity components, and is one of the sound quality components of claim 1.

図７は、請求項３の明瞭成分抽出フィルターの一実施例の特性である。
2nd Band Pass Gain Gain=3 1045,2080Hz,4180Hz はそれぞれの抽出フィルターの、
ゲインが３で、中心周波数が 1045,2080Hz,4180Hz の周波数－ゲイン特性であることを示す。
2nd Band Pass Phase Gain=3 1045,2080Hz,4180Hz はそれぞれの抽出フィルターの、
周波数－位相特性であることを示す。
Frequency は横軸が周波数、Gain*3 は中心周波数におけるゲインが３、
Phase Degree は縦軸の単位が度であることを示す。
これらの特性は極めて一般的な帯域フィルターである。
帯域フィルターを、複数個に分け、それぞれ、明瞭度にとって重要な特性とする。
例えば、第1フォルマントの帯域の強度に対し第３フォルマントの帯域の強度が弱いと、
第３フォルマントの帯域を補強することで明瞭度は改善される。
さらに、弱難聴と難聴では必要な明瞭成分の配分と強度に違いがあることから、
供給側は用途に応じたデータを供給することで、リスナーの選択に委ねることができる。 FIG. 7 shows the characteristics of an embodiment of the clear component extraction filter according to claim 3.
2nd Band Pass Gain Gain=3 1045,2080Hz,4180Hz are each extraction filter,
It shows that the gain is 3 and the center frequency is 1045, 2080Hz, 4180Hz.
2nd Band Pass Phase Gain=3 1045,2080Hz,4180Hz are each extraction filter's
It shows that the frequency-phase characteristic.
For Frequency, the horizontal axis is the frequency, and for Gain*3, the gain at the center frequency is 3.
Phase Degree indicates that the unit of the vertical axis is degrees.
These characteristics are very common bandpass filters.
The bandpass filter is divided into multiple parts, each with characteristics important for clarity.
For example, if the intensity of the third formant band is weaker than the intensity of the first formant band,
Intelligibility is improved by reinforcing the third formant band.
Furthermore, since there is a difference in the distribution and strength of the clear components required for weak hearing loss and hearing loss,
By supplying data according to the purpose, the supply side can leave it up to the listener's choice.

請求項１の捕捉説明
本案の本質は、音質を作る工程の分業化を可能ならしめるところの信号生成手段にある。
分業とは、それぞれの音質成分を作る複雑な作業工程と
前もって準備された音質成分を組み合わせる簡素な作業で全体の音質を仕上げる工程と
に分けることである。
音質成分を抽出するＡＰＫの供給側は完成度の高い音質成分抽出ＡＰＫを供給する。
音質の供給側は、既に準備されている音質成分抽出ＡＰＫが抽出する音質成分を組み合わせて、一種類または複数種類の供給音響データをリスナーに供給する。
リスナーから見れば、音質が一種類であれば、供給された音質を無条件に選択、
ネーミングされた複数の音質が提供されていれば、その内のいずれかを選択する。 The essence of the main proposal of claim 1 lies in the signal generation means that enables division of labor in the process of creating sound quality.
Division of labor means dividing the process into a complex work process that creates each sound quality component, and a process that completes the overall sound quality through simple work that combines previously prepared sound quality components.
The APK supplier for extracting sound quality components supplies a highly complete sound quality component extraction APK.
The sound quality supply side combines sound quality components extracted by sound quality component extraction APKs that have already been prepared, and supplies one or more types of supplied audio data to the listener.
From the listener's perspective, if there is only one type of sound quality, the listener will unconditionally select the supplied sound quality.
If a plurality of named sound qualities are provided, select one of them.

Name0 元音響データ S(t) の音質名称
Namek k=1,2,,,n 音質成分 Fk{S(t)} の音質名称 Name0 Sound quality name of original sound data S(t)
Namek k=1,2,,,n Sound quality name of sound quality component Fk{S(t)}

Claims

A listener is a person who plays and listens to a sound signal,
The acoustic data provided to the listener is the original acoustic data,
The original acoustic data shall consist of single or multiple time series data,
Let the original acoustic data be S(t), where (t) means time series data,
Sound quality names are words that correspond to the expression of the texture of the sound, such as bass, treble, presence, distance, tight bass, vocal prominence, clarity, etc.
Corresponding to multiple types of sound quality names extracted by calculation based on the original acoustic data S(t),
Let Fk{S(t)} k=0,1,2,,,k,,,n be a specific sound quality component,
A means for extracting sound quality components is referred to as a sound quality component extraction means,
The sound quality component Fk{S(t)} is expressed as follows in relation to the original sound data S(t):
In the band where the sound quality effect is effective, there shall be an in-phase or anti-phase relationship,
What is in-phase? In the effective range of sound quality effects, the phase with the original audio data is plus or minus 90 degrees. What is out-of-phase? Within the effective range of sound quality effects, the phase with the original audio data is opposite to the above in-phase. shall be in a relationship;
Fk{S(t)} k=1,2,,,n are component data extracted from the original acoustic data.
year,
The component data is based on the PCM format, and if it has been compressed and converted, it is demodulated PCM format time series data,
Let F0{S(t)} be the original acoustic data S(t).
Let Fk{S(t)} k=0,1,2,,,n be the sound quality dataset,
The first method is to obtain a sound quality data set,
Let Hk(t) k=0.1.2…k,,,n be the component coefficient set,
The component coefficient set is
When the size is a value including 0 and the sign is negative or positive, and it is a time series variable,
Assume that there is a case of a fixed coefficient that is constant throughout and does not depend on time,
Hk(t)*Fk{S(t)} k=0,1,2,,,,n is
Product of Hk(tj) and Fk{S(tj)} at sampling time tj j=0,1,2,,,,j,,,m
Let Hk(tj)*Fk{A(tj)},
Gh(t)=Σ{Hk(t)*Fk{S(t))} Let k=0,1,2,,,n be the reproduced sound data,
The second means of obtaining the reproduced sound data Gh(t) is
In the relationship between the supply side and the user side of the reproduced sound data Gh(t),
The reproduction sound data is supplied either by the source of the original sound data or by a third party.
Let the multiple types of component coefficient sets be Ha(t), Hb(t), Hc(t),,,
Component coefficient set Hk(t) k=0,1,2,,,n Sound quality data set Gh(t) when H is A, B, C, .
Ga(t)=Σ{Hak(t)*Fk{S(t))} k=0,1,2,,,n
Gb(t)=Σ{Hbk(t)*Fk{S(t))} k=0,1,2,,,n
Gc(t)=Σ{Hck(t)*Fk{S(t))} k=0,1,2,,,n
,,,,
year,
Regarding the supply of Gh(t) to listeners,
There may be cases where one type is supplied or two or more types are supplied.
Regarding H, the sound quality name that H has is A, B, C,,,
Namea,Nameb,Namec,,, respectively,
By supplying the set of sound quality name and playback sound data {Nameh,Gh(t)},
The means by which the listener can select the sound quality name Nameh and use the reproduced sound data Gh(t) is defined as the supplied sound data selection means,
The supplied sound data selection means includes the case where the reproduced sound data Gh(t) is of one type,
a third supply acoustic data selection means;
A sound data supply means having a first, a second, and a third.

Regarding the sound quality component extraction means of claim 1,
The bass component extraction means is a second-order filter in which a known first-order high-frequency cutoff filter is connected in two stages in cascade, and the output data of the bass component extraction means is the sound quality data set Fk{S(t) described in claim 1. } The fourth is to make it one of the
The treble component extraction means is a second-order filter in which a known first-order low-cut filter is connected in two stages in cascade, and the output data of the treble component extraction means is the sound quality data set Fk{S(t) described in claim 1. } The fifth is to make it one of the
Where the reproduced sound data Gh(t) according to claim 1 has at least a fourth and a fifth,
An acoustic data supply means comprising the first, second and third means according to claim 1.

When the original acoustic data is audio data, a means for equalizing the intonation of the intensity within a short time period of the original acoustic data is defined as an intensity equalization means,
Let the output of the intensity equalization means be intensity equalization data,
Let the intensity of the intensity homogenized data be the homogenized data intensity,
Regarding the sound quality component extraction means of claim 1,
A known second-order bandpass filter is referred to as a second-order bandpass filter,
Intensity equalization data is used as a common input, and the effective band has a correlation with speech clarity.
A filter composed of multiple second-order band filters is called a clear component extraction filter,
Let the output of the clear component extraction filter be the clear component,
At least one of the clear components shall have a means for keeping the ratio of the intensity of the clear component and the equalized data intensity constant, and this function shall be defined as a clear component equalization means,
Data obtained by adding the output data of a plurality of clear component extraction filters having at least one clear component equalization means is defined as clear component equalization data,
The clear component homogenized data shall be provided to one of the clear components FkS(t)} of claim 1,
The clear component uniformization means is the sixth,
having the sixth,
An acoustic data supply means comprising the first, second and third means according to claim 1.

A terminal device is an electronic device with a built-in computer and a sound reproduction function.
In the terminal device, regarding the sound quality data set Fk{S(t)} of claim 1,
A seventh means for storing the sound quality data set supplied from the supply side in the terminal device,
having means for generating a sound quality data set within the terminal; and
An eighth means includes a means for storing the generated sound quality data set in the terminal,
The ninth means is a means for generating reproduced sound data Hk*Fk{S(t)} k=0,1,2,,,n in the terminal,
Sound quality generating means for a terminal device having one of the seventh and eighth elements and the ninth element.