JP6482880B2

JP6482880B2 - Mixing apparatus, signal mixing method, and mixing program

Info

Publication number: JP6482880B2
Application number: JP2015007380A
Authority: JP
Inventors: 弘太高橋
Original assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS
Current assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS
Priority date: 2015-01-19
Filing date: 2015-01-19
Publication date: 2019-03-13
Anticipated expiration: 2035-01-19
Also published as: JP2016134706A

Description

本発明は、２以上の入力信号のミキシング技術に関する。 The present invention relates to a technique for mixing two or more input signals.

ミキシングの基本構成は、複数の入力信号の加算である。レコーディングや放送の現場では、ミキシングされた各種楽器や音声をバランスよく聴かせるために、加算器の前後にイコライザが配置されている。イコライザを用いることで、入力信号ごとに所望の周波数帯域を強調したり、バックグラウンドの中の重要でない周波数帯域の音量を下げて混合音中の優先音を引き立てる等の処理が行われる。 The basic configuration of mixing is the addition of a plurality of input signals. In the field of recording and broadcasting, equalizers are placed before and after the adder in order to listen to a variety of mixed instruments and audio in a balanced manner. By using the equalizer, processing such as emphasizing a desired frequency band for each input signal or lowering the volume of an insignificant frequency band in the background to enhance the priority sound in the mixed sound is performed.

一方、入力信号を時間周波数平面上に展開し、時間周波数平面上の点ごとに振幅調整と位相調整を行った後に加算して、時間軸上の信号に戻す「スマートミキシング」の手法が提案されている（たとえば、特許文献１参照）。音声や楽器音は、時間周波数平面上できめ細かな構造を持っている。各種入力信号を、時間周波数平面上での構造に合わせてミキシング処理を行うことで、よりきめ細かなミキシング処理が可能になる。特許文献１では、時間周波数平面上での処理対象点と所定の関係を有する他の点の入力データを用いて処理対象点の信号特性を判断し、その信号特性に応じて優先音の明りょう度を決定している。 On the other hand, a “smart mixing” method has been proposed in which the input signal is developed on the time-frequency plane, and after amplitude adjustment and phase adjustment are performed for each point on the time-frequency plane, they are added and returned to the signal on the time axis. (For example, refer to Patent Document 1). Voices and musical instrument sounds have a fine structure on the time-frequency plane. By performing various kinds of input signals in accordance with the structure on the time-frequency plane, it is possible to perform more detailed mixing processing. In Patent Document 1, the signal characteristic of a processing target point is determined using input data of another point having a predetermined relationship with the processing target point on the time-frequency plane, and the priority sound is clarified according to the signal characteristic. The degree is determined.

特許第５０５７５３５号Patent No. 5057535

特許文献１では、入力信号に与えられるゲインを最適化する合理的な方法が規定されていない。特許文献１のミキシング方法を現実に応用する場合、周波数ごとのゲインは音源にあわせて試行錯誤で決定されることになり、適切なゲインが設定されない場合がある。また、従来からの一般的なミキシング技術は経験と勘に頼っており、ゲイン設定やイコライザの特性設定に関しての合理的な基準が確立されていない。 In Patent Document 1, a rational method for optimizing a gain given to an input signal is not defined. When the mixing method of Patent Document 1 is actually applied, the gain for each frequency is determined by trial and error in accordance with the sound source, and an appropriate gain may not be set. In addition, conventional general mixing techniques rely on experience and intuition, and no reasonable standards have been established for gain setting and equalizer characteristic setting.

ミキシングで優先的に明瞭化されるべき音（以下、「優先音」と称する）の周波数ごとのゲインが適切に設定されないと、以下の問題が生じる。まず、優先音のゲインの変化が強すぎると、出力（混合音）中の優先音として不自然な感じになり、音として聴こえても音声として内容を聴き分けることができない。また、優先音の音量変化や音質変化が強すぎて不快感を生じさせることがある。逆にゲインの変化が弱すぎると、優先音が十分に聴き取れない。 The following problems arise if the gain for each frequency of the sound to be preferentially clarified by mixing (hereinafter referred to as “priority sound”) is not set appropriately. First, if the change in the gain of the priority sound is too strong, it will feel unnatural as the priority sound in the output (mixed sound), and even if it is heard as a sound, the content cannot be heard as a sound. In addition, the volume change or sound quality change of the priority sound may be too strong and cause discomfort. Conversely, if the gain change is too weak, the priority sound cannot be heard sufficiently.

優先音以外の音(以下、「非優先音」と称する)のゲインが適切に設定できないと、以下の問題が生じる。ゲインの変化が強すぎると、出力（混合音）において非優先音の欠落や音質急変が目立ってしまい、違和感を感じる。違和感に気をとられることで優先音の聴き取りが阻害されてしまう。逆に、ゲインの変化が弱すぎると、優先音を十分に引き立てることができない。 The following problems arise when the gains of sounds other than the priority sounds (hereinafter referred to as “non-priority sounds”) cannot be set appropriately. If the gain change is too strong, the lack of non-priority sounds and sudden changes in sound quality will be noticeable in the output (mixed sound), and you will feel uncomfortable. Listening to the priority sound is hindered by being distracted. Conversely, if the gain change is too weak, the priority sound cannot be fully enhanced.

そこで、オーディオデータミキシング時の合理的なゲイン設定方法を確立して、ミキシング装置の動作の向上と安定を図ることを課題とする。 Therefore, it is an object to establish a rational gain setting method at the time of audio data mixing, and to improve and stabilize the operation of the mixing apparatus.

上記課題を解決するために、本発明では、次の２つの原理を基本原理として用いる。
（１）出力信号の対数強度を入力信号の対数強度の和を超えない範囲に限定する。これを「対数強度の和の原理」と称する。「対数強度の和の原理」によって、優先音が増強されすぎて混合音に違和感が生じることを抑制する。
（２）非優先音のパワーの減少を、優先音のパワー増加分を超えない範囲に制限する。これを「穴埋めの原理」と称する。「穴埋めの原理」によって、混合音において非優先音が抑制されすぎて違和感が生じることを抑制する。 In order to solve the above problems, the present invention uses the following two principles as basic principles.
(1) The log intensity of the output signal is limited to a range that does not exceed the sum of the log intensity of the input signal. This is referred to as “the principle of sum of logarithmic intensities”. According to the “principle of sum of logarithmic intensities”, it is suppressed that the priority sound is excessively enhanced and the mixed sound is uncomfortable.
(2) Limiting the power reduction of the non-priority sound to a range that does not exceed the power increase of the priority sound. This is called “the principle of hole filling”. By the “principle of hole filling”, it is possible to suppress the occurrence of a sense of incongruity due to excessive suppression of non-priority sounds in the mixed sound.

具体的には、本発明の一側面において、ミキシング装置は、
時間領域の第１入力信号と第２入力信号をそれぞれ時間周波数平面上の第１信号と第２信号に展開する周波数解析部と、
前記第１信号と前記第２信号を混合した混合信号を生成する信号処理部と、
前記混合信号を時間領域の信号に変換する周波数時間変換部と、
前記変換された信号を出力する信号出力部と、
を有し、
前記信号処理部は、
前記時間周波数平面の各点ごとに、出力信号の対数強度が前記第１信号の対数強度と前記第２信号の対数強度の和を超えない範囲内で前記第１信号のパワーを調整する第１ゲインと、前記第１信号のパワーの増加分を超えない範囲内で前記第２信号のパワーを減少させる第２ゲインとを決定するゲイン決定部と、
前記第１ゲインで調整された前記第１信号と前記第２ゲインで調整された前記第２信号を加算する加算部と、
を有する。 Specifically, in one aspect of the present invention, the mixing device includes:
A frequency analysis unit that develops the first input signal and the second input signal in the time domain into a first signal and a second signal on a time frequency plane, respectively;
A signal processing unit for generating a mixed signal obtained by mixing the first signal and the second signal;
A frequency time conversion unit for converting the mixed signal into a time domain signal;
A signal output unit for outputting the converted signal;
Have
The signal processing unit
A first power for adjusting the power of the first signal for each point on the time-frequency plane within a range in which the log intensity of the output signal does not exceed the sum of the log intensity of the first signal and the log intensity of the second signal. A gain determination unit that determines a gain and a second gain that decreases the power of the second signal within a range not exceeding an increase in power of the first signal;
An adder for adding the first signal adjusted by the first gain and the second signal adjusted by the second gain;
Have

入力信号のミキシング時に適切なゲインが設定され、ミキシング装置の動作の向上と安定を図ることができる。 An appropriate gain is set when the input signal is mixed, so that the operation of the mixing apparatus can be improved and stabilized.

本発明の基本原理を説明する図である。It is a figure explaining the basic principle of this invention. 実施形態のミキシング装置の概略構成図である。It is a schematic block diagram of the mixing apparatus of embodiment. 優先音と非優先音のパワーをリスナーの可聴レベルとともに示す図である。It is a figure which shows the power of a priority sound and a non-priority sound with a listener's audible level. 図３の優先音と非優先音のそれぞれから可聴レベルを減算して、リスナーが感じる音強度（聴感補正パワー）としてプロットした図である。It is the figure which subtracted the audible level from each of the priority sound and non-priority sound of FIG. 3, and plotted as the sound intensity (audibility correction power) which a listener feels. 図４の優先音と非優先音の対数尺度での和を示す図である。It is a figure which shows the sum by the logarithm scale of the priority sound and non-priority sound of FIG. 図４の優先音と非優先音の単純加算和を一定倍した結果を示す図である。It is a figure which shows the result of having multiplied the simple addition sum of the priority sound of FIG. 図５（対数強度和）と図６（単純加算和の一定倍）の双方を満たす範囲を加算(ミキシング)時の上限とする図である。It is a figure which makes the range which satisfy | fills both FIG. 5 (logarithmic intensity sum) and FIG. 6 (constant multiple of simple addition sum) as an upper limit at the time of addition (mixing). 図７の範囲内でゲイン調整された優先音と非優先音を示す図である。It is a figure which shows the priority sound and non-priority sound which were gain-adjusted within the range of FIG. 最小可聴パワーの選択に用いるラウドネス曲線の例を示す図である。It is a figure which shows the example of the loudness curve used for selection of the minimum audible power. 実施形態のミキシング装置の動作例を示す図である。It is a figure which shows the operation example of the mixing apparatus of embodiment. ゲインを平滑化した場合のミキシング装置の動作状態を示す図である。It is a figure which shows the operation state of the mixing apparatus at the time of smoothing a gain. 補正係数Ｂ［ｋ］を直接設定する場合の例を示す図である。It is a figure which shows the example in the case of setting correction coefficient B [k] directly. 実施形態のミキシング信号処理方法の一例を示すフローチャートである。It is a flowchart which shows an example of the mixing signal processing method of embodiment. 実施形態のミキシング信号処理方法の一例を示すフローチャートであり図１３Ａに続くフローチャートである。It is a flowchart which shows an example of the mixing signal processing method of embodiment, and is a flowchart following FIG. 13A. 実施形態のミキシング信号処理方法の別の例を示すフローチャートである。It is a flowchart which shows another example of the mixing signal processing method of embodiment.

図１は、本発明の基本原理を説明する図である。以下の説明では、入力信号ｘ１［ｎ］とｘ２［ｎ］をミキシングする場合を例にとる。入力信号ｘ１［ｎ］は、たとえば音声等の優先信号とする。入力信号ｘ２［ｎ］は、バックグラウンド音等の非優先信号とする。 FIG. 1 is a diagram for explaining the basic principle of the present invention. In the following description, the case where the input signals x1 [n] and x2 [n] are mixed is taken as an example. The input signal x1 [n] is a priority signal such as voice. The input signal x2 [n] is a non-priority signal such as a background sound.

入力信号ｘ１［ｎ］とｘ２［ｎ］は、周波数解析によって、それぞれ時間周波数平面（図中「ｔｆ平面」と表記されている）上に展開される。周波数解析は、短時間ＦＦＴ（Fast Fourier Transform）、短時間フーリエ変換、ウェーブレット変換、フィルタバンクによる変換、ウイグナー分布などの時間周波数分布への変換等、任意の手法を用いることができる。時間周波数平面上に展開された各信号を、Ｘ１［ｉ，ｋ］、Ｘ２［ｉ、ｋ］とする。Ｘ１［ｉ，ｋ］、Ｘ２［ｉ，ｋ］は、時間方向の座標値ｉと、周波数方向の座標値ｋで表される時間周波数平面上での入力信号の点である。 The input signals x1 [n] and x2 [n] are each developed on a time-frequency plane (denoted as “tf plane” in the figure) by frequency analysis. For the frequency analysis, any method such as a short-time FFT (Fast Fourier Transform), a short-time Fourier transform, a wavelet transform, a transformation by a filter bank, or a transformation to a time-frequency distribution such as a Wigner distribution can be used. Let each signal developed on the time-frequency plane be X1 [i, k], X2 [i, k]. X1 [i, k] and X2 [i, k] are points of the input signal on the time frequency plane represented by the coordinate value i in the time direction and the coordinate value k in the frequency direction.

時間周波数平面に展開された入力信号の各点のパワーに基づき、「対数強度の和の原理」と「穴埋めの原理」を用いて、各点での優先音と非優先音のゲインを決定する。「対数強度の和の原理」は、上述のように出力信号のパワーを入力信号の対数強度の和を超えない範囲に限定する処理である。「穴埋めの原理」は、非優先音のパワーの低減を、優先音のパワー増加分を超えない範囲に限定する処理である。これらの原理の具体的な処理方法については後述する。 Based on the power of each point of the input signal developed on the time-frequency plane, the gain of the priority sound and the non-priority sound at each point is determined using the "logarithmic intensity sum principle" and the "fill hole principle". . “The principle of the sum of logarithmic intensities” is a process for limiting the power of an output signal to a range not exceeding the sum of the logarithmic intensities of input signals as described above. “Principle of hole filling” is a process for limiting the power reduction of non-priority sounds to a range that does not exceed the power increase of the priority sound. Specific processing methods of these principles will be described later.

なお、実施形態では最適なゲインの決定のために、対数強度の和の原理に基づく処理（１）と、穴埋めの原理に基づく処理（２）に加えて、オプションとしてさらに以下の処理（３）〜（５）を導入する。
（３）ゲイン決定に際して、(a)対数強度の和の原理により決定されるパワーの増加率に対して、入力音の単純加算値を一定倍したレベルを超えないように上限を設ける、(b)優先音のゲインに固定的な上限を設ける、(c)非優先音のゲインに固定的な下限を設ける、という３つの条件のうちの少なくとも１つを加える。これにより、混合音をさらに自然で穏やかにすることができる。
（４）極端に信号対雑音比が低下している時間区間では、上記（３）の上限や下限を緩和する。これにより、信号対雑音比が低い時間区間でも優先音を目立たせてリスナーにとって聴き取りやすい音にする。
（５）ミキシング処理におけるパラメータは、最適化問題の解として算出するのではなく逐次更新するように構成する。逐次更新の採用によって、「方程式を解く」ことを「不等式の真偽の判定」に置き換えることができ、指数関数、対数関数、乗算などの演算を排除して、乗算と加減算だけの高速アルゴリズムを構成することができる。これにより、ＦＰＧＡ（Field-programmable gate array）等のプログラマブルロジックデバイスへの実装や、ＤＡＷ（digital audio workstation）用のプラグインとしての実装が容易になり、リアルタイム処理が実現する。 In the embodiment, in order to determine an optimum gain, in addition to the process (1) based on the principle of the sum of logarithmic intensity and the process (2) based on the principle of hole filling, the following process (3) is optionally added: (5) is introduced.
(3) When determining the gain, (a) an upper limit is set so that the level of power increase determined by the principle of the sum of logarithmic intensity does not exceed a level obtained by multiplying the simple addition value of the input sound by a certain amount. At least one of the following three conditions is added: (1) a fixed upper limit is set for the gain of the priority sound, and (c) a fixed lower limit is set for the gain of the non-priority sound. As a result, the mixed sound can be made more natural and gentle.
(4) In the time interval in which the signal-to-noise ratio is extremely lowered, the upper limit and the lower limit of (3) are relaxed. This makes the priority sound stand out even in a time interval where the signal-to-noise ratio is low, making it easy for the listener to hear.
(5) Parameters in the mixing process are configured to be updated sequentially rather than being calculated as a solution to the optimization problem. By adopting sequential update, “solving equations” can be replaced with “judgment of inequality true / false”, eliminating operations such as exponential function, logarithmic function, multiplication, etc., and a high-speed algorithm with only multiplication and addition / subtraction Can be configured. This facilitates mounting on a programmable logic device such as a field-programmable gate array (FPGA) or a plug-in for a digital audio workstation (DAW), thereby realizing real-time processing.

時間周波数平面上の各点で、Ｘ１［ｉ，ｋ］、Ｘ２［ｉ、ｋ］にそれぞれ対応するゲインを乗算する。ゲイン乗算後の信号を、Ｍ１［ｉ，ｋ］、Ｍ２［ｉ，ｋ］とする。ゲイン調整された信号Ｍ１［ｉ，ｋ］、Ｍ２［ｉ，ｋ］を加算して時間周波数平面上で２つの信号を重ね合わせる。その後、時間領域の信号に戻して、混合音を出力する。 At each point on the time frequency plane, the gain corresponding to X1 [i, k] and X2 [i, k] is multiplied. The signals after gain multiplication are assumed to be M1 [i, k] and M2 [i, k]. The gain-adjusted signals M1 [i, k] and M2 [i, k] are added and the two signals are superimposed on the time-frequency plane. Thereafter, the mixed sound is output by returning to the time domain signal.

このように、時間周波数平面上の各点で入力信号に対するゲインを決定して乗算することで、自然な混合音を生成することができる。 In this way, a natural mixed sound can be generated by determining and multiplying the gain for the input signal at each point on the time-frequency plane.

図２は、実施形態のミキシング装置１の概略図である。ミキシング装置１は、信号入力部１１、周波数解析部１２、信号処理部１５、周波数時間変換部１６、及び信号出力部１７を有する。信号入力部１１は、ミキシングの対象となる複数の入力信号を入力する。入力信号はたとえばオーディオ信号であり、音声等の優先信号と、バックグラウンド音等の非優先信号を含む。 FIG. 2 is a schematic diagram of the mixing device 1 according to the embodiment. The mixing apparatus 1 includes a signal input unit 11, a frequency analysis unit 12, a signal processing unit 15, a frequency time conversion unit 16, and a signal output unit 17. The signal input unit 11 inputs a plurality of input signals to be mixed. The input signal is an audio signal, for example, and includes a priority signal such as voice and a non-priority signal such as background sound.

周波数解析部１２はたとえば、短時間ＦＦＴにより入力信号を時間周波数平面上に展開する。信号処理部１５は、時間周波数平面上の各点で、入力信号のパワーを算出し、パワーを平滑化した後、ゲイン決定部１５１で優先信号と非優先信号に対するゲインをそれぞれ算出する。そして、優先信号と非優先信号に対して算出されたそれぞれのゲインを乗算した後、加算して加算結果を出力する。周波数時間変換部１６は、信号処理部１５からの出力信号を時間領域の信号に変換する。信号出力部１７は、時間領域に復元された信号を出力する。 For example, the frequency analysis unit 12 develops an input signal on a time-frequency plane by a short-time FFT. The signal processing unit 15 calculates the power of the input signal at each point on the time-frequency plane, smoothes the power, and then calculates the gain for the priority signal and the non-priority signal by the gain determination unit 151. And after multiplying each gain calculated with respect to the priority signal and the non-priority signal, it adds and outputs the addition result. The frequency time conversion unit 16 converts the output signal from the signal processing unit 15 into a time domain signal. The signal output unit 17 outputs a signal restored in the time domain.

図３〜図８を参照して、信号処理部１５での基本処理を説明する。以下の説明で用いられる記号を、表１と表２に示す。表１は定数の記号の一覧であり、表２は変数の記号の一覧である。 The basic processing in the signal processing unit 15 will be described with reference to FIGS. Tables 1 and 2 show symbols used in the following description. Table 1 is a list of constant symbols, and Table 2 is a list of variable symbols.

＜対数強度の和の原理＞
図３は、時間周波数平面のある時刻における優先音（太線）と非優先音（実線）のパワー［ｄＢ］を周波数の関数として模式的に示したものである。このパワーは、信号処理部１５で平滑化されたパワー値Ｅである。点線は、このレベル以上であればリスナーが音を検知できる聴力限界を示す。 <Principle of sum of log intensity>
FIG. 3 schematically shows the power [dB] of the priority sound (thick line) and the non-priority sound (solid line) at a certain time on the time-frequency plane as a function of frequency. This power is a power value E smoothed by the signal processing unit 15. The dotted line indicates the hearing limit at which the listener can detect sound if it is above this level.

人間の聴覚は、パワーの対数で音の強さを感じると言われている。この考え方に基づけば、点線の可聴限界よりも１０ｄＢ高い音成分同士はほぼ同じ強さに感じられ、聴力限界よりも２０ｄＢ高い音成分同士もほぼ同じ強さに感じられる。また、可聴限界よりも１０ｄＢ高い音成分と、２０ｄＢ高い音成分の音量差は、可聴限界よりも２０ｄＢ高い音成分と、３０ｄＢ高い音線分の音量差と同じであるとみなすことができる。 It is said that human hearing feels the strength of sound as a logarithm of power. Based on this idea, sound components that are 10 dB higher than the audible limit of the dotted line feel almost the same strength, and sound components that are 20 dB higher than the hearing limit feel almost the same strength. Further, the volume difference between the sound component that is 10 dB higher than the audible limit and the sound component that is 20 dB higher than the audible limit can be regarded as the same as the volume difference between the sound component that is 20 dB higher than the audible limit and the sound line segment that is 30 dB higher.

図４は、図３の優先音と非優先音のそれぞれから、点線の聴力限界可聴レベルを減算してプロットしたものである。図４で得られる優先信号と非優先信号のパワー値は、人間の聴力限界をゼロｄＢとして補正された「聴感補正パワーＰ」である。図４の縦軸の大きさが、リスナーが感じる音強度となる。 FIG. 4 is a plot in which the dotted-line hearing limit audible level is subtracted from the priority sound and the non-priority sound of FIG. The power values of the priority signal and the non-priority signal obtained in FIG. 4 are “auditory correction power P” corrected with the human hearing limit set to zero dB. The size of the vertical axis in FIG. 4 is the sound intensity felt by the listener.

図５は、図４で聴感補正された優先音と非優先音の対数尺度での和を点線Ａで示す。上述した人間の聴覚についての知見によれば、リスナーに聴感補正された優先音と非優先音の和（対数強度の和）を混合音として感じさせることが妥当であるという原理に行き着く。すなわち、混合音のパワーを図４の点線Ａとする。これが、「対数強度の和の原理」である。人間の聴力限界と等しい音の大きさを１倍とすると、２０ｄＢは１０倍であり、その対数は１である。４０ｄＢは１００倍であり、その対数は２である。対数の尺度で表現すると、人間の聴力限界の１０倍の音と１００倍の音の和は、１０^３すなわち１０００倍のパワーとなる。 FIG. 5 shows the sum of logarithmic scales of the priority sound and the non-priority sound whose auditory sense is corrected in FIG. According to the above-described knowledge about human hearing, the principle is reached that it is appropriate for the listener to feel the sum of the priority sound and the non-priority sound (logarithmic intensity sum) corrected for auditory sense as a mixed sound. That is, the power of the mixed sound is indicated by a dotted line A in FIG. This is the “principle of sum of logarithmic intensity”. If the volume of a sound equal to the human hearing limit is multiplied by 1, 20 dB is 10 times and its logarithm is 1. 40 dB is 100 times and its logarithm is 2. Expressed on a logarithmic scale, the sum of 10 and 100 times the human hearing limit is 10 ³ or 1000 times the power.

ただし、この原理をそのまま使うと、聴感補正後（図４）の優先音のパワーＰ１と非優先音のパワーＰ２の和で表される混合音のパワーはＰ１×Ｐ２になり、場合によってはパワーが大きくなりすぎる。たとえば、Ｐ１＝Ｐ２＝１０^５であれば、混合音のパワーは１０^１０になり、多くの処理系でオーバーフローが起きる可能性がある。そこで、「対数強度の和の原理」で導かれる混合音のパワーをミキシングの上限値として用いる。
＜追加条件＞
対数強度の和の原理で求められる混合音は場合によって音量が強くなりすぎるという問題を解決するために、処理（３）の３つの条件(a)〜(c)の少なくともひとつを追加する。 However, if this principle is used as it is, the power of the mixed sound represented by the sum of the power P1 of the priority sound and the power P2 of the non-priority sound after auditory correction (FIG. 4) becomes P1 × P2, and depending on the case Is too big. For example, if P1 = P2 = 10 ⁵ , the power of the mixed sound becomes 10 ¹⁰ , and overflow may occur in many processing systems. Therefore, the power of the mixed sound derived by the “principle of sum of logarithmic intensity” is used as the upper limit value of mixing.
<Additional conditions>
At least one of the three conditions (a) to (c) of the process (3) is added in order to solve the problem that the volume of the mixed sound obtained by the principle of the logarithmic intensity becomes too strong in some cases.

図６は、条件(a)を示す図である。条件(a)は、混合音のパワー増加率を、２つの入力音のパワーの単純加算値の一定倍率（比率）に制限するものである。自然界において、人間は２つの音の加算（混合音）を聴くときに、単純加算での混合を聴いている。たとえば、聴力限界の１０倍の入力音と、１００倍の入力音の単純加算値は１１０倍である。これに対して、対数尺度での加算値は１０００倍となる。 FIG. 6 is a diagram showing the condition (a). Condition (a) limits the power increase rate of the mixed sound to a fixed magnification (ratio) of the simple addition value of the powers of the two input sounds. In nature, when a person listens to the addition (mixed sound) of two sounds, he / she listens to the mixture by simple addition. For example, the simple addition value of the input sound 10 times the hearing limit and the input sound 100 times is 110 times. On the other hand, the addition value on the logarithmic scale is 1000 times.

そこで、混合パワーの増幅限界Ｔ_Ｇを設定する。混合パワーの増幅限界Ｔ_Ｇは単純加算値の振幅比で表わし、たとえばＴ_Ｇ＝４．０と設定する。この場合、単純加算で求められる振幅（たとえば１１０）の４．０倍が混合パワーの増幅限界となる。図６の点線Ｂが、優先音と非優先音の和（混合）のパワーを単純加算値の所定倍率Ｔ_Ｇに制限する増幅限界である。 Therefore, the amplification limit _TG of the mixed power is set. The amplification limit _TG of the mixed power is represented by the amplitude ratio of the simple addition value, and is set to T _G = 4.0, for example. In this case, the amplification limit of the mixed power is 4.0 times the amplitude (for example, 110) obtained by simple addition. A dotted line B in FIG. 6 is an amplification limit that limits the power of the sum (mixed) of the priority sound and the non-priority sound to a predetermined magnification _TG of the simple addition value.

図７は、図５の対数強度の和の原理で求められる混合音パワーの上限と、図６の単純加算の一定倍で設定される増幅限界の双方を満たすレベル、すなわちいずれか低い方のパワーレベルを信号加算の上限として設定する処理を示す。図７で、点線Ａ（対数強度の和の原理で求められる上限）と、点線Ｂ（単純加算に基づく増幅限界）のパワーの低い方を選択した実線が、信号加算時の上限となる。
＜穴埋めの原理＞
図８は、図７の信号加算の上限の範囲内でのゲインの設定を示す図である。優先音の聴き取りを向上させるためには、時間周波数平面上の必要な部分において、非優先音を抑制する必要がある。抑制量は多ければ多いほどよいというものではない。非優先音を無条件に抑制してしまうと、非優先音の音量変化が刺激的になりすぎて、違和感が増すだけでなく、優先音の聴き取りが妨害され得るからである。したがって、非優先音の抑制に対しても合理的な基準が必要になる。 FIG. 7 shows a level that satisfies both the upper limit of the mixed sound power obtained by the principle of the sum of logarithmic in FIG. 5 and the amplification limit set by a fixed multiple of the simple addition in FIG. 6, that is, the lower power The process which sets a level as an upper limit of signal addition is shown. In FIG. 7, the solid line in which the lower power of the dotted line A (upper limit determined by the principle of the sum of logarithmic intensity) and the dotted line B (amplification limit based on simple addition) is selected is the upper limit at the time of signal addition.
<Principle of hole filling>
FIG. 8 is a diagram showing the gain setting within the upper limit range of the signal addition of FIG. In order to improve listening to the priority sound, it is necessary to suppress the non-priority sound in a necessary part on the time frequency plane. The greater the amount of suppression, the better. If the non-priority sound is suppressed unconditionally, the volume change of the non-priority sound becomes too stimulating, which not only increases the sense of incongruity, but also prevents listening to the priority sound. Therefore, a reasonable standard is required for suppressing non-priority sounds.

実施形態では、優先音のゲインを増加させたことによってパワーが増大した分を超えない範囲で、非優先音のパワーが低減される。つまり、非優先音の抑制によって作られた穴を、優先音の増大によって埋める。この処理により、非優先音に対する違和感の発生を回避できる。 In the embodiment, the power of the non-priority sound is reduced within a range that does not exceed the amount of increase in power by increasing the gain of the priority sound. That is, the hole created by suppressing the non-priority sound is filled by increasing the priority sound. By this process, it is possible to avoid a sense of incongruity with the non-priority sound.

図８において、優先音は、図７の信号加算の上限の範囲内で増幅される。点線Ｃがゲイン調整後の優先音のパワーである。他方、非優先音は、優先音の変化量（すなわちパワー増大分）を超えない範囲で低減される。点線Ｄが、ゲイン調整後の非優先音のパワーである。 In FIG. 8, the priority sound is amplified within the upper limit of signal addition in FIG. The dotted line C is the power of the priority sound after gain adjustment. On the other hand, the non-priority sound is reduced within a range that does not exceed the change amount of the priority sound (that is, the power increase). The dotted line D is the power of the non-priority sound after gain adjustment.

このように、信号処理部１５は、優先音パワーの増大に対する制限と、非優先音パワーの減少に対する制限に基づいて、それぞれのゲインを決定する。時間周波数平面の各点での優先音の増大と非優先音の減少（凹凸）を決定したものがゲインマスクである。 Thus, the signal processing unit 15 determines the respective gains based on the restriction on the increase in the priority sound power and the restriction on the decrease in the non-priority sound power. The gain mask determines the increase of the priority sound and the decrease (unevenness) of the non-priority sound at each point on the time-frequency plane.

なお、処理（３）の条件(a)に替えて、あるいは条件(a)に加えて、優先音のゲインに固定的な上限を設定する条件(b)や、非優先音のゲインに固定的な下限を設定する条件(c)を追加してもよい。対数強度の和の原理と、穴埋めの原理にこれらの条件を追加することで自然な混合音を生成することができる。
＜短時間ＦＦＴ＞
次に、周波数解析部１２の処理の詳細を説明する。実施形態では、周波数解析部１２はＦＦＴ点数として２５６程度の短時間ＦＦＴを実施する。この短時間ＦＦＴは、一次元の入力信号を、２次元の時間周波数（ｔｆ）平面上に展開する処理である。 In addition to condition (a) in process (3), or in addition to condition (a), condition (b) for setting a fixed upper limit on the gain of priority sound, or fixed to the gain of non-priority sound A condition (c) for setting a lower limit may be added. By adding these conditions to the principle of the sum of logarithmic intensities and the principle of filling in holes, a natural mixed sound can be generated.
<Fast-time FFT>
Next, details of the processing of the frequency analysis unit 12 will be described. In the embodiment, the frequency analysis unit 12 performs a short-time FFT of about 256 as the number of FFT points. This short-time FFT is a process of developing a one-dimensional input signal on a two-dimensional time frequency (tf) plane.

サンプリング周波数Ｆ_ｓで採取された信号ｘ１［ｎ］とｘ２［ｎ］を、それぞれ優先音と非優先音とする。両信号ｘｊ［ｎ］（ｊ＝１，２）をＮ_ｄ点シフトでＮ_Ｆ点の短時間フーリエ変換する。ブロック番号ｉ、周波数ビン番号ｋにおける変換結果をＸｊ［ｉ，ｋ］とすると、Ｘｊ［ｉ，ｋ］は式（１）で表される。 The sampling frequency _F signal taken at _s x1 [n] and x2 [n], the respective priority sound and the non-priority sound. Both signals xj [n] a (j = 1, 2) to short-time Fourier transform of the _{N F} point _{N d} point shift. Assuming that the conversion result at block number i and frequency bin number k is Xj [i, k], Xj [i, k] is expressed by equation (1).

ここで、ｈ［ｎ］は窓関数である。Ｎ_ｈは窓関数の幅を決めるパラメータであり、｜ｎ｜≧Ｎ_ｈであるｎに対して、ｈ［ｎ］＝０とする。窓関数としては、ハン窓、ハニング窓、ガウス窓等、任意の窓関数を用いることができる。実施形態では、式（２）のガウス窓を使用する。 Here, h [n] is a window function. N _h is a parameter for determining the width of the window function, and h [n] = 0 for n where | n | ≧ N _h . As the window function, an arbitrary window function such as Hann window, Hanning window, Gaussian window or the like can be used. In the embodiment, a Gaussian window of Equation (2) is used.

ここで、σは窓関数の幅を調整するパラメータである。 Here, σ is a parameter for adjusting the width of the window function.

実信号のＦＦＴ結果は、正負の周波数出力が互いに複素共役の関係にあるため、負の周波数を扱う必要はない。Ｎ_Ｈ＝Ｎ_Ｆ／２として、０≦ｋ≦Ｎ_Ｈの範囲の周波数ビンのみを扱えばよい。また、Ｎ_ｄ＝１のときの逆ＦＦＴを、加算と加算後の定数倍だけですませるために、式（３）の変換を行っておく。加算と乗算のみの演算により計算量を軽減する。 The FFT result of the real signal does not need to handle negative frequencies because the positive and negative frequency outputs are in a complex conjugate relationship with each other. As N _H = N _F / 2, only frequency bins in the range of 0 ≦ k ≦ N _H need be handled. Further, in order to perform the inverse FFT when N _d = 1 only by addition and a constant multiple after the addition, the conversion of Expression (3) is performed. The amount of calculation is reduced by operations of only addition and multiplication.

逆ＦＦＴは、ミキシング装置１の周波数時間変換部１６で行われる。実施形態の信号処理部１５は位相処理を行わず、振幅処理だけで入力信号を混合する。これはＦＦＴの点数Ｎ_Ｆが少ないためである。一例として、Ｎ_Ｆ＝２５６、サンプリング周波数Ｆ_Ｓ＝４４．１ｋＨｚとする。これらの条件では音声の線スペクトル構造を分解するには不十分であり、ひとつの周波数ビンに複数の高調波成分が混在し位相の利用が困難になる。 The inverse FFT is performed by the frequency time conversion unit 16 of the mixing apparatus 1. The signal processing unit 15 of the embodiment does not perform phase processing and mixes input signals only by amplitude processing. This is because the number of FFT points N _F is small. As an example, N _F = 256 and sampling frequency F _S = 44.1 kHz. Under these conditions, it is insufficient to decompose the line spectral structure of speech, and a plurality of harmonic components are mixed in one frequency bin, making it difficult to use the phase.

実施形態では振幅処理のみを行うので、ミキシング出力Ｙ［ｉ，ｋ］は、Ｘ１［ｉ，ｋ］とＸ２［ｉ，ｋ］に、それぞれゲインα１［ｉ，ｋ］とα２［ｉ，ｋ］を乗算して加算することで生成される。 In the embodiment, since only amplitude processing is performed, the mixing output Y [i, k] has gains α1 [i, k] and α2 [i, k] on X1 [i, k] and X2 [i, k], respectively. Is generated by multiplying and adding.

時間領域での出力ｙは、式（４）のＹ［ｉ，ｋ］を逆ＦＦＴして得られる。 The output y in the time domain is obtained by performing an inverse FFT on Y [i, k] in Equation (4).

ここで、１サンプルシフト（Ｎ_ｄ＝１）の場合は、ｎをゼロに固定してもｙを生成できるので、式（６）のように簡単な処理になる。 Here, in the case of one sample shift (N _d = 1), y can be generated even if n is fixed to zero, so that the processing is simple as shown in Equation (6).

さらに、Ｘ［ｉ，ｋ］について、式（３）の変換をしておくことで、加算する周波数ビン数を、式（７）のようにほぼ半分に減らすことができる。 Further, by converting X [i, k] according to Equation (3), the number of frequency bins to be added can be reduced to almost half as shown in Equation (7).

＜平滑化パワーの計算＞
次に、信号処理部１５による平滑化パワーの計算を説明する。パワーの平滑化に先立って、時間周波数領域の信号Ｘｊ［ｉ，ｋ］の絶対値の２乗（|Ｘj［i,k］|^２）を計算し、これを平滑化する。平滑化として、たとえば式（８）で示される指数平滑化を用いる。指数平滑化法は、計算量と必要なメモリ量が少ないので、ＦＰＧＡ化に適している。 <Calculation of smoothing power>
Next, the calculation of the smoothing power by the signal processing unit 15 will be described. Prior to power smoothing, the square of the absolute value of the signal Xj [i, k] in the time frequency domain (| Xj [i, k] | ² ) is calculated and smoothed. As the smoothing, for example, exponential smoothing represented by Expression (8) is used. The exponential smoothing method is suitable for FPGA implementation because it requires a small amount of calculation and a small amount of memory.

ここで、μは指数平滑化法の係数であり、平滑の時定数τ_ｓから式（９）で導出する。 Here, μ is a coefficient of the exponential smoothing method, and is derived from the smoothing time constant τ _s by Equation (9).

式（８）をＩＩＲ（Infinite Impulse Response）型ディジタルフィルタとみたとき、そのインパルス応答がピーク値の１／ｅに減衰する時間がτ_ｓである。実施形態では、平滑化に指数平滑化を用いるが、ＦＩＲ（Finite Impulse Response）フィルタ、ＩＩＲフィルタ等、任意の平滑化法を用いることができる。
＜最小可聴パワーの計算＞
入力信号のミキシングには、時間周波数平面上での各点の成分について、それが聴こえる成分なのか、聴こえない成分なのかを判定する必要がある。そのために、それぞれの音源ｊの各周波数ビンｋについて、その成分が可聴であるための最小のパワーＡ［ｋ］を定義する。 When Expression (8) is regarded as an IIR (Infinite Impulse Response) type digital filter, the time for the impulse response to decay to 1 / e of the peak value is τ _s . In the embodiment, exponential smoothing is used for smoothing, but any smoothing method such as an FIR (Finite Impulse Response) filter or an IIR filter can be used.
<Calculation of minimum audible power>
In mixing the input signal, it is necessary to determine whether the component at each point on the time-frequency plane is a audible component or an inaudible component. Therefore, for each frequency bin k of each sound source j, a minimum power A [k] for defining the audible component is defined.

図９（Ａ）は、国際標準化規格ＩＳＯ２２６：２００３で規定された等ラウドネス曲線のうち、２０ phonと７０ phonの曲線から主要部分を抽出してサンプリングしたものである。これらをそれぞれＣ_２０［ｋ］とＣ_７０［ｋ］と呼ぶ。 FIG. 9A shows a sample extracted by extracting a main part from the curves of 20 phon and 70 phon among the equal loudness curves defined in the international standard ISO 226: 2003. These are referred to as C ₂₀ [k] and C ₇₀ [k], respectively.

本来であれば、０ phonの曲線が最小可聴パワーである。しかし、リスナーにどのような音量で音が提示されるかは電気音響装置のボリューム設定によってその都度違うので、実施形態のミキシング装置１の信号処理部１５は、ラウドネスレベルが指定された値になったときに可聴であると判断する。ミキシング装置１のユーザが最小可聴パワーとして、等ラウドネス曲線の中からＬ_ｐ phonの曲線を選択できるように設計してもよい。Ｌ_ｐ phonの曲線は、Ｃ_２０［ｋ］とＣ_７０［ｋ］を補間または補外した近似値として、式（１０）で得ることができる。 Originally, the 0 phon curve is the minimum audible power. However, since the volume at which the sound is presented to the listener varies depending on the volume setting of the electroacoustic device, the signal processing unit 15 of the mixing device 1 according to the embodiment has a specified value for the loudness level. Judgment is audible when It may be designed so that the user of the mixing apparatus 1 can select the L _p phon curve from the equal loudness curve as the minimum audible power. The curve of L _p phon can be obtained by Equation (10) as an approximate value obtained by interpolating or extrapolating C ₂₀ [k] and C ₇₀ [k].

なお、平滑化されたパワーレベルＥｊ［ｉ，ｋ］が可聴であるか否かを判断するとき、Ｅｊ［ｉ，ｋ］をＣ_Lp［ｋ］と比較することはできず、信号ｘｊ［ｎ］の絶対値の最大値ｘ_maxや窓関数ｈ［ｎ］を勘案する必要がある。そこで、Ｃ_Lp［ｋ］を式（１１）のように変換し、最小可聴パワーＡ［ｋ］を導出する。 Note that when determining whether the smoothed power level Ej [i, k] is audible, Ej [i, k] cannot be compared with C _Lp [k], and the signal xj [n ] is the need to account for the maximum value x _max and window function h of the absolute value [n]. Therefore, C _Lp [k] is converted as shown in Equation (11) to derive the minimum audible power A [k].

ここで、定数Ｌｆは、ｘｊ［ｎ］がフルスケールの信号であったときに、それを図９（Ａ）の縦軸の音圧レベル（ＳＰＬ：Sound Pressure Level）の何ｄＢに相当させるかを自由に設定するための定数である。 Here, the constant Lf, when xj [n] is a full-scale signal, how much dB of the sound pressure level (SPL: Sound Pressure Level) on the vertical axis in FIG. Is a constant for setting

ミキシング装置１の動作を自由に設定するという観点に立てば、Ｃ_２０［ｋ］とＣ_７０［ｋ］をＩＳＯ２２６：２００３に準拠させる必然性はなく、図９（Ｂ）のような等ラウドネス曲線を生成してもよい。図９（Ｂ）の曲線を用いると、８ｋＨｚ程度の高い周波数の音は可聴とみなされやすくなるので、優先音においてこの帯域付近の成分は尊重されることになる。結果として、優先音にメリハリがつくので、実際に聴いた感じとして高評価が得られやすい。後述する実験結果は、図９（Ｂ）のＣ_２０［ｋ］とＣ_７０［ｋ］を用いている。
＜聴感補正パワーの計算＞
ゲインを決定するための聴感補正パワーの計算について説明する。聴感補正パワーの計算は図４の処理に該当する。平滑化後のパワーＥｊ［ｉ，ｋ］を最小可聴パワーＡ［ｋ］で除算した結果が１より大きければ可聴であり、その可聴のレベルは、Ｅｊ［ｉ，ｋ］／Ａ［ｋ］で表現される。たとえば、Ｅｊ［ｉ，ｋ］／Ａ［ｋ］＝１００であれば、最小可聴の音に比べて１００倍のパワーを持っている。 From the viewpoint of freely setting the operation of the mixing apparatus 1, there is no necessity to make C ₂₀ [k] and C ₇₀ [k] conform to ISO 226: 2003, and an equal loudness curve as shown in FIG. May be generated. If the curve in FIG. 9B is used, a sound with a high frequency of about 8 kHz is likely to be audible, so that the components near this band are respected in the priority sound. As a result, since the priority sound is sharp, it is easy to obtain high evaluation as a feeling of actual listening. The experimental results to be described later use C ₂₀ [k] and C ₇₀ [k] in FIG. 9B.
<Calculation of auditory correction power>
The calculation of the auditory correction power for determining the gain will be described. The calculation of the auditory correction power corresponds to the processing of FIG. The smoothed power Ej [i, k] divided by the minimum audible power A [k] is audible if the result is greater than 1, and the audible level is Ej [i, k] / A [k]. Expressed. For example, if Ej [i, k] / A [k] = 100, the power is 100 times that of the minimum audible sound.

この評価法では除算が生じるが、ＦＰＧＡは除算が苦手である。そこで、最小可聴パワーＡ［ｋ］は事前に決定されているので、あらかじめその逆数Ｂ［ｋ］を作っておくことで除算を回避する。 Although division occurs in this evaluation method, FPGA is not good at division. Therefore, since the minimum audible power A [k] is determined in advance, division is avoided by making the reciprocal B [k] in advance.

この補正係数Ｂ［ｋ］を用いて、平滑化パワーＥｊ［ｉ，ｋ］から聴感補正パワーＰｊ［ｉ，ｋ］を式（１４）の乗算により生成する。 Using this correction coefficient B [k], auditory correction power Pj [i, k] is generated from the smoothing power Ej [i, k] by multiplication of Expression (14).

聴感補正パワーＰｊ［ｉ，ｋ］は、時間周波数平面の１点ごとに値が決まる量である。各点での聴感補正パワーＰｊ［ｉ，ｋ］から、式（１５）で定義する聴感補正総パワーＱｊ［ｉ］を算出する。 The auditory correction power Pj [i, k] is an amount whose value is determined for each point on the time-frequency plane. From the auditory correction power Pj [i, k] at each point, the total auditory correction power Qj [i] defined by the equation (15) is calculated.

聴感補正総パワーＱｊ［ｉ］は、各点のパワーを周波数方向に積算した量であり、リスナーが感じることのできる音のエネルギーの簡略化された推定値である。聴感補正総パワーＱｊ［ｉ］は、以下で説明する時区間の属性判定に用いられる。
＜時区間の属性判定計算＞
信号処理部１５は、ミキシング処理を行う際に各時間区間において、有音判定、低ＳＮＲ（Signal to Noise Ratio：信号対雑音比）判定、及びブースト判定を行う。これらの判定は、上述した処理（４）と関連する。 The auditory sensation correction total power Qj [i] is an amount obtained by integrating the power of each point in the frequency direction, and is a simplified estimated value of sound energy that the listener can feel. The auditory correction total power Qj [i] is used for attribute determination of a time interval described below.
<Time zone attribute determination calculation>
The signal processing unit 15 performs sound determination, low SNR (Signal to Noise Ratio) determination, and boost determination in each time interval when performing the mixing processing. These determinations are related to the process (4) described above.

まず、有音判定について説明する。有音でない部分でミキシング処理を行うと、優先信号に含まれるわずかな音、たとえばナレーションの合間の風の音などが増強され、好ましくない混合音が生成される。これを防ぐために、優先音の中でこの時間区間は聴き落してはならないという部分を有音部としてあらかじめ設定しておく。 First, the sound determination will be described. When the mixing process is performed in a non-sound part, a slight sound included in the priority signal, for example, a wind sound between narrations, is enhanced, and an undesired mixed sound is generated. In order to prevent this, a part of the priority sound that should not be listened to during this time interval is set in advance as a sound part.

有音部の判定は、有音時に１となる関数ｅ［ｉ］を式（１６）により定義する。 For the determination of the sound part, a function e [i] that becomes 1 when sound is present is defined by Expression (16).

ここで、Ｔｅは有音判定のためのパラメータである。たとえば、Ｔｅ＝１．０とすれば、全ビンが可聴判定ぎりぎりであるときに有音と判定される。 Here, Te is a parameter for sound determination. For example, if Te = 1.0, it is determined to be sound when all bins are at the audible limit.

次に、低ＳＮＲ判定について説明する。図５〜図７を参照して説明したように、ミキシング装置１では、優先音のゲインに上限を設ける。このため、優先音が非優先音に比べて極端にレベルが低い場合は、ゲインの上限値を使っても、優先音の聴き取りが困難になる場合がある。これを防ぐため、低ＳＮＲか否かを判定し、低ＳＮＲと判定された時間区間で上限の引き上げを行う。 Next, the low SNR determination will be described. As described with reference to FIGS. 5 to 7, the mixing device 1 sets an upper limit on the gain of the priority sound. For this reason, when the priority sound is extremely low compared to the non-priority sound, it may be difficult to listen to the priority sound even if the upper limit value of the gain is used. In order to prevent this, it is determined whether or not the SNR is low, and the upper limit is increased in the time interval determined to be low SNR.

低ＳＮＲの判定は、低ＳＮＲ時に１となる関数l［ｉ］を式（１７）で定義することができる。 For the determination of the low SNR, the function l [i] that becomes 1 at the time of the low SNR can be defined by Expression (17).

ここで、Ｔ_ＳＮは低ＳＮＲ判定のためのパラメータである。たとえば、Ｔ_ＳＮ＝１０．０とすれば、聴感補正総パワーについて、優先音と非優先音の間に、振幅比で１０倍（パワー比で１００倍）の開きがあるときに低ＳＮＲと判定される。 Here, _TSN is a parameter for low SNR determination. For example, if T _SN = 10.0, the audibility correction total power is determined to be low SNR when there is a difference of 10 times in amplitude ratio (100 times in power ratio) between the priority sound and the non-priority sound. Is done.

最後に、ブースト判定について説明する。ブースト判定は、優先音が有音であり、かつ低ＳＮＲであるときに行われる。ブースト時に１となるｂ［ｉ］を、式（１８）で定義する。 Finally, boost determination will be described. The boost determination is performed when the priority sound is sound and has a low SNR. B [i] which becomes 1 at the time of boosting is defined by Expression (18).

ブースト判定が真となったときに、除算なしでブースト動作を行うために、ブーストレシオを分数表示したときの分子ｂ_ｎと分母ｂ_ｄを、それぞれ式（１９）と式（２０）で求めておく。これらを用いて、各種の評価基準に対してｂ_ｎ／ｂ_ｄのブーストが行われる。 In order to perform a boost operation without division when the boost determination becomes true, the numerator b _n and the denominator b _d when the boost ratio is displayed as a fraction are obtained by the equations (19) and (20), respectively. deep. These are used to boost b _n / b _d for various evaluation criteria.

＜ゲインの生成＞
ゲインの生成は実施形態のミキシング処理の核心である。優先音のためのゲインα１［ｉ，ｋ］と、非優先音のためのゲインα２［ｉ，ｋ］を生成する。ミキシング装置１の動作開始時は、両ゲインを１に初期化しておく。すべてのｋについて、α１［０，ｋ］＝α２［０，ｋ］＝１である。 <Generation of gain>
The generation of the gain is the core of the mixing process of the embodiment. A gain α1 [i, k] for the priority sound and a gain α2 [i, k] for the non-priority sound are generated. Both gains are initialized to 1 when the operation of the mixing apparatus 1 is started. For all k, α1 [0, k] = α2 [0, k] = 1.

今、時間ブロックｉに関する処理を始めたところであるとする。このとき、すべてのｋについてα１［ｉ−１，ｋ］とα２［ｉ−１，ｋ］がすでに決定している。α１［ｉ，ｋ］はα１［ｉ−１，ｋ］にΔ１を使った増減を行うことで更新される。α２［ｉ，ｋ］はα２［ｉ−１，ｋ］にΔ２を使った増減を行うことで更新される。 It is assumed that processing related to the time block i has just started. At this time, α1 [i−1, k] and α2 [i−1, k] have already been determined for all k. α1 [i, k] is updated by increasing or decreasing α1 [i−1, k] using Δ1. α2 [i, k] is updated by increasing or decreasing α2 [i−1, k] using Δ2.

α１［ｉ，ｋ］の増減はα１［ｉ−１，ｋ］に対して（１＋Δ１）の乗算、もしくは（１＋Δ１）^−１の乗算を行うことで実現する。一方、α２［ｉ，ｋ］の増減は、α２［ｉ−１，ｋ］に±Δ２を加算することで行う。 The increase / decrease of α1 [i, k] is realized by multiplying α1 [i−1, k] by (1 + Δ1) or (1 + Δ1) ⁻¹ . On the other hand, α2 [i, k] is increased or decreased by adding ± Δ2 to α2 [i−1, k].

このように異なる更新方法を採用する理由を説明する。優先音のためのゲインα１［ｉ，ｋ］は、条件によっては１０以上の値にすることがある。特に、α１［ｉ，ｋ］が大きいときには変化の差分を大きくする必要があり、乗算的更新が適している。一方、非優先音のためのゲインα２［ｉ，ｋ］は、０から１の範囲に限定されているため、一定刻みで十分であるし、一定刻みのほうが低レベルになったときの信号の抑圧をシャープに行うことができる。 The reason for adopting such a different update method will be described. The gain α1 [i, k] for the priority sound may be 10 or more depending on conditions. In particular, when α1 [i, k] is large, it is necessary to increase the difference in change, and multiplicative updating is suitable. On the other hand, the gain α2 [i, k] for the non-priority sound is limited to a range from 0 to 1, so that a constant increment is sufficient, and the signal when the constant increment becomes lower is obtained. Suppression can be performed sharply.

ゲインα１［ｉ，ｋ］、α２［ｉ，ｋ］の更新を加減算と乗算のみにしたのは、処理（５）で説明したとおり、演算を軽くするためである。方程式を解いて次のゲインを決めるという方法では、多くの場合、除算や平方根等が発生する。また、ゲインが大きく変動して出力波形に不連続が生じることも懸念される。 The reason why the gains α1 [i, k] and α2 [i, k] are updated only by addition / subtraction and multiplication is to reduce the calculation as described in the process (5). In the method of determining the next gain by solving the equation, in many cases, division, square root, or the like occurs. There is also a concern that the output waveform may be discontinuous due to a large fluctuation in gain.

これに対し、実施形態では微小量の増減に限定することで、ゲインは滑らかに変化し、出力に段差が生じることを抑止できる。
（Ａ）ゲイン調整信号の聴感補正パワーの計算
もし、ゲインの増減を行わず、ひとつ前のフレームのゲインαｊ［ｉ−１，ｋ］をそのまま用いた場合、すなわち、αｊ［ｉ，ｋ］＝αｊ［ｉ−１，ｋ］とした場合、音源ｊに関する優先音と非優先音の聴感補正パワーは、それぞれ式（２１）と式（２２）で表される。 On the other hand, in the embodiment, by limiting the increase / decrease to a minute amount, it is possible to prevent the gain from changing smoothly and causing a step in the output.
(A) Calculation of auditory correction power of gain adjustment signal If gain αj [i−1, k] of the previous frame is used as it is without increasing or decreasing gain, that is, αj [i, k] = When αj [i−1, k] is set, the audibility correction power of the priority sound and the non-priority sound related to the sound source j is expressed by Expression (21) and Expression (22), respectively.

このとき、ミキシング出力の聴感補正パワーＬ［ｉ，ｋ］は、両音源の寄与の和として式（２３）で表される。 At this time, the perceptual correction power L [i, k] of the mixing output is expressed by the equation (23) as the sum of contributions of both sound sources.

優先音のゲインを増加させた場合の聴感補正パワーをＬ_１ｐ［ｉ，ｋ］と定義しておく。 The audibility correction power when the gain of the priority sound is increased is defined as L _1p [i, k].

増加時のミキシング出力の聴感補正パワーをＬ_ｐ［ｉ，ｋ］とする。 _Let L _p [i, k] be the auditory correction power of the mixing output at the time of increase.

非優先音のゲインをΔ２だけ減少させた増加させた場合の聴感補正パワーをＬ_２ｍ［ｉ，ｋ］と定義しておく。 The auditory correction power when the gain of the non-priority sound is increased by decreasing by Δ2 is defined as L _2m [i, k].

調整後のゲインα１［ｉ，ｋ］を用いた場合の優先音に関する聴感補正パワーをＬ_１α［ｉ，ｋ］と定義しておく。 The auditory correction power related to the priority sound when the adjusted gain α1 [i, k] is used is defined as L _1α [i, k].

（Ｂ）操作する帯域の制限
次に、ゲイン調整する帯域の制限について説明する。０Ｈｚに相当する周波数ビンの信号ゲインを操作すると、音の自然感が損なわれる場合がある。また、高い周波数の信号ゲインを操作すると、聴き取り易さ向上のメリットよりも耳障りな音の付加というデメリットが大きくなる場合がある。 (B) Restriction of band to be operated Next, restriction of a band for gain adjustment will be described. Manipulating the signal gain of the frequency bin corresponding to 0 Hz may impair the natural feeling of the sound. Further, when a high-frequency signal gain is manipulated, the demerit of adding a harsh sound may be greater than the merit of improving the ease of listening.

そこで、優先音に対しては、ｆ_１Ｌ≦ｆ≦ｆ_１Ｈの範囲にある周波数ｆでのみα１［ｉ，ｋ］を更新する。この範囲は、周波数ビンｋの範囲で、ｋ_１Ｌ≦ｋ≦ｋ_１Ｈの範囲に相当する。ただし、
ｋ_１Ｌ＝ｒｄ（Ｎ_Ｆｆ_１Ｌ／Ｆ_ｓ）
ｋ_１Ｈ＝ｒｄ（Ｎ_Ｆｆ_１Ｈ／Ｆ_ｓ）
である。ここで、「ｒｄ（）」は最も近い整数への丸め関数（四捨五入関数）を意味する。 Therefore, for the priority sound, α1 [i, k] is updated only at the frequency f in the range of f _1L ≦ f ≦ f _1H . This range is a range of frequency bin k and corresponds to a range of k _1L ≦ k ≦ k _1H . However,
k _1L = rd (N _F f _1L / F _s )
k _1H = rd (N _F f _1H / F _s )
It is. Here, “rd ()” means a rounding function (rounding function) to the nearest integer.

非優先音に対しても、同様に、ｆ_２Ｌ≦ｆ≦ｆ_２Ｈの範囲に限定してゲイン調整を行い、ｋ_２Ｌ≦ｋ≦ｋ_２Ｈを満たすα２［ｉ，ｋ］だけを増減させる。
（Ｃ）α１を増加するための条件
α１の増加、すなわちα１［ｉ，ｋ］＝（１＋Δ１）×α１［ｉ−１，ｋ］の演算を行うのは、式（２８）〜（３２）の条件がすべて満たされるときである。 Similarly, for non-priority sounds, gain adjustment is performed only in the range of f _2L ≦ f ≦ f _2H , and only α2 [i, k] that satisfies k _2L ≦ k ≦ k _2H is increased or decreased.
(C) Conditions for increasing α1 Increase of α1, that is, the calculation of α1 [i, k] = (1 + Δ1) × α1 [i−1, k] is performed according to equations (28) to (32). When all the conditions are met.

式（２８）と式（２９）は、優先音と非優先音の双方が可聴であるときにのみ増加を行うことを規定している。式（３０）は、混合音の対数強度（パワー）が優先音と非優先音の対数強度の和を上回らないように働く（対数強度の和の原理）。式（３１）は、優先音に対するゲインを一定値（Ｔ_１Ｈ）以下に抑えるように働く。式（３２）は、単純加算の場合の混合と比較して、時間周波数平面の局所であってもパワーの上昇を一定限界（振幅比でＴ_Ｇ倍）以下に抑えるように働く（処理（３）の条件(a)）。 Expressions (28) and (29) specify that the increase is performed only when both the priority sound and the non-priority sound are audible. Equation (30) works so that the log intensity (power) of the mixed sound does not exceed the sum of the log intensity of the priority sound and the non-priority sound (the principle of the sum of log intensity). Equation (31) works to keep the gain for the priority sound below a certain value (T _1H ). Equation (32) works to suppress the increase in power below a certain limit ( _TG ratio in amplitude ratio) even in the local area of the time-frequency plane as compared with the mixing in the case of simple addition (processing (3) ) Condition (a)).

式（３０）〜（３２）に対しては、低ＳＮＲ判定時には補正をかけるのが望ましい。この補正は、Ｐ１を（ｂ_ｎ／ｂ_ｄ）Ｐ１に置き換えることによって優先音のレベルを上昇させたとみなすことによって行われる。
(Ｄ) α１を減少するための条件
α１の減少、すなわちα１［ｉ，ｋ］＝（１＋Δ１）^−１×α１［ｉ−１，ｋ］の演算を行うのは、式（３３）〜（３７）のいずれかが成り立ち、かつ式（３８）が成り立つときである。 It is desirable to correct the expressions (30) to (32) when determining the low SNR. This correction is performed by considering that the priority sound level has been raised by replacing P1 with (b _n / b _d ) P1.
(D) Condition for reducing α1 The reduction of α1, that is, the calculation of α1 [i, k] = (1 + Δ1) ⁻¹ × α1 [i−1, k] is performed by equations (33) to (37 ) Holds and equation (38) holds.

式（３３）と式（３４）は、時間周波数平面上の点（ｉ，ｋ）において、優先音と非優先音の少なくとも一方が可聴レベルを満たさない場合は、優先音のゲインを戻すことを意図する。式（３５）は、混合音の対数強度が優先音と非優先音の対数強度の和を上回っている場合に、優先音のゲインを戻すように働く。式（３６）は、優先音に対するゲインα１があらかじめ設定された上限Ｔ_１Ｈを超えていたとき、その超過を解消する方向に働く（処理（３）の条件(b)）。式（３７）は、単純加算による混合音に所定の倍率（比率）Ｔ_Ｇを乗算したレベル（図６参照）を超える場合に優先音のゲインを戻す方向に働く。式（３８）は、優先音のゲイン値が１よりも大きいときにのみ減少させることを示す。 Expressions (33) and (34) indicate that the gain of the priority sound is returned when at least one of the priority sound and the non-priority sound does not satisfy the audible level at the point (i, k) on the time-frequency plane. Intended. Expression (35) works so as to return the gain of the priority sound when the log intensity of the mixed sound exceeds the sum of the log intensity of the priority sound and the non-priority sound. Equation (36) works in a direction to eliminate the excess when the gain α1 for the priority sound exceeds the preset upper limit T _1H (condition (b) of process (3)). Expression (37) works to return the gain of the priority sound when it exceeds a level (see FIG. 6) obtained by multiplying the mixed sound by simple addition by a predetermined magnification (ratio) _TG . Equation (38) indicates that the priority sound is decreased only when the gain value is larger than one.

式（３３）〜（３６）は、式（２８）〜（３１）の否定である。一方、式（３７）は式（３２）の否定になっていない。式（３７）は、修正前に対する条件式であり、式（３２）は修正後に対する条件式であるという差異がある。この差異により、ゲインが振動することを抑制している。 Expressions (33) to (36) are negations of Expressions (28) to (31). On the other hand, Expression (37) is not the negative of Expression (32). Equation (37) is a conditional expression before correction, and equation (32) is a conditional expression after correction. This difference prevents the gain from vibrating.

このような減少操作によって、α１は増加の必要がないときには１に戻っていく。減少操作によってα１［ｉ，ｋ］＜１となってしまった場合は、１を強制代入することで、α１［ｉ，ｋ］＝１を回復させる。この回復操作がある場合は、式（３８）の条件は必ずしも必要ではないが、ソフトウエア実装の場合は、無駄な乗算時間の増大を防止するため、ＦＰＧＡ実装の場合は消費電力抑制のために、式（３８）の判定があったほうがよい。 By such a decrease operation, α1 returns to 1 when there is no need to increase it. When α1 [i, k] <1 is obtained by the reduction operation, α1 [i, k] = 1 is recovered by forcibly substituting 1. If there is this recovery operation, the condition of equation (38) is not necessarily required. However, in the case of software implementation, in order to prevent an unnecessary increase in multiplication time, in the case of FPGA implementation, to reduce power consumption. It is better to have the determination of the equation (38).

α１の増加と減少の条件がどちらも満たされない場合は、値の保持、すなわちα１［ｉ，ｋ］＝α１［ｉ−１，ｋ］を行う。
（Ｅ）α２を減少するための条件
α２の減少、すなわちα２［ｉ，ｋ］＝α２［ｉ−１，ｋ］−Δ２の演算を行うのは、式（３９）と式（４０）の双方が満たされる場合である。 When neither the increase nor decrease condition of α1 is satisfied, the value is retained, that is, α1 [i, k] = α1 [i−1, k] is performed.
(E) Condition for decreasing α2 Reduction of α2, that is, the calculation of α2 [i, k] = α2 [i−1, k] −Δ2 is performed in both formulas (39) and (40). Is satisfied.

式（３９）は、優先音のパワー増加分を超えない量であれば、非優先音のパワーを減少させてもよいことを示す。式（４０）は、非優先音に対するゲインを一定値（Ｔ_２Ｌ）以上に保つように働く。
（Ｆ）α２を増加するための条件
α２の増加、すなわちα２［ｉ，ｋ］＝α２［ｉ−１，ｋ］＋Δ２の演算を行うのは、式（４１）と式（４２）の双方が満たされる場合である。 Equation (39) indicates that the power of the non-priority sound may be decreased as long as the amount does not exceed the power increase of the priority sound. Equation (40) works to keep the gain for non-priority sounds above a certain value (T _2L ).
(F) Condition for Increasing α2 Increase of α2, that is, the calculation of α2 [i, k] = α2 [i−1, k] + Δ2 is performed by both equation (41) and equation (42). This is the case.

式（４１）は、この時点までに決定されたゲインα１［ｉ，ｋ］、α２［ｉ−１，ｋ］を用いると、優先音のパワー増加分よりも非優先音のパワー減少のほうがおおきくなってしまうことを示している。式（４１）は式（３９）の否定に近いが、式（４１）は修正前に対する条件式であるのに対し、式（３９）は修正後に対する条件式であるという差異がある。この差異によって、ゲインが振動することを防止する。 In equation (41), when the gains α1 [i, k] and α2 [i−1, k] determined up to this point are used, the power decrease of the non-priority sound is larger than the power increase of the priority sound. It shows that it will become. The expression (41) is close to the negative of the expression (39), but there is a difference that the expression (41) is a conditional expression before the correction, whereas the expression (39) is a conditional expression after the correction. This difference prevents the gain from vibrating.

この操作により、α２は減少させる必要がないときは１に戻っていく。α２の増加によりα２［ｉ，ｋ］＞１となった場合は、１を強制代入することで、α２［ｉ，ｋ］＝１を回復する。 With this operation, α2 returns to 1 when it is not necessary to decrease it. When α2 [i, k]> 1 due to an increase in α2, α2 [i, k] = 1 is recovered by forcibly substituting 1.

α２の増加と減少の条件がどちらも満たされない場合は、値の保持、すなわちα２［ｉ，ｋ］＝α２［ｉ−１，ｋ］を行う。
＜動作例＞
図１０は、実施形態のミキシング装置１の動作例を示す図である。２つの音源セット（セット１、セット２）を用意し、各音源セットで音声を優先音とし、楽器音を非優先音とした。図１０（Ａ）はブーストが効いていない場合の例、図１０（Ｂ）はブーストが効いている場合の例であり、ともに音源セット１を対象にしたものである。上述のように、優先音が有音であって、かつ低ＳＮＲのときにブースト処理が行われる。図１０（Ａ）と図１０（Ｂ）はともに、修正前の各種変量をプロットしている。 If neither the increase nor decrease condition of α2 is satisfied, the value is retained, that is, α2 [i, k] = α2 [i−1, k] is performed.
<Operation example>
FIG. 10 is a diagram illustrating an operation example of the mixing apparatus 1 according to the embodiment. Two sound source sets (set 1 and set 2) were prepared. In each sound source set, the sound was a priority sound and the instrument sound was a non-priority sound. FIG. 10A shows an example when the boost is not effective, and FIG. 10B shows an example when the boost is effective, both of which are directed to the sound source set 1. As described above, the boost process is performed when the priority sound is sound and has a low SNR. Both FIG. 10 (A) and FIG. 10 (B) plot various variables before correction.

図中のＭＵＬは、ｂ_ｎＰ１・Ｐ２／ｂ_ｄ（ｂｎ／ｂｄはブーストレシオ）であり、ＰＬＵＳは、Ｔ_Ｇ ^２（ｂ_ｎＰ１＋ｂ_ｄＰ２）／ｂ_ｄである。図中のＬは、式（２３）で定義したミキシング出力の聴感補正パワーＬである。ＬをＭＵＬを超えない範囲でできるだけ大きくするというのが式（３０）の条件であり、ＬをＰＬＵＳを超えない範囲でできるだけ大きくするというのが式（３２）の条件である。 In the figure, MUL is b _n P1 · P2 / b _d (bn / bd is a boost ratio), and PLUS is T _G ² (b _n P1 + b _d P2) / b _d . L in the figure is the audibility correction power L of the mixing output defined by the equation (23). The condition of equation (30) is to make L as large as possible within a range not exceeding MUL, and the condition of equation (32) is to make L as large as possible within a range not exceeding PLUS.

図１０（Ａ）と図１０（Ｂ）の双方で、ＭＵＬとＰＬＵＳの大小関係は周波数に依存しており、常にどちらかが高いということはない。このことから、式（３０）と式（３２）の条件は両方とも効いており、併用すべきであることがわかる。
＜発展例１＞
発展例１として、ゲインの平滑化による改良例を示す。上述した方法で、２つの音源セットのいずれに対しても良好な結果を得ることができたが、入力のＳＮＲが低い部分で混合音がやや聴き取りにくくなることがわかった。 In both FIG. 10A and FIG. 10B, the magnitude relationship between MUL and PLUS depends on the frequency, and one of them is not always high. From this, it can be seen that both the conditions of the equations (30) and (32) are effective and should be used together.
<Development example 1>
As a first development example, an improvement example by smoothing the gain is shown. With the method described above, good results could be obtained for both of the two sound source sets, but it was found that the mixed sound was somewhat difficult to hear in the portion where the input SNR was low.

その原因を探ったところ、優先音のゲインα１の上昇が穏やかすぎて必要な値が確保できていないためであるとわかった。これに対処するためにはゲイン増加のステップサイズΔ１を大きくすればよいが、Δ１を大きくすると、ゲインの推移やゲインの差分の推移に大きな不連続が生じるおそれがある。この場合、スペクトルの散逸（ノイズの発生）が起こってしまう。 As a result of searching for the cause, it was found that the increase in the gain α1 of the priority sound was too gentle to secure a necessary value. In order to cope with this, the gain increase step size Δ1 may be increased. However, if Δ1 is increased, there is a possibility that a large discontinuity occurs in the gain transition and the gain difference transition. In this case, spectrum dissipation (noise generation) occurs.

そこで、発展例１では、以下のようにしてα１、α２を平滑化し、平滑化されたゲインβ１、β２を用いる。これによってゲイン調整のステップサイズΔ１、Δ２を１０倍以上に大きくしても、スペクトル散逸の問題を回避することができる。 Accordingly, in the first development example, α1 and α2 are smoothed as follows, and smoothed gains β1 and β2 are used. As a result, the problem of spectral dissipation can be avoided even if the gain adjustment step sizes Δ1 and Δ2 are increased by 10 times or more.

ここで、ηは指数平滑化法の係数であり、平滑の時定数τ_αから式（４４）で導出する。 Here, η is a coefficient of the exponential smoothing method, and is derived from the smoothing time constant τ _α by the equation (44).

このようにして生成したβ１、β２は、上述した音源セット１、セット２のいずれに対しても良好なミキシング結果をもたらす。計算負荷や回路規模に支障がなければ、発展例１のゲインの平滑化を行うのが望ましい。 Β1 and β2 generated in this way provide a good mixing result for both the sound source set 1 and the set 2 described above. If there is no problem with the calculation load and the circuit scale, it is desirable to smooth the gain in the first development example.

図１１は、音源セット２を用い、平滑化されたゲインβ１、β２でミキシングしたときのミキシング装置１の動作状態を示す。図１１で、横軸は時間、縦軸は周波数である。図１１（Ａ）は優先信号Ｘ１としての音声、図１１（Ｂ）は非優先信号Ｘ２としての音楽、図１１（Ｃ）は従来の単純加算によるミキシング結果（Ｘ１＋Ｘ２）、図１１（Ｄ）は実施形態のミキシング結果である。図１１（Ｅ）は優先信号の平滑化後のパワーＥ１、図１１（Ｆ）は非優先信号の平滑化後のパワーＥ２である（図３参照）。図１１（Ｇ）は聴感補正後のパワーＰ１、図１１（Ｈ）は聴感補正後のパワーＰ２（図４参照）を諧調表示したものである。 FIG. 11 shows an operation state of the mixing apparatus 1 when the sound source set 2 is used and mixing is performed with smoothed gains β1 and β2. In FIG. 11, the horizontal axis represents time, and the vertical axis represents frequency. 11A shows the sound as the priority signal X1, FIG. 11B shows the music as the non-priority signal X2, FIG. 11C shows the mixing result (X1 + X2) by the conventional simple addition, and FIG. It is a mixing result of an embodiment. FIG. 11E shows power E1 after smoothing the priority signal, and FIG. 11F shows power E2 after smoothing the non-priority signal (see FIG. 3). FIG. 11 (G) is a gradation display of power P1 after auditory correction, and FIG. 11 (H) is a gradation display of power P2 (see FIG. 4) after auditory correction.

図１１（Ｇ）と図１１（Ｈ）において、淡い灰色の領域が０ｄＢ以上２０ｄＢ未満、黒色の領域が２０ｄＢ以上４０ｄＢ未満、濃い灰色の領域が４０ｄＢ以上の領域である。すなわち、聴感補正により可聴として取り扱われたのは、白色以外の領域である。図１１（Ｇ）で、線分ｅで示される領域が有音判定（ｅ［ｉ］）された領域、線分lで示される領域が低ＳＮＲ判定（l［ｉ］）された領域、線分ｂで示される領域がブースト判定（ｂ［ｉ］）された領域である。用いた音源セット２は、優先音のＳＮＲが低い音源セットであり、５秒以降の時間区間では、有音区間はすべてブースト処理の対象となっている。 In FIGS. 11G and 11H, a light gray area is 0 dB or more and less than 20 dB, a black area is 20 dB or more and less than 40 dB, and a dark gray area is an area of 40 dB or more. That is, the region other than white is treated as audible by auditory correction. In FIG. 11G, the area indicated by the line segment e is the area where the sound is determined (e [i]), the area indicated by the line segment l is the area where the low SNR is determined (l [i]), and the line The area indicated by the minute b is an area for which a boost determination (b [i]) has been made. The used sound source set 2 is a sound source set having a low priority sound SNR, and in the time interval after 5 seconds, all the sound intervals are the targets of boost processing.

図１１（Ｉ）は、発展例１で平滑化されたゲインβ１に基づいて作成された優先音のゲインマスクであり、β１の対数を濃淡表示した図である。白色が０ｄＢ、黒色が３５ｄＢに相当する。図１１（Ｊ）は、発展例１で平滑化されたゲインβ２に基づいて作成された非優先音のゲインマスクであり、β２の値を濃淡表示した図である。白色が１．０、黒色が０．０に相当する。 FIG. 11 (I) is a gain mask for the priority sound created based on the gain β1 smoothed in the first development example, and is a diagram in which the logarithm of β1 is displayed in shades. White corresponds to 0 dB and black corresponds to 35 dB. FIG. 11J is a non-priority sound gain mask created based on the gain β2 smoothed in the first development example, and shows the value of β2 in grayscale. White corresponds to 1.0 and black corresponds to 0.0.

図１１（Ｉ）及び図１１（Ｊ）のゲインマスクを用いてゲイン調整した後に加算することによって、時間周波数平面でのきめ細かなミキシングが可能になる。従来法による図１１（Ｃ）では、低周波領域で非優先音（ギター）の成分しか見えないのに対し、図１１（Ｄ）では、優先音（音声）の成分が混ざりあっている。
＜発展例２＞
上述した実施形態では、演算量を低減するために、方程式を解くのではなく、不等式の真偽判定による逐次更新を行っている（処理（５））。特に、ＦＰＧＡの実装に際しては、極力処理を簡略化したい。そこで、７０ phonと２０ phonのラウドネス曲線を信号処理部１５にセットして、式（１０）〜（１２）により順次Ｃ_Ｌｐ［ｋ］、Ａ［ｋ］、Ｂ［ｋ］を導出する方法に替えて、最初からＢ［ｋ］を与える。たとえば出荷時に補正係数Ｂ［ｋ］（最小可聴パワーＡ［ｉ］の逆数）を定数テーブルとして与えておく。動作中に一時的に合理性を無視しても特に強い優先感を与えたくなったりしたなどの場合、Ｂ［ｋ］に強制的に任意の値を代入して自由に好みの特性を持たせることも可能である。 By performing gain adjustment using the gain masks of FIGS. 11 (I) and 11 (J) and then adding, fine mixing in the time-frequency plane becomes possible. In FIG. 11C according to the conventional method, only the non-priority sound (guitar) component can be seen in the low frequency region, whereas in FIG. 11D, the priority sound (sound) component is mixed.
<Development example 2>
In the above-described embodiment, in order to reduce the amount of calculation, the equation is not solved but sequential updating is performed by determining whether the inequality is true or false (processing (5)). In particular, when implementing an FPGA, we want to simplify the processing as much as possible. Therefore, a loudness curve of 70 phon and 20 phon is set in the signal processing unit 15, and C _Lp [k], A [k], and B [k] are sequentially derived from the equations (10) to (12). Instead, B [k] is given from the beginning. For example, the correction coefficient B [k] (reciprocal of the minimum audible power A [i]) is given as a constant table at the time of shipment. If you want to give a particularly strong priority even if you ignore the rationality temporarily during operation, for example, you can forcibly assign an arbitrary value to B [k] to give you the desired characteristics. It is also possible.

図１２は、図９（Ｂ）の等ラウドネス曲線を設定し、発展例２を適用したときの補正係数Ｂ［ｉ］の具体例を示す。図１２の場合、Ｂ［ｉ］をテーブルとして記憶する替わりに関数としてあらかじめ記憶しておいてもよい。
＜処理フロー＞
図１３Ａ及び図１３Ｂは、ミキシング装置１の信号処理部１５のゲイン決定部１５１で実行されるゲイン決定の処理の一例を示すフローである。この処理フローは、ゲインαを平滑化してゲインβを生成する発展例１に対応する。 FIG. 12 shows a specific example of the correction coefficient B [i] when the equal loudness curve of FIG. 9B is set and the development example 2 is applied. In the case of FIG. 12, B [i] may be stored in advance as a function instead of being stored as a table.
<Processing flow>
13A and 13B are flowcharts showing an example of gain determination processing executed by the gain determination unit 151 of the signal processing unit 15 of the mixing apparatus 1. This processing flow corresponds to the first development example in which the gain α is smoothed to generate the gain β.

まず、α１［ｋ］、α２［ｋ］、β１［ｋ］、β２［ｋ］をすべての周波数ビンｋについて「１」に初期化し、聴感補正総パワーＱ１＝０、Ｑ２＝０、ｉ＝０に設定して係数Ｂ［ｋ］を読み込む（Ｓ１１）。ｋ＝０から処理を開始し（Ｓ１２），平滑化パワーＥ１［ｉ，ｋ］、Ｅ２［ｉ，ｋ］を読み込んで（Ｓ１３）、聴感補正パワーＰ１［ｉ，ｋ］、Ｐ２［ｉ，ｋ］を求め（Ｓ１４）、聴感補正総パワーＱ１［ｉ］とＱ２［ｉ］を計算する（Ｓ１５）。ｋの値をインクリメントして（Ｓ１６）、ｋが周波数ビン数Ｎ_Ｈに達するまで（Ｓ１７でＮＯ）、Ｓ１３〜Ｓ１６を繰り返す。これは、周波数ビンｋについてのループの１回目のパスである。 First, α1 [k], α2 [k], β1 [k], β2 [k] are initialized to “1” for all frequency bins k, and auditory correction total power Q1 = 0, Q2 = 0, i = 0. And the coefficient B [k] is read (S11). The process is started from k = 0 (S12), the smoothing powers E1 [i, k] and E2 [i, k] are read (S13), and the auditory correction powers P1 [i, k] and P2 [i, k ] Is calculated (S14), and audible correction total powers Q1 [i] and Q2 [i] are calculated (S15). The value of k is incremented (S16), and S13 to S16 are repeated until k reaches the frequency bin number _NH (NO in S17). This is the first pass of the loop for frequency bin k.

ｋがＮ_Ｈを超えると（Ｓ１７でＹＥＳ）、有音判定結果ｅ［ｉ］、低ＳＮＲ判定結果l［ｉ］、ブースト判定結果ｂ［ｉ］、ブーストレシオの分子ｂ_ｎ［ｉ］、ブーストレシオの分母ｂ_ｄ［ｉ］を求めて（Ｓ１８）、ｋについてのループの２回目のパスの処理を開始する（Ｓ１９）。周波数ビンｋについて、優先音のｋがゲイン調整する最低ビンｋ_１Ｌと最高ビンｋ_１Ｈの範囲内にあるか否かを判断する（Ｓ２０）。範囲内にある場合に、平滑化パワーＥ１［ｉ，ｋ］、Ｅ２［ｉ，ｋ］を読み込み（Ｓ２１）、Ｐ１（優先音の聴感補正パワー），Ｐ２（非優先音の聴感補正パワー）、Ｌ１（更新前のゲインα１での優先音の聴感補正パワー）、Ｌ_１ｐ（優先音のゲインを増加させたときの聴感補正パワー）、Ｌ２（更新前のゲインα２での非優先音の聴感補正パワー）、Ｌ_２ｍ（非優先音のゲインをΔ２減少させたときの聴感補正パワー）、Ｌ（式（２３））、Ｌ_ｐ（式（２５））を求める（Ｓ２２）。 When k exceeds _NH (YES in S17), the sound determination result e [i], low SNR determination result l [i], boost determination result b [i], boost ratio numerator b _n [i], boost The ratio denominator b _d [i] is obtained (S18), and the process of the second pass of the loop for k is started (S19). For the frequency bin k, it is determined whether or not the priority sound k is within the range of the lowest bin k _1L and the highest bin k _1H for gain adjustment (S20). If within the range, the smoothing powers E1 [i, k] and E2 [i, k] are read (S21), P1 (priority audibility correction power), P2 (non-priority audibility correction power), L1 (Hearing correction power of priority sound with gain α1 before update), L _1p (Hearing correction power when gain of priority sound is increased), L2 (Hearing correction power of non-priority sound with gain α2 before update) (Power), L _2m (audience correction power when the gain of the non-priority sound is decreased by Δ2), L (formula (23)), and L _p (formula (25)) are obtained (S22).

求めた値を用いて、式（２８）〜式（３２）のすべてが成立するか否か、すなわちα１を増加するか否かを判断する（Ｓ２３）。成立する場合に（Ｓ２３でＹＥＳ）、α１を増加し（Ｓ２４）、成立しない場合は（Ｓ２３でＮＯ）、α１を維持する。 Using the obtained value, it is determined whether or not all of the equations (28) to (32) are satisfied, that is, whether or not α1 is increased (S23). If true (YES in S23), α1 is increased (S24). If not true (NO in S23), α1 is maintained.

次に、式（３３）〜式（３７）のいずれかが成立し、かつ式（３８）が成立するか否か、すなわちα１を減少するか否かを判断する（Ｓ２５）。Ｓ２５の条件が満たされない場合はα１を維持し、満たされる場合にα１を減少する（Ｓ２６）。さらに、減少後のα１が１未満か否かを判断する（Ｓ２７）。α１が１未満になった場合はα１を１に戻し（Ｓ２８）、α１が１以上であれば更新されたα１を維持する。 Next, it is determined whether or not any of Expression (33) to Expression (37) is satisfied and Expression (38) is satisfied, that is, whether or not α1 is decreased (S25). If the condition of S25 is not satisfied, α1 is maintained, and if satisfied, α1 is decreased (S26). Further, it is determined whether α1 after the decrease is less than 1 (S27). When α1 becomes less than 1, α1 is returned to 1 (S28), and when α1 is 1 or more, the updated α1 is maintained.

続いて、非優先音のｋがゲイン調整する最低ビンｋ_２Ｌと最高ビンｋ_２Ｈの範囲内にあるか否かを判断する（Ｓ２９）。範囲内にある場合に、Ｌ_１ａ（調整後のゲインα１を用いた場合の優先音に関する聴感補正パワー）を求め（Ｓ３０）、式（３９）と式（４０）が成立するか否か、すなわちα２を減少するか否かを判断する（Ｓ３１）。成立する場合に（Ｓ３１でＹＥＳ）、α２を減少し（Ｓ３２）、成立しない場合は（Ｓ３１でＮＯ）、α２を維持する。 Subsequently, it is determined whether or not k of the non-priority sound is within the range of the lowest bin k _2L and the highest bin k _2H for gain adjustment (S29). If it is within the range, L _1a (audibility correction power related to the priority sound when the adjusted gain α1 is used) is obtained (S30), and whether or not the expressions (39) and (40) are satisfied, that is, It is determined whether or not α2 is decreased (S31). If true (YES in S31), α2 is decreased (S32). If not true (NO in S31), α2 is maintained.

次に、式（４１）と式（４２）が成立するか否か、すなわちα２を増加するか否かを判断する（Ｓ３３）。Ｓ３３の条件が満たされない場合はα２を維持し、満たされる場合にα２を増加する（Ｓ３４）。さらに、増加後のα２が１を超えるか否かを判断し（Ｓ３５）、１以下であれば増加後のα２を維持し、１を超える場合はα２を１に戻す（Ｓ３６）。 Next, it is determined whether or not Expression (41) and Expression (42) are satisfied, that is, whether or not α2 is increased (S33). When the condition of S33 is not satisfied, α2 is maintained, and when satisfied, α2 is increased (S34). Further, it is determined whether or not the increased α2 exceeds 1 (S35). If it is 1 or less, the increased α2 is maintained, and if it exceeds 1, α2 is returned to 1 (S36).

次に、式（４３）に基づき、α２とα１をそれぞれ平滑化してβ２とβ１を生成し（Ｓ３７及びＳ３８）、β１とα１を出力する（Ｓ３９）。その後、ｋをインクリメントして（Ｓ４０）、ｋがＮ_Ｈに達するまで（Ｓ４１でＮＯ）、Ｓ２０〜Ｓ４０を繰り返す。 Next, based on Expression (43), α2 and α1 are respectively smoothed to generate β2 and β1 (S37 and S38), and β1 and α1 are output (S39). Thereafter, k is incremented (S40), and S20 to S40 are repeated until k reaches _NH (NO in S41).

ｋがＮ_Ｈを超えると（Ｓ４１でＹＥＳ）、時間ブロックｉをインクリメントして（Ｓ４２），最後の時間ブロックｉになるまで（Ｓ４３でＮＯ），Ｓ１２〜Ｓ４２を繰り返し、最後の時間ブロックｉの処理が終わったらプロセスを終了する。 When k exceeds _NH (YES in S41), the time block i is incremented (S42), and S12 to S42 are repeated until the last time block i is reached (NO in S43). When the process is finished, end the process.

図１３Ａ及び図１３Ｂの処理では、ブーストレシオの分子ｂ_ｎ［ｉ］と分母ｂ_ｄ［ｉ］を決定するために（Ｓ１８）、すべての周波数ビンｋについてＸ１［ｉ，ｋ］とＸ２［ｉ，ｋ］を参照しなければならない。一方、ゲインα１［ｋ］とα２［ｋ］の更新には、Ｘ１［ｉ，ｋ］とＸ２［ｉ，ｋ］だけではなく、ｂ_ｎ［ｉ］とｂ_ｄ［ｉ］が必要である。このため周波数ビンｋについてのループ処理を２回行わないと、ゲインを１回更新することができない。 In the process of FIGS. 13A and 13B, in order to determine the boost ratio numerator b _n [i] and denominator b _d [i] (S18), X1 [i, k] and X2 [i] for all frequency bins k. , K]. On the other hand, updating the gains α1 [k] and α2 [k] requires not only X1 [i, k] and X2 [i, k] but also b _n [i] and b _d [i]. Therefore, the gain cannot be updated once unless the loop processing for the frequency bin k is performed twice.

そこで、図１４では、ブーストレシオだけは前回のサンプルで計算しておき、これを流用することによって、２パスの処理から１パスの処理へ軽減する。これにより、回路の簡素化や高速化が実現する。 Therefore, in FIG. 14, only the boost ratio is calculated with the previous sample, and this is diverted to reduce the processing from two passes to one pass. Thereby, simplification and high speed of the circuit are realized.

図１４では、図１３Ａ及び図１３Ｂとの差分のみを説明する。Ｓ１１〜Ｓ１５で、初期化、必要なパラメータの読み込みと算出を行った後、Ｓ２０に飛んで、Ｓ２１〜Ｓ２８によるα１の調整と、Ｓ２９〜Ｓ３６によるα２の調整を行う。その後、調整後のα１、α２を平滑化してβ１、β２を求め、着目している時間ブロックｉについて処理範囲のすべての周波数ｋについてゲインを求める（Ｓ１３〜Ｓ４１の繰り返し）。その後、時間ブロックｉをインクリメントし（Ｓ４２）、ｅ［ｉ］、l［ｉ］、ｂ［ｉ］、ｂ_ｎ［ｉ］、ｂ_ｄ［ｉ］を求め（Ｓ５１）、これらのパラメータを次の時間ブロックｉ＝ｉ＋１の処理に用いる。 In FIG. 14, only the differences from FIGS. 13A and 13B will be described. After initialization, reading and calculation of necessary parameters in S11 to S15, the process jumps to S20 to adjust α1 by S21 to S28 and α2 by S29 to S36. Thereafter, α1 and α2 after adjustment are smoothed to obtain β1 and β2, and gains are obtained for all frequencies k in the processing range for the time block i of interest (repetition of S13 to S41). Thereafter, increments the time block i (S42), e [i ], l [i], b [i], b n [i], b d seek [i] (S51), these parameters: Used for processing of time block i = i + 1.

この簡略化について、優先音のＳＮＲが低い音源セット２について調べたところ、出力値は完全に同一ではないが、その差分は聴き取ることのできない微小なレベルであった。 As for this simplification, when the sound source set 2 with a low priority sound SNR was examined, the output values were not completely the same, but the difference was a minute level that could not be heard.

以上述べた方法により、入力信号１と入力信号２のそれぞれにつき、合理的な判定基準に基づいて最適なゲイン値を決定することができる。また、加減算と乗算のみの演算による逐次更新により、演算量を大幅に低減することができる。 By the method described above, an optimum gain value can be determined for each of the input signal 1 and the input signal 2 based on a reasonable determination criterion. In addition, the amount of calculation can be greatly reduced by sequential updating by only the addition / subtraction and multiplication.

上述した信号処理部１５の処理は、ハードウエアによってもソフトウエアによっても実現することができる。信号処理部１５の処理により、入力信号をそのまま用いてミキシング処理を行い、自然な聴感で優先音を目立たせ（処理（１）の対数強度の和の原理）、非優先音を違和感なく抑制することができる（処理（２）の穴埋めの原理）。優先音と非優先音のゲインを決める規範が合理的に決められているため（処理（１）〜処理（３））、ユーザが音を聴きながらパラメータの調整をする必要はない。 The processing of the signal processing unit 15 described above can be realized by hardware or software. By the processing of the signal processing unit 15, mixing processing is performed using the input signal as it is, the priority sound is made to stand out with a natural audibility (the principle of the sum of logarithmic intensity of the process (1)), and the non-priority sound is suppressed without a sense of incongruity. (Principle of filling hole in process (2)). Since the norm for determining the gain of the priority sound and the non-priority sound is reasonably determined (processing (1) to processing (3)), it is not necessary for the user to adjust the parameters while listening to the sound.

非優先音に対して優先音のパワーが極端に小さい場合でも、優先音を目立たせることができる（処理（４））。たとえば、音楽にナレーションを重ねる場合に、ささやき声でも音楽に埋もれることなく聴き取らせることができる。 Even when the power of the priority sound is extremely small with respect to the non-priority sound, the priority sound can be conspicuous (process (4)). For example, when narrating music, the whispers can be heard without being buried in the music.

また、乗算と加減算だけでゲインを更新する手法により、ＦＰＧＡ上へのハードウエア実装が容易になる（処理（５））。また、ＤＡＷ上のプラグインとして実装し、リアルタイムに動作するミキシング装置が実現される。これは、逐次更新によって計算負荷を軽くしたことと、ゲインだけの調整で高性能のミキシングが可能になったためＦＦＴの点数を２５６点程度まで小さくできることから生じる効果である。 Further, the method of updating the gain only by multiplication and addition / subtraction facilitates hardware mounting on the FPGA (processing (5)). Also, a mixing device that is mounted as a plug-in on the DAW and operates in real time is realized. This is due to the fact that the calculation load is reduced by sequential updating and that the FFT score can be reduced to about 256 points because high-performance mixing is possible by adjusting only the gain.

なお、優先音と非優先音のパワーの平滑化は必須ではなく、時間周波数平面上に展開された入力信号のパワーから直接ゲインα１、α２を求めてもよい。 Note that smoothing of the power of the priority sound and the non-priority sound is not essential, and the gains α1 and α2 may be obtained directly from the power of the input signal developed on the time-frequency plane.

上述したミキシング装置は、入力されたオーディオ信号を、合理的に決定されるゲインを用いて高速に自動合成できるため、レコーディングだけではなく、ニュース速報、カーナビゲーション、ディスクジョッキー、会議、カラオケ装置などに広く適用することができる。たとえば、番組を邪魔せずに緊急速報を報道する、カーステレオで音楽を流している場合でもカーナビゲーションの音声が聴こえやすい、ディスクジョッキーが音楽の音量を下げずにトークできる、会議時に司会者の声を際立たせる、伴奏に対してボーカル音を自動調整する、などである。 The above-mentioned mixing device can automatically synthesize the input audio signal at high speed using a reasonably determined gain, so that it can be used not only for recording but also for breaking news, car navigation, disc jockey, conference, karaoke device, etc. Can be widely applied. For example, you can report emergency bulletins without disturbing the program, you can easily hear the car navigation sound even if you are playing music on a car stereo, and the disc jockey can talk without lowering the volume of the music. To make the voice stand out, automatically adjust the vocal sound to the accompaniment.

また、パソコンやスマートフォン等のユーザ端末装置にミキシングプログラムをインストールすることで、ユーザが所望の音楽をミキシングし、あるいは所望の画像に所望の音楽を重畳して、通信相手に送信することができる。受信側では、受け取ったデータに対してさらに音声を重畳して保存、あるいは返信することができる。 In addition, by installing a mixing program in a user terminal device such as a personal computer or a smartphone, the user can mix desired music or superimpose desired music on a desired image and transmit it to a communication partner. On the receiving side, it is possible to further superimpose the voice on the received data and save or return it.

１ミキシング装置
１１信号入力部
１２周波数解析部
１５信号処理部
１６周波数時間変換部
１７信号出力部
１５１ゲイン決定部 DESCRIPTION OF SYMBOLS 1 Mixing device 11 Signal input part 12 Frequency analysis part 15 Signal processing part 16 Frequency time conversion part 17 Signal output part 151 Gain determination part

Claims

A frequency analysis unit that develops the first input signal and the second input signal in the time domain into a first signal and a second signal on a time frequency plane, respectively;
A signal processing unit for generating a mixed signal obtained by mixing the first signal and the second signal;
A frequency time conversion unit for converting the mixed signal into a time domain signal;
A signal output unit for outputting the converted signal;
Have
The signal processing unit
For each point on the time-frequency plane, the gain is determined on condition that the log intensity of the output signal does not exceed the sum of the log intensity of the first signal and the log intensity of the second signal, a first gain for adjusting the power of the signal in a first direction, said a gain that is determined by the condition that the first does not exceed the adjustment amount of the signal power, the power of the second signal said first A gain determining unit that determines a second gain to be changed in a second direction opposite to the direction ;
An adder for adding the first signal adjusted by the first gain and the second signal adjusted by the second gain;
A mixing apparatus comprising:

The gain determination unit sets (a) a first upper limit for the adjustment of the first gain so as not to exceed a certain multiple of power obtained by simply adding the first signal and the second signal, and (b) the first At least one condition of providing a fixed second upper limit for one gain or (c) providing a fixed lower limit for the second gain is added, and the first gain is within a range satisfying the added condition. The mixing apparatus according to claim 1, wherein the second gain is determined.

3. The gain determination unit according to claim 2, wherein the gain determination unit relaxes the at least one condition when a ratio of the power of the first signal to a power of the second signal is equal to or less than a predetermined ratio. Mixing device.

The gain determination unit corrects the power of the first signal and the power of the second signal to a first audible correction power and a second audible correction power based on the hearing limit level, respectively, and the logarithmic intensity of the output signal 2. The first gain and the second gain are determined within a range that does not exceed the sum of the logarithmic intensity of the first auditory correction power and the logarithmic intensity of the second auditory correction power. Mixing equipment.

The gain determination unit generates a third gain obtained by smoothing the first gain and a fourth gain obtained by smoothing the second gain;
2. The mixing apparatus according to claim 1, wherein the adding unit adds the first signal adjusted with the third gain and the second signal adjusted with the fourth gain. 3.

The mixing apparatus according to claim 1, wherein the gain determination unit sequentially updates the first gain and the second gain for each point on the time-frequency plane.

7. The first input signal is a priority sound that is preferentially clarified by mixing processing, and the second input signal is a non-priority sound other than the priority sound. The mixing apparatus according to item 1.

Receiving a first input signal and a second input signal in a time domain;
Expanding the first input signal and the second input signal into a first signal and a second signal on a time-frequency plane, respectively;
For each point on the time-frequency plane, the gain is determined on condition that the log intensity of the output signal does not exceed the sum of the log intensity of the first signal and the log intensity of the second signal, a first gain for adjusting the power of the signal in a first direction, said a gain that is determined by the condition that the first does not exceed the adjustment amount of the signal power, the power of the second signal said first Determining a second gain to be changed in a second direction opposite to the direction ;
Adding a first multiplication result obtained by multiplying the first signal by the first gain and a second multiplication result obtained by multiplying the second signal by the second gain to generate a mixed signal;
The mixed signal is converted into a time domain signal and output.
A signal mixing method characterized by the above.

A mixing program for causing a computer to execute signal mixing processing, wherein the computer
Receiving a first input signal and a second input signal in a time domain;
Developing the first input signal and the second input signal into a first signal and a second signal on a time-frequency plane, respectively;
For each point on the time-frequency plane, the gain is determined on condition that the log intensity of the output signal does not exceed the sum of the log intensity of the first signal and the log intensity of the second signal, a first gain for adjusting the power of the signal in a first direction, said a gain that is determined by the condition that the first does not exceed the adjustment amount of the signal power, the power of the second signal said first A procedure for determining a second gain to be changed in a second direction opposite to the direction ;
Adding a first multiplication result obtained by multiplying the first signal by the first gain and a second multiplication result obtained by multiplying the second signal by the second gain to generate a mixed signal;
Converting the mixed signal into a time domain signal and outputting the signal;
A mixing program characterized in that it is executed.