JP4378098B2 - Sound source selection apparatus and method - Google Patents

Sound source selection apparatus and method Download PDF

Info

Publication number
JP4378098B2
JP4378098B2 JP2003081858A JP2003081858A JP4378098B2 JP 4378098 B2 JP4378098 B2 JP 4378098B2 JP 2003081858 A JP2003081858 A JP 2003081858A JP 2003081858 A JP2003081858 A JP 2003081858A JP 4378098 B2 JP4378098 B2 JP 4378098B2
Authority
JP
Japan
Prior art keywords
sound source
average sparsity
average
sparsity
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2003081858A
Other languages
Japanese (ja)
Other versions
JP2004287311A (en
Inventor
数学 丁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Faurecia Clarion Electronics Co Ltd
Original Assignee
Clarion Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clarion Co Ltd filed Critical Clarion Co Ltd
Priority to JP2003081858A priority Critical patent/JP4378098B2/en
Publication of JP2004287311A publication Critical patent/JP2004287311A/en
Application granted granted Critical
Publication of JP4378098B2 publication Critical patent/JP4378098B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Description

【0001】
【発明の属する技術分野】
本発明は、複数音源からの音源信号を入力とし、その中から目的音源を選択して、その選択した音源からの音源信号だけを出力する音源選択装置、および音源選択方法に関する。
【0002】
【従来の技術】
音源選択方法は大きく分けると、発声内容依存型と発声内容独立型の2種類に分類できる。発声内容依存型は音源信号の言語的な特徴に基づいて音源を選択する方法である。また、発声内容独立型は音源信号の音響特性に基づいて音源を選択する方法で、スペクトルの平均的な特徴やピッチ(音源の基本周波数)などに基づき音源選択を行っている(例えば、特許文献1参照)。
【0003】
ところで、発声内容依存型は誤選択率が低いという利点を有しているが、目的音源に相当する発話者各々に対する大量かつ長時間の学習(トレーニング)が必要である。実際の音源選択に応用する場合、目的とする発話者を想定することができない場合もあり、この場合は目的音源に対する学習が不可能なため、目的音源に関する情報を事前にシステムに取り込んでおくことができない。また目的音源に関する情報を事前にシステムに取り込むことができたとしても、取り込まれる情報量は大量であり、そのような情報量を保持するためのメモリ必要量が大きくなってしまう。
【0004】
一方、発声内容独立型では、学習過程は必要であるが、短時間の学習で推定された音響的特徴パターンを使用しても、長時間の学習で推定したものと比べて、大きな性能の劣化が無いという特徴がある。
【0005】
【特許公報1】
特開平5−181464号公報
【0006】
【発明が解決しようとする課題】
しかしながら、従来の発声内容独立型では、スペクトルの平均的な特徴やピッチ(音源の基本周波数)だけに基づいて選択を行っているので、音源を誤って選択する可能性が高い。また、ピッチを抽出する処理の計算量が大きく、そのため、処理時間が長くなる。
【0007】
本発明の課題は、誤選択率が低く、かつ処理時間が短い音源選択装置を提供することにある。
【0008】
【課題を解決するための手段】
上記課題を解決するために、本発明は、目的音源から出力される音源信号を取り込んで演算処理を施し、前記音源信号の音響的特徴を表わす第1の平均Sparsityを算出する目的音源情報算出手段と、前記第1の平均Sparsityを記憶する記憶手段と、前記目的音源を含む複数音源の中から目的音源を選択するときに、複数音源から出力される音源信号を取り込んで演算処理を施し、各音源信号の音響的特徴を表わす第2の平均Sparsityを算出する複数音源情報算出手段と、前記記憶手段に記憶した第1の平均Sparsityと前記複数音源情報算出手段で算出した第2の平均Sparsityとの相関を計算する計算手段と、前記第1の平均Sparsityと前記第2の平均Sparsityとの相関値の絶対値が最大となる音源を求め、その音源を目的音源として選択するとともに、その選択した音源からの音源信号を出力する選択出力手段と、を備えたことを特徴としている。
【0009】
上記構成では、音源信号の音響特性に基づいて音源を選択しており、発声内容独立型の音源選択方法を採用している。このように発声内容独立型の音源選択方法を採用することで、上述した発声内容依存型の問題点は解決される。
【0010】
次に、従来の発声内容依存型の音源選択方法の問題点を解決するために、上記構成においては、先ず、ピッチ(音源の基本周波数)を音響的特徴として採用しないようにして、計算処理量を抑えている。しかし、単なる平均スペクトルのみで音源を選択すると、選択の精度が落ちてしまうため、平均スペクトルの代わりに音源信号の第1の平均Sparsityおよび第2の平均Sparsityを用いている。そして、第1の平均Sparsityと第2の平均Sparsityとの相関を計算し、第1の平均Sparsityと第2の平均Sparsityとの相関値の絶対値が最大となる音源を求め、その音源を目的音源として選択することにより、誤選択率を低く抑え、さらには処理時間の向上を図っている。
【0011】
第1の平均Sparsityおよび第2の平均Sparsityについて、より具体的に示せば以下の通りである。
すなわち、本発明は、請求項1において、前記複数音源のうち、k番目の音源からの音源信号sk(t)の短時間フーリエ変換を高速フーリエ変換したSk(tc,f)を、
k(tc,f)=FFT(w(t′)sk(tc+t′))
で表わしたとき、
前記目的音源情報算出手段および前記複数音源情報算出手段は、前記第1の平均Sparsityおよび前記第2の平均Sparsityを、以下の式を用いてそれぞれ算出することを特徴としている。
【数3】

Figure 0004378098
ただし、w(t)は窓関数、tcは窓関数の時間座標、t′は窓内部の時間座標、fは音源信号の周波数、Spar(Sk(f))は第1または第2の平均Sparsityである。
【0013】
【発明の実施の形態】
以下、本発明の実施の形態を図面に従って説明する。
図1は本発明に係る音源選択装置の構成を示すブロック図である。この音源選択装置は、図に示すように、目的音源の特徴抽出部1、目的音源のパラメータ推定部2、記憶装置3、音源毎の特徴抽出部4、音源毎のパラメータ推定部5、類似度計算部6、および音源選択部7を備えている。
【0014】
ここでは、目的音源の特徴抽出部1と目的音源のパラメータ推定部2は目的音源情報算出手段を、記憶装置3は記憶手段を、音源毎の特徴抽出部4と音源毎のパラメータ推定部5は複数音源情報算出手段を、類似度計算部6は計算手段を、音源選択部7は選択出力手段をそれぞれ構成している。
【0015】
入力信号の数はnであり、各入力信号をsk(t)で表わす(なお、k=1,…,nである)。j番の音源を目的音源と仮定すると、入力信号sj(t)は目的音源の特徴抽出部1に入力される。
【0016】
一方、1番〜n番までの複数音源(複数音源の中にはj番の目的音源が含まれている)からの各入力信号s1(t)〜sn(t)は音源毎の特徴抽出部4に入力される。また、各入力信号s1(t)〜sn(t)は音源抽出部7にも入力される。
【0017】
入力信号sj(t)に対しては、図2に示すように、音響的特徴抽出とパラメータ推定による目的信号の学習記憶が実行される。すなわち、目的音源の特徴抽出部1は、入力信号sj(t)を取り込んで、目的音源の音源信号(つまり入力信号sj(t))を教師信号として入力し(ステップS11)、音響的特徴の抽出処理を行う。
【0018】
本実施の形態で使用する音響的特徴を表わす指標は平均Sparsityである。例えば、k番目の入力信号sk(t)の短時間フーリエ変換(STFT:Short-time Fourier Transform)を高速フーリエ変換(FFT:Fast Fourier Transform)したSk(tc,f)は、次の(1)式で表される。
k(tc,f)=FFT(w(t′)sk(tc+t′)) ………(1)
ただし、w(t)は窓関数を、tcは窓関数の時間座標を、t′は窓内部の時間座標を、fは周波数をそれぞれ示している。また、高速フーリエ変換(FFT)のデータ点数は窓関数の長さと同じである。さらに、隣接する2つの窓は一定の間隔に保持される。
【0019】
平均Sparsityは次の(2)式のように定義されている。
【数4】
Figure 0004378098
ここで、E[・]はtc に関する平均値を意味している。
【0020】
目的音源の特徴抽出部1は、音響的特徴の抽出処理として、(1)式に従った窓付けフーリエ変換(FFT)を実行する(ステップS12)。次に、目的音源の特徴を表わすパラメータ推定部2は、(2)式に従う第1の平均Sparsityを算出する(ステップS13)。この第1の平均Sparsityは、Spar(R(f))として記憶装置3に記憶保存される(ステップS14)。なお、R(f)は目的音源信号r(t)の短時間フーリエ変換(STFT)である。
【0021】
一般的に、図2に示した学習過程では学習時間が長いほど学習の結果が良くなり、実際の処理では数秒から数分の時間が必要である。
【0022】
入力信号s1(t)〜sn(t)に対しては、図3に示すように、音源の選択処理が実行される。ここでは、入力された1ブロック(=窓の長さの倍数のサンプル数の集まり)の信号毎に音響的特徴を抽出し、これらの音響的特徴を表わすパラメータを、図2で示した学習段階で求めた目的音源のパラメータと比較し、目的音源と等しいかどうか音源毎に判定する。
【0023】
先ず、音源毎の特徴抽出部4は、入力信号s1(t)〜sn(t)を取り込んで、音源毎に窓付けフーリエ変換(FFT)を実行する(ステップS21)。次に、各音源の短時間フーリエ変換(STFT)の結果を利用して、(2)式に従う第2の平均Sparsityを算出する(ステップS22)。算出した各音源についての第2の平均Sparsity(Spar(Sk(f)))と、記録装置3に記録保存されている、目的音源の第1の平均Sparsity(Spar(R(f)))との相関(又は相関値)を計算する(ステップS23)。第1の平均Sparsity(Spar(R(f)))と第2の平均Sparsity(Spar(Sk(f)))との相関値をcorr(Spar(Sk(f)),Spar(R(f)))で表わす。
【0024】
ステップS23で計算した相関値が最大となる音源の番号(kmax)を求める(ステップS24)。kmaxは目的音源の番号であり、j=kmaxとなる。そして音源選択部7では、入力信号s1(t)〜sn(t)の中からj=kmax番の音源信号、つまり入力信号sj(t)だけを選択して出力する(ステップS25)。
【0025】
本実施の形態によれば、事前に推定された目的音源の平均Sparsityを表わすパラメータとの類似度によって、複数音源の入力信号s1(t)〜sn(t)の中から目的音源の入力信号sj(t)を選択しているので、処理時間を短縮することができ、さらには記憶装置3が必要とするメモリを少なくすることができる。
【0026】
【発明の効果】
以上のように本発明によれば、第1の平均Sparsityと第2の平均Sparsityとの相関を計算し、第1の平均Sparsityと第2の平均Sparsityとの相関値の絶対値が最大となる音源を求め、その音源を目的音源として選択することにより、誤選択率を低く抑え、さらに処理時間を短縮することが可能となる。
【図面の簡単な説明】
【図1】本発明に係る音源選択装置の構成を示すブロック図である。
【図2】目的信号の学習記憶手順を示すフローチャートである。
【図3】音源選択段階の手順を示すフローチャートである。
【符号の説明】
1 目的音源の特徴抽出部(目的音源情報算出手段)
2 目的音源のパラメータ推定部(目的音源情報算出手段)
3 記憶装置(記憶手段)
4 音源毎の特徴抽出部(複数音源情報算出手段)
5 音源毎のパラメータ推定部(複数音源情報算出手段)
6 類似度計算部(計算手段)
7 音源選択部(選択出力手段)[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a sound source selection apparatus and a sound source selection method for receiving sound source signals from a plurality of sound sources, selecting a target sound source from the input, and outputting only a sound source signal from the selected sound source.
[0002]
[Prior art]
Sound source selection methods can be broadly classified into two types: utterance content dependent type and utterance content independent type. The utterance content-dependent type is a method of selecting a sound source based on the linguistic characteristics of the sound source signal. In addition, the utterance content independent type is a method of selecting a sound source based on the acoustic characteristics of the sound source signal, and performs sound source selection based on an average characteristic of the spectrum, a pitch (fundamental frequency of the sound source), etc. 1).
[0003]
By the way, although the utterance content-dependent type has an advantage of a low erroneous selection rate, a large amount of long-time learning (training) is required for each speaker corresponding to the target sound source. When applying to actual sound source selection, it may not be possible to assume the target speaker. In this case, it is impossible to learn the target sound source. I can't. Even if the information related to the target sound source can be captured in the system in advance, the amount of information to be captured is large, and the amount of memory required to hold such information amount increases.
[0004]
On the other hand, in the utterance content independent type, a learning process is necessary, but even if the acoustic feature pattern estimated by short-term learning is used, the performance degradation is larger than that estimated by long-term learning. There is a feature that there is no.
[0005]
[Patent Publication 1]
JP-A-5-181464 [0006]
[Problems to be solved by the invention]
However, in the conventional utterance content independent type, the selection is based only on the average characteristics and pitch (sound source fundamental frequency) of the spectrum, so there is a high possibility that the sound source is selected by mistake. In addition, the amount of calculation for extracting the pitch is large, and therefore the processing time is long.
[0007]
An object of the present invention is to provide a sound source selection device having a low erroneous selection rate and a short processing time.
[0008]
[Means for Solving the Problems]
In order to solve the above-described problem, the present invention provides a target sound source information calculation unit that takes in a sound source signal output from a target sound source and performs arithmetic processing to calculate a first average sparsity that represents an acoustic feature of the sound source signal. Storage means for storing the first average sparsity, and when selecting a target sound source from among a plurality of sound sources including the target sound source, a sound source signal output from the plurality of sound sources is captured, and calculation processing is performed. A plurality of sound source information calculating means for calculating a second average sparsity representing an acoustic feature of the sound source signal; a first average sparsity stored in the storage means; and a second average sparsity calculated by the plurality of sound source information calculating means. a calculating means for calculating a correlation of the absolute value of the correlation value between said first average Sparsity second average Sparsity is seeking sound source becomes maximum, selects the sound source as a target sound source, the selection It is characterized a selection output means for outputting a sound signal from a sound source, further comprising a was.
[0009]
In the above configuration, the sound source is selected based on the acoustic characteristics of the sound source signal, and the utterance content independent type sound source selection method is employed. By adopting the utterance content independent type sound source selection method in this way, the above-mentioned utterance content-dependent problem is solved.
[0010]
Next, in order to solve the problems of the conventional sound content-dependent sound source selection method, in the above configuration, first, the pitch (basic frequency of the sound source) is not adopted as an acoustic feature, and the calculation processing amount is reduced. Is suppressed. However, if a sound source is selected using only a simple average spectrum, the accuracy of the selection is reduced. Therefore, the first average sparsity and the second average sparsity of the sound source signal are used instead of the average spectrum. Then, the correlation between the first average sparsity and the second average sparsity is calculated, and a sound source having the maximum absolute value of the correlation value between the first average sparsity and the second average sparsity is obtained. By selecting it as a sound source , the erroneous selection rate is kept low and the processing time is improved.
[0011]
More specifically, the first average sparsity and the second average sparsity are as follows.
That is, according to the present invention, in claim 1, S k (t c , f) obtained by performing fast Fourier transform on the short-time Fourier transform of the sound source signal s k (t) from the k-th sound source among the plurality of sound sources,
S k (t c , f) = FFT (w (t ′) s k (t c + t ′))
When represented by
The target sound source information calculating unit and the plurality of sound source information calculating units calculate the first average sparsity and the second average sparsity using the following equations, respectively.
[Equation 3]
Figure 0004378098
Where w (t) is the window function, t c is the time coordinate of the window function, t ′ is the time coordinate inside the window, f is the frequency of the sound source signal, and Spar (S k (f)) is the first or second. Average sparsity.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a sound source selection device according to the present invention. As shown in the figure, the sound source selection device includes a target sound source feature extraction unit 1, a target sound source parameter estimation unit 2, a storage device 3, a sound source feature extraction unit 4, a sound source parameter estimation unit 5, and a similarity. A calculation unit 6 and a sound source selection unit 7 are provided.
[0014]
Here, the target sound source feature extraction unit 1 and the target sound source parameter estimation unit 2 are the target sound source information calculation unit, the storage device 3 is the storage unit, and the sound source feature extraction unit 4 and the sound source parameter estimation unit 5 are The plural sound source information calculation means, the similarity calculation unit 6 constitutes calculation means, and the sound source selection unit 7 constitutes selection output means.
[0015]
The number of input signals is n, and each input signal is represented by s k (t) (k = 1,..., N). Assuming that the jth sound source is the target sound source, the input signal s j (t) is input to the target sound source feature extraction unit 1.
[0016]
On the other hand, each input signal s 1 (t) -s n (t) from a plurality of sound sources 1 to n (a plurality of sound sources include a j-th target sound source) is a feature of each sound source. Input to the extraction unit 4. The input signals s 1 (t) to s n (t) are also input to the sound source extraction unit 7.
[0017]
For the input signal s j (t), as shown in FIG. 2, learning and storage of the target signal is performed by acoustic feature extraction and parameter estimation. That is, the target sound source feature extraction unit 1 takes in the input signal s j (t) and inputs the target sound source signal (that is, the input signal s j (t)) as a teacher signal (step S11). Perform feature extraction processing.
[0018]
The index representing the acoustic feature used in this embodiment is the average sparsity. For example, S k (t c , f) obtained by performing fast Fourier transform (FFT) on a short-time Fourier transform (STFT) of the k-th input signal s k (t) is It is represented by the formula (1).
S k (t c , f) = FFT (w (t ′) s k (t c + t ′)) (1)
Here, w (t) represents the window function, t c represents the time coordinate of the window function, t ′ represents the time coordinate inside the window, and f represents the frequency. Further, the number of data points of the fast Fourier transform (FFT) is the same as the length of the window function. Furthermore, two adjacent windows are kept at a constant interval.
[0019]
The average sparsity is defined as the following equation (2).
[Expression 4]
Figure 0004378098
Here, E [•] means an average value for t c .
[0020]
The target sound source feature extraction unit 1 performs a windowed Fourier transform (FFT) according to the equation (1) as an acoustic feature extraction process (step S12). Next, the parameter estimation unit 2 representing the characteristics of the target sound source calculates a first average sparsity according to equation (2) (step S13). The first average sparsity is stored and saved in the storage device 3 as Spar (R (f)) (step S14). R (f) is a short-time Fourier transform (STFT) of the target sound source signal r (t).
[0021]
In general, in the learning process shown in FIG. 2, the longer the learning time, the better the learning result. In actual processing, a time of several seconds to several minutes is required.
[0022]
As shown in FIG. 3, a sound source selection process is executed for the input signals s 1 (t) to s n (t). Here, acoustic features are extracted for each input signal of one block (= a collection of samples that is a multiple of the window length), and parameters representing these acoustic features are represented in the learning stage shown in FIG. Compared with the parameters of the target sound source obtained in step 1, it is determined for each sound source whether it is equal to the target sound source.
[0023]
First, the feature extraction unit 4 for each sound source takes in the input signals s 1 (t) to s n (t) and performs windowed Fourier transform (FFT) for each sound source (step S21). Next, using the result of the short-time Fourier transform (STFT) of each sound source, a second average sparsity according to equation (2) is calculated (step S22). The calculated second average Sparsity (Spar (S k (f))) for each sound source and the first average Sparsity (Spar (R (f))) of the target sound source recorded and stored in the recording device 3 Is calculated (or correlation value) (step S23). Correlation values between the first average Sparsity (Spar (R (f))) and the second average Sparsity (Spar (S k (f))) are expressed as corr (Spar (S k (f)), Spar (R ( f))).
[0024]
The number (kmax) of the sound source that maximizes the correlation value calculated in step S23 is obtained (step S24). kmax is the number of the target sound source, and j = kmax. The sound source selection unit 7 selects and outputs only the j = kmax number of sound source signals, that is, the input signal s j (t), from the input signals s 1 (t) to s n (t) (step S25). .
[0025]
According to the present embodiment, the input of the target sound source is selected from among the input signals s 1 (t) to s n (t) of a plurality of sound sources according to the similarity to the parameter representing the average sparsity of the target sound source estimated in advance. Since the signal s j (t) is selected, the processing time can be shortened and the memory required for the storage device 3 can be reduced.
[0026]
【The invention's effect】
As described above, according to the present invention, the correlation between the first average sparsity and the second average sparsity is calculated, and the absolute value of the correlation value between the first average sparsity and the second average sparsity is maximized. By obtaining a sound source and selecting the sound source as a target sound source, it is possible to reduce the erroneous selection rate and further reduce the processing time.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a sound source selection device according to the present invention.
FIG. 2 is a flowchart showing a procedure for learning and storing a target signal.
FIG. 3 is a flowchart showing a procedure of a sound source selection stage.
[Explanation of symbols]
1. Target sound source feature extraction unit (target sound source information calculation means)
2 Target sound source parameter estimation unit (target sound source information calculation means)
3. Storage device (storage means)
4 Feature extraction unit for each sound source (multiple sound source information calculation means)
5 Parameter estimation unit for each sound source (multiple sound source information calculation means)
6 Similarity calculation part (calculation means)
7 Sound source selector (selection output means)

Claims (4)

目的音源から出力される音源信号を取り込んで演算処理を施し、前記音源信号の音響的特徴を表わす第1の平均Sparsityを算出する目的音源情報算出手段と、
前記第1の平均Sparsityを記憶する記憶手段と、
前記目的音源を含む複数音源の中から目的音源を選択するときに、複数音源から出力される音源信号を取り込んで演算処理を施し、各音源信号の音響的特徴を表わす第2の平均Sparsityを算出する複数音源情報算出手段と、
前記記憶手段に記憶した第1の平均Sparsityと前記複数音源情報算出手段で算出した第2の平均Sparsityとの相関を計算する計算手段と、
前記第1の平均Sparsityと前記第2の平均Sparsityとの相関値の絶対値が最大となる音源を求め、その音源を目的音源として選択するとともに、その選択した音源からの音源信号を出力する選択出力手段と、を備えたことを特徴とする音源選択装置。
A target sound source information calculating unit that takes in a sound source signal output from the target sound source, performs arithmetic processing, and calculates a first average sparsity representing an acoustic feature of the sound source signal;
Storage means for storing the first average Sparsity;
When a target sound source is selected from a plurality of sound sources including the target sound source, a sound source signal output from the plurality of sound sources is taken and subjected to arithmetic processing to calculate a second average sparsity representing the acoustic characteristics of each sound source signal. A plurality of sound source information calculating means,
Calculating means for calculating a correlation between the first average sparsity stored in the storage means and the second average sparsity calculated by the plural sound source information calculating means;
Selection of a sound source having the maximum absolute value of the correlation value between the first average sparsity and the second average sparsity, selecting the sound source as a target sound source, and outputting a sound source signal from the selected sound source And a sound source selecting device.
請求項1に記載の音源選択装置において、
前記複数音源のうち、k番目の音源からの音源信号sk(t)の短時間フーリエ変換を高速フーリエ変換したSk(tc,f)を、
Sk(tc,f)=FFT(w(t′)sk(tc+t′))
で表わしたとき、
前記目的音源情報算出手段および前記複数音源情報算出手段は、前記第1の平均Sparsityおよび前記第2の平均Sparsityを、以下の式を用いてそれぞれ算出することを特徴とする音源選択装置。
Figure 0004378098
ただし、w(t)は窓関数、tcは窓関数の時間座標、t′は窓内部の時間座標、fは音源信号の周波数、Spar(Sk(f))は第1または第2の平均Sparsityである。
The sound source selection device according to claim 1,
Sk (tc, f) obtained by fast Fourier transforming the short-time Fourier transform of the sound source signal sk (t) from the kth sound source among the plurality of sound sources,
Sk (tc, f) = FFT (w (t ') sk (tc + t'))
When represented by
The target sound source information calculating unit and the plurality of sound source information calculating units calculate the first average sparsity and the second average sparsity using the following equations, respectively.
Figure 0004378098
Where w (t) is the window function, tc is the time coordinate of the window function, t 'is the time coordinate inside the window, f is the frequency of the sound source signal, and Spar (Sk (f)) is the first or second average sparsity. It is.
目的音源からの出力される音源信号を取り込んで演算処理を施し、前記音源信号の音響的特徴を表わす第1の平均Sparsityを算出するとともに、その第1の平均Sparsityを記憶部に記憶しておき、
前記目的音源を含む複数音源の中から目的音源を選択するときに、複数音源から出力される音源信号を取り込んで演算処理を施し、各音源信号の音響的特徴を表わす第2の平均Sparsityを算出する一方、
算出した前記第2の平均Sparsityと前記記憶部に記憶した前記第1の平均Sparsityとの相関値を計算し、その相関値の絶対値が最大となる音源を求め、その音源を目的音源として選択するとともに、その選択した音源からの音源信号を出力することを特徴とする音源選択方法。
The sound source signal output from the target sound source is taken in and subjected to arithmetic processing to calculate a first average sparsity representing the acoustic characteristics of the sound source signal, and the first average sparsity is stored in the storage unit. ,
When a target sound source is selected from a plurality of sound sources including the target sound source, a sound source signal output from the plurality of sound sources is taken and subjected to arithmetic processing to calculate a second average sparsity representing the acoustic characteristics of each sound source signal. While
A correlation value between the calculated second average sparsity and the first average sparsity stored in the storage unit is calculated, a sound source having the maximum absolute value of the correlation value is obtained, and the sound source is selected as a target sound source. And a sound source selection method characterized by outputting a sound source signal from the selected sound source.
請求項に記載の音源選択方法において、
前記複数音源のうち、k番目の音源からの音源信号sk(t)の短時間フーリエ変換を高速フーリエ変換したSk(tc,f)を、
Sk(tc,f)=FFT(w(t′)sk(tc+t′))
で表わしたとき、
前記第1の平均Sparsityおよび前記第2の平均Sparsityを、以下の式を用いてそれぞれ算出することを特徴とする音源選択方法。
Figure 0004378098
ただし、w(t)は窓関数、tcは窓関数の時間座標、t′は窓内部の時間座標、fは音源信号の周波数、Spar(Sk(f))は第1または第2の平均Sparsityである。
In the sound source selection method according to claim 3 ,
Sk (tc, f) obtained by fast Fourier transforming the short-time Fourier transform of the sound source signal sk (t) from the kth sound source among the plurality of sound sources,
Sk (tc, f) = FFT (w (t ') sk (tc + t'))
When represented by
The sound source selection method, wherein the first average sparsity and the second average sparsity are respectively calculated using the following equations.
Figure 0004378098
Where w (t) is the window function, tc is the time coordinate of the window function, t 'is the time coordinate inside the window, f is the frequency of the sound source signal, and Spar (Sk (f)) is the first or second average sparsity. It is.
JP2003081858A 2003-03-25 2003-03-25 Sound source selection apparatus and method Expired - Fee Related JP4378098B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003081858A JP4378098B2 (en) 2003-03-25 2003-03-25 Sound source selection apparatus and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003081858A JP4378098B2 (en) 2003-03-25 2003-03-25 Sound source selection apparatus and method

Publications (2)

Publication Number Publication Date
JP2004287311A JP2004287311A (en) 2004-10-14
JP4378098B2 true JP4378098B2 (en) 2009-12-02

Family

ID=33295284

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003081858A Expired - Fee Related JP4378098B2 (en) 2003-03-25 2003-03-25 Sound source selection apparatus and method

Country Status (1)

Country Link
JP (1) JP4378098B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023127058A1 (en) * 2021-12-27 2023-07-06 日本電信電話株式会社 Signal filtering device, signal filtering method, and program

Also Published As

Publication number Publication date
JP2004287311A (en) 2004-10-14

Similar Documents

Publication Publication Date Title
US20210089967A1 (en) Data training in multi-sensor setups
KR101153093B1 (en) Method and apparatus for multi-sensory speech enhamethod and apparatus for multi-sensory speech enhancement ncement
US20150228277A1 (en) Voiced Sound Pattern Detection
JP4797342B2 (en) Method and apparatus for automatically recognizing audio data
EP1995723A1 (en) Neuroevolution training system
WO2022012195A1 (en) Audio signal processing method and related apparatus
KR20060082465A (en) Method and apparatus for classifying voice and non-voice using sound model
US9058384B2 (en) System and method for identification of highly-variable vocalizations
US11830521B2 (en) Voice activity detection method and system based on joint deep neural network
US20160027421A1 (en) Audio signal analysis
US9767846B2 (en) Systems and methods for analyzing audio characteristics and generating a uniform soundtrack from multiple sources
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
US9570060B2 (en) Techniques of audio feature extraction and related processing apparatus, method, and program
US10522160B2 (en) Methods and apparatus to identify a source of speech captured at a wearable electronic device
CN112052880A (en) Underwater sound target identification method based on weight updating support vector machine
JP4378098B2 (en) Sound source selection apparatus and method
JP2004287010A (en) Method and device for wavelength recognition, and program
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
US11322169B2 (en) Target sound enhancement device, noise estimation parameter learning device, target sound enhancement method, noise estimation parameter learning method, and program
CN115273904A (en) Angry emotion recognition method and device based on multi-feature fusion
JP4127511B2 (en) Sound source selection method and sound source selection device
US11004463B2 (en) Speech processing method, apparatus, and non-transitory computer-readable storage medium for storing a computer program for pitch frequency detection based upon a learned value
CN113593604A (en) Method, device and storage medium for detecting audio quality
CN112735477A (en) Voice emotion analysis method and device
CN112562647A (en) Method and device for marking audio starting point

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060208

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20090203

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A132

Effective date: 20090210

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20090408

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20090818

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20090914

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120918

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130918

Year of fee payment: 4

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

LAPS Cancellation because of no payment of annual fees