EP1605440A1 - Verfahren zur Quellentrennung eines Signalgemisches - Google Patents
Verfahren zur Quellentrennung eines Signalgemisches Download PDFInfo
- Publication number
- EP1605440A1 EP1605440A1 EP05291254A EP05291254A EP1605440A1 EP 1605440 A1 EP1605440 A1 EP 1605440A1 EP 05291254 A EP05291254 A EP 05291254A EP 05291254 A EP05291254 A EP 05291254A EP 1605440 A1 EP1605440 A1 EP 1605440A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- separation
- sources
- covariance
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 37
- 239000000203 mixture Substances 0.000 title description 20
- 239000011159 matrix material Substances 0.000 claims description 26
- 230000003595 spectral effect Effects 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 5
- 230000005236 sound signal Effects 0.000 description 17
- 241001080024 Telles Species 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 230000001143 conditioned effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Definitions
- the present invention relates to a method for determining signals respectively relating to sound sources from a signal from the mixture of these signals.
- the field of the present invention is that of the digital processing of signals relating to sound sources, also called simply sound, audio or audio signals.
- the processing performed on the sound signals is not in the time domain but in the frequency domain.
- a short-term Fourier transform which is a linear transform associating with a signal in the sampled time domain ⁇ x (t 1 ), ..., x (t N ) ⁇ a two-dimensional time-frequency signal noted here x (t k , f), where t k is a frame index of the sampled digital signal and f is a generally discrete frequency index.
- the signal x (t k , f) is therefore a signal of the frequency domain and is in the form of frames indexed at t k .
- s 1 (t, f) follows a Gaussian law centered and of variance ⁇ 2 / i ( f )
- each component of the vector S and W ( t k , f ) can be obtained by the following relation: where e i (f) is the energy fraction of the source i contained a priori in the mixing signal, at the index frequency f, where N is the total number of sources and x ( t k , f ) is the mixing signal.
- the two sound sources were evaluated and their respective characteristic spectral shapes ⁇ 2/1 (f) and ⁇ 2/2 (f), which represent, finally, as it is known, their energy distributions according to the frequency. If we consider that the signals in the frequency domain relating to these two sources s 1 ( t, f ) and s 2 ( t , f ) are gaussian random variables, non-stationary, ⁇ 2/1 (f) and ⁇ 2/2 (f) represent their variance, respectively.
- the Wiener filter has the following main disadvantages. It operates identically on all the frames of the mix sound signal and so it does not hold any changes in the sound energy content from one frame to another. Ultimately, it is not an adaptive filter. Another disadvantage is that it does not takes into account that a characteristic spectral form by sound source then same as the sound sources have a great spectral variety in term stamp, height, intensity, etc.
- the sound signal of each source s i (t) is characterized by a set of K i spectral forms ⁇ 2 / k i (f), k i ⁇ [1, ..., K i ].
- K i spectral forms ⁇ 2 / k i (f), k i ⁇ [1, ..., K i ].
- N sources their mixture is characterized by a set of K 1 x K 2 x ... x K N N N-tuples of characteristic spectral forms ( ⁇ 2 / k l (f), ..., ⁇ 2 / k N (f)).
- the method consists in first choosing the N-tuple of spectral shapes that best corresponds to the sound signal of the mixture.
- it can consist in maximizing the probability of correspondence between the spectrogram of the mixture
- it consists of filtering the mixture by conventional Wiener filtering using the N-tuple of spectral shapes thus selected. It can be seen that this method is adaptive since the choice of the parameters of the filter depends on the frame index t k considered.
- the main disadvantage of this method lies in its algorithmic complexity. Indeed, if K characteristic spectral forms by source i and N sources i are considered in the mixture, K N N-tuples of characteristic spectral forms must be tested for each frame so that the complexity is in O (K n x T) if T is the number of frames of the mix signal to be analyzed. This disadvantage of complexity can make this method unacceptable, especially when the number of characteristic spectral forms per source is relatively large.
- the sound signal of each source s i (t) is characterized by a set of K i characteristic spectral forms ⁇ 2 / k i (f) but which are there grouped in a dictionary of spectral forms.
- 2 is decomposed on the union of the dictionaries in presence and it is thus possible to write: where the coefficients a k i (t), are called "amplitude factors", are the unknowns to solve.
- Equation above can be rewritten as follows: e i (t k , f) represents the fraction of energy of the source i contained in the mixture to be analyzed.
- a first method for estimating the sound signals from sources 1 to N is to implement Wiener time-frequency filtering, which is nevertheless adaptive since it depends on the frame index t.
- This filter is called a generalized Wiener filter. So for the source i, the estimate s and i, W boy Wut ( t k , f ):
- This second method by the use of a dictionary of characteristic spectral shapes has the advantage over the previous method of reducing the algorithmic complexity. Indeed, for n sources each having K spectral forms, the algorithmic complexity is in O (nx K x T) where T is the number of frames to be analyzed, therefore lower than that of the previous method which was in O (K n x T).
- the human auditory system is indeed very sensitive to phase coherences in the audio signals, in particular inter-frame coherences for fixed f (coherent phase between s ( t k +1 , f ) and s ( t k , f )) and the phase coherences for the same frame but for different values of the frequency f (phase of s ( t k , f ) for different values of f).
- phase coherence effects are particularly sensitive on harmonic sounds, such as the sounds of a musical instrument, or voiced sounds, while they are less important on white noise, pink, etc. or the sounds of percussion instruments.
- the purpose of this is to propose a method of separating signals relating to sound sources from a signal derived from a mixture of these signals that does not present the phase inconsistencies of the methods cited above.
- This method also applies to non-sonic signals such as all digital signals from the sampling of a transducer allowing the transformation of a physical quantity into an electrical signal.
- said step of determining the separation signal consists in summing the estimated signal and the predicted signal in a weighted manner, said weighting coefficient being determined so as to minimize the covariance of the separation signal.
- the estimation signal is weighted by a first matrix coefficient while the predicted signal is weighted by a second matrix coefficient equal to the unit matrix minus the first matrix coefficient, said first matrix coefficient being determined so as to minimize the covariance of the separation signal.
- the present invention provides connecting means between adjacent frames.
- each elementary sound source is determined from a recursively and iteratively.
- FIG. 1 a system for separating sound signals from sound sources according to an embodiment of the present invention which comprises these connecting means between adjacent frames.
- This system essentially consists of an estimation unit 10 which, on the basis of a frequency domain mixing signal denoted x (t k , f) obtained for example by a short-term Fourier transform of the signal x (t) in the sampled time domain, delivers an estimation signal represented by the random variable S e (t k , f), each component of which is / i (t k , f) is the estimation signal for a source of the mixture of index i.
- the estimated signal is represented by a vector of which each component is relative to a source:
- the estimation unit 10 is such that the expectation of the signal at its output is conditioned by the signals x (t k , f) which are actually observed.
- S e ( t k f ) E [ S ( t k , f )
- the estimation unit 10 is for example a Wiener filter (see the different forms of this type of filter given in the preamble of the present description), a unit operating by a time-frequency thresholding method, or by a method said Ephraim and Malah, etc.
- each component of the vector S e (t k , f) can be obtained by the following relation: where e i (t k , f) is the energy fraction of the source i contained in the mixing signal, in the frame of index t k and frequency of index f, where N is the total number of sources and x (t k , f) being the mixing signal.
- K i represents the number of elementary sources considered for the source i
- a k i (t k ) represents the amplitude factor of the elementary source of index k i and ⁇ 2 / k i (f) the variance of this elementary source of index k i .
- the system for separating sound signals from sound sources shown in FIG. 1 still has an update unit 20 and a unit 30. These are the units 20 and 30 that constitute the means of connection inter-frame which are mentioned above.
- the prediction unit 30 is provided to deliver a prediction signal considered as a corresponding random variable S p (t k , f)
- the prediction signal is a vector whose each component is relative to a source:
- the updating unit 20 on the basis of the prediction signal S p (t k , f) delivered by the prediction unit 30 and the estimation signal S e (t k , f) delivered by the estimation unit 10 delivers, for its part, the separation signal whose random variable is denoted S tot (t k , f).
- the separation signal is represented by a vector whose each component is relative to a source:
- the predicted signal for the present frame is based on the separation signal for the previous frame.
- the updating unit 20 it is intended to determine the separation signal S tot (t k , f) by summing the estimation signal S e (t k , f) in a weighted manner and the predicted signal S p (t k , f).
- the estimated signal S e (t k , f) is weighted by a matrix coefficient ⁇ (tk, f) while the predicted signal is weighted by a coefficient I- ⁇ (tk, f) , I being the unit matrix.
- the separation system shown in FIG. 1 is provided for determining the optimum coefficient matrix ⁇ (tk, f) for minimizing the variance of the estimate of the separation signal S tot (t k , f). It can be shown that this optimum value of the weighting factor is given by the following covariance ratio of the predicted Cov p signal (t k , f) and the sum of covariance of the predicted Cov p signal (t k , f) and the covariance of the estimation signal Cov e (t k , f), that is:
- step E10 the updating of the covariance of the predicted signal represented, it is recalled, by the random variable S p (t k + 1 , f) is carried out.
- the module of the function H (f) is indeed equal to 1.
- the variance of the prediction noise var (b p (t k , f)) depends on the sources or sub-sources considered and on the frequency f. It does not depend on the frame considered, so it can also be written:
- Cov tot (t k-1 , f) is a quantity that was calculated at the previous iteration (see step E30 below).
- step E20 the optimal coefficient matrix ⁇ (t k , f) is determined. To do this, we use the expression above:
- the covariance of the predicted separation signal Cov p (t k , f) is given by the calculation performed in step E10.
- the covariance of the Cov e estimation signal (t k , f) it is determined by the characteristic spectral forms ⁇ 2 / k i (f) and the amplitude factors a k i (t k ) sources or elementary sources considered.
- the estimation signal S e (t, f) of the mixture of the set of elementary sources is a Gaussian random variable of variance Cov e (t, f):
- step E30 for the covariance calculations, the next frame is considered and the process is resumed in step E10.
- the expectation of the separation signal S tot / 0 (t k , f) is the output signal of the system. Its components are the signals of separation of each of the sources or elementary sources considered.
- step E60 the expectation of the separation signal of the frame Tr, S tot / o ( t k , f ) is shifted by one frame to obtain the expectation of the separation signal of the frame t k -1 and this latter expectation is used in step E40.
- step E40 After the steps E50 and E60, the following frame is considered and the This process is repeated in step E40 for the steps related to calculations of expectations.
- Steps E10 and E40 are implemented by the prediction unit 30 while steps E20, E30 and E50 are implemented by the setting unit day 20.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0406365A FR2871593B1 (fr) | 2004-06-11 | 2004-06-11 | Procede de determination des signaux de separation respectivement relatifs a des sources sonores a partir d'un signal issu du melange de ces signaux |
FR0406365 | 2004-06-11 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1605440A1 true EP1605440A1 (de) | 2005-12-14 |
EP1605440B1 EP1605440B1 (de) | 2010-11-24 |
Family
ID=34942399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20050291254 Ceased EP1605440B1 (de) | 2004-06-11 | 2005-06-10 | Verfahren zur Quellentrennung eines Signalgemisches |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1605440B1 (de) |
DE (1) | DE602005024890D1 (de) |
FR (1) | FR2871593B1 (de) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112863537A (zh) * | 2021-01-04 | 2021-05-28 | 北京小米松果电子有限公司 | 一种音频信号处理方法、装置及存储介质 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11558699B2 (en) | 2020-03-11 | 2023-01-17 | Sonova Ag | Hearing device component, hearing device, computer-readable medium and method for processing an audio-signal for a hearing device |
-
2004
- 2004-06-11 FR FR0406365A patent/FR2871593B1/fr not_active Expired - Fee Related
-
2005
- 2005-06-10 DE DE200560024890 patent/DE602005024890D1/de active Active
- 2005-06-10 EP EP20050291254 patent/EP1605440B1/de not_active Ceased
Non-Patent Citations (4)
Title |
---|
BENAROYA L ET AL: "Non negative sparse representation for wiener based source separation with a single sensor", 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. (ICASSP). HONG KONG, APRIL 6 - 10, 2003, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, NY : IEEE, US, vol. VOL. 1 OF 6, 6 April 2003 (2003-04-06), pages VI613 - VI616, XP010640826, ISBN: 0-7803-7663-3 * |
ELIE LAURENT BENNAROYA: "Séparation de plusieurs sources sonores avec un seul microphone", 26 June 2003, UNIVERSITE DE RENNES 1, RENNES, XP002346340, 2874 * |
MANDIC D P ET AL: "An on-line algorithm for blind source extraction based on nonlinear prediction approach", NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003. NNSP'03. 2003 IEEE 13TH WORKSHOP ON TOULOUSE, FRANCE SEPT. 17-19, 2003, PISCATAWAY, NJ, USA,IEEE, 17 September 2003 (2003-09-17), pages 429 - 438, XP010712478, ISBN: 0-7803-8177-7 * |
STONE J V: "Blind source separation using temporal predictability", NEURAL COMPUTATION MIT PRESS USA, vol. 13, no. 7, 2001, pages 1559 - 1574, XP002303769, ISSN: 0899-7667 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112863537A (zh) * | 2021-01-04 | 2021-05-28 | 北京小米松果电子有限公司 | 一种音频信号处理方法、装置及存储介质 |
CN112863537B (zh) * | 2021-01-04 | 2024-06-04 | 北京小米松果电子有限公司 | 一种音频信号处理方法、装置及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP1605440B1 (de) | 2010-11-24 |
DE602005024890D1 (de) | 2011-01-05 |
FR2871593A1 (fr) | 2005-12-16 |
FR2871593B1 (fr) | 2007-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2419900B1 (de) | Verfahren und einrichtung zur objektiven evaluierung der sprachqualität eines sprachsignals unter berücksichtigung der klassifikation der in dem signal enthaltenen hintergrundgeräusche | |
EP2415047B1 (de) | Klassifizieren von in einem Tonsignal enthaltenem Hintergrundrauschen | |
EP1730729A1 (de) | Verbessertes sprachsignalumsetzungsverfahren und -system | |
FR2522179A1 (fr) | Procede et appareil de reconnaissance de paroles permettant de reconnaitre des phonemes particuliers du signal vocal quelle que soit la personne qui parle | |
EP0608174A1 (de) | System zur prädiktiven Kodierung/Dekodierung eines digitalen Sprachsignals mittels einer adaptiven Transformation mit eingebetteten Kodes | |
WO2005106853A1 (fr) | Procede et systeme de conversion rapides d'un signal vocal | |
EP0594480A1 (de) | Verfahren zur Erkennung von Sprachsignalen | |
EP3330964A1 (de) | Neuabtastung eines audiosignals für eine kodierung/dekodierung mit geringer verzögerung | |
EP1606792B1 (de) | Verfahren zur analyse der grundfrequenz, verfahren und vorrichtung zur sprachkonversion unter dessen verwendung | |
EP0511095A1 (de) | Verfahren und Vorrichtung zur Kodierung und Dekodierung eines numerischen Signals | |
EP0685833B1 (de) | Verfahren zur Sprachkodierung mittels linearer Prädiktion | |
FR2882458A1 (fr) | Procede de mesure de la gene due au bruit dans un signal audio | |
FR2702075A1 (fr) | Procédé de génération d'un filtre de pondération spectrale du bruit dans un codeur de la parole. | |
EP3040989A1 (de) | Verbessertes trennverfahren und computerprogrammprodukt | |
EP1605440B1 (de) | Verfahren zur Quellentrennung eines Signalgemisches | |
Emiya | Transcription automatique de la musique de piano | |
EP0714088B1 (de) | Sprachaktivitätsdetektion | |
FR2717294A1 (fr) | Procédé et dispositif de synthèse dynamique sonore musicale et vocale par distorsion non linéaire et modulation d'amplitude. | |
EP3155609B1 (de) | Frequenzanalyse mittels phasendemodulation von einem akustischen signal | |
EP0821345B1 (de) | Verfahren zur Bestimmung der Grundfrequenz in einem Sprachsignal | |
FR3051959A1 (fr) | Procede et dispositif pour estimer un signal dereverbere | |
EP1192619B1 (de) | Audio-kodierung, dekodierung zur interpolation | |
EP1194923B1 (de) | Verfahren und system für audio analyse und synthese | |
EP1192618B1 (de) | Audiokodierung mit adaptiver lifterung | |
WO2007068861A2 (fr) | Procede d'estimation de phase pour la modelisation sinusoidale d'un signal numerique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR LV MK YU |
|
17P | Request for examination filed |
Effective date: 20060509 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1084228 Country of ref document: HK |
|
17Q | First examination report despatched |
Effective date: 20060718 |
|
AKX | Designation fees paid |
Designated state(s): DE ES GB IT PL |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: AUDIONAMIX SA |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE ES GB IT PL |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REF | Corresponds to: |
Ref document number: 602005024890 Country of ref document: DE Date of ref document: 20110105 Kind code of ref document: P |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110307 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101124 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20110825 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602005024890 Country of ref document: DE Effective date: 20110825 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20101124 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: WD Ref document number: 1084228 Country of ref document: HK |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20180604 Year of fee payment: 14 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20190506 Year of fee payment: 15 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 602005024890 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20200101 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20200610 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20200610 |