WO2023148955A1 - Dispositif de génération de fenêtre temporelle, procédé et programme - Google Patents
Dispositif de génération de fenêtre temporelle, procédé et programme Download PDFInfo
- Publication number
- WO2023148955A1 WO2023148955A1 PCT/JP2022/004616 JP2022004616W WO2023148955A1 WO 2023148955 A1 WO2023148955 A1 WO 2023148955A1 JP 2022004616 W JP2022004616 W JP 2022004616W WO 2023148955 A1 WO2023148955 A1 WO 2023148955A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- window
- frequency domain
- unit
- analysis window
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000012545 processing Methods 0.000 claims abstract description 72
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 35
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 35
- 230000005236 sound signal Effects 0.000 claims abstract description 10
- 230000009466 transformation Effects 0.000 claims description 17
- 230000001131 transforming effect Effects 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 abstract description 3
- 238000012805 post-processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Definitions
- the present invention relates to technology for processing sound signals such as voice.
- a method using short-time Fourier transform is used to analyze acoustic signals in real time in terms of time and frequency.
- a time window is used to clip a signal of constant length and treat it as a periodic signal.
- the time window has frequency resolution and dynamic range determined according to its shape.
- frequency resolution and dynamic range are in a trade-off relationship. Therefore, if the separation performance of adjacent frequency components is improved, minute components may be overlooked.
- a specific window function is fixed and used, or a plurality of window functions prepared in advance are used by switching (for example, see Non-Patent Document 1). .).
- An object of the present invention is to provide a time window generation device, method, and program for generating an appropriate time window.
- a time window generation device uses a signal clipping unit that creates a clipped signal by clipping a sound signal into a predetermined length, and an analysis window model that is determined by the clipped signal and analysis window model parameters.
- an analysis window generation unit that generates an analysis window;
- a synthesis window generation unit that generates a synthesis window using at least the cutout signal or using the analysis window; and transforms the cutout signal into a frequency domain using the analysis window.
- a frequency domain transformation unit that generates a frequency domain signal by using a frequency domain signal, a signal processing unit that performs a predetermined process on the frequency domain signal to generate a post-processing frequency domain signal, and a post-processing frequency domain signal using a synthesis window.
- a time domain transforming unit that generates a time domain signal by transforming it into the time domain; and a learning unit that learns at least analysis window model parameters using the time domain signal and correct data corresponding to the time domain signal. ing.
- An appropriate time window can be generated.
- FIG. 1 is a diagram showing an example of the functional configuration of a time window generation device.
- FIG. 2 is a diagram showing an example of the processing procedure of the time window generation method.
- FIG. 3 is a diagram illustrating a functional configuration example of a computer.
- the time window generation device includes a signal extraction unit 1, an analysis window generation unit 2, a synthesis window generation unit 3, a frequency domain transformation unit 4, a signal processing unit 5, a time domain transformation unit 6, and a learning unit 7. for example.
- the time window generation method is realized, for example, by each component of the time window generation device performing the processing from step S1 to step S7 described below and shown in FIG.
- the time window means at least one of the analysis window and the synthesis window.
- the signal clipping unit 1 generates a clipped signal by clipping the sound signal to a predetermined length (step S1).
- the generated clipping signal is output to the analysis window generating section 2 and the synthesis window generating section 3.
- a clipping signal is input to the analysis window generator 2 .
- Analysis window model parameters generated by a learning unit 7 to be described later are input to the analysis window generation unit 2 .
- the analysis window generation unit 2 generates an analysis window using the cutout signal and the analysis window model determined by the analysis window model parameters (step S2).
- the generated analysis window is output to the frequency domain transform section 4 .
- the analysis window model parameters are, for example, analysis window model parameters generated by the learning unit 7, which will be described later.
- the analysis window generation unit 2 uses predetermined analysis window model parameters.
- the analysis window generator 2 may generate and output a predetermined analysis window.
- Synthetic window generator 3 A clipping signal is input to the synthesis window generator 3 .
- Synthetic window model parameters generated by a learning unit 7 to be described later are input to the synthetic window generation unit 3 .
- the synthetic window generation unit 3 generates a synthetic window using the clipping signal and the synthetic window model determined by the synthetic window model parameters (step S3).
- the generated synthesis window is output to the time domain transforming section 6 .
- the synthetic window model parameters are, for example, synthetic window model parameters generated by the learning unit 7, which will be described later. If the synthetic window model parameters generated by the learning unit 7 are not present in the first processing of the synthetic window generation unit 3, the synthetic window generation unit 3 uses predetermined synthetic window model parameters. In this case, the synthetic window generator 3 may generate and output a predetermined synthetic window.
- ⁇ Frequency domain transformation unit 4> The cutout signal and the analysis window are input to the frequency domain transforming unit 4 .
- the frequency domain transformation unit 4 generates a frequency domain signal by transforming the clipped signal into the frequency domain using the analysis window (step S4).
- the generated frequency domain signal is output to the signal processing section 5 .
- the frequency domain transformation unit 4 performs transformation into the frequency domain using a technique such as short-time Fourier transformation.
- a frequency domain signal is input to the signal processing unit 5 .
- the signal processing unit 5 performs predetermined processing on the frequency domain signal to generate a post-processing frequency domain signal (step S5).
- the generated post-processing frequency domain signal is output to the time domain transform section 6 .
- An example of the predetermined processing is at least one of processing for enhancing predetermined signals such as voice (in other words, speech signal enhancement processing, processing for suppressing noise), and classification processing for classifying noise and the like.
- the signal processing unit 5 estimates a speech enhancement filter using, for example, a frequency domain signal. Then, the signal processing unit 5 multiplies the estimated speech enhancement filter and the frequency domain signal to generate a frequency domain signal with enhanced speech.
- This frequency domain signal is an example of a post-processing frequency domain signal.
- the predetermined processing may include noise classification processing.
- the signal processing unit 5 uses the frequency domain signal to estimate noise included in the frequency domain signal.
- a noise label that is the estimation result is output to the learning unit 7 .
- ⁇ Time domain conversion unit 6> The processed frequency domain signal and the synthesis window are input to the time domain transformation unit 6 .
- the time domain transforming unit 6 generates a time domain signal by transforming the processed frequency domain signal into the time domain using a synthesis window (step S6).
- the generated time domain signal is output to the learning section 7 .
- the generated time domain signal may be output from the time window generator as a result of predetermined processing by the signal processor 5 .
- the frequency domain transformation unit 4 performs transformation into the time domain using a technique such as inverse short-time Fourier transformation.
- a time domain signal is input to the learning unit 7 .
- Correct data corresponding to the time domain signal is also input to the learning unit 7 .
- the learning unit 7 uses the time domain signal and the correct data corresponding to the time domain signal to learn the analysis window parameter and the synthesis window parameter (step S7).
- Analysis window parameters and synthesis window parameters are learned by, for example, the gradient descent method.
- the predetermined processing in the signal processing unit 5 is audio signal enhancement processing
- the original sound signal without noise becomes the correct data.
- a gradient method or the like is used for learning the analysis window parameter and the synthesis window parameter so that the mean sum of squares error obtained by averaging the squared differences between the time domain signal and the original sound signal becomes small.
- the true noise label is input to the learning unit 7 as correct data.
- the noise label estimated by the predetermined processing in the signal processing unit 5 is further input to the learning unit 7 .
- the learning unit 7 further uses the true noise label and the estimated noise label in addition to the time domain signal and the correct data corresponding to the time domain signal to learn the analysis window parameter and the synthesis window parameter.
- step S1 to step S7 described so far may be repeated as appropriate.
- the learning unit 7 learns the analysis window parameter and the synthesis window parameter based on the time domain signal, which is the signal that has undergone predetermined processing in the signal processing unit 5 and has been converted into the time domain. Therefore, it can be said that the analysis window parameter and the synthesis window parameter are learned in consideration of the predetermined processing of the signal processing section 5 which is the processing subsequent to the analysis window generation section 2 and the synthesis window generation section 3 . By learning the analysis window parameter and the synthesis window parameter in consideration of the subsequent processing in this way, it is possible to generate the time window more appropriately than before.
- data exchange between components of the time window generation device may be performed directly or may be performed via a storage unit (not shown).
- the synthesis window can be generated from the analysis window. Therefore, the synthesis window generator 3 may use the analysis window generated by the analysis window generator 2 to generate a synthesis window. That is, the synthesis window generation unit 3 may generate a synthesis window using at least the clipped signal or the analysis window.
- the learning unit 7 does not need to learn synthetic window model parameters. That is, the learning unit 7 may learn at least the analysis window model parameters using the time domain signal and the correct data corresponding to the time domain signal.
- the predetermined processing in the signal processing unit 5 may be implemented by deep learning.
- the predetermined processing in the signal processing unit 5 may be processing for generating the post-processing frequency domain signal using the frequency domain signal and a model determined by the model parameters.
- the learning unit 7 may further learn the model parameters by using at least the time domain signal and the correct data corresponding to the time domain signal.
- the predetermined processing in the signal processing unit 5 may include processing for estimating a noise label using a frequency domain signal and a model determined by noise estimation model parameters.
- the learning unit 7 further uses the true noise label input to the learning unit 7 and the noise label estimated by the signal processing unit 5 in addition to the time domain signal and the correct data corresponding to the time domain signal.
- the noise estimation model parameters may be further learned.
- a program that describes this process can be recorded on a computer-readable recording medium.
- a computer-readable recording medium is, for example, a non-temporary recording medium, specifically a magnetic recording device, an optical disc, or the like.
- this program will be carried out, for example, by selling, transferring, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded.
- the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to other computers via the network.
- a computer that executes such a program for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer once in the auxiliary recording unit 1050, which is its own non-temporary storage device. Store. When executing the process, this computer reads the program stored in the auxiliary recording section 1050, which is its own non-temporary storage device, into the storage section 1020, and executes the process according to the read program. As another execution form of this program, the computer may read the program directly from the portable recording medium into the storage unit 1020 and execute processing according to the program. It is also possible to execute processing in accordance with the received program each time the is transferred.
- ASP Application Service Provider
- the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer the program from the server computer to this computer, and realizes the processing function only by its execution instruction and result acquisition.
- ASP Application Service Provider
- the program in this embodiment includes information that is used for processing by a computer and that conforms to the program (data that is not a direct instruction to the computer but has the property of prescribing the processing of the computer, etc.).
- the device is configured by executing a predetermined program on a computer, but at least part of these processing contents may be implemented by hardware.
- the signal clipping unit 1, the analysis window generating unit 2, the synthesis window generating unit 3, the frequency domain transforming unit 4, the signal processing unit 5, the time domain transforming unit 6, and the learning unit 7 may be configured by processing circuits.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Complex Calculations (AREA)
Abstract
L'invention concerne un dispositif de génération de fenêtre temporelle comprenant : une unité de découpe de signal 1 pour générer un signal découpé par découpage d'un signal audio à une longueur prédéterminée ; une unité de génération de fenêtre d'analyse 2 pour générer une fenêtre d'analyse par l'utilisation du signal découpé et d'un modèle de fenêtre d'analyse déterminé par un paramètre de modèle de fenêtre d'analyse ; une unité de génération de fenêtre de synthèse 3 pour générer une fenêtre de synthèse par l'utilisation d'au moins le signal découpé ou par l'utilisation de la fenêtre d'analyse ; une unité de conversion au domaine fréquentiel 4 pour générer un signal de domaine fréquentiel par conversion du signal découpé dans un domaine fréquentiel par le biais de l'utilisation de la fenêtre d'analyse ; une unité de traitement de signal 5 pour générer un signal de domaine fréquentiel traité par réalisation d'un processus prédéfini sur le signal de domaine fréquentiel ; une unité de conversion au domaine temporel 6 pour générer un signal de domaine temporel par conversion du signal de domaine fréquentiel traité dans un domaine temporel par le biais de l'utilisation de la fenêtre de synthèse ; et une unité d'apprentissage 7 pour apprendre au moins le paramètre de modèle de fenêtre d'analyse par l'utilisation du signal de domaine temporel et corriger des données de réponse qui correspondent au signal de domaine temporel.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023578328A JPWO2023148955A1 (fr) | 2022-02-07 | 2022-02-07 | |
PCT/JP2022/004616 WO2023148955A1 (fr) | 2022-02-07 | 2022-02-07 | Dispositif de génération de fenêtre temporelle, procédé et programme |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/004616 WO2023148955A1 (fr) | 2022-02-07 | 2022-02-07 | Dispositif de génération de fenêtre temporelle, procédé et programme |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023148955A1 true WO2023148955A1 (fr) | 2023-08-10 |
Family
ID=87551996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/004616 WO2023148955A1 (fr) | 2022-02-07 | 2022-02-07 | Dispositif de génération de fenêtre temporelle, procédé et programme |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2023148955A1 (fr) |
WO (1) | WO2023148955A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH103297A (ja) * | 1996-06-14 | 1998-01-06 | Oki Electric Ind Co Ltd | 背景雑音消去装置 |
JP2015049354A (ja) * | 2013-08-30 | 2015-03-16 | 富士通株式会社 | 音声処理装置、音声処理方法及び音声処理用コンピュータプログラム |
JP2016537891A (ja) * | 2013-11-28 | 2016-12-01 | ヴェーデクス・アクティーセルスカプ | 補聴器システムの動作方法および補聴器システム |
JP2017102488A (ja) * | 2012-05-04 | 2017-06-08 | カオニックス ラブス リミテッド ライアビリティ カンパニー | 源信号分離のためのシステム及び方法 |
JP2020030373A (ja) * | 2018-08-24 | 2020-02-27 | 日本電信電話株式会社 | 音源強調装置、音源強調学習装置、音源強調方法、プログラム |
-
2022
- 2022-02-07 JP JP2023578328A patent/JPWO2023148955A1/ja active Pending
- 2022-02-07 WO PCT/JP2022/004616 patent/WO2023148955A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH103297A (ja) * | 1996-06-14 | 1998-01-06 | Oki Electric Ind Co Ltd | 背景雑音消去装置 |
JP2017102488A (ja) * | 2012-05-04 | 2017-06-08 | カオニックス ラブス リミテッド ライアビリティ カンパニー | 源信号分離のためのシステム及び方法 |
JP2015049354A (ja) * | 2013-08-30 | 2015-03-16 | 富士通株式会社 | 音声処理装置、音声処理方法及び音声処理用コンピュータプログラム |
JP2016537891A (ja) * | 2013-11-28 | 2016-12-01 | ヴェーデクス・アクティーセルスカプ | 補聴器システムの動作方法および補聴器システム |
JP2020030373A (ja) * | 2018-08-24 | 2020-02-27 | 日本電信電話株式会社 | 音源強調装置、音源強調学習装置、音源強調方法、プログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2023148955A1 (fr) | 2023-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6903611B2 (ja) | 信号生成装置、信号生成システム、信号生成方法およびプログラム | |
JP4774100B2 (ja) | 残響除去装置、残響除去方法、残響除去プログラム及び記録媒体 | |
CN113436643B (zh) | 语音增强模型的训练及应用方法、装置、设备及存储介质 | |
KR101224755B1 (ko) | 음성-상태 모델을 사용하는 다중-감각 음성 향상 | |
JP2019078864A (ja) | 楽音強調装置、畳み込みオートエンコーダ学習装置、楽音強調方法、プログラム | |
JP4964259B2 (ja) | パラメタ推定装置、音源分離装置、方向推定装置、それらの方法、プログラム | |
JP2008203474A (ja) | 多信号強調装置、方法、プログラム及びその記録媒体 | |
Bayram et al. | Primal-dual algorithms for audio decomposition using mixed norms | |
WO2023148955A1 (fr) | Dispositif de génération de fenêtre temporelle, procédé et programme | |
Panda et al. | Sliding mode singular spectrum analysis for the elimination of cross-terms in Wigner–Ville distribution | |
Bilen et al. | Joint audio inpainting and source separation | |
WO2023152895A1 (fr) | Système de génération de signal de forme d'onde, procédé de génération de signal de forme d'onde et programme | |
JP7120573B2 (ja) | 推定装置、その方法、およびプログラム | |
JP7375904B2 (ja) | フィルタ係数最適化装置、潜在変数最適化装置、フィルタ係数最適化方法、潜在変数最適化方法、プログラム | |
JP5583181B2 (ja) | 縦続接続型伝達系パラメータ推定方法、縦続接続型伝達系パラメータ推定装置、プログラム | |
WO2021255925A1 (fr) | Dispositif de génération de signal sonore cible, procédé de génération de signal sonore cible, et programme | |
WO2021205494A1 (fr) | Dispositif de traitement de signal, procédé de traitement de signal, et programme | |
JP7156064B2 (ja) | 潜在変数最適化装置、フィルタ係数最適化装置、潜在変数最適化方法、フィルタ係数最適化方法、プログラム | |
JP2022102319A (ja) | ベクトル推定プログラム、ベクトル推定装置、及び、ベクトル推定方法 | |
WO2019208137A1 (fr) | Dispositif de séparation de sources sonores, procédé pour sa mise en œuvre et programme | |
JP2019090930A (ja) | 音源強調装置、音源強調学習装置、音源強調方法、プログラム | |
JP2020030373A (ja) | 音源強調装置、音源強調学習装置、音源強調方法、プログラム | |
WO2022172348A1 (fr) | Procédé de déduction de scène, dispositif de déduction de scène et programme | |
WO2021144934A1 (fr) | Dispositif d'amélioration de voix, dispositif d'apprentissage, procédés associés et programme | |
JP6445417B2 (ja) | 信号波形推定装置、信号波形推定方法、プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22924865 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023578328 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |