WO2021012403A1 - Dual sensor speech enhancement method and implementation device - Google Patents
Dual sensor speech enhancement method and implementation device Download PDFInfo
- Publication number
- WO2021012403A1 WO2021012403A1 PCT/CN2019/110290 CN2019110290W WO2021012403A1 WO 2021012403 A1 WO2021012403 A1 WO 2021012403A1 CN 2019110290 W CN2019110290 W CN 2019110290W WO 2021012403 A1 WO2021012403 A1 WO 2021012403A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- air conduction
- air
- dual
- channel
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000009977 dual effect Effects 0.000 title abstract description 11
- 238000013145 classification model Methods 0.000 claims abstract description 65
- 238000001914 filtration Methods 0.000 claims abstract description 12
- 238000012360 testing method Methods 0.000 claims description 115
- 238000001228 spectrum Methods 0.000 claims description 66
- 238000012549 training Methods 0.000 claims description 46
- 238000013179 statistical model Methods 0.000 claims description 19
- 238000001514 detection method Methods 0.000 claims description 10
- 230000001360 synchronised effect Effects 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 6
- 238000005311 autocorrelation function Methods 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 238000004891 communication Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 3
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- 235000014676 Phragmites communis Nutrition 0.000 description 2
- 230000005534 acoustic noise Effects 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Definitions
- the invention relates to the technical field of speech signal processing, in particular to a dual-sensor speech enhancement method and implementation device based on dual-channel Wiener filtering.
- Speech enhancement technology is an important branch of speech signal processing. The purpose is to extract as much pure original speech as possible from noisy speech, and it is widely used in speech communication, speech compression coding and speech recognition in noisy environments.
- non-air conduction air conduction for short
- speech that is, the use of air conduction sensors (such as microphones) to collect speech
- the enhancement effect is affected.
- the influence of various acoustic noises in the environment is great, and the performance is usually poor in a noisy environment.
- non-air conduction abbreviated as non-air conduction
- sensors such as throat microphones and bone conduction microphones are often used for voice collection in noisy environments.
- non-air conduction voice sensors use the vibration of the speaker’s vocal cords, jawbone and other parts to drive the reed or carbon film in the sensor to change, change its resistance value, and change the voltage at both ends. In this way, the vibration signal is converted into an electrical signal, that is, a voice signal. Since the sound waves conducted in the air cannot deform the reed or carbon film of the non-air-conducting sensor, the non-air-conducting sensor is not affected by the air-conducting sound and has a strong ability to resist acoustic noise.
- the non-air conduction sensor collects the voice transmitted through the vibration of the jawbone, muscle, skin and other parts, the high frequency part is seriously lost, which is manifested as the sound is muffled and ambiguous, and the voice intelligibility is poor.
- Chinese invention patent 201610025390.7 discloses a dual-sensor speech enhancement method and device based on a statistical model.
- the invention first combines non-air-conducted speech and air-conducted speech to construct a joint statistical model for classification and endpoint detection. Calculate the current best air conduction speech filter, filter and enhance the air conduction speech, and then use the non-air conduction speech to air conduction speech mapping model to convert the non-air conduction speech into the air conduction speech, and combine it with the filtered and enhanced speech
- the weighted fusion of air conduction speech partially solves the problem of the lack of full use of the correlation and prior knowledge of the air conduction speech recovered by the non-air conduction sensor and the air conduction speech fusion, but it is still used in the second step of fusion
- the air-conducted speech recovered to the non-air-conducted speech so there are also deficiencies such as high-frequency and silent noise, and the use of non-air-conducted speech to recover the air-conducted speech.
- the purpose of the present invention is to solve the above-mentioned defects in the prior art and provide a dual-sensor speech enhancement method and implementation device based on dual-channel Wiener filtering.
- the method first utilizes the complementarity between air-conducted speech and non-air-conducted speech It establishes a dual-channel voice joint classification model for frame classification of dual-channel input signals of air conduction sensors and non-air conduction sensors, and uses this model to classify the voice frames collected by dual channels, and finally constructs a dual-channel dimension based on the classification results Nano filter to filter and enhance the voice signal collected by dual channels.
- the present invention more fully integrates the information contained in air-guided speech and non-air-guided speech, and introduces prior knowledge of the speech signal through a statistical model, which can effectively improve the performance of the speech enhancement system in a noisy environment. Enhancement.
- the invention can be widely used in various occasions such as video calls, car phones, multimedia classrooms, and military communications.
- a dual-sensor voice enhancement method based on dual-channel Wiener filtering includes the following steps:
- step S3 Use the statistical model of air conduction noise and the dual-channel speech joint classification model in step S1 to classify the synchronized input air conduction test speech frames and non-air conduction test speech frames;
- step S4 Construct a two-channel Wiener filter according to the classification result of step S3 and the power spectrum mean value ⁇ vv ( ⁇ ), and filter the air conduction test speech frame and the non-air conduction test speech frame to obtain an enhanced air conduction speech.
- step S1 is as follows:
- step S1.2 Use the air-guided speech and non-air-guided speech features obtained in step S1.1 to train a dual-channel speech joint classification model;
- the dual-channel voice joint classification model adopts a multi-data stream Gaussian Mixture Model (GMM), namely
- GMM Gaussian Mixture Model
- N (o, ⁇ , ⁇ ) is a Gaussian function
- o x (k) and o b (k) for the k-th frame and the non-voice test air conduction air conduction test speech feature vectors extracted, with Is the mean value of the l Gaussian component of the air-conducted voice data stream and the non-air-conducted voice data stream in the multi-data stream GMM, with Is the variance of the l-th Gaussian component of the air-conducted voice data stream and the non-air-conducted voice data stream in the multi-stream GMM, c l is the weight of the l-th Gaussian component in the multi-stream GMM, w x and w b are respectively The weight of the air-conducted voice data stream and the non-air-conducted voice data stream in the data stream GMM, L is the number of Gaussian components.
- each Gaussian component in the dual-channel speech joint classification model represents a category, and for each pair of synchronized air conduction training speech frames and non-air conduction speech frames, the following formula is used to calculate Its score for each category
- the current air conduction training speech frames and non-air conduction speech frames belong to the category with the highest score; calculate the classification to which all air conduction training speech frames and non-air conduction speech frames belong, and then calculate the air conduction training speech frames and sums contained in the same category
- the statistical model of air conduction noise is the mean value of the power spectrum of air conduction noise ⁇ vv ( ⁇ ), which is calculated by the following method:
- step S2.3 Use the time corresponding to the endpoint of the non-air conduction test voice signal detected in step S2.2 as the endpoint of the air conduction test voice, and extract the pure noise segment in the air conduction test voice;
- the vector Taylor series model (Vector Taylor Seties, VTS) compensation technology is first used, and the air conduction noise statistical model is used to correct the parameters of the air conduction speech data stream in the dual-channel speech joint classification model. , And then classify the input air conduction test speech frames and non-air conduction test speech frames.
- the following formula is used to modify the average value of each Gaussian component of the air conduction speech data stream in the dual-channel speech joint classification model:
- step S4 for the air conduction test speech and non-air conduction test speech synchronously collected in the kth frame, the following formula is used to calculate the enhanced air conduction speech spectrum:
- Y( ⁇ ,k), X( ⁇ ,k), B( ⁇ ,k) are the frequency spectrums of the enhanced air conduction speech, air conduction test speech and non-air conduction test speech at the k-th frame, respectively.
- the following formulas are used to calculate
- q(k,l) is the classification score of the k-th frame air conduction test speech and non-air conduction test speech corresponding to the first category of the dual-channel speech joint classification model
- H a ( ⁇ ,k,l) is the k-th frame
- the air conduction test speech corresponds to the Wiener filter frequency response of the first category of the dual-channel speech joint classification model.
- H na ( ⁇ ,k,l) is the Wiener filter frequency response of the k-th frame of non-air conduction test speech corresponding to the first category of the dual-channel speech joint classification model.
- the calculation method is:
- a device for implementing a dual-sensor speech enhancement method based on dual-channel Wiener filtering includes an air-conducted speech sensor, a non-air-conducted speech sensor, a noise model estimation module, a dual-channel speech joint classification model, and a model compensation module , Frame classification module, filter coefficient generation module and dual-channel filter, among which,
- the air-conducted speech sensor and the non-air-conducted speech sensor are respectively connected to the noise model estimation module, the frame classification module, and the dual-channel filter; the dual-channel speech joint classification model, the model compensation module, and the frame classification module , The filter coefficient generation module and the dual-channel filter are connected in sequence, the noise model estimation module is connected with the model compensation module and the filter coefficient generation module, and the dual-channel speech joint classification model is connected with the filter coefficient generation module ;
- the air conduction speech sensor and the non-air conduction speech sensor are used to collect air conduction and non-air conduction speech signals, respectively, and the noise model estimation module is used to estimate the current air conduction noise model and power spectrum.
- the channel voice joint classification model uses synchronously collected clean air conduction training speech and non-air conduction training speech to establish air conduction speech frames and non-air conduction speech frames, and the air conduction speech power of each category in the dual-channel speech joint classification model
- the mean value of the spectrum is ⁇ ss ( ⁇ ,l)
- the mean value of the power spectrum of non-air-guided speech is ⁇ bb ( ⁇ ,l)
- the mean value of the cross-spectrum between air-guided speech and non-air-guided speech is ⁇ bs ( ⁇ ,l)
- the model compensation module uses the statistical model of air conduction noise to correct the parameters of the dual-channel speech joint classification model, and the frame classification module classifies the currently synchronized input air conduction test speech and non-air conduction test speech
- the air-conducted speech sensor is a microphone
- the non-air-conducted speech sensor is a throat microphone
- the present invention has the following advantages and effects:
- the present invention uses both air conduction test speech and non-air conduction test speech information when enhancing, and can achieve better enhancement effect.
- the present invention adopts a dual-channel speech joint classification model to fuse information of air conduction test speech and non-air conduction test speech, which can make frame classification more accurate and make full use of the correlation and prior knowledge of the two.
- the present invention uses a dual-channel Wiener filter to recover air-conducted speech. Compared with the Chinese invention patent 201610025390.7, the calculation is simpler, and it can avoid high-frequency or silent noise when air-conducted speech is recovered from non-air-conducted speech. , Failed to take advantage of the insufficiency of air conduction voice information, with better performance.
- the present invention uses a dual-channel Wiener filter to recover air-conducted speech, avoiding the assumption that non-air-conducted speech and air-conducted speech are mutually independent.
- Figure 1 is a structural block diagram of a device for implementing a dual-sensor voice enhancement method based on dual-channel Wiener filtering disclosed in an embodiment of the present invention
- Fig. 2 is a flowchart of a dual-sensor speech enhancement method based on dual-channel Wiener filtering disclosed in an embodiment of the present invention.
- This embodiment discloses a structural block diagram of a device for implementing a dual-sensor speech enhancement method based on dual-channel Wiener filtering.
- the air-conducted speech sensor, non-air-conducted speech sensor, noise model estimation module, and dual Channel speech joint classification model, model compensation module, frame classification module, filter coefficient generation module, and dual-channel filter are jointly constituted.
- air-conducted speech sensor and non-air-conducted speech sensor are respectively combined with noise model estimation module, frame classification module, Dual-channel filter connection, dual-channel speech joint classification model, model compensation module, frame classification module, filter coefficient generation module, dual-channel filter are connected in sequence, noise model estimation module is connected with model compensation module, filter coefficient generation module , The dual-channel speech joint classification model is connected to the filter coefficient generation module.
- the air-conducted speech sensor is a microphone
- the non-air-conducted speech sensor is a larynx microphone, both of which are used to collect air-conducted and non-air-conducted speech signals
- the noise model estimation module is used to estimate the current air conduction noise Model and power spectrum.
- the dual-channel voice joint classification model uses the synchronously collected clean air conduction training speech and non-air conduction training speech to establish air conduction speech frames and non-air conduction speech frames.
- the air conduction speech power spectrum of each category in the above two-channel speech joint classification model The mean value ⁇ ss ( ⁇ ,l), the mean value of the power spectrum of non-air-guided speech ⁇ bb ( ⁇ ,l), the mean value of cross-spectrum between air-guided speech and non-air-guided speech ⁇ bs ( ⁇ ,l).
- the model compensation module uses the statistical model of air conduction noise to correct the parameters of the dual-channel speech joint classification model.
- the frame classification module classifies the air conduction test speech and non-air conduction test speech frames input simultaneously.
- the filter coefficient generation module constructs a dual-channel Wiener filter based on the classification result and the power spectrum of air conduction noise.
- the dual-channel filter filters air conduction test speech frames and non-air conduction test speech frames to obtain enhanced air conduction speech.
- This embodiment discloses a dual-sensor speech enhancement method based on dual-channel Wiener filtering. According to the implementation device disclosed in the above embodiment, the following steps are used to calculate the enhanced air conduction test speech and non-air conduction test speech input. Guide voice, its process is shown in Figure 2:
- Step S1 Synchronously collect clean air conduction training speech and non-air conduction training speech, establish a two-channel speech joint classification model of air conduction speech frame and non-air conduction speech frame, and calculate each corresponding to each of the above-mentioned two-channel speech joint classification model
- the average power spectrum of non-air-guided speech ⁇ bb ( ⁇ ,l) the average cross-spectrum between air-guided speech and non-air-guided speech ⁇ bs ( ⁇ ,l)
- ⁇ is the frequency
- l is the serial number of the classification.
- the synchronously collected clean air conduction training speech and non-air conduction training speech are divided into frames with a frame length of 30 ms and a frame shift of 10 ms.
- Each frame of clean air conduction training speech and non-air conduction training speech uses Hamming. After adding windows and pre-emphasis, find the power spectrum.
- the power spectra of the above-mentioned air conduction training speech and non-air conduction training speech are respectively passed through a 24-dimensional mel filter bank, and the output of the filter bank is taken logarithmically and then subjected to DCT transformation to obtain two sets of 12-dimensional mel frequency inverted
- the spectral coefficient is used as the training feature of the dual-channel speech joint classification model.
- step S1.2 Use the air-guided speech and non-air-guided speech features obtained in step S1.1 to train a dual-channel speech joint classification model.
- the dual-channel voice joint classification model adopts multi-data stream GMM, namely
- N (o, ⁇ , ⁇ ) is a Gaussian function
- o x (k) and o b (k) for the k-th frame and the non-voice test air conduction air conduction test speech feature vectors extracted, with Is the mean value of the l Gaussian component of the air-conducted voice data stream and the non-air-conducted voice data stream in the multi-data stream GMM, with Is the variance of the l-th Gaussian component of the air-conducted voice data stream and the non-air-conducted voice data stream in the multi-stream GMM, c l is the weight of the l-th Gaussian component in the multi-stream GMM, w x and w b are respectively The weight of the air-conducted voice data stream and the non-air-conducted voice data stream in the data stream GMM, L is the number of Gaussian components.
- each Gaussian component in the dual-channel speech joint classification model represents a category.
- the following formula is used to calculate the score for each category
- the current air conduction training speech frame and non-air conduction speech frame belong to the category with the highest score. Calculate the category to which all air conduction training speech frames and non-air conduction speech frames belong, and then calculate the average air conduction speech power spectrum of the air conduction training speech frames and non-air conduction speech frames contained in the same category ⁇ ss ( ⁇ ,l) , The average power spectrum of non-air-guided speech ⁇ bb ( ⁇ ,l), the average cross-spectrum between air-guided speech and non-air-guided speech ⁇ bs ( ⁇ ,l).
- Step S2 Collect air conduction test speech and non-air conduction test speech simultaneously, use the pure noise section of the air conduction test speech to establish a statistical model of air conduction noise, and calculate the power spectrum mean value ⁇ vv ( ⁇ ) of air conduction noise.
- the statistical model of air conduction noise is the mean value of the power spectrum of air conduction noise ⁇ vv ( ⁇ ), which is calculated by the following method:
- step S2.3 Use the time corresponding to the endpoint of the non-air conduction test voice signal detected in step S2.2 as the endpoint of the air conduction test voice, and extract the pure noise segment in the air conduction test voice;
- the statistical model of air conduction noise is Gaussian function, GMM model or HMM model.
- Step S3 Use the statistical model of air conduction noise and the two-channel speech joint classification model in step S1 to classify the air conduction test speech frames and non-air conduction test speech frames input simultaneously.
- the VTS model compensation technology is first adopted, and the air conduction noise statistical model is used to correct the parameters of the air conduction speech data stream in the dual-channel speech joint classification model, and then the input air conduction test speech frames and non-air conduction test speech frames are corrected.
- Guide test voice frames for classification The specific method is to use the following formula to modify the mean value of each Gaussian component of the air conduction speech data stream in the dual-channel speech joint classification model:
- the power spectra of the clean air conduction training speech and noise belonging to the l-th class respectively pass through the 24-dimensional mel filter bank and take the logarithm of the mean, and C is the discrete cosine transform matrix (Discrete Cosine Transform, DCT).
- DCT discrete Cosine Transform
- Step S4 Construct a two-channel Wiener filter according to the classification result of step S3 and ⁇ vv ( ⁇ ), and filter the air conduction test speech frame and the non-air conduction test speech frame to obtain an enhanced air conduction speech.
- Y( ⁇ ,k), X( ⁇ ,k), B( ⁇ ,k) are the frequency spectrums of the enhanced air conduction speech, air conduction test speech and non-air conduction test speech at the k-th frame, respectively.
- the following formulas are used to calculate
- q(k,l) is the classification score of the k-th frame air conduction test speech and non-air conduction test speech corresponding to the first category of the dual-channel speech joint classification model.
- H a ( ⁇ ,k,l) is the Wiener filter frequency response of the k-th frame air conduction test speech corresponding to the first category of the dual-channel speech joint classification model.
- H na ( ⁇ ,k,l) is the Wiener filter frequency response of the k-th frame of non-air conduction test speech corresponding to the first category of the dual-channel speech joint classification model.
- the calculation method is:
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
Claims (10)
- 一种基于双通道维纳滤波的双传感器语音增强方法,其特征在于,所述的双传感器语音增强方法包括以下步骤:A dual-sensor voice enhancement method based on dual-channel Wiener filtering is characterized in that the dual-sensor voice enhancement method includes the following steps:S1、同步采集干净的气导训练语音和非气导训练语音,建立气导语音帧和非气导语音帧的双通道语音联合分类模型,并计算对应于上述双通道语音联合分类模型中每个分类的气导语音功率谱均值Φ ss(ω,l)、非气导语音功率谱均值Φ bb(ω,l)、气导语音和非气导语音之间的互谱均值Φ bs(ω,l),其中ω为频率,l为分类的序号; S1. Collect clean air conduction training speech and non-air conduction training speech simultaneously, establish a two-channel speech joint classification model of air conduction speech frame and non-air conduction speech frame, and calculate corresponding to each of the above two-channel speech joint classification models The average power spectrum of classified air-guided speech Φ ss (ω,l), the average power spectrum of non-air-guided speech Φ bb (ω,l), the average cross-spectrum between air-guided speech and non-air-guided speech Φ bs (ω, l), where ω is frequency and l is the serial number of the classification;S2、同步采集气导测试语音和非气导测试语音,利用气导测试语音的纯噪声段建立气导噪声的统计模型,并计算气导噪声的功率谱均值Φ vv(ω); S2. Collect air conduction test speech and non-air conduction test speech simultaneously, use the pure noise section of air conduction test speech to establish a statistical model of air conduction noise, and calculate the mean value of the power spectrum of air conduction noise Φ vv (ω);S3、利用气导噪声的统计模型和步骤S1中的双通道语音联合分类模型对同步输入的气导测试语音帧和非气导测试语音帧进行分类;S3. Use the statistical model of air conduction noise and the dual-channel speech joint classification model in step S1 to classify the synchronized input air conduction test speech frames and non-air conduction test speech frames;S4、根据步骤S3的分类结果和功率谱均值Φ vv(ω)构建双通道维纳滤波器,对气导测试语音帧和非气导测试语音帧进行滤波,得到增强后的气导语音。 S4. Construct a two-channel Wiener filter according to the classification result of step S3 and the power spectrum mean value Φ vv (ω), and filter the air conduction test speech frame and the non-air conduction test speech frame to obtain an enhanced air conduction speech.
- 根据权利要求1所述的双传感器语音增强方法,其特征在于,所述的步骤S1过程如下:The dual-sensor speech enhancement method according to claim 1, wherein the process of step S1 is as follows:S1.1、对同步采集的干净气导训练语音和非气导训练语音进行分帧和预处理,提取每帧语音的特征参数,其中,所述的特征参数为倒梅尔谱系数;S1.1. Framing and preprocessing the synchronously collected clean air conduction training speech and non-air conduction training speech, and extracting characteristic parameters of each frame of speech, where the characteristic parameters are inverted mel spectrum coefficients;S1.2、利用步骤S1.1中得到的气导语音和非气导语音特征,训练双通道语音联合分类模型;S1.2. Use the air-guided speech and non-air-guided speech features obtained in step S1.1 to train a dual-channel speech joint classification model;S1.3、使用经过训练的双通道语音联合分类模型对所有气导训练语音帧和非气导语音帧进行分类,然后计算每一分类所包含的气导训练语音帧和非气导语音帧的气导语音功率谱均值Φ ss(ω,l)、非气导语音功率谱均值Φ bb(ω,l)、气导语音和非气导语音之间的互谱均值Φ bs(ω,l)。 S1.3. Use the trained dual-channel speech joint classification model to classify all air conduction training speech frames and non-air conduction speech frames, and then calculate the air conduction training speech frames and non-air conduction speech frames contained in each classification. Air-guided speech power spectrum mean Φ ss (ω,l), non-air-guided speech power spectrum mean Φ bb (ω,l), cross-spectrum mean between air-guided speech and non-air-guided speech Φ bs (ω,l) .
- 根据权利要求2所述的双传感器语音增强方法,其特征在于,所述的步骤S1.2中,双通道语音联合分类模型采用多数据流GMM,其中,GMM为高斯混合模型,即The dual-sensor speech enhancement method according to claim 2, characterized in that, in the step S1.2, the dual-channel speech joint classification model adopts a multi-data stream GMM, where GMM is a Gaussian mixture model, namely其中N(o,μ,σ)为高斯函数,o x(k)和o b(k)为第k帧气导测试语音和非气导测试语音中提 取的特征矢量, 和 为多数据流GMM中气导语音数据流和非气导语音数据流第l个高斯分量的均值, 和 为多数据流GMM中气导语音数据流和非气导语音数据流第l个高斯分量的方差,c l为多数据流GMM中第l个高斯分量的权重,w x和w b分别为多数据流GMM中气导语音数据流和非气导语音数据流的权重,L为高斯分量的个数。 Where N (o, μ, σ) is a Gaussian function, o x (k) and o b (k) for the k-th frame and the non-voice test air conduction air conduction test speech feature vectors extracted, with Is the mean value of the l Gaussian component of the air-conducted voice data stream and the non-air-conducted voice data stream in the multi-data stream GMM, with Is the variance of the l-th Gaussian component of the air-conducted voice data stream and the non-air-conducted voice data stream in the multi-stream GMM, c l is the weight of the l-th Gaussian component in the multi-stream GMM, w x and w b are respectively The weight of the air-conducted voice data stream and the non-air-conducted voice data stream in the data stream GMM, L is the number of Gaussian components.
- 根据权利要求3所述的双传感器语音增强方法,其特征在于,所述的步骤S1.3中,双通道语音联合分类模型中的每个高斯分量代表一个分类,对于每一对同步的气导训练语音帧和非气导语音帧,采用下式计算其对每一个分类的得分The dual-sensor speech enhancement method according to claim 3, wherein in the step S1.3, each Gaussian component in the dual-channel speech joint classification model represents a classification, and for each pair of synchronized air conduction Training speech frames and non-air-conducted speech frames, use the following formula to calculate their scores for each category当前的气导训练语音帧和非气导语音帧属于得分最高的分类;计算出所有气导训练语音帧和非气导语音帧所属的分类,然后计算同一分类所包含的气导训练语音帧和非气导语音帧的气导语音功率谱均值Φ ss(ω,l)、非气导语音功率谱均值Φ bb(ω,l)、气导语音和非气导语音之间的互谱均值Φ bs(ω,l)。 The current air conduction training speech frames and non-air conduction speech frames belong to the category with the highest score; calculate the classification to which all air conduction training speech frames and non-air conduction speech frames belong, and then calculate the air conduction training speech frames and sums contained in the same category The mean value of the air-guided speech power spectrum of the non-air-guided speech frame Φ ss (ω,l), the mean value of the power spectrum of the non-air-guided speech Φ bb (ω,l), the cross-spectrum mean value between the air-guided speech and the non-air-guided speech Φ bs (ω,l).
- 根据权利要求1所述的双传感器语音增强方法,其特征在于,所述的气导噪声的统计模型即为气导噪声的功率谱均值Φ vv(ω),采用以下方法来计算: The dual-sensor speech enhancement method according to claim 1, wherein the statistical model of air conduction noise is the mean value of the power spectrum of air conduction noise Φ vv (ω), which is calculated by the following method:S2.1、同步采集气导测试语音和非气导测试语音并分帧;S2.1. Collect air conduction test speech and non-air conduction test speech simultaneously and divide them into frames;S2.2、根据非气导检测语音帧的短时自相关函数R b(m)和短时能量E b,计算每帧非气导检测语音帧的短时平均过门限率C b: S2.2. According to the short-term autocorrelation function R b (m) and short-term energy E b of the non-air conduction detection speech frame, calculate the short-term average threshold crossing rate C b of each non-air conduction detection speech frame:其中sgn[·]为取符号运算, 是调节因子,T是门限初值,M是帧长,当C b大于预设的门限值时,判断该帧为语音信号,否则为噪声,根据每帧的判决结果得到非气导检测语音信号的端点位置; Where sgn[·] is a symbolic operation, Is the adjustment factor, T is the threshold initial value, and M is the frame length. When C b is greater than the preset threshold, the frame is judged to be a speech signal, otherwise it is noise. According to the judgment result of each frame, the non-air conduction detection speech is obtained The end position of the signal;S2.3、将步骤S2.2中检测到的非气导测试语音信号端点对应的时刻作为气导检测语音的端点,提取气导检测语音中的纯噪声段;S2.3. Use the time corresponding to the endpoint of the non-air conduction test voice signal detected in step S2.2 as the endpoint of the air conduction test voice, and extract the pure noise segment in the air conduction test voice;S2.4、计算气导测试语音中纯噪声段信号的功率谱均值Φ vv(ω)。 S2.4. Calculate the mean value Φ vv (ω) of the power spectrum of the pure noise signal in the air conduction test speech.
- 根据权利要求1所述的双传感器语音增强方法,其特征在于,所述的步骤S3 中首先采用矢量泰勒级数模型补偿技术,利用气导噪声的统计模型对双通道语音联合分类模型中气导语音数据流的参数进行修正,然后再对输入的气导测试语音帧和非气导测试语音帧进行分类,其中,采用下式修正双通道语音联合分类模型中气导语音数据流每个高斯分量的均值:The dual-sensor speech enhancement method according to claim 1, characterized in that, in said step S3, a vector Taylor series model compensation technique is first adopted, and a statistical model of air conduction noise is used to analyze the air conduction noise in the dual-channel speech joint classification model. The parameters of the speech data stream are corrected, and then the input air conduction test speech frames and non-air conduction test speech frames are classified. The following formula is used to modify each Gaussian component of the air conduction speech data stream in the dual-channel speech joint classification model Mean of:其中 和 分别为属于第l个类的干净气导训练语音和噪声的功率谱分别通过24维梅尔滤波器组并取对数后的均值,C为离散余弦变换矩阵,双通道语音联合分类模型中的其他参数保持不变,采用修正后的双通道语音联合分类模型对同步输入的气导测试语音帧和非气导测试语音帧进行分类,得到当前气导测试语音帧和非气导测试语音帧对应于每个分类的分类得分q(k,l)。 among them with The power spectra of the clean air conduction training speech and noise belonging to the l-th class respectively pass through the 24-dimensional mel filter bank and take the logarithm of the mean value. C is the discrete cosine transform matrix, the two-channel speech joint classification model Other parameters remain unchanged, and the revised dual-channel speech joint classification model is used to classify the synchronized input air conduction test speech frame and non-air conduction test speech frame to obtain the current air conduction test speech frame and the corresponding non-air conduction test speech frame The category score q(k,l) for each category.
- 根据权利要求2所述的双传感器语音增强方法,其特征在于,所述的步骤S4中,对于第k帧同步采集的气导测试语音和非气导测试语音,采用下式计算增强后的气导语音频谱:The dual-sensor speech enhancement method according to claim 2, characterized in that, in the step S4, for the air conduction test speech and non-air conduction test speech synchronously collected at the kth frame, the following formula is used to calculate the enhanced air conduction test speech Guide voice spectrum:其中Y(ω,k)、X(ω,k)、B(ω,k)分别为第k帧增强后的气导语音、气导测试语音和非气导测试语音的频谱, 为对应于第k帧气导测试语音和非气导测试语音的维纳滤波器滤波器的频率响应,分别采用下式计算 Among them, Y(ω,k), X(ω,k), B(ω,k) are the frequency spectrums of the enhanced air conduction speech, air conduction test speech and non-air conduction test speech at the k-th frame, respectively. In order to correspond to the frequency response of the Wiener filter filter of the k-th air conduction test speech and non-air conduction test speech, the following formulas are used to calculate式中q(k,l)为第k帧气导测试语音和非气导测试语音对应于双通道语音联合分类模型第l类的分类得分,H a(ω,k,l)为第k帧气导测试语音对应于双通道语音联合分类模型第l类的维纳滤波器频率响应,计算方法为: Where q(k,l) is the classification score of the k-th frame air conduction test speech and non-air conduction test speech corresponding to the first category of the dual-channel speech joint classification model, and H a (ω,k,l) is the k-th frame The air conduction test speech corresponds to the Wiener filter frequency response of the first category of the dual-channel speech joint classification model. The calculation method is:H na(ω,k,l)为第k帧非气导测试语音对应于双通道语音联合分类模型第l类的维纳滤波器频率响应,计算方法为: H na (ω,k,l) is the Wiener filter frequency response of the k-th frame of non-air conduction test speech corresponding to the first category of the dual-channel speech joint classification model. The calculation method is:
- 一种基于双通道维纳滤波的双传感器语音增强方法的实现装置,其特征在于,所述的实现装置包括气导语音传感器、非气导语音传感器、噪声模型估计模块、双通道语音联合分类模型、模型补偿模块、帧分类模块、滤波器系数生成模块和双通道滤波器,其中,A device for implementing a dual-sensor speech enhancement method based on dual-channel Wiener filtering, characterized in that the device includes an air-conducted speech sensor, a non-air-conducted speech sensor, a noise model estimation module, and a dual-channel speech joint classification model , Model compensation module, frame classification module, filter coefficient generation module and dual-channel filter, among which,所述的气导语音传感器和非气导语音传感器分别与所述的噪声模型估计模块、帧分类模块、双通道滤波器连接;所述的双通道语音联合分类模型、模型补偿模块、帧分类模块、滤波器系数生成模块、双通道滤波器顺次连接,所述的噪声模型估计模块与模型补偿模块、滤波器系数生成模块连接,所述的双通道语音联合分类模型与滤波器系数生成模块连接;The air-conducted speech sensor and the non-air-conducted speech sensor are respectively connected to the noise model estimation module, the frame classification module, and the dual-channel filter; the dual-channel speech joint classification model, the model compensation module, and the frame classification module , The filter coefficient generation module and the dual-channel filter are connected in sequence, the noise model estimation module is connected with the model compensation module and the filter coefficient generation module, and the dual-channel speech joint classification model is connected with the filter coefficient generation module ;所述的气导语音传感器和非气导语音传感器分别用于采集气导和非气导语音信号,所述的噪声模型估计模块用于估计当前气导噪声的模型和功率谱,所述的双通道语音联合分类模型采用同步采集的干净气导训练语音和非气导训练语音建立气导语音帧和非气导语音帧,所述的双通道语音联合分类模型中每个分类的气导语音功率谱均值是Φ ss(ω,l)、非气导语音功率谱均值是Φ bb(ω,l)、气导语音和非气导语音之间的互谱均值是Φ bs(ω,l),所述的模型补偿模块利用气导噪声的统计模型对双通道语音联合分类模型的参数进行修正,所述的帧分类模块对当前同步输入的气导测试语音和非气导测试语音帧进行分类,所述的滤波器系数生成模块根据分类结果和气导噪声的功率谱构建双通道维纳滤波器,所述的双通道滤波器对气导测试语音帧和非气导测试语音帧进行滤波,得到增强后的气导语音。 The air conduction speech sensor and the non-air conduction speech sensor are used to collect air conduction and non-air conduction speech signals, respectively, and the noise model estimation module is used to estimate the current air conduction noise model and power spectrum. The channel voice joint classification model uses synchronously collected clean air conduction training speech and non-air conduction training speech to establish air conduction speech frames and non-air conduction speech frames, and the air conduction speech power of each category in the dual-channel speech joint classification model The mean value of the spectrum is Φ ss (ω,l), the mean value of the power spectrum of non-air-guided speech is Φ bb (ω,l), the mean value of the cross-spectrum between air-guided speech and non-air-guided speech is Φ bs (ω,l), The model compensation module uses the statistical model of air conduction noise to correct the parameters of the dual-channel speech joint classification model, and the frame classification module classifies the currently synchronized input air conduction test speech and non-air conduction test speech frames, The filter coefficient generation module constructs a dual-channel Wiener filter based on the classification result and the power spectrum of air conduction noise. The dual-channel filter filters air conduction test speech frames and non-air conduction test speech frames to be enhanced After the air conduction voice.
- 根据权利要求9所述的双传感器语音增强方法的实现装置,其特征在于,所述的气导语音传感器为麦克风,所述的非气导语音传感器为喉部送话器。The device for implementing the dual-sensor speech enhancement method according to claim 9, wherein the air-conducted speech sensor is a microphone, and the non-air-conducted speech sensor is a throat microphone.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910678398.7A CN110390945B (en) | 2019-07-25 | 2019-07-25 | Dual-sensor voice enhancement method and implementation device |
CN201910678398.7 | 2019-07-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021012403A1 true WO2021012403A1 (en) | 2021-01-28 |
Family
ID=68287587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/110290 WO2021012403A1 (en) | 2019-07-25 | 2019-10-10 | Dual sensor speech enhancement method and implementation device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110390945B (en) |
WO (1) | WO2021012403A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111009253B (en) * | 2019-11-29 | 2022-10-21 | 联想(北京)有限公司 | Data processing method and device |
CN111524531A (en) * | 2020-04-23 | 2020-08-11 | 广州清音智能科技有限公司 | Method for real-time noise reduction of high-quality two-channel video voice |
CN116470959A (en) * | 2022-07-12 | 2023-07-21 | 苏州旭创科技有限公司 | Filter implementation method, noise suppression method, device and computer equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004279768A (en) * | 2003-03-17 | 2004-10-07 | Mitsubishi Heavy Ind Ltd | Device and method for estimating air-conducted sound |
CN103208291A (en) * | 2013-03-08 | 2013-07-17 | 华南理工大学 | Speech enhancement method and device applicable to strong noise environments |
CN105513605A (en) * | 2015-12-01 | 2016-04-20 | 南京师范大学 | Voice enhancement system and method for cellphone microphone |
CN105632512A (en) * | 2016-01-14 | 2016-06-01 | 华南理工大学 | Dual-sensor voice enhancement method based on statistics model and device |
US20170294179A1 (en) * | 2011-09-19 | 2017-10-12 | Bitwave Pte Ltd | Multi-sensor signal optimization for speech communication |
CN107886967A (en) * | 2017-11-18 | 2018-04-06 | 中国人民解放军陆军工程大学 | A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network |
JP2018063400A (en) * | 2016-10-14 | 2018-04-19 | 富士通株式会社 | Audio processing apparatus and audio processing program |
CN108986834A (en) * | 2018-08-22 | 2018-12-11 | 中国人民解放军陆军工程大学 | The blind Enhancement Method of bone conduction voice based on codec framework and recurrent neural network |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN203165457U (en) * | 2013-03-08 | 2013-08-28 | 华南理工大学 | Voice acquisition device used for noisy environment |
CN106328156B (en) * | 2016-08-22 | 2020-02-18 | 华南理工大学 | Audio and video information fusion microphone array voice enhancement system and method |
GB201713946D0 (en) * | 2017-06-16 | 2017-10-18 | Cirrus Logic Int Semiconductor Ltd | Earbud speech estimation |
CN110010143B (en) * | 2019-04-19 | 2020-06-09 | 出门问问信息科技有限公司 | Voice signal enhancement system, method and storage medium |
-
2019
- 2019-07-25 CN CN201910678398.7A patent/CN110390945B/en active Active
- 2019-10-10 WO PCT/CN2019/110290 patent/WO2021012403A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004279768A (en) * | 2003-03-17 | 2004-10-07 | Mitsubishi Heavy Ind Ltd | Device and method for estimating air-conducted sound |
US20170294179A1 (en) * | 2011-09-19 | 2017-10-12 | Bitwave Pte Ltd | Multi-sensor signal optimization for speech communication |
CN103208291A (en) * | 2013-03-08 | 2013-07-17 | 华南理工大学 | Speech enhancement method and device applicable to strong noise environments |
CN105513605A (en) * | 2015-12-01 | 2016-04-20 | 南京师范大学 | Voice enhancement system and method for cellphone microphone |
CN105632512A (en) * | 2016-01-14 | 2016-06-01 | 华南理工大学 | Dual-sensor voice enhancement method based on statistics model and device |
JP2018063400A (en) * | 2016-10-14 | 2018-04-19 | 富士通株式会社 | Audio processing apparatus and audio processing program |
CN107886967A (en) * | 2017-11-18 | 2018-04-06 | 中国人民解放军陆军工程大学 | A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network |
CN108986834A (en) * | 2018-08-22 | 2018-12-11 | 中国人民解放军陆军工程大学 | The blind Enhancement Method of bone conduction voice based on codec framework and recurrent neural network |
Also Published As
Publication number | Publication date |
---|---|
CN110390945A (en) | 2019-10-29 |
CN110390945B (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI763073B (en) | Deep learning based noise reduction method using both bone-conduction sensor and microphone signals | |
WO2021012403A1 (en) | Dual sensor speech enhancement method and implementation device | |
CN110070880B (en) | Establishment method and application method of combined statistical model for classification | |
CN109273021B (en) | RNN-based real-time conference noise reduction method and device | |
CN111916101B (en) | Deep learning noise reduction method and system fusing bone vibration sensor and double-microphone signals | |
CN100573663C (en) | Mute detection method based on speech characteristic to jude | |
Zhang et al. | On end-to-end multi-channel time domain speech separation in reverberant environments | |
WO2022027423A1 (en) | Deep learning noise reduction method and system fusing signal of bone vibration sensor with signals of two microphones | |
KR102429152B1 (en) | Deep learning voice extraction and noise reduction method by fusion of bone vibration sensor and microphone signal | |
CN110197665A (en) | A kind of speech Separation and tracking for police criminal detection monitoring | |
CN103208291A (en) | Speech enhancement method and device applicable to strong noise environments | |
CN110942784A (en) | Snore classification system based on support vector machine | |
Zheng et al. | Spectra restoration of bone-conducted speech via attention-based contextual information and spectro-temporal structure constraint | |
CN203165457U (en) | Voice acquisition device used for noisy environment | |
CN111341351A (en) | Voice activity detection method and device based on self-attention mechanism and storage medium | |
CN113327589B (en) | Voice activity detection method based on attitude sensor | |
CN112992131A (en) | Method for extracting ping-pong command of target voice in complex scene | |
Heracleous et al. | Fusion of standard and alternative acoustic sensors for robust automatic speech recognition | |
Srinivasan et al. | Robustness analysis of speech enhancement using a bone conduction microphone-preliminary results | |
Thomsen et al. | Speech enhancement and noise-robust automatic speech recognition | |
Chandra | Hindi vowel classification using QCN-PNCC features | |
Radha et al. | A Study on Alternative Speech Sensor | |
Jiang et al. | Using energy difference for speech separation of dual-microphone close-talk system | |
Saudi et al. | Robust Audio-Visual Speech Recognition System based on Gabor Features and Dynamic Stream Weight Adaption | |
Sathiamoorthy et al. | Performance of Speaker Verification Using CSM and TM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19938708 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19938708 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.09.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19938708 Country of ref document: EP Kind code of ref document: A1 |