CN110390945B - Dual-sensor voice enhancement method and implementation device - Google Patents
Dual-sensor voice enhancement method and implementation device Download PDFInfo
- Publication number
- CN110390945B CN110390945B CN201910678398.7A CN201910678398A CN110390945B CN 110390945 B CN110390945 B CN 110390945B CN 201910678398 A CN201910678398 A CN 201910678398A CN 110390945 B CN110390945 B CN 110390945B
- Authority
- CN
- China
- Prior art keywords
- air conduction
- speech
- voice
- dual
- air
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000013179 statistical model Methods 0.000 claims abstract description 21
- 238000001914 filtration Methods 0.000 claims abstract description 18
- 238000012360 testing method Methods 0.000 claims description 132
- 238000013145 classification model Methods 0.000 claims description 61
- 238000001228 spectrum Methods 0.000 claims description 52
- 238000012549 training Methods 0.000 claims description 51
- 230000004044 response Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000009432 framing Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 5
- 230000003595 spectral effect Effects 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 238000005311 autocorrelation function Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 6
- 238000004891 communication Methods 0.000 abstract description 4
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 4
- 235000014676 Phragmites communis Nutrition 0.000 description 4
- 229910052799 carbon Inorganic materials 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000005534 acoustic noise Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 210000003491 skin Anatomy 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a dual-sensor speech enhancement method based on dual-channel wiener filtering and an implementation device. Compared with the prior art, the method and the device have the advantages that the information contained in the air conduction voice and the non-air conduction voice is more fully fused, the priori knowledge of the voice signals is introduced through the statistical model, and the enhancement effect of the voice enhancement system in a noise environment can be effectively improved. The invention can be widely applied to various occasions such as video call, vehicle-mounted telephone, multimedia classroom, military communication and the like.
Description
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a dual-sensor voice enhancement method based on dual-channel wiener filtering and an implementation device.
Background
In actual voice communication, a voice signal is often interfered by external environmental noise, and the quality of received voice is affected. The speech enhancement technology is an important branch of speech signal processing, aims to extract pure original speech from noisy speech as far as possible, and is widely applied to the fields of speech communication, speech compression coding, speech recognition and the like in a noisy environment.
Since human ears sense sound through air vibration, most of the existing speech enhancement algorithms are directed at air conduction (air conduction for short), that is, speech is collected by an air conduction sensor (such as a microphone), the enhancement effect is greatly influenced by various acoustic noises in the environment, and the performance is usually not good in a noisy environment. To reduce the impact of ambient noise on speech quality, non-air-conduction (referred to as non-air-conduction) sensors such as laryngeal microphones, bone conduction microphones, etc. are often used for speech acquisition in noisy environments. Different from the air conduction sensor, the non-air conduction voice sensor utilizes the vibration of the vocal cords, the jaw bones and other parts of a speaker to drive the reed or the carbon film in the sensor to change, change the resistance value of the reed or the carbon film, change the voltage at two ends of the reed or the carbon film, and convert a vibration signal into an electric signal, namely a voice signal. The reed or the carbon film of the non-air-conduction sensor cannot be deformed by the sound waves conducted in the air, so that the non-air-conduction sensor is not influenced by the air-conduction sound and has strong acoustic noise resistance. However, the non-air conduction sensor collects the voice transmitted through the vibration of the jaw bone, muscle, skin and other parts, and the high frequency part of the voice is seriously lost, which is manifested as stuffiness and vague voice and poorer speech intelligibility.
In view of the shortcomings of both air conduction and non-air conduction sensors when used alone, some speech enhancement methods have been developed in recent years that combine the advantages of both. These methods utilize the complementarity of air-borne speech and non-air-borne speech, and employ multi-sensor fusion techniques to achieve speech enhancement, often achieving better results than single-sensor speech enhancement systems. The existing dual-sensor voice enhancement mainly comprises two modes, namely, firstly recovering air conduction voice from non-air conduction voice, and then fusing the air conduction voice with noise; and the other method is to recover the air conduction voice from the non-air conduction voice, enhance the air conduction voice with noise by using signals of the air conduction sensor and the non-air conduction sensor, and then fuse the air conduction voice and the non-air conduction sensor. These techniques suffer from the following disadvantages: (1) when restoring air conduction speech using non-air conduction speech, additional noise may be introduced in the high frequency or silence, affecting the enhancement effect. (2) When recovering air conduction speech using non-air conduction speech, information of current air conduction speech is not utilized. (3) When the air conduction speech restored by using the non-air conduction speech is fused with the air conduction speech, the correlation and the prior knowledge of the air conduction speech and the air conduction speech cannot be fully utilized. (4) The non-air-guided speech and the air-guided speech are generally assumed to be independent of each other in the fusion, but this assumption does not hold in practice.
Chinese patent 201610025390.7 discloses a method and apparatus for dual-sensor speech enhancement based on statistical models, the invention firstly combines non-air conduction voice and air conduction voice to construct a combined statistical model for classification and carry out endpoint test, calculates the current optimal air conduction voice filter through the combined statistical model, the air conduction voice is subjected to filtering enhancement, then the non-air conduction voice is converted into the air conduction voice by utilizing a mapping model from the non-air conduction voice to the air conduction voice, and the weighted fusion is carried out on the air conduction voice after the filtering enhancement, the defects that the correlation and the prior knowledge of the air conduction voice and the air conduction voice recovered by a non-air conduction sensor cannot be fully utilized when the air conduction voice and the air conduction voice are fused are partially solved, however, the second step of fusion still uses the air conduction voice recovered from the non-air conduction voice, so that the method also has the defects of high-frequency and mute noise, information of the air conduction voice which cannot be utilized when the non-air conduction voice is used for recovering the air conduction voice, and the like.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a dual-sensor speech enhancement method based on dual-channel wiener filtering and a realization device. Compared with the prior art, the method and the device have the advantages that the information contained in the air conduction voice and the non-air conduction voice is more fully fused, the priori knowledge of the voice signals is introduced through the statistical model, and the enhancement effect of the voice enhancement system in a noise environment can be effectively improved. The invention can be widely applied to various occasions such as video call, vehicle-mounted telephone, multimedia classroom, military communication and the like.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a dual-sensor speech enhancement method based on dual-channel wiener filtering comprises the following steps:
s1, synchronously collecting clean air conduction training voice and non-air conduction training voice, establishing a dual-channel voice combined classification model of air conduction voice frames and non-air conduction voice frames, and calculating an air conduction voice power spectrum average value phi corresponding to each classification in the dual-channel voice combined classification modelss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω, l), where ω is frequency and l is the number of the class;
s2, synchronously collecting air conduction test voice and non-air conduction test voice, establishing a statistical model of air conduction noise by using pure noise section of the air conduction test voice, and calculating the power spectrum mean value phi of the air conduction noisevv(ω);
S3, classifying the synchronously input air conduction test voice frame and the non-air conduction test voice frame by using the statistical model of the air conduction noise and the dual-channel voice combined classification model in the step S1;
s4, classifying result and power spectrum mean value phi according to the step S3vv(omega) constructing a dual-channel wiener filter, and filtering the air conduction test voice frame and the non-air conduction test voice frame to obtain the enhanced air conduction voice.
Further, the step S1 is as follows:
s1.1, framing and preprocessing clean air conduction training voice and non-air conduction training voice which are synchronously collected, and extracting a characteristic parameter, namely a reverse Mel spectral coefficient, of each frame of voice;
s1.2, training a dual-channel speech joint classification model by using the clean air conduction speech and non-air conduction speech characteristics obtained in the step S1.1;
s1.3, use of trained pairsClassifying all air conduction training speech frames and non-air conduction speech frames by a channel speech combined classification model, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in each classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
Further, in step S1.2, the dual-channel speech joint classification Model adopts a multiple data stream Gaussian Mixture Model (GMM), that is, a Gaussian Mixture Model (GMM)
Where N (o, μ, σ) is a Gaussian function, ox(k) And ob(k) For the feature vectors extracted from the k-th frame of air conduction test speech and non-air conduction test speech,andis the mean of the first gaussian components of the air-guide speech data stream and the non-air-guide speech data stream in the multi-data stream GMM,andvariance of the first Gaussian component of the flow of air-guiding and non-air-guiding speech data in a multi-data-flow GMM, clIs the weight of the first Gaussian component in multiple data streams GMM, wxAnd wbThe weights of the air-guide voice data stream and the non-air-guide voice data stream in the multi-data stream GMM are respectively, and L is the number of Gaussian components.
Further, in step S1.3, each gaussian component in the dual-channel speech joint classification model represents a classification, and for each pair of synchronous air conduction training speech frame and non-air conduction speech frame, the score of each classification is calculated by using the following formula
The current air conduction training speech frame and the non-air conduction speech frame belong to the classification with the highest score; calculating the classification of all air conduction training speech frames and non-air conduction speech frames, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in the same classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
Further, the statistical model of the air conduction noise is the power spectrum mean value phi of the air conduction noisevv(ω), calculated using the following method:
s2.1, synchronously acquiring air conduction test voice and non-air conduction test voice and framing;
s2.2, testing the short-time autocorrelation function R of the non-air conduction testing voice frame according to the testb(m) and short-term energy EbCalculating the short-time average threshold crossing rate C of each frame of test non-air conduction test voice frameb:
Wherein sgn [. C]In order to take the sign of the operation,is an adjustment factor, T is the initial threshold value, M is the frame length, when CbWhen the value is larger than the preset threshold value, judging the frame as a voice signal, otherwise, judging the frame as noise, and obtaining the end point position of the non-air conduction test voice signal according to the judgment result of each frame;
s2.3, taking the time corresponding to the non-air conduction test voice signal end point tested in the step S2.2 as an end point of the air conduction test voice, and extracting a pure noise section in the air conduction test voice;
s2.4, calculating the power spectrum mean value phi of the pure noise section signal in the air conduction test voicevv(ω)。
Further, in step S3, a Vector Taylor series model (VTS) compensation technique is first adopted, a statistical model of air conduction noise is used to correct parameters of an air conduction speech data stream in the dual-channel speech combined classification model, and then the input air conduction test speech frame and the input non-air conduction test speech frame are classified, wherein the following formula is adopted to correct the mean value of each gaussian component of the air conduction speech data stream in the dual-channel speech combined classification model:
whereinAndand respectively enabling power spectrums of clean air conduction training voice and noise belonging to the l-th class to respectively pass through a 24-dimensionalmel filter bank and take the mean values after logarithm, C is a DCT (discrete cosine transformation) matrix, other parameters in the dual-channel voice combined classification model are kept unchanged, and classifying the synchronously input air conduction test voice frame and the non-air conduction test voice frame by adopting the modified dual-channel voice combined classification model to obtain the classification scores q (k, l) of each classification corresponding to the current air conduction test voice frame and the non-air conduction test voice frame.
Further, in step S4, for the air conduction test speech and the non-air conduction test speech acquired synchronously at the kth frame, the enhanced air conduction speech spectrum is calculated by using the following formula:
wherein Y (omega, k), X (omega, k) and B (omega, k) are respectively the enhanced air conduction voice of the kth frameThe frequency spectra of the air conduction test speech and the non-air conduction test speech,for the frequency responses of the wiener filters corresponding to the k-th frame of air conduction test speech and the non-air conduction test speech, the following equations are respectively used to calculate
Where q (k, l) is the classification score for the kth frame of air conduction test speech and the non-air conduction test speech corresponding to class I of the two-channel speech joint classification model, Ha(omega, k, l) is the frequency response of the wiener filter of the kth frame air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Hna(omega, k, l) is the frequency response of the wiener filter of the kth frame of non-air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
the other purpose of the invention is realized by the following technical scheme:
an implementation device of a dual-sensor speech enhancement method based on dual-channel wiener filtering comprises an air conduction speech sensor, a non-air conduction speech sensor, a noise model estimation module, a dual-channel speech joint classification model, a model compensation module, a frame classification module, a filter coefficient generation module and a dual-channel filter, wherein,
the air conduction voice sensor and the non-air conduction voice sensor are respectively connected with the noise model estimation module, the frame classification module and the dual-channel filter; the dual-channel speech joint classification model, the model compensation module, the frame classification module, the filter coefficient generation module and the dual-channel filter are sequentially connected, the noise model estimation module is connected with the model compensation module and the filter coefficient generation module, and the dual-channel speech joint classification model is connected with the filter coefficient generation module;
the air conduction voice sensor and the non-air conduction voice sensor are respectively used for collecting air conduction voice signals and non-air conduction voice signals, the noise model estimation module is used for estimating a model and a power spectrum of current air conduction noise, the dual-channel voice combined classification model adopts clean air conduction training voice and non-air conduction training voice which are synchronously collected to establish an air conduction voice frame and a non-air conduction voice frame, and the mean value of the power spectrum of each classified air conduction voice in the dual-channel voice combined classification model is phiss(omega, l) and the mean value of the power spectrum of the non-air-conduction speech is phibbThe cross-spectral mean between (ω, l), air-guided speech and non-air-guided speech is Φbs(omega, l), the model compensation module utilizes the statistical model of air conduction noise to revise the parameter of the dual-channel speech joint classification model, the frame classification module classify the current synchronous input air conduction test speech frame and the non-air conduction test speech frame, the filter coefficient generation module construct the dual-channel wiener filter according to the classification result and the power spectrum of the air conduction noise, the dual-channel filter measure the air conductionAnd filtering the test voice frame and the non-air conduction test voice frame to obtain the enhanced air conduction voice.
Further, the air conduction voice sensor is a microphone, and the non-air conduction voice sensor is a throat microphone.
Compared with the prior art, the invention has the following advantages and effects:
(1) compared with the voice enhancement technology only based on the air conduction test voice or the non-air conduction test voice, the method and the device have the advantages that the information of the air conduction test voice and the non-air conduction test voice is simultaneously utilized during enhancement, and a better enhancement effect can be achieved.
(2) The invention adopts the dual-channel speech joint classification model to fuse the information of the air conduction test speech and the non-air conduction test speech, can make the frame classification more accurate, and fully utilizes the correlation and the prior knowledge of the two.
(3) Compared with the Chinese patent 201610025390.7, the method for restoring the air conduction voice by the two-channel wiener filter is simpler in calculation, can avoid the defects of high-frequency or mute noise and failure in utilizing air conduction voice information when restoring the air conduction voice from non-air conduction voice, and has better performance.
(4) The invention adopts the two-channel wiener filter to recover the air conduction voice, and avoids the assumption that the non-air conduction voice and the air conduction voice are mutually independent.
Drawings
FIG. 1 is a block diagram of an apparatus for implementing a dual-channel wiener filtering-based dual-sensor speech enhancement method disclosed in the embodiments of the present invention;
FIG. 2 is a flowchart of a dual-channel wiener filtering-based dual-sensor speech enhancement method disclosed in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment discloses a structural block diagram of an implementation device of a dual-sensor voice enhancement method based on dual-channel wiener filtering, as shown in fig. 1, the device comprises an air conduction voice sensor, a non-air conduction voice sensor, a noise model estimation module, a dual-channel voice joint classification model, a model compensation module, a frame classification module, a filter coefficient generation module and a dual-channel filter, the air conduction voice sensor and the non-air conduction voice sensor are respectively connected with the noise model estimation module, the frame classification module and the dual-channel filter, the dual-channel voice combined classification model, the model compensation module, the frame classification module, the filter coefficient generation module and the dual-channel filter are sequentially connected, the noise model estimation module is connected with the model compensation module and the filter coefficient generation module, and the dual-channel voice combined classification model is connected with the filter coefficient generation module.
In this embodiment, the air conduction voice sensor is a microphone, and the non-air conduction voice sensor is a throat microphone, and the air conduction voice sensor and the throat microphone are used for acquiring air conduction voice signals and non-air conduction voice signals; the noise model estimation module is used for estimating a model and a power spectrum of the current air conduction noise. The dual-channel speech combined classification model adopts synchronously acquired clean air conduction training speech and non-air conduction training speech to establish air conduction speech frames and non-air conduction speech frames, and the average value phi of the air conduction speech power spectrum of each classification in the dual-channel speech combined classification modelss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω, l). And the model compensation module corrects the parameters of the dual-channel speech combined classification model by using the statistical model of the air conduction noise. And the frame classification module classifies the currently synchronously input air conduction test voice and non-air conduction test voice frames. And the filter coefficient generating module constructs a dual-channel wiener filter according to the classification result and the power spectrum of the air conduction noise. Dual-channel filter pair air conduction test voice frame and non-air conduction testAnd filtering the voice test frame to obtain the enhanced air conduction voice.
Example two
The embodiment discloses a dual-sensor speech enhancement method based on dual-channel wiener filtering, according to the implementation device disclosed in the embodiment, the following steps are adopted to calculate enhanced air conduction speech by using input air conduction test speech and non-air conduction test speech, and the flow is shown in fig. 2:
step S1, collecting clean air conduction training voice and non-air conduction training voice synchronously, establishing a dual-channel voice combined classification model of air conduction voice frames and non-air conduction voice frames, and calculating an air conduction voice power spectrum average value phi corresponding to each classification in the dual-channel voice combined classification modelss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω, l), where ω is frequency and l is the number of the class.
The following steps are adopted in the embodiment to complete the process:
s1.1, framing and preprocessing clean air conduction training voice and non-air conduction training voice which are synchronously collected, and extracting characteristic parameters of each frame of voice.
In this embodiment, the clean air conduction training voice and the non-air conduction training voice which are synchronously acquired are framed according to the frame length of 30ms and the frame shift of 10ms, and each frame of the clean air conduction training voice and the non-air conduction training voice is windowed by using a hamming window respectively and is subjected to pre-emphasis, and then the power spectrums of the clean air conduction training voice and the non-air conduction training voice are obtained. And respectively enabling the power spectrums of the air conduction training voice and the non-air conduction training voice to pass through a 24-dimensional Mel filter bank, logarithm is taken from the output of the filter bank, and then DCT transformation is carried out to obtain two groups of 12-dimensional Mel frequency cepstrum coefficients which are used as training characteristics of a two-channel voice combined classification model.
S1.2, training a dual-channel speech joint classification model by using the clean air conduction speech and non-air conduction speech characteristics obtained in the step S1.1. In this embodiment, the two-channel speech joint classification model uses multiple data streams GMM, i.e.
Where N (o, μ, σ) is a Gaussian function, ox(k) And ob(k) For the feature vectors extracted from the k-th frame of air conduction test speech and non-air conduction test speech,andis the mean of the first gaussian components of the air-guide speech data stream and the non-air-guide speech data stream in the multi-data stream GMM,andvariance of the first Gaussian component of the flow of air-guiding and non-air-guiding speech data in a multi-data-flow GMM, clIs the weight of the first Gaussian component in multiple data streams GMM, wxAnd wbThe weights of the air-guide voice data stream and the non-air-guide voice data stream in the multi-data stream GMM are respectively, and L is the number of Gaussian components.
Parameter c in dual-channel speech joint classification modell、wx、wb、Andthe maximum Expectation (Expectation Maximization) algorithm is used for estimation.
S1.3, classifying all air conduction training speech frames and non-air conduction speech frames by using the trained dual-channel speech combined classification model, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in each classificationss(omega, l) non-air conduction speech power spectrum mean value phibb(omega, l), air-guided speech and non-air-guided speechCross spectral mean phi betweenbs(ω,l)。
In this embodiment, each gaussian component in the dual-channel speech joint classification model represents a classification, and for each pair of synchronous air conduction training speech frame and non-air conduction speech frame, the score of each classification is calculated by using the following formula
The current air conduction training speech frame and the non-air conduction speech frame belong to the class with the highest score. Calculating the classification of all air conduction training speech frames and non-air conduction speech frames, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in the same classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
Step S2, synchronously collecting air conduction test voice and non-air conduction test voice, establishing a statistical model of air conduction noise by using pure noise section of the air conduction test voice, and calculating the power spectrum mean value phi of the air conduction noisevv(ω)。
In this embodiment, the statistical model of the air conduction noise is the power spectrum mean value Φ of the air conduction noisevv(ω), calculated using the following method:
s2.1, synchronously acquiring air conduction test voice and non-air conduction test voice and framing;
s2.2, testing the short-time autocorrelation function R of the non-air conduction testing voice frame according to the testb(m) and short-term energy EbCalculating the short-time average threshold crossing rate C of each frame of test non-air conduction test voice frameb:
Wherein sgn [. C]In order to take the sign of the operation,is the adjustment factor, T is the threshold initial value, and M is the frame length. When C is presentbWhen the value is larger than the preset threshold value, judging the frame as a voice signal, otherwise, judging the frame as noise, and obtaining the end point position of the non-air conduction test voice signal according to the judgment result of each frame;
s2.3, taking the time corresponding to the non-air conduction test voice signal end point tested in the step S2.2 as an end point of the air conduction test voice, and extracting a pure noise section in the air conduction test voice;
s2.4, calculating the power spectrum mean value phi of the pure noise section signal in the air conduction test voicevv(ω)。
The statistical model of the air conduction noise is a Gaussian function, a GMM model or an HMM model.
And S3, classifying the synchronously input air conduction test speech frames and non-air conduction test speech frames by utilizing the statistical model of the air conduction noise and the dual-channel speech joint classification model in the step S1.
In this embodiment, a VTS model compensation technique is first adopted, and a statistical model of air conduction noise is used to correct parameters of an air conduction speech data stream in a dual-channel speech combined classification model, and then an input air conduction test speech frame and a non-air conduction test speech frame are classified. The specific method is to adopt the following formula to correct the mean value of each Gaussian component of the air guide voice data flow in the dual-channel voice combined classification model:
whereinAndthe power spectra of clean air conduction training speech and noise belonging to the first class are passed through a 24-dimensionalmel filter bank and the mean values after logarithmic calculation are taken, and C is Discrete Cosine Transform (DCT). Dual channel speech joint classificationOther parameters in the model remain unchanged. And classifying the synchronously input air conduction test voice frame and the non-air conduction test voice frame by adopting the corrected two-channel voice combined classification model to obtain the classification score q (k, l) of each classification corresponding to the current air conduction test voice frame and the non-air conduction test voice frame.
Step S4, sorting result according to step S3 and phivv(omega) constructing a dual-channel wiener filter, and filtering the air conduction test voice frame and the non-air conduction test voice frame to obtain the enhanced air conduction voice.
In this embodiment, for the air conduction test voice and the non-air conduction test voice acquired synchronously at the kth frame, the enhanced air conduction voice spectrum is calculated by using the following formula:
wherein Y (omega, k), X (omega, k) and B (omega, k) are respectively the frequency spectrums of the enhanced air conduction voice, the air conduction test voice and the non-air conduction test voice of the kth frame,for the frequency responses of the wiener filters corresponding to the k-th frame of air conduction test speech and the non-air conduction test speech, the following equations are respectively used to calculate
Q (k, l) in the formula is the classification score of the kth frame of air conduction test voice and the non-air conduction test voice corresponding to the l class of the dual-channel voice joint classification model. Ha(omega, k, l) is the frequency response of the wiener filter of the kth frame air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Hna(omega, k, l) is the frequency response of the wiener filter of the kth frame of non-air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
the above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (10)
1. A dual-sensor speech enhancement method based on dual-channel wiener filtering is characterized by comprising the following steps:
s1, synchronously collecting clean air conduction training voice and non-air conduction training voice, establishing a dual-channel voice combined classification model of air conduction voice frames and non-air conduction voice frames, and calculating an air conduction voice power spectrum average value phi corresponding to each classification in the dual-channel voice combined classification modelss(omega, l) non-air conduction speech power spectrum mean value phibbBetween (omega, l), air-conducting speech and non-air-conducting speechCross spectral mean phibs(ω, l), where ω is frequency and l is the number of the class;
s2, synchronously collecting air conduction test voice and non-air conduction test voice, establishing a statistical model of air conduction noise by using pure noise section of the air conduction test voice, and calculating the power spectrum mean value phi of the air conduction noisevv(ω);
S3, classifying the synchronously input air conduction test voice frame and the non-air conduction test voice frame by using the statistical model of the air conduction noise and the dual-channel voice combined classification model in the step S1;
s4, classifying result and power spectrum mean value phi according to the step S3vv(omega) constructing a dual-channel wiener filter, and filtering the air conduction test voice frame and the non-air conduction test voice frame to obtain the enhanced air conduction voice.
2. The dual-sensor speech enhancement method of claim 1, wherein the step S1 is performed as follows:
s1.1, framing and preprocessing clean air conduction training voice and non-air conduction training voice which are synchronously collected, and extracting characteristic parameters of each frame of voice, wherein the characteristic parameters are reverse Mel spectral coefficients;
s1.2, training a dual-channel speech joint classification model by using the clean air conduction speech and non-air conduction speech characteristics obtained in the step S1.1;
s1.3, classifying all air conduction training speech frames and non-air conduction speech frames by using the trained dual-channel speech combined classification model, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in each classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
3. The dual-sensor speech enhancement method of claim 2, wherein in step S1.2, the dual-channel speech joint classification model uses multiple data streams GMM, where GMM is a Gaussian Mixture Model (GMM)
Where N (o, μ, σ) is a Gaussian function, ox(k) And ob(k) For the feature vectors extracted from the k-th frame of air conduction test speech and non-air conduction test speech,andis the mean of the first gaussian components of the air-guide speech data stream and the non-air-guide speech data stream in the multi-data stream GMM,andvariance of the first Gaussian component of the flow of air-guiding and non-air-guiding speech data in a multi-data-flow GMM, clIs the weight of the first Gaussian component in multiple data streams GMM, wxAnd wbThe weights of the air-guide voice data stream and the non-air-guide voice data stream in the multi-data stream GMM are respectively, and L is the number of Gaussian components.
4. The dual-sensor speech enhancement method of claim 3 wherein in step S1.3, each Gaussian component in the dual-channel speech joint classification model represents a class, and for each pair of synchronous air conduction training speech frames and non-air conduction speech frames, the score for each class is calculated using the following equation
Wherein the current air conduction training speech frame andthe non-air conduction speech frame belongs to the class with the highest score; calculating the classification of all air conduction training speech frames and non-air conduction speech frames, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in the same classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
5. The dual-sensor speech enhancement method of claim 1, wherein the statistical model of the air conduction noise is the power spectrum mean Φ of the air conduction noisevv(ω), calculated using the following method:
s2.1, synchronously acquiring air conduction test voice and non-air conduction test voice and framing;
s2.2, testing the short-time autocorrelation function R of the non-air conduction testing voice frame according to the testb(m) and short-term energy EbCalculating the short-time average threshold crossing rate C of each frame of test non-air conduction test voice frameb:
Wherein sgn [. C]In order to take the sign of the operation,is an adjustment factor, T is the initial threshold value, M is the frame length, when CbWhen the value is larger than the preset threshold value, judging the frame as a voice signal, otherwise, judging the frame as noise, and obtaining the end point position of the non-air conduction test voice signal according to the judgment result of each frame;
s2.3, taking the time corresponding to the non-air conduction test voice signal end point tested in the step S2.2 as an end point of the air conduction test voice, and extracting a pure noise section in the air conduction test voice;
s2.4, calculating the power spectrum mean value phi of the pure noise section signal in the air conduction test voicevv(ω)。
6. The dual-sensor speech enhancement method of claim 1, wherein in step S3, a vector taylor series model compensation technique is first used, a statistical model of the air conduction noise is used to correct parameters of the air conduction speech data stream in the dual-channel speech combined classification model, and then the input air conduction test speech frame and the input non-air conduction test speech frame are classified, wherein the mean value of each gaussian component of the air conduction speech data stream in the dual-channel speech combined classification model is corrected by the following formula:
whereinAndand respectively enabling power spectrums of clean air conduction training voice and noise belonging to the l-th class to respectively pass through a 24-dimensional Mel filter bank and take the mean values after logarithm, C is a discrete cosine transform matrix, other parameters in the dual-channel voice combined classification model are kept unchanged, and classifying synchronously input air conduction test voice frames and non-air conduction test voice frames by adopting the modified dual-channel voice combined classification model to obtain classification scores q (k, l) of the current air conduction test voice frames and the non-air conduction test voice frames corresponding to each classification.
7. The dual-sensor speech enhancement method of claim 2, wherein in step S4, for the k-th frame of synchronously acquired air conduction test speech and non-air conduction test speech, the spectrum of the enhanced air conduction speech is calculated by using the following formula:
wherein Y (omega, k), X (omega, k) and B (omega, k) are respectively the frequency spectrums of the enhanced air conduction voice, the air conduction test voice and the non-air conduction test voice of the kth frame,for the frequency responses of the wiener filters corresponding to the k-th frame of air conduction test speech and the non-air conduction test speech, the following equations are respectively used to calculate
Where q (k, l) is the classification score for the kth frame of air conduction test speech and the non-air conduction test speech corresponding to class I of the two-channel speech joint classification model, Ha(omega, k, l) is the frequency response of the wiener filter of the kth frame air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Hna(omega, k, l) is the frequency response of the wiener filter of the kth frame of non-air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
9. an implementation device of a dual-sensor speech enhancement method based on dual-channel wiener filtering is characterized by comprising an air conduction speech sensor, a non-air conduction speech sensor, a noise model estimation module, a dual-channel speech joint classification model, a model compensation module, a frame classification module, a filter coefficient generation module and a dual-channel filter, wherein,
the air conduction voice sensor and the non-air conduction voice sensor are respectively connected with the noise model estimation module, the frame classification module and the dual-channel filter; the dual-channel speech joint classification model, the model compensation module, the frame classification module, the filter coefficient generation module and the dual-channel filter are sequentially connected, the noise model estimation module is connected with the model compensation module and the filter coefficient generation module, and the dual-channel speech joint classification model is connected with the filter coefficient generation module;
the air conduction voice sensor and the non-air conduction voice sensor are respectively used for collecting air conduction voice signals and non-air conduction voice signals, the noise model estimation module is used for estimating a model and a power spectrum of current air conduction noise, the dual-channel voice combined classification model adopts clean air conduction training voice and non-air conduction training voice which are synchronously collected to establish an air conduction voice frame and a non-air conduction voice frame, and the mean value of the power spectrum of each classified air conduction voice in the dual-channel voice combined classification model is phiss(omega, l) and the mean value of the power spectrum of the non-air-conduction speech is phibbThe cross-spectral mean between (ω, l), air-guided speech and non-air-guided speech is Φbs(ω, l), said model compensation module jointly classifying the two-channel speech using a statistical model of air conduction noiseThe parameters of the model are corrected, the frame classification module classifies the current synchronously input air conduction test voice frame and the non-air conduction test voice frame, the filter coefficient generation module constructs a dual-channel wiener filter according to the classification result and the power spectrum of air conduction noise, and the dual-channel filter filters the air conduction test voice frame and the non-air conduction test voice frame to obtain enhanced air conduction voice.
10. The apparatus for implementing a dual-sensor speech enhancement method according to claim 9, wherein said air conduction speech sensor is a microphone and said non-air conduction speech sensor is a throat microphone.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910678398.7A CN110390945B (en) | 2019-07-25 | 2019-07-25 | Dual-sensor voice enhancement method and implementation device |
PCT/CN2019/110290 WO2021012403A1 (en) | 2019-07-25 | 2019-10-10 | Dual sensor speech enhancement method and implementation device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910678398.7A CN110390945B (en) | 2019-07-25 | 2019-07-25 | Dual-sensor voice enhancement method and implementation device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110390945A CN110390945A (en) | 2019-10-29 |
CN110390945B true CN110390945B (en) | 2021-09-21 |
Family
ID=68287587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910678398.7A Expired - Fee Related CN110390945B (en) | 2019-07-25 | 2019-07-25 | Dual-sensor voice enhancement method and implementation device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110390945B (en) |
WO (1) | WO2021012403A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111009253B (en) * | 2019-11-29 | 2022-10-21 | 联想(北京)有限公司 | Data processing method and device |
CN111524531A (en) * | 2020-04-23 | 2020-08-11 | 广州清音智能科技有限公司 | Method for real-time noise reduction of high-quality two-channel video voice |
CN116470959A (en) * | 2022-07-12 | 2023-07-21 | 苏州旭创科技有限公司 | Filter implementation method, noise suppression method, device and computer equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004279768A (en) * | 2003-03-17 | 2004-10-07 | Mitsubishi Heavy Ind Ltd | Device and method for estimating air-conducted sound |
CN203165457U (en) * | 2013-03-08 | 2013-08-28 | 华南理工大学 | Voice acquisition device used for noisy environment |
CN106328156A (en) * | 2016-08-22 | 2017-01-11 | 华南理工大学 | Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information |
WO2018229503A1 (en) * | 2017-06-16 | 2018-12-20 | Cirrus Logic International Semiconductor Limited | Earbud speech estimation |
CN110010143A (en) * | 2019-04-19 | 2019-07-12 | 出门问问信息科技有限公司 | A kind of voice signals enhancement system, method and storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9711127B2 (en) * | 2011-09-19 | 2017-07-18 | Bitwave Pte Ltd. | Multi-sensor signal optimization for speech communication |
CN103208291A (en) * | 2013-03-08 | 2013-07-17 | 华南理工大学 | Speech enhancement method and device applicable to strong noise environments |
CN105513605B (en) * | 2015-12-01 | 2019-07-02 | 南京师范大学 | The speech-enhancement system and sound enhancement method of mobile microphone |
CN110070883B (en) * | 2016-01-14 | 2023-07-28 | 深圳市韶音科技有限公司 | Speech enhancement method |
JP2018063400A (en) * | 2016-10-14 | 2018-04-19 | 富士通株式会社 | Audio processing apparatus and audio processing program |
CN107886967B (en) * | 2017-11-18 | 2018-11-13 | 中国人民解放军陆军工程大学 | A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network |
CN108986834B (en) * | 2018-08-22 | 2023-04-07 | 中国人民解放军陆军工程大学 | Bone conduction voice blind enhancement method based on codec framework and recurrent neural network |
-
2019
- 2019-07-25 CN CN201910678398.7A patent/CN110390945B/en not_active Expired - Fee Related
- 2019-10-10 WO PCT/CN2019/110290 patent/WO2021012403A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004279768A (en) * | 2003-03-17 | 2004-10-07 | Mitsubishi Heavy Ind Ltd | Device and method for estimating air-conducted sound |
CN203165457U (en) * | 2013-03-08 | 2013-08-28 | 华南理工大学 | Voice acquisition device used for noisy environment |
CN106328156A (en) * | 2016-08-22 | 2017-01-11 | 华南理工大学 | Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information |
WO2018229503A1 (en) * | 2017-06-16 | 2018-12-20 | Cirrus Logic International Semiconductor Limited | Earbud speech estimation |
CN110010143A (en) * | 2019-04-19 | 2019-07-12 | 出门问问信息科技有限公司 | A kind of voice signals enhancement system, method and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2021012403A1 (en) | 2021-01-28 |
CN110390945A (en) | 2019-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390945B (en) | Dual-sensor voice enhancement method and implementation device | |
CN109273021B (en) | RNN-based real-time conference noise reduction method and device | |
TWI763073B (en) | Deep learning based noise reduction method using both bone-conduction sensor and microphone signals | |
CN110070880B (en) | Establishment method and application method of combined statistical model for classification | |
CN110197665B (en) | Voice separation and tracking method for public security criminal investigation monitoring | |
JP2003255993A (en) | System, method, and program for speech recognition, and system, method, and program for speech synthesis | |
KR102429152B1 (en) | Deep learning voice extraction and noise reduction method by fusion of bone vibration sensor and microphone signal | |
Aichner et al. | Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming | |
CN103325381A (en) | Speech separation method based on fuzzy membership function | |
WO2022027423A1 (en) | Deep learning noise reduction method and system fusing signal of bone vibration sensor with signals of two microphones | |
CN103208291A (en) | Speech enhancement method and device applicable to strong noise environments | |
CN110942784A (en) | Snore classification system based on support vector machine | |
Zheng et al. | Spectra restoration of bone-conducted speech via attention-based contextual information and spectro-temporal structure constraint | |
JP2002268698A (en) | Voice recognition device, device and method for standard pattern generation, and program | |
CN112185405B (en) | Bone conduction voice enhancement method based on differential operation and combined dictionary learning | |
CN203165457U (en) | Voice acquisition device used for noisy environment | |
CN113327589B (en) | Voice activity detection method based on attitude sensor | |
CN111968627B (en) | Bone conduction voice enhancement method based on joint dictionary learning and sparse representation | |
CN115410591A (en) | Dual self-adaptive intelligent voice recognition method for VR live broadcast scene | |
CN114566179A (en) | Time delay controllable voice noise reduction method | |
CN112992131A (en) | Method for extracting ping-pong command of target voice in complex scene | |
Deng et al. | Vision-Guided Speaker Embedding Based Speech Separation | |
CN106971733A (en) | The method and system and intelligent terminal of Application on Voiceprint Recognition based on voice de-noising | |
KR20100056859A (en) | Voice recognition apparatus and method | |
Thomsen et al. | Speech enhancement and noise-robust automatic speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210921 |