CN110390945B - Dual-sensor voice enhancement method and implementation device - Google Patents

Dual-sensor voice enhancement method and implementation device Download PDF

Info

Publication number
CN110390945B
CN110390945B CN201910678398.7A CN201910678398A CN110390945B CN 110390945 B CN110390945 B CN 110390945B CN 201910678398 A CN201910678398 A CN 201910678398A CN 110390945 B CN110390945 B CN 110390945B
Authority
CN
China
Prior art keywords
air conduction
speech
voice
dual
air
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910678398.7A
Other languages
Chinese (zh)
Other versions
CN110390945A (en
Inventor
张军
李�学
宁更新
冯义志
余华
季飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910678398.7A priority Critical patent/CN110390945B/en
Priority to PCT/CN2019/110290 priority patent/WO2021012403A1/en
Publication of CN110390945A publication Critical patent/CN110390945A/en
Application granted granted Critical
Publication of CN110390945B publication Critical patent/CN110390945B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a dual-sensor speech enhancement method based on dual-channel wiener filtering and an implementation device. Compared with the prior art, the method and the device have the advantages that the information contained in the air conduction voice and the non-air conduction voice is more fully fused, the priori knowledge of the voice signals is introduced through the statistical model, and the enhancement effect of the voice enhancement system in a noise environment can be effectively improved. The invention can be widely applied to various occasions such as video call, vehicle-mounted telephone, multimedia classroom, military communication and the like.

Description

Dual-sensor voice enhancement method and implementation device
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a dual-sensor voice enhancement method based on dual-channel wiener filtering and an implementation device.
Background
In actual voice communication, a voice signal is often interfered by external environmental noise, and the quality of received voice is affected. The speech enhancement technology is an important branch of speech signal processing, aims to extract pure original speech from noisy speech as far as possible, and is widely applied to the fields of speech communication, speech compression coding, speech recognition and the like in a noisy environment.
Since human ears sense sound through air vibration, most of the existing speech enhancement algorithms are directed at air conduction (air conduction for short), that is, speech is collected by an air conduction sensor (such as a microphone), the enhancement effect is greatly influenced by various acoustic noises in the environment, and the performance is usually not good in a noisy environment. To reduce the impact of ambient noise on speech quality, non-air-conduction (referred to as non-air-conduction) sensors such as laryngeal microphones, bone conduction microphones, etc. are often used for speech acquisition in noisy environments. Different from the air conduction sensor, the non-air conduction voice sensor utilizes the vibration of the vocal cords, the jaw bones and other parts of a speaker to drive the reed or the carbon film in the sensor to change, change the resistance value of the reed or the carbon film, change the voltage at two ends of the reed or the carbon film, and convert a vibration signal into an electric signal, namely a voice signal. The reed or the carbon film of the non-air-conduction sensor cannot be deformed by the sound waves conducted in the air, so that the non-air-conduction sensor is not influenced by the air-conduction sound and has strong acoustic noise resistance. However, the non-air conduction sensor collects the voice transmitted through the vibration of the jaw bone, muscle, skin and other parts, and the high frequency part of the voice is seriously lost, which is manifested as stuffiness and vague voice and poorer speech intelligibility.
In view of the shortcomings of both air conduction and non-air conduction sensors when used alone, some speech enhancement methods have been developed in recent years that combine the advantages of both. These methods utilize the complementarity of air-borne speech and non-air-borne speech, and employ multi-sensor fusion techniques to achieve speech enhancement, often achieving better results than single-sensor speech enhancement systems. The existing dual-sensor voice enhancement mainly comprises two modes, namely, firstly recovering air conduction voice from non-air conduction voice, and then fusing the air conduction voice with noise; and the other method is to recover the air conduction voice from the non-air conduction voice, enhance the air conduction voice with noise by using signals of the air conduction sensor and the non-air conduction sensor, and then fuse the air conduction voice and the non-air conduction sensor. These techniques suffer from the following disadvantages: (1) when restoring air conduction speech using non-air conduction speech, additional noise may be introduced in the high frequency or silence, affecting the enhancement effect. (2) When recovering air conduction speech using non-air conduction speech, information of current air conduction speech is not utilized. (3) When the air conduction speech restored by using the non-air conduction speech is fused with the air conduction speech, the correlation and the prior knowledge of the air conduction speech and the air conduction speech cannot be fully utilized. (4) The non-air-guided speech and the air-guided speech are generally assumed to be independent of each other in the fusion, but this assumption does not hold in practice.
Chinese patent 201610025390.7 discloses a method and apparatus for dual-sensor speech enhancement based on statistical models, the invention firstly combines non-air conduction voice and air conduction voice to construct a combined statistical model for classification and carry out endpoint test, calculates the current optimal air conduction voice filter through the combined statistical model, the air conduction voice is subjected to filtering enhancement, then the non-air conduction voice is converted into the air conduction voice by utilizing a mapping model from the non-air conduction voice to the air conduction voice, and the weighted fusion is carried out on the air conduction voice after the filtering enhancement, the defects that the correlation and the prior knowledge of the air conduction voice and the air conduction voice recovered by a non-air conduction sensor cannot be fully utilized when the air conduction voice and the air conduction voice are fused are partially solved, however, the second step of fusion still uses the air conduction voice recovered from the non-air conduction voice, so that the method also has the defects of high-frequency and mute noise, information of the air conduction voice which cannot be utilized when the non-air conduction voice is used for recovering the air conduction voice, and the like.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a dual-sensor speech enhancement method based on dual-channel wiener filtering and a realization device. Compared with the prior art, the method and the device have the advantages that the information contained in the air conduction voice and the non-air conduction voice is more fully fused, the priori knowledge of the voice signals is introduced through the statistical model, and the enhancement effect of the voice enhancement system in a noise environment can be effectively improved. The invention can be widely applied to various occasions such as video call, vehicle-mounted telephone, multimedia classroom, military communication and the like.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a dual-sensor speech enhancement method based on dual-channel wiener filtering comprises the following steps:
s1, synchronously collecting clean air conduction training voice and non-air conduction training voice, establishing a dual-channel voice combined classification model of air conduction voice frames and non-air conduction voice frames, and calculating an air conduction voice power spectrum average value phi corresponding to each classification in the dual-channel voice combined classification modelss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω, l), where ω is frequency and l is the number of the class;
s2, synchronously collecting air conduction test voice and non-air conduction test voice, establishing a statistical model of air conduction noise by using pure noise section of the air conduction test voice, and calculating the power spectrum mean value phi of the air conduction noisevv(ω);
S3, classifying the synchronously input air conduction test voice frame and the non-air conduction test voice frame by using the statistical model of the air conduction noise and the dual-channel voice combined classification model in the step S1;
s4, classifying result and power spectrum mean value phi according to the step S3vv(omega) constructing a dual-channel wiener filter, and filtering the air conduction test voice frame and the non-air conduction test voice frame to obtain the enhanced air conduction voice.
Further, the step S1 is as follows:
s1.1, framing and preprocessing clean air conduction training voice and non-air conduction training voice which are synchronously collected, and extracting a characteristic parameter, namely a reverse Mel spectral coefficient, of each frame of voice;
s1.2, training a dual-channel speech joint classification model by using the clean air conduction speech and non-air conduction speech characteristics obtained in the step S1.1;
s1.3, use of trained pairsClassifying all air conduction training speech frames and non-air conduction speech frames by a channel speech combined classification model, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in each classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
Further, in step S1.2, the dual-channel speech joint classification Model adopts a multiple data stream Gaussian Mixture Model (GMM), that is, a Gaussian Mixture Model (GMM)
Figure GDA0003086461740000041
Where N (o, μ, σ) is a Gaussian function, ox(k) And ob(k) For the feature vectors extracted from the k-th frame of air conduction test speech and non-air conduction test speech,
Figure GDA0003086461740000042
and
Figure GDA0003086461740000043
is the mean of the first gaussian components of the air-guide speech data stream and the non-air-guide speech data stream in the multi-data stream GMM,
Figure GDA0003086461740000044
and
Figure GDA0003086461740000045
variance of the first Gaussian component of the flow of air-guiding and non-air-guiding speech data in a multi-data-flow GMM, clIs the weight of the first Gaussian component in multiple data streams GMM, wxAnd wbThe weights of the air-guide voice data stream and the non-air-guide voice data stream in the multi-data stream GMM are respectively, and L is the number of Gaussian components.
Further, in step S1.3, each gaussian component in the dual-channel speech joint classification model represents a classification, and for each pair of synchronous air conduction training speech frame and non-air conduction speech frame, the score of each classification is calculated by using the following formula
Figure GDA0003086461740000046
The current air conduction training speech frame and the non-air conduction speech frame belong to the classification with the highest score; calculating the classification of all air conduction training speech frames and non-air conduction speech frames, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in the same classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
Further, the statistical model of the air conduction noise is the power spectrum mean value phi of the air conduction noisevv(ω), calculated using the following method:
s2.1, synchronously acquiring air conduction test voice and non-air conduction test voice and framing;
s2.2, testing the short-time autocorrelation function R of the non-air conduction testing voice frame according to the testb(m) and short-term energy EbCalculating the short-time average threshold crossing rate C of each frame of test non-air conduction test voice frameb
Figure GDA0003086461740000051
Wherein sgn [. C]In order to take the sign of the operation,
Figure GDA0003086461740000052
is an adjustment factor, T is the initial threshold value, M is the frame length, when CbWhen the value is larger than the preset threshold value, judging the frame as a voice signal, otherwise, judging the frame as noise, and obtaining the end point position of the non-air conduction test voice signal according to the judgment result of each frame;
s2.3, taking the time corresponding to the non-air conduction test voice signal end point tested in the step S2.2 as an end point of the air conduction test voice, and extracting a pure noise section in the air conduction test voice;
s2.4, calculating the power spectrum mean value phi of the pure noise section signal in the air conduction test voicevv(ω)。
Further, in step S3, a Vector Taylor series model (VTS) compensation technique is first adopted, a statistical model of air conduction noise is used to correct parameters of an air conduction speech data stream in the dual-channel speech combined classification model, and then the input air conduction test speech frame and the input non-air conduction test speech frame are classified, wherein the following formula is adopted to correct the mean value of each gaussian component of the air conduction speech data stream in the dual-channel speech combined classification model:
Figure GDA0003086461740000053
wherein
Figure GDA0003086461740000054
And
Figure GDA0003086461740000055
and respectively enabling power spectrums of clean air conduction training voice and noise belonging to the l-th class to respectively pass through a 24-dimensionalmel filter bank and take the mean values after logarithm, C is a DCT (discrete cosine transformation) matrix, other parameters in the dual-channel voice combined classification model are kept unchanged, and classifying the synchronously input air conduction test voice frame and the non-air conduction test voice frame by adopting the modified dual-channel voice combined classification model to obtain the classification scores q (k, l) of each classification corresponding to the current air conduction test voice frame and the non-air conduction test voice frame.
Further, in step S4, for the air conduction test speech and the non-air conduction test speech acquired synchronously at the kth frame, the enhanced air conduction speech spectrum is calculated by using the following formula:
Figure GDA0003086461740000061
wherein Y (omega, k), X (omega, k) and B (omega, k) are respectively the enhanced air conduction voice of the kth frameThe frequency spectra of the air conduction test speech and the non-air conduction test speech,
Figure GDA0003086461740000062
for the frequency responses of the wiener filters corresponding to the k-th frame of air conduction test speech and the non-air conduction test speech, the following equations are respectively used to calculate
Figure GDA0003086461740000063
Figure GDA0003086461740000064
Where q (k, l) is the classification score for the kth frame of air conduction test speech and the non-air conduction test speech corresponding to class I of the two-channel speech joint classification model, Ha(omega, k, l) is the frequency response of the wiener filter of the kth frame air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Figure GDA0003086461740000065
Hna(omega, k, l) is the frequency response of the wiener filter of the kth frame of non-air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Figure GDA0003086461740000066
further, the
Figure GDA0003086461740000067
And
Figure GDA0003086461740000068
calculated using the formula:
Figure GDA0003086461740000069
the other purpose of the invention is realized by the following technical scheme:
an implementation device of a dual-sensor speech enhancement method based on dual-channel wiener filtering comprises an air conduction speech sensor, a non-air conduction speech sensor, a noise model estimation module, a dual-channel speech joint classification model, a model compensation module, a frame classification module, a filter coefficient generation module and a dual-channel filter, wherein,
the air conduction voice sensor and the non-air conduction voice sensor are respectively connected with the noise model estimation module, the frame classification module and the dual-channel filter; the dual-channel speech joint classification model, the model compensation module, the frame classification module, the filter coefficient generation module and the dual-channel filter are sequentially connected, the noise model estimation module is connected with the model compensation module and the filter coefficient generation module, and the dual-channel speech joint classification model is connected with the filter coefficient generation module;
the air conduction voice sensor and the non-air conduction voice sensor are respectively used for collecting air conduction voice signals and non-air conduction voice signals, the noise model estimation module is used for estimating a model and a power spectrum of current air conduction noise, the dual-channel voice combined classification model adopts clean air conduction training voice and non-air conduction training voice which are synchronously collected to establish an air conduction voice frame and a non-air conduction voice frame, and the mean value of the power spectrum of each classified air conduction voice in the dual-channel voice combined classification model is phiss(omega, l) and the mean value of the power spectrum of the non-air-conduction speech is phibbThe cross-spectral mean between (ω, l), air-guided speech and non-air-guided speech is Φbs(omega, l), the model compensation module utilizes the statistical model of air conduction noise to revise the parameter of the dual-channel speech joint classification model, the frame classification module classify the current synchronous input air conduction test speech frame and the non-air conduction test speech frame, the filter coefficient generation module construct the dual-channel wiener filter according to the classification result and the power spectrum of the air conduction noise, the dual-channel filter measure the air conductionAnd filtering the test voice frame and the non-air conduction test voice frame to obtain the enhanced air conduction voice.
Further, the air conduction voice sensor is a microphone, and the non-air conduction voice sensor is a throat microphone.
Compared with the prior art, the invention has the following advantages and effects:
(1) compared with the voice enhancement technology only based on the air conduction test voice or the non-air conduction test voice, the method and the device have the advantages that the information of the air conduction test voice and the non-air conduction test voice is simultaneously utilized during enhancement, and a better enhancement effect can be achieved.
(2) The invention adopts the dual-channel speech joint classification model to fuse the information of the air conduction test speech and the non-air conduction test speech, can make the frame classification more accurate, and fully utilizes the correlation and the prior knowledge of the two.
(3) Compared with the Chinese patent 201610025390.7, the method for restoring the air conduction voice by the two-channel wiener filter is simpler in calculation, can avoid the defects of high-frequency or mute noise and failure in utilizing air conduction voice information when restoring the air conduction voice from non-air conduction voice, and has better performance.
(4) The invention adopts the two-channel wiener filter to recover the air conduction voice, and avoids the assumption that the non-air conduction voice and the air conduction voice are mutually independent.
Drawings
FIG. 1 is a block diagram of an apparatus for implementing a dual-channel wiener filtering-based dual-sensor speech enhancement method disclosed in the embodiments of the present invention;
FIG. 2 is a flowchart of a dual-channel wiener filtering-based dual-sensor speech enhancement method disclosed in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment discloses a structural block diagram of an implementation device of a dual-sensor voice enhancement method based on dual-channel wiener filtering, as shown in fig. 1, the device comprises an air conduction voice sensor, a non-air conduction voice sensor, a noise model estimation module, a dual-channel voice joint classification model, a model compensation module, a frame classification module, a filter coefficient generation module and a dual-channel filter, the air conduction voice sensor and the non-air conduction voice sensor are respectively connected with the noise model estimation module, the frame classification module and the dual-channel filter, the dual-channel voice combined classification model, the model compensation module, the frame classification module, the filter coefficient generation module and the dual-channel filter are sequentially connected, the noise model estimation module is connected with the model compensation module and the filter coefficient generation module, and the dual-channel voice combined classification model is connected with the filter coefficient generation module.
In this embodiment, the air conduction voice sensor is a microphone, and the non-air conduction voice sensor is a throat microphone, and the air conduction voice sensor and the throat microphone are used for acquiring air conduction voice signals and non-air conduction voice signals; the noise model estimation module is used for estimating a model and a power spectrum of the current air conduction noise. The dual-channel speech combined classification model adopts synchronously acquired clean air conduction training speech and non-air conduction training speech to establish air conduction speech frames and non-air conduction speech frames, and the average value phi of the air conduction speech power spectrum of each classification in the dual-channel speech combined classification modelss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω, l). And the model compensation module corrects the parameters of the dual-channel speech combined classification model by using the statistical model of the air conduction noise. And the frame classification module classifies the currently synchronously input air conduction test voice and non-air conduction test voice frames. And the filter coefficient generating module constructs a dual-channel wiener filter according to the classification result and the power spectrum of the air conduction noise. Dual-channel filter pair air conduction test voice frame and non-air conduction testAnd filtering the voice test frame to obtain the enhanced air conduction voice.
Example two
The embodiment discloses a dual-sensor speech enhancement method based on dual-channel wiener filtering, according to the implementation device disclosed in the embodiment, the following steps are adopted to calculate enhanced air conduction speech by using input air conduction test speech and non-air conduction test speech, and the flow is shown in fig. 2:
step S1, collecting clean air conduction training voice and non-air conduction training voice synchronously, establishing a dual-channel voice combined classification model of air conduction voice frames and non-air conduction voice frames, and calculating an air conduction voice power spectrum average value phi corresponding to each classification in the dual-channel voice combined classification modelss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω, l), where ω is frequency and l is the number of the class.
The following steps are adopted in the embodiment to complete the process:
s1.1, framing and preprocessing clean air conduction training voice and non-air conduction training voice which are synchronously collected, and extracting characteristic parameters of each frame of voice.
In this embodiment, the clean air conduction training voice and the non-air conduction training voice which are synchronously acquired are framed according to the frame length of 30ms and the frame shift of 10ms, and each frame of the clean air conduction training voice and the non-air conduction training voice is windowed by using a hamming window respectively and is subjected to pre-emphasis, and then the power spectrums of the clean air conduction training voice and the non-air conduction training voice are obtained. And respectively enabling the power spectrums of the air conduction training voice and the non-air conduction training voice to pass through a 24-dimensional Mel filter bank, logarithm is taken from the output of the filter bank, and then DCT transformation is carried out to obtain two groups of 12-dimensional Mel frequency cepstrum coefficients which are used as training characteristics of a two-channel voice combined classification model.
S1.2, training a dual-channel speech joint classification model by using the clean air conduction speech and non-air conduction speech characteristics obtained in the step S1.1. In this embodiment, the two-channel speech joint classification model uses multiple data streams GMM, i.e.
Figure GDA0003086461740000101
Where N (o, μ, σ) is a Gaussian function, ox(k) And ob(k) For the feature vectors extracted from the k-th frame of air conduction test speech and non-air conduction test speech,
Figure GDA0003086461740000102
and
Figure GDA0003086461740000103
is the mean of the first gaussian components of the air-guide speech data stream and the non-air-guide speech data stream in the multi-data stream GMM,
Figure GDA0003086461740000104
and
Figure GDA0003086461740000105
variance of the first Gaussian component of the flow of air-guiding and non-air-guiding speech data in a multi-data-flow GMM, clIs the weight of the first Gaussian component in multiple data streams GMM, wxAnd wbThe weights of the air-guide voice data stream and the non-air-guide voice data stream in the multi-data stream GMM are respectively, and L is the number of Gaussian components.
Parameter c in dual-channel speech joint classification modell、wx、wb
Figure GDA0003086461740000111
And
Figure GDA0003086461740000112
the maximum Expectation (Expectation Maximization) algorithm is used for estimation.
S1.3, classifying all air conduction training speech frames and non-air conduction speech frames by using the trained dual-channel speech combined classification model, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in each classificationss(omega, l) non-air conduction speech power spectrum mean value phibb(omega, l), air-guided speech and non-air-guided speechCross spectral mean phi betweenbs(ω,l)。
In this embodiment, each gaussian component in the dual-channel speech joint classification model represents a classification, and for each pair of synchronous air conduction training speech frame and non-air conduction speech frame, the score of each classification is calculated by using the following formula
Figure GDA0003086461740000113
The current air conduction training speech frame and the non-air conduction speech frame belong to the class with the highest score. Calculating the classification of all air conduction training speech frames and non-air conduction speech frames, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in the same classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
Step S2, synchronously collecting air conduction test voice and non-air conduction test voice, establishing a statistical model of air conduction noise by using pure noise section of the air conduction test voice, and calculating the power spectrum mean value phi of the air conduction noisevv(ω)。
In this embodiment, the statistical model of the air conduction noise is the power spectrum mean value Φ of the air conduction noisevv(ω), calculated using the following method:
s2.1, synchronously acquiring air conduction test voice and non-air conduction test voice and framing;
s2.2, testing the short-time autocorrelation function R of the non-air conduction testing voice frame according to the testb(m) and short-term energy EbCalculating the short-time average threshold crossing rate C of each frame of test non-air conduction test voice frameb
Figure GDA0003086461740000121
Wherein sgn [. C]In order to take the sign of the operation,
Figure GDA0003086461740000122
is the adjustment factor, T is the threshold initial value, and M is the frame length. When C is presentbWhen the value is larger than the preset threshold value, judging the frame as a voice signal, otherwise, judging the frame as noise, and obtaining the end point position of the non-air conduction test voice signal according to the judgment result of each frame;
s2.3, taking the time corresponding to the non-air conduction test voice signal end point tested in the step S2.2 as an end point of the air conduction test voice, and extracting a pure noise section in the air conduction test voice;
s2.4, calculating the power spectrum mean value phi of the pure noise section signal in the air conduction test voicevv(ω)。
The statistical model of the air conduction noise is a Gaussian function, a GMM model or an HMM model.
And S3, classifying the synchronously input air conduction test speech frames and non-air conduction test speech frames by utilizing the statistical model of the air conduction noise and the dual-channel speech joint classification model in the step S1.
In this embodiment, a VTS model compensation technique is first adopted, and a statistical model of air conduction noise is used to correct parameters of an air conduction speech data stream in a dual-channel speech combined classification model, and then an input air conduction test speech frame and a non-air conduction test speech frame are classified. The specific method is to adopt the following formula to correct the mean value of each Gaussian component of the air guide voice data flow in the dual-channel voice combined classification model:
Figure GDA0003086461740000123
wherein
Figure GDA0003086461740000124
And
Figure GDA0003086461740000125
the power spectra of clean air conduction training speech and noise belonging to the first class are passed through a 24-dimensionalmel filter bank and the mean values after logarithmic calculation are taken, and C is Discrete Cosine Transform (DCT). Dual channel speech joint classificationOther parameters in the model remain unchanged. And classifying the synchronously input air conduction test voice frame and the non-air conduction test voice frame by adopting the corrected two-channel voice combined classification model to obtain the classification score q (k, l) of each classification corresponding to the current air conduction test voice frame and the non-air conduction test voice frame.
Step S4, sorting result according to step S3 and phivv(omega) constructing a dual-channel wiener filter, and filtering the air conduction test voice frame and the non-air conduction test voice frame to obtain the enhanced air conduction voice.
In this embodiment, for the air conduction test voice and the non-air conduction test voice acquired synchronously at the kth frame, the enhanced air conduction voice spectrum is calculated by using the following formula:
Figure GDA0003086461740000139
wherein Y (omega, k), X (omega, k) and B (omega, k) are respectively the frequency spectrums of the enhanced air conduction voice, the air conduction test voice and the non-air conduction test voice of the kth frame,
Figure GDA0003086461740000131
for the frequency responses of the wiener filters corresponding to the k-th frame of air conduction test speech and the non-air conduction test speech, the following equations are respectively used to calculate
Figure GDA0003086461740000132
Figure GDA0003086461740000133
Q (k, l) in the formula is the classification score of the kth frame of air conduction test voice and the non-air conduction test voice corresponding to the l class of the dual-channel voice joint classification model. Ha(omega, k, l) is the frequency response of the wiener filter of the kth frame air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Figure GDA0003086461740000134
Hna(omega, k, l) is the frequency response of the wiener filter of the kth frame of non-air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Figure GDA0003086461740000135
in another embodiment, the above
Figure GDA0003086461740000136
And
Figure GDA0003086461740000137
calculated using the formula:
Figure GDA0003086461740000138
the above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A dual-sensor speech enhancement method based on dual-channel wiener filtering is characterized by comprising the following steps:
s1, synchronously collecting clean air conduction training voice and non-air conduction training voice, establishing a dual-channel voice combined classification model of air conduction voice frames and non-air conduction voice frames, and calculating an air conduction voice power spectrum average value phi corresponding to each classification in the dual-channel voice combined classification modelss(omega, l) non-air conduction speech power spectrum mean value phibbBetween (omega, l), air-conducting speech and non-air-conducting speechCross spectral mean phibs(ω, l), where ω is frequency and l is the number of the class;
s2, synchronously collecting air conduction test voice and non-air conduction test voice, establishing a statistical model of air conduction noise by using pure noise section of the air conduction test voice, and calculating the power spectrum mean value phi of the air conduction noisevv(ω);
S3, classifying the synchronously input air conduction test voice frame and the non-air conduction test voice frame by using the statistical model of the air conduction noise and the dual-channel voice combined classification model in the step S1;
s4, classifying result and power spectrum mean value phi according to the step S3vv(omega) constructing a dual-channel wiener filter, and filtering the air conduction test voice frame and the non-air conduction test voice frame to obtain the enhanced air conduction voice.
2. The dual-sensor speech enhancement method of claim 1, wherein the step S1 is performed as follows:
s1.1, framing and preprocessing clean air conduction training voice and non-air conduction training voice which are synchronously collected, and extracting characteristic parameters of each frame of voice, wherein the characteristic parameters are reverse Mel spectral coefficients;
s1.2, training a dual-channel speech joint classification model by using the clean air conduction speech and non-air conduction speech characteristics obtained in the step S1.1;
s1.3, classifying all air conduction training speech frames and non-air conduction speech frames by using the trained dual-channel speech combined classification model, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in each classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
3. The dual-sensor speech enhancement method of claim 2, wherein in step S1.2, the dual-channel speech joint classification model uses multiple data streams GMM, where GMM is a Gaussian Mixture Model (GMM)
Figure FDA0003086461730000021
Where N (o, μ, σ) is a Gaussian function, ox(k) And ob(k) For the feature vectors extracted from the k-th frame of air conduction test speech and non-air conduction test speech,
Figure FDA0003086461730000022
and
Figure FDA0003086461730000023
is the mean of the first gaussian components of the air-guide speech data stream and the non-air-guide speech data stream in the multi-data stream GMM,
Figure FDA0003086461730000024
and
Figure FDA0003086461730000025
variance of the first Gaussian component of the flow of air-guiding and non-air-guiding speech data in a multi-data-flow GMM, clIs the weight of the first Gaussian component in multiple data streams GMM, wxAnd wbThe weights of the air-guide voice data stream and the non-air-guide voice data stream in the multi-data stream GMM are respectively, and L is the number of Gaussian components.
4. The dual-sensor speech enhancement method of claim 3 wherein in step S1.3, each Gaussian component in the dual-channel speech joint classification model represents a class, and for each pair of synchronous air conduction training speech frames and non-air conduction speech frames, the score for each class is calculated using the following equation
Figure FDA0003086461730000026
Wherein the current air conduction training speech frame andthe non-air conduction speech frame belongs to the class with the highest score; calculating the classification of all air conduction training speech frames and non-air conduction speech frames, and then calculating the air conduction speech power spectrum mean value phi of the air conduction training speech frames and the non-air conduction speech frames contained in the same classificationss(omega, l) non-air conduction speech power spectrum mean value phibbCross-spectral mean phi between (omega, l), air-conducting speech and non-air-conducting speechbs(ω,l)。
5. The dual-sensor speech enhancement method of claim 1, wherein the statistical model of the air conduction noise is the power spectrum mean Φ of the air conduction noisevv(ω), calculated using the following method:
s2.1, synchronously acquiring air conduction test voice and non-air conduction test voice and framing;
s2.2, testing the short-time autocorrelation function R of the non-air conduction testing voice frame according to the testb(m) and short-term energy EbCalculating the short-time average threshold crossing rate C of each frame of test non-air conduction test voice frameb
Figure FDA0003086461730000031
Wherein sgn [. C]In order to take the sign of the operation,
Figure FDA0003086461730000032
is an adjustment factor, T is the initial threshold value, M is the frame length, when CbWhen the value is larger than the preset threshold value, judging the frame as a voice signal, otherwise, judging the frame as noise, and obtaining the end point position of the non-air conduction test voice signal according to the judgment result of each frame;
s2.3, taking the time corresponding to the non-air conduction test voice signal end point tested in the step S2.2 as an end point of the air conduction test voice, and extracting a pure noise section in the air conduction test voice;
s2.4, calculating the power spectrum mean value phi of the pure noise section signal in the air conduction test voicevv(ω)。
6. The dual-sensor speech enhancement method of claim 1, wherein in step S3, a vector taylor series model compensation technique is first used, a statistical model of the air conduction noise is used to correct parameters of the air conduction speech data stream in the dual-channel speech combined classification model, and then the input air conduction test speech frame and the input non-air conduction test speech frame are classified, wherein the mean value of each gaussian component of the air conduction speech data stream in the dual-channel speech combined classification model is corrected by the following formula:
Figure FDA0003086461730000033
wherein
Figure FDA0003086461730000034
And
Figure FDA0003086461730000035
and respectively enabling power spectrums of clean air conduction training voice and noise belonging to the l-th class to respectively pass through a 24-dimensional Mel filter bank and take the mean values after logarithm, C is a discrete cosine transform matrix, other parameters in the dual-channel voice combined classification model are kept unchanged, and classifying synchronously input air conduction test voice frames and non-air conduction test voice frames by adopting the modified dual-channel voice combined classification model to obtain classification scores q (k, l) of the current air conduction test voice frames and the non-air conduction test voice frames corresponding to each classification.
7. The dual-sensor speech enhancement method of claim 2, wherein in step S4, for the k-th frame of synchronously acquired air conduction test speech and non-air conduction test speech, the spectrum of the enhanced air conduction speech is calculated by using the following formula:
Figure FDA0003086461730000041
wherein Y (omega, k), X (omega, k) and B (omega, k) are respectively the frequency spectrums of the enhanced air conduction voice, the air conduction test voice and the non-air conduction test voice of the kth frame,
Figure FDA0003086461730000042
for the frequency responses of the wiener filters corresponding to the k-th frame of air conduction test speech and the non-air conduction test speech, the following equations are respectively used to calculate
Figure FDA0003086461730000048
Figure FDA0003086461730000043
Where q (k, l) is the classification score for the kth frame of air conduction test speech and the non-air conduction test speech corresponding to class I of the two-channel speech joint classification model, Ha(omega, k, l) is the frequency response of the wiener filter of the kth frame air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Figure FDA0003086461730000044
Hna(omega, k, l) is the frequency response of the wiener filter of the kth frame of non-air conduction test voice corresponding to the l class of the dual-channel voice joint classification model, and the calculation method comprises the following steps:
Figure FDA0003086461730000045
8. the dual-sensor speech enhancement method of claim 7, wherein the speech enhancement is performed by a speech enhancement processor
Figure FDA0003086461730000046
And
Figure FDA0003086461730000047
calculated using the formula:
Figure FDA0003086461730000051
9. an implementation device of a dual-sensor speech enhancement method based on dual-channel wiener filtering is characterized by comprising an air conduction speech sensor, a non-air conduction speech sensor, a noise model estimation module, a dual-channel speech joint classification model, a model compensation module, a frame classification module, a filter coefficient generation module and a dual-channel filter, wherein,
the air conduction voice sensor and the non-air conduction voice sensor are respectively connected with the noise model estimation module, the frame classification module and the dual-channel filter; the dual-channel speech joint classification model, the model compensation module, the frame classification module, the filter coefficient generation module and the dual-channel filter are sequentially connected, the noise model estimation module is connected with the model compensation module and the filter coefficient generation module, and the dual-channel speech joint classification model is connected with the filter coefficient generation module;
the air conduction voice sensor and the non-air conduction voice sensor are respectively used for collecting air conduction voice signals and non-air conduction voice signals, the noise model estimation module is used for estimating a model and a power spectrum of current air conduction noise, the dual-channel voice combined classification model adopts clean air conduction training voice and non-air conduction training voice which are synchronously collected to establish an air conduction voice frame and a non-air conduction voice frame, and the mean value of the power spectrum of each classified air conduction voice in the dual-channel voice combined classification model is phiss(omega, l) and the mean value of the power spectrum of the non-air-conduction speech is phibbThe cross-spectral mean between (ω, l), air-guided speech and non-air-guided speech is Φbs(ω, l), said model compensation module jointly classifying the two-channel speech using a statistical model of air conduction noiseThe parameters of the model are corrected, the frame classification module classifies the current synchronously input air conduction test voice frame and the non-air conduction test voice frame, the filter coefficient generation module constructs a dual-channel wiener filter according to the classification result and the power spectrum of air conduction noise, and the dual-channel filter filters the air conduction test voice frame and the non-air conduction test voice frame to obtain enhanced air conduction voice.
10. The apparatus for implementing a dual-sensor speech enhancement method according to claim 9, wherein said air conduction speech sensor is a microphone and said non-air conduction speech sensor is a throat microphone.
CN201910678398.7A 2019-07-25 2019-07-25 Dual-sensor voice enhancement method and implementation device Expired - Fee Related CN110390945B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910678398.7A CN110390945B (en) 2019-07-25 2019-07-25 Dual-sensor voice enhancement method and implementation device
PCT/CN2019/110290 WO2021012403A1 (en) 2019-07-25 2019-10-10 Dual sensor speech enhancement method and implementation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910678398.7A CN110390945B (en) 2019-07-25 2019-07-25 Dual-sensor voice enhancement method and implementation device

Publications (2)

Publication Number Publication Date
CN110390945A CN110390945A (en) 2019-10-29
CN110390945B true CN110390945B (en) 2021-09-21

Family

ID=68287587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910678398.7A Expired - Fee Related CN110390945B (en) 2019-07-25 2019-07-25 Dual-sensor voice enhancement method and implementation device

Country Status (2)

Country Link
CN (1) CN110390945B (en)
WO (1) WO2021012403A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111009253B (en) * 2019-11-29 2022-10-21 联想(北京)有限公司 Data processing method and device
CN111524531A (en) * 2020-04-23 2020-08-11 广州清音智能科技有限公司 Method for real-time noise reduction of high-quality two-channel video voice
CN116470959A (en) * 2022-07-12 2023-07-21 苏州旭创科技有限公司 Filter implementation method, noise suppression method, device and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004279768A (en) * 2003-03-17 2004-10-07 Mitsubishi Heavy Ind Ltd Device and method for estimating air-conducted sound
CN203165457U (en) * 2013-03-08 2013-08-28 华南理工大学 Voice acquisition device used for noisy environment
CN106328156A (en) * 2016-08-22 2017-01-11 华南理工大学 Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information
WO2018229503A1 (en) * 2017-06-16 2018-12-20 Cirrus Logic International Semiconductor Limited Earbud speech estimation
CN110010143A (en) * 2019-04-19 2019-07-12 出门问问信息科技有限公司 A kind of voice signals enhancement system, method and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9711127B2 (en) * 2011-09-19 2017-07-18 Bitwave Pte Ltd. Multi-sensor signal optimization for speech communication
CN103208291A (en) * 2013-03-08 2013-07-17 华南理工大学 Speech enhancement method and device applicable to strong noise environments
CN105513605B (en) * 2015-12-01 2019-07-02 南京师范大学 The speech-enhancement system and sound enhancement method of mobile microphone
CN110070883B (en) * 2016-01-14 2023-07-28 深圳市韶音科技有限公司 Speech enhancement method
JP2018063400A (en) * 2016-10-14 2018-04-19 富士通株式会社 Audio processing apparatus and audio processing program
CN107886967B (en) * 2017-11-18 2018-11-13 中国人民解放军陆军工程大学 A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network
CN108986834B (en) * 2018-08-22 2023-04-07 中国人民解放军陆军工程大学 Bone conduction voice blind enhancement method based on codec framework and recurrent neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004279768A (en) * 2003-03-17 2004-10-07 Mitsubishi Heavy Ind Ltd Device and method for estimating air-conducted sound
CN203165457U (en) * 2013-03-08 2013-08-28 华南理工大学 Voice acquisition device used for noisy environment
CN106328156A (en) * 2016-08-22 2017-01-11 华南理工大学 Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information
WO2018229503A1 (en) * 2017-06-16 2018-12-20 Cirrus Logic International Semiconductor Limited Earbud speech estimation
CN110010143A (en) * 2019-04-19 2019-07-12 出门问问信息科技有限公司 A kind of voice signals enhancement system, method and storage medium

Also Published As

Publication number Publication date
WO2021012403A1 (en) 2021-01-28
CN110390945A (en) 2019-10-29

Similar Documents

Publication Publication Date Title
CN110390945B (en) Dual-sensor voice enhancement method and implementation device
CN109273021B (en) RNN-based real-time conference noise reduction method and device
TWI763073B (en) Deep learning based noise reduction method using both bone-conduction sensor and microphone signals
CN110070880B (en) Establishment method and application method of combined statistical model for classification
CN110197665B (en) Voice separation and tracking method for public security criminal investigation monitoring
JP2003255993A (en) System, method, and program for speech recognition, and system, method, and program for speech synthesis
KR102429152B1 (en) Deep learning voice extraction and noise reduction method by fusion of bone vibration sensor and microphone signal
Aichner et al. Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming
CN103325381A (en) Speech separation method based on fuzzy membership function
WO2022027423A1 (en) Deep learning noise reduction method and system fusing signal of bone vibration sensor with signals of two microphones
CN103208291A (en) Speech enhancement method and device applicable to strong noise environments
CN110942784A (en) Snore classification system based on support vector machine
Zheng et al. Spectra restoration of bone-conducted speech via attention-based contextual information and spectro-temporal structure constraint
JP2002268698A (en) Voice recognition device, device and method for standard pattern generation, and program
CN112185405B (en) Bone conduction voice enhancement method based on differential operation and combined dictionary learning
CN203165457U (en) Voice acquisition device used for noisy environment
CN113327589B (en) Voice activity detection method based on attitude sensor
CN111968627B (en) Bone conduction voice enhancement method based on joint dictionary learning and sparse representation
CN115410591A (en) Dual self-adaptive intelligent voice recognition method for VR live broadcast scene
CN114566179A (en) Time delay controllable voice noise reduction method
CN112992131A (en) Method for extracting ping-pong command of target voice in complex scene
Deng et al. Vision-Guided Speaker Embedding Based Speech Separation
CN106971733A (en) The method and system and intelligent terminal of Application on Voiceprint Recognition based on voice de-noising
KR20100056859A (en) Voice recognition apparatus and method
Thomsen et al. Speech enhancement and noise-robust automatic speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210921