CN112420018A - Language identification method suitable for low signal-to-noise ratio environment - Google Patents

Language identification method suitable for low signal-to-noise ratio environment Download PDF

Info

Publication number
CN112420018A
CN112420018A CN202011154863.6A CN202011154863A CN112420018A CN 112420018 A CN112420018 A CN 112420018A CN 202011154863 A CN202011154863 A CN 202011154863A CN 112420018 A CN112420018 A CN 112420018A
Authority
CN
China
Prior art keywords
signal
sampling
amplitude spectrum
voice
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011154863.6A
Other languages
Chinese (zh)
Inventor
邵玉斌
刘晶
龙华
杜庆治
李一民
杨贵安
唐维康
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202011154863.6A priority Critical patent/CN112420018A/en
Publication of CN112420018A publication Critical patent/CN112420018A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a language identification method suitable for a low signal-to-noise ratio environment aiming at the defect of low identification rate under the low signal-to-noise ratio, belonging to the field of voice identification. The invention mainly adopts filtering and resampling technology to process the voice with noise and then extracts the amplitude spectrum. Mainly comprises the following parts: firstly, filtering a high-frequency part by using a low-pass filter; then resampling is carried out; and then preprocessing the signal, extracting an amplitude spectrum, and resampling the amplitude spectrum again to obtain the sampled amplitude spectrum characteristic under the low signal-to-noise ratio. And training the extracted feature input model to obtain a corresponding language model, mounting the trained model on a server, collecting and inputting recognized voice to the server through a client, then extracting features and the trained language model to carry out scoring judgment, and finally outputting a recognition result to the client. Through experimental tests, the speech features extracted by the method are applied to the field of language identification, so that the overall identification accuracy is improved, and the identification speed is extremely high.

Description

Language identification method suitable for low signal-to-noise ratio environment
Technical Field
The invention relates to a language identification method under a low signal-to-noise ratio environment, belonging to the field of voice identification.
Background
In recent years, human communication awareness is getting stronger, and in cooperation of more and more countries joining the silk road, huge benefits brought by international cooperation attract more countries joining cross-country cooperation, thereby causing many problems. The most urgent problem at present is that the language is not available, which results in no better communication and cooperation. Although machine translation has achieved a good effect, the absence of language identification in the front end can cause manual switching of translated objects, so language identification is a very important research problem, and especially identification under low signal-to-noise ratio is always a hard-to-solve hotspot. With the introduction of machine learning, the technology is well developed, but the recognition under the low signal-to-noise ratio is still very limited, and the popularization to all parts of the world is still an unknown number, so the language recognition under the low signal-to-noise ratio needs to be further improved.
Disclosure of Invention
The invention aims to solve the technical problem of extracting effective characteristics in a low signal-to-noise ratio environment. The invention introduces a filter at the front end to filter high-frequency part information to obtain a low-frequency part signal, samples the low-frequency part information by adopting a time-domain interval A point, carries out FFT (fast Fourier transform), modulus taking, smoothing, logarithm taking, IFFT (inverse fast Fourier transform) conversion, construction of a sound channel impulse cepstrum sequence and a sound channel impulse response frequency spectrum on each frame by pre-emphasis, amplitude normalization and framing on the sampled signal so as to obtain a magnitude spectrum, and samples the magnitude spectrum by adopting a Nyquist sampling theorem to obtain a sampled magnitude spectrum. Finally, inputting the breadth spectrum characteristics of the languages into the training model to train the corresponding language model, and mounting the trained model to the server side. The speech to be recognized is collected by the client and input to the server, low-pass filtering processing is carried out on the speech, resampling is carried out, then the sampling amplitude spectrum characteristics are extracted and scored with the trained language model, and finally the recognition result is output and returned to the client webpage. The algorithm performs feature extraction and identification on the voice through simulation software, and achieves a good identification effect. In order to solve the technical problems, the invention adopts the following technical scheme: a language identification method under low signal-to-noise ratio environment. The method comprises the following steps:
s1, filtering
The principle is as follows:
through observation and statistics of the spectrogram, the fact that only low-frequency part energy information of the spectrogram containing the noise audio can be displayed is found, most high-frequency energy information is covered by the noise, most signal energy is concentrated on the low-frequency part, and the energy of the noise is mainly concentrated on the high-frequency part, so that the high-frequency part is filtered, the interference of partial noise can be reduced, and the signal-to-noise ratio of the original voice is improved.
S2 time domain alternate sampling
The principle is as follows:
the A-point sampling is carried out on the time domain signal by adopting a resampling technology, so that the high-frequency part of the filtered signal is superposed to the low-frequency part. Because the superposition of the speech information frequency spectrum is unequal, and the superposition of the noise information frequency spectrum is equal, the signal-to-noise ratio before sampling is improved through an average signal-to-noise ratio formula. Noisy speech is defined as x (n) ═ s (n) + w (n), and the signal-to-noise ratio is given by the following equation:
Figure BDA0002742446030000021
wherein the content of the first and second substances,
Figure BDA0002742446030000022
which is indicative of the energy of the signal,
Figure BDA0002742446030000023
representing the energy of white noise, s (n) is original voice, w (n) is zero-mean Gaussian white noise, and H is the total sampling point number of the whole voice.
The alternate sampling formula is defined as follows:
Figure BDA0002742446030000024
wherein x' (n) is a signal sampled at every other point,
Figure BDA0002742446030000025
the total number of sampling points of the integer part is taken.
S3, extracting a sampling amplitude spectrum
The method comprises the steps of pre-emphasis, amplitude normalization, framing, FFT (fast Fourier transform), modulus taking, smoothing, logarithm taking, IFFT (inverse fast Fourier transform) conversion, construction of a sound channel impulse cepstrum sequence, sound channel impulse response frequency spectrum, point-spaced sampling and the like.
S3.1, Pre-emphasis
In order to avoid some signal information loss caused by FFT, it is necessary to increase the height by pre-emphasis
The energy of the frequency part is convenient for transmission in a channel, and the signal obtained after pre-emphasis is x' (n).
S3.2 amplitude normalization
The amplitude normalization formula is as follows:
Figure BDA0002742446030000026
where z (n) is the normalized signal and max (x "(n)) is the maximum value of the pre-emphasis signal value.
S3.3, framing
The normalized signal z (n) is divided into frames, and a section of speech signal is divided into a plurality of frames and phases due to the speech signal having short-time stationarityThe overlapping part generated by two adjacent different frames aims to make the transition between the frames smooth and keep the continuity, and the signal of the ith frame after the frame division is z(i)And (n), the frame length is E, the frame shift is K, and F frames are shared.
S3.4, FFT transformation
Transforming z with fast Fourier transform(i)(n) the time domain data becomes frequency domain data, and the formula is as follows:
z(i)(k)=FFT[z(i)(n)],1≤i≤F,1≤n≤E,1≤k≤E (4)
wherein z is(i)(k) Is the signal after the fast fourier transform.
S3.5, taking a modulus value
To z(i)(k) The signal modulo each data is | z(i)(k)|。
S3.5, smoothing
To | z(i)(k) And l, smoothing is carried out to reduce noise on the voice, the noise of the target voice is suppressed under the condition of keeping voice detail characteristics as much as possible, the size of a neighborhood is directly related to the smoothing effect, the larger the neighborhood is, the better the smoothing effect is, but the larger the neighborhood is, the larger the edge information loss is due to the fact that the neighborhood is too large, so that the output voice becomes fuzzy, and therefore the size of the neighborhood needs to be reasonably selected. The smoothed signal is y(i)(k)。
S3.6, taking logarithm
For the smoothed signal y(i)(k) Taking logarithm, the formula is as follows:
s(i)(k)=log(y(i)(k)) (5)
wherein s is(i)(k) Is a logarithmic signal;
s3.7 IFFT transformation
The logarithmic signal is subjected to inverse Fourier transform, namely cepstrum, and after cepstrum analysis, the glottal excitation pulse and the vocal tract impulse response are conveniently separated and are positioned in different cepstrum intervals in the cepstrum, and the formula is as follows:
c(i)(n)=FT-1[s(i)(k)] (6)
wherein, c(i)(n) is y(i)(n) inverse Fourier transform of the log-amplitude spectrum.
S3.8, constructing the sound channel impulse cepstrum sequence
Through c(i)(n) constructing a vocal tract impulse cepstrum sequence g(i)(n)。
S3.9, amplitude spectrum
Impulse cepstrum sequence g for sound channel(i)(n) performing fast Fourier transform, and then taking a real number part to obtain an amplitude spectrum, wherein the formula is as follows:
g(i)(k)=FFT[g(i)(n)] (7)
wherein, g(i)(k) For the Fourier transformed spectrum, g(i)(k) Getting the real number part to obtain the amplitude spectrum r(i)(k)。
S3.10, alternate sampling
Since the amplitude spectrum is bilaterally symmetrical, the first half is sampled. B point sampling is conducted on the amplitude spectrum through the Nyquist law, mainly in order to reduce data volume and simultaneously not destroy voice information, and the formula for increasing the express speed and the recognition speed is as follows:
y(i)=[r(i)(1),r(i)(B),r(i)(2B),r(i)(3B),...,r(i)(D)]T (8)
wherein, y(i)The amplitude spectrum after sampling for the ith frame, D is the corresponding r of the last sampling point(i)(k) Of (4).
Fusing the sampled amplitude spectrums of each frame to form a fused amplitude spectrum characteristic matrix:
Y=[y(1) y(2) y(3) ... y(i) ... y(F)] (9)
wherein Y is a sampling amplitude spectrum matrix of each speech segment.
S4, generating a training model
Referring to fig. 1, the extracted sampling magnitude spectrum of each language is input to a training model for training, so as to obtain a corresponding language model.
S5, language identification
The invention provides a language identification method suitable for a low signal-to-noise ratio environment.
Drawings
FIG. 1 is a recognition chart for language theory training
FIG. 2 is a graph of local waveforms for different signal-to-noise ratios
FIG. 3 is a waveform diagram and spectrogram before and after filtering
FIG. 4 is a waveform diagram and spectrogram of a filtered signal sample
FIG. 5 is a flow chart of spectral feature extraction
FIG. 6 is a frame amplitude spectrum
FIG. 7 is a sampled amplitude spectrum
FIG. 8 is a diagram of GMM-UBM model training language model
FIG. 9 is a diagram of language identification between a server and a client
FIG. 10 is a diagram of client returned recognition results
Detailed Description
The invention will be further described by means of specific embodiments in conjunction with the accompanying drawings.
S1, test audio data acquisition:
the language material is from the international broadcasting station of China, and mainly comprises five languages of Chinese, Tibetan, Uygur language, English and Hashakesteins. The five languages adopt a single sound channel, a 8000Hz sampling rate and a recording audio file with the length of 10 seconds.
S2, noise-containing speech generation
Referring to fig. 2, each waveform of the speech in the present invention is a local speech waveform under different signal-to-noise ratios, and it can be seen from the figure that as the signal-to-noise ratio is reduced, the larger the area of the speech signal waveform submerged by white noise is, the SNR-5 dB is basically only highlighted by the local strong signal waveform, so that the difficulty of low signal-to-noise ratio identification is very large.
S3, filtering
Referring to fig. 3, before filtering and after filtering, the speech waveform and the speech spectrogram are partially filtered by using a butterworth filter, so that a 1000Hz high-frequency part is filtered, information of a low-frequency part is retained, interference of partial noise can be reduced, and the signal-to-noise ratio of the speech waveform and the speech spectrogram is improved by about 7dB compared with that of original speech.
S2 time domain alternate sampling
Referring to fig. 4, the pre-sampled and post-sampled speech waveforms and spectra are sampled at 8-point intervals by resampling the time domain signal so that the high frequency portion of the filtered signal is superimposed on the low frequency portion. Because the superposition of the voice information frequency spectrum is unequal, and the superposition of the noise information frequency spectrum is equal, the signal-to-noise ratio before sampling is improved by about 5dB according to the average signal-to-noise ratio formula.
The alternate sampling formula is defined as follows:
x′(n)=[x(1),x(8),x(16),...,x(N)],1≤n≤10000,N=80000 (10)
where x' (n) is the signal after the dot-wise sampling.
S3, extracting a sampling amplitude spectrum
Referring to fig. 5, the extraction steps include pre-emphasis, amplitude normalization, framing, FFT transformation, modulus extraction, smoothing, logarithm extraction, IFFT transformation, construction of a channel impulse cepstrum sequence, magnitude spectrum, and alternate sampling.
S3.1, Pre-emphasis
S3.2 amplitude normalization
S3.3, framing
And framing the normalized signal z (n), wherein the frame length E is 256, and the frame shift K is 128.
S3.4, FFT transformation
S3.5, taking a modulus value
S3.5, smoothing
The embodiment of the present invention employs, in part, a Savitzky-Golay filter that performs smoothing based on a quadratic polynomial fit over each window.
S3.6, taking logarithm
S3.7 IFFT transformation
S3.8, constructing the sound channel impulse cepstrum sequence
S3.9, amplitude spectrum
Referring to fig. 6, a magnitude spectrum is obtained.
S3.10, alternate sampling
See fig. 7, amplitude spectrum after sampling. Since the amplitude spectrum is bilaterally symmetrical, the first half is sampled. Sampling is carried out on the amplitude spectrum at 6 points through the Nyquist law, mainly in order to reduce data volume and simultaneously not destroy voice information, and the formula of the method is as follows:
y(i)=[r(i)(1),r(i)(6),r(i)(12),r(i)(18),...,r(i)(126)]T,1≤i≤78 (11)
wherein, y(i)The sampled amplitude spectrum for the ith frame.
Fusing the sampled amplitude spectrums of each frame to form a fused amplitude spectrum characteristic matrix:
Y=[y(1) y(2) y(3) ... y(i) ... y(78)] (12)
wherein Y is a sampling amplitude spectrum matrix of each speech segment.
S4, generating a training model
Referring to fig. 8, the invention trains the corresponding language model by using the GMM-UBM model language identification system, and then mounts the trained model to the server. The GMM-UBM is an improved model of the GMM, and has the advantages that model parameters are estimated through an MAP algorithm, the overfitting phenomenon is avoided, all parameters of the GMM of a target user do not need to be adjusted, the best recognition performance can be realized only by estimating mean parameters of various Gaussian components, and the defect that training data are few can be effectively overcome. 1675 pieces are used as general template training corpora in the experiment, 300 pieces are used for each language of the training sample, wherein 50 pieces are noiseless, and the rest are respectively 50 pieces with SNR (signal to noise ratio) of 25dB, 20dB, 15dB, 10dB and 5dB, so that the real noisy environment can be better simulated, and the trained model is mounted to the server side.
S5 example of application of the method of the invention to single speech
Referring to fig. 9, in the present invention, a segment of monaural 10S speech is randomly collected at a client, transmitted to a server, filtered and sampled, then amplitude spectrum features are extracted, and then a scoring decision is performed with a trained model, and then an identification result is output and returned to a client webpage, fig. 10 is a graph of a returned identification result of the client.
S6 test performance examples of a large number of voices by using the method of the invention
171 pieces of each language of a sample are tested, and then corpora with SNR (signal to noise ratio) -5dB, 0dB, 5dB, 10dB, 15dB and 20dB are added in sequence to carry out identification experiments respectively. The identification results are shown in table 1 according to the specific procedures.
The experimental performance results prove that the noise resistance of the anti-noise coating is extremely strong, and the recognition rate of 0dB can reach more than 70 percent, as shown in Table 1.
TABLE 1 identification rate of five languages with different SNR by fusing features
(Unit/%)
Figure BDA0002742446030000071
The above description is only a preferred embodiment of the present invention, and should not be taken as limiting the invention, and all changes, equivalents, and improvements that come within the spirit and principle of the invention are intended to be embraced therein.

Claims (8)

1. A language identification method under the environment of low signal-to-noise ratio comprises the following steps:
s1, filtering the voice: and filtering information of the high-frequency part by using a filter to obtain a voice signal of the low-frequency part.
S2, sampling the filtering signal at intervals: and sampling the filtered signal at the A-point interval in the time domain by adopting a resampling technology.
S3, preprocessing: and preprocessing the sampled signals, including pre-emphasis, amplitude normalization and framing.
S4, extracting an amplitude spectrum: an amplitude spectrum extracted for the pre-processed signal.
S5, sampling of amplitude spectrum: and carrying out sampling treatment on the extracted amplitude spectrum at K points according to the Nyquist sampling theorem.
S6, generating a training model: and inputting the extracted magnitude spectrum of the language into a training model for training so as to obtain a corresponding language model.
S7, language identification: and mounting the extracted language model on a server, collecting voice data to be recognized at a client, inputting the voice data to the server for grading judgment recognition, and outputting a recognition result and returning the recognition result to a webpage of the client.
2. The feature extraction algorithm in low snr environment according to claim 1, wherein:
and filtering information of a high-frequency part of the voice by using a filter so as to obtain a voice signal of a low-frequency part.
3. The feature extraction algorithm in low snr environment according to claim 1, wherein: and sampling the filtered signal at the A-point interval in the time domain by adopting a resampling technology.
4. The feature extraction algorithm in low snr environment according to claim 1, wherein: and preprocessing the sampled signals, including pre-emphasis, amplitude normalization and framing.
5. The feature extraction algorithm in low snr environment according to claim 1, wherein: and performing FFT (fast Fourier transform), modulus taking, smoothing, logarithm taking, IFFT (inverse fast Fourier transform) and sound channel impulse cepstrum sequence and magnitude spectrum on the preprocessed signals to obtain the magnitude spectrum.
6. The feature extraction algorithm in low snr environment according to claim 1, wherein: and carrying out sampling treatment on the extracted amplitude spectrum at K points according to the Nyquist sampling theorem.
7. The feature extraction algorithm in low snr environment according to claim 1, wherein: and inputting the extracted sampling amplitude spectrum characteristics of the languages into a training model for training so as to obtain a corresponding language model.
8. The feature extraction algorithm in low snr environment according to claim 1, wherein: and filtering and sampling the voice, extracting the characteristics of the sampled amplitude spectrum, performing scoring judgment on the characteristics and the trained language model, and finally outputting a recognition result and returning the recognition result to the client.
CN202011154863.6A 2020-10-26 2020-10-26 Language identification method suitable for low signal-to-noise ratio environment Pending CN112420018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011154863.6A CN112420018A (en) 2020-10-26 2020-10-26 Language identification method suitable for low signal-to-noise ratio environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011154863.6A CN112420018A (en) 2020-10-26 2020-10-26 Language identification method suitable for low signal-to-noise ratio environment

Publications (1)

Publication Number Publication Date
CN112420018A true CN112420018A (en) 2021-02-26

Family

ID=74841679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011154863.6A Pending CN112420018A (en) 2020-10-26 2020-10-26 Language identification method suitable for low signal-to-noise ratio environment

Country Status (1)

Country Link
CN (1) CN112420018A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160796A (en) * 2021-04-28 2021-07-23 北京中科模识科技有限公司 Language identification method, device, equipment and storage medium of broadcast audio
CN114548221A (en) * 2022-01-17 2022-05-27 苏州大学 Generation type data enhancement method and system for small sample unbalanced voice database

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120116756A1 (en) * 2010-11-10 2012-05-10 Sony Computer Entertainment Inc. Method for tone/intonation recognition using auditory attention cues
CN110111769A (en) * 2019-04-28 2019-08-09 深圳信息职业技术学院 A kind of cochlear implant control method, device, readable storage medium storing program for executing and cochlear implant
CN110223134A (en) * 2019-04-28 2019-09-10 平安科技(深圳)有限公司 Products Show method and relevant device based on speech recognition
CN110827793A (en) * 2019-10-21 2020-02-21 成都大公博创信息技术有限公司 Language identification method
CN110853632A (en) * 2018-08-21 2020-02-28 蔚来汽车有限公司 Voice recognition method based on voiceprint information and intelligent interaction equipment
CN111161713A (en) * 2019-12-20 2020-05-15 北京皮尔布莱尼软件有限公司 Voice gender identification method and device and computing equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120116756A1 (en) * 2010-11-10 2012-05-10 Sony Computer Entertainment Inc. Method for tone/intonation recognition using auditory attention cues
CN110853632A (en) * 2018-08-21 2020-02-28 蔚来汽车有限公司 Voice recognition method based on voiceprint information and intelligent interaction equipment
CN110111769A (en) * 2019-04-28 2019-08-09 深圳信息职业技术学院 A kind of cochlear implant control method, device, readable storage medium storing program for executing and cochlear implant
CN110223134A (en) * 2019-04-28 2019-09-10 平安科技(深圳)有限公司 Products Show method and relevant device based on speech recognition
CN110827793A (en) * 2019-10-21 2020-02-21 成都大公博创信息技术有限公司 Language identification method
CN111161713A (en) * 2019-12-20 2020-05-15 北京皮尔布莱尼软件有限公司 Voice gender identification method and device and computing equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙裕晶: "《农业工程测试系统设计与应用》", 31 January 2011, pages: 106 - 107 *
钟东: "《信号与系统》", 31 January 2018, pages: 83 - 86 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160796A (en) * 2021-04-28 2021-07-23 北京中科模识科技有限公司 Language identification method, device, equipment and storage medium of broadcast audio
CN113160796B (en) * 2021-04-28 2023-08-08 北京中科模识科技有限公司 Language identification method, device and equipment for broadcast audio and storage medium
CN114548221A (en) * 2022-01-17 2022-05-27 苏州大学 Generation type data enhancement method and system for small sample unbalanced voice database

Similar Documents

Publication Publication Date Title
CN102054480B (en) Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT)
CN108447495A (en) A kind of deep learning sound enhancement method based on comprehensive characteristics collection
Alsteris et al. Further intelligibility results from human listening tests using the short-time phase spectrum
CN109256127B (en) Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter
CN113012720B (en) Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction
CN105825852A (en) Oral English reading test scoring method
CN104183245A (en) Method and device for recommending music stars with tones similar to those of singers
CN112420018A (en) Language identification method suitable for low signal-to-noise ratio environment
CN107767859A (en) The speaker's property understood detection method of artificial cochlea's signal under noise circumstance
Zhou et al. Classification of speech under stress based on features derived from the nonlinear Teager energy operator
CN107785028B (en) Voice noise reduction method and device based on signal autocorrelation
CN108198545A (en) A kind of audio recognition method based on wavelet transformation
CN113436606B (en) Original sound speech translation method
CN107464563B (en) Voice interaction toy
CN109036470A (en) Speech differentiation method, apparatus, computer equipment and storage medium
Chennupati et al. Significance of phase in single frequency filtering outputs of speech signals
CN108281150B (en) Voice tone-changing voice-changing method based on differential glottal wave model
CN105845126A (en) Method for automatic English subtitle filling of English audio image data
CN113963713A (en) Audio noise reduction method and device
CN113744715A (en) Vocoder speech synthesis method, device, computer equipment and storage medium
TW582024B (en) Method and system for determining reliable speech recognition coefficients in noisy environment
Chi et al. Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram
CN114401168B (en) Voice enhancement method applicable to short wave Morse signal under complex strong noise environment
CN112331178A (en) Language identification feature fusion method used in low signal-to-noise ratio environment
CN106997766B (en) Homomorphic filtering speech enhancement method based on broadband noise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210226