CN112420018A - Language identification method suitable for low signal-to-noise ratio environment - Google Patents
Language identification method suitable for low signal-to-noise ratio environment Download PDFInfo
- Publication number
- CN112420018A CN112420018A CN202011154863.6A CN202011154863A CN112420018A CN 112420018 A CN112420018 A CN 112420018A CN 202011154863 A CN202011154863 A CN 202011154863A CN 112420018 A CN112420018 A CN 112420018A
- Authority
- CN
- China
- Prior art keywords
- signal
- sampling
- amplitude spectrum
- voice
- low
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000001228 spectrum Methods 0.000 claims abstract description 53
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000001914 filtration Methods 0.000 claims abstract description 13
- 238000012952 Resampling Methods 0.000 claims abstract description 8
- 238000005516 engineering process Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract 4
- 238000005070 sampling Methods 0.000 claims description 37
- 238000000605 extraction Methods 0.000 claims description 12
- 238000009499 grossing Methods 0.000 claims description 10
- 238000009432 framing Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract description 2
- 239000000284 extract Substances 0.000 abstract 1
- 230000009466 transformation Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000037433 frameshift Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 101100148545 Caenorhabditis elegans snr-5 gene Proteins 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a language identification method suitable for a low signal-to-noise ratio environment aiming at the defect of low identification rate under the low signal-to-noise ratio, belonging to the field of voice identification. The invention mainly adopts filtering and resampling technology to process the voice with noise and then extracts the amplitude spectrum. Mainly comprises the following parts: firstly, filtering a high-frequency part by using a low-pass filter; then resampling is carried out; and then preprocessing the signal, extracting an amplitude spectrum, and resampling the amplitude spectrum again to obtain the sampled amplitude spectrum characteristic under the low signal-to-noise ratio. And training the extracted feature input model to obtain a corresponding language model, mounting the trained model on a server, collecting and inputting recognized voice to the server through a client, then extracting features and the trained language model to carry out scoring judgment, and finally outputting a recognition result to the client. Through experimental tests, the speech features extracted by the method are applied to the field of language identification, so that the overall identification accuracy is improved, and the identification speed is extremely high.
Description
Technical Field
The invention relates to a language identification method under a low signal-to-noise ratio environment, belonging to the field of voice identification.
Background
In recent years, human communication awareness is getting stronger, and in cooperation of more and more countries joining the silk road, huge benefits brought by international cooperation attract more countries joining cross-country cooperation, thereby causing many problems. The most urgent problem at present is that the language is not available, which results in no better communication and cooperation. Although machine translation has achieved a good effect, the absence of language identification in the front end can cause manual switching of translated objects, so language identification is a very important research problem, and especially identification under low signal-to-noise ratio is always a hard-to-solve hotspot. With the introduction of machine learning, the technology is well developed, but the recognition under the low signal-to-noise ratio is still very limited, and the popularization to all parts of the world is still an unknown number, so the language recognition under the low signal-to-noise ratio needs to be further improved.
Disclosure of Invention
The invention aims to solve the technical problem of extracting effective characteristics in a low signal-to-noise ratio environment. The invention introduces a filter at the front end to filter high-frequency part information to obtain a low-frequency part signal, samples the low-frequency part information by adopting a time-domain interval A point, carries out FFT (fast Fourier transform), modulus taking, smoothing, logarithm taking, IFFT (inverse fast Fourier transform) conversion, construction of a sound channel impulse cepstrum sequence and a sound channel impulse response frequency spectrum on each frame by pre-emphasis, amplitude normalization and framing on the sampled signal so as to obtain a magnitude spectrum, and samples the magnitude spectrum by adopting a Nyquist sampling theorem to obtain a sampled magnitude spectrum. Finally, inputting the breadth spectrum characteristics of the languages into the training model to train the corresponding language model, and mounting the trained model to the server side. The speech to be recognized is collected by the client and input to the server, low-pass filtering processing is carried out on the speech, resampling is carried out, then the sampling amplitude spectrum characteristics are extracted and scored with the trained language model, and finally the recognition result is output and returned to the client webpage. The algorithm performs feature extraction and identification on the voice through simulation software, and achieves a good identification effect. In order to solve the technical problems, the invention adopts the following technical scheme: a language identification method under low signal-to-noise ratio environment. The method comprises the following steps:
s1, filtering
The principle is as follows:
through observation and statistics of the spectrogram, the fact that only low-frequency part energy information of the spectrogram containing the noise audio can be displayed is found, most high-frequency energy information is covered by the noise, most signal energy is concentrated on the low-frequency part, and the energy of the noise is mainly concentrated on the high-frequency part, so that the high-frequency part is filtered, the interference of partial noise can be reduced, and the signal-to-noise ratio of the original voice is improved.
S2 time domain alternate sampling
The principle is as follows:
the A-point sampling is carried out on the time domain signal by adopting a resampling technology, so that the high-frequency part of the filtered signal is superposed to the low-frequency part. Because the superposition of the speech information frequency spectrum is unequal, and the superposition of the noise information frequency spectrum is equal, the signal-to-noise ratio before sampling is improved through an average signal-to-noise ratio formula. Noisy speech is defined as x (n) ═ s (n) + w (n), and the signal-to-noise ratio is given by the following equation:
wherein the content of the first and second substances,which is indicative of the energy of the signal,representing the energy of white noise, s (n) is original voice, w (n) is zero-mean Gaussian white noise, and H is the total sampling point number of the whole voice.
The alternate sampling formula is defined as follows:
wherein x' (n) is a signal sampled at every other point,the total number of sampling points of the integer part is taken.
S3, extracting a sampling amplitude spectrum
The method comprises the steps of pre-emphasis, amplitude normalization, framing, FFT (fast Fourier transform), modulus taking, smoothing, logarithm taking, IFFT (inverse fast Fourier transform) conversion, construction of a sound channel impulse cepstrum sequence, sound channel impulse response frequency spectrum, point-spaced sampling and the like.
S3.1, Pre-emphasis
In order to avoid some signal information loss caused by FFT, it is necessary to increase the height by pre-emphasis
The energy of the frequency part is convenient for transmission in a channel, and the signal obtained after pre-emphasis is x' (n).
S3.2 amplitude normalization
The amplitude normalization formula is as follows:
where z (n) is the normalized signal and max (x "(n)) is the maximum value of the pre-emphasis signal value.
S3.3, framing
The normalized signal z (n) is divided into frames, and a section of speech signal is divided into a plurality of frames and phases due to the speech signal having short-time stationarityThe overlapping part generated by two adjacent different frames aims to make the transition between the frames smooth and keep the continuity, and the signal of the ith frame after the frame division is z(i)And (n), the frame length is E, the frame shift is K, and F frames are shared.
S3.4, FFT transformation
Transforming z with fast Fourier transform(i)(n) the time domain data becomes frequency domain data, and the formula is as follows:
z(i)(k)=FFT[z(i)(n)],1≤i≤F,1≤n≤E,1≤k≤E (4)
wherein z is(i)(k) Is the signal after the fast fourier transform.
S3.5, taking a modulus value
To z(i)(k) The signal modulo each data is | z(i)(k)|。
S3.5, smoothing
To | z(i)(k) And l, smoothing is carried out to reduce noise on the voice, the noise of the target voice is suppressed under the condition of keeping voice detail characteristics as much as possible, the size of a neighborhood is directly related to the smoothing effect, the larger the neighborhood is, the better the smoothing effect is, but the larger the neighborhood is, the larger the edge information loss is due to the fact that the neighborhood is too large, so that the output voice becomes fuzzy, and therefore the size of the neighborhood needs to be reasonably selected. The smoothed signal is y(i)(k)。
S3.6, taking logarithm
For the smoothed signal y(i)(k) Taking logarithm, the formula is as follows:
s(i)(k)=log(y(i)(k)) (5)
wherein s is(i)(k) Is a logarithmic signal;
s3.7 IFFT transformation
The logarithmic signal is subjected to inverse Fourier transform, namely cepstrum, and after cepstrum analysis, the glottal excitation pulse and the vocal tract impulse response are conveniently separated and are positioned in different cepstrum intervals in the cepstrum, and the formula is as follows:
c(i)(n)=FT-1[s(i)(k)] (6)
wherein, c(i)(n) is y(i)(n) inverse Fourier transform of the log-amplitude spectrum.
S3.8, constructing the sound channel impulse cepstrum sequence
Through c(i)(n) constructing a vocal tract impulse cepstrum sequence g(i)(n)。
S3.9, amplitude spectrum
Impulse cepstrum sequence g for sound channel(i)(n) performing fast Fourier transform, and then taking a real number part to obtain an amplitude spectrum, wherein the formula is as follows:
g(i)(k)=FFT[g(i)(n)] (7)
wherein, g(i)(k) For the Fourier transformed spectrum, g(i)(k) Getting the real number part to obtain the amplitude spectrum r(i)(k)。
S3.10, alternate sampling
Since the amplitude spectrum is bilaterally symmetrical, the first half is sampled. B point sampling is conducted on the amplitude spectrum through the Nyquist law, mainly in order to reduce data volume and simultaneously not destroy voice information, and the formula for increasing the express speed and the recognition speed is as follows:
y(i)=[r(i)(1),r(i)(B),r(i)(2B),r(i)(3B),...,r(i)(D)]T (8)
wherein, y(i)The amplitude spectrum after sampling for the ith frame, D is the corresponding r of the last sampling point(i)(k) Of (4).
Fusing the sampled amplitude spectrums of each frame to form a fused amplitude spectrum characteristic matrix:
Y=[y(1) y(2) y(3) ... y(i) ... y(F)] (9)
wherein Y is a sampling amplitude spectrum matrix of each speech segment.
S4, generating a training model
Referring to fig. 1, the extracted sampling magnitude spectrum of each language is input to a training model for training, so as to obtain a corresponding language model.
S5, language identification
The invention provides a language identification method suitable for a low signal-to-noise ratio environment.
Drawings
FIG. 1 is a recognition chart for language theory training
FIG. 2 is a graph of local waveforms for different signal-to-noise ratios
FIG. 3 is a waveform diagram and spectrogram before and after filtering
FIG. 4 is a waveform diagram and spectrogram of a filtered signal sample
FIG. 5 is a flow chart of spectral feature extraction
FIG. 6 is a frame amplitude spectrum
FIG. 7 is a sampled amplitude spectrum
FIG. 8 is a diagram of GMM-UBM model training language model
FIG. 9 is a diagram of language identification between a server and a client
FIG. 10 is a diagram of client returned recognition results
Detailed Description
The invention will be further described by means of specific embodiments in conjunction with the accompanying drawings.
S1, test audio data acquisition:
the language material is from the international broadcasting station of China, and mainly comprises five languages of Chinese, Tibetan, Uygur language, English and Hashakesteins. The five languages adopt a single sound channel, a 8000Hz sampling rate and a recording audio file with the length of 10 seconds.
S2, noise-containing speech generation
Referring to fig. 2, each waveform of the speech in the present invention is a local speech waveform under different signal-to-noise ratios, and it can be seen from the figure that as the signal-to-noise ratio is reduced, the larger the area of the speech signal waveform submerged by white noise is, the SNR-5 dB is basically only highlighted by the local strong signal waveform, so that the difficulty of low signal-to-noise ratio identification is very large.
S3, filtering
Referring to fig. 3, before filtering and after filtering, the speech waveform and the speech spectrogram are partially filtered by using a butterworth filter, so that a 1000Hz high-frequency part is filtered, information of a low-frequency part is retained, interference of partial noise can be reduced, and the signal-to-noise ratio of the speech waveform and the speech spectrogram is improved by about 7dB compared with that of original speech.
S2 time domain alternate sampling
Referring to fig. 4, the pre-sampled and post-sampled speech waveforms and spectra are sampled at 8-point intervals by resampling the time domain signal so that the high frequency portion of the filtered signal is superimposed on the low frequency portion. Because the superposition of the voice information frequency spectrum is unequal, and the superposition of the noise information frequency spectrum is equal, the signal-to-noise ratio before sampling is improved by about 5dB according to the average signal-to-noise ratio formula.
The alternate sampling formula is defined as follows:
x′(n)=[x(1),x(8),x(16),...,x(N)],1≤n≤10000,N=80000 (10)
where x' (n) is the signal after the dot-wise sampling.
S3, extracting a sampling amplitude spectrum
Referring to fig. 5, the extraction steps include pre-emphasis, amplitude normalization, framing, FFT transformation, modulus extraction, smoothing, logarithm extraction, IFFT transformation, construction of a channel impulse cepstrum sequence, magnitude spectrum, and alternate sampling.
S3.1, Pre-emphasis
S3.2 amplitude normalization
S3.3, framing
And framing the normalized signal z (n), wherein the frame length E is 256, and the frame shift K is 128.
S3.4, FFT transformation
S3.5, taking a modulus value
S3.5, smoothing
The embodiment of the present invention employs, in part, a Savitzky-Golay filter that performs smoothing based on a quadratic polynomial fit over each window.
S3.6, taking logarithm
S3.7 IFFT transformation
S3.8, constructing the sound channel impulse cepstrum sequence
S3.9, amplitude spectrum
Referring to fig. 6, a magnitude spectrum is obtained.
S3.10, alternate sampling
See fig. 7, amplitude spectrum after sampling. Since the amplitude spectrum is bilaterally symmetrical, the first half is sampled. Sampling is carried out on the amplitude spectrum at 6 points through the Nyquist law, mainly in order to reduce data volume and simultaneously not destroy voice information, and the formula of the method is as follows:
y(i)=[r(i)(1),r(i)(6),r(i)(12),r(i)(18),...,r(i)(126)]T,1≤i≤78 (11)
wherein, y(i)The sampled amplitude spectrum for the ith frame.
Fusing the sampled amplitude spectrums of each frame to form a fused amplitude spectrum characteristic matrix:
Y=[y(1) y(2) y(3) ... y(i) ... y(78)] (12)
wherein Y is a sampling amplitude spectrum matrix of each speech segment.
S4, generating a training model
Referring to fig. 8, the invention trains the corresponding language model by using the GMM-UBM model language identification system, and then mounts the trained model to the server. The GMM-UBM is an improved model of the GMM, and has the advantages that model parameters are estimated through an MAP algorithm, the overfitting phenomenon is avoided, all parameters of the GMM of a target user do not need to be adjusted, the best recognition performance can be realized only by estimating mean parameters of various Gaussian components, and the defect that training data are few can be effectively overcome. 1675 pieces are used as general template training corpora in the experiment, 300 pieces are used for each language of the training sample, wherein 50 pieces are noiseless, and the rest are respectively 50 pieces with SNR (signal to noise ratio) of 25dB, 20dB, 15dB, 10dB and 5dB, so that the real noisy environment can be better simulated, and the trained model is mounted to the server side.
S5 example of application of the method of the invention to single speech
Referring to fig. 9, in the present invention, a segment of monaural 10S speech is randomly collected at a client, transmitted to a server, filtered and sampled, then amplitude spectrum features are extracted, and then a scoring decision is performed with a trained model, and then an identification result is output and returned to a client webpage, fig. 10 is a graph of a returned identification result of the client.
S6 test performance examples of a large number of voices by using the method of the invention
171 pieces of each language of a sample are tested, and then corpora with SNR (signal to noise ratio) -5dB, 0dB, 5dB, 10dB, 15dB and 20dB are added in sequence to carry out identification experiments respectively. The identification results are shown in table 1 according to the specific procedures.
The experimental performance results prove that the noise resistance of the anti-noise coating is extremely strong, and the recognition rate of 0dB can reach more than 70 percent, as shown in Table 1.
TABLE 1 identification rate of five languages with different SNR by fusing features
(Unit/%)
The above description is only a preferred embodiment of the present invention, and should not be taken as limiting the invention, and all changes, equivalents, and improvements that come within the spirit and principle of the invention are intended to be embraced therein.
Claims (8)
1. A language identification method under the environment of low signal-to-noise ratio comprises the following steps:
s1, filtering the voice: and filtering information of the high-frequency part by using a filter to obtain a voice signal of the low-frequency part.
S2, sampling the filtering signal at intervals: and sampling the filtered signal at the A-point interval in the time domain by adopting a resampling technology.
S3, preprocessing: and preprocessing the sampled signals, including pre-emphasis, amplitude normalization and framing.
S4, extracting an amplitude spectrum: an amplitude spectrum extracted for the pre-processed signal.
S5, sampling of amplitude spectrum: and carrying out sampling treatment on the extracted amplitude spectrum at K points according to the Nyquist sampling theorem.
S6, generating a training model: and inputting the extracted magnitude spectrum of the language into a training model for training so as to obtain a corresponding language model.
S7, language identification: and mounting the extracted language model on a server, collecting voice data to be recognized at a client, inputting the voice data to the server for grading judgment recognition, and outputting a recognition result and returning the recognition result to a webpage of the client.
2. The feature extraction algorithm in low snr environment according to claim 1, wherein:
and filtering information of a high-frequency part of the voice by using a filter so as to obtain a voice signal of a low-frequency part.
3. The feature extraction algorithm in low snr environment according to claim 1, wherein: and sampling the filtered signal at the A-point interval in the time domain by adopting a resampling technology.
4. The feature extraction algorithm in low snr environment according to claim 1, wherein: and preprocessing the sampled signals, including pre-emphasis, amplitude normalization and framing.
5. The feature extraction algorithm in low snr environment according to claim 1, wherein: and performing FFT (fast Fourier transform), modulus taking, smoothing, logarithm taking, IFFT (inverse fast Fourier transform) and sound channel impulse cepstrum sequence and magnitude spectrum on the preprocessed signals to obtain the magnitude spectrum.
6. The feature extraction algorithm in low snr environment according to claim 1, wherein: and carrying out sampling treatment on the extracted amplitude spectrum at K points according to the Nyquist sampling theorem.
7. The feature extraction algorithm in low snr environment according to claim 1, wherein: and inputting the extracted sampling amplitude spectrum characteristics of the languages into a training model for training so as to obtain a corresponding language model.
8. The feature extraction algorithm in low snr environment according to claim 1, wherein: and filtering and sampling the voice, extracting the characteristics of the sampled amplitude spectrum, performing scoring judgment on the characteristics and the trained language model, and finally outputting a recognition result and returning the recognition result to the client.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011154863.6A CN112420018A (en) | 2020-10-26 | 2020-10-26 | Language identification method suitable for low signal-to-noise ratio environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011154863.6A CN112420018A (en) | 2020-10-26 | 2020-10-26 | Language identification method suitable for low signal-to-noise ratio environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112420018A true CN112420018A (en) | 2021-02-26 |
Family
ID=74841679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011154863.6A Pending CN112420018A (en) | 2020-10-26 | 2020-10-26 | Language identification method suitable for low signal-to-noise ratio environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112420018A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113160796A (en) * | 2021-04-28 | 2021-07-23 | 北京中科模识科技有限公司 | Language identification method, device, equipment and storage medium of broadcast audio |
CN114548221A (en) * | 2022-01-17 | 2022-05-27 | 苏州大学 | Generation type data enhancement method and system for small sample unbalanced voice database |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120116756A1 (en) * | 2010-11-10 | 2012-05-10 | Sony Computer Entertainment Inc. | Method for tone/intonation recognition using auditory attention cues |
CN110111769A (en) * | 2019-04-28 | 2019-08-09 | 深圳信息职业技术学院 | A kind of cochlear implant control method, device, readable storage medium storing program for executing and cochlear implant |
CN110223134A (en) * | 2019-04-28 | 2019-09-10 | 平安科技(深圳)有限公司 | Products Show method and relevant device based on speech recognition |
CN110827793A (en) * | 2019-10-21 | 2020-02-21 | 成都大公博创信息技术有限公司 | Language identification method |
CN110853632A (en) * | 2018-08-21 | 2020-02-28 | 蔚来汽车有限公司 | Voice recognition method based on voiceprint information and intelligent interaction equipment |
CN111161713A (en) * | 2019-12-20 | 2020-05-15 | 北京皮尔布莱尼软件有限公司 | Voice gender identification method and device and computing equipment |
-
2020
- 2020-10-26 CN CN202011154863.6A patent/CN112420018A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120116756A1 (en) * | 2010-11-10 | 2012-05-10 | Sony Computer Entertainment Inc. | Method for tone/intonation recognition using auditory attention cues |
CN110853632A (en) * | 2018-08-21 | 2020-02-28 | 蔚来汽车有限公司 | Voice recognition method based on voiceprint information and intelligent interaction equipment |
CN110111769A (en) * | 2019-04-28 | 2019-08-09 | 深圳信息职业技术学院 | A kind of cochlear implant control method, device, readable storage medium storing program for executing and cochlear implant |
CN110223134A (en) * | 2019-04-28 | 2019-09-10 | 平安科技(深圳)有限公司 | Products Show method and relevant device based on speech recognition |
CN110827793A (en) * | 2019-10-21 | 2020-02-21 | 成都大公博创信息技术有限公司 | Language identification method |
CN111161713A (en) * | 2019-12-20 | 2020-05-15 | 北京皮尔布莱尼软件有限公司 | Voice gender identification method and device and computing equipment |
Non-Patent Citations (2)
Title |
---|
孙裕晶: "《农业工程测试系统设计与应用》", 31 January 2011, pages: 106 - 107 * |
钟东: "《信号与系统》", 31 January 2018, pages: 83 - 86 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113160796A (en) * | 2021-04-28 | 2021-07-23 | 北京中科模识科技有限公司 | Language identification method, device, equipment and storage medium of broadcast audio |
CN113160796B (en) * | 2021-04-28 | 2023-08-08 | 北京中科模识科技有限公司 | Language identification method, device and equipment for broadcast audio and storage medium |
CN114548221A (en) * | 2022-01-17 | 2022-05-27 | 苏州大学 | Generation type data enhancement method and system for small sample unbalanced voice database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102054480B (en) | Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT) | |
CN108447495A (en) | A kind of deep learning sound enhancement method based on comprehensive characteristics collection | |
Alsteris et al. | Further intelligibility results from human listening tests using the short-time phase spectrum | |
CN109256127B (en) | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter | |
CN113012720B (en) | Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction | |
CN105825852A (en) | Oral English reading test scoring method | |
CN104183245A (en) | Method and device for recommending music stars with tones similar to those of singers | |
CN112420018A (en) | Language identification method suitable for low signal-to-noise ratio environment | |
CN107767859A (en) | The speaker's property understood detection method of artificial cochlea's signal under noise circumstance | |
Zhou et al. | Classification of speech under stress based on features derived from the nonlinear Teager energy operator | |
CN107785028B (en) | Voice noise reduction method and device based on signal autocorrelation | |
CN108198545A (en) | A kind of audio recognition method based on wavelet transformation | |
CN113436606B (en) | Original sound speech translation method | |
CN107464563B (en) | Voice interaction toy | |
CN109036470A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
Chennupati et al. | Significance of phase in single frequency filtering outputs of speech signals | |
CN108281150B (en) | Voice tone-changing voice-changing method based on differential glottal wave model | |
CN105845126A (en) | Method for automatic English subtitle filling of English audio image data | |
CN113963713A (en) | Audio noise reduction method and device | |
CN113744715A (en) | Vocoder speech synthesis method, device, computer equipment and storage medium | |
TW582024B (en) | Method and system for determining reliable speech recognition coefficients in noisy environment | |
Chi et al. | Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram | |
CN114401168B (en) | Voice enhancement method applicable to short wave Morse signal under complex strong noise environment | |
CN112331178A (en) | Language identification feature fusion method used in low signal-to-noise ratio environment | |
CN106997766B (en) | Homomorphic filtering speech enhancement method based on broadband noise |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210226 |