CN115602190A - Forged voice detection algorithm and system based on main body filtering - Google Patents
Forged voice detection algorithm and system based on main body filtering Download PDFInfo
- Publication number
- CN115602190A CN115602190A CN202211217858.4A CN202211217858A CN115602190A CN 115602190 A CN115602190 A CN 115602190A CN 202211217858 A CN202211217858 A CN 202211217858A CN 115602190 A CN115602190 A CN 115602190A
- Authority
- CN
- China
- Prior art keywords
- filtering
- masking
- voice
- spectrogram
- amplitude
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 69
- 238000001914 filtration Methods 0.000 title claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 11
- 230000000873 masking effect Effects 0.000 claims description 71
- 238000000605 extraction Methods 0.000 claims description 31
- 238000012360 testing method Methods 0.000 claims description 12
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 2
- 230000003321 amplification Effects 0.000 abstract 1
- 238000013434 data augmentation Methods 0.000 abstract 1
- 238000003199 nucleic acid amplification method Methods 0.000 abstract 1
- 238000011156 evaluation Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 210000005069 ears Anatomy 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The robustness of the existing forged voice detection method under the scene of recoding and noise mismatch is weak, and a strategy for performing data amplification on a training data set is provided for improving the robustness of the existing method and the research work of forged voice detection. However, the data augmentation strategy increases the amount of training data, reduces the model training efficiency, and can only be applied to known coding algorithms and noise difference scenarios. The invention relates to the field of forged voice detection, in particular to the field of forged voice detection in a recoding and noise interference scene, and particularly relates to a forged voice detection algorithm and a forged voice detection system based on main body filtering.
Description
The technical field is as follows:
the invention relates to the field of forged voice detection, in particular to the field of forged voice detection in a recoding and noise interference scene, and particularly relates to a forged voice detection algorithm and a forged voice detection system based on main body filtering.
Technical background:
the non-robustness of the forged voice detection model means that when the data set of the training detection model and the data set of the evaluation detection model are mismatched, the detection performance of the forged voice detection model is reduced sharply. The mismatch of the training data set and the evaluation data set can be divided into a number of scenarios: the method comprises a speaker mismatch scene, a counterfeit algorithm mismatch scene, a recoding mismatch scene, a noise interference mismatch scene and the like. The speaker mismatching scene refers to the situation that speakers do not exist in the training data set exist in the evaluation data set; the scene of mismatching of the forged algorithm means that the forged voice in the evaluation data set uses the forged voice synthesis algorithm which is not used in the training data set during construction; the recoding mismatch scene means that the voice data in the evaluation data set may be processed by a plurality of unknown coding algorithms, while the voice data in the training data set is not coded or the related processing algorithms are limited; a noise interference mismatch scenario refers to the speech in the evaluation dataset containing various noise interferences, while the training dataset has only clean noise-free speech. The mismatch scenarios described above may exist together, for example, the LA assessment dataset and the training dataset of the ASVspoof2019 have both speaker mismatch and spurious speech algorithm mismatch.
Improving the robustness of the forged speech detection model is a progressive process. Early forged voice detection research work focused on robustness of forged voice detection models in speaker mismatch scenes and forged algorithm mismatch scenes, and a plurality of targeted detection models and novel loss functions were proposed. The detection accuracy of the fake voice detection methods is high in a speaker mismatch scene and a fake algorithm mismatch scene. However, the existing method is not concerned with robustness of a forged voice detection model in a recoding mismatch scene and a noise mismatch scene. Coding is a common processing mode of digital audio, and audio cannot inevitably encounter various noise interferences in the processes of acquisition, transmission and playing. Therefore, the forged voice detection model facing the practical scene should consider the performance under the re-encoding mismatch scene and the noise mismatch scene.
Experiments show that the existing forged voice detection model has poor robustness in a recoding mismatch scene and a noise mismatch scene. The distribution mismatch scenario that mainly exists between the LA assessment dataset of ASVspoof2021 and the LA training dataset of ASVspoof2019 is a re-encoding mismatch scenario. The existing forged voice detection model is trained in an LA training data set of ASVspoof2019, EER of the trained detection model is about 4% -7% when the trained detection model is evaluated by using the LA evaluation data set of ASVspoof2019, and EER of the trained detection model when the trained detection model is evaluated by using the LA evaluation data set of ASVspoof2021 is generally close to 20%. An LF (LF) data set of the ADD simulates a noise mismatch scene of a real environment, the EER of an existing forged voice detection model during training of the ADD data set is close to 10%, the EER during evaluation is about 30%, and the judgment capability of the model in the noise mismatch scene is weak. Therefore, the performance of the existing algorithm in the re-encoding mismatch scenario and the noise mismatch scenario still needs to be improved.
Disclosure of Invention
The technical problem of the invention is mainly solved by the following technical scheme:
a forged voice detection algorithm based on main body filtering comprises
Collecting voice data, extracting characteristics, and dividing the voice data into a training set and a testing set;
respectively extracting a masking filtering main body and an amplitude filtering main body from the training set and the test set, and eliminating interference data of speech data due to recoding and noise in a spectrogram to obtain an extracted training set and a test set;
training the detection model by using a training set to obtain a trained detection model;
and detecting the forged voice in real time by using the trained detection model.
In the above forged voice detection algorithm based on main body filtering, the feature extraction is to extract the spectrogram feature of the voice to be detected by using a general feature extraction algorithm.
In the foregoing forged speech detection algorithm based on main body filtering, when the masking filtering main body is extracted:
masking the spectrogram characteristics to calculate a spectrogram characteristic masking curve;
and eliminating the masked frequency components in the original speech spectrum features according to the masking curve to obtain a non-masking power spectrogram.
In the above forged voice detection algorithm based on the main body filtering, when the amplitude filtering main body is extracted:
performing frequency band division on the non-masking power spectrogram, and dividing the non-masking power spectrogram into a plurality of frequency band parts according to the characteristics of human voice and listening;
and eliminating the noise signals by adopting a self-adaptive amplitude filtering algorithm on each frequency band part to obtain a main signal power spectrogram.
In the forged voice detection algorithm based on the main body filtering, a Bark (Bark) domain spectrogram, a sound pressure level SPL of spectrogram amplitude and a local peak point of a frequency curve are respectively calculated according to a voice power spectrogram;
the Bark band, the sound pressure level SPL of the spectrogram amplitude and the local peak point are brought into a masking transfer function to calculate a masking curve;
and eliminating frequency components with amplitude lower than the masking curve to obtain a non-masking voice power spectrogram.
In the above forged voice detection algorithm based on the main body filtering, the non-masking power spectrogram is divided into three frequency bands, namely a high frequency band, a middle frequency band and a low frequency band;
and carrying out amplitude filtering according to the adaptive energy level in the frequency band region for different frequency bands.
In the above-mentioned forged voice detection algorithm based on body filtering, a Bark (Bark) domain spectrogram is calculated, i.e. the calculation of frequency domain to Bark domain is shown in formula 1,
wherein F hz Representing a frequency value, f bark Representing the frequency domain values in the bark scale.
In the above forged voice detection algorithm based on the main filtering, the core of the adaptive filtering is as shown in formula 2:
f abs representing the absolute value of the amplitude, top 10% Representing the order of the amplitudes of all frequency components in the band from high to low, F h There are two parameters, the first being all frequency components of the band and the second being Top 10% Calculated amplitude point, F h And setting the amplitudes of all frequency components with the amplitudes lower than the second parameter as singular values.
A forged voice detection system based on main body filtering comprises
A first module: the voice recognition system is configured to collect voice data, extract features of the voice data, and divide the voice data into a training set and a testing set;
a second module: the system is configured to respectively perform masking filtering main body extraction and amplitude filtering main body extraction on a training set and a test set, and eliminate interference data of speech data due to recoding and noise in a spectrogram to obtain an extracted training set and an extracted test set;
a third module: the training set is used for training the detection model to obtain a trained detection model;
a fourth module: configured to detect in real time the detection of spurious speech using a trained detection model.
Therefore, the invention has the following advantages: 1. aiming at the common problem of data distribution mismatch in the field of forged voice detection, the main body extraction module eliminates non-main body parts which are easy to change in voice signals, and the robustness of the conventional forged voice detection model in the scenes of recoding and noise interference is effectively improved. 2. The main body extraction module utilizes the auditory masking effect, filters out the non-main body part of the voice content, simultaneously reserves the main body part of the voice, and can keep the semantic meaning and the naturalness of the original voice. Under the condition that the training data set and the evaluation data set are not mismatched, the detection accuracy of the conventional forged voice detection model cannot be obviously reduced by the main body extraction module.
Drawings
Figure 1 is a schematic of a subject extraction scheme versus a general process.
Fig. 2 is a flow of a subject extraction calculation based on auditory masking effects.
Fig. 3 is a specific calculation flow of the principal extraction scheme based on spectrogram amplitude filtering.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.
Example (b):
the invention provides a masking filtering main body extraction scheme and an amplitude filtering main body extraction scheme respectively based on the characteristics of human auditory masking effect and high energy difference of signal noise by deeply analyzing the coding flow of voice and the characteristics of signals and noise in a spectrogram, eliminates interference components caused by recoding and noise on voice signals in the spectrogram, and reserves the main body part of voice production in the spectrogram, thereby realizing more robust detection of forged voice. The method specifically comprises the following steps:
1. and extracting the original spectrogram characteristics of the voice sample to be detected by using an STFT algorithm.
2. Processing the original spectrogram characteristics by using a main body filtering module, wherein the main body filtering module comprises the following steps:
2.1 calculating a masking curve of the original spectrogram characteristic according to a calculation formula of the masking effect.
And 2.2, removing the masked frequency components in the original spectrogram features according to the masking curve to obtain a non-masking power spectrogram.
2.3 according to human auditory characteristics, applying adaptive amplitude filtering to different frequency bands in the non-masking power spectrogram, and eliminating noise signals to obtain main body signals.
3. The subject signal is used as input to train a model for detecting the forged voice (many deep neural networks can be used for detecting the forged voice, and any network can be selected here, which is not the coverage of the present invention).
4. The trained model can be used for detecting the forged voice. Before detection, however, the main signal is still extracted by the main filtering manner described in step 2.
The core of the method is a main feature extraction module, the overall structure is shown in figure 1, and the work flow is as follows. Firstly, extracting the spectrogram feature of the voice to be detected by using a general feature extraction algorithm. Secondly, mask metrics are performed on the spectrogram features in order to calculate their masking curves. And then, the masking and removing module removes the masked frequency components in the original speech spectrum features according to the masking curve to obtain a non-masking power spectrogram. And finally, performing frequency band division on the non-masking power spectrogram, dividing the non-masking power spectrogram into a plurality of frequency band parts according to the characteristics of human voice production and listening, and removing noise signals from each frequency band part by adopting a self-adaptive amplitude filtering algorithm. The main part extraction module processes the speech spectrum characteristics of the training data and the evaluation data, interference information caused by recoding and noise signals between the characteristics is eliminated, and the phenomenon that a forged speech detection model depends on unstable interference information during training and evaluation is avoided, so that the robustness of the forged speech detection model is improved.
The main body extraction module firstly eliminates interference signals which can not be sensed by human ears based on masking effect, and secondly performs amplitude filtering operation according to the amplitude relation between noise and main body signals. This processing sequence is because the computation of the masking curve is required to maintain the integrity of the original speech signal, and if the amplitude filtering process is performed first, the relationship between the signals is destroyed and the computed masking curve will lose its original meaning. The masking filtering only eliminates the parts which can not be sensed by human ears, which has no influence on the subsequent noise elimination according to the amplitude, so the processing sequence of the main body extraction module is first masking filtering and then amplitude filtering.
The subject extraction module includes a subject extraction scheme based on auditory masking effect and a subject extraction scheme based on amplitude filtering, which are respectively described below.
1. Subject extraction scheme based on auditory masking effect (masking filtering subject extraction).
The specific processing flow of the subject extraction scheme based on the masking effect is shown in fig. 2. Firstly, according to the voice power spectrogram respectively meterBark (Bark) domain spectra, sound Pressure Level (SPL) in spectral amplitude, and local peak points of the frequency curve are calculated. The calculation of frequency domain to Bark domain is shown in formula 1, wherein f hz Represents the frequency value, f bark The frequency domain value representing Bark scale, the frequency domain is converted into Bark domain because Bark region is more in line with human ear auditory system, there are 24 Bark sub-bands respectively corresponding to 24 regions in human ear, the physiological basis of masking effect is mutual interference of voice frequency components in each region in 24 regions of human ear; the unit of the SPL value is dB which represents the ratio of the point sound to the standard sound pressure, and the formula of the frequency-to-SPL is shown in formula 2, whereinRepresenting the square of the absolute value of the power, which represents the energy of the frequency component, N fft The number of levels used for fourier transformation is typically slightly larger than the speech framing window length and is a power of2, which is convenient for using fast fourier transform algorithms; the local peak point is a frequency point of which the frequency component in the frequency curve is higher than the frequency components around the local peak point, the algorithm for searching the local peak point in the sequence is very mature, and the fin _ peak function of the scipy library is used for positioning the peak point. In calculating the masking curve, each peak point is treated as a non-noise part. The reason why the peak point is calculated is that in the auditory masking effect, the masking effect of the non-noise part on the noise part is different from the masking effect of the noise part on the noise part, and generally only the masking effect of the non-noise part on the noise part needs to be counted.
The Bark band, SPL, and local peak points are taken into the masking transfer function to calculate the masking curve. The masking transfer function is a cyclic iteration process, and masking effects of each non-noise point on surrounding signals are calculated in an iteration mode and accumulated to finally obtain a masking curve of the whole frequency curve. The formula for calculating the global masking effect of each non-noise point in the masking transfer function is shown in equation 3.SPL i The SPL value representing the peak point of the current iteration is only such that the SPL of the non-noise part is greater than 40 to produce a masking effect. sf j Used for temporarily storing ith peak value to global jth frequency scoreMasking effect of the amount. dz is the difference between the Bark scale value for the ith frequency and the Bark scale value for the jth frequency, taken in absolute terms when used. θ depends on the value of dz, which takes 1 if dz is regular, and 0 otherwise. This means that there is a masking effect if the energy of the local peak point i is greater than the energy of j. And the masking and rejecting module rejects the frequency components with the amplitude lower than the masking curve to obtain a non-masking voice power spectrogram.
Sf j =abs(dz)·(-27+0.37·max(SPL i -40,0)·θ) (3)
2. Magnitude filtering based subject extraction scheme (magnitude filtering subject extraction).
The specific flow of the principal extraction scheme based on spectrogram magnitude filtering is shown in fig. 3. The non-masking power spectrogram firstly divides frequency bands into a high frequency band, a middle frequency band and a low frequency band. And carrying out amplitude filtering on different frequency bands according to the adaptive energy level in the frequency band region. The core of adaptive filtering is shown in equation 4. f. of abs Representing the absolute value of the amplitude, top 10% The amplitudes representing all frequency components within the band are ordered from high to low, and the amplitude, top, that happens to fall at the 10 percentile is chosen 30% And Top 5% Similar thereto. f. of h There are two parameters, the first being all frequency components of the band and the second being Top 10% Calculated amplitude point, F h And setting the amplitudes of all the frequency components with the amplitudes lower than the second parameter as singular values.
In different band components, different percentages are retained according to the amplitude distribution of each speech itself. Low frequencies are the main component of human speech and only frequencies with sufficient energy need to be preserved. The intermediate frequency is an important basis for human ears to distinguish different sounds, so that more frequency components need to be reserved. The high frequencies are mostly consonants, noise, etc., so that only minimal information needs to be retained.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments, or alternatives may be employed, by those skilled in the art, without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (9)
1. A forged voice detection algorithm based on main body filtering is characterized by comprising
Collecting voice data, extracting characteristics, and dividing the voice data into a training set and a testing set;
respectively extracting a masking filtering main body and an amplitude filtering main body from the training set and the test set, and eliminating interference data of speech data due to recoding and noise in a spectrogram to obtain an extracted training set and a test set;
training the detection model by using a training set to obtain a trained detection model;
and detecting the forged voice in real time by using the trained detection model.
2. The algorithm for detecting counterfeit voice based on body filtering as claimed in claim 1, wherein the feature extraction is a general feature extraction algorithm to extract spectrogram features of the voice to be detected.
3. A forged voice detection algorithm based on body filtering according to claim 1, characterized in that, when extracting the masking filtering body:
masking measurement is carried out on spectrogram features, and a spectrogram feature masking curve is calculated;
and eliminating the masked frequency components in the original speech spectrum characteristics according to the masking curve to obtain a non-masking power spectrogram.
4. A subject filtering based counterfeit voice detection algorithm according to claim 1, wherein the magnitude filtering subject extraction:
performing frequency band division on the non-masking power spectrogram, and dividing the non-masking power spectrogram into a plurality of frequency band parts according to the characteristics of human voice and listening;
and eliminating the noise signals by adopting a self-adaptive amplitude filtering algorithm on each frequency band part to obtain a main signal power spectrogram.
5. A subject filtering based counterfeit speech detection algorithm according to claim 1,
respectively calculating a Bark (Bark) domain spectrogram, a sound pressure level SPL of the spectrogram amplitude and a local peak point of a frequency curve according to the voice power spectrogram;
the Bark band, the sound pressure level SPL of the spectrogram amplitude and the local peak point are brought into a masking transfer function to calculate a masking curve;
and eliminating frequency components with amplitude lower than the masking curve to obtain a non-masking voice power spectrogram.
6. A subject filtering based counterfeit speech detection algorithm according to claim 1,
dividing the frequency band of the non-masking power spectrogram into a high frequency band, a middle frequency band and a low frequency band;
and carrying out amplitude filtering according to the adaptive energy level in the frequency band region for different frequency bands.
7. A subject filtering based counterfeit speech detection algorithm according to claim 1,
calculating Bark (Bark) domain spectrogram, namely, calculating frequency domain to Bark domain as shown in formula 1,
wherein f is hz Represents the frequency value, f bark Representing the frequency domain values in the bark scale.
8. The algorithm for detecting forged speech based on body filtering as claimed in claim 1, wherein the core of the adaptive filtering is as shown in equation 2:
f abs representing the absolute value of the amplitude, top 10% Representing the order of the amplitudes of all frequency components in the band from high to low, F h There are two parameters, the first being all frequency components of the band and the second being Top 10% Calculated amplitude point, F h And setting the amplitudes of all the frequency components with the amplitudes lower than the second parameter as singular values.
9. A system for detecting forged voice based on main body filtering is characterized by comprising
A first module: the voice recognition system is configured to collect voice data, extract features of the voice data, and divide the voice data into a training set and a testing set;
a second module: the device is configured to respectively perform masking filtering main body extraction and amplitude filtering main body extraction on a training set and a test set, and eliminate interference data of speech data due to recoding and noise in a spectrogram to obtain an extracted training set and a test set;
a third module: the method comprises the steps of training a detection model by using a training set to obtain a trained detection model;
a fourth module: configured to detect in real time the detection of spurious speech using a trained detection model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211217858.4A CN115602190A (en) | 2022-09-30 | 2022-09-30 | Forged voice detection algorithm and system based on main body filtering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211217858.4A CN115602190A (en) | 2022-09-30 | 2022-09-30 | Forged voice detection algorithm and system based on main body filtering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115602190A true CN115602190A (en) | 2023-01-13 |
Family
ID=84845344
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211217858.4A Pending CN115602190A (en) | 2022-09-30 | 2022-09-30 | Forged voice detection algorithm and system based on main body filtering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115602190A (en) |
-
2022
- 2022-09-30 CN CN202211217858.4A patent/CN115602190A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108447495B (en) | Deep learning voice enhancement method based on comprehensive feature set | |
CN105513605B (en) | The speech-enhancement system and sound enhancement method of mobile microphone | |
CN108831499A (en) | Utilize the sound enhancement method of voice existing probability | |
CN109256127B (en) | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter | |
CN111261189B (en) | Vehicle sound signal feature extraction method | |
CN108198545B (en) | Speech recognition method based on wavelet transformation | |
CN112201255A (en) | Voice signal spectrum characteristic and deep learning voice spoofing attack detection method | |
CN110299141B (en) | Acoustic feature extraction method for detecting playback attack of sound record in voiceprint recognition | |
CN105427859A (en) | Front voice enhancement method for identifying speaker | |
CN107293306B (en) | A kind of appraisal procedure of the Objective speech quality based on output | |
Mallidi et al. | Novel neural network based fusion for multistream ASR | |
Jangjit et al. | A new wavelet denoising method for noise threshold | |
CN112542174A (en) | VAD-based multi-dimensional characteristic parameter voiceprint identification method | |
CN103544961A (en) | Voice signal processing method and device | |
Couvreur et al. | Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models | |
Wang et al. | Low pass filtering and bandwidth extension for robust anti-spoofing countermeasure against codec variabilities | |
CN110197657B (en) | Dynamic sound feature extraction method based on cosine similarity | |
CN111261192A (en) | Audio detection method based on LSTM network, electronic equipment and storage medium | |
CN115602190A (en) | Forged voice detection algorithm and system based on main body filtering | |
CN112863517B (en) | Speech recognition method based on perceptual spectrum convergence rate | |
Rao et al. | Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration | |
CN114613391B (en) | Snore identification method and device based on half-band filter | |
CN115064182A (en) | Fan fault feature identification method of self-adaptive Mel filter in strong noise environment | |
CN111933140A (en) | Method, device and storage medium for detecting voice of earphone wearer | |
CN112750451A (en) | Noise reduction method for improving voice listening feeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |