CN115602190A - Forged voice detection algorithm and system based on main body filtering - Google Patents

Forged voice detection algorithm and system based on main body filtering Download PDF

Info

Publication number
CN115602190A
CN115602190A CN202211217858.4A CN202211217858A CN115602190A CN 115602190 A CN115602190 A CN 115602190A CN 202211217858 A CN202211217858 A CN 202211217858A CN 115602190 A CN115602190 A CN 115602190A
Authority
CN
China
Prior art keywords
filtering
masking
voice
spectrogram
amplitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211217858.4A
Other languages
Chinese (zh)
Inventor
任延珍
刘轶文
王丽娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202211217858.4A priority Critical patent/CN115602190A/en
Publication of CN115602190A publication Critical patent/CN115602190A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The robustness of the existing forged voice detection method under the scene of recoding and noise mismatch is weak, and a strategy for performing data amplification on a training data set is provided for improving the robustness of the existing method and the research work of forged voice detection. However, the data augmentation strategy increases the amount of training data, reduces the model training efficiency, and can only be applied to known coding algorithms and noise difference scenarios. The invention relates to the field of forged voice detection, in particular to the field of forged voice detection in a recoding and noise interference scene, and particularly relates to a forged voice detection algorithm and a forged voice detection system based on main body filtering.

Description

Forged voice detection algorithm and system based on main body filtering
The technical field is as follows:
the invention relates to the field of forged voice detection, in particular to the field of forged voice detection in a recoding and noise interference scene, and particularly relates to a forged voice detection algorithm and a forged voice detection system based on main body filtering.
Technical background:
the non-robustness of the forged voice detection model means that when the data set of the training detection model and the data set of the evaluation detection model are mismatched, the detection performance of the forged voice detection model is reduced sharply. The mismatch of the training data set and the evaluation data set can be divided into a number of scenarios: the method comprises a speaker mismatch scene, a counterfeit algorithm mismatch scene, a recoding mismatch scene, a noise interference mismatch scene and the like. The speaker mismatching scene refers to the situation that speakers do not exist in the training data set exist in the evaluation data set; the scene of mismatching of the forged algorithm means that the forged voice in the evaluation data set uses the forged voice synthesis algorithm which is not used in the training data set during construction; the recoding mismatch scene means that the voice data in the evaluation data set may be processed by a plurality of unknown coding algorithms, while the voice data in the training data set is not coded or the related processing algorithms are limited; a noise interference mismatch scenario refers to the speech in the evaluation dataset containing various noise interferences, while the training dataset has only clean noise-free speech. The mismatch scenarios described above may exist together, for example, the LA assessment dataset and the training dataset of the ASVspoof2019 have both speaker mismatch and spurious speech algorithm mismatch.
Improving the robustness of the forged speech detection model is a progressive process. Early forged voice detection research work focused on robustness of forged voice detection models in speaker mismatch scenes and forged algorithm mismatch scenes, and a plurality of targeted detection models and novel loss functions were proposed. The detection accuracy of the fake voice detection methods is high in a speaker mismatch scene and a fake algorithm mismatch scene. However, the existing method is not concerned with robustness of a forged voice detection model in a recoding mismatch scene and a noise mismatch scene. Coding is a common processing mode of digital audio, and audio cannot inevitably encounter various noise interferences in the processes of acquisition, transmission and playing. Therefore, the forged voice detection model facing the practical scene should consider the performance under the re-encoding mismatch scene and the noise mismatch scene.
Experiments show that the existing forged voice detection model has poor robustness in a recoding mismatch scene and a noise mismatch scene. The distribution mismatch scenario that mainly exists between the LA assessment dataset of ASVspoof2021 and the LA training dataset of ASVspoof2019 is a re-encoding mismatch scenario. The existing forged voice detection model is trained in an LA training data set of ASVspoof2019, EER of the trained detection model is about 4% -7% when the trained detection model is evaluated by using the LA evaluation data set of ASVspoof2019, and EER of the trained detection model when the trained detection model is evaluated by using the LA evaluation data set of ASVspoof2021 is generally close to 20%. An LF (LF) data set of the ADD simulates a noise mismatch scene of a real environment, the EER of an existing forged voice detection model during training of the ADD data set is close to 10%, the EER during evaluation is about 30%, and the judgment capability of the model in the noise mismatch scene is weak. Therefore, the performance of the existing algorithm in the re-encoding mismatch scenario and the noise mismatch scenario still needs to be improved.
Disclosure of Invention
The technical problem of the invention is mainly solved by the following technical scheme:
a forged voice detection algorithm based on main body filtering comprises
Collecting voice data, extracting characteristics, and dividing the voice data into a training set and a testing set;
respectively extracting a masking filtering main body and an amplitude filtering main body from the training set and the test set, and eliminating interference data of speech data due to recoding and noise in a spectrogram to obtain an extracted training set and a test set;
training the detection model by using a training set to obtain a trained detection model;
and detecting the forged voice in real time by using the trained detection model.
In the above forged voice detection algorithm based on main body filtering, the feature extraction is to extract the spectrogram feature of the voice to be detected by using a general feature extraction algorithm.
In the foregoing forged speech detection algorithm based on main body filtering, when the masking filtering main body is extracted:
masking the spectrogram characteristics to calculate a spectrogram characteristic masking curve;
and eliminating the masked frequency components in the original speech spectrum features according to the masking curve to obtain a non-masking power spectrogram.
In the above forged voice detection algorithm based on the main body filtering, when the amplitude filtering main body is extracted:
performing frequency band division on the non-masking power spectrogram, and dividing the non-masking power spectrogram into a plurality of frequency band parts according to the characteristics of human voice and listening;
and eliminating the noise signals by adopting a self-adaptive amplitude filtering algorithm on each frequency band part to obtain a main signal power spectrogram.
In the forged voice detection algorithm based on the main body filtering, a Bark (Bark) domain spectrogram, a sound pressure level SPL of spectrogram amplitude and a local peak point of a frequency curve are respectively calculated according to a voice power spectrogram;
the Bark band, the sound pressure level SPL of the spectrogram amplitude and the local peak point are brought into a masking transfer function to calculate a masking curve;
and eliminating frequency components with amplitude lower than the masking curve to obtain a non-masking voice power spectrogram.
In the above forged voice detection algorithm based on the main body filtering, the non-masking power spectrogram is divided into three frequency bands, namely a high frequency band, a middle frequency band and a low frequency band;
and carrying out amplitude filtering according to the adaptive energy level in the frequency band region for different frequency bands.
In the above-mentioned forged voice detection algorithm based on body filtering, a Bark (Bark) domain spectrogram is calculated, i.e. the calculation of frequency domain to Bark domain is shown in formula 1,
Figure BDA0003873702040000031
wherein F hz Representing a frequency value, f bark Representing the frequency domain values in the bark scale.
In the above forged voice detection algorithm based on the main filtering, the core of the adaptive filtering is as shown in formula 2:
Figure BDA0003873702040000032
f abs representing the absolute value of the amplitude, top 10% Representing the order of the amplitudes of all frequency components in the band from high to low, F h There are two parameters, the first being all frequency components of the band and the second being Top 10% Calculated amplitude point, F h And setting the amplitudes of all frequency components with the amplitudes lower than the second parameter as singular values.
A forged voice detection system based on main body filtering comprises
A first module: the voice recognition system is configured to collect voice data, extract features of the voice data, and divide the voice data into a training set and a testing set;
a second module: the system is configured to respectively perform masking filtering main body extraction and amplitude filtering main body extraction on a training set and a test set, and eliminate interference data of speech data due to recoding and noise in a spectrogram to obtain an extracted training set and an extracted test set;
a third module: the training set is used for training the detection model to obtain a trained detection model;
a fourth module: configured to detect in real time the detection of spurious speech using a trained detection model.
Therefore, the invention has the following advantages: 1. aiming at the common problem of data distribution mismatch in the field of forged voice detection, the main body extraction module eliminates non-main body parts which are easy to change in voice signals, and the robustness of the conventional forged voice detection model in the scenes of recoding and noise interference is effectively improved. 2. The main body extraction module utilizes the auditory masking effect, filters out the non-main body part of the voice content, simultaneously reserves the main body part of the voice, and can keep the semantic meaning and the naturalness of the original voice. Under the condition that the training data set and the evaluation data set are not mismatched, the detection accuracy of the conventional forged voice detection model cannot be obviously reduced by the main body extraction module.
Drawings
Figure 1 is a schematic of a subject extraction scheme versus a general process.
Fig. 2 is a flow of a subject extraction calculation based on auditory masking effects.
Fig. 3 is a specific calculation flow of the principal extraction scheme based on spectrogram amplitude filtering.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.
Example (b):
the invention provides a masking filtering main body extraction scheme and an amplitude filtering main body extraction scheme respectively based on the characteristics of human auditory masking effect and high energy difference of signal noise by deeply analyzing the coding flow of voice and the characteristics of signals and noise in a spectrogram, eliminates interference components caused by recoding and noise on voice signals in the spectrogram, and reserves the main body part of voice production in the spectrogram, thereby realizing more robust detection of forged voice. The method specifically comprises the following steps:
1. and extracting the original spectrogram characteristics of the voice sample to be detected by using an STFT algorithm.
2. Processing the original spectrogram characteristics by using a main body filtering module, wherein the main body filtering module comprises the following steps:
2.1 calculating a masking curve of the original spectrogram characteristic according to a calculation formula of the masking effect.
And 2.2, removing the masked frequency components in the original spectrogram features according to the masking curve to obtain a non-masking power spectrogram.
2.3 according to human auditory characteristics, applying adaptive amplitude filtering to different frequency bands in the non-masking power spectrogram, and eliminating noise signals to obtain main body signals.
3. The subject signal is used as input to train a model for detecting the forged voice (many deep neural networks can be used for detecting the forged voice, and any network can be selected here, which is not the coverage of the present invention).
4. The trained model can be used for detecting the forged voice. Before detection, however, the main signal is still extracted by the main filtering manner described in step 2.
The core of the method is a main feature extraction module, the overall structure is shown in figure 1, and the work flow is as follows. Firstly, extracting the spectrogram feature of the voice to be detected by using a general feature extraction algorithm. Secondly, mask metrics are performed on the spectrogram features in order to calculate their masking curves. And then, the masking and removing module removes the masked frequency components in the original speech spectrum features according to the masking curve to obtain a non-masking power spectrogram. And finally, performing frequency band division on the non-masking power spectrogram, dividing the non-masking power spectrogram into a plurality of frequency band parts according to the characteristics of human voice production and listening, and removing noise signals from each frequency band part by adopting a self-adaptive amplitude filtering algorithm. The main part extraction module processes the speech spectrum characteristics of the training data and the evaluation data, interference information caused by recoding and noise signals between the characteristics is eliminated, and the phenomenon that a forged speech detection model depends on unstable interference information during training and evaluation is avoided, so that the robustness of the forged speech detection model is improved.
The main body extraction module firstly eliminates interference signals which can not be sensed by human ears based on masking effect, and secondly performs amplitude filtering operation according to the amplitude relation between noise and main body signals. This processing sequence is because the computation of the masking curve is required to maintain the integrity of the original speech signal, and if the amplitude filtering process is performed first, the relationship between the signals is destroyed and the computed masking curve will lose its original meaning. The masking filtering only eliminates the parts which can not be sensed by human ears, which has no influence on the subsequent noise elimination according to the amplitude, so the processing sequence of the main body extraction module is first masking filtering and then amplitude filtering.
The subject extraction module includes a subject extraction scheme based on auditory masking effect and a subject extraction scheme based on amplitude filtering, which are respectively described below.
1. Subject extraction scheme based on auditory masking effect (masking filtering subject extraction).
The specific processing flow of the subject extraction scheme based on the masking effect is shown in fig. 2. Firstly, according to the voice power spectrogram respectively meterBark (Bark) domain spectra, sound Pressure Level (SPL) in spectral amplitude, and local peak points of the frequency curve are calculated. The calculation of frequency domain to Bark domain is shown in formula 1, wherein f hz Represents the frequency value, f bark The frequency domain value representing Bark scale, the frequency domain is converted into Bark domain because Bark region is more in line with human ear auditory system, there are 24 Bark sub-bands respectively corresponding to 24 regions in human ear, the physiological basis of masking effect is mutual interference of voice frequency components in each region in 24 regions of human ear; the unit of the SPL value is dB which represents the ratio of the point sound to the standard sound pressure, and the formula of the frequency-to-SPL is shown in formula 2, wherein
Figure BDA0003873702040000051
Representing the square of the absolute value of the power, which represents the energy of the frequency component, N fft The number of levels used for fourier transformation is typically slightly larger than the speech framing window length and is a power of2, which is convenient for using fast fourier transform algorithms; the local peak point is a frequency point of which the frequency component in the frequency curve is higher than the frequency components around the local peak point, the algorithm for searching the local peak point in the sequence is very mature, and the fin _ peak function of the scipy library is used for positioning the peak point. In calculating the masking curve, each peak point is treated as a non-noise part. The reason why the peak point is calculated is that in the auditory masking effect, the masking effect of the non-noise part on the noise part is different from the masking effect of the noise part on the noise part, and generally only the masking effect of the non-noise part on the noise part needs to be counted.
The Bark band, SPL, and local peak points are taken into the masking transfer function to calculate the masking curve. The masking transfer function is a cyclic iteration process, and masking effects of each non-noise point on surrounding signals are calculated in an iteration mode and accumulated to finally obtain a masking curve of the whole frequency curve. The formula for calculating the global masking effect of each non-noise point in the masking transfer function is shown in equation 3.SPL i The SPL value representing the peak point of the current iteration is only such that the SPL of the non-noise part is greater than 40 to produce a masking effect. sf j Used for temporarily storing ith peak value to global jth frequency scoreMasking effect of the amount. dz is the difference between the Bark scale value for the ith frequency and the Bark scale value for the jth frequency, taken in absolute terms when used. θ depends on the value of dz, which takes 1 if dz is regular, and 0 otherwise. This means that there is a masking effect if the energy of the local peak point i is greater than the energy of j. And the masking and rejecting module rejects the frequency components with the amplitude lower than the masking curve to obtain a non-masking voice power spectrogram.
Figure BDA0003873702040000061
Figure BDA0003873702040000062
Sf j =abs(dz)·(-27+0.37·max(SPL i -40,0)·θ) (3)
2. Magnitude filtering based subject extraction scheme (magnitude filtering subject extraction).
The specific flow of the principal extraction scheme based on spectrogram magnitude filtering is shown in fig. 3. The non-masking power spectrogram firstly divides frequency bands into a high frequency band, a middle frequency band and a low frequency band. And carrying out amplitude filtering on different frequency bands according to the adaptive energy level in the frequency band region. The core of adaptive filtering is shown in equation 4. f. of abs Representing the absolute value of the amplitude, top 10% The amplitudes representing all frequency components within the band are ordered from high to low, and the amplitude, top, that happens to fall at the 10 percentile is chosen 30% And Top 5% Similar thereto. f. of h There are two parameters, the first being all frequency components of the band and the second being Top 10% Calculated amplitude point, F h And setting the amplitudes of all the frequency components with the amplitudes lower than the second parameter as singular values.
In different band components, different percentages are retained according to the amplitude distribution of each speech itself. Low frequencies are the main component of human speech and only frequencies with sufficient energy need to be preserved. The intermediate frequency is an important basis for human ears to distinguish different sounds, so that more frequency components need to be reserved. The high frequencies are mostly consonants, noise, etc., so that only minimal information needs to be retained.
Figure BDA0003873702040000071
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments, or alternatives may be employed, by those skilled in the art, without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (9)

1. A forged voice detection algorithm based on main body filtering is characterized by comprising
Collecting voice data, extracting characteristics, and dividing the voice data into a training set and a testing set;
respectively extracting a masking filtering main body and an amplitude filtering main body from the training set and the test set, and eliminating interference data of speech data due to recoding and noise in a spectrogram to obtain an extracted training set and a test set;
training the detection model by using a training set to obtain a trained detection model;
and detecting the forged voice in real time by using the trained detection model.
2. The algorithm for detecting counterfeit voice based on body filtering as claimed in claim 1, wherein the feature extraction is a general feature extraction algorithm to extract spectrogram features of the voice to be detected.
3. A forged voice detection algorithm based on body filtering according to claim 1, characterized in that, when extracting the masking filtering body:
masking measurement is carried out on spectrogram features, and a spectrogram feature masking curve is calculated;
and eliminating the masked frequency components in the original speech spectrum characteristics according to the masking curve to obtain a non-masking power spectrogram.
4. A subject filtering based counterfeit voice detection algorithm according to claim 1, wherein the magnitude filtering subject extraction:
performing frequency band division on the non-masking power spectrogram, and dividing the non-masking power spectrogram into a plurality of frequency band parts according to the characteristics of human voice and listening;
and eliminating the noise signals by adopting a self-adaptive amplitude filtering algorithm on each frequency band part to obtain a main signal power spectrogram.
5. A subject filtering based counterfeit speech detection algorithm according to claim 1,
respectively calculating a Bark (Bark) domain spectrogram, a sound pressure level SPL of the spectrogram amplitude and a local peak point of a frequency curve according to the voice power spectrogram;
the Bark band, the sound pressure level SPL of the spectrogram amplitude and the local peak point are brought into a masking transfer function to calculate a masking curve;
and eliminating frequency components with amplitude lower than the masking curve to obtain a non-masking voice power spectrogram.
6. A subject filtering based counterfeit speech detection algorithm according to claim 1,
dividing the frequency band of the non-masking power spectrogram into a high frequency band, a middle frequency band and a low frequency band;
and carrying out amplitude filtering according to the adaptive energy level in the frequency band region for different frequency bands.
7. A subject filtering based counterfeit speech detection algorithm according to claim 1,
calculating Bark (Bark) domain spectrogram, namely, calculating frequency domain to Bark domain as shown in formula 1,
Figure FDA0003873702030000021
wherein f is hz Represents the frequency value, f bark Representing the frequency domain values in the bark scale.
8. The algorithm for detecting forged speech based on body filtering as claimed in claim 1, wherein the core of the adaptive filtering is as shown in equation 2:
Figure FDA0003873702030000022
f abs representing the absolute value of the amplitude, top 10% Representing the order of the amplitudes of all frequency components in the band from high to low, F h There are two parameters, the first being all frequency components of the band and the second being Top 10% Calculated amplitude point, F h And setting the amplitudes of all the frequency components with the amplitudes lower than the second parameter as singular values.
9. A system for detecting forged voice based on main body filtering is characterized by comprising
A first module: the voice recognition system is configured to collect voice data, extract features of the voice data, and divide the voice data into a training set and a testing set;
a second module: the device is configured to respectively perform masking filtering main body extraction and amplitude filtering main body extraction on a training set and a test set, and eliminate interference data of speech data due to recoding and noise in a spectrogram to obtain an extracted training set and a test set;
a third module: the method comprises the steps of training a detection model by using a training set to obtain a trained detection model;
a fourth module: configured to detect in real time the detection of spurious speech using a trained detection model.
CN202211217858.4A 2022-09-30 2022-09-30 Forged voice detection algorithm and system based on main body filtering Pending CN115602190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211217858.4A CN115602190A (en) 2022-09-30 2022-09-30 Forged voice detection algorithm and system based on main body filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211217858.4A CN115602190A (en) 2022-09-30 2022-09-30 Forged voice detection algorithm and system based on main body filtering

Publications (1)

Publication Number Publication Date
CN115602190A true CN115602190A (en) 2023-01-13

Family

ID=84845344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211217858.4A Pending CN115602190A (en) 2022-09-30 2022-09-30 Forged voice detection algorithm and system based on main body filtering

Country Status (1)

Country Link
CN (1) CN115602190A (en)

Similar Documents

Publication Publication Date Title
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN105513605B (en) The speech-enhancement system and sound enhancement method of mobile microphone
CN108831499A (en) Utilize the sound enhancement method of voice existing probability
CN109256127B (en) Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter
CN111261189B (en) Vehicle sound signal feature extraction method
CN108198545B (en) Speech recognition method based on wavelet transformation
CN112201255A (en) Voice signal spectrum characteristic and deep learning voice spoofing attack detection method
CN110299141B (en) Acoustic feature extraction method for detecting playback attack of sound record in voiceprint recognition
CN105427859A (en) Front voice enhancement method for identifying speaker
CN107293306B (en) A kind of appraisal procedure of the Objective speech quality based on output
Mallidi et al. Novel neural network based fusion for multistream ASR
Jangjit et al. A new wavelet denoising method for noise threshold
CN112542174A (en) VAD-based multi-dimensional characteristic parameter voiceprint identification method
CN103544961A (en) Voice signal processing method and device
Couvreur et al. Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models
Wang et al. Low pass filtering and bandwidth extension for robust anti-spoofing countermeasure against codec variabilities
CN110197657B (en) Dynamic sound feature extraction method based on cosine similarity
CN111261192A (en) Audio detection method based on LSTM network, electronic equipment and storage medium
CN115602190A (en) Forged voice detection algorithm and system based on main body filtering
CN112863517B (en) Speech recognition method based on perceptual spectrum convergence rate
Rao et al. Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration
CN114613391B (en) Snore identification method and device based on half-band filter
CN115064182A (en) Fan fault feature identification method of self-adaptive Mel filter in strong noise environment
CN111933140A (en) Method, device and storage medium for detecting voice of earphone wearer
CN112750451A (en) Noise reduction method for improving voice listening feeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination