CN110556125B - Feature extraction method and device based on voice signal and computer storage medium - Google Patents

Feature extraction method and device based on voice signal and computer storage medium Download PDF

Info

Publication number
CN110556125B
CN110556125B CN201910976850.8A CN201910976850A CN110556125B CN 110556125 B CN110556125 B CN 110556125B CN 201910976850 A CN201910976850 A CN 201910976850A CN 110556125 B CN110556125 B CN 110556125B
Authority
CN
China
Prior art keywords
noise
value
power spectrum
mel
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910976850.8A
Other languages
Chinese (zh)
Other versions
CN110556125A (en
Inventor
李勤
付聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mobvoi Information Technology Co Ltd
Original Assignee
Mobvoi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mobvoi Information Technology Co Ltd filed Critical Mobvoi Information Technology Co Ltd
Priority to CN201910976850.8A priority Critical patent/CN110556125B/en
Publication of CN110556125A publication Critical patent/CN110556125A/en
Application granted granted Critical
Publication of CN110556125B publication Critical patent/CN110556125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Abstract

The invention discloses a method, equipment and computer storage medium for extracting a characteristic value based on a voice signal, wherein the method comprises the following steps: carrying out time domain to frequency domain conversion on the noisy speech signal to obtain a frequency domain signal of the noisy speech signal; carrying out Mel filtering processing on the frequency domain signal to obtain a Mel power spectrum value of the frequency domain signal; denoising the Mel power spectrum value to obtain a noise-reduced Mel power spectrum value; and performing voice recognition according to the noise-reduced Mel power spectrum value to obtain voice characteristics corresponding to the noise-containing voice signal.

Description

Feature extraction method and device based on voice signal and computer storage medium
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a method and an apparatus for extracting feature values based on speech signals, and a computer storage medium.
Background
The speech recognition technology is a technology for converting a speech signal of a speaker into information recognizable by a computer program, thereby recognizing a speech command and text contents of the speaker. At present, speech recognition is widely applied to the fields of customer service quality inspection, navigation, intelligent home and the like. Speech recognition generally includes modules such as front-end processing, feature extraction, and the like. The input voice data stream is subjected to front-end processing to obtain a recognition result.
The voice denoising is used for denoising an input voice with noise, a voice signal with noise is used as an input signal of a denoising module, a time domain signal is generally transformed into a frequency domain firstly, then the frequency domain signal is used as the input signal of the denoising module for denoising, and finally the signal is transformed into the time domain. Other functions, such as speech recognition and noise reduction, are typically included on a carrier with speech processing capabilities. With the increase of functions, the data volume of carrier operation is also increased, and the occupation of the memory is influenced.
Disclosure of Invention
The embodiment of the invention provides a method and equipment for extracting a characteristic value based on a voice signal and a computer storage medium, which have the effect of reducing the data volume of operation.
The embodiment of the invention provides a feature extraction method based on a voice signal, which comprises the following steps: carrying out time domain to frequency domain conversion on the noisy speech signal to obtain a frequency domain signal of the noisy speech signal; carrying out Mel filtering processing on the frequency domain signal to obtain a Mel power spectrum value of the frequency domain signal; denoising the Mel power spectrum value to obtain a noise-reduced Mel power spectrum value; and performing voice recognition according to the noise-reduced Mel power spectrum value to obtain voice characteristics corresponding to the noise-containing voice signal.
In an implementation manner, the performing mel filtering processing on the frequency domain signal to obtain a mel power spectrum value of the frequency domain signal includes: calculating a signal power spectrum of the frequency domain signal; and carrying out Mel filtering processing on the signal power spectrum obtained by calculation through a Mel filter bank to obtain a Mel power spectrum value of the frequency domain signal.
In an embodiment, denoising the mel-power spectrum values to obtain denoised mel-power spectrum values includes: carrying out noise estimation on the signal power spectrum to obtain a noise estimation value; and carrying out noise suppression on the Mel power spectrum value according to the noise estimation value to obtain a noise-reduced Mel power spectrum value.
In one embodiment, performing noise estimation on the signal power spectrum to obtain a noise estimation value includes: calculating the signal power spectrum to obtain the minimum value of the noisy power within a set time; determining the minimum value of the noisy power as a noise estimation reference value; and compensating the noise estimation reference value to obtain the noise estimation value.
In an embodiment, the noise suppressing the mel-power spectrum value according to the noise estimation value to obtain a noise-reduced mel-power spectrum value includes: determining a first gain value of the mel-power spectrum value according to the noise estimation value; performing inter-spectrum smoothing on the first gain value to obtain a second gain value; and performing noise reduction processing on the Mel power spectrum value by using the second gain value to obtain a noise-reduced Mel power spectrum value.
In a further possible embodiment, determining a first gain value for the mel-power spectrum value based on the noise estimate comprises: calculating the posterior signal-to-noise ratio of the Mel power spectrum value according to the noise estimation value to obtain the posterior signal-to-noise ratio; carrying out prior signal-to-noise ratio calculation according to the posterior signal-to-noise ratio to obtain a prior signal-to-noise ratio; and calculating a gain value according to the prior signal-to-noise ratio to obtain the first gain value corresponding to the Mel power spectrum value.
Another aspect of the present invention provides a feature extraction apparatus based on a speech signal, the apparatus including: the conversion module is used for converting a noisy speech signal from a time domain to a frequency domain to obtain a frequency domain signal of the noisy speech signal; the filtering module is used for carrying out Mel filtering processing on the frequency domain signal to obtain a Mel power spectrum value of the frequency domain signal; the noise reduction module is used for carrying out noise reduction on the Mel power spectrum value to obtain a noise-reduced Mel power spectrum value; and the recognition module is used for carrying out voice recognition according to the noise-reduced Mel power spectrum value to obtain the voice characteristics corresponding to the noise-containing voice signal.
In an embodiment, the filtering module includes: the calculation submodule is used for calculating a signal power spectrum of the frequency domain signal; and the filtering submodule is used for carrying out Mel filtering processing on the signal power spectrum obtained by calculation through a Mel filter bank to obtain a Mel power spectrum value of the frequency domain signal.
In one embodiment, the noise reduction module includes: the noise estimation submodule is used for carrying out noise estimation on the signal power spectrum to obtain a noise estimation value; and the noise suppression submodule is used for performing noise suppression on the Mel power spectrum value according to the noise estimation value to obtain a noise-reduced Mel power spectrum value.
In one embodiment, the noise estimation sub-module includes: the computing unit is used for computing the signal power spectrum to obtain a minimum value of the noisy power within a set time; a first determining unit, configured to determine the minimum value of the noisy power as a noise estimation reference value; and the compensation unit is used for compensating the noise estimation reference value to obtain the noise estimation value.
In one embodiment, the noise suppression sub-module includes: a second determining unit for determining a first gain value of the mel-power spectrum value according to the noise estimation value; the smoothing unit is used for carrying out inter-spectrum smoothing processing on the first gain value to obtain a second gain value; and the noise reduction unit is used for carrying out noise reduction processing on the Mel power spectrum value by using the second gain value to obtain a noise-reduced Mel power spectrum value.
In an embodiment, the second determining unit includes: specifically, the method is used for calculating the posterior signal-to-noise ratio of the mel-power spectrum value according to the noise estimation value to obtain the posterior signal-to-noise ratio; carrying out prior signal-to-noise ratio calculation according to the posterior signal-to-noise ratio to obtain a prior signal-to-noise ratio; and calculating a gain value according to the prior signal-to-noise ratio to obtain the first gain value corresponding to the Mel power spectrum value.
Another aspect of the present invention provides a computer-readable storage medium, which includes a set of computer-executable instructions, when executed, for performing any one of the above-mentioned methods for extracting feature values based on a speech signal.
According to the characteristic value extraction method and device based on the voice signal and the computer storage medium, voice noise reduction and voice characteristic recognition are combined in the whole process and do not need to be carried out separately, so that the whole process only needs to carry out time domain to frequency domain conversion on the signal, the amount of noisy data which really participates in noise reduction operation is reduced, and memory resources and operation resources consumed by a noise reduction algorithm in the voice characteristic recognition process are greatly reduced.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, like or corresponding reference characters designate like or corresponding parts.
Fig. 1 is a schematic flow chart illustrating an implementation of a feature extraction method based on a speech signal according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a Mel filtering process implementation flow of the extraction method of the embodiment of the present invention;
FIG. 3 is a schematic view of a noise reduction process implementation flow of the extraction method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a noise estimation implementation flow of the extraction method according to the embodiment of the present invention;
FIG. 5 is a schematic diagram of a noise suppression implementation flow of the extraction method according to an embodiment of the present invention;
fig. 6 is a schematic block diagram of a feature extraction device based on a speech signal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart illustrating an implementation of a feature extraction method based on a speech signal according to an embodiment of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a feature extraction method based on a speech signal, where the method includes: step 101, converting a noisy speech signal from a time domain to a frequency domain to obtain a frequency domain signal of the noisy speech signal; 102, carrying out Mel filtering processing on the frequency domain signal to obtain a Mel power spectrum value of the frequency domain signal; 103, denoising the Mel power spectrum value to obtain a denoised Mel power spectrum value; and 104, performing voice recognition according to the noise-reduced Mel power spectrum value to obtain voice characteristics corresponding to the voice signal with noise.
The feature extraction method based on the voice signal provided by the embodiment of the invention takes the intermediate result 'Mel power spectrum value' generated in the voice feature extraction process as input, noise reduction is carried out on the voice signal with noise, and then voice recognition is carried out on the Mel power spectrum value after noise reduction, so as to obtain the voice feature corresponding to the voice signal with noise. In the whole process, the voice noise reduction and the voice feature recognition are combined without being separately carried out, so that the whole process only needs to carry out one-time conversion from a time domain to a frequency domain on a signal, the noise-carrying data volume which really participates in noise reduction operation is reduced, and the memory resource and the operation resource consumed by a noise reduction algorithm in the voice feature recognition process are greatly reduced.
The method comprises the step of converting a time domain to a frequency domain of a voice signal with noise to obtain a frequency domain signal of the voice signal with noise. Specifically, the embodiment of the invention realizes the conversion from the time domain to the frequency domain of the signal by framing, windowing and solving FFT on the voice signal with noise, so that the voice signal with noise is converted into the frequency domain signal. The method also comprises the step of carrying out Mel filtering processing on the frequency domain signal to obtain a Mel power spectrum value of the frequency domain signal. Specifically, the dimensionality of the frequency domain signal is reduced by carrying out Mel filtering processing on the frequency domain signal, the complexity of a Mel power spectrum value is low, and the processing effect is good. The method also comprises the step of carrying out noise reduction on the Mel power spectrum value to obtain the Mel power spectrum value after noise reduction. Specifically, the noise reduction method comprises noise estimation and noise suppression, and the noise-reduced mel power spectrum value is obtained by adopting the noise estimation and the noise suppression. The embodiment of the invention does not limit the specific methods of noise estimation and noise suppression. The method also comprises the step of carrying out voice recognition according to the noise-reduced Mel power spectrum value to obtain the voice characteristics corresponding to the voice signal with noise. Specifically, the voice sound characteristic of the embodiment of the invention is the Fbank value, and the Fbank value is obtained by solving the natural logarithm of the noise-reduced Mel power spectrum value. The Fbank value is used as a voice characteristic to be sent to a voice recognition engine for voice recognition.
Fig. 2 is a schematic flow diagram illustrating a mel filtering process of the extraction method according to the embodiment of the present invention.
Referring to fig. 2, in the embodiment of the present invention, in step 102, a mel filtering process is performed on the frequency domain signal to obtain a mel power spectrum value of the frequency domain signal, which includes: step 1021, calculating a signal power spectrum of the frequency domain signal; and step 1022, performing mel filtering processing on the calculated signal power spectrum through a mel filter bank to obtain a mel power spectrum value of the frequency domain signal.
Specifically, in the process of performing mel filtering on the frequency domain signal, the method includes calculating the frequency domain signal obtained in step 101, so as to obtain a signal power spectrum of the frequency domain signal, and implement dimension reduction on the frequency domain signal, and then performing mel filtering on the obtained signal power spectrum through a mel filter bank, so as to implement dimension reduction on the signal power spectrum, and obtain a mel power spectrum value.
Fig. 3 is a schematic view of a noise reduction process implementation flow of the extraction method according to the embodiment of the present invention.
Referring to fig. 3, in the embodiment of the present invention, in step 103, performing noise reduction on the mel-power spectrum value to obtain a noise-reduced mel-power spectrum value, includes: step 1031, performing noise estimation on the signal power spectrum to obtain a noise estimation value; and step 1032, carrying out noise suppression on the Mel power spectrum value according to the noise estimation value to obtain a noise-reduced Mel power spectrum value.
Specifically, in the process of denoising the mel-frequency filtered spectrum value, the method includes performing noise estimation on the signal power spectrum obtained in the step 1021 to obtain a noise estimation value, and performing noise suppression by combining the noise estimation value and the mel-frequency power spectrum value to obtain a mel-frequency power spectrum value after denoising. The noise-reduced mel-power spectrum value can be used for voice recognition and can also be used as the input of other modules, such as the storage and other processing of the noise-reduced voice signal.
Fig. 4 is a schematic diagram of a noise estimation implementation flow of the extraction method according to the embodiment of the present invention.
Referring to fig. 4, in the embodiment of the present invention, in step 1031, performing noise estimation on the signal power spectrum to obtain a noise estimation value, includes: step 10311, calculating a signal power spectrum to obtain a minimum value of the noisy power within a set time; step 10312, determining the minimum value of the power with noise as a noise estimation reference value; and step 10313, compensating the noise estimation reference value to obtain a noise estimation value.
Specifically, the embodiment of the invention adopts a noise estimation algorithm based on minimum statistics to estimate the noise. The method includes firstly calculating a signal power spectrum, and obtaining a minimum value of the signal power spectrum within a set time period, where the set time period is a certain time period set as required, such as 1s, 2s … or any other set time, and details are not described below. And after the minimum value of the noisy power within the set time is obtained, the value is used as a noise estimation reference value, and then the noise estimation value is obtained by compensating the reference value. Any one of a recursive average noise algorithm, a minimum tracking algorithm, a histogram noise estimation algorithm, or others may also be used to perform noise estimation to obtain a noise estimation value.
Fig. 5 is a schematic diagram of a flow chart of implementing noise suppression by the extraction method according to the embodiment of the present invention.
Referring to fig. 5, in the embodiment of the present invention, in step 1032, the noise suppression is performed on the mel-power spectrum value according to the noise estimation value, and the obtaining of the mel-power spectrum value after noise reduction includes: step 10321, determining a first gain value of the mel-power spectrum value according to the noise estimation value; step 10322, performing inter-spectrum smoothing on the first gain value to obtain a second gain value; and 10323, performing noise reduction on the mel-power spectrum value by using the second gain value to obtain a mel-power spectrum value after noise reduction.
Specifically, the embodiment of the invention adopts a noise suppression algorithm based on wiener filtering to suppress noise. The method comprises the steps of firstly processing a Mel power spectrum value through a noise estimation value to obtain a first gain value, and after the first gain value is obtained, performing certain inter-spectrum smoothing on the first gain value to obtain a second gain value, namely a final gain value. And then, carrying out noise reduction treatment by multiplying the Mel power spectrum value by a second gain value to obtain a Mel power spectrum value after noise reduction. The noise-reduced mel power spectrum value can be used for recognizing voice characteristics and can also be used as an input signal of other voice processing processes. Other noise suppression algorithms can be adopted for noise suppression to obtain a noise-reduced mel power spectrum value, such as a spectral subtraction method and the like.
In this embodiment of the present invention, step 10321, determining a first gain value of the mel-power spectrum value according to the noise estimation value, includes: firstly, calculating the posterior signal-to-noise ratio of a Mel power spectrum value according to a noise estimation value to obtain the posterior signal-to-noise ratio; then, carrying out prior signal-to-noise ratio calculation according to the posterior signal-to-noise ratio to obtain the prior signal-to-noise ratio; and then, calculating a gain value according to the prior signal-to-noise ratio to obtain a first gain value corresponding to the Mel power spectrum value.
Specifically, in the process of determining the first gain value, the method first calculates the posterior signal-to-noise ratio of the mel-power spectrum value according to the noise estimation value. After the posterior signal-to-noise ratio (SNRpost) is obtained through calculation, the posterior signal-to-noise ratio is calculated according to a decision-directed algorithm (decision-directed approach) to obtain a priori signal-to-noise ratio (SNRprio), and a specific formula for calculating the priori signal-to-noise ratio is as follows: SNRprio (i) ═ factor SNRprio (i-1) + (1-factor) × (snrpost (i) — 1, 0). Wherein, the factor is a smoothing factor and is a positive real number between 0 and 1. After obtaining the prior signal-to-noise ratio, calculating a first gain value (gain) of the noisy speech by the prior signal-to-noise ratio, wherein the formula is as follows: gain (i) ═ snrprio (i)/(snrprio (i) + 1). A second gain value is obtained by performing a certain inter-spectral smoothing of the first gain value. And multiplying the Mel power spectrum value by the second gain value to obtain a noise-reduced Mel power spectrum value.
To facilitate understanding of the above embodiments, a specific implementation scenario is provided below for explanation. In this implementation scenario, the feature extraction method based on the voice signal is applied to devices with data processing functions, such as computers, mobile phones, smart speakers, smart headsets, smart watches, smart robots, and the like. In this implementation scenario, the device is a cell phone. First, when a user needs to instruct the device to perform an inquiry about certain information, the device receives a noisy speech signal containing a play command, such as "inquiry information a", through a microphone. The device needs to perform speech feature extraction on the noisy speech signal to clarify the user instruction. In the process of extracting voice characteristics of a noisy voice signal, firstly, performing framing, windowing and FFT (fast Fourier transform) processing on the noisy voice signal to complete conversion of the noisy voice signal from a time domain to a frequency domain to obtain a frequency domain signal, then, calculating a signal power spectrum from the frequency domain signal, passing through a Mel filter to obtain a series of Mel power spectrum values, then, performing noise estimation and noise suppression on the Mel power spectrum values to realize noise reduction processing on the noisy voice signal to obtain a noise-reduced Mel power spectrum value, and obtaining an Fbank value by respectively taking natural logarithm of the noise-reduced Mel power spectrum value. The Fbank value is used as a voice characteristic and can be sent to a voice recognition system for voice recognition. The noise-reduced mel-power spectrum value can also be used as an input of other processing, such as converting the noise-reduced mel-power spectrum value into characters, storing and other processing.
By the method, the voice noise reduction algorithm with low memory occupation and low computation complexity can be realized; the method has great practical value on embedded equipment with limited resources.
Fig. 6 is a schematic block diagram of a feature extraction device based on a speech signal according to an embodiment of the present invention.
Referring to fig. 6, another aspect of the present invention provides a feature extraction device based on a speech signal, where the device includes: a conversion module 601, configured to perform time-domain to frequency-domain conversion on the noisy speech signal to obtain a frequency-domain signal of the noisy speech signal; a filtering module 602, configured to perform mel filtering on the frequency domain signal to obtain a mel power spectrum value of the frequency domain signal; a noise reduction module 603, configured to perform noise reduction on the mel-power spectrum value to obtain a mel-power spectrum value after noise reduction; the recognition module 604 is configured to perform speech recognition according to the noise-reduced mel power spectrum value to obtain speech features corresponding to the noise-containing speech signal.
In an embodiment of the present invention, the filtering module 602 includes: the calculation submodule 6021 is used for calculating a signal power spectrum of the frequency domain signal; and the filtering submodule 6022 is configured to perform mel filtering on the calculated signal power spectrum through a mel filter bank to obtain a mel power spectrum value of the frequency domain signal.
In this embodiment of the present invention, the denoising module 603 includes: a noise estimation sub-module 6031, configured to perform noise estimation on the signal power spectrum to obtain a noise estimation value; and a noise suppression submodule 6032, configured to perform noise suppression on the mel power spectrum value according to the noise estimation value, so as to obtain a mel power spectrum value after noise reduction.
In this embodiment of the present invention, the noise estimation sub-module 6031 includes: the calculating unit 60311 is configured to calculate a signal power spectrum to obtain a minimum value of the noisy power within a set time; a first determining unit 60312 for determining the minimum value of the noisy power as a noise estimation reference value; and a compensation unit 60313, configured to compensate the noise estimation reference value to obtain a noise estimation value.
In the embodiment of the present invention, the noise suppressor sub-module 6032 includes: a second determining unit 60321 for determining a first gain value of the mel-power spectrum value from the noise estimation value; a smoothing unit 60322, configured to perform inter-spectrum smoothing on the first gain value to obtain a second gain value; and a noise reduction unit 60323 configured to perform noise reduction processing on the mel-power spectrum value by using the second gain value, so as to obtain a mel-power spectrum value after noise reduction.
In the embodiment of the present invention, the second determining unit is specifically configured to perform a posterior signal-to-noise ratio calculation on the mel-power spectrum value according to the noise estimation value to obtain a posterior signal-to-noise ratio; carrying out prior signal-to-noise ratio calculation according to the posterior signal-to-noise ratio to obtain a prior signal-to-noise ratio; and calculating a gain value according to the prior signal-to-noise ratio to obtain a first gain value corresponding to the Mel power spectrum value.
Another aspect of the embodiments of the present invention provides a computer-readable storage medium, where the storage medium includes a set of computer-executable instructions, and when the instructions are executed, the storage medium is configured to perform any one of the above-mentioned methods for extracting feature values based on a speech signal.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A method for extracting features based on a speech signal, the method comprising:
carrying out time domain to frequency domain conversion on the noisy speech signal to obtain a frequency domain signal of the noisy speech signal;
carrying out Mel filtering processing on the frequency domain signal to obtain a Mel power spectrum value of the frequency domain signal;
denoising the Mel power spectrum value to obtain a noise-reduced Mel power spectrum value;
performing voice recognition according to the noise-reduced Mel power spectrum value to obtain voice characteristics corresponding to the noise-containing voice signal;
wherein, the denoising the mel power spectrum value to obtain the mel power spectrum value after denoising comprises:
calculating a signal power spectrum of the frequency domain signal;
carrying out noise estimation on the signal power spectrum to obtain a noise estimation value;
Carrying out noise suppression on the Mel power spectrum value according to the noise estimation value to obtain a noise-reduced Mel power spectrum value;
the voice recognition is performed according to the noise-reduced mel power spectrum value to obtain the voice characteristics corresponding to the noise-containing voice signal, and the method comprises the following steps:
and solving a natural logarithm of the denoised Mel power spectrum value to obtain an Fbank value, wherein the Fbank value is used as a voice characteristic to be sent to a voice recognition engine for voice recognition.
2. The method of claim 1, wherein the performing the mel filtering process on the frequency domain signal to obtain the mel power spectrum value of the frequency domain signal comprises:
and carrying out Mel filtering processing on the signal power spectrum obtained by calculation through a Mel filter bank to obtain a Mel power spectrum value of the frequency domain signal.
3. The method of claim 1, wherein performing noise estimation on the power spectrum of the signal to obtain a noise estimate comprises:
calculating the signal power spectrum to obtain the minimum value of the noisy power within a set time;
determining the minimum value of the noisy power as a noise estimation reference value;
and compensating the noise estimation reference value to obtain the noise estimation value.
4. The method of claim 1, wherein performing noise suppression on the mel-power spectral values according to the noise estimation values to obtain denoised mel-power spectral values comprises:
determining a first gain value of the mel power spectrum value according to the noise estimation value;
performing inter-spectrum smoothing on the first gain value to obtain a second gain value;
and performing noise reduction processing on the Mel power spectrum value by using the second gain value to obtain a noise-reduced Mel power spectrum value.
5. The method of claim 4, wherein determining a first gain value for the mel-power spectral value as a function of the noise estimate comprises:
calculating the posterior signal-to-noise ratio of the Mel power spectrum value according to the noise estimation value to obtain the posterior signal-to-noise ratio;
carrying out prior signal-to-noise ratio calculation according to the posterior signal-to-noise ratio to obtain a prior signal-to-noise ratio;
and calculating a gain value according to the prior signal-to-noise ratio to obtain the first gain value corresponding to the Mel power spectrum value.
6. A feature extraction device based on a speech signal, characterized in that the device comprises:
the conversion module is used for converting a time domain to a frequency domain of the noisy speech signal to obtain a frequency domain signal of the noisy speech signal; the filtering module is used for carrying out Mel filtering processing on the frequency domain signal to obtain a Mel power spectrum value of the frequency domain signal;
The noise reduction module is used for carrying out noise reduction on the Mel power spectrum value to obtain a noise-reduced Mel power spectrum value;
the recognition module is used for carrying out voice recognition according to the noise-reduced Mel power spectrum value to obtain voice characteristics corresponding to the noise-containing voice signal;
the noise reduction module comprises a calculation submodule, a noise estimation submodule and a noise suppression submodule;
the calculation submodule is used for calculating a signal power spectrum of the frequency domain signal;
the noise estimation submodule is used for carrying out noise estimation on the signal power spectrum to obtain a noise estimation value;
the noise suppression submodule is used for performing noise suppression on the Mel power spectrum value according to the noise estimation value to obtain a noise-reduced Mel power spectrum value;
the recognition module is further configured to solve a natural logarithm for the noise-reduced mel-power spectrum value to obtain an Fbank value, and the Fbank value is used as a voice feature to be sent to a voice recognition engine for voice recognition.
7. The apparatus of claim 6, wherein the filtering module comprises:
and the filtering submodule is used for carrying out Mel filtering processing on the signal power spectrum obtained by calculation through a Mel filter bank to obtain a Mel power spectrum value of the frequency domain signal.
8. A computer storage medium comprising a set of computer-executable instructions for performing the method of speech signal based feature extraction of any one of claims 1-5 when executed.
CN201910976850.8A 2019-10-15 2019-10-15 Feature extraction method and device based on voice signal and computer storage medium Active CN110556125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910976850.8A CN110556125B (en) 2019-10-15 2019-10-15 Feature extraction method and device based on voice signal and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910976850.8A CN110556125B (en) 2019-10-15 2019-10-15 Feature extraction method and device based on voice signal and computer storage medium

Publications (2)

Publication Number Publication Date
CN110556125A CN110556125A (en) 2019-12-10
CN110556125B true CN110556125B (en) 2022-06-10

Family

ID=68742854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910976850.8A Active CN110556125B (en) 2019-10-15 2019-10-15 Feature extraction method and device based on voice signal and computer storage medium

Country Status (1)

Country Link
CN (1) CN110556125B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475633B (en) * 2020-04-10 2022-06-10 复旦大学 Speech support system based on seat voice
CN111415313B (en) * 2020-04-13 2022-08-30 展讯通信(上海)有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN111883181A (en) * 2020-06-30 2020-11-03 海尔优家智能科技(北京)有限公司 Audio detection method and device, storage medium and electronic device
CN113299302A (en) * 2021-04-22 2021-08-24 维沃移动通信(杭州)有限公司 Audio noise reduction method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740030A (en) * 2008-11-04 2010-06-16 北京中星微电子有限公司 Method and device for transmitting and receiving speech signals
CN108510979A (en) * 2017-02-27 2018-09-07 芋头科技(杭州)有限公司 A kind of training method and audio recognition method of mixed frequency acoustics identification model
CN109036470A (en) * 2018-06-04 2018-12-18 平安科技(深圳)有限公司 Speech differentiation method, apparatus, computer equipment and storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916567B (en) * 2009-11-23 2012-02-01 瑞声声学科技(深圳)有限公司 Speech enhancement method applied to dual-microphone system
JP5875414B2 (en) * 2012-03-07 2016-03-02 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Noise suppression method, program and apparatus
CN102982801B (en) * 2012-11-12 2014-12-10 中国科学院自动化研究所 Phonetic feature extracting method for robust voice recognition
CN104021796B (en) * 2013-02-28 2017-06-20 华为技术有限公司 Speech enhan-cement treating method and apparatus
CN103871421B (en) * 2014-03-21 2018-02-02 厦门莱亚特医疗器械有限公司 A kind of self-adaptation noise reduction method and system based on subband noise analysis
CN105788603B (en) * 2016-02-25 2019-04-16 深圳创维数字技术有限公司 A kind of audio identification methods and system based on empirical mode decomposition
CN105976812B (en) * 2016-04-28 2019-04-26 腾讯科技(深圳)有限公司 A kind of audio recognition method and its equipment
CN106653043B (en) * 2016-12-26 2019-09-27 云知声(上海)智能科技有限公司 Reduce the Adaptive beamformer method of voice distortion
CN107331384B (en) * 2017-06-12 2018-05-04 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN108682418B (en) * 2018-06-26 2022-03-04 北京理工大学 Speech recognition method based on pre-training and bidirectional LSTM
CN109215634A (en) * 2018-10-22 2019-01-15 上海声瀚信息科技有限公司 A kind of method and its system of more word voice control on-off systems
CN109147818A (en) * 2018-10-30 2019-01-04 Oppo广东移动通信有限公司 Acoustic feature extracting method, device, storage medium and terminal device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740030A (en) * 2008-11-04 2010-06-16 北京中星微电子有限公司 Method and device for transmitting and receiving speech signals
CN108510979A (en) * 2017-02-27 2018-09-07 芋头科技(杭州)有限公司 A kind of training method and audio recognition method of mixed frequency acoustics identification model
CN109036470A (en) * 2018-06-04 2018-12-18 平安科技(深圳)有限公司 Speech differentiation method, apparatus, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于语音增强的远场说话人识别技术;覃晓逸 等;《网络新媒体技术》;20190731;第8卷(第4期);第1-10页 *

Also Published As

Publication number Publication date
CN110556125A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN110556125B (en) Feature extraction method and device based on voice signal and computer storage medium
CN109767783B (en) Voice enhancement method, device, equipment and storage medium
CN111418010B (en) Multi-microphone noise reduction method and device and terminal equipment
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
CN110634497B (en) Noise reduction method and device, terminal equipment and storage medium
US10839820B2 (en) Voice processing method, apparatus, device and storage medium
CN113539285B (en) Audio signal noise reduction method, electronic device and storage medium
CN110875049B (en) Voice signal processing method and device
WO2021007841A1 (en) Noise estimation method, noise estimation apparatus, speech processing chip and electronic device
WO2022218254A1 (en) Voice signal enhancement method and apparatus, and electronic device
CN110503973B (en) Audio signal transient noise suppression method, system and storage medium
CN103824563A (en) Hearing aid denoising device and method based on module multiplexing
CN105144290A (en) Signal processing device, signal processing method, and signal processing program
RU2616534C2 (en) Noise reduction during audio transmission
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
CN108053834B (en) Audio data processing method, device, terminal and system
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
CN113611319B (en) Wind noise suppression method, device, equipment and system based on voice component
CN112669869B (en) Noise suppression method, device, apparatus and storage medium
CN113593599A (en) Method for removing noise signal in voice signal
CN117280414A (en) Noise reduction based on dynamic neural network
CN117351925B (en) Howling suppression method, device, electronic equipment and storage medium
CN113870884B (en) Single-microphone noise suppression method and device
CN113035222B (en) Voice noise reduction method and device, filter determination method and voice interaction equipment
CN115985337B (en) Transient noise detection and suppression method and device based on single microphone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant