CN110610696A - MFCC feature extraction method and device based on mixed signal domain - Google Patents

MFCC feature extraction method and device based on mixed signal domain Download PDF

Info

Publication number
CN110610696A
CN110610696A CN201810615611.5A CN201810615611A CN110610696A CN 110610696 A CN110610696 A CN 110610696A CN 201810615611 A CN201810615611 A CN 201810615611A CN 110610696 A CN110610696 A CN 110610696A
Authority
CN
China
Prior art keywords
domain
signal
frequency
time domain
frequency band
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810615611.5A
Other languages
Chinese (zh)
Other versions
CN110610696B (en
Inventor
李钦
乔飞
魏琦
朱慧峰
刘辛军
杨华中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810615611.5A priority Critical patent/CN110610696B/en
Publication of CN110610696A publication Critical patent/CN110610696A/en
Application granted granted Critical
Publication of CN110610696B publication Critical patent/CN110610696B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Abstract

The embodiment of the invention provides a MFCC feature extraction method and a MFCC feature extraction device based on a mixed signal domain, wherein the mixed signal domain comprises an analog signal domain and a digital signal domain, and the method comprises the following steps: acquiring a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands; calculating the time domain signals in each frequency band according to a preset operation rule; carrying out low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band; and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic. The device performs the above method. The method and the device provided by the embodiment of the invention can effectively extract the MFCC characteristics, improve the extraction speed and reduce the energy consumed in the extraction process.

Description

MFCC feature extraction method and device based on mixed signal domain
Technical Field
The embodiment of the invention relates to the technical field of voice feature extraction, in particular to a mixed signal domain-based MFCC feature extraction method and device.
Background
Voice interaction has become an important approach between human-computer interaction, and therefore, automatic voice recognition is very important. Furthermore, in energy-constrained application scenarios, low-power and energy-efficient automatic speech recognition is of paramount importance.
Auditory feature extraction is a key in automatic speech recognition, Mel-scale Frequency Cepstral coeffients (hereinafter referred to as "MFCC") can intuitively show the distribution of speech signals in a Frequency domain, and therefore, MFCC features are widely extracted as auditory features and are also the most commonly used speech features at present. FIG. 1 is a flow chart of a prior art MFCC feature extraction method; as shown in fig. 1, the speech signal is converted from the analog domain to the digital domain, where data processing, including fourier transform, Mel filtering, etc., is performed. In the course of carrying out the embodiments of the present invention, the inventors found that: in the MFCC feature extraction process in fig. 1, the fourier transform process consumes considerable computation time and computation resources, and the analog-to-digital conversion process also consumes certain computation time and computation resources, thereby causing excessive energy consumption in the prior art.
Therefore, how to avoid the above-mentioned drawbacks, and effectively extract MFCC features and reduce the energy consumed in the extraction process becomes a problem that needs to be solved for low-power automatic speech recognition.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a method and a device for extracting MFCC features based on a mixed signal domain.
In a first aspect, an embodiment of the present invention provides a method for MFCC feature extraction based on a mixed signal domain, where the mixed signal domain includes an analog signal domain and a digital signal domain, and the method includes:
acquiring a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands;
calculating the time domain signals in each frequency band according to a preset operation rule;
carrying out low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band;
and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
In a second aspect, an embodiment of the present invention provides an MFCC feature extraction apparatus based on a mixed signal domain, where the mixed signal domain includes an analog signal domain and a digital signal domain, and the apparatus includes:
an acquisition unit configured to acquire a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands;
the operation unit is used for operating the time domain signals in each frequency band according to a preset operation rule;
the filtering unit is used for performing low-pass filtering processing on the operation result and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band;
and the extraction unit is used for converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor, a memory, and a bus, wherein,
the processor and the memory are communicated with each other through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform a method comprising:
acquiring a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands;
calculating the time domain signals in each frequency band according to a preset operation rule;
carrying out low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band;
and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, including:
the non-transitory computer readable storage medium stores computer instructions that cause the computer to perform a method comprising:
acquiring a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands;
calculating the time domain signals in each frequency band according to a preset operation rule;
carrying out low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band;
and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
According to the mixed signal domain-based MFCC feature extraction method and device provided by the embodiment of the invention, time domain signals of voice signals in different frequency bands are extracted from the analog signal domain, the time domain signals in each frequency band are subjected to operation and low-pass filtering, and the energy value obtained after the low-pass filtering is subjected to data processing in the digital signal domain, so that the MFCC feature can be effectively extracted, and the energy consumed in the extraction process is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a prior art MFCC feature extraction method;
FIG. 2 is a schematic flow chart of a mixed signal domain-based MFCC feature extraction method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a MFCC feature extraction method according to another embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a mixed signal domain-based MFCC feature extraction device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 2 is a schematic flow diagram of a mixed signal domain-based MFCC feature extraction method according to an embodiment of the present invention, and as shown in fig. 2, the mixed signal domain includes an analog signal domain and a digital signal domain, and includes the following steps:
s201: acquiring a preprocessed voice signal in the analog signal domain; and performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands.
Specifically, the device acquires a preprocessed voice signal in the analog signal domain; and performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands. FIG. 3 is a flowchart of a MFCC feature extraction method according to another embodiment of the present invention; as shown in fig. 3, the preprocessed voice signal may be a voice signal obtained by amplifying an original voice signal through a low noise amplifier.
S202: and calculating the time domain signals in each frequency band according to a preset operation rule.
Specifically, the device calculates the time domain signals in each frequency band according to a preset calculation rule. Referring to fig. 3, further, the time domain signal in each frequency band may be squared, and the time domain signal in each frequency band may be squared according to the following formula:
|x(t)|2
x (t) is the time domain signal of the speech signal, according to the Pasteval theorem:
wherein E isiIs the energy, x, of the ith frame speech signal in each frequency bandi(t) is the time domain signal, X, of the i-th frame speech signal in each frequency bandiAnd (omega) is a frequency domain signal of the ith frame voice signal in each frequency band. That is, the sum of the integrals of the squares of the time domain signals for a frequency band is equal to the frequency band2 pi times the integrated sum of the squares of the frequency domain signals.
S203: and performing low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band.
Specifically, the device performs low-pass filtering on the operation result, and uses the operation result after the low-pass filtering as the energy value of the time domain signal in each frequency band. Referring to fig. 3, a preset analog low-pass filter may be used to perform low-pass filtering on the operation result.
S204: and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
Specifically, the device converts the energy value into a digital signal, performs data processing on the converted energy value in the digital signal domain, and takes the result of the data processing as the extracted Mel cepstrum coefficient MFCC feature. Referring to fig. 3, an analog-to-digital converter with an ultra-low sampling rate (lower than a preset sampling rate threshold) may be used to convert the energy value into a digital signal. Then, the transformed energy value is subjected to framing, logarithm processing and Discrete Cosine Transform (DCT). For each frame of speech signal, the embodiment of the present invention generates and outputs the energy value in each frequency band in the analog signal domain, and the output rate thereof is less changed (for example, 80 Hz). Furthermore, while the framing step of the prior art implementation shown in fig. 1 is performed at the front end of the digital signal domain, in the embodiment of the present invention, the front end is in the analog signal domain, and the signal values cannot be stored, so that the framing with aliasing cannot be performed. The embodiment of the invention puts the framing step into a digital signal domain behind an analog-to-digital converter, and as the aliasing length is half of the frame length, namely the output signal change rate of the analog-to-digital converter is 80Hz, the output value of a half frame is stored in the digital signal domain and averaged with the output value of the next half frame, so as to obtain the average energy value of the frame.
The embodiment of the invention does not need the step of FFT (Fourier transform) with complex calculation shown in figure 1, and also utilizes the advantages of high energy efficiency and high speed of the analog circuit to complete the extraction and calculation of the energy distribution of the input voice signal with higher speed and higher energy efficiency. The method realized by the prior art directly accesses the 16bits and 16kHz analog-to-digital converter behind the sensor, and for a voice frame with the length of 25ms, each frame has 1400 16-bit sampling points, so that the operation cost of FFT and square operation is greatly increased, and simultaneously higher ADC energy consumption is introduced. In the embodiment of the invention, each frame at the analog-to-digital conversion part only has 40 sampling points of 16bits, so that the energy consumption of the analog-to-digital converter part is greatly reduced, the speed of the part is improved, and the operation cost of the logarithmic multiplication and DCT part is also reduced.
The embodiment of the invention carries out analog simulation on the processing circuit of the analog signal domain on the cadence platform by adopting a CMOS180nm process. In order to evaluate the performance of the MFCC features extracted by the embodiment of the invention, the embodiment of the invention is based on a transducer flow platform, and adopts a TI-DIGITS voice data set and an LSTM neural network to perform automatic voice recognition accuracy performance test. The test results are shown in table 1:
TABLE 1
Referring to table 1 above, the comparative results in energy consumption are very significant in the examples of the present invention compared to the prior art. Compared with an FPGA (field programmable gate array), the energy loss of each frame of MFCC feature extraction is saved by 97.2%, and compared with an ASIC (application specific integrated circuit), the energy loss is saved by 95.1%. Therefore, the embodiment of the invention has obvious saving effect on energy loss. Compared with the prior art, the embodiment of the invention has certain advantages on the speed characteristic of extracting the MFCC features, and the MFCC extracting speed of the FPGA, the DSP and the ASIC is several times or even tens of times of that of the embodiment of the invention. The GPU trades very high energy consumption for faster speed, but the GPU has no advantage in low-power consumption application scenarios, considering the comprehensive energy consumption and extraction speed. Because the data dimension is reduced in the front-end processing of the analog signal domain, the requirement for the analog-digital conversion part is greatly reduced, which is reflected in the aspect of sampling rate.
In summary, the embodiment of the present invention can greatly reduce the operation energy loss and the time loss in the extraction process, and eliminate the FFT which occupies a large amount of operation cost in the existing method. Compared with the prior art, the method saves energy consumption by at least 95.1 percent, and the operation speed is improved by more than 6.4 times. Simulation results also show that the MFCC feature extraction accuracy is as high as 99%. Compared with the MFCC feature extraction method in the prior art, the method and the device for extracting the MFCC features have the advantages and the effects in a low-power-consumption application scene are obvious.
According to the mixed signal domain-based MFCC feature extraction method provided by the embodiment of the invention, time domain signals of voice signals in different frequency bands are extracted in the analog signal domain, the time domain signals in each frequency band are subjected to operation and low-pass filtering, and the energy value obtained after the low-pass filtering is subjected to data processing in the digital signal domain, so that the MFCC feature can be effectively extracted, and the energy consumed in the extraction process is reduced.
On the basis of the above embodiment, the calculating the time domain signal in each frequency band according to the preset calculation rule includes:
and performing square operation on the time domain signals in each frequency band.
Specifically, the device performs a squaring operation on the time domain signals in each frequency band. Reference may be made to the above embodiments, which are not described in detail.
According to the MFCC feature extraction method based on the mixed signal domain, provided by the embodiment of the invention, the time domain signals in each frequency band are subjected to square operation, so that the operation result is more reasonable, and the normal operation of the method is ensured.
On the basis of the foregoing embodiment, the low-pass filtering processing on the operation result includes:
and carrying out low-pass filtering processing on the operation result by adopting a preset analog low-pass filter.
Specifically, the device performs low-pass filtering processing on the operation result by using a preset low-pass filter. Reference may be made to the above embodiments, which are not described in detail.
According to the MFCC feature extraction method based on the mixed signal domain, the operation result is subjected to low-pass filtering processing by adopting the preset low-pass filter, and the operation result can be effectively subjected to low-pass filtering processing.
On the basis of the above embodiment, the data processing of the converted energy value in the digital domain includes:
and framing, logarithm processing and Discrete Cosine Transform (DCT) are carried out on the converted energy value.
Specifically, the device performs framing, logarithm processing and Discrete Cosine Transform (DCT) on the converted energy value. Reference may be made to the above embodiments, which are not described in detail.
The MFCC feature extraction method based on the mixed signal domain provided by the embodiment of the invention can effectively extract the MFCC feature by performing framing, logarithm taking processing and Discrete Cosine Transform (DCT) on the converted energy value.
On the basis of the above embodiment, before the step of performing operation on the time domain signal in each frequency band according to the preset operation rule, the method further includes:
and acquiring the frequency characteristics of the voice signals.
Specifically, the device acquires a frequency characteristic of the voice signal. For example: the male voice is more concentrated in a region with a lower frequency than the female voice, and thus it can be determined whether the male voice or the female voice is the voice by the frequency characteristics.
And according to the frequency characteristics, determining the frequency distribution range in which the frequency characteristics are positioned, and closing the frequency bands which are not in the frequency distribution range.
Specifically, the device determines a frequency distribution range in which the frequency feature is located according to the frequency feature, and closes a frequency band that is not within the frequency distribution range. Referring to fig. 3, a band switching device (corresponding to the user switching device in fig. 3) may be previously provided after the low noise amplifier in fig. 3. Referring to the above example, if a male voice is confirmed, the frequency distribution range in which the male voice is located is determined, and the band switching device is adjusted to close the path of the frequency band not within the frequency distribution range, thereby ensuring that the low frequency part characteristic is not affected.
The MFCC feature extraction method based on the mixed signal domain further avoids information sampling in a useless frequency band, so that the analysis speed is improved, and the energy consumption is reduced.
On the basis of the above embodiment, the turning off the frequency bands not within the frequency distribution range includes:
and closing the path of the frequency band which is not in the frequency distribution range through a preset frequency band switching device.
Specifically, the device closes the path of the frequency band not within the frequency distribution range by a preset frequency band switching device. Reference may be made to the above embodiments, which are not described in detail.
According to the MFCC feature extraction method based on the mixed signal domain, the preset frequency band switch device is used for closing the access of the frequency band which is not in the frequency distribution range, information sampling in the useless frequency band is further effectively avoided, the analysis speed is improved, and the energy consumption is reduced.
Fig. 4 is a schematic structural diagram of a mixed signal domain-based MFCC feature extraction device according to an embodiment of the present invention, and as shown in fig. 4, an embodiment of the present invention provides a mixed signal domain-based MFCC feature extraction device, where the mixed signal domain includes an analog signal domain and a digital signal domain, the device includes an obtaining unit 401, an arithmetic unit 402, a filtering unit 403, and an extracting unit 404, where:
the obtaining unit 401 is configured to obtain a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands; the operation unit 402 is configured to perform operation on the time domain signals in each frequency band according to a preset operation rule; the filtering unit 403 is configured to perform low-pass filtering on the operation result, and use the operation result after the low-pass filtering as an energy value of the time domain signal in each frequency band; the extracting unit 404 is configured to convert the energy value into a digital signal, perform data processing on the converted energy value in the digital signal domain, and use a result of the data processing as the extracted mel-frequency cepstrum coefficient MFCC characteristic.
Specifically, the obtaining unit 401 is configured to obtain a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands; the operation unit 402 is configured to perform operation on the time domain signals in each frequency band according to a preset operation rule; the filtering unit 403 is configured to perform low-pass filtering on the operation result, and use the operation result after the low-pass filtering as an energy value of the time domain signal in each frequency band; the extracting unit 404 is configured to convert the energy value into a digital signal, perform data processing on the converted energy value in the digital signal domain, and use a result of the data processing as the extracted mel-frequency cepstrum coefficient MFCC characteristic.
According to the MFCC feature extraction device based on the mixed signal domain, provided by the embodiment of the invention, time domain signals of voice signals in different frequency bands are extracted in the analog signal domain, the time domain signals in each frequency band are subjected to operation and low-pass filtering, and the energy value obtained after the low-pass filtering is subjected to data processing in the digital signal domain, so that the MFCC feature can be effectively extracted, and the energy consumed in the extraction process is reduced.
The MFCC feature extraction apparatus based on a mixed signal domain provided in the embodiments of the present invention may be specifically configured to execute the processing flows of the above method embodiments, and the functions thereof are not described herein again, and refer to the detailed description of the above method embodiments.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device includes: a processor (processor)501, a memory (memory)502, and a bus 503;
the processor 501 and the memory 502 complete communication with each other through a bus 503;
the processor 501 is configured to call program instructions in the memory 502 to perform the methods provided by the above-mentioned method embodiments, for example, including: acquiring a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands; calculating the time domain signals in each frequency band according to a preset operation rule; carrying out low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band; and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, comprising: acquiring a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands; calculating the time domain signals in each frequency band according to a preset operation rule; carrying out low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band; and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided by the above method embodiments, for example, including: acquiring a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands; calculating the time domain signals in each frequency band according to a preset operation rule; carrying out low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band; and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the electronic device and the like are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may also be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for MFCC feature extraction based on a mixed signal domain, wherein the mixed signal domain comprises an analog signal domain and a digital signal domain, the method comprising:
acquiring a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands;
calculating the time domain signals in each frequency band according to a preset operation rule;
carrying out low-pass filtering processing on the operation result, and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band;
and converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
2. The method of claim 1, wherein the operating the time domain signals in each frequency band according to a preset operation rule comprises:
and performing square operation on the time domain signals in each frequency band.
3. The method according to claim 1, wherein the low-pass filtering the operation result comprises:
and carrying out low-pass filtering processing on the operation result by adopting a preset analog low-pass filter.
4. The method of claim 1, wherein said data processing of the converted energy values in the digital domain comprises:
and framing, logarithm processing and Discrete Cosine Transform (DCT) are carried out on the converted energy value.
5. The method according to any one of claims 1 to 4, wherein before the step of operating the time domain signals in each frequency band according to the preset operation rule, the method further comprises:
acquiring the frequency characteristics of the voice signal;
and according to the frequency characteristics, determining the frequency distribution range in which the frequency characteristics are positioned, and closing the frequency bands which are not in the frequency distribution range.
6. The method of claim 5, wherein turning off frequency bands not within the frequency distribution comprises:
and closing the path of the frequency band which is not in the frequency distribution range through a preset frequency band switching device.
7. A MFCC feature extraction apparatus based on a mixed signal domain, wherein the mixed signal domain comprises an analog signal domain and a digital signal domain, the method comprising:
an acquisition unit configured to acquire a preprocessed voice signal in the analog signal domain; performing Mel frequency analysis on the voice signal to extract time domain signals of the voice signal in different frequency bands;
the operation unit is used for operating the time domain signals in each frequency band according to a preset operation rule;
the filtering unit is used for performing low-pass filtering processing on the operation result and taking the operation result after the low-pass filtering processing as the energy value of the time domain signal in each frequency band;
and the extraction unit is used for converting the energy value into a digital signal, performing data processing on the converted energy value in the digital signal domain, and taking the result of the data processing as the extracted Mel frequency cepstrum coefficient MFCC characteristic.
8. An electronic device, comprising: a processor, a memory, and a bus, wherein,
the processor and the memory are communicated with each other through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 6.
CN201810615611.5A 2018-06-14 2018-06-14 MFCC feature extraction method and device based on mixed signal domain Active CN110610696B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810615611.5A CN110610696B (en) 2018-06-14 2018-06-14 MFCC feature extraction method and device based on mixed signal domain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810615611.5A CN110610696B (en) 2018-06-14 2018-06-14 MFCC feature extraction method and device based on mixed signal domain

Publications (2)

Publication Number Publication Date
CN110610696A true CN110610696A (en) 2019-12-24
CN110610696B CN110610696B (en) 2021-11-09

Family

ID=68888018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810615611.5A Active CN110610696B (en) 2018-06-14 2018-06-14 MFCC feature extraction method and device based on mixed signal domain

Country Status (1)

Country Link
CN (1) CN110610696B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667838A (en) * 2020-06-22 2020-09-15 清华大学 Low-power-consumption analog domain feature vector extraction method for voiceprint recognition
CN112634937A (en) * 2020-12-02 2021-04-09 爱荔枝科技(北京)有限公司 Sound classification method without digital feature extraction calculation
CN112951268A (en) * 2021-02-26 2021-06-11 北京百度网讯科技有限公司 Audio recognition method, apparatus and storage medium
CN112992123A (en) * 2021-03-05 2021-06-18 西安交通大学 Voice feature extraction circuit and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model
US8131543B1 (en) * 2008-04-14 2012-03-06 Google Inc. Speech detection
CN103390403A (en) * 2013-06-19 2013-11-13 北京百度网讯科技有限公司 Extraction method and device for mel frequency cepstrum coefficient (MFCC) characteristics
US20150066495A1 (en) * 2013-08-28 2015-03-05 Texas Instruments Incorporated Robust Feature Extraction Using Differential Zero-Crossing Countes
CN105118501A (en) * 2015-09-07 2015-12-02 徐洋 Speech recognition method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model
US8131543B1 (en) * 2008-04-14 2012-03-06 Google Inc. Speech detection
CN103390403A (en) * 2013-06-19 2013-11-13 北京百度网讯科技有限公司 Extraction method and device for mel frequency cepstrum coefficient (MFCC) characteristics
US20150066495A1 (en) * 2013-08-28 2015-03-05 Texas Instruments Incorporated Robust Feature Extraction Using Differential Zero-Crossing Countes
CN105118501A (en) * 2015-09-07 2015-12-02 徐洋 Speech recognition method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIHYUCK JO ET AL: "Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems", 《IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667838A (en) * 2020-06-22 2020-09-15 清华大学 Low-power-consumption analog domain feature vector extraction method for voiceprint recognition
CN111667838B (en) * 2020-06-22 2022-10-14 清华大学 Low-power-consumption analog domain feature vector extraction method for voiceprint recognition
CN112634937A (en) * 2020-12-02 2021-04-09 爱荔枝科技(北京)有限公司 Sound classification method without digital feature extraction calculation
CN112951268A (en) * 2021-02-26 2021-06-11 北京百度网讯科技有限公司 Audio recognition method, apparatus and storage medium
CN112951268B (en) * 2021-02-26 2023-01-10 北京百度网讯科技有限公司 Audio recognition method, apparatus and storage medium
CN112992123A (en) * 2021-03-05 2021-06-18 西安交通大学 Voice feature extraction circuit and method

Also Published As

Publication number Publication date
CN110610696B (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN110610696B (en) MFCC feature extraction method and device based on mixed signal domain
CN110634497B (en) Noise reduction method and device, terminal equipment and storage medium
CN106486131B (en) A kind of method and device of speech de-noising
US10178228B2 (en) Method and apparatus for classifying telephone dialing test audio based on artificial intelligence
WO2021139327A1 (en) Audio signal processing method, model training method, and related apparatus
WO2018223727A1 (en) Voiceprint recognition method, apparatus and device, and medium
CN109360572B (en) Call separation method and device, computer equipment and storage medium
US20200365173A1 (en) Method for constructing voice detection model and voice endpoint detection system
CN107221343B (en) Data quality evaluation method and evaluation system
CN110880329A (en) Audio identification method and equipment and storage medium
CN102945673A (en) Continuous speech recognition method with speech command range changed dynamically
CN111540342B (en) Energy threshold adjusting method, device, equipment and medium
CN113870885B (en) Bluetooth audio squeal detection and suppression method, device, medium, and apparatus
CN109065043A (en) A kind of order word recognition method and computer storage medium
WO2022218254A1 (en) Voice signal enhancement method and apparatus, and electronic device
US20230267947A1 (en) Noise reduction using machine learning
WO2024041512A1 (en) Audio noise reduction method and apparatus, and electronic device and readable storage medium
CN116052706B (en) Low-complexity voice enhancement method based on neural network
CN111326159A (en) Voice recognition method, device and system
CN111489739A (en) Phoneme recognition method and device and computer readable storage medium
CN106340310A (en) Speech detection method and device
CN112289311A (en) Voice wake-up method and device, electronic equipment and storage medium
CN115273880A (en) Voice noise reduction method, model training method, device, equipment, medium and product
CN105513587B (en) MFCC extraction method and device
CN113889098A (en) Command word recognition method and device, mobile terminal and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant