CN117636901A

CN117636901A - Low-power consumption MFCC feature extraction method and device

Info

Publication number: CN117636901A
Application number: CN202311755750.5A
Authority: CN
Inventors: 何伊妮; 曹国忠; 张涌
Original assignee: Xiamen Semiconductor Industry Technology Research And Development Co ltd
Current assignee: Xiamen Semiconductor Industry Technology Research And Development Co ltd
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-03-01

Abstract

The invention discloses a low-power consumption MFCC feature extraction method and a device, wherein the method comprises the steps of obtaining a voice signal; performing pre-emphasis, framing, windowing, fast Fourier transform, mel filtering, logarithmic operation and discrete cosine transform processing on the voice signal by adopting different processing modules so as to obtain MFCC characteristics corresponding to the voice signal; acquiring a first-order amplitude value in the MFCC feature, and generating a corresponding initial bit width according to the first-order amplitude value so as to adjust bit widths corresponding to different processing modules according to the initial bit width; therefore, the first-order amplitude value in the MFCC characteristic is obtained, the bit width is quantized, and the bit width is fed back to the input bit width of each stage of operation of the front stage of accurate quantization, so that the data bit width is truncated on the premise of ensuring the identification accuracy, and the operation power consumption is reduced.

Description

Low-power consumption MFCC feature extraction method and device

Technical Field

The present invention relates to the field of speech processing technologies, and in particular, to a low power MFCC feature extraction method, a low power MFCC feature extraction device, and an electronic device.

Background

In the related art, as shown in fig. 1, the characteristic extraction of the mel-frequency cepstrum coefficient (MFCC, mel Frequency Cepstrum Coefficient) of the existing voice is that after the voice time domain signal collected by the microphone is converted into a digital signal through the ADC, the characteristic value of the voice signal is obtained through pre-emphasis, framing, windowing, fast fourier transform, mel filtering, logarithmic operation and discrete cosine transform respectively; the method comprises the steps of enhancing the suppressed high-frequency characteristics through a pre-emphasis, framing and windowing module, performing short-time analysis, converting a time domain voice signal into a frequency domain through fast Fourier transform (FFT, fast Fourier Transform), simplifying and compressing the frequency domain signal through Mel band-pass filtering, analyzing in each frequency band, and finally performing logarithmic operation and discrete cosine transform to convert voice energy distribution of each frequency band into voice MFCC characteristics; because the existing voice MFCC feature extraction requires high-frequency operation of a large-scale circuit and directly uses the maximum bit width for calculation in the operation process, the whole power consumption and the post-processing module power consumption are large, bit width redundancy is easy to occur, and the whole area and the power consumption of a circuit and a system are affected.

Disclosure of Invention

The present invention aims to solve at least to some extent one of the technical problems in the above-described technology. Therefore, an object of the present invention is to provide a low power consumption MFCC feature extraction method, which obtains a first order amplitude value in MFCC features and quantizes a bit width, and feeds back the bit width to a previous stage to precisely quantize an input bit width of each stage of operation, so as to cut off the data bit width on the premise of ensuring recognition accuracy, thereby reducing operation power consumption.

In order to achieve the above objective, a low power MFCC feature extraction method according to an embodiment of the first aspect of the present invention includes the following steps: acquiring a voice signal; performing pre-emphasis, framing, windowing, fast Fourier transform, mel filtering, logarithmic operation and discrete cosine transform processing on the voice signal by adopting different processing modules so as to obtain MFCC characteristics corresponding to the voice signal; and acquiring a first-order amplitude value in the MFCC characteristics, and generating a corresponding initial bit width according to the first-order amplitude value so as to adjust the bit widths corresponding to the different processing modules according to the initial bit width.

According to the low-power consumption MFCC feature extraction method provided by the embodiment of the invention, firstly, a voice signal is obtained; then, performing pre-emphasis, framing, windowing, fast Fourier transform, mel filtering, logarithmic operation and discrete cosine transform processing on the voice signal by adopting different processing modules so as to obtain MFCC characteristics corresponding to the voice signal; finally, a first order amplitude value in the MFCC characteristics is obtained, and a corresponding initial bit width is generated according to the first order amplitude value, so that the bit widths corresponding to different processing modules are adjusted according to the initial bit width; therefore, the first-order amplitude value in the MFCC characteristic is obtained, the bit width is quantized, and the bit width is fed back to the input bit width of each stage of operation of the front stage of accurate quantization, so that the data bit width is truncated on the premise of ensuring the identification accuracy, and the operation power consumption is reduced.

In addition, the low-power consumption MFCC feature extraction method according to the embodiment of the present invention may further have the following additional technical features:

optionally, the method further comprises: and judging whether to output a control signal to control the gating circuit to be started according to the first-order amplitude value so as to delay the standby time of the equipment.

Optionally, generating a corresponding initial bit width according to the first order magnitude includes: judging whether the first-order amplitude is smaller than or equal to a first preset threshold value, if so, setting the initial bit width as a fifth preset threshold value, if not, continuously judging whether the first-order amplitude is smaller than or equal to a second preset threshold value, and if so, setting the initial bit width as a sixth preset threshold value; if not, continuing to judge whether the first-order amplitude is smaller than or equal to a third preset threshold value, and if so, setting the initial bit width as a seventh preset threshold value; if not, continuing to judge whether the first-order amplitude value is smaller than or equal to a fourth preset threshold value, and if so, setting the initial bit width as an eighth preset threshold value.

Further, after the first-order amplitude value in the MFCC feature is obtained, it is further determined whether the first-order amplitude value is greater than a feature data threshold value, and if so, the count value of the counter is cleared.

Further, the first preset threshold is smaller than the second preset threshold, the second preset threshold is smaller than the third preset threshold, and the third preset threshold is smaller than the fourth preset threshold; the fifth preset threshold is smaller than the sixth preset threshold, the sixth preset threshold is smaller than the seventh preset threshold, and the seventh preset threshold is smaller than the eighth preset threshold.

Optionally, adjusting the bit widths corresponding to the different processing modules according to the initial bit width includes: and taking the initial bit width as the input bit width of the discrete cosine transform so as to feed back to a front stage to adaptively adjust the bit widths corresponding to different processing modules.

Optionally, determining whether to output a control signal to control the gate to be turned on according to the first-order magnitude includes: and judging whether the first-order amplitude value is smaller than or equal to the characteristic data threshold value, if so, adding one to the count value of the counter, judging whether the added count value is larger than or equal to a ninth preset threshold value, if so, judging that the voice is invalid, outputting a control signal to control the gating circuit to be started, and emptying the count value of the counter.

In order to achieve the above object, a second aspect of the present invention provides a low power MFCC feature extraction apparatus, including an obtaining module, configured to obtain a voice signal; the characteristic extraction module is used for carrying out pre-emphasis, framing, windowing, fast Fourier transformation, mel filtering, logarithmic operation and discrete cosine transformation processing on the voice signal by adopting different processing modules so as to obtain the MFCC characteristics corresponding to the voice signal; and the control module is used for acquiring a first-order amplitude value in the MFCC characteristics and generating a corresponding initial bit width according to the first-order amplitude value so as to adjust the bit widths corresponding to the different processing modules according to the initial bit width.

According to the low-power consumption MFCC feature extraction device, a voice signal is acquired through an acquisition module; the characteristic extraction module adopts different processing modules to perform pre-emphasis, framing, windowing, fast Fourier transformation, mel filtering, logarithmic operation and discrete cosine transformation processing on the voice signals so as to obtain MFCC characteristics corresponding to the voice signals; the control module acquires a first-order amplitude value in the MFCC characteristics, and generates a corresponding initial bit width according to the first-order amplitude value so as to adjust bit widths corresponding to different processing modules according to the initial bit width; therefore, the first-order amplitude value in the MFCC characteristic is obtained, the bit width is quantized, and the bit width is fed back to the input bit width of each stage of operation of the front stage of accurate quantization, so that the data bit width is truncated on the premise of ensuring the identification accuracy, and the operation power consumption is reduced.

In addition, the low-power MFCC feature extraction device according to the embodiment of the present invention may further have the following additional technical features:

optionally, the control module is further configured to determine whether to output a control signal to control the gate to be turned on according to the first-order amplitude, so as to delay the standby time of the device.

To achieve the above object, an embodiment of a third aspect of the present invention provides an electronic device, including a processor, a memory, and a bus, where the processor and the memory complete communication with each other through the bus; the memory stores program instructions executable by the processor that are invoked by the processor to perform the low power MFCC feature extraction method described above.

According to the electronic equipment provided by the embodiment of the invention, the data bit width can be truncated on the premise of ensuring the identification accuracy by the low-power consumption MFCC feature extraction method, so that the operation power consumption is reduced.

Drawings

FIG. 1 is a block diagram of a prior art low power MFCC feature extraction method;

FIG. 2 is a flow chart of a low power MFCC feature extraction method according to one embodiment of the present invention;

FIG. 3 is a block diagram of a low power MFCC feature extraction method according to one embodiment of the present invention;

FIG. 4 is a schematic circuit architecture diagram of low power MFCC feature extraction according to one embodiment of the present invention;

FIG. 5 is a flow chart of a low power MFCC feature extraction method according to one embodiment of the present invention;

FIG. 6 is a block schematic diagram of a low power MFCC feature extraction apparatus in accordance with one embodiment of the invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.

In the related technology, in order to keep the always-open voice interaction, a large number of small-sized voice sensing nodes need to perform uninterrupted voice recognition so as to wake up and interact, the power consumption of the whole voice recognition system is critical, and the standby time of the voice recognition equipment is directly determined by the power consumption of the system; therefore, for the voice wake-up system, low power consumption should be emphasized, and the power consumption of voice recognition should be reduced as much as possible on the premise of ensuring the recognition accuracy; in the hardware design, the multiplication and addition operation increases the data bit width, so that the bit width is accumulated every time the MFCC circuit is subjected to one-stage operation, and the bit width is amplified in multiple times finally, so that the bit width redundancy is caused, and the whole area and the power consumption of the circuit and the system are affected.

Therefore, the invention provides a low-power consumption MFCC feature extraction method, which comprises the steps of pre-emphasizing, framing, windowing, fast Fourier transform, mel filtering, logarithmic operation and discrete cosine transform processing on an acquired voice signal by adopting different processing modules so as to obtain the MFCC feature corresponding to the voice signal; then, a first order amplitude value in the MFCC characteristics is obtained, and a corresponding initial bit width is generated according to the first order amplitude value, so that the bit widths corresponding to different processing modules are adjusted according to the initial bit width; therefore, the first-order amplitude value in the MFCC characteristic is obtained, the bit width is quantized, and the bit width is fed back to the input bit width of each stage of operation of the front stage of accurate quantization, so that the data bit width is truncated on the premise of ensuring the identification accuracy, and the operation power consumption is reduced.

In order that the above-described aspects may be better understood, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.

The low power MFCC feature extraction method according to the embodiment of the present invention is described below with reference to the accompanying drawings.

Referring to fig. 2, the low-power MFCC feature extraction method provided by the embodiment of the invention includes the following steps:

s101, acquiring a voice signal.

It should be noted that, the voice signal may be obtained by converting a voice time domain signal collected by the microphone into a digital signal through an analog-to-digital converter.

S102, performing pre-emphasis, framing, windowing, fast Fourier transformation, mel filtering, logarithmic operation and discrete cosine transformation processing on the voice signal by adopting different processing modules so as to obtain the corresponding MFCC characteristics of the voice signal.

It should be noted that, as shown in fig. 3, the different processing modules include a pre-emphasis module, a framing module, a windowing module, a fast fourier transform module, a mel filtering module, a logarithmic operation module and a discrete cosine transform module, where the pre-emphasis module is sequentially connected with the framing module, the windowing module, the fast fourier transform module, the mel filtering module, the logarithmic operation module and the discrete cosine transform module, and the voice signal sequentially performs pre-emphasis, framing, windowing, fast fourier transform, mel filtering, logarithmic operation and discrete cosine transform processing through the above modules to obtain MFCC characteristics corresponding to the voice signal.

As an embodiment, as shown in fig. 4, the pre-emphasis module is equivalent to a high-pass filter, so as to compensate the attenuation of the voice signal in the high-frequency part, the voice signal needs to pass through the high-frequency filter, and the pre-emphasis coefficient with smaller influence is set to be 15/16, so that the calculation amount is reduced, namely, the voice signal is firstly input into the high-frequency filter to be subjected to pre-emphasis processing; the voice signal has short-time stability in a short period of time, so that the pre-emphasis processed voice signal can be subjected to framing treatment through a framing module, meanwhile, the segmentation of the continuous signal is subjected to smoothing treatment through a windowing module, and a Hamming window function is added to obtain a time sequence signal; converting the time sequence signal obtained before into a frequency domain signal through a fast Fourier transform module; then, the frequency domain signal is converted into Mel frequency which accords with the perception of human ears through a Mel filtering module; then, the characteristics of the low-frequency part are enhanced through logarithmic operation, and the high-frequency characteristics are weakened; and finally, extracting signal frequency characteristics through discrete cosine transformation to obtain MFCC characteristics.

S103, acquiring a first-order amplitude value in the MFCC feature, and generating a corresponding initial bit width according to the first-order amplitude value so as to adjust bit widths corresponding to different processing modules according to the initial bit width.

That is, as shown in fig. 3, on the basis of the discrete cosine transform (DCT, discrete Cosine Transform) operation output mel-cepstrum coefficient, the maximum bit width is determined by calculating the maximum amplitude energy thereof by the control module, and the bit width output by the DCT is used as a target value for controlling and adjusting the set bit width of the previous steps, thereby reducing the data flip amount and effectively reducing the overall power consumption.

In addition, since the amplitude of the first order (first frame) in the MFCC characteristics outputted from the DCT is the largest, the corresponding required bit width can be obtained from the first order amplitude.

As one embodiment, generating a corresponding initial bit width from the first order magnitude includes: judging whether the first-order amplitude is smaller than or equal to a first preset threshold value, if so, setting the initial bit width as a fifth preset threshold value, if not, continuously judging whether the first-order amplitude is smaller than or equal to a second preset threshold value, and if so, setting the initial bit width as a sixth preset threshold value; if not, continuing to judge whether the first-order amplitude is smaller than or equal to a third preset threshold value, and if so, setting the initial bit width as a seventh preset threshold value; if not, continuing to judge whether the first-order amplitude is smaller than or equal to a fourth preset threshold value, and if so, setting the initial bit width as an eighth preset threshold value.

It should be noted that, the first preset threshold value is smaller than the second preset threshold value, the second preset threshold value is smaller than the third preset threshold value, and the third preset threshold value is smaller than the fourth preset threshold value; the fifth preset threshold is smaller than the sixth preset threshold, the sixth preset threshold is smaller than the seventh preset threshold, and the seventh preset threshold is smaller than the eighth preset threshold.

That is, the larger the amplitude corresponds to the larger the number of bits, and the larger the number of bits, the larger the bit width is required for transmission.

As one embodiment, adjusting the bit widths corresponding to different processing modules according to the initial bit width includes: the initial bit width is used as the input bit width of discrete cosine transform, so as to feed back to the previous stage to adaptively adjust the bit widths corresponding to different processing modules.

That is, the obtained initial bit width is used as the DCT bit width, and the bit widths corresponding to the logarithmic operation module, the mel filtering module, the fast fourier transform module, the windowing module, the framing module and the pre-emphasis module are obtained by pushing from back to front according to the actual operation requirement, wherein the bit widths corresponding to the logarithmic operation module, the mel filtering module, the fast fourier transform module, the windowing module, the framing module and the pre-emphasis module are set according to the actual requirement, for example, different hamming windows are selected to correspond to different bit width requirements, and after the bit widths corresponding to different processing modules are obtained, the original redundant bit widths are deleted, so that the data bit widths can be truncated on the premise of ensuring the identification accuracy according to the required precision requirement, thereby achieving the purpose of low-power consumption operation.

As one embodiment, it is further determined whether to output a control signal to control the gating circuit to be turned on according to the first-order amplitude value, so as to delay the standby time of the device.

In order to further reduce the standby time of the voice wake-up technical device, the operation power consumption of the device needs to be reduced, therefore, circuit gating is set for voice data, an MFCC characteristic data threshold is set, if the maximum amplitude of the first order of the DCT output data is smaller than the threshold and is smaller than 30 continuous frames (1 s), the voice is determined to be invalid, gating is performed, thereby reducing the neural network recognition and post-processing flow, further reducing the power consumption, and delaying the standby time of the device.

As a specific embodiment, the determining whether to output the control signal to control the gate to open according to the first-order magnitude includes: and judging whether the first-order amplitude value is larger than a characteristic data threshold value, if so, clearing the count value of the counter, if not, adding one to the count value of the counter, judging whether the added count value is larger than or equal to a ninth preset threshold value, if so, judging that the voice is invalid, outputting a control signal to control the gate control circuit to be started, and clearing the count value of the counter.

That is, if the first-order amplitude is greater than the characteristic data threshold, the voice is effective, so the above-mentioned bit width adjustment flow is entered, and if the first-order amplitude is less than or equal to the characteristic data threshold and the continuous 30 frames (1 s) are all less, the voice is determined to be ineffective, and gating is performed; wherein the characteristic data threshold is set according to actual conditions.

As a specific embodiment, as shown in fig. 5, the low power MFCC feature extraction method includes the steps of:

s1, obtaining a Mel cepstrum coefficient output by DCT.

S2, calculating the maximum amplitude of the first order.

S3, judging whether the maximum amplitude of the first order is larger than a threshold value. If not, the counter is incremented by one, and step S4 is executed; if so, the counter is cleared and step S5 is performed.

S4, judging whether the count value of the counter is greater than or equal to 30. If yes, gating is carried out; if not, returning to the step S2.

S5, judging whether the maximum amplitude of the first order is smaller than or equal to 32. If yes, setting DCT bit width as 5 bits, and executing step S9; if not, step S6 is performed.

S6, judging whether the maximum amplitude of the first order is smaller than or equal to 64. If yes, setting DCT bit width as 6bit, and executing step S9; if not, step S7 is performed.

S7, judging whether the maximum amplitude of the first order is smaller than or equal to 128. If yes, setting DCT bit width as 7 bits, and executing step S9; if not, step S8 is performed.

S8, judging whether the maximum amplitude of the first order is less than or equal to 256. If so, the DCT bit-width is set to 8 bits, and step S9 is performed.

S9, adjusting bit widths of different modules of the MFCC according to the DCT bit width.

It should be noted that, through the DCT output amplitude adjustment overall identification flow bit width, if the assignment is smaller than the threshold value (set according to the actual situation), and in 30 continuous frames (1 s) are smaller, the voice is determined to be invalid, and the gating is performed; when the voice is valid, determining the output bit width by judging the size of the DCT output assignment: when the maximum amplitude is greater than 128 and smaller than 256, the DCT bit width output is 8 bits, and when the maximum amplitude is smaller than 32, the DCT output bit width is 5 bits, and the bit widths corresponding to different modules of the MFCC are fed back to the front-stage adaptive adjustment, so that the whole data turnover amount is effectively reduced, the accurate quantization is realized, and the 15% power consumption is reduced; the result of reasoning between the hardware MFCC and the software MFCC is tested, and the recognition rate is reduced by only 0.7%.

When the bit width adjustment design is obtained according to experiments, a control module is added in the flow of the MFCC algorithm, the MFCC operation is precisely quantized in the data bit width, and the calculation amount of the operation bit width and the calculation amount of the neural network module and the post-processing module are effectively reduced; the circuit area of the feature recognition module is increased by about 10% due to the need of adding resources such as a state machine and the like; however, the dynamic power consumption of the simulation MFCC processing data is reduced by 20%, the overall power consumption is reduced by 10%, and for the overall voice wake-up total system, the overall power consumption is obviously reduced, and the voice recognition accuracy is reduced by only within 0.7%. When the voice wake-up system is subjected to gating design, the power consumption of the MFCC feature extraction module is reduced by 15%, and the MFCC output bit width is reduced due to the fact that the feature parameters are subjected to threshold setting, so that in the subsequent processing module, the neural network module, the post-processing module and other modules can be reduced by about 20%, and the total power consumption can be reduced by about 30%.

In order to implement the above embodiment, as shown in fig. 6, an embodiment of the present invention further provides a low-power MFCC feature extraction device, which includes an obtaining module 10, configured to obtain a voice signal; the feature extraction module 20 is configured to perform pre-emphasis, framing, windowing, fast fourier transform, mel filtering, logarithmic operation and discrete cosine transform processing on the voice signal by using different processing modules, so as to obtain MFCC features corresponding to the voice signal; control module 30 is configured to obtain a first order magnitude in the MFCC feature, and generate a corresponding initial bit width according to the first order magnitude, so as to adjust bit widths corresponding to different processing modules according to the initial bit width.

Optionally, the control module 30 is further configured to determine whether to output the control signal to control the gate to be turned on according to the first-order amplitude, so as to delay the standby time of the device.

It should be noted that the above description of the low-power MFCC feature extraction method is also applicable to the low-power MFCC feature extraction device, and will not be described herein.

In summary, according to the low-power MFCC feature extraction device of the embodiment of the present invention, the obtaining module obtains the voice signal; the characteristic extraction module adopts different processing modules to perform pre-emphasis, framing, windowing, fast Fourier transformation, mel filtering, logarithmic operation and discrete cosine transformation processing on the voice signals so as to obtain MFCC characteristics corresponding to the voice signals; the control module acquires a first-order amplitude value in the MFCC characteristics, and generates a corresponding initial bit width according to the first-order amplitude value so as to adjust bit widths corresponding to different processing modules according to the initial bit width; therefore, the first-order amplitude value in the MFCC characteristic is obtained, the bit width is quantized, and the bit width is fed back to the input bit width of each stage of operation of the front stage of accurate quantization, so that the data bit width is truncated on the premise of ensuring the identification accuracy, and the operation power consumption is reduced.

In order to achieve the above embodiments, the embodiments of the present invention further provide an electronic device, including a processor, a memory, and a bus, where the processor and the memory complete communication with each other through the bus; the memory stores program instructions executable by the processor, which can be invoked by the processor to perform the low power MFCC feature extraction method described above.

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

In the present invention, unless expressly stated or limited otherwise, a first feature "above" or "below" a second feature may include both the first and second features being in direct contact, as well as the first and second features not being in direct contact but being in contact with each other through additional features therebetween. Moreover, a first feature being "above," "over" and "on" a second feature includes the first feature being directly above and obliquely above the second feature, or simply indicating that the first feature is higher in level than the second feature. The first feature being "under", "below" and "beneath" the second feature includes the first feature being directly under and obliquely below the second feature, or simply means that the first feature is less level than the second feature.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be understood as necessarily being directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art can engage and combine the different embodiments or examples described in this specification.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. A method for low power MFCC feature extraction, the method comprising:

acquiring a voice signal;

performing pre-emphasis, framing, windowing, fast Fourier transform, mel filtering, logarithmic operation and discrete cosine transform processing on the voice signal by adopting different processing modules so as to obtain MFCC characteristics corresponding to the voice signal;

and acquiring a first-order amplitude value in the MFCC characteristics, and generating a corresponding initial bit width according to the first-order amplitude value so as to adjust the bit widths corresponding to the different processing modules according to the initial bit width.

2. The low-power MFCC feature extraction method of claim 1, further comprising: and judging whether to output a control signal to control the gating circuit to be started according to the first-order amplitude value so as to delay the standby time of the equipment.

3. The low power MFCC feature extraction method of claim 2, wherein generating the corresponding initial bit width based on the first order magnitude comprises:

judging whether the first-order amplitude is smaller than or equal to a first preset threshold value, if so, setting the initial bit width as a fifth preset threshold value, if not, continuously judging whether the first-order amplitude is smaller than or equal to a second preset threshold value, and if so, setting the initial bit width as a sixth preset threshold value;

if not, continuing to judge whether the first-order amplitude is smaller than or equal to a third preset threshold value, and if so, setting the initial bit width as a seventh preset threshold value;

if not, continuing to judge whether the first-order amplitude value is smaller than or equal to a fourth preset threshold value, and if so, setting the initial bit width as an eighth preset threshold value.

4. The low-power MFCC feature extraction method of claim 3, wherein after the first order amplitude in the MFCC feature is obtained, further determining whether the first order amplitude is greater than a feature data threshold, and if so, clearing a count value of a counter.

5. The low power MFCC feature extraction method of claim 4, wherein the first predetermined threshold is less than the second predetermined threshold, the second predetermined threshold is less than the third predetermined threshold, and the third predetermined threshold is less than the fourth predetermined threshold; the fifth preset threshold is smaller than the sixth preset threshold, the sixth preset threshold is smaller than the seventh preset threshold, and the seventh preset threshold is smaller than the eighth preset threshold.

6. The method for extracting low-power MFCC features of claim 5, wherein the adjusting the bit widths corresponding to the different processing modules according to the initial bit width comprises: and taking the initial bit width as the input bit width of the discrete cosine transform so as to feed back to a front stage to adaptively adjust the bit widths corresponding to different processing modules.

7. The method of claim 6, wherein determining whether to output a control signal to control a gating circuit to be turned on based on the first order magnitude comprises:

and judging whether the first-order amplitude value is smaller than or equal to the characteristic data threshold value, if so, adding one to the count value of the counter, judging whether the added count value is larger than or equal to a ninth preset threshold value, if so, judging that the voice is invalid, outputting a control signal to control the gating circuit to be started, and emptying the count value of the counter.

8. A low power MFCC feature extraction apparatus, comprising:

the acquisition module is used for acquiring the voice signal;

the characteristic extraction module is used for carrying out pre-emphasis, framing, windowing, fast Fourier transformation, mel filtering, logarithmic operation and discrete cosine transformation processing on the voice signal by adopting different processing modules so as to obtain the MFCC characteristics corresponding to the voice signal;

and the control module is used for acquiring a first-order amplitude value in the MFCC characteristics and generating a corresponding initial bit width according to the first-order amplitude value so as to adjust the bit widths corresponding to the different processing modules according to the initial bit width.

9. The low power MFCC feature extraction apparatus of claim 8, wherein the control module is further configured to determine whether to output a control signal to control the gating circuit to open based on the first order magnitude, so as to delay a standby time of the device.

10. An electronic device comprising a processor, a memory, and a bus, wherein,

the processor and the memory complete communication with each other through the bus;

the memory stores program instructions executable by the processor, the processor invoking the program instructions capable of performing the low power MFCC feature extraction method of any of claims 1-7.