CN115798491B

CN115798491B - Audio feature extraction device, method, chip and electronic equipment

Info

Publication number: CN115798491B
Application number: CN202310044184.0A
Authority: CN
Inventors: 苏尼尔·希拉万特; 赛义德·哈格哈特舒尔; 图芭·代米尔吉; 乔宁
Original assignee: Shenzhen Shizhi Technology Co ltd
Current assignee: Shenzhen Shizhi Technology Co ltd
Priority date: 2023-01-29
Filing date: 2023-01-29
Publication date: 2023-05-30
Anticipated expiration: 2043-01-29
Also published as: CN115798491A

Abstract

The invention discloses an audio feature extraction device, an audio feature extraction method, a chip and electronic equipment. In order to solve the technical problem that the existing analog audio front end is sensitive to technology and manufacturing, the audio front end of the invention uses a digital-analog hybrid circuit, and the digital-analog hybrid circuit comprises a low-noise amplifier and an asynchronous delta modulator, wherein the asynchronous delta modulator is coupled with the low-noise amplifier and is used for realizing digitization in an asynchronous mode so as to realize digitization to the maximum degree after the low-noise amplifier. Sparse data representation of the input audio signal is realized, the resource occupation is less, and the sparse data representation has power consumption almost equal to that of the front end of the analog audio. The invention is suitable for the field of brain-like perception or/and computation.

Description

Audio feature extraction device, method, chip and electronic equipment

Technical Field

The present invention relates to an audio feature extraction device, method, chip and electronic device, and more particularly, to an audio feature extraction device, method, chip and electronic device that are based on an event-driven mechanism and implement digitization in an asynchronous manner.

Background

Neuromorphic chips (also known as brain-like chips) often use an Audio Front End (AFE) to process the original Audio signal, extract Audio features, encode the extracted Audio features into a stream of pulsed events, and be identified by a neural network processor (e.g., a pulsed neural network SNN), as shown in fig. 1. The audio front-end includes a low noise amplifier (Low Noise Amplifier, LNA), a Band Pass Filter (BPF), a Rectifier (FWR), a pulse generator (Leakyintegrate and fire, LIF or integrated-and-fire, IAF), etc.

The existing audio front-end AFE is often realized by using an analog circuit, has higher processing speed and ultra-low power consumption, however, the analog band-pass filter BPF in the audio front-end AFE has large area (a large number of resistors and capacitors) and is sensitive to temperature change, and the performance is influenced by factors such as technology, manufacturing and the like.

In order to solve the problems, the invention searches for an alternative way to realize the analog audio front end in a digital mode to the maximum extent after the low noise amplifier, and has the equivalent power consumption and speed of the analog AFE.

Disclosure of Invention

In order to solve or alleviate some or all of the above technical problems, the present invention is implemented by the following technical solutions:

an audio feature extraction device comprising a low noise amplifier and an asynchronous delta modulator; the low-noise amplifier is used for amplifying the audio acquired by the microphone with low noise; the asynchronous delta modulator is coupled to the low noise amplifier for digitizing in an asynchronous manner.

In one embodiment, the asynchronous delta modulator performs sampling with non-uniform time steps and uniform amplitude.

In an embodiment, at least one threshold is obtained based on the uniform amplitude, the at least one threshold being n times the uniform amplitude, n=0, 1,2 … …; the asynchronous delta modulator receives a signal from the low noise amplifier that is continuous in time and amplitude, and samples the signal based on the at least one threshold; the signal and the at least one threshold value produce an event at a horizontal intersection.

In certain classes of embodiments, the events include a first event and a second event; generating a first event when the signal slope is positive and the signal rises to or above the at least one threshold; a second event is generated when the signal slope is negative and the signal falls to or below the at least one threshold.

In certain types of embodiments, the high frequency digital clock is utilized to track and locate the time stamps of the occurrence of the first event or/and the second event corresponding to the respective thresholds.

In some class of embodiments, whatThe frequency of the high-frequency clock is greater than or equal to

Where W is the audio bandwidth and N is the quantization accuracy of the maximum amplitude of the signal, which is equal to N times the uniform amplitude.

In a class of embodiments, the audio feature extraction device further comprises: a plurality of parallel channels, each parallel channel comprising a digital bandpass filter having a different center frequency; the digital bandpass filter in each parallel channel is coupled to the asynchronous delta modulator; the output of the digital band pass filter is calculated using differential encoding.

In certain classes of embodiments, the time stamps of the first events corresponding to the respective thresholds are each encoded as +.1 per threshold using differential encoding, or/and the time stamps of the second events corresponding to the respective thresholds are encoded as-1, the remaining time stamps being encoded as 0.

In a class of embodiments, a shift register is coupled between the asynchronous delta modulator and a plurality of parallel channels; the shift register stores the sampling result of the asynchronous delta modulator according to a threshold value; the digital band-pass filter performs differential encoding on the values in the shift register to obtain differential encoding event streams corresponding to the thresholds.

In certain classes of embodiments, the asynchronous delta modulator resets one bit in a multi-bit shift register according to the crossing of a signal on a positive or negative direction with the at least one threshold value on a level.

An audio feature extraction method is characterized in that:

the low-noise amplifier is used for amplifying the audio acquired by the microphone with low noise;

an asynchronous delta modulator is coupled to the low noise amplifier, receives signals from the low noise amplifier that are continuous in time and amplitude, and digitizes them in an asynchronous manner.

In certain classes of embodiments, at least one threshold is derived based on a uniform amplitude, the asynchronous delta modulator sampling the signal based on the at least one threshold; the at least one threshold is n times the uniform amplitude, n=0, 1,2 … …; the signal generates an event at a crossing with the at least one threshold.

In certain classes of embodiments, the events include a first event and a second event; generating a first event when the signal slope is positive and the signal rises to or above the at least one threshold; when the signal slope is negative and the signal falls to or below the at least one threshold, a second event is generated.

In certain classes of embodiments, the audio feature extraction method further comprises: a plurality of parallel channels, each parallel channel comprising a digital bandpass filter having a different center frequency; the digital bandpass filter is coupled to the asynchronous delta modulator; the output of the digital band pass filter is calculated using differential encoding.

A chip comprising an audio feature extraction device as described above or an audio feature extraction method as described above, and a processor; the processor is coupled with the audio feature extraction device and is used for executing classification tasks.

In certain classes of embodiments, the processor is a decision tree or neural network processor.

In certain classes of embodiments, the chip is a brain-like chip.

An electronic device comprising a chip as described above.

Some or all embodiments of the present invention have the following beneficial technical effects:

1) The audio front end AFE is realized by adopting a digital-analog hybrid circuit, is realized in a digital mode after a low-noise amplifier, and is realized in an asynchronous mode, and has almost the same power consumption as an analog AFE.

2) The audio front-end AFE of the present invention is based on an event-based mechanism comprising an asynchronous delta Modulator (Async delta Modulator, asynchronous ADM for short), the output data stream of which depends on the input signal and generates data only when the amplitude of the input signal changes.

3) Unlike prior delta modulators that use continuous time sampling, the asynchronous ADM of the present invention is implemented based on an asynchronous approach, using non-uniform time steps and uniform amplitude steps for sampling, thereby implementing sparse data representation of the input audio signal.

4) The invention uses pulse differential coding to generate sparse event stream (also called event sequence), and has good event sparsity, less resource occupation and low power consumption.

Further advantageous effects will be further described in the preferred embodiments.

The above-described technical solutions/features are intended to summarize the technical solutions and technical features described in the detailed description section, and thus the ranges described may not be exactly the same. However, these new solutions disclosed in this section are also part of the numerous solutions disclosed in this document, and the technical features disclosed in this section and the technical features disclosed in the following detailed description section, and some contents in the drawings not explicitly described in the specification disclose more solutions in a reasonable combination with each other.

The technical scheme combined by all the technical features disclosed in any position of the invention is used for supporting the generalization of the technical scheme, the modification of the patent document and the disclosure of the technical scheme.

Drawings

FIG. 1 is a schematic diagram of audio feature extraction and processing;

FIG. 2 is a schematic block diagram of the audio feature extraction of the present invention;

FIG. 3 is a schematic diagram of an asynchronous ADM performing sampling in accordance with an embodiment of the present invention;

FIG. 4 is a value in a shift register when the input to an asynchronous ADM crosses different thresholds;

fig. 5 shows a sequence of events resulting from differential encoding.

Detailed Description

Since various alternatives are not exhaustive, the gist of the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention. Other technical solutions and details not disclosed in detail below, which generally belong to technical objects or technical features that can be achieved by conventional means in the art, are limited in space and the present invention is not described in detail.

Except where division is used, any position "/" in this disclosure means a logical "or". The ordinal numbers "first", "second", etc., in any position of the present invention are used merely for distinguishing between the labels in the description and do not imply an absolute order in time or space, nor do they imply that the terms preceded by such ordinal numbers are necessarily different from the same terms preceded by other ordinal terms.

The present invention will be described in terms of various elements for use in various combinations of embodiments, which elements are to be combined in various methods, products. In the present invention, even if only the gist described in introducing a method/product scheme means that the corresponding product/method scheme explicitly includes the technical feature.

The description of a step, module, or feature in any location in the disclosure does not imply that the step, module, or feature is the only step or feature present, but that other embodiments may be implemented by those skilled in the art with the aid of other technical means according to the disclosed technical solutions. The embodiments of the present invention are generally disclosed for the purpose of disclosing preferred embodiments, but it is not meant to imply that the contrary embodiments of the preferred embodiments are not intended to cover all embodiments of the invention as long as such contrary embodiments are at least one technical problem addressed by the present invention. Based on the gist of the specific embodiments of the present invention, a person skilled in the art can apply means of substitution, deletion, addition, combination, exchange of sequences, etc. to certain technical features, so as to obtain a technical solution still following the inventive concept. Such solutions without departing from the technical idea of the invention are also within the scope of protection of the invention.

The audio front end AFE (also called audio feature extractor) of the present invention is based on an event driven mechanism, whose energy efficiency, power consumption, are proportional to the activity of the audio signal. The audio front end AFE performs frequency topology on the audio signal, and divides the audio into a plurality of parallel channels for processing according to the frequency. In an embodiment, a digital Band Pass Filter (BPF), a digital rectifier and a digital pulse generator are included in any channel that fits within the channel frequency band to effect conversion of quantized digital audio signals to pulse signals within the channel frequency band. Specifically, the BPF in each channel retains only a small portion of the audio input signal whose frequency matches the center frequency of the channel BPF, and detects the signal activity of the portion of the audio input signal over time. An event is generated when the amplitude or energy variation intensity of the BPF-filtered audio signal exceeds a threshold value in any channel.

The invention realizes audio front end AFE in a digital mode to the greatest extent after the LNA, and realizes digitization of signals after the LNA in an asynchronous mode. Fig. 2 is a schematic block diagram of audio feature extraction according to the present invention, including a low noise amplifier LNA, an asynchronous delta modulator (Async delta-Mod, also referred to herein simply as asynchronous ADM), and a plurality of channels in parallel, each channel including a digital band pass filter BPF, a digital rectifier, a digital pulse generator, etc., wherein the digital pulse generator may be an IAF (leak Integrated-and-Fire, LIF or Integrated-and-Fire), or other pulse encoder.

In existing synchronous ADCs, the time interval is generally fixednT ₀ Sampling the signal (whereinT ₀ In order to sample the time interval of the time,n=0, 1,2, … …) and since the signal is a continuous real number, the amplitude x of the sampled signal is requirednT ₀ ) Quantization is performed so as to be storable in a memory of limited precision, i.e. synchronous ADCs perform sampling at uniform time intervals and non-uniform amplitudes.

The asynchronous ADM of the present invention is quite different from conventional continuous time signal sampling methods in that the sampling amplitude is performed with non-uniform time steps and uniform amplitude steps delta, in particular at fixed amplitude intervalsnDelta sampling of signalsn=0, 1,2 … … or a non-negative integer), the set of fixed amplitude intervals is referred to as a threshold set, such as { Δ,2Δ,3Δ,4Δ … … }. In contrast, the sampling time corresponding to the generation event is a real value that needs to be quantized.

Let a be the maximum amplitude (which may also be referred to as the maximum amplitude, or dynamic range) of the sampled signal, a=NΔ，NThe larger the amplitude quantization accuracy is, the higher the sampling accuracy is.

Fig. 3 is a schematic diagram of an asynchronous ADM performing sampling according to an embodiment of the present invention, the asynchronous ADM receiving a signal x (input of the asynchronous ADM) from a low noise amplifier LNA that is continuous in time and amplitude and generating events at intersections with the level of discrete voltage steps (i.e. fixed amplitude intervals). Specifically, at fixed amplitude intervalsnDelta sampling signal x when the signal rises to (or above) and falls to (or below) a thresholdnAt Δ, an event occurs.

In some embodiments, the generated events further include an up event (also referred to as a start event or a first event), a down event (also referred to as an end event or a second event). The signal and the discrete amplitude interval ndelta generate an event at the horizontal intersection, the slope of the signal is positive or/and the increasing direction of the signal is positive, and if the event generated at the horizontal intersection of the positive direction and the discrete amplitude interval ndelta is an up event, that is, the signal rises to (or exceeds) a threshold ndelta, the up event is generated.

Conversely, the signal slope is negative or/and the signal decreasing direction is negative, and if the event generated at the intersection of the negative direction and the discrete amplitude interval nΔ level is a down event, that is, the signal decreases to (or below) the threshold nΔ, the down event is generated. In addition, the technical means for distinguishing the asynchronous ADM generating event can be any reasonable way, and the invention is not limited to the method.

In fig. 3, the input signal x is compared with a set of thresholds, such as Δ,2Δ,3Δ, and 4Δ, and when the input signal rises to (or exceeds) the threshold, or falls to (or falls below) the threshold, events are generated, and the time stamps of the generated events are t1 to t8. Fig. 3 is merely an exemplary example, and the present invention is not limited in terms of the time stamps and the number of generated events.

Asynchronous ADM records time stamps on different threshold crossings, digital logic writes data to a register (e.g., shift register has 2 ¹⁰ bit). FIG. 4 shows the values in the shift register when crossed by different thresholds, shifted right by one bit by appending a "1" to the leftmost bit for each up event; for each down event, a left shift operation is performed by appending a "0" to the rightmost bit.

As shown in fig. 3 and 4, the signal x is sampled at a first threshold value delta to obtain a sampled quantized pulse e1, wherein the signal crosses horizontally at t1 with a discrete amplitude interval delta to generate an up event (also called a start event, representing the start position/moment of the quantized pulse e1, crosses horizontally at t8 to generate a down event (also called an end event), representing the end position/moment of the quantized pulse, and the value in the register remains unchanged at t2-t 7.

The signal x is sampled at a second threshold value 2 delta resulting in a sampled quantized pulse e2, wherein the horizontal crossing of the signal with a discrete amplitude interval 2 delta at t2 produces an up event, indicating the start of quantized pulse e2, and the horizontal crossing at t7 produces a down event, indicating the end of quantized pulse e2, the value in the register remaining unchanged at t3-t 6. Therefore, for the second threshold 2Δ, one up event, down event is generated at time t2, t7, respectively, the value in the shift register is 001111100.

For the third threshold 3 delta, an up event, a down event, and so on are generated at times t3 and t6, respectively, with a value of 000111000 in the shift register, and so on.

The present invention quantifies the timestamps of the up event, down event generated corresponding to a threshold nΔ based on event timing to produce a pulse decomposition of signal x.

For pulse decomposition of the signal, a high frequency digital clock is utilized to locate and track when the start event (up event) and the end event (down event) occur corresponding to the threshold. In order to ensure good accuracy, the time stamps generated by adjacent events do not overlap, requiring accurate time quantization, and therefore the frequency of the high frequency digital clock needs to be sufficiently large. W for audio bandwidth and amplitude quantization asN (maximum amplitude of signal x is a=NDelta), sampling frequency of high frequency digital clockf _s The requirements are as follows:

for example, for 8-bit amplitude quantizationN＝2 ⁸ =256) and bandwidthWAudio frequency of =16khz, it is necessary to ensure that the minimum sampling rate of the high frequency digital clock isf _s =25.72 MHz so that it can correctly capture and locate the occurrence of events.

In some embodiments, an approximation of the original signal is obtained by a set of pulses

，

Wherein E represents a set of events corresponding to all threshold levels, p _e (t) represents a pulse signal corresponding to a certain threshold value from the start of the rise time (up event) to the end of the fall time (down event) of the event.

For a pair of

Pulse quantification is carried out to obtain +.>

：

Wherein,,

is a quantized pulse signal corresponding to a certain threshold value from the rising time of an event (time corresponding to an up event) to the end of the falling time (time corresponding to a down event).

The present invention utilizes a digital band pass filter BPF bank to couple with an asynchronous ADM. The digital BPF group is efficient and easy to implement, for event-based quantized signals

Filtering is performed. Let the impulse response of the digital BPF group be +.>

Where k=1, 2 … … F, F is the BPF number. For any BPF, its output is:

wherein,,

for the impulse response of a digital BPF, +.>

The impulse response representing the BPF is convolved with the asynchronous ADM sampled/quantized signal. For arbitrary->

Has the same amplitude and can therefore be directly applied to the impulse response +.>

Is->

Summation of successive samples, wherein->

The length of event e (distance/difference between the start and end times of the event) is indicated.

In a preferred embodiment of the invention, the quantization-based pulse

The output of the differential encoding calculation filter +.>

The above formula can be simplified as:

wherein,,

for differential encoding of the quantized pulses,I() Representing the accumulation operation.

Quantising pulses

Differential encoding of->

Very sparse, 0 for the whole duration of the event, +/-1 at the beginning of the event and-1 at the end of the event. Thus, in the present invention, differential encoding is with respect to the digital stream of { -1, 0, -1 }.

Therefore, we can calculate the position n at each time by performing two operations (one addition, one subtraction, which can be realized as two additions or two subtractions in practice, the invention is not limited to this)

. The method ignores the instantaneous value in the calculation process, and has simple calculation and low power consumption.

In a preferred embodiment, the ADM generates an asynchronous output, sets/resets one bit on a 10-bit shift register according to the crossing of the signal in either the positive or negative direction, and then differentially encodes by the difference in two consecutive time steps.

Based on differential encoding of the values in the shift register, a differential encoded event stream corresponding to each threshold value can be obtained, e.g. the differential encoding corresponding to the threshold value delta is …, + &

lt

1,0,0,0,0,0,0, &

lt

1,0, …, etc., i.e. only at the up eventTime of occurrence (shift register median from 0

1) Generates +1, when a down event occurs (the shift register value is 1 +.>

0) Producing-1, the rest of the time being 0. Fig. 5 shows a sequence of events resulting from differential encoding.

In general, the audio front-end of the present invention implements digitization in an asynchronous manner based on an event driven mechanism, including a low noise amplifier and an asynchronous delta modulator; the low noise amplifier LNA is coupled with the audio sensor and is used for amplifying the audio collected by the microphone with low noise; an asynchronous delta modulator is coupled to the LNA for digitizing in an asynchronous manner.

Wherein the asynchronous delta modulator performs sampling with non-uniform time steps and uniform amplitude. Specifically, at least one threshold is derived based on the uniform amplitude ΔnΔ（n=0, 1,2 … …), the asynchronous delta modulator receives from the LNA a signal x with continuous time and amplitude, based on a threshold valuenDelta sampling the signal; receiving a signal of continuous time and amplitude from an LNA at a threshold valuenThe delta intersection produces an event.

In certain preferred embodiments, the signal x, which is continuous in time and amplitude, is received from the LNA at a value that is less than the threshold valuenEvents generated at the delta intersection include up events and down events; generating an up event when the signal x slope is positive, the signal x rising to or above the at least one threshold; a down event is generated when the signal x slope is negative and the signal x falls to or below the at least one threshold.

In certain preferred embodiments, the high frequency digital clock is utilized to track and locate the timestamps of the occurrence of up or/and down events corresponding to various thresholds (e.g., delta, 2 delta, 3 delta, 4 delta, … …). Preferably, the sampling frequency of the high frequency clock

Wherein, the method comprises the steps of, wherein,Wfor the audio bandwidth to be available,Nfor the amount of maximum amplitude of the signal xAccuracy of the conversion (maximum amplitude of signal x is a=NΔ）。

The audio front-end of the present invention performs frequency topology on an audio signal, comprising a plurality of parallel channels, each comprising a digital BPF, a digital FWR and a digital IAF with different center frequencies, to maximally realize digitization after the LNA, to reduce sensitivity of the audio front-end to temperature, process, etc. The digital BPF of each channel is coupled with the asynchronous delta modulator, and the output of the digital band-pass filter is calculated by utilizing differential coding, so that the sparseness of output events of each channel is improved. Tests show that the audio front end realized by the invention has equivalent power consumption and speed of the analog AFE.

Preferably, the digital BPF of the present invention employs in particular a digital bandpass filter as in patent application number CN202211373605.6, the present invention is incorporated by reference in its entirety.

The time stamp of the up event corresponding to each threshold (such as delta, 2 delta, 3 delta, 4 delta, … …) is coded as plus 1 by the threshold, or/and the time stamp of the down event corresponding to each threshold is coded as minus 1 by the differential coding, and the rest of the time stamps between the up event corresponding to each threshold and the down event are coded as 0.

In certain preferred embodiments, the audio front-end includes a shift register coupled between the asynchronous delta modulator and the plurality of parallel channels, the sampling result of the asynchronous delta modulator being stored at a threshold; meanwhile, the digital BPF performs differential coding on the values in the shift register to obtain differential coding event streams corresponding to the thresholds.

The invention also relates to a method for extracting audio features based on an event-driven mechanism, which utilizes a low-noise amplifier for low-noise amplification of audio collected by a microphone, and comprises an asynchronous delta modulator coupled with the low-noise amplifier, wherein the asynchronous delta modulator receives a signal x with continuous time and amplitude from the low-noise amplifier and realizes digitization in an asynchronous manner.

Specifically, at least one threshold is derived based on the uniform amplitude ΔnΔ（n=0, 1,2 … …), the asynchronous delta modulationThe controller is based on the at least one thresholdnDelta sampling the signal x; the signal x generates an event at the intersection with the at least one threshold.

The invention also relates to a chip comprising an audio front-end as described above or an audio feature extraction method as described above, and a processor coupled to the front-end for performing classification tasks.

In a preferred embodiment, the processor is a decision tree or neural network processor.

Preferably, the chip is a brain-like chip.

Preferably, the neural network processor is a pulsed neural network processor (SNN).

The invention also relates to an electronic device comprising a chip as described above for audio processing, such as ambient sound detection, always on keyword discovery (KWS), voice activation detection (voiceactivity detect, VAD), vibration anomaly detection (Vibration anomaly detection), smart agriculture, smart farming, smart toys, smart home, assisted driving, etc.

Although the present invention has been described with reference to specific features and embodiments thereof, various modifications, combinations, substitutions can be made thereto without departing from the invention. The scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification, but rather, the methods and modules may be practiced in one or more products, methods, and systems of the associated, interdependent, inter-working, pre/post stages.

The specification and drawings are, accordingly, to be regarded in an abbreviated manner as an introduction to some embodiments of the technical solutions defined by the appended claims and are thus to be construed in accordance with the doctrine of greatest reasonable interpretation and are intended to cover as much as possible all modifications, changes, combinations or equivalents within the scope of the disclosure of the invention while also avoiding unreasonable interpretation.

Further improvements in the technical solutions may be made by those skilled in the art on the basis of the present invention in order to achieve better technical results or for the needs of certain applications. However, even if the partial improvement/design has creative or/and progressive characteristics, the technical idea of the present invention is relied on to cover the technical features defined in the claims, and the technical scheme shall fall within the protection scope of the present invention.

The features recited in the appended claims may be presented in the form of alternative features or in the order of some of the technical processes or the sequence of organization of materials may be combined. Those skilled in the art will readily recognize that such modifications, changes, and substitutions can be made herein after with the understanding of the present invention, by changing the sequence of the process steps and the organization of the materials, and then by employing substantially the same means to solve substantially the same technical problem and achieve substantially the same technical result, and therefore such modifications, changes, and substitutions should be made herein by the equivalency of the claims even though they are specifically defined in the appended claims.

The steps and components of the embodiments have been described generally in terms of functions in the foregoing description to clearly illustrate this interchangeability of hardware and software, and in terms of various steps or modules described in connection with the embodiments disclosed herein, may be implemented in hardware, software, or a combination of both. Whether such functionality is implemented as hardware or software depends upon the particular application or design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be beyond the scope of the claimed invention.

Claims

1. An audio feature extraction device, characterized in that:

the audio feature extraction means comprises a low noise amplifier and an asynchronous delta modulator;

the asynchronous delta modulator is coupled to the low noise amplifier;

the asynchronous delta modulator performs sampling with non-uniform time steps and uniform amplitude for digitizing in an asynchronous manner.

2. The audio feature extraction device of claim 1, wherein:

obtaining at least one threshold value based on the uniform amplitude, wherein the at least one threshold value is the uniform amplitudenThe number of times of the number of times,nis a non-negative integer;

the asynchronous delta modulator receives a signal from the low noise amplifier that is continuous in time and amplitude, and samples the signal based on the at least one threshold;

the signal and the at least one threshold value produce an event at a horizontal intersection.

3. The audio feature extraction device of claim 2, wherein:

the events include a first event and a second event;

generating a first event when the slope of the signal is positive, the signal rising to or above the at least one threshold;

a second event is generated when the slope of the signal is negative and the signal falls to or below the at least one threshold.

4. An audio feature extraction apparatus as claimed in claim 3, wherein:

tracking and locating time stamps of occurrence of first event or/and second event corresponding to each threshold value by using high frequency digital clock with frequency greater than or equal to

WhereinWFor the audio bandwidth to be available,Nfor the quantization accuracy of the maximum amplitude of the signal, the maximum amplitude is equal toNMultiplying by the uniform amplitude.

5. The audio feature extraction apparatus according to any one of claims 1 to 4, comprising:

a plurality of parallel channels, each parallel channel comprising a digital bandpass filter having a different center frequency;

the digital bandpass filter in each parallel channel is coupled to the asynchronous delta modulator;

the output of the digital band pass filter is calculated using differential encoding.

6. The audio feature extraction device of claim 5, wherein:

the time stamp of the first event corresponding to each threshold is coded as plus 1 by the threshold by differential coding, or/and the time stamp of the second event corresponding to each threshold is coded as minus 1, and the other time stamps are coded as 0.

7. The audio feature extraction device of claim 5, comprising:

a shift register coupled between the asynchronous delta modulator and a plurality of parallel channels;

the shift register stores the sampling result of the asynchronous delta modulator according to a threshold value;

the digital band-pass filter performs differential encoding on the values in the shift register to obtain differential encoding event streams corresponding to the thresholds.

8. The audio feature extraction device of claim 7, wherein:

the asynchronous delta modulator resets one bit in a multi-bit shift register based on the crossing of a signal in either the positive or negative direction with at least one threshold value at the level.

9. An audio feature extraction method is characterized in that:

an asynchronous delta modulator is coupled to the low noise amplifier, receives a signal from the low noise amplifier that is continuous in time and amplitude, performs sampling based on a non-uniform time step and a uniform amplitude, and digitizes in an asynchronous manner.

10. The audio feature extraction method according to claim 9, characterized in that:

obtaining at least one threshold value based on the uniform amplitude, the asynchronous delta modulator sampling the signal based on the at least one threshold value;

the at least one threshold is of uniform amplitudenThe number of times of the number of times,nis a non-negative integer;

the signal generates an event at a crossing with the at least one threshold.

11. The audio feature extraction method according to claim 10, characterized in that:

the events include a first event and a second event;

generating a first event when the signal slope is positive and the signal rises to or above the at least one threshold;

when the signal slope is negative and the signal falls to or below the at least one threshold, a second event is generated.

12. The audio feature extraction method according to any one of claims 9 to 11, characterized by comprising:

the digital bandpass filter is coupled to the asynchronous delta modulator;

13. A chip, characterized in that:

comprising an audio feature extraction device as claimed in any one of claims 1 to 8 and a processor;

the processor is coupled to the audio feature extraction device for performing classification tasks.

14. The chip of claim 13, wherein:

the processor is a decision tree or neural network processor, or/and the chip is a brain-like chip.

15. An electronic device, characterized in that:

the electronic device comprising a chip as claimed in any one of claims 13 to 14.