WO2023226193A1 - Audio processing method and apparatus, and non-transitory computer-readable storage medium - Google Patents

Audio processing method and apparatus, and non-transitory computer-readable storage medium Download PDF

Info

Publication number
WO2023226193A1
WO2023226193A1 PCT/CN2022/110275 CN2022110275W WO2023226193A1 WO 2023226193 A1 WO2023226193 A1 WO 2023226193A1 CN 2022110275 W CN2022110275 W CN 2022110275W WO 2023226193 A1 WO2023226193 A1 WO 2023226193A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
audio
time
processing method
encoding
Prior art date
Application number
PCT/CN2022/110275
Other languages
French (fr)
Chinese (zh)
Inventor
林功艺
Original Assignee
神盾股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 神盾股份有限公司 filed Critical 神盾股份有限公司
Priority to PCT/CN2022/117526 priority Critical patent/WO2023226234A1/en
Publication of WO2023226193A1 publication Critical patent/WO2023226193A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1781Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase characterised by the analysis of input or output signals, e.g. frequency range, modes, transfer functions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1785Methods, e.g. algorithms; Devices

Definitions

  • Embodiments of the present disclosure relate to an audio processing method, an audio processing device, and a non-transitory computer-readable storage medium.
  • noise reduction methods mainly include active noise reduction and passive noise reduction.
  • Active noise reduction uses the noise reduction system to generate an inverse signal that is equal to the external noise to neutralize the noise, thereby achieving the noise reduction effect.
  • Passive noise reduction mainly achieves the noise reduction effect by forming a closed space around the object or using sound insulation materials to block external noise.
  • Active noise reduction usually uses lagging inverted audio to destructively superimpose the originally received audio (for example, noise) to achieve the effect of audio suppression.
  • An active noise reduction process is as follows: First, the audio Vn generated by the sound source is received through the microphone, and the received audio Vn is sent to the processor. Then, the processor performs inversion processing on the audio Vn to generate inverted audio. Vn' and output the inverted audio Vn' to the speaker, and the speaker emits the inverted audio Vn'.
  • the human ear can receive the inverted audio Vn’ and the audio Vn, and the inverted audio Vn’ and the audio Vn can be destructively superimposed to achieve the effect of suppressing the audio.
  • the time of the inverted audio Vn' output by the speaker must lag behind the time of the audio Vn originally received by the microphone. Therefore, the human ear receives The time to the inverted audio Vn' must also lag behind the time when the human ear receives the audio Vn, and the silencing effect is poor, and may even be impossible to achieve.
  • At least one embodiment of the present disclosure provides an audio processing method, which includes: generating a control instruction based on a first audio signal; generating a second audio signal based on the control instruction; and outputting the second audio signal to Suppressing a third audio signal, wherein the sum of the phases of the second audio signal and the third audio signal is less than a phase threshold, and the first audio signal appears earlier than the third audio signal. time.
  • the outputting the second audio signal to suppress the third audio signal includes: based on the control instruction, determining the output of the second audio signal. The first moment; outputting the second audio signal at the first moment, wherein the third audio signal starts to appear from the second moment, and the absolute time difference between the first moment and the second moment is The value is less than the time threshold.
  • the time difference between the first moment and the second moment is 0.
  • generating a control instruction based on a first audio signal includes: acquiring the first audio signal; processing the first audio signal to predict a fourth audio signal; based on the fourth audio signal, the control instruction is generated.
  • the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain Signal.
  • processing the first audio signal to predict a fourth audio signal includes: generating a first audio feature code based on the first audio signal ; Query the lookup table based on the first audio feature coding to obtain the second audio feature coding; predict and obtain the fourth audio signal based on the second audio feature coding.
  • the lookup table includes at least one first encoding field.
  • the lookup table further includes at least one second encoding field, and multiple first encoding fields constitute one second encoding field.
  • the second audio feature encoding includes at least one of the first encoding field and/or at least one of the second encoding field.
  • obtaining the first audio signal includes: collecting an initial audio signal; performing downsampling processing on the initial audio signal to obtain the first audio signal. Signal.
  • obtaining the first audio signal includes: collecting an initial audio signal; filtering the initial audio signal to obtain the first audio signal .
  • the phase of the second audio signal is opposite to the phase of the third audio signal.
  • At least one embodiment of the present disclosure also provides an audio processing device, including: an instruction generation module configured to generate a control instruction based on a first audio signal; and an audio generation module configured to generate a second audio based on the control instruction. signal; an output module configured to output the second audio signal to suppress a third audio signal; wherein the sum of the phases of the second audio signal and the phase of the third audio signal is less than a phase threshold, the The first audio signal appears earlier than the third audio signal.
  • the output module includes a time determination sub-module and an output sub-module, and the time determination sub-module is configured to determine to output the first time based on the control instruction.
  • the first moment of the second audio signal; the output sub-module is configured to output the second audio signal at the first moment, wherein the third audio signal begins to appear from the second moment, and the first moment
  • the absolute value of the time difference between the second moment and the second moment is less than the time threshold.
  • the time difference between the first time and the second time is 0.
  • the instruction generation module includes an audio acquisition sub-module, a prediction sub-module and a generation sub-module, and the audio acquisition sub-module is configured to acquire the first audio signal; the prediction sub-module is configured to process the first audio signal to predict a fourth audio signal; the generation sub-module is configured to generate the control instruction based on the fourth audio signal.
  • the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain Signal.
  • the prediction sub-module includes a query unit and a prediction unit, the query unit is configured to generate a first audio feature encoding based on the first audio signal and a prediction unit based on the first audio signal.
  • the first audio feature coding queries a lookup table to obtain a second audio feature coding; the prediction unit is configured to predict the fourth audio signal based on the second audio feature coding.
  • the lookup table includes at least one first encoding field.
  • the lookup table further includes at least one second encoding field, and multiple first encoding fields constitute one second encoding field.
  • the second audio feature encoding includes at least one of the first encoding field and/or at least one of the second encoding field.
  • the audio acquisition sub-module includes a collection unit and a down-sampling processing unit, the collection unit is configured to collect an initial audio signal; the down-sampling processing unit is Configured to perform downsampling processing on the initial audio signal to obtain the first audio signal.
  • the audio acquisition sub-module includes an acquisition unit and a filtering unit, the acquisition unit is configured to acquire an initial audio signal; the filtering unit is configured to The initial audio signal is filtered to obtain the first audio signal.
  • the phase of the second audio signal is opposite to the phase of the third audio signal.
  • At least one embodiment of the present disclosure also provides an audio processing device, including: one or more memories non-transiently storing computer-executable instructions; one or more processors configured to run the computer-executable instructions, Wherein, the computer-executable instructions implement the audio processing method according to any embodiment of the present disclosure when run by the one or more processors.
  • At least one embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are implemented when executed by a processor.
  • An audio processing method according to any embodiment of the present disclosure.
  • a future inverted audio signal is generated by learning the characteristics of the current audio signal (ie, the first audio signal) (i.e., the second audio signal) to suppress the future audio signal (i.e., the third audio signal) to avoid the problem of out-of-synchronization of the inverted audio signal and the audio signal that needs to be suppressed due to the delay between the input end and the output end, Improving the noise canceling effect can significantly reduce or even eliminate the impact of input-to-output delay on noise canceling, and the audio suppression effect is better than the backward active noise canceling system commonly used in the industry.
  • Figure 1 is a schematic block diagram of an audio processing system provided by at least one embodiment of the present disclosure
  • Figure 2A is a schematic flow chart of an audio processing method provided by at least one embodiment of the present disclosure
  • Figure 2B is a schematic flow chart of step S10 shown in Figure 2A;
  • FIG. 2C is a schematic flow chart of step S102 shown in Figure 2B;
  • Figure 3 is a schematic diagram of a first audio signal and a third audio signal provided by at least one embodiment of the present disclosure
  • Figure 4 is a schematic diagram of a third audio signal and a fourth audio signal provided by at least one embodiment of the present disclosure
  • Figure 5A is a schematic diagram of an audio signal provided by some embodiments of the present disclosure.
  • Figure 5B is an enlarged schematic diagram of the audio signal in the dotted rectangular frame P1 in Figure 5A;
  • Figure 6 is a schematic block diagram of an audio processing device provided by at least one embodiment of the present disclosure.
  • Figure 7 is a schematic block diagram of another audio processing device provided by at least one embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure.
  • At least one embodiment of the present disclosure provides an audio processing method.
  • the audio processing method includes: generating a control instruction based on the first audio signal; generating a second audio signal based on the control instruction; and outputting the second audio signal to suppress the third audio signal.
  • the sum of the phases of the second audio signal and the phase of the third audio signal is less than the phase threshold, and the first audio signal appears earlier than the third audio signal.
  • a future inverted audio signal ie, the second audio signal
  • the future audio signal That is, the third audio signal
  • the effect of terminal delay on noise reduction is better than that of the backward active noise reduction system commonly used in the industry.
  • Embodiments of the present disclosure also provide an audio processing device and a non-transitory computer-readable storage medium.
  • the audio processing method can be applied to the audio processing device provided by the embodiment of the present disclosure, and the audio processing device can be configured on an electronic device.
  • the electronic device may be a personal computer, a mobile terminal, a car headrest, etc.
  • the mobile terminal may be a mobile phone, a headset, a tablet computer or other hardware devices.
  • Figure 1 is a schematic block diagram of an audio processing system provided by at least one embodiment of the present disclosure.
  • Figure 2A is a schematic flow chart of an audio processing method provided by at least one embodiment of the present disclosure.
  • Figure 2B is shown in Figure 2A
  • Figure 2C is a schematic flow chart of step S10 shown in Figure 2B.
  • Figure 3 is a schematic diagram of a first audio signal and a third audio signal provided by at least one embodiment of the present disclosure.
  • the audio processing system shown in Figure 1 can be used to implement the audio processing method provided by any embodiment of the present disclosure, for example, the audio processing method shown in Figure 2A.
  • the audio processing system may include an audio receiving part, an audio processing part and an audio output part.
  • the audio receiving part can receive the audio signal Sn1 emitted by the sound source at time tt1, and then transmit the audio signal Sn1 to the audio processing part.
  • the audio processing part processes the audio signal Sn1 to predict the future inverted audio signal Sn2; then the The future inverted audio signal Sn2 is output through the audio output section.
  • the future inverted audio signal Sn2 may be used to suppress the future audio signal Sn3 generated by the sound source at time tt2 later than time tt1.
  • the target object e.g., human ear, etc.
  • the target object can receive the inverted audio signal Sn2 and the future audio signal Sn3 at the same time, so that the future inverted audio signal Sn2 and the future audio signal Sn3 can be destructively superimposed, thereby achieving noise elimination.
  • the audio receiving part may include a microphone, an amplifier (for example, a microphone amplifier), an analog to digital converter (ADC), a downsampler, etc.
  • the audio processing part may include an AI engine and/or a digital signal Processor (Digital Signal Processing, DSP), etc.
  • the audio output part can include an upsampler, a digital to analog converter (digital to analog converter, DAC), an amplifier (for example, a speaker amplifier), a speaker, etc.
  • an audio processing method includes steps S10 to S12.
  • step S10 a control instruction is generated based on the first audio signal;
  • step S11 a second audio signal is generated based on the control instruction;
  • step S12 the second audio signal is output to suppress the third audio signal.
  • the first audio signal may be the audio signal Sn1 shown in FIG. 1
  • the second audio signal may be the inverted audio signal Sn2 shown in FIG. 1
  • the third audio signal may be the future audio signal Sn3 shown in FIG. 1 .
  • the audio receiving part can receive a first audio signal; the audio processing part can process the first audio signal to generate a control instruction, and generate a second audio signal based on the control instruction; the audio output part can output the second audio signal, thereby achieving Suppress third audio signal.
  • the first audio signal appears earlier than the third audio signal.
  • the time when the first audio signal starts to appear is t11
  • the time when the third audio signal starts to appear is t21.
  • time t11 is earlier than time t21.
  • the time period during which the first audio signal exists may be the time period between time t11 and time t12
  • the time period during which the third audio signal exists may be the time period between time t21 and time t22.
  • time t12 and time t21 may not be the same time, and time t12 is earlier than time t21.
  • the time period in which the audio signal exists or the time in which it appears means the time period in which the audio corresponding to the audio signal exists or the time in which it appears.
  • the sum of the phases of the second audio signal and the phase of the third audio signal is less than the phase threshold.
  • the phase threshold can be set according to the actual situation, and this disclosure does not specifically limit this.
  • the phase of the second audio signal is opposite to the phase of the third audio signal, so that complete silence can be achieved, that is, the third audio signal is completely suppressed.
  • the error energy of the audio signal received by the audio collection device is 0; if the second audio signal and the third audio signal are received by the human ear, it is equivalent to the person not hearing the sound. .
  • the first audio signal may be the time-domain audio signal with the maximum volume (maximum amplitude) between time t11 and time t12, and the first audio signal is not an audio signal of a specific frequency, so the implementation of the present disclosure
  • the audio processing method provided in the example does not need to extract spectral features from the audio signal to generate a spectrogram, which can simplify the audio signal processing process and save processing time.
  • the first audio signal and the third audio signal may be audio signals generated by the external environment, machines, etc., the sound of machine operation, the sound of electric drills and electric saws during decoration, etc.
  • machines may include household appliances (air conditioners, range hoods, washing machines, etc.) and the like.
  • step S10 may include steps S101 to 103.
  • step S101 a first audio signal is obtained; in step S102, the first audio signal is processed to predict Fourth audio signal; in step S103, a control instruction is generated based on the fourth audio signal.
  • an audio signal that has not yet been generated ie, the fourth audio signal
  • is predicted by learning the characteristics of the current audio signal ie, the first audio signal.
  • the fourth audio signal is a predicted future audio signal.
  • the time period in which the fourth audio signal exists is later than the time period in which the first audio signal exists, for example, the time period in which the fourth audio signal exists.
  • the segment is the same as the time period in which the third audio signal exists, so the time period in which the fourth audio signal exists may also be the time period between time t21 and time t22 shown in FIG. 3 .
  • Figure 4 is a schematic diagram of a third audio signal and a fourth audio signal provided by at least one embodiment of the present disclosure.
  • the horizontal axis represents time (Time)
  • the vertical axis represents amplitude (Amplitude)
  • the amplitude can be expressed as a voltage value.
  • the predicted fourth audio signal is substantially the same as the third audio signal.
  • the third audio signal and the fourth audio signal may be exactly the same.
  • the phase of the second audio signal finally generated based on the fourth audio signal is opposite to the phase of the third audio signal, thereby achieving complete Silencing.
  • processing the first audio signal to predict the fourth audio signal may include processing the first audio signal through a neural network to predict the fourth audio signal.
  • neural networks may include recurrent neural networks, long short-term memory networks, or generative adversarial networks.
  • the characteristics of the audio signal can be learned based on artificial intelligence, thereby predicting the audio signal of a certain future time period that has not yet occurred, and thereby generating an inverted audio signal of the future time period to suppress the time period audio signal.
  • step S102 may include steps S1021 to 1023.
  • a first audio feature code is generated based on the first audio signal; in step S1022, based on the first audio signal, The feature coding queries the lookup table to obtain the second audio feature coding; in step S1023, based on the second audio feature coding, a fourth audio signal is predicted.
  • the first audio signal may be an analog signal, and the first audio signal may be processed through an analog-to-digital converter to obtain a processed first audio signal.
  • the processed first audio signal may be a digital signal. Based on the processed The first audio signal may generate a first audio feature code.
  • the first audio signal may be a digital signal, such as a PDM (Pulse-density-modulation, pulse density modulation) signal.
  • the first audio feature code may be generated directly based on the first audio signal.
  • PDM signals can be represented by binary numbers 0 and 1.
  • any suitable encoding method may be used to implement the first audio feature encoding.
  • the changing state of the audio signal can be used to describe the audio signal, and multi-bits can be used to represent the changing state of the audio signal.
  • multi-bits can be used to represent the changing state of the audio signal.
  • two bits (2bits) can be used to represent the changing state of the audio signal.
  • 00 means that the audio signal becomes larger
  • 01 means that the audio signal becomes smaller
  • 10 means that there is no audio signal
  • 11 means that there is no audio signal. The audio signal remains unchanged.
  • the audio signal becomes larger means that the amplitude of the audio signal in the unit time period (each time step) becomes larger with time, and "the audio signal becomes smaller” means that the amplitude of the audio signal in the unit time period increases with time. The time becomes smaller, “the audio signal remains unchanged” means that the amplitude of the audio signal in the unit time period does not change with time, and “no audio signal” means that there is no audio signal in the unit time period, that is, the amplitude of the audio signal is 0.
  • Figure 5A is a schematic diagram of an audio signal provided by some embodiments of the present disclosure.
  • Figure 5B is an enlarged schematic diagram of the audio signal in the dotted rectangular box P1 in Figure 5A.
  • the abscissa is time (ms, milliseconds), and the ordinate is the amplitude of the audio signal (volts, volts).
  • the audio signal V is a periodically changing signal, and the periodic pattern of the audio signal V is the pattern shown by the dotted rectangular frame P2.
  • the amplitude of the audio signal represented by the waveform segment 30 does not change with time t, and the time corresponding to the waveform segment 30 is a unit time period, then the waveform segment 30 can be expressed as audio feature coding (11); similarly Ground, the amplitude of the audio signal represented by waveform segment 31 gradually increases with time t, and the time corresponding to waveform segment 31 is four unit time segments, then waveform segment 31 can be expressed as audio feature encoding (00,00,00, 00); the amplitude of the audio signal represented by waveform segment 32 remains unchanged with time t, the time corresponding to waveform segment 32 is a unit time period, and waveform segment 32 can be represented as audio feature encoding (11); represented by waveform segment 33 The amplitude of the audio signal gradually becomes smaller with time t, and the time corresponding to the waveform segment 33 is six unit time periods, then the waveform segment 33 can be expressed as the audio feature code (01,01,01,01,01); The amplitude of the audio signal
  • the audio feature encoding corresponding to the audio signal shown in Figure 5B can be expressed as ⁇ 11,00,00,00,00,11,01,01,01,01,01,11,00,00,00, 00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,00,00,00,... ⁇ .
  • a lookup table includes at least one first code field.
  • the lookup table further includes at least one second encoding field, and multiple first encoding fields constitute a second encoding field, so that dimensionally reduced high-order features can be formed from combinations of low-level features.
  • the coding method of the coding field (codeword, for example, the codeword may include a first coding field and a second coding field) in the lookup table may be the same as the coding method of the above-mentioned first audio feature coding.
  • the first encoding field when two bits are used to represent the changing state of the audio signal to implement feature encoding, the first encoding field may be one of 00, 01, 10, and 11. 00, 01, 10 and 11 can be combined to form the second encoding field.
  • a second encoding field may be represented as ⁇ 00,00,00,01,01,01,11,11,01,... ⁇ , which is composed of a combination of 00, 01 and 11.
  • the lookup table includes a plurality of second encoding fields
  • the number of first encoding fields included in each of the plurality of second encoding fields may be different.
  • the types of the first coding field can be more, for example, when 3 bits are used to represent When the audio signal changes state, there can be up to eight types of first coding fields. At this time, the first coding fields can be part or all of 000, 001, 010, 011, 100, 101, 110 and 111.
  • one or more second encoding fields can also be combined to obtain a third encoding field, or one or more second encoding fields and one or more first encoding fields can be combined to obtain a third encoding field, similarly Alternatively, one or more third coding fields may be combined or one or more third coding fields may be combined with the first coding field and/or the second coding field to obtain a higher order coding field.
  • low-order feature codes can be combined to obtain high-order feature codes, thereby achieving more efficient and longer predictions.
  • the second audio feature encoding includes at least one first encoding field and/or at least one second encoding field.
  • the second audio feature encoding may include one or more complete second encoding fields, or the second audio feature encoding may include part of the first encoding field in one second encoding field.
  • the second audio feature encoding may include at least one first encoding field and/or at least one second encoding field and/or at least one third encoding field.
  • W2 ⁇ 11,01,00,00,01, 01,01,01,01,01,01,01. ⁇
  • W3 ⁇ 11,00,01,00,00,01,01,01,11,00,00,00,01,01,01,01,01,01,01,01,01. ⁇ .
  • the audio collection device continues to collect the first audio signal.
  • the first feature encoding field corresponding to the first audio signal collected by the audio collection device is expressed as ⁇ 11 ⁇ , corresponding to waveform segment 30, a query is performed based on the lookup table to determine whether there is a certain coding field (including the first coding field and the second coding field) in the lookup table, including ⁇ 11 ⁇ .
  • the query The second encoding field W1, the second encoding field W2, and the second encoding field W3 in the lookup table all include ⁇ 11 ⁇ .
  • the second encoding field W1, the second encoding field W2, and the second encoding field W3 are all used as to-be-coded fields.
  • the second encoding field W2 can be deleted from the list of encoding fields to be output.
  • the second encoding field W2 Field W1 and the second encoding field W3 serve as the encoding fields to be output in the encoding field list to be output.
  • the third feature encoding field corresponding to the first audio signal collected by the audio collection device is represented as ⁇ 00 ⁇ , corresponding to the second unit time period in the waveform segment 31, continue to query the lookup table to determine Check whether there is a certain encoding field in the lookup table that includes ⁇ 11,00,00 ⁇ .
  • the second encoding field W1 in the lookup table is queried and includes ⁇ 11,00,00 ⁇ . Then, it can be predicted that the next audio signal should be the pattern of the second encoding field W1.
  • the fourth field in the second coding field W1 can be output (ie ⁇ 00 ⁇ ) is used as the predicted second audio coding feature.
  • the second audio feature coding is expressed as ⁇ 00,00,11,01,01,01,01,01,01 ,01 ,11 ,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,11,00,00,00 ,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,.--- ⁇ .
  • how many feature coding fields are matched before determining the second audio feature coding can be adjusted according to actual application scenarios, design requirements and other factors. For example, in the above example, when 3 matching fields (in actual In the application, if 10, 20, 50, etc.) feature coding fields can be matched, the second audio feature coding can be determined.
  • the first audio feature code corresponding to the first audio signal includes 3 feature code fields and is represented as ⁇ 11,00,00 ⁇ .
  • the time period corresponding to the first audio signal It is from time t31 to time t32.
  • the system actually needs to output the second audio signal at time t33, which is later than time t32.
  • the first two feature coding fields in the second audio feature coding ⁇ 00
  • the time period corresponding to ,00 ⁇ (that is, the time period between time t32 and time t33) has passed, so the audio feature encoding corresponding to the predicted fourth audio signal is actually expressed as ⁇ 11,01,01,01,01 ,01,01,11,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,11,11,00 ,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,... ⁇ .
  • the audio feature code corresponding to the third audio signal is also expressed as ⁇ 11,01,01,01,01,01,11,00,00,00 ,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00 ,00,00,00,00,00,00,00,00,00,00,00,... ⁇ .
  • the second audio signal is a signal obtained by inverting the fourth audio signal, that is, the second audio signal can be ⁇ 11,01,01,01,01,01,11,00,00,00, 00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,00, 00,00,00,00,00,00,00,00,00,00,00,... ⁇ The inverted audio signal of this pattern.
  • the duration of the second audio signal, the duration of the third audio signal, and the duration of the fourth audio signal are substantially the same, eg, identical.
  • the leading feature coding field may be set for at least part of the first coding field and/or the second coding field in the lookup table.
  • the leading feature coding field may be set for the second coding field W1 ⁇ 11,00 ,00 ⁇ , when the leading feature coding field is detected, the second coding field W1 is output as the second audio feature coding.
  • the first audio feature code corresponding to the first audio signal is ⁇ 11,00,00 ⁇
  • the first audio feature code corresponding to the first audio signal and the preamble feature code field ⁇ 11,00, 00 ⁇ matching so that the second encoding field W1 can be output as the second audio feature encoding.
  • the leading feature coding field ⁇ 11,00,00,01,01 ⁇ can be set for the second coding field W1.
  • the second coding field W1 and the leading feature coding field are The remaining fields in the feature encoding field are output as the second audio feature encoding.
  • the first audio signal corresponding The first audio feature encoding matches the first three fields ⁇ 11,00,00 ⁇ in the leading feature encoding field, so that the remaining fields ⁇ 01,01 ⁇ and the second encoding field W1 in the leading feature encoding field can be output as the third 2. Audio feature encoding.
  • the time corresponding to the first two feature coding fields ⁇ 01,01 ⁇ in the second audio feature coding (that is, the remaining fields in the leading feature coding field) can be the time for the system to process the signal, so that the predicted first
  • the audio feature encoding corresponding to the four audio signals may be the complete second encoding field W1.
  • the length of the leading feature encoding field can be adjusted according to actual conditions, and this disclosure does not limit this.
  • look-up tables when the memory used to store the look-up table is large enough and the content stored in the look-up table is rich enough (that is, there are enough combinations of encoding fields in the look-up table), the user's desire to eliminate all types of audio signals.
  • the samples used to train the neural network are rich enough and the types of samples are rich enough, any type of audio signal that the user wants to eliminate can be predicted based on the neural network.
  • the lookup table may be stored in the memory in the form of a table, etc.
  • the embodiments of the present disclosure do not limit the specific form of the lookup table.
  • predictions in neural networks can be achieved by looking up tables.
  • the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain signals
  • the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain signals
  • the second audio signal and/or the third audio signal and/or the fourth audio signal have the characteristics of continuous repetition or intermittence repetition, and have a fixed pattern.
  • intermittent audio signals since there is no audio signal during the pause period of the intermittent audio signal, there is no spectral feature to be extracted during the pause period, but the pause period can become the time domain feature of the intermittent audio signal. one.
  • step S101 may include: collecting an initial audio signal; performing downsampling on the initial audio signal to obtain a first audio signal.
  • the sampling rate of the initial audio signal collected by the audio acquisition device is high, it is not conducive to the back-end audio signal processing device (for example, artificial intelligence engine (AI (Artificial Intelligence) Engine), digital signal processor (Digital Signal) Processing (DSP for short), etc.), therefore, the initial audio signal can be down-sampled to achieve frequency reduction, which is convenient for processing by the audio signal processing device.
  • the frequency can be reduced to 48K Hz or even lower.
  • step S101 may include: collecting an initial audio signal; and filtering the initial audio signal to obtain a first audio signal.
  • filtering can also be performed through a bandwidth controller (Bandwidth controller) to suppress audio signals within a specific frequency range.
  • a bandwidth controller for continuous and intermittent audio signals (for example, knocking or dripping noise, etc.)
  • the effective bandwidth of the first audio signal is set to the frequency range corresponding to the audio signal that needs to be suppressed, for example, 1K ⁇ 6K Hz , thereby ensuring that users can still hear more important sounds. For example, when used in the automotive field, it must be ensured that the driver can hear the horn, etc. to improve driving safety.
  • obtaining the first audio signal may include: collecting an initial audio signal; filtering the initial audio signal to obtain an audio signal within a predetermined frequency range; and downsampling the audio signal within the predetermined frequency range.
  • Processing to obtain the first audio signal; alternatively, obtaining the first audio signal may include: collecting an initial audio signal; performing downsampling processing on the initial audio signal; and performing filtering processing on the downsampled audio signal to obtain the first audio signal.
  • control instruction may include the time at which the second audio signal is output, the fourth audio signal, a control signal instructing to invert the fourth audio signal, and the like.
  • step S11 may include: based on the control instruction, determining a fourth audio signal and a control signal indicating inverting the fourth audio signal; based on the control signal, inverting the fourth audio signal Processed to generate a second audio signal.
  • step S12 may include: determining a first moment to output the second audio signal based on the control instruction; and outputting the second audio signal at the first moment.
  • the third audio signal starts to appear from the second moment, and the absolute value of the time difference between the first moment and the second moment is less than the time threshold.
  • the time threshold can be specifically set according to the actual situation, and this disclosure does not limit this. The smaller the time threshold, the better the silencing effect.
  • the time difference between the first moment and the second moment is 0, that is, the moment when the second audio signal starts to be output and the moment when the third audio signal starts to appear are the same.
  • the time when the second audio signal starts to be output and the time when the third audio signal starts to appear are both time t21.
  • the time difference between the first moment and the second moment can be set according to the actual situation.
  • the first moment and the second moment can be set to ensure that the second audio signal and the third audio signal are transmitted to the target object at the same time, thereby avoiding The transmission of audio signals causes the second audio signal and the third audio signal to be out of sync, further improving the noise canceling effect.
  • the target object can be a human ear, a microphone, etc.
  • the second audio signal can be output through a device such as a speaker that can convert an electrical signal into a sound signal for output.
  • the audio processing method provided by the present disclosure may not be executed until the audio collection device collects the audio signal, thereby saving power consumption.
  • the audio processing method can reduce or eliminate periodic audio signals (for example, noise) in environmental audio signals.
  • periodic audio signals for example, noise
  • the sound of construction at a nearby construction site can be eliminated. wait.
  • This type of scenario does not require special knowledge of the audio signals that you want to keep. It simply reduces the target sounds to be silenced in the environment that need to be eliminated.
  • These target sounds to be silenced usually have the characteristics of continuous repetition or intermittence repetition, so they can be predicted through prediction. Predicted.
  • the "target sound to be silenced" can be determined according to the actual situation. For example, for an application scenario such as a library, when there is a construction site around the library, the external environment audio signal can include two audio signals.
  • the first The audio signal can be the sound of drilling at the construction site
  • the second audio signal can be the sound of discussions by people around you.
  • the sound of construction site drilling has periodic characteristics and usually has a fixed pattern.
  • the discussion sound most likely does not have a fixed pattern and does not have periodic characteristics.
  • the target sound to be silenced is the construction site drilling sound.
  • the audio processing method provided by embodiments of the present disclosure can be applied to automobile driving headrests to create a silent zone near the driver's ears to avoid unnecessary external audio signals (such as engine noise, road noise, wind noise, and tire noise). Noise signals while the car is driving) interfere with the driver.
  • this audio processing method can also be applied to hair dryers, range hoods, vacuum cleaners, non-inverter air conditioners and other equipment to reduce the operating sound emitted by these equipment, allowing users to stay in noisy environments without being affected by the surrounding environment. The impact of environmental noise.
  • This audio processing method can also be applied to headphones, etc., to reduce or eliminate external sounds, so that users can better receive the sounds from the headphones (music or phone calls, etc.).
  • FIG. 6 is a schematic block diagram of an audio processing device provided by at least one embodiment of the present disclosure.
  • the audio processing device 600 includes an instruction generation module 601 , an audio generation module 602 and an output module 603 .
  • the components and structures of the audio processing device 600 shown in FIG. 6 are only exemplary and not restrictive.
  • the audio processing device 600 may also include other components and structures as needed.
  • the instruction generation module 601 is configured to generate a control instruction based on the first audio signal.
  • the instruction generation module 601 is used to execute step S10 shown in Figure 2A.
  • the audio generation module 602 is configured to generate a second audio signal based on the control instruction.
  • the audio generation module 602 is used to perform step S11 shown in Figure 2A.
  • the output module 603 is configured to output the second audio signal to suppress the third audio signal.
  • the output module 603 is used to perform step S12 shown in Figure 2A.
  • step S10 shown in FIG. 2A in the embodiment of the above audio processing method.
  • step S11 shown in FIG. 2A in the embodiment of the processing method
  • step S12 shown in FIG. 2A in the embodiment of the audio processing method.
  • the audio processing device can achieve similar or identical technical effects to the foregoing audio processing method, which will not be described again here.
  • the first audio signal appears earlier than the third audio signal.
  • the sum of the phases of the second audio signal and the third audio signal is less than the phase threshold.
  • the phase of the second audio signal is opposite to the phase of the third audio signal, so that the third audio signal can be completely suppressed. .
  • the instruction generation module 601 may include an audio acquisition sub-module, a prediction sub-module and a generation sub-module.
  • the audio acquisition sub-module is configured to acquire the first audio signal;
  • the prediction sub-module is configured to process the first audio signal to predict a fourth audio signal;
  • the generation sub-module is configured to generate a control instruction based on the fourth audio signal.
  • the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain signals.
  • the third audio signal and the fourth audio signal may be exactly the same.
  • the prediction sub-module may process the first audio signal based on a neural network to predict the fourth audio signal.
  • the prediction sub-module may include the AI engine and/or digital signal processor in the audio processing part shown in Figure 1.
  • the AI engine may include a neural network.
  • the AI engine may include a recurrent neural network, a long short-term memory network, or At least one neural network among generative adversarial networks and the like.
  • the prediction sub-module includes a query unit and a prediction unit.
  • the query unit is configured to generate a first audio feature code based on the first audio signal and query the lookup table based on the first audio feature code to obtain a second audio feature code.
  • the prediction unit is configured to predict the fourth audio signal based on the second audio feature encoding.
  • the lookup unit may include memory for storing lookup tables.
  • the lookup table may include at least one first encoding field.
  • the lookup table further includes at least one second encoding field, and multiple first encoding fields constitute one second encoding field.
  • the second audio feature encoding includes at least one first encoding field and/or at least one second encoding field.
  • the audio acquisition sub-module includes an acquisition unit and a downsampling processing unit.
  • the acquisition unit is configured to collect the initial audio signal;
  • the down-sampling processing unit is configured to perform down-sampling processing on the initial audio signal to obtain the first audio signal.
  • the audio acquisition sub-module includes an acquisition unit and a filtering unit.
  • the acquisition unit is configured to acquire an initial audio signal; and the filtering unit is configured to filter the initial audio signal to obtain a first audio signal.
  • the audio acquisition sub-module can be implemented as the audio receiving part shown in Figure 1.
  • the collection unit may include an audio collection device, such as a microphone in the audio receiving part shown in FIG. 1 , or the like.
  • the acquisition unit may also include an amplifier, an analog-to-digital converter, etc.
  • the output module 603 may include a moment determination sub-module and an output sub-module.
  • the time determination sub-module is configured to determine a first time to output the second audio signal based on the control instruction; the output sub-module is configured to output the second audio signal at the first time.
  • the output module 603 may be implemented as the audio output part shown in FIG. 1 .
  • the third audio signal starts to appear from the second moment, and the absolute value of the time difference between the first moment and the second moment is less than the time threshold.
  • the time difference between the first time and the second time may be zero.
  • the output sub-module may include audio output devices such as speakers.
  • the output sub-module may also include a digital-to-analog converter, etc.
  • the instruction generation module 601, the audio generation module 602, and/or the output module 603 may be hardware, software, firmware, or any feasible combination thereof.
  • the instruction generation module 601, the audio generation module 602 and/or the output module 603 can be a dedicated or general-purpose circuit, chip or device, or a combination of a processor and a memory.
  • the embodiments of the present disclosure do not limit the specific implementation forms of each of the above modules, sub-modules and units.
  • FIG. 7 is a schematic block diagram of another audio processing device provided by at least one embodiment of the present disclosure.
  • the audio processing device 700 includes one or more memories 701 and one or more processors 702 .
  • One or more memories 701 are configured to store non-transitory computer-executable instructions; one or more processors 702 are configured to execute the computer-executable instructions.
  • the computer-executable instructions when executed by one or more processors 702, implement the audio processing method according to any of the above embodiments.
  • each step of the audio processing method please refer to the description of the above embodiments of the audio processing method, and will not be described again here.
  • the audio processing device 700 may further include a communication interface and a communication bus.
  • the memory 701, the processor 702 and the communication interface can communicate with each other through the communication bus, and the memory 701, the processor 6702 and the communication interface and other components can also communicate through a network connection. This disclosure does not limit the type and function of the network.
  • the communication bus may be a Peripheral Component Interconnect Standard (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, or the like.
  • PCI Peripheral Component Interconnect Standard
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into address bus, data bus, control bus, etc.
  • the communication interface is used to implement communication between the audio processing device 700 and other devices.
  • the communication interface may be a Universal Serial Bus (USB) interface, etc.
  • the processor 702 and the memory 701 can be provided on the server side (or cloud).
  • processor 702 may control other components in audio processing device 700 to perform desired functions.
  • the processor 702 may be a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • the central processing unit (CPU) can be X86 or ARM architecture, etc.
  • memory 701 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc.
  • Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, and the like.
  • One or more computer-executable instructions may be stored on the computer-readable storage medium, and the processor 702 may execute the computer-executable instructions to implement various functions of the audio processing device 700 .
  • Various applications and various data can also be stored in the storage medium.
  • the audio processing device 700 may be embodied in the form of a chip, a small device/device, or the like.
  • FIG. 8 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure.
  • one or more computer-executable instructions 1001 may be non-transitory stored on a non-transitory computer-readable storage medium 1000.
  • one or more steps in the audio processing method described above may be performed when the computer-executable instructions 1001 are executed by a processor.
  • the non-transitory computer-readable storage medium 1000 can be applied in the above-mentioned audio processing device 700, and for example, it can include the memory 701 in the audio processing device 700.
  • non-transitory computer-readable storage medium 1000 For description of the non-transitory computer-readable storage medium 1000, reference may be made to the description of the memory 701 in the embodiment of the audio processing device 600 shown in FIG. 7, and repeated descriptions will not be repeated.
  • At least one embodiment of the present disclosure provides an audio processing method, an audio processing device and a non-transitory computer-readable storage medium.
  • an audio signal that has not yet been generated ie, the fourth audio signal
  • the audio signal predicted based on this generates a future inverted audio signal to suppress the future audio signal, avoiding the problem of the inverted audio signal being out of sync with the audio signal that needs to be suppressed due to the delay between the input end and the output end, and improving noise reduction.
  • the effect can significantly reduce or even eliminate the impact of the input-to-output delay on noise reduction, and the audio suppression effect is better than that of the backward active noise reduction system commonly used in the industry; because the first audio signal is a time domain signal , the first audio signal is not an audio signal of a specific frequency, so the audio processing method provided by the embodiment of the present disclosure does not need to extract spectral features from the audio signal to generate a spectrogram, thereby simplifying the audio signal processing process and saving processing time.
  • low-order feature codes can be combined to obtain high-order feature codes, thereby achieving more efficient and longer predictions; and in this audio processing method, filtering processing can also be performed through the bandwidth controller , thereby achieving suppression of audio signals within a specific frequency range to ensure that users can still hear more important sounds. For example, when used in the automotive field, it must be ensured that the driver can hear the horn, etc. to improve driving safety. property; in addition, when no audio signal is collected, the audio processing method provided by the present disclosure may not be executed until the audio signal is collected, thereby saving power consumption.

Abstract

An audio processing method, an audio processing apparatus, and a non-transitory computer-readable storage medium. The audio processing method comprises: generating a control instruction on the basis of a first audio signal; generating a second audio signal on the basis of the control instruction; and outputting the second audio signal to suppress a third audio signal, wherein the sum of the phase of the second audio signal and the phase of the third audio signal is less than a phase threshold, and the time at which the first audio signal appears is earlier than the time at which the third audio signal appears.

Description

音频处理方法及装置、非瞬时性计算机可读存储介质Audio processing method and device, non-transitory computer-readable storage medium
本申请要求于2022年05月23日递交的美国临时专利申请第63/344,642号、于2022年06月13日递交的美国临时专利申请第63/351,439号以及于2022年06月14日递交的美国临时专利申请第63/352,213号的优先权,在此全文引用上述美国临时专利申请的内容以作为本申请的一部分。This application requires U.S. Provisional Patent Application No. 63/344,642 submitted on May 23, 2022, U.S. Provisional Patent Application No. 63/351,439 submitted on June 13, 2022, and U.S. Provisional Patent Application No. 63/351,439 submitted on June 14, 2022 Priority is granted to U.S. Provisional Patent Application No. 63/352,213, the contents of which are incorporated herein by reference in their entirety.
技术领域Technical field
本公开的实施例涉及一种音频处理方法、音频处理装置和非瞬时性计算机可读存储介质。Embodiments of the present disclosure relate to an audio processing method, an audio processing device, and a non-transitory computer-readable storage medium.
背景技术Background technique
目前,降噪方法主要包括主动式降噪和被动式降噪。主动式降噪是通过降噪系统产生与外界噪音相等的反相信号以将噪音中和,从而实现降噪的效果。被动式降噪主要通过在对象周围形成封闭空间或者采用隔音材料来阻挡外界噪声,从而实现降噪的效果。At present, noise reduction methods mainly include active noise reduction and passive noise reduction. Active noise reduction uses the noise reduction system to generate an inverse signal that is equal to the external noise to neutralize the noise, thereby achieving the noise reduction effect. Passive noise reduction mainly achieves the noise reduction effect by forming a closed space around the object or using sound insulation materials to block external noise.
主动式降噪通常采用落后的反相音频跟原本收到的音频(例如,噪声)进行破坏性迭加以达到抑制音频的效果。一种主动式降噪的消音流程如下:首先,通过麦克风接收声音源产生的音频Vn,并将接收的音频Vn发送到处理器,然后,处理器对音频Vn进行反相处理以生成反相音频Vn’并输出该反相音频Vn’至扬声器,扬声器发出该反相音频Vn’。人的耳朵可以接收反相音频Vn’和音频Vn,并且反相音频Vn’与音频Vn可以进行破坏性迭加从而达到抑制音频的效果。在该主动式降噪中,由于信号处理和信号传输等需要花费时间,扬声器输出的反相音频Vn’的时间必然是落后于麦克风原本收到的音频Vn的时间,由此,人的耳朵接收到反相音频Vn’的时间也必然落后于人的耳朵接收到音频Vn的时间,消音效果较差,甚至可能无法实现消音。输入端(即麦克风)到输出端(即扬声器)必然有延迟,输入端对输出端的延迟越低,则人的耳朵接收到反相音频Vn’和接收到音频Vn之间的时间差越小,消音效果越好。因此,主动式降噪对于端对端延迟要求极其严苛,使得该主动消音系统的架构必须使用高速的模拟数字转换器以及高速运算硬件等,才能达到低延迟,实现较好的抑制音频的效果,从而导致其开发成本过高且架构较无弹性。因此,如何避免 端对端延迟对主动式降噪的影响,如何实现更好的抑制音频的效果等成为需要解决的问题。Active noise reduction usually uses lagging inverted audio to destructively superimpose the originally received audio (for example, noise) to achieve the effect of audio suppression. An active noise reduction process is as follows: First, the audio Vn generated by the sound source is received through the microphone, and the received audio Vn is sent to the processor. Then, the processor performs inversion processing on the audio Vn to generate inverted audio. Vn' and output the inverted audio Vn' to the speaker, and the speaker emits the inverted audio Vn'. The human ear can receive the inverted audio Vn’ and the audio Vn, and the inverted audio Vn’ and the audio Vn can be destructively superimposed to achieve the effect of suppressing the audio. In this active noise reduction, due to the time required for signal processing and signal transmission, the time of the inverted audio Vn' output by the speaker must lag behind the time of the audio Vn originally received by the microphone. Therefore, the human ear receives The time to the inverted audio Vn' must also lag behind the time when the human ear receives the audio Vn, and the silencing effect is poor, and may even be impossible to achieve. There must be a delay from the input end (i.e. microphone) to the output end (i.e. speaker). The lower the delay from the input end to the output end, the smaller the time difference between the human ear receiving the inverted audio Vn' and the received audio Vn, the smaller the noise reduction. The better. Therefore, active noise reduction has extremely strict requirements on end-to-end delay, so the architecture of the active noise reduction system must use high-speed analog-to-digital converters and high-speed computing hardware to achieve low latency and achieve better audio suppression effects. , resulting in high development costs and less elastic architecture. Therefore, how to avoid the impact of end-to-end delay on active noise reduction and how to achieve better audio suppression effects have become problems that need to be solved.
发明内容Contents of the invention
针对上述问题,本公开至少一个实施例提供一种音频处理方法,包括:基于第一音频信号,生成控制指令;基于所述控制指令,生成第二音频信号;输出所述第二音频信号,以抑制第三音频信号,其中,所述第二音频信号的相位与所述第三音频信号的相位之和小于相位阈值,所述第一音频信号出现的时间早于所述第三音频信号出现的时间。To address the above problems, at least one embodiment of the present disclosure provides an audio processing method, which includes: generating a control instruction based on a first audio signal; generating a second audio signal based on the control instruction; and outputting the second audio signal to Suppressing a third audio signal, wherein the sum of the phases of the second audio signal and the third audio signal is less than a phase threshold, and the first audio signal appears earlier than the third audio signal. time.
例如,在本公开至少一个实施例提供的音频处理方法中,所述输出所述第二音频信号,以抑制第三音频信号,包括:基于所述控制指令,确定输出所述第二音频信号的第一时刻;在所述第一时刻输出所述第二音频信号,其中,所述第三音频信号从第二时刻开始出现,所述第一时刻和所述第二时刻之间的时间差的绝对值小于时间阈值。For example, in the audio processing method provided by at least one embodiment of the present disclosure, the outputting the second audio signal to suppress the third audio signal includes: based on the control instruction, determining the output of the second audio signal. The first moment; outputting the second audio signal at the first moment, wherein the third audio signal starts to appear from the second moment, and the absolute time difference between the first moment and the second moment is The value is less than the time threshold.
例如,在本公开至少一个实施例提供的音频处理方法中,所述第一时刻和所述第二时刻之间的时间差为0。For example, in the audio processing method provided by at least one embodiment of the present disclosure, the time difference between the first moment and the second moment is 0.
例如,在本公开至少一个实施例提供的音频处理方法中,所述基于第一音频信号,生成控制指令,包括:获取所述第一音频信号;对所述第一音频信号进行处理以预测得到第四音频信号;基于所述第四音频信号,生成所述控制指令。For example, in the audio processing method provided by at least one embodiment of the present disclosure, generating a control instruction based on a first audio signal includes: acquiring the first audio signal; processing the first audio signal to predict a fourth audio signal; based on the fourth audio signal, the control instruction is generated.
例如,在本公开至少一个实施例提供的音频处理方法中,所述第二音频信号和/或所述第三音频信号和/或所述第四音频信号是周期性的或间歇性的时域信号。For example, in the audio processing method provided by at least one embodiment of the present disclosure, the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain Signal.
例如,在本公开至少一个实施例提供的音频处理方法中,所述对所述第一音频信号进行处理以预测得到第四音频信号,包括:基于所述第一音频信号生成第一音频特征编码;基于所述第一音频特征编码查询查找表,以得到第二音频特征编码;基于所述第二音频特征编码,预测得到所述第四音频信号。For example, in the audio processing method provided by at least one embodiment of the present disclosure, processing the first audio signal to predict a fourth audio signal includes: generating a first audio feature code based on the first audio signal ; Query the lookup table based on the first audio feature coding to obtain the second audio feature coding; predict and obtain the fourth audio signal based on the second audio feature coding.
例如,在本公开至少一个实施例提供的音频处理方法中,所述查找表包括至少一个第一编码字段。For example, in the audio processing method provided by at least one embodiment of the present disclosure, the lookup table includes at least one first encoding field.
例如,在本公开至少一个实施例提供的音频处理方法中,所述查找表还包括至少一个第二编码字段,多个所述第一编码字段组成一个所述第二编码字段。For example, in the audio processing method provided by at least one embodiment of the present disclosure, the lookup table further includes at least one second encoding field, and multiple first encoding fields constitute one second encoding field.
例如,在本公开至少一个实施例提供的音频处理方法中,所述第二音频特征编码包括至少一个所述第一编码字段和/或至少一个所述第二编码字段。For example, in the audio processing method provided by at least one embodiment of the present disclosure, the second audio feature encoding includes at least one of the first encoding field and/or at least one of the second encoding field.
例如,在本公开至少一个实施例提供的音频处理方法中,所述获取所述第一音频信号,包括:采集初始音频信号;对所述初始音频信号进行下采样处理以得到所述第一音频信号。For example, in the audio processing method provided by at least one embodiment of the present disclosure, obtaining the first audio signal includes: collecting an initial audio signal; performing downsampling processing on the initial audio signal to obtain the first audio signal. Signal.
例如,在本公开至少一个实施例提供的音频处理方法中,所述获取所述第一音频信号,包括:采集初始音频信号;对所述初始音频信号进行滤波处理以得到所述第一音频信号。For example, in the audio processing method provided by at least one embodiment of the present disclosure, obtaining the first audio signal includes: collecting an initial audio signal; filtering the initial audio signal to obtain the first audio signal .
例如,在本公开至少一个实施例提供的音频处理方法中,所述第二音频信号的相位与所述第三音频信号的相位相反。For example, in the audio processing method provided by at least one embodiment of the present disclosure, the phase of the second audio signal is opposite to the phase of the third audio signal.
本公开至少一个实施例还提供一种音频处理装置,包括:指令生成模块,被配置为基于第一音频信号,生成控制指令;音频生成模块,被配置为基于所述控制指令,生成第二音频信号;输出模块,被配置为输出所述第二音频信号,以抑制第三音频信号;其中,所述第二音频信号的相位与所述第三音频信号的相位之和小于相位阈值,所述第一音频信号出现的时间早于所述第三音频信号出现的时间。At least one embodiment of the present disclosure also provides an audio processing device, including: an instruction generation module configured to generate a control instruction based on a first audio signal; and an audio generation module configured to generate a second audio based on the control instruction. signal; an output module configured to output the second audio signal to suppress a third audio signal; wherein the sum of the phases of the second audio signal and the phase of the third audio signal is less than a phase threshold, the The first audio signal appears earlier than the third audio signal.
例如,在本公开至少一个实施例提供的音频处理装置中,所述输出模块包括时刻确定子模块和输出子模块,所述时刻确定子模块被配置为基于所述控制指令,确定输出所述第二音频信号的第一时刻;所述输出子模块被配置为在所述第一时刻输出所述第二音频信号,其中,所述第三音频信号从第二时刻开始出现,所述第一时刻和所述第二时刻之间的时间差的绝对值小于时间阈值。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the output module includes a time determination sub-module and an output sub-module, and the time determination sub-module is configured to determine to output the first time based on the control instruction. The first moment of the second audio signal; the output sub-module is configured to output the second audio signal at the first moment, wherein the third audio signal begins to appear from the second moment, and the first moment The absolute value of the time difference between the second moment and the second moment is less than the time threshold.
例如,在本公开至少一个实施例提供的音频处理装置中,所述第一时刻和所述第二时刻之间的时间差为0。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the time difference between the first time and the second time is 0.
例如,在本公开至少一个实施例提供的音频处理装置中,所述指令生成模块包括音频获取子模块、预测子模块和生成子模块,所述音频获取子模块被配置为获取所述第一音频信号;所述预测子模块被配置为对所述第一音频信号进行处理以预测得到第四音频信号;所述生成子模块被配置为基于所述第四音频信号,生成所述控制指令。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the instruction generation module includes an audio acquisition sub-module, a prediction sub-module and a generation sub-module, and the audio acquisition sub-module is configured to acquire the first audio signal; the prediction sub-module is configured to process the first audio signal to predict a fourth audio signal; the generation sub-module is configured to generate the control instruction based on the fourth audio signal.
例如,在本公开至少一个实施例提供的音频处理装置中,所述第二音频信号和/或所述第三音频信号和/或所述第四音频信号是周期性的或间歇性的时域信号。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain Signal.
例如,在本公开至少一个实施例提供的音频处理装置中,所述预测子模块包括查询单元和预测单元,所述查询单元被配置为基于所述第一音频信号生成第一音频特征编码以及基于所述第一音频特征编码查询查找表,以得到第二音频特征编码;所述预测单元被配置为基于所述第二音频特征编码,预测得到所述第四音频信号。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the prediction sub-module includes a query unit and a prediction unit, the query unit is configured to generate a first audio feature encoding based on the first audio signal and a prediction unit based on the first audio signal. The first audio feature coding queries a lookup table to obtain a second audio feature coding; the prediction unit is configured to predict the fourth audio signal based on the second audio feature coding.
例如,在本公开至少一个实施例提供的音频处理装置中,所述查找表包括至少一个第一编码字段。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the lookup table includes at least one first encoding field.
例如,在本公开至少一个实施例提供的音频处理装置中,所述查找表还包括至少一个第二编码字段,多个所述第一编码字段组成一个所述第二编码字段。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the lookup table further includes at least one second encoding field, and multiple first encoding fields constitute one second encoding field.
例如,在本公开至少一个实施例提供的音频处理装置中,所述第二音频特征编码包括至少一个所述第一编码字段和/或至少一个所述第二编码字段。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the second audio feature encoding includes at least one of the first encoding field and/or at least one of the second encoding field.
例如,在本公开至少一个实施例提供的音频处理装置中,所述音频获取子模块包括采集单元和下采样处理单元,所述采集单元被配置为采集初始音频信号;所述下采样处理单元被配置为对所述初始音频信号进行下采样处理以得到所述第一音频信号。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the audio acquisition sub-module includes a collection unit and a down-sampling processing unit, the collection unit is configured to collect an initial audio signal; the down-sampling processing unit is Configured to perform downsampling processing on the initial audio signal to obtain the first audio signal.
例如,在本公开至少一个实施例提供的音频处理装置中,所述音频获取子模块包括采集单元和滤波单元,所述采集单元被配置为采集初始音频信号;所述滤波单元被配置为对所述初始音频信号进行滤波处理以得到所述第一音频信号。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the audio acquisition sub-module includes an acquisition unit and a filtering unit, the acquisition unit is configured to acquire an initial audio signal; the filtering unit is configured to The initial audio signal is filtered to obtain the first audio signal.
例如,在本公开至少一个实施例提供的音频处理装置中,所述第二音频信号的相位与所述第三音频信号的相位相反。For example, in the audio processing device provided by at least one embodiment of the present disclosure, the phase of the second audio signal is opposite to the phase of the third audio signal.
本公开至少一个实施例还提供一种音频处理装置,包括:一个或多个存储器,非瞬时性地存储有计算机可执行指令;一个或多个处理器,配置为运行所述计算机可执行指令,其中,所述计算机可执行指令被所述一个或多个处理器运行时实现根据本公开任一个实施例所述的音频处理方法。At least one embodiment of the present disclosure also provides an audio processing device, including: one or more memories non-transiently storing computer-executable instructions; one or more processors configured to run the computer-executable instructions, Wherein, the computer-executable instructions implement the audio processing method according to any embodiment of the present disclosure when run by the one or more processors.
本公开至少一个实施例还提供一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据本公开任一个实施例所述的音频处理方法。At least one embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are implemented when executed by a processor. An audio processing method according to any embodiment of the present disclosure.
根据本公开的任一实施例提供的音频处理方法、音频处理装置和非瞬时性计算机可读存储介质,通过学习当前音频信号(即,第一音频信号)的特征,产生未来的反相音频信号(即,第二音频信号)以抑制未来音频信号(即,第 三音频信号),避免由于输入端和输出端之间的延迟导致的反相音频信号和需要抑制的音频信号不同步的问题,提升消音效果,可大幅降低或甚至消除输入端对输出端的延迟对消音的影响,抑制音频的效果比业界常用的落后式的主动消音系统的抑制音频的效果更好。According to the audio processing method, audio processing device and non-transitory computer-readable storage medium provided by any embodiment of the present disclosure, a future inverted audio signal is generated by learning the characteristics of the current audio signal (ie, the first audio signal) (i.e., the second audio signal) to suppress the future audio signal (i.e., the third audio signal) to avoid the problem of out-of-synchronization of the inverted audio signal and the audio signal that needs to be suppressed due to the delay between the input end and the output end, Improving the noise canceling effect can significantly reduce or even eliminate the impact of input-to-output delay on noise canceling, and the audio suppression effect is better than the backward active noise canceling system commonly used in the industry.
附图说明Description of the drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below. Obviously, the drawings in the following description only relate to some embodiments of the present disclosure and do not limit the present disclosure. .
图1为本公开至少一个实施例提供的一种音频处理系统的示意性框图;Figure 1 is a schematic block diagram of an audio processing system provided by at least one embodiment of the present disclosure;
图2A为本公开至少一个实施例提供的一种音频处理方法的示意性流程图;Figure 2A is a schematic flow chart of an audio processing method provided by at least one embodiment of the present disclosure;
图2B为图2A所示的步骤S10的示意性流程图;Figure 2B is a schematic flow chart of step S10 shown in Figure 2A;
图2C为图2B所示的步骤S102的示意性流程图;Figure 2C is a schematic flow chart of step S102 shown in Figure 2B;
图3为本公开至少一个实施例提供的一种第一音频信号和第三音频信号的示意图;Figure 3 is a schematic diagram of a first audio signal and a third audio signal provided by at least one embodiment of the present disclosure;
图4为本公开至少一个实施例提供的一种第三音频信号和第四音频信号的示意图;Figure 4 is a schematic diagram of a third audio signal and a fourth audio signal provided by at least one embodiment of the present disclosure;
图5A为本公开一些实施例提供的一种音频信号的示意图;Figure 5A is a schematic diagram of an audio signal provided by some embodiments of the present disclosure;
图5B为图5A中的虚线矩形框P1中的音频信号的放大示意图;Figure 5B is an enlarged schematic diagram of the audio signal in the dotted rectangular frame P1 in Figure 5A;
图6为本公开至少一个实施例提供的一种音频处理装置的示意性框图;Figure 6 is a schematic block diagram of an audio processing device provided by at least one embodiment of the present disclosure;
图7为本公开至少一个实施例提供的另一种音频处理装置的示意性框图;以及Figure 7 is a schematic block diagram of another audio processing device provided by at least one embodiment of the present disclosure; and
图8为本公开至少一个实施例提供的一种非瞬时性计算机可读存储介质的示意图。FIG. 8 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure.
具体实施方式Detailed ways
为了使得本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings of the embodiments of the present disclosure. Obviously, the described embodiments are some, but not all, of the embodiments of the present disclosure. Based on the described embodiments of the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present disclosure.
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。Unless otherwise defined, technical terms or scientific terms used in this disclosure shall have the usual meaning understood by a person with ordinary skill in the art to which this disclosure belongs. "First", "second" and similar words used in this disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Words such as "include" or "comprising" mean that the elements or things appearing before the word include the elements or things listed after the word and their equivalents, without excluding other elements or things. Words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
为了保持本公开实施例的以下说明清楚且简明,本公开省略了部分已知功能和已知部件的详细说明。In order to keep the following description of the embodiments of the present disclosure clear and concise, the present disclosure omits detailed descriptions of some well-known functions and well-known components.
本公开至少一个实施例提供一种音频处理方法。该音频处理方法包括:基于第一音频信号,生成控制指令;基于控制指令,生成第二音频信号;输出第二音频信号,以抑制第三音频信号。第二音频信号的相位与第三音频信号的相位之和小于相位阈值,第一音频信号出现的时间早于第三音频信号出现的时间。At least one embodiment of the present disclosure provides an audio processing method. The audio processing method includes: generating a control instruction based on the first audio signal; generating a second audio signal based on the control instruction; and outputting the second audio signal to suppress the third audio signal. The sum of the phases of the second audio signal and the phase of the third audio signal is less than the phase threshold, and the first audio signal appears earlier than the third audio signal.
在本公开的实施例提供的音频处理方法中,通过学习当前音频信号(即,第一音频信号)的特征,产生未来的反相音频信号(即,第二音频信号)以抑制未来音频信号(即,第三音频信号),避免由于输入端和输出端之间的延迟导致的反相音频信号和需要抑制的音频信号不同步的问题,提升消音效果,可大幅降低或甚至消除输入端对输出端的延迟对消音的影响,抑制音频的效果比业界常用的落后式的主动消音系统的抑制音频的效果更好。In the audio processing method provided by embodiments of the present disclosure, by learning the characteristics of the current audio signal (ie, the first audio signal), a future inverted audio signal (ie, the second audio signal) is generated to suppress the future audio signal ( That is, the third audio signal), avoids the problem of out-of-synchronization between the inverted audio signal and the audio signal that needs to be suppressed due to the delay between the input end and the output end, improves the noise canceling effect, and can significantly reduce or even eliminate the impact of the input end on the output The effect of terminal delay on noise reduction is better than that of the backward active noise reduction system commonly used in the industry.
本公开的实施例还提供一种音频处理装置和非瞬时性计算机可读存储介质。该音频处理方法可应用于本公开实施例提供的音频处理装置,该音频处理装置可被配置于电子设备上。该电子设备可以是个人计算机、移动终端、汽车头枕等,该移动终端可以是手机、耳机、平板电脑等硬件设备。Embodiments of the present disclosure also provide an audio processing device and a non-transitory computer-readable storage medium. The audio processing method can be applied to the audio processing device provided by the embodiment of the present disclosure, and the audio processing device can be configured on an electronic device. The electronic device may be a personal computer, a mobile terminal, a car headrest, etc. The mobile terminal may be a mobile phone, a headset, a tablet computer or other hardware devices.
下面结合附图对本公开的实施例进行详细说明,但是本公开并不限于这些具体的实施例。The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings, but the present disclosure is not limited to these specific embodiments.
图1为本公开至少一个实施例提供的一种音频处理系统的示意性框图,图2A为本公开至少一个实施例提供的一种音频处理方法的示意性流程图,图2B为图2A所示的步骤S10的示意性流程图,图2C为图2B所示的步骤S102的示意性流程图,图3为本公开至少一个实施例提供的一种第一音频信号和第三音频信号的示意图。Figure 1 is a schematic block diagram of an audio processing system provided by at least one embodiment of the present disclosure. Figure 2A is a schematic flow chart of an audio processing method provided by at least one embodiment of the present disclosure. Figure 2B is shown in Figure 2A Figure 2C is a schematic flow chart of step S10 shown in Figure 2B. Figure 3 is a schematic diagram of a first audio signal and a third audio signal provided by at least one embodiment of the present disclosure.
图1所示的音频处理系统可以用于实现本公开任一实施例提供的音频处理 方法,例如,图2A所示的音频处理方法。如图1所示,音频处理系统可以包括音频接收部分、音频处理部分和音频输出部分。音频接收部分可以接收声音源在时刻tt1发出的音频信号Sn1,然后将音频信号Sn1传输至音频处理部分,音频处理部分对音频信号Sn1进行处理,以预测得到未来的反相音频信号Sn2;然后该未来的反相音频信号Sn2通过音频输出部分输出。未来的反相音频信号Sn2可以用于抑制声音源在晚于时刻tt1的时刻tt2产生的未来音频信号Sn3。例如,目标对象(例如,人的耳朵等)可以同时接收到反相音频信号Sn2和未来音频信号Sn3,以使得未来的反相音频信号Sn2和未来音频信号Sn3可以进行破坏性叠加,从而实现消音。The audio processing system shown in Figure 1 can be used to implement the audio processing method provided by any embodiment of the present disclosure, for example, the audio processing method shown in Figure 2A. As shown in Figure 1, the audio processing system may include an audio receiving part, an audio processing part and an audio output part. The audio receiving part can receive the audio signal Sn1 emitted by the sound source at time tt1, and then transmit the audio signal Sn1 to the audio processing part. The audio processing part processes the audio signal Sn1 to predict the future inverted audio signal Sn2; then the The future inverted audio signal Sn2 is output through the audio output section. The future inverted audio signal Sn2 may be used to suppress the future audio signal Sn3 generated by the sound source at time tt2 later than time tt1. For example, the target object (e.g., human ear, etc.) can receive the inverted audio signal Sn2 and the future audio signal Sn3 at the same time, so that the future inverted audio signal Sn2 and the future audio signal Sn3 can be destructively superimposed, thereby achieving noise elimination. .
例如,音频接收部分可以包括麦克风、放大器(例如,麦克风放大器)、模数转换器(analog to digital converter,ADC)、下采样器(downsampler)等,音频处理部分可以包括AI引擎和/或数字信号处理器(Digital Signal Processing,DSP))等,音频输出部分可以包括上采样器(Upsampler)、数模转换器(digital to analog converter,DAC)、放大器(例如,扬声器放大器)以及扬声器等。For example, the audio receiving part may include a microphone, an amplifier (for example, a microphone amplifier), an analog to digital converter (ADC), a downsampler, etc., and the audio processing part may include an AI engine and/or a digital signal Processor (Digital Signal Processing, DSP), etc., the audio output part can include an upsampler, a digital to analog converter (digital to analog converter, DAC), an amplifier (for example, a speaker amplifier), a speaker, etc.
如图2A所示,本公开的一个实施例提供的音频处理方法包括步骤S10至S12。在步骤S10,基于第一音频信号,生成控制指令;在步骤S11,基于控制指令,生成第二音频信号;在步骤S12,输出第二音频信号,以抑制第三音频信号。As shown in Figure 2A, an audio processing method provided by one embodiment of the present disclosure includes steps S10 to S12. In step S10, a control instruction is generated based on the first audio signal; in step S11, a second audio signal is generated based on the control instruction; in step S12, the second audio signal is output to suppress the third audio signal.
例如,第一音频信号可以为图1所示的音频信号Sn1,第二音频信号可以为图1所示的反相音频信号Sn2,第三音频信号可以为图1所示的未来音频信号Sn3。For example, the first audio signal may be the audio signal Sn1 shown in FIG. 1 , the second audio signal may be the inverted audio signal Sn2 shown in FIG. 1 , and the third audio signal may be the future audio signal Sn3 shown in FIG. 1 .
例如,音频接收部分可以接收第一音频信号;音频处理部分可以对第一音频信号进行处理以生成控制指令,并基于控制指令生成第二音频信号;音频输出部分可以输出第二音频信号,从而实现抑制第三音频信号。For example, the audio receiving part can receive a first audio signal; the audio processing part can process the first audio signal to generate a control instruction, and generate a second audio signal based on the control instruction; the audio output part can output the second audio signal, thereby achieving Suppress third audio signal.
例如,第一音频信号出现的时间早于第三音频信号出现的时间。如图3所示,第一音频信号开始出现的时刻为t11,第三音频信号开始出现的时刻为t21,在时间轴t上,时刻t11早于时刻t21。例如,第一音频信号存在的时间段可以为时刻t11到时刻t12之间的时间段,第三音频信号存在的时间段为时刻t21到时刻t22之间的时间段。考虑到信号处理过程的时间等因素,时刻t12和时刻t21可以不是同一时刻,时刻t12早于时刻t21。For example, the first audio signal appears earlier than the third audio signal. As shown in Figure 3, the time when the first audio signal starts to appear is t11, and the time when the third audio signal starts to appear is t21. On the time axis t, time t11 is earlier than time t21. For example, the time period during which the first audio signal exists may be the time period between time t11 and time t12, and the time period during which the third audio signal exists may be the time period between time t21 and time t22. Taking into account factors such as the time of the signal processing process, time t12 and time t21 may not be the same time, and time t12 is earlier than time t21.
需要说明的是,在本公开的实施例中,“音频信号存在的时间段或出现的 时间”表示该音频信号对应的音频存在的时间段或出现的时间。It should be noted that, in the embodiment of the present disclosure, "the time period in which the audio signal exists or the time in which it appears" means the time period in which the audio corresponding to the audio signal exists or the time in which it appears.
例如,第二音频信号的相位与第三音频信号的相位之和小于相位阈值,相位阈值可以根据实际情况设置,本公开对此不作具体限制。例如,在一些实施例中,第二音频信号的相位与第三音频信号的相位相反,从而可以实现完全消音,即完全抑制第三音频信号,此时,当第二音频信号和第三音频信号由音频采集装置(例如,麦克风等)接收时,音频采集装置所接收到的音频信号的误差能量为0;若第二音频信号和第三音频信号被人耳接收,相当于人没有听到声音。For example, the sum of the phases of the second audio signal and the phase of the third audio signal is less than the phase threshold. The phase threshold can be set according to the actual situation, and this disclosure does not specifically limit this. For example, in some embodiments, the phase of the second audio signal is opposite to the phase of the third audio signal, so that complete silence can be achieved, that is, the third audio signal is completely suppressed. At this time, when the second audio signal and the third audio signal When received by an audio collection device (for example, a microphone, etc.), the error energy of the audio signal received by the audio collection device is 0; if the second audio signal and the third audio signal are received by the human ear, it is equivalent to the person not hearing the sound. .
例如,在一些实施例中,第一音频信号可以为时刻t11到时刻t12之间的最大声量(振幅最大)的时域音频信号,第一音频信号不是特定频率的音频信号,从而本公开的实施例提供的音频处理方法不需要从音频信号中提取频谱特征来产生频谱图,由此可以简化音频信号的处理过程,节省处理时间。For example, in some embodiments, the first audio signal may be the time-domain audio signal with the maximum volume (maximum amplitude) between time t11 and time t12, and the first audio signal is not an audio signal of a specific frequency, so the implementation of the present disclosure The audio processing method provided in the example does not need to extract spectral features from the audio signal to generate a spectrogram, which can simplify the audio signal processing process and save processing time.
例如,第一音频信号和第三音频信号可以为外界环境、机器等产生的音频信号,机器运转的声音、装修过程的电钻声和电锯声等。例如,机器可以包括家用电器(空调、抽油烟机、洗衣机等)等。For example, the first audio signal and the third audio signal may be audio signals generated by the external environment, machines, etc., the sound of machine operation, the sound of electric drills and electric saws during decoration, etc. For example, machines may include household appliances (air conditioners, range hoods, washing machines, etc.) and the like.
例如,在一些实施例中,如图2B所示,步骤S10可以包括步骤S101~步骤103,在步骤S101中,获取第一音频信号;在步骤S102中,对第一音频信号进行处理以预测得到第四音频信号;在步骤S103中,基于第四音频信号,生成控制指令。在本公开的实施例提供的音频处理方法中,通过学习当前音频信号(即第一音频信号)的特征,预测得到尚未产生的音频信号(即第四音频信号)。For example, in some embodiments, as shown in Figure 2B, step S10 may include steps S101 to 103. In step S101, a first audio signal is obtained; in step S102, the first audio signal is processed to predict Fourth audio signal; in step S103, a control instruction is generated based on the fourth audio signal. In the audio processing method provided by embodiments of the present disclosure, an audio signal that has not yet been generated (ie, the fourth audio signal) is predicted by learning the characteristics of the current audio signal (ie, the first audio signal).
例如,第四音频信号是预测得到的未来的音频信号,例如,在时间轴上,第四音频信号存在的时间段落后于第一音频信号存在的时间段,例如,第四音频信号存在的时间段与第三音频信号存在的时间段相同,从而第四音频信号存在的时间段也可以为图3所示的时刻t21到时刻t22之间的时间段。For example, the fourth audio signal is a predicted future audio signal. For example, on the time axis, the time period in which the fourth audio signal exists is later than the time period in which the first audio signal exists, for example, the time period in which the fourth audio signal exists. The segment is the same as the time period in which the third audio signal exists, so the time period in which the fourth audio signal exists may also be the time period between time t21 and time t22 shown in FIG. 3 .
图4为本公开至少一个实施例提供的一种第三音频信号和第四音频信号的示意图。在图4所示的示例中,横轴表示时间(Time),纵轴表示幅度(Amplitude),幅度可以表示为电压值。如图4所示,在一个实施例中,预测得到的第四音频信号与第三音频信号大致相同。Figure 4 is a schematic diagram of a third audio signal and a fourth audio signal provided by at least one embodiment of the present disclosure. In the example shown in Figure 4, the horizontal axis represents time (Time), the vertical axis represents amplitude (Amplitude), and the amplitude can be expressed as a voltage value. As shown in Figure 4, in one embodiment, the predicted fourth audio signal is substantially the same as the third audio signal.
例如,在一实施例中,第三音频信号和第四音频信号可以完全相同,此时,基于第四音频信号最终生成的第二音频信号的相位与第三音频信号的相位相 反,从而实现完全消音。For example, in one embodiment, the third audio signal and the fourth audio signal may be exactly the same. In this case, the phase of the second audio signal finally generated based on the fourth audio signal is opposite to the phase of the third audio signal, thereby achieving complete Silencing.
例如,在步骤S102中,对第一音频信号进行处理以预测第四音频信号可以包括通过神经网络对第一音频信号进行处理以预测得到第四音频信号。For example, in step S102, processing the first audio signal to predict the fourth audio signal may include processing the first audio signal through a neural network to predict the fourth audio signal.
例如,神经网络可以包括循环神经网络、长短时记忆网络或生成对抗网络等。在本公开的实施例中,可以基于人工智能学习音频信号的特征,从而预测尚未发生的未来某个时间段的音频信号,据此产生未来的该时间段的反相音频信号,用以抑制该时间段的音频信号。For example, neural networks may include recurrent neural networks, long short-term memory networks, or generative adversarial networks. In embodiments of the present disclosure, the characteristics of the audio signal can be learned based on artificial intelligence, thereby predicting the audio signal of a certain future time period that has not yet occurred, and thereby generating an inverted audio signal of the future time period to suppress the time period audio signal.
例如,在一些实施例中,如图2C所示,步骤S102可以包括步骤S1021~步骤1023,在步骤S1021中,基于第一音频信号生成第一音频特征编码;在步骤S1022中,基于第一音频特征编码查询查找表,以得到第二音频特征编码;在步骤S1023中,基于第二音频特征编码,预测得到第四音频信号。For example, in some embodiments, as shown in Figure 2C, step S102 may include steps S1021 to 1023. In step S1021, a first audio feature code is generated based on the first audio signal; in step S1022, based on the first audio signal, The feature coding queries the lookup table to obtain the second audio feature coding; in step S1023, based on the second audio feature coding, a fourth audio signal is predicted.
例如,第一音频信号可以为模拟信号,可以通过模数转换器对第一音频信号进行处理,以得到处理后的第一音频信号,处理后的第一音频信号为数字信号,基于该处理后的第一音频信号可以生成第一音频特征编码。For example, the first audio signal may be an analog signal, and the first audio signal may be processed through an analog-to-digital converter to obtain a processed first audio signal. The processed first audio signal may be a digital signal. Based on the processed The first audio signal may generate a first audio feature code.
又例如,第一音频信号可以为数字信号,例如,PDM(Pulse-density-modulation,脉冲密度调制)信号,此时,可以直接基于第一音频信号生成第一音频特征编码。PDM信号可以采用二进制数0和1表示。For another example, the first audio signal may be a digital signal, such as a PDM (Pulse-density-modulation, pulse density modulation) signal. In this case, the first audio feature code may be generated directly based on the first audio signal. PDM signals can be represented by binary numbers 0 and 1.
例如,可以采用任何合适的编码方式实现第一音频特征编码。例如,在一些实施例中,在表示一个音频信号时,可以采用音频信号的变化状态来描述该音频信号,可以采用多比特(multi-bits)来表示一个音频信号的变化状态。例如,可以采用两比特(2bits)表示音频信号的变化状态,在一些示例中,如下述表格1所示,00表示音频信号变大,01表示音频信号变小,10表示没有音频信号,11表示音频信号不变。For example, any suitable encoding method may be used to implement the first audio feature encoding. For example, in some embodiments, when representing an audio signal, the changing state of the audio signal can be used to describe the audio signal, and multi-bits can be used to represent the changing state of the audio signal. For example, two bits (2bits) can be used to represent the changing state of the audio signal. In some examples, as shown in Table 1 below, 00 means that the audio signal becomes larger, 01 means that the audio signal becomes smaller, 10 means that there is no audio signal, and 11 means that there is no audio signal. The audio signal remains unchanged.
比特Bits 音频信号的变化状态The changing state of the audio signal
0000 音频信号变大Audio signal becomes louder
0101 音频信号变小Audio signal becomes smaller
1010 没有音频信号no audio signal
1111 音频信号不变Audio signal remains unchanged
表1Table 1
“音频信号变大”表示单位时间段(每个时间步(time step))中的音频信号的振幅随着时间变大,“音频信号变小”表示单位时间段中的音频信号的振幅 随着时间变小,“音频信号不变”表示单位时间段中的音频信号的振幅随着时间不变,“没有音频信号”表示在单位时间段中没有音频信号,即音频信号的振幅为0。"The audio signal becomes larger" means that the amplitude of the audio signal in the unit time period (each time step) becomes larger with time, and "the audio signal becomes smaller" means that the amplitude of the audio signal in the unit time period increases with time. The time becomes smaller, "the audio signal remains unchanged" means that the amplitude of the audio signal in the unit time period does not change with time, and "no audio signal" means that there is no audio signal in the unit time period, that is, the amplitude of the audio signal is 0.
图5A为本公开一些实施例提供的一种音频信号的示意图,图5B为图5A中的虚线矩形框P1中的音频信号的放大示意图。Figure 5A is a schematic diagram of an audio signal provided by some embodiments of the present disclosure. Figure 5B is an enlarged schematic diagram of the audio signal in the dotted rectangular box P1 in Figure 5A.
在图5A中,横坐标为时间(ms,毫秒),纵坐标为音频信号的振幅(volts,伏特)。如图5A所示,音频信号V是周期性变化的信号,音频信号V的周期性的模式(pattern)为虚线矩形框P2所示的模式。In Figure 5A, the abscissa is time (ms, milliseconds), and the ordinate is the amplitude of the audio signal (volts, volts). As shown in FIG. 5A , the audio signal V is a periodically changing signal, and the periodic pattern of the audio signal V is the pattern shown by the dotted rectangular frame P2.
如图5B所示,波形段30所表示的音频信号的振幅随着时间t不变,波形段30对应的时间为一个单位时间段,则波形段30可以表示为音频特征编码(11);类似地,波形段31所表示的音频信号的振幅随着时间t逐渐变大,波形段31对应的时间为四个单位时间段,则波形段31可以表示为音频特征编码(00,00,00,00);波形段32所表示的音频信号的振幅随着时间t不变,波形段32对应的时间为一个单位时间段,波形段32可以表示为音频特征编码(11);波形段33所表示的音频信号的振幅随着时间t逐渐变小,波形段33对应的时间为六个单位时间段,则波形段33可以表示为音频特征编码(01,01,01,01,01,01);波形段34所表示的音频信号的振幅随着时间t不变,波形段34对应的时间为一个单位时间段,则波形段34可以表示为音频特征编码(11);波形段35所表示的音频信号的振幅随着时间t逐渐变大,波形段35对应的时间为八个单位时间段,则波形段35可以表示为音频特征编码(00,00,00,00,00,00,00,00);以此类推,波形段36可以表示为音频特征编码(01,01,01,01,01,01,01,01,01,01,01,01),波形段37可以表示为音频特征编码(11),波形段38可以表示为音频特征编码(00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00)。从而,图5B所示的音频信号对应的音频特征编码可以表示为{11,00,00,00,00,11,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,…}。As shown in Figure 5B, the amplitude of the audio signal represented by the waveform segment 30 does not change with time t, and the time corresponding to the waveform segment 30 is a unit time period, then the waveform segment 30 can be expressed as audio feature coding (11); similarly Ground, the amplitude of the audio signal represented by waveform segment 31 gradually increases with time t, and the time corresponding to waveform segment 31 is four unit time segments, then waveform segment 31 can be expressed as audio feature encoding (00,00,00, 00); the amplitude of the audio signal represented by waveform segment 32 remains unchanged with time t, the time corresponding to waveform segment 32 is a unit time period, and waveform segment 32 can be represented as audio feature encoding (11); represented by waveform segment 33 The amplitude of the audio signal gradually becomes smaller with time t, and the time corresponding to the waveform segment 33 is six unit time periods, then the waveform segment 33 can be expressed as the audio feature code (01,01,01,01,01,01); The amplitude of the audio signal represented by waveform segment 34 does not change with time t, and the time corresponding to waveform segment 34 is a unit time period, then waveform segment 34 can be expressed as audio feature encoding (11); the audio signal represented by waveform segment 35 The amplitude of the signal gradually increases with time t, and the time corresponding to the waveform segment 35 is eight unit time segments, then the waveform segment 35 can be expressed as audio feature encoding (00,00,00,00,00,00,00,00 ); By analogy, waveform segment 36 can be expressed as audio feature coding (01,01,01,01,01,01,01,01,01,01,01,01), and waveform segment 37 can be expressed as audio feature coding (11), the waveform segment 38 can be expressed as audio feature encoding (00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00). Therefore, the audio feature encoding corresponding to the audio signal shown in Figure 5B can be expressed as {11,00,00,00,00,11,01,01,01,01,01,01,11,00,00,00, 00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00, 00,00,00,00,00,00,00,00,00,…}.
例如,在一些实施例中,查找表(codebook)包括至少一个第一编码字段。例如,在另一些实施例中,查找表还包括至少一个第二编码字段,多个第一编码字段组成一个第二编码字段,从而可以实现从低级特征组合而形成降维的高阶特征。例如,查找表中的编码字段(codeword,例如,codeword可以包括第 一编码字段和第二编码字段)的编码方式可以与上述第一音频特征编码的编码方式相同。For example, in some embodiments, a lookup table (codebook) includes at least one first code field. For example, in other embodiments, the lookup table further includes at least one second encoding field, and multiple first encoding fields constitute a second encoding field, so that dimensionally reduced high-order features can be formed from combinations of low-level features. For example, the coding method of the coding field (codeword, for example, the codeword may include a first coding field and a second coding field) in the lookup table may be the same as the coding method of the above-mentioned first audio feature coding.
例如,在一些实施例中,当采用两比特表示音频信号的变化状态,从而实现特征编码时,第一编码字段可以为00、01、10和11之一。可以由00、01、10和11进行组合以构成第二编码字段。例如,一个第二编码字段可以表示为{00,00,00,01,01,01,11,11,01,…},其由00、01和11组合构成。For example, in some embodiments, when two bits are used to represent the changing state of the audio signal to implement feature encoding, the first encoding field may be one of 00, 01, 10, and 11. 00, 01, 10 and 11 can be combined to form the second encoding field. For example, a second encoding field may be represented as {00,00,00,01,01,01,11,11,01,…}, which is composed of a combination of 00, 01 and 11.
例如,当查找表包括多个第二编码字段时,多个第二编码字段分别包括的第一编码字段的数量可以各不相同。For example, when the lookup table includes a plurality of second encoding fields, the number of first encoding fields included in each of the plurality of second encoding fields may be different.
需要说明的是,当采用更多比特(例如,3比特、4比特等)表示音频信号的变化状态,从而实现特征编码时,第一编码字段的种类可以更多,例如,当采用3比特表示音频信号的变化状态时,第一编码字段的种类最多可以为8种,此时,第一编码字段可以为000、001、010、011,100、101、110和111中的部分或全部。It should be noted that when more bits (for example, 3 bits, 4 bits, etc.) are used to represent the changing state of the audio signal to implement feature encoding, the types of the first coding field can be more, for example, when 3 bits are used to represent When the audio signal changes state, there can be up to eight types of first coding fields. At this time, the first coding fields can be part or all of 000, 001, 010, 011, 100, 101, 110 and 111.
例如,一个或多个第二编码字段还可以进行组合以得到第三编码字段,或一个或多个第二编码字段以及一个或多个第一编码字段可以进行组合以得到第三编码字段,类似地,一个或多个第三编码字段可以进行组合或一个或多个第三编码字段与第一编码字段和/或第二编码字段可以进行组合,以得到更高阶的编码字段。在本公开的实施例中,低阶的特征编码可以进行组合以得到高阶的特征编码,从而实现更高效且更长时间的预测。For example, one or more second encoding fields can also be combined to obtain a third encoding field, or one or more second encoding fields and one or more first encoding fields can be combined to obtain a third encoding field, similarly Alternatively, one or more third coding fields may be combined or one or more third coding fields may be combined with the first coding field and/or the second coding field to obtain a higher order coding field. In embodiments of the present disclosure, low-order feature codes can be combined to obtain high-order feature codes, thereby achieving more efficient and longer predictions.
例如,第二音频特征编码包括至少一个第一编码字段和/或至少一个第二编码字段。例如,在一些实施例中,第二音频特征编码可以包括完整的一个或多个第二编码字段,或者,第二音频特征编码可以包括一个第二编码字段中的部分第一编码字段。For example, the second audio feature encoding includes at least one first encoding field and/or at least one second encoding field. For example, in some embodiments, the second audio feature encoding may include one or more complete second encoding fields, or the second audio feature encoding may include part of the first encoding field in one second encoding field.
需要说明的是,当查找表中包括第三编码字段时,第二音频特征编码可以包括至少一个第一编码字段和/或至少一个第二编码字段和/或至少一个第三编码字段。It should be noted that when the lookup table includes a third encoding field, the second audio feature encoding may include at least one first encoding field and/or at least one second encoding field and/or at least one third encoding field.
例如,在一实施例中,查找表包括第二编码字段W1、第二编码字段W2和第二编码字段W3,且W1={11,00,00,00,00,11,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,….},W2={11,01,00,00,01,01,01,01,01,01,01,….}, W3={11,00,01,00,00,01,01,01,11,00,00,00,01,01,01,01,01,01,01,01,01,….}。For example, in one embodiment, the lookup table includes the second encoding field W1, the second encoding field W2, and the second encoding field W3, and W1={11,00,00,00,00,11,01,01,01 ,01,01,01,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11 ,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,….}, W2={11,01,00,00,01, 01,01,01,01,01,01,….}, W3={11,00,01,00,00,01,01,01,11,00,00,00,01,01,01,01 ,01,01,01,01,01,….}.
在一个实施例中,如图5B所示,从时刻t31开始,音频采集装置持续采集第一音频信号,当音频采集装置采集到的第一音频信号对应的第一个特征编码字段表示为{11},对应于波形段30,则基于查找表进行查询,以确定查找表中是否存在某个编码字段(包括第一编码字段和第二编码字段)包括{11},在上述示例中,查询到查找表中的第二编码字段W1、第二编码字段W2和第二编码字段W3均包括{11},此时,第二编码字段W1、第二编码字段W2和第二编码字段W3均作为待输出编码字段列表中的待输出编码字段。In one embodiment, as shown in Figure 5B, starting from time t31, the audio collection device continues to collect the first audio signal. When the first feature encoding field corresponding to the first audio signal collected by the audio collection device is expressed as {11 }, corresponding to waveform segment 30, a query is performed based on the lookup table to determine whether there is a certain coding field (including the first coding field and the second coding field) in the lookup table, including {11}. In the above example, the query The second encoding field W1, the second encoding field W2, and the second encoding field W3 in the lookup table all include {11}. At this time, the second encoding field W1, the second encoding field W2, and the second encoding field W3 are all used as to-be-coded fields. The encoding fields to be output in the output encoding field list.
然后,如图5B所示,当音频采集装置采集到的第一音频信号对应的第二个特征编码字段表示为{00},对应于波形段31中的第一个单位时间段,继续对查找表进行查询(此时可以仅对待输出编码字段列中的待输出编码字段进行查询,从而可以节省查询时间,然而,也可以对整个查找表进行查询),以确定查找表中是否存在某个编码字段包括{11,00},在上述示例中,查询到查找表中的第二编码字段W1和第二编码字段W3均包括{11,00},由于第二编码字段W2包括{11,01},而不包括{11,00},从而不满足音频采集装置采集到的第一音频信号的特征,因此,可以将第二编码字段W2从待输出编码字段列表中删除,此时,第二编码字段W1和第二编码字段W3作为待输出编码字段列表中的待输出编码字段。Then, as shown in Figure 5B, when the second feature encoding field corresponding to the first audio signal collected by the audio collection device is represented as {00}, corresponding to the first unit time period in the waveform segment 31, continue the search. Query the table (at this time, you can only query the coding field to be output in the coding field column to be output, which can save query time. However, you can also query the entire lookup table) to determine whether a certain encoding exists in the lookup table. The field includes {11,00}. In the above example, it is found that the second encoding field W1 and the second encoding field W3 in the lookup table both include {11,00}, because the second encoding field W2 includes {11,01}. , and does not include {11,00}, thus not meeting the characteristics of the first audio signal collected by the audio collection device. Therefore, the second encoding field W2 can be deleted from the list of encoding fields to be output. At this time, the second encoding field W2 Field W1 and the second encoding field W3 serve as the encoding fields to be output in the encoding field list to be output.
然后,当音频采集装置采集到的第一音频信号对应的第三个特征编码字段表示为{00},对应于波形段31中的第二个单位时间段,继续对查找表进行查询,以确定查找表中是否存在某个编码字段包括{11,00,00},在上述示例中,查询到查找表中的第二编码字段W1包括{11,00,00}。那么,可以预测接下来的音频信号应该就是第二编码字段W1这个模式。对于第二编码字段W1中的前三个编码字段{11,00,00},由于其在时间上,其对应的音频信号已经过去,从而可以输出从第二编码字段W1中的第四个字段(即{00})开始的所有后续编码字段作为预测得到的第二音频编码特征,此时,第二音频特征编码表示为{00,00,11,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,…….}。Then, when the third feature encoding field corresponding to the first audio signal collected by the audio collection device is represented as {00}, corresponding to the second unit time period in the waveform segment 31, continue to query the lookup table to determine Check whether there is a certain encoding field in the lookup table that includes {11,00,00}. In the above example, the second encoding field W1 in the lookup table is queried and includes {11,00,00}. Then, it can be predicted that the next audio signal should be the pattern of the second encoding field W1. For the first three coding fields {11,00,00} in the second coding field W1, since their corresponding audio signals have passed in time, the fourth field in the second coding field W1 can be output (ie {00}) is used as the predicted second audio coding feature. At this time, the second audio feature coding is expressed as {00,00,11,01,01,01,01,01,01 ,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00 ,00,00,00,00,00,00,00,00,00,00,00,00,00,…….}.
需要说明的是,在实际应用中,当匹配多少个特征编码字段才确定第二音频特征编码可以根据实际应用场景、设计需求等因素调整,例如,在上述示例中,当匹配3个(在实际应用中,可以匹配10、20、50个等)特征编码字段, 则可以确定第二音频特征编码。It should be noted that in actual applications, how many feature coding fields are matched before determining the second audio feature coding can be adjusted according to actual application scenarios, design requirements and other factors. For example, in the above example, when 3 matching fields (in actual In the application, if 10, 20, 50, etc.) feature coding fields can be matched, the second audio feature coding can be determined.
例如,在上述示例中,第一音频信号对应的第一音频特征编码包括3个特征编码字段,且表示为{11,00,00},如图5B所示,第一音频信号对应的时间段为时刻t31至时刻t32。当考虑到系统处理信号的时间等因素,实际上系统需要在时刻t33才能输出第二音频信号,时刻t33晚于时刻t32,此时,第二音频特征编码中的前两个特征编码字段{00,00}对应的时间段(即时刻t32至时刻t33之间的时间段)已经过去,从而实际上预测得到的第四音频信号对应的音频特征编码表示为{11,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,….}。For example, in the above example, the first audio feature code corresponding to the first audio signal includes 3 feature code fields and is represented as {11,00,00}. As shown in Figure 5B, the time period corresponding to the first audio signal It is from time t31 to time t32. When considering factors such as the system's signal processing time, the system actually needs to output the second audio signal at time t33, which is later than time t32. At this time, the first two feature coding fields in the second audio feature coding {00 The time period corresponding to ,00} (that is, the time period between time t32 and time t33) has passed, so the audio feature encoding corresponding to the predicted fourth audio signal is actually expressed as {11,01,01,01,01 ,01,01,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00 ,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,…}.
例如,若第三音频信号和第四音频信号完全相同,则第三音频信号对应的音频特征编码也表示为{11,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,….}。For example, if the third audio signal and the fourth audio signal are exactly the same, the audio feature code corresponding to the third audio signal is also expressed as {11,01,01,01,01,01,01,11,00,00,00 ,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00 ,00,00,00,00,00,00,00,00,00,…}.
例如,第二音频信号为对第四音频信号进行反相处理得到的信号,即第二音频信号可以为{11,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,….}这个模式的反相音频信号。For example, the second audio signal is a signal obtained by inverting the fourth audio signal, that is, the second audio signal can be {11,01,01,01,01,01,01,11,00,00,00, 00,00,00,00,00,01,01,01,01,01,01,01,01,01,01,01,01,11,00,00,00,00,00,00,00, 00,00,00,00,00,00,00,00,00,….} The inverted audio signal of this pattern.
例如,在一些实施例中,第二音频信号的时间长度、第三音频信号的时间长度和第四音频信号的时间长度是大致相同的,例如,完全相同。For example, in some embodiments, the duration of the second audio signal, the duration of the third audio signal, and the duration of the fourth audio signal are substantially the same, eg, identical.
例如,在一些实施例中,可以针对查找表中的至少部分第一编码字段和/或第二编码字段设置前导特征编码字段,例如,可以为第二编码字段W1设置前导特征编码{11,00,00},当检测到该前导特征编码字段,则将第二编码字段W1输出作为第二音频特征编码。在此情况下,当检测到第一音频信号对应的第一音频特征编码为{11,00,00},该第一音频信号对应的第一音频特征编码与前导特征编码字段{11,00,00}匹配,从而可以将第二编码字段W1输出作为第二音频特征编码。For example, in some embodiments, the leading feature coding field may be set for at least part of the first coding field and/or the second coding field in the lookup table. For example, the leading feature coding field may be set for the second coding field W1 {11,00 ,00}, when the leading feature coding field is detected, the second coding field W1 is output as the second audio feature coding. In this case, when it is detected that the first audio feature code corresponding to the first audio signal is {11,00,00}, the first audio feature code corresponding to the first audio signal and the preamble feature code field {11,00, 00} matching, so that the second encoding field W1 can be output as the second audio feature encoding.
又例如,可以为第二编码字段W1设置前导特征编码字段{11,00,00,01,01},当检测到该前导特征编码字段中的部分字段,则将第二编码字段W1和该前导特征编码字段中的剩余字段输出作为第二音频特征编码,在此情况下,当检测 到第一音频信号对应的第一音频特征编码为{11,00,00},该第一音频信号对应的第一音频特征编码与前导特征编码字段中的前三个字段{11,00,00}匹配,从而可以将前导特征编码字段中的剩余字段{01,01}和第二编码字段W1输出作为第二音频特征编码。此时,第二音频特征编码中的前两个特征编码字段{01,01}(即前导特征编码字段中的剩余字段)对应的时间可以为系统处理信号的时间,从而实际上预测得到的第四音频信号对应的音频特征编码可以为完整的第二编码字段W1。For another example, the leading feature coding field {11,00,00,01,01} can be set for the second coding field W1. When some fields in the leading feature coding field are detected, the second coding field W1 and the leading feature coding field are The remaining fields in the feature encoding field are output as the second audio feature encoding. In this case, when it is detected that the first audio feature encoding corresponding to the first audio signal is {11,00,00}, the first audio signal corresponding The first audio feature encoding matches the first three fields {11,00,00} in the leading feature encoding field, so that the remaining fields {01,01} and the second encoding field W1 in the leading feature encoding field can be output as the third 2. Audio feature encoding. At this time, the time corresponding to the first two feature coding fields {01,01} in the second audio feature coding (that is, the remaining fields in the leading feature coding field) can be the time for the system to process the signal, so that the predicted first The audio feature encoding corresponding to the four audio signals may be the complete second encoding field W1.
需要说明的是,前导特征编码字段的长度可以根据实际情况调整,本公开对此不作限制。It should be noted that the length of the leading feature encoding field can be adjusted according to actual conditions, and this disclosure does not limit this.
值得注意的是,对于查找表而言,当用于存储查找表的存储器足够大,查找表存储的内容够丰富(即查找表中的编码字段的组合够多),则可消除用户想要消除的所有类型的音频信号。而对于神经网络而言,当用于训练神经网络的样本足够丰富,样本的类型足够丰富,则也可以基于神经网络预测得到用户想要消除的任何类型的音频信号。It is worth noting that for look-up tables, when the memory used to store the look-up table is large enough and the content stored in the look-up table is rich enough (that is, there are enough combinations of encoding fields in the look-up table), the user's desire to eliminate all types of audio signals. For neural networks, when the samples used to train the neural network are rich enough and the types of samples are rich enough, any type of audio signal that the user wants to eliminate can be predicted based on the neural network.
例如,查找表可以以表格等形式存储在存储器中,本公开的实施例对查找表的具体形式不作限制。For example, the lookup table may be stored in the memory in the form of a table, etc. The embodiments of the present disclosure do not limit the specific form of the lookup table.
例如,通过查找表的方式可以实现神经网络中的预测。For example, predictions in neural networks can be achieved by looking up tables.
例如,第二音频信号和/或第三音频信号和/或第四音频信号是周期性的或间歇性的时域信号,第二音频信号和/或第三音频信号和/或第四音频信号的信号特征是周期性或间歇性的时域振幅变化,即第二音频信号和/或第三音频信号和/或第四音频信号具有连续重复、间歇重复的特质,具有固定的模式。对于间歇性的音频信号,由于在该间歇性的音频信号的停歇期间不存在音频信号,因此在停歇期间没有频谱特征可供提取,而停歇期间却可以成为该间歇性的音频信号的时域特征之一。For example, the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain signals, and the second audio signal and/or the third audio signal and/or the fourth audio signal The signal characteristics are periodic or intermittent time domain amplitude changes, that is, the second audio signal and/or the third audio signal and/or the fourth audio signal have the characteristics of continuous repetition or intermittence repetition, and have a fixed pattern. For intermittent audio signals, since there is no audio signal during the pause period of the intermittent audio signal, there is no spectral feature to be extracted during the pause period, but the pause period can become the time domain feature of the intermittent audio signal. one.
例如,在一些实施例中,步骤S101可以包括:采集初始音频信号;对初始音频信号进行下采样处理(downsampling)以得到第一音频信号。For example, in some embodiments, step S101 may include: collecting an initial audio signal; performing downsampling on the initial audio signal to obtain a first audio signal.
由于音频采集装置采集得到的初始音频信号的采样率(sample rate)较高,不利于后端的音频信号处理装置(例如,人工智能引擎(AI(Artificial Intelligence)Engine)、数字信号处理器(Digital Signal Processing,简称DSP)等)的处理,因此,可以对初始音频信号进行下采样处理以实现降频,便于音频信号处理装置处理,例如可以降频至48K赫兹甚至更低。Since the sampling rate of the initial audio signal collected by the audio acquisition device is high, it is not conducive to the back-end audio signal processing device (for example, artificial intelligence engine (AI (Artificial Intelligence) Engine), digital signal processor (Digital Signal) Processing (DSP for short), etc.), therefore, the initial audio signal can be down-sampled to achieve frequency reduction, which is convenient for processing by the audio signal processing device. For example, the frequency can be reduced to 48K Hz or even lower.
例如,在另一些实施例中,步骤S101可以包括:采集初始音频信号;对初始音频信号进行滤波处理以得到第一音频信号。For example, in other embodiments, step S101 may include: collecting an initial audio signal; and filtering the initial audio signal to obtain a first audio signal.
在一些应用场景下,太安静并不安全,因此,还可以通过带宽控制器(Bandwidth controller)进行滤波处理,以针对特定频率范围内的音频信号进行抑制。针对连续性及间歇性的音频信号(例如,敲击或滴水噪音等),将第一音频信号的有效频宽设定在该需要被抑制的音频信号对应的频率范围,例如,1K~6K赫兹,从而确保使用者还能听到较为重要的声音,例如,当应用在汽车领域时,必须确保驾驶员能够听到喇叭声等,以提升驾驶安全性。In some application scenarios, being too quiet is not safe. Therefore, filtering can also be performed through a bandwidth controller (Bandwidth controller) to suppress audio signals within a specific frequency range. For continuous and intermittent audio signals (for example, knocking or dripping noise, etc.), the effective bandwidth of the first audio signal is set to the frequency range corresponding to the audio signal that needs to be suppressed, for example, 1K ~ 6K Hz , thereby ensuring that users can still hear more important sounds. For example, when used in the automotive field, it must be ensured that the driver can hear the horn, etc. to improve driving safety.
例如,在一些实施例中,滤波处理和下采样处理还可以结合使用,本公开对滤波处理和下采样处理的处理顺序不作限制。例如,在一些实施例中,获取第一音频信号可以包括:采集初始音频信号;对初始音频信号进行滤波处理以得到预定频率范围内的音频信号;对在预定频率范围内的音频信号进行下采样处理以得到第一音频信号;或者,获取第一音频信号可以包括:采集初始音频信号;对初始音频信号进行下采样处理;对下采样处理后的音频信号进行滤波处理以得到第一音频信号。For example, in some embodiments, filtering processing and downsampling processing can also be used in combination, and the present disclosure does not limit the processing order of filtering processing and downsampling processing. For example, in some embodiments, obtaining the first audio signal may include: collecting an initial audio signal; filtering the initial audio signal to obtain an audio signal within a predetermined frequency range; and downsampling the audio signal within the predetermined frequency range. Processing to obtain the first audio signal; alternatively, obtaining the first audio signal may include: collecting an initial audio signal; performing downsampling processing on the initial audio signal; and performing filtering processing on the downsampled audio signal to obtain the first audio signal.
例如,控制指令可以包括第二音频信号输出的时刻、第四音频信号和指示对第四音频信号进行反相的控制信号等。For example, the control instruction may include the time at which the second audio signal is output, the fourth audio signal, a control signal instructing to invert the fourth audio signal, and the like.
例如,在一些实施例中,步骤S11可以包括:基于控制指令,确定第四音频信号和指示对第四音频信号进行反相的控制信号;基于该控制信号,对该第四音频信号进行反相处理,以生成第二音频信号。For example, in some embodiments, step S11 may include: based on the control instruction, determining a fourth audio signal and a control signal indicating inverting the fourth audio signal; based on the control signal, inverting the fourth audio signal Processed to generate a second audio signal.
例如,在一些实施例中,步骤S12可以包括:基于控制指令,确定输出第二音频信号的第一时刻;在第一时刻输出第二音频信号。For example, in some embodiments, step S12 may include: determining a first moment to output the second audio signal based on the control instruction; and outputting the second audio signal at the first moment.
例如,第三音频信号从第二时刻开始出现,第一时刻和第二时刻之间的时间差的绝对值小于时间阈值。需要说明的是,时间阈值可以根据实际情况具体设置,本公开对此不作限制,时间阈值越小,则消音效果越好。For example, the third audio signal starts to appear from the second moment, and the absolute value of the time difference between the first moment and the second moment is less than the time threshold. It should be noted that the time threshold can be specifically set according to the actual situation, and this disclosure does not limit this. The smaller the time threshold, the better the silencing effect.
例如,在一些实施例中,第一时刻和第二时刻之间的时间差为0,即第二音频信号的开始输出的时刻和第三音频信号开始出现的时刻相同,在图3所示的示例中,第二音频信号的开始输出的时刻和第三音频信号开始出现的时刻均为时刻t21。For example, in some embodiments, the time difference between the first moment and the second moment is 0, that is, the moment when the second audio signal starts to be output and the moment when the third audio signal starts to appear are the same. In the example shown in Figure 3 , the time when the second audio signal starts to be output and the time when the third audio signal starts to appear are both time t21.
例如,第一时刻和第二时刻之间的时间差可以根据实际情况设置,例如,可以设置第一时刻和第二时刻以保证第二音频信号和第三音频信号同时被传 输至目标对象,从而避免音频信号的传输而导致第二音频信号和第三音频信号不同步的问题,进一步提升消音效果。例如,目标对象可以为人的耳朵、麦克风等。For example, the time difference between the first moment and the second moment can be set according to the actual situation. For example, the first moment and the second moment can be set to ensure that the second audio signal and the third audio signal are transmitted to the target object at the same time, thereby avoiding The transmission of audio signals causes the second audio signal and the third audio signal to be out of sync, further improving the noise canceling effect. For example, the target object can be a human ear, a microphone, etc.
例如,第二音频信号可以通过扬声器等可以将电信号转换为声音信号进行输出的装置进行输出。For example, the second audio signal can be output through a device such as a speaker that can convert an electrical signal into a sound signal for output.
需要说明的是,当音频采集装置没有采集到音频信号,则可以不执行本公开提供的音频处理方法,直到音频采集装置采集到音频信号为止,从而可以节省功耗。It should be noted that when the audio collection device does not collect the audio signal, the audio processing method provided by the present disclosure may not be executed until the audio collection device collects the audio signal, thereby saving power consumption.
在本公开的实施例中,音频处理方法可以将环境音频信号中的周期性的音频信号(例如,噪声)降低或消除,例如,在图书馆这样的应用场景中,消除旁边建筑工地施工的声音等。这类的场景不需要特别知道想留下来的音频信号,单纯的降低需要消除的环境中的目标待消音声音,而这些目标待消音声音通常具有连续重复、间歇重复的特质,因此可以通过预测方式预测得到。需要说明的是,“目标待消音声音”可以根据实际情况确定,例如,对于图书馆这样的应用场景,当图书馆周围具有建筑工地时,外界环境音频信号可以包括两种音频信号,第一种音频信号可以为工地钻地声,第二种音频信号可以周围人的讨论声。通常,工地钻地声具有周期性的特点,且通常具有固定的模式,而讨论声大概率不具固定模式,也不具有周期性的特点,此时,目标待消音声音则为工地钻地声,通过本公开的实施例提供的音频处理方法,则可以实现对工地钻地声的预测,从而消除或降低工地钻地声。In embodiments of the present disclosure, the audio processing method can reduce or eliminate periodic audio signals (for example, noise) in environmental audio signals. For example, in application scenarios such as libraries, the sound of construction at a nearby construction site can be eliminated. wait. This type of scenario does not require special knowledge of the audio signals that you want to keep. It simply reduces the target sounds to be silenced in the environment that need to be eliminated. These target sounds to be silenced usually have the characteristics of continuous repetition or intermittence repetition, so they can be predicted through prediction. Predicted. It should be noted that the "target sound to be silenced" can be determined according to the actual situation. For example, for an application scenario such as a library, when there is a construction site around the library, the external environment audio signal can include two audio signals. The first The audio signal can be the sound of drilling at the construction site, and the second audio signal can be the sound of discussions by people around you. Usually, the sound of construction site drilling has periodic characteristics and usually has a fixed pattern. However, the discussion sound most likely does not have a fixed pattern and does not have periodic characteristics. At this time, the target sound to be silenced is the construction site drilling sound. Through the audio processing method provided by the embodiments of the present disclosure, it is possible to predict the drilling sound at the construction site, thereby eliminating or reducing the drilling sound at the construction site.
本公开的实施例提供的音频处理方法可以应用于汽车驾驶头枕,从而在驾驶员的耳朵附近创造静音区,避免外界非必要的音频信号(例如,发动机噪音、路噪、风噪和胎噪等汽车行驶过程中的噪声信号)对驾驶员产生干扰。又例如,该音频处理方法还可以应用于吹风机、排油烟机、吸尘器、非变频式空调等设备中,以降低这些设备发出的运转声音,使得用户可以待在吵杂的环境,而不受到周围环境噪声的影响。该音频处理方法还可以应用于耳机等,以降低或消除外界声音,使得用户可以更好地接收耳机发出的声音(音乐声或通话声等)。The audio processing method provided by embodiments of the present disclosure can be applied to automobile driving headrests to create a silent zone near the driver's ears to avoid unnecessary external audio signals (such as engine noise, road noise, wind noise, and tire noise). Noise signals while the car is driving) interfere with the driver. For another example, this audio processing method can also be applied to hair dryers, range hoods, vacuum cleaners, non-inverter air conditioners and other equipment to reduce the operating sound emitted by these equipment, allowing users to stay in noisy environments without being affected by the surrounding environment. The impact of environmental noise. This audio processing method can also be applied to headphones, etc., to reduce or eliminate external sounds, so that users can better receive the sounds from the headphones (music or phone calls, etc.).
本公开至少一个实施例还提供一种音频处理装置。图6为本公开至少一个实施例提供的一种音频处理装置的示意性框图。At least one embodiment of the present disclosure also provides an audio processing device. Figure 6 is a schematic block diagram of an audio processing device provided by at least one embodiment of the present disclosure.
如图6所示,音频处理装置600包括指令生成模块601、音频生成模块602和输出模块603。图6所示的音频处理装置600的组件和结构只是示例性的, 而非限制性的,根据需要,该音频处理装置600还可以包括其他组件和结构。As shown in FIG. 6 , the audio processing device 600 includes an instruction generation module 601 , an audio generation module 602 and an output module 603 . The components and structures of the audio processing device 600 shown in FIG. 6 are only exemplary and not restrictive. The audio processing device 600 may also include other components and structures as needed.
指令生成模块601被配置为基于第一音频信号,生成控制指令。指令生成模块601用于执行图2A所示的步骤S10。The instruction generation module 601 is configured to generate a control instruction based on the first audio signal. The instruction generation module 601 is used to execute step S10 shown in Figure 2A.
音频生成模块602被配置为基于控制指令,生成第二音频信号。音频生成模块602用于执行图2A所示的步骤S11。The audio generation module 602 is configured to generate a second audio signal based on the control instruction. The audio generation module 602 is used to perform step S11 shown in Figure 2A.
输出模块603被配置为输出第二音频信号,以抑制第三音频信号。输出模块603用于执行图2A所示的步骤S12。The output module 603 is configured to output the second audio signal to suppress the third audio signal. The output module 603 is used to perform step S12 shown in Figure 2A.
关于指令生成模块601所实现的功能的具体说明可以参考上述音频处理方法的实施例中的图2A所示的步骤S10的相关描述,关于音频生成模块602所实现的功能的具体说明可以参考上述音频处理方法的实施例中的图2A所示的步骤S11的相关描述,关于输出模块603所实现的功能的具体说明可以参考上述音频处理方法的实施例中的图2A所示的步骤S12的相关描述。音频处理装置可以实现与前述音频处理方法相似或相同的技术效果,在此不再赘述。For a specific description of the functions implemented by the instruction generation module 601, please refer to the relevant description of step S10 shown in FIG. 2A in the embodiment of the above audio processing method. For a specific description of the functions implemented by the audio generation module 602, please refer to the above audio For the relevant description of step S11 shown in FIG. 2A in the embodiment of the processing method, for a specific description of the functions implemented by the output module 603, please refer to the relevant description of step S12 shown in FIG. 2A in the embodiment of the audio processing method. . The audio processing device can achieve similar or identical technical effects to the foregoing audio processing method, which will not be described again here.
例如,第一音频信号出现的时间早于第三音频信号出现的时间。For example, the first audio signal appears earlier than the third audio signal.
例如,第二音频信号的相位与第三音频信号的相位之和小于相位阈值,在一些实施例中,第二音频信号的相位与第三音频信号的相位相反,从而可以完全抑制第三音频信号。For example, the sum of the phases of the second audio signal and the third audio signal is less than the phase threshold. In some embodiments, the phase of the second audio signal is opposite to the phase of the third audio signal, so that the third audio signal can be completely suppressed. .
例如,在一些实施例中,指令生成模块601可以包括音频获取子模块、预测子模块和生成子模块。音频获取子模块被配置为获取第一音频信号;预测子模块被配置为对第一音频信号进行处理以预测得到第四音频信号;生成子模块被配置为基于第四音频信号,生成控制指令。For example, in some embodiments, the instruction generation module 601 may include an audio acquisition sub-module, a prediction sub-module and a generation sub-module. The audio acquisition sub-module is configured to acquire the first audio signal; the prediction sub-module is configured to process the first audio signal to predict a fourth audio signal; the generation sub-module is configured to generate a control instruction based on the fourth audio signal.
例如,第二音频信号和/或第三音频信号和/或第四音频信号是周期性的或间歇性的时域信号。For example, the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain signals.
例如,第三音频信号和第四音频信号可以完全相同。For example, the third audio signal and the fourth audio signal may be exactly the same.
例如,在一些实施例中,预测子模块可以基于神经网络对第一音频信号进行处理以预测得到第四音频信号。例如,预测子模块可以包括图1所示的音频处理部分中的AI引擎和/或数字信号处理器等,AI引擎可以包括神经网络,例如,AI引擎可以包括循环神经网络、长短时记忆网络或生成对抗网络等中的至少一个神经网络。For example, in some embodiments, the prediction sub-module may process the first audio signal based on a neural network to predict the fourth audio signal. For example, the prediction sub-module may include the AI engine and/or digital signal processor in the audio processing part shown in Figure 1. The AI engine may include a neural network. For example, the AI engine may include a recurrent neural network, a long short-term memory network, or At least one neural network among generative adversarial networks and the like.
例如,在一些实施中,预测子模块包括查询单元和预测单元。查询单元被配置为基于第一音频信号生成第一音频特征编码以及基于第一音频特征编码 查询查找表,以得到第二音频特征编码。预测单元被配置为基于第二音频特征编码,预测得到第四音频信号。For example, in some implementations, the prediction sub-module includes a query unit and a prediction unit. The query unit is configured to generate a first audio feature code based on the first audio signal and query the lookup table based on the first audio feature code to obtain a second audio feature code. The prediction unit is configured to predict the fourth audio signal based on the second audio feature encoding.
例如,查询单元可以包括存储器以用于存储查找表。For example, the lookup unit may include memory for storing lookup tables.
例如,在一些实施例中,查找表可以包括至少一个第一编码字段。例如,在另一些实施例中,查找表还包括至少一个第二编码字段,多个第一编码字段组成一个第二编码字段。关于查找表的具体内容可以参考上述音频处理方法的实施例中的相关描述,重复之处不再赘述。For example, in some embodiments, the lookup table may include at least one first encoding field. For example, in other embodiments, the lookup table further includes at least one second encoding field, and multiple first encoding fields constitute one second encoding field. Regarding the specific content of the lookup table, reference may be made to the relevant descriptions in the embodiments of the audio processing method described above, and repeated details will not be described again.
例如,第二音频特征编码包括至少一个第一编码字段和/或至少一个第二编码字段。For example, the second audio feature encoding includes at least one first encoding field and/or at least one second encoding field.
例如,在一些实施例中,音频获取子模块包括采集单元和下采样处理单元。采集单元被配置为采集初始音频信号;下采样处理单元被配置为对初始音频信号进行下采样处理以得到第一音频信号。For example, in some embodiments, the audio acquisition sub-module includes an acquisition unit and a downsampling processing unit. The acquisition unit is configured to collect the initial audio signal; the down-sampling processing unit is configured to perform down-sampling processing on the initial audio signal to obtain the first audio signal.
例如,在一些实施例中,音频获取子模块包括采集单元和滤波单元,采集单元被配置为采集初始音频信号;滤波单元被配置为对初始音频信号进行滤波处理以得到第一音频信号。For example, in some embodiments, the audio acquisition sub-module includes an acquisition unit and a filtering unit. The acquisition unit is configured to acquire an initial audio signal; and the filtering unit is configured to filter the initial audio signal to obtain a first audio signal.
例如,音频获取子模块可以实现为图1所示的音频接收部分。例如,采集单元可以包括音频采集装置,例如,图1所示的音频接收部分中的麦克风等。例如,采集单元还可以包括放大器、模数转换器等。For example, the audio acquisition sub-module can be implemented as the audio receiving part shown in Figure 1. For example, the collection unit may include an audio collection device, such as a microphone in the audio receiving part shown in FIG. 1 , or the like. For example, the acquisition unit may also include an amplifier, an analog-to-digital converter, etc.
例如,在一些实施例中,输出模块603可以包括时刻确定子模块和输出子模块。时刻确定子模块被配置为基于控制指令,确定输出第二音频信号的第一时刻;输出子模块被配置为在第一时刻输出第二音频信号。For example, in some embodiments, the output module 603 may include a moment determination sub-module and an output sub-module. The time determination sub-module is configured to determine a first time to output the second audio signal based on the control instruction; the output sub-module is configured to output the second audio signal at the first time.
例如,输出模块603可以实现为图1所示的音频输出部分。For example, the output module 603 may be implemented as the audio output part shown in FIG. 1 .
例如,第三音频信号从第二时刻开始出现,第一时刻和第二时刻之间的时间差的绝对值小于时间阈值。For example, the third audio signal starts to appear from the second moment, and the absolute value of the time difference between the first moment and the second moment is less than the time threshold.
例如,第一时刻和所述第二时刻之间的时间差可以为0。For example, the time difference between the first time and the second time may be zero.
例如,输出子模块可以包括扬声器等音频输出装置。例如,输出子模块还可以包括数模转换器等。For example, the output sub-module may include audio output devices such as speakers. For example, the output sub-module may also include a digital-to-analog converter, etc.
例如,指令生成模块601、音频生成模块602和/或输出模块603可以为硬件、软件、固件以及它们的任意可行的组合。例如,指令生成模块601、音频生成模块602和/或输出模块603可以为专用或通用的电路、芯片或装置等,也可以为处理器和存储器的结合。本公开的实施例不对上述各个模块、子模块和 单元的具体实现形式进行限制。For example, the instruction generation module 601, the audio generation module 602, and/or the output module 603 may be hardware, software, firmware, or any feasible combination thereof. For example, the instruction generation module 601, the audio generation module 602 and/or the output module 603 can be a dedicated or general-purpose circuit, chip or device, or a combination of a processor and a memory. The embodiments of the present disclosure do not limit the specific implementation forms of each of the above modules, sub-modules and units.
本公开至少一个实施例还提供一种音频处理装置,图7为本公开至少一个实施例提供的另一种音频处理装置的示意性框图。At least one embodiment of the present disclosure also provides an audio processing device. FIG. 7 is a schematic block diagram of another audio processing device provided by at least one embodiment of the present disclosure.
例如,如图7所示,音频处理装置700包括一个或多个存储器701和一个或多个处理器702。一个或多个存储器701被配置为非瞬时性地存储有计算机可执行指令;一个或多个处理器702配置为运行计算机可执行指令。计算机可执行指令被一个或多个处理器702运行时实现根据上述任一实施例所述的音频处理方法。关于该音频处理方法的各个步骤的具体实现以及相关解释内容可以参见上述音频处理方法的实施例的描述,在此不做赘述。For example, as shown in FIG. 7 , the audio processing device 700 includes one or more memories 701 and one or more processors 702 . One or more memories 701 are configured to store non-transitory computer-executable instructions; one or more processors 702 are configured to execute the computer-executable instructions. The computer-executable instructions, when executed by one or more processors 702, implement the audio processing method according to any of the above embodiments. For the specific implementation and related explanations of each step of the audio processing method, please refer to the description of the above embodiments of the audio processing method, and will not be described again here.
例如,在一些实施例中,音频处理装置700还可以包括通信接口和通信总线。存储器701、处理器702和通信接口可以通过通信总线实现相互通信,存储器701、处理器6702和通信接口等组件之间也可以通过网络连接进行通信。本公开对网络的类型和功能在此不作限制。For example, in some embodiments, the audio processing device 700 may further include a communication interface and a communication bus. The memory 701, the processor 702 and the communication interface can communicate with each other through the communication bus, and the memory 701, the processor 6702 and the communication interface and other components can also communicate through a network connection. This disclosure does not limit the type and function of the network.
例如,通信总线可以是外设部件互连标准(PCI)总线或扩展工业标准结构(EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。For example, the communication bus may be a Peripheral Component Interconnect Standard (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus can be divided into address bus, data bus, control bus, etc.
例如,通信接口用于实现音频处理装置700与其他设备之间的通信。通信接口可以为通用串行总线(Universal Serial Bus,USB)接口等。For example, the communication interface is used to implement communication between the audio processing device 700 and other devices. The communication interface may be a Universal Serial Bus (USB) interface, etc.
例如,处理器702和存储器701可以设置在服务器端(或云端)。For example, the processor 702 and the memory 701 can be provided on the server side (or cloud).
例如,处理器702可以控制音频处理装置700中的其它组件以执行期望的功能。处理器702可以是中央处理器(CPU)、网络处理器(NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。中央处理元(CPU)可以为X86或ARM架构等。For example, processor 702 may control other components in audio processing device 700 to perform desired functions. The processor 702 may be a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components. The central processing unit (CPU) can be X86 or ARM architecture, etc.
例如,存储器701可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机可执行指令,处理器702可以运行所述计算机可执行指令,以实现音频处理装置700的各种功能。在存储介质中还可以存储各种应用程序和各种数据等。For example, memory 701 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer-executable instructions may be stored on the computer-readable storage medium, and the processor 702 may execute the computer-executable instructions to implement various functions of the audio processing device 700 . Various applications and various data can also be stored in the storage medium.
例如,关于音频处理装置700执行音频处理的过程的详细说明可以参考音频处理方法的实施例中的相关描述,重复之处不再赘述。For example, for detailed description of the process of audio processing performed by the audio processing device 700, reference may be made to the relevant descriptions in the embodiments of the audio processing method, and repeated details will not be described again.
例如,在一些实施例中,音频处理装置700可以通过芯片、小型装置/设备等形式呈现。For example, in some embodiments, the audio processing device 700 may be embodied in the form of a chip, a small device/device, or the like.
图8为本公开至少一个实施例提供的一种非瞬时性计算机可读存储介质的示意图。例如,如图8所示,在非瞬时性计算机可读存储介质1000上可以非暂时性地存储一个或多个计算机可执行指令1001。例如,当计算机可执行指令1001由处理器执行时可以执行根据上文所述的音频处理方法中的一个或多个步骤。FIG. 8 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure. For example, as shown in Figure 8, one or more computer-executable instructions 1001 may be non-transitory stored on a non-transitory computer-readable storage medium 1000. For example, one or more steps in the audio processing method described above may be performed when the computer-executable instructions 1001 are executed by a processor.
例如,该非瞬时性计算机可读存储介质1000可以应用于上述音频处理装置700中,例如,其可以包括音频处理装置700中的存储器701。For example, the non-transitory computer-readable storage medium 1000 can be applied in the above-mentioned audio processing device 700, and for example, it can include the memory 701 in the audio processing device 700.
关于非瞬时性计算机可读存储介质1000的说明可以参考图7所示的音频处理装置600的实施例中对于存储器701的描述,重复之处不再赘述。For description of the non-transitory computer-readable storage medium 1000, reference may be made to the description of the memory 701 in the embodiment of the audio processing device 600 shown in FIG. 7, and repeated descriptions will not be repeated.
本公开的至少一个实施例提供一种音频处理方法、音频处理装置和非瞬时性计算机可读存储介质,通过学习当前音频信号的特征,预测得到尚未产生的音频信号(即第四音频信号),据此预测得到的音频信号产生未来的反相音频信号以抑制未来音频信号,避免由于输入端和输出端之间的延迟导致的反相音频信号和需要抑制的音频信号不同步的问题,提升消音效果,可大幅降低或甚至消除输入端对输出端的延迟对消音的影响,抑制音频的效果比业界常用的落后式的主动消音系统的抑制音频的效果更好;由于第一音频信号为时域信号,第一音频信号不是特定频率的音频信号,从而本公开的实施例提供的音频处理方法不需要从音频信号中提取频谱特征来产生频谱图,由此可以简化音频信号的处理过程,节省处理时间;在查找表中,低阶的特征编码可以进行组合以得到高阶的特征编码,从而实现更高效且更长时间的预测;并且在该音频处理方法中,还可以通过带宽控制器进行滤波处理,从而实现针对特定频率范围内的音频信号进行抑制,确保使用者还能听到较为重要的声音,例如,当应用在汽车领域时,必须确保驾驶员能够听到喇叭声等,以提升驾驶安全性;此外,当没有采集到音频信号,则可以不执行本公开提供的音频处理方法,直到采集到音频信号为止,从而可以节省功耗。At least one embodiment of the present disclosure provides an audio processing method, an audio processing device and a non-transitory computer-readable storage medium. By learning the characteristics of the current audio signal, an audio signal that has not yet been generated (ie, the fourth audio signal) is predicted, The audio signal predicted based on this generates a future inverted audio signal to suppress the future audio signal, avoiding the problem of the inverted audio signal being out of sync with the audio signal that needs to be suppressed due to the delay between the input end and the output end, and improving noise reduction. The effect can significantly reduce or even eliminate the impact of the input-to-output delay on noise reduction, and the audio suppression effect is better than that of the backward active noise reduction system commonly used in the industry; because the first audio signal is a time domain signal , the first audio signal is not an audio signal of a specific frequency, so the audio processing method provided by the embodiment of the present disclosure does not need to extract spectral features from the audio signal to generate a spectrogram, thereby simplifying the audio signal processing process and saving processing time. ; In the lookup table, low-order feature codes can be combined to obtain high-order feature codes, thereby achieving more efficient and longer predictions; and in this audio processing method, filtering processing can also be performed through the bandwidth controller , thereby achieving suppression of audio signals within a specific frequency range to ensure that users can still hear more important sounds. For example, when used in the automotive field, it must be ensured that the driver can hear the horn, etc. to improve driving safety. property; in addition, when no audio signal is collected, the audio processing method provided by the present disclosure may not be executed until the audio signal is collected, thereby saving power consumption.
对于本公开,还有以下几点需要说明:Regarding this disclosure, there are still several points that need to be explained:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结 构可参考通常设计。(1) The drawings of the embodiments of this disclosure only refer to the structures related to the embodiments of this disclosure, and other structures can refer to common designs.
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。(2) Without conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other to obtain new embodiments.
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。The above are only specific implementation modes of the present disclosure, but the protection scope of the present disclosure is not limited thereto. The protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims (26)

  1. 一种音频处理方法,包括:An audio processing method including:
    基于第一音频信号,生成控制指令;Based on the first audio signal, generate a control instruction;
    基于所述控制指令,生成第二音频信号;Based on the control instruction, generate a second audio signal;
    输出所述第二音频信号,以抑制第三音频信号,outputting the second audio signal to suppress the third audio signal,
    其中,所述第二音频信号的相位与所述第三音频信号的相位之和小于相位阈值,所述第一音频信号出现的时间早于所述第三音频信号出现的时间。Wherein, the sum of the phases of the second audio signal and the third audio signal is less than a phase threshold, and the first audio signal appears earlier than the third audio signal.
  2. 根据权利要求1所述的音频处理方法,其中,所述输出所述第二音频信号,以抑制第三音频信号,包括:The audio processing method according to claim 1, wherein the outputting the second audio signal to suppress the third audio signal includes:
    基于所述控制指令,确定输出所述第二音频信号的第一时刻;Based on the control instruction, determine a first moment to output the second audio signal;
    在所述第一时刻输出所述第二音频信号,Output the second audio signal at the first moment,
    其中,所述第三音频信号从第二时刻开始出现,所述第一时刻和所述第二时刻之间的时间差的绝对值小于时间阈值。Wherein, the third audio signal starts to appear from the second time, and the absolute value of the time difference between the first time and the second time is less than a time threshold.
  3. 根据权利要求2所述的音频处理方法,其中,所述第一时刻和所述第二时刻之间的时间差为0。The audio processing method according to claim 2, wherein the time difference between the first time and the second time is 0.
  4. 根据权利要求1~3任一项所述的音频处理方法,其中,所述基于第一音频信号,生成控制指令,包括:The audio processing method according to any one of claims 1 to 3, wherein generating a control instruction based on the first audio signal includes:
    获取所述第一音频信号;Obtain the first audio signal;
    对所述第一音频信号进行处理以预测得到第四音频信号;Process the first audio signal to predict a fourth audio signal;
    基于所述第四音频信号,生成所述控制指令。The control instruction is generated based on the fourth audio signal.
  5. 根据权利要求4所述的音频处理方法,其中,所述第二音频信号和/或所述第三音频信号和/或所述第四音频信号是周期性的或间歇性的时域信号。The audio processing method according to claim 4, wherein the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain signals.
  6. 根据权利要求4或5所述的音频处理方法,其中,所述对所述第一音频信号进行处理以预测得到第四音频信号,包括:The audio processing method according to claim 4 or 5, wherein said processing the first audio signal to predict a fourth audio signal includes:
    基于所述第一音频信号生成第一音频特征编码;Generate a first audio feature code based on the first audio signal;
    基于所述第一音频特征编码查询查找表,以得到第二音频特征编码;Query a lookup table based on the first audio feature coding to obtain a second audio feature coding;
    基于所述第二音频特征编码,预测得到所述第四音频信号。Based on the second audio feature encoding, the fourth audio signal is predicted.
  7. 根据权利要求6所述的音频处理方法,其中,所述查找表包括至少一个第一编码字段。The audio processing method of claim 6, wherein the lookup table includes at least one first encoding field.
  8. 根据权利要求7所述的音频处理方法,其中,所述查找表还包括至少 一个第二编码字段,多个所述第一编码字段组成一个所述第二编码字段。The audio processing method according to claim 7, wherein the lookup table further includes at least one second encoding field, and a plurality of the first encoding fields constitute one second encoding field.
  9. 根据权利要求8所述的音频处理方法,其中,所述第二音频特征编码包括至少一个所述第一编码字段和/或至少一个所述第二编码字段。The audio processing method according to claim 8, wherein the second audio feature encoding includes at least one of the first encoding field and/or at least one of the second encoding field.
  10. 根据权利要求4~9任一项所述的音频处理方法,其中,所述获取所述第一音频信号,包括:The audio processing method according to any one of claims 4 to 9, wherein said obtaining the first audio signal includes:
    采集初始音频信号;Collect initial audio signal;
    对所述初始音频信号进行下采样处理以得到所述第一音频信号。Perform downsampling processing on the initial audio signal to obtain the first audio signal.
  11. 根据权利要求4~9任一项所述的音频处理方法,其中,所述获取所述第一音频信号,包括:The audio processing method according to any one of claims 4 to 9, wherein said obtaining the first audio signal includes:
    采集初始音频信号;Collect initial audio signal;
    对所述初始音频信号进行滤波处理以得到所述第一音频信号。The initial audio signal is filtered to obtain the first audio signal.
  12. 根据权利要求1~11任一项所述的音频处理方法,其中,所述第二音频信号的相位与所述第三音频信号的相位相反。The audio processing method according to any one of claims 1 to 11, wherein the phase of the second audio signal is opposite to the phase of the third audio signal.
  13. 一种音频处理装置,包括:An audio processing device, including:
    指令生成模块,被配置为基于第一音频信号,生成控制指令;an instruction generation module configured to generate a control instruction based on the first audio signal;
    音频生成模块,被配置为基于所述控制指令,生成第二音频信号;an audio generation module configured to generate a second audio signal based on the control instruction;
    输出模块,被配置为输出所述第二音频信号,以抑制第三音频信号;an output module configured to output the second audio signal to suppress the third audio signal;
    其中,所述第二音频信号的相位与所述第三音频信号的相位之和小于相位阈值,所述第一音频信号出现的时间早于所述第三音频信号出现的时间。Wherein, the sum of the phases of the second audio signal and the third audio signal is less than a phase threshold, and the first audio signal appears earlier than the third audio signal.
  14. 根据权利要求13所述的音频处理装置,其中,所述输出模块包括时刻确定子模块和输出子模块,The audio processing device according to claim 13, wherein the output module includes a time determination sub-module and an output sub-module,
    所述时刻确定子模块被配置为基于所述控制指令,确定输出所述第二音频信号的第一时刻;The time determination submodule is configured to determine the first time to output the second audio signal based on the control instruction;
    所述输出子模块被配置为在所述第一时刻输出所述第二音频信号,The output sub-module is configured to output the second audio signal at the first moment,
    其中,所述第三音频信号从第二时刻开始出现,所述第一时刻和所述第二时刻之间的时间差的绝对值小于时间阈值。Wherein, the third audio signal starts to appear from the second time, and the absolute value of the time difference between the first time and the second time is less than a time threshold.
  15. 根据权利要求14所述的音频处理装置,其中,所述第一时刻和所述第二时刻之间的时间差为0。The audio processing device according to claim 14, wherein the time difference between the first time and the second time is 0.
  16. 根据权利要求13~15任一项所述的音频处理装置,其中,所述指令生成模块包括音频获取子模块、预测子模块和生成子模块,The audio processing device according to any one of claims 13 to 15, wherein the instruction generation module includes an audio acquisition sub-module, a prediction sub-module and a generation sub-module,
    所述音频获取子模块被配置为获取所述第一音频信号;The audio acquisition sub-module is configured to acquire the first audio signal;
    所述预测子模块被配置为对所述第一音频信号进行处理以预测得到第四音频信号;The prediction sub-module is configured to process the first audio signal to predict a fourth audio signal;
    所述生成子模块被配置为基于所述第四音频信号,生成所述控制指令。The generating sub-module is configured to generate the control instruction based on the fourth audio signal.
  17. 根据权利要求16所述的音频处理装置,其中,所述第二音频信号和/或所述第三音频信号和/或所述第四音频信号是周期性的或间歇性的时域信号。The audio processing device according to claim 16, wherein the second audio signal and/or the third audio signal and/or the fourth audio signal are periodic or intermittent time domain signals.
  18. 根据权利要求16或17所述的音频处理装置,其中,所述预测子模块包括查询单元和预测单元,The audio processing device according to claim 16 or 17, wherein the prediction sub-module includes a query unit and a prediction unit,
    所述查询单元被配置为基于所述第一音频信号生成第一音频特征编码以及基于所述第一音频特征编码查询查找表,以得到第二音频特征编码;The query unit is configured to generate a first audio feature code based on the first audio signal and query a lookup table based on the first audio feature code to obtain a second audio feature code;
    所述预测单元被配置为基于所述第二音频特征编码,预测得到所述第四音频信号。The prediction unit is configured to predict the fourth audio signal based on the second audio feature encoding.
  19. 根据权利要求18所述的音频处理装置,其中,所述查找表包括至少一个第一编码字段。The audio processing device of claim 18, wherein the lookup table includes at least one first encoding field.
  20. 根据权利要求19所述的音频处理装置,其中,所述查找表还包括至少一个第二编码字段,多个所述第一编码字段组成一个所述第二编码字段。The audio processing device according to claim 19, wherein the lookup table further includes at least one second encoding field, and a plurality of the first encoding fields constitute one second encoding field.
  21. 根据权利要求20所述的音频处理装置,其中,所述第二音频特征编码包括至少一个所述第一编码字段和/或至少一个所述第二编码字段。The audio processing device according to claim 20, wherein the second audio feature encoding includes at least one of the first encoding field and/or at least one of the second encoding field.
  22. 根据权利要求16~21任一项所述的音频处理装置,其中,所述音频获取子模块包括采集单元和下采样处理单元,The audio processing device according to any one of claims 16 to 21, wherein the audio acquisition sub-module includes a collection unit and a downsampling processing unit,
    所述采集单元被配置为采集初始音频信号;The collection unit is configured to collect an initial audio signal;
    所述下采样处理单元被配置为对所述初始音频信号进行下采样处理以得到所述第一音频信号。The down-sampling processing unit is configured to perform down-sampling processing on the initial audio signal to obtain the first audio signal.
  23. 根据权利要求16~21任一项所述的音频处理装置,其中,所述音频获取子模块包括采集单元和滤波单元,The audio processing device according to any one of claims 16 to 21, wherein the audio acquisition sub-module includes a collection unit and a filtering unit,
    所述采集单元被配置为采集初始音频信号;The collection unit is configured to collect an initial audio signal;
    所述滤波单元被配置为对所述初始音频信号进行滤波处理以得到所述第一音频信号。The filtering unit is configured to perform filtering processing on the initial audio signal to obtain the first audio signal.
  24. 根据权利要求13~23任一项所述的音频处理装置,其中,所述第二音频信号的相位与所述第三音频信号的相位相反。The audio processing device according to any one of claims 13 to 23, wherein the phase of the second audio signal is opposite to the phase of the third audio signal.
  25. 一种音频处理装置,包括:An audio processing device, including:
    一个或多个存储器,非瞬时性地存储有计算机可执行指令;One or more memories that non-transitoryly store computer-executable instructions;
    一个或多个处理器,配置为运行所述计算机可执行指令,one or more processors configured to execute the computer-executable instructions,
    其中,所述计算机可执行指令被所述一个或多个处理器运行时实现根据权利要求1~12任一项所述的音频处理方法。Wherein, when the computer-executable instructions are run by the one or more processors, the audio processing method according to any one of claims 1 to 12 is implemented.
  26. 一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据权利要求1~12任一项所述的音频处理方法。A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, and when executed by a processor, the computer-executable instructions implement any one of claims 1 to 12 The audio processing method described in the item.
PCT/CN2022/110275 2022-05-23 2022-08-04 Audio processing method and apparatus, and non-transitory computer-readable storage medium WO2023226193A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/117526 WO2023226234A1 (en) 2022-05-23 2022-09-07 Model training method and apparatus, and computer-readable non-transitory storage medium

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202263344642P 2022-05-23 2022-05-23
US63/344,642 2022-05-23
US202263351439P 2022-06-13 2022-06-13
US63/351,439 2022-06-13
US202263352213P 2022-06-14 2022-06-14
US63/352,213 2022-06-14

Publications (1)

Publication Number Publication Date
WO2023226193A1 true WO2023226193A1 (en) 2023-11-30

Family

ID=83825587

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2022/110275 WO2023226193A1 (en) 2022-05-23 2022-08-04 Audio processing method and apparatus, and non-transitory computer-readable storage medium
PCT/CN2022/117526 WO2023226234A1 (en) 2022-05-23 2022-09-07 Model training method and apparatus, and computer-readable non-transitory storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117526 WO2023226234A1 (en) 2022-05-23 2022-09-07 Model training method and apparatus, and computer-readable non-transitory storage medium

Country Status (3)

Country Link
CN (1) CN115294952A (en)
TW (1) TW202347318A (en)
WO (2) WO2023226193A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0820653D0 (en) * 2008-11-11 2008-12-17 Isis Innovation Acoustic noise reduction during magnetic resonance imaging
CN102110438A (en) * 2010-12-15 2011-06-29 方正国际软件有限公司 Method and system for authenticating identity based on voice
CN104900237A (en) * 2015-04-24 2015-09-09 上海聚力传媒技术有限公司 Method, device and system for denoising audio information
CN110970010A (en) * 2019-12-03 2020-04-07 广州酷狗计算机科技有限公司 Noise elimination method, device, storage medium and equipment
CN113470684A (en) * 2021-07-23 2021-10-01 平安科技(深圳)有限公司 Audio noise reduction method, device, equipment and storage medium
CN113903322A (en) * 2021-10-16 2022-01-07 艾普科模具材料(上海)有限公司 Automobile active noise reduction system and method based on mobile terminal and programmable logic device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9053697B2 (en) * 2010-06-01 2015-06-09 Qualcomm Incorporated Systems, methods, devices, apparatus, and computer program products for audio equalization
US9208771B2 (en) * 2013-03-15 2015-12-08 Cirrus Logic, Inc. Ambient noise-based adaptation of secondary path adaptive response in noise-canceling personal audio devices
CN109671440B (en) * 2019-01-09 2020-08-14 四川虹微技术有限公司 Method, device, server and storage medium for simulating audio distortion
CN112634923B (en) * 2020-12-14 2021-11-19 广州智讯通信系统有限公司 Audio echo cancellation method, device and storage medium based on command scheduling system
CN113707167A (en) * 2021-08-31 2021-11-26 北京地平线信息技术有限公司 Training method and training device for residual echo suppression model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0820653D0 (en) * 2008-11-11 2008-12-17 Isis Innovation Acoustic noise reduction during magnetic resonance imaging
CN102110438A (en) * 2010-12-15 2011-06-29 方正国际软件有限公司 Method and system for authenticating identity based on voice
CN104900237A (en) * 2015-04-24 2015-09-09 上海聚力传媒技术有限公司 Method, device and system for denoising audio information
CN110970010A (en) * 2019-12-03 2020-04-07 广州酷狗计算机科技有限公司 Noise elimination method, device, storage medium and equipment
CN113470684A (en) * 2021-07-23 2021-10-01 平安科技(深圳)有限公司 Audio noise reduction method, device, equipment and storage medium
CN113903322A (en) * 2021-10-16 2022-01-07 艾普科模具材料(上海)有限公司 Automobile active noise reduction system and method based on mobile terminal and programmable logic device

Also Published As

Publication number Publication date
TW202347319A (en) 2023-12-01
TW202347318A (en) 2023-12-01
CN115294952A (en) 2022-11-04
WO2023226234A1 (en) 2023-11-30

Similar Documents

Publication Publication Date Title
US9294834B2 (en) Method and apparatus for reducing noise in voices of mobile terminal
CN110164451B (en) Speech recognition
JP5085556B2 (en) Configure echo cancellation
US6801889B2 (en) Time-domain noise suppression
JP2011511571A (en) Improve sound quality by intelligently selecting between signals from multiple microphones
JP2004511823A (en) Dynamically reconfigurable speech recognition system and method
AU2017405291B2 (en) Method and apparatus for processing speech signal adaptive to noise environment
JP2014520284A (en) Generation of masking signals on electronic devices
JPH03256411A (en) High efficient encoder for digital data
US20200045166A1 (en) Acoustic signal processing device, acoustic signal processing method, and hands-free communication device
CN106251856B (en) Environmental noise elimination system and method based on mobile terminal
EP3815082A1 (en) Adaptive comfort noise parameter determination
CN112309416B (en) Vehicle-mounted voice echo eliminating method, system, vehicle and storage medium
WO2023226193A1 (en) Audio processing method and apparatus, and non-transitory computer-readable storage medium
WO2019239977A1 (en) Echo suppression device, echo suppression method, and echo suppression program
TWI837756B (en) Audio processing method, device, and non-transitory computer-readable storage medium
CN116612778B (en) Echo and noise suppression method, related device and medium
CN109429125B (en) Electronic device and control method of earphone device
US10650834B2 (en) Audio processing method and non-transitory computer readable medium
EP2884726B1 (en) Method for metadata - based collaborative voice processing for voice communication
CN115457930A (en) Model training method and device, and non-transitory computer readable storage medium
US7693294B2 (en) Method and system for reducing audible side effects of dynamic current consumption
CN117392994B (en) Audio signal processing method, device, equipment and storage medium
WO2023220918A1 (en) Audio signal processing method and apparatus, storage medium and vehicle
CN109841222B (en) Audio communication method, communication apparatus, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22943384

Country of ref document: EP

Kind code of ref document: A1