WO2020020247A1 - Signal processing method and device, and computer storage medium - Google Patents

Signal processing method and device, and computer storage medium Download PDF

Info

Publication number
WO2020020247A1
WO2020020247A1 PCT/CN2019/097552 CN2019097552W WO2020020247A1 WO 2020020247 A1 WO2020020247 A1 WO 2020020247A1 CN 2019097552 W CN2019097552 W CN 2019097552W WO 2020020247 A1 WO2020020247 A1 WO 2020020247A1
Authority
WO
WIPO (PCT)
Prior art keywords
microphone
speaker
signal
model
audio signal
Prior art date
Application number
PCT/CN2019/097552
Other languages
French (fr)
Chinese (zh)
Inventor
崔腾飞
Original Assignee
西安中兴新软件有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安中兴新软件有限责任公司 filed Critical 西安中兴新软件有限责任公司
Publication of WO2020020247A1 publication Critical patent/WO2020020247A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/03Connection circuits to selectively connect loudspeakers or headphones to amplifiers

Definitions

  • the present disclosure relates to, but is not limited to, the field of audio signal processing technology.
  • An embodiment of the present disclosure provides a signal processing method, which is applied to a signal processing device having at least one speaker and at least one microphone.
  • the method includes: receiving a first audio signal and the first audio signal being processed by the at least one speaker. Playing the signal; obtaining at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, wherein the at least one is obtained based on the at least one speaker A speaker model, and obtaining the at least one microphone model based on the at least one microphone; receiving a second audio signal using the at least one microphone, wherein the second audio signal includes an output by the at least one speaker and passes through the at least one speaker An echo signal generated by the first audio signal and received by at least one microphone; and removing at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
  • An embodiment of the present disclosure further provides a signal processing device including at least one speaker, at least one microphone, a first receiving part, a first obtaining part, a second receiving part, and a second obtaining part, wherein the first receiving part Configured to receive a first audio signal and play the first audio signal by the at least one speaker;
  • the first acquisition component is configured to be based on at least one speaker model, at least one microphone model, and the first audio signal, Acquiring at least one echo estimation signal corresponding to the first audio signal, wherein the at least one speaker model is obtained based on the at least one speaker, and the at least one microphone model is obtained based on the at least one microphone;
  • the second The receiving component is configured to receive a second audio signal by using the at least one microphone, wherein the second audio signal includes a first audio signal generated by the at least one microphone and received by the at least one microphone.
  • An echo signal; and the second acquisition component is configured It is configured to remove at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain
  • An embodiment of the present disclosure further provides a signal processing apparatus including a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program, the processor executes the computer program according to the present disclosure. Signal processing method.
  • An embodiment of the present disclosure further provides a computer storage medium on which a computer program is stored.
  • the computer program is executed by at least one processor, the at least one processor executes a signal processing method according to the present disclosure.
  • FIG. 1 shows a schematic circuit structure of a single speaker and a single microphone
  • Figure 2 shows a schematic circuit structure of a single speaker and a dual microphone
  • FIG. 3 shows a schematic circuit structure of a dual speaker and a single microphone
  • FIG. 4 is a schematic diagram showing a curve comparison between an echo reference signal and a microphone recording signal
  • FIG. 5 is another schematic diagram of comparison between an echo reference signal and a microphone recording signal
  • FIG. 6 shows another schematic circuit structure diagram of a dual speaker and a single microphone
  • FIG. 7 shows a circuit structure diagram of a dual speaker and a dual microphone
  • FIG. 8 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of another circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure
  • FIG. 11 is a schematic diagram showing a circuit structure of a four-speaker and a dual microphone according to an embodiment of the present disclosure
  • FIG. 12 is a schematic diagram of another circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure
  • FIG. 13 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 14 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • 15 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • 16 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 17 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 18 is a schematic diagram of a circuit structure of a single speaker and a single microphone according to an embodiment of the present disclosure
  • FIG. 19 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 20 is a schematic diagram of a circuit structure of a dual speaker and a single microphone according to an embodiment of the present disclosure
  • 21 is a schematic structural diagram of a signal processing device according to an embodiment of the present disclosure.
  • 22 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure.
  • FIG. 23 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure.
  • FIG. 24 is another schematic structural diagram of a signal processing apparatus according to an embodiment of the present disclosure.
  • 25 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure.
  • FIG. 26 is a schematic diagram of a hardware structure of a signal processing apparatus according to an embodiment of the present disclosure.
  • Echo is mainly the repetition of sound caused by the reflection of sound waves, that is, the sound emitted by the sound source is reflected back to the position of the sound source.
  • microphones and speakers are often used.
  • the microphone transmits voice or other sound data from the near end to the far end.
  • the microphone Since the speaker is disposed adjacent to the microphone, the sound emitted by the speaker will be received by the microphone, which is called an echo.
  • the echo In the absence of processing, the echo will be heard by the far-end user, resulting in unexpectedly great noise and unpleasant psychoacoustic experience. Therefore, echo cancellation technology is introduced to perform echo processing on the echoes extracted by the microphone.
  • Echo cancellation technology is a technology that processes the sound played by the speakers received by the microphone and retains only the sound played by non-speakers.
  • the echo cancellation technology commonly used at present is a single speaker (earpiece) echo cancellation technology, a stereo (dual speaker) echo cancellation technology and more channels (speakers) echo cancellation technology are relatively less used.
  • the circuit structure 10 includes a speaker 101, a microphone 102, an adder 103, an adaptive filter (AF) module 104, a voice processing module 105, a noise reduction and other processing module 106, a decoder 107, Encoder 108 and radio frequency end 109.
  • the radio frequency terminal 109 sends the received audio signal to the decoder 107, and the decoder 107 demodulates the audio signal.
  • the demodulated audio signal enters the voice processing module 105 for voice processing such as noise reduction and filtering, and then the voice signal Will be played by speaker 101.
  • the audio signal recorded by the microphone 102 includes an echo signal generated by the audio signal played by the speaker 101.
  • an audio signal is extracted at the front end of the speaker 101 as a reference signal.
  • the reference signal is input to the adder 103 after passing through the AF module 104, and the audio signal recorded by the microphone 102 is also input to the adder 103.
  • the two signals are subjected to subtraction processing in the adder 103, so that the audio signals recorded by the microphone 102 can be subjected to echo processing.
  • the audio signal that has undergone the echo processing is then processed by the noise reduction and other processing module 106 for voice processing, modulated in the encoder 108, and finally transmitted by the transmitting end 109 through the transmission line.
  • FIG. 2 shows an application example of the circuit structure 20 of a single speaker and a dual microphone.
  • the circuit structure 20 includes a speaker 201, a first microphone 202a, a second microphone 202b, a first adder 203a, a second adder 203b, a first AF module 204a, a second AF module 204b, and a voice processing module 205.
  • the audio signal at the front end of the speaker 201 is selected as a reference signal.
  • the reference signal performs echo processing on the audio signal received by the first microphone 202a through the first AF module 204a and the first adder 203a, and on the second microphone 202b through the second AF module 204b and the second adder 203b.
  • the received audio signal is subjected to echo processing.
  • the audio signal after the echo processing is input to the noise reduction and other processing module 206 for noise reduction processing, is modulated in the encoder 208, and is finally transmitted by the transmitting end 209 through the transmission line.
  • FIG. 3 shows an application example of the circuit structure 30 of a dual speaker and a single microphone.
  • the circuit structure 30 includes a first speaker 301a, a second speaker 301b, a microphone 302, an adder 303, an AF module 304, a voice processing module 305, a noise reduction and other processing module 306, and a decoder 307. , Encoder 308 and transmitting end 309.
  • the audio signals at the front ends of the first speaker 301a and the second speaker 301b are selected as reference signals for echo processing.
  • the reference signal performs echo processing on the audio signal received by the microphone 302 through the AF module 304 and the adder 303.
  • the audio signal after the echo processing is input to the noise reduction and other processing module 306 for noise reduction processing, then is modulated in the encoder 308, and finally sent by the transmitting terminal 309 through the transmission line.
  • the reference signal of the echo is larger than the signal recorded by the microphone (that is, the echo signal to be processed), and the linearity of the two is better. In this case, the echo is easier to handle and clean. It can be seen from FIG. 5 that, at around 4 kHz, the microphone recording signal (that is, the echo signal to be processed) is larger than the reference signal of the echo. At this time, the echo will be difficult to be processed.
  • circuit structure in FIG. 1 and FIG. 2 is suitable for a single-speaker application scenario.
  • a multi-speaker application scenario such as FIG. 3, if only one signal at the front end of a certain speaker is used as a reference signal for echo processing, there is no Considering the effect of the sound superposition of multiple speakers, the effect of echo processing will be worse, especially when the loudness of the sound played by the speaker is large, the effect of echo processing is even worse.
  • FIG. 6 shows an application example of the circuit structure 60 of a dual speaker and a single microphone.
  • the circuit structure 60 includes a first speaker 601a, a second speaker 601b, a microphone 602, a first adder 603a, a second adder 603b, a first AF module 604a, a second AF module 604b, and a voice processing module. 605.
  • the audio signals at the front ends of the first speaker 601a and the second speaker 601b are selected as reference signals for echo processing.
  • the audio signal at the front end of the first speaker 601a passes through the first AF module 604a and is input to the first adder 603a.
  • the audio signal received by the microphone 602 is also input to the first adder 603a.
  • the audio signal recorded by the first adder 603a to the microphone 602 is The echo generated by the audio signal played by the first speaker 601a is subjected to echo processing.
  • the audio signal at the front end of the second speaker 601b passes through the second AF module 604b and is input to the second adder 603b.
  • the audio signal received by the microphone 602 is also input to the second adder 603b.
  • the audio signal recorded by the second adder 603b to the microphone 602 is
  • the echo generated by the audio signal played by the second speaker 601b is subjected to echo processing.
  • the audio signal after the echo processing is input to the noise reduction and other processing module 606 for processing, and then is modulated in the encoder 608, and finally sent by the transmitting end 609 through the transmission line.
  • This processing method considers the effect of the superposition of multiple speakers on the echo signal, which is better than the processing effect of introducing the digital signal at the front of one speaker as the reference signal of the echo in FIG.
  • FIG. 7 shows an application example of the circuit structure 70 of a dual speaker and a dual microphone. It can be seen from FIG. 7 that the left / right two-channel stereo signals XL and XR input to the line input terminals LI (L) and LI (R) do not pass through the sum / difference signal generating device 52 and pass through the sound output terminals, respectively. SO (L) and SO (R) are output and reproduced at the speakers SP (L) and SP (R), and then collected by the microphones MC (L) and MC (R) and input to the sound input terminals SI (L), SI ( R).
  • the filters 40-1, 40-2, 40-3, and 40-4 are formed by, for example, FIR filters.
  • the impulse responses set by the filters 40-1, 40-2, 40-3, and 40-4 are respectively related to the speakers.
  • the transfer functions between SP (L) and SP (R) and the microphones MC (L) and MC (R) correspond to each other, thereby correspondingly generating echo processing signals EC1, EC2, EC3, and EC4.
  • Adders 44 and 46 and subtractors 48 and 50 are used for echo processing. These echo processing signals are output from the line output terminals LO (L) and LO (R), respectively.
  • the sum / difference signal generating device 52 includes an adder 54 and a subtractor 56 for generating a sum signal X M and a difference signal X S.
  • the correlation detection device 59 detects a correlation between the sum signal X M and the difference signal X S based on a correlation value calculation (or such calculation).
  • the transfer function calculation device 58 is used to calculate the transfer functions of the four audio transmission systems between the speakers SP (L), SP (R) and the microphones MC (L), MC (R).
  • the technical solution uses the sum signal and the difference signal of the stereo sound signal as reference signals, and calculates the transfer of the four audio transmission systems between the two speakers and the two microphones according to the cross-spectrum calculation of the reference signal and the sound signal recorded by the microphone. function.
  • the obtained transfer function is subjected to an inverse Fourier transform to obtain impulse responses, and these impulse responses are set in a filter device to generate an echo-processed reference signal and perform an echo process.
  • the technical solution also considers the impact of the acoustic path and acoustic structure and components on the echo signal.
  • the sum signal and difference signal of the two speaker signals and the echo signal recorded by the microphone are considered. More comprehensive.
  • the speaker is required to play the audio signal, and the microphone receives the audio signal at the same time, which is greatly affected by environmental fluctuations, and the speaker and the microphone have inconsistent working conditions under different audio signals and environments. This has caused some situations where the echo processing is not ideal.
  • the complexity of the transfer function is possible as the number of speakers and microphones increases, so does the complexity of the transfer function.
  • the acoustic response model of the speaker and microphone is not considered, so that when the speaker or microphone's high-frequency resonance peak is relatively low (such as around 4kHz) When the signal recorded by the microphone is higher than the amplitude of the reference signal at the high-frequency resonance peak, it is easy to produce echo howling.
  • the technical solutions of the embodiments of the present disclosure are proposed. The embodiments of the present disclosure will be described in detail below with reference to the drawings.
  • FIG. 8 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • a signal processing method is applied to a signal processing apparatus having at least one speaker and at least one microphone.
  • the method may include steps S801 to S804.
  • step S801 a first audio signal is received and the first audio signal is played by at least one speaker.
  • step S802 at least one echo estimation signal corresponding to the first audio signal is obtained according to at least one speaker model, at least one microphone model, and the first audio signal, and the at least one is obtained based on the at least one speaker.
  • a second audio signal is received by using at least one microphone, wherein the second audio signal includes an echo signal generated by the first audio signal and output by at least one speaker and received through the at least one microphone.
  • step S804 at least one echo estimation signal corresponding to the first audio signal is removed from the second audio signal to obtain an echo-processed audio signal.
  • the echo howling phenomenon generated when the high-frequency resonance peak of the speaker or the microphone is low can be effectively solved, and the calculation workload in the multi-microphone design is also reduced.
  • the method further includes: demodulating and preprocessing the voice of the first audio signal, wherein the first An audio signal is generated and transmitted by the remote device.
  • the first audio signal is generated and transmitted by the remote device.
  • the signal processing device receives the first audio signal, it performs demodulation and speech preprocessing.
  • the processed first audio signal enters the speaker and the speaker plays the first audio signal.
  • the method further includes: correspondingly establishing at least one speaker model according to the characteristic information of the at least one speaker, where The characteristic information of the at least one speaker includes circuit information and structural information corresponding to the at least one speaker; and at least one microphone model is correspondingly established according to the characteristic information of the at least one microphone, wherein the characteristic information of the at least one microphone includes Circuit information and structure information corresponding to the at least one microphone.
  • a speaker model and a microphone model are introduced.
  • the speaker model is based on the circuit information and structure information of the speaker to simulate the acoustic response of the speaker
  • the microphone model is based on the circuit information and structure information of the microphone to simulate the acoustic response of the microphone.
  • the acquired reference signal of the first audio signal can be made more accurate and closer to the echo signal of the first audio signal played by the speaker, so that the processing effect of the echo signal is better.
  • step S802 may include: inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the First delay and attenuation information corresponding to the acoustic response signal to obtain a first reference signal of each speaker, wherein the first delay and attenuation are correspondingly obtained based on distance and position information between each speaker and the microphone.
  • Information performing superposition and bit expansion processing on the first reference signal of each speaker to obtain a first superimposed reference signal; and performing acoustic response processing on the first superimposed reference signal based on the microphone model To obtain a first echo estimation signal of the first audio signal.
  • the echo estimation signal of the first audio signal is obtained through a series of signal processing steps such as speaker model, delay and attenuation, signal superposition, and microphone model, so that the echo estimation signal is closer to The echo signal of the first audio signal played by the speaker makes the subsequent echo signal processing effect better.
  • the circuit structure 90 includes a speaker group 901, a microphone 902, a speaker model group 903, a delay attenuation module group 904, a summing module 905, a microphone model 906, an echo processing module 907, a voice processing module 908, and noise reduction. And other processing modules 909, a decoder 910, an encoder 911, and a transmitting end 912.
  • the speaker group 901 includes a first speaker 901a, a second speaker 901b, a third speaker 901c, and a fourth speaker 901d.
  • the speaker model group 903 includes a first speaker model 903a, a second speaker model 903b, a third speaker model 903c, and a fourth speaker model 903d.
  • the delay attenuation module group 904 includes a first delay attenuation module 904a, a second delay attenuation module 904b, a third delay attenuation module 904c, and a fourth delay attenuation module 904d.
  • the speaker model group 903 is correspondingly established based on the speaker group 901, and the delay attenuation module group 904 performs corresponding delay and attenuation on the audio signal after passing through the speaker model group 903.
  • the radio frequency end 912 sends the received first audio signal to the decoder 910, and the decoder 910 demodulates the first audio signal.
  • the demodulated first audio signal enters the voice processing module 908 for pre-processing such as noise reduction and filtering, and then the first audio signal is played by the speaker group 901.
  • the second audio signal recorded by the microphone 902 includes an echo signal generated by the first audio signal played by the speaker group 901.
  • a first audio signal is extracted at the front end of the speaker group 901, and the first audio signal is input to the first speaker model 903a, the second speaker model 903b, the third speaker model 903c, and the fourth speaker model 903d for acoustics, respectively.
  • the summing module 905 performs superposition processing to obtain a superposed reference signal.
  • the superimposed reference signal is input to the microphone model 906 to perform acoustic response processing according to the sound pressure excitation of the microphone model 906, thereby obtaining an echo reference signal recorded through the microphone 902, that is, a first echo estimation signal of the first audio signal.
  • the first echo estimation signal and the second audio signal recorded by the microphone 902 are input to the echo processing module 907 for echo processing.
  • the echo-processed audio signal is then subjected to noise reduction processing through the noise reduction and other processing module 909, and then modulated in the encoder 911, and finally sent by the transmitting end 912 through the transmission line.
  • the sound signals of multiple speakers are superimposed and the number of bits is expanded, mainly considering that overflow may occur after superposition. , Need to be extended by digits.
  • the superimposed sound signal is input to the echo processing module 907 as an echo estimation signal, and is mainly used to perform echo processing on the echo signal generated by the first audio signal played by the speaker among the second audio signals recorded by the microphone 902. After the echo processing module 907 does not increase the original audio signal amplitude, the echo processing module 907 does not need to perform a bit expansion process on the second audio signal recorded by the microphone 902.
  • the method further includes: obtaining a high-frequency resonance peak of the microphone; and combining the high-frequency resonance peak of the microphone with a preset Compare the high-frequency resonance peak; and in response to the high-frequency resonance peak of the microphone being higher than a preset high-frequency resonance peak, cancel the acoustic response processing of the microphone model to the first superimposed reference signal.
  • the bandwidth of the first microphone (such as the microphone 902 in FIG. 9) is relatively wide, that is, the high-frequency resonance peak of the microphone 902 is relatively high, it is not necessary to use the microphone model 906 to pair the obtained first
  • An superimposed reference signal is subjected to acoustic response processing, that is, it is not necessary to modify the echo estimation signal through the microphone model 906, and the obtained first superimposed reference signal can be used as the echo estimation signal.
  • the first superimposed reference signal is obtained through the summing module 905
  • the microphone model 906 is required to pair the first The superimposed reference signal is processed for acoustic response, so that a modified echo estimation signal can be obtained. If the high-frequency resonance peak of the microphone 902 is 9 kHz, that is, the high-frequency resonance peak of the microphone 902 is higher than a preset high-frequency resonance peak, then The microphone model 906 is not required to perform acoustic response processing on the first superimposed reference signal.
  • the circuit structure 100 includes a speaker group 1001, a microphone 1002, a speaker model group 1003, a delay attenuation module group 1004, a summing module 1005, an echo processing module 1006, a voice processing module 1007, noise reduction, and other processing modules. 1008, a decoder 1009, an encoder 1010, and a transmitting end 1011.
  • the speaker group 1001 includes a first speaker 1001a, a second speaker 1001b, a third speaker 1001c, and a fourth speaker 1001d.
  • the speaker model group 1003 includes a first speaker model 1003a, a second speaker model 1003b, a third speaker model 1003c, and a fourth speaker model 1003d.
  • the delay attenuation module group 1004 includes a first delay attenuation module 1004a, a second delay attenuation module 1004b, a third delay attenuation module 1004c, and a fourth delay attenuation module 1004d.
  • the circuit structure 100 shown in FIG. 10 reduces the microphone model. That is, in the circuit structure shown in FIG. 10, the step of performing the acoustic response processing on the first superimposed reference signal by the microphone model is eliminated.
  • the superimposed reference signal is directly input as an echo estimation signal to the echo processing module 1006 to perform echo processing on the second audio signal recorded by the microphone 1002.
  • the other operation processes are the same as those of the circuit structure 90 shown in FIG. 9 described above, and will not be described in detail here.
  • step S802 may include: inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the The first delay and attenuation information corresponding to the acoustic response signal and the second delay and attenuation information corresponding to the acoustic response signal are used to obtain a first reference signal and a second reference signal for each speaker, wherein, based on each speaker and The distance and position information between the first microphones in the at least one microphone correspond to the first delay and attenuation information, and are based on the distance between each speaker and the second microphone in the at least one microphone and The position information corresponds to the second delay and attenuation information; the first reference signal and the second reference signal of each speaker are
  • the circuit structure 110 includes a speaker group 1101, a first microphone 1102a, a second microphone 1102b, a speaker model group 1103, a first delay attenuation module group 1104-1, and a second delay attenuation module group 1104-2.
  • the speaker group 1101 includes a first speaker 1101a, a second speaker 1101b, a third speaker 1101c, and a fourth speaker 1101d.
  • the speaker model group 1103 includes a first speaker model 1103a, a second speaker model 1103b, a third speaker model 1103c, and a fourth speaker model 1103d.
  • the first delay attenuation module group 1104-1 includes a first delay attenuation module 1104-1a, a second delay attenuation module 1104-1b, a third delay attenuation module 1104-1c, and a fourth delay attenuation module 1104-1d.
  • the second delay attenuation module group 1104-2 includes a fifth delay attenuation module 1104-2a, a sixth delay attenuation module 1104-2b, a seventh delay attenuation module 1104-2c, and an eighth delay attenuation module 1104-2d. .
  • the speaker model group 1103 is correspondingly established based on the speaker group 1101, and the first delay attenuation module group 1104-1 and the second delay attenuation module group 1104-2 perform corresponding delay and attenuation on the audio signal after passing through the speaker model group 1103. .
  • the radio frequency end 1112 sends the received first audio signal to the decoder 1111, and the decoder 1111 demodulates the first audio signal.
  • the demodulated first audio signal enters the voice processing module 1108 for preprocessing such as noise reduction and filtering, and then the first audio signal is played by the speaker group 1101.
  • the second audio signal recorded by the first microphone 1102a includes an echo signal generated by the first audio signal played by the speaker group 1101, and the second audio signal recorded by the second microphone 1102b also includes the first audio played by the speaker group 1101. The echo signal produced by the signal. In order to eliminate these echo signals as much as possible, the first audio signal is extracted at the front end of the speaker group 1101.
  • each of the extracted first audio signals is correspondingly input to the first speaker model 1103a.
  • the second speaker model 1103b, the third speaker model 1103c, and the fourth speaker model 1103d and then perform an acoustic response analysis process to obtain an acoustic response signal of each speaker in the speaker group 1101. Since sound takes time in the propagation process, and energy also decays during the propagation, in order to indicate the delay and attenuation of the sound transmitted from the speaker to the microphone position, it is necessary to use the first microphone 1102a and each speaker in the speaker group 1101.
  • the distance and position information are used to obtain the corresponding delay and attenuation in the first delay attenuation module group 1104-1, and the second delay attenuation is obtained according to the distance and position information of the second microphone 1102b and each speaker in the speaker group 1101. Corresponding delay and attenuation in module group 1104-2.
  • the acoustic response signals are input to the first delay attenuation module group 1104-1 and the first delay attenuation module group 1104-2, respectively, to perform delay attenuation processing to obtain eight-channel delayed and attenuated sound signals.
  • the first reference signal is input to the first summing module 1105a for superposition and bit expansion processing to obtain a first superposed reference signal.
  • the first superimposed reference signal is input to the first microphone model 1106a, and an acoustic response process is performed according to the sound pressure excitation of the first microphone model 1106a, thereby obtaining a first echo estimation signal recorded by the first microphone 1102a.
  • the second reference signal is input to the second summation module 1105b for superposition and bit expansion processing to obtain a second superimposed reference signal.
  • the second superimposed reference signal is input to the second microphone model 1106b, and the acoustic response processing is performed according to the sound pressure excitation of the second microphone model 1106b, thereby obtaining a second echo estimation signal recorded by the second microphone 1102b.
  • the first echo estimation signal and the second audio signal recorded by the first microphone 1102a are input to the first echo processing module 1107a for echo processing, so that the speaker group 1101 can be eliminated from the second audio signal recorded by the first microphone 1102a as much as possible.
  • the echo signal is generated by the first audio signal.
  • the second echo estimation signal and the second audio signal recorded by the second microphone 1102b are input to the second echo processing module 1107b for echo processing, so that the speaker group 1101 can be eliminated from the second audio signal recorded by the second microphone 1102b as much as possible.
  • the echo signal is generated by the first audio signal.
  • the audio signal after the echo processing is then subjected to noise reduction processing by the noise reduction and other processing module 1109, then modulated in the encoder 1111, and finally transmitted by the transmitting end 1112 through the transmission line.
  • the first echo processing module 1107a or the second echo processing module 1107b it is not necessary to perform a bit expansion process in consideration of the second audio signal recorded by the first microphone and the second microphone.
  • the reference signal of the echo signal that is, the echo estimation signal
  • the reference signal of the echo signal that is, the echo estimation signal
  • the echo processing separately (such as the first echo processing module 1107a and the second echo processing module 1107b in FIG. 11).
  • the speaker model and the delay attenuation module are separately provided, so first consider the speaker model, and then consider the position and distance information between the speaker and each microphone, so that The audio signal at the front of the speaker only needs to undergo the acoustic response processing of the speaker model once, which helps reduce the calculation workload in the multi-microphone design scheme.
  • the method further includes: acquiring a high-frequency resonance peak of the first microphone and the second superimposed reference signal.
  • the high-frequency resonance peak of the microphone comparing the high-frequency resonance peak of the first microphone and the high-frequency resonance peak of the second microphone with a preset high-frequency resonance peak, respectively; in response to the height of the first microphone
  • the high-frequency resonance peak is higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the first microphone model to the first superimposed reference signal; and in response to the high-frequency resonance peak of the second microphone being higher than a preset Set the high-frequency resonance peak, cancel the acoustic response processing of the second microphone model to the second superimposed reference signal.
  • the bandwidth of the first microphone or the second microphone is relatively wide, that is, the high-frequency resonance peak value of the microphone is high, the superposition obtained by using the first microphone model or the second microphone model is not required.
  • Acoustic response processing is performed on the reference signal, that is, there is no need to modify the echo reference signal through the first microphone model or the second microphone model.
  • the first microphone 1102a may be judged according to the bandwidth of the first microphone 1102a, that is, the first microphone 1102a The high-frequency resonance peak is compared with a preset high-frequency resonance peak.
  • the preset high-frequency resonance peak is 8 kHz. If the high-frequency resonance peak of the first microphone 1102a is 5 kHz, that is, the high-frequency resonance peak of the first microphone 1102a is lower than the preset high-frequency resonance peak, the first The microphone model 1106a performs an acoustic response processing on the first superimposed reference signal, so that a corrected first echo estimation signal can be obtained; if the high-frequency resonance peak of the first microphone 1102a is 9kHz, that is, the high-frequency resonance of the first microphone 1102a If the peak value is higher than the preset high-frequency resonance peak value, the first microphone model 1106a does not need to perform acoustic response processing on the superimposed reference signal, that is, the first microphone model 1106a does not need to modify the first echo estimation signal, and the first A microphone model 1106a performs an acoustic response process on the first superimposed reference signal.
  • the bandwidth of the second microphone 1102b that is, comparing the high-frequency resonance peak of the second microphone 1102b with a preset high-frequency resonance peak, so as to obtain whether the second microphone model 1106b is required for the second echo. Correction of estimated signal. If the high-frequency resonance peak of the second microphone 1102b is higher than the preset high-frequency resonance peak, the second microphone model 1106b does not need to perform the acoustic response processing on the superimposed reference signal, and the second microphone model 1106b may cancel the second superimposition.
  • the reference signal performs the steps of acoustic response processing.
  • the step of inputting the first audio signal into each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker includes: responding The first audio signal received by the at least one speaker is the same, inputting the first audio signal to the at least one speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; and responding to the The first audio signal received by the at least one speaker is different, and the first audio signal is correspondingly input to the at least one speaker model and an acoustic response process is performed to obtain an acoustic response signal of each speaker.
  • the first audio signals received by the multiple speakers may be the same. At this time, the same audio signal is input to the multiple speakers, and the audio signals are also Input to multiple speaker models, such as the circuit structure 90 shown in FIG. 9; In addition, the first audio signals received by the multiple speakers may also be different. At this time, these first audio signals are correspondingly input to multiple speakers. At the same time, these audio signals are correspondingly input to multiple speaker models, such as the circuit structure 120 shown in FIG. 12.
  • the circuit structure 120 includes a speaker group 1201, a microphone 1202, a speaker model group 1203, a delay attenuation module group 1204, a summing module 1205, a microphone model 1206, an echo processing module 1207, a voice processing module 1208, and noise reduction. And other processing modules 1209, decoder 1210, encoder 1211, and transmitting end 1212.
  • the speaker group 1201 includes a first speaker 1201a, a second speaker 1201b, a third speaker 1201c, and a fourth speaker 1201d.
  • the speaker model group 1203 includes a first speaker model 1203a, a second speaker model 1203b, a third speaker model 1203c, and a fourth speaker model 1203d.
  • the delay attenuation module group 1204 includes a first delay attenuation module 1204a, a second delay attenuation module 1204b, a third delay attenuation module 1204c, and a fourth delay attenuation module 1204d.
  • the circuit structure 120 shown in FIG. 12 is similar to the circuit structure 90 shown in FIG. 9 except that the first audio signal is divided into four first audio signals after passing through the voice processing module 1208.
  • An audio signal correspondingly enters and is played by the speaker group 1201, and each first audio signal received by the first speaker 1201a, the second speaker 1201b, the third speaker 1201c, and the fourth speaker 1201d is correspondingly input to the first speaker 1201a.
  • an echo howling phenomenon generated when a high-frequency resonance peak of a speaker or a microphone is low can be effectively solved, and a calculation workload in a multi-microphone design is also reduced.
  • the method may include steps S1301 to S1305.
  • step S1301 a digital signal is extracted from the front end of the speaker group 1001 to obtain a first audio signal.
  • step S1302 the first audio signal is input to the first speaker model 1003a, the second speaker model 1003b, the third speaker model 1003c, and the fourth speaker model 1003d, respectively, and an acoustic response process is performed to obtain an acoustic response signal of each speaker .
  • step S1303 a reference signal of each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1001 and the first delay and attenuation information corresponding to the acoustic response signal.
  • step S1304 the reference signal of each speaker is superposed and the number of bits is expanded to obtain a superposed reference signal, and the superposed reference signal is used as an echo estimation signal.
  • step S1305 the echo estimation signal is input to an echo processing module 1006 to perform echo processing on a second audio signal recorded by the microphone 1002.
  • the above process does not consider the modification of the echo estimation signal by the microphone model.
  • the superposed reference signal can be directly used as an echo estimation signal, and then the echo estimation signal is used to record the microphone 1002.
  • the second audio signal is subjected to echo processing.
  • the method may include steps S1401 to S1409.
  • step S1401 a digital signal is extracted from the front end of the speaker group 901 to obtain a first audio signal.
  • step S1402 the first audio signal is input to the first speaker model 903a, the second speaker model 903b, the third speaker model 903c, and the fourth speaker model 903d, respectively, and an acoustic response process is performed to obtain an acoustic response signal of each speaker. .
  • step S1403 a reference signal of each speaker is obtained according to an acoustic response signal of each speaker in the speaker group 901 and first delay and attenuation information corresponding to the acoustic response signal.
  • step S1404 the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
  • step S1405 a high-frequency resonance peak of the microphone 902 is acquired.
  • step S1406 the high-frequency resonance peak of the microphone 902 is compared with a preset high-frequency resonance peak.
  • step S1407 if the high-frequency resonance peak of the microphone 902 is higher than a preset high-frequency resonance peak, the superimposed reference signal is directly used as an echo estimation signal.
  • step S1408 if the high-frequency resonance peak of the microphone 902 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 906 to obtain an echo estimation signal.
  • step S1409 the echo estimation signal is input to an echo processing module 907 to perform echo processing on the second audio signal recorded by the microphone 902.
  • the process shown in FIG. 14 adds a microphone model and a judgment as to whether the microphone model is required to perform acoustic response processing on the superimposed reference signal.
  • a high-frequency resonance peak of the microphone 902 is obtained. Assume that the preset high-frequency resonance peak is 8kHz. If the high-frequency resonance peak of the microphone 902 is 5kHz, that is, the high-frequency resonance peak of the microphone 902 is lower than the preset high-frequency resonance peak, the superimposed reference of the microphone model 906 is required.
  • the signal is subjected to acoustic response processing, so that a modified echo estimation signal can be obtained, and the second audio signal recorded by the microphone 902 is subjected to echo processing using the modified echo estimation signal.
  • the high-frequency resonance peak of the microphone 902 is 9 kHz, that is, The high-frequency resonance peak of the microphone 902 is higher than the preset high-frequency resonance peak. Therefore, the microphone model 906 is not required to perform acoustic response processing on the superimposed reference signal, that is, the microphone model 906 is not required to modify the echo estimation signal, and the superposition is directly used.
  • the reference signal is used as an echo estimation signal to perform echo processing on the second audio signal recorded by the microphone 902.
  • FIG. 15 there is shown an application example in which the signal processing method according to the embodiment of the present disclosure is applied to the circuit structure 120 of the four-speaker and single-microphone shown in FIG. 12, and the method may include steps S1501 to S1509.
  • step S1501 one digital signal is extracted from the front end of each speaker in the speaker group 1201 to obtain four first audio signals.
  • step S1502 the four first audio signals are correspondingly input to the first speaker model 1203a, the second speaker model 1203b, the third speaker model 1203c, and the fourth speaker model 1203d, and the acoustic response processing is performed to obtain the acoustic response signal of each speaker .
  • step S1503 a reference signal for each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1201 and the first delay and attenuation information corresponding to the acoustic response signal.
  • step S1504 the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
  • step S1505 a high-frequency resonance peak of the microphone 1202 is acquired.
  • step S1506 the high-frequency resonance peak of the microphone 1202 is compared with a preset high-frequency resonance peak.
  • step S1507 if the high-frequency resonance peak of the microphone 1202 is higher than a preset high-frequency resonance peak, the superimposed reference signal is used as an echo estimation signal.
  • step S1508 if the high-frequency resonance peak of the microphone 1202 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 1206 to obtain an echo estimation signal.
  • step S1509 the echo estimation signal is input to an echo processing module 1207 to perform echo processing on the second audio signal recorded by the microphone 1202.
  • the process shown in FIG. 15 considers that the audio signals input by each speaker in the speaker group may be different.
  • the other processing steps are the same as the corresponding processing steps shown in FIG. 14 and will not be described in detail here.
  • the method may include steps S1601 to S1617.
  • step S1601 one digital signal is extracted from the front end of each speaker in the speaker group 1101 to obtain four first audio signals.
  • step S1602 the four first audio signals are correspondingly input to the first speaker model 1103a, the second speaker model 1103b, the third speaker model 1103c, and the fourth speaker model 1103d, and an acoustic response process is performed to obtain an acoustic response signal of each speaker. .
  • step S1603 a first reference signal of each speaker is obtained according to an acoustic response signal of each speaker in the speaker group 1101 and first delay and attenuation information corresponding to the acoustic response signal.
  • step S1604 the first reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a first superposed reference signal.
  • step S1605 a high-frequency resonance peak of the first microphone 1102a is acquired.
  • step S1606 the high-frequency resonance peak of the first microphone 1102a is compared with a preset high-frequency resonance peak.
  • step S1607 if the high-frequency resonance peak of the first microphone 1102a is higher than a preset high-frequency resonance peak, the first superimposed reference signal is used as a first echo estimation signal.
  • step S1608 if the high-frequency resonance peak of the first microphone 1102a is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the first superimposed reference signal based on the first microphone model 1106a to obtain a first Echo estimation signal.
  • step S1609 the first echo estimation signal is input to a first echo processing module 1107a to perform a first echo processing on a second audio signal recorded by the first microphone 1102a.
  • step S1610 the second reference signal of each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1101 and the second delay and attenuation information corresponding to the acoustic response signal.
  • step S1611 the second reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a second superimposed reference signal.
  • step S1612 a high-frequency resonance peak of the second microphone 1102b is acquired.
  • step S1613 the high-frequency resonance peak of the second microphone 1102b is compared with a preset high-frequency resonance peak.
  • step S1614 if the high-frequency resonance peak of the second microphone 1102b is higher than a preset high-frequency resonance peak, the second superimposed reference signal is used as a second echo estimation signal.
  • step S1615 if the high-frequency resonance peak of the second microphone 1102b is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the second superimposed reference signal based on the second microphone model 1106b to obtain a second Echo estimation signal.
  • step S1616 the second echo estimation signal is input to a second echo processing module 1107b to perform a second echo processing on the second audio signal recorded by the second microphone 1102b.
  • step S1617 the audio signals after the first echo processing and the second echo processing are input to the noise reduction and other processing module 1109.
  • the reference signal of the echo signal ie, the echo estimation signal
  • echo processing needs to be performed separately (such as the first echo processing module 1107a in FIG. 11).
  • the speaker model since the speaker group is shared, and the speaker model and the delay attenuation module are separately provided, so first consider the speaker model, and then consider the position and distance information between the speaker and each microphone, so that The audio signal at the front of the speaker only needs to undergo the acoustic response processing of the speaker model once, thereby helping to reduce the computational workload in a multi-microphone design.
  • the foregoing embodiments perform echo processing on the echo signals of the four speakers.
  • the embodiments of the present disclosure are also suitable for echo processing of other multi-speakers and multi-microphones, as well as echo processing of a single speaker.
  • the number of speakers is not specifically limited.
  • the method may include steps S1701 to S1708.
  • step S1701 a digital signal is extracted from the front end of the speaker 1801 to obtain a first audio signal.
  • step S1702 the first audio signal is input to a speaker model 1803 for acoustic response processing to obtain an acoustic response signal of the speaker 1801.
  • step S1703 a reference signal of the speaker 1801 is obtained according to the acoustic response signal of the speaker 1801 and the delay and attenuation information corresponding to the acoustic response signal.
  • step S1704 a high-frequency resonance peak of the microphone 1802 is acquired.
  • step S1705 the high-frequency resonance peak of the microphone 1802 is compared with a preset high-frequency resonance peak.
  • step S1706 if the high-frequency resonance peak of the microphone 1802 is higher than a preset high-frequency resonance peak, the reference signal is used as an echo estimation signal.
  • step S1707 if the high-frequency resonance peak of the microphone 1802 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the reference signal based on the microphone model 1805 to obtain an echo estimation signal.
  • step S1708 the echo estimation signal is input to the adder 1806 to perform echo processing on the second audio signal recorded by the microphone 1802.
  • the circuit structure 180 shown in FIG. 18 is taken as an example.
  • the circuit structure 180 includes a speaker 1801, a microphone 1802, a speaker model 1803, a delay attenuation module 1804, a microphone model 1805, an adder 1806,
  • the speaker model 1803 is correspondingly established based on the speaker 1801, and the delay attenuation module 1804 performs corresponding delay and attenuation on the audio signal after passing through the speaker model 1803.
  • the delay and attenuation information is obtained according to the distance and position information between the speaker 1801 and the microphone 1802.
  • the function of the adder 1806 is the same as that of the echo processing module, and is used to perform echo processing on the echo signal generated by the first audio signal played by the speaker 1801 among the second audio signals recorded by the microphone 1802.
  • the operation process of the circuit structure shown in FIG. 18 is similar to the operation process of the aforementioned multi-speaker circuit structure, and will not be described in detail here.
  • the method may include steps S1901 to S1909.
  • step S1901 one digital signal is extracted from the front ends of the first speaker 2001a and the second speaker 2001b respectively to obtain two first audio signals.
  • step S1902 two channels of the first audio signal are correspondingly input to the first speaker model 2003a and the second speaker model 2003b to perform an acoustic response process to obtain an acoustic response signal of each speaker.
  • step S1903 a reference signal of each speaker is obtained according to the acoustic response signal of each speaker and the first delay attenuation module 2004a and the second delay attenuation module 2004b.
  • step S1904 the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
  • step S1905 a high-frequency resonance peak of the microphone 2002 is acquired.
  • step S1906 the high-frequency resonance peak of the microphone 2002 is compared with a preset high-frequency resonance peak.
  • step S1907 if the high-frequency resonance peak of the microphone 2002 is higher than a preset high-frequency resonance peak, the superimposed reference signal is used as an echo estimation signal.
  • step S1908 if the high-frequency resonance peak of the microphone 2002 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 2006 to obtain an echo estimation signal.
  • step S1909 the echo estimation signal is input to an echo processing module 2007 to perform echo processing on the second audio signal recorded by the microphone 2002.
  • the circuit structure 200 shown in FIG. 20 is taken as an example.
  • the circuit structure 200 includes a first speaker 2001a, a second speaker 2001b, a microphone 2002, a first speaker model 2003a, and a second speaker model. 2003b, first delay attenuation module 2004a, second delay attenuation module 2004b, summing module 2005, microphone model 2006, adder 2007, speech processing module 2008, noise reduction and other processing module 2009, decoder 2010, encoder 2011 and launcher 2012.
  • the function of the adder 2007 is the same as that of the echo processing module.
  • the operation process of the circuit structure shown in FIG. 20 is similar to the operation process of the aforementioned multi-speaker circuit structure (such as FIG. 11), and will not be described in detail here.
  • 21 to 25 are schematic structural diagrams of a signal processing apparatus according to an embodiment of the present disclosure.
  • the signal processing device 210 may include at least one speaker 2101, at least one microphone 2102, a first receiving part 2103, a first obtaining part 2104, a second receiving part 2105, and a second obtaining part 2106.
  • the first receiving part 2103 is configured to receive a first audio signal and play the first audio signal by the at least one speaker.
  • the first obtaining part 2104 is configured to obtain at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, and obtain the estimated signal based on the at least one speaker.
  • the at least one speaker model is described, and the at least one microphone model is obtained based on the at least one microphone.
  • the second receiving part 2105 is configured to receive a second audio signal by using the at least one microphone, wherein the second audio signal includes the first audio signal output by the at least one speaker and received by the at least one microphone.
  • the echo signal produced by the signal is configured to receive a second audio signal by using the at least one microphone, wherein the second audio signal includes the first audio signal output by the at least one speaker and received by the at least one microphone. The echo signal produced by the signal.
  • the second acquisition component 2106 is configured to remove at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
  • the signal processing device 210 may further include a pre-processing component 2107 configured to demodulate and pre-process the first audio signal, where the first audio signal is generated and transmitted by a remote device. .
  • the signal processing device 210 may further include a modeling component 2108 configured to: correspondingly establish at least one speaker model according to the characteristic information of the at least one speaker, where the characteristic information of the at least one speaker includes the Circuit information and structure information corresponding to at least one speaker; and correspondingly establishing at least one microphone model according to the characteristic information of the at least one microphone, wherein the characteristic information of the at least one microphone includes circuit information and Structural information.
  • a modeling component 2108 configured to: correspondingly establish at least one speaker model according to the characteristic information of the at least one speaker, where the characteristic information of the at least one speaker includes the Circuit information and structure information corresponding to at least one speaker; and correspondingly establishing at least one microphone model according to the characteristic information of the at least one microphone, wherein the characteristic information of the at least one microphone includes circuit information and Structural information.
  • the first obtaining part 2104 may be configured to: input the first audio signal into each speaker model and perform acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the acoustic response signal Corresponding first delay and attenuation information to obtain a first reference signal for each speaker, wherein the first delay and attenuation information is correspondingly obtained based on the distance and position information between each speaker and the microphone; Performing superposition and bit expansion processing on the first reference signal of each speaker to obtain a first superimposed reference signal; and performing an acoustic response process on the first superimposed reference signal based on the microphone model to obtain A first echo estimation signal of the first audio signal.
  • the signal processing device 210 may further include a first comparison component 2109 configured to: obtain a high-frequency resonance peak of the microphone; and compare the high-frequency resonance peak of the microphone with a preset high-frequency resonance peak And in response to the high-frequency resonance peak of the microphone being higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the microphone model to the first superimposed reference signal.
  • a first comparison component 2109 configured to: obtain a high-frequency resonance peak of the microphone; and compare the high-frequency resonance peak of the microphone with a preset high-frequency resonance peak And in response to the high-frequency resonance peak of the microphone being higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the microphone model to the first superimposed reference signal.
  • the first obtaining part 2104 may be configured to: input the first audio signal into each speaker model and perform acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the acoustic response signal Corresponding first delay and attenuation information and second delay and attenuation information corresponding to the acoustic response signal to obtain a first reference signal and a second reference signal for each speaker, wherein, based on each speaker and the at least one The distance and position information between the first microphones in a microphone are correspondingly obtained to obtain the first delay and attenuation information, and are based on the distance and position information between each speaker and the second microphone in the at least one microphone.
  • the second delay and attenuation information Obtaining the second delay and attenuation information; and performing superposition and bit expansion on the first reference signal and the second reference signal of each speaker to obtain a first superposed reference signal and a second superposition Based on the first microphone model in the at least one microphone model, Performing acoustic response processing on the first superimposed reference signal to obtain a first echo estimation signal of the first audio signal; and based on a second microphone model of the at least one microphone model, the second superimposed The reference signal is subjected to an acoustic response process to obtain a second echo estimation signal of the first audio signal.
  • the signal processing device 210 may further include a second comparison component 2110 configured to: obtain a high-frequency resonance peak of the first microphone and a high-frequency resonance peak of the second microphone; The high-frequency resonance peak of the second microphone and the high-frequency resonance peak of the second microphone are respectively compared with a preset high-frequency resonance peak; in response to the high-frequency resonance peak of the first microphone being higher than the preset high-frequency resonance peak, Cancel the acoustic response processing of the first microphone model to the first superimposed reference signal; and cancel the second microphone in response to the high-frequency resonance peak of the second microphone being higher than a preset high-frequency resonance peak The model processes the acoustic response of the second superimposed reference signal.
  • a second comparison component 2110 configured to: obtain a high-frequency resonance peak of the first microphone and a high-frequency resonance peak of the second microphone; The high-frequency resonance peak of the second microphone and the high-frequency resonance peak of the second microphone are respectively compared with a preset high-frequency resonance peak; in response to the high
  • the first obtaining part 2104 may be configured to: in response to the first audio signals received by the at least one speaker being the same, input the first audio signals to the at least one speaker model and perform acoustic response processing to obtain each An acoustic response signal of the speaker; and in response to the first audio signal received by the at least one speaker being different, correspondingly inputting the first audio signal to the at least one speaker model and performing acoustic response processing to obtain the Acoustic response signal.
  • the “component” may be a part of a circuit, a part of a processor, a part of a program or software, and the like, of course, it may also be a unit, or may be a module or non-modular.
  • each component in this embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional modules.
  • the integrated unit is implemented in the form of a software functional module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment may be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions for making a computer device (which may be a personal computer, a server, or Network equipment, etc.) or a processor executes all or part of the steps of the method described in this embodiment.
  • the foregoing storage media include: U disks, mobile hard disks, read only memories (ROM, Read Only Memory), random access memories (RAM, Random Access Memory), magnetic disks or optical disks, and other media that can store program codes.
  • an embodiment of the present disclosure provides a computer storage medium on which a computer program is stored.
  • the computer program is executed by at least one processor, the at least one processor executes a signal processing method according to embodiments of the present disclosure. .
  • the signal processing device 210 may include a network interface 2601, a memory 2602, and a processor 2603.
  • the various components are coupled together by a bus system 2604.
  • the bus system 2604 is used to implement connection and communication between these components.
  • the bus system 2604 may include a data bus, and may further include a power bus, a control bus, and a status signal bus.
  • various buses are labeled as the bus system 2604 in FIG. 26.
  • the network interface 2601 is used to receive and send signals during the process of transmitting and receiving information with other external network elements.
  • the memory 2602 stores a computer program capable of running on the processor 2603.
  • the processor 2603 runs the computer program, it can execute a signal processing method according to various embodiments of the present disclosure.
  • the memory 2602 in the embodiment of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • Non-volatile memory can be Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable read-only memory (EPROM, EEPROM) or flash memory.
  • the volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDRSDRAM
  • enhanced SDRAM ESDRAM
  • synchronous connection dynamic random access memory Synchronous DRAM, SLDRAM
  • Direct RAMbus RAM Direct RAMbus RAM, DRRAM
  • the memory 2602 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
  • the processor 2603 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 2603 or an instruction in the form of software.
  • the above-mentioned processor 2603 may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA), or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in combination with the embodiments of the present disclosure may be directly embodied as being executed by a hardware decoding processor, or may be executed and completed by using a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a mature storage medium such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, and the like.
  • the storage medium is located in the memory 2602, and the processor 2603 reads the information in the memory 2602 and completes the steps of the foregoing method in combination with its hardware.
  • the embodiments described herein may be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more application-specific integrated circuits (ASICs), digital signal processors (DSP), digital signal processing devices (DSPD), programmable Programmable Logic Device (PLD), Field-Programmable Gate Array (FPGA), general-purpose processor, controller, microcontroller, microprocessor, other for performing functions described in this disclosure Electronic unit or combination thereof.
  • ASICs application-specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • controller microcontroller
  • microprocessor other for performing functions described in this disclosure Electronic unit or combination thereof.
  • the techniques described herein can be implemented through modules (e.g., procedures, functions, etc.) that perform the functions described herein.
  • Software codes may be stored in a memory and executed by a processor.
  • the memory may be implemented in the processor or external to the processor.

Abstract

Embodiments of the present invention provide a signal processing method, a signal processing device, and a computer storage medium. The method comprises: receiving a first audio signal and playing the first audio signal by at least one loudspeaker; obtaining, according to at least one loudspeaker model, at least one microphone model, and the first audio signal, at least one echo estimation signal corresponding to the first audio signal; receiving a second audio signal using at least one microphone, wherein the second audio signal comprises an echo signal generated by the first audio signal, output by the at least one loudspeaker, and received by the at least one microphone; and removing, from the second audio signal, at least one echo estimation signal corresponding to the first audio signal to obtain an echo-processed audio signal.

Description

信号处理方法、装置以及计算机存储介质Signal processing method, device and computer storage medium 技术领域Technical field
本公开涉及(但不限于)音频信号处理技术领域。The present disclosure relates to, but is not limited to, the field of audio signal processing technology.
背景技术Background technique
在通话过程中,人们有时能够听到自己说话的声音,这主要是因为对方扬声器播放的声音被其麦克风(Microphone,MIC)接收并传输回来,是由声学方面的原因产生的。因此,在涉及到扬声器、MIC双工场景中,一般都会存在回声现象,例如,终端通话、个人计算机(Personal Computer,PC)网络电话、个人数字助理(Personal Digital Assistant,PDA)网络通话、边录边播场景等等。During a call, people can sometimes hear the sound of their own speech. This is mainly because the sound played by the speaker of the other party is received and transmitted back by their microphone (Microphone, MIC), which is caused by acoustic reasons. Therefore, in the scenarios involving speakers and MIC duplex, echo phenomena generally exist, such as terminal calls, personal computer (PC) Internet calls, personal digital assistant (PDA) Internet calls, and side recording. Side broadcast scenes and more.
发明内容Summary of the Invention
本公开实施例提供了一种信号处理方法,应用于具有至少一个扬声器和至少一个麦克风的信号处理装置,所述方法包括:接收第一音频信号并由所述至少一个扬声器对所述第一音频信号进行播放;根据至少一个扬声器模型、至少一个麦克风模型以及所述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号,其中,基于所述至少一个扬声器得到所述至少一个扬声器模型,并且基于所述至少一个麦克风得到所述至少一个麦克风模型;利用所述至少一个麦克风接收第二音频信号,其中,所述第二音频信号包括由所述至少一个扬声器输出并通过所述至少一个麦克风接收的由所述第一音频信号产生的回音信号;以及从所述第二音频信号中去除所述第一音频信号对应的至少一个回音估计信号,以获得回音处理后的音频信号。An embodiment of the present disclosure provides a signal processing method, which is applied to a signal processing device having at least one speaker and at least one microphone. The method includes: receiving a first audio signal and the first audio signal being processed by the at least one speaker. Playing the signal; obtaining at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, wherein the at least one is obtained based on the at least one speaker A speaker model, and obtaining the at least one microphone model based on the at least one microphone; receiving a second audio signal using the at least one microphone, wherein the second audio signal includes an output by the at least one speaker and passes through the at least one speaker An echo signal generated by the first audio signal and received by at least one microphone; and removing at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
本公开实施例还提供了一种信号处理装置,包括至少一个扬声器、至少一个麦克风、第一接收部件、第一获取部件、第二接收部件和第二获取部件,其中,所述第一接收部件配置为接收第一音频信号并由所述至少一个扬声器对所述第一音频信号进行播放;所述第一获取部件配置为根据至少一个扬声器模型、至少一个麦克风模型以及所 述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号,其中,基于所述至少一个扬声器得到所述至少一个扬声器模型,并且基于所述至少一个麦克风得到所述至少一个麦克风模型;所述第二接收部件配置为利用所述至少一个麦克风接收第二音频信号,其中,所述第二音频信号包括由所述至少一个扬声器输出并通过所述至少一个麦克风接收的由所述第一音频信号产生的回音信号;并且所述第二获取部件配置为从所述第二音频信号中去除所述第一音频信号对应的至少一个回音估计信号,以获得回音处理后的音频信号。An embodiment of the present disclosure further provides a signal processing device including at least one speaker, at least one microphone, a first receiving part, a first obtaining part, a second receiving part, and a second obtaining part, wherein the first receiving part Configured to receive a first audio signal and play the first audio signal by the at least one speaker; the first acquisition component is configured to be based on at least one speaker model, at least one microphone model, and the first audio signal, Acquiring at least one echo estimation signal corresponding to the first audio signal, wherein the at least one speaker model is obtained based on the at least one speaker, and the at least one microphone model is obtained based on the at least one microphone; the second The receiving component is configured to receive a second audio signal by using the at least one microphone, wherein the second audio signal includes a first audio signal generated by the at least one microphone and received by the at least one microphone. An echo signal; and the second acquisition component is configured It is configured to remove at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
本公开实施例还提供了一种信号处理装置,包括存储器和处理器,其中,所述存储器存储有计算机程序,当所述处理器运行所述计算机程序时,所述处理器执行根据本公开的信号处理方法。An embodiment of the present disclosure further provides a signal processing apparatus including a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program, the processor executes the computer program according to the present disclosure. Signal processing method.
本公开实施例还提供了一种计算机存储介质,其上存储有计算机程序,所述计算机程序被至少一个处理器执行时,所述至少一个处理器执行根据本公开的信号处理方法。An embodiment of the present disclosure further provides a computer storage medium on which a computer program is stored. When the computer program is executed by at least one processor, the at least one processor executes a signal processing method according to the present disclosure.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1示出了单扬声器和单麦克风的电路结构示意图;FIG. 1 shows a schematic circuit structure of a single speaker and a single microphone;
图2示出了单扬声器和双麦克风的电路结构示意图;Figure 2 shows a schematic circuit structure of a single speaker and a dual microphone;
图3示出了双扬声器和单麦克风的电路结构示意图;FIG. 3 shows a schematic circuit structure of a dual speaker and a single microphone;
图4示出了回音参考信号与麦克风录制信号的曲线对比示意图;FIG. 4 is a schematic diagram showing a curve comparison between an echo reference signal and a microphone recording signal;
图5示出了回音参考信号与麦克风录制信号的另一曲线对比示意图;FIG. 5 is another schematic diagram of comparison between an echo reference signal and a microphone recording signal;
图6示出了双扬声器和单麦克风的另一电路结构示意图;FIG. 6 shows another schematic circuit structure diagram of a dual speaker and a single microphone; FIG.
图7示出了双扬声器和双麦克风的电路结构示意图;FIG. 7 shows a circuit structure diagram of a dual speaker and a dual microphone;
图8为根据本公开实施例的信号处理方法的流程示意图;8 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure;
图9为根据本公开实施例的四扬声器和单麦克风的电路结构示意图;9 is a schematic diagram of a circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure;
图10为根据本公开实施例的四扬声器和单麦克风的另一电路结构示意图;10 is a schematic diagram of another circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure;
图11为根据本公开实施例的四扬声器和双麦克风的电路结构示 意图;11 is a schematic diagram showing a circuit structure of a four-speaker and a dual microphone according to an embodiment of the present disclosure;
图12为根据本公开实施例的四扬声器和单麦克风的另一电路结构示意图;12 is a schematic diagram of another circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure;
图13为根据本公开实施例的信号处理方法的流程示意图;13 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure;
图14为根据本公开实施例的信号处理方法的流程示意图;14 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure;
图15为根据本公开实施例的信号处理方法的流程示意图;15 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure;
图16为根据本公开实施例的信号处理方法的流程示意图;16 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure;
图17为根据本公开实施例的信号处理方法的流程示意图;17 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure;
图18为根据本公开实施例的单扬声器和单麦克风的电路结构示意图;18 is a schematic diagram of a circuit structure of a single speaker and a single microphone according to an embodiment of the present disclosure;
图19为根据本公开实施例的信号处理方法的流程示意图;19 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure;
图20为根据本公开实施例的双扬声器和单麦克风的电路结构示意图;20 is a schematic diagram of a circuit structure of a dual speaker and a single microphone according to an embodiment of the present disclosure;
图21为根据本公开实施例的信号处理装置的结构示意图;21 is a schematic structural diagram of a signal processing device according to an embodiment of the present disclosure;
图22为根据本公开实施例的信号处理装置的另一结构示意图;22 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure;
图23为根据本公开实施例的信号处理装置的另一结构示意图;23 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure;
图24为根据本公开实施例的信号处理装置的另一结构示意图;FIG. 24 is another schematic structural diagram of a signal processing apparatus according to an embodiment of the present disclosure; FIG.
图25为根据本公开实施例的信号处理装置的另一结构示意图;以及25 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure; and
图26为根据本公开实施例的信号处理装置的硬件结构示意图。FIG. 26 is a schematic diagram of a hardware structure of a signal processing apparatus according to an embodiment of the present disclosure.
具体实施方式detailed description
下面将结合附图对本公开实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
回声主要是由声波的反射引起的声音的重复,即,将声源所发的声音反射回声源位置。在电子设备中,通常会使用麦克风和扬声器。当扬声器播放由远端接收的声音数据时,麦克风由近端将语音或者其他声音数据传送至远端。对于典型的免提式系统而言,由于扬声器与麦克风邻近设置,因而扬声器发出的声音会被该麦克风接收,此即所谓的回声。在没有处理情况下,回声会被位于远端的使用者听到,进 而产生非预期的极大噪音与不愉快的心理声学经验。因此,引入了回声消除技术来对麦克风所提取的回声进行回音处理。Echo is mainly the repetition of sound caused by the reflection of sound waves, that is, the sound emitted by the sound source is reflected back to the position of the sound source. In electronic devices, microphones and speakers are often used. When the speaker plays sound data received by the far end, the microphone transmits voice or other sound data from the near end to the far end. For a typical hands-free system, since the speaker is disposed adjacent to the microphone, the sound emitted by the speaker will be received by the microphone, which is called an echo. In the absence of processing, the echo will be heard by the far-end user, resulting in unexpectedly great noise and unpleasant psychoacoustic experience. Therefore, echo cancellation technology is introduced to perform echo processing on the echoes extracted by the microphone.
回声消除技术是把麦克风接收进来的扬声器播放的声音进行处理,仅保留非扬声器播放的声音的技术。目前常用的回声消除技术是单扬声器(听筒)的回声消除技术,立体声(双扬声器)回声消除技术以及更多声道(扬声器)的回声消除技术相对较少使用。Echo cancellation technology is a technology that processes the sound played by the speakers received by the microphone and retains only the sound played by non-speakers. The echo cancellation technology commonly used at present is a single speaker (earpiece) echo cancellation technology, a stereo (dual speaker) echo cancellation technology and more channels (speakers) echo cancellation technology are relatively less used.
单扬声器回声消除技术使用比较广泛,即,通话中只有一个扬声器播放声音。示例性的,参见图1,其示出了单扬声器和单麦克风的电路结构10的应用示例。如图1所示,电路结构10包括扬声器101、麦克风102、加法器103、自适应滤波器(Adaptive Filter,AF)模块104、语音处理模块105、降噪及其他处理模块106、解码器107、编码器108和射频端109。射频端109将接收的音频信号发送至解码器107,由解码器107对该音频信号进行解调,解调后的音频信号进入语音处理模块105进行降噪、滤波等语音处理,然后该语音信号会被扬声器101进行播放。麦克风102录制的音频信号中会包括由扬声器101播放的音频信号所产生的回音信号。为了尽可能消除这个回音信号,在扬声器101前端提取音频信号作为参考信号。参考信号经过AF模块104之后输入加法器103,麦克风102录制的音频信号也输入加法器103。在加法器103中对这两个信号进行减法处理,从而可以对麦克风102所录制的音频信号进行回音处理。经过回音处理后的音频信号再通过降噪及其他处理模块106进行语音处理,在编码器108中被调制,最后通过传输线路由发射端109发送出去。Single-speaker echo cancellation technology is widely used, that is, only one speaker plays sound during a call. For example, referring to FIG. 1, an application example of a circuit structure 10 of a single speaker and a single microphone is shown. As shown in FIG. 1, the circuit structure 10 includes a speaker 101, a microphone 102, an adder 103, an adaptive filter (AF) module 104, a voice processing module 105, a noise reduction and other processing module 106, a decoder 107, Encoder 108 and radio frequency end 109. The radio frequency terminal 109 sends the received audio signal to the decoder 107, and the decoder 107 demodulates the audio signal. The demodulated audio signal enters the voice processing module 105 for voice processing such as noise reduction and filtering, and then the voice signal Will be played by speaker 101. The audio signal recorded by the microphone 102 includes an echo signal generated by the audio signal played by the speaker 101. In order to eliminate this echo signal as much as possible, an audio signal is extracted at the front end of the speaker 101 as a reference signal. The reference signal is input to the adder 103 after passing through the AF module 104, and the audio signal recorded by the microphone 102 is also input to the adder 103. The two signals are subjected to subtraction processing in the adder 103, so that the audio signals recorded by the microphone 102 can be subjected to echo processing. The audio signal that has undergone the echo processing is then processed by the noise reduction and other processing module 106 for voice processing, modulated in the encoder 108, and finally transmitted by the transmitting end 109 through the transmission line.
图2示出了单扬声器和双麦克风的电路结构20的应用示例。如图2所示,电路结构20包括扬声器201、第一麦克风202a、第二麦克风202b、第一加法器203a、第二加法器203b、第一AF模块204a、第二AF模块204b、语音处理模块205、降噪及其他处理模块206、解码器207、编码器208和发射端209。选取扬声器201前端的音频信号作为参考信号。参考信号一方面通过第一AF模块204a和第一加法器203a对第一麦克风202a所接收的音频信号进行回音处理,另一方面通过第二AF模块204b和第二加法器203b对第二麦克风202b 所接收的音频信号进行回音处理。经过回音处理后的音频信号输入到降噪及其他处理模块206进行降噪处理,在编码器208中被调制,最后通过传输线路由发射端209发送出去。FIG. 2 shows an application example of the circuit structure 20 of a single speaker and a dual microphone. As shown in FIG. 2, the circuit structure 20 includes a speaker 201, a first microphone 202a, a second microphone 202b, a first adder 203a, a second adder 203b, a first AF module 204a, a second AF module 204b, and a voice processing module 205. Noise reduction and other processing module 206, decoder 207, encoder 208, and transmitting end 209. The audio signal at the front end of the speaker 201 is selected as a reference signal. The reference signal performs echo processing on the audio signal received by the first microphone 202a through the first AF module 204a and the first adder 203a, and on the second microphone 202b through the second AF module 204b and the second adder 203b. The received audio signal is subjected to echo processing. The audio signal after the echo processing is input to the noise reduction and other processing module 206 for noise reduction processing, is modulated in the encoder 208, and is finally transmitted by the transmitting end 209 through the transmission line.
图3示出了双扬声器和单麦克风的电路结构30的应用示例。如图3所示,电路结构30包括第一扬声器301a、第二扬声器301b、一个麦克风302、一个加法器303、一个AF模块304、语音处理模块305、降噪及其他处理模块306、解码器307、编码器308和发射端309。选取第一扬声器301a和第二扬声器301b前端的音频信号作为回音处理的参考信号。参考信号通过AF模块304和加法器303对麦克风302所接收的音频信号进行回音处理。经过回音处理后的音频信号输入到降噪及其他处理模块306进行降噪处理,然后在编码器308中被调制,最后通过传输线路由发射端309发送出去。FIG. 3 shows an application example of the circuit structure 30 of a dual speaker and a single microphone. As shown in FIG. 3, the circuit structure 30 includes a first speaker 301a, a second speaker 301b, a microphone 302, an adder 303, an AF module 304, a voice processing module 305, a noise reduction and other processing module 306, and a decoder 307. , Encoder 308 and transmitting end 309. The audio signals at the front ends of the first speaker 301a and the second speaker 301b are selected as reference signals for echo processing. The reference signal performs echo processing on the audio signal received by the microphone 302 through the AF module 304 and the adder 303. The audio signal after the echo processing is input to the noise reduction and other processing module 306 for noise reduction processing, then is modulated in the encoder 308, and finally sent by the transmitting terminal 309 through the transmission line.
在研究中发现,上述回声消除技术存在明显缺陷。参见图4和图5,当扬声器或者麦克风频率响应的高频谐振峰值比较低时,比如扬声器的谐振峰值在4kHz附近,会导致回音信号无法被处理干净,如果这时扬声器的免提响度较大,就很容易出现回音啸叫现象。这是由于参考信号仅是扬声器前端的数字信号,没有考虑扬声器和麦克风所引入的声学影响,使得谐振峰值附近的信号被放大,引起了声音信号在谐振峰值附近的回环放大,从而无法处理掉回声。从图4中可以看到,回音的参考信号比麦克风录制信号(即需要处理的回音信号)大,而且两者的线性度比较好,这种情况的回音比较容易处理干净。从图5中可以看到,在4kHz附近,麦克风录制信号(即需要处理的回音信号)比回音的参考信号大,这时回音会很难被处理掉。It was found in research that the above-mentioned echo cancellation technology has obvious defects. Referring to Figures 4 and 5, when the high-frequency resonance peak of the speaker or microphone frequency response is relatively low, such as the resonance peak of the speaker is around 4kHz, the echo signal cannot be processed cleanly. If the speaker is louder, , It is easy to appear echo howling. This is because the reference signal is only a digital signal at the front of the speaker, and the acoustic effects introduced by the speaker and microphone are not taken into account, so that the signal near the resonance peak is amplified, causing the loop of the sound signal to be amplified near the resonance peak, and the echo cannot be processed. . It can be seen from FIG. 4 that the reference signal of the echo is larger than the signal recorded by the microphone (that is, the echo signal to be processed), and the linearity of the two is better. In this case, the echo is easier to handle and clean. It can be seen from FIG. 5 that, at around 4 kHz, the microphone recording signal (that is, the echo signal to be processed) is larger than the reference signal of the echo. At this time, the echo will be difficult to be processed.
另外,上述图1和图2中的电路结构适用于单扬声器应用场景,对于多扬声器应用场景,比如图3,如果只使用某个扬声器前端的一路信号作为回音的参考信号进行回音处理,而没有考虑多个扬声器声音叠加的影响,会使回音处理的效果较差,尤其是当扬声器播放声音的响度较大时,回音处理的效果更差。In addition, the circuit structure in FIG. 1 and FIG. 2 is suitable for a single-speaker application scenario. For a multi-speaker application scenario, such as FIG. 3, if only one signal at the front end of a certain speaker is used as a reference signal for echo processing, there is no Considering the effect of the sound superposition of multiple speakers, the effect of echo processing will be worse, especially when the loudness of the sound played by the speaker is large, the effect of echo processing is even worse.
在多声道回声消除技术中,由于考虑到每个扬声器都会对回声信号产生影响,一般会在麦克风输入通路的算法中添加多个回音处理 模块。一般来说,电子设备中存在多少个扬声器,就会使用对应数量的回音处理模块,每个回音处理模块会引入一路扬声器的音频信号作为参考信号。图6示出了双扬声器和单麦克风的电路结构60的应用示例。如图6所示,电路结构60包括第一扬声器601a、第二扬声器601b、麦克风602、第一加法器603a、第二加法器603b、第一AF模块604a、第二AF模块604b、语音处理模块605、降噪及其他处理模块606、解码器607、编码器608和发射端609。选取第一扬声器601a和第二扬声器601b前端的音频信号作为回音处理的参考信号。第一扬声器601a前端的音频信号通过第一AF模块604a后输入第一加法器603a,麦克风602所接收的音频信号也输入第一加法器603a,由第一加法器603a对麦克风602录制的音频信号中第一扬声器601a播放的音频信号所产生的回音进行回音处理。第二扬声器601b前端的音频信号通过第二AF模块604b后输入第二加法器603b,麦克风602所接收的音频信号也输入第二加法器603b,由第二加法器603b对麦克风602录制的音频信号中第二扬声器601b播放的音频信号所产生的回音进行回音处理。经过回音处理后的音频信号输入到降噪及其他处理模块606进行处理,然后在编码器608中被调制,最后通过传输线路由发射端609发送出去。这种处理方法考虑到了多个扬声器叠加对回音信号的影响,比图3只引入一个扬声器前端的数字信号作为回音的参考信号的处理效果要好,但是同样也存在上述缺陷,即,由于没有考虑扬声器和麦克风所引入的声学影响,在扬声器和麦克风的谐振峰值较低时,仍然会存在比较明显的回音啸叫现象。此外,由于引入了多个回音处理模块,而且扬声器和麦克风的数量越多,回音处理模块的数据就越多,复杂程度也明显增加。另外,由于一个麦克风可能会存在多个回音处理模块,相互之间的相互影响,会使得调试复杂度明显增加。In the multi-channel echo cancellation technology, considering that each speaker will affect the echo signal, multiple echo processing modules are generally added to the algorithm of the microphone input path. Generally speaking, how many speakers exist in an electronic device, a corresponding number of echo processing modules will be used, and each echo processing module will introduce the audio signal of one speaker as a reference signal. FIG. 6 shows an application example of the circuit structure 60 of a dual speaker and a single microphone. As shown in FIG. 6, the circuit structure 60 includes a first speaker 601a, a second speaker 601b, a microphone 602, a first adder 603a, a second adder 603b, a first AF module 604a, a second AF module 604b, and a voice processing module. 605. The noise reduction and other processing module 606, the decoder 607, the encoder 608, and the transmitting end 609. The audio signals at the front ends of the first speaker 601a and the second speaker 601b are selected as reference signals for echo processing. The audio signal at the front end of the first speaker 601a passes through the first AF module 604a and is input to the first adder 603a. The audio signal received by the microphone 602 is also input to the first adder 603a. The audio signal recorded by the first adder 603a to the microphone 602 is The echo generated by the audio signal played by the first speaker 601a is subjected to echo processing. The audio signal at the front end of the second speaker 601b passes through the second AF module 604b and is input to the second adder 603b. The audio signal received by the microphone 602 is also input to the second adder 603b. The audio signal recorded by the second adder 603b to the microphone 602 is The echo generated by the audio signal played by the second speaker 601b is subjected to echo processing. The audio signal after the echo processing is input to the noise reduction and other processing module 606 for processing, and then is modulated in the encoder 608, and finally sent by the transmitting end 609 through the transmission line. This processing method considers the effect of the superposition of multiple speakers on the echo signal, which is better than the processing effect of introducing the digital signal at the front of one speaker as the reference signal of the echo in FIG. 3, but it also has the above disadvantages, that is, because the speaker is not considered Acoustic effects introduced by microphones and microphones, when the resonance peaks of speakers and microphones are low, there will still be obvious echo howling. In addition, due to the introduction of multiple echo processing modules, and the greater the number of speakers and microphones, the more data the echo processing module has, and the complexity is also significantly increased. In addition, since there may be multiple echo processing modules in a microphone, the mutual influence between each other will significantly increase the complexity of debugging.
在多声道回声消除技术中,使用了扬声器播放和麦克风录制信号时的传递函数进行计算。图7示出了双扬声器和双麦克风的电路结构70的应用示例。从图7中可以看出,输入至线路输入端LI(L)和LI(R)的左/右两通道立体信号XL和XR,不通过和/差信号产生 装置52,并分别通过声音输出端SO(L)和SO(R)输出并在扬声器SP(L)和SP(R)再生,然后由麦克风MC(L)、MC(R)收集并输入至声音输入端SI(L)、SI(R)。滤波器40-1、40-2、40-3和40-4由例如FIR滤波器形成,滤波器40-1、40-2、40-3和40-4所设置的脉冲响应,分别与扬声器SP(L)、SP(R)和麦克风MC(L)、MC(R)之间的传递函数相对应,由此对应产生回声处理信号EC1、EC2、EC3和EC4。加法器44和46、减法器48和50用于进行回声处理。这些回声处理信号分别从线路输出端LO(L)和LO(R)进行输出。这里,和/差信号发生装置52包括加法器54和减法器56,用于产生和信号X M和差信号X S。相关检测装置59基于相关值计算(或诸如此类计算)检测和信号X M和差信号X S之间的相关。传递函数计算装置58用于计算推导出扬声器SP(L)、SP(R)和麦克风MC(L)、MC(R)之间的四个音频传送系统的传递函数。该技术方案是利用立体声音信号的和信号和差信号作为参考信号,并根据参考信号与麦克风录制的声音信号的互频谱计算得到两个扬声器与两个麦克风之间的四个音频传送系统的传递函数。所得到的传递函数经过逆傅里叶变换以得到脉冲响应,这些脉冲响应被设置到滤波器装置中以生成回音处理的参考信号并进行回音处理。该技术方案还考虑了声学路径以及声学结构和器件对回音信号的影响,同时在分析传递函数的过程中,分别引入了两个扬声器信号的和信号和差信号,以及麦克风录制的回音信号,考虑比较全面。但是,在得到传递函数的过程中,需要扬声器播放音频信号,麦克风同时接收音频信号,受环境波动影响比较大,而且扬声器和麦克风在不同的音频信号和环境下存在工作状态不一致的现象,这就造成了某些情况回声处理不理想的现象出现。此外,随着扬声器和麦克风数量的增加,传递函数的复杂性也随之非常明显地增加。 In the multi-channel echo cancellation technology, the transfer function is used to calculate the signal during speaker playback and microphone recording. FIG. 7 shows an application example of the circuit structure 70 of a dual speaker and a dual microphone. It can be seen from FIG. 7 that the left / right two-channel stereo signals XL and XR input to the line input terminals LI (L) and LI (R) do not pass through the sum / difference signal generating device 52 and pass through the sound output terminals, respectively. SO (L) and SO (R) are output and reproduced at the speakers SP (L) and SP (R), and then collected by the microphones MC (L) and MC (R) and input to the sound input terminals SI (L), SI ( R). The filters 40-1, 40-2, 40-3, and 40-4 are formed by, for example, FIR filters. The impulse responses set by the filters 40-1, 40-2, 40-3, and 40-4 are respectively related to the speakers. The transfer functions between SP (L) and SP (R) and the microphones MC (L) and MC (R) correspond to each other, thereby correspondingly generating echo processing signals EC1, EC2, EC3, and EC4. Adders 44 and 46 and subtractors 48 and 50 are used for echo processing. These echo processing signals are output from the line output terminals LO (L) and LO (R), respectively. Here, the sum / difference signal generating device 52 includes an adder 54 and a subtractor 56 for generating a sum signal X M and a difference signal X S. The correlation detection device 59 detects a correlation between the sum signal X M and the difference signal X S based on a correlation value calculation (or such calculation). The transfer function calculation device 58 is used to calculate the transfer functions of the four audio transmission systems between the speakers SP (L), SP (R) and the microphones MC (L), MC (R). The technical solution uses the sum signal and the difference signal of the stereo sound signal as reference signals, and calculates the transfer of the four audio transmission systems between the two speakers and the two microphones according to the cross-spectrum calculation of the reference signal and the sound signal recorded by the microphone. function. The obtained transfer function is subjected to an inverse Fourier transform to obtain impulse responses, and these impulse responses are set in a filter device to generate an echo-processed reference signal and perform an echo process. The technical solution also considers the impact of the acoustic path and acoustic structure and components on the echo signal. At the same time, in the process of analyzing the transfer function, the sum signal and difference signal of the two speaker signals and the echo signal recorded by the microphone are considered. More comprehensive. However, in the process of obtaining the transfer function, the speaker is required to play the audio signal, and the microphone receives the audio signal at the same time, which is greatly affected by environmental fluctuations, and the speaker and the microphone have inconsistent working conditions under different audio signals and environments. This has caused some situations where the echo processing is not ideal. In addition, as the number of speakers and microphones increases, so does the complexity of the transfer function.
在图1至图3和图6至图7所示电路结构的应用示例中,都没有考虑引入扬声器、麦克风的声学响应模型,使得当扬声器或者麦克风的高频谐振峰值比较低(如4kHz附近)时,容易因为麦克风所录制的信号在高频谐振峰值处比参考信号的幅度高而产生回音啸叫现 象。为了有效解决在扬声器或者麦克风的高频谐振峰值比较低时所产生的回音啸叫现象,提出了本公开各实施例的技术方案。下面结合附图对本公开各实施例进行详细介绍。In the application examples of the circuit structures shown in FIG. 1 to FIG. 3 and FIG. 6 to FIG. 7, the acoustic response model of the speaker and microphone is not considered, so that when the speaker or microphone's high-frequency resonance peak is relatively low (such as around 4kHz) When the signal recorded by the microphone is higher than the amplitude of the reference signal at the high-frequency resonance peak, it is easy to produce echo howling. In order to effectively solve the echo howling phenomenon generated when the high-frequency resonance peak of a speaker or a microphone is relatively low, the technical solutions of the embodiments of the present disclosure are proposed. The embodiments of the present disclosure will be described in detail below with reference to the drawings.
图8为根据本公开实施例的信号处理方法的流程示意图。FIG. 8 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
参见图8,根据本公开实施例的信号处理的方法应用于具有至少一个扬声器和至少一个麦克风的信号处理装置,该方法可以包括步骤S801至S804。Referring to FIG. 8, a signal processing method according to an embodiment of the present disclosure is applied to a signal processing apparatus having at least one speaker and at least one microphone. The method may include steps S801 to S804.
在步骤S801,接收第一音频信号并由至少一个扬声器对所述第一音频信号进行播放。In step S801, a first audio signal is received and the first audio signal is played by at least one speaker.
在步骤S802,根据至少一个扬声器模型、至少一个麦克风模型以及所述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号,其中,基于所述至少一个扬声器得到所述至少一个扬声器模型,并且基于所述至少一个麦克风得到所述至少一个麦克风模型。In step S802, at least one echo estimation signal corresponding to the first audio signal is obtained according to at least one speaker model, at least one microphone model, and the first audio signal, and the at least one is obtained based on the at least one speaker. A speaker model, and obtaining the at least one microphone model based on the at least one microphone.
在步骤S803,利用至少一个麦克风接收第二音频信号,其中,所述第二音频信号包括由至少一个扬声器输出并通过至少一个麦克风接收的由所述第一音频信号产生的回音信号。In step S803, a second audio signal is received by using at least one microphone, wherein the second audio signal includes an echo signal generated by the first audio signal and output by at least one speaker and received through the at least one microphone.
在步骤S804,从所述第二音频信号中去除所述第一音频信号对应的至少一个回音估计信号,以获得回音处理后的音频信号。In step S804, at least one echo estimation signal corresponding to the first audio signal is removed from the second audio signal to obtain an echo-processed audio signal.
基于图8所示的信号处理方法,可以有效地解决当扬声器或者麦克风的高频谐振峰值较低时所产生的回音啸叫现象,还减少了多麦克风设计中的计算工作量。Based on the signal processing method shown in FIG. 8, the echo howling phenomenon generated when the high-frequency resonance peak of the speaker or the microphone is low can be effectively solved, and the calculation workload in the multi-microphone design is also reduced.
对于图8所示的信号处理方法,在一种可能的实现方式中,在步骤S801之前,所述方法还包括:对所述第一音频信号进行解调制及语音预处理,其中,所述第一音频信号由远端设备产生并进行发送。For the signal processing method shown in FIG. 8, in a possible implementation manner, before step S801, the method further includes: demodulating and preprocessing the voice of the first audio signal, wherein the first An audio signal is generated and transmitted by the remote device.
需要说明的是,一般而言,由远端设备产生第一音频信号并进行发送。信号处理装置接收到该第一音频信号之后,会进行解调制及语音预处理。经过处理之后的第一音频信号进入扬声器并由扬声器对该第一音频信号进行播放。It should be noted that, generally speaking, the first audio signal is generated and transmitted by the remote device. After the signal processing device receives the first audio signal, it performs demodulation and speech preprocessing. The processed first audio signal enters the speaker and the speaker plays the first audio signal.
对于图8所示的信号处理方法,在一种可能的实现方式中,在步骤S802之前,所述方法还包括:根据所述至少一个扬声器的特性 信息,对应建立至少一个扬声器模型,其中,所述至少一个扬声器的特性信息包括所述至少一个扬声器对应的电路信息和结构信息;以及根据所述至少一个麦克风的特性信息,对应建立至少一个麦克风模型,其中,所述至少一个麦克风的特性信息包括所述至少一个麦克风对应的电路信息和结构信息。For the signal processing method shown in FIG. 8, in a possible implementation manner, before step S802, the method further includes: correspondingly establishing at least one speaker model according to the characteristic information of the at least one speaker, where The characteristic information of the at least one speaker includes circuit information and structural information corresponding to the at least one speaker; and at least one microphone model is correspondingly established according to the characteristic information of the at least one microphone, wherein the characteristic information of the at least one microphone includes Circuit information and structure information corresponding to the at least one microphone.
需要说明的是,在本公开实施例中,引入了扬声器模型和麦克风模型。扬声器模型的建立基于扬声器对应的电路信息和结构信息,以模拟扬声器的声学响应,并且麦克风模型的建立基于麦克风对应的电路信息和结构信息,以模拟麦克风的声学响应。可以使得所获取的第一音频信号的参考信号更准确,更接近于扬声器所播放的第一音频信号的回音信号,从而使得回音信号的处理效果更佳。It should be noted that, in the embodiment of the present disclosure, a speaker model and a microphone model are introduced. The speaker model is based on the circuit information and structure information of the speaker to simulate the acoustic response of the speaker, and the microphone model is based on the circuit information and structure information of the microphone to simulate the acoustic response of the microphone. The acquired reference signal of the first audio signal can be made more accurate and closer to the echo signal of the first audio signal played by the speaker, so that the processing effect of the echo signal is better.
可以理解地,麦克风的数量可以是一个或多个,在本公开实施例中对此不作具体限定。当麦克风的数量是一个时,对应建立的麦克风模型的数量为一个。在这种实现方式中,步骤S802可以包括:将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号;根据每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息,获得每一个扬声器的第一参考信号,其中,基于每一个扬声器与所述麦克风之间的距离及位置信息对应得到所述第一延时及衰减信息;将每一个扬声器的所述第一参考信号进行叠加及位数扩展处理,以获得第一叠加的参考信号;以及基于所述麦克风模型,对所述第一叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第一回音估计信号。Understandably, the number of microphones may be one or more, which is not specifically limited in the embodiments of the present disclosure. When the number of microphones is one, the number of corresponding microphone models is one. In this implementation, step S802 may include: inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the First delay and attenuation information corresponding to the acoustic response signal to obtain a first reference signal of each speaker, wherein the first delay and attenuation are correspondingly obtained based on distance and position information between each speaker and the microphone. Information; performing superposition and bit expansion processing on the first reference signal of each speaker to obtain a first superimposed reference signal; and performing acoustic response processing on the first superimposed reference signal based on the microphone model To obtain a first echo estimation signal of the first audio signal.
需要说明的是,在本公开实施例中,经过扬声器模型、延时及衰减、信号叠加和麦克风模型等一系列信号处理步骤得到第一音频信号的回音估计信号,从而使得回音估计信号更接近于扬声器所播放的第一音频信号的回音信号,使得后续回音信号的处理效果更佳。It should be noted that, in the embodiment of the present disclosure, the echo estimation signal of the first audio signal is obtained through a series of signal processing steps such as speaker model, delay and attenuation, signal superposition, and microphone model, so that the echo estimation signal is closer to The echo signal of the first audio signal played by the speaker makes the subsequent echo signal processing effect better.
举例来说,参见图9,其示出了根据本公开实施例的四扬声器和单麦克风的电路结构90的应用示例。如图9所示,电路结构90包括扬声器组901、麦克风902、扬声器模型组903、延时衰减模块组904、求和模块905、麦克风模型906、回音处理模块907、语音处理模块 908、降噪及其他处理模块909、解码器910、编码器911和发射端912。扬声器组901包括第一扬声器901a、第二扬声器901b、第三扬声器901c与第四扬声器901d。扬声器模型组903包括第一扬声器模型903a、第二扬声器模型903b、第三扬声器模型903c和第四扬声器模型903d。延时衰减模块组904包括第一延时衰减模块904a、第二延时衰减模块904b、第三延时衰减模块904c和第四延时衰减模块904d。扬声器模型组903基于扬声器组901对应建立,并且延时衰减模块组904对经过扬声器模型组903后的音频信号进行对应的延时及衰减。For example, referring to FIG. 9, an application example of a circuit structure 90 of a four-speaker and a single microphone according to an embodiment of the present disclosure is shown. As shown in FIG. 9, the circuit structure 90 includes a speaker group 901, a microphone 902, a speaker model group 903, a delay attenuation module group 904, a summing module 905, a microphone model 906, an echo processing module 907, a voice processing module 908, and noise reduction. And other processing modules 909, a decoder 910, an encoder 911, and a transmitting end 912. The speaker group 901 includes a first speaker 901a, a second speaker 901b, a third speaker 901c, and a fourth speaker 901d. The speaker model group 903 includes a first speaker model 903a, a second speaker model 903b, a third speaker model 903c, and a fourth speaker model 903d. The delay attenuation module group 904 includes a first delay attenuation module 904a, a second delay attenuation module 904b, a third delay attenuation module 904c, and a fourth delay attenuation module 904d. The speaker model group 903 is correspondingly established based on the speaker group 901, and the delay attenuation module group 904 performs corresponding delay and attenuation on the audio signal after passing through the speaker model group 903.
射频端912将接收的第一音频信号发送至解码器910,由解码器910对第一音频信号进行解调。解调后的第一音频信号进入语音处理模块908进行降噪、滤波等预处理,然后第一音频信号被扬声器组901播放。麦克风902录制的第二音频信号中包括由扬声器组901播放的第一音频信号所产生的回音信号。为了对回音信号进行处理,在扬声器组901前端提取第一音频信号,将第一音频信号分别输入第一扬声器模型903a、第二扬声器模型903b、第三扬声器模型903c和第四扬声器模型903d进行声学响应分析处理,以得到扬声器组901中每一个扬声器的声学响应信号。由于声音在传播过程中需要时间,而且在传播中能量也会衰减,为了表示扬声器播放的声音传递到麦克风位置的延时及衰减量(即声音传输时间和衰减量),声学响应信号被对应输入到第一延时衰减模块904a、第二延时衰减模块904b、第三延时衰减模块904c和第四延时衰减模块904d进行延时衰减处理,以得到四路延时及衰减之后的第一参考信号。通过求和模块905进行叠加处理,可以得到叠加的参考信号。这里,还需要考虑到叠加之后有可能会出现溢出的情况,因此需要进行位数扩展处理,至少可以扩展两位二进制数。一般来说,最多扩展四倍,以防止叠加之后的溢出情况。叠加的参考信号输入到麦克风模型906,以根据麦克风模型906的声压激励进行声学响应处理,进而可以得到经过麦克风902录制的回音参考信号,即,第一音频信号的第一回音估计信号。第一回音估计信号和麦克风902录制的第二音频信号共同输入到回音处理模块 907进行回音处理。回音处理后的音频信号再通过降噪及其他处理模块909进行降噪处理,然后在编码器911中被调制,最后通过传输线路由发射端912发送出去。The radio frequency end 912 sends the received first audio signal to the decoder 910, and the decoder 910 demodulates the first audio signal. The demodulated first audio signal enters the voice processing module 908 for pre-processing such as noise reduction and filtering, and then the first audio signal is played by the speaker group 901. The second audio signal recorded by the microphone 902 includes an echo signal generated by the first audio signal played by the speaker group 901. In order to process the echo signal, a first audio signal is extracted at the front end of the speaker group 901, and the first audio signal is input to the first speaker model 903a, the second speaker model 903b, the third speaker model 903c, and the fourth speaker model 903d for acoustics, respectively. Response analysis processing to obtain an acoustic response signal of each speaker in the speaker group 901. Since sound takes time in the propagation process, and the energy also decays during the propagation, in order to represent the delay and attenuation of the sound transmitted from the speaker to the microphone position (that is, the sound transmission time and attenuation), the acoustic response signal is correspondingly input. Go to the first delay attenuation module 904a, the second delay attenuation module 904b, the third delay attenuation module 904c, and the fourth delay attenuation module 904d to perform delay attenuation processing to obtain the four delays and the first after the attenuation. Reference signal. The summing module 905 performs superposition processing to obtain a superposed reference signal. Here, it is also necessary to consider that overflow may occur after the superposition, so a bit expansion process is required, and at least two binary numbers can be expanded. In general, it is extended up to four times to prevent overflow after superposition. The superimposed reference signal is input to the microphone model 906 to perform acoustic response processing according to the sound pressure excitation of the microphone model 906, thereby obtaining an echo reference signal recorded through the microphone 902, that is, a first echo estimation signal of the first audio signal. The first echo estimation signal and the second audio signal recorded by the microphone 902 are input to the echo processing module 907 for echo processing. The echo-processed audio signal is then subjected to noise reduction processing through the noise reduction and other processing module 909, and then modulated in the encoder 911, and finally sent by the transmitting end 912 through the transmission line.
需要说明的是,将多个扬声器(比如图9中的四个扬声器901a、901b、901c和901d)的声音信号进行叠加及位数扩展处理,主要是考虑到叠加之后有可能会出现溢出的情况,需要进行位数扩展。叠加之后的声音信号作为回音估计信号输入到回音处理模块907,主要是用来对麦克风902所录制的第二音频信号中由扬声器播放的第一音频信号所产生的回音信号进行回音处理。经过回音处理模块907之后并不会增加原始音频信号幅度,因此在回音处理模块907中,不需要对麦克风902所录制的第二音频信号而进行位数扩展处理。It should be noted that the sound signals of multiple speakers (such as the four speakers 901a, 901b, 901c, and 901d in FIG. 9) are superimposed and the number of bits is expanded, mainly considering that overflow may occur after superposition. , Need to be extended by digits. The superimposed sound signal is input to the echo processing module 907 as an echo estimation signal, and is mainly used to perform echo processing on the echo signal generated by the first audio signal played by the speaker among the second audio signals recorded by the microphone 902. After the echo processing module 907 does not increase the original audio signal amplitude, the echo processing module 907 does not need to perform a bit expansion process on the second audio signal recorded by the microphone 902.
可以理解地,当麦克风的高频谐振峰值比较高(比如大于8kHz)时,可以取消麦克风模型对上述获得第一叠加的参考信号的声学响应处理。因此,在上述具体实现方式中,在获得第一叠加的参考信号的步骤之后,所述方法还包括:获取所述麦克风的高频谐振峰值;将所述麦克风的高频谐振峰值与预设的高频谐振峰值进行比较;以及响应于所述麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述麦克风模型对所述第一叠加的参考信号的声学响应处理。Understandably, when the high-frequency resonance peak of the microphone is relatively high (for example, greater than 8 kHz), the acoustic response processing of the microphone model to the above-mentioned first superimposed reference signal may be cancelled. Therefore, in the above specific implementation manner, after the step of obtaining the first superimposed reference signal, the method further includes: obtaining a high-frequency resonance peak of the microphone; and combining the high-frequency resonance peak of the microphone with a preset Compare the high-frequency resonance peak; and in response to the high-frequency resonance peak of the microphone being higher than a preset high-frequency resonance peak, cancel the acoustic response processing of the microphone model to the first superimposed reference signal.
需要说明的是,如果第一麦克风(比如图9中的麦克风902)的频宽比较宽,也就是说,麦克风902的高频谐振峰值比较高,则不需要通过麦克风模型906对所得到的第一叠加的参考信号进行声学响应处理,即,不需要通过麦克风模型906对回音估计信号进行修正,所得到的第一叠加的参考信号就可以作为回音估计信号。举例来说,结合图9所示的电路结构90,在通过求和模块905得到第一叠加的参考信号之后,可以根据麦克风902的频宽进行判断,即,将麦克风902的高频谐振峰值与预设的高频谐振峰值进行比较。假定,预设的高频谐振峰值为8kHz,若麦克风902的高频谐振峰值为5kHz,即,麦克风902的高频谐振峰值低于预设的高频谐振峰值,则需要麦克风模型906对第一叠加的参考信号进行声学响应处理,从而可以得到修正后的回音估计信号;若麦克风902的高频谐振峰值为9kHz,即, 麦克风902的高频谐振峰值高于预设的高频谐振峰值,则不需要麦克风模型906对第一叠加的参考信号进行声学响应处理。It should be noted that if the bandwidth of the first microphone (such as the microphone 902 in FIG. 9) is relatively wide, that is, the high-frequency resonance peak of the microphone 902 is relatively high, it is not necessary to use the microphone model 906 to pair the obtained first An superimposed reference signal is subjected to acoustic response processing, that is, it is not necessary to modify the echo estimation signal through the microphone model 906, and the obtained first superimposed reference signal can be used as the echo estimation signal. For example, in combination with the circuit structure 90 shown in FIG. 9, after the first superimposed reference signal is obtained through the summing module 905, it can be determined according to the bandwidth of the microphone 902, that is, the high-frequency resonance peak of the microphone 902 and Compare preset high-frequency resonance peaks. Assume that the preset high-frequency resonance peak is 8 kHz. If the high-frequency resonance peak of the microphone 902 is 5 kHz, that is, the high-frequency resonance peak of the microphone 902 is lower than the preset high-frequency resonance peak, the microphone model 906 is required to pair the first The superimposed reference signal is processed for acoustic response, so that a modified echo estimation signal can be obtained. If the high-frequency resonance peak of the microphone 902 is 9 kHz, that is, the high-frequency resonance peak of the microphone 902 is higher than a preset high-frequency resonance peak, then The microphone model 906 is not required to perform acoustic response processing on the first superimposed reference signal.
参见图10,其示出了根据本公开实施例的四扬声器和单麦克风的电路结构100的应用示例。如图10所示,电路结构100包括扬声器组1001、麦克风1002、扬声器模型组1003、延时衰减模块组1004、求和模块1005、回音处理模块1006、语音处理模块1007、降噪及其他处理模块1008、解码器1009、编码器1010和发射端1011。扬声器组1001包括第一扬声器1001a、第二扬声器1001b、第三扬声器1001c与第四扬声器1001d。扬声器模型组1003包括第一扬声器模型1003a、第二扬声器模型1003b、第三扬声器模型1003c和第四扬声器模型1003d。延时衰减模块组1004包括第一延时衰减模块1004a、第二延时衰减模块1004b、第三延时衰减模块1004c和第四延时衰减模块1004d。与图9所示的电路结构90相比,图10所示的电路结构100减少了麦克风模型。也就是说,在图10所示的电路结构中,取消了麦克风模型对第一叠加的参考信号进行声学响应处理的步骤。在得到叠加的参考信号之后,将叠加的参考信号直接作为回音估计信号输入回音处理模块1006以对麦克风1002录制的第二音频信号进行回音处理。其他操作过程同前述图9所示的电路结构90的操作过程相同,这里不再详述。Referring to FIG. 10, an application example of a circuit structure 100 of a four-speaker and a single microphone according to an embodiment of the present disclosure is shown. As shown in FIG. 10, the circuit structure 100 includes a speaker group 1001, a microphone 1002, a speaker model group 1003, a delay attenuation module group 1004, a summing module 1005, an echo processing module 1006, a voice processing module 1007, noise reduction, and other processing modules. 1008, a decoder 1009, an encoder 1010, and a transmitting end 1011. The speaker group 1001 includes a first speaker 1001a, a second speaker 1001b, a third speaker 1001c, and a fourth speaker 1001d. The speaker model group 1003 includes a first speaker model 1003a, a second speaker model 1003b, a third speaker model 1003c, and a fourth speaker model 1003d. The delay attenuation module group 1004 includes a first delay attenuation module 1004a, a second delay attenuation module 1004b, a third delay attenuation module 1004c, and a fourth delay attenuation module 1004d. Compared with the circuit structure 90 shown in FIG. 9, the circuit structure 100 shown in FIG. 10 reduces the microphone model. That is, in the circuit structure shown in FIG. 10, the step of performing the acoustic response processing on the first superimposed reference signal by the microphone model is eliminated. After obtaining the superimposed reference signal, the superimposed reference signal is directly input as an echo estimation signal to the echo processing module 1006 to perform echo processing on the second audio signal recorded by the microphone 1002. The other operation processes are the same as those of the circuit structure 90 shown in FIG. 9 described above, and will not be described in detail here.
可以理解地,麦克风的数量可以是一个或多个,在本公开实施例中对此不作具体限定。当麦克风的数量是两个时,对应建立的麦克风模型的数量为两个。在这种实现方式中,步骤S802可以包括:将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号;根据每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息、所述声学响应信号对应的第二延时及衰减信息,获得每一个扬声器的第一参考信号和第二参考信号,其中,基于每一个扬声器与所述至少一个麦克风中的第一麦克风之间的距离及位置信息对应得到所述第一延时及衰减信息,并且基于每一个扬声器与所述至少一个麦克风中的第二麦克风之间的距离及位置信息对应得到所述第二延时及衰减信息;将每一个扬声 器的所述第一参考信号和所述第二参考信号分别进行叠加及位数扩展处理,以获得第一叠加的参考信号和第二叠加的参考信号;基于所述至少一个麦克风模型中的第一麦克风模型,对所述第一叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第一回音估计信号;以及基于所述至少一个麦克风模型中的第二麦克风模型,对所述第二叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第二回音估计信号。Understandably, the number of microphones may be one or more, which is not specifically limited in the embodiments of the present disclosure. When the number of microphones is two, the number of correspondingly established microphone models is two. In this implementation, step S802 may include: inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the The first delay and attenuation information corresponding to the acoustic response signal and the second delay and attenuation information corresponding to the acoustic response signal are used to obtain a first reference signal and a second reference signal for each speaker, wherein, based on each speaker and The distance and position information between the first microphones in the at least one microphone correspond to the first delay and attenuation information, and are based on the distance between each speaker and the second microphone in the at least one microphone and The position information corresponds to the second delay and attenuation information; the first reference signal and the second reference signal of each speaker are separately superimposed and bit expanded to obtain a first superimposed reference signal and A second superimposed reference signal; based on a first microphone in the at least one microphone model A model, performing acoustic response processing on the first superimposed reference signal to obtain a first echo estimation signal of the first audio signal; and based on a second microphone model of the at least one microphone model, Acoustic response processing is performed on the two superimposed reference signals to obtain a second echo estimation signal of the first audio signal.
举例来说,参见图11,其示出了根据本公开实施例的四扬声器和双麦克风的电路结构110的应用示例。如图11所示,电路结构110包括扬声器组1101、第一麦克风1102a、第二麦克风1102b、扬声器模型组1103、第一延时衰减模块组1104-1、第二延时衰减模块组1104-2、第一求和模块1105a、第二求和模块1105b、第一麦克风模型1106a、第二麦克风模型1106b、第一回音处理模块1107a、第二回音处理模块1107b、语音处理模块1108、降噪及其他处理模块1109、解码器1111、编码器1111和发射端1112。扬声器组1101包括第一扬声器1101a、第二扬声器1101b、第三扬声器1101c与第四扬声器1101d。扬声器模型组1103包括第一扬声器模型1103a、第二扬声器模型1103b、第三扬声器模型1103c和第四扬声器模型1103d。第一延时衰减模块组1104-1包括第一延时衰减模块1104-1a、第二延时衰减模块1104-1b、第三延时衰减模块1104-1c和第四延时衰减模块1104-1d。第二延时衰减模块组1104-2包括第五延时衰减模块1104-2a、第六延时衰减模块1104-2b、第七延时衰减模块1104-2c和第八延时衰减模块1104-2d。扬声器模型组1103基于扬声器组1101对应建立,并且第一延时衰减模块组1104-1和第二延时衰减模块组1104-2对经过扬声器模型组1103后的音频信号进行对应的延时及衰减。For example, referring to FIG. 11, an application example of a circuit structure 110 of a four-speaker and dual-microphone according to an embodiment of the present disclosure is shown. As shown in FIG. 11, the circuit structure 110 includes a speaker group 1101, a first microphone 1102a, a second microphone 1102b, a speaker model group 1103, a first delay attenuation module group 1104-1, and a second delay attenuation module group 1104-2. , First summation module 1105a, second summation module 1105b, first microphone model 1106a, second microphone model 1106b, first echo processing module 1107a, second echo processing module 1107b, speech processing module 1108, noise reduction and other The processing module 1109, the decoder 1111, the encoder 1111, and the transmitting end 1112. The speaker group 1101 includes a first speaker 1101a, a second speaker 1101b, a third speaker 1101c, and a fourth speaker 1101d. The speaker model group 1103 includes a first speaker model 1103a, a second speaker model 1103b, a third speaker model 1103c, and a fourth speaker model 1103d. The first delay attenuation module group 1104-1 includes a first delay attenuation module 1104-1a, a second delay attenuation module 1104-1b, a third delay attenuation module 1104-1c, and a fourth delay attenuation module 1104-1d. . The second delay attenuation module group 1104-2 includes a fifth delay attenuation module 1104-2a, a sixth delay attenuation module 1104-2b, a seventh delay attenuation module 1104-2c, and an eighth delay attenuation module 1104-2d. . The speaker model group 1103 is correspondingly established based on the speaker group 1101, and the first delay attenuation module group 1104-1 and the second delay attenuation module group 1104-2 perform corresponding delay and attenuation on the audio signal after passing through the speaker model group 1103. .
射频端1112将接收的第一音频信号发送至解码器1111,由解码器1111对第一音频信号进行解调。解调后的第一音频信号进入语音处理模块1108进行降噪、滤波等预处理,然后第一音频信号被扬声器组1101进行播放。第一麦克风1102a录制的第二音频信号中包括 由扬声器组1101播放的第一音频信号所产生的回音信号,第二麦克风1102b录制的第二音频信号中也包括由扬声器组1101播放的第一音频信号所产生的回音信号。为了尽可能消除这些回音信号,在扬声器组1101前端提取第一音频信号,若这四个扬声器所输入的第一音频信号不同,则将提取的各个第一音频信号对应输入到第一扬声器模型1103a、第二扬声器模型1103b、第三扬声器模型1103c和第四扬声器模型1103d,然后进行声学响应分析处理,以得到扬声器组1101中每一个扬声器的声学响应信号。由于声音在传播过程中需要时间,而且在传播中能量也会衰减,为了表示扬声器播放的声音传递到麦克风位置的延时及衰减量,需要根据第一麦克风1102a与扬声器组1101中每一个扬声器的距离及位置信息来得到第一延时衰减模块组1104-1中对应的延时及衰减量,根据第二麦克风1102b与扬声器组1101中每一个扬声器的距离及位置信息来得到第二延时衰减模块组1104-2中对应的延时及衰减量。声学响应信号分别输入到第一延时衰减模块组1104-1和第一延时衰减模块组1104-2进行延时衰减处理,得到八路延时及衰减之后的声音信号。通过第一延时衰减模块组1104-1得到的四路延时及衰减之后的第一参考信号。将第一参考信号输入到第一求和模块1105a进行叠加及位数扩展处理,以得到第一叠加的参考信号。第一叠加的参考信号被输入到第一麦克风模型1106a,根据第一麦克风模型1106a的声压激励进行声学响应处理,进而可以得到经过第一麦克风1102a录制的第一回音估计信号。通过第二延时衰减模块组1104-2得到的四路延时及衰减之后的第二参考信号。将第二参考信号输入到第二求和模块1105b进行叠加及位数扩展处理,以得到第二叠加的参考信号。第二叠加的参考信号被输入到第二麦克风模型1106b,根据第二麦克风模型1106b的声压激励进行声学响应处理,进而可以得到经过第二麦克风1102b录制的第二回音估计信号。第一回音估计信号和第一麦克风1102a录制的第二音频信号共同输入到第一回音处理模块1107a进行回音处理,从而可以尽可能消除第一麦克风1102a所录制的第二音频信号中扬声器组1101播放的第一音频信号所产生的回音信号。第二回音估计信号和第二麦克 风1102b录制的第二音频信号共同输入到第二回音处理模块1107b进行回音处理,从而可以尽可能消除第二麦克风1102b所录制的第二音频信号中扬声器组1101播放的第一音频信号所产生的回音信号。回音处理之后的音频信号再通过降噪及其他处理模块1109进行降噪处理,然后在编码器1111中被调制,最后通过传输线路由发射端1112发送出去。The radio frequency end 1112 sends the received first audio signal to the decoder 1111, and the decoder 1111 demodulates the first audio signal. The demodulated first audio signal enters the voice processing module 1108 for preprocessing such as noise reduction and filtering, and then the first audio signal is played by the speaker group 1101. The second audio signal recorded by the first microphone 1102a includes an echo signal generated by the first audio signal played by the speaker group 1101, and the second audio signal recorded by the second microphone 1102b also includes the first audio played by the speaker group 1101. The echo signal produced by the signal. In order to eliminate these echo signals as much as possible, the first audio signal is extracted at the front end of the speaker group 1101. If the first audio signals input by the four speakers are different, each of the extracted first audio signals is correspondingly input to the first speaker model 1103a. , The second speaker model 1103b, the third speaker model 1103c, and the fourth speaker model 1103d, and then perform an acoustic response analysis process to obtain an acoustic response signal of each speaker in the speaker group 1101. Since sound takes time in the propagation process, and energy also decays during the propagation, in order to indicate the delay and attenuation of the sound transmitted from the speaker to the microphone position, it is necessary to use the first microphone 1102a and each speaker in the speaker group 1101. The distance and position information are used to obtain the corresponding delay and attenuation in the first delay attenuation module group 1104-1, and the second delay attenuation is obtained according to the distance and position information of the second microphone 1102b and each speaker in the speaker group 1101. Corresponding delay and attenuation in module group 1104-2. The acoustic response signals are input to the first delay attenuation module group 1104-1 and the first delay attenuation module group 1104-2, respectively, to perform delay attenuation processing to obtain eight-channel delayed and attenuated sound signals. The four delays obtained by the first delay attenuation module group 1104-1 and the first reference signal after attenuation. The first reference signal is input to the first summing module 1105a for superposition and bit expansion processing to obtain a first superposed reference signal. The first superimposed reference signal is input to the first microphone model 1106a, and an acoustic response process is performed according to the sound pressure excitation of the first microphone model 1106a, thereby obtaining a first echo estimation signal recorded by the first microphone 1102a. The four delays obtained by the second delay attenuation module group 1104-2 and the second reference signal after attenuation. The second reference signal is input to the second summation module 1105b for superposition and bit expansion processing to obtain a second superimposed reference signal. The second superimposed reference signal is input to the second microphone model 1106b, and the acoustic response processing is performed according to the sound pressure excitation of the second microphone model 1106b, thereby obtaining a second echo estimation signal recorded by the second microphone 1102b. The first echo estimation signal and the second audio signal recorded by the first microphone 1102a are input to the first echo processing module 1107a for echo processing, so that the speaker group 1101 can be eliminated from the second audio signal recorded by the first microphone 1102a as much as possible. The echo signal is generated by the first audio signal. The second echo estimation signal and the second audio signal recorded by the second microphone 1102b are input to the second echo processing module 1107b for echo processing, so that the speaker group 1101 can be eliminated from the second audio signal recorded by the second microphone 1102b as much as possible. The echo signal is generated by the first audio signal. The audio signal after the echo processing is then subjected to noise reduction processing by the noise reduction and other processing module 1109, then modulated in the encoder 1111, and finally transmitted by the transmitting end 1112 through the transmission line.
需要说明的是,在第一回音处理模块1107a或者第二回音处理模块1107b中,不需要考虑第一麦克风和第二麦克风所录制的第二音频信号而进行位数扩展处理。It should be noted that, in the first echo processing module 1107a or the second echo processing module 1107b, it is not necessary to perform a bit expansion process in consideration of the second audio signal recorded by the first microphone and the second microphone.
还需要说明的是,当麦克风的数量为多个时(比如图11中的第一麦克风1102a和第二麦克风1102b),每个麦克风中的回音信号的参考信号(即回音估计信号)需要单独得到,并且还需要单独进行回声处理(比如图11中的第一回音处理模块1107a和第二回音处理模块1107b)。对于扬声器模型所进行的声学响应处理,由于扬声器组是共用的,而且扬声器模型和延时衰减模块分开设置,这样先考虑扬声器模型,再考虑扬声器和每个麦克风之间的位置及距离信息,使得扬声器前端的音频信号只需要经过一次扬声器模型的声学响应处理即可,有助于减少在多麦克风的设计方案中的计算工作量。It should also be noted that when there are multiple microphones (such as the first microphone 1102a and the second microphone 1102b in FIG. 11), the reference signal of the echo signal (that is, the echo estimation signal) in each microphone needs to be obtained separately. , And also need to perform echo processing separately (such as the first echo processing module 1107a and the second echo processing module 1107b in FIG. 11). For the acoustic response processing of the speaker model, since the speaker group is shared, and the speaker model and the delay attenuation module are separately provided, so first consider the speaker model, and then consider the position and distance information between the speaker and each microphone, so that The audio signal at the front of the speaker only needs to undergo the acoustic response processing of the speaker model once, which helps reduce the calculation workload in the multi-microphone design scheme.
可以理解地,当第一麦克风或者第二麦克风的高频谐振峰值比较高(比如大于8kHz)时,可以取消麦克风对应的麦克风模型对上述第一叠加的参考信号或者第二叠加的参考信号的声学响应处理。因此,在上述具体实现方式中,在获得第一叠加的参考信号和第二叠加的参考信号的步骤之后,所述方法还包括:获取所述第一麦克风的高频谐振峰值和所述第二麦克风的高频谐振峰值;将所述第一麦克风的高频谐振峰值和所述第二麦克风的高频谐振峰值分别与预设的高频谐振峰值进行比较;响应于所述第一麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述第一麦克风模型对所述第一叠加的参考信号的声学响应处理;以及响应于所述第二麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述第二麦克风模型对所述第二叠加的参考信号的声学响应处理。Understandably, when the high-frequency resonance peak of the first microphone or the second microphone is relatively high (for example, greater than 8 kHz), the acoustics of the microphone model corresponding to the microphone on the first superimposed reference signal or the second superimposed reference signal may be cancelled. Response processing. Therefore, in the above specific implementation manner, after the step of obtaining the first superimposed reference signal and the second superimposed reference signal, the method further includes: acquiring a high-frequency resonance peak of the first microphone and the second superimposed reference signal. The high-frequency resonance peak of the microphone; comparing the high-frequency resonance peak of the first microphone and the high-frequency resonance peak of the second microphone with a preset high-frequency resonance peak, respectively; in response to the height of the first microphone The high-frequency resonance peak is higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the first microphone model to the first superimposed reference signal; and in response to the high-frequency resonance peak of the second microphone being higher than a preset Set the high-frequency resonance peak, cancel the acoustic response processing of the second microphone model to the second superimposed reference signal.
需要说明的是,如果第一麦克风或者第二麦克风的频宽比较宽,也就是说,麦克风的高频谐振峰值较高,则不需要通过第一麦克风模型或者第二麦克风模型对所得到的叠加的参考信号进行声学响应处理,即,不需要通过第一麦克风模型或者第二麦克风模型对回音参考信号进行修正。举例来说,结合图11所示的电路结构110,在通过第一求和模块1105a得到第一叠加的参考信号之后,可以根据第一麦克风1102a的频宽进行判断,即,将第一麦克风1102a的高频谐振峰值与预设的高频谐振峰值进行比较。假定,预设的高频谐振峰值为8kHz,若第一麦克风1102a的高频谐振峰值为5kHz,即,第一麦克风1102a的高频谐振峰值低于预设的高频谐振峰值,则需要第一麦克风模型1106a对第一叠加的参考信号进行声学响应处理,从而可以得到修正后的第一回音估计信号;若第一麦克风1102a的高频谐振峰值为9kHz,即,第一麦克风1102a的高频谐振峰值高于预设的高频谐振峰值,则不需要第一麦克风模型1106a对叠加的参考信号进行声学响应处理,即,不需要第一麦克风模型1106a对第一回音估计信号进行修正,可以取消第一麦克风模型1106a对第一叠加的参考信号进行声学响应处理的步骤。也可以根据第二麦克风1102b的频宽进行判断,即,将第二麦克风1102b的高频谐振峰值与预设的高频谐振峰值进行比较,从而可以得到是否需要第二麦克风模型1106b对第二回音估计信号的修正。如果第二麦克风1102b的高频谐振峰值高于预设的高频谐振峰值,则不需要第二麦克风模型1106b对叠加的参考信号进行声学响应处理,可以取消了第二麦克风模型1106b对第二叠加的参考信号进行声学响应处理的步骤。It should be noted that if the bandwidth of the first microphone or the second microphone is relatively wide, that is, the high-frequency resonance peak value of the microphone is high, the superposition obtained by using the first microphone model or the second microphone model is not required. Acoustic response processing is performed on the reference signal, that is, there is no need to modify the echo reference signal through the first microphone model or the second microphone model. For example, in combination with the circuit structure 110 shown in FIG. 11, after obtaining the first superimposed reference signal through the first summing module 1105a, the first microphone 1102a may be judged according to the bandwidth of the first microphone 1102a, that is, the first microphone 1102a The high-frequency resonance peak is compared with a preset high-frequency resonance peak. Assume that the preset high-frequency resonance peak is 8 kHz. If the high-frequency resonance peak of the first microphone 1102a is 5 kHz, that is, the high-frequency resonance peak of the first microphone 1102a is lower than the preset high-frequency resonance peak, the first The microphone model 1106a performs an acoustic response processing on the first superimposed reference signal, so that a corrected first echo estimation signal can be obtained; if the high-frequency resonance peak of the first microphone 1102a is 9kHz, that is, the high-frequency resonance of the first microphone 1102a If the peak value is higher than the preset high-frequency resonance peak value, the first microphone model 1106a does not need to perform acoustic response processing on the superimposed reference signal, that is, the first microphone model 1106a does not need to modify the first echo estimation signal, and the first A microphone model 1106a performs an acoustic response process on the first superimposed reference signal. It can also be judged according to the bandwidth of the second microphone 1102b, that is, comparing the high-frequency resonance peak of the second microphone 1102b with a preset high-frequency resonance peak, so as to obtain whether the second microphone model 1106b is required for the second echo. Correction of estimated signal. If the high-frequency resonance peak of the second microphone 1102b is higher than the preset high-frequency resonance peak, the second microphone model 1106b does not need to perform the acoustic response processing on the superimposed reference signal, and the second microphone model 1106b may cancel the second superimposition. The reference signal performs the steps of acoustic response processing.
对于图8所示的信号处理方法,在一种可能的实现方式中,将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号的步骤包括:响应于所述至少一个扬声器所接收的第一音频信号相同,将所述第一音频信号分别输入所述至少一个扬声器模型并进行声学响应处理,以获得每一个扬声器的声学响应信号;以及响应于所述至少一个扬声器所接收的第一音频信号不同,将所述第一音频信号对应输入所述至少一个扬声器模型并进行声 学响应处理,以获得每一个扬声器的声学响应信号。For the signal processing method shown in FIG. 8, in a possible implementation manner, the step of inputting the first audio signal into each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker includes: responding The first audio signal received by the at least one speaker is the same, inputting the first audio signal to the at least one speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; and responding to the The first audio signal received by the at least one speaker is different, and the first audio signal is correspondingly input to the at least one speaker model and an acoustic response process is performed to obtain an acoustic response signal of each speaker.
需要说明的是,对于多个扬声器所接收的第一音频信号,多个扬声器所接收的第一音频信号可以是相同的,此时同一路音频信号分别输入到多个扬声器,同时该音频信号也分别输入到多个扬声器模型,比如图9所示的电路结构90;另外,多个扬声器所接收的第一音频信号也可以是不同的,此时这些第一音频信号对应输入到多个扬声器,同时这些音频信号也会对应输入到多个扬声器模型,比如图12所示的电路结构120。It should be noted that, for the first audio signals received by the multiple speakers, the first audio signals received by the multiple speakers may be the same. At this time, the same audio signal is input to the multiple speakers, and the audio signals are also Input to multiple speaker models, such as the circuit structure 90 shown in FIG. 9; In addition, the first audio signals received by the multiple speakers may also be different. At this time, these first audio signals are correspondingly input to multiple speakers. At the same time, these audio signals are correspondingly input to multiple speaker models, such as the circuit structure 120 shown in FIG. 12.
参见图12,其示出了根据本公开实施例的四扬声器和单麦克风的电路结构120的应用示例。如图12所示,电路结构120包括扬声器组1201、麦克风1202、扬声器模型组1203、延时衰减模块组1204、求和模块1205、麦克风模型1206、回音处理模块1207、语音处理模块1208、降噪及其他处理模块1209、解码器1210、编码器1211和发射端1212。扬声器组1201包括第一扬声器1201a、第二扬声器1201b、第三扬声器1201c与第四扬声器1201d。扬声器模型组1203包括第一扬声器模型1203a、第二扬声器模型1203b、第三扬声器模型1203c和第四扬声器模型1203d。延时衰减模块组1204包括第一延时衰减模块1204a、第二延时衰减模块1204b、第三延时衰减模块1204c和第四延时衰减模块1204d。还需要说明的是,图12所示的电路结构120与图9所示的电路结构90类似,区别在于第一音频信号经过语音处理模块1208之后分为四路第一音频信号,这四路第一音频信号对应进入到扬声器组1201并由扬声器组1201进行播放,同时将第一扬声器1201a、第二扬声器1201b、第三扬声器1201c与第四扬声器1201d所接收的各个第一音频信号对应输入到第一扬声器模型1203a、第二扬声器模型1203b、第三扬声器模型1203c和第四扬声器模型1203d中。其他操作过程与图9所示的电路结构90的操作过程相同,这里不再详述。Referring to FIG. 12, an application example of a circuit structure 120 of a four-speaker and a single microphone according to an embodiment of the present disclosure is shown. As shown in FIG. 12, the circuit structure 120 includes a speaker group 1201, a microphone 1202, a speaker model group 1203, a delay attenuation module group 1204, a summing module 1205, a microphone model 1206, an echo processing module 1207, a voice processing module 1208, and noise reduction. And other processing modules 1209, decoder 1210, encoder 1211, and transmitting end 1212. The speaker group 1201 includes a first speaker 1201a, a second speaker 1201b, a third speaker 1201c, and a fourth speaker 1201d. The speaker model group 1203 includes a first speaker model 1203a, a second speaker model 1203b, a third speaker model 1203c, and a fourth speaker model 1203d. The delay attenuation module group 1204 includes a first delay attenuation module 1204a, a second delay attenuation module 1204b, a third delay attenuation module 1204c, and a fourth delay attenuation module 1204d. It should also be noted that the circuit structure 120 shown in FIG. 12 is similar to the circuit structure 90 shown in FIG. 9 except that the first audio signal is divided into four first audio signals after passing through the voice processing module 1208. An audio signal correspondingly enters and is played by the speaker group 1201, and each first audio signal received by the first speaker 1201a, the second speaker 1201b, the third speaker 1201c, and the fourth speaker 1201d is correspondingly input to the first speaker 1201a. A speaker model 1203a, a second speaker model 1203b, a third speaker model 1203c, and a fourth speaker model 1203d. The other operations are the same as those of the circuit structure 90 shown in FIG. 9, and will not be described in detail here.
根据本实施例提供的信号处理方法,可以有效地解决当扬声器或者麦克风的高频谐振峰值较低时所产生的回音啸叫现象,还减小了多麦克风设计中的计算工作量。According to the signal processing method provided in this embodiment, an echo howling phenomenon generated when a high-frequency resonance peak of a speaker or a microphone is low can be effectively solved, and a calculation workload in a multi-microphone design is also reduced.
参见图13,其示出了根据本公开实施例的信号处理方法应用于图10所示的四扬声器和单麦克风的电路结构100的应用示例,该方法可以包括步骤S1301至S1305。Referring to FIG. 13, which shows an application example of a signal processing method according to an embodiment of the present disclosure applied to the circuit structure 100 of the four-speaker and single-microphone shown in FIG. 10, the method may include steps S1301 to S1305.
在步骤S1301,从扬声器组1001前端提取一路数字信号,得到第一音频信号。In step S1301, a digital signal is extracted from the front end of the speaker group 1001 to obtain a first audio signal.
在步骤S1302,将所述第一音频信号分别输入第一扬声器模型1003a、第二扬声器模型1003b、第三扬声器模型1003c和第四扬声器模型1003d进行声学响应处理,以获得每一个扬声器的声学响应信号。In step S1302, the first audio signal is input to the first speaker model 1003a, the second speaker model 1003b, the third speaker model 1003c, and the fourth speaker model 1003d, respectively, and an acoustic response process is performed to obtain an acoustic response signal of each speaker .
在步骤S1303,根据扬声器组1001中每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息,获得每一个扬声器的参考信号。In step S1303, a reference signal of each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1001 and the first delay and attenuation information corresponding to the acoustic response signal.
在步骤S1304,将每一个扬声器的参考信号进行叠加及位数扩展处理,以获得叠加的参考信号,并将所述叠加的参考信号作为回音估计信号。In step S1304, the reference signal of each speaker is superposed and the number of bits is expanded to obtain a superposed reference signal, and the superposed reference signal is used as an echo estimation signal.
在步骤S1305,将所述回音估计信号输入回音处理模块1006,以对麦克风1002录制的第二音频信号进行回音处理。In step S1305, the echo estimation signal is input to an echo processing module 1006 to perform echo processing on a second audio signal recorded by the microphone 1002.
需要说明的是,上述过程没有考虑麦克风模型对回音估计信号的修正。举例来说,以图10所示的电路结构100为例,结合前述实例,当得到叠加的参考信号之后,可以将叠加的参考信号直接作为回音估计信号,然后利用回音估计信号来对麦克风1002录制的第二音频信号进行回音处理。It should be noted that the above process does not consider the modification of the echo estimation signal by the microphone model. For example, taking the circuit structure 100 shown in FIG. 10 as an example, combined with the foregoing example, after obtaining the superposed reference signal, the superposed reference signal can be directly used as an echo estimation signal, and then the echo estimation signal is used to record the microphone 1002. The second audio signal is subjected to echo processing.
参见图14,其示出了根据本公开实施例的信号处理方法应用于图9所示的四扬声器和单麦克风的电路结构90的应用示例,该方法可以包括步骤S1401至S1409。Referring to FIG. 14, which shows an application example of a signal processing method according to an embodiment of the present disclosure applied to the circuit structure 90 of the four-speaker and single-microphone shown in FIG. 9, the method may include steps S1401 to S1409.
在步骤S1401,从扬声器组901前端提取一路数字信号,得到第一音频信号。In step S1401, a digital signal is extracted from the front end of the speaker group 901 to obtain a first audio signal.
在步骤S1402,将所述第一音频信号分别输入第一扬声器模型903a、第二扬声器模型903b、第三扬声器模型903c和第四扬声器模型903d进行声学响应处理,以获得每一个扬声器的声学响应信号。In step S1402, the first audio signal is input to the first speaker model 903a, the second speaker model 903b, the third speaker model 903c, and the fourth speaker model 903d, respectively, and an acoustic response process is performed to obtain an acoustic response signal of each speaker. .
在步骤S1403,根据扬声器组901中每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息,获得每一个扬声器的参考信号。In step S1403, a reference signal of each speaker is obtained according to an acoustic response signal of each speaker in the speaker group 901 and first delay and attenuation information corresponding to the acoustic response signal.
在步骤S1404,将每一个扬声器的参考信号进行叠加及位数扩展处理,获得叠加的参考信号。In step S1404, the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
在步骤S1405,获取麦克风902的高频谐振峰值。In step S1405, a high-frequency resonance peak of the microphone 902 is acquired.
在步骤S1406,将麦克风902的高频谐振峰值与预设的高频谐振峰值进行比较。In step S1406, the high-frequency resonance peak of the microphone 902 is compared with a preset high-frequency resonance peak.
在步骤S1407,若麦克风902的高频谐振峰值高于预设的高频谐振峰值,则将叠加的参考信号直接作为回音估计信号。In step S1407, if the high-frequency resonance peak of the microphone 902 is higher than a preset high-frequency resonance peak, the superimposed reference signal is directly used as an echo estimation signal.
在步骤S1408,若麦克风902的高频谐振峰值不高于预设的高频谐振峰值,则基于麦克风模型906对叠加的参考信号进行声学响应处理,以获得回音估计信号。In step S1408, if the high-frequency resonance peak of the microphone 902 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 906 to obtain an echo estimation signal.
在步骤S1409,将所述回音估计信号输入回音处理模块907,以对麦克风902录制的第二音频信号进行回音处理。In step S1409, the echo estimation signal is input to an echo processing module 907 to perform echo processing on the second audio signal recorded by the microphone 902.
需要说明的是,与图13所示的过程相比,图14所示的过程增加了麦克风模型以及是否需要麦克风模型对叠加的参考信号进行声学响应处理的判断。举例来说,以图9所示的电路结构90为例,结合前述实例,当得到叠加的参考信号之后,获取麦克风902的高频谐振峰值。假定预设的高频谐振峰值为8kHz,若麦克风902的高频谐振峰值为5kHz,即,麦克风902的高频谐振峰值低于预设的高频谐振峰值,则需要麦克风模型906对叠加的参考信号进行声学响应处理,从而可以得到修正后的回音估计信号,利用修正后的回音估计信号对麦克风902所录制的第二音频信号进行回音处理;若麦克风902的高频谐振峰值为9kHz,即,麦克风902的高频谐振峰值高于预设的高频谐振峰值,则不需要麦克风模型906对叠加的参考信号进行声学响应处理,即,不需要麦克风模型906对回音估计信号进行修正,直接利用叠加的参考信号作为回音估计信号来对麦克风902录制的第二音频信号进行回音处理。It should be noted that, compared with the process shown in FIG. 13, the process shown in FIG. 14 adds a microphone model and a judgment as to whether the microphone model is required to perform acoustic response processing on the superimposed reference signal. For example, taking the circuit structure 90 shown in FIG. 9 as an example, in combination with the foregoing example, after obtaining the superimposed reference signal, a high-frequency resonance peak of the microphone 902 is obtained. Assume that the preset high-frequency resonance peak is 8kHz. If the high-frequency resonance peak of the microphone 902 is 5kHz, that is, the high-frequency resonance peak of the microphone 902 is lower than the preset high-frequency resonance peak, the superimposed reference of the microphone model 906 is required. The signal is subjected to acoustic response processing, so that a modified echo estimation signal can be obtained, and the second audio signal recorded by the microphone 902 is subjected to echo processing using the modified echo estimation signal. If the high-frequency resonance peak of the microphone 902 is 9 kHz, that is, The high-frequency resonance peak of the microphone 902 is higher than the preset high-frequency resonance peak. Therefore, the microphone model 906 is not required to perform acoustic response processing on the superimposed reference signal, that is, the microphone model 906 is not required to modify the echo estimation signal, and the superposition is directly used. The reference signal is used as an echo estimation signal to perform echo processing on the second audio signal recorded by the microphone 902.
参见图15,其示出了根据本公开实施例的信号处理方法应用于 图12所示的四扬声器和单麦克风的电路结构120的应用示例,该方法可以包括步骤S1501至S1509。Referring to FIG. 15, there is shown an application example in which the signal processing method according to the embodiment of the present disclosure is applied to the circuit structure 120 of the four-speaker and single-microphone shown in FIG. 12, and the method may include steps S1501 to S1509.
在步骤S1501,分别从扬声器组1201中每一个扬声器前端提取一路数字信号,得到四路第一音频信号。In step S1501, one digital signal is extracted from the front end of each speaker in the speaker group 1201 to obtain four first audio signals.
在步骤S1502,将四路第一音频信号对应输入第一扬声器模型1203a、第二扬声器模型1203b、第三扬声器模型1203c和第四扬声器模型1203d进行声学响应处理,以获得每一个扬声器的声学响应信号。In step S1502, the four first audio signals are correspondingly input to the first speaker model 1203a, the second speaker model 1203b, the third speaker model 1203c, and the fourth speaker model 1203d, and the acoustic response processing is performed to obtain the acoustic response signal of each speaker .
在步骤S1503,根据扬声器组1201中每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息,获得每一个扬声器的参考信号。In step S1503, a reference signal for each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1201 and the first delay and attenuation information corresponding to the acoustic response signal.
在步骤S1504,将每一个扬声器的参考信号进行叠加及位数扩展处理,以获得叠加的参考信号。In step S1504, the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
在步骤S1505,获取麦克风1202的高频谐振峰值。In step S1505, a high-frequency resonance peak of the microphone 1202 is acquired.
在步骤S1506,将麦克风1202的高频谐振峰值与预设的高频谐振峰值进行比较。In step S1506, the high-frequency resonance peak of the microphone 1202 is compared with a preset high-frequency resonance peak.
在步骤S1507,若麦克风1202的高频谐振峰值高于预设的高频谐振峰值,则将叠加的参考信号作为回音估计信号。In step S1507, if the high-frequency resonance peak of the microphone 1202 is higher than a preset high-frequency resonance peak, the superimposed reference signal is used as an echo estimation signal.
在步骤S1508,若麦克风1202的高频谐振峰值不高于预设的高频谐振峰值,则基于麦克风模型1206对叠加的参考信号进行声学响应处理,以获得回音估计信号。In step S1508, if the high-frequency resonance peak of the microphone 1202 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 1206 to obtain an echo estimation signal.
在步骤S1509,将所述回音估计信号输入回音处理模块1207,以对麦克风1202录制的第二音频信号进行回音处理。In step S1509, the echo estimation signal is input to an echo processing module 1207 to perform echo processing on the second audio signal recorded by the microphone 1202.
需要说明的是,与图14所示的过程相比,图15所示的过程考虑了扬声器组中每一个扬声器所输入的音频信号可能存在不同的情况。其他处理步骤与图14所示的相应的处理步骤相同,这里不再进行详述。It should be noted that, compared with the process shown in FIG. 14, the process shown in FIG. 15 considers that the audio signals input by each speaker in the speaker group may be different. The other processing steps are the same as the corresponding processing steps shown in FIG. 14 and will not be described in detail here.
参见图16,其示出了根据本公开实施例的信号处理方法应用于图11所示的四扬声器和双麦克风的电路结构110的应用示例,该方法可以包括步骤S1601至S1617。Referring to FIG. 16, which shows an application example of a signal processing method according to an embodiment of the present disclosure applied to the circuit structure 110 of the four speakers and dual microphones shown in FIG. 11, the method may include steps S1601 to S1617.
在步骤S1601,分别从扬声器组1101中每一个扬声器前端提取一路数字信号,得到四路第一音频信号。In step S1601, one digital signal is extracted from the front end of each speaker in the speaker group 1101 to obtain four first audio signals.
在步骤S1602,将四路第一音频信号对应输入第一扬声器模型1103a、第二扬声器模型1103b、第三扬声器模型1103c和第四扬声器模型1103d进行声学响应处理,以获得每一个扬声器的声学响应信号。In step S1602, the four first audio signals are correspondingly input to the first speaker model 1103a, the second speaker model 1103b, the third speaker model 1103c, and the fourth speaker model 1103d, and an acoustic response process is performed to obtain an acoustic response signal of each speaker. .
在步骤S1603,根据扬声器组1101中每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息,获得每一个扬声器的第一参考信号。In step S1603, a first reference signal of each speaker is obtained according to an acoustic response signal of each speaker in the speaker group 1101 and first delay and attenuation information corresponding to the acoustic response signal.
在步骤S1604,将每一个扬声器的第一参考信号进行叠加及位数扩展处理,以获得第一叠加的参考信号。In step S1604, the first reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a first superposed reference signal.
在步骤S1605,获取第一麦克风1102a的高频谐振峰值。In step S1605, a high-frequency resonance peak of the first microphone 1102a is acquired.
在步骤S1606,将第一麦克风1102a的高频谐振峰值与预设的高频谐振峰值进行比较。In step S1606, the high-frequency resonance peak of the first microphone 1102a is compared with a preset high-frequency resonance peak.
在步骤S1607,若第一麦克风1102a的高频谐振峰值高于预设的高频谐振峰值,则将所述第一叠加的参考信号作为第一回音估计信号。In step S1607, if the high-frequency resonance peak of the first microphone 1102a is higher than a preset high-frequency resonance peak, the first superimposed reference signal is used as a first echo estimation signal.
在步骤S1608,若第一麦克风1102a的高频谐振峰值不高于预设的高频谐振峰值,则基于第一麦克风模型1106a对所述第一叠加的参考信号进行声学响应处理,以获得第一回音估计信号。In step S1608, if the high-frequency resonance peak of the first microphone 1102a is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the first superimposed reference signal based on the first microphone model 1106a to obtain a first Echo estimation signal.
在步骤S1609,将所述第一回音估计信号输入第一回音处理模块1107a,以对第一麦克风1102a录制的第二音频信号进行第一回音处理。In step S1609, the first echo estimation signal is input to a first echo processing module 1107a to perform a first echo processing on a second audio signal recorded by the first microphone 1102a.
在步骤S1610,根据扬声器组1101中每一个扬声器的声学响应信号以及所述声学响应信号对应的第二延时及衰减信息,获得每一个扬声器的第二参考信号。In step S1610, the second reference signal of each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1101 and the second delay and attenuation information corresponding to the acoustic response signal.
在步骤S1611,将每一个扬声器的第二参考信号进行叠加及位数扩展处理,以获得第二叠加的参考信号。In step S1611, the second reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a second superimposed reference signal.
在步骤S1612,获取第二麦克风1102b的高频谐振峰值。In step S1612, a high-frequency resonance peak of the second microphone 1102b is acquired.
在步骤S1613,将第二麦克风1102b的高频谐振峰值与预设的高频谐振峰值进行比较。In step S1613, the high-frequency resonance peak of the second microphone 1102b is compared with a preset high-frequency resonance peak.
在步骤S1614,若第二麦克风1102b的高频谐振峰值高于预设的高频谐振峰值,则将所述第二叠加的参考信号作为第二回音估计信号。In step S1614, if the high-frequency resonance peak of the second microphone 1102b is higher than a preset high-frequency resonance peak, the second superimposed reference signal is used as a second echo estimation signal.
在步骤S1615,若第二麦克风1102b的高频谐振峰值不高于预设的高频谐振峰值,则基于第二麦克风模型1106b对所述第二叠加的参考信号进行声学响应处理,以获得第二回音估计信号。In step S1615, if the high-frequency resonance peak of the second microphone 1102b is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the second superimposed reference signal based on the second microphone model 1106b to obtain a second Echo estimation signal.
在步骤S1616,将所述第二回音估计信号输入第二回音处理模块1107b,以对第二麦克风1102b录制的第二音频信号进行第二回音处理。In step S1616, the second echo estimation signal is input to a second echo processing module 1107b to perform a second echo processing on the second audio signal recorded by the second microphone 1102b.
在步骤S1617,将经过第一回音处理和第二回音处理后的音频信号输入降噪及其他处理模块1109。In step S1617, the audio signals after the first echo processing and the second echo processing are input to the noise reduction and other processing module 1109.
需要说明的是,与图15所示的过程相比,图16所示的过程考虑了双麦克风的回音处理情况。其他处理步骤与图15所示的相应的处理步骤基本相同,这里不再进行详述。It should be noted that, compared with the process shown in FIG. 15, the process shown in FIG. 16 considers the echo processing situation of the dual microphones. Other processing steps are basically the same as the corresponding processing steps shown in FIG. 15, and will not be described in detail here.
需要说明的是,对于双麦克风的应用,每个麦克风中的回音信号的参考信号(即回音估计信号)需要单独得到,并且还需要单独进行回声处理(比如图11中的第一回音处理模块1107a和第二回音处理模块1107b)。对于扬声器模型所进行的声学响应处理,由于扬声器组是共用的,而且扬声器模型和延时衰减模块分开设置,这样先考虑扬声器模型,再考虑扬声器和每个麦克风之间的位置及距离信息,使得扬声器前端的音频信号只需要经过一次扬声器模型的声学响应处理即可,从而有助于减少在多麦克风的设计方案中的计算工作量。It should be noted that for dual-microphone applications, the reference signal of the echo signal (ie, the echo estimation signal) in each microphone needs to be obtained separately, and echo processing needs to be performed separately (such as the first echo processing module 1107a in FIG. 11). And second echo processing module 1107b). For the acoustic response processing of the speaker model, since the speaker group is shared, and the speaker model and the delay attenuation module are separately provided, so first consider the speaker model, and then consider the position and distance information between the speaker and each microphone, so that The audio signal at the front of the speaker only needs to undergo the acoustic response processing of the speaker model once, thereby helping to reduce the computational workload in a multi-microphone design.
前述各实施例针对四个扬声器的回音信号进行回音处理,本公开实施例也适应于其他多扬声器、多麦克风的回音处理,也适应于单扬声器的回音处理,在本公开实施例中,对此扬声器的数量不作具体限定。The foregoing embodiments perform echo processing on the echo signals of the four speakers. The embodiments of the present disclosure are also suitable for echo processing of other multi-speakers and multi-microphones, as well as echo processing of a single speaker. In the embodiments of the present disclosure, The number of speakers is not specifically limited.
参见图17,其示出了根据本公开实施例的信号处理方法应用于图18所示的单扬声器和单麦克风的电路结构180的应用示例,该方法可以包括步骤S1701至S1708。Referring to FIG. 17, which shows an application example of the signal processing method according to the embodiment of the present disclosure applied to the circuit structure 180 of the single speaker and single microphone shown in FIG. 18, the method may include steps S1701 to S1708.
在步骤S1701,从扬声器1801前端提取一路数字信号,得到第一音频信号。In step S1701, a digital signal is extracted from the front end of the speaker 1801 to obtain a first audio signal.
在步骤S1702,将所述第一音频信号输入扬声器模型1803进行声学响应处理,以获得扬声器1801的声学响应信号。In step S1702, the first audio signal is input to a speaker model 1803 for acoustic response processing to obtain an acoustic response signal of the speaker 1801.
在步骤S1703,根据扬声器1801的声学响应信号以及所述声学响应信号对应的延时及衰减信息,获得扬声器1801的参考信号。In step S1703, a reference signal of the speaker 1801 is obtained according to the acoustic response signal of the speaker 1801 and the delay and attenuation information corresponding to the acoustic response signal.
在步骤S1704,获取麦克风1802的高频谐振峰值。In step S1704, a high-frequency resonance peak of the microphone 1802 is acquired.
在步骤S1705,将麦克风1802的高频谐振峰值与预设的高频谐振峰值进行比较。In step S1705, the high-frequency resonance peak of the microphone 1802 is compared with a preset high-frequency resonance peak.
在步骤S1706,若麦克风1802的高频谐振峰值高于预设的高频谐振峰值,则将所述参考信号作为回音估计信号。In step S1706, if the high-frequency resonance peak of the microphone 1802 is higher than a preset high-frequency resonance peak, the reference signal is used as an echo estimation signal.
在步骤S1707,若麦克风1802的高频谐振峰值不高于预设的高频谐振峰值,则基于麦克风模型1805对所述参考信号进行声学响应处理,以获得回音估计信号。In step S1707, if the high-frequency resonance peak of the microphone 1802 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the reference signal based on the microphone model 1805 to obtain an echo estimation signal.
在步骤S1708,将所述回音估计信号输入加法器1806,以对麦克风1802录制的第二音频信号进行回音处理。In step S1708, the echo estimation signal is input to the adder 1806 to perform echo processing on the second audio signal recorded by the microphone 1802.
举例来说,以图18所示的电路结构180为例,如图18所示,电路结构180包括扬声器1801、麦克风1802、扬声器模型1803、延时衰减模块1804、麦克风模型1805、加法器1806、语音处理模块1807、降噪及其他处理模块1808、解码器1809、编码器1810和发射端1811。扬声器模型1803基于扬声器1801对应建立,并且延时衰减模块1804对经过扬声器模型1803后的音频信号进行对应的延时及衰减。根据扬声器1801与麦克风1802之间的距离及位置信息得到延时及衰减信息。加法器1806的功能和回音处理模块相同,用于对麦克风1802所录制的第二音频信号中由扬声器1801播放的第一音频信号所产生的回音信号进行回音处理。图18所示的电路结构的操作过程同前述多扬声器的电路结构的操作过程类似,这里不再详述。For example, the circuit structure 180 shown in FIG. 18 is taken as an example. As shown in FIG. 18, the circuit structure 180 includes a speaker 1801, a microphone 1802, a speaker model 1803, a delay attenuation module 1804, a microphone model 1805, an adder 1806, The speech processing module 1807, the noise reduction and other processing module 1808, the decoder 1809, the encoder 1810, and the transmitting end 1811. The speaker model 1803 is correspondingly established based on the speaker 1801, and the delay attenuation module 1804 performs corresponding delay and attenuation on the audio signal after passing through the speaker model 1803. The delay and attenuation information is obtained according to the distance and position information between the speaker 1801 and the microphone 1802. The function of the adder 1806 is the same as that of the echo processing module, and is used to perform echo processing on the echo signal generated by the first audio signal played by the speaker 1801 among the second audio signals recorded by the microphone 1802. The operation process of the circuit structure shown in FIG. 18 is similar to the operation process of the aforementioned multi-speaker circuit structure, and will not be described in detail here.
参见图19,其示出了根据本公开实施例的信号处理方法应用于图20所示的双扬声器和单麦克风的电路结构200的应用示例,该方法可以包括步骤S1901至S1909。Referring to FIG. 19, which shows an application example of a signal processing method according to an embodiment of the present disclosure applied to the circuit structure 200 of the dual speaker and single microphone shown in FIG. 20, the method may include steps S1901 to S1909.
在步骤S1901,从第一扬声器2001a、第二扬声器2001b前端分别提取一路数字信号,得到二路第一音频信号。In step S1901, one digital signal is extracted from the front ends of the first speaker 2001a and the second speaker 2001b respectively to obtain two first audio signals.
在步骤S1902,将二路第一音频信号对应输入第一扬声器模型2003a、第二扬声器模型2003b进行声学响应处理,以获得每一个扬声器的声学响应信号。In step S1902, two channels of the first audio signal are correspondingly input to the first speaker model 2003a and the second speaker model 2003b to perform an acoustic response process to obtain an acoustic response signal of each speaker.
在步骤S1903,根据每一个扬声器的声学响应信号以及第一延时衰减模块2004a和第二延时衰减模块2004b,获得每一个扬声器的参考信号。In step S1903, a reference signal of each speaker is obtained according to the acoustic response signal of each speaker and the first delay attenuation module 2004a and the second delay attenuation module 2004b.
在步骤S1904,将每一个扬声器的参考信号进行叠加及位数扩展处理,以获得叠加的参考信号。In step S1904, the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
在步骤S1905,获取麦克风2002的高频谐振峰值。In step S1905, a high-frequency resonance peak of the microphone 2002 is acquired.
在步骤S1906,将麦克风2002的高频谐振峰值与预设的高频谐振峰值进行比较。In step S1906, the high-frequency resonance peak of the microphone 2002 is compared with a preset high-frequency resonance peak.
在步骤S1907,若麦克风2002的高频谐振峰值高于预设的高频谐振峰值,则将叠加的参考信号作为回音估计信号。In step S1907, if the high-frequency resonance peak of the microphone 2002 is higher than a preset high-frequency resonance peak, the superimposed reference signal is used as an echo estimation signal.
在步骤S1908,若麦克风2002的高频谐振峰值不高于预设的高频谐振峰值,则基于麦克风模型2006对叠加的参考信号进行声学响应处理,以获得回音估计信号。In step S1908, if the high-frequency resonance peak of the microphone 2002 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 2006 to obtain an echo estimation signal.
在步骤S1909,将所述回音估计信号输入回音处理模块2007,以对麦克风2002录制的第二音频信号进行回音处理。In step S1909, the echo estimation signal is input to an echo processing module 2007 to perform echo processing on the second audio signal recorded by the microphone 2002.
需要说明的是,以图20所示的电路结构200为例,如图20所示,电路结构200包括第一扬声器2001a、第二扬声器2001b、麦克风2002、第一扬声器模型2003a、第二扬声器模型2003b、第一延时衰减模块2004a、第二延时衰减模块2004b、求和模块2005、麦克风模型2006、加法器2007、语音处理模块2008、降噪及其他处理模块2009、解码器2010、编码器2011和发射端2012。加法器2007的功能和回音处理模块相同。图20所示的电路结构的操作过程同前述多扬声器的电路结构(比如图11)的操作过程类似,这里不再详述。It should be noted that the circuit structure 200 shown in FIG. 20 is taken as an example. As shown in FIG. 20, the circuit structure 200 includes a first speaker 2001a, a second speaker 2001b, a microphone 2002, a first speaker model 2003a, and a second speaker model. 2003b, first delay attenuation module 2004a, second delay attenuation module 2004b, summing module 2005, microphone model 2006, adder 2007, speech processing module 2008, noise reduction and other processing module 2009, decoder 2010, encoder 2011 and launcher 2012. The function of the adder 2007 is the same as that of the echo processing module. The operation process of the circuit structure shown in FIG. 20 is similar to the operation process of the aforementioned multi-speaker circuit structure (such as FIG. 11), and will not be described in detail here.
通过上述实施例,对前述实施例的具体实现进行了详细阐述,从中可以看出,通过前述实施例的技术方案,有效地解决当扬声器或者麦克风的高频谐振峰值较低时所产生的回音啸叫现象,还减小了多麦克风设计中的计算工作量。Through the foregoing embodiments, the specific implementation of the foregoing embodiments is described in detail. It can be seen from the foregoing that the technical solutions of the foregoing embodiments effectively solve the echo whistling generated when the high-frequency resonance peak of the speaker or microphone is low. This also reduces the computational workload in multi-microphone designs.
图21至图25为根据本公开实施例的信号处理装置的结构示意图。21 to 25 are schematic structural diagrams of a signal processing apparatus according to an embodiment of the present disclosure.
参见图21,其示出了根据本公开实施例的信息处理装置210的组成。信号处理装置210可以包括至少一个扬声器2101、至少一个麦克风2102、第一接收部件2103、第一获取部件2104、第二接收部件2105和第二获取部件2106。Referring to FIG. 21, a composition of an information processing apparatus 210 according to an embodiment of the present disclosure is shown. The signal processing device 210 may include at least one speaker 2101, at least one microphone 2102, a first receiving part 2103, a first obtaining part 2104, a second receiving part 2105, and a second obtaining part 2106.
第一接收部件2103配置为接收第一音频信号并由所述至少一个扬声器对所述第一音频信号进行播放。The first receiving part 2103 is configured to receive a first audio signal and play the first audio signal by the at least one speaker.
第一获取部件2104配置为根据至少一个扬声器模型、至少一个麦克风模型以及所述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号,其中,基于所述至少一个扬声器得到所述至少一个扬声器模型,并且基于所述至少一个麦克风得到所述至少一个麦克风模型。The first obtaining part 2104 is configured to obtain at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, and obtain the estimated signal based on the at least one speaker. The at least one speaker model is described, and the at least one microphone model is obtained based on the at least one microphone.
第二接收部件2105配置为利用所述至少一个麦克风接收第二音频信号,其中,所述第二音频信号包括由所述至少一个扬声器输出并通过所述至少一个麦克风接收的由所述第一音频信号产生的回音信号。The second receiving part 2105 is configured to receive a second audio signal by using the at least one microphone, wherein the second audio signal includes the first audio signal output by the at least one speaker and received by the at least one microphone. The echo signal produced by the signal.
第二获取部件2106配置为从所述第二音频信号中去除所述第一音频信号对应的至少一个回音估计信号,以获得回音处理后的音频信号。The second acquisition component 2106 is configured to remove at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
参见图22,信号处理装置210还可以包括预处理部件2107,其配置为对所述第一音频信号进行解调制及语音预处理,其中,所述第一音频信号由远端设备产生并进行发送。Referring to FIG. 22, the signal processing device 210 may further include a pre-processing component 2107 configured to demodulate and pre-process the first audio signal, where the first audio signal is generated and transmitted by a remote device. .
参见图23,信号处理装置210还可以包括建模部件2108,其配置为:根据所述至少一个扬声器的特性信息,对应建立至少一个扬声器模型,其中,所述至少一个扬声器的特性信息包括所述至少一个扬声器对应的电路信息和结构信息;以及根据所述至少一个麦克风的特性信息,对应建立至少一个麦克风模型,其中,所述至少一个麦克风的特性信息包括所述至少一个麦克风对应的电路信息和结构信息。Referring to FIG. 23, the signal processing device 210 may further include a modeling component 2108 configured to: correspondingly establish at least one speaker model according to the characteristic information of the at least one speaker, where the characteristic information of the at least one speaker includes the Circuit information and structure information corresponding to at least one speaker; and correspondingly establishing at least one microphone model according to the characteristic information of the at least one microphone, wherein the characteristic information of the at least one microphone includes circuit information and Structural information.
第一获取部件2104可以配置为:将所述第一音频信号输入每一 个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号;根据每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息,获得每一个扬声器的第一参考信号,其中,基于每一个扬声器与所述麦克风之间的距离及位置信息对应得到所述第一延时及衰减信息;将每一个扬声器的所述第一参考信号进行叠加及位数扩展处理,以获得第一叠加的参考信号;以及基于所述麦克风模型,对所述第一叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第一回音估计信号。The first obtaining part 2104 may be configured to: input the first audio signal into each speaker model and perform acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the acoustic response signal Corresponding first delay and attenuation information to obtain a first reference signal for each speaker, wherein the first delay and attenuation information is correspondingly obtained based on the distance and position information between each speaker and the microphone; Performing superposition and bit expansion processing on the first reference signal of each speaker to obtain a first superimposed reference signal; and performing an acoustic response process on the first superimposed reference signal based on the microphone model to obtain A first echo estimation signal of the first audio signal.
参见图24,信号处理装置210还可以包括第一比较部件2109,其配置为:获取所述麦克风的高频谐振峰值;将所述麦克风的高频谐振峰值与预设的高频谐振峰值进行比较;以及响应于所述麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述麦克风模型对所述第一叠加的参考信号的声学响应处理。Referring to FIG. 24, the signal processing device 210 may further include a first comparison component 2109 configured to: obtain a high-frequency resonance peak of the microphone; and compare the high-frequency resonance peak of the microphone with a preset high-frequency resonance peak And in response to the high-frequency resonance peak of the microphone being higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the microphone model to the first superimposed reference signal.
第一获取部件2104可以配置为:将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号;根据每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息、所述声学响应信号对应的第二延时及衰减信息,获得每一个扬声器的第一参考信号和第二参考信号,其中,基于每一个扬声器与所述至少一个麦克风中的第一麦克风之间的距离及位置信息对应得到所述第一延时及衰减信息,并且基于每一个扬声器与所述至少一个麦克风中的第二麦克风之间的距离及位置信息对应得到所述第二延时及衰减信息;将每一个扬声器的所述第一参考信号和所述第二参考信号分别进行叠加及位数扩展处理,以获得第一叠加的参考信号和第二叠加的参考信号;基于所述至少一个麦克风模型中的第一麦克风模型,对所述第一叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第一回音估计信号;以及基于所述至少一个麦克风模型中的第二麦克风模型,对所述第二叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第二回音估计信号。The first obtaining part 2104 may be configured to: input the first audio signal into each speaker model and perform acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the acoustic response signal Corresponding first delay and attenuation information and second delay and attenuation information corresponding to the acoustic response signal to obtain a first reference signal and a second reference signal for each speaker, wherein, based on each speaker and the at least one The distance and position information between the first microphones in a microphone are correspondingly obtained to obtain the first delay and attenuation information, and are based on the distance and position information between each speaker and the second microphone in the at least one microphone. Obtaining the second delay and attenuation information; and performing superposition and bit expansion on the first reference signal and the second reference signal of each speaker to obtain a first superposed reference signal and a second superposition Based on the first microphone model in the at least one microphone model, Performing acoustic response processing on the first superimposed reference signal to obtain a first echo estimation signal of the first audio signal; and based on a second microphone model of the at least one microphone model, the second superimposed The reference signal is subjected to an acoustic response process to obtain a second echo estimation signal of the first audio signal.
参见图25,信号处理装置210还可以包括第二比较部件2110,其配置为:获取所述第一麦克风的高频谐振峰值和所述第二麦克风的 高频谐振峰值;将所述第一麦克风的高频谐振峰值和所述第二麦克风的高频谐振峰值分别与预设的高频谐振峰值进行比较;响应于所述第一麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述第一麦克风模型对所述第一叠加的参考信号的声学响应处理;以及响应于所述第二麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述第二麦克风模型对所述第二叠加的参考信号的声学响应处理。Referring to FIG. 25, the signal processing device 210 may further include a second comparison component 2110 configured to: obtain a high-frequency resonance peak of the first microphone and a high-frequency resonance peak of the second microphone; The high-frequency resonance peak of the second microphone and the high-frequency resonance peak of the second microphone are respectively compared with a preset high-frequency resonance peak; in response to the high-frequency resonance peak of the first microphone being higher than the preset high-frequency resonance peak, Cancel the acoustic response processing of the first microphone model to the first superimposed reference signal; and cancel the second microphone in response to the high-frequency resonance peak of the second microphone being higher than a preset high-frequency resonance peak The model processes the acoustic response of the second superimposed reference signal.
第一获取部件2104可以配置为:响应于所述至少一个扬声器所接收的第一音频信号相同,将所述第一音频信号分别输入所述至少一个扬声器模型并进行声学响应处理,以获得每一个扬声器的声学响应信号;以及响应于所述至少一个扬声器所接收的第一音频信号不同,将所述第一音频信号对应输入所述至少一个扬声器模型并进行声学响应处理,以获得每一个扬声器的声学响应信号。The first obtaining part 2104 may be configured to: in response to the first audio signals received by the at least one speaker being the same, input the first audio signals to the at least one speaker model and perform acoustic response processing to obtain each An acoustic response signal of the speaker; and in response to the first audio signal received by the at least one speaker being different, correspondingly inputting the first audio signal to the at least one speaker model and performing acoustic response processing to obtain the Acoustic response signal.
可以理解地,在本实施例中,“部件”可以是一部分电路、一部分处理器、一部分程序或软件等等,当然也可以是单元,还可以是模块或非模块化的。Understandably, in this embodiment, the “component” may be a part of a circuit, a part of a processor, a part of a program or software, and the like, of course, it may also be a unit, or may be a module or non-modular.
另外,在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, each component in this embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional modules.
集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在计算机可读取存储介质中。基于这样的理解,本实施例的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on such an understanding, the technical solution of this embodiment may be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for making a computer device (which may be a personal computer, a server, or Network equipment, etc.) or a processor executes all or part of the steps of the method described in this embodiment. The foregoing storage media include: U disks, mobile hard disks, read only memories (ROM, Read Only Memory), random access memories (RAM, Random Access Memory), magnetic disks or optical disks, and other media that can store program codes.
因此,本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,所述计算机程序被至少一个处理器执行时,所述至少一 个处理器执行根据本公开各实施例的信号处理方法。Therefore, an embodiment of the present disclosure provides a computer storage medium on which a computer program is stored. When the computer program is executed by at least one processor, the at least one processor executes a signal processing method according to embodiments of the present disclosure. .
参见图26,其示出了根据本公开实施例的信号处理装置210的硬件结构,信号处理装置210可以包括网络接口2601、存储器2602和处理器2603。各个组件通过总线系统2604耦合在一起。可理解,总线系统2604用于实现这些组件之间的连接通信。总线系统2604可以包括数据总线,还可以包括电源总线、控制总线和状态信号总线。为了清楚说明起见,在图26中将各种总线都标为总线系统2604。Referring to FIG. 26, which illustrates a hardware structure of a signal processing device 210 according to an embodiment of the present disclosure, the signal processing device 210 may include a network interface 2601, a memory 2602, and a processor 2603. The various components are coupled together by a bus system 2604. It can be understood that the bus system 2604 is used to implement connection and communication between these components. The bus system 2604 may include a data bus, and may further include a power bus, a control bus, and a status signal bus. For the sake of clarity, various buses are labeled as the bus system 2604 in FIG. 26.
网络接口2601用于在与其他外部网元之间进行收发信息过程中接收和发送信号。The network interface 2601 is used to receive and send signals during the process of transmitting and receiving information with other external network elements.
存储器2602存储有能够在处理器2603上运行的计算机程序。The memory 2602 stores a computer program capable of running on the processor 2603.
处理器2603在运行所述计算机程序时,可以执行根据本公开各实施例的信号处理方法。When the processor 2603 runs the computer program, it can execute a signal processing method according to various embodiments of the present disclosure.
可以理解,本公开实施例中的存储器2602可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本文描述的系统和方法的存储器2602旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory 2602 in the embodiment of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Non-volatile memory can be Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable read-only memory (EPROM, EEPROM) or flash memory. The volatile memory may be Random Access Memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synchlink DRAM, SLDRAM) And direct memory bus random access memory (Direct RAMbus RAM, DRRAM). The memory 2602 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
处理器2603可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器2603中的硬件的 集成逻辑电路或者软件形式的指令完成。上述的处理器2603可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本公开实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。存储介质位于存储器2602,处理器2603读取存储器2602中的信息,结合其硬件完成上述方法的步骤。The processor 2603 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 2603 or an instruction in the form of software. The above-mentioned processor 2603 may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA), or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods, steps, and logical block diagrams disclosed in the embodiments of the present disclosure may be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in combination with the embodiments of the present disclosure may be directly embodied as being executed by a hardware decoding processor, or may be executed and completed by using a combination of hardware and software modules in the decoding processor. The software module may be located in a mature storage medium such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, and the like. The storage medium is located in the memory 2602, and the processor 2603 reads the information in the memory 2602 and completes the steps of the foregoing method in combination with its hardware.
可以理解的是,本文描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本公开所述功能的其它电子单元或其组合中。It can be understood that the embodiments described herein may be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof. For hardware implementation, the processing unit can be implemented in one or more application-specific integrated circuits (ASICs), digital signal processors (DSP), digital signal processing devices (DSPD), programmable Programmable Logic Device (PLD), Field-Programmable Gate Array (FPGA), general-purpose processor, controller, microcontroller, microprocessor, other for performing functions described in this disclosure Electronic unit or combination thereof.
对于软件实现,可通过执行本文所述功能的模块(例如过程、函数等)来实现本文所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。For software implementation, the techniques described herein can be implemented through modules (e.g., procedures, functions, etc.) that perform the functions described herein. Software codes may be stored in a memory and executed by a processor. The memory may be implemented in the processor or external to the processor.
需要说明的是,在本公开实施例所记载的各个技术方案之间,在不冲突的情况下,可以任意组合。It should be noted that the technical solutions described in the embodiments of the present disclosure can be arbitrarily combined without conflict.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the present disclosure, but the scope of protection of the present disclosure is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in the present disclosure. It should be covered by the protection scope of this disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (18)

  1. 一种信号处理方法,应用于具有至少一个扬声器和至少一个麦克风的信号处理装置,所述方法包括:A signal processing method applied to a signal processing device having at least one speaker and at least one microphone. The method includes:
    接收第一音频信号并由所述至少一个扬声器对所述第一音频信号进行播放;Receiving a first audio signal and playing the first audio signal by the at least one speaker;
    根据至少一个扬声器模型、至少一个麦克风模型以及所述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号,其中,基于所述至少一个扬声器得到所述至少一个扬声器模型,并且基于所述至少一个麦克风得到所述至少一个麦克风模型;Obtaining at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, wherein the at least one speaker model is obtained based on the at least one speaker, and Obtaining the at least one microphone model based on the at least one microphone;
    利用所述至少一个麦克风接收第二音频信号,其中,所述第二音频信号包括由所述至少一个扬声器输出并通过所述至少一个麦克风接收的由所述第一音频信号产生的回音信号;以及Receiving a second audio signal using the at least one microphone, wherein the second audio signal includes an echo signal generated by the first audio signal output by the at least one speaker and received through the at least one microphone; and
    从所述第二音频信号中去除所述第一音频信号对应的至少一个回音估计信号,以获得回音处理后的音频信号。Removing at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
  2. 根据权利要求1所述的信号处理方法,其中,在接收第一音频信号并由所述至少一个扬声器对所述第一音频信号进行播放的步骤之前,所述方法还包括:The signal processing method according to claim 1, wherein before the step of receiving a first audio signal and playing the first audio signal by the at least one speaker, the method further comprises:
    对所述第一音频信号进行解调制及语音预处理,其中,所述第一音频信号由远端设备产生并进行发送。Demodulate and pre-process the first audio signal, wherein the first audio signal is generated and transmitted by a remote device.
  3. 根据权利要求1所述的信号处理方法,其中,在根据至少一个扬声器模型、至少一个麦克风模型以及所述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号的步骤之前,所述方法还包括:The signal processing method according to claim 1, wherein before the step of obtaining at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, The method further includes:
    根据所述至少一个扬声器的特性信息,对应建立至少一个扬声器模型,其中,所述至少一个扬声器的特性信息包括所述至少一个扬声器对应的电路信息和结构信息;以及Establish at least one speaker model corresponding to the characteristic information of the at least one speaker, wherein the characteristic information of the at least one speaker includes circuit information and structure information corresponding to the at least one speaker; and
    根据所述至少一个麦克风的特性信息,对应建立至少一个麦克 风模型,其中,所述至少一个麦克风的特性信息包括所述至少一个麦克风对应的电路信息和结构信息。According to the characteristic information of the at least one microphone, at least one microphone model is correspondingly established, wherein the characteristic information of the at least one microphone includes circuit information and structure information corresponding to the at least one microphone.
  4. 根据权利要求3所述的信号处理方法,其中,所述至少一个麦克风的数量为一个,对应的麦克风模型的数量为一个,并且根据至少一个扬声器模型、至少一个麦克风模型以及所述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号的步骤包括:The signal processing method according to claim 3, wherein the number of the at least one microphone is one, the number of corresponding microphone models is one, and according to at least one speaker model, at least one microphone model, and the first audio signal The step of obtaining at least one echo estimation signal corresponding to the first audio signal includes:
    将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号;Inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker;
    根据每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息,获得每一个扬声器的第一参考信号,其中,基于每一个扬声器与所述麦克风之间的距离及位置信息对应得到所述第一延时及衰减信息;The first reference signal of each speaker is obtained according to the acoustic response signal of each speaker and the first delay and attenuation information corresponding to the acoustic response signal, wherein based on the distance and position between each speaker and the microphone Information corresponding to the first delay and attenuation information;
    将每一个扬声器的所述第一参考信号进行叠加及位数扩展处理,以获得第一叠加的参考信号;以及Performing superposition and bit expansion processing on the first reference signal of each speaker to obtain a first superimposed reference signal; and
    基于所述麦克风模型,对所述第一叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第一回音估计信号。Based on the microphone model, an acoustic response process is performed on the first superimposed reference signal to obtain a first echo estimation signal of the first audio signal.
  5. 根据权利要求4所述的信号处理方法,其中,在获得第一叠加的参考信号的步骤之后,所述方法还包括:The signal processing method according to claim 4, wherein after the step of obtaining a first superimposed reference signal, the method further comprises:
    获取所述麦克风的高频谐振峰值;Obtaining a high-frequency resonance peak of the microphone;
    将所述麦克风的高频谐振峰值与预设的高频谐振峰值进行比较;以及Comparing the high-frequency resonance peak of the microphone with a preset high-frequency resonance peak; and
    响应于所述麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述麦克风模型对所述第一叠加的参考信号的声学响应处理。In response to that the high-frequency resonance peak of the microphone is higher than a preset high-frequency resonance peak, the acoustic response processing of the microphone model to the first superimposed reference signal is cancelled.
  6. 根据权利要求3所述的信号处理方法,其中,所述至少一个麦克风的数量为两个,对应的麦克风模型的数量为两个,并且根据至少一个扬声器模型、至少一个麦克风模型以及所述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号的步骤包括:The signal processing method according to claim 3, wherein the number of the at least one microphone is two, the number of corresponding microphone models is two, and according to at least one speaker model, at least one microphone model, and the first For an audio signal, the step of obtaining at least one echo estimation signal corresponding to the first audio signal includes:
    将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号;Inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker;
    根据每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息、所述声学响应信号对应的第二延时及衰减信息,获得每一个扬声器的第一参考信号和第二参考信号,其中,基于每一个扬声器与所述至少一个麦克风中的第一麦克风之间的距离及位置信息对应得到所述第一延时及衰减信息,并且基于每一个扬声器与所述至少一个麦克风中的第二麦克风之间的距离及位置信息对应得到所述第二延时及衰减信息;According to the acoustic response signal of each speaker and the first delay and attenuation information corresponding to the acoustic response signal and the second delay and attenuation information corresponding to the acoustic response signal, a first reference signal and a first reference signal of each speaker are obtained. Two reference signals, wherein the first delay and attenuation information is correspondingly obtained based on distance and position information between each speaker and a first microphone in the at least one microphone, and based on each speaker and the at least one The distance and position information between the second microphones in the microphones correspond to the second delay and attenuation information;
    将每一个扬声器的所述第一参考信号和所述第二参考信号分别进行叠加及位数扩展处理,以获得第一叠加的参考信号和第二叠加的参考信号;Performing superposition and bit expansion on the first reference signal and the second reference signal of each speaker to obtain a first superposed reference signal and a second superposed reference signal;
    基于所述至少一个麦克风模型中的第一麦克风模型,对所述第一叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第一回音估计信号;以及Performing an acoustic response process on the first superimposed reference signal based on a first microphone model of the at least one microphone model to obtain a first echo estimation signal of the first audio signal; and
    基于所述至少一个麦克风模型中的第二麦克风模型,对所述第二叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第二回音估计信号。Based on the second microphone model of the at least one microphone model, performing acoustic response processing on the second superimposed reference signal to obtain a second echo estimation signal of the first audio signal.
  7. 根据权利要求6所述的信号处理方法,其中,在获得第一叠加的参考信号和第二叠加的参考信号的步骤之后,所述方法还包括:The signal processing method according to claim 6, wherein after the step of obtaining a first superimposed reference signal and a second superimposed reference signal, the method further comprises:
    获取所述第一麦克风的高频谐振峰值和所述第二麦克风的高频谐振峰值;Obtaining a high-frequency resonance peak of the first microphone and a high-frequency resonance peak of the second microphone;
    将所述第一麦克风的高频谐振峰值和所述第二麦克风的高频谐振峰值分别与预设的高频谐振峰值进行比较;Comparing the high-frequency resonance peak of the first microphone and the high-frequency resonance peak of the second microphone with preset high-frequency resonance peaks, respectively;
    响应于所述第一麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述第一麦克风模型对所述第一叠加的参考信号的声学响应处理;以及In response to the high-frequency resonance peak of the first microphone being higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the first microphone model to the first superimposed reference signal; and
    响应于所述第二麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述第二麦克风模型对所述第二叠加的参考信号的声学响应 处理。In response to the high-frequency resonance peak value of the second microphone being higher than a preset high-frequency resonance peak value, canceling the acoustic response processing of the second microphone model to the second superimposed reference signal.
  8. 根据权利要求4至7中任一项所述的信号处理方法,其中,将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号的步骤包括:The signal processing method according to any one of claims 4 to 7, wherein the step of inputting the first audio signal into each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker comprises:
    响应于所述至少一个扬声器所接收的第一音频信号相同,将所述第一音频信号分别输入所述至少一个扬声器模型并进行声学响应处理,以获得每一个扬声器的声学响应信号;以及In response to the first audio signals received by the at least one speaker being the same, inputting the first audio signals to the at least one speaker model separately and performing acoustic response processing to obtain an acoustic response signal of each speaker; and
    响应于所述至少一个扬声器所接收的第一音频信号不同,将所述第一音频信号对应输入所述至少一个扬声器模型并进行声学响应处理,以获得每一个扬声器的声学响应信号。In response to different first audio signals received by the at least one speaker, the first audio signal is correspondingly input to the at least one speaker model and an acoustic response process is performed to obtain an acoustic response signal of each speaker.
  9. 一种信号处理装置,包括至少一个扬声器、至少一个麦克风、第一接收部件、第一获取部件、第二接收部件和第二获取部件,其中,A signal processing device includes at least one speaker, at least one microphone, a first receiving part, a first obtaining part, a second receiving part, and a second obtaining part, wherein:
    所述第一接收部件配置为接收第一音频信号并由所述至少一个扬声器对所述第一音频信号进行播放;The first receiving component is configured to receive a first audio signal and play the first audio signal by the at least one speaker;
    所述第一获取部件配置为根据至少一个扬声器模型、至少一个麦克风模型以及所述第一音频信号,获取所述第一音频信号对应的至少一个回音估计信号,其中,基于所述至少一个扬声器得到所述至少一个扬声器模型,并且基于所述至少一个麦克风得到所述至少一个麦克风模型;The first obtaining component is configured to obtain at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, wherein, based on the at least one speaker, The at least one speaker model, and obtaining the at least one microphone model based on the at least one microphone;
    所述第二接收部件配置为利用所述至少一个麦克风接收第二音频信号,其中,所述第二音频信号包括由所述至少一个扬声器输出并通过所述至少一个麦克风接收的由所述第一音频信号产生的回音信号;并且The second receiving component is configured to receive a second audio signal using the at least one microphone, wherein the second audio signal includes the first audio signal output by the at least one speaker and received by the at least one microphone. Echo signals from audio signals; and
    所述第二获取部件配置为从所述第二音频信号中去除所述第一音频信号对应的至少一个回音估计信号,以获得回音处理后的音频信号。The second acquisition component is configured to remove at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
  10. 根据权利要求9所述的信号处理装置,还包括预处理部件, 其配置为对所述第一音频信号进行解调制及语音预处理,其中,所述第一音频信号由远端设备产生并进行发送。The signal processing device according to claim 9, further comprising a pre-processing component configured to demodulate and pre-process the first audio signal, wherein the first audio signal is generated and performed by a remote device. send.
  11. 根据权利要求9所述的信号处理装置,还包括建模部件,其配置为:The signal processing device according to claim 9, further comprising a modeling component configured to:
    根据所述至少一个扬声器的特性信息,对应建立至少一个扬声器模型,其中,所述至少一个扬声器的特性信息包括所述至少一个扬声器对应的电路信息和结构信息;以及Establish at least one speaker model corresponding to the characteristic information of the at least one speaker, wherein the characteristic information of the at least one speaker includes circuit information and structure information corresponding to the at least one speaker; and
    根据所述至少一个麦克风的特性信息,对应建立至少一个麦克风模型,其中,所述至少一个麦克风的特性信息包括所述至少一个麦克风对应的电路信息和结构信息。According to the characteristic information of the at least one microphone, at least one microphone model is correspondingly established, wherein the characteristic information of the at least one microphone includes circuit information and structure information corresponding to the at least one microphone.
  12. 根据权利要求11所述的信号处理装置,其中,所述第一获取部件配置为:The signal processing device according to claim 11, wherein the first acquisition component is configured to:
    将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号;Inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker;
    根据每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息,获得每一个扬声器的第一参考信号,其中,基于每一个扬声器与所述麦克风之间的距离及位置信息对应得到所述第一延时及衰减信息;The first reference signal of each speaker is obtained according to the acoustic response signal of each speaker and the first delay and attenuation information corresponding to the acoustic response signal, wherein based on the distance and position between each speaker and the microphone Information corresponding to the first delay and attenuation information;
    将每一个扬声器的所述第一参考信号进行叠加及位数扩展处理,以获得第一叠加的参考信号;以及Performing superposition and bit expansion processing on the first reference signal of each speaker to obtain a first superimposed reference signal; and
    基于所述麦克风模型,对所述第一叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第一回音估计信号。Based on the microphone model, an acoustic response process is performed on the first superimposed reference signal to obtain a first echo estimation signal of the first audio signal.
  13. 根据权利要求12所述的信号处理装置,还包括第一比较部件,其配置为:The signal processing device according to claim 12, further comprising a first comparison section configured to:
    获取所述麦克风的高频谐振峰值;Obtaining a high-frequency resonance peak of the microphone;
    将所述麦克风的高频谐振峰值与预设的高频谐振峰值进行比较;以及Comparing the high-frequency resonance peak of the microphone with a preset high-frequency resonance peak; and
    响应于所述麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述麦克风模型对所述第一叠加的参考信号的声学响应处理。In response to that the high-frequency resonance peak of the microphone is higher than a preset high-frequency resonance peak, the acoustic response processing of the microphone model to the first superimposed reference signal is cancelled.
  14. 根据权利要求11所述的信号处理装置,其中,所述第一获取部件配置为:The signal processing device according to claim 11, wherein the first acquisition component is configured to:
    将所述第一音频信号输入每一个扬声器模型进行声学响应处理,以获得每一个扬声器的声学响应信号;Inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker;
    根据每一个扬声器的声学响应信号以及所述声学响应信号对应的第一延时及衰减信息、所述声学响应信号对应的第二延时及衰减信息,获得每一个扬声器的第一参考信号和第二参考信号,其中,基于每一个扬声器与所述至少一个麦克风中的第一麦克风之间的距离及位置信息对应得到所述第一延时及衰减信息,并且基于每一个扬声器与所述至少一个麦克风中的第二麦克风之间的距离及位置信息对应得到所述第二延时及衰减信息;According to the acoustic response signal of each speaker and the first delay and attenuation information corresponding to the acoustic response signal and the second delay and attenuation information corresponding to the acoustic response signal, a first reference signal and a first reference signal of each speaker are obtained. Two reference signals, wherein the first delay and attenuation information is correspondingly obtained based on distance and position information between each speaker and a first microphone in the at least one microphone, and based on each speaker and the at least one The distance and position information between the second microphones in the microphones correspond to the second delay and attenuation information;
    将每一个扬声器的所述第一参考信号和所述第二参考信号分别进行叠加及位数扩展处理,以获得第一叠加的参考信号和第二叠加的参考信号;Performing superposition and bit expansion on the first reference signal and the second reference signal of each speaker to obtain a first superposed reference signal and a second superposed reference signal;
    基于所述至少一个麦克风模型中的第一麦克风模型,对所述第一叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第一回音估计信号;以及Performing an acoustic response process on the first superimposed reference signal based on a first microphone model of the at least one microphone model to obtain a first echo estimation signal of the first audio signal; and
    基于所述至少一个麦克风模型中的第二麦克风模型,对所述第二叠加的参考信号进行声学响应处理,以获得所述第一音频信号的第二回音估计信号。Based on the second microphone model of the at least one microphone model, performing acoustic response processing on the second superimposed reference signal to obtain a second echo estimation signal of the first audio signal.
  15. 根据权利要求14所述的信号处理装置,还包括第二比较部件,其配置为:The signal processing device according to claim 14, further comprising a second comparison section configured to:
    获取所述第一麦克风的高频谐振峰值和所述第二麦克风的高频谐振峰值;Obtaining a high-frequency resonance peak of the first microphone and a high-frequency resonance peak of the second microphone;
    将所述第一麦克风的高频谐振峰值和所述第二麦克风的高频谐振峰值分别与预设的高频谐振峰值进行比较;Comparing the high-frequency resonance peak of the first microphone and the high-frequency resonance peak of the second microphone with preset high-frequency resonance peaks, respectively;
    响应于所述第一麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述第一麦克风模型对所述第一叠加的参考信号的声学响应处理;以及In response to the high-frequency resonance peak of the first microphone being higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the first microphone model to the first superimposed reference signal; and
    响应于所述第二麦克风的高频谐振峰值高于预设的高频谐振峰值,取消所述第二麦克风模型对所述第二叠加的参考信号的声学响应处理。In response to the high-frequency resonance peak value of the second microphone being higher than a preset high-frequency resonance peak value, canceling the acoustic response processing of the second microphone model to the second superimposed reference signal.
  16. 根据权利要求12至15中任一项所述的信号处理装置,其中,所述第一获取部件配置为:The signal processing device according to any one of claims 12 to 15, wherein the first acquisition component is configured to:
    响应于所述至少一个扬声器所接收的第一音频信号相同,将所述第一音频信号分别输入所述至少一个扬声器模型并进行声学响应处理,以获得每一个扬声器的声学响应信号;以及In response to the first audio signals received by the at least one speaker being the same, inputting the first audio signals to the at least one speaker model separately and performing acoustic response processing to obtain an acoustic response signal of each speaker; and
    响应于所述至少一个扬声器所接收的第一音频信号不同,将所述第一音频信号对应输入所述至少一个扬声器模型并进行声学响应处理,以获得每一个扬声器的声学响应信号。In response to different first audio signals received by the at least one speaker, the first audio signal is correspondingly input to the at least one speaker model and an acoustic response process is performed to obtain an acoustic response signal of each speaker.
  17. 一种信号处理装置,包括存储器和处理器,其中,A signal processing device includes a memory and a processor, wherein:
    所述存储器存储有计算机程序,当所述处理器运行所述计算机程序时,所述处理器执行根据权利要求1至8中任一项所述的信号处理方法。The memory stores a computer program, and when the processor runs the computer program, the processor executes the signal processing method according to any one of claims 1 to 8.
  18. 一种计算机存储介质,其上存储有计算机程序,所述计算机程序被至少一个处理器执行时,所述至少一个处理器执行根据权利要求1至8中任一项所述的信号处理方法。A computer storage medium stores a computer program thereon, and when the computer program is executed by at least one processor, the at least one processor executes the signal processing method according to any one of claims 1 to 8.
PCT/CN2019/097552 2018-07-25 2019-07-24 Signal processing method and device, and computer storage medium WO2020020247A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810826906.7A CN110769352B (en) 2018-07-25 2018-07-25 Signal processing method and device and computer storage medium
CN201810826906.7 2018-07-25

Publications (1)

Publication Number Publication Date
WO2020020247A1 true WO2020020247A1 (en) 2020-01-30

Family

ID=69182149

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097552 WO2020020247A1 (en) 2018-07-25 2019-07-24 Signal processing method and device, and computer storage medium

Country Status (2)

Country Link
CN (1) CN110769352B (en)
WO (1) WO2020020247A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037808B (en) * 2020-09-01 2024-04-19 杭州岁丰信息技术有限公司 Echo cancellation method and device for elevator car
CN113096678A (en) * 2021-03-31 2021-07-09 康佳集团股份有限公司 Voice echo cancellation method, device, terminal equipment and storage medium
CN115223581A (en) * 2021-04-17 2022-10-21 华为技术有限公司 Echo time delay estimation method for distributed equipment and electronic equipment
WO2024080590A1 (en) * 2022-10-14 2024-04-18 삼성전자주식회사 Electronic device and method for detecting signal error
CN116582803B (en) * 2023-06-01 2023-10-20 广州市声讯电子科技股份有限公司 Self-adaptive control method, system, storage medium and terminal for loudspeaker array

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1176034A (en) * 1995-02-24 1998-03-11 艾利森公司 Apparatus and method for canceling acoustic echoes including non-linear distortions in loudspeaker telephones
CN106297816A (en) * 2015-05-20 2017-01-04 广州质音通讯技术有限公司 The non-linear processing methods of a kind of echo cancellor and device and electronic equipment
WO2017065989A1 (en) * 2015-10-12 2017-04-20 Microsoft Technology Licensing, Llc Audio signal processing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2675063B1 (en) * 2012-06-13 2016-04-06 Dialog Semiconductor GmbH Agc circuit with optimized reference signal energy levels for an echo cancelling circuit
US9531433B2 (en) * 2014-02-07 2016-12-27 Analog Devices Global Echo cancellation methodology and assembly for electroacoustic communication apparatuses
JP6258061B2 (en) * 2014-02-17 2018-01-10 クラリオン株式会社 Acoustic processing apparatus, acoustic processing method, and acoustic processing program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1176034A (en) * 1995-02-24 1998-03-11 艾利森公司 Apparatus and method for canceling acoustic echoes including non-linear distortions in loudspeaker telephones
CN106297816A (en) * 2015-05-20 2017-01-04 广州质音通讯技术有限公司 The non-linear processing methods of a kind of echo cancellor and device and electronic equipment
WO2017065989A1 (en) * 2015-10-12 2017-04-20 Microsoft Technology Licensing, Llc Audio signal processing

Also Published As

Publication number Publication date
CN110769352B (en) 2022-07-15
CN110769352A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
WO2020020247A1 (en) Signal processing method and device, and computer storage medium
KR101547035B1 (en) Three-dimensional sound capturing and reproducing with multi-microphones
JP4286637B2 (en) Microphone device and playback device
KR101210313B1 (en) System and method for utilizing inter?microphone level differences for speech enhancement
JP6703525B2 (en) Method and device for enhancing sound source
US11297178B2 (en) Method, apparatus, and computer-readable media utilizing residual echo estimate information to derive secondary echo reduction parameters
US9516411B2 (en) Signal-separation system using a directional microphone array and method for providing same
US10250975B1 (en) Adaptive directional audio enhancement and selection
EP3005362B1 (en) Apparatus and method for improving a perception of a sound signal
Benesty et al. Binaural noise reduction in the time domain with a stereo setup
Schwartz et al. Nested generalized sidelobe canceller for joint dereverberation and noise reduction
TWI465121B (en) System and method for utilizing omni-directional microphones for speech enhancement
JP3583980B2 (en) Sound collecting device and receiving device
US20220225024A1 (en) Method and system for using single adaptive filter for echo and point noise cancellation
Garre et al. An Acoustic Echo Cancellation System based on Adaptive Algorithm
US10419851B2 (en) Retaining binaural cues when mixing microphone signals
Ruiz et al. Distributed combined acoustic echo cancellation and noise reduction using GEVD-based distributed adaptive node specific signal estimation with prior knowledge
Beracoechea et al. On building immersive audio applications using robust adaptive beamforming and joint audio-video source localization
Pasha et al. Multi-Channel Compression and Coding of Reverberant Ad-Hoc Recordings Through Spatial Autoregressive Modelling
Braun Speech Dereverberation in Noisy Environments Using Time-Frequency Domain Signal Models/Enthallung von Sprachsignalen unter Einfluss von Störgeräuschen mittels Signalmodellen im Zeit-Frequenz-Bereich
Lombard et al. Improved wideband blind adaptive system identification using decorrelation filters for the localization of multiple speakers
Arote et al. Multimicrophone Based Speech Dereverberation
CN116364104A (en) Audio transmission method, device, chip, equipment and medium
Boland et al. A Multichannel Speech Dereverberation Technique Based Upon the Wiener Filter

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19840134

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19840134

Country of ref document: EP

Kind code of ref document: A1