WO2020020247A1 - Procédé et dispositif de traitement de signal, et support de stockage informatique - Google Patents

Procédé et dispositif de traitement de signal, et support de stockage informatique Download PDF

Info

Publication number
WO2020020247A1
WO2020020247A1 PCT/CN2019/097552 CN2019097552W WO2020020247A1 WO 2020020247 A1 WO2020020247 A1 WO 2020020247A1 CN 2019097552 W CN2019097552 W CN 2019097552W WO 2020020247 A1 WO2020020247 A1 WO 2020020247A1
Authority
WO
WIPO (PCT)
Prior art keywords
microphone
speaker
signal
model
audio signal
Prior art date
Application number
PCT/CN2019/097552
Other languages
English (en)
Chinese (zh)
Inventor
崔腾飞
Original Assignee
西安中兴新软件有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安中兴新软件有限责任公司 filed Critical 西安中兴新软件有限责任公司
Publication of WO2020020247A1 publication Critical patent/WO2020020247A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/03Connection circuits to selectively connect loudspeakers or headphones to amplifiers

Definitions

  • the present disclosure relates to, but is not limited to, the field of audio signal processing technology.
  • An embodiment of the present disclosure provides a signal processing method, which is applied to a signal processing device having at least one speaker and at least one microphone.
  • the method includes: receiving a first audio signal and the first audio signal being processed by the at least one speaker. Playing the signal; obtaining at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, wherein the at least one is obtained based on the at least one speaker A speaker model, and obtaining the at least one microphone model based on the at least one microphone; receiving a second audio signal using the at least one microphone, wherein the second audio signal includes an output by the at least one speaker and passes through the at least one speaker An echo signal generated by the first audio signal and received by at least one microphone; and removing at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
  • An embodiment of the present disclosure further provides a signal processing device including at least one speaker, at least one microphone, a first receiving part, a first obtaining part, a second receiving part, and a second obtaining part, wherein the first receiving part Configured to receive a first audio signal and play the first audio signal by the at least one speaker;
  • the first acquisition component is configured to be based on at least one speaker model, at least one microphone model, and the first audio signal, Acquiring at least one echo estimation signal corresponding to the first audio signal, wherein the at least one speaker model is obtained based on the at least one speaker, and the at least one microphone model is obtained based on the at least one microphone;
  • the second The receiving component is configured to receive a second audio signal by using the at least one microphone, wherein the second audio signal includes a first audio signal generated by the at least one microphone and received by the at least one microphone.
  • An echo signal; and the second acquisition component is configured It is configured to remove at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain
  • An embodiment of the present disclosure further provides a signal processing apparatus including a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program, the processor executes the computer program according to the present disclosure. Signal processing method.
  • An embodiment of the present disclosure further provides a computer storage medium on which a computer program is stored.
  • the computer program is executed by at least one processor, the at least one processor executes a signal processing method according to the present disclosure.
  • FIG. 1 shows a schematic circuit structure of a single speaker and a single microphone
  • Figure 2 shows a schematic circuit structure of a single speaker and a dual microphone
  • FIG. 3 shows a schematic circuit structure of a dual speaker and a single microphone
  • FIG. 4 is a schematic diagram showing a curve comparison between an echo reference signal and a microphone recording signal
  • FIG. 5 is another schematic diagram of comparison between an echo reference signal and a microphone recording signal
  • FIG. 6 shows another schematic circuit structure diagram of a dual speaker and a single microphone
  • FIG. 7 shows a circuit structure diagram of a dual speaker and a dual microphone
  • FIG. 8 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of another circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure
  • FIG. 11 is a schematic diagram showing a circuit structure of a four-speaker and a dual microphone according to an embodiment of the present disclosure
  • FIG. 12 is a schematic diagram of another circuit structure of a four-speaker and a single microphone according to an embodiment of the present disclosure
  • FIG. 13 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 14 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • 15 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • 16 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 17 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 18 is a schematic diagram of a circuit structure of a single speaker and a single microphone according to an embodiment of the present disclosure
  • FIG. 19 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • FIG. 20 is a schematic diagram of a circuit structure of a dual speaker and a single microphone according to an embodiment of the present disclosure
  • 21 is a schematic structural diagram of a signal processing device according to an embodiment of the present disclosure.
  • 22 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure.
  • FIG. 23 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure.
  • FIG. 24 is another schematic structural diagram of a signal processing apparatus according to an embodiment of the present disclosure.
  • 25 is another schematic structural diagram of a signal processing device according to an embodiment of the present disclosure.
  • FIG. 26 is a schematic diagram of a hardware structure of a signal processing apparatus according to an embodiment of the present disclosure.
  • Echo is mainly the repetition of sound caused by the reflection of sound waves, that is, the sound emitted by the sound source is reflected back to the position of the sound source.
  • microphones and speakers are often used.
  • the microphone transmits voice or other sound data from the near end to the far end.
  • the microphone Since the speaker is disposed adjacent to the microphone, the sound emitted by the speaker will be received by the microphone, which is called an echo.
  • the echo In the absence of processing, the echo will be heard by the far-end user, resulting in unexpectedly great noise and unpleasant psychoacoustic experience. Therefore, echo cancellation technology is introduced to perform echo processing on the echoes extracted by the microphone.
  • Echo cancellation technology is a technology that processes the sound played by the speakers received by the microphone and retains only the sound played by non-speakers.
  • the echo cancellation technology commonly used at present is a single speaker (earpiece) echo cancellation technology, a stereo (dual speaker) echo cancellation technology and more channels (speakers) echo cancellation technology are relatively less used.
  • the circuit structure 10 includes a speaker 101, a microphone 102, an adder 103, an adaptive filter (AF) module 104, a voice processing module 105, a noise reduction and other processing module 106, a decoder 107, Encoder 108 and radio frequency end 109.
  • the radio frequency terminal 109 sends the received audio signal to the decoder 107, and the decoder 107 demodulates the audio signal.
  • the demodulated audio signal enters the voice processing module 105 for voice processing such as noise reduction and filtering, and then the voice signal Will be played by speaker 101.
  • the audio signal recorded by the microphone 102 includes an echo signal generated by the audio signal played by the speaker 101.
  • an audio signal is extracted at the front end of the speaker 101 as a reference signal.
  • the reference signal is input to the adder 103 after passing through the AF module 104, and the audio signal recorded by the microphone 102 is also input to the adder 103.
  • the two signals are subjected to subtraction processing in the adder 103, so that the audio signals recorded by the microphone 102 can be subjected to echo processing.
  • the audio signal that has undergone the echo processing is then processed by the noise reduction and other processing module 106 for voice processing, modulated in the encoder 108, and finally transmitted by the transmitting end 109 through the transmission line.
  • FIG. 2 shows an application example of the circuit structure 20 of a single speaker and a dual microphone.
  • the circuit structure 20 includes a speaker 201, a first microphone 202a, a second microphone 202b, a first adder 203a, a second adder 203b, a first AF module 204a, a second AF module 204b, and a voice processing module 205.
  • the audio signal at the front end of the speaker 201 is selected as a reference signal.
  • the reference signal performs echo processing on the audio signal received by the first microphone 202a through the first AF module 204a and the first adder 203a, and on the second microphone 202b through the second AF module 204b and the second adder 203b.
  • the received audio signal is subjected to echo processing.
  • the audio signal after the echo processing is input to the noise reduction and other processing module 206 for noise reduction processing, is modulated in the encoder 208, and is finally transmitted by the transmitting end 209 through the transmission line.
  • FIG. 3 shows an application example of the circuit structure 30 of a dual speaker and a single microphone.
  • the circuit structure 30 includes a first speaker 301a, a second speaker 301b, a microphone 302, an adder 303, an AF module 304, a voice processing module 305, a noise reduction and other processing module 306, and a decoder 307. , Encoder 308 and transmitting end 309.
  • the audio signals at the front ends of the first speaker 301a and the second speaker 301b are selected as reference signals for echo processing.
  • the reference signal performs echo processing on the audio signal received by the microphone 302 through the AF module 304 and the adder 303.
  • the audio signal after the echo processing is input to the noise reduction and other processing module 306 for noise reduction processing, then is modulated in the encoder 308, and finally sent by the transmitting terminal 309 through the transmission line.
  • the reference signal of the echo is larger than the signal recorded by the microphone (that is, the echo signal to be processed), and the linearity of the two is better. In this case, the echo is easier to handle and clean. It can be seen from FIG. 5 that, at around 4 kHz, the microphone recording signal (that is, the echo signal to be processed) is larger than the reference signal of the echo. At this time, the echo will be difficult to be processed.
  • circuit structure in FIG. 1 and FIG. 2 is suitable for a single-speaker application scenario.
  • a multi-speaker application scenario such as FIG. 3, if only one signal at the front end of a certain speaker is used as a reference signal for echo processing, there is no Considering the effect of the sound superposition of multiple speakers, the effect of echo processing will be worse, especially when the loudness of the sound played by the speaker is large, the effect of echo processing is even worse.
  • FIG. 6 shows an application example of the circuit structure 60 of a dual speaker and a single microphone.
  • the circuit structure 60 includes a first speaker 601a, a second speaker 601b, a microphone 602, a first adder 603a, a second adder 603b, a first AF module 604a, a second AF module 604b, and a voice processing module. 605.
  • the audio signals at the front ends of the first speaker 601a and the second speaker 601b are selected as reference signals for echo processing.
  • the audio signal at the front end of the first speaker 601a passes through the first AF module 604a and is input to the first adder 603a.
  • the audio signal received by the microphone 602 is also input to the first adder 603a.
  • the audio signal recorded by the first adder 603a to the microphone 602 is The echo generated by the audio signal played by the first speaker 601a is subjected to echo processing.
  • the audio signal at the front end of the second speaker 601b passes through the second AF module 604b and is input to the second adder 603b.
  • the audio signal received by the microphone 602 is also input to the second adder 603b.
  • the audio signal recorded by the second adder 603b to the microphone 602 is
  • the echo generated by the audio signal played by the second speaker 601b is subjected to echo processing.
  • the audio signal after the echo processing is input to the noise reduction and other processing module 606 for processing, and then is modulated in the encoder 608, and finally sent by the transmitting end 609 through the transmission line.
  • This processing method considers the effect of the superposition of multiple speakers on the echo signal, which is better than the processing effect of introducing the digital signal at the front of one speaker as the reference signal of the echo in FIG.
  • FIG. 7 shows an application example of the circuit structure 70 of a dual speaker and a dual microphone. It can be seen from FIG. 7 that the left / right two-channel stereo signals XL and XR input to the line input terminals LI (L) and LI (R) do not pass through the sum / difference signal generating device 52 and pass through the sound output terminals, respectively. SO (L) and SO (R) are output and reproduced at the speakers SP (L) and SP (R), and then collected by the microphones MC (L) and MC (R) and input to the sound input terminals SI (L), SI ( R).
  • the filters 40-1, 40-2, 40-3, and 40-4 are formed by, for example, FIR filters.
  • the impulse responses set by the filters 40-1, 40-2, 40-3, and 40-4 are respectively related to the speakers.
  • the transfer functions between SP (L) and SP (R) and the microphones MC (L) and MC (R) correspond to each other, thereby correspondingly generating echo processing signals EC1, EC2, EC3, and EC4.
  • Adders 44 and 46 and subtractors 48 and 50 are used for echo processing. These echo processing signals are output from the line output terminals LO (L) and LO (R), respectively.
  • the sum / difference signal generating device 52 includes an adder 54 and a subtractor 56 for generating a sum signal X M and a difference signal X S.
  • the correlation detection device 59 detects a correlation between the sum signal X M and the difference signal X S based on a correlation value calculation (or such calculation).
  • the transfer function calculation device 58 is used to calculate the transfer functions of the four audio transmission systems between the speakers SP (L), SP (R) and the microphones MC (L), MC (R).
  • the technical solution uses the sum signal and the difference signal of the stereo sound signal as reference signals, and calculates the transfer of the four audio transmission systems between the two speakers and the two microphones according to the cross-spectrum calculation of the reference signal and the sound signal recorded by the microphone. function.
  • the obtained transfer function is subjected to an inverse Fourier transform to obtain impulse responses, and these impulse responses are set in a filter device to generate an echo-processed reference signal and perform an echo process.
  • the technical solution also considers the impact of the acoustic path and acoustic structure and components on the echo signal.
  • the sum signal and difference signal of the two speaker signals and the echo signal recorded by the microphone are considered. More comprehensive.
  • the speaker is required to play the audio signal, and the microphone receives the audio signal at the same time, which is greatly affected by environmental fluctuations, and the speaker and the microphone have inconsistent working conditions under different audio signals and environments. This has caused some situations where the echo processing is not ideal.
  • the complexity of the transfer function is possible as the number of speakers and microphones increases, so does the complexity of the transfer function.
  • the acoustic response model of the speaker and microphone is not considered, so that when the speaker or microphone's high-frequency resonance peak is relatively low (such as around 4kHz) When the signal recorded by the microphone is higher than the amplitude of the reference signal at the high-frequency resonance peak, it is easy to produce echo howling.
  • the technical solutions of the embodiments of the present disclosure are proposed. The embodiments of the present disclosure will be described in detail below with reference to the drawings.
  • FIG. 8 is a schematic flowchart of a signal processing method according to an embodiment of the present disclosure.
  • a signal processing method is applied to a signal processing apparatus having at least one speaker and at least one microphone.
  • the method may include steps S801 to S804.
  • step S801 a first audio signal is received and the first audio signal is played by at least one speaker.
  • step S802 at least one echo estimation signal corresponding to the first audio signal is obtained according to at least one speaker model, at least one microphone model, and the first audio signal, and the at least one is obtained based on the at least one speaker.
  • a second audio signal is received by using at least one microphone, wherein the second audio signal includes an echo signal generated by the first audio signal and output by at least one speaker and received through the at least one microphone.
  • step S804 at least one echo estimation signal corresponding to the first audio signal is removed from the second audio signal to obtain an echo-processed audio signal.
  • the echo howling phenomenon generated when the high-frequency resonance peak of the speaker or the microphone is low can be effectively solved, and the calculation workload in the multi-microphone design is also reduced.
  • the method further includes: demodulating and preprocessing the voice of the first audio signal, wherein the first An audio signal is generated and transmitted by the remote device.
  • the first audio signal is generated and transmitted by the remote device.
  • the signal processing device receives the first audio signal, it performs demodulation and speech preprocessing.
  • the processed first audio signal enters the speaker and the speaker plays the first audio signal.
  • the method further includes: correspondingly establishing at least one speaker model according to the characteristic information of the at least one speaker, where The characteristic information of the at least one speaker includes circuit information and structural information corresponding to the at least one speaker; and at least one microphone model is correspondingly established according to the characteristic information of the at least one microphone, wherein the characteristic information of the at least one microphone includes Circuit information and structure information corresponding to the at least one microphone.
  • a speaker model and a microphone model are introduced.
  • the speaker model is based on the circuit information and structure information of the speaker to simulate the acoustic response of the speaker
  • the microphone model is based on the circuit information and structure information of the microphone to simulate the acoustic response of the microphone.
  • the acquired reference signal of the first audio signal can be made more accurate and closer to the echo signal of the first audio signal played by the speaker, so that the processing effect of the echo signal is better.
  • step S802 may include: inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the First delay and attenuation information corresponding to the acoustic response signal to obtain a first reference signal of each speaker, wherein the first delay and attenuation are correspondingly obtained based on distance and position information between each speaker and the microphone.
  • Information performing superposition and bit expansion processing on the first reference signal of each speaker to obtain a first superimposed reference signal; and performing acoustic response processing on the first superimposed reference signal based on the microphone model To obtain a first echo estimation signal of the first audio signal.
  • the echo estimation signal of the first audio signal is obtained through a series of signal processing steps such as speaker model, delay and attenuation, signal superposition, and microphone model, so that the echo estimation signal is closer to The echo signal of the first audio signal played by the speaker makes the subsequent echo signal processing effect better.
  • the circuit structure 90 includes a speaker group 901, a microphone 902, a speaker model group 903, a delay attenuation module group 904, a summing module 905, a microphone model 906, an echo processing module 907, a voice processing module 908, and noise reduction. And other processing modules 909, a decoder 910, an encoder 911, and a transmitting end 912.
  • the speaker group 901 includes a first speaker 901a, a second speaker 901b, a third speaker 901c, and a fourth speaker 901d.
  • the speaker model group 903 includes a first speaker model 903a, a second speaker model 903b, a third speaker model 903c, and a fourth speaker model 903d.
  • the delay attenuation module group 904 includes a first delay attenuation module 904a, a second delay attenuation module 904b, a third delay attenuation module 904c, and a fourth delay attenuation module 904d.
  • the speaker model group 903 is correspondingly established based on the speaker group 901, and the delay attenuation module group 904 performs corresponding delay and attenuation on the audio signal after passing through the speaker model group 903.
  • the radio frequency end 912 sends the received first audio signal to the decoder 910, and the decoder 910 demodulates the first audio signal.
  • the demodulated first audio signal enters the voice processing module 908 for pre-processing such as noise reduction and filtering, and then the first audio signal is played by the speaker group 901.
  • the second audio signal recorded by the microphone 902 includes an echo signal generated by the first audio signal played by the speaker group 901.
  • a first audio signal is extracted at the front end of the speaker group 901, and the first audio signal is input to the first speaker model 903a, the second speaker model 903b, the third speaker model 903c, and the fourth speaker model 903d for acoustics, respectively.
  • the summing module 905 performs superposition processing to obtain a superposed reference signal.
  • the superimposed reference signal is input to the microphone model 906 to perform acoustic response processing according to the sound pressure excitation of the microphone model 906, thereby obtaining an echo reference signal recorded through the microphone 902, that is, a first echo estimation signal of the first audio signal.
  • the first echo estimation signal and the second audio signal recorded by the microphone 902 are input to the echo processing module 907 for echo processing.
  • the echo-processed audio signal is then subjected to noise reduction processing through the noise reduction and other processing module 909, and then modulated in the encoder 911, and finally sent by the transmitting end 912 through the transmission line.
  • the sound signals of multiple speakers are superimposed and the number of bits is expanded, mainly considering that overflow may occur after superposition. , Need to be extended by digits.
  • the superimposed sound signal is input to the echo processing module 907 as an echo estimation signal, and is mainly used to perform echo processing on the echo signal generated by the first audio signal played by the speaker among the second audio signals recorded by the microphone 902. After the echo processing module 907 does not increase the original audio signal amplitude, the echo processing module 907 does not need to perform a bit expansion process on the second audio signal recorded by the microphone 902.
  • the method further includes: obtaining a high-frequency resonance peak of the microphone; and combining the high-frequency resonance peak of the microphone with a preset Compare the high-frequency resonance peak; and in response to the high-frequency resonance peak of the microphone being higher than a preset high-frequency resonance peak, cancel the acoustic response processing of the microphone model to the first superimposed reference signal.
  • the bandwidth of the first microphone (such as the microphone 902 in FIG. 9) is relatively wide, that is, the high-frequency resonance peak of the microphone 902 is relatively high, it is not necessary to use the microphone model 906 to pair the obtained first
  • An superimposed reference signal is subjected to acoustic response processing, that is, it is not necessary to modify the echo estimation signal through the microphone model 906, and the obtained first superimposed reference signal can be used as the echo estimation signal.
  • the first superimposed reference signal is obtained through the summing module 905
  • the microphone model 906 is required to pair the first The superimposed reference signal is processed for acoustic response, so that a modified echo estimation signal can be obtained. If the high-frequency resonance peak of the microphone 902 is 9 kHz, that is, the high-frequency resonance peak of the microphone 902 is higher than a preset high-frequency resonance peak, then The microphone model 906 is not required to perform acoustic response processing on the first superimposed reference signal.
  • the circuit structure 100 includes a speaker group 1001, a microphone 1002, a speaker model group 1003, a delay attenuation module group 1004, a summing module 1005, an echo processing module 1006, a voice processing module 1007, noise reduction, and other processing modules. 1008, a decoder 1009, an encoder 1010, and a transmitting end 1011.
  • the speaker group 1001 includes a first speaker 1001a, a second speaker 1001b, a third speaker 1001c, and a fourth speaker 1001d.
  • the speaker model group 1003 includes a first speaker model 1003a, a second speaker model 1003b, a third speaker model 1003c, and a fourth speaker model 1003d.
  • the delay attenuation module group 1004 includes a first delay attenuation module 1004a, a second delay attenuation module 1004b, a third delay attenuation module 1004c, and a fourth delay attenuation module 1004d.
  • the circuit structure 100 shown in FIG. 10 reduces the microphone model. That is, in the circuit structure shown in FIG. 10, the step of performing the acoustic response processing on the first superimposed reference signal by the microphone model is eliminated.
  • the superimposed reference signal is directly input as an echo estimation signal to the echo processing module 1006 to perform echo processing on the second audio signal recorded by the microphone 1002.
  • the other operation processes are the same as those of the circuit structure 90 shown in FIG. 9 described above, and will not be described in detail here.
  • step S802 may include: inputting the first audio signal to each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the The first delay and attenuation information corresponding to the acoustic response signal and the second delay and attenuation information corresponding to the acoustic response signal are used to obtain a first reference signal and a second reference signal for each speaker, wherein, based on each speaker and The distance and position information between the first microphones in the at least one microphone correspond to the first delay and attenuation information, and are based on the distance between each speaker and the second microphone in the at least one microphone and The position information corresponds to the second delay and attenuation information; the first reference signal and the second reference signal of each speaker are
  • the circuit structure 110 includes a speaker group 1101, a first microphone 1102a, a second microphone 1102b, a speaker model group 1103, a first delay attenuation module group 1104-1, and a second delay attenuation module group 1104-2.
  • the speaker group 1101 includes a first speaker 1101a, a second speaker 1101b, a third speaker 1101c, and a fourth speaker 1101d.
  • the speaker model group 1103 includes a first speaker model 1103a, a second speaker model 1103b, a third speaker model 1103c, and a fourth speaker model 1103d.
  • the first delay attenuation module group 1104-1 includes a first delay attenuation module 1104-1a, a second delay attenuation module 1104-1b, a third delay attenuation module 1104-1c, and a fourth delay attenuation module 1104-1d.
  • the second delay attenuation module group 1104-2 includes a fifth delay attenuation module 1104-2a, a sixth delay attenuation module 1104-2b, a seventh delay attenuation module 1104-2c, and an eighth delay attenuation module 1104-2d. .
  • the speaker model group 1103 is correspondingly established based on the speaker group 1101, and the first delay attenuation module group 1104-1 and the second delay attenuation module group 1104-2 perform corresponding delay and attenuation on the audio signal after passing through the speaker model group 1103. .
  • the radio frequency end 1112 sends the received first audio signal to the decoder 1111, and the decoder 1111 demodulates the first audio signal.
  • the demodulated first audio signal enters the voice processing module 1108 for preprocessing such as noise reduction and filtering, and then the first audio signal is played by the speaker group 1101.
  • the second audio signal recorded by the first microphone 1102a includes an echo signal generated by the first audio signal played by the speaker group 1101, and the second audio signal recorded by the second microphone 1102b also includes the first audio played by the speaker group 1101. The echo signal produced by the signal. In order to eliminate these echo signals as much as possible, the first audio signal is extracted at the front end of the speaker group 1101.
  • each of the extracted first audio signals is correspondingly input to the first speaker model 1103a.
  • the second speaker model 1103b, the third speaker model 1103c, and the fourth speaker model 1103d and then perform an acoustic response analysis process to obtain an acoustic response signal of each speaker in the speaker group 1101. Since sound takes time in the propagation process, and energy also decays during the propagation, in order to indicate the delay and attenuation of the sound transmitted from the speaker to the microphone position, it is necessary to use the first microphone 1102a and each speaker in the speaker group 1101.
  • the distance and position information are used to obtain the corresponding delay and attenuation in the first delay attenuation module group 1104-1, and the second delay attenuation is obtained according to the distance and position information of the second microphone 1102b and each speaker in the speaker group 1101. Corresponding delay and attenuation in module group 1104-2.
  • the acoustic response signals are input to the first delay attenuation module group 1104-1 and the first delay attenuation module group 1104-2, respectively, to perform delay attenuation processing to obtain eight-channel delayed and attenuated sound signals.
  • the first reference signal is input to the first summing module 1105a for superposition and bit expansion processing to obtain a first superposed reference signal.
  • the first superimposed reference signal is input to the first microphone model 1106a, and an acoustic response process is performed according to the sound pressure excitation of the first microphone model 1106a, thereby obtaining a first echo estimation signal recorded by the first microphone 1102a.
  • the second reference signal is input to the second summation module 1105b for superposition and bit expansion processing to obtain a second superimposed reference signal.
  • the second superimposed reference signal is input to the second microphone model 1106b, and the acoustic response processing is performed according to the sound pressure excitation of the second microphone model 1106b, thereby obtaining a second echo estimation signal recorded by the second microphone 1102b.
  • the first echo estimation signal and the second audio signal recorded by the first microphone 1102a are input to the first echo processing module 1107a for echo processing, so that the speaker group 1101 can be eliminated from the second audio signal recorded by the first microphone 1102a as much as possible.
  • the echo signal is generated by the first audio signal.
  • the second echo estimation signal and the second audio signal recorded by the second microphone 1102b are input to the second echo processing module 1107b for echo processing, so that the speaker group 1101 can be eliminated from the second audio signal recorded by the second microphone 1102b as much as possible.
  • the echo signal is generated by the first audio signal.
  • the audio signal after the echo processing is then subjected to noise reduction processing by the noise reduction and other processing module 1109, then modulated in the encoder 1111, and finally transmitted by the transmitting end 1112 through the transmission line.
  • the first echo processing module 1107a or the second echo processing module 1107b it is not necessary to perform a bit expansion process in consideration of the second audio signal recorded by the first microphone and the second microphone.
  • the reference signal of the echo signal that is, the echo estimation signal
  • the reference signal of the echo signal that is, the echo estimation signal
  • the echo processing separately (such as the first echo processing module 1107a and the second echo processing module 1107b in FIG. 11).
  • the speaker model and the delay attenuation module are separately provided, so first consider the speaker model, and then consider the position and distance information between the speaker and each microphone, so that The audio signal at the front of the speaker only needs to undergo the acoustic response processing of the speaker model once, which helps reduce the calculation workload in the multi-microphone design scheme.
  • the method further includes: acquiring a high-frequency resonance peak of the first microphone and the second superimposed reference signal.
  • the high-frequency resonance peak of the microphone comparing the high-frequency resonance peak of the first microphone and the high-frequency resonance peak of the second microphone with a preset high-frequency resonance peak, respectively; in response to the height of the first microphone
  • the high-frequency resonance peak is higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the first microphone model to the first superimposed reference signal; and in response to the high-frequency resonance peak of the second microphone being higher than a preset Set the high-frequency resonance peak, cancel the acoustic response processing of the second microphone model to the second superimposed reference signal.
  • the bandwidth of the first microphone or the second microphone is relatively wide, that is, the high-frequency resonance peak value of the microphone is high, the superposition obtained by using the first microphone model or the second microphone model is not required.
  • Acoustic response processing is performed on the reference signal, that is, there is no need to modify the echo reference signal through the first microphone model or the second microphone model.
  • the first microphone 1102a may be judged according to the bandwidth of the first microphone 1102a, that is, the first microphone 1102a The high-frequency resonance peak is compared with a preset high-frequency resonance peak.
  • the preset high-frequency resonance peak is 8 kHz. If the high-frequency resonance peak of the first microphone 1102a is 5 kHz, that is, the high-frequency resonance peak of the first microphone 1102a is lower than the preset high-frequency resonance peak, the first The microphone model 1106a performs an acoustic response processing on the first superimposed reference signal, so that a corrected first echo estimation signal can be obtained; if the high-frequency resonance peak of the first microphone 1102a is 9kHz, that is, the high-frequency resonance of the first microphone 1102a If the peak value is higher than the preset high-frequency resonance peak value, the first microphone model 1106a does not need to perform acoustic response processing on the superimposed reference signal, that is, the first microphone model 1106a does not need to modify the first echo estimation signal, and the first A microphone model 1106a performs an acoustic response process on the first superimposed reference signal.
  • the bandwidth of the second microphone 1102b that is, comparing the high-frequency resonance peak of the second microphone 1102b with a preset high-frequency resonance peak, so as to obtain whether the second microphone model 1106b is required for the second echo. Correction of estimated signal. If the high-frequency resonance peak of the second microphone 1102b is higher than the preset high-frequency resonance peak, the second microphone model 1106b does not need to perform the acoustic response processing on the superimposed reference signal, and the second microphone model 1106b may cancel the second superimposition.
  • the reference signal performs the steps of acoustic response processing.
  • the step of inputting the first audio signal into each speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker includes: responding The first audio signal received by the at least one speaker is the same, inputting the first audio signal to the at least one speaker model and performing acoustic response processing to obtain an acoustic response signal of each speaker; and responding to the The first audio signal received by the at least one speaker is different, and the first audio signal is correspondingly input to the at least one speaker model and an acoustic response process is performed to obtain an acoustic response signal of each speaker.
  • the first audio signals received by the multiple speakers may be the same. At this time, the same audio signal is input to the multiple speakers, and the audio signals are also Input to multiple speaker models, such as the circuit structure 90 shown in FIG. 9; In addition, the first audio signals received by the multiple speakers may also be different. At this time, these first audio signals are correspondingly input to multiple speakers. At the same time, these audio signals are correspondingly input to multiple speaker models, such as the circuit structure 120 shown in FIG. 12.
  • the circuit structure 120 includes a speaker group 1201, a microphone 1202, a speaker model group 1203, a delay attenuation module group 1204, a summing module 1205, a microphone model 1206, an echo processing module 1207, a voice processing module 1208, and noise reduction. And other processing modules 1209, decoder 1210, encoder 1211, and transmitting end 1212.
  • the speaker group 1201 includes a first speaker 1201a, a second speaker 1201b, a third speaker 1201c, and a fourth speaker 1201d.
  • the speaker model group 1203 includes a first speaker model 1203a, a second speaker model 1203b, a third speaker model 1203c, and a fourth speaker model 1203d.
  • the delay attenuation module group 1204 includes a first delay attenuation module 1204a, a second delay attenuation module 1204b, a third delay attenuation module 1204c, and a fourth delay attenuation module 1204d.
  • the circuit structure 120 shown in FIG. 12 is similar to the circuit structure 90 shown in FIG. 9 except that the first audio signal is divided into four first audio signals after passing through the voice processing module 1208.
  • An audio signal correspondingly enters and is played by the speaker group 1201, and each first audio signal received by the first speaker 1201a, the second speaker 1201b, the third speaker 1201c, and the fourth speaker 1201d is correspondingly input to the first speaker 1201a.
  • an echo howling phenomenon generated when a high-frequency resonance peak of a speaker or a microphone is low can be effectively solved, and a calculation workload in a multi-microphone design is also reduced.
  • the method may include steps S1301 to S1305.
  • step S1301 a digital signal is extracted from the front end of the speaker group 1001 to obtain a first audio signal.
  • step S1302 the first audio signal is input to the first speaker model 1003a, the second speaker model 1003b, the third speaker model 1003c, and the fourth speaker model 1003d, respectively, and an acoustic response process is performed to obtain an acoustic response signal of each speaker .
  • step S1303 a reference signal of each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1001 and the first delay and attenuation information corresponding to the acoustic response signal.
  • step S1304 the reference signal of each speaker is superposed and the number of bits is expanded to obtain a superposed reference signal, and the superposed reference signal is used as an echo estimation signal.
  • step S1305 the echo estimation signal is input to an echo processing module 1006 to perform echo processing on a second audio signal recorded by the microphone 1002.
  • the above process does not consider the modification of the echo estimation signal by the microphone model.
  • the superposed reference signal can be directly used as an echo estimation signal, and then the echo estimation signal is used to record the microphone 1002.
  • the second audio signal is subjected to echo processing.
  • the method may include steps S1401 to S1409.
  • step S1401 a digital signal is extracted from the front end of the speaker group 901 to obtain a first audio signal.
  • step S1402 the first audio signal is input to the first speaker model 903a, the second speaker model 903b, the third speaker model 903c, and the fourth speaker model 903d, respectively, and an acoustic response process is performed to obtain an acoustic response signal of each speaker. .
  • step S1403 a reference signal of each speaker is obtained according to an acoustic response signal of each speaker in the speaker group 901 and first delay and attenuation information corresponding to the acoustic response signal.
  • step S1404 the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
  • step S1405 a high-frequency resonance peak of the microphone 902 is acquired.
  • step S1406 the high-frequency resonance peak of the microphone 902 is compared with a preset high-frequency resonance peak.
  • step S1407 if the high-frequency resonance peak of the microphone 902 is higher than a preset high-frequency resonance peak, the superimposed reference signal is directly used as an echo estimation signal.
  • step S1408 if the high-frequency resonance peak of the microphone 902 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 906 to obtain an echo estimation signal.
  • step S1409 the echo estimation signal is input to an echo processing module 907 to perform echo processing on the second audio signal recorded by the microphone 902.
  • the process shown in FIG. 14 adds a microphone model and a judgment as to whether the microphone model is required to perform acoustic response processing on the superimposed reference signal.
  • a high-frequency resonance peak of the microphone 902 is obtained. Assume that the preset high-frequency resonance peak is 8kHz. If the high-frequency resonance peak of the microphone 902 is 5kHz, that is, the high-frequency resonance peak of the microphone 902 is lower than the preset high-frequency resonance peak, the superimposed reference of the microphone model 906 is required.
  • the signal is subjected to acoustic response processing, so that a modified echo estimation signal can be obtained, and the second audio signal recorded by the microphone 902 is subjected to echo processing using the modified echo estimation signal.
  • the high-frequency resonance peak of the microphone 902 is 9 kHz, that is, The high-frequency resonance peak of the microphone 902 is higher than the preset high-frequency resonance peak. Therefore, the microphone model 906 is not required to perform acoustic response processing on the superimposed reference signal, that is, the microphone model 906 is not required to modify the echo estimation signal, and the superposition is directly used.
  • the reference signal is used as an echo estimation signal to perform echo processing on the second audio signal recorded by the microphone 902.
  • FIG. 15 there is shown an application example in which the signal processing method according to the embodiment of the present disclosure is applied to the circuit structure 120 of the four-speaker and single-microphone shown in FIG. 12, and the method may include steps S1501 to S1509.
  • step S1501 one digital signal is extracted from the front end of each speaker in the speaker group 1201 to obtain four first audio signals.
  • step S1502 the four first audio signals are correspondingly input to the first speaker model 1203a, the second speaker model 1203b, the third speaker model 1203c, and the fourth speaker model 1203d, and the acoustic response processing is performed to obtain the acoustic response signal of each speaker .
  • step S1503 a reference signal for each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1201 and the first delay and attenuation information corresponding to the acoustic response signal.
  • step S1504 the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
  • step S1505 a high-frequency resonance peak of the microphone 1202 is acquired.
  • step S1506 the high-frequency resonance peak of the microphone 1202 is compared with a preset high-frequency resonance peak.
  • step S1507 if the high-frequency resonance peak of the microphone 1202 is higher than a preset high-frequency resonance peak, the superimposed reference signal is used as an echo estimation signal.
  • step S1508 if the high-frequency resonance peak of the microphone 1202 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 1206 to obtain an echo estimation signal.
  • step S1509 the echo estimation signal is input to an echo processing module 1207 to perform echo processing on the second audio signal recorded by the microphone 1202.
  • the process shown in FIG. 15 considers that the audio signals input by each speaker in the speaker group may be different.
  • the other processing steps are the same as the corresponding processing steps shown in FIG. 14 and will not be described in detail here.
  • the method may include steps S1601 to S1617.
  • step S1601 one digital signal is extracted from the front end of each speaker in the speaker group 1101 to obtain four first audio signals.
  • step S1602 the four first audio signals are correspondingly input to the first speaker model 1103a, the second speaker model 1103b, the third speaker model 1103c, and the fourth speaker model 1103d, and an acoustic response process is performed to obtain an acoustic response signal of each speaker. .
  • step S1603 a first reference signal of each speaker is obtained according to an acoustic response signal of each speaker in the speaker group 1101 and first delay and attenuation information corresponding to the acoustic response signal.
  • step S1604 the first reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a first superposed reference signal.
  • step S1605 a high-frequency resonance peak of the first microphone 1102a is acquired.
  • step S1606 the high-frequency resonance peak of the first microphone 1102a is compared with a preset high-frequency resonance peak.
  • step S1607 if the high-frequency resonance peak of the first microphone 1102a is higher than a preset high-frequency resonance peak, the first superimposed reference signal is used as a first echo estimation signal.
  • step S1608 if the high-frequency resonance peak of the first microphone 1102a is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the first superimposed reference signal based on the first microphone model 1106a to obtain a first Echo estimation signal.
  • step S1609 the first echo estimation signal is input to a first echo processing module 1107a to perform a first echo processing on a second audio signal recorded by the first microphone 1102a.
  • step S1610 the second reference signal of each speaker is obtained according to the acoustic response signal of each speaker in the speaker group 1101 and the second delay and attenuation information corresponding to the acoustic response signal.
  • step S1611 the second reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a second superimposed reference signal.
  • step S1612 a high-frequency resonance peak of the second microphone 1102b is acquired.
  • step S1613 the high-frequency resonance peak of the second microphone 1102b is compared with a preset high-frequency resonance peak.
  • step S1614 if the high-frequency resonance peak of the second microphone 1102b is higher than a preset high-frequency resonance peak, the second superimposed reference signal is used as a second echo estimation signal.
  • step S1615 if the high-frequency resonance peak of the second microphone 1102b is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the second superimposed reference signal based on the second microphone model 1106b to obtain a second Echo estimation signal.
  • step S1616 the second echo estimation signal is input to a second echo processing module 1107b to perform a second echo processing on the second audio signal recorded by the second microphone 1102b.
  • step S1617 the audio signals after the first echo processing and the second echo processing are input to the noise reduction and other processing module 1109.
  • the reference signal of the echo signal ie, the echo estimation signal
  • echo processing needs to be performed separately (such as the first echo processing module 1107a in FIG. 11).
  • the speaker model since the speaker group is shared, and the speaker model and the delay attenuation module are separately provided, so first consider the speaker model, and then consider the position and distance information between the speaker and each microphone, so that The audio signal at the front of the speaker only needs to undergo the acoustic response processing of the speaker model once, thereby helping to reduce the computational workload in a multi-microphone design.
  • the foregoing embodiments perform echo processing on the echo signals of the four speakers.
  • the embodiments of the present disclosure are also suitable for echo processing of other multi-speakers and multi-microphones, as well as echo processing of a single speaker.
  • the number of speakers is not specifically limited.
  • the method may include steps S1701 to S1708.
  • step S1701 a digital signal is extracted from the front end of the speaker 1801 to obtain a first audio signal.
  • step S1702 the first audio signal is input to a speaker model 1803 for acoustic response processing to obtain an acoustic response signal of the speaker 1801.
  • step S1703 a reference signal of the speaker 1801 is obtained according to the acoustic response signal of the speaker 1801 and the delay and attenuation information corresponding to the acoustic response signal.
  • step S1704 a high-frequency resonance peak of the microphone 1802 is acquired.
  • step S1705 the high-frequency resonance peak of the microphone 1802 is compared with a preset high-frequency resonance peak.
  • step S1706 if the high-frequency resonance peak of the microphone 1802 is higher than a preset high-frequency resonance peak, the reference signal is used as an echo estimation signal.
  • step S1707 if the high-frequency resonance peak of the microphone 1802 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the reference signal based on the microphone model 1805 to obtain an echo estimation signal.
  • step S1708 the echo estimation signal is input to the adder 1806 to perform echo processing on the second audio signal recorded by the microphone 1802.
  • the circuit structure 180 shown in FIG. 18 is taken as an example.
  • the circuit structure 180 includes a speaker 1801, a microphone 1802, a speaker model 1803, a delay attenuation module 1804, a microphone model 1805, an adder 1806,
  • the speaker model 1803 is correspondingly established based on the speaker 1801, and the delay attenuation module 1804 performs corresponding delay and attenuation on the audio signal after passing through the speaker model 1803.
  • the delay and attenuation information is obtained according to the distance and position information between the speaker 1801 and the microphone 1802.
  • the function of the adder 1806 is the same as that of the echo processing module, and is used to perform echo processing on the echo signal generated by the first audio signal played by the speaker 1801 among the second audio signals recorded by the microphone 1802.
  • the operation process of the circuit structure shown in FIG. 18 is similar to the operation process of the aforementioned multi-speaker circuit structure, and will not be described in detail here.
  • the method may include steps S1901 to S1909.
  • step S1901 one digital signal is extracted from the front ends of the first speaker 2001a and the second speaker 2001b respectively to obtain two first audio signals.
  • step S1902 two channels of the first audio signal are correspondingly input to the first speaker model 2003a and the second speaker model 2003b to perform an acoustic response process to obtain an acoustic response signal of each speaker.
  • step S1903 a reference signal of each speaker is obtained according to the acoustic response signal of each speaker and the first delay attenuation module 2004a and the second delay attenuation module 2004b.
  • step S1904 the reference signal of each speaker is subjected to superposition and bit expansion processing to obtain a superposed reference signal.
  • step S1905 a high-frequency resonance peak of the microphone 2002 is acquired.
  • step S1906 the high-frequency resonance peak of the microphone 2002 is compared with a preset high-frequency resonance peak.
  • step S1907 if the high-frequency resonance peak of the microphone 2002 is higher than a preset high-frequency resonance peak, the superimposed reference signal is used as an echo estimation signal.
  • step S1908 if the high-frequency resonance peak of the microphone 2002 is not higher than a preset high-frequency resonance peak, an acoustic response process is performed on the superimposed reference signal based on the microphone model 2006 to obtain an echo estimation signal.
  • step S1909 the echo estimation signal is input to an echo processing module 2007 to perform echo processing on the second audio signal recorded by the microphone 2002.
  • the circuit structure 200 shown in FIG. 20 is taken as an example.
  • the circuit structure 200 includes a first speaker 2001a, a second speaker 2001b, a microphone 2002, a first speaker model 2003a, and a second speaker model. 2003b, first delay attenuation module 2004a, second delay attenuation module 2004b, summing module 2005, microphone model 2006, adder 2007, speech processing module 2008, noise reduction and other processing module 2009, decoder 2010, encoder 2011 and launcher 2012.
  • the function of the adder 2007 is the same as that of the echo processing module.
  • the operation process of the circuit structure shown in FIG. 20 is similar to the operation process of the aforementioned multi-speaker circuit structure (such as FIG. 11), and will not be described in detail here.
  • 21 to 25 are schematic structural diagrams of a signal processing apparatus according to an embodiment of the present disclosure.
  • the signal processing device 210 may include at least one speaker 2101, at least one microphone 2102, a first receiving part 2103, a first obtaining part 2104, a second receiving part 2105, and a second obtaining part 2106.
  • the first receiving part 2103 is configured to receive a first audio signal and play the first audio signal by the at least one speaker.
  • the first obtaining part 2104 is configured to obtain at least one echo estimation signal corresponding to the first audio signal according to at least one speaker model, at least one microphone model, and the first audio signal, and obtain the estimated signal based on the at least one speaker.
  • the at least one speaker model is described, and the at least one microphone model is obtained based on the at least one microphone.
  • the second receiving part 2105 is configured to receive a second audio signal by using the at least one microphone, wherein the second audio signal includes the first audio signal output by the at least one speaker and received by the at least one microphone.
  • the echo signal produced by the signal is configured to receive a second audio signal by using the at least one microphone, wherein the second audio signal includes the first audio signal output by the at least one speaker and received by the at least one microphone. The echo signal produced by the signal.
  • the second acquisition component 2106 is configured to remove at least one echo estimation signal corresponding to the first audio signal from the second audio signal to obtain an echo-processed audio signal.
  • the signal processing device 210 may further include a pre-processing component 2107 configured to demodulate and pre-process the first audio signal, where the first audio signal is generated and transmitted by a remote device. .
  • the signal processing device 210 may further include a modeling component 2108 configured to: correspondingly establish at least one speaker model according to the characteristic information of the at least one speaker, where the characteristic information of the at least one speaker includes the Circuit information and structure information corresponding to at least one speaker; and correspondingly establishing at least one microphone model according to the characteristic information of the at least one microphone, wherein the characteristic information of the at least one microphone includes circuit information and Structural information.
  • a modeling component 2108 configured to: correspondingly establish at least one speaker model according to the characteristic information of the at least one speaker, where the characteristic information of the at least one speaker includes the Circuit information and structure information corresponding to at least one speaker; and correspondingly establishing at least one microphone model according to the characteristic information of the at least one microphone, wherein the characteristic information of the at least one microphone includes circuit information and Structural information.
  • the first obtaining part 2104 may be configured to: input the first audio signal into each speaker model and perform acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the acoustic response signal Corresponding first delay and attenuation information to obtain a first reference signal for each speaker, wherein the first delay and attenuation information is correspondingly obtained based on the distance and position information between each speaker and the microphone; Performing superposition and bit expansion processing on the first reference signal of each speaker to obtain a first superimposed reference signal; and performing an acoustic response process on the first superimposed reference signal based on the microphone model to obtain A first echo estimation signal of the first audio signal.
  • the signal processing device 210 may further include a first comparison component 2109 configured to: obtain a high-frequency resonance peak of the microphone; and compare the high-frequency resonance peak of the microphone with a preset high-frequency resonance peak And in response to the high-frequency resonance peak of the microphone being higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the microphone model to the first superimposed reference signal.
  • a first comparison component 2109 configured to: obtain a high-frequency resonance peak of the microphone; and compare the high-frequency resonance peak of the microphone with a preset high-frequency resonance peak And in response to the high-frequency resonance peak of the microphone being higher than a preset high-frequency resonance peak, canceling the acoustic response processing of the microphone model to the first superimposed reference signal.
  • the first obtaining part 2104 may be configured to: input the first audio signal into each speaker model and perform acoustic response processing to obtain an acoustic response signal of each speaker; according to the acoustic response signal of each speaker and the acoustic response signal Corresponding first delay and attenuation information and second delay and attenuation information corresponding to the acoustic response signal to obtain a first reference signal and a second reference signal for each speaker, wherein, based on each speaker and the at least one The distance and position information between the first microphones in a microphone are correspondingly obtained to obtain the first delay and attenuation information, and are based on the distance and position information between each speaker and the second microphone in the at least one microphone.
  • the second delay and attenuation information Obtaining the second delay and attenuation information; and performing superposition and bit expansion on the first reference signal and the second reference signal of each speaker to obtain a first superposed reference signal and a second superposition Based on the first microphone model in the at least one microphone model, Performing acoustic response processing on the first superimposed reference signal to obtain a first echo estimation signal of the first audio signal; and based on a second microphone model of the at least one microphone model, the second superimposed The reference signal is subjected to an acoustic response process to obtain a second echo estimation signal of the first audio signal.
  • the signal processing device 210 may further include a second comparison component 2110 configured to: obtain a high-frequency resonance peak of the first microphone and a high-frequency resonance peak of the second microphone; The high-frequency resonance peak of the second microphone and the high-frequency resonance peak of the second microphone are respectively compared with a preset high-frequency resonance peak; in response to the high-frequency resonance peak of the first microphone being higher than the preset high-frequency resonance peak, Cancel the acoustic response processing of the first microphone model to the first superimposed reference signal; and cancel the second microphone in response to the high-frequency resonance peak of the second microphone being higher than a preset high-frequency resonance peak The model processes the acoustic response of the second superimposed reference signal.
  • a second comparison component 2110 configured to: obtain a high-frequency resonance peak of the first microphone and a high-frequency resonance peak of the second microphone; The high-frequency resonance peak of the second microphone and the high-frequency resonance peak of the second microphone are respectively compared with a preset high-frequency resonance peak; in response to the high
  • the first obtaining part 2104 may be configured to: in response to the first audio signals received by the at least one speaker being the same, input the first audio signals to the at least one speaker model and perform acoustic response processing to obtain each An acoustic response signal of the speaker; and in response to the first audio signal received by the at least one speaker being different, correspondingly inputting the first audio signal to the at least one speaker model and performing acoustic response processing to obtain the Acoustic response signal.
  • the “component” may be a part of a circuit, a part of a processor, a part of a program or software, and the like, of course, it may also be a unit, or may be a module or non-modular.
  • each component in this embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional modules.
  • the integrated unit is implemented in the form of a software functional module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment may be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions for making a computer device (which may be a personal computer, a server, or Network equipment, etc.) or a processor executes all or part of the steps of the method described in this embodiment.
  • the foregoing storage media include: U disks, mobile hard disks, read only memories (ROM, Read Only Memory), random access memories (RAM, Random Access Memory), magnetic disks or optical disks, and other media that can store program codes.
  • an embodiment of the present disclosure provides a computer storage medium on which a computer program is stored.
  • the computer program is executed by at least one processor, the at least one processor executes a signal processing method according to embodiments of the present disclosure. .
  • the signal processing device 210 may include a network interface 2601, a memory 2602, and a processor 2603.
  • the various components are coupled together by a bus system 2604.
  • the bus system 2604 is used to implement connection and communication between these components.
  • the bus system 2604 may include a data bus, and may further include a power bus, a control bus, and a status signal bus.
  • various buses are labeled as the bus system 2604 in FIG. 26.
  • the network interface 2601 is used to receive and send signals during the process of transmitting and receiving information with other external network elements.
  • the memory 2602 stores a computer program capable of running on the processor 2603.
  • the processor 2603 runs the computer program, it can execute a signal processing method according to various embodiments of the present disclosure.
  • the memory 2602 in the embodiment of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • Non-volatile memory can be Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable read-only memory (EPROM, EEPROM) or flash memory.
  • the volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDRSDRAM
  • enhanced SDRAM ESDRAM
  • synchronous connection dynamic random access memory Synchronous DRAM, SLDRAM
  • Direct RAMbus RAM Direct RAMbus RAM, DRRAM
  • the memory 2602 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
  • the processor 2603 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 2603 or an instruction in the form of software.
  • the above-mentioned processor 2603 may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA), or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in combination with the embodiments of the present disclosure may be directly embodied as being executed by a hardware decoding processor, or may be executed and completed by using a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a mature storage medium such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, and the like.
  • the storage medium is located in the memory 2602, and the processor 2603 reads the information in the memory 2602 and completes the steps of the foregoing method in combination with its hardware.
  • the embodiments described herein may be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more application-specific integrated circuits (ASICs), digital signal processors (DSP), digital signal processing devices (DSPD), programmable Programmable Logic Device (PLD), Field-Programmable Gate Array (FPGA), general-purpose processor, controller, microcontroller, microprocessor, other for performing functions described in this disclosure Electronic unit or combination thereof.
  • ASICs application-specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • controller microcontroller
  • microprocessor other for performing functions described in this disclosure Electronic unit or combination thereof.
  • the techniques described herein can be implemented through modules (e.g., procedures, functions, etc.) that perform the functions described herein.
  • Software codes may be stored in a memory and executed by a processor.
  • the memory may be implemented in the processor or external to the processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

Des modes de réalisation de la présente invention concernent un procédé de traitement de signal, un dispositif de traitement de signal et un support de stockage informatique. Le procédé consiste à : recevoir un premier signal audio et lire le premier signal audio par au moins un haut-parleur ; obtenir, selon au moins un modèle de haut-parleur, au moins un modèle de microphone et le premier signal audio, au moins un signal d'estimation d'écho correspondant au premier signal audio ; recevoir un deuxième signal audio en utilisant au moins un microphone, le deuxième signal audio comprenant un signal d'écho généré par le premier signal audio délivré par ledit haut-parleur et reçu par ledit microphone ; et éliminer du deuxième signal audio au moins un signal d'estimation d'écho correspondant au premier signal audio en vue d'obtenir un signal audio dont l'écho a été traité.
PCT/CN2019/097552 2018-07-25 2019-07-24 Procédé et dispositif de traitement de signal, et support de stockage informatique WO2020020247A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810826906.7 2018-07-25
CN201810826906.7A CN110769352B (zh) 2018-07-25 2018-07-25 一种信号处理方法、装置以及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2020020247A1 true WO2020020247A1 (fr) 2020-01-30

Family

ID=69182149

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097552 WO2020020247A1 (fr) 2018-07-25 2019-07-24 Procédé et dispositif de traitement de signal, et support de stockage informatique

Country Status (2)

Country Link
CN (1) CN110769352B (fr)
WO (1) WO2020020247A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037808B (zh) * 2020-09-01 2024-04-19 杭州岁丰信息技术有限公司 一种用于电梯轿厢的回音消除方法及其装置
CN113096678B (zh) * 2021-03-31 2024-06-25 康佳集团股份有限公司 一种语音回声消除方法、装置、终端设备及存储介质
CN115223581A (zh) * 2021-04-17 2022-10-21 华为技术有限公司 分布式设备回声时延估计方法及电子设备
WO2024080590A1 (fr) * 2022-10-14 2024-04-18 삼성전자주식회사 Dispositif électronique et procédé de détection d'erreur de signal
CN116582803B (zh) * 2023-06-01 2023-10-20 广州市声讯电子科技股份有限公司 扬声器阵列的自适应控制方法、系统、存储介质及终端

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1176034A (zh) * 1995-02-24 1998-03-11 艾利森公司 消除在扬声器电话中包括非线性失真的声学回声的装置和方法
CN106297816A (zh) * 2015-05-20 2017-01-04 广州质音通讯技术有限公司 一种回声消除的非线性处理方法和装置及电子设备
WO2017065989A1 (fr) * 2015-10-12 2017-04-20 Microsoft Technology Licensing, Llc Traitement de signal audio

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2675063B1 (fr) * 2012-06-13 2016-04-06 Dialog Semiconductor GmbH Circuit agc avec des niveaux d'énergie de signal de référence optimisés pour un circuit d'annulation d'écho
US9531433B2 (en) * 2014-02-07 2016-12-27 Analog Devices Global Echo cancellation methodology and assembly for electroacoustic communication apparatuses
JP6258061B2 (ja) * 2014-02-17 2018-01-10 クラリオン株式会社 音響処理装置、音響処理方法及び音響処理プログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1176034A (zh) * 1995-02-24 1998-03-11 艾利森公司 消除在扬声器电话中包括非线性失真的声学回声的装置和方法
CN106297816A (zh) * 2015-05-20 2017-01-04 广州质音通讯技术有限公司 一种回声消除的非线性处理方法和装置及电子设备
WO2017065989A1 (fr) * 2015-10-12 2017-04-20 Microsoft Technology Licensing, Llc Traitement de signal audio

Also Published As

Publication number Publication date
CN110769352B (zh) 2022-07-15
CN110769352A (zh) 2020-02-07

Similar Documents

Publication Publication Date Title
WO2020020247A1 (fr) Procédé et dispositif de traitement de signal, et support de stockage informatique
US11297178B2 (en) Method, apparatus, and computer-readable media utilizing residual echo estimate information to derive secondary echo reduction parameters
JP4286637B2 (ja) マイクロホン装置および再生装置
JP6703525B2 (ja) 音源を強調するための方法及び機器
KR101210313B1 (ko) 음성 향상을 위해 마이크로폰 사이의 레벨 차이를 활용하는시스템 및 방법
US9552840B2 (en) Three-dimensional sound capturing and reproducing with multi-microphones
US9516411B2 (en) Signal-separation system using a directional microphone array and method for providing same
US10250975B1 (en) Adaptive directional audio enhancement and selection
US8761410B1 (en) Systems and methods for multi-channel dereverberation
EP3005362B1 (fr) Appareil et procédé permettant d'améliorer une perception d'un signal sonore
Benesty et al. Binaural noise reduction in the time domain with a stereo setup
Schwartz et al. Nested generalized sidelobe canceller for joint dereverberation and noise reduction
TWI465121B (zh) 利用全方向麥克風改善通話的系統及方法
JP3583980B2 (ja) 収音装置及び受信装置
US20220225024A1 (en) Method and system for using single adaptive filter for echo and point noise cancellation
Garre et al. An Acoustic Echo Cancellation System based on Adaptive Algorithm
US10419851B2 (en) Retaining binaural cues when mixing microphone signals
Ruiz et al. Distributed combined acoustic echo cancellation and noise reduction using GEVD-based distributed adaptive node specific signal estimation with prior knowledge
Beracoechea et al. On building immersive audio applications using robust adaptive beamforming and joint audio-video source localization
Favrot et al. Adaptive equalizer for acoustic feedback control
Pasha et al. Multi-Channel Compression and Coding of Reverberant Ad-Hoc Recordings Through Spatial Autoregressive Modelling
Braun Speech Dereverberation in Noisy Environments Using Time-Frequency Domain Signal Models/Enthallung von Sprachsignalen unter Einfluss von Störgeräuschen mittels Signalmodellen im Zeit-Frequenz-Bereich
Lombard et al. Improved wideband blind adaptive system identification using decorrelation filters for the localization of multiple speakers
Arote et al. Multimicrophone Based Speech Dereverberation
CN116364104A (zh) 音频传输方法、装置、芯片、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19840134

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19840134

Country of ref document: EP

Kind code of ref document: A1