CN115696140B - Classroom audio multichannel echo cancellation method - Google Patents
Classroom audio multichannel echo cancellation method Download PDFInfo
- Publication number
- CN115696140B CN115696140B CN202211546136.3A CN202211546136A CN115696140B CN 115696140 B CN115696140 B CN 115696140B CN 202211546136 A CN202211546136 A CN 202211546136A CN 115696140 B CN115696140 B CN 115696140B
- Authority
- CN
- China
- Prior art keywords
- echo
- classroom
- filter
- signals
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
The invention relates to the technical field of echo cancellation, and discloses a classroom audio multichannel echo cancellation method, which comprises the following steps: the preprocessed microphone signals and the loudspeaker signals are used as audio signals to be echo eliminated, and the audio signals to be echo eliminated are segmented in a multi-classification filter cascading mode; constructing a multichannel echo filter, and inputting the segmented frequency spectrum into the multichannel echo filter; the multichannel echo filter is adjusted in real time according to the solved multichannel echo filter parameters; when the loudspeaker signals are in a dominant state in the classroom, the adjusted multichannel echo filter is utilized to carry out filtering processing on the loudspeaker signals, and the echo of the loudspeaker in the classroom is intelligently eliminated. The invention realizes the frequency spectrum extraction of the multi-frequency sampling point channels and the existence of echoes in the audio signal, and constructs a corresponding multi-channel echo filter for echo elimination processing of classrooms aiming at the extracted frequency spectrum of the multi-frequency sampling point channels.
Description
Technical Field
The invention relates to the technical field of echo cancellation, in particular to a classroom audio multichannel echo cancellation method.
Background
In a classroom audio/video system, a plurality of speakers and microphones are deployed in a room in order to obtain a better hearing sensation. The echoes generated using multiple speaker and microphone devices are referred to as multi-channel echoes. As the requirements of people for communication quality increase, multichannel echo cancellation is increasingly paid attention to. At the same time, increasingly complex acoustic environments result in echo paths with significant nonlinear characteristics. The existing method cannot effectively inhibit nonlinear echoes, and aiming at the problem, the patent provides a classroom audio multichannel echo cancellation method for realizing the echo cancellation of a classroom audio system.
Disclosure of Invention
In view of this, the present invention provides a classroom audio multichannel echo cancellation method, which aims to: the method comprises the steps of adopting a multi-classification filter cascading mode, segmenting audio signal spectrums with different Fourier transform sampling points by constructing classification filters with different center frequencies, selecting spectrums with larger energy differences between microphone signals and loudspeaker signals as a plurality of spectrums of the audio signals to be echo eliminated, and realizing multi-frequency domain sampling point channels and extracting the spectrums with echo; constructing a corresponding multichannel echo filter aiming at the extracted frequency spectrum of the multi-frequency sampling point number channel, and rapidly updating parameters of the multichannel echo filter by utilizing the extracted frequency spectrum; according to the detection result, when the loudspeaker in the classroom is the main sounding sound source, the loudspeaker signal is filtered by constructing a multi-channel echo filter, so that the intelligent elimination of the echo of the loudspeaker in the classroom is realized.
The invention provides a classroom audio multichannel echo cancellation method, which comprises the following steps:
s1: collecting classroom audio signals in real time, and preprocessing the collected classroom audio signals, wherein the classroom audio signals comprise microphone signals, loudspeaker signals, echo signals, human voice signals and noise signals;
s2: the preprocessed microphone signals and the loudspeaker signals are used as audio signals to be echo eliminated, and the audio signals to be echo eliminated are segmented in a multi-classification filter cascading mode to obtain a plurality of frequency spectrums of the audio signals to be echo eliminated;
s3: constructing a multichannel echo filter, inputting the segmented frequency spectrum into the multichannel echo filter, and solving to obtain parameters of the multichannel echo filter;
s4: according to the solved multi-channel echo filter parameters, the multi-channel echo filter is adjusted in real time, and the adjusted multi-channel echo filter is obtained;
s5: the state of the audio signal in the classroom is detected in real time, and when the speaker signal is in a dominant state in the classroom, the adjusted multi-channel echo filter is utilized to carry out filtering processing on the speaker signal, so that the echo of the speaker in the classroom is eliminated.
As a further improvement of the present invention:
optionally, the step S1 of collecting the classroom audio signal in real time includes:
the method comprises the steps of collecting classroom audio signals in real time to obtain a classroom audio signal sequence, wherein the classroom audio signals comprise microphone signals, loudspeaker signals, echo signals, human voice signals and noise signals, the microphone signals represent speaker audio signals received by a microphone, the loudspeaker signals represent audio signals sent by a loudspeaker, the echo signals represent signals generated by the loudspeaker along an echo path, the human voice signals represent speaker audio signals without the microphone, and the noise signals represent environmental noise;
the format of the classroom audio signal sequence Q is as follows:
wherein:
representing t i Classroom audio signal at time, t L The time difference between adjacent time intervals is 0.5 seconds, and the classroom audio signal sequence is a classroom audio signal sequence with L continuous moments;
representing t i Microphone signal of time of day,/>Representing t i A speaker signal acquired at a moment in time, wherein the acquired speaker signal comprises a speaker clean signal +.>Echo signal +.>;
In the embodiment of the invention, the microphone signals or the loudspeaker signals are all total signals of a plurality of groups of microphones or loudspeaker cascade connection in a classroom, and the invention acquires the loudspeaker signals near the loudspeaker by utilizing a sound sensor;
representing t i Other noise audio signals of time of day, including the human voice signal +.>Noise signal->In the embodiment of the invention, the sound sensor is arranged near the wall of the window in the classroom, and the audio signal acquired by the sound sensor is subtracted by the loudspeaker signal to obtain other noise audio signals.
Optionally, the preprocessing the collected classroom audio signal in step S1 includes:
preprocessing the collected classroom audio signals, wherein the classroom audio signals to be preprocessed comprise microphone signals, loudspeaker signals and other noise audio signals, and the classroom audio signals to be preprocessed are expressed as,j=1,2,3,/>,/>,The pretreatment flow of the classroom audio signal is as follows:
s11: constructing a hamming window function:
wherein:
l represents the length of the audio signal to be windowed;
s12: and windowing the classroom audio signal by utilizing a Hamming window function, wherein the windowing processing formula is as follows:
wherein:
s13: reconstructing to obtain classroom audio signal sequences of different categories after pretreatment:
wherein:
representing the microphone signal sequence in the pre-processed room,/->Representing the loudspeaker signal sequence in the pre-processed room, < > in the room>Representing other noise audio signal sequences within the pre-processed teaching room.
Optionally, in the step S2, the audio signal to be echo cancelled is sliced by adopting a multi-classification filter cascade method, including:
the method comprises the steps of taking a preprocessed microphone signal and a preprocessed loudspeaker signal as audio signals to be echo eliminated, segmenting the audio signals to be echo eliminated in a multi-classification filter cascading mode to obtain audio sequence numbers to be echo eliminated in different frequency domains, wherein the segmentation flow of the audio signals to be echo eliminated based on the multi-classification filter cascading is as follows:
s21: constructing M classification filters, wherein the center frequency of the mth classification filter isThe frequency response of the mth classification filter is:
wherein:
representing the mth classification filter receiving Fourier transform sampling point number as +.>Frequency of the audio signal spectrum of (a)Rate response;
l represents the length of the audio signal sequence;
s22: and respectively carrying out fast Fourier transform processing on the preprocessed microphone division signal sequence and the loudspeaker signal sequence:
wherein:
representing the audio signal sequence +.>In Fourier transform points->Lower frequency spectrum, +.>;
S23: will beInput to M classification filters, the logarithmic energy output by each classification filterThe method comprises the following steps:
the logarithmic energy of the M classification filters is summed:
if it isThen it is indicated that the loudspeaker signal sequence is in fourier transform sampling point +.>The spectral energy of (a) is significantly higher than that of the microphone signal sequence, indicating that the loudspeaker signal sequence is at the number of fourier transform sampling points +.>High energy echoes are present in the spectrum of (2) and marked +.>,/>Is the spectrum to be echo eliminated after segmentation, wherein +.>Representing an energy threshold;
s24: repeating the steps S22-S23 to obtain to-be-echo-eliminated spectrums with different Fourier transform sampling points, wherein the to-be-echo-eliminated spectrums are collected as follows:
wherein:
representing the v-th set of to-be-echo-cancelled spectrum, and N represents the number of sets of to-be-echo-cancelled spectrum.
Optionally, in the step S3, the segmented spectrum is input into the constructed multi-channel echo filter, and parameters of the multi-channel echo filter are obtained by solving, including:
constructing a multi-channel echo filter, wherein the multi-channel echo filter comprises N taps, each tap is provided with a tap vector, N represents the number of to-be-echo cancellation frequency spectrum groups of the input multi-channel echo filter, and the tap vectors are parameters of the multi-channel echo filter;
inputting the N groups of frequency spectrums after segmentation into a constructed multi-channel echo filter, and solving to obtain parameters of the multi-channel echo filter, wherein the solving flow of the parameters of the multi-channel echo filter is as follows:
s31: setting the order of the multichannel echo filter as K, setting the current order of the multichannel echo filter as K, setting the initial value of K as 0, and initializing the parameter H (0) of the multichannel echo filter:
wherein:
s32: inputting the spectrum of the cut loudspeaker signal sequence into a k-order multichannel echo filter, and outputting the result of the k-order multichannel echo filterThe method comprises the following steps:
s33: calculating a filtering error of the k-order multichannel echo filter:
s34: if k=l, then processing H (k) by inverse Fourier transform processing, and using the processing result as a multi-channel echo filter parameter H obtained by solving * Otherwise, updating the filtering parameters of the k+1-order multichannel echo filter:
wherein:
and let k=k+1, return to step S32.
Optionally, in the step S4, the adjusting the multichannel echo filter in real time according to the solved parameter of the multichannel echo filter includes:
according to the multi-channel echo filter parameters H obtained by solving * And adjusting parameters of the current multichannel echo filter in real time to obtain the adjusted multichannel echo filter.
Optionally, the detecting, in real time, the state of the audio signal in the classroom in step S5 includes:
detecting states of audio signals in a classroom in real time, wherein the states of the audio signals in the classroom comprise a state that a loudspeaker is in a leading state and a state that the loudspeaker is in a non-leading state, wherein the state that the loudspeaker is in the leading state indicates that the loudspeaker is a main sound source in the classroom, and the state that the loudspeaker is in the non-leading state indicates that human voice and noise in the teaching room are main sound sources;
the detection flow of the audio signal state in the classroom is as follows:
s51: constructing a state determination function E 1 :
Wherein:
covariance matrix representing microphone signal sequence and speaker signal sequence acquired in step S1, +.>An autocorrelation matrix representing a sequence of microphone signals;
s52: construction of the State decision function value E 2 :
Wherein:
signal means representing microphone signal sequence, +.>Signal means representing other noise signals, +.>Signal mean value representing a loudspeaker signal sequence, +.>Representing an average update value of each tap vector of the current multi-channel echo filter compared to the previous multi-channel echo filter;
s53: if it isOr->Then it is indicated that the other noise is too strong, wherein the other noise comprises speaker audio signal without microphone and ambient noise, indicating that the speaker is in a non-dominant state,/->Representing an autocorrelation threshold, otherwise indicating that the speaker is in a dominant state.
Optionally, in the step S5, filtering the speaker signal with the adjusted multichannel echo filter includes:
when the loudspeaker signals are in a dominant state in a classroom, the loudspeaker signals represented in the time domain are input into a multi-channel echo filter to obtain the loudspeaker signals after echo cancellation, and the adjusted multi-channel echo filter is utilized to carry out filtering processing on the loudspeaker signals so as to eliminate the echo of the loudspeaker in the classroom.
Compared with the prior art, the invention provides a classroom audio multichannel echo cancellation method, which has the following advantages:
firstly, the proposal provides an audio slicing method, which takes the preprocessed microphone signal and speaker signal as the audio signal to be echo eliminatedThe audio signal to be echo eliminated is cut in a multi-classification filter cascade mode to obtain audio sequence numbers to be echo eliminated in different frequency domains, wherein the audio signal to be echo eliminated cut flow based on the multi-classification filter cascade is as follows: constructing M classification filters, wherein the center frequency of the mth classification filter isThe frequency response of the mth classification filter is: />
Wherein:representing the mth classification filter receiving Fourier transform sampling point number as +.>Frequency response of the audio signal spectrum; l represents the length of the audio signal sequence; and respectively carrying out fast Fourier transform processing on the preprocessed microphone division signal sequence and the loudspeaker signal sequence:
wherein: c represents an imaginary unit and,;/>representing the audio signal sequence +.>In Fourier transform points->Lower frequency spectrum, +.>The method comprises the steps of carrying out a first treatment on the surface of the Will->Input to M classification filters, the logarithmic energy output by each classification filterThe method comprises the following steps:
the logarithmic energy of the M classification filters is summed:
if it isThen it is indicated that the loudspeaker signal sequence is in fourier transform sampling point +.>The spectral energy of (a) is significantly higher than that of the microphone signal sequence, indicating that the loudspeaker signal sequence is at the number of fourier transform sampling points +.>Higher energy echoes are present in the spectrum of (a) and marked +.>,/>Is the spectrum to be echo eliminated after segmentation, wherein +.>Representing an energy threshold;
repeating the steps to obtain to-be-echo cancellation spectrums with different Fourier transform sampling points, wherein the to-be-echo cancellation spectrums are collected as follows:
wherein:representing the v-th set of to-be-echo-cancelled spectrum, and N represents the number of sets of to-be-echo-cancelled spectrum. The method adopts a multi-classification filter cascading mode, divides the frequency spectrums of the audio signals with different Fourier transform sampling points by constructing classification filters with different center frequencies, selects the frequency spectrums with larger difference between microphone signal energy and loudspeaker signal energy as a plurality of frequency spectrums of the audio signals to be echo eliminated, and realizes the frequency spectrum extraction of multi-frequency domain sampling point channels and echo.
Meanwhile, the scheme provides an intelligent echo cancellation method, by detecting the states of audio signals in a classroom in real time, wherein the states of the audio signals in the classroom comprise a leading state of a loudspeaker and a non-leading state of the loudspeaker, wherein the leading state of the loudspeaker indicates that the loudspeaker is a main sound source in the classroom, and the non-leading state of the loudspeaker indicates that the voice and noise in the classroom are main sound sources; the detection flow of the audio signal state in the classroom is as follows: constructing a state determination function E 1 :
Wherein:covariance matrix representing acquired microphone signal sequence and speaker signal sequence, +.>An autocorrelation matrix representing a sequence of microphone signals; />Representing the standard deviation of the microphone signal sequence; />Representing a state decision function value; construction of the State decision function value +.>:
Wherein:signal means representing microphone signal sequence, +.>Signal means representing other noise signals, +.>Signal mean value representing a loudspeaker signal sequence, +.>Representing an average update value of each tap vector of the current multi-channel echo filter compared to the previous multi-channel echo filter; if->Or->Then it is indicated that the other noise is too strong, wherein the other noise comprises speaker audio signal without microphone and ambient noise, indicating that the speaker is in a non-dominant state,/->Representing an autocorrelation threshold, otherwise indicating that the speaker is in a dominant state. Aiming at the extracted frequency spectrum of the multi-frequency sampling point number channel, the scheme constructs a corresponding multi-channel echo filter, and the parameters of the multi-channel echo filter are rapidly updated by utilizing the extracted frequency spectrum; according to the detection result, when the loudspeaker in the classroom is the main sounding sound source, the loudspeaker signal is filtered by constructing a multi-channel echo filter, so that the intelligent elimination of the echo of the loudspeaker in the classroom is realized.
Drawings
Fig. 1 is a schematic flow chart of a classroom audio multichannel echo cancellation method according to an embodiment of the present invention;
fig. 2 is a functional block diagram of a classroom audio multichannel echo cancellation device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device for implementing a classroom audio multichannel echo cancellation method according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application provides a classroom audio multichannel echo cancellation method. The execution subject of the classroom audio multichannel echo cancellation method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiments of the present application. In other words, the classroom audio multi-channel echo cancellation method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Example 1:
s1: classroom audio signals are collected in real time and preprocessed, wherein the classroom audio signals include microphone signals, speaker signals, echo signals, human voice signals, and noise signals.
And in the step S1, classroom audio signals are collected in real time, and the method comprises the following steps:
the method comprises the steps of collecting classroom audio signals in real time to obtain a classroom audio signal sequence, wherein the classroom audio signals comprise microphone signals, loudspeaker signals, echo signals, human voice signals and noise signals, the microphone signals represent speaker audio signals received by a microphone, the loudspeaker signals represent audio signals sent by a loudspeaker, the echo signals represent signals generated by the loudspeaker along an echo path, the human voice signals represent speaker audio signals without the microphone, and the noise signals represent environmental noise;
the format of the classroom audio signal sequence Q is as follows:
wherein:
representing t i Classroom audio signal at time, t L The time difference between adjacent time intervals is 0.5 seconds, and the classroom audio signal sequence is a classroom audio signal sequence with L continuous moments;
representing t i Microphone signal of time of day,/>Representing t i A speaker signal acquired at a moment in time, wherein the acquired speaker signal comprises a speaker clean signal +.>Echo signal +.>;
Representing t i Other noise audio signals of time of day, including the human voice signal +.>Noise signal->。
And in the step S1, preprocessing the acquired classroom audio signals, wherein the preprocessing comprises the following steps:
preprocessing the collected classroom audio signals, wherein the classroom audio signals to be preprocessed comprise microphone signals, loudspeaker signals and other noise audio signals, and the classroom audio signals to be preprocessed are expressed as,j=1,2,3,/>,/>,The pretreatment flow of the classroom audio signal is as follows:
s11: constructing a hamming window function:
wherein:
l represents the length of the audio signal to be windowed;
s12: and windowing the classroom audio signal by utilizing a Hamming window function, wherein the windowing processing formula is as follows:
wherein:
s13: reconstructing to obtain classroom audio signal sequences of different categories after pretreatment:
wherein:
representing the microphone signal sequence in the pre-processed room,/->Representing the loudspeaker signal sequence in the pre-processed room, < > in the room>Representing other noise audio signal sequences within the pre-processed teaching room.
S2: and taking the preprocessed microphone signal and the preprocessed loudspeaker signal as audio signals to be echo eliminated, and segmenting the audio signals to be echo eliminated in a multi-classification filter cascading mode to obtain a plurality of frequency spectrums of the audio signals to be echo eliminated.
In the step S2, the audio signal to be echo cancelled is split in a multi-classification filter cascade manner, which includes:
the method comprises the steps of taking a preprocessed microphone signal and a preprocessed loudspeaker signal as audio signals to be echo eliminated, segmenting the audio signals to be echo eliminated in a multi-classification filter cascading mode to obtain audio sequence numbers to be echo eliminated in different frequency domains, wherein the segmentation flow of the audio signals to be echo eliminated based on the multi-classification filter cascading is as follows:
s21: constructing M classification filters, wherein the center frequency of the mth classification filter isThe frequency response of the mth classification filter is:
wherein:
representing the mth classification filter receiving Fourier transform sampling point number as +.>Frequency response of the audio signal spectrum;
l represents the length of the audio signal sequence;
s22: and respectively carrying out fast Fourier transform processing on the preprocessed microphone division signal sequence and the loudspeaker signal sequence:
wherein:
representing the audio signal sequence +.>In Fourier transform points->Lower frequency spectrum, +.>;
S23: will beInput to M classification filters, the logarithmic energy output by each classification filterThe method comprises the following steps:
the logarithmic energy of the M classification filters is summed:
if it isThen it is indicated that the loudspeaker signal sequence is in fourier transform sampling point +.>The spectral energy of (a) is significantly higher than that of the microphone signal sequence, indicating that the loudspeaker signal sequence is at the number of fourier transform sampling points +.>High energy echoes are present in the spectrum of (2) and marked +.>,/>Is the spectrum to be echo eliminated after segmentation, wherein +.>Representing an energy threshold;
s24: repeating the steps S22-S23 to obtain to-be-echo-eliminated spectrums with different Fourier transform sampling points, wherein the to-be-echo-eliminated spectrums are collected as follows:
wherein:
representing the v-th set of to-be-echo-cancelled spectrum, and N represents the number of sets of to-be-echo-cancelled spectrum.
S3: constructing a multichannel echo filter, inputting the segmented frequency spectrum into the multichannel echo filter, and solving to obtain parameters of the multichannel echo filter.
In the step S3, the segmented spectrum is input into the constructed multi-channel echo filter, and parameters of the multi-channel echo filter are obtained by solving, including:
constructing a multi-channel echo filter, wherein the multi-channel echo filter comprises N taps, each tap is provided with a tap vector, N represents the number of to-be-echo cancellation frequency spectrum groups of the input multi-channel echo filter, and the tap vectors are parameters of the multi-channel echo filter;
inputting the N groups of frequency spectrums after segmentation into a constructed multi-channel echo filter, and solving to obtain parameters of the multi-channel echo filter, wherein the solving flow of the parameters of the multi-channel echo filter is as follows:
s31: setting the order of the multichannel echo filter as K, setting the current order of the multichannel echo filter as K, setting the initial value of K as 0, and initializing the parameter H (0) of the multichannel echo filter:
wherein:
s32: inputting the spectrum of the cut loudspeaker signal sequence into a k-order multichannel echo filter, and outputting the result of the k-order multichannel echo filterThe method comprises the following steps:
s33: calculating a filtering error of the k-order multichannel echo filter:
s34: if k=l, then processing H (k) by inverse Fourier transform processing, and using the processing result as a multi-channel echo filter parameter H obtained by solving * Otherwise, updating the filtering parameters of the k+1-order multichannel echo filter:
wherein:
and let k=k+1, return to step S32.
S4: and adjusting the multichannel echo filter in real time according to the solved multichannel echo filter parameters to obtain an adjusted multichannel echo filter.
And in the step S4, the multichannel echo filter is adjusted in real time according to the solved multichannel echo filter parameters, and the method comprises the following steps:
according to the multi-channel echo filter parameters H obtained by solving * And adjusting parameters of the current multichannel echo filter in real time to obtain the adjusted multichannel echo filter.
S5: the state of the audio signal in the classroom is detected in real time, and when the speaker signal is in a dominant state in the classroom, the adjusted multi-channel echo filter is utilized to carry out filtering processing on the speaker signal, so that the echo of the speaker in the classroom is eliminated.
And in the step S5, detecting the state of the audio signal in the classroom in real time, wherein the method comprises the following steps:
detecting states of audio signals in a classroom in real time, wherein the states of the audio signals in the classroom comprise a state that a loudspeaker is in a leading state and a state that the loudspeaker is in a non-leading state, wherein the state that the loudspeaker is in the leading state indicates that the loudspeaker is a main sound source in the classroom, and the state that the loudspeaker is in the non-leading state indicates that human voice and noise in the teaching room are main sound sources;
the detection flow of the audio signal state in the classroom is as follows:
s51: constructing a state determination function E 1 :
Wherein:
representing the microphone signal sequence and the loudspeaker signal sequence acquired in the step S1Covariance matrix of columns>An autocorrelation matrix representing a sequence of microphone signals;
s52: construction of the State decision function value E 2 :
Wherein:
signal means representing microphone signal sequence, +.>Signal means representing other noise signals, +.>Signal mean value representing a loudspeaker signal sequence, +.>Representing an average update value of each tap vector of the current multi-channel echo filter compared to the previous multi-channel echo filter;
s53: if it isOr->Then it is indicated that the other noise is too strong, wherein the other noise comprises speaker audio signal without microphone and ambient noise, indicating that the speaker is in a non-dominant state,/->Representing an autocorrelation threshold, otherwise indicating that the speaker is in a dominant state.
And in the step S5, filtering the loudspeaker signal by using the adjusted multichannel echo filter, wherein the filtering comprises the following steps:
when the loudspeaker signals are in a dominant state in a classroom, the loudspeaker signals represented in the time domain are input into a multi-channel echo filter to obtain the loudspeaker signals after echo cancellation, and the adjusted multi-channel echo filter is utilized to carry out filtering processing on the loudspeaker signals so as to eliminate the echo of the loudspeaker in the classroom.
Example 2:
fig. 2 is a functional block diagram of a classroom audio multichannel echo cancellation device according to an embodiment of the present invention, which can implement the classroom audio multichannel echo cancellation method in embodiment 1.
The classroom audio multichannel echo cancellation device 100 of the present invention may be installed in an electronic device. Depending on the implemented functions, the classroom audio multichannel echo cancellation device may include an audio signal processing module 101, an audio state detection module 102, and an echo cancellation device 103. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
The audio signal processing module 101 is used for collecting classroom audio signals in real time and preprocessing the collected classroom audio signals;
the audio state detection module 102 is configured to detect a state of an audio signal in a classroom, and when the speaker signal is in a dominant state in the classroom, perform filtering processing on the speaker signal by using the adjusted multi-channel echo filter, so as to eliminate echo of the speaker in the classroom;
the echo cancellation device 103 is configured to segment the microphone signal and the speaker signal after being preprocessed as audio signals to be echo cancelled by adopting a multi-classification filter cascade connection manner, obtain a plurality of spectrums of the audio signals to be echo cancelled, construct a multi-channel echo filter, input the segmented spectrums into the multi-channel echo filter, solve parameters of the multi-channel echo filter, and adjust the multi-channel echo filter in real time according to the solved parameters of the multi-channel echo filter, so as to obtain the adjusted multi-channel echo filter.
In detail, the modules in the classroom audio multichannel echo cancellation device 100 in the embodiment of the present invention use the same technical means as the classroom audio multichannel echo cancellation method described in fig. 1, and can produce the same technical effects, which are not described herein.
Example 3:
fig. 3 is a schematic structural diagram of an electronic device for implementing a classroom audio multichannel echo cancellation method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication interface 13 and a bus, and may further comprise a computer program, such as program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of the program 12, but also for temporarily storing data that has been output or is to be output.
The communication interface 13 may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device 1 and other electronic devices and to enable connection communication between internal components of the electronic device.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective parts of the entire electronic device using various interfaces and lines, executes or executes programs or modules (a program 12 for echo cancellation, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
collecting classroom audio signals in real time, and preprocessing the collected classroom audio signals;
the preprocessed microphone signals and the loudspeaker signals are used as audio signals to be echo eliminated, and the audio signals to be echo eliminated are segmented in a multi-classification filter cascading mode to obtain a plurality of frequency spectrums of the audio signals to be echo eliminated;
constructing a multichannel echo filter, inputting the segmented frequency spectrum into the multichannel echo filter, and solving to obtain parameters of the multichannel echo filter;
according to the solved multi-channel echo filter parameters, the multi-channel echo filter is adjusted in real time, and the adjusted multi-channel echo filter is obtained;
the state of the audio signal in the classroom is detected in real time, and when the speaker signal is in a dominant state in the classroom, the adjusted multi-channel echo filter is utilized to carry out filtering processing on the speaker signal, so that the echo of the speaker in the classroom is eliminated.
Specifically, the specific implementation method of the above instructions by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 3, which are not repeated herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (6)
1. A classroom audio multichannel echo cancellation method, the method comprising:
s1: collecting classroom audio signals in real time, and preprocessing the collected classroom audio signals, wherein the classroom audio signals comprise microphone signals, loudspeaker signals, echo signals, human voice signals and noise signals;
s2: the preprocessed microphone signals and the loudspeaker signals are used as audio signals to be echo eliminated, and the audio signals to be echo eliminated are segmented in a multi-classification filter cascading mode to obtain a plurality of frequency spectrums of the audio signals to be echo eliminated;
splitting an audio signal to be subjected to echo cancellation in a multi-classification filter cascade manner, wherein the method comprises the following steps:
the method comprises the steps of taking a preprocessed microphone signal and a preprocessed loudspeaker signal as audio signals to be echo eliminated, segmenting the audio signals to be echo eliminated in a multi-classification filter cascading mode to obtain audio sequence numbers to be echo eliminated in different frequency domains, wherein the segmentation flow of the audio signals to be echo eliminated based on the multi-classification filter cascading is as follows:
s21: constructing M classification filters, wherein the center frequency of the mth classification filter isThe frequency response of the mth classification filter is:
wherein:
representing the mth classification filter receiving Fourier transform sampling point number as +.>Frequency response of the audio signal spectrum;
l represents the length of the audio signal sequence;
s22: and respectively carrying out fast Fourier transform processing on the preprocessed microphone signal sequence and the loudspeaker signal sequence:
wherein:
representing the audio signal sequence +.>In Fourier transform points->Lower frequency spectrum, +.>;
l represents the length of the audio signal sequence;
S23: will beInput to M classification filters, the logarithmic energy of each classification filter output +.>The method comprises the following steps: />
The logarithmic energy of the M classification filters is summed:
if it isThen it is indicated that the loudspeaker signal sequence is in fourier transform sampling point +.>The spectral energy of (a) is significantly higher than that of the microphone signal sequence, indicating that the loudspeaker signal sequence is at the number of fourier transform sampling points +.>High energy echoes are present in the spectrum of (2) and marked +.>,/>Is the spectrum to be echo eliminated after segmentation, wherein +.>Representing an energy threshold;
s24: repeating the steps S22-S23 to obtain to-be-echo-eliminated spectrums with different Fourier transform sampling points, wherein the to-be-echo-eliminated spectrums are collected as follows:
wherein:
representing the v group of to-be-echo-eliminated spectrum, and N represents the group number of to-be-echo-eliminated spectrum;
s3: constructing a multichannel echo filter, inputting the segmented frequency spectrum into the multichannel echo filter, and solving to obtain parameters of the multichannel echo filter, wherein the method comprises the following steps of:
constructing a multi-channel echo filter, wherein the multi-channel echo filter comprises N taps, each tap is provided with a tap vector, N represents the number of to-be-echo cancellation frequency spectrum groups of the input multi-channel echo filter, and the tap vectors are parameters of the multi-channel echo filter;
inputting the N groups of frequency spectrums after segmentation into a constructed multi-channel echo filter, and solving to obtain parameters of the multi-channel echo filter, wherein the solving flow of the parameters of the multi-channel echo filter is as follows:
s31: setting the order of the multichannel echo filter as K, setting the current order of the multichannel echo filter as K, setting the initial value of K as 0, and initializing parameters of the multichannel echo filter:
Wherein:
s32: inputting the spectrum of the cut loudspeaker signal sequence into a k-order multichannel echo filter, and outputting the result of the k-order multichannel echo filterThe method comprises the following steps:
s33: calculating a filtering error of the k-order multichannel echo filter:
s34: if it isThen use inverse Fourier transform processing pair +.>Processing and taking the processing result as a multi-channel echo filter parameter obtained by solving>Otherwise update->Filtering parameters of the order multichannel echo filter:;
wherein:
s4: according to the solved multi-channel echo filter parameters, the multi-channel echo filter is adjusted in real time, and the adjusted multi-channel echo filter is obtained;
s5: the state of the audio signal in the classroom is detected in real time, and when the speaker signal is in a dominant state in the classroom, the adjusted multi-channel echo filter is utilized to carry out filtering processing on the speaker signal, so that the echo of the speaker in the classroom is eliminated.
2. The classroom audio multichannel echo cancellation method according to claim 1, wherein the step S1 of acquiring the classroom audio signal in real time includes:
and acquiring classroom audio signals in real time to obtain a classroom audio signal sequence, wherein the format of the classroom audio signal sequence Q is as follows:
wherein:
representation->Classroom audio signal of moment->The time difference between adjacent time intervals is 0.5 seconds, and the classroom audio signal sequence is a classroom audio signal sequence with L continuous moments;
representation->Microphone signal of time of day,/>Representation->A speaker signal acquired at a moment in time, wherein the acquired speaker signal comprises a speaker clean signal +.>Echo signal +.>;
3. The classroom audio multichannel echo cancellation method according to claim 1, wherein the preprocessing of the acquired classroom audio signal in step S1 includes:
preprocessing the collected classroom audio signals, wherein the classroom audio signals to be preprocessed comprise microphone signals, loudspeaker signals and other noise audio signals, and the classroom audio signals to be preprocessed are expressed as,/>,/>,/>,/>The pretreatment flow of the classroom audio signal is as follows:
s11: constructing a hamming window function:
wherein:
l represents the length of the audio signal to be windowed;
s12: and windowing the classroom audio signal by utilizing a Hamming window function, wherein the windowing processing formula is as follows:
wherein:
s13: reconstructing to obtain classroom audio signal sequences of different categories after pretreatment:
wherein:
4. The classroom audio multichannel echo cancellation method according to claim 1, wherein the step S4 of adjusting the multichannel echo filter in real time according to the solved multichannel echo filter parameters comprises:
5. The classroom audio multichannel echo cancellation method as claimed in claim 1, wherein said step S5 of detecting the state of the audio signal in the classroom in real time comprises:
detecting states of audio signals in a classroom in real time, wherein the states of the audio signals in the classroom comprise a state that a loudspeaker is in a leading state and a state that the loudspeaker is in a non-leading state, wherein the state that the loudspeaker is in the leading state indicates that the loudspeaker is a main sound source in the classroom, and the state that the loudspeaker is in the non-leading state indicates that human voice and noise in the teaching room are main sound sources;
the detection flow of the audio signal state in the classroom is as follows:
Wherein:
representing the microphone signal sequence and the loudspeaker signal sequence acquired in step S1Covariance matrix>An autocorrelation matrix representing a sequence of microphone signals;
Wherein:
signal means representing microphone signal sequence, +.>Signal means representing other noise signals, +.>Signal mean value representing a loudspeaker signal sequence, +.>Representing an average update value of each tap vector of the current multi-channel echo filter compared to the previous multi-channel echo filter;
s53: if it isOr->Then it is indicated that the other noise is too strong, wherein the other noise comprises speaker audio signal without microphone and ambient noise, indicating that the speaker is in a non-dominant state,/->Representing an autocorrelation threshold, otherwise indicating that the speaker is in a dominant state.
6. The classroom audio multichannel echo cancellation method as claimed in claim 5, wherein said step S5 of filtering the speaker signal with the adjusted multichannel echo filter comprises:
when the loudspeaker signals are in a dominant state in a classroom, the loudspeaker signals represented in the time domain are input into a multi-channel echo filter to obtain the loudspeaker signals after echo cancellation, and the adjusted multi-channel echo filter is utilized to carry out filtering processing on the loudspeaker signals so as to eliminate the echo of the loudspeaker in the classroom.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211546136.3A CN115696140B (en) | 2022-12-05 | 2022-12-05 | Classroom audio multichannel echo cancellation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211546136.3A CN115696140B (en) | 2022-12-05 | 2022-12-05 | Classroom audio multichannel echo cancellation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115696140A CN115696140A (en) | 2023-02-03 |
CN115696140B true CN115696140B (en) | 2023-05-26 |
Family
ID=85055130
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211546136.3A Active CN115696140B (en) | 2022-12-05 | 2022-12-05 | Classroom audio multichannel echo cancellation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115696140B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106210368A (en) * | 2016-06-20 | 2016-12-07 | 百度在线网络技术(北京)有限公司 | The method and apparatus eliminating multiple channel acousto echo |
CN107105366A (en) * | 2017-06-15 | 2017-08-29 | 歌尔股份有限公司 | A kind of multi-channel echo eliminates circuit, method and smart machine |
CN107564539A (en) * | 2017-08-29 | 2018-01-09 | 苏州奇梦者网络科技有限公司 | Towards the acoustic echo removing method and device of microphone array |
CN108630219A (en) * | 2018-05-08 | 2018-10-09 | 北京小鱼在家科技有限公司 | A kind of audio frequency processing system, method, apparatus, equipment and storage medium |
CN108630217A (en) * | 2017-03-21 | 2018-10-09 | 豪威科技股份有限公司 | The echo cancelling system and method for residual echo with reduction |
CN111031448A (en) * | 2019-11-12 | 2020-04-17 | 西安讯飞超脑信息科技有限公司 | Echo cancellation method, echo cancellation device, electronic equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10446165B2 (en) * | 2017-09-27 | 2019-10-15 | Sonos, Inc. | Robust short-time fourier transform acoustic echo cancellation during audio playback |
CN111755020B (en) * | 2020-08-07 | 2023-02-28 | 南京时保联信息科技有限公司 | Stereo echo cancellation method |
-
2022
- 2022-12-05 CN CN202211546136.3A patent/CN115696140B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106210368A (en) * | 2016-06-20 | 2016-12-07 | 百度在线网络技术(北京)有限公司 | The method and apparatus eliminating multiple channel acousto echo |
CN108630217A (en) * | 2017-03-21 | 2018-10-09 | 豪威科技股份有限公司 | The echo cancelling system and method for residual echo with reduction |
CN107105366A (en) * | 2017-06-15 | 2017-08-29 | 歌尔股份有限公司 | A kind of multi-channel echo eliminates circuit, method and smart machine |
CN107564539A (en) * | 2017-08-29 | 2018-01-09 | 苏州奇梦者网络科技有限公司 | Towards the acoustic echo removing method and device of microphone array |
CN108630219A (en) * | 2018-05-08 | 2018-10-09 | 北京小鱼在家科技有限公司 | A kind of audio frequency processing system, method, apparatus, equipment and storage medium |
CN111031448A (en) * | 2019-11-12 | 2020-04-17 | 西安讯飞超脑信息科技有限公司 | Echo cancellation method, echo cancellation device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115696140A (en) | 2023-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110335620B (en) | Noise suppression method and device and mobile terminal | |
US10504539B2 (en) | Voice activity detection systems and methods | |
US8065115B2 (en) | Method and system for identifying audible noise as wind noise in a hearing aid apparatus | |
WO2021196905A1 (en) | Voice signal dereverberation processing method and apparatus, computer device and storage medium | |
KR20100065811A (en) | Apparatus and method for speech recognition by using source separation and source identification | |
CN108962231B (en) | Voice classification method, device, server and storage medium | |
US20110022361A1 (en) | Sound processing device, sound processing method, and program | |
CN110827843A (en) | Audio processing method and device, storage medium and electronic equipment | |
US9928848B2 (en) | Audio signal noise reduction in noisy environments | |
CN105225672B (en) | Merge the system and method for the dual microphone orientation noise suppression of fundamental frequency information | |
CN103903612A (en) | Method for performing real-time digital speech recognition | |
CN103811023A (en) | Audio processing device, method and program | |
CN111883135A (en) | Voice transcription method and device and electronic equipment | |
CN111968651A (en) | WT (WT) -based voiceprint recognition method and system | |
CN105931648B (en) | Audio signal solution reverberation method and device | |
CN112992190B (en) | Audio signal processing method and device, electronic equipment and storage medium | |
CN115696140B (en) | Classroom audio multichannel echo cancellation method | |
CN106340310B (en) | Speech detection method and device | |
Valero et al. | Classification of audio scenes using narrow-band autocorrelation features | |
CN116312561A (en) | Method, system and device for voice print recognition, authentication, noise reduction and voice enhancement of personnel in power dispatching system | |
CN111429937B (en) | Voice separation method, model training method and electronic equipment | |
JP2003271168A (en) | Method, device and program for extracting signal, and recording medium recorded with the program | |
KR101096091B1 (en) | Apparatus for Separating Voice and Method for Separating Voice of Single Channel Using the Same | |
CN111782860A (en) | Audio detection method and device and storage medium | |
Sapozhnykov | Sub-band detector for wind-induced noise |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |