CN115696140B - Classroom audio multichannel echo cancellation method - Google Patents

Classroom audio multichannel echo cancellation method Download PDF

Info

Publication number
CN115696140B
CN115696140B CN202211546136.3A CN202211546136A CN115696140B CN 115696140 B CN115696140 B CN 115696140B CN 202211546136 A CN202211546136 A CN 202211546136A CN 115696140 B CN115696140 B CN 115696140B
Authority
CN
China
Prior art keywords
echo
classroom
filter
signals
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211546136.3A
Other languages
Chinese (zh)
Other versions
CN115696140A (en
Inventor
刘建洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Dongmak Information Technology Co ltd
Original Assignee
Changsha Dongmak Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Dongmak Information Technology Co ltd filed Critical Changsha Dongmak Information Technology Co ltd
Priority to CN202211546136.3A priority Critical patent/CN115696140B/en
Publication of CN115696140A publication Critical patent/CN115696140A/en
Application granted granted Critical
Publication of CN115696140B publication Critical patent/CN115696140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The invention relates to the technical field of echo cancellation, and discloses a classroom audio multichannel echo cancellation method, which comprises the following steps: the preprocessed microphone signals and the loudspeaker signals are used as audio signals to be echo eliminated, and the audio signals to be echo eliminated are segmented in a multi-classification filter cascading mode; constructing a multichannel echo filter, and inputting the segmented frequency spectrum into the multichannel echo filter; the multichannel echo filter is adjusted in real time according to the solved multichannel echo filter parameters; when the loudspeaker signals are in a dominant state in the classroom, the adjusted multichannel echo filter is utilized to carry out filtering processing on the loudspeaker signals, and the echo of the loudspeaker in the classroom is intelligently eliminated. The invention realizes the frequency spectrum extraction of the multi-frequency sampling point channels and the existence of echoes in the audio signal, and constructs a corresponding multi-channel echo filter for echo elimination processing of classrooms aiming at the extracted frequency spectrum of the multi-frequency sampling point channels.

Description

Classroom audio multichannel echo cancellation method
Technical Field
The invention relates to the technical field of echo cancellation, in particular to a classroom audio multichannel echo cancellation method.
Background
In a classroom audio/video system, a plurality of speakers and microphones are deployed in a room in order to obtain a better hearing sensation. The echoes generated using multiple speaker and microphone devices are referred to as multi-channel echoes. As the requirements of people for communication quality increase, multichannel echo cancellation is increasingly paid attention to. At the same time, increasingly complex acoustic environments result in echo paths with significant nonlinear characteristics. The existing method cannot effectively inhibit nonlinear echoes, and aiming at the problem, the patent provides a classroom audio multichannel echo cancellation method for realizing the echo cancellation of a classroom audio system.
Disclosure of Invention
In view of this, the present invention provides a classroom audio multichannel echo cancellation method, which aims to: the method comprises the steps of adopting a multi-classification filter cascading mode, segmenting audio signal spectrums with different Fourier transform sampling points by constructing classification filters with different center frequencies, selecting spectrums with larger energy differences between microphone signals and loudspeaker signals as a plurality of spectrums of the audio signals to be echo eliminated, and realizing multi-frequency domain sampling point channels and extracting the spectrums with echo; constructing a corresponding multichannel echo filter aiming at the extracted frequency spectrum of the multi-frequency sampling point number channel, and rapidly updating parameters of the multichannel echo filter by utilizing the extracted frequency spectrum; according to the detection result, when the loudspeaker in the classroom is the main sounding sound source, the loudspeaker signal is filtered by constructing a multi-channel echo filter, so that the intelligent elimination of the echo of the loudspeaker in the classroom is realized.
The invention provides a classroom audio multichannel echo cancellation method, which comprises the following steps:
s1: collecting classroom audio signals in real time, and preprocessing the collected classroom audio signals, wherein the classroom audio signals comprise microphone signals, loudspeaker signals, echo signals, human voice signals and noise signals;
s2: the preprocessed microphone signals and the loudspeaker signals are used as audio signals to be echo eliminated, and the audio signals to be echo eliminated are segmented in a multi-classification filter cascading mode to obtain a plurality of frequency spectrums of the audio signals to be echo eliminated;
s3: constructing a multichannel echo filter, inputting the segmented frequency spectrum into the multichannel echo filter, and solving to obtain parameters of the multichannel echo filter;
s4: according to the solved multi-channel echo filter parameters, the multi-channel echo filter is adjusted in real time, and the adjusted multi-channel echo filter is obtained;
s5: the state of the audio signal in the classroom is detected in real time, and when the speaker signal is in a dominant state in the classroom, the adjusted multi-channel echo filter is utilized to carry out filtering processing on the speaker signal, so that the echo of the speaker in the classroom is eliminated.
As a further improvement of the present invention:
optionally, the step S1 of collecting the classroom audio signal in real time includes:
the method comprises the steps of collecting classroom audio signals in real time to obtain a classroom audio signal sequence, wherein the classroom audio signals comprise microphone signals, loudspeaker signals, echo signals, human voice signals and noise signals, the microphone signals represent speaker audio signals received by a microphone, the loudspeaker signals represent audio signals sent by a loudspeaker, the echo signals represent signals generated by the loudspeaker along an echo path, the human voice signals represent speaker audio signals without the microphone, and the noise signals represent environmental noise;
the format of the classroom audio signal sequence Q is as follows:
Figure 832123DEST_PATH_IMAGE001
Figure 991709DEST_PATH_IMAGE002
Figure 716214DEST_PATH_IMAGE003
wherein:
representing t i Classroom audio signal at time, t L The time difference between adjacent time intervals is 0.5 seconds, and the classroom audio signal sequence is a classroom audio signal sequence with L continuous moments;
Figure 670263DEST_PATH_IMAGE004
representing t i Microphone signal of time of day,/>
Figure 916437DEST_PATH_IMAGE005
Representing t i A speaker signal acquired at a moment in time, wherein the acquired speaker signal comprises a speaker clean signal +.>
Figure 192565DEST_PATH_IMAGE006
Echo signal +.>
Figure 942215DEST_PATH_IMAGE007
Figure 365368DEST_PATH_IMAGE008
In the embodiment of the invention, the microphone signals or the loudspeaker signals are all total signals of a plurality of groups of microphones or loudspeaker cascade connection in a classroom, and the invention acquires the loudspeaker signals near the loudspeaker by utilizing a sound sensor;
Figure 567679DEST_PATH_IMAGE009
representing t i Other noise audio signals of time of day, including the human voice signal +.>
Figure 98762DEST_PATH_IMAGE010
Noise signal->
Figure 702918DEST_PATH_IMAGE011
In the embodiment of the invention, the sound sensor is arranged near the wall of the window in the classroom, and the audio signal acquired by the sound sensor is subtracted by the loudspeaker signal to obtain other noise audio signals.
Optionally, the preprocessing the collected classroom audio signal in step S1 includes:
preprocessing the collected classroom audio signals, wherein the classroom audio signals to be preprocessed comprise microphone signals, loudspeaker signals and other noise audio signals, and the classroom audio signals to be preprocessed are expressed as
Figure 828131DEST_PATH_IMAGE012
,j=1,2,3,/>
Figure 48897DEST_PATH_IMAGE013
,/>
Figure 858371DEST_PATH_IMAGE014
Figure 848193DEST_PATH_IMAGE015
The pretreatment flow of the classroom audio signal is as follows:
s11: constructing a hamming window function:
Figure 314946DEST_PATH_IMAGE016
wherein:
Figure 993314DEST_PATH_IMAGE017
is a window function;
Figure 898822DEST_PATH_IMAGE018
representing the window function coefficient, which is set to 0.43;
l represents the length of the audio signal to be windowed;
s12: and windowing the classroom audio signal by utilizing a Hamming window function, wherein the windowing processing formula is as follows:
Figure 444948DEST_PATH_IMAGE019
wherein:
Figure 410499DEST_PATH_IMAGE020
at t i Classroom audio signal of time of day->
Figure 107322DEST_PATH_IMAGE021
Is a result of windowing;
s13: reconstructing to obtain classroom audio signal sequences of different categories after pretreatment:
Figure 754204DEST_PATH_IMAGE022
j=1,2,3
wherein:
Figure 476872DEST_PATH_IMAGE023
representing the microphone signal sequence in the pre-processed room,/->
Figure 82166DEST_PATH_IMAGE024
Representing the loudspeaker signal sequence in the pre-processed room, < > in the room>
Figure 735127DEST_PATH_IMAGE025
Representing other noise audio signal sequences within the pre-processed teaching room.
Optionally, in the step S2, the audio signal to be echo cancelled is sliced by adopting a multi-classification filter cascade method, including:
the method comprises the steps of taking a preprocessed microphone signal and a preprocessed loudspeaker signal as audio signals to be echo eliminated, segmenting the audio signals to be echo eliminated in a multi-classification filter cascading mode to obtain audio sequence numbers to be echo eliminated in different frequency domains, wherein the segmentation flow of the audio signals to be echo eliminated based on the multi-classification filter cascading is as follows:
s21: constructing M classification filters, wherein the center frequency of the mth classification filter is
Figure 654541DEST_PATH_IMAGE026
The frequency response of the mth classification filter is:
Figure 942303DEST_PATH_IMAGE027
wherein:
Figure 482612DEST_PATH_IMAGE028
representing the mth classification filter receiving Fourier transform sampling point number as +.>
Figure 855825DEST_PATH_IMAGE029
Frequency of the audio signal spectrum of (a)Rate response;
l represents the length of the audio signal sequence;
s22: and respectively carrying out fast Fourier transform processing on the preprocessed microphone division signal sequence and the loudspeaker signal sequence:
Figure 142712DEST_PATH_IMAGE030
wherein:
c represents an imaginary unit and,
Figure 550559DEST_PATH_IMAGE031
Figure 700918DEST_PATH_IMAGE032
representing the audio signal sequence +.>
Figure 393717DEST_PATH_IMAGE033
In Fourier transform points->
Figure 186093DEST_PATH_IMAGE029
Lower frequency spectrum, +.>
Figure 481070DEST_PATH_IMAGE034
S23: will be
Figure 802330DEST_PATH_IMAGE035
Input to M classification filters, the logarithmic energy output by each classification filter
Figure 445408DEST_PATH_IMAGE036
The method comprises the following steps:
Figure 510316DEST_PATH_IMAGE037
the logarithmic energy of the M classification filters is summed:
Figure 128641DEST_PATH_IMAGE038
Figure 683119DEST_PATH_IMAGE039
if it is
Figure 518220DEST_PATH_IMAGE040
Then it is indicated that the loudspeaker signal sequence is in fourier transform sampling point +.>
Figure 246591DEST_PATH_IMAGE029
The spectral energy of (a) is significantly higher than that of the microphone signal sequence, indicating that the loudspeaker signal sequence is at the number of fourier transform sampling points +.>
Figure 280275DEST_PATH_IMAGE029
High energy echoes are present in the spectrum of (2) and marked +.>
Figure 241540DEST_PATH_IMAGE041
,/>
Figure 298357DEST_PATH_IMAGE042
Is the spectrum to be echo eliminated after segmentation, wherein +.>
Figure 265920DEST_PATH_IMAGE043
Representing an energy threshold;
s24: repeating the steps S22-S23 to obtain to-be-echo-eliminated spectrums with different Fourier transform sampling points, wherein the to-be-echo-eliminated spectrums are collected as follows:
Figure 91793DEST_PATH_IMAGE044
wherein:
Figure 755118DEST_PATH_IMAGE045
representing the v-th set of to-be-echo-cancelled spectrum, and N represents the number of sets of to-be-echo-cancelled spectrum.
Optionally, in the step S3, the segmented spectrum is input into the constructed multi-channel echo filter, and parameters of the multi-channel echo filter are obtained by solving, including:
constructing a multi-channel echo filter, wherein the multi-channel echo filter comprises N taps, each tap is provided with a tap vector, N represents the number of to-be-echo cancellation frequency spectrum groups of the input multi-channel echo filter, and the tap vectors are parameters of the multi-channel echo filter;
inputting the N groups of frequency spectrums after segmentation into a constructed multi-channel echo filter, and solving to obtain parameters of the multi-channel echo filter, wherein the solving flow of the parameters of the multi-channel echo filter is as follows:
s31: setting the order of the multichannel echo filter as K, setting the current order of the multichannel echo filter as K, setting the initial value of K as 0, and initializing the parameter H (0) of the multichannel echo filter:
Figure 299232DEST_PATH_IMAGE046
wherein:
t represents the transpose of the number,
Figure 509634DEST_PATH_IMAGE047
a spectral representation representing an nth tap vector;
s32: inputting the spectrum of the cut loudspeaker signal sequence into a k-order multichannel echo filter, and outputting the result of the k-order multichannel echo filter
Figure 756725DEST_PATH_IMAGE048
The method comprises the following steps:
Figure 823907DEST_PATH_IMAGE049
Figure 560044DEST_PATH_IMAGE050
s33: calculating a filtering error of the k-order multichannel echo filter:
Figure 167612DEST_PATH_IMAGE051
Figure 263350DEST_PATH_IMAGE052
s34: if k=l, then processing H (k) by inverse Fourier transform processing, and using the processing result as a multi-channel echo filter parameter H obtained by solving * Otherwise, updating the filtering parameters of the k+1-order multichannel echo filter:
Figure 799636DEST_PATH_IMAGE053
wherein:
Figure 318342DEST_PATH_IMAGE054
representing a sign function, b representing an update step size, which is set to 0.1;
and let k=k+1, return to step S32.
Optionally, in the step S4, the adjusting the multichannel echo filter in real time according to the solved parameter of the multichannel echo filter includes:
according to the multi-channel echo filter parameters H obtained by solving * And adjusting parameters of the current multichannel echo filter in real time to obtain the adjusted multichannel echo filter.
Optionally, the detecting, in real time, the state of the audio signal in the classroom in step S5 includes:
detecting states of audio signals in a classroom in real time, wherein the states of the audio signals in the classroom comprise a state that a loudspeaker is in a leading state and a state that the loudspeaker is in a non-leading state, wherein the state that the loudspeaker is in the leading state indicates that the loudspeaker is a main sound source in the classroom, and the state that the loudspeaker is in the non-leading state indicates that human voice and noise in the teaching room are main sound sources;
the detection flow of the audio signal state in the classroom is as follows:
s51: constructing a state determination function E 1
Figure 870546DEST_PATH_IMAGE055
Wherein:
Figure 814932DEST_PATH_IMAGE056
covariance matrix representing microphone signal sequence and speaker signal sequence acquired in step S1, +.>
Figure 427179DEST_PATH_IMAGE057
An autocorrelation matrix representing a sequence of microphone signals;
Figure 731384DEST_PATH_IMAGE058
representing the standard deviation of the microphone signal sequence;
Figure 87279DEST_PATH_IMAGE059
representing a state decision function value;
s52: construction of the State decision function value E 2
Figure 157609DEST_PATH_IMAGE060
Wherein:
Figure 940758DEST_PATH_IMAGE061
signal means representing microphone signal sequence, +.>
Figure 168477DEST_PATH_IMAGE062
Signal means representing other noise signals, +.>
Figure 219740DEST_PATH_IMAGE063
Signal mean value representing a loudspeaker signal sequence, +.>
Figure 619279DEST_PATH_IMAGE064
Representing an average update value of each tap vector of the current multi-channel echo filter compared to the previous multi-channel echo filter;
s53: if it is
Figure 307749DEST_PATH_IMAGE065
Or->
Figure 85081DEST_PATH_IMAGE066
Then it is indicated that the other noise is too strong, wherein the other noise comprises speaker audio signal without microphone and ambient noise, indicating that the speaker is in a non-dominant state,/->
Figure 549823DEST_PATH_IMAGE067
Representing an autocorrelation threshold, otherwise indicating that the speaker is in a dominant state.
Optionally, in the step S5, filtering the speaker signal with the adjusted multichannel echo filter includes:
when the loudspeaker signals are in a dominant state in a classroom, the loudspeaker signals represented in the time domain are input into a multi-channel echo filter to obtain the loudspeaker signals after echo cancellation, and the adjusted multi-channel echo filter is utilized to carry out filtering processing on the loudspeaker signals so as to eliminate the echo of the loudspeaker in the classroom.
Compared with the prior art, the invention provides a classroom audio multichannel echo cancellation method, which has the following advantages:
firstly, the proposal provides an audio slicing method, which takes the preprocessed microphone signal and speaker signal as the audio signal to be echo eliminatedThe audio signal to be echo eliminated is cut in a multi-classification filter cascade mode to obtain audio sequence numbers to be echo eliminated in different frequency domains, wherein the audio signal to be echo eliminated cut flow based on the multi-classification filter cascade is as follows: constructing M classification filters, wherein the center frequency of the mth classification filter is
Figure 768314DEST_PATH_IMAGE026
The frequency response of the mth classification filter is: />
Figure 454117DEST_PATH_IMAGE027
Wherein:
Figure 390849DEST_PATH_IMAGE068
representing the mth classification filter receiving Fourier transform sampling point number as +.>
Figure 157817DEST_PATH_IMAGE069
Frequency response of the audio signal spectrum; l represents the length of the audio signal sequence; and respectively carrying out fast Fourier transform processing on the preprocessed microphone division signal sequence and the loudspeaker signal sequence:
Figure 60176DEST_PATH_IMAGE030
wherein: c represents an imaginary unit and,
Figure 152766DEST_PATH_IMAGE070
;/>
Figure 358486DEST_PATH_IMAGE071
representing the audio signal sequence +.>
Figure 397987DEST_PATH_IMAGE072
In Fourier transform points->
Figure 358115DEST_PATH_IMAGE069
Lower frequency spectrum, +.>
Figure 621606DEST_PATH_IMAGE073
The method comprises the steps of carrying out a first treatment on the surface of the Will->
Figure 64088DEST_PATH_IMAGE071
Input to M classification filters, the logarithmic energy output by each classification filter
Figure 936973DEST_PATH_IMAGE074
The method comprises the following steps:
Figure 515722DEST_PATH_IMAGE037
the logarithmic energy of the M classification filters is summed:
Figure 654842DEST_PATH_IMAGE038
Figure 115779DEST_PATH_IMAGE039
if it is
Figure 267055DEST_PATH_IMAGE075
Then it is indicated that the loudspeaker signal sequence is in fourier transform sampling point +.>
Figure 231469DEST_PATH_IMAGE069
The spectral energy of (a) is significantly higher than that of the microphone signal sequence, indicating that the loudspeaker signal sequence is at the number of fourier transform sampling points +.>
Figure 305604DEST_PATH_IMAGE069
Higher energy echoes are present in the spectrum of (a) and marked +.>
Figure 224144DEST_PATH_IMAGE076
,/>
Figure 674717DEST_PATH_IMAGE077
Is the spectrum to be echo eliminated after segmentation, wherein +.>
Figure 992172DEST_PATH_IMAGE078
Representing an energy threshold;
repeating the steps to obtain to-be-echo cancellation spectrums with different Fourier transform sampling points, wherein the to-be-echo cancellation spectrums are collected as follows:
Figure 237209DEST_PATH_IMAGE079
wherein:
Figure 377466DEST_PATH_IMAGE080
representing the v-th set of to-be-echo-cancelled spectrum, and N represents the number of sets of to-be-echo-cancelled spectrum. The method adopts a multi-classification filter cascading mode, divides the frequency spectrums of the audio signals with different Fourier transform sampling points by constructing classification filters with different center frequencies, selects the frequency spectrums with larger difference between microphone signal energy and loudspeaker signal energy as a plurality of frequency spectrums of the audio signals to be echo eliminated, and realizes the frequency spectrum extraction of multi-frequency domain sampling point channels and echo.
Meanwhile, the scheme provides an intelligent echo cancellation method, by detecting the states of audio signals in a classroom in real time, wherein the states of the audio signals in the classroom comprise a leading state of a loudspeaker and a non-leading state of the loudspeaker, wherein the leading state of the loudspeaker indicates that the loudspeaker is a main sound source in the classroom, and the non-leading state of the loudspeaker indicates that the voice and noise in the classroom are main sound sources; the detection flow of the audio signal state in the classroom is as follows: constructing a state determination function E 1
Figure 897309DEST_PATH_IMAGE055
Wherein:
Figure 39577DEST_PATH_IMAGE081
covariance matrix representing acquired microphone signal sequence and speaker signal sequence, +.>
Figure 948191DEST_PATH_IMAGE082
An autocorrelation matrix representing a sequence of microphone signals; />
Figure 808699DEST_PATH_IMAGE083
Representing the standard deviation of the microphone signal sequence; />
Figure 430436DEST_PATH_IMAGE084
Representing a state decision function value; construction of the State decision function value +.>
Figure 161631DEST_PATH_IMAGE085
Figure 309322DEST_PATH_IMAGE060
Wherein:
Figure 657127DEST_PATH_IMAGE086
signal means representing microphone signal sequence, +.>
Figure 722035DEST_PATH_IMAGE087
Signal means representing other noise signals, +.>
Figure 137098DEST_PATH_IMAGE088
Signal mean value representing a loudspeaker signal sequence, +.>
Figure 160418DEST_PATH_IMAGE089
Representing an average update value of each tap vector of the current multi-channel echo filter compared to the previous multi-channel echo filter; if->
Figure 968755DEST_PATH_IMAGE090
Or->
Figure 430829DEST_PATH_IMAGE091
Then it is indicated that the other noise is too strong, wherein the other noise comprises speaker audio signal without microphone and ambient noise, indicating that the speaker is in a non-dominant state,/->
Figure 372503DEST_PATH_IMAGE092
Representing an autocorrelation threshold, otherwise indicating that the speaker is in a dominant state. Aiming at the extracted frequency spectrum of the multi-frequency sampling point number channel, the scheme constructs a corresponding multi-channel echo filter, and the parameters of the multi-channel echo filter are rapidly updated by utilizing the extracted frequency spectrum; according to the detection result, when the loudspeaker in the classroom is the main sounding sound source, the loudspeaker signal is filtered by constructing a multi-channel echo filter, so that the intelligent elimination of the echo of the loudspeaker in the classroom is realized.
Drawings
Fig. 1 is a schematic flow chart of a classroom audio multichannel echo cancellation method according to an embodiment of the present invention;
fig. 2 is a functional block diagram of a classroom audio multichannel echo cancellation device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device for implementing a classroom audio multichannel echo cancellation method according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application provides a classroom audio multichannel echo cancellation method. The execution subject of the classroom audio multichannel echo cancellation method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiments of the present application. In other words, the classroom audio multi-channel echo cancellation method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Example 1:
s1: classroom audio signals are collected in real time and preprocessed, wherein the classroom audio signals include microphone signals, speaker signals, echo signals, human voice signals, and noise signals.
And in the step S1, classroom audio signals are collected in real time, and the method comprises the following steps:
the method comprises the steps of collecting classroom audio signals in real time to obtain a classroom audio signal sequence, wherein the classroom audio signals comprise microphone signals, loudspeaker signals, echo signals, human voice signals and noise signals, the microphone signals represent speaker audio signals received by a microphone, the loudspeaker signals represent audio signals sent by a loudspeaker, the echo signals represent signals generated by the loudspeaker along an echo path, the human voice signals represent speaker audio signals without the microphone, and the noise signals represent environmental noise;
the format of the classroom audio signal sequence Q is as follows:
Figure 832303DEST_PATH_IMAGE001
Figure 154700DEST_PATH_IMAGE002
Figure 856683DEST_PATH_IMAGE003
wherein:
Figure 479294DEST_PATH_IMAGE093
representing t i Classroom audio signal at time, t L The time difference between adjacent time intervals is 0.5 seconds, and the classroom audio signal sequence is a classroom audio signal sequence with L continuous moments;
Figure 80302DEST_PATH_IMAGE004
representing t i Microphone signal of time of day,/>
Figure 217891DEST_PATH_IMAGE005
Representing t i A speaker signal acquired at a moment in time, wherein the acquired speaker signal comprises a speaker clean signal +.>
Figure 865777DEST_PATH_IMAGE006
Echo signal +.>
Figure 578781DEST_PATH_IMAGE007
Figure 380384DEST_PATH_IMAGE009
Representing t i Other noise audio signals of time of day, including the human voice signal +.>
Figure 644749DEST_PATH_IMAGE010
Noise signal->
Figure 455580DEST_PATH_IMAGE011
And in the step S1, preprocessing the acquired classroom audio signals, wherein the preprocessing comprises the following steps:
preprocessing the collected classroom audio signals, wherein the classroom audio signals to be preprocessed comprise microphone signals, loudspeaker signals and other noise audio signals, and the classroom audio signals to be preprocessed are expressed as
Figure 757510DEST_PATH_IMAGE012
,j=1,2,3,/>
Figure 995593DEST_PATH_IMAGE013
,/>
Figure 514299DEST_PATH_IMAGE014
Figure 102057DEST_PATH_IMAGE015
The pretreatment flow of the classroom audio signal is as follows:
s11: constructing a hamming window function:
Figure 22608DEST_PATH_IMAGE016
wherein:
Figure 401899DEST_PATH_IMAGE017
is a window function;
Figure 204639DEST_PATH_IMAGE018
representing the window function coefficient, which is set to 0.43;
l represents the length of the audio signal to be windowed;
s12: and windowing the classroom audio signal by utilizing a Hamming window function, wherein the windowing processing formula is as follows:
Figure 793490DEST_PATH_IMAGE019
wherein:
Figure 568548DEST_PATH_IMAGE020
at t i Classroom audio signal of time of day->
Figure 414013DEST_PATH_IMAGE021
Is a result of windowing;
s13: reconstructing to obtain classroom audio signal sequences of different categories after pretreatment:
Figure 408776DEST_PATH_IMAGE022
j=1,2,3
wherein:
Figure 302783DEST_PATH_IMAGE023
representing the microphone signal sequence in the pre-processed room,/->
Figure 221760DEST_PATH_IMAGE024
Representing the loudspeaker signal sequence in the pre-processed room, < > in the room>
Figure 972548DEST_PATH_IMAGE025
Representing other noise audio signal sequences within the pre-processed teaching room.
S2: and taking the preprocessed microphone signal and the preprocessed loudspeaker signal as audio signals to be echo eliminated, and segmenting the audio signals to be echo eliminated in a multi-classification filter cascading mode to obtain a plurality of frequency spectrums of the audio signals to be echo eliminated.
In the step S2, the audio signal to be echo cancelled is split in a multi-classification filter cascade manner, which includes:
the method comprises the steps of taking a preprocessed microphone signal and a preprocessed loudspeaker signal as audio signals to be echo eliminated, segmenting the audio signals to be echo eliminated in a multi-classification filter cascading mode to obtain audio sequence numbers to be echo eliminated in different frequency domains, wherein the segmentation flow of the audio signals to be echo eliminated based on the multi-classification filter cascading is as follows:
s21: constructing M classification filters, wherein the center frequency of the mth classification filter is
Figure 189028DEST_PATH_IMAGE026
The frequency response of the mth classification filter is:
Figure 417884DEST_PATH_IMAGE027
wherein:
Figure 433113DEST_PATH_IMAGE028
representing the mth classification filter receiving Fourier transform sampling point number as +.>
Figure 56599DEST_PATH_IMAGE029
Frequency response of the audio signal spectrum;
l represents the length of the audio signal sequence;
s22: and respectively carrying out fast Fourier transform processing on the preprocessed microphone division signal sequence and the loudspeaker signal sequence:
Figure 55648DEST_PATH_IMAGE030
wherein:
c represents an imaginary unit and,
Figure 58501DEST_PATH_IMAGE031
Figure 459396DEST_PATH_IMAGE032
representing the audio signal sequence +.>
Figure 259642DEST_PATH_IMAGE033
In Fourier transform points->
Figure 480408DEST_PATH_IMAGE029
Lower frequency spectrum, +.>
Figure 552531DEST_PATH_IMAGE034
S23: will be
Figure 73512DEST_PATH_IMAGE035
Input to M classification filters, the logarithmic energy output by each classification filter
Figure 835538DEST_PATH_IMAGE036
The method comprises the following steps:
Figure 746862DEST_PATH_IMAGE037
the logarithmic energy of the M classification filters is summed:
Figure 121212DEST_PATH_IMAGE038
Figure 998163DEST_PATH_IMAGE039
if it is
Figure 901397DEST_PATH_IMAGE040
Then it is indicated that the loudspeaker signal sequence is in fourier transform sampling point +.>
Figure 612868DEST_PATH_IMAGE029
The spectral energy of (a) is significantly higher than that of the microphone signal sequence, indicating that the loudspeaker signal sequence is at the number of fourier transform sampling points +.>
Figure 994171DEST_PATH_IMAGE029
High energy echoes are present in the spectrum of (2) and marked +.>
Figure 460050DEST_PATH_IMAGE041
,/>
Figure 65343DEST_PATH_IMAGE042
Is the spectrum to be echo eliminated after segmentation, wherein +.>
Figure 246532DEST_PATH_IMAGE043
Representing an energy threshold;
s24: repeating the steps S22-S23 to obtain to-be-echo-eliminated spectrums with different Fourier transform sampling points, wherein the to-be-echo-eliminated spectrums are collected as follows:
Figure 431526DEST_PATH_IMAGE044
wherein:
Figure 751911DEST_PATH_IMAGE045
representing the v-th set of to-be-echo-cancelled spectrum, and N represents the number of sets of to-be-echo-cancelled spectrum.
S3: constructing a multichannel echo filter, inputting the segmented frequency spectrum into the multichannel echo filter, and solving to obtain parameters of the multichannel echo filter.
In the step S3, the segmented spectrum is input into the constructed multi-channel echo filter, and parameters of the multi-channel echo filter are obtained by solving, including:
constructing a multi-channel echo filter, wherein the multi-channel echo filter comprises N taps, each tap is provided with a tap vector, N represents the number of to-be-echo cancellation frequency spectrum groups of the input multi-channel echo filter, and the tap vectors are parameters of the multi-channel echo filter;
inputting the N groups of frequency spectrums after segmentation into a constructed multi-channel echo filter, and solving to obtain parameters of the multi-channel echo filter, wherein the solving flow of the parameters of the multi-channel echo filter is as follows:
s31: setting the order of the multichannel echo filter as K, setting the current order of the multichannel echo filter as K, setting the initial value of K as 0, and initializing the parameter H (0) of the multichannel echo filter:
Figure 996948DEST_PATH_IMAGE046
wherein:
t represents the transpose of the number,
Figure 432477DEST_PATH_IMAGE047
a spectral representation representing an nth tap vector;
s32: inputting the spectrum of the cut loudspeaker signal sequence into a k-order multichannel echo filter, and outputting the result of the k-order multichannel echo filter
Figure 659977DEST_PATH_IMAGE048
The method comprises the following steps:
Figure 333404DEST_PATH_IMAGE049
Figure 313124DEST_PATH_IMAGE050
s33: calculating a filtering error of the k-order multichannel echo filter:
Figure 704791DEST_PATH_IMAGE051
Figure 261281DEST_PATH_IMAGE052
s34: if k=l, then processing H (k) by inverse Fourier transform processing, and using the processing result as a multi-channel echo filter parameter H obtained by solving * Otherwise, updating the filtering parameters of the k+1-order multichannel echo filter:
Figure 258055DEST_PATH_IMAGE053
wherein:
Figure 408676DEST_PATH_IMAGE054
representing a sign function, b representing an update step size, which is set to 0.1;
and let k=k+1, return to step S32.
S4: and adjusting the multichannel echo filter in real time according to the solved multichannel echo filter parameters to obtain an adjusted multichannel echo filter.
And in the step S4, the multichannel echo filter is adjusted in real time according to the solved multichannel echo filter parameters, and the method comprises the following steps:
according to the multi-channel echo filter parameters H obtained by solving * And adjusting parameters of the current multichannel echo filter in real time to obtain the adjusted multichannel echo filter.
S5: the state of the audio signal in the classroom is detected in real time, and when the speaker signal is in a dominant state in the classroom, the adjusted multi-channel echo filter is utilized to carry out filtering processing on the speaker signal, so that the echo of the speaker in the classroom is eliminated.
And in the step S5, detecting the state of the audio signal in the classroom in real time, wherein the method comprises the following steps:
detecting states of audio signals in a classroom in real time, wherein the states of the audio signals in the classroom comprise a state that a loudspeaker is in a leading state and a state that the loudspeaker is in a non-leading state, wherein the state that the loudspeaker is in the leading state indicates that the loudspeaker is a main sound source in the classroom, and the state that the loudspeaker is in the non-leading state indicates that human voice and noise in the teaching room are main sound sources;
the detection flow of the audio signal state in the classroom is as follows:
s51: constructing a state determination function E 1
Figure 22060DEST_PATH_IMAGE055
Wherein:
Figure 110802DEST_PATH_IMAGE056
representing the microphone signal sequence and the loudspeaker signal sequence acquired in the step S1Covariance matrix of columns>
Figure 289980DEST_PATH_IMAGE057
An autocorrelation matrix representing a sequence of microphone signals;
Figure 80344DEST_PATH_IMAGE058
representing the standard deviation of the microphone signal sequence;
Figure 915444DEST_PATH_IMAGE059
representing a state decision function value;
s52: construction of the State decision function value E 2
Figure 79316DEST_PATH_IMAGE060
Wherein:
Figure 50683DEST_PATH_IMAGE061
signal means representing microphone signal sequence, +.>
Figure 776063DEST_PATH_IMAGE062
Signal means representing other noise signals, +.>
Figure 458979DEST_PATH_IMAGE063
Signal mean value representing a loudspeaker signal sequence, +.>
Figure 432401DEST_PATH_IMAGE064
Representing an average update value of each tap vector of the current multi-channel echo filter compared to the previous multi-channel echo filter;
s53: if it is
Figure 523854DEST_PATH_IMAGE065
Or->
Figure 859282DEST_PATH_IMAGE066
Then it is indicated that the other noise is too strong, wherein the other noise comprises speaker audio signal without microphone and ambient noise, indicating that the speaker is in a non-dominant state,/->
Figure 465713DEST_PATH_IMAGE067
Representing an autocorrelation threshold, otherwise indicating that the speaker is in a dominant state.
And in the step S5, filtering the loudspeaker signal by using the adjusted multichannel echo filter, wherein the filtering comprises the following steps:
when the loudspeaker signals are in a dominant state in a classroom, the loudspeaker signals represented in the time domain are input into a multi-channel echo filter to obtain the loudspeaker signals after echo cancellation, and the adjusted multi-channel echo filter is utilized to carry out filtering processing on the loudspeaker signals so as to eliminate the echo of the loudspeaker in the classroom.
Example 2:
fig. 2 is a functional block diagram of a classroom audio multichannel echo cancellation device according to an embodiment of the present invention, which can implement the classroom audio multichannel echo cancellation method in embodiment 1.
The classroom audio multichannel echo cancellation device 100 of the present invention may be installed in an electronic device. Depending on the implemented functions, the classroom audio multichannel echo cancellation device may include an audio signal processing module 101, an audio state detection module 102, and an echo cancellation device 103. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
The audio signal processing module 101 is used for collecting classroom audio signals in real time and preprocessing the collected classroom audio signals;
the audio state detection module 102 is configured to detect a state of an audio signal in a classroom, and when the speaker signal is in a dominant state in the classroom, perform filtering processing on the speaker signal by using the adjusted multi-channel echo filter, so as to eliminate echo of the speaker in the classroom;
the echo cancellation device 103 is configured to segment the microphone signal and the speaker signal after being preprocessed as audio signals to be echo cancelled by adopting a multi-classification filter cascade connection manner, obtain a plurality of spectrums of the audio signals to be echo cancelled, construct a multi-channel echo filter, input the segmented spectrums into the multi-channel echo filter, solve parameters of the multi-channel echo filter, and adjust the multi-channel echo filter in real time according to the solved parameters of the multi-channel echo filter, so as to obtain the adjusted multi-channel echo filter.
In detail, the modules in the classroom audio multichannel echo cancellation device 100 in the embodiment of the present invention use the same technical means as the classroom audio multichannel echo cancellation method described in fig. 1, and can produce the same technical effects, which are not described herein.
Example 3:
fig. 3 is a schematic structural diagram of an electronic device for implementing a classroom audio multichannel echo cancellation method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication interface 13 and a bus, and may further comprise a computer program, such as program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of the program 12, but also for temporarily storing data that has been output or is to be output.
The communication interface 13 may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device 1 and other electronic devices and to enable connection communication between internal components of the electronic device.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective parts of the entire electronic device using various interfaces and lines, executes or executes programs or modules (a program 12 for echo cancellation, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
collecting classroom audio signals in real time, and preprocessing the collected classroom audio signals;
the preprocessed microphone signals and the loudspeaker signals are used as audio signals to be echo eliminated, and the audio signals to be echo eliminated are segmented in a multi-classification filter cascading mode to obtain a plurality of frequency spectrums of the audio signals to be echo eliminated;
constructing a multichannel echo filter, inputting the segmented frequency spectrum into the multichannel echo filter, and solving to obtain parameters of the multichannel echo filter;
according to the solved multi-channel echo filter parameters, the multi-channel echo filter is adjusted in real time, and the adjusted multi-channel echo filter is obtained;
the state of the audio signal in the classroom is detected in real time, and when the speaker signal is in a dominant state in the classroom, the adjusted multi-channel echo filter is utilized to carry out filtering processing on the speaker signal, so that the echo of the speaker in the classroom is eliminated.
Specifically, the specific implementation method of the above instructions by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 3, which are not repeated herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (6)

1. A classroom audio multichannel echo cancellation method, the method comprising:
s1: collecting classroom audio signals in real time, and preprocessing the collected classroom audio signals, wherein the classroom audio signals comprise microphone signals, loudspeaker signals, echo signals, human voice signals and noise signals;
s2: the preprocessed microphone signals and the loudspeaker signals are used as audio signals to be echo eliminated, and the audio signals to be echo eliminated are segmented in a multi-classification filter cascading mode to obtain a plurality of frequency spectrums of the audio signals to be echo eliminated;
splitting an audio signal to be subjected to echo cancellation in a multi-classification filter cascade manner, wherein the method comprises the following steps:
the method comprises the steps of taking a preprocessed microphone signal and a preprocessed loudspeaker signal as audio signals to be echo eliminated, segmenting the audio signals to be echo eliminated in a multi-classification filter cascading mode to obtain audio sequence numbers to be echo eliminated in different frequency domains, wherein the segmentation flow of the audio signals to be echo eliminated based on the multi-classification filter cascading is as follows:
s21: constructing M classification filters, wherein the center frequency of the mth classification filter is
Figure QLYQS_1
The frequency response of the mth classification filter is:
Figure QLYQS_2
wherein:
Figure QLYQS_3
representing the mth classification filter receiving Fourier transform sampling point number as +.>
Figure QLYQS_4
Frequency response of the audio signal spectrum;
l represents the length of the audio signal sequence;
s22: and respectively carrying out fast Fourier transform processing on the preprocessed microphone signal sequence and the loudspeaker signal sequence:
Figure QLYQS_5
wherein:
c represents an imaginary unit and,
Figure QLYQS_6
Figure QLYQS_7
representing the audio signal sequence +.>
Figure QLYQS_8
In Fourier transform points->
Figure QLYQS_9
Lower frequency spectrum, +.>
Figure QLYQS_10
Figure QLYQS_11
A spectral sequence representing a fourier transform of the audio signal sequence;
l represents the length of the audio signal sequence;
Figure QLYQS_12
representing different Fourier transformation points, wherein the value ranges are +.>
Figure QLYQS_13
S23: will be
Figure QLYQS_14
Input to M classification filters, the logarithmic energy of each classification filter output +.>
Figure QLYQS_15
The method comprises the following steps: />
Figure QLYQS_16
The logarithmic energy of the M classification filters is summed:
Figure QLYQS_17
Figure QLYQS_18
if it is
Figure QLYQS_19
Then it is indicated that the loudspeaker signal sequence is in fourier transform sampling point +.>
Figure QLYQS_20
The spectral energy of (a) is significantly higher than that of the microphone signal sequence, indicating that the loudspeaker signal sequence is at the number of fourier transform sampling points +.>
Figure QLYQS_21
High energy echoes are present in the spectrum of (2) and marked +.>
Figure QLYQS_22
,/>
Figure QLYQS_23
Is the spectrum to be echo eliminated after segmentation, wherein +.>
Figure QLYQS_24
Representing an energy threshold;
s24: repeating the steps S22-S23 to obtain to-be-echo-eliminated spectrums with different Fourier transform sampling points, wherein the to-be-echo-eliminated spectrums are collected as follows:
Figure QLYQS_25
wherein:
Figure QLYQS_26
representing the v group of to-be-echo-eliminated spectrum, and N represents the group number of to-be-echo-eliminated spectrum;
s3: constructing a multichannel echo filter, inputting the segmented frequency spectrum into the multichannel echo filter, and solving to obtain parameters of the multichannel echo filter, wherein the method comprises the following steps of:
constructing a multi-channel echo filter, wherein the multi-channel echo filter comprises N taps, each tap is provided with a tap vector, N represents the number of to-be-echo cancellation frequency spectrum groups of the input multi-channel echo filter, and the tap vectors are parameters of the multi-channel echo filter;
inputting the N groups of frequency spectrums after segmentation into a constructed multi-channel echo filter, and solving to obtain parameters of the multi-channel echo filter, wherein the solving flow of the parameters of the multi-channel echo filter is as follows:
s31: setting the order of the multichannel echo filter as K, setting the current order of the multichannel echo filter as K, setting the initial value of K as 0, and initializing parameters of the multichannel echo filter
Figure QLYQS_27
Figure QLYQS_28
Wherein:
t represents the transpose of the number,
Figure QLYQS_29
a spectral representation representing an nth tap vector;
s32: inputting the spectrum of the cut loudspeaker signal sequence into a k-order multichannel echo filter, and outputting the result of the k-order multichannel echo filter
Figure QLYQS_30
The method comprises the following steps:
Figure QLYQS_31
Figure QLYQS_32
s33: calculating a filtering error of the k-order multichannel echo filter:
Figure QLYQS_33
Figure QLYQS_34
s34: if it is
Figure QLYQS_35
Then use inverse Fourier transform processing pair +.>
Figure QLYQS_36
Processing and taking the processing result as a multi-channel echo filter parameter obtained by solving>
Figure QLYQS_37
Otherwise update->
Figure QLYQS_38
Filtering parameters of the order multichannel echo filter:
Figure QLYQS_39
wherein:
Figure QLYQS_40
representing a sign function, b representing an update step size, which is set to 0.1;
Figure QLYQS_41
representing the filtering error of the k-order multichannel echo filter;
and order
Figure QLYQS_42
Step S32;
s4: according to the solved multi-channel echo filter parameters, the multi-channel echo filter is adjusted in real time, and the adjusted multi-channel echo filter is obtained;
s5: the state of the audio signal in the classroom is detected in real time, and when the speaker signal is in a dominant state in the classroom, the adjusted multi-channel echo filter is utilized to carry out filtering processing on the speaker signal, so that the echo of the speaker in the classroom is eliminated.
2. The classroom audio multichannel echo cancellation method according to claim 1, wherein the step S1 of acquiring the classroom audio signal in real time includes:
and acquiring classroom audio signals in real time to obtain a classroom audio signal sequence, wherein the format of the classroom audio signal sequence Q is as follows:
Figure QLYQS_43
wherein:
Figure QLYQS_44
representation->
Figure QLYQS_45
Classroom audio signal of moment->
Figure QLYQS_46
The time difference between adjacent time intervals is 0.5 seconds, and the classroom audio signal sequence is a classroom audio signal sequence with L continuous moments;
Figure QLYQS_47
representation->
Figure QLYQS_48
Microphone signal of time of day,/>
Figure QLYQS_49
Representation->
Figure QLYQS_50
A speaker signal acquired at a moment in time, wherein the acquired speaker signal comprises a speaker clean signal +.>
Figure QLYQS_51
Echo signal +.>
Figure QLYQS_52
Figure QLYQS_53
Representation->
Figure QLYQS_54
Other noise of time of daySound frequency signal, including human voice signal->
Figure QLYQS_55
Noise signal->
Figure QLYQS_56
3. The classroom audio multichannel echo cancellation method according to claim 1, wherein the preprocessing of the acquired classroom audio signal in step S1 includes:
preprocessing the collected classroom audio signals, wherein the classroom audio signals to be preprocessed comprise microphone signals, loudspeaker signals and other noise audio signals, and the classroom audio signals to be preprocessed are expressed as
Figure QLYQS_57
,/>
Figure QLYQS_58
,/>
Figure QLYQS_59
,/>
Figure QLYQS_60
,/>
Figure QLYQS_61
The pretreatment flow of the classroom audio signal is as follows:
s11: constructing a hamming window function:
Figure QLYQS_62
/>
wherein:
Figure QLYQS_63
is a window function;
Figure QLYQS_64
representing the window function coefficient, which is set to 0.43;
l represents the length of the audio signal to be windowed;
s12: and windowing the classroom audio signal by utilizing a Hamming window function, wherein the windowing processing formula is as follows:
Figure QLYQS_65
wherein:
Figure QLYQS_66
is->
Figure QLYQS_67
Classroom audio signal of time of day->
Figure QLYQS_68
Is a result of windowing;
s13: reconstructing to obtain classroom audio signal sequences of different categories after pretreatment:
Figure QLYQS_69
wherein:
Figure QLYQS_70
representing the microphone signal sequence in the pre-processed room,/->
Figure QLYQS_71
Representing the sequence of speaker signals within the pre-processed chamber,
Figure QLYQS_72
representing other noise audio signal sequences within the pre-processed teaching room.
4. The classroom audio multichannel echo cancellation method according to claim 1, wherein the step S4 of adjusting the multichannel echo filter in real time according to the solved multichannel echo filter parameters comprises:
according to the multi-channel echo filter parameters obtained by solving
Figure QLYQS_73
And adjusting parameters of the current multichannel echo filter in real time to obtain the adjusted multichannel echo filter.
5. The classroom audio multichannel echo cancellation method as claimed in claim 1, wherein said step S5 of detecting the state of the audio signal in the classroom in real time comprises:
detecting states of audio signals in a classroom in real time, wherein the states of the audio signals in the classroom comprise a state that a loudspeaker is in a leading state and a state that the loudspeaker is in a non-leading state, wherein the state that the loudspeaker is in the leading state indicates that the loudspeaker is a main sound source in the classroom, and the state that the loudspeaker is in the non-leading state indicates that human voice and noise in the teaching room are main sound sources;
the detection flow of the audio signal state in the classroom is as follows:
s51: constructing a state decision function
Figure QLYQS_74
Figure QLYQS_75
Wherein:
Figure QLYQS_76
representing the microphone signal sequence and the loudspeaker signal sequence acquired in step S1Covariance matrix>
Figure QLYQS_77
An autocorrelation matrix representing a sequence of microphone signals;
Figure QLYQS_78
representing the standard deviation of the microphone signal sequence;
Figure QLYQS_79
representing a state decision function value;
s52: construction of state decision function values
Figure QLYQS_80
Figure QLYQS_81
Wherein:
Figure QLYQS_82
signal means representing microphone signal sequence, +.>
Figure QLYQS_83
Signal means representing other noise signals, +.>
Figure QLYQS_84
Signal mean value representing a loudspeaker signal sequence, +.>
Figure QLYQS_85
Representing an average update value of each tap vector of the current multi-channel echo filter compared to the previous multi-channel echo filter;
s53: if it is
Figure QLYQS_86
Or->
Figure QLYQS_87
Then it is indicated that the other noise is too strong, wherein the other noise comprises speaker audio signal without microphone and ambient noise, indicating that the speaker is in a non-dominant state,/->
Figure QLYQS_88
Representing an autocorrelation threshold, otherwise indicating that the speaker is in a dominant state.
6. The classroom audio multichannel echo cancellation method as claimed in claim 5, wherein said step S5 of filtering the speaker signal with the adjusted multichannel echo filter comprises:
when the loudspeaker signals are in a dominant state in a classroom, the loudspeaker signals represented in the time domain are input into a multi-channel echo filter to obtain the loudspeaker signals after echo cancellation, and the adjusted multi-channel echo filter is utilized to carry out filtering processing on the loudspeaker signals so as to eliminate the echo of the loudspeaker in the classroom.
CN202211546136.3A 2022-12-05 2022-12-05 Classroom audio multichannel echo cancellation method Active CN115696140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211546136.3A CN115696140B (en) 2022-12-05 2022-12-05 Classroom audio multichannel echo cancellation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211546136.3A CN115696140B (en) 2022-12-05 2022-12-05 Classroom audio multichannel echo cancellation method

Publications (2)

Publication Number Publication Date
CN115696140A CN115696140A (en) 2023-02-03
CN115696140B true CN115696140B (en) 2023-05-26

Family

ID=85055130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211546136.3A Active CN115696140B (en) 2022-12-05 2022-12-05 Classroom audio multichannel echo cancellation method

Country Status (1)

Country Link
CN (1) CN115696140B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106210368A (en) * 2016-06-20 2016-12-07 百度在线网络技术(北京)有限公司 The method and apparatus eliminating multiple channel acousto echo
CN107105366A (en) * 2017-06-15 2017-08-29 歌尔股份有限公司 A kind of multi-channel echo eliminates circuit, method and smart machine
CN107564539A (en) * 2017-08-29 2018-01-09 苏州奇梦者网络科技有限公司 Towards the acoustic echo removing method and device of microphone array
CN108630219A (en) * 2018-05-08 2018-10-09 北京小鱼在家科技有限公司 A kind of audio frequency processing system, method, apparatus, equipment and storage medium
CN108630217A (en) * 2017-03-21 2018-10-09 豪威科技股份有限公司 The echo cancelling system and method for residual echo with reduction
CN111031448A (en) * 2019-11-12 2020-04-17 西安讯飞超脑信息科技有限公司 Echo cancellation method, echo cancellation device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10446165B2 (en) * 2017-09-27 2019-10-15 Sonos, Inc. Robust short-time fourier transform acoustic echo cancellation during audio playback
CN111755020B (en) * 2020-08-07 2023-02-28 南京时保联信息科技有限公司 Stereo echo cancellation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106210368A (en) * 2016-06-20 2016-12-07 百度在线网络技术(北京)有限公司 The method and apparatus eliminating multiple channel acousto echo
CN108630217A (en) * 2017-03-21 2018-10-09 豪威科技股份有限公司 The echo cancelling system and method for residual echo with reduction
CN107105366A (en) * 2017-06-15 2017-08-29 歌尔股份有限公司 A kind of multi-channel echo eliminates circuit, method and smart machine
CN107564539A (en) * 2017-08-29 2018-01-09 苏州奇梦者网络科技有限公司 Towards the acoustic echo removing method and device of microphone array
CN108630219A (en) * 2018-05-08 2018-10-09 北京小鱼在家科技有限公司 A kind of audio frequency processing system, method, apparatus, equipment and storage medium
CN111031448A (en) * 2019-11-12 2020-04-17 西安讯飞超脑信息科技有限公司 Echo cancellation method, echo cancellation device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115696140A (en) 2023-02-03

Similar Documents

Publication Publication Date Title
CN110335620B (en) Noise suppression method and device and mobile terminal
US10504539B2 (en) Voice activity detection systems and methods
US8065115B2 (en) Method and system for identifying audible noise as wind noise in a hearing aid apparatus
WO2021196905A1 (en) Voice signal dereverberation processing method and apparatus, computer device and storage medium
KR20100065811A (en) Apparatus and method for speech recognition by using source separation and source identification
CN108962231B (en) Voice classification method, device, server and storage medium
US20110022361A1 (en) Sound processing device, sound processing method, and program
CN110827843A (en) Audio processing method and device, storage medium and electronic equipment
US9928848B2 (en) Audio signal noise reduction in noisy environments
CN105225672B (en) Merge the system and method for the dual microphone orientation noise suppression of fundamental frequency information
CN103903612A (en) Method for performing real-time digital speech recognition
CN103811023A (en) Audio processing device, method and program
CN111883135A (en) Voice transcription method and device and electronic equipment
CN111968651A (en) WT (WT) -based voiceprint recognition method and system
CN105931648B (en) Audio signal solution reverberation method and device
CN112992190B (en) Audio signal processing method and device, electronic equipment and storage medium
CN115696140B (en) Classroom audio multichannel echo cancellation method
CN106340310B (en) Speech detection method and device
Valero et al. Classification of audio scenes using narrow-band autocorrelation features
CN116312561A (en) Method, system and device for voice print recognition, authentication, noise reduction and voice enhancement of personnel in power dispatching system
CN111429937B (en) Voice separation method, model training method and electronic equipment
JP2003271168A (en) Method, device and program for extracting signal, and recording medium recorded with the program
KR101096091B1 (en) Apparatus for Separating Voice and Method for Separating Voice of Single Channel Using the Same
CN111782860A (en) Audio detection method and device and storage medium
Sapozhnykov Sub-band detector for wind-induced noise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant