WO2023065317A1 - 会议终端及回声消除方法 - Google Patents
会议终端及回声消除方法 Download PDFInfo
- Publication number
- WO2023065317A1 WO2023065317A1 PCT/CN2021/125763 CN2021125763W WO2023065317A1 WO 2023065317 A1 WO2023065317 A1 WO 2023065317A1 CN 2021125763 W CN2021125763 W CN 2021125763W WO 2023065317 A1 WO2023065317 A1 WO 2023065317A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- omnidirectional
- sound
- conference terminal
- microphones
- echo
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000005236 sound signal Effects 0.000 claims abstract description 70
- 239000013598 vector Substances 0.000 claims abstract description 57
- 238000012545 processing Methods 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 58
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 230000008030 elimination Effects 0.000 claims 1
- 238000003379 elimination reaction Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 13
- 230000000694 effects Effects 0.000 abstract description 6
- 230000001629 suppression Effects 0.000 description 12
- 238000003672 processing method Methods 0.000 description 9
- 238000013461 design Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000002592 echocardiography Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 239000000919 ceramic Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
- H04M9/082—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/02—Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
Definitions
- the present application relates to the technical field of voice processing, in particular to a conference terminal, an echo cancellation method and device, and a sound pickup device.
- the common structure of the sound pickup device of the audio and video conferencing system is that, with the speaker as the center, a dipole directional microphone is respectively arranged in the three surrounding connecting rods (a, b, c).
- Pole directional microphones rely on acoustic design to form a dipole beam pattern that has the ability to suppress echoes from speakers and get speech in the direction of the target.
- a dipole directional microphone relies on acoustic design to form a dipole beam pattern, and the gain in the direction of the speaker is not small enough to effectively suppress the noise from the speaker. direction echo.
- the present application provides a conference terminal to solve the problem in the prior art that the echo cancellation effect of the conference terminal is poor.
- the present application additionally provides an echo cancellation method and device, and sound pickup equipment.
- This application provides a conference terminal, including:
- At least one omnidirectional microphone set comprising at least two omnidirectional microphones
- the memory is used to store the program for implementing the echo cancellation method. After the device is powered on and runs the program of the method through the processor, the following steps are performed:
- For the group of omnidirectional microphones according to the weight vector, determine a weighted sum of at least two sound signals corresponding to the at least two omnidirectional microphones as the sound signal after echo cancellation.
- the at least two omnidirectional microphones are two omnidirectional microphones.
- the at least one omnidirectional microphone group is three omnidirectional microphone groups centered on the speaker, and the three omnidirectional microphone groups cover omnidirectional target sound sources.
- the present application also provides an echo cancellation method for a conference terminal, where the conference terminal includes: a loudspeaker, at least one omnidirectional microphone group, and the omnidirectional microphone group includes at least two omnidirectional microphones;
- the methods include:
- For the group of omnidirectional microphones according to the weight vector of the beamformer, determine the weighted sum of at least two sound signals corresponding to the at least two omnidirectional microphones as the sound signal after echo cancellation .
- the determining a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern includes:
- the weight vector is determined according to the noise covariance matrix and the steering vector through a minimum variance distortion-free response MVDR beamforming algorithm.
- the noise covariance matrix is determined in the following manner:
- the sound signal that comprises preset sound collected by described omnidirectional microphone, determine speech autocorrelation matrix, as noise covariance matrix.
- the autocorrelation matrix is updated according to the sound signal including the conference sound collected by the omnidirectional microphone, as an updated noise covariance matrix.
- the echo-eliminated sound signals corresponding to the at least two omnidirectional microphones of the target omnidirectional microphone group are selected.
- the present application also provides an echo cancellation device, the device is located in a conference terminal, and the conference terminal includes: a loudspeaker, at least one omnidirectional microphone group, and the omnidirectional microphone group includes at least two omnidirectional microphones;
- the devices include:
- a parameter determination unit configured to determine a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern, so as to suppress the echo signal in the speaker direction and enhance the sound signal in the target direction;
- a sound signal acquisition unit configured to collect sound signals through the omnidirectional microphone
- the beamforming unit is configured to, for the group of omnidirectional microphones, determine the weighted sum of at least two sound signals corresponding to the at least two omnidirectional microphones according to the weight vector of the beamformer, as The sound signal after echo cancellation.
- the application also provides a sound pickup device, including:
- At least one omnidirectional microphone set comprising at least two omnidirectional microphones
- the memory is used to store a program for implementing the above echo cancellation method, and the terminal is powered on to run the program of the method through the processor.
- the present application also provides an electronic device, including:
- a processor and a memory the memory is used to store a program for implementing the above method, and the device is powered on and runs the program of the method through the processor.
- the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, it causes the computer to execute the above-mentioned various methods.
- the present application also provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the above-mentioned various methods.
- the conference terminal provided in the embodiment of the present application includes: a loudspeaker, and at least one omnidirectional microphone group, where the omnidirectional microphone group includes at least two omnidirectional microphones.
- the conference terminal by determining the weight vector of the beamformer that makes the at least two omnidirectional microphones form a dipole beam pattern, to suppress the echo signal in the speaker direction and enhance the sound signal in the target direction; through the The omnidirectional microphone collects multiple microphone signals; for the at least two omnidirectional microphones, according to the weight vector, determine a weighted sum of the at least two microphone signals as the echo-eliminated sound signal.
- two or more omnidirectional microphones can be used to replace a dipole directional microphone. Combined with beamforming technology, the beam pattern can form a smaller gain in the direction of the speaker, and the gain is not limited by the acoustic design of the microphone. ; Therefore, the echo cancellation effect can be effectively improved.
- FIG. 1 is a schematic structural diagram of a conference terminal in the prior art
- Fig. 2 is a schematic structural diagram of an embodiment of a conference terminal provided by the present application.
- Fig. 3 is a schematic flowchart of an embodiment of the echo cancellation method provided by the present application.
- a conference terminal an echo cancellation method and device, and a sound pickup device are provided.
- Various schemes are described in detail in the following examples one by one.
- the conference terminal may include: a loudspeaker, and at least one omnidirectional microphone set, where the omnidirectional microphone set includes at least two omnidirectional microphones.
- the conference terminal can be used in an audio and video conference system.
- the audio and video conferencing system is two or more individuals or groups in different places, through transmission lines and conference terminals and other equipment, the audio, video and document data are exchanged to achieve real-time and interactive communication, so as to realize simultaneous conferences. system equipment.
- the conference terminal may be a speakerphone, or a video conference terminal including a display and a camera.
- the loudspeaker also known as "horn" is a transducer device that converts electrical signals into acoustic signals.
- the omnidirectional microphone is a microphone that can equally receive sounds from all sides, such as magnetic, ceramic and electret microphones are all omnidirectional microphones.
- the conference terminal includes a loudspeaker and multiple omnidirectional microphone groups, and the multiple omnidirectional microphone groups can be installed around the loudspeaker to cover all directions of the conference site.
- a plurality of connecting rods may be extended from the loudspeaker, and an omnidirectional microphone group is mounted on each connecting rod.
- Each omnidirectional microphone group includes at least two omnidirectional microphones instead of one directional microphone in the prior art.
- the core technology of the conference terminal provided by the embodiment of the present application includes: how to combine the beamforming technology to form each omnidirectional microphone group into a dipole beam pattern, so as to suppress the echo signal in the speaker direction and enhance the sound signal in the target direction.
- the dipole beam pattern formed based on two or more omnidirectional microphones in the embodiment of the present application results in less gain and therefore better echo suppression.
- the reason for this is that a dipole directional microphone does not have enough gain by "acoustic design", while two omnidirectional microphones based on a beamforming algorithm (such as MVDR) can gain less than what is acoustically achievable.
- FIG. 3 is a schematic diagram of an echo cancellation process of an embodiment of the conference terminal of the present application.
- echo cancellation includes the following processing steps:
- Step S301 Determine a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern, so as to suppress the echo signal in the speaker direction and enhance the sound signal in the target direction.
- the conference terminal provided in this embodiment combines the beamforming technology to form a dipole beam pattern for each group of omnidirectional microphones.
- a variety of beamforming algorithms can be used to form a dipole beam pattern for each omnidirectional microphone group, such as a minimum variance distortion-free response MVDR beamforming algorithm, a differential beamforming algorithm, and the like.
- each group of omnidirectional microphones is formed into a dipole beam pattern by an MVDR algorithm.
- step S301 may include the following sub-steps:
- Step S3031 Determine the noise covariance matrix and steering vector of the conference terminal.
- the principle of the MVDR algorithm is to minimize the noise power spectrum while ensuring that the target direction is not distorted, as shown in formula 1:
- w represents the weight vector
- R represents the noise covariance matrix
- w H Rw represents the noise power spectrum
- the objective function is min w H Rw, which represents the minimum noise power spectrum.
- the solution result of this objective function is the weight vector w. According to the weight vector w, the weighted sum of the multiple sound signals in one group is calculated, and the result is the sound signal after the echo is suppressed.
- the noise covariance matrix can be determined in two stages.
- One stage is the initialization stage of the conference terminal, which can determine the initial value of the noise covariance matrix; the other stage is during the use of the conference terminal, when the participants in the environment where the conference terminal is located are not speaking (the target sound source is muted), the noise can be updated Covariance matrix, to better adapt to the conference environment, improve the accuracy of the noise covariance matrix, thereby improving the accuracy of the weight vector w, thereby improving the echo suppression effect.
- the noise covariance matrix is related to the conference environment where the conference terminal is located.
- the same conference terminal is usually used in multiple conference environments, so the noise covariance matrix can be determined when the conference terminal is initialized.
- the preset sound data can be played; then, the voice autocorrelation matrix can be determined according to the multi-channel sound signals including the preset sound collected by the omnidirectional microphone, as the noise coordinating matrix. variance matrix. For example, let the speaker of the conference terminal (such as a speakerphone) play the speech for 2 to 4 seconds, and calculate the autocorrelation matrix of this speech as the noise covariance matrix. With this processing method, the conference terminal can obtain a better echo suppression effect in different conference environments.
- the self-directed audio is updated according to the multi-channel sound signal collected by the omnidirectional microphone including the conference sound (the speaker's voice played by the loudspeaker). Correlation matrix, as the updated noise covariance matrix.
- the speech autocorrelation matrix of this segment is calculated as the noise covariance matrix.
- a smoothing method may be used to update the previous noise covariance matrix.
- a smoothing method may be used to update the previous noise covariance matrix.
- R t+1 ⁇ R t-1 +(1- ⁇ )R t Formula 3
- Step S3033 Determine the weight vector according to the noise covariance matrix and the steering vector through the minimum variance distortion-free response MVDR beamforming algorithm.
- the weight vector may be determined based on the noise covariance matrix and the steering vector according to Formula 2.
- Step S303 collecting multiple sound signals through the omnidirectional microphone.
- multiple omnidirectional microphones of the conference terminal can be used to collect multiple sound signals.
- Step S305 For the at least two omnidirectional microphones, according to the weight vector, determine a weighted sum of at least two sound signals as an echo-eliminated sound signal.
- the weight vector may include weight vectors respectively corresponding to respective omnidirectional microphones of the at least two omnidirectional microphones. For example, if one omnidirectional microphone group includes two omnidirectional microphone groups, the weight vector includes two weight vectors.
- any omnidirectional microphone group determines the weighted sum of at least two sound signals as the sound signal after echo cancellation. For example, if the conference terminal includes three omnidirectional microphone groups, three channels of echo-cancelled sound signals are obtained.
- the number of omnidirectional microphone groups is usually related to the size of the conference environment space. For most environments with limited space, three omnidirectional microphone groups can cover the target sound source in all directions around the conference terminal; For a conference environment with a large space, more omnidirectional microphone groups can be set up, such as 4 groups, 5 groups, etc., to cover all directions of the conference site.
- each omnidirectional microphone group includes two omnidirectional microphones
- a dipole beam pattern can be formed by combining the beamforming technology. In this way, the echo cancellation effect can be improved, and the equipment cost can be reduced.
- each omnidirectional microphone group may also include more than two omnidirectional microphones, but this will increase equipment costs.
- each omnidirectional microphone group includes two omnidirectional microphones
- the distance between the two omnidirectional microphones will affect the echo suppression performance.
- the distance between two omnidirectional microphones is 3cm better than 7cm.
- the gain facing the loudspeaker direction (broadside direction) formed by the prior art is larger than the gain facing the loudspeaker direction formed by the solution of the present application, so the solution of the present application can suppress echo better.
- the target-oriented gain (endfire direction) formed by the prior art is smaller than the target-oriented gain formed by the solution of the present application, so the solution of the present application can better enhance conference voice.
- a speaker may move in a conference site, such as a host at a party.
- the echo cancellation process may also include the following steps:
- Step S401 If the movement of the target sound source is detected, determine the signal-to-noise ratio of the omnidirectional microphone.
- the target sound source is moving through the existing technology, and to determine the signal-to-noise ratio (SNR) of each omnidirectional microphone through the existing technology.
- SNR signal-to-noise ratio
- Step S403 According to the signal-to-noise ratio, select the echo-cancelled sound signal corresponding to the at least two omnidirectional microphones of the target omnidirectional microphone group.
- step S401 and step S403 by executing step S401 and step S403, a good echo suppression effect can still be obtained when the sound source moves.
- the conference terminal provided in the embodiments of the present application includes: a loudspeaker, and at least one omnidirectional microphone group, where the omnidirectional microphone group includes at least two omnidirectional microphones.
- the conference terminal by determining the weight vector of the beamformer that makes the at least two omnidirectional microphones form a dipole beam pattern, to suppress the echo signal in the speaker direction and enhance the sound signal in the target direction; through the The omnidirectional microphone collects sound signals; for the at least two omnidirectional microphones, according to the weight vector of the beamformer, determine the weighted sum of at least two sound signals as the echo-eliminated sound signal.
- two or more omnidirectional microphones can be used to replace a dipole directional microphone.
- the beam pattern can form a smaller gain in the direction of the speaker, and the gain is not limited by the acoustic design of the microphone. ; Therefore, the echo cancellation effect can be effectively improved.
- a conference terminal is provided, and correspondingly, the present application also provides an echo cancellation method.
- the method corresponds to the embodiment of the above-mentioned device. Since the method embodiments are basically similar to the device embodiments, the description is relatively simple, and for relevant parts, please refer to part of the description of the device embodiments. The method embodiments described below are illustrative only.
- the present application further provides an echo cancellation method, which is used in a conference terminal, and the conference terminal includes: a loudspeaker, and at least one omnidirectional microphone group, and the omnidirectional microphone group includes at least two omnidirectional microphones.
- the method may include the following steps:
- Step S301 Determine a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern, so as to suppress the echo signal in the speaker direction and enhance the sound signal in the target direction.
- Step S303 collecting sound signals through the omnidirectional microphone.
- Step S305 For the group of omnidirectional microphones, according to the weight vector of the beamformer, determine the weighted sum of at least two sound signals corresponding to the at least two omnidirectional microphones as the echo-eliminated sound signal.
- step S301 may include the following sub-steps: determine the noise covariance matrix and steering vector of the conference terminal; use the minimum variance distortion-free response MVDR beamforming algorithm to determine the noise covariance matrix and steering vector according to the The weight vector.
- the noise covariance matrix can be determined in the following manner: when the conference terminal is started, the preset sound data is played; according to the sound signal collected by the omnidirectional microphone that includes the preset sound, the voice Autocorrelation matrix, as noise covariance matrix.
- the conference terminal can obtain a better echo suppression effect in different conference environments.
- the method may further include the following steps: when the conference terminal is working, if a target direction is detected to be muted, updating the self Correlation matrix, as the updated noise covariance matrix.
- this processing method can better adapt to the conference environment, improve the accuracy of the noise covariance matrix, thereby improving the accuracy of the weight vector, and further improving the echo suppression effect.
- the method may further include the following steps: if the movement of the target sound source is detected, then determine the signal-to-noise ratio of the omnidirectional microphone; The echo-canceled sound signal corresponding to the at least two omnidirectional microphones.
- an echo cancellation method is provided, and correspondingly, the present application also provides an echo cancellation device.
- the device corresponds to the embodiment of the above-mentioned method. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to the part of the description of the method embodiment.
- the device embodiments described below are illustrative only.
- the present application further provides an echo canceling device, the device is located in a conference terminal, and the conference terminal includes: a loudspeaker, at least one omnidirectional microphone group, and the omnidirectional microphone group includes at least two omnidirectional microphones;
- the devices include:
- a parameter determination unit configured to determine a weight vector of a beamformer that enables the at least two omnidirectional microphones to form a dipole beam pattern, so as to suppress the echo signal in the speaker direction and enhance the sound signal in the target direction;
- a sound signal acquisition unit configured to collect sound signals through the omnidirectional microphone
- the beamforming unit is configured to, for the group of omnidirectional microphones, determine the weighted sum of at least two sound signals corresponding to the at least two omnidirectional microphones according to the weight vector of the beamformer, as The sound signal after echo cancellation.
- the parameter determination unit may be specifically configured to determine the noise covariance matrix and steering vector of the conference terminal; through the minimum variance distortion-free response MVDR beamforming algorithm, according to the noise covariance matrix and steering vector, Determine the weight vector.
- the noise covariance matrix may be determined in the following manner: when the conference terminal is started, the preset sound data is played; according to the sound signal collected by the omnidirectional microphone including the preset sound, determine Speech autocorrelation matrix, as noise covariance matrix.
- the conference terminal can obtain a better echo suppression effect in different conference environments.
- the device may also include:
- a noise covariance matrix update unit configured to update the autocorrelation matrix according to the sound signal collected by the omnidirectional microphone including the sound of the meeting if the target direction is mute when the conference terminal is working, as an update
- the noise covariance matrix of Using this processing method can better adapt to the conference environment, improve the accuracy of the noise covariance matrix, thereby improving the accuracy of the weight vector, and further improving the echo suppression effect.
- the device may also include:
- a signal-to-noise ratio determining unit configured to determine the signal-to-noise ratio of the omnidirectional microphone if the target sound source is detected to be moving if the target direction is detected to be silent when the conference terminal is working;
- the echo suppression signal selection unit is configured to select the echo-cancelled sound signal corresponding to the at least two omnidirectional microphones of the target omnidirectional microphone group according to the signal-to-noise ratio.
- an echo cancellation method is provided, and correspondingly, the present application also provides an electronic device.
- the device corresponds to an embodiment of the method described above. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to part of the description of the method embodiment.
- the device embodiments described below are illustrative only.
- the present application further provides an electronic device, including: a loudspeaker; at least one omnidirectional microphone set, the omnidirectional microphone set including at least two omnidirectional microphones; a processor; and a memory.
- the memory is used to store a program for implementing the above echo cancellation method, and the terminal is powered on to run the program of the method through the processor.
- the electronic device may be an audio-video conference terminal, or a sound pickup device.
- a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
- processors CPUs
- input/output interfaces network interfaces
- memory volatile and non-volatile memory
- Memory may include non-permanent storage in computer-readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read-only memory (ROM) or flash RAM. Memory is an example of computer readable media.
- RAM random access memory
- ROM read-only memory
- flash RAM flash random access memory
- Computer-readable media including permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for information storage.
- Information may be computer readable instructions, data structures, modules of a program, or other data.
- Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
- computer-readable media excludes non-transitory computer-readable media, such as modulated data signals and carrier waves.
- the embodiments of the present application may be provided as methods, systems or computer program products. Accordingly, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
本申请公开了会议终端,回声消除方法和装置,拾音设备。其中,会议终端包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风。所述会议终端通过确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;通过全向性麦克风采集声音信号;针对所述至少两个全向性麦克风,根据所述权值向量,确定至少两路声音信号的加权之和,作为回声消除信号。采用这种处理方式,使得用两个以上全向性麦克风替代一个偶极子指向性麦克风,结合波束形成技术,让波束模式在扬声器方向上形成更小的增益,因此,可以有效提升回声消除效果。
Description
本申请涉及语音处理技术领域,具体涉及会议终端,回声消除方法和装置,拾音设备。
互联网技术带来人们通信工具的改变,基于云计算的音视频会议系统逐步普及。音视频会议终端在使用中可能会产生回音,导致演讲者能听到自己说话的声音,影响会议效果。因此,视频会议环境下的回声消除一直都是研究的热点。
如图1所示,音视频会议系统的拾音装置的常见结构是,以扬声器为中心,在周围的三个连接杆(a,b,c)中分别设置一个偶极子指向性麦克风,偶极子指向性麦克风依靠声学设计形成偶极子波束模式,该波束模式具有抑制扬声器的回声,并获得目标方向的语音的能力。
然而,在实现本发明过程中,发明人发现现有方案至少存在如下问题:一个偶极子指向性麦克风依靠声学设计形成偶极子波束模式,在扬声器方向的增益不够小,无法有效抑制来自扬声器方向的回声。
发明内容
本申请提供会议终端,以解决现有技术存在的会议终端回声抵消效果较差的问题。本申请另外提供回声消除方法和装置,拾音设备。
本申请提供一种会议终端,包括:
扬声器;
至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;
处理器;以及
存储器,用于存储实现回声消除方法的程序,该设备通电并通过所述处 理器运行该方法的程序后,执行下述步骤:
确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;
通过所述全向性麦克风,采集声音信号;
针对所述全向性麦克风组,根据所述权值向量,确定与所述至少两个全向性麦克风对应的至少两路声音信号的加权之和,作为消除回声后的声音信号。
可选的,所述至少两个全向性麦克风为两个全向性麦克风。
可选的,所述至少一个全向性麦克风组为以扬声器为中心的三个全向性麦克风组,所述三个全向性麦克风组覆盖全方向的目标声源。
本申请还提供一种回声消除方法,用于会议终端,所述会议终端包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;
所述方法包括:
确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;
通过所述全向性麦克风,采集声音信号;
针对所述全向性麦克风组,根据所述波束形成器的权值向量,确定与所述至少两个全向性麦克风对应的至少两路声音信号的加权之和,作为消除回声后的声音信号。
可选的,所述确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,包括:
确定所述会议终端的噪声协方差矩阵和导向矢量;
通过最小方差无失真响应MVDR波束形成算法,根据所述噪声协方差矩阵和导向矢量,确定所述权值向量。
可选的,所述噪声协方差矩阵采用如下方式确定:
在所述会议终端启动时,播放预设的声音数据;
根据所述全向性麦克风采集的包括预设声音的声音信号,确定语音自相 关矩阵,作为噪声协方差矩阵。
可选的,还包括:
在所述会议终端工作时,若检测到目标方向静音,则根据所述全向性麦克风采集的包括会议声音的声音信号,更新所述自相关矩阵,作为更新的噪声协方差矩阵。
可选的,还包括:
若检测到目标声源移动,则确定所述全向性麦克风的信噪比;
根据所述信噪比,选取与目标全向性麦克风组的所述至少两个全向性麦克风对应的所述消除回声后的声音信号。
本申请还提供一种回声消除装置,所述装置位于会议终端,所述会议终端包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;
所述装置包括:
参数确定单元,用于确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;
声音信号采集单元,用于通过所述全向性麦克风,采集声音信号;
波束形成单元,用于针对所述全向性麦克风组,根据所述波束形成器的权值向量,确定与所述至少两个全向性麦克风对应的至少两路声音信号的加权之和,作为消除回声后的声音信号。
本申请还提供一种拾音设备,包括:
扬声器;
至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;
处理器;以及
存储器,用于存储实现上述回声消除方法的程序,该终端通电并通过所述处理器运行该方法的程序。
本申请还提供一种电子设备,包括:
处理器和存储器;存储器,用于存储实现上述方法的程序,该设备通电并通过所述处理器运行该方法的程序。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各种方法。
本申请还提供一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各种方法。
与现有技术相比,本申请具有以下优点:
本申请实施例提供的会议终端,包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风。所述会议终端,通过确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;通过所述全向性麦克风,采集多路麦克风信号;针对所述至少两个全向性麦克风,根据所述权值向量,确定至少两路麦克风信号的加权之和,作为消除回声后的声音信号。采用这种处理方式,使得用两个以上全向性麦克风替代一个偶极子指向性麦克风,结合波束形成技术,让波束模式在扬声器方向上形成更小的增益,增益不受麦克风声学设计的限制;因此,可以有效提升回声消除效果。
构成本公开的一部分的附图用来提供对本公开的进一步理解,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1现有技术中会议终端的结构示意图;
图2本申请提供的会议终端的实施例的结构示意图;
图3本申请提供的回声消除方法的实施例的流程示意图。
在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申 请能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请内涵的情况下做类似推广,因此本申请不受下面公开的具体实施的限制。
在本申请中,提供了会议终端,回声消除方法和装置,拾音设备。在下面的实施例中逐一对各种方案进行详细说明。
第一实施例
请参看图2,其为本申请的会议终端的实施例的结构示意图。在本实施例中,所述会议终端可包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风。
所述会议终端,可用于音视频会议系统。音视频会议系统是两个或两个以上不同地方的个人或群体,通过传输线路及会议终端等设备,将声音、影像及文件资料互传,实现即时且互动的沟通,以实现同时进行会议的系统设备。所述会议终端,可以是扬声电话(Speakerphone),也可以是包括显示器和摄像头的视频会议终端。
所述扬声器又称“喇叭”,是一种把电信号转变为声信号的换能器件。
所述全向性麦克风,是可等量接受各方面的声音的麦克风,如磁性、陶瓷和驻极体式麦克风都是全向性麦克风。
所述会议终端包括扬声器和多个全向性麦克风组,所述多个全向性麦克风组可安装在扬声器周围,以覆盖会议现场的全方向。具体实施时,可由扬声器延伸出多个连接杆,每个连接杆上安装一个全向性麦克风组。
经实验表明,采用如图2所示结构的会议终端,以扬声器为中心,周围安装三个全向性麦克风组,即可覆盖会议终端周围全方向的目标声源。
每个全向性麦克风组包括至少两个全向性麦克风,以代替现有技术中的一个指向性麦克风。本申请实施例提供的会议终端的核心技术包括:如何结合波束形成技术,将每个全向性麦克风组形成偶极子波束模式,以抑制扬声器方向的回声信号,增强目标方向的声音信号。
需要强调的是,与现有技术中基于一个偶极子指向性麦克风形成的偶极子波束模式相比,本申请实施例中基于两个以上全向性麦克风形成的偶极子 波束模式在扬声器方向形成更小的增益,因此可更好的抑制回声。其原因在于:一个偶极子指向性麦克风依靠“声学设计”的增益不够小,而两个全向性麦克风基于波束形成算法(如MVDR),可以比声学上能达到的增益更小。
请参看图3,其为本申请的会议终端的实施例的回声消除流程示意图。在本实施例中,回声消除包括如下处理步骤:
步骤S301:确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号。
本实施例提供的会议终端,结合波束形成技术,将每个全向性麦克风组形成偶极子波束模式。具体实施时,可采用多种波束形成算法,将每个全向性麦克风组形成偶极子波束模式,如最小方差无失真响应MVDR波束形成算法、差分波束形成算法等。
在一个示例中,通过MVDR算法,将每个全向性麦克风组形成偶极子波束模式。具体实施时,步骤S301可包括如下子步骤:
步骤S3031:确定会议终端的噪声协方差矩阵和导向矢量。
MVDR算法原理是最小化噪声功率谱,同时保证目标方向不失真,如公式1:
min W
HRw
st.w
Hd(θ)=1 (公式1)
其中,w表示权值向量,R表示噪声协方差矩阵,w
HRw表示噪声功率谱,目标函数为min w
HRw,表示最小化噪声功率谱。约束条件为w
Hd(θ)=1,可保证目标方向不失真。该目标函数的求解结果为权值向量w。根据权值向量w,计算一组中多路声音信号的加权之和,该结果即为抑制回声后的声音信号。
根据公式1可推导出权值向量w的计算公式,如公式2:
其中,d(θ)表示导向矢量,R表示噪声协方差矩阵。
在本实施例中,可在两个阶段确定噪声协方差矩阵。一个阶段为会议终端初始化阶段,可确定噪声协方差矩阵的初始值;另一个阶段为会议终端使用过程中,当会议终端所在环境的参会人没说话(目标声源静音)时,可更新噪声协方差矩阵,以更好的适应会议环境,提升噪声协方差矩阵的准确度,从而提升权值向量w的准确度,进而提升回声抑制效果。
1)会议终端初始化阶段确定噪声协方差矩阵的初始值
所述噪声协方差矩阵,与会议终端所在的会议环境有关。同一会议终端通常会在多个会议环境中使用,因此可在会议终端初始化时,确定噪声协方差矩阵。
在一个示例中,可在会议终端启动时,播放预设的声音数据;然后,可根据所述全向性麦克风采集的包括预设声音的多路声音信号,确定语音自相关矩阵,作为噪声协方差矩阵。例如,先让会议终端(如扬声电话)的扬声器播放2至4秒的语音,计算这一段语音的自相关矩阵,作为噪声协方差矩阵。采用这种处理方式,使得会议终端在不同的会议环境中都可获得较好的回声抑制效果。
由于确定语音信号的自相关矩阵属于较为成熟的现有技术,因此此处不再赘述。
2)会议终端使用阶段更新噪声协方差矩阵
在一个示例中,在会议终端工作时,若检测到目标方向静音,则根据所述全向性麦克风采集的包括会议声音(扬声器播放的对方发言人声音)的多路声音信号,更新所述自相关矩阵,作为更新的噪声协方差矩阵。这样,在会议终端工作时,当目标声源没有发声,扬声器在播放声音时,计算这一段的语音自相关矩阵作为噪声协方差矩阵。
具体实施时,可采用顺滑(smoothing)方式来更新以往的噪声协方差矩阵。如公式3:
R
t+1=αR
t-1+(1-α)R
t公式3
其中,α∈[0,1]是常数系数。该公式表示当前t+1时刻的噪声协方差矩阵与第t时刻和第t-1时刻的噪声协方差矩阵有关。
由于上述检测目标方向是否静音、更新噪声协方差矩阵R均属于较为成熟的现有技术,此外导向矢量d(θ)也属于较为成熟的现有技术,因此此处不再赘述。
步骤S3033:通过最小方差无失真响应MVDR波束形成算法,根据所述噪声协方差矩阵和导向矢量,确定所述权值向量。
本步骤可根据公式2,基于所述噪声协方差矩阵和导向矢量,确定所述权值向量。
步骤S303:通过所述全向性麦克风,采集多路声音信号。
在会议进行期间,可通过会议终端的多个全向性麦克风,采集多路声音信号。
步骤S305:针对所述至少两个全向性麦克风,根据所述权值向量,确定至少两路声音信号的加权之和,作为消除回声后的声音信号。
所述权值向量,可包括与所述至少两个全向性麦克风中的各个全向性麦克风分别对应的权值矢量。例如,一个全向性麦克风组包括两个全向性麦克风组,则权值向量包括两个权值矢量。
针对任意一个全向性麦克风组,根据所述波束形成器的权值向量,确定至少两路声音信号的加权之和,作为消除回声后的声音信号。例如,会议终端包括三个全向性麦克风组,则获得三路消除回声后的声音信号。
需要说明的是,全向性麦克风组的组数通常与会议环境空间的大小有关,对于大多数空间有限的环境,三个全向性麦克风组即可覆盖会议终端周围全方向的目标声源;对于空间较大的会议环境,可设置更多个全向性麦克风组,如4组、5组等,以覆盖会议现场的全方向。
经实验表明,在每个全向性麦克风组包括两个全向性麦克风的情况下,结合波束形成技术,即可形成偶极子波束模式。这样,即可提升回声消除效果,又可降低设备成本。具体实施时,也可在每个全向性麦克风组包括两个以上的全向性麦克风,但是会增加设备成本。
此外,在每个全向性麦克风组包括两个全向性麦克风的情况下,两个全向性麦克风之间的间距会对回声抑制性能产生影响。经实验表明,对于抑制 扬声器回声方向的性能,两个全向性麦克风之间间隔3cm比7cm要好一些。
通过对比图1和图2可见,现有技术形成的面向扬声器方向(broadside方向)的增益比本申请方案形成的面向扬声器方向的增益要大,因此本申请方案可以更好的抑制回声。同时,现有技术形成的面向目标方向(endfire方向)的增益小于本申请方案形成的面向目标方向的增益,因此本申请方案可以更好的增强会议语音。
在一个示例中,发言者在会议现场可能会移动,如晚会现场的主持人等。在这种情况下,回声消除的处理过程还可包括如下步骤:
步骤S401:若检测到目标声源移动,则确定所述全向性麦克风的信噪比。
具体实施时,可通过现有技术检测目标声源是否移动,以及通过现有技术确定各个全向性麦克风的信噪比SNR。
步骤S403:根据所述信噪比,选取与目标全向性麦克风组的所述至少两个全向性麦克风对应的所述消除回声后的声音信号。
例如,选取信噪比排在第一位的一个全向性麦克风组通过上述步骤S301至S305得到的消除回声后的声音信号。
本实施例通过执行步骤S401和步骤S403,可在声源移动时,仍然获得较好的回声抑制效果。
从上述实施例可见,本申请实施例提供的会议终端,包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风。所述会议终端,通过确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;通过所述全向性麦克风,采集声音信号;针对所述至少两个全向性麦克风,根据所述波束形成器的权值向量,确定至少两路声音信号的加权之和,作为消除回声后的声音信号。采用这种处理方式,使得用两个以上全向性麦克风替代一个偶极子指向性麦克风,结合波束形成技术,让波束模式在扬声器方向上形成更小的增益,增益不受麦克风声学设计的限制;因此,可以有效提升回声消除效果。
第二实施例
在上述的实施例中,提供了一种会议终端,与之相对应的,本申请还提供一种回声消除方法。该方法是与上述设备的实施例相对应。由于方法实施例基本相似于设备实施例,所以描述得比较简单,相关之处参见设备实施例的部分说明即可。下述描述的方法实施例仅仅是示意性的。
本申请另外提供一种回声消除方法,用于会议终端,所述会议终端包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风。在本实施例中,所述方法可包括如下步骤:
步骤S301:确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号。
步骤S303:通过所述全向性麦克风,采集声音信号。
步骤S305:针对所述全向性麦克风组,根据所述波束形成器的权值向量,确定与所述至少两个全向性麦克风对应的至少两路声音信号的加权之和,作为消除回声后的声音信号。
具体实施时,步骤S301可包括如下子步骤:确定所述会议终端的噪声协方差矩阵和导向矢量;通过最小方差无失真响应MVDR波束形成算法,根据所述噪声协方差矩阵和导向矢量,确定所述权值向量。
具体实施时,所述噪声协方差矩阵可采用如下方式确定:在所述会议终端启动时,播放预设的声音数据;根据所述全向性麦克风采集的包括预设声音的声音信号,确定语音自相关矩阵,作为噪声协方差矩阵。采用这种处理方式,使得会议终端在不同的会议环境中都可获得较好的回声抑制效果。
在一个示例中,所述方法还可包括如下步骤:在所述会议终端工作时,若检测到目标方向静音,则根据所述全向性麦克风采集的包括会议声音的声音信号,更新所述自相关矩阵,作为更新的噪声协方差矩阵。采用这种处理方式,可以更好的适应会议环境,提升噪声协方差矩阵的准确度,从而提升权值向量的准确度,进而提升回声抑制效果。
在一个示例中,所述方法还可包括如下步骤:若检测到目标声源移动,则确定所述全向性麦克风的信噪比;根据所述信噪比,选取与目标全向性麦 克风组的所述至少两个全向性麦克风对应的所述消除回声后的声音信号。采用这种处理方式,可以在声源移动时,仍然获得较好的回声抑制效果。
第三实施例
在上述的实施例中,提供了一种回声消除方法,与之相对应的,本申请还提供一种回声消除装置。该装置是与上述方法的实施例相对应。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
本申请另外提供一种回声消除装置,所述装置位于会议终端,所述会议终端包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;
所述装置包括:
参数确定单元,用于确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;
声音信号采集单元,用于通过所述全向性麦克风,采集声音信号;
波束形成单元,用于针对所述全向性麦克风组,根据所述波束形成器的权值向量,确定与所述至少两个全向性麦克风对应的至少两路声音信号的加权之和,作为消除回声后的声音信号。
在一个示例中,所述参数确定单元可具体用于确定所述会议终端的噪声协方差矩阵和导向矢量;通过最小方差无失真响应MVDR波束形成算法,根据所述噪声协方差矩阵和导向矢量,确定所述权值向量。
在一个示例中,所述噪声协方差矩阵可采用如下方式确定:在所述会议终端启动时,播放预设的声音数据;根据所述全向性麦克风采集的包括预设声音的声音信号,确定语音自相关矩阵,作为噪声协方差矩阵。采用这种处理方式,使得会议终端在不同的会议环境中都可获得较好的回声抑制效果。
在一个示例中,所述装置还可包括:
噪声协方差矩阵更新单元,用于在所述会议终端工作时,若检测到目标方向静音,则根据所述全向性麦克风采集的包括会议声音的声音信号,更新 所述自相关矩阵,作为更新的噪声协方差矩阵。采用这种处理方式,可以更好的适应会议环境,提升噪声协方差矩阵的准确度,从而提升权值向量的准确度,进而提升回声抑制效果。
在一个示例中,所述装置还可包括:
信噪比确定单元,用于在所述会议终端工作时,若检测到目标方向静音,则根据若检测到目标声源移动,则确定所述全向性麦克风的信噪比;
回声抑制信号选取单元,用于根据所述信噪比,选取与目标全向性麦克风组的所述至少两个全向性麦克风对应的所述消除回声后的声音信号。采用这种处理方式,可以在声源移动时,仍然获得较好的回声抑制效果。
第四实施例
在上述的实施例中,提供了一种回声消除方法,与之相对应的,本申请还提供一种电子设备。该设备是与上述方法的实施例相对应。由于设备实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
本申请另外提供一种电子设备,包括:扬声器;至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;处理器;以及存储器。其中,存储器用于存储实现上述回声消除方法的程序,该终端通电并通过所述处理器运行该方法的程序。
所述电子设备,可以是音视频会议终端,也可以是拾音设备。
本申请虽然以较佳实施例公开如上,但其并不是用来限定本申请,任何本领域技术人员在不脱离本申请的精神和范围内,都可以做出可能的变动和修改,因此本申请的保护范围应当以本申请权利要求所界定的范围为准。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
1、计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以 由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
2、本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
Claims (12)
- 一种会议终端,其特征在于,包括:扬声器;至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;处理器;以及存储器,用于存储实现回声消除方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;通过所述全向性麦克风,采集声音信号;针对所述全向性麦克风组,根据所述波束形成器的权值向量,确定与所述至少两个全向性麦克风对应的至少两路声音信号的加权之和,作为消除回声后的声音信号。
- 根据权利要求1的会议终端,其特征在于,所述至少两个全向性麦克风为两个全向性麦克风。
- 根据权利要求1的会议终端,其特征在于,所述至少一个全向性麦克风组为以扬声器为中心的三个全向性麦克风组,所述三个全向性麦克风组覆盖全方向的目标声源。
- 一种回声消除方法,用于会议终端,其特征在于,所述会议终端包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;所述方法包括:确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;通过所述全向性麦克风,采集声音信号;针对所述全向性麦克风组,根据所述波束形成器的权值向量,确定与所述至少两个全向性麦克风对应的至少两路声音信号的加权之和,作为消除回 声后的声音信号。
- 根据权利要求4的方法,其特征在于,所述确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,包括:确定所述会议终端的噪声协方差矩阵和导向矢量;通过最小方差无失真响应MVDR波束形成算法,根据所述噪声协方差矩阵和导向矢量,确定所述权值向量。
- 根据权利要求5的方法,其特征在于,所述噪声协方差矩阵采用如下方式确定:在所述会议终端启动时,播放预设的声音数据;根据所述全向性麦克风采集的包括预设声音的声音信号,确定语音自相关矩阵,作为噪声协方差矩阵。
- 根据权利要求6的方法,其特征在于,还包括:在所述会议终端工作时,若检测到目标方向静音,则根据所述全向性麦克风采集的包括会议声音的声音信号,更新所述自相关矩阵,作为更新的噪声协方差矩阵。
- 根据权利要求4的方法,其特征在于,还包括:若检测到目标声源移动,则确定所述全向性麦克风的信噪比;根据所述信噪比,选取与目标全向性麦克风组的所述至少两个全向性麦克风对应的所述消除回声后的声音信号。
- 一种回声消除装置,其特征在于,所述装置位于会议终端,所述会议终端包括:扬声器,至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;所述装置包括:参数确定单元,用于确定使得所述至少两个全向性麦克风形成偶极子波束模式的波束形成器的权值向量,以抑制扬声器方向的回声信号,增强目标方向的声音信号;声音信号采集单元,用于通过所述全向性麦克风,采集声音信号;波束形成单元,用于针对所述全向性麦克风组,根据所述波束形成器的权值向量,确定与所述至少两个全向性麦克风对应的至少两路声音信号的加权之和,作为消除回声后的声音信号。
- 一种拾音设备,其特征在于,包括:扬声器;至少一个全向性麦克风组,所述全向性麦克风组包括至少两个全向性麦克风;处理器;以及存储器,用于存储实现根据权利要求4-8任一项所述的回声消除方法的程序,该终端通电并通过所述处理器运行该方法的程序。
- 一种计算机程序,其特征在于,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行根据权利要求4-8任一项所述的回声消除方法。
- 一种计算机可读介质,其中存储了如权利要求11所述的计算机程序。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/125763 WO2023065317A1 (zh) | 2021-10-22 | 2021-10-22 | 会议终端及回声消除方法 |
CN202180101805.3A CN117981352A (zh) | 2021-10-22 | 2021-10-22 | 会议终端及回声消除方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/125763 WO2023065317A1 (zh) | 2021-10-22 | 2021-10-22 | 会议终端及回声消除方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023065317A1 true WO2023065317A1 (zh) | 2023-04-27 |
Family
ID=86058745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/125763 WO2023065317A1 (zh) | 2021-10-22 | 2021-10-22 | 会议终端及回声消除方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117981352A (zh) |
WO (1) | WO2023065317A1 (zh) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007251406A (ja) * | 2006-03-14 | 2007-09-27 | Yamaha Corp | 音声信号送受信装置及び音声会議装置 |
WO2009034524A1 (en) * | 2007-09-13 | 2009-03-19 | Koninklijke Philips Electronics N.V. | Apparatus and method for audio beam forming |
CN101763858A (zh) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | 双麦克风信号处理方法 |
CN104464739A (zh) * | 2013-09-18 | 2015-03-25 | 华为技术有限公司 | 音频信号处理方法及装置、差分波束形成方法及装置 |
CN110085247A (zh) * | 2019-05-06 | 2019-08-02 | 上海互问信息科技有限公司 | 一种针对复杂噪声环境的双麦克风降噪方法 |
CN111312269A (zh) * | 2019-12-13 | 2020-06-19 | 辽宁工业大学 | 一种智能音箱中的快速回声消除方法 |
CN111866439A (zh) * | 2020-07-21 | 2020-10-30 | 厦门亿联网络技术股份有限公司 | 一种优化音视频体验的会议装置、系统及其运行方法 |
CN111918169A (zh) * | 2020-06-28 | 2020-11-10 | 佳禾智能科技股份有限公司 | 基于多波束成形麦克风阵列的会议音箱及其声波拾取方法 |
-
2021
- 2021-10-22 WO PCT/CN2021/125763 patent/WO2023065317A1/zh active Application Filing
- 2021-10-22 CN CN202180101805.3A patent/CN117981352A/zh active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007251406A (ja) * | 2006-03-14 | 2007-09-27 | Yamaha Corp | 音声信号送受信装置及び音声会議装置 |
WO2009034524A1 (en) * | 2007-09-13 | 2009-03-19 | Koninklijke Philips Electronics N.V. | Apparatus and method for audio beam forming |
CN101763858A (zh) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | 双麦克风信号处理方法 |
CN104464739A (zh) * | 2013-09-18 | 2015-03-25 | 华为技术有限公司 | 音频信号处理方法及装置、差分波束形成方法及装置 |
CN110085247A (zh) * | 2019-05-06 | 2019-08-02 | 上海互问信息科技有限公司 | 一种针对复杂噪声环境的双麦克风降噪方法 |
CN111312269A (zh) * | 2019-12-13 | 2020-06-19 | 辽宁工业大学 | 一种智能音箱中的快速回声消除方法 |
CN111918169A (zh) * | 2020-06-28 | 2020-11-10 | 佳禾智能科技股份有限公司 | 基于多波束成形麦克风阵列的会议音箱及其声波拾取方法 |
CN111866439A (zh) * | 2020-07-21 | 2020-10-30 | 厦门亿联网络技术股份有限公司 | 一种优化音视频体验的会议装置、系统及其运行方法 |
Also Published As
Publication number | Publication date |
---|---|
CN117981352A (zh) | 2024-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9913022B2 (en) | System and method of improving voice quality in a wireless headset with untethered earbuds of a mobile device | |
US9497544B2 (en) | Systems and methods for surround sound echo reduction | |
JP6703525B2 (ja) | 音源を強調するための方法及び機器 | |
US8233352B2 (en) | Audio source localization system and method | |
JP6121481B2 (ja) | マルチマイクロフォンを用いた3次元サウンド獲得及び再生 | |
US9438985B2 (en) | System and method of detecting a user's voice activity using an accelerometer | |
US9197974B1 (en) | Directional audio capture adaptation based on alternative sensory input | |
KR101456866B1 (ko) | 혼합 사운드로부터 목표 음원 신호를 추출하는 방법 및장치 | |
US20110096915A1 (en) | Audio spatialization for conference calls with multiple and moving talkers | |
WO2015035785A1 (zh) | 语音信号处理方法与装置 | |
JP2013543987A (ja) | 遠距離場マルチ音源追跡および分離のためのシステム、方法、装置およびコンピュータ可読媒体 | |
US11496830B2 (en) | Methods and systems for recording mixed audio signal and reproducing directional audio | |
CN110379439A (zh) | 一种音频处理的方法以及相关装置 | |
US11122381B2 (en) | Spatial audio signal processing | |
CN111078185A (zh) | 录制声音的方法及设备 | |
CN109859769A (zh) | 一种掩码估计方法及装置 | |
WO2023065317A1 (zh) | 会议终端及回声消除方法 | |
Ba et al. | Enhanced MVDR beamforming for arrays of directional microphones | |
WO2023056905A1 (zh) | 声源定位方法、装置及设备 | |
TW202312140A (zh) | 會議終端及回授抑制方法 | |
CN111243615A (zh) | 麦克风阵列信号处理方法及手持式装置 | |
Ogawa et al. | Speech enhancement using a square microphone array in the presence of directional and diffuse noise | |
Suzuki et al. | Spot-forming method by using two shotgun microphones | |
WO2022041030A1 (en) | Low complexity howling suppression for portable karaoke | |
CN115508777A (zh) | 说话人定位方法、装置及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21961065 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180101805.3 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |