WO2022140928A1 - 用于抑制回声的音频信号处理方法和系统 - Google Patents

用于抑制回声的音频信号处理方法和系统 Download PDF

Info

Publication number
WO2022140928A1
WO2022140928A1 PCT/CN2020/140215 CN2020140215W WO2022140928A1 WO 2022140928 A1 WO2022140928 A1 WO 2022140928A1 CN 2020140215 W CN2020140215 W CN 2020140215W WO 2022140928 A1 WO2022140928 A1 WO 2022140928A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
speaker
audio
threshold
control signal
Prior art date
Application number
PCT/CN2020/140215
Other languages
English (en)
French (fr)
Inventor
郑金波
周美林
廖风云
齐心
Original Assignee
深圳市韶音科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市韶音科技有限公司 filed Critical 深圳市韶音科技有限公司
Priority to PCT/CN2020/140215 priority Critical patent/WO2022140928A1/zh
Priority to CN202080104434.XA priority patent/CN116158090A/zh
Priority to KR1020237018110A priority patent/KR20230098282A/ko
Priority to JP2023533789A priority patent/JP2023551556A/ja
Priority to EP20967280.7A priority patent/EP4270987A1/en
Priority to US17/397,797 priority patent/US20220208207A1/en
Publication of WO2022140928A1 publication Critical patent/WO2022140928A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Definitions

  • This specification relates to the field of audio signal processing, and in particular, to an audio signal processing method and system for suppressing echoes.
  • vibration sensors are used in electronic products such as headphones, and more and more applications are used as bone conduction microphones to receive voice signals.
  • the system converts the vibration signal collected by the bone conduction microphone into an electrical signal or other type of signal, and transmits it to the electronic device to realize the sound pickup function.
  • more and more electronic devices combine air conduction microphones with bone conduction microphones with different characteristics, use the air conduction microphone to pick up the external audio signal, use the bone conduction microphone to pick up the vibration signal of the sounding part, and make a speech on the picked up signal. Enhanced processing and fusion.
  • the bone conduction microphone When the bone conduction microphone is placed in an earphone or other electronic device with a speaker, the bone conduction microphone can not only receive the vibration signal when the person speaks, but also the vibration generated by the speaker of the earphone or other electronic device when the sound is played. signal, resulting in an echo signal. At this time, it needs to be processed by an echo cancellation algorithm.
  • the different echo signals of the speaker will also affect the voice quality of the microphone. For example, when the input signal of the speaker is strong, the vibration signal of the speaker received by the bone conduction microphone is larger, which is much larger than the vibration signal generated when the person speaks received by the bone conduction microphone.
  • the traditional echo cancellation algorithm is difficult to eliminate the bone conduction microphone. echo in .
  • the voice quality obtained by the microphone signals output by the air conduction microphone and the bone conduction microphone as the sound source signal is poor. Therefore, it is unreasonable not to consider the echo signal of the loudspeaker when selecting the sound source signal of the microphone.
  • This specification provides a new audio signal processing method and system for suppressing echo, so as to improve the effect of echo cancellation and improve the voice quality.
  • the present specification provides an audio signal processing method for suppressing echo, comprising: selecting a target audio processing mode of an electronic device from a plurality of audio processing modes based on at least a speaker signal, the speaker signal being sent by a control device to a The audio signal of the electronic device; the target audio is generated by processing the microphone signal in the target audio processing mode to at least reduce the echo in the target audio, and the microphone signal is the output signal of the microphone module obtained by the electronic device.
  • the microphone module includes at least one first type microphone and at least one second type microphone; and outputs the target audio signal.
  • the at least one microphone of the first type outputs a first audio signal; and the at least one microphone of the second type outputs a second audio signal, wherein the microphone signal includes the first audio signal and the second audio signal.
  • the at least one microphone of the first type is used to collect human body vibration signals; and the at least one microphone of the second type is used to collect air vibration signals.
  • the plurality of audio processing modes include at least: a first mode for performing signal processing on the first audio signal and the second audio signal; and a second mode for performing signal processing on the second audio signal Do signal processing.
  • the selecting a target audio processing mode of the electronic device from a plurality of audio processing modes based at least on the speaker signal includes: generating a control signal corresponding to the speaker signal based on at least the strength of the speaker signal, the control signal includes a first control signal or a second control signal; and based on the control signal, selecting a target audio processing mode corresponding to the control signal, wherein the first mode corresponds to the first control signal , the second mode corresponds to the second control signal.
  • the generating a control signal corresponding to the speaker signal based on at least the strength of the speaker signal includes: determining that the strength of the speaker signal is lower than a preset speaker threshold, and generating the first control signal; or determine that the intensity of the speaker signal is higher than the speaker threshold, and generate the second control signal.
  • the generating a control signal corresponding to the speaker signal based on at least the strength of the speaker signal includes: generating a corresponding control signal based on the strength of the speaker signal and the microphone signal.
  • the generating a corresponding control signal based on the strength of the speaker signal and the microphone signal includes: acquiring an evaluation parameter of the microphone signal, where the evaluation parameter includes an environmental noise evaluation parameter, and the The environmental noise evaluation parameter includes at least one of an environmental noise level and a signal-to-noise ratio; and the control signal is generated based on the strength of the speaker signal and the evaluation parameter.
  • the generating the control signal based on the strength of the speaker signal and the evaluation parameter includes one of the following cases: determining that the strength of the speaker signal is higher than a preset speaker threshold, generating the second control signal; determining that the strength of the speaker signal is lower than the speaker threshold and the environmental noise evaluation parameter is outside a preset noise evaluation range, generating the first control signal; and determining the The strength of the speaker signal is lower than the speaker threshold, and the environmental noise evaluation parameter is within the noise evaluation range, and the first control signal or the second control signal is generated.
  • the environmental noise evaluation parameter is within the noise evaluation range, including at least one of the following situations: the environmental noise level is lower than a preset environmental noise threshold; and the signal-to-noise ratio is higher than Preset SNR threshold.
  • the evaluation parameter further includes a vocal signal strength
  • the generating the control signal based on the strength of the speaker signal and the evaluation parameter includes one of the following: determining the speaker The strength of the signal is higher than the preset speaker threshold, and the strength of the vocal signal exceeds the preset vocal threshold, and the environmental noise evaluation parameter is outside the preset noise evaluation range, and the first control signal is generated; determine; The intensity of the speaker signal is higher than the speaker threshold, and the intensity of the human voice signal exceeds the human voice threshold, the environmental noise evaluation parameter is within the noise evaluation range, and the second control signal is generated; determining that the strength of the speaker signal is higher than the speaker threshold, and the strength of the vocal signal is lower than the vocal threshold, and generating the second control signal; determining that the strength of the speaker signal is lower than the speaker threshold , and the environmental noise evaluation parameter is outside the noise evaluation range, generating the first control signal; and determining that the intensity of the speaker signal is lower than the speaker threshold, and the environmental noise evaluation parameter is in the Within the noise evaluation range,
  • the environmental noise evaluation parameter is within the noise evaluation range, including at least one of the following situations: the environmental noise level is lower than a preset environmental noise threshold; and the signal-to-noise ratio is higher than Preset SNR threshold.
  • the generating the target audio includes: performing signal processing on the first audio signal and the second audio signal by a first algorithm in the first mode to generate the first target audio; Or by using the second algorithm in the second mode, signal processing is performed on the second audio signal to generate a second target audio, wherein the target audio includes the first target audio and the second target audio one of the.
  • the outputting the target audio includes: smoothing the target audio, and when the target audio is switched between the first target audio and the second target audio The smoothing process is performed at the connection between the first target audio and the second target audio; and the smoothed target audio is output.
  • the method further comprises controlling the strength of a speaker input signal of the speaker based on the control signal.
  • the controlling the intensity of the speaker input signal of the speaker based on the control signal includes: determining the control signal as the first control signal, and reducing the speaker input input to the speaker signal strength, thereby reducing the strength of the sound output by the speaker.
  • the present specification also provides an audio signal processing system for suppressing echoes, comprising: at least one storage medium and at least one processor, wherein the at least one storage medium stores at least one instruction set for suppressing echoes audio signal processing; the at least one processor is communicatively connected to the at least one storage medium, wherein, when the system is running, the at least one processor reads the at least one instruction set and, according to the at least one The instruction of an instruction set executes the method of audio signal processing for suppressing echoes described in the first aspect of this specification.
  • the audio signal processing method and system for suppressing echoes can generate a control signal corresponding to the speaker signal according to the strength of the speaker signal, and control or switch the audio processing mode according to the control signal. , so as to perform signal processing on the audio source signal corresponding to the audio processing mode to obtain better voice quality.
  • the system When the speaker signal does not exceed the threshold, the system generates the first control signal, selects the first mode, and uses the first audio signal and the second audio signal as the first audio source signal, and performs signal processing on the first audio source signal to obtain The first target audio.
  • the loudspeaker signal exceeds the threshold, the loudspeaker echo in the first audio signal is larger.
  • the system generates the second control signal, selects the second mode, and uses the second audio signal as the second audio source signal, and performs signal processing on the second audio source signal to obtain the second target audio frequency.
  • the method and system can switch different audio processing modes according to the speaker signal, thereby switching the audio source signal of the microphone signal, so as to improve the voice quality and ensure better voice quality in different scenarios.
  • FIG. 1 shows a schematic diagram of application scenarios of some audio signal processing systems for suppressing echoes provided according to embodiments of the present specification
  • FIG. 2 shows a schematic diagram of some electronic devices provided according to an embodiment of the present specification
  • Fig. 3 shows the working schematic diagram of some first modes provided according to the embodiments of this specification
  • Fig. 4 shows the working schematic diagram of some second modes provided according to the embodiments of this specification.
  • FIG. 5 shows a flowchart of some audio signal processing methods for suppressing echoes provided according to an embodiment of the present specification
  • FIG. 6 shows a flowchart of some audio signal processing methods for suppressing echoes provided according to an embodiment of the present specification.
  • FIG. 7 shows a flowchart of some audio signal processing methods for suppressing echoes provided according to an embodiment of the present specification.
  • FIG. 1 shows a schematic diagram of some application scenarios of an audio signal processing system 100 for echo suppression (hereinafter referred to as the system 100 ) provided according to an embodiment of the present specification.
  • System 100 may include electronic device 200 and control device 400 .
  • the electronic device 200 may store data or instructions for performing the method of audio signal processing for suppressing echoes described in this specification, and may execute the data and/or instructions.
  • the electronic device 200 may be a wireless headset, a wired headset, or a smart wearable device, such as smart glasses, a smart helmet, or a smart watch, and other devices with a voice collection function and a voice playback function.
  • the electronic device 200 may also be a mobile device, a tablet computer, a laptop computer, an in-vehicle device, or the like, or any combination thereof.
  • the mobile device may comprise a smart home device, a smart mobile device, or the like, or any combination thereof.
  • the intelligent mobile device may include a mobile phone, a personal digital assistant, a game device, a navigation device, an Ultra-mobile Personal Computer (UMPC), etc., or any combination thereof.
  • the smart home devices may include smart TVs, desktop computers, etc., or any combination.
  • built-in devices in an automobile may include an onboard computer, an onboard television, and the like.
  • Control device 400 may be a remote device in wired and/or wireless audio signal communication with electronic device 200 .
  • the control device 400 may also be a device communicatively connected to the electronic device 200 locally.
  • the electronic device 200 can collect local audio signals and output them to the control device 400 .
  • the electronic device 200 can also receive and output the far-end audio signal sent by the control device 400 .
  • the far-end audio signal may also be referred to as a loudspeaker signal.
  • the control device 400 may also be a device with a voice collection function and a voice playback function.
  • the control device 400 may be a terminal device communicatively connected to the earphone, such as a mobile phone, a computer, and the like.
  • the electronic device 200 may include a microphone module 240 and a speaker 280 .
  • the microphone module 240 may be configured to acquire local audio signals and output microphone signals, that is, electronic signals carrying audio information.
  • the microphone module 240 may be an out-of-ear microphone module or an in-ear microphone module.
  • the microphone module 240 may be a microphone disposed outside the ear canal, or may be a microphone disposed in the ear canal.
  • the microphone module 240 may include at least one first type microphone 242 and at least one second type microphone 244 .
  • the first type of microphone 242 is different from the second type of microphone 244 .
  • the first type of microphone 242 may be a microphone that directly collects human body vibration signals, such as a bone conduction microphone.
  • the second type of microphone 244 may be a microphone that directly collects air vibration signals, such as an air conduction microphone.
  • the first type of microphone 242 and the second type of microphone 244 may also be other types of microphones.
  • the first type of microphone 242 may be an optical microphone; the second type of microphone 244 may be a microphone for receiving electromyographic signals, and so on. Since the first type of microphone 242 is different from the second type of microphone 244, the performance of the perceived audio signal will be different, resulting in different noise and echo components in the corresponding audio signal.
  • the present disclosure will use a bone conduction microphone as an example of the first type of microphone 242 and an air conduction microphone as an example of the second type of microphone 244 in the following statements.
  • the bone conduction microphone may include vibration sensors, such as optical vibration sensors, acceleration sensors, and the like.
  • the vibration sensor can collect mechanical vibration signals (eg, signals generated by the vibration of the skin or bones when the user 002 speaks), and convert the mechanical vibration signals into electrical signals.
  • the mechanical vibration signal mentioned here mainly refers to the vibration transmitted through the solid body.
  • the bone conduction microphone contacts the skin or bones of the user 002 through the vibration sensor or the vibration component connected to the vibration sensor, so as to collect the vibration signals generated by the bones or the skin of the user 002 when making sounds, and convert the vibration signals into electric signal.
  • the vibration sensor may be a device that is sensitive to mechanical vibration but not to air vibration (ie, the vibration sensor is more responsive to mechanical vibration than the vibration sensor is to air vibration). Since the bone conduction microphone can directly pick up the vibration signal of the vocal part, the bone conduction microphone can reduce the influence of environmental noise.
  • the air conduction microphone collects the air vibration signal caused by the user 002 when making a sound, and converts the air vibration signal into an electrical signal.
  • the air conduction microphone may be a single air conduction microphone, or a microphone array composed of two or more air conduction microphones.
  • the microphone array may be a beamforming microphone array or other similar microphone array. Sounds from different directions or different locations in space can be collected through the microphone array.
  • the first type of microphone 242 may output a first audio signal 243 .
  • the second type of microphone 244 may output a second audio signal 245 .
  • the microphone signal includes the first audio signal 243 and the second audio signal 245.
  • the second audio signal 245 has better speech quality than the first audio signal 243 .
  • the voice quality of the first audio signal 243 in the low frequency part is higher, and the voice quality of the second audio signal 245 in the high frequency part is higher. Therefore, the audio signal obtained by feature fusion of the first audio signal 243 and the second audio signal 245 has good speech quality in a scenario with relatively large ambient noise.
  • the noise of the environment may change all the time, and the low-noise scene and the high-noise scene are repeatedly switched.
  • the speaker 280 may convert electrical signals into audio signals.
  • the speaker 280 may be configured to receive and output the speaker signal from the control device 400 .
  • the audio signal input to the speaker 280 may be the speaker input signal.
  • the speaker input signal may be the speaker signal.
  • the electronic device 200 may perform signal processing on the speaker signal, and send the signal-processed audio signal to the speaker 280 for output.
  • the speaker input signal may be an audio signal obtained after the electronic device 200 performs signal processing on the speaker signal.
  • the sound output by the speaker input signal through the speaker 280 may be transmitted to the user 002 by air conduction or bone conduction.
  • the speaker 280 may be a speaker that transmits vibration signals to the human body to transmit sound, such as a bone conduction speaker, or a speaker that transmits vibration signals through air, such as an air conduction speaker.
  • the bone conduction speaker generates mechanical vibration through the vibration module, and conducts the mechanical vibration into the ear through the bone.
  • the speaker 280 may contact the head of the user 002 directly or through a specific medium (eg, one or more panels), and transmit the audio signal to the user's auditory nerve through skull vibration.
  • the air conduction speaker generates vibration in the air through the vibration module, and conducts the air vibration into the ear through the air.
  • Speaker 280 may also be a combination of bone conduction speakers and air conduction speakers. Speaker 280 may also be other types of speakers.
  • the sound output by the speaker input signal through the speaker 280 may be collected by the microphone module 240 to form an echo. The greater the strength of the input signal of the speaker, the greater the strength of the sound output by the speaker 280, and the stronger the echo signal.
  • the microphone module 240 and the speaker 280 may be integrated on the electronic device 200 , or may be external devices of the electronic device 200 .
  • the electronic device 200 can collect audio signals through the microphone module 240 and generate the microphone signals.
  • the microphone signal may include a first audio signal 243 and a second audio signal 245 . In different scenarios, the voice quality of the first audio signal 243 and the second audio signal 245 are different.
  • the electronic device 200 can select a target audio processing mode from multiple audio processing modes according to different application scenarios, so as to select an audio signal with better voice quality from the microphone signals as the audio source signal, and pass The target audio processing mode performs signal processing on the audio source signal and outputs it to the control device 400 .
  • the audio source signal may be an input signal of the target audio processing mode.
  • the signal processing may include noise suppression to reduce noise signals.
  • the signal processing may include echo suppression to reduce echo signals.
  • the signal processing may include both the noise suppression and the echo suppression.
  • the signal processing may also directly output the audio source signal.
  • the selection of the target audio processing mode by the electronic device 200 is not only related to the ambient noise, but also related to the speaker signal.
  • the first audio signal 243 output by the first type of microphone 242 and the second audio signal 245 output by the second type of microphone 244 are characterized.
  • the voice quality of the fused audio signal is better than the voice quality of the second audio signal 245 output by the second type of microphone 244 .
  • the speaker signal when the speaker signal is large and the sound output by the speaker 280 is also large, the impact on the first audio signal 243 output by the first type of microphone 242 is large, resulting in the first audio
  • the echo in signal 243 is larger.
  • the echo signal in the first audio signal 243 may exceed the voice signal of the user 002 .
  • the speaker 280 is a bone conduction speaker, the echo signal in the first audio signal 243 is more obvious. It is difficult for the traditional echo cancellation algorithm to cancel the echo signal in the first audio signal 243, and the effect of echo cancellation cannot be guaranteed.
  • the voice quality of the second audio signal 245 output by the second type of microphone 244 is better than that of the first audio signal 243 output by the first type of microphone 242 and the second audio signal 245 output by the second type of microphone 244 after feature fusion.
  • the voice quality of the audio signal is better than that of the first audio signal 243 output by the first type of microphone 242 and the second audio signal 245 output by the second type of microphone 244 after feature fusion.
  • the electronic device 200 may select the target audio processing mode from the plurality of audio processing modes based on the speaker signal to perform the signal processing on the microphone signal.
  • the plurality of audio processing modes may include at least a first mode 1 and a second mode 2 .
  • the first mode 1 may perform signal processing on the first audio signal 243 and the second audio signal 245 .
  • the signal processing may include noise suppression to reduce noise signals.
  • the signal processing may include echo suppression to reduce echo signals.
  • the signal processing may include both the noise suppression and the echo suppression. For the convenience of presentation, in the following description, we will describe the signal processing including the echo suppression. Those skilled in the art should understand that other signal processing methods are within the protection scope of this specification.
  • the second mode 2 may perform signal processing on the second audio signal 245 .
  • the signal processing may include noise suppression to reduce noise signals.
  • the signal processing may include echo suppression to reduce echo signals.
  • the signal processing may include both the noise suppression and the echo suppression.
  • the target audio processing mode is one of the first mode 1 and the second mode 2.
  • the plurality of audio processing modes may also include other modes, such as a processing mode for performing signal processing on the first audio signal 243 .
  • the electronic device 200 selects the first mode 1, uses the first audio signal 243 and the second audio signal 245 as the audio source signals, and The audio source signal is subjected to signal processing to generate and output the first target audio 291 for use in voice communication.
  • the electronic device 200 selects the second mode 2, uses the second audio signal 245 as the sound source signal, and performs signal processing on the sound source signal.
  • the second target audio 292 is generated and output, and applied to voice communication.
  • the electronic device 200 may execute the data or instructions of the method for echo suppression audio signal processing described in this specification, and acquire the microphone signal and the speaker signal; the electronic device 200 may select the corresponding signal based on the signal strength of the speaker signal.
  • the target audio processing mode for signal processing the microphone signal may be executed.
  • the electronic device 200 may select a target audio processing mode corresponding to the strength of the speaker signal from a plurality of audio processing modes, and select a target audio processing mode corresponding to the strength of the speaker signal from the first audio signal 243 and the second audio In the signal 245, an audio signal with better voice quality or a combination thereof is selected as the audio source signal, and a corresponding signal processing algorithm is used to perform signal processing (such as echo cancellation and noise reduction processing) on the audio source signal, and the target audio is generated and output. Reduce echoes in the target audio.
  • the target audio may include one of the first target audio 291 and the second target audio 292 .
  • the electronic device 200 may output the target audio to the control device 400 .
  • the electronic device 200 can control and select the target audio processing mode based on the strength of the speaker signal, so as to select an audio signal with better voice quality as the audio source signal of the electronic device 200, and Signal processing is performed on the audio source signal to obtain different target audios for different usage scenarios, so as to ensure that the voice quality of the target audios is optimal in different usage scenarios.
  • FIG. 2 shows a schematic diagram of an electronic device 200 .
  • the electronic device 200 may perform the method of audio signal processing for suppressing echoes described in this specification.
  • the described method of audio signal processing for echo suppression is described elsewhere in this specification.
  • the audio signal processing method for echo suppression is introduced in the description of FIGS. 5 to 7 .
  • the electronic device 200 may include a microphone module 240 and a speaker 280 .
  • the electronic device 200 may also include at least one storage medium 230 and at least one processor 220 .
  • the storage medium 230 may include data storage devices.
  • the data storage device may be a non-transitory storage medium or a temporary storage medium.
  • the data storage device may include one or more of a magnetic disk, read only storage medium (ROM), or random access storage medium (RAM).
  • the storage medium 230 also includes at least one set of instructions stored in the data storage device for echo-suppressed audio signal processing.
  • the instructions are computer program code, which may include programs, routines, objects, components, data structures, procedures, modules, etc. that perform the methods provided herein for echo-suppressed audio signal processing.
  • the at least one instruction set may include control instructions, issued by the control module 231, configured to generate a control corresponding to the speaker signal based on the speaker signal or the speaker signal and the microphone signal Signal.
  • the control signal includes a first control signal or a second control signal. Wherein, the first control signal corresponds to the first mode 1 .
  • the second control signal corresponds to the second mode 2 .
  • the control signal may be any signal, for example, the first control signal may be signal 1, the second control signal may be signal 2, and so on.
  • the control instruction sent by the control module 231 may generate a corresponding control signal according to the signal strength of the speaker signal or the signal strength of the speaker signal and the evaluation parameter of the microphone signal.
  • the control module 231 may also select a target audio signal processing mode corresponding to the control signal according to the control signal.
  • the control module 231 selects the first mode 1; when the control signal is the second control signal, the control module 231 selects the second mode 2.
  • the at least one instruction set may further include an echo processing instruction, which is issued by the echo processing module 233 and is configured to, based on the control signal, use the target audio processing mode of the electronic device 200 for the microphone
  • the signal is subjected to signal processing (such as echo suppression, noise reduction, etc.).
  • the echo processing module 233 uses the first mode 1 to perform signal processing on the microphone signal.
  • the control signal is the second control signal
  • the echo processing module 233 uses the first mode 2 to perform signal processing on the microphone signal.
  • the echo processing module 233 may include a first algorithm 233-1 and a second algorithm 233-8.
  • the first algorithm 233-1 corresponds to the first control signal and the first mode 1.
  • the second algorithm 233 - 8 corresponds to the second control signal and the second mode 2 .
  • the electronic device 200 uses the first algorithm 233-1 to perform signal processing on the first audio signal 243 and the second audio signal 245 respectively, and processes the first audio signal 243 and the second audio signal 243 after the signal processing.
  • the second audio signal 245 is subjected to feature fusion, and the first target audio 291 is output.
  • FIG. 3 shows a schematic working diagram of a first mode 1 provided according to an embodiment of the present specification.
  • the first algorithm 233-1 may receive the first audio signal 243 and the second audio signal 245 and the speaker input signal.
  • the first algorithm 233-1 may use a first echo cancellation module 233-2 to perform echo cancellation on the first audio signal 243 based on the speaker input signal.
  • the speaker input signal may be an audio signal after noise reduction processing.
  • the first echo cancellation module 233-2 receives the first audio signal 243 and the speaker input signal, and outputs the first audio signal 243 after echo cancellation.
  • the first echo cancellation module 233-2 may be a single microphone echo cancellation algorithm.
  • the first algorithm 233-1 may use a second echo cancellation module 233-3 to echo cancel the second audio signal 245 based on the speaker input signal.
  • the second echo cancellation module 233-3 receives the second audio signal 245 and the speaker input signal, and outputs the second audio signal 245 after echo cancellation.
  • the second echo cancellation module 233-3 may be a single-microphone echo cancellation algorithm or a multi-microphone echo cancellation algorithm.
  • the first echo cancellation module 233-2 and the second echo cancellation module 233-3 may be the same or different.
  • the first algorithm 233-1 may use a first noise suppression module 233-4 to perform noise suppression on the first audio signal 243 and the second audio signal 245 after echo cancellation.
  • the first noise suppression module 233-4 is used to suppress noise signals in the first audio signal 243 and the second audio signal 245.
  • the first noise suppression module 233-4 receives the echo-removed first audio signal 243 and the second audio signal 245, and outputs the noise-suppressed first audio signal 243 and the second audio signal 245 .
  • the first noise suppression module 233-4 may perform noise reduction on the first audio signal 243 and the second audio signal 245 independently, or may simultaneously perform noise reduction on the first audio signal 243 and the second audio signal 245. Noise reduction.
  • the first algorithm 233-1 may use a feature fusion module 233-5 to perform feature fusion processing on the noise-suppressed first audio signal 243 and the second audio signal 245.
  • the feature fusion module 233-5 receives the first audio signal 243 and the second audio signal 245 after noise reduction processing.
  • the feature fusion module 233 - 5 can analyze the voice quality of the first audio signal 243 and the second audio signal 245 .
  • the feature fusion module 233-5 can analyze the effective voice signal strength, noise signal strength, echo signal strength, signal-to-noise ratio, etc. in the first audio signal 243 and the second audio signal 245, and determine the first audio signal
  • the first audio signal 243 and the second audio signal 245 are fused into the first target audio 291 and output.
  • the first algorithm 233-1 may also perform noise suppression on the speaker signal using the second noise suppression module 233-6.
  • the second noise suppression module 233-6 is used to suppress the noise signal in the speaker signal.
  • the second noise suppression module 233-6 receives the speaker signal sent by the control device 400, eliminates the far-end noise, channel noise, and electronic noise in the electronic device 200 and other noise signals in the speaker signal, and outputs the noise-reduced signal.
  • the speaker processes the signal.
  • the first algorithm 233-1 may include a feature fusion module 233-5.
  • the first algorithm 233-2 may further include a first echo cancellation module 233-2, a second echo cancellation module 233-3, a first noise suppression module 233-4 and a second noise suppression module 233- Any of 6 or any combination thereof.
  • the electronic device 200 uses the second algorithm 233-8 to perform signal processing on the second audio signal 245, and outputs the second target audio 292.
  • FIG. 4 shows a schematic working diagram of a second mode 2 provided according to an embodiment of the present specification.
  • the second algorithm 233-8 may receive the second audio signal 245 and the speaker input signal.
  • the second algorithm 233-8 may use a third echo cancellation module 233-9 to echo cancel the second audio signal 245 based on the speaker input signal.
  • the third echo cancellation module 233-9 receives the second audio signal 245 and the speaker input signal, and outputs the second audio signal 245 after echo cancellation.
  • the third echo cancellation module 233-9 may be the same as or different from the second echo cancellation module 233-3.
  • the second algorithm 233-8 may use the third noise suppression module 233-10 to perform noise suppression on the second audio signal 245 after echo cancellation.
  • the third noise suppression module 233 - 10 is used to suppress the noise signal in the second audio signal 245 .
  • the third noise suppression module 233 - 10 receives the second audio signal 245 after echo cancellation, and outputs the second audio signal 245 after noise suppression as the second target audio 292 .
  • the third noise suppression module 233-10 may be the same as or different from the first noise suppression module 233-4.
  • the second algorithm 233-8 may also perform noise suppression on the speaker signal using the fourth noise suppression module 233-11.
  • the fourth noise suppression module 233-11 is used to suppress the noise signal in the speaker signal.
  • the fourth noise suppression module 233-11 receives the speaker signal sent by the control device 400, eliminates the far-end noise, channel noise, and electronic noise in the electronic device 200 and other noise signals in the speaker signal, and outputs the noise-reduced signal.
  • the speaker processes the signal.
  • the fourth noise suppression module 233-11 may be the same as or different from the second noise suppression module 233-6.
  • the second algorithm 233-8 may include any of the third echo cancellation module 233-9, the third noise suppression module 233-10, and the fourth noise suppression module 233-11 one or any combination thereof. In other embodiments, the second algorithm 233-8 may not include any of the above signal processing modules, and directly output the second audio signal 245.
  • Only one of the first mode 1 and the second mode 2 can be run to save computing resources.
  • the first mode 1 When the first mode 1 is running, the second mode 2 can be turned off.
  • the first mode 1 When the second mode 2 is running, the first mode 1 can be turned off.
  • the first mode 1 and the second mode 2 can also run simultaneously, and when one of the modes is running, the other mode can update the algorithm parameters.
  • some parameters in the first mode 1 and the second mode 2 can be shared (for example, the noise parameters obtained by the noise estimation algorithm, the noise parameters obtained by the human voice estimation algorithm voice parameters, the signal-to-noise ratio parameters obtained by the signal-to-noise ratio algorithm, etc.), thereby saving computing resources and making the calculation results more accurate.
  • the first algorithm 233-1 and the second algorithm 233-8 in the first mode 1 and the second mode 2 can also be shared with some parameters in the control instructions issued by the control module 231, such as noise parameters obtained by the noise estimation algorithm,
  • noise parameters obtained by the noise estimation algorithm such as noise parameters obtained by the noise estimation algorithm,
  • human voice parameters obtained by the estimation algorithm the signal-to-noise ratio parameters obtained by the signal-to-noise ratio algorithm, etc., can save computing resources and make the calculation results more accurate.
  • the at least one instruction set may further include a microphone control instruction, executed by the microphone control module 235, configured to perform smoothing processing on the target audio, and output the smoothed target audio to a Control device 400 .
  • the microphone control module 235 may receive the control signal generated by the control module 231 and the target audio, and perform the smoothing process on the target audio based on the control signal.
  • the control signal is the first control signal, run the first mode 1, and use the first target audio 291 output by the first algorithm 233-1 as the input signal; when the control signal is the second When controlling the signal, the second mode 2 is run, and the second target audio 292 output by the second algorithm 233-8 is used as the input signal.
  • the microphone control module 235 may perform smooth processing on the signal discontinuity caused by the switching of a target audio 291 and the second target audio 292 . Specifically, the microphone control module 235 may adjust the first target audio 291 and the parameters of the first target audio 291 so that the target audio is continuous.
  • the parameters may be stored in the at least one storage medium 230 in advance.
  • the parameters may be amplitude, phase, frequency response, and the like.
  • the content of the adjustment may include adjustment of the volume of the target audio, adjustment of EQ equalization, adjustment of residual noise, and the like.
  • the microphone control module 235 can make the target audio processing mode switch between the first mode 1 and the second mode 2, the target audio is a continuous signal, so that the user 002 cannot easily perceive the switch between the two.
  • the at least one instruction set may further include speaker control instructions, which are executed by the speaker control module 237 and configured to adjust the speaker processing signal to obtain the speaker input signal, and input the speaker to the speaker.
  • the signal is output to the speaker 280 to output sound.
  • the speaker control module 237 may receive the speaker processing signal and the control signal output by the first algorithm 233-1 and the second algorithm 233-8.
  • the speaker control module 237 can control the speaker processing signal output by the first algorithm 233-1 to lower or turn it off and then output it to the speaker 280 for output, In order to reduce the sound output by the speaker 280, the echo is reduced, and the echo cancellation effect of the first algorithm 233-1 is improved.
  • the speaker control module 237 may not adjust the speaker processing signal output by the second algorithm 233-8.
  • the speaker control module 237 may adjust the first algorithm 233-1 and the second algorithm The speaker processing signal output by 233-8 is smoothed.
  • the speaker control module 237 tries to ensure the continuity of the switching, so that the user 002 cannot easily perceive the switching between the two.
  • the first algorithm 233-1 focuses on the voice quality of the user 002 picked up by the near-end microphone module 240.
  • the speaker processing signal is processed by the speaker control module 237 to reduce the speaker input signal, thereby reducing the sound output by the speaker 280 to reduce echo to ensure near-end voice quality.
  • the second algorithm 233 - 8 focuses on the speaker input signal of the speaker 280 , and does not use the first audio signal 243 output by the first type of microphone 242 to ensure the voice quality and intelligibility of the speaker input signal of the speaker 280 .
  • At least one processor 220 may be communicatively connected with at least one storage medium 230 , microphone module 240 and speaker 280 .
  • the communication connection refers to any form of connection capable of directly or indirectly receiving information.
  • At least one processor 220 is configured to execute the above-mentioned at least one instruction set.
  • At least one processor 220 reads the at least one instruction set, and acquires the data of the microphone module 240 and the speaker 280 according to the instructions of the at least one instruction set, and executes the echo suppression provided in this specification.
  • method of audio signal processing The processor 220 may perform all steps involved in the method of audio signal processing for echo suppression.
  • Processor 220 may be in the form of one or more processors, and in some embodiments, processor 220 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced instruction set computers (RISC), Application-Specific Integrated Circuits (ASICs), Application-Specific Instruction Set Processors (ASIPs), Central Processing Units (CPUs), Graphics Processing Units (GPUs), Physical Processing Units (PPUs), Microcontroller Units, Digital Signal Processors ( DSP), Field Programmable Gate Array (FPGA), Advanced RISC Machine (ARM), Programmable Logic Device (PLD), any circuit or processor capable of performing one or more functions, etc., or any combination thereof.
  • RISC reduced instruction set computers
  • ASICs Application-Specific Integrated Circuits
  • ASIPs Application-Specific Instruction Set Processors
  • CPUs Central Processing Units
  • GPUs Graphics Processing Units
  • PPUs Physical Processing Units
  • Microcontroller Units Digital Signal Processors
  • DSP Field Programmable Gate
  • processor 220 For the sake of illustration only, only one processor 220 is described in the electronic device 200 in this specification. However, it should be noted that the electronic device 200 in this specification may also include a plurality of processors, therefore, the operations and/or method steps disclosed in this specification may be performed by one processor as described in this specification, or may be performed by a plurality of processors The processors execute jointly. For example, if the processor 220 of the electronic device 200 performs step A and step B in this specification, it should be understood that step A and step B may also be performed jointly or separately by two different processors 220 (eg, the first processor Step A is performed, and the second processor performs step B, or the first and second processors jointly perform steps A and B).
  • the system 100 may select the target audio processing mode of the electronic device 200 according to the signal strength of the speaker signal. In some embodiments, the system 100 may select the target audio processing mode of the electronic device 200 according to the signal strength of the speaker signal and the microphone signal.
  • FIG. 5 shows a flowchart of an audio signal processing method P100 for suppressing echoes provided according to an embodiment of the present specification.
  • the method P100 is a flowchart of a method for the system 100 to select the target audio processing mode of the electronic device 200 according to the signal strength of the speaker signal.
  • the method P100 may include executing by at least one processor 220:
  • step S120 Select a target audio processing mode of the electronic device 200 from the first mode 1 and the second mode 2 based on at least the speaker signal.
  • the target audio processing mode may include one of the first mode 1 and the second mode 2 .
  • step S120 may include:
  • Step S122 Generate a control signal corresponding to the speaker signal based on at least the strength of the speaker signal.
  • the control signal includes a first control signal or a second control signal.
  • the electronic device 200 may receive the speaker signal sent by the control device 400, compare the intensity of the speaker signal with a preset speaker threshold, and generate the control signal according to the comparison result.
  • Step S122 may include one of the following situations:
  • S122-2 Determine that the intensity of the speaker signal is lower than the speaker threshold, and generate the first control signal
  • S122-4 Determine that the intensity of the speaker signal is higher than a preset speaker threshold, and generate the second control signal.
  • Step S120 may also include:
  • S124 Based on the control signal, select the target audio processing mode corresponding to the control signal.
  • the first control signal corresponds to the first mode 1 .
  • the second control signal corresponds to the second mode 2 .
  • the control signal is the first control signal
  • the first mode 1 is selected; when the control signal is the second control signal, the second mode 2 is selected.
  • the electronic device 200 When the strength of the speaker signal is higher than the speaker threshold, when the first algorithm 233-1 in the first mode 1 is used to perform signal processing on the first audio signal 243 and the second audio signal 245, it cannot be reserved At the same time as the voice signal is better, the echo signal in the signal is eliminated, so the voice quality of the first target audio 291 obtained is poor; and the second algorithm 233-8 in the second mode The quality of the second target audio 292 obtained by performing signal processing on the second audio signal 245 is good. Therefore, the electronic device 200 generates the second control signal corresponding to the second mode 2 when the strength of the speaker signal is higher than the speaker threshold.
  • the electronic device 200 When the intensity of the speaker signal is lower than the speaker threshold, when the first algorithm 233-1 in the first mode 1 is used to perform signal processing on the first audio signal 243 and the second audio signal 245, the At the same time as the voice signal is better, the echo signal in the signal is eliminated, so the obtained voice quality of the first target audio 291 is better; and the second algorithm 233-8 in the second mode The quality of the second target audio 292 obtained by performing signal processing on the second audio signal 245 is also good. Therefore, when the intensity of the speaker signal is lower than the speaker threshold, the electronic device 200 generates the first control signal corresponding to the first mode 1, and may also generate the second control signal corresponding to the second mode 2 Signal.
  • the control signal is generated by the control module 231 .
  • the electronic device 200 can monitor the intensity of the speaker signal in real time and compare it with the speaker threshold.
  • the electronic device 200 may also periodically detect the intensity of the speaker signal and compare it with the speaker threshold.
  • the electronic device 200 may further compare the speaker signal with the speaker threshold when it is detected that the intensity of the speaker signal changes significantly, and the change value exceeds a preset range.
  • the electronic device 200 When the strength of the speaker signal is higher than the speaker threshold, the electronic device 200 generates the second control signal; when the speaker signal changes and the strength of the speaker signal is lower than the speaker threshold, the electronic device 200 generates the second control signal; A device generates the first control signal.
  • the electronic device 200 When the strength of the speaker signal is lower than the speaker threshold, the electronic device 200 generates the first control signal; when the speaker signal changes and the strength of the speaker signal is higher than the speaker threshold, the electronic device 200 generates the first control signal; A device generates the second control signal.
  • the speaker threshold may be a range.
  • the loudspeaker threshold may be within a range within which the first loudspeaker threshold and the second loudspeaker threshold lie.
  • the first speaker threshold is smaller than the second speaker threshold.
  • the strength of the loudspeaker signal being above the loudspeaker threshold may include the strength of the loudspeaker signal being above the second loudspeaker threshold.
  • the strength of the loudspeaker signal being below the loudspeaker threshold may include the strength of the loudspeaker signal being below the first loudspeaker threshold.
  • the electronic device 200 may generate the first control signal or the second control signal.
  • the electronic device 200 When the speaker signal strength is higher than the second speaker threshold, the electronic device 200 generates the second control signal; when the speaker signal strength decreases to the first speaker threshold and the second speaker threshold between values, the electronic device 200 may generate the second control signal.
  • the electronic device 200 When the speaker signal strength is lower than the first speaker threshold, the electronic device 200 generates the first control signal; when the speaker signal strength increases to the first speaker threshold and the second speaker threshold between values, the electronic device 200 may generate the first control signal.
  • the electronic device 200 may also obtain a control model through machine learning, input the speaker signal into the control model, and the control model outputs the control signal.
  • the method P100 may further include executing, by at least one processor 220:
  • step S140 Process the microphone signal in the target audio processing mode to generate the target audio to at least reduce echoes in the microphone signal.
  • step S140 may include one of the following situations:
  • S142 Determine that the control signal is the first control signal, and use the first algorithm 233-1 in the first mode 1 corresponding to the first control signal, based on the speaker input signal, to
  • the first audio signal 243 and the second audio signal 245 are subjected to signal processing and feature fusion to generate the first target audio 291 .
  • the specific process is as described above and will not be repeated here.
  • S144 Determine that the control signal is the second control signal, and use the second algorithm 233-8 in the second mode 2 corresponding to the second control signal, based on the speaker input signal, to determine the second control signal for the second control signal.
  • the audio signal 245 undergoes signal processing. The specific process is as described above and will not be repeated here.
  • step S160 Output the target audio.
  • the electronic device 200 may directly output the target audio.
  • the electronic device 200 may also perform smoothing processing on the target audio, so that the target audio is not perceived by the user 002 when the target audio is switched between the first target audio 291 and the second target audio 292 .
  • step S160 may include: smoothing the target audio and outputting the smoothed target audio.
  • the electronic device 200 may perform smoothing processing on the target audio through the microphone control module 235 .
  • the microphone control module 235 can control the connection between the first target audio 291 and the second target audio 292.
  • the smoothing process is performed, that is, signal conditioning is performed on the first target audio 291 and the second target audio 292, so that a smooth transition is made at the connection.
  • the method P100 may further include:
  • step S180 Based on the control signal, control the intensity of the speaker input signal of the speaker 280. Specifically, step S180 may be performed by the speaker control module 237 .
  • the speaker control module 237 may determine that the control signal is the first control signal; the speaker control module 237 processes the speaker processing signal to reduce the intensity of the speaker input signal input to the speaker 280, thereby reducing the intensity of the speaker input signal.
  • the intensity of the sound output by the speaker 280 is used to reduce the echo signal in the microphone signal, so as to improve the voice quality of the first target audio.
  • Table 1 shows the result graph of the target audio processing mode corresponding to FIG. 5 .
  • the scenarios which are the first one: the near-end sound signal is less than the threshold (for example, user 002 does not emit sound) and the speaker signal does not exceed the speaker threshold
  • whether the near-end sound signal is greater than the threshold can be determined by the control module 231 according to the microphone signal.
  • the fact that the near-end sound signal is greater than the threshold may be that the strength of the audio signal sent by the user 002 exceeds a preset threshold.
  • the target audio processing modes corresponding to the four scenarios are that the first and second types correspond to the first mode 1; the third and fourth types correspond to the second mode 2, respectively.
  • the electronic device 200 may select the target audio processing mode of the electronic device 200 according to the speaker signal, so as to ensure that the voice quality processed by the target audio processing mode selected by the electronic device 200 is optimal in any scenario. to ensure call quality.
  • the selection of the target audio processing mode is not only related to the echo of the speaker signal, but also to ambient noise.
  • the ambient noise may be evaluated by at least one of an ambient noise level and a signal-to-noise ratio in the microphone signal.
  • Fig. 6 shows a flowchart of an audio signal processing method P200 for suppressing echo provided according to an embodiment of the present specification.
  • the method P200 is a flowchart of a method for the system 100 to select the target audio processing mode of the electronic device 200 according to the signal strength of the speaker signal and the microphone signal.
  • the method P200 is a flowchart of a method for the system 100 to select the target audio processing mode according to at least one of the ambient noise level and the signal-to-noise ratio in the speaker signal and the microphone signal.
  • the method P200 may include performing, by at least one processor 220:
  • step S220 Select a target audio processing mode of the electronic device 200 from the first mode 1 and the second mode 2 based on at least the speaker signal.
  • step S220 may include:
  • Step S222 Generate a control signal corresponding to the speaker signal based on at least the strength of the speaker signal.
  • the control signal includes a first control signal or a second control signal.
  • step S222 may be that the electronic device 200 generates a corresponding control signal based on the strength of the speaker signal and the noise in the microphone signal.
  • Step S222 may include:
  • the evaluation parameter may be an environmental noise evaluation parameter in the microphone signal.
  • the environmental noise evaluation parameter may include at least one of an environmental noise level and a signal-to-noise ratio.
  • the electronic device 200 may acquire the environmental noise evaluation parameter in the microphone signal through the control module 231 . Specifically, the electronic device 200 may acquire the environmental noise evaluation parameter according to at least one of the first audio signal 243 and the second audio signal 245 . The electronic device 200 may obtain the environmental noise level or the signal-to-noise ratio through a noise estimation algorithm, and details are not described herein again.
  • Step S222-4 Generate the control signal based on the strength of the speaker signal and the environmental noise evaluation parameter. Specifically, the electronic device 200 may compare the intensity of the speaker signal with a preset speaker threshold, and compare the environmental noise evaluation parameter with a preset noise evaluation range, and generate the control signal according to the comparison result .
  • Step S222-4 may include one of the following situations:
  • S222-5 Determine that the intensity of the speaker signal is higher than a preset speaker threshold, and generate the second control signal
  • S222-6 Determine that the intensity of the speaker signal is lower than the speaker threshold, and the environmental noise evaluation parameter is outside a preset noise evaluation range, and generate the first control signal;
  • S222-7 Determine that the intensity of the speaker signal is lower than the speaker threshold, and the environmental noise evaluation parameter is within the noise evaluation range, and generate the first control signal or the second control signal.
  • the environmental noise evaluation parameter being within the noise evaluation range may include at least one of the environmental noise level being lower than a preset environmental noise threshold, and the signal-to-noise ratio being higher than a preset signal-to-noise ratio threshold.
  • the ambient noise at this time is small.
  • the fact that the environmental noise evaluation parameter is outside the noise evaluation range may include at least one of the environmental noise level being higher than a preset environmental noise threshold, and the signal-to-noise ratio being lower than a preset signal-to-noise ratio threshold. At this time, the ambient noise is large.
  • the environmental noise evaluation parameter is outside the noise evaluation range, that is, in a loud noise environment, the voice quality of the first target audio 291 is better than that of the second target audio 292 .
  • the environmental noise evaluation parameter is within the noise evaluation range, the voice quality of the first target audio 291 is not much different from the voice quality of the second target audio 292 .
  • Step S220 may also include:
  • S224 Based on the control signal, select the target audio processing mode corresponding to the control signal.
  • the first control signal corresponds to the first mode 1 .
  • the second control signal corresponds to the second mode 2.
  • the control signal is the first control signal
  • the first mode 1 is selected; when the control signal is the second control signal, the second mode 2 is selected.
  • the electronic device 200 When the intensity of the speaker signal is higher than the speaker threshold, when the first algorithm 233-1 in the first mode 1 performs signal processing on the first audio signal 243 and the second audio signal 245, it cannot keep a higher At the same time as a good vocal signal, the echo signal in the signal is eliminated, so the voice quality of the first target audio 291 obtained is poor; while the second algorithm 233-8 in the second mode 2 has no effect on the second audio The quality of the second target audio 292 obtained by performing signal processing on the signal 245 is good. Therefore, when the intensity of the speaker signal is higher than the speaker threshold, the electronic device 200 generates the second control signal corresponding to the second mode 2 regardless of the range of the ambient noise.
  • the first algorithm 233-1 in the first mode 1 performs signal processing on the first audio signal 243 and the second audio signal 245, it can save a higher
  • the echo signal in the signal is eliminated, so the voice quality of the first target audio 291 obtained is better; and the second algorithm 233-8 in the second mode 2 is effective for the second audio.
  • the quality of the second target audio 292 obtained by performing signal processing on the signal 245 is also good. Therefore, when the strength of the speaker signal is below the speaker threshold, the control signal generated by the electronic device 200 is related to ambient noise.
  • the ambient noise level is higher than the ambient noise threshold or the signal-to-noise ratio is lower than the signal-to-noise ratio threshold, it represents that the ambient noise in the microphone signal is relatively large.
  • the first algorithm 233-1 in the first mode 1 performs signal processing on the first audio signal 243 and the second audio signal 245, the noise in the signal can be reduced while retaining a better human voice signal, so The obtained voice quality of the first target audio 291 is good; while the second algorithm 233-8 in the second mode 2 performs signal processing on the second audio signal 245 to obtain the voice of the second target audio 292 The quality is not as good as the voice quality of the first target audio 291 .
  • the electronic device 200 when the strength of the speaker signal is lower than the speaker threshold, and the ambient noise level is higher than the ambient noise threshold or the signal-to-noise ratio is lower than the signal-to-noise ratio threshold, the electronic device 200 generates a The first control signal corresponding to the first mode 1.
  • the electronic device 200 can always generate the second control signal to select the second algorithm 233-8 in the second mode 2 to perform signal processing on the second audio signal 245, on the premise of ensuring the target audio voice quality It reduces the amount of computation and saves resources.
  • the electronic device 200 When the ambient noise level is lower than the ambient noise threshold or the signal-to-noise ratio is higher than the signal-to-noise ratio threshold, it represents that the ambient noise in the microphone signal is small.
  • the first target audio 291 obtained when the first algorithm 233-1 in the first mode 1 performs signal processing on the first audio signal 243 and the second audio signal 245, and the second algorithm in the second mode 2 233-8 performs signal processing on the second audio signal 245 and the voice quality of the second target audio 292 is good. Therefore, when the strength of the speaker signal is lower than the speaker threshold, and the ambient noise level is lower than the ambient noise threshold or the signal-to-noise ratio is higher than the signal-to-noise ratio threshold, the electronic device 200 generates the the first control signal or the second control signal.
  • the electronic device 200 may determine the control signal in the current scene according to the control signal of the previous scene. That is to say, in the current scene, when the electronic device generates the first control signal, when in the current scene, the electronic device also generates the first control signal, thereby ensuring the continuity of the signal. vice versa.
  • the control signal is generated by the control module 231 .
  • the electronic device 200 can monitor the intensity of the speaker signal and the environmental noise evaluation parameter in real time, and compare with the speaker threshold and the noise evaluation range.
  • the electronic device 200 may also periodically detect the intensity of the speaker signal and the environmental noise evaluation parameter, and compare it with the speaker threshold and the noise evaluation range.
  • the electronic device 200 may further compare the speaker signal and the environmental noise evaluation parameter with the The loudspeaker threshold is compared with the noise evaluation range.
  • the speaker threshold, the ambient noise threshold and the preset signal-to-noise ratio threshold may be a range.
  • the loudspeaker threshold is as described above and will not be repeated here.
  • the ambient noise threshold may be within a range within which the first noise threshold and the second noise threshold lie.
  • the first noise threshold is smaller than the second noise threshold.
  • the ambient noise level being above the ambient noise threshold may include the ambient noise level being above the second noise threshold.
  • the ambient noise level being below the ambient noise threshold may include the ambient noise level being below the first noise threshold.
  • the signal-to-noise ratio threshold may be within a range within which the first signal-to-noise ratio threshold and the second signal-to-noise ratio threshold lie.
  • the first SNR threshold is smaller than the second SNR threshold.
  • the signal-to-noise ratio being above the signal-to-noise ratio threshold may include the signal-to-noise ratio being above the second signal-to-noise ratio threshold.
  • the signal-to-noise ratio being below the signal-to-noise ratio threshold may include the signal-to-noise ratio being below the first signal-to-noise ratio threshold.
  • the method P200 may include performing, by at least one processor 220:
  • step S240 Process the microphone signal in the target audio processing mode to generate target audio to at least reduce echoes in the microphone signal.
  • step S240 may include one of the following situations:
  • step S242 Determine that the control signal is the first control signal, select the first mode 1, and perform signal processing on the first audio signal 243 and the second audio signal 245 to generate a first target audio 291 .
  • step S242 may be the same as step S142, and details are not repeated here.
  • step S244 Determine that the control signal is the second control signal, select the second mode 2, perform echo suppression on the second audio signal 245, and generate a second target audio 292.
  • step S244 may be the same as step S144, and details are not repeated here.
  • step S260 Output the target audio.
  • step S260 may be the same as step S160, and details are not repeated here.
  • the method P200 may also include:
  • step S280 Control the intensity of the speaker input signal of the speaker 280 based on the control signal. Specifically, step S280 may be consistent with step S180, and details are not repeated here.
  • Table 2 shows the result graph of the target audio processing mode corresponding to FIG. 6 .
  • the scenarios As shown in Table 2, for the convenience of comparison, we divide the scenarios into 8 scenarios, which are the first one: the near-end sound signal is less than the threshold (for example, user 002 does not emit sound), and the speaker signal does not exceed the speaker threshold.
  • the second type the near-end sound signal is greater than the threshold (for example, user 002 makes a sound), the speaker signal does not exceed the speaker threshold, and the ambient noise is small
  • the third type the near-end sound signal If it is less than the threshold (such as the user 002 does not make a sound), the speaker signal exceeds the speaker threshold, and the ambient noise is small
  • the fourth type the near-end sound signal is greater than the threshold (such as the user 002 makes a sound), the speaker signal Exceed the speaker threshold, and the ambient noise is small
  • sixth Type the near-end sound signal is greater than the threshold (for example, the user 002 makes a sound), the speaker signal does not exceed the speaker threshold, and the ambient noise is large
  • the control module 231 determines whether the near-end sound signal is greater than the threshold.
  • the fact that the near-end sound signal is greater than the threshold may be that the strength of the audio signal sent by the user 002 exceeds a preset threshold.
  • the target audio processing modes corresponding to the 8 scenes are respectively the fifth and sixth corresponding to the first mode 1; the third, fourth, seventh and eighth corresponding to the second mode 2; the remaining scenes correspond to The first mode 1 or the second mode 2.
  • the method P200 can not only control the target audio processing mode of the electronic device 200 according to the speaker signal, but also control the target audio processing mode according to the near-end environmental noise signal, so as to ensure that in different scenarios, the output of the electronic device 200 is The voice quality of the voice signal is the best to ensure the quality of the call.
  • the selection of the target audio processing mode is not only related to the echo and ambient noise of the speaker signal, but also to the speech signal when the user 002 speaks.
  • the ambient noise signal may be evaluated by at least one of an ambient noise level and a signal-to-noise ratio in the microphone signal.
  • the speech signal when the user 002 speaks can be evaluated by the signal strength of the human voice in the microphone signal.
  • the human voice signal strength may be the voice signal strength obtained through a noise estimation algorithm, and the voice signal strength may also be the strength of the audio signal obtained after noise reduction processing.
  • FIG. 7 shows a flowchart of an audio signal processing method P300 for suppressing echoes provided according to an embodiment of the present specification.
  • the method P300 is a flowchart of a method for the system 100 to select a target audio processing mode of the electronic device 200 according to the signal strength of the speaker signal and the microphone signal.
  • the method P300 is a flowchart of a method for the system 100 to select the target audio processing mode according to at least one of the speaker signal, the human voice signal strength in the microphone signal, and ambient noise level and signal-to-noise ratio.
  • the method P300 may include performing, by at least one processor 220:
  • step S320 Select a target audio processing mode of the electronic device 200 from the first mode 1 and the second mode 2 based on at least the speaker signal.
  • step S320 may include:
  • Step S322 Generate a control signal corresponding to the speaker signal based on at least the strength of the speaker signal.
  • the control signal includes a first control signal or a second control signal.
  • Step S320 may be that the electronic device 200 generates a corresponding control signal based on the strength of the speaker signal, the noise in the microphone signal, and the strength of the human voice signal in the microphone signal.
  • step S322 may include:
  • the evaluation parameters may include environmental noise evaluation parameters in the microphone signal, and may also include human voice signal strength in the microphone signal.
  • the environmental noise evaluation parameter may include at least one of an environmental noise level and a signal-to-noise ratio.
  • the electronic device 200 may obtain the environmental noise evaluation parameter and the human voice signal strength in the microphone signal through the control module 231 . Specifically, the electronic device 200 may acquire the evaluation parameter according to at least one of the first audio signal 243 and the second audio signal 245 .
  • the electronic device 200 may acquire the human voice signal, the environmental noise level, and the signal-to-noise ratio through a noise estimation algorithm, and details are not described herein again.
  • Step S322-4 Generate the control signal based on the strength of the speaker signal and the evaluation parameter. Specifically, the electronic device 200 can compare the intensity of the speaker signal with a preset speaker threshold, compare the environmental noise evaluation parameter with a preset noise evaluation range, and compare the human voice signal intensity with a preset noise evaluation range. The set human voice threshold is compared, and the control signal is generated according to the comparison result. Step S322-4 may include one of the following situations:
  • S322-5 Determine that the intensity of the speaker signal is higher than a preset speaker threshold, the intensity of the human voice signal exceeds the human voice threshold, and the environmental noise evaluation parameter is outside the preset noise evaluation range, and generate the first control signal;
  • S322-6 Determine that the intensity of the speaker signal is higher than the speaker threshold, the intensity of the human voice signal exceeds the human voice threshold, and the environmental noise evaluation parameter is within the noise evaluation range, and generate the a second control signal;
  • S322-7 Determine that the intensity of the speaker signal is higher than the speaker threshold, and the intensity of the human voice signal is lower than the human voice threshold, and generate the second control signal;
  • S322-8 Determine that the intensity of the speaker signal is lower than the speaker threshold, and the environmental noise evaluation parameter is outside the noise evaluation range, and generate the first control signal;
  • S322-9 Determine that the intensity of the speaker signal is lower than the speaker threshold, and the environmental noise evaluation parameter is within the noise evaluation range, and generate the first control signal or the second control signal.
  • the environmental noise evaluation parameter being within the noise evaluation range may include at least one of the environmental noise level being lower than a preset environmental noise threshold, and the signal-to-noise ratio being higher than a preset signal-to-noise ratio threshold.
  • the ambient noise at this time is small.
  • the fact that the environmental noise evaluation parameter is outside the noise evaluation range may include at least one of the environmental noise level being higher than a preset environmental noise threshold, and the signal-to-noise ratio being lower than a preset signal-to-noise ratio threshold. At this time, the ambient noise is large.
  • the environmental noise evaluation parameter is outside the noise evaluation range, that is, in a loud noise environment, the voice quality of the first target audio 291 is better than that of the second target audio 292 .
  • the voice quality of the first target audio 291 is not much different from the voice quality of the second target audio 292 .
  • the loudspeaker threshold, the ambient noise threshold, and the signal-to-noise ratio threshold are as described above, and will not be repeated here.
  • the electronic device 200 may generate the first control signal and reduce the speaker signal to ensure the voice quality of the first target audio 292 .
  • the speaker threshold, the ambient noise threshold, the signal-to-noise ratio threshold, and the vocal threshold may be stored in the electronic device 200 in advance.
  • Step S320 may also include:
  • S324 Based on the control signal, select the target audio processing mode corresponding to the control signal.
  • the first control signal corresponds to the first mode 1 .
  • the second control signal corresponds to the second mode 2 .
  • the control signal is the first control signal
  • the first mode 1 is selected; when the control signal is the second control signal, the second mode 2 is selected.
  • the electronic device 200 can reduce or even turn off the speaker input signal input to the speaker 280 to reduce the echo in the microphone signal and ensure the voice quality of the target audio.
  • the voice quality of the first target audio 291 obtained by performing signal processing on the first audio signal 243 and the second audio signal 245 by the first algorithm 233-1 in the first mode 1 is higher than that in the second mode 2
  • the second target audio 292 obtained by performing signal processing on the second audio signal 245 by the second algorithm 233-8 is better. Therefore, when the intensity of the speaker signal is higher than the speaker threshold and the human voice signal intensity exceeds the human voice threshold, and the environmental noise evaluation parameter is outside the preset noise evaluation range, the electronic device 200 generates The first control signal corresponding to the first mode 1. In this case, the electronic device 200 can guarantee the intelligibility of the near-end user 002's speech quality. Although a part of the speaker input signal is missing, the electronic device 200 can retain most of the voice quality and intelligibility of the speaker input signal, thereby improving the quality of voice communication between the two parties.
  • the environmental noise evaluation parameter is at a preset value When it is within the noise evaluation range of , it proves that the user 002 is not speaking at this time, or the user 002 is speaking but the noise is small.
  • the voice quality of the first target audio 291 obtained by performing signal processing on the first audio signal 243 and the second audio signal 245 by the first algorithm 233 - 1 in the first mode 1 is higher than that in the second mode 2
  • the second target audio 292 obtained by signal processing the second audio signal 245 by the second algorithm 233-8 is worse.
  • the ambient noise evaluation parameter is at When within the preset noise evaluation range, the electronic device 200 generates the second control signal corresponding to the second mode 2 .
  • step S322-4 are basically the same as those in step S222-4, and are not repeated here.
  • the control signal is generated by the control module 231 .
  • the electronic device 200 can monitor the strength of the speaker signal and the evaluation parameter in real time, and compare it with the speaker threshold, the noise evaluation range, and the human voice threshold.
  • the electronic device 200 may also periodically detect the strength of the speaker signal and the evaluation parameter, and compare it with the speaker threshold, the noise evaluation range, and the human voice threshold.
  • the electronic device 200 may further compare the speaker signal and the evaluation parameter with the speaker threshold, all the parameters when the intensity of the speaker signal or the evaluation parameter is significantly changed, and the change value exceeds a preset range.
  • the noise evaluation range and the human voice threshold are compared.
  • the method P300 may include performing, by at least one processor 220:
  • step S340 Process the microphone signal in the target audio processing mode to generate the target audio to at least reduce the echo in the microphone signal.
  • step S340 may include one of the following situations:
  • step S342 Determine that the control signal is the first control signal, select the first mode 1, and perform signal processing on the first audio signal 243 and the second audio signal 245 to generate a first target audio 291 .
  • step S342 may be consistent with step S142, and details are not repeated here.
  • step S344 Determine that the control signal is the second control signal, select the second mode 2, and perform signal processing on the second audio signal 245 to generate a second target audio 292.
  • step S344 may be the same as step S144, and details are not repeated here.
  • the method P300 may include performing, by at least one processor 220:
  • step S360 Output the target audio.
  • step S360 may be the same as step S160, and details are not repeated here.
  • the method P300 may also include:
  • step S380 Based on the control signal, control the intensity of the speaker input signal of the speaker 280. Specifically, step S380 may be consistent with step S180, and details are not repeated here.
  • Table 3 shows the result graph of the target audio processing mode corresponding to FIG. 7 .
  • the scenarios which are the first one: the near-end sound signal is less than the threshold (for example, user 002 does not emit sound), and the speaker signal does not exceed the speaker threshold , and the ambient noise is small; the second type: the near-end sound signal is greater than the threshold (for example, user 002 makes a sound), the speaker signal does not exceed the speaker threshold, and the ambient noise is small; the third type: the near-end sound signal If it is less than the threshold (such as the user 002 does not make a sound), the speaker signal exceeds the speaker threshold, and the ambient noise is small; and the fourth type: the near-end sound signal is greater than the threshold (such as the user 002 makes a sound), the speaker signal Exceed the speaker threshold, and the ambient noise is small; the fifth: the near-end sound signal is less than the threshold (for example, user 002 does not emit sound), the speaker
  • the control module 231 determines whether the near-end sound signal is greater than the threshold.
  • the fact that the near-end sound signal is greater than the threshold may be that the strength of the audio signal sent by the user 002 exceeds a preset threshold.
  • the target audio processing modes corresponding to the 8 scenes are respectively the fifth, sixth and eighth corresponding to the first mode 1; the third, fourth and seventh corresponding to the second mode 2; the remaining scenes correspond to The first mode 1 or the second mode 2.
  • the method P200 and the method P300 are applicable to different application scenarios.
  • the method P200 may be selected to ensure the quality of the speaker signal and the intelligibility of the speaker signal.
  • method P300 can be selected to ensure the voice quality and intelligibility of the near-end voice.
  • the system 100, the method P100, the method P200 and the method P300 can control the target audio processing mode of the electronic device 200 according to the speaker signal according to different scenarios, thereby controlling the audio source signal of the electronic device 200,
  • the voice quality of the target audio in any scene is optimal, thereby improving the quality of voice communication.
  • the signal strength of environmental noise is different at each frequency.
  • the voice quality of the first target audio 291 and the second target audio 292 are also different.
  • the voice quality of the first target audio 291 obtained after the first audio signal 243 and the second audio signal 245 are processed by the first algorithm 233-1 is better than that of the second audio
  • the voice quality of the second target audio 292 obtained after the audio signal 245 is subjected to signal processing by the second algorithm 233-8.
  • the voice quality of the first target audio 291 is obtained after the first audio signal 243 and the second audio signal 245 are processed by the first algorithm 233-1.
  • the voice quality is similar to that of the second target audio 292 obtained after the second audio signal 245 is processed by the second algorithm 233-8.
  • the electronic device 200 may also generate the control signal according to the frequency of the environmental noise.
  • the first control signal is generated at the first frequency and the second control signal is generated at a frequency other than the first frequency.
  • the first target obtained by the first audio signal 243 and the second audio signal 245 under the signal processing of the first algorithm 233-1 may appear at this time.
  • the speech signal quality of the audio 291 at low frequencies is poor, that is, the speech intelligibility of the first target audio 291 at low frequencies is poor, and the speech intelligibility at high frequencies is higher.
  • the electronic device 200 may control the selection of the target audio processing mode according to the frequency of the ambient noise.
  • the electronic device 200 can select the method P300 to control the target audio processing mode, so as to ensure that the voice of the near-end user 002 is picked up, thereby ensuring the quality of the near-end voice; in the high frequency range, the electronic device 200 may select the method P200 to control the target audio processing mode to ensure that the near-end user 002 can hear the speaker signal.
  • Another aspect of the present specification provides a non-transitory storage medium storing at least one set of executable instructions for control based on a sound source signal, and when the executable instructions are executed by a processor, the executable instructions instruct a
  • the processor implements the steps of the audio signal processing method for echo suppression described in this specification.
  • various aspects of this specification may also be implemented in the form of a program product, which includes program code.
  • the program product is executed on the electronic device 200, the program code is used to cause the electronic device 200 to perform the steps of the audio source signal-based control described in this specification.
  • a program product for implementing the above method may employ a portable compact disc read only memory (CD-ROM) including program codes, and may be executed on the electronic device 200 .
  • CD-ROM portable compact disc read only memory
  • a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system (eg, processor 220).
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • the computer-readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, carrying readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable storage medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • Program code embodied on a readable storage medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for carrying out the operations of this specification may be written in any combination of one or more programming languages, including object-oriented programming languages - such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language.
  • the program code may execute entirely on electronic device 200, partly on electronic device 200, as a stand-alone software package, partly on electronic device 200 and partly on a remote computing device, or entirely on the remote computing device implement.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

本说明书提供的用于抑制回声的音频信号处理方法和系统,根据扬声器信号强度控制目标音频处理模式的选择。所述方法和系统根据扬声器信号强度生成对应的控制信号,并根据控制信号控制目标音频处理模式,对麦克风信号进行信号处理,以获得更优的语音质量。当扬声器信号未超过阈值时,系统选择第一模式,并对第一音频信号和第二音频信号进行信号处理,以得到第一目标音频。而当扬声器信号超过阈值时,系统选择第二模式,并对第二音频信号进行信号处理,以得到第二目标音频。所述方法和系统,能够根据扬声器信号切换目标音频处理模式,保证在不同场景下都能获得更好的语音质量,提升语音通信质量。

Description

用于抑制回声的音频信号处理方法和系统 技术领域
本说明书涉及音频信号处理领域,尤其涉及一种用于抑制回声的音频信号处理方法和系统。
背景技术
目前,振动传感器用于耳机等电子产品上,作为骨传导麦克风接收语音信号的应用越来越多。人在说话时,会同时引起骨骼和皮肤的振动,这些振动就是骨传导的语音信号,能够为骨传导麦克风拾取从而产生信号。系统将骨传导麦克风采集的振动信号转换为电信号或其他类型的信号,传递到电子设备实现拾音功能。目前,越来越多的电子设备将具有不同特性的气传导麦克风与骨传导麦克风组合起来,使用气传导麦克风拾取外部音频信号,使用骨传导麦克风拾取发声部位振动信号,并对所拾取信号进行语音增强处理和融合。当将骨传导麦克风放置在耳机或者其他具有扬声器的电子设备内时,骨传导麦克风不仅能够接收到人说话时的振动信号,也能接收到耳机或其他电子设备的扬声器在播放声音时产生的振动信号,从而产生回声信号。这时需要对其进行回声消除算法处理。而扬声器的回声信号不同,也会影响麦克风的语音质量。比如当扬声器输入信号较强时,骨传导麦克风接收到的扬声器振动信号较大,要远大于骨传导麦克风接收到的人说话时产生的振动信号,此时传统的回声消除算法难以消除骨传导麦克风中的回声。此时,气传导麦克风和骨传导麦克风输出的麦克风信号作为音源信号得到的语音质量较差。因此,在选择麦克风的音源信号 时不考虑扬声器的回声信号是不合理的。
因此,需要提供一种新的用于抑制回声的音频信号处理方法和系统,以根据不同的扬声器输入信号切换输入的音源信号,提高回声消除的效果,提升语音质量。
发明内容
本说明书提供一种新的用于抑制回声的音频信号处理方法和系统,以提高回声消除的效果,提升语音质量。
第一方面,本说明书提供一种用于抑制回声的音频信号处理方法,包括:至少基于扬声器信号从多个音频处理模式中选择电子设备的目标音频处理模式,所述扬声器信号为控制设备发送给所述电子设备的音频信号;通过所述目标音频处理模式处理麦克风信号生成目标音频,来至少降低所述目标音频中的回声,所述麦克风信号为所述电子设备获取的麦克风模组的输出信号,所述麦克风模组包括至少一个第一类麦克风和至少一个第二类麦克风;以及输出所述目标音频信号。
在一些实施例中,所述至少一个第一类麦克风输出第一音频信号;以及所述至少一个第二类麦克风输出第二音频信号,其中,所述麦克风信号包括所述第一音频信号和所述第二音频信号。
在一些实施例中,所述至少一个第一类麦克风用于采集人体振动信号;以及所述至少一个第二类麦克风用于采集空气振动信号。
在一些实施例中,所述多个音频处理模式至少包括:第一模式,对所述第一音频信号和所述第二音频信号进行信号处理;以及第二模式,对所述第二音频信号进行信号处理。
在一些实施例中,所述至少基于扬声器信号从多个音频处理模式中选择电子设备的目标音频处理模式,包括:至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号,所述控制信号包括第一控制信号或第二控制信号;以及基于所述控制信号,选择与所述控制信号对应的目标音频处理模式,其中,所述第一模式与所述第一控制信号对应,所述第二模式与所述第二控制信号对应。
在一些实施例中,所述至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号,包括:确定所述扬声器信号的强度低于预设的扬声器阈值,生成所述第一控制信号;或者确定所述扬声器信号的强度高于所述扬声器阈值,生成所述第二控制信号。
在一些实施例中,所述至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号,包括:基于所述扬声器信号的强度以及所述麦克风信号,生成对应的控制信号。
在一些实施例中,所述基于所述扬声器信号的强度以及所述麦克风信号,生成对应的控制信号,包括:获取所述麦克风信号的评价参数,所述评价参数包括环境噪声评价参数,所述环境噪声评价参数包括环境噪声等级以及信噪比中的至少一个;以及基于所述扬声器信号的强度以及所述评价参数,生成所述控制信号。
在一些实施例中,所述基于所述扬声器信号的强度以及所述评价参数,生成所述控制信号,包括以下情况中的一种:确定所述扬声器信号的强度高于预设的扬声器阈值,生成所述第二控制信号;确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于预设的噪声评价范围外,生成所述第一控制信号;以及确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围内,生成所述第一控制信号或所述第二控制信号。
在一些实施例中,所述环境噪声评价参数处于所述噪声评价范围内,包括以下情况中的至少一种:所述环境噪声等级低于预设环境噪声阈值;以及所述信噪比高于预设信噪比阈值。
在一些实施例中,所述评价参数还包括人声信号强度,所述基于所述扬声器信号的强度以及所述评价参数,生成所述控制信号,包括以下情况中的一种:确定所述扬声器信号的强度高于预设的扬声器阈值,且所述人声信号强度超过预设人声阈值,所述环境噪声评价参数处于预设的噪声评价范围之外,生成所述第一控制信号;确定所述扬声器信号的强度高于所述扬声器阈值,且所述人声信号强度超过所述人声阈值,所述环境噪声评价参数处于所述噪声评价范围之内,生成所述第二控制信号;确定所述扬声器信号的强度高于所述扬声器阈值,且所述人声信号强度低于所述人声阈值,生成所述第二控制信号;确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围之外,生成所述第一控制信号;以及确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围内,生成所述第一控制信号或所述第二控制信号。
在一些实施例中,所述环境噪声评价参数处于所述噪声评价范围内,包括以下情况中的至少一种:所述环境噪声等级低于预设环境噪声阈值;以及所述信噪比高于预设信噪比阈值。
在一些实施例中,所述生成目标音频,包括:通过所述第一模式中的第一算法,对所述第一音频信号和所述第二音频信号进行信号处理,生成第一目标音频;或者通过所述第二模式中的第二算法,对所述第二音频信号进行信号处理,生成第二目标音频,其中,所述目标音频包括所述第一目标音频和所述第二目标音频中的一个。
在一些实施例中,所述输出所述目标音频,包括:对所述目标音频做平滑处理,当所述目标音频在所述第一目标音频和所述第二目标音频之间切换时,对所述第一目标音频和所述第二目标音频的连接处进行所述平滑处理;以及输出经过所述平滑处理的所述目标音频。
在一些实施例中,所述方法还包括:基于所述控制信号,控制所述扬声器的扬声器输入信号的强度。
在一些实施例中,所述基于所述控制信号,控制所述扬声器的扬声器输入信号的强度,包括:确定所述控制信号为所述第一控制信号,降低输入所述扬声器的所述扬声器输入信号的强度,从而降低所述扬声器输出的声音的强度。
第二方面,本说明书还提供一种用于抑制回声的音频信号处理的系统,包括:至少一个存储介质以及至少一个处理器,所述至少一个存储介质存储有至少一个指令集,用于抑制回声的音频信号处理;所述至少一个处理器同所述至少一个存储介质通信连接,其中,当所述系统运行时,所述至少一个处理器读 取所述至少一个指令集,并且根据所述至少一个指令集的指示执行本说明书第一方面所述的用于抑制回声的音频信号处理的方法。
由以上技术方案可知,本说明书提供的用于抑制回声的音频信号处理方法和系统,可以根据扬声器信号的强度生成与所述扬声器信号相对应的控制信号,并根据控制信号控制或切换音频处理模式,从而对与音频处理模式对应的音源信号进行信号处理,以获得更优的语音质量。当扬声器信号未超过阈值时,所述系统生成第一控制信号,选择第一模式,并以第一音频信号和第二音频信号作为第一音源信号,对第一音源信号进行信号处理,以得到第一目标音频。而当扬声器信号超过阈值时,第一音频信号中的扬声器回声较大。此时,所述系统生成第二控制信号,选择第二模式,并以第二音频信号作为第二音源信号,对第二音源信号进行信号处理,以得到第二目标音频。所述方法和系统,能够根据扬声器信号切换不同的音频处理模式,从而切换麦克风信号的音源信号,以提升语音质量,保证在不同场景下都能获得更好的语音质量。
本说明书提供的用于抑制回声的音频信号处理方法和系统的其他功能将在以下说明中部分列出。根据描述,以下数字和示例介绍的内容将对那些本领域的普通技术人员显而易见。本说明书提供的用于抑制回声的音频信号处理方法和系统的创造性方面可以通过实践或使用下面详细示例中所述的方法、装置和组合得到充分解释。
附图说明
为了更清楚地说明本说明书实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说 明书的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了根据本说明书的实施例提供的一些用于抑制回声的音频信号处理系统的应用场景示意图;
图2示出了根据本说明书的实施例提供的一些电子设备的设备示意图;
图3示出了根据本说明书的实施例提供的一些第一模式的工作示意图;
图4示出了根据本说明书的实施例提供的一些第二模式的工作示意图;
图5示出了根据本说明书的实施例提供的一些用于抑制回声的音频信号处理方法流程图;
图6示出了根据本说明书的实施例提供的一些用于抑制回声的音频信号处理方法流程图;以及
图7示出了根据本说明书的实施例提供的一些用于抑制回声的音频信号处理方法流程图。
具体实施方式
以下描述提供了本说明书的特定应用场景和要求,目的是使本领域技术人员能够制造和使用本说明书中的内容。对于本领域技术人员来说,对所公开的实施例的各种局部修改是显而易见的,并且在不脱离本说明书的精神和范围的情况下,可以将这里定义的一般原理应用于其他实施例和应用。因此,本说明书不限于所示的实施例,而是与权利要求一致的最宽范围。
这里使用的术语仅用于描述特定示例实施例的目的,而不是限制性的。比 如,除非上下文另有明确说明,这里所使用的,单数形式“一”,“一个”和“该”也可以包括复数形式。当在本说明书中使用时,术语“包括”、“包含”和/或“含有”意思是指所关联的整数,步骤、操作、元素和/或组件存在,但不排除一个或多个其他特征、整数、步骤、操作、元素、组件和/或组的存在或在该系统/方法中可以添加其他特征、整数、步骤、操作、元素、组件和/或组。
考虑到以下描述,本说明书的这些特征和其他特征、以及结构的相关元件的操作和功能、以及部件的组合和制造的经济性可以得到明显提高。参考附图,所有这些形成本说明书的一部分。然而,应该清楚地理解,附图仅用于说明和描述的目的,并不旨在限制本说明书的范围。还应理解,附图未按比例绘制。
本说明书中使用的流程图示出了根据本说明书中的一些实施例的系统实现的操作。应该清楚地理解,流程图的操作可以不按顺序实现。相反,操作可以以反转顺序或同时实现。此外,可以向流程图添加一个或多个其他操作。可以从流程图中移除一个或多个操作。
图1示出了根据本说明书的实施例提供的一些用于抑制回声的音频信号处理系统100(以下简称系统100)的应用场景示意图。系统100可以包括电子设备200和控制设备400。
电子设备200可以存储有执行本说明书描述的用于抑制回声的音频信号处理的方法的数据或指令,并可以执行所述数据和/或指令。在一些实施例中,电子设备200可以是无线耳机、有线耳机、智能穿戴式设备,比如,智能眼镜、智能头盔或者智能腕表等具有语音采集功能以及语音播放功能的设备。电子设备200也可以是移动设备、平板电脑、笔记本电脑、机动车内置装置或类似内容,或其任意组合。在一些实施例中,移动设备可包括智能家居设备、智能移 动设备或类似设备,或其任意组合。比如,所述智能移动设备可包括手机、个人数字辅助、游戏设备、导航设备、超级移动个人计算机(Ultra-mobile Personal Computer,UMPC)等,或其任意组合。在一些实施例中,所述智能家居装置可包括智能电视、台式电脑等,或任意组合。在一些实施例中,机动车中的内置装置可包括车载计算机、车载电视等。
控制设备400可以是与电子设备200进行有线和/或无线音频信号通信的远程设备。控制设备400也本地可以是与电子设备200通信连接的设备。电子设备200可以采集本地的音频信号并输出至控制设备400。电子设备200还可以接收控制设备400发送的远端音频信号并输出。所述远端音频信号也可以称为扬声器信号。控制设备400也可以是具有语音采集功能以及语音播放功能的设备。比如,手机、平板电脑、笔记本电脑、耳机、智能穿戴式设备、机动车内置装置或类似内容,或其任意组合。比如,电子设备200为耳机时,控制设备400可以是与所述耳机通信连接的终端设备,比如,手机、电脑,等等。
如图1所示,电子设备200可以包括麦克风模组240以及扬声器280。麦克风模组240可以被配置为获取本地音频信号,并输出麦克风信号,也就是携带了音频信息的电子信号。麦克风模组240可以是耳外麦克风模组也可以是耳内麦克风模组。比如,麦克风模组240可以是设置于耳道外的麦克风,也可以是设置在耳道内的麦克风。麦克风模组240可以包括至少一个第一类麦克风242和至少一个第二类麦克风244。第一类麦克风242不同于第二类麦克风244。第一类麦克风242可以是直接采集人体振动信号的麦克风,比如骨传导麦克风。第二类麦克风244可以是直接采集空气振动信号的麦克风,比如气传导麦克风。当然,第一类麦克风242和第二类麦克风244也可以是其他类型的麦克风。比 如第一类麦克风242可以是光学麦克风;第二类麦克风244可以是接收肌电信号的麦克风,等等。由于第一类麦克风242不同于第二类麦克风244,在感知音频信号的表现上便会不同,造成相应的音频信号中的噪音和回声成分会不同。为了方便展示,本披露在下面的陈述中将使用骨传导麦克风作为第一类麦克风242的例子,使用气传导麦克风作为第二类麦克风244的例子。
骨传导麦克风可以包括振动传感器,比如光学振动传感器、加速度传感器等。所述振动传感器可以采集机械振动信号(比如,由用户002说话时皮肤或骨骼产生的振动产生的信号),并将该机械振动信号转换成电信号。这里所说的机械振动信号主要指经由固体传播的振动。骨传导麦克风通过所述振动传感器或与所述振动传感器连接的振动部件与用户002的皮肤或骨骼进行接触,从而采集用户002在发出声音时骨骼或皮肤产生的振动信号,并将振动信号转换为电信号。在一些实施例中,所述振动传感器可以是对机械振动敏感而对空气振动不敏感的装置(即所述振动传感器对于机械振动的响应能力超过所述振动传感器对于空气振动的响应能力)。由于骨传导麦克风能够直接拾取发声部位的振动信号,骨传导麦克风能降低环境噪声的影响。
气传导麦克风通过采集用户002在发出声音时引起的空气振动信号,并将空气振动信号转化为电信号。气传导麦克风可以是单独的一颗气传导麦克风,也可以是由两个及以上的气传导麦克风组成的麦克风阵列。麦克风阵列可以是波束形成麦克风阵列或者其他类似的麦克风阵列。通过麦克风阵列可以采集来自空间不同方向或不同位置的声音。
第一类麦克风242可以输出第一音频信号243。第二类麦克风244可以输出第二音频信号245。所述麦克风信号包括所述第一音频信号243和所述第二音频 信号245。在低噪声场景下,第二音频信号245较第一音频信号243具有更好的语音质量。而在环境噪声较大的场景下,在低频部分第一音频信号243的语音质量更高,而在高频部分第二音频信号245的语音质量更高。因此在环境噪声较大的场景下,将第一音频信号243和第二音频信号245进行特征融合后得到的音频信号具有良好的语音质量。在实际使用过程中,环境的噪声时刻都可能发生变化,在所述低噪声场景和所述高噪声场景之间反复转换。
扬声器280可以将电信号转换为音频信号。扬声器280可以被配置为接收来自控制设备400的所述扬声器信号并输出。为了方便描述,我们将输入扬声器280的音频信号定义为扬声器输入信号。在一些实施例中,所述扬声器输入信号可以是所述扬声器信号。在一些实施例中,电子设备200可以对所述扬声器信号进行信号处理,并将信号处理后的音频信号发送给扬声器280进行输出。此时,所述扬声器输入信号可以是电子设备200对所述扬声器信号进行信号处理后得到的音频信号。
所述扬声器输入信号经扬声器280输出后的声音可以通过空气传导或者骨传导的方式传递给用户002。扬声器280可以是通过向人体传递振动信号以传递声音的扬声器,比如骨传导扬声器,也可以是通过空气传递振动信号的扬声器,比如气传导扬声器。骨传导扬声器通过振动模块产生机械振动,并将所述机械振动经由骨骼传导至耳内。比如,扬声器280可以直接或者通过特定介质(例如,一个或多个面板)接触用户002的头部,并将所述音频信号通过颅骨振动的方式传递给用户的听觉神经。气传导扬声器通过振动模块在空气中产生振动,并将所述空气振动经由空气传导至耳内。扬声器280还可以是骨传导扬声器和气传导扬声器的组合。扬声器280还可以是其他类型的扬声器。所述扬声器输 入信号经扬声器280输出后的声音可能会被麦克风模组240采集,形成回声。所述扬声器输入信号强度越大,扬声器280输出的声音强度越大,所述回声信号越强。
需要说明的是,麦克风模组240和扬声器280可以集成在电子设备200上,也可以是电子设备200的外接式设备。
第一类麦克风242和第二类麦克风244工作时,不仅能够采集到用户002发出的声音,也能采集到环境噪声,还能采集到扬声器280发出的声音。电子设备200可以通过麦克风模组240采集音频信号并生成所述麦克风信号。所述麦克风信号可以包括第一音频信号243和第二音频信号245。不同场景下,第一音频信号243和第二音频信号245的语音质量不同。为保证语音通信质量,电子设备200可以根据不同的应用场景,从多个音频处理模式中选择目标音频处理模式,以从所述麦克风信号中选择语音质量更好的音频信号作为音源信号,并通过所述目标音频处理模式对所述音源信号进行信号处理后输出至控制设备400。所述音源信号可以是所述目标音频处理模式的输入信号。在一些实施例中,所述信号处理可以包括噪声抑制以降低噪声信号。在一些实施例中,所述信号处理可以包括回声抑制以降低回声信号。在一些实施例中,所述信号处理既可以包括所述噪声抑制,也可以包括所述回声抑制。在一些实施例中,所述信号处理也可以是直接输出所述音源信号。为了方便展示,下面的描述中我们将以所述信号处理包括所述回声抑制进行描述。本领域技术人员应当明白,其他信号处理方式都在本说明书的保护范围内。
电子设备200对所述目标音频处理模式的选择,除了与环境噪声有关外,还与所述扬声器信号有关。在一些场景下,比如,所述扬声器信号较小,扬声 器280输出的声音也较小时,第一类麦克风242输出的第一音频信号243和第二类麦克风244输出的第二音频信号245进行特征融合后的音频信号的语音质量优于第二类麦克风244输出的第二音频信号245的语音质量。
而在一些特殊场景下,比如所述扬声器信号较大,扬声器280输出的声音也较大时,对于第一类麦克风242输出的所述第一音频信号243影响较大,导致所述第一音频信号243中的回声较大。在一些实施例中,所述第一音频信号243中的回声信号会超过用户002的语音信号。特别是当扬声器280为骨传导扬声器时,所述第一音频信号243中的回声信号更明显。传统的回声消除算法难以消除所述第一音频信号243中的回声信号,无法保证回声消除的效果。此时,第二类麦克风244输出的第二音频信号245的语音质量优于第一类麦克风242输出的第一音频信号243和第二类麦克风244输出的第二音频信号245进行特征融合后的音频信号的语音质量。
因此,电子设备200可以基于所述扬声器信号从所述多个音频处理模式中选择所述目标音频处理模式对所述麦克风信号进行所述信号处理。所述多个音频处理模式至少可以包括第一模式1和第二模式2。
第一模式1可以对第一音频信号243和第二音频信号245进行信号处理。如前所述,在一些实施例中,所述信号处理可以包括噪声抑制以降低噪声信号。在一些实施例中,所述信号处理可以包括回声抑制以降低回声信号。在一些实施例中,所述信号处理既可以包括所述噪声抑制,也可以包括所述回声抑制。为了方便展示,下面的描述中我们将以所述信号处理包括所述回声抑制进行描述。本领域技术人员应当明白,其他信号处理方式都在本说明书的保护范围内。
第二模式2可以对第二音频信号245进行信号处理。在一些实施例中,所述信号处理可以包括噪声抑制以降低噪声信号。在一些实施例中,所述信号处理可以包括回声抑制以降低回声信号。在一些实施例中,所述信号处理既可以包括所述噪声抑制,也可以包括所述回声抑制。为了方便展示,下面的描述中我们将以所述信号处理包括所述回声抑制进行描述。本领域技术人员应当明白,其他信号处理方式都在本说明书的保护范围内。
所述目标音频处理模式是第一模式1和第二模式2中的一个。所述多个音频处理模式还可以包括其他模式,比如,对第一音频信号243进行信号处理的处理模式。
因此,所述扬声器信号较小时,为了保证应用于语音通信的语音具有较高的质量,电子设备200选用第一模式1,以第一音频信号243和第二音频信号245作为音源信号,并对所述音源信号进行信号处理,生成第一目标音频291并输出,应用于语音通信。所述扬声器信号较大时,为了保证应用于语音通信的语音具有较高的质量,电子设备200选用第二模式2,以第二音频信号245作为音源信号,并对所述音源信号进行信号处理,生成第二目标音频292并输出,应用于语音通信。
电子设备200可以执行本说明书描述的用于抑制回声的音频信号处理的方法的数据或指令,获取所述麦克风信号以及所述扬声器信号;电子设备200可以基于所述扬声器信号的信号强度,选择对应的目标音频处理模式对所述麦克风信号进行信号处理。具体地,电子设备200可以根据所述扬声器信号的强度,从多个音频处理模式中选择与所述扬声器信号强度对应的目标音频处理模式,从所述第一音频信号243和所述第二音频信号245中选择语音质量更好的音频 信号或者其组合作为音源信号,并采用对应的信号处理算法对所述音源信号进行信号处理(比如回声消除以及降噪处理),生成目标音频并输出,以降低所述目标音频中的回声。所述目标音频可以包括第一目标音频291和第二目标音频292中的一个。电子设备200可以将所述目标音频输出至控制设备400。
综上所述,为了保证通信的语音质量,电子设备200可以基于所述扬声器信号的强度,控制并选择目标音频处理模式,从而选择语音质量更好的音频信号作为电子设备200的音源信号,并对所述音源信号进行信号处理,以针对不同的使用场景获取不同的目标音频,从而保证不同使用场景下,所述目标音频的语音质量都是最优的。
图2示出了一种电子设备200的设备示意图。电子设备200可以执行本说明书描述的用于抑制回声的音频信号处理的方法。所述用于抑制回声的音频信号处理的方法在本说明书中的其他部分介绍。比如,在图5至图7的描述中介绍了所述用于抑制回声的音频信号处理的方法。
如图2所示,电子设备200可以包括麦克风模组240和扬声器280。在一些实施例中,电子设备200还可以包括至少一个存储介质230和至少一个处理器220。
存储介质230可以包括数据存储装置。所述数据存储装置可以是非暂时性存储介质,也可以是暂时性存储介质。比如,所述数据存储装置可以包括磁盘、只读存储介质(ROM)或随机存取存储介质(RAM)中的一种或多种。存储介质230还包括存储在所述数据存储装置中的至少一个指令集,用于抑制回声的音频信号处理。所述指令是计算机程序代码,所述计算机程序代码可以包括执 行本说明书提供的用于抑制回声的音频信号处理的方法的程序、例程、对象、组件、数据结构、过程、模块等等。
如图2所示,所述至少一个指令集可以包括控制指令,由控制模块231发出,被配置为基于所述扬声器信号或者所述扬声器信号和所述麦克风信号生成与所述扬声器信号对应的控制信号。所述控制信号包括第一控制信号或第二控制信号。其中,所述第一控制信号与第一模式1相对应。所述第二控制信号与第二模式2相对应。所述控制信号可以是任意的信号,比如,所述第一控制信号可以是信号1,所述第二控制信号可以是信号2,等等。控制模块231发出的控制指令可以根据所述扬声器信号的信号强度或者所述扬声器信号的信号强度以及所述麦克风信号的评价参数,生成相对应的控制信号。所述控制信号与所述扬声器信号或者所述扬声器信号以及所述麦克风信号的对应关系将在后面的描述中详细介绍。控制模块231还可以根据所述控制信号选择与所述控制信号对应的目标音频信号处理模式。当所述控制信号为所述第一控制信号时,控制模块231选择第一模式1;当所述控制信号为所述第二控制信号时,控制模块231选择第二模式2。
在一些实施例中,所述至少一个指令集还可以包括回声处理指令,由回声处理模块233发出,被配置为基于所述控制信号,通过电子设备200的所述目标音频处理模式对所述麦克风信号进行信号处理(比如回声抑制,降噪处理等)。当所述控制信号为所述第一控制信号时,回声处理模块233采用第一模式1对所述麦克风信号进行信号处理。当所述控制信号为所述第二控制信号时,回声处理模块233采用第一模式2对所述麦克风信号进行信号处理。
所述回声处理模块233可以包括第一算法233-1以及第二算法233-8。所述第一算法233-1与所述第一控制信号以及所述第一模式1对应。所述第二算法233-8与所述第二控制信号以及所述第二模式2对应。
在第一模式1中,电子设备200采用第一算法233-1分别对第一音频信号243和第二音频信号245进行信号处理,并将经过所述信号处理后的第一音频信号243和第二音频信号245进行特征融合,输出所述第一目标音频291。
图3示出了根据本说明书的实施例提供的一种第一模式1的工作示意图。如图3所示,在第一模式1中,第一算法233-1可以接收所述第一音频信号243和所述第二音频信号245以及所述扬声器输入信号。第一算法233-1可以使用第一回声消除模块233-2基于所述扬声器输入信号对所述第一音频信号243进行回声消除。所述扬声器输入信号可以是经过降噪处理后的音频信号。第一回声消除模块233-2接收所述第一音频信号243以及所述扬声器输入信号,并输出消除回声后的所述第一音频信号243。第一回声消除模块233-2可以是单麦克风回声消除算法。
在一些实施例中,第一算法233-1可以使用第二回声消除模块233-3基于所述扬声器输入信号对所述第二音频信号245进行回声消除。第二回声消除模块233-3接收所述第二音频信号245以及所述扬声器输入信号,并输出消除回声后的所述第二音频信号245。第二回声消除模块233-3可以是单麦克风回声消除算法,也可以是多麦克风回声消除算法。第一回声消除模块233-2与第二回声消除模块233-3可以相同也可以不同。
在一些实施例中,第一算法233-1可以使用第一噪声抑制模块233-4对消除回声后的所述第一音频信号243和所述第二音频信号245进行噪声抑制。第一 噪声抑制模块233-4用于抑制所述第一音频信号243和所述第二音频信号245中的噪声信号。第一噪声抑制模块233-4接收消除回声后的所述第一音频信号243和所述第二音频信号245,并输出噪声抑制后的所述第一音频信号243和所述第二音频信号245。第一噪声抑制模块233-4可以单独对所述第一音频信号243和所述第二音频信号245进行降噪,也可以同时对所述第一音频信号243和所述第二音频信号245进行降噪。
在一些实施例中,所述第一算法233-1可以使用特征融合模块233-5对经过噪声抑制的所述第一音频信号243和所述第二音频信号245进行特征融合处理。特征融合模块233-5接收经过降噪处理的所述第一音频信号243和所述第二音频信号245。特征融合模块233-5可以分析所述第一音频信号243和所述第二音频信号245的语音质量。比如,特征融合模块233-5可以分析所述第一音频信号243和所述第二音频信号245中的有效语音信号强度、噪声信号强度、回声信号强度以及信噪比等等,判断所述第一音频信号243和所述第二音频信号245的语音质量,将第一音频信号243和所述第二音频信号245融合成所述第一目标音频291并输出。
在一些实施例中,第一算法233-1还可以使用第二噪声抑制模块233-6对所述扬声器信号进行噪声抑制。第二噪声抑制模块233-6用来抑制所述扬声器信号中的噪声信号。第二噪声抑制模块233-6接收控制设备400发送的所述扬声器信号,消除所述扬声器信号中的远端噪声、信道噪声及电子设备200中的电子噪声等噪声信号,输出经过降噪处理的扬声器处理信号。
需要说明的是,图3只是示例性说明。本领域技术人员应该明白,在一些实施例中,第一算法233-1可以包括特征融合模块233-5。在另一些实施例中, 第一算法233-2还可以包括第一回声消除模块233-2、第二回声消除模块233-3、第一噪声抑制模块233-4和第二噪声抑制模块233-6中的任意一种或其任意组合。
在第二模式2中,电子设备200采用第二算法233-8对第二音频信号245进行信号处理,并输出所述第二目标音频292。
图4示出了根据本说明书的实施例提供的一种第二模式2的工作示意图。如图4所示,在第二模式2中,第二算法233-8可以接收所述第二音频信号245以及所述扬声器输入信号。第二算法233-8可以使用第三回声消除模块233-9基于所述扬声器输入信号对所述第二音频信号245进行回声消除。第三回声消除模块233-9接收所述第二音频信号245以及所述扬声器输入信号,并输出消除回声后的所述第二音频信号245。第三回声消除模块233-9可以与第二回声消除模块233-3相同,也可以不同。
在一些实施例中,第二算法233-8可以使用第三噪声抑制模块233-10对消除回声后的所述第二音频信号245进行噪声抑制。第三噪声抑制模块233-10用于抑制所述第二音频信号245中的噪声信号。第三噪声抑制模块233-10接收消除回声后的所述第二音频信号245,并输出噪声抑制后的所述第二音频信号245作为所述第二目标音频292。第三噪声抑制模块233-10可以与第一噪声抑制模块233-4相同,也可以不同。
在一些实施例中,第二算法233-8还可以使用第四噪声抑制模块233-11对所述扬声器信号进行噪声抑制。第四噪声抑制模块233-11用来抑制所述扬声器信号中的噪声信号。第四噪声抑制模块233-11接收控制设备400发送的所述扬声器信号,消除所述扬声器信号中的远端噪声、信道噪声及电子设备200中的电子噪声等噪声信号,输出经过降噪处理的扬声器处理信号。第四噪声抑制模 块233-11可以与第二噪声抑制模块233-6相同,也可以不同。
需要说明的是,图4只是示例性说明。本领域技术人员应该明白,在一些实施例中,第二算法233-8可以包括第三回声消除模块233-9、第三噪声抑制模块233-10和第四噪声抑制模块233-11中的任意一种或其任意组合。在另一些实施例中,第二算法233-8也可以不包括上述任何信号处理模块,直接输出所述第二音频信号245。
第一模式1和第二模式2可以只运行一个,以节省计算资源。当第一模式1和运行时,第二模式2可以关闭。当第二模式2运行时,第一模式1可以关闭。第一模式1和第二模式2也可以同时运行,当其中一个模式运行时,另一个模式可以更新算法参数。当电子设备200在第一模式1和第二模式2之间切换时,第一模式1和第二模式2内的部分参数可以共用(比如噪声估计算法得到的噪声参数、人声估计算法得到的人声参数、信噪比算法得到的信噪比参数,等等),从而节约计算资源,使计算结果更为准确。第一模式1和第二模式2中的第一算法233-1和第二算法233-8也可以与控制模块231发出的控制指令中的部分参数共用,比如噪声估计算法得到的噪声参数、人生估计算法得到的人声参数、信噪比算法得到的信噪比参数,等等,从而节约计算资源,使计算结果更为准确。
在一些实施例中,所述至少一个指令集还可以包括麦克风控制指令,由麦克风控制模块235执行,被配置为对所述目标音频进行平滑处理,并将平滑处理后的所述目标音频输出至控制设备400。麦克风控制模块235可以接收控制模块231生成的控制信号以及所述目标音频,并基于所述控制信号对所述目标音频进行所述平滑处理。当所述控制信号为所述第一控制信号时,运行第一模式1, 使用第一算法233-1输出的所述第一目标音频291为输入信号;当所述控制信号为所述第二控制信号时,运行第二模式2,使用第二算法233-8输出的所述第二目标音频292为输入信号。当所述控制信号在所述第一控制信号和所述第二控制信号之间切换,导致所述目标音频处理模式在第一模式1和第二模式2之间切换时,为避免所述第一目标音频291和所述第二目标音频292切换带来的信号的不连续性,麦克风控制模块235可以对所述目标音频进行平滑处理。具体地,麦克风控制模块235可以对所述第一目标音频291和所述第一目标音频291的参数进行调整,使所述目标音频连续。所述参数可以预先存储在所述至少一个存储介质230中。所述参数可以是幅度、相位、频率响应等等。所述调整的内容可以包括对所述目标音频的音量的调整、EQ均衡的调整、残留噪声的调整等。麦克风控制模块235可以使所述目标音频处理模式在第一模式1和第二模式2之间切换时,所述目标音频为连续信号,使用户002不容易感知二者之间的切换。
在一些实施例中,所述至少一个指令集还可以包括扬声器控制指令,由扬声器控制模块237执行,被配置为对所述扬声器处理信号进行调整得到所述扬声器输入信号,并将所述扬声器输入信号输出至扬声器280输出声音。扬声器控制模块237可以接收第一算法233-1和第二算法233-8输出的所述扬声器处理信号以及所述控制信号。当所述控制信号为所述第一控制信号时,扬声器控制模块237可以对第一算法233-1输出的所述扬声器处理信号进行控制,使其降低或关闭后再输出至扬声器280进行输出,以降低扬声器280输出的声音,从而降低回声,提升第一算法233-1的回声消除的效果。当所述控制信号为所述第二控制信号时,扬声器控制模块237可以不对第二算法233-8输出的所述扬声器处 理信号进行调整。当所述控制信号在所述第一控制信号和所述第二控制信号之间切换时,为避免扬声器280输出的声音不连续,扬声器控制模块237可以对第一算法233-1和第二算法233-8输出的所述扬声器处理信号进行平滑处理。当所述第一控制信号和所述第二控制信号之间切换时,扬声器控制模块237尽量保证切换的连续性,使用户002不容易感知两者之间的切换。
第一模式1中,第一算法233-1偏重于近端麦克风模组240拾取的用户002的语音质量。当所述扬声器处理信号过大时,通过扬声器控制模块237对所述扬声器处理信号进行处理以降低所述扬声器输入信号,从而降低扬声器280输出的声音,以降低回声确保近端语音质量。第二算法233-8偏重于扬声器280的所述扬声器输入信号,不采用第一类麦克风242输出的所述第一音频信号243来确保扬声器280的所述扬声器输入信号的语音质量以及可懂度。
至少一个处理器220可以同至少一个存储介质230、麦克风模组240和扬声器280通信连接。所述通信连接是指能够直接地或者间接地接收信息的任何形式的连接。至少一个处理器220用以执行上述至少一个指令集。当系统100运行时,至少一个处理器220读取所述至少一个指令集,并且根据所述至少一个指令集的指示获取麦克风模组240以及扬声器280的数据,执行本说明书提供的用于抑制回声的音频信号处理的方法。处理器220可以执行用于抑制回声的音频信号处理的方法包含的所有步骤。处理器220可以是一个或多个处理器的形式,在一些实施例中,处理器220可以包括一个或多个硬件处理器,例如微控制器,微处理器,精简指令集计算机(RISC),专用集成电路(ASIC),特定于应用的指令集处理器(ASIP),中央处理单元(CPU),图形处理单元(GPU),物理处理单元(PPU),微控制器单元,数字信号处理器(DSP),现场可编程门 阵列(FPGA),高级RISC机器(ARM),可编程逻辑器件(PLD),能够执行一个或多个功能的任何电路或处理器等,或其任何组合。仅仅为了说明问题,在本说明书中电子设备200中仅描述了一个处理器220。然而,应当注意,本说明书中电子设备200还可以包括多个处理器,因此,本说明书中披露的操作和/或方法步骤可以如本说明书所述的由一个处理器执行,也可以由多个处理器联合执行。例如,如果在本说明书中电子设备200的处理器220执行步骤A和步骤B,则应该理解,步骤A和步骤B也可以由两个不同处理器220联合或分开执行(例如,第一处理器执行步骤A,第二处理器执行步骤B,或者第一和第二处理器共同执行步骤A和B)。
在一些实施例中,系统100可以根据所述扬声器信号的信号强度选择电子设备200的所述目标音频处理模式。在一些实施例中,系统100可以根据所述扬声器信号的信号强度以及所述麦克风信号选择电子设备200的所述目标音频处理模式。
图5示出了根据本说明书的实施例提供的一种用于抑制回声的音频信号处理方法P100的流程图。所述方法P100为系统100根据所述扬声器信号的信号强度选择电子设备200的所述目标音频处理模式的方法流程图。如图5所示,所述方法P100可以包括通过至少一个处理器220执行:
S120:至少基于所述扬声器信号从第一模式1和第二模式2中选择电子设备200的目标音频处理模式。如前所述,所述目标音频处理模式可以包括第一模式1和第二模式2中的一个。具体地,步骤S120可以包括:
S121:获取所述扬声器信号。
S122:至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号。所述控制信号包括第一控制信号或第二控制信号。具体地,电子设备200可以接收控制设备400发送的所述扬声器信号,并将所述扬声器信号的强度与预设的扬声器阈值进行对比,并根据对比结果生成所述控制信号。步骤S122可以包括以下情况中的一种:
S122-2:确定所述扬声器信号的强度低于所述扬声器阈值,生成所述第一控制信号;或者
S122-4:确定所述扬声器信号的强度高于预设的扬声器阈值,生成所述第二控制信号。
步骤S120还可以包括:
S124:基于所述控制信号,选择与所述控制信号对应的所述目标音频处理模式。其中,所述第一控制信号与所述第一模式1对应。所述第二控制信号与所述第二模式2对应。当所述控制信号为所述第一控制信号时,选择第一模式1;当所述控制信号为所述第二控制信号时,选择第二模式2。
当所述扬声器信号的强度高于所述扬声器阈值时,使用第一模式1中的第一算法233-1对所述第一音频信号243和第二音频信号245进行信号处理时,无法在保留较好的人声信号的同时,消除信号中的回声信号,因此得到的所述第一目标音频291的语音质量较差;而使用第二模式2中的第二算法233-8对所述第二音频信号245进行信号处理得到的所述第二目标音频292质量较好。因此,当所述扬声器信号的强度高于所述扬声器阈值时,电子设备200生成与第二模式2对应的所述第二控制信号。
当所述扬声器信号的强度低于所述扬声器阈值时,使用第一模式1中的第一算法233-1对所述第一音频信号243和第二音频信号245进行信号处理时,能够在保留较好的人声信号的同时,消除信号中的回声信号,因此得到的所述第一目标音频291的语音质量较好;而使用第二模式2中的第二算法233-8对所述第二音频信号245进行信号处理得到的所述第二目标音频292质量也较好。因此,当所述扬声器信号的强度低于所述扬声器阈值时,电子设备200生成与第一模式1对应的所述第一控制信号,也可以生成与第二模式2对应的所述第二控制信号。
所述控制信号由控制模块231生成。具体地,电子设备200可以实时监测所述扬声器信号的强度,并与所述扬声器阈值进行对比。电子设备200也可以定时检测所述扬声器信号的强度,并与所述扬声器阈值进行对比。电子设备200还可以在监测到所述扬声器信号的强度发生明显变化,且变化值超过预设范围时,再将所述扬声器信号与所述扬声器阈值进行对比。
当所述扬声器信号的强度高于所述扬声器阈值时,电子设备200生成所述第二控制信号;当所述扬声器信号发生变化,且所述扬声器信号的强度低于所述扬声器阈值时,电子设备生成所述第一控制信号。当所述扬声器信号的强度低于所述扬声器阈值时,电子设备200生成所述第一控制信号;当所述扬声器信号发生变化,且所述扬声器信号的强度高于所述扬声器阈值时,电子设备生成所述第二控制信号。
为了保证所述控制信号切换时不被用户002感知,所述扬声器阈值可以是一个范围。所述扬声器阈值可以在第一扬声器临界值和第二扬声器临界值所处的范围内。所述第一扬声器临界值小于第二扬声器临界值。所述扬声器信号的 强度高于所述扬声器阈值可以包括所述扬声器信号的强度高于所述第二扬声器临界值。所述扬声器信号的强度低于所述扬声器阈值可以包括所述扬声器信号的强度低于所述第一扬声器临界值。
当所述扬声器信号强度等于所述扬声器阈值时,电子设备200可以生成所述第一控制信号或者所述第二控制信号。当所述扬声器信号强度高于所述第二扬声器临界值时,电子设备200生成所述第二控制信号;当所述扬声器信号强度降低至所述第一扬声器临界值和所述第二扬声器临界值之间时,电子设备200可以生成所述第二控制信号。当所述扬声器信号强度低于所述第一扬声器临界值时,电子设备200生成所述第一控制信号;当所述扬声器信号强度增强至所述第一扬声器临界值和所述第二扬声器临界值之间时,电子设备200可以生成所述第一控制信号。
电子设备200也可以通过机器学习得到控制模型,将所述扬声器信号输入所述控制模型,所述控制模型输出所述控制信号。
所述方法P100还可以包括通过至少一个处理器220执行:
S140:通过所述目标音频处理模式处理所述麦克风信号生成所述目标音频,来至少降低所述麦克风信号中的回声。具体地,步骤S140可以包括以下情况中的一种:
S142:确定所述控制信号为所述第一控制信号,通过与所述第一控制信号对应的所述第一模式1中的第一算法233-1,基于所述扬声器输入信号,对所述第一音频信号243和所述第二音频信号245进行信号处理以及特征融合,生成第一目标音频291。具体过程如前所述,在这里不再赘述。
S144:确定所述控制信号为所述第二控制信号,通过与所述第二控制信号对应的第二模式2中的第二算法233-8,基于所述扬声器输入信号,对所述第二音频信号245进行信号处理。具体过程如前所述,在这里不再赘述。
S160:输出所述目标音频。电子设备200可以直接输出所述目标音频。电子设备200也可以对所述目标音频做平滑处理,以使所述目标音频在所述第一目标音频291和所述第二目标音频292之间切换时,不被用户002感知。具体地,步骤S160可以包括:对所述目标音频做平滑处理并输出经过所述平滑处理的所述目标音频。
具体地,电子设备200可以通过麦克风控制模块235对所述目标音频做平滑处理。当所述目标音频在所述第一目标音频291和所述第二目标音频292之间切换时,麦克风控制模块235可以对所述第一目标音频291和所述第二目标音频292的连接处进行所述平滑处理,即对第一目标音频291和所述第二目标音频292进行信号调节,使得连接处平滑过渡。
所述方法P100还可以包括:
S180:基于所述控制信号,控制所述扬声器280的所述扬声器输入信号的强度。具体地,步骤S180可以通过扬声器控制模块237执行。步骤S180可以是通过扬声器控制模块237确定所述控制信号为所述第一控制信号;扬声器控制模块237对所述扬声器处理信号进行处理,降低输入扬声器280的所述扬声器输入信号的强度,从而降低扬声器280输出的声音的强度,以降低所述麦克风信号中的回声信号,以提高所述第一目标音频的语音质量。
表1示出了图5对应的目标音频处理模式结果图。如表1所示,为了方便对照,我们将场景分为4个场景,分别是第一种:近端声音信号小于阈值(比 如用户002不发出声音)且所述扬声器信号不超过所述扬声器阈值;第二种:近端声音信号大于阈值(比如用户002发出声音)且所述扬声器信号不超过所述扬声器阈值;第三种:近端声音信号小于阈值(比如用户002不发出声音)且所述扬声器信号超过所述扬声器阈值;以及第四种:近端声音信号大于阈值(比如用户002发出声音)且所述扬声器信号超过所述扬声器阈值。其中,近端声音信号是否大于阈值可以通过控制模块231根据所述麦克风信号进行判断。近端声音信号大于阈值可以是用户002发出的音频信号强度超过预设的阈值。所述4个场景对应的目标音频处理模式分别是第一种和第二种对应第一模式1;第三种和第四种对应第二模式2。
Figure PCTCN2020140215-appb-000001
所述方法P100中,电子设备200可以根据所述扬声器信号所选择电子设备200的目标音频处理模式,以保证电子设备200在任何场景下选择的目标音频处理模式处理的语音质量都是最优的,以保证通话质量。
在一些实施例中,所述目标音频处理模式的选择不仅与所述扬声器信号的回声有关,还可以与环境噪声有关。所述环境噪声可以通过所述麦克风信号中的环境噪声等级和信噪比中的至少一个进行评价。
图6示出了根据本说明书的实施例提供的一种用于抑制回声的音频信号处 理方法P200的流程图。所述方法P200为系统100根据所述扬声器信号的信号强度以及所述麦克风信号选择电子设备200的所述目标音频处理模式的方法流程图。具体地,所述方法P200为系统100根据所述扬声器信号以及所述麦克风信号中的环境噪声等级和信噪比中的至少一个选择所述目标音频处理模式的方法流程图。所述方法P200可以包括通过至少一个处理器220执行:
S220:至少基于所述扬声器信号从第一模式1和第二模式2中选择电子设备200的目标音频处理模式。具体地,步骤S220可以包括:
S222:至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号。所述控制信号包括第一控制信号或第二控制信号。具体地,步骤S222可以是电子设备200基于所述扬声器信号的强度以及所述麦克风信号中的噪声,生成对应的控制信号。步骤S222可以包括:
S222-2:获取所述扬声器信号和所述麦克风信号的评价参数。其中,所述评价参数可以是所述麦克风信号中的环境噪声评价参数。所述环境噪声评价参数可以包括环境噪声等级以及信噪比中的至少一个。电子设备200可以通过控制模块231获取所述麦克风信号中的环境噪声评价参数。具体地,电子设备200可以根据第一音频信号243和第二音频信号245中的至少一个获取所述环境噪声评价参数。电子设备200可以通过噪声估计算法获取所述环境噪声等级或所述信噪比,本说明书在此不再赘述。
S222-4:基于所述扬声器信号的强度以及所述环境噪声评价参数,生成所述控制信号。具体地,电子设备200可以将所述扬声器信号的强度与预设的扬声器阈值进行对比,以及将所述环境噪声评价参数与预设的噪声评价范围进行对 比,并根据对比结果生成所述控制信号。步骤S222-4可以包括以下情况中的一种:
S222-5:确定所述扬声器信号的强度高于预设的扬声器阈值,生成所述第二控制信号;
S222-6:确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于预设的噪声评价范围外,生成所述第一控制信号;
S222-7:确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围内,生成所述第一控制信号或所述第二控制信号。
其中,所述环境噪声评价参数处于所述噪声评价范围内可以包括所述环境噪声等级低于预设环境噪声阈值,以及所述信噪比高于预设信噪比阈值中的至少一种。此时的环境噪声较小。所述环境噪声评价参数处于所述噪声评价范围外可以包括所述环境噪声等级高于预设环境噪声阈值,以及所述信噪比低于预设信噪比阈值中的至少一种。此时的环境噪声较大。其中,当所述环境噪声评价参数处于所述噪声评价范围外时,即大噪声环境下,所述第一目标音频291的语音质量优于所述第二目标音频292。当所述环境噪声评价参数处于所述噪声评价范围内时,所述第一目标音频291的语音质量与所述第二目标音频292的语音质量相差不大。
步骤S220还可以包括:
S224:基于所述控制信号,选择与所述控制信号对应的所述目标音频处理模式。其中,所述第一控制信号与所述第一模式1对应。所述第二控制信号与 所述第二模式2对应。当所述控制信号为所述第一控制信号时,选择第一模式1;当所述控制信号为所述第二控制信号时,选择第二模式2。
当所述扬声器信号的强度高于所述扬声器阈值时,第一模式1中的第一算法233-1对所述第一音频信号243和第二音频信号245进行信号处理时,无法在保留较好的人声信号的同时,消除信号中的回声信号,因此得到的所述第一目标音频291的语音质量较差;而第二模式2中的第二算法233-8对所述第二音频信号245进行信号处理得到的所述第二目标音频292质量较好。因此,当所述扬声器信号的强度高于所述扬声器阈值时,不管所述环境噪声处于什么范围内,电子设备200都生成与第二模式2对应的所述第二控制信号。
当所述扬声器信号的强度低于所述扬声器阈值时,第一模式1中的第一算法233-1对所述第一音频信号243和第二音频信号245进行信号处理时,能够在保留较好的人声信号的同时,消除信号中的回声信号,因此得到的所述第一目标音频291的语音质量较好;而第二模式2中的第二算法233-8对所述第二音频信号245进行信号处理得到的所述第二目标音频292质量也较好。因此,当所述扬声器信号的强度低于所述扬声器阈值时,电子设备200生成的控制信号与环境噪声有关。
当所述环境噪声等级高于所述环境噪声阈值或所述信噪比低于所述信噪比阈值时,代表所述麦克风信号中的环境噪声较大。第一模式1中的第一算法233-1对所述第一音频信号243和第二音频信号245进行信号处理时,能够在保留较好的人声信号的同时,降低信号中的噪声,因此得到的所述第一目标音频291的语音质量较好;而第二模式2中的第二算法233-8对所述第二音频信号245进行信号处理得到的所述第二目标音频292的语音质量不如第一目标音频291 的语音质量。因此,当所述扬声器信号的强度低于所述扬声器阈值,并且所述环境噪声等级高于所述环境噪声阈值或所述信噪比低于所述信噪比阈值时,电子设备200生成与第一模式1对应的所述第一控制信号。
需要说明的是,当环境噪声较小时,即环境噪声评价参数处于所述噪声评价范围之内时,第一目标音频291的语音质量与第二目标音频292的语音质量相差不大。这时,电子设备200可以始终生成所述第二控制信号,以选择所述第二模式2中的第二算法233-8对第二音频信号245进行信号处理,在保证目标音频语音质量的前提下,减少计算量,节约资源。
当所述环境噪声等级低于所述环境噪声阈值或所述信噪比高于所述信噪比阈值时,代表所述麦克风信号中的环境噪声较小。第一模式1中的第一算法233-1对所述第一音频信号243和第二音频信号245进行信号处理时得到的所述第一目标音频291,以及第二模式2中的第二算法233-8对所述第二音频信号245进行信号处理得到的所述第二目标音频292的语音质量都较好。因此,当所述扬声器信号的强度低于所述扬声器阈值,并且所述环境噪声等级低于所述环境噪声阈值或所述信噪比高于所述信噪比阈值时,电子设备200生成所述第一控制信号或所述第二控制信号。具体地,电子设备200可以根据前一场景的控制信号决定当前场景下的控制信号。也就是说,当前一场景下,电子设备生成第一控制信号时,当处在当前场景下时,电子设备也生成第一控制信号,从而保证信号的连续性。反之亦然。
所述控制信号由控制模块231生成。具体地,电子设备200可以实时监测所述扬声器信号的强度以及所述环境噪声评价参数,并与所述扬声器阈值和所述噪声评价范围进行对比。电子设备200也可以定时检测所述扬声器信号的强 度以及所述环境噪声评价参数,并与所述扬声器阈值和所述噪声评价范围进行对比。电子设备200还可以在监测到所述扬声器信号的强度或所述环境噪声评价参数发生明显变化,且变化值超过预设范围时,再将所述扬声器信号以及所述环境噪声评价参数与所述扬声器阈值和所述噪声评价范围进行对比。
为了保证所述控制信号切换时不被用户002感知,所述扬声器阈值、所述环境噪声阈值和所述预设信噪比阈值可以是一个范围。所述扬声器阈值如前所述,在此不再赘述。所述环境噪声阈值可以在第一噪声临界值和第二噪声临界值所处的范围内。所述第一噪声临界值小于第二噪声临界值。所述环境噪声等级高于所述环境噪声阈值可以包括所述环境噪声等级高于所述第二噪声临界值。所述环境噪声等级低于所述环境噪声阈值可以包括所述环境噪声等级低于所述第一噪声临界值。所述信噪比阈值可以在第一信噪比临界值和第二信噪比临界值所处的范围内。所述第一信噪比临界值小于第二信噪比临界值。所述信噪比高于所述信噪比阈值可以包括所述信噪比高于所述第二信噪比临界值。所述信噪比低于所述信噪比阈值可以包括所述信噪比低于所述第一信噪比临界值。
所述方法P200可以包括通过至少一个处理器220执行:
S240:通过所述目标音频处理模式处理所述麦克风信号生成目标音频,来至少降低所述麦克风信号中的回声。具体地,步骤S240可以包括以下情况中的一种:
S242:确定所述控制信号为所述第一控制信号,选择所述第一模式1,对所述第一音频信号243和第二音频信号245进行信号处理,生成第一目标音频291。具体地,步骤S242可以与步骤S142一致,在此不再赘述。
S244:确定所述控制信号为所述第二控制信号,选择所述第二模式2,对所述第二音频信号245进行回声抑制,生成第二目标音频292。具体地,步骤S244可以与步骤S144一致,在此不再赘述。
S260:输出所述目标音频。具体地,步骤S260可以与步骤S160一致,在此不再赘述。
所述方法P200还可以包括:
S280:基于所述控制信号,控制所述扬声器280的所述扬声器输入信号的强度。具体地,步骤S280可以与步骤S180一致,在此不再赘述。
表2示出了图6对应的目标音频处理模式结果图。如表2所示,为了方便对照,我们将场景分为8个场景,分别是第一种:近端声音信号小于阈值(比如用户002不发出声音),所述扬声器信号不超过所述扬声器阈值,且环境噪声较小;第二种:近端声音信号大于阈值(比如用户002发出声音),所述扬声器信号不超过所述扬声器阈值,且环境噪声较小;第三种:近端声音信号小于阈值(比如用户002不发出声音),所述扬声器信号超过所述扬声器阈值,且环境噪声较小;以及第四种:近端声音信号大于阈值(比如用户002发出声音),所述扬声器信号超过所述扬声器阈值,且环境噪声较小;第五种:近端声音信号小于阈值(比如用户002不发出声音),所述扬声器信号不超过所述扬声器阈值,且环境噪声较大;第六种:近端声音信号大于阈值(比如用户002发出声音),所述扬声器信号不超过所述扬声器阈值,且环境噪声较大;第七种:近端声音信号小于阈值(比如用户002不发出声音),所述扬声器信号超过所述扬声器阈值,且环境噪声较大;以及第八种:近端声音信号大于阈值(比如用户002发出声音),所述扬声器信号超过所述扬声器阈值,且环境噪声较大。其中,近端 声音信号是否大于阈值可以通过控制模块231根据所述麦克风信号进行判断。近端声音信号大于阈值可以是用户002发出的音频信号强度超过预设的阈值。所述8个场景对应的目标音频处理模式分别是第五种和第六种对应第一模式1;第三种、第四种、第七种和第八种对应第二模式2;其余场景对应第一模式1或第二模式2。
Figure PCTCN2020140215-appb-000002
所述方法P200不仅可以根据扬声器信号控制电子设备200的所述目标音频处理模式,还可以根据近端的环境噪声信号控制所述目标音频处理模式,从而保证在不同场景下,电子设备200输出的语音信号的语音质量都是最佳的,以保证通话质量。
在一些实施例中,所述目标音频处理模式的选择不仅与所述扬声器信号的回声以及环境噪声有关,还可以与用户002说话时的语音信号有关。所述环境噪声信号可以通过所述麦克风信号中的环境噪声等级和信噪比中的至少一个进行评价。用户002说话时的语音信号可以通过所述麦克风信号中的人声信号强度进行评价。所述人声信号强度可以是通过噪声估计算法得到的人声信号强度,所述人声信号强度也可以是经过降噪处理后得到的音频信号的强度。
图7示出了根据本说明书的实施例提供的一种用于抑制回声的音频信号处理方法P300的流程图。所述方法P300为系统100根据所述扬声器信号的信号强度以及所述麦克风信号选择电子设备200的目标音频处理模式的方法流程图。具体地,所述方法P300为系统100根据所述扬声器信号、所述麦克风信号中的人声信号强度以及环境噪声等级和信噪比的至少一个选择所述目标音频处理模式的方法流程图。所述方法P300可以包括通过至少一个处理器220执行:
S320:至少基于所述扬声器信号从第一模式1和第二模式2中选择电子设备200的目标音频处理模式。具体地,步骤S320可以包括:
S322:至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号。所述控制信号包括第一控制信号或第二控制信号。步骤S320可以是电子设备200基于所述扬声器信号的强度、所述麦克风信号中的噪声以及所述麦克风信号中的人声信号强度,生成对应的控制信号。具体地,步骤S322可以包括:
S322-2:获取所述扬声器信号和所述麦克风信号的评价参数。其中,所述评价参数可以包括所述麦克风信号中的环境噪声评价参数,还可以包括所述麦克风信号中的人声信号强度。所述环境噪声评价参数可以包括环境噪声等级以及信噪比中的至少一个。电子设备200可以通过控制模块231获取所述麦克风信号中的环境噪声评价参数以及人声信号强度。具体地,电子设备200可以根据第一音频信号243和第二音频信号245中的至少一个获取所述评价参数。电子设备200可以通过噪声估计算法获取所述人声信号以及所述环境噪声等级和所述信噪比,本说明书在此不再赘述。
S322-4:基于所述扬声器信号的强度以及所述评价参数,生成所述控制信号。具体地,电子设备200可以将所述扬声器信号的强度与预设的扬声器阈值进行对比,将所述环境噪声评价参数与预设的噪声评价范围进行对比,以及将所述人声信号强度与预设的人声阈值进行对比,并根据对比结果生成所述控制信号。步骤S322-4可以包括以下情况中的一种:
S322-5:确定所述扬声器信号的强度高于预设的扬声器阈值,且所述人声信号强度超过所述人声阈值,所述环境噪声评价参数处于预设的噪声评价范围之外,生成所述第一控制信号;
S322-6:确定所述扬声器信号的强度高于所述扬声器阈值,且所述人声信号强度超过所述人声阈值,所述环境噪声评价参数处于所述噪声评价范围之内,生成所述第二控制信号;
S322-7:确定所述扬声器信号的强度高于所述扬声器阈值,且所述人声信号强度低于所述人声阈值,生成所述第二控制信号;
S322-8:确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围之外,生成所述第一控制信号;
S322-9:确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围内,生成所述第一控制信号或所述第二控制信号。
所述环境噪声评价参数处于所述噪声评价范围内可以包括所述环境噪声等级低于预设环境噪声阈值,以及所述信噪比高于预设信噪比阈值中的至少一种。此时的环境噪声较小。所述环境噪声评价参数处于所述噪声评价范围外可以包括所述环境噪声等级高于预设环境噪声阈值,以及所述信噪比低于预设信噪比 阈值中的至少一种。此时的环境噪声较大。其中,当所述环境噪声评价参数处于所述噪声评价范围外时,即大噪声环境下,所述第一目标音频291的语音质量优于所述第二目标音频292。当所述环境噪声评价参数处于所述噪声评价范围内时,所述第一目标音频291的语音质量与所述第二目标音频292的语音质量相差不大。所述扬声器阈值、所述环境噪声阈值以及所述信噪比阈值如前所述,在此不在赘述。
其中,所述人声信号强度超过所述人声阈值说明用户002正在说话。此时,为了保证用户002的语音质量,电子设备200可以生成所述第一控制信号,并降低所述扬声器信号以保证所述第一目标音频292的语音质量。
所述扬声器阈值、所述环境噪声阈值、所述信噪比阈值以及所述人声阈值可以预先存储在电子设备200中。
步骤S320还可以包括:
S324:基于所述控制信号,选择与所述控制信号对应的所述目标音频处理模式。其中,所述第一控制信号与所述第一模式1对应。所述第二控制信号与所述第二模式2对应。当所述控制信号为所述第一控制信号时,选择第一模式1;当所述控制信号为所述第二控制信号时,选择第二模式2。
当所述扬声器信号的强度高于所述扬声器阈值且所述人声信号强度超过所述人声阈值,所述环境噪声评价参数处于预设的噪声评价范围之外时,证明此时用户002正在说话,并且回声很大,噪声也较大。为了保证用户002的语音质量以及可懂度,电子设备200可降低甚至关闭输入至扬声器280的扬声器输入信号,以降低所述麦克风信号中的回声,保证目标音频的语音质量。此时,第一模式1中的第一算法233-1对所述第一音频信号243和第二音频信号245 进行信号处理得到的所述第一目标音频291的语音质量较第二模式2中的第二算法233-8对第二音频信号245进行信号处理得到的第二目标音频292更好。因此,当所述扬声器信号的强度高于所述扬声器阈值且所述人声信号强度超过所述人声阈值,所述环境噪声评价参数处于预设的噪声评价范围之外时,电子设备200生成与第一模式1对应的所述第一控制信号。在这种情况下,电子设备200可以保证近端用户002的语音质量的可懂度。虽然扬声器输入信号有一部分缺失,但电子设备200能保留扬声器输入信号的大部分语音质量和可懂度,从而提升双方的语音通信质量。
当所述扬声器信号的强度高于所述扬声器阈值且所述人声信号强度低于所述人声阈值或者所述人声信号强度超过所述人声阈值,所述环境噪声评价参数处于预设的噪声评价范围之内时,证明此时用户002没有说话,或者用户002正在说话但噪音较小。此时,第一模式1中的第一算法233-1对所述第一音频信号243和第二音频信号245进行信号处理得到的所述第一目标音频291的语音质量较第二模式2中的第二算法233-8对第二音频信号245进行信号处理得到的第二目标音频292更差。因此,当所述扬声器信号的强度高于所述扬声器阈值且所述人声信号强度低于所述人声阈值或者所述人声信号强度超过所述人声阈值,所述环境噪声评价参数处于预设的噪声评价范围之内时,电子设备200生成与第二模式2对应的所述第二控制信号。
步骤S322-4中的其他情况与步骤S222-4基本一致,在此不再赘述。
所述控制信号由控制模块231生成。具体地,电子设备200可以实时监测所述扬声器信号的强度以及所述评价参数,并与所述扬声器阈值、所述噪声评价范围以及所述人声阈值进行对比。电子设备200也可以定时检测所述扬声器 信号的强度以及所述评价参数,并与所述扬声器阈值、所述噪声评价范围以及所述人声阈值进行对比。电子设备200还可以在监测到所述扬声器信号的强度或所述评价参数发生明显变化,且变化值超过预设范围时,再将所述扬声器信号以及所述评价参数与所述扬声器阈值、所述噪声评价范围以及所述人声阈值进行对比。
所述方法P300可以包括通过至少一个处理器220执行:
S340:通过所述目标音频处理模式处理所述麦克风信号生成所述目标音频,来至少降低所述麦克风信号中的回声。具体地,步骤S340可以包括以下情况中的一种:
S342:确定所述控制信号为所述第一控制信号,选择所述第一模式1,对所述第一音频信号243和第二音频信号245进行信号处理,生成第一目标音频291。具体地,步骤S342可以与步骤S142一致,在此不再赘述。
S344:确定所述控制信号为所述第二控制信号,选择所述第二模式2,对所述第二音频信号245进行信号处理,生成第二目标音频292。具体地,步骤S344可以与步骤S144一致,在此不再赘述。
所述方法P300可以包括通过至少一个处理器220执行:
S360:输出所述目标音频。具体地,步骤S360可以与步骤S160一致,在此不再赘述。
所述方法P300还可以包括:
S380:基于所述控制信号,控制所述扬声器280的所述扬声器输入信号的强度。具体地,步骤S380可以与步骤S180一致,在此不再赘述。
表3示出了图7对应的目标音频处理模式结果图。如表3所示,为了方便 对照,我们将场景分为8个场景,分别是第一种:近端声音信号小于阈值(比如用户002不发出声音),所述扬声器信号不超过所述扬声器阈值,且环境噪声较小;第二种:近端声音信号大于阈值(比如用户002发出声音),所述扬声器信号不超过所述扬声器阈值,且环境噪声较小;第三种:近端声音信号小于阈值(比如用户002不发出声音),所述扬声器信号超过所述扬声器阈值,且环境噪声较小;以及第四种:近端声音信号大于阈值(比如用户002发出声音),所述扬声器信号超过所述扬声器阈值,且环境噪声较小;第五种:近端声音信号小于阈值(比如用户002不发出声音),所述扬声器信号不超过所述扬声器阈值,且环境噪声较大;第六种:近端声音信号大于阈值(比如用户002发出声音),所述扬声器信号不超过所述扬声器阈值,且环境噪声较大;第七种:近端声音信号小于阈值(比如用户002不发出声音),所述扬声器信号超过所述扬声器阈值,且环境噪声较大;以及第八种:近端声音信号大于阈值(比如用户002发出声音),所述扬声器信号超过所述扬声器阈值,且环境噪声较大。其中,近端声音信号是否大于阈值可以通过控制模块231根据所述麦克风信号进行判断。近端声音信号大于阈值可以是用户002发出的音频信号强度超过预设的阈值。所述8个场景对应的目标音频处理模式分别是第五种、第六种和第八种对应第一模式1;第三种、第四种和第七种对应第二模式2;其余场景对应第一模式1或第二模式2。
Figure PCTCN2020140215-appb-000003
Figure PCTCN2020140215-appb-000004
需要说明的是,方法P200和方法P300适用于不同应用场景。当扬声器信号比近端语音质量重要的场景下,为了保证扬声器信号的质量以及扬声器信号的可懂度可以选择方法P200。当近端语音质量比扬声器信号重要的场景下,为了保证近端语音的语音质量和可懂度可以选择方法P300。
综上所述,系统100、所述方法P100、所述方法P200以及所述方法P300可以针对不同的场景,根据扬声器信号控制电子设备200的目标音频处理模式,从而控制电子设备200的音源信号,使得目标音频在任何场景下的语音质量都是最优的,从而提升语音通信的质量。
需要说明的是,环境噪声的信号强度在各个频率下是不同的。在不同频率下,所述第一目标音频291和所述第二目标音频292的语音质量也是不同的。比如,在第一频率下,所述第一音频信号243和第二音频信号245经过第一算法233-1做信号处理后得到的所述第一目标音频291的语音质量好于所述第二音频信号245经过第二算法233-8做信号处理后得到的第二目标音频292的语音质量。而在除所述第一频率外的其他频率下,所述第一音频信号243和第二音频信号245经过第一算法233-1做信号处理后得到的所述第一目标音频291的语音质量与所述第二音频信号245经过第二算法233-8做信号处理后得到的第二目标音频292的语音质量相近。这时,电子设备200还可以根据所述环境噪声的频率生成所述控制信号。在所述第一频率下生成所述第一控制信号,在除所述第一频率外的其他频率下生成所述第二控制信号。
当所述环境噪声是低频噪声时(比如地铁、公交等一些情况),这时可能出现第一音频信号243和第二音频信号245在第一算法233-1的信号处理下得到的第一目标音频291在低频处的语音信号质量较差,即第一目标音频291在低频时的语音可懂度较差,而在高频时的语音可懂度较高。这时,电子设备200可以根据所述环境噪声的频率控制目标音频处理模式的选择。比如,在低频范围内,电子设备200可以选择所述方法P300控制所述目标音频处理模式,以保证近端用户002的语音被拾取,从而保证近端语音质量;在高频范围内,电子设备200可以选择所述方法P200控制所述目标音频处理模式,以保证近端用户002可以听到所述扬声器信号。
本说明书另一方面提供一种非暂时性存储介质,存储有至少一组用来基于的音源信号控制的可执行指令,当所述可执行指令被处理器执行时,所述可执行指令指导所述处理器实施本说明书所述的用于抑制回声的音频信号处理方法的步骤。在一些可能的实施方式中,本说明书的各个方面还可以实现为一种程序产品的形式,其包括程序代码。当所述程序产品在电子设备200上运行时,所述程序代码用于使电子设备200执行本说明书描述的基于的音源信号控制的步骤。用于实现上述方法的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)包括程序代码,并可以在电子设备200上运行。然而,本说明书的程序产品不限于此,在本说明书中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统(例如处理器220)使用或者与其结合使用。所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。 可读存储介质的更具体的例子包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。所述计算机可读存储介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读存储介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。可以以一种或多种程序设计语言的任意组合来编写用于执行本说明书操作的程序代码,所述程序设计语言包括面向对象的程序设计语言-诸如Java、C++等,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在电子设备200上执行、部分地在电子设备200上执行、作为一个独立的软件包执行、部分在电子设备200上部分在远程计算设备上执行、或者完全在远程计算设备上执行。
上述对本说明书特定实施例进行了描述。其他实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者是可能有利的。
综上所述,在阅读本详细公开内容之后,本领域技术人员可以明白,前述 详细公开内容可以仅以示例的方式呈现,并且可以不是限制性的。尽管这里没有明确说明,本领域技术人员可以理解本说明书需求囊括对实施例的各种合理改变,改进和修改。这些改变,改进和修改旨在由本说明书提出,并且在本说明书的示例性实施例的精神和范围内。
此外,本说明书中的某些术语已被用于描述本说明书的实施例。例如,“一个实施例”,“实施例”和/或“一些实施例”意味着结合该实施例描述的特定特征,结构或特性可以包括在本说明书的至少一个实施例中。因此,可以强调并且应当理解,在本说明书的各个部分中对“实施例”或“一个实施例”或“替代实施例”的两个或更多个引用不一定都指代相同的实施例。此外,特定特征,结构或特性可以在本说明书的一个或多个实施例中适当地组合。
应当理解,在本说明书的实施例的前述描述中,为了帮助理解一个特征,出于简化本说明书的目的,本说明书将各种特征组合在单个实施例、附图或其描述中。然而,这并不是说这些特征的组合是必须的,本领域技术人员在阅读本说明书的时候完全有可能将其中一部分特征提取出来作为单独的实施例来理解。也就是说,本说明书中的实施例也可以理解为多个次级实施例的整合。而每个次级实施例的内容在于少于单个前述公开实施例的所有特征的时候也是成立的。
本文引用的每个专利,专利申请,专利申请的出版物和其他材料,例如文章,书籍,说明书,出版物,文件,物品等,可以通过引用结合于此。用于所有目的的全部内容,除了与其相关的任何起诉文件历史,可能与本文件不一致或相冲突的任何相同的,或者任何可能对权利要求的最宽范围具有限制性影响的任何相同的起诉文件历史。现在或以后与本文件相关联。举例来说,如果在 与任何所包含的材料相关联的术语的描述、定义和/或使用与本文档相关的术语、描述、定义和/或之间存在任何不一致或冲突时,使用本文件中的术语为准。
最后,应理解,本文公开的申请的实施方案是对本说明书的实施方案的原理的说明。其他修改后的实施例也在本说明书的范围内。因此,本说明书披露的实施例仅仅作为示例而非限制。本领域技术人员可以根据本说明书中的实施例采取替代配置来实现本说明书中的申请。因此,本说明书的实施例不限于申请中被精确地描述过的实施例。

Claims (17)

  1. 一种用于抑制回声的音频信号处理方法,其特征在于,包括:
    至少基于扬声器信号从多个音频处理模式中选择电子设备的目标音频处理模式,所述扬声器信号为控制设备发送给所述电子设备的音频信号;
    通过所述目标音频处理模式处理麦克风信号生成目标音频,来至少降低所述目标音频中的回声,所述麦克风信号为所述电子设备获取的麦克风模组的输出信号,所述麦克风模组包括至少一个第一类麦克风和至少一个第二类麦克风;以及
    输出所述目标音频信号。
  2. 如权利要求1所述的音频信号处理方法,其特征在于,
    所述至少一个第一类麦克风输出第一音频信号;以及
    所述至少一个第二类麦克风输出第二音频信号,
    其中,所述麦克风信号包括所述第一音频信号和所述第二音频信号。
  3. 如权利要求2所述的音频信号处理方法,其特征在于,
    所述至少一个第一类麦克风用于采集人体振动信号;以及
    所述至少一个第二类麦克风用于采集空气振动信号。
  4. 如权利要求2所述的音频信号处理方法,其特征在于,所述多个音频处理模式至少包括:
    第一模式,对所述第一音频信号和所述第二音频信号进行信号处理;以及
    第二模式,对所述第二音频信号进行信号处理。
  5. 如权利要求4所述的音频信号处理方法,其特征在于,所述至少基于扬声器信号从多个音频处理模式中选择电子设备的目标音频处理模式,包括:
    至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号,所述控制信号包括第一控制信号或第二控制信号;以及
    基于所述控制信号,选择与所述控制信号对应的目标音频处理模式,其中,所述第一模式与所述第一控制信号对应,所述第二模式与所述第二控制信号对应。
  6. 如权利要求5所述的音频信号处理方法,其特征在于,所述至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号,包括:
    确定所述扬声器信号的强度低于预设的扬声器阈值,生成所述第一控制信号;或者
    确定所述扬声器信号的强度高于所述扬声器阈值,生成所述第二控制信号。
  7. 如权利要求5所述的音频信号处理方法,其特征在于,所述至少基于所述扬声器信号的强度,生成与所述扬声器信号对应的控制信号,包括:
    基于所述扬声器信号的强度以及所述麦克风信号,生成对应的控制信号。
  8. 如权利要求7所述的音频信号处理方法,其特征在于,所述基于所述扬声器信号的强度以及所述麦克风信号,生成对应的控制信号,包括:
    获取所述麦克风信号的评价参数,所述评价参数包括环境噪声评价参数,所述环境噪声评价参数包括环境噪声等级以及信噪比中的至少一个;以及
    基于所述扬声器信号的强度以及所述评价参数,生成所述控制信号。
  9. 如权利要求8所述的音频信号处理方法,其特征在于,所述基于所述扬声器信号的强度以及所述评价参数,生成所述控制信号,包括以下情况中的一种:
    确定所述扬声器信号的强度高于预设的扬声器阈值,生成所述第二控制信号;
    确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于预设的噪声评价范围外,生成所述第一控制信号;以及
    确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围内,生成所述第一控制信号或所述第二控制信号。
  10. 如权利要求9所述的音频信号处理方法,其特征在于,所述环境噪声评价参数处于所述噪声评价范围内,包括以下情况中的至少一种:
    所述环境噪声等级低于预设环境噪声阈值;以及
    所述信噪比高于预设信噪比阈值。
  11. 如权利要求8所述的音频信号处理方法,其特征在于,所述评价参数还包括人声信号强度,所述基于所述扬声器信号的强度以及所述评价参数,生成所述控制信号,包括以下情况中的一种:
    确定所述扬声器信号的强度高于预设的扬声器阈值,且所述人声信号强度超过预设人声阈值,所述环境噪声评价参数处于预设的噪声评价范围之外,生成所述第一控制信号;
    确定所述扬声器信号的强度高于所述扬声器阈值,且所述人声信号强度超过所述人声阈值,所述环境噪声评价参数处于所述噪声评价范围之内,生成所述第二控制信号;
    确定所述扬声器信号的强度高于所述扬声器阈值,且所述人声信号强度低于所述人声阈值,生成所述第二控制信号;
    确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围之外,生成所述第一控制信号;以及
    确定所述扬声器信号的强度低于所述扬声器阈值,且所述环境噪声评价参数处于所述噪声评价范围内,生成所述第一控制信号或所述第二控制信号。
  12. 如权利要求11所述的音频信号处理方法,其特征在于,所述环境噪声评价参数处于所述噪声评价范围内,包括以下情况中的至少一种:
    所述环境噪声等级低于预设环境噪声阈值;以及
    所述信噪比高于预设信噪比阈值。
  13. 如权利要求5所述的音频信号处理方法,其特征在于,所述生成目标音频,包括:
    通过所述第一模式中的第一算法,对所述第一音频信号和所述第二音频信号进行信号处理,生成第一目标音频;或者
    通过所述第二模式中的第二算法,对所述第二音频信号进行信号处理,生成第二目标音频,
    其中,所述目标音频包括所述第一目标音频和所述第二目标音频中的一个。
  14. 如权利要求13所述的音频信号处理方法,其特征在于,所述输出所述目标音频,包括:
    对所述目标音频做平滑处理,当所述目标音频在所述第一目标音频和所述第二目标音频之间切换时,对所述第一目标音频和所述第二目标音频的连接处进行所述平滑处理;以及
    输出经过所述平滑处理的所述目标音频。
  15. 如权利要求5所述的音频信号处理方法,其特征在于,所述方法还包括:
    基于所述控制信号,控制所述扬声器的扬声器输入信号的强度。
  16. 如权利要求15所述的音频信号处理方法,其特征在于,所述基于所述控制信号,控制所述扬声器的扬声器输入信号的强度,包括:
    确定所述控制信号为所述第一控制信号,降低输入所述扬声器的所述扬声器输入信号的强度,从而降低所述扬声器输出的声音的强度。
  17. 一种用于抑制回声的音频信号处理的系统,其特征在于,包括:
    至少一个存储介质,存储有至少一个指令集,用于抑制回声的音频信号处理;以及
    至少一个处理器,同所述至少一个存储介质通信连接,
    其中,当所述系统运行时,所述至少一个处理器读取所述至少一个指令集,并且根据所述至少一个指令集的指示执行权利要求1-16中任一项所述的用于抑制回声的音频信号处理的方法。
PCT/CN2020/140215 2020-12-28 2020-12-28 用于抑制回声的音频信号处理方法和系统 WO2022140928A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
PCT/CN2020/140215 WO2022140928A1 (zh) 2020-12-28 2020-12-28 用于抑制回声的音频信号处理方法和系统
CN202080104434.XA CN116158090A (zh) 2020-12-28 2020-12-28 用于抑制回声的音频信号处理方法和系统
KR1020237018110A KR20230098282A (ko) 2020-12-28 2020-12-28 에코 억제를 위한 오디오 신호 처리 방법과 시스템
JP2023533789A JP2023551556A (ja) 2020-12-28 2020-12-28 エコーの抑制のためのオーディオ信号処理方法及びシステム
EP20967280.7A EP4270987A1 (en) 2020-12-28 2020-12-28 Audio signal processing method and system for suppressing echo
US17/397,797 US20220208207A1 (en) 2020-12-28 2021-08-09 Audio signal processing method and system for echo suppression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/140215 WO2022140928A1 (zh) 2020-12-28 2020-12-28 用于抑制回声的音频信号处理方法和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/397,797 Continuation US20220208207A1 (en) 2020-12-28 2021-08-09 Audio signal processing method and system for echo suppression

Publications (1)

Publication Number Publication Date
WO2022140928A1 true WO2022140928A1 (zh) 2022-07-07

Family

ID=82117732

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140215 WO2022140928A1 (zh) 2020-12-28 2020-12-28 用于抑制回声的音频信号处理方法和系统

Country Status (6)

Country Link
US (1) US20220208207A1 (zh)
EP (1) EP4270987A1 (zh)
JP (1) JP2023551556A (zh)
KR (1) KR20230098282A (zh)
CN (1) CN116158090A (zh)
WO (1) WO2022140928A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421580B (zh) * 2021-08-23 2021-11-05 深圳市中科蓝讯科技股份有限公司 降噪方法、存储介质、芯片及电子设备
WO2024092453A1 (zh) * 2022-10-31 2024-05-10 北京小米移动软件有限公司 一种风噪测量方法/装置/设备及存储介质
CN116221160A (zh) * 2023-01-06 2023-06-06 歌尔股份有限公司 风扇噪声调整方法、装置、头戴显示设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1715669A1 (en) * 2005-04-19 2006-10-25 Ecole Polytechnique Federale De Lausanne (Epfl) A method for removing echo in an audio signal
CN106686496A (zh) * 2016-12-27 2017-05-17 广东小天才科技有限公司 一种可穿戴设备的播放模式控制方法及可穿戴设备
CN107078403A (zh) * 2014-10-20 2017-08-18 株式会社村田制作所 无线通信模块
CN107889007A (zh) * 2017-10-27 2018-04-06 恒玄科技(上海)有限公司 消除降噪通路对播放声音影响的主动降噪方法及系统
US10187504B1 (en) * 2016-09-23 2019-01-22 Apple Inc. Echo control based on state of a device
CN110110616A (zh) * 2019-04-19 2019-08-09 出门问问信息科技有限公司 一种电子设备及控制方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7171003B1 (en) * 2000-10-19 2007-01-30 Lear Corporation Robust and reliable acoustic echo and noise cancellation system for cabin communication
US9711127B2 (en) * 2011-09-19 2017-07-18 Bitwave Pte Ltd. Multi-sensor signal optimization for speech communication
US11418874B2 (en) * 2015-02-27 2022-08-16 Harman International Industries, Inc. Techniques for sharing stereo sound between multiple users

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1715669A1 (en) * 2005-04-19 2006-10-25 Ecole Polytechnique Federale De Lausanne (Epfl) A method for removing echo in an audio signal
CN107078403A (zh) * 2014-10-20 2017-08-18 株式会社村田制作所 无线通信模块
US10187504B1 (en) * 2016-09-23 2019-01-22 Apple Inc. Echo control based on state of a device
CN106686496A (zh) * 2016-12-27 2017-05-17 广东小天才科技有限公司 一种可穿戴设备的播放模式控制方法及可穿戴设备
CN107889007A (zh) * 2017-10-27 2018-04-06 恒玄科技(上海)有限公司 消除降噪通路对播放声音影响的主动降噪方法及系统
CN110110616A (zh) * 2019-04-19 2019-08-09 出门问问信息科技有限公司 一种电子设备及控制方法

Also Published As

Publication number Publication date
EP4270987A1 (en) 2023-11-01
JP2023551556A (ja) 2023-12-08
US20220208207A1 (en) 2022-06-30
CN116158090A8 (zh) 2024-05-24
KR20230098282A (ko) 2023-07-03
CN116158090A (zh) 2023-05-23

Similar Documents

Publication Publication Date Title
WO2022140928A1 (zh) 用于抑制回声的音频信号处理方法和系统
US10535362B2 (en) Speech enhancement for an electronic device
US9749731B2 (en) Sidetone generation using multiple microphones
AU2010295569B2 (en) Multi-Modal Audio System with Automatic Usage Mode Detection and Configuration Capability
US20160189728A1 (en) Voice Signal Processing Method and Apparatus
US11812208B2 (en) Wireless earphone noise reduction method and device, wireless earphone, and storage medium
JP2009530950A (ja) ウェアラブル装置のためのデータ処理
WO2017096923A1 (zh) 一种改善移动终端免提通话回声的方法及系统
EP2426950A2 (en) Noise suppression for sending voice with binaural microphones
CN111131947A (zh) 耳机信号处理方法、系统和耳机
CN112954530B (zh) 一种耳机降噪方法、装置、系统及无线耳机
JP2006139307A (ja) 声音効果処理と騒音制御を有する装置及びその方法
CN110782912A (zh) 音源的控制方法以及扬声设备
CN111683319A (zh) 一种通话拾音降噪方法及耳机、存储介质
US20160267925A1 (en) Audio processing apparatus that outputs, among sounds surrounding user, sound to be provided to user
CN110010117B (zh) 一种语音主动降噪的方法及装置
CN113207056B (zh) 一种无线耳机及其透传方法、装置及系统
CN111629313B (zh) 包括环路增益限制器的听力装置
CN115499744A (zh) 耳机降噪方法及装置、计算机可读存储介质及耳机
CN115866474A (zh) 无线耳机的透传降噪控制方法、系统及无线耳机
CN111083250A (zh) 移动终端及其降噪方法
CN116803100A (zh) 用于具有anc的耳机的方法和系统
US10540955B1 (en) Dual-driver loudspeaker with active noise cancellation
CN114697785A (zh) 用于抑制回声的音频信号处理方法和系统
WO2022141364A1 (zh) 生成音频的方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20967280

Country of ref document: EP

Kind code of ref document: A1

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023003585

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 20237018110

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2023533789

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2020967280

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020967280

Country of ref document: EP

Effective date: 20230728

ENP Entry into the national phase

Ref document number: 112023003585

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20230227