WO2024016229A1 - 音频处理方法及电子设备 - Google Patents

音频处理方法及电子设备 Download PDF

Info

Publication number
WO2024016229A1
WO2024016229A1 PCT/CN2022/106850 CN2022106850W WO2024016229A1 WO 2024016229 A1 WO2024016229 A1 WO 2024016229A1 CN 2022106850 W CN2022106850 W CN 2022106850W WO 2024016229 A1 WO2024016229 A1 WO 2024016229A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
noise
target
noise ratio
audio signal
Prior art date
Application number
PCT/CN2022/106850
Other languages
English (en)
French (fr)
Inventor
张立斌
刘畅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/106850 priority Critical patent/WO2024016229A1/zh
Publication of WO2024016229A1 publication Critical patent/WO2024016229A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment

Definitions

  • the embodiments of the present application relate to the technical field of terminal equipment, and in particular, to an audio processing method and electronic equipment.
  • this application provides an audio processing method and electronic device.
  • the audio signal can be adjusted based on the signal-to-noise ratio between the audio signal and the noise signal to stabilize the signal-to-noise ratio of the output audio signal and improve the user's listening experience of the audio output by the electronic device.
  • embodiments of the present application provide an audio processing method, which is applied to electronic devices.
  • the method includes: obtaining a target noise signal corresponding to the environmental sound; obtaining a first audio signal to be output; determining a target signal-to-noise ratio corresponding to the first audio signal and the target noise signal; based on the target signal-to-noise ratio,
  • the first audio signal and the target noise signal determine a target gain signal corresponding to the first audio signal; adjust the first audio signal based on the target gain signal to obtain a second audio signal; and output the second audio signal.
  • the electronic device can be a mobile phone, a headset, a call service network element, etc. This application does not limit this.
  • the environmental sound is the sound of the environment where the electronic device is located.
  • the target noise signal may be a noise signal of environmental noise collected by an electronic device, or may be a noise signal of environmental noise after noise reduction processing, or may be a compensated noise signal.
  • the target noise signal may be a noise signal of environmental noise collected by an electronic device, or may be a noise signal of environmental noise after noise reduction processing, or may be a compensated noise signal.
  • the target signal-to-noise ratio is the signal-to-noise ratio between the first audio signal expected by the electronic device and the target noise signal determined by the electronic device by combining the target noise signal and the first audio signal.
  • the first audio signal may be an audio signal of music or an audio signal of a call, which is not limited in this application.
  • the first audio signal is also called "effective audio signal”.
  • Embodiments of the present application can adjust the audio signal to be output by the electronic device based on the signal-to-noise ratio (SNR), so that the signal-to-noise ratio between the gain-adjusted audio signal and the target noise signal is more stable. Changes in the decibel intensity of the ambient noise around the electronic device do not affect the user's reception of the sound signal corresponding to the audio signal.
  • SNR signal-to-noise ratio
  • this embodiment can make the volume of the sound signal corresponding to the audio signal also become louder, so that the user does not affect the reception of the audio signal; in addition, when the volume of the ambient sound suddenly becomes smaller, Embodiments of the present application can reduce the volume of the sound signal corresponding to the audio signal, thereby preventing the corresponding amplitude of the audio signal from being too high and affecting the user's hearing. In this way, it can be combined with environmental noise signals to dynamically maintain the stability of the signal-to-noise ratio and improve the listening experience.
  • determining a target signal-to-noise ratio corresponding to the first audio signal and the target noise signal includes: dividing the first audio signal into a plurality of first subbands; The signal is divided into a plurality of second subbands; wherein the frequency bands corresponding to the plurality of first subbands and the plurality of second subbands are the same; it is determined that the plurality of first subbands and the plurality of second subbands are Multiple first signal-to-noise ratios between subbands; wherein each first signal-to-noise ratio is a signal-to-noise ratio between the first subband and the second subband corresponding to the same frequency band.
  • the target signal-to-noise ratio may include a signal-to-noise ratio between corresponding subbands of the audio signal and the noise signal.
  • the number of the first subbands and the second subbands is the same, wherein the corresponding frequency bands between the first subbands and the second subbands corresponding to each other are the same.
  • the target signal-to-noise ratio includes the plurality of first signal-to-noise ratios.
  • the first signal-to-noise ratio is also called the signal-to-noise ratio of a subband, such as the signal-to-noise ratio SNRi of subband i.
  • the electronic device may determine the first signal-to-noise ratio for the first subband and the second subband that correspond to each other.
  • the target noise signal N is divided into 20 sub-bands n_i, namely sub-band n_1 to sub-band n_20;
  • the audio signal S is divided into 20 sub-bands s_i, which are sub-bands s_1 to sub-band s_20 respectively;
  • the frequency band i corresponding to subband n_i is the same as the frequency band i corresponding to subband s_i.
  • subband i may represent subband n_i and subband s_i.
  • i is an integer greater than or equal to 1 and less than or equal to 20.
  • the electronic device can adjust the gain of the first audio signal based on the SNR of the sub-band, and can consider the difference between the gain adjustment details of different frequency components (which can be represented based on the sub-band) to ensure that the output audio signal The SNR within different subbands is more stable.
  • determining a plurality of first signal-to-noise ratios between the plurality of first subbands and the plurality of second subbands includes: Based on the masking curve, a plurality of first signal-to-noise ratios between the plurality of first subbands and the plurality of second subbands are determined.
  • the masking curve is a masking curve for human hearing, which can be any masking curve in the prior art, and is not limited in this application.
  • the signal-to-noise ratio of the subband is determined based on the masking curve, which can be more consistent with the listening perception of human ears and improve the listening experience.
  • determining a plurality of first signal noises between the plurality of first subbands and the plurality of second subbands based on the masking curve. ratio including: determining a plurality of second signal-to-noise ratios between the plurality of first sub-bands and the plurality of second sub-bands; wherein each of the second signal-to-noise ratios is a signal-to-noise ratio corresponding to the same frequency band.
  • the signal-to-noise ratio between the first sub-band and the second sub-band based on the masking curve, determine the amplitude thresholds respectively corresponding to the frequency bands of the plurality of first sub-bands; based on the plurality of second signals
  • a noise ratio and a plurality of the amplitude thresholds determine a plurality of first signal-to-noise ratios between the plurality of first subbands and the plurality of second subbands.
  • the second signal-to-noise ratio here may be the initially set signal-to-noise ratio of sub-band i.
  • This embodiment may determine the amplitude threshold corresponding to the frequency band of each sub-band i based on the masking curve, where, for For a subband i, the electronic device determines an amplitude threshold corresponding to the subband i.
  • the electronic device can determine whether to adjust the second signal-to-noise ratio corresponding to the sub-band i based on the amplitude threshold corresponding to each sub-band i and the initially set second signal-to-noise ratio corresponding to each sub-band i; if so, The second signal-to-noise ratio is adjusted based on the amplitude threshold corresponding to the sub-band i, then the adjusted second signal-to-noise ratio can be the first signal-to-noise ratio of the sub-band i, such as the above-mentioned signal-to-noise ratio SNRi; on the contrary , if it is determined that the second signal-to-noise ratio corresponding to the sub-band i does not need to be adjusted, then the second signal-to-noise ratio may be the first signal-to-noise ratio corresponding to the sub-band i, such as the above-mentioned signal-to-noise ratio SNRi.
  • the electronic device can determine the amplitude threshold corresponding to each sub-band of the audio signal based on the masking curve, and refer to the amplitude threshold to determine the amplitude of each sub-band in a scenario that satisfies the physiological perception of human hearing. Whether the corresponding second signal-to-noise ratio needs further adjustment to determine the target signal-to-noise ratio corresponding to each subband.
  • determining a plurality of first signal-to-noise ratios between the plurality of first subbands and the plurality of second subbands includes: For the first subband and the second subband corresponding to the same frequency band, a third signal-to-noise ratio between the third audio signal and the first noise signal corresponding to the same time frame is determined; wherein, the first subband The band includes the third audio signal, and the second sub-band includes the first noise signal; based on the third signal-to-noise ratio, the first sub-band and the second sub-band corresponding to the same frequency band are determined The first signal-to-noise ratio between.
  • one subband may include multiple signals corresponding to each time frame.
  • the first subband may include a plurality of third audio signals corresponding to different time frames.
  • the second subband may include a plurality of first noise signals corresponding to different time frames.
  • the electronic device may compare the third audio signal corresponding to the same time frame with the first noise signal to obtain the third signal-to-noise ratio.
  • the electronic device may average or weight the third signal-to-noise ratio of each time frame corresponding to the first subband and the second subband, or select any one of the third signal-to-noise ratios corresponding to the time frame.
  • the third signal-to-noise ratio is the first signal-to-noise ratio between the first sub-band and the second sub-band, such as the above-mentioned signal-to-noise ratio SNRi.
  • the first signal-to-noise ratio corresponding to each subband may be determined for each subband based on the time frame. It can be understood that, as for the above-mentioned determination method of the second signal-to-noise ratio, the method of determining the signal-to-noise ratio of a subband based on a time frame in this embodiment can also be used as the initial signal-to-noise ratio of the subband.
  • the target signal-to-noise ratio includes the plurality of first signal-to-noise ratios; the target signal-to-noise ratio, the first audio signal and the target noise signal, determining a target gain signal corresponding to the first audio signal, including: based on the plurality of first signal-to-noise ratios, the plurality of first subbands and the plurality of second subbands. band, determining a first gain signal corresponding to each of the first sub-bands; wherein the target gain signal includes a plurality of the first gain signals corresponding to the plurality of first sub-bands.
  • the electronic device can perform gain adjustment on each sub-band corresponding to the first audio signal.
  • the first signal-to-noise ratio corresponding to each sub-band i can be used as the first signal-to-noise ratio of each sub-band i.
  • the target signal-to-noise ratio is used to determine the gain signal of each first sub-band in the first audio signal to ensure that the signal-to-noise ratio of each sub-band of the first audio signal is stable within the sub-band.
  • adjusting the first audio signal based on the target gain signal and obtaining the second audio signal includes: based on the plurality of first gain signals , adjust the gains of the corresponding plurality of first subbands to obtain a plurality of third subbands; wherein each of the third subbands is each of the first subbands after adjusting the gain; A plurality of third subbands are synthesized into the second audio signal.
  • gain when performing gain adjustment on the first subband based on the first gain signal corresponding to the first subband in the first audio signal, gain may be performed on each audio signal in the first subband. Adjustment. In addition, after gain adjustment is performed on each first sub-band divided by the first audio signal, each first sub-band after the gain adjustment can be re-synthesized into a complete audio signal, which is referred to as a second audio signal here.
  • the target gain signal here includes the above-mentioned plurality of first gain signals.
  • the electronic device can adjust the gain of each sub-band according to the gain signal of each sub-band corresponding to the audio signal to be output, so as to achieve a stable signal-to-noise ratio of each sub-band in the audio signal. In order to achieve a stable signal-to-noise ratio of the overall audio signal.
  • the method further includes: based on the plurality of first The signal-to-noise ratio determines the definition index; based on the definition index, the target signal-to-noise ratio is adjusted; wherein the adjusted target signal-to-noise ratio is used to determine the target gain signal.
  • the target signal-to-noise ratio can be the overall target signal-to-noise ratio between the first audio signal and the target noise signal, or it can be the target signal-to-noise ratio between the first audio signal and the target noise signal in each sub-band.
  • Signal-to-noise ratio (such as the first signal-to-noise ratio mentioned above).
  • the electronic device can determine the clarity index based on the first signal-to-noise ratio corresponding to each sub-band in the first audio signal, so as to use the clarity index to adjust the target signal-to-noise ratio, so as to use the adjusted
  • the target signal-to-noise ratio is used to determine the target gain signal.
  • using the clarity index to adjust the target signal-to-noise ratio can help improve the user's listening experience of the output audio signal.
  • the method further includes: adjusting the target gain signal based on the decibel of the target noise signal and a preset noise threshold. The target signal-to-noise ratio; wherein the adjusted target signal-to-noise ratio is used to determine the target gain signal.
  • the electronic device can adjust the target signal-to-noise ratio based on the decibel intensity of the target noise signal of the ambient sound, so as to enhance the electronic device's ability to adaptively maintain output sensitivity under different ambient noise intensities.
  • the target signal-to-noise ratio includes a fourth signal-to-noise ratio; and the determination of the target signal-to-noise ratio corresponding to the first audio signal and the target noise signal The ratio includes: taking a first average value of the plurality of first signal-to-noise ratios; and using the first average value as the fourth signal-to-noise ratio.
  • the electronic device can ensure the stability of the overall signal-to-noise ratio between the first audio signal and the target noise signal by adjusting the gain of the first audio signal.
  • the method of determining the overall signal-to-noise ratio may be to The first signal-to-noise ratio corresponding to each sub-band i is averaged as the overall signal-to-noise ratio, so that the target gain signal for the first audio signal is determined based on the overall signal-to-noise ratio, which can ensure the overall signal-to-noise ratio. Stablize.
  • the target signal-to-noise ratio includes a fifth signal-to-noise ratio; said determining the target signal-to-noise ratio corresponding to the first audio signal and the target noise signal,
  • the method includes: determining a second average value of the signal-to-noise ratio between the first audio signal and the target noise signal based on a time frame; and using the second average value as the fifth signal-to-noise ratio.
  • the electronic device can ensure the stability of the overall signal-to-noise ratio between the first audio signal and the target noise signal by adjusting the gain of the first audio signal.
  • the method of determining the overall signal-to-noise ratio can be based on In a time frame manner, specifically, the electronic device can calculate the ratio of the first audio signal and the target noise signal corresponding to the same time frame as their signal-to-noise ratio on the time frame; then, the electronic device can calculate the ratio of the first audio signal and the target noise signal in each time frame.
  • the signal-to-noise ratio is averaged (or weighted and summed, or the signal-to-noise ratio corresponding to a time frame is selected) as the overall signal-to-noise ratio between the first audio signal and the target noise signal.
  • the electronic device thereby adjusts the gain of the first audio signal based on the overall signal-to-noise ratio to ensure the stability of the overall signal-to-noise ratio of the audio signal.
  • obtaining the target noise signal corresponding to the environmental sound includes: obtaining a second noise signal corresponding to the environmental sound; based on the acoustic transfer function, The noise signal is processed to obtain the target noise signal.
  • the electronic device can use the acoustic transfer function to compensate for the second noise signal as the target noise signal of the environment to improve the user's listening perception.
  • the method further includes: based on the spatial distance between the human ear and the electronic device and Green's function, modifying the first audio signal or the The second audio signal is processed.
  • the electronic device can compensate the audio signal to improve the user's listening perception.
  • inventions of the present application provide an electronic device.
  • the electronic device includes: a memory and a processor, the memory is coupled to the processor; the memory stores program instructions, and when the program instructions are executed by the processor, the electronic device executes the first aspect and a method in any implementation of the first aspect.
  • embodiments of the present application provide a computer-readable medium for storing a computer program.
  • the electronic device When the computer program is run on an electronic device, the electronic device causes the electronic device to execute the first aspect and the first aspect. method in any embodiment.
  • embodiments of the present application provide a chip, which includes one or more interface circuits and one or more processors; the interface circuit is used to receive signals from the memory of the electronic device and provide them to the processor.
  • the processor sends the signal, and the signal includes computer instructions stored in the memory; when the processor executes the computer instructions, the electronic device is caused to execute the first aspect and any one of the implementation manners of the first aspect.
  • embodiments of the present application provide a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the first aspect and any one of the implementation methods of the first aspect. method in.
  • Figure 1 is one of the structural schematic diagrams of an exemplary electronic device
  • Figure 2 is a schematic diagram of the software structure of an exemplary electronic device
  • Figure 3 is a schematic diagram illustrating an exemplary user receiving audio
  • Figure 4 is a schematic diagram of an exemplary audio processing process of an electronic device
  • Figure 5a is a schematic diagram of an exemplary call scenario
  • Figure 5b is a schematic diagram of an exemplary call scenario
  • Figure 5c is a schematic diagram of an exemplary earphone
  • Figure 6 is an exemplary comparative diagram before and after audio signal processing
  • Figure 7 is a schematic diagram of an exemplary call scenario
  • Figure 8 is a schematic diagram of an exemplary call scenario
  • Figure 9 is a schematic structural diagram of a device provided by an embodiment of the present application.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations.
  • first and second in the description and claims of the embodiments of this application are used to distinguish different objects, rather than to describe a specific order of objects.
  • first target object, the second target object, etc. are used to distinguish different target objects, rather than to describe a specific order of the target objects.
  • multiple processing units refer to two or more processing units; multiple systems refer to two or more systems.
  • FIG. 1 shows a schematic structural diagram of an electronic device 100 .
  • the electronic device 100 shown in FIG. 1 is only an example of an electronic device.
  • the electronic device 100 may be a terminal, which may also be called a terminal device.
  • the terminal may be a cellular phone or a tablet computer. (pad), wearable devices (such as headphones) or Internet of Things devices, etc., are not limited in this application.
  • the electronic device 100 may have more or fewer components than shown in the figure, may combine two or more components, or may have different component configurations.
  • the various components shown in Figure 1 may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, And subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) wait.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • NPU neural-network processing unit
  • different processing units can be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous receiver and transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (derail clock line, SCL).
  • processor 110 may include multiple sets of I2C buses.
  • the processor 110 can separately couple the touch sensor 180K, charger, flash, camera 193, etc. through different I2C bus interfaces.
  • the processor 110 can be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • processor 110 may include multiple sets of I2S buses.
  • the processor 110 can be coupled with the audio module 170 through the I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface to implement the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface to implement the function of answering calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 110 and the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface to implement the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
  • the processor 110 and the camera 193 communicate through the CSI interface to implement the shooting function of the electronic device 100 .
  • the processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100 .
  • the USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other electronic devices, such as AR devices, etc.
  • the interface connection relationships between the modules illustrated in the embodiments of the present application are only schematic illustrations and do not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142, it can also provide power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc.
  • the power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
  • the mobile communication module 150 can provide solutions for wireless communication including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194.
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can implement the shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other format image signals.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the electronic device 100 .
  • the internal memory 121 may include a program storage area and a data storage area. Among them, the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.).
  • the storage data area may store data created during use of the electronic device 100 (such as audio data, phone book, etc.).
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or some functional modules of the audio module 170 may be provided in the processor 110.
  • Speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or a voice message, the voice can be heard by bringing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 170C with the human mouth and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which in addition to collecting sound signals, may also implement a noise reduction function. In other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.
  • the headphone interface 170D is used to connect wired headphones.
  • the headphone interface 170D may be a USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, or a Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA Cellular Telecommunications Industry Association of the USA
  • the buttons 190 include a power button, a volume button, etc.
  • Key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • the motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 .
  • Different application scenarios such as time reminders, receiving information, alarm clocks, games, etc.
  • the touch vibration feedback effect can also be customized.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to or separated from the electronic device 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • the electronic device 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 is also compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as calls and data communications.
  • the electronic device 100 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100 .
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of this application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .
  • FIG. 2 is a software structure block diagram of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture of the electronic device 100 divides the software into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the Android system is divided into four layers, from top to bottom: application layer, application framework layer, Android runtime and system libraries, and kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include camera, gallery, calendar, calling, map, navigation, WLAN, Bluetooth, music, video, short message and other applications.
  • the application framework layer provides an application programming interface (API) and programming framework for applications in the application layer.
  • API application programming interface
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make this data accessible to applications.
  • Said data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, etc.
  • a view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the electronic device 100 .
  • call status management including connected, hung up, etc.
  • the resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a beep sounds, the electronic device vibrates, the indicator light flashes, etc.
  • the system library and runtime layer include system libraries and Android Runtime.
  • System libraries can include multiple functional modules. For example: surface manager (surface manager), media libraries (Media Libraries), 3D graphics processing libraries (for example: OpenGL ES), 2D graphics engines (for example: SGL), etc.
  • the 3D graphics library is used to implement three-dimensional graphics drawing, image rendering, composition and layer processing, etc.
  • the Android runtime includes core libraries and a virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
  • the core library contains two parts: one is the functional functions that need to be called by the Java language, and the other is the core library of Android.
  • the application layer and application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and application framework layer into binary files. The virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
  • 2D Graphics Engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the components included in the system framework layer, system library and runtime layer shown in Figure 2 do not constitute specific limitations on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • Audio signals can be divided into valid audio signals and noise signals.
  • the effective audio signal is the audio signal to be output by the electronic device.
  • the music audio signal received by the mobile phone from the application server or the music audio signal to be played saved on the mobile phone itself can be a valid audio signal.
  • the voice audio signal received by mobile phone 1 from mobile phone 2 is also a valid audio signal.
  • the noise signal can be environmental sound collected by electronic equipment, etc.
  • the human ear can not only hear the effective audio signal S played by the earphones, but also hear the noise signal, such as the environmental noise signal N of the environment where the user is located.
  • the microphone of the earphone can collect the environmental noise signal N.
  • the headset worn by the local user and connected to the mobile phone can play the voice audio signal from the peer during the call (which can be used as the effective audio signal S here).
  • the headset can also play the music audio signal (which can be used as a valid audio signal S).
  • the user's perception of the audio will be affected by the unstable environmental noise around the local user, causing the volume of the effective audio signal heard by the user to fluctuate. Small, not stable enough, poor listening experience.
  • ACG Automatic Gain Control
  • ACG uses an effective combination of linear amplification and compression to dynamically adjust the output audio signal.
  • ACG can control the sound size by changing the compression ratio of the input and output amplitude of the audio signal (also called the gain of the audio signal), so that the final output sound amplitude is always maintained within a constant range. For example, when a weak audio signal is input, the amplitude of the audio signal can be increased by increasing the gain of the audio signal to ensure the intensity of the output audio signal; when the input audio signal reaches a certain intensity, by reducing the gain of the audio signal Reduce the amplitude of the output audio signal. Therefore, ACG can automatically adjust the amplitude of the output audio signal by changing the gain of the audio signal, preventing the volume of the audio signal heard by the user from being louder and smaller, making the listening experience more stable and comfortable.
  • the audio signals when processing audio signals, the audio signals are not distinguished, for example, between valid audio signals and noise signals. Instead, the audio signals (including valid audio signals S and noise signals) acquired by the electronic device are processed. N) perform automatic gain control together to maintain the amplitude of the audio signal within a certain range.
  • this processing method only considers the stability of the overall audio signal, resulting in increasing or decreasing the amplitude of the effective audio signal, while also increasing or decreasing the amplitude of the noise signal N. So in a scenario where the external environmental noise intensity is not stable enough, the volume of the effective audio signal heard by the user is still sometimes loud and sometimes small, and the volume is not stable enough, which affects the user's listening experience.
  • this application provides an audio processing method and electronic device.
  • This method ensures the stability of the signal-to-noise ratio SNR (Signal to Noise Ratio) of the output audio.
  • SNR Signal to Noise Ratio
  • the method of this application can ensure the stability of the signal-to-noise ratio of the audio signal (the ratio of the effective audio signal S to the ambient noise N ), to improve the user’s listening comfort in scenarios such as listening to music or making phone calls, to enhance the listening experience.
  • FIG. 4 is a flowchart illustrating the steps of the audio processing method of the electronic device of the present application.
  • This audio processing method (such as the method process shown in Figure 4) can be applied to voice or video call scenarios, audio playback scenarios, and other unlisted electronic devices that need to output audio. This is not the case here. Again.
  • This application takes the voice call scenario as an example to illustrate the above method.
  • this method is applied to other scenarios that require audio output, the method principles are similar and will not be described again here.
  • Figure 5a is a schematic diagram of an exemplary call scenario.
  • the call service network element 1 can collect the collected voice signal (here as the effective audio signal S2) of the local user (here user 1) and send it to the call service network element 2.
  • the call service network element 2 can output the effective audio signal S2 from the call service network element 1, so that the user 2 on the call service network element 2 side can hear the call voice of the user 1.
  • the call service network element 2 can collect the voice signal of the user 2 (herein referred to as the effective audio signal S1), and send the effective audio signal S1 to the call service network element 1.
  • the call service network element 1 can output the effective audio signal S1 so that the user 1 can hear the call voice of the user 2. In this way, voice calls between users at both ends are realized.
  • the number of call service network elements can be more than two, for example, in a conference scenario or a traffic scenario.
  • the principles are similar, and this application does not limit this.
  • the call service network element may be a traffic system device or a conference system device, or a mobile phone, or a software module or hardware chip integrated in the mobile phone, etc. This application does not limit this.
  • the call service network element can directly collect the audio of the user on the side of the network element or the noise signal on the side of the network element without using equipment such as headphones, or it can also directly collect the audio signal of the user on the side of the network element without using equipment such as headphones. Play the audio signal from the peer call service network element.
  • the call service network element is a mobile phone or a software or hardware module integrated in the mobile phone.
  • the call service network element and the user on the network element side play audio and collect audio and noise through media such as headphones.
  • the call service network element may be a mobile phone or a software or hardware module integrated in the mobile phone.
  • Figure 5b is a schematic diagram illustrating a scenario in which the call service network element 1 interacts with the user 1 through the earphone.
  • the process of calling service network element 2 and its connected headset is the same as that described in Figure 5b and will not be described again here.
  • the headset worn by user 1 may include an external microphone 201 and a speaker 202.
  • the external microphone 201 in Figure 5b can collect the voice signal of the user 1, for example, as the effective audio signal S2 in Figure 5a, and send it to the call service network element 1.
  • the headset can also collect environmental noise as a noise signal N (as an uplink signal) and send it to the call service network element 1 .
  • a noise signal N as an uplink signal
  • the noise signal N and the effective audio signal S2 are generally not collected at the same time.
  • the earphone collects the effective audio signal S2
  • the earphone collects the noise signal N.
  • the call service network element 1 may also send the received effective audio signal S (for example, the effective audio signal S1 shown in Figure 5a) to the earphone as a downlink signal.
  • the received effective audio signal S for example, the effective audio signal S1 shown in Figure 5a
  • the call service network element 1, or the headset can execute the process of Figure 4 to combine the noise signal N to perform gain processing on the effective audio signal S to be output (for example, the effective audio signal S1) to improve the effective audio Stability of the signal-to-noise ratio between signal S and noise signal N.
  • the process may include the following steps:
  • the earphones can acquire environmental noise, and the collected environmental noise can be used as the noise signal N here.
  • the earphones can also perform noise reduction processing on the acquired environmental noise, and use the collected environmental noise after noise reduction processing as the noise signal N here. This application does not limit this.
  • the headset not only includes the speaker 202 and the external microphone 201 shown in Figure 5b, but also includes a built-in microphone 203.
  • the headset can collect the reference sensor signal x(n) (basically) for this environmental noise. Equivalent to d(n)).
  • the adaptive filter of the headset can process the reference sensor signal x(n) to obtain the inverted signal y(n) of the reference sensor signal x(n).
  • the headset can play the reverse signal y(n) through the speaker 202.
  • the ear of the user 1 who wears the headset is near the position of the speaker 202, and the human ear can not only hear the external direct noise d(n), but also the inverted signal that is opposite to the external direct noise d(n).
  • y(n) the inverted signal y(n) played by the output can offset part of the external direct noise d(n) in the environment, so that the environment where the speaker 202 shown in Figure 5c(1) is located is The noise is the residual noise signal (here the error sensor signal e(n)).
  • the built-in microphone 203 can collect the error sensor signal e(n) in the environment.
  • the headset can use the error sensor signal e(n) collected by the built-in microphone 203 as the noise signal N, or use the reference sensor signal x(n) collected by the external microphone 201 as the noise signal N , this application does not limit this.
  • the noise signal N collected by the earphone may be the reference sensor signal x(n) collected by the microphone 201 shown in FIG. 5c, or the error sensor signal e(n) collected by the microphone 203 shown in FIG. 5c.
  • the headset can compensate the collected noise signal N so that the compensated noise signal is N_true. Then when the headset executes the following S105, the compensated noise signal N_true can be processed as the noise signal in S105 to improve the user's listening perception.
  • the earphone can compensate the noise signal N based on the acoustic transfer function H( ⁇ ) to obtain N_true.
  • N_true N*H( ⁇ ).
  • the acoustic transfer function H( ⁇ ) can be configured in the headset or the call service network element 1 shown in Figure 5b, and this application does not limit this.
  • the acoustic transfer function H( ⁇ ) can be obtained based on fitting a large number of headphones, and then stored in the headphones or the call service network element 1 .
  • the headset can obtain the audio signal to be output from the call service network element 1 (for example, the call service network element 1 is a mobile phone connected to the headset via Bluetooth), here as the effective audio signal S.
  • the call service network element 1 for example, the call service network element 1 is a mobile phone connected to the headset via Bluetooth
  • the effective audio signal S here as the effective audio signal S.
  • this application does not limit the execution order of S101 and S103.
  • S105 Determine a target signal-to-noise ratio based on the noise signal and the effective audio signal.
  • the headset may determine the target signal-to-noise ratio of the noise signal N and the effective audio signal S based on the noise signal N and the effective audio signal S to be output.
  • the target signal-to-noise ratio may be the target signal-to-noise ratio SNR_t of the two sets of signals acquired in S101 and S103, and/or the target signal-to-noise ratio may include: the noise signal N and the effective audio signal
  • the target signal-to-noise ratio SNRi of each sub-band corresponding to S, where i 1, 2, 3, 4, k, where k is the number of sub-bands.
  • the subband may correspond to a frequency band, and the specific definition may refer to the existing technology, which is not limited in this application.
  • the earphone when the earphone determines each sub-band corresponding to the noise signal N and the effective audio signal S, the earphone can divide both the noise signal N and the effective audio signal S into multiple sub-bands in the time domain.
  • the noise signal N is divided into 20 sub-bands n_i, namely sub-band n_1 to sub-band n_20;
  • the effective audio signal S is divided into 20 sub-bands s_i, namely sub-band s_1 to sub-band s_20;
  • the frequency band i corresponding to subband n_i is the same as the frequency band i corresponding to subband s_i.
  • subband i may represent subband n_i and subband s_i.
  • subband n_i corresponds to frequency band 1
  • subband s_i also corresponds to frequency band 1.
  • frequency band 1 includes multiple frequency points from f1 to f4, and this application does not limit this.
  • the method by which the earphone converts the time domain signal into the frequency domain signal may be Fourier transform or other methods, which is not limited in this application.
  • methods 1 to 3 can be used. Any implementation method is not limited by this application.
  • Method 1.1 Pre-configure the value of the overall target signal-to-noise ratio SNR_t of the effective audio signal and the noise signal.
  • the overall target signal-to-noise ratio and the target signal-to-noise ratio of the subbands are pre-configured, user configuration (such as UI interface) or system configuration (such as configuration file) can be used, and this application does not limit this.
  • the headset can obtain the value of the target signal-to-noise ratio SNR_t from the configuration file as the overall target signal-to-noise ratio SNR_t of the noise signal N and the effective audio signal S to be output.
  • the headset can obtain the target signal-to-noise ratio corresponding to the target sub-band from the target signal-to-noise ratio of each sub-band preset in the configuration file as the signal-to-noise ratio of the target sub-band.
  • the target subbands are a plurality of subbands divided into the noise signal N collected by the earphone pair and the acquired effective audio signal S to be output. In this way, the headset can obtain the target signal-to-noise ratio of each subband from the preconfigured information.
  • Method 1 can quickly obtain the overall target signal-to-noise ratio of the effective audio signal and noise signal, as well as the target signal-to-noise ratio of each sub-band, so as to adjust the gain of the effective audio signal and ensure the overall signal-to-noise ratio of the effective audio signal and noise signal.
  • the noise ratio is stabilized within the target signal-to-noise ratio
  • the signal-to-noise ratio of each subband is stabilized within the target signal-to-noise ratio of each subband.
  • Method 2.1 In the process of the headset obtaining the overall target signal-to-noise ratio SNR_t of the effective audio signal S to be output and the environmental noise signal N, the headset can continuously calculate the effective audio signal S of each time frame and the environmental noise based on the time frame.
  • the SNR of the noise signal N i.e., S/N
  • the headset can determine the above-mentioned target signal-to-noise ratio SNR_t based on the SNR corresponding to each time frame.
  • Example 1 taking the earphones playing music as an example, within a period of time (for example, 2s) when the earphones play music, calculate the average amplitude S avg of the effective audio signal S (here, the audio signal of the music) during this period, and this The average amplitude Navg of the environmental noise signal N collected during a period of time.
  • the headset can use S avg /N avg as the target signal-to-noise ratio SNR_t between the audio signal of the next piece of music to be output and the collected noise signal N.
  • Example 2 When determining the mean value of SNR corresponding to each time frame, the headset can also continuously update the determined mean value of SNR, so that the updated mean value of SNR is used as the above-mentioned target signal-to-noise ratio SNR_t.
  • the headset can be set with a sliding time window (for example, 2s as mentioned above).
  • a sliding time window for example, 2s as mentioned above.
  • the headset calculates S avg /N avg for the music audio signal and noise signal collected in the time window.
  • the headset calculates S avg /N avg for the music audio signal and noise signal collected in this time window; then, the headset calculates the target signal
  • the noise ratio SNR_t is updated to S avg /N avg corresponding to this time window.
  • the target signal-to-noise ratio SNR_t can be continuously updated as the noise signal N changes.
  • the length of the sliding time window can be different. For example, when the amplitude of the noise signal is relatively stable, the time window can be longer, and when the amplitude of the noise signal is relatively dynamic, the time window can be shorter.
  • Example 3 The headset determines the above target signal-to-noise ratio SNR_t based on the SNR corresponding to each time frame.
  • the headset can calculate the SNR once for the effective audio signal and noise signal at each time point; then, the headset calculates the SNR corresponding to multiple time points. Calculate the average value, or the weighted sum, or sample the SNR corresponding to a time point, as the overall target signal-to-noise ratio SNR_t of the effective audio signal S and the environmental noise signal N determined based on the time frame.
  • the noise signal N collected by the headset within a period of time and the effective audio signal S during the period of time can correspond to each other in the time frame.
  • the time t0 shown in Figure 6 not only corresponds to one of the curves 1.
  • the sampling point (a sampling point in the effective audio signal S) also corresponds to a sampling point in curve 2 (a sampling point in the noise signal N). These two sampling points are related to each other due to time t0.
  • manner 2.1 is not limited to the above examples 1 to 3, and may also include other implementations, which will not be described again here.
  • Method 2.2 When the headset obtains the target signal-to-noise ratio SNRi of each sub-band corresponding to the effective audio signal S to be output and the environmental noise signal N, for the target signal-to-noise ratio SNRi of any sub-band, the headset can be based on the sub-band.
  • the corresponding effective audio signal and noise signal determine the signal-to-noise ratio SNR_it between the effective audio signal and the noise signal corresponding to each time frame.
  • the effective audio signal S and the noise signal N can correspond to each other in time frames.
  • the headset can calculate the signal-to-noise ratio SNR_it for each effective audio signal and each noise signal within a sub-band, and for the effective audio signal and noise signal corresponding to each other in the time frame, to obtain multiple signal-to-noise corresponding to the sub-band.
  • SNR_it the headset can determine the target signal-to-noise ratio SNRi corresponding to the sub-band based on the plurality of signal-to-noise ratios SNR_it corresponding to the sub-band.
  • the headset when determining the target signal-to-noise ratio SNRi corresponding to the sub-band based on the multiple signal-to-noise ratios SNR_it corresponding to the sub-band, the headset can randomly sample one signal-to-noise ratio SNR_it from the multiple signal-to-noise ratios SNR_it. , as the target signal-to-noise ratio SNRi corresponding to this sub-band.
  • the headset may average the multiple signal-to-noise ratios SNR_it as the sub-band. With the corresponding target signal-to-noise ratio SNRi.
  • the headset may weight and sum the multiple signal-to-noise ratios SNR_it as the The target signal-to-noise ratio SNRi corresponding to the subband.
  • This application does not limit the specific strategy for the earphone to determine the target signal-to-noise ratio SNRi corresponding to the sub-band based on the multiple signal-to-noise ratios SNR_it corresponding to the sub-band.
  • method 2.2 when obtaining the target signal-to-noise ratio of each sub-band, the principle is similar to that of method 2.1. The difference is that the signal-to-noise ratio SNR_it is calculated for the effective audio signal S and noise signal N corresponding to each other in the time frame within the sub-band. And based on the SNR_it, the target signal-to-noise ratio SNRi of the subband is obtained.
  • Method 3.1 When the headset obtains the target signal-to-noise ratio SNRi of each sub-band corresponding to the effective audio signal S to be output and the environmental noise signal N, it can obtain the target corresponding to each sub-band based on the masking curve of human hearing.
  • Signal-to-noise ratio SNRi For example, the headset can be implemented through S201 to S203:
  • the headset obtains the signal-to-noise ratio SNRi of each subband as the initial target signal-to-noise ratio SNRi of each subband through the above method 1.2 or the above 2.2 or other methods.
  • the earphone Based on the masking curve of human hearing, the earphone obtains the amplitude corresponding to each sub-band as the acoustic masking threshold thr_i corresponding to each sub-band.
  • this application does not limit the execution order of S201 and S202, and they are all executed before S203.
  • the masking curve of human hearing may not distinguish between time domain signals and frequency domain signals.
  • the above-mentioned masking curve can be any masking curve related to human hearing in the prior art, and this application is not limited thereto.
  • the effective audio signal S and the noise signal N are divided into multiple sub-bands.
  • a subband of the effective audio signal S may include multiple sampling points, and each sampling point includes an amplitude and a frequency point.
  • the frequency bands between the multiple sub-bands (including multiple audio signals) divided by the effective audio signal S and the multiple sub-bands (including multiple noise signals) divided by the noise signal N are the same.
  • the effective audio signal is divided into sub-band 1 to sub-band 20.
  • the noise signal N is also divided into sub-band 1 to sub-band 20.
  • the corresponding frequency bands of subband 1 in the effective audio signal S are f1 to f4
  • the corresponding frequency bands of subband 1 in the noise signal N are also f1 to f4. Only in the frequency band corresponding to sub-band 1, the amplitude of the noise signal at each frequency point f is different from the amplitude of the effective audio signal S at each frequency point f.
  • the above-mentioned masking curve can be a curve in which the horizontal axis is the frequency point (also called frequency) and the vertical axis is the amplitude.
  • Each of the multiple sub-bands divided into the effective audio signal S and the noise signal N can correspond to one frequency band. Then the headphone can obtain the amplitude corresponding to any frequency point in the frequency band from the above masking curve, as the amplitude corresponding to the sub-band of the frequency band, and as the acoustic masking threshold of the sub-band.
  • this application's strategy of determining the corresponding amplitude on the masking curve for the frequency band corresponding to each sub-band is not limited to the above example.
  • Other known methods can also be used to determine the amplitude corresponding to the frequency band of the sub-band. , as the acoustic masking threshold of the subband corresponding to this frequency band.
  • the above-mentioned masking curve of this application is used to indicate that in the same or adjacent frequency band, the energy difference between two signals exceeds the acoustic masking threshold thr_i, and the low-energy signal can be masked, making the low-energy signal inaudible to the human ear.
  • the acoustic masking threshold thr_i is generated to adjust the gain of the effective audio signal to change the amplitude of the effective audio signal, so that the effective audio signal after changing the amplitude can mask the corresponding noise signal and improve the user experience.
  • the auditory experience of valid audio signals is used to indicate that in the same or adjacent frequency band, the energy difference between two signals exceeds the acoustic masking threshold thr_i, and the low-energy signal can be masked, making the low-energy signal inaudible to the human ear.
  • the headset Based on the acoustic masking threshold thr_i corresponding to each sub-band, the audio signal si and the noise signal ni corresponding to each sub-band, the headset determines whether to adjust the initial target signal-to-noise ratio SNRi corresponding to each sub-band to determine each The target signal-to-noise ratio SNRi of the subband.
  • the audio signal si corresponding to the sub-band i of the effective audio signal S can be a sampling point p corresponding to the sub-band i of the effective audio signal S;
  • the noise signal ni corresponding to the sub-band i of the noise signal N can be is a sampling point q corresponding to the subband i of the noise signal N, where the frequency points corresponding to the sampling point p and the sampling point q are the same.
  • the sampling point p is any sampling point in the sub-band i of the effective audio signal S
  • the sampling point q is a sampling point in the sub-band i of the noise signal N that is the same frequency point as the sampling point p.
  • the amplitude of the sampling point p can be the average amplitude of multiple sampling points corresponding to the sub-band i of the effective audio signal S, and the frequency point of the sampling point p can be any frequency point in the sub-band i;
  • the amplitude of sampling point q is the average amplitude of multiple sampling points corresponding to sub-band i of the noise signal N, and the frequency point of sampling point q is the same as the frequency point of sampling point p.
  • the audio signal si corresponding to the sub-band i of the effective audio signal S can also be a plurality of sampling points p corresponding to the sub-band i of the effective audio signal S (for example, each sampling point in the sub-band i or partial sampling points, this application does not limit this);
  • the noise signal ni corresponding to the sub-band i of the noise signal N can be multiple sampling points q corresponding to the sub-band i of the noise signal N (for example, in the sub-band i Each sampling point or part of the sampling points, this application does not limit this), where the number of sampling points p and sampling points q is the same, and the frequency points corresponding to each group of sampling points p and sampling point q are the same.
  • the headset may not adjust the initial target signal-to-noise ratio SNRi corresponding to the sub-band i, and use the initial target signal-to-noise ratio SNRi as the target signal-to-noise ratio SNRi corresponding to the sub-band i.
  • the headset can update the target signal-to-noise ratio SNRi of the sub-band i to si/(ni+thr_i).
  • Method 3.2 When the headset obtains the overall target signal-to-noise ratio SNR_t of the effective audio signal and noise signal, the target signal-to-noise ratio SNRi corresponding to each sub-band i can be averaged as the target signal-to-noise ratio SNR_t.
  • the method of obtaining the target signal-to-noise ratio SNR_t is not limited to method 3.2.
  • the headset can obtain the acoustic masking threshold corresponding to each sub-band based on the masking curve of human hearing, and combine the acoustic masking threshold to determine the signal-to-noise of each sub-band or the overall signal. ratio, so that the signal-to-noise ratio integrates the psychological perception of human hearing.
  • Using the signal-to-noise ratio to adjust the effective audio signal S can make the adjusted signal-to-noise ratio of the effective audio signal achieve the best listening perception effect.
  • the specific methods for obtaining the target signal-to-noise ratio SNR_t and the specific methods for obtaining the target signal-to-noise ratio SNRi corresponding to each subband in the above-mentioned methods 1 to 3 can be freely combined, and this application does not impose restrictions on this.
  • the earphone obtains the above-mentioned overall target signal-to-noise ratio SNR_t, it is implemented through method 2.1, and when it obtains the target signal-to-noise ratio SNRi corresponding to each subband, it is implemented through method 3.1.
  • the earphone can also determine the articulation index (articulation index) based on the target signal-to-noise ratio SNRi respectively corresponding to multiple sub-bands (the sub-bands into which the noise signal N and the effective audio signal S are divided).
  • index AI
  • AI is the speech intelligibility evaluation parameter.
  • the headset can adjust the target signal-to-noise ratio SNRi corresponding to each sub-band and/or the above-mentioned overall target signal-to-noise ratio SNR_t based on AI, and the adjusted target signal-to-noise ratio SNRi, and/or the adjusted The target signal-to-noise ratio SNR_t is used as the target signal-to-noise ratio in S107 to determine the gain.
  • the headset determines AI
  • it can be implemented through steps 1 and 2:
  • Step 1 The headset can normalize the target signal-to-noise ratio SNRi corresponding to multiple sub-bands.
  • the headset can update the target signal-to-noise ratio SNRi (that is, the SNR dB in Formula 2) of each subband obtained through Method 1, Method 2, or Method 3 in the range of [-15, 15] according to Formula 2.
  • the headset maps the target signal-to-noise ratio SNRi (i.e., SNR' dB (f i ) in Formula 2) updated in the range of [-15,15] to [0,1] according to Formula 3, where, in Formula In 3, the target signal-to-noise ratio SNRi of each subband whose value is mapped to [0,1] is represented by SNR M (fi ) .
  • the headset can update the value of the target signal-to-noise ratio SRNi to -15; when the target signal-to-noise ratio SRNi (that is, SNR dB in Formula 2) is When the value is greater than 15, the headset can update the value of the target signal-to-noise ratio SRNi to 15; when the value of the target signal-to-noise ratio SRNi (i.e., SNR dB in Formula 2) is greater than or equal to -15 and less than or equals 15, the value of the target signal-to-noise ratio SRNi for the headset remains unchanged.
  • the number of subbands is k
  • the k updated target signal-to-noise ratios SNRi corresponding to the k subbands are expressed in SNR' dB (fi ) .
  • the critical value set for updating SNRi is not limited to -15 and 15, and can also be other values, which is not limited in this application.
  • Step 2 The headset can determine the clarity index AI based on the normalized target SNRi.
  • the headset can perform a weighted summation of the target SNRi (expressed as SNR M (f i ) in Formula 3 and Formula 4) after normalization by Formula 2 and Formula 3 based on Formula 4 to determine Articulation Index AI.
  • k in Formula 4 is the number of frequency bands into which the spectrum is divided, that is, the number k of sub-bands i.
  • Wi represents the band-Importance Functions (BIF) of the i-th frequency band (here, the frequency band corresponding to the i-th sub-band).
  • BIF band-Importance Functions
  • the BIF satisfies Formula 5 and can be obtained through a large number of experiments.
  • the Wi is equivalent to the weight.
  • the headset can adjust the overall target signal-to-noise ratio SNR_t based on AI, and can also adjust the target signal-to-noise ratio SNRi corresponding to the subband.
  • the headset can adjust SNR_original through Formula 6, and the adjusted signal-to-noise ratio is SNR_target.
  • SNR_original represents the target signal-to-noise ratio before AI adjustment
  • SNR_target represents the target signal-to-noise ratio after AI adjustment.
  • SNR_original is the above-mentioned overall target signal-to-noise ratio SNR_t determined in S105
  • SNR_target is the overall target signal-to-noise ratio SNR_t adjusted by using AI.
  • SNR_original is the target signal-to-noise ratio SNRi of sub-band i determined in S105
  • SNR_target is the target signal-to-noise ratio SNRi of sub-band i adjusted using AI.
  • 1/AI can be constrained to be between 1.0 and 1.3.
  • 1/AI when constraining 1/AI, after calculation of Formula 4, it is determined that 1/AI is greater than 1.3, the value of 1/AI can be updated to 1.3; when the value of 1/AI is less than 1.0, the value of 1/AI can be updated to 1.3. Update the value of 1/AI to 1.0; when 1.0 ⁇ 1/AI ⁇ 1.3, keep 1/AI unchanged.
  • constraint condition of 1/AI is not limited to the above examples of 1.0 and 1.3. It can also be other constraint values or other constraint conditions, and this application does not limit this.
  • the headset uses AI to adjust the target signal-to-noise ratio, it is not limited to formula 6.
  • Other strategies can also be used, and this application does not limit this.
  • the headset can increase the target signal-to-noise ratio (which can be the above-mentioned SNR_t, or the SNRi corresponding to subband i), or keep the target signal-to-noise ratio unchanged.
  • the headset can reduce the target signal-to-noise ratio.
  • the headset can adjust the target signal-to-noise ratio (which can be the above-mentioned SNR_t, or the SNRi corresponding to sub-band i) according to the decibel size of the collected environmental noise signal N.
  • N0 may be a configurable basic noise level (for example, 50 dB (decibel), which is not limited by this application), where N is the above-mentioned noise signal N collected by the headset when executing S101 in Figure 4 .
  • the headset can adjust the target signal-to-noise ratio based on the adjustment coefficient d.
  • the headset can adjust the target signal-to-noise ratio according to Formula 7.
  • SNR_original in Formula 7 represents the target signal-to-noise ratio before adjustment using the adjustment coefficient d
  • SNR_target represents the target signal-to-noise ratio after adjustment using the adjustment coefficient d.
  • the adjustment coefficient d can be constrained to be between 0.9 and 1.1.
  • the value of d when constraining the adjustment coefficient d, after the above calculation of N/N0, it is determined that d is greater than 1.1, the value of d can be updated to 1.1; when the value of d is less than 0.9, the value of d can be The value is updated to 0.9; when 0.9 ⁇ 1/AI ⁇ 1.1, d remains unchanged.
  • constraint condition for d is not limited to the above examples of 0.9 and 1.1. It can also be other constraint values or other constraint conditions, and this application does not limit this.
  • S107 Determine a gain signal based on the target signal-to-noise ratio, the noise signal and the effective audio signal.
  • the headset can determine the gain signal G through Formula 8.
  • the headset can determine the gain signal g_i corresponding to the sub-band i through Equation 9, where the effective audio signal Each subband i of S corresponds to a gain signal g_i.
  • g_i SNRi*ni/si, formula 9;
  • ni is the noise signal ni corresponding to the sub-band i in the noise signal N
  • si is the effective audio signal si corresponding to the sub-band i in the effective audio signal S.
  • i 1,2,3,...,k.
  • S109 Based on the gain signal, adjust the effective audio signal and output the adjusted effective audio signal.
  • the headset can use the overall gain signal G of the effective audio signal to adjust the gain of the effective audio signal S to change the amplitude of the effective audio signal S.
  • S' f(G,S).
  • the headset can use the gain signal g_i corresponding to each sub-band i to adjust the gain signal g_i in the sub-band i.
  • curve 1 is a signal diagram of the effective audio signal S before the effective audio signal S is processed (here, the gain is adjusted) using the method of the present application.
  • curve 2 is the signal diagram of the noise signal N, where curve 2 in Figure 6(1) and Figure 6(2) is the same.
  • curve 2 shown in Figure 6(1) Take curve 2 shown in Figure 6(1) as an example for explanation. The same applies to curve 2 shown in Figure 6(2), which will not be described again here.
  • the amplitude of the noise signal N is not stable, and the decibels of the environmental noise are sometimes strong and sometimes weak.
  • the amplitude of the noise signal N is much larger than the amplitude of the noise signal N at time t0 (or time t2). It can be understood that the environmental noise shown in curve 2 suddenly increases in sound at time t2, and the sound at other times is relatively stable.
  • User 1 has a voice call with User 2, and User 1 uses headphones to play User 1's effective audio signal S.
  • the volume of the effective audio signal S is relatively stable, but in the environmental noise on user 1's side, at time t1, the environmental noise suddenly increases in decibels, causing the sound of the effective audio signal S heard by user 1's human ears to be easily Covered by the noise signal N, it affects the user's ability to listen to the effective audio signal S.
  • the overall signal-to-noise ratio SNR of the effective audio signal S and the noise signal N is not stable.
  • the earphone of the present application can process each sub-band of the effective audio signal S according to the above audio processing method.
  • the curve 1' is the value of the processed effective audio signal S. Signal diagram.
  • the effective audio signal The amplitude of S increases significantly, making the signal-to-noise ratio more stable during this period; compared with curve 1, it can be determined from curve 1' that before t11 and after t12, the amplitude of the effective audio signal has not being regulated.
  • the earphone of the present application can specifically adjust part of the frequency domain or part of the time domain of the effective audio signal that needs to be adjusted, so as to improve the stability of the signal-to-noise ratio of the effective audio signal S and the noise signal N within the sub-band. and the overall signal-to-noise ratio stability of the signal.
  • Figure 6 is intended to illustrate that the signal-to-noise ratio of the effective audio signal S processed by the method of the present application and the noise signal N has better stability.
  • the method of this application has the same effect after processing effective audio signals in the frequency domain.
  • the earphones can adjust the gains of different sub-bands of the effective audio signal respectively, so that the gain adjustment of different frequency components of the effective audio signal can be differentiated, so as to improve the SNR stability of each sub-band.
  • the headset can adjust the gain of each effective audio signal between the adjusted sub-bands.
  • the effective audio signal (for example, si_l' corresponding to each sub-band i) is resynthesized into an overall gain-adjusted effective audio signal (which can also be marked as an effective audio signal S').
  • the specific synthesis method may refer to known technologies and is not limited here.
  • the headphones can output and play the synthesized effective audio signal S’.
  • the effective audio signal S' synthesized here can be distinguished from the effective audio signal S' after gain adjustment using the overall target signal-to-noise ratio SNR-t.
  • the earphone obtains the overall gain signal G of the effective audio signal to adjust the gain of the effective audio signal S, which can make the overall signal-to-noise ratio of the effective audio signal and the noise signal more stable.
  • the headset obtains the gain signal g_i of the effective audio signal of each sub-band to adjust the gain of each sub-band of the effective audio signal S respectively.
  • the effective audio signal S can be adjusted based on the SNR of the sub-band, so that the effective audio signal S of each sub-band is effectively The signal-to-noise ratio of the audio signal is more stable.
  • the method of the present application can be used to increase the amplitude of the effective audio signal to be output, so as to increase the volume of the effective audio output and avoid the effective audio signal being masked by the environmental noise, resulting in The user cannot hear the audio of the call or the audio being played.
  • the method of the present application can also be used to reduce the amplitude of the effective audio signal to be output, so as to reduce the volume of the effective audio output and avoid the excessive volume of the effective audio from affecting the user's hearing. . In this way, the audio output from the headphones can maintain the stability of the signal-to-noise ratio.
  • the audio processing method of the present application is described with the headset as the execution subject.
  • the call service network element for example, the call service network element 1 shown in Figure 5b
  • the call service network element 1 shown in Figure 5b can be used as the execution subject to execute the processing processes of each implementation manner and each method described in scenario 1. The principles are similar and will not be used here. Let’s go over them one by one.
  • the call service network element 1 can be implemented according to the relevant implementation illustrated in Figure 4 and/or Scenario 1.
  • method to perform gain processing on the effective audio signal S to be output (for example, the effective audio signal S1 shown in Figure 5a), to obtain the effective audio signal S' indicated by the dotted arrow in Figure 5b (for example, perform gain processing on the effective audio signal S1 , get the effective audio signal S1').
  • the call service network element 1 When the call service network element 1 outputs the gain-processed effective audio signal S' (for example, the effective audio signal S1'), the processed effective audio signal S' can be output to the earphone, and the earphone outputs and plays it through the speaker 202.
  • the effective audio signal S' output by the call service network element to the headset is the effective audio signal after gain processing according to the method of this application, rather than from the opposite side network element (such as the call service shown in Figure 5a Network element 2) receives the valid audio signal S1.
  • the call service network element may be an electronic device such as a telephone system device, a conference system device, a server, or a mobile phone, or the call service network element may be a software module or hardware chip integrated in the electronic device. This application There are no restrictions on this.
  • the call service network element 1 shown in Figure 5a is the mobile phone shown in Figure 7, or a software module or hardware chip in the mobile phone shown in Figure 7.
  • Figure 7 is a schematic diagram of an interaction scene between user 1's mobile phone and the earphones worn by user 1.
  • the mobile phone can be used as the execution subject to execute the corresponding steps of the audio processing method in scenario 1.
  • the specific execution principles are similar. You can refer to the relevant descriptions of scenario 1 or scenario 2, which will not be described again here.
  • the scene in Figure 7 is a call scene as an example for explanation.
  • user 1 uses a headset connected to a mobile phone (such as Bluetooth, which is not limited in this application) to talk to user 2 shown in Figure 5a.
  • a mobile phone such as Bluetooth, which is not limited in this application
  • the mobile phone shown in Figure 7 can receive the effective audio signal S1 from the call service network element 2;
  • the mobile phone can receive the noise signal N1 from the earphone.
  • the noise signal N1 here can be the noise signal N obtained by the earphone described in Scenario 1.
  • N1 can be the above-mentioned reference sensor signal x(n). , or the error sensor signal e(n), or the compensated noise signal N_true, this application does not limit this.
  • the microphone 302 of the mobile phone may also collect the environmental noise signal, here the noise signal N2.
  • the mobile phone may use the noise signal N2 collected by itself or the noise signal N1 received from the earphone as the noise signal acquired by the mobile phone when executing S101 in FIG. 4 .
  • the mobile phone can also generate the noise signal N3 based on the noise signal N1 and the noise signal N2, and use the noise signal N3 as the noise signal obtained by the mobile phone when executing S101 in Figure 4.
  • the noise signal N3 is the average value of the noise signal N1 and the noise signal N2, which is not limited in this application.
  • the mobile phone can perform gain processing on the effective audio signal S1 according to the relevant methods of scenario 1 or scenario 2 based on the acquired noise signal (N1 or N2 or N3) and the effective audio signal S1 received from the call service network element 2. , to obtain the effective audio signal S1' and output it to the earphone, so that the earphone plays the gain-processed effective audio signal S1'.
  • the execution subject of the method in each embodiment of the present application can also be a headset
  • the mobile phone can use the noise signal N2 as the downlink signal and the effective audio signal S to be output (for example, the effective audio signal S1) is sent to the headset, and the headset performs the process of the above method of the present application.
  • the effective audio signal S for example, the effective audio signal S1
  • Figure 7 can also be applied to an audio playback scenario, where the audio is not call audio.
  • the audio may be local audio of the mobile phone, such as local video audio, local music, local recording, etc. This application does not limit this. Then the effective audio signal S for gain processing by the mobile phone is not received by the mobile phone from an external electronic device, but the audio signal to be output obtained from the local memory of the mobile phone.
  • the audio can also be audio received by the mobile phone from an external electronic device.
  • the mobile phone uses a video application to play online videos, or uses a music application to play online music. Then the mobile phone can receive the audio signal to be output from the application server of the video application or music application, and use the music signal to be output as the effective audio signal S of the present application. Then, the mobile phone, according to any of the above implementations of the present application, Method to perform gain processing on the effective audio signal S to output the gain-processed effective audio signal S'.
  • Figure 8 In a call scenario, user 1 uses a mobile phone to talk to user 2.
  • user 1's mobile phone can output and play the call audio through a speaker or hands-free call.
  • user 1's mobile phone can also output and play call audio through a receiver (referred to as "earpiece").
  • the call service network element 1 may be different from the mobile phone, or may be a software module or hardware chip integrated in the mobile phone, which is not limited by this application.
  • the execution subject that executes the audio processing method of this application can be the call service network element 1, or it can be a mobile phone.
  • the following takes the mobile phone as the execution subject as an example.
  • the execution subject is the call service network element 1, The principles are similar and will not be repeated here.
  • a call scenario is taken as an example to illustrate the solution of the scenario shown in Figure 8 .
  • the mobile phone may include a microphone 302, a speaker 301, and a receiver (referred to as "earpiece") 303 located at the bottom edge.
  • a microphone 302 a speaker 301
  • a receiver referred to as "earpiece”
  • this application does not limit the positions of the microphone, speaker, and receiver of the mobile phone, and they can be placed at any position on the mobile phone.
  • the bottom edge of the mobile phone shown in Figure 8 can also be equipped with a speaker.
  • the microphone 302 of the mobile phone can collect the noise signal N (for example, the noise signal N2 collected by the mobile phone in scenario 3).
  • the mobile phone when the execution subject of the method of this application is the call service network element 1, as shown by the dotted arrow in Figure 8, the mobile phone can send the collected noise signal N to the call service network element 1.
  • the main execution image is a mobile phone, so the mobile phone does not need to transmit the noise signal N to the call service network element 1.
  • the mobile phone can receive the effective audio signal S (for example, the effective audio signal S1) from the call service network element 2 from the call service network element 1.
  • the effective audio signal S for example, the effective audio signal S1
  • the call service network element 2 from the call service network element 1.
  • the mobile phone can determine the gain of the effective audio signal S based on the effective audio signal S and the noise signal N according to any implementation method introduced in Scenario 1, thereby performing gain adjustment on the effective audio signal S and obtaining the adjusted gain.
  • the effective audio signal S' is output.
  • the audio signal S_true (for example, by the audio signal) received by the human ear of user 1
  • the effective audio signal S received by the mobile phone from the call service network element 1.
  • the mobile phone outputs the effective audio signal S' after adjusting the gain through the speaker 301, and there is a certain spatial distance between the speaker 301 of the mobile phone and the human ear of user 1, then the effective audio signal output by the speaker 301 S' (for example, the sound signal converted from the effective audio signal S') reaches the human ear through the acoustic transmission path L, and the effective audio signal corresponding to the sound signal received by the human ear is S_true'.
  • the effective audio signal output by the speaker 301 S' for example, the sound signal converted from the effective audio signal S'
  • the mobile phone can compensate the effective audio signal S’ after gain processing to generate the above-mentioned effective audio signal S_true’.
  • the mobile phone can obtain the distance L between the human ear and the mobile phone; then, based on the distance L and Green's function G (r, r0, ⁇ ), the mobile phone performs compensation processing on the effective audio signal S' to obtain S_true' .
  • the mobile phone can obtain S_true’ according to Formula 10.
  • G(r, r0, ⁇ ) is Green's function
  • r0 is the spatial coordinate point of the mobile phone (such as the speaker 301 of the mobile phone shown in Figure 8)
  • r is the user's human ear (such as the user's ear of user 1 shown in Figure 8).
  • the spatial coordinate point of the ear), ⁇ r-r0 ⁇ represents the distance L between the mobile phone and the human ear of user 1.
  • the effective audio signal S’ represents the effective audio signal after gain processing by the method of this application.
  • the mobile phone can also compensate the effective audio signal S received from the call service network element 1 shown in Figure 8 based on the above distance L and Green's function G(r, r0, ⁇ ). For example, referring to Figure 4, when executing S103, the mobile phone can compensate the effective audio signal received from the call service network element 1 according to the above distance L and Green's function G (r, r0, ⁇ ); then, the mobile phone based on For the compensated effective audio signal, perform S105 and subsequent steps shown in Figure 4.
  • the call scenario is taken as an example to illustrate the gain processing of the call audio to be output by the electronic device.
  • the electronic device can obtain the effective audio signal to be output from the application server or from local storage, and use the audio processing method of the present application to effectively The audio signal undergoes gain processing to output a gain-processed effective audio signal.
  • FIGS. 5a to 5c and FIGS. 6 to 8 the same reference numerals between the various drawings represent the same objects. Therefore, the reference numerals in each drawing are not explained one by one. , reference signs not mentioned in the above-mentioned drawings can be referred to FIGS. 5a to 5c , and explanations of the same reference signs already mentioned in FIGS. 6 to 8 , which will not be described again here.
  • the electronic device includes corresponding hardware and/or software modules that perform each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions in conjunction with the embodiments for each specific application, but such implementations should not be considered beyond the scope of this application.
  • FIG. 9 shows a schematic block diagram of a device 300 according to an embodiment of the present application.
  • the device 300 may include a processor 301 and a transceiver/transceiver pin 302, and optionally a memory 303.
  • bus 304 which includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.
  • bus 304 includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.
  • various buses are referred to as bus 304 in the figure.
  • the memory 303 may be used for instructions in the foregoing method embodiments.
  • the processor 301 can be used to execute instructions in the memory 303, and control the receiving pin to receive signals, and control the transmitting pin to send signals.
  • the device 300 may be the electronic device or a chip of the electronic device in the above method embodiment.
  • This embodiment also provides a computer storage medium that stores computer instructions.
  • the electronic device When the computer instructions are run on an electronic device, the electronic device causes the electronic device to execute the above related method steps to implement the audio processing method in the above embodiment.
  • This embodiment also provides a computer program product.
  • the computer program product When the computer program product is run on a computer, it causes the computer to perform the above related steps to implement the audio processing method in the above embodiment.
  • inventions of the present application also provide a device.
  • This device may be a chip, a component or a module.
  • the device may include a connected processor and a memory.
  • the memory is used to store computer execution instructions.
  • the processor can execute computer execution instructions stored in the memory, so that the chip executes the audio processing method in each of the above method embodiments.
  • the electronic equipment, computer storage media, computer program products or chips provided in this embodiment are all used to execute the corresponding methods provided above. Therefore, the beneficial effects they can achieve can be referred to the corresponding methods provided above. The beneficial effects of the method will not be repeated here.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be combined or can be integrated into another device, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separate.
  • a component shown as a unit may be one physical unit or multiple physical units, that is, it may be located in one place, or it may be distributed to multiple different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present application can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • Integrated units may be stored in a readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.
  • the technical solutions of the embodiments of the present application are essentially or contribute to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium , including several instructions to cause a device (which can be a microcontroller, a chip, etc.) or a processor to execute all or part of the steps of the methods of various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code.
  • the steps of the methods or algorithms described in connection with the disclosure of the embodiments of this application can be implemented in hardware or by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules.
  • Software modules can be stored in random access memory (Random Access Memory, RAM), flash memory, read only memory (Read Only Memory, ROM), erasable programmable read only memory ( Erasable Programmable ROM (EPROM), electrically erasable programmable read-only memory (Electrically EPROM, EEPROM), register, hard disk, removable hard disk, compact disc (CD-ROM) or any other form of storage media well known in the art.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage media may be located in an ASIC. Additionally, the ASIC can be located in a network device. Of course, the processor and storage media can also exist as discrete components in the network device.
  • Computer-readable media includes computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • Storage media can be any available media that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请实施例提供了一种音频处理方法及电子设备,涉及终端设备技术领域,该方法包括:电子设备基于音频信号和噪声信号之间的信噪比,来对音频信号进行调整,以稳定输出的音频信号的信噪比,提升用户对电子设备输出的音频的听音体验。

Description

音频处理方法及电子设备 技术领域
本申请实施例涉及终端设备技术领域,尤其涉及一种音频处理方法及电子设备。
背景技术
用户在使用电子设备接听电话或者播放音乐或视频等场景下,用户所在的环境存在噪声,噪声强度的不稳定可导致电子设备输出的音频信号的信噪比不稳定,从而影响用户对电子设备输出的音频的听音体验。
发明内容
为了解决上述技术问题,本申请提供一种音频处理方法及电子设备。在该方法中,可基于音频信号和噪声信号之间的信噪比,来对音频信号进行调整,以稳定输出的音频信号的信噪比,提升用户对电子设备输出的音频的听音体验。
第一方面,本申请实施例提供一种音频处理方法,应用于电子设备。该方法包括:获取环境音对应的目标噪声信号;获取待输出的第一音频信号;确定所述第一音频信号与所述目标噪声信号对应的目标信噪比;基于所述目标信噪比、所述第一音频信号以及所述目标噪声信号,确定与所述第一音频信号对应的目标增益信号;基于所述目标增益信号调整所述第一音频信号,获取第二音频信号;输出所述第二音频信号。
示例性的,该电子设备可为手机、耳机、通话服务网元等,本申请对此不做限制。
示例性的,该环境音为电子设备所处环境的声音。
示例性的,该目标噪声信号,可以是电子设备采集的环境噪声的噪声信号,也可以是经过降噪处理后的环境噪声的噪声信号,还可以是经过补偿后的噪声信号,本申请对此不做限制。
示例性的,该目标信噪比为电子设备结合目标噪声信号和第一音频信号,而确定的电子设备所期望的第一音频信号与该目标噪声信号之间的信噪比。
示例性的,该第一音频信号可以是音乐的音频信号,也可以是通话的音频信号,本申请对此不做限制。示例性的,第一音频信号又称“有效音频信号”。
本申请实施例可基于信噪比(SNR)来调整电子设备待输出的音频信号,以使调节增益后的音频信号,与目标噪声信号之间的信噪比的稳定性更强。使得电子设备周围的环境噪声的分贝强弱的变化,不影响用户对该音频信号对应的声音信号的接收。例如周围环境音的音量突然变大,本实施例可使音频信号对应的声音信号的音量也可以变大,以使用户不影响对音频信号的接收;此外,在环境音的音量突然变小,本申请实施例可使音频信号对应的声音信号的音量也变小,避免音频信号对应的幅值太高,而影响用户听力。这样,可结合环境的噪声信号,而动态的保持信噪比的稳定,提升听音感受。
根据第一方面,所述确定所述第一音频信号与所述目标噪声信号对应的目标信噪比,包括:将所述第一音频信号划分为多个第一子带;将所述目标噪声信号划分为多个第二子带;其中,所述多个第一子带与所述多个第二子带对应的频段相同;确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比;其中,每个所述第一信噪比为对应相同频段的所述第一子带与所述第二子带之间的信噪比。
示例性的,该目标信噪比可以包括音频信号和噪声信号分别对应的子带之间的信噪比。
示例性的,所述第一子带与所述第二子带的数量相同,其中,相互对应的第一子带和第二子带之间,它们对应的频段相同。
可选地,所述目标信噪比包括所述多个第一信噪比。
示例性的,第一信噪比又称子带的信噪比,例如子带i的信噪比SNRi。
示例性的,电子设备可对相互对应的第一子带和第二子带确定第一信噪比。
示例性的,目标噪声信号N被划分为20个子带n_i,分别为子带n_1至子带n_20;
示例性的,音频信号S被划分为20个子带s_i,分别为子带s_1至子带s_20;
其中,子带n_i对应的频段i与子带s_i对应的频段i相同。
可以理解的是,20个子带n_i各自的频段,与20个子带s_i各自的频段之间,一一对应。
示例性的,子带i可表示子带n_i和子带s_i。
示例性的,i为大于等于1,且小于等于20的整数。
本申请实施例,电子设备可基于子带的SNR来调整第一音频信号的增益,能够考虑不同的频率成分(可基于子带表示)的增益调整细节之间的差异,以确保输出的音频信号在不同子带内的SNR更加稳定。
根据第一方面,或者以上第一方面的任意一种实现方式,所述确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比,包括:基于掩蔽曲线,确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比。
示例性的,该掩蔽曲线为人耳听觉的掩蔽曲线,可以是现有技术中的任意一种掩蔽曲线,本申请对此不做限制。
本实施例中,基于该掩蔽曲线,来确定子带的信噪比,可更加符合人耳听觉的听音感知,提升听音感知体验。
根据第一方面,或者以上第一方面的任意一种实现方式,所述基于掩蔽曲线,确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比,包括:确定所述多个第一子带与所述多个第二子带之间的多个第二信噪比;其中,每个所述第二信噪比为对应相同频段的所述第一子带与所述第二子带之间的信噪比;基于掩蔽曲线,确定与所述多个第一子带的频段分别对应的幅值阈值;基于所述多个第二信噪比和多个所述幅值阈值,确定所述多个第一子带和所述多个第二子带之间的多个第一信噪比。
示例性的,这里的第二信噪比可为初始设置的子带i的信噪比,本实施例可基于掩 蔽曲线,来确定各子带i的频段所对应的幅值阈值,其中,对于一个子带i,电子设备对该子带i确定对应的一个幅值阈值。电子设备可基于各个子带i对应的幅值阈值和各个子带i对应的初始设置的第二信噪比,确定是否对子带i对应的第二信噪比进行调节;如果调节,则可基于该子带i对应的幅值阈值来对第二信噪比进行调节,那么调节后的第二信噪比可为该子带i的第一信噪比,例如上述信噪比SNRi;相反,如果确定不需要对该子带i对应的第二信噪比进行调节,则该第二信噪比可为该子带i对应的第一信噪比,例如上述信噪比SNRi。
本申请实施例中,电子设备可基于掩蔽曲线来确定与音频信号的各个子带对应的幅值阈值,并参考该幅值阈值,确定在满足人耳听觉的生理感知的场景下,各子带对应的第二信噪比是否需要进一步调节,以确定各个子带对应的目标信噪比。
根据第一方面,或者以上第一方面的任意一种实现方式,所述确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比,包括:对于对应相同频段的所述第一子带和所述第二子带,确定对应同一时间帧的第三音频信号和第一噪声信号之间的第三信噪比;其中,所述第一子带包括所述第三音频信号,所述第二子带包括所述第一噪声信号;基于所述第三信噪比,确定对应相同频段的所述第一子带和所述第二子带之间的第一信噪比。
示例性的,一个子带可包括对应各个时间帧的多个信号。
示例性的,第一子带可包括对应不同时间帧的多个第三音频信号。
示例性的,第二子带可包括对应不同时间帧的多个第一噪声信号。
示例性的,电子设备可将对应同一时间帧的第三音频信号与第一噪声信号求比值,以得到第三信噪比。
示例性的,电子设备可对第一子带和第二子带对应的各个时间帧的所述第三信噪比,求平均值,或者加权求和,或者任取一个所述时间帧对应的第三信噪比,作为该第一子带与该第二子带之间的第一信噪比,例如上述信噪比SNRi。
本申请实施例中,可基于时间帧,来对各子带确定各子带对应的第一信噪比。可以理解的是,对于上述第二信噪比的确定方式,同样可以采用本实施方式中的基于时间帧来确定子带的信噪比的方式来作为该子带的初始的信噪比。
根据第一方面,或者以上第一方面的任意一种实现方式,所述目标信噪比包括所述多个第一信噪比;所述基于所述目标信噪比、所述第一音频信号以及所述目标噪声信号,确定与所述第一音频信号对应的目标增益信号,包括:基于所述多个第一信噪比、所述多个第一子带以及所述多个第二子带,确定与每个所述第一子带对应的第一增益信号;其中,所述目标增益信号包括与所述多个第一子带对应的多个所述第一增益信号。
在本申请实施例中,电子设备可对第一音频信号所对应的各个子带分别进行增益调整,在调整时,可以以各个子带i对应的第一信噪比,来作为各个子带i的目标信噪比来确定第一音频信号中的各个第一子带的增益信号,以确保第一音频信号的各个子带的信噪比在子带内是稳定的。
根据第一方面,或者以上第一方面的任意一种实现方式,所述基于所述目标增益信号调整所述第一音频信号,获取第二音频信号,包括:基于所述多个第一增益信号,调整对应的所述多个第一子带的增益,获取多个第三子带;其中,每个所述第三子带为调整增益后的每个所述第一子带;将所述多个第三子带合成为所述第二音频信号。
示例性的,在基于第一音频信号中的第一子带对应的第一增益信号,来对该第一子带进行增益调整时,可对该第一子带中的每个音频信号进行增益调整。另外,在对由第一音频信号所划分的各个第一子带进行增益调整之后,可对调整增益后的各个第一子带重新合成为完整的音频信号,这里称为第二音频信号。
示例性的,这里的目标增益信号包括上述多个第一增益信号。
在本申请实施例中,电子设备可按照待输出的音频信号所对应的各个子带的增益信号,来对各个子带的增益进行调整,以实现音频信号内各子带的信噪比稳定,以达到整体音频信号的信噪比稳定的效果。
根据第一方面,或者以上第一方面的任意一种实现方式,所述所述确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比之后,所述基于所述目标信噪比、所述第一音频信号以及所述目标噪声信号,确定与所述第一音频信号对应的目标增益信号之前,方法还包括:基于所述多个第一信噪比,确定清晰度指数;基于所述清晰度指数,调整所述目标信噪比;其中,调整后的所述目标信噪比,用于确定所述目标增益信号。
示例性的,该目标信噪比可以是第一音频信号与目标噪声信号之间的整体的目标信噪比,也可以是第一音频信号与目标噪声信号之间,在各个子带内的目标信噪比(例如上述第一信噪比)。
示例性的,电子设备可基于第一音频信号中各个子带对应的第一信噪比,来确定清晰度指数,以利用该清晰度指数,来对目标信噪比进行调节,从而利用调节后的目标信噪比,来确定目标增益信号。
在本申请实施例中,利用清晰度指数调节目标信噪比,可利于提高用户对输出的音频信号的听音体验。
根据第一方面,或者以上第一方面的任意一种实现方式,所述确定所述第一音频信号与所述目标噪声信号对应的目标信噪比之后,所述基于所述目标信噪比、所述第一音频信号以及所述目标噪声信号,确定与所述第一音频信号对应的目标增益信号之前,所述方法还包括:基于所述目标噪声信号的分贝和预设噪声阈值,调整所述目标信噪比;其中,调整后的所述目标信噪比,用于确定所述目标增益信号。
在本申请实施例中,电子设备可基于环境音的目标噪声信号的分贝强弱,来对目标信噪比进行调节,以增强电子设备在不同环境噪声的强度下,能够自适应的保持对输出的音频信号与环境的噪声信号的信噪比的稳定性。
根据第一方面,或者以上第一方面的任意一种实现方式,所述目标信噪比包括第四 信噪比;所述确定所述第一音频信号与所述目标噪声信号对应的目标信噪比,包括:对所述多个第一信噪比取第一平均值;将所述第一平均值作为所述第四信噪比。
示例性的,电子设备可通过调整第一音频信号的增益,来确保第一音频信号与目标噪声信号之间的整体信噪比的稳定性,这里确定该整体信噪比的方式,可以是对各个子带i对应的第一信噪比取平均值,以作为该整体信噪比,从而基于该整体的信噪比来确定对第一音频信号的目标增益信号,能够确保整体信噪比的稳定。
根据第一方面,或者以上第一方面的任意一种实现方式,目标信噪比包括第五信噪比;所述确定所述第一音频信号与所述目标噪声信号对应的目标信噪比,包括:基于时间帧,确定所述第一音频信号与所述目标噪声信号之间的信噪比的第二平均值;将所述第二平均值作为所述第五信噪比。
示例性的,电子设备可通过调整第一音频信号的增益,来确保第一音频信号与目标噪声信号之间的整体信噪比的稳定性,这里确定该整体信噪比的方式,可以是基于时间帧的方式,具体而言,电子设备可对对应同一时间帧的第一音频信号和目标噪声信号,计算比值作为它们在该时间帧上的信噪比;然后,电子设备可对各个时间帧上的该信噪比取平均值(或者加权求和,或者任取一个时间帧对应的信噪比),作为第一音频信号与目标噪声信号之间的整体信噪比。电子设备从而基于该整体信噪比,来调节第一音频信号的增益,以确保音频信号的整体信噪比的稳定性。
根据第一方面,或者以上第一方面的任意一种实现方式,所述获取环境音对应的目标噪声信号,包括:获取环境音对应的第二噪声信号;基于声传递函数,对所述第二噪声信号进行处理,获取所述目标噪声信号。
示例性的,在由耳机来采集该环境音的第二噪声信号(可以是原始的噪声信号,例如参考传感器信号x(n),也可以是降噪后的噪声信号,例如误差传感器信号e(n),这里不做限制)时,电子设备可利用声传递函数,来对第二噪声信号进行补偿,以作为环境的目标噪声信号,以提升用户的听音感知。
根据第一方面,或者以上第一方面的任意一种实现方式,所述方法还包括:基于人耳与所述电子设备之间的空间距离和格林函数,对所述第一音频信号或所述第二音频信号进行处理。
在本申请实施例中,电子设备可对音频信号进行补偿,以提升用户的听音感知。
第二方面,本申请实施例提供一种电子设备。该电子设备包括:存储器和处理器,所述存储器和所述处理器耦合;所述存储器存储有程序指令,所述程序指令由所述处理器执行时,使得所述电子设备执行如第一方面以及第一方面的任意一种实现方式中的方法。
第二方面所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第三方面,本申请实施例提供了一种计算机可读介质,用于存储计算机程序,当所述计算机程序在电子设备上运行时,使得所述电子设备执行如第一方面以及第一方面的任意一种实施方式中的方法。
第三方面所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第四方面,本申请实施例提供了一种芯片,该芯片包括一个或多个接口电路和一个或多个处理器;所述接口电路用于从电子设备的存储器接收信号,并向所述处理器发送所述信号,所述信号包括存储器中存储的计算机指令;当所述处理器执行所述计算机指令时,使得所述电子设备执行如第一方面以及第一方面的任意一种实施方式中的方法。
第四方面所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
第五方面,本申请实施例提供了一种包含指令的计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如第一方面以及第一方面的任意一种实施方式中的方法。
第五方面所对应的技术效果可参见上述第一方面以及第一方面的任意一种实现方式所对应的技术效果,此处不再赘述。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为示例性示出的电子设备的结构示意图之一;
图2为示例性示出的电子设备的软件结构示意图;
图3为示例性示出的用户接收音频的示意图;
图4为示例性示出的电子设备的音频处理过程的示意图;
图5a为示例性示出的通话场景的示意图;
图5b为示例性示出的通话场景的示意图;
图5c为示例性示出的耳机的示意图;
图6为示例性示出的音频信号处理前后的对比示意图;
图7为示例性示出的通话场景的示意图;
图8为示例性示出的通话场景的示意图;
图9为本申请实施例提供的装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
本申请实施例的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一目标对象和第二目标对象等是用于区别不同的目标对象,而不是用于描述目标对象的特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。例如,多个处理单元是指两个或两个以上的处理单元;多个系统是指两个或两个以上的系统。
图1示出了电子设备100的结构示意图。应该理解的是,图1所示电子设备100仅是电子设备的一个范例,可选地,电子设备100可以为终端,也可以称为终端设备,终端可以为蜂窝电话(cellular phone),平板电脑(pad)、可穿戴设备(例如耳机)或物联网设备等,本申请不做限定。需要说明的是,电子设备100可以具有比图中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图1中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器 (application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display  serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处 理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置 于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图2是本申请实施例的电子设备100的软件结构框图。
电子设备100的分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
系统库与运行时层包括系统库和安卓运行时(Android Runtime)。系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。3D图形库用于实现三维图形绘图,图像渲染,合成和图层处理等。安卓运行时包括核心库和虚拟机。安卓运行时负责安卓系统的调度和管理。核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库 可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
可以理解的是,图2示出的系统框架层、系统库与运行时层包含的部件,并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。
音频信号可分为有效音频信号和噪声信号。其中,有效音频信号为电子设备待输出的音频信号。例如,手机在播放音乐时,手机从应用服务器接收到的音乐音频信号或者手机本端保存的待播放的音乐音频信号均可为有效音频信号。再如,手机1在与手机2(可以是多个手机2,例如视频会议场景)进行语音或视频通话时,手机1从手机2接收到的语音音频信号也为有效音频信号。而噪声信号可为电子设备采集的环境音等。
示例性的,如图3所示,人耳不仅可听到耳机播放的有效音频信号S,还可听到噪声信号,例如用户所在环境的环境噪声信号N。其中,耳机的麦克风可采集到环境噪声信号N。例如本端用户在使用手机和对端用户通话,本端用户佩戴的与该手机通信连接的耳机,可播放通话过程中来自对端的语音音频信号(可作为这里的有效音频信号S)。或者,本端用户在使用与该耳机播放手机中的音乐,则耳机还可播放该音乐音频信号(可作为有效音频信号S)。在本端用户通话或听音乐等使用电子设备播放音频的场景下,用户对音频的听感会受到本端用户周围不稳定的环境噪声影响,造成用户听到的有效音频信号的音量忽大忽小,不够稳定,听音感知体验不佳。
自动增益控制技术(Automatic Gain Control,AGC)可对音频信号的输出音量进行调整,ACG利用线性放大和压缩的有效组合,对输出的音频信号进行动态调整。其中,ACG可以通过改变音频信号的输入输出幅值的压缩比例(也称音频信号的增益)来达到控制声音大小的目的,以使最终输出的声音幅度始终保持在一个恒定的范围内。例如,当弱音频信号输入时,通过增大音频信号的增益可以提高音频信号的幅值,以保证输出音频信号的强度;当所输入的音频信号达到一定强度时,通过减小音频信号的增益可以使输出的音频信号的幅值降低。因此ACG可以通过改变音频信号的增益,来自动调整输出音频信号的幅值,避免用户听到的音频信号的音量忽大忽小,可在听音感知上更稳定、更舒适。
在现有技术中,在对音频信号进行处理时,不会对音频信号进行区分,例如区分有效音频信号和噪声信号,而是对电子设备所获取的音频信号(包括有效音频信号S和噪声信号N)一起进行自动增益控制,以将音频信号的幅值维持在一定范围内。但是这种处理方式只考虑整体音频信号的稳定性,造成在提高或降低有效音频信号的幅值的同时,也会提高或降低噪声信号N的幅值。那么在外界环境噪声强度不够稳定的场景下,用户听到的有效音频信号的音量还是有时大有时小,音量不够稳定,影响用户的听感体验。
为此,本申请提供了一种音频处理方法及电子设备。该方法可确保输出音频的信噪 比SNR(Signal to Noise Ratio)的稳定性。那么在用户使用电子设备输出音频时,即便环境噪声的分贝强,且强度不够稳定时,本申请的方法可通过确保音频信号的信噪比的稳定性(有效音频信号S与环境噪声N的比值),来提升用户听音或通话等场景下的听音舒适性,以提升听感体验。
图4为示例性示出的本申请的电子设备的音频处理方法的步骤流程图。
该音频处理方法(例如图4所示的方法过程)可应用于语音或视频通话场景,也可以应用于音频播放场景,还可以应用于其他的未列举的电子设备需要输出音频的场景,这里不再赘述。
本申请以语音通话场景为例,来对上述方法进行说明,该方法在在应用于其他需要输出音频的场景时,方法原理类似,这里不再一一赘述。
图5a为示例性示出的通话场景的示意图。
如图5a所示,在通话场景下,通话服务网元1可将采集的本端用户(这里为用户1)的语音信号(这里作为有效音频信号S2),并发送至通话服务网元2。通话服务网元2可输出来自通话服务网元1的有效音频信号S2,以使通话服务网元2侧的用户2听到用户1的通话语音。同理,通话服务网元2可采集用户2的语音信号(这里作为有效音频信号S1),并发送有效音频信号S1至通话服务网元1。通话服务网元1可输出该有效音频信号S1,以使用户1听到用户2的通话语音。这样,就实现了两端用户的语音通话。
当然,通话服务网元的数量可以多于两个,例如在会议场景下或话务场景下,原理类似,本申请对此不做限制。
示例性的,通话服务网元可以是话务系统装置或者会议系统装置,还可以是手机,或者手机中集成的软件模块或硬件芯片等,本申请对此不做限制。
示例性的,在图5a中,通话服务网元可不通过耳机等设备,而直接对该网元侧的用户的音频或该网元侧的噪声信号进行采集,也可以不通过耳机等设备而直接播放来自对端的通话服务网元的音频信号。示例性的,该通话服务网元为手机或者手机中集成的软件或硬件模块。
或者,通话服务网元与该网元侧的用户,通过耳机等介质来进行音频的播放、音频及噪声的采集。示例性的,该通话服务网元可选地为手机或者手机中集成的软件或硬件模块。
场景1
示例性的,结合于图5a,图5b为示例性示出通话服务网元1通过耳机与用户1交互的场景示意图。通话服务网元2及其连接的耳机,与图5b的描述过程同理,这里不再赘述。
示例性的,如图5b所示,用户1佩戴的耳机可包括外置麦克风201,扬声器202。
示例性的,结合图5a,图5b中的外置麦克风201可采集用户1的语音信号,例如作为图5a中的有效音频信号S2,发送至通话服务网元1。
示例性的,耳机还可采集环境噪声作为噪声信号N(作为上行信号)发送至通话服务网元1。
示例性的,噪声信号N和有效音频信号S2一般不同时采集,例如在用户1说话时,耳机采集有效音频信号S2,在用户1说话间隙,耳机采集噪声信号N。
示例性的,通话服务网元1还可将接收的有效音频信号S(例如图5a所示的有效音频信号S1)作为下行信号发送至耳机。
在图5b中,通话服务网元1,或者耳机,可执行图4的流程,以结合噪声信号N,来对待输出的有效音频信号S(例如有效音频信号S1)进行增益处理,以提升有效音频信号S和噪声信号N之间的信噪比的稳定性。
下面以图5b中的耳机执行图4的流程为例,来对本申请的音频处理方法进行描述。
如图4所示,该过程可包括如下步骤:
S101,获取环境的噪声信号。
示例性的,结合于图5b,耳机可获取环境噪声,可将采集的环境噪声作为这里的噪声信号N。或者,耳机还可对获取的环境噪声进行降噪处理,将采集到的降噪处理后的环境噪声作为这里的噪声信号N,本申请对此不做限制。
示例性的,结合图5c,对图5b所示的耳机的降噪过程进行描述。
如图5c(1)所示,该耳机不仅包括图5b所示的扬声器202、外置麦克风201,还包括内置麦克风203。
示例性的,如图5c(1)所示,耳机所处环境存在环境噪声,例如外部直达噪声d(n),外置麦克风201对于该环境噪声可采集到参考传感器信号x(n)(基本与d(n)等同)。如图5c(2)所示,耳机的自适应滤波器可对参考传感器信号x(n)进行处理,以获取参考传感器信号x(n)的反相信号y(n)。如图5c(1)所示,耳机可通过扬声器202来播放该反向信号y(n)。那么佩戴该耳机的用户1的耳朵在扬声器202的位置附近,则人耳不仅可听到外部直达噪声d(n),还可听到与该外部直达噪声d(n)反相的反相信号y(n)。如图5c(2)所示,输出播放的反相信号y(n)可抵消环境中的部分外部直达噪声d(n),使得图5c(1)所示的扬声器202所处的环境处的噪声为残余的噪声信号(这里为误差传感器信号e(n))。如图5c(1)所示,内置麦克风203可采集到环境中的误差传感器信号e(n)。
继续参照图5b,结合图5c,耳机可将内置麦克风203所采集的误差传感器信号e(n)作为噪声信号N,或者,将外置麦克风201采集的参考传感器信号x(n)作为噪声信号N,本申请对此不做限制。
示例性的,如图5b所示,用户1佩戴耳机时,用户的耳朵位于扬声器202附近的位置,而采集环境的噪声信号的麦克风201距离扬声器202存在一定距离,使得在上述过程中耳机所采集的噪声信号N,与佩戴该耳机用户1的人耳所实际接收的噪声信号N_true之间存在差异。
示例性的,耳机采集的噪声信号N可以是图5c所示的麦克风201所采集的参考传感器信号x(n),或者图5c所示的麦克风203所采集的误差传感器信号e(n)。
在一种可能的实施方式中,耳机可对采集的噪声信号N进行补偿,以使补偿后的噪声信号为N_true。那么在耳机执行下述S105时,则可以将补偿后的噪声信号N_true作为S105中的噪声信号进行处理,以提升用户的听音感知。
示例性的,耳机可基于声传递函数H(ω),对噪声信号N进行补偿,以获取N_true。
示例性的,N_true=N*H(ω)。
示例性的,该声传递函数H(ω)可配置于图5b所示的耳机或者通话服务网元1中,本申请对此不做限制。
示例性的,声传递函数H(ω),可以基于大量耳机拟合得到,然后存储在耳机或通话服务网元1中。
S103,获取待输出的有效音频信号。
示例性的,如图5b所示,耳机可从通话服务网元1(例如该通话服务网元1为与耳机蓝牙连接的手机)获取待输出的音频信号,这里作为有效音频信号S。
其中,本申请对于S101和S103的执行顺序不做限制。
S105,基于所述噪声信号和所述有效音频信号,确定目标信噪比。
示例性的,耳机可基于噪声信号N和待输出的有效音频信号S,来确定该噪声信号N与该有效音频信号S的目标信噪比。
示例性的,该目标信噪比可以是S101和S103所获取的两组信号的目标信噪比SNR_t,和/或,该目标信噪比可包括:所述噪声信号N和所述有效音频信号S对应的各个子带的目标信噪比SNRi,其中,i=1,2,3,4,k,其中,k为子带的数量。
可以理解的是,子带可对应一个频段,具体定义可参考现有技术,本申请对此不做限制。
示例性的,耳机在确定噪声信号N和有效音频信号S对应的各个子带时,耳机可在时域内,将噪声信号N和有效音频信号S均划分为多个子带。
示例性的,耳机在确定噪声信号N和有效音频信号S对应的各个子带时,耳机还可将时域的噪声信号N,以及时域的有效音频信号S,均转换为频域信号;然后,在频域内,将频域的噪声信号N以及频域的有效音频信号S,划分为多个子带。例如k=20,本申请对此不做限制,这样可将噪声信号N和有效音频信号S均在频域内划分为20个子带。
示例性的,噪声信号N被划分为20个子带n_i,分别为子带n_1至子带n_20;
示例性的,有效音频信号S被划分为20个子带s_i,分别为子带s_1至子带s_20;
其中,子带n_i对应的频段i与子带s_i对应的频段i相同。
可以理解的是,20个子带n_i各自的频段,与20个子带s_i各自的频段之间,一一对应。
示例性的,子带i可表示子带n_i和子带s_i。
例如,子带n_i对应频段1,子带s_i也对应频段1,例如频段1包括f1至f4的多个频点,本申请对此不做限制。
可以理解的的是,不论所述噪声信号N和所述有效音频信号S是在频域上还是时域上被分别划分为20个子带,20个子带n_i各自的频段,与20个子带s_i各自的频段之间,均一一对应。
示例性的,耳机将时域信号转换为频域信号的方法可以是傅里叶变换等方法,本申请对此不做限制。
示例性的,耳机在获取上述所述噪声信号N和所述有效音频信号S的目标信噪比 SNR_t,和/或,上述各个子带的目标信噪比SNRi时,可采用方式1至方式3中的任意一种实现方式,本申请对此不做限制。
方式1:
方式1.1:预先配置有效音频信号和噪声信号的整体的目标信噪比SNR_t的取值。
方式1.2:预先配置有效音频信号与噪声信号对应的各个子带的目标信噪比SNRi,i=1,2,3,4,…,20。
示例性的,预先配置的子带的数量可大于或等于k(例如k=20)。
示例性的,在预先配置整体的目标信噪比和子带的目标信噪比时,可通过用户配置(例如UI界面)或者系统配置(例如配置文件),本申请对此不做限制。
示例性的,耳机可从配置文件中获取目标信噪比SNR_t的取值,作为噪声信号N和待输出的有效音频信号S的整体的目标信噪比SNR_t。
示例性的,耳机可从配置文件预设的各个子带的目标信噪比中,获取与目标子带对应的目标信噪比,以作为目标子带的信噪比。其中,目标子带为耳机对采集的噪声信号N和获取的待输出的有效音频信号S所划分为的多个子带。这样,耳机就可从预配置信息中获取到各子带的目标信噪比。
该方式1可快速获取有效音频信号和噪声信号的整体的目标信噪比,以及各个子带的目标信噪比,以便于调节有效音频信号的增益,确保有效音频信号和噪声信号的整体的信噪比稳定在目标信噪比内,以及各个子带的信噪比稳定在各个子带的目标信噪比内。
方式2:
方式2.1:耳机在获取待输出的有效音频信号S和环境的噪声信号N的整体的目标信噪比SNR_t的过程中,耳机可基于时间帧,连续计算各时间帧的有效音频信号S和环境的噪声信号N的SNR(即S/N);然后,耳机可基于各时间帧对应的SNR,确定上述目标信噪比SNR_t。
示例1,以耳机播放音乐为例,在耳机播放音乐的开始一段时间(例如2s)内,计算这段时间的有效音频信号S(这里为音乐的音频信号)的平均幅值S avg,以及这段时间内采集的环境的噪声信号N的平均幅值N avg,耳机可将S avg/N avg作为待输出的下一段音乐的音频信号与采集的噪声信号N的目标信噪比SNR_t。
示例2,耳机在确定各时间帧对应的SNR的均值时,还可对已确定的SNR的均值不断的更新,以将更新后的SNR的均值作为上述目标信噪比SNR_t。
示例性的,耳机可设置可滑动的时间窗口(例如上述2s),在一个时间窗口内,按照示例1的方式,对该时间窗口所采集的音乐音频信号以及噪声信号,计算S avg/N avg,作为目标信噪比SNR_t;然后,在下一个时间窗口,再按照示例1的方式,耳机对本时间窗口内所采集的音乐音频信号以及噪声信号,计算S avg/N avg;然后,耳机将目标信噪比SNR_t更新为对应本次时间窗口的S avg/N avg。以此类推,使得目标信噪比SNR_t可随噪声信号N的变化而不断更新。
当然,可滑动的时间窗口的时长可以不同,例如在噪声信号的幅值较为稳定时,时间窗口可以就较长,在噪声信号的幅值较为动态时,则时间窗口可以较短。
示例3,耳机在基于各时间帧对应的SNR,确定上述目标信噪比SNR_t,耳机可对每 个时间点的有效音频信号与噪声信号计算一次SNR;然后,耳机将多个时间点对应的SNR求平均值,或者加权求和,或者,采样一个时间点对应的SNR,作为基于时间帧所确定的效音频信号S和环境的噪声信号N的整体的目标信噪比SNR_t。
示例性的,耳机采集的一段时间内的噪声信号N,和该段时间内的有效音频信号S,可在时间帧上相互对应,例如图6所示的t0时刻不仅对应有曲线1中的一个采样点(有效音频信号S中的一个采样点),还对应有曲线2中的一个采样点(噪声信号N中的一个采样点),这两个采样点因t0时刻而相互关联对应。
方式2.1的实现方式并不限于上述示例1至示例3,还可包括其他的实现方式,这里不再赘述。
方式2.2:耳机在获取待输出的有效音频信号S和环境的噪声信号N对应的各个子带的目标信噪比SNRi时,对于任意一个子带的目标信噪比SNRi,耳机可基于该子带对应的有效音频信号和噪声信号,确定与各时间帧对应有效音频信号与噪声信号之间的信噪比SNR_it。其中,如方式2.1所述,有效音频信号S和噪声信号N可在时间帧上相互对应。那么耳机可对一个子带内的各有效音频信号和各噪声信号,对在时间帧上相互对应的有效音频信号和噪声信号,计算信噪比SNR_it,以获取该子带对应的多个信噪比SNR_it。然后,耳机可基于该子带对应的该多个信噪比SNR_it,确定该子带对应的目标信噪比SNRi。
示例性的,耳机在基于该子带对应的该多个信噪比SNR_it,确定该子带对应的目标信噪比SNRi时,可在该多个信噪比SNR_it中随机采样一个信噪比SNR_it,作为该子带对应的目标信噪比SNRi。
示例性的,耳机在基于该子带对应的该多个信噪比SNR_it,确定该子带对应的目标信噪比SNRi时,可在对该多个信噪比SNR_it取平均值,作为该子带对应的目标信噪比SNRi。
示例性的,耳机在基于该子带对应的该多个信噪比SNR_it,确定该子带对应的目标信噪比SNRi时,可在对该多个信噪比SNR_it加权求和,以作为该子带对应的目标信噪比SNRi。
本申请对于耳机基于该子带对应的该多个信噪比SNR_it,确定该子带对应的目标信噪比SNRi的具体策略不做限制。
方式2.2中在获取各个子带的目标信噪比时,原理与方式2.1类似,区别在于这里是对子带内在时间帧上相互关联对应的有效音频信号S和噪声信号N计算信噪比SNR_it,并基于该SNR_it,获取该子带的目标信噪比SNRi。
方式3:
方式3.1:耳机在获取待输出的有效音频信号S和环境的噪声信号N对应的各个子带的目标信噪比SNRi时,可基于人耳听觉的掩蔽曲线,获取与该各个子带对应的目标信噪比SNRi。示例性的,耳机可通过S201至S203来实现:
S201,耳机通过上述方式1.2或上述2.2或者其他方式,获取各个子带的信噪比SNRi作为各个子带初始的目标信噪比SNRi。
S202,耳机基于人耳听觉的掩蔽曲线,获取与该各个子带对应的幅值,作为与各个子带对应的声学掩蔽阈值thr_i。
其中,本申请对于S201和S202的执行顺序不做限制,它们均在S203之前执行。
示例性的,人耳听觉的掩蔽曲线可不区分时域信号和频域信号。
关于上述掩蔽曲线可以是现有技术中的任意关于人耳听觉的掩蔽曲线,本申请对此不做限制。
示例性的,有效音频信号S和噪声信号N划分为多个子带。
示例性的,有效音频信号S的一个子带可包括多个采样点,每个采样点包括幅值和频点。
并且,有效音频信号S所划分为的多个子带(包括多个音频信号),与噪声信号N所划分为的多个子带(包括多个噪声信号)之间,频段是相同的。例如有效音频信号划分为子带1至子带20,同理,噪声信号N也划分为子带1至子带20。以子带1为例,有效音频信号S中的子带1对应频段为f1至f4,噪声信号N中的子带1对应的频段同样为f1至f4。只是在子带1对应的频段内,噪声信号在各频点f上的幅值,与有效音频信号S在各频点f上的幅值不同。
示例性的,上述掩蔽曲线可以是横轴为频点(也称频率),纵轴为幅值的曲线,有效音频信号S和噪声信号N所划分为的多个子带中每个子带可对应一个频段。那么耳机可从上述掩蔽曲线中,获取与该频段中的任意一个频点所对应的幅值,作为该频段的子带所对应的幅值,以作为该子带的声学掩蔽阈值。
但是,本申请对于对各个子带对应的频段,在掩蔽曲线上确定对应的幅值的策略,并不限于上述示例,还可通过其他已知方法,来确定与子带的频段对应的幅值,以作为该频段对应的子带的声学掩蔽阈值。
本申请的上述掩蔽曲线用于指示在相同或者相邻频段内,两个信号的能量差超出声学掩蔽阈值thr_i,低能量的信号可被掩蔽掉,使得人耳听不见低能量的信号。从而借助于该掩蔽曲线,来生成声学掩蔽阈值thr_i,以调节有效音频信号的增益,来改变有效音频信号的幅值,以使改变幅值后的有效音频信号可掩盖相应的噪声信号,提升用户对有效音频信号的听觉体验。
S203,耳机基于与各个子带对应的声学掩蔽阈值thr_i,和各个子带对应的音频信号si以及噪声信号ni,确定是否对各个子带对应的初始的目标信噪比SNRi进行调节,以确定各个子带的目标信噪比SNRi。
示例性的,有效音频信号S的子带i对应的音频信号si,可为该有效音频信号S的子带i对应的一个采样点p;噪声信号N的子带i对应的噪声信号ni,可为该噪声信号N的子带i对应的一个采样点q,其中,采样点p和采样点q对应的频点相同。
例如,采样点p为有效音频信号S的子带i中的任意一个采样点,采样点q为噪声信号N的子带i中与采样点p的频点相同的一个采样点。
再如,采样点p的幅值可以是有效音频信号S的子带i对应的多个采样点的平均幅值,该采样点p的频点可以是该子带i中的任意一个频点;采样点q的幅值为噪声信号N的子带i对应的多个采样点的平均幅值,采样点q的频点与采样点p的频点相同。
示例性的,有效音频信号S的子带i对应的音频信号si,也可为该有效音频信号S的子带i对应的多个采样点p(例如该子带i中的每个采样点或部分采样点,本申请对此 不做限制);噪声信号N的子带i对应的噪声信号ni,可为该噪声信号N的子带i对应的多个采样点q(例如该子带i中的每个采样点或部分采样点,本申请对此不做限制),其中,采样点p和采样点q的数量相同,每组采样点p和采样点q对应的频点相同。
需要说明的是,本申请对于采样点p和采样点q的确定方式不做限制。
在S203中,若子带i对应的音频信号si和噪声信号ni满足公式1,考虑到ni和thr_i均为常量,则说明该子带对应的音频信号si已经足够大,该子带i的音频信号si与噪声信号ni的幅值已经存在较大差距,该子带i的音频信号的分贝在人耳感知上已经可以远大于该子带i的噪声信号。使得该子带i的有效音频信号可以掩盖噪声信号的音量,人耳易感知该子带i对应的音频信号。那么耳机可不调节该子带i对应的初始的目标SNRi,并将该初始的目标信噪比SNRi作为该子带i对应的目标信噪比SNRi。
在S203中,若子带i对应的音频信号si和噪声信号ni满足公式1,则耳机可令该子带i的目标信噪比SNRi更新为si/(ni+thr_i)。
si/(ni+thr_i)≥SNRi,公式1;
方式3.2:耳机在获取有效音频信号和噪声信号的整体的目标信噪比SNR_t时,可将各个子带i对应的目标信噪比SNRi求平均值,以作为目标信噪比SNR_t。
当然,获取目标信噪比SNR_t的方式不限制与方式3.2。
在本方式3对应的实施方式中,耳机可基于人耳听觉的掩蔽曲线,获取与该各个子带对应的声学掩蔽阈值,并结合该声学掩蔽阈值,来确定各子带或整体信号的信噪比,使得该信噪比融合了人耳听觉的心理感知,利用该信噪比调整有效音频信号S,可使调整后的有效音频信号的信噪比达到最佳听音感知的效果。
需要说明的是,上述方式1至方式3中关于获取目标信噪比SNR_t的具体方式,以及获取各个子带对应的目标信噪比SNRi的具体方式可以自由组合,本申请对此不做限制。例如耳机在获取上述整体的目标信噪比SNR_t时,通过方式2.1来实现,以及在获取各个子带对应的目标信噪比SNRi时,通过方式3.1来实现。
在一种可能的实施方式中,耳机还可基于多个子带(所述噪声信号N和有效音频信号S所划分为的各个子带)分别对应的目标信噪比SNRi,确定清晰度指数(articulation index,AI)。其中,AI是言语清晰度评价参数。然后,耳机可基于AI来对各个子带对应的目标信噪比SNRi,和/或,上述整体的目标信噪比SNR_t进行调节,将调节后的目标信噪比SNRi,和/或,调节后的目标信噪比SNR_t作为S107中的目标信噪比,以用于确定增益。
示例性的,耳机在确定AI时,可通过步骤1和步骤2来实现:
步骤1,耳机可对多个子带分别对应的目标信噪比SNRi进行归一化处理。
示例性的,耳机可按照公式2,将通过方式1或方式2或方式3得到的各个子带的目标信噪比SNRi(即公式2中的SNR dB),更新在[-15,15]范围内。然后,耳机按照公式3,将更新在[-15,15]范围内的目标信噪比SNRi(即公式2中的SNR’ dB(f i))映射到[0,1],其中,在公式3中,取值映射到[0,1]的每个子带的目标信噪比SNRi以SNR M(f i)表示。
Figure PCTCN2022106850-appb-000001
Figure PCTCN2022106850-appb-000002
其中,如公式2所示,在耳机将各个子带的目标信噪比SNRi(即公式2中的SNR dB),更新在[-15,15]范围内时,在目标信噪比SRNi(即公式2中的SNR dB)的取值小于-15时,则耳机可将该目标标信噪比SRNi的取值更新为-15;在目标信噪比SRNi(即公式2中的SNR dB)的取值大于15时,则耳机可将该目标标信噪比SRNi的取值更新为15;在目标信噪比SRNi(即公式2中的SNR dB)的取值大于或等于-15,且小于或等于15时,则耳机可该目标信噪比SRNi的取值保持不变。
其中,在公式2中,子带的数量为k,k个子带对应的k个更新后的目标信噪比SNRi以SNR’ dB(f i)表示。
需要说明的是,在公式2中,设定的用于更新SNRi的临界值并不限于-15、15,还可以是其他数值,本申请对此不做限制。
步骤2,耳机可基于归一化处理后的目标SNRi,确定清晰度指数AI。
Figure PCTCN2022106850-appb-000003
示例性的,耳机可基于公式4,来对通过公式2和公式3归一化处理后的目标SNRi(在公式3和公式4中以SNR M(f i)表示)进行加权求和,以确定清晰度指数AI。
其中,公式4中的k为频谱被划分的频段个数,即子带i的个数k。
W i表示第i个频段(这里为第i个子带对应的频段)的频带重要性函数(Band-Importance Functions,BIF),BIF满足公式5,BIF可通过大量的实验得到。那么在上述公式4中,该W i相当于权重。
Figure PCTCN2022106850-appb-000004
示例性的,耳机可基于AI对整体的目标信噪比SNR_t进行调节,还可以对子带对应的目标信噪比SNRi进行调节。
示例性的,耳机可通过公式6来对SNR_original进行调节,调节后的信噪比为SNR_target。
SNR_target=(1/AI)*SNR_original,公式6;
其中,SNR_original表示利用AI调节前的目标信噪比,SNR_target表示利用AI调节后的目标信噪比。
示例性的,在SNR_original为S105确定的整体的上述目标信噪比SNR_t时,SNR_target为利用AI调节后的整体的目标信噪比SNR_t。
示例性的,在SNR_original为S105确定的子带i的目标信噪比SNRi时,SNR_target为利用AI调节后的子带i的目标信噪比SNRi。
可选地,为保证鲁棒性,可以约束1/AI在1.0和1.3之间。
示例性的,在约束1/AI时,在经过公式4的计算,确定1/AI大于1.3时,可将1/AI的取值更新为1.3;在1/AI的取值小于1.0时,可将1/AI的取值更新为1.0;在1.0≤1/AI≤1.3时,则保持1/AI不变。
需要说明的是,对于1/AI的约束条件,并不限于上述1.0和1.3的举例,还可以是其他约束的数值或其他的约束条件,本申请对此不做限制。
需要说明的是,耳机利用AI来调节目标信噪比时,并不限于通过公式6的方式,还可通过其他策略,本申请对此不做限制。例如在AI小于预设阈值时,耳机可增大目标信噪比(可以是上述SNR_t,也可以是子带i对应的SNRi),或保持该目标信噪比不变。再如,在AI大于预设阈值时,耳机可减小该目标信噪比。
可选的,为了增加自适应能力,耳机可以根据采集的环境的噪声信号N的分贝大小来调整目标信噪比(可以是上述SNR_t,也可以是子带i对应的SNRi)。例如耳机可设置调整系数d,d=N/N0,其中,N0预设噪声阈值。例如N0可以是可配置的基础噪声级别(例如50dB(分贝),本申请对此不做限制),其中,N为耳机在执行图4中S101时采集的上述噪声信号N。
示例性的,耳机可基于调整系数d,来对目标信噪比进行调节。
示例性的,耳机可按照公式7对目标信噪比进行调节。
SNR_target=d*SNR_original,公式7;
其中,公式7中的SNR_original表示利用调节系数d调节前的目标信噪比,SNR_target表示利用调节系数d调节后的目标信噪比。
可选地,为例保证鲁棒性,可以约束调整系数d在0.9和1.1之间。
示例性的,在约束调整系数d时,在经过上述N/N0的计算,确定d大于1.1时,可将d的取值更新为1.1;在d的取值小于0.9时,可将d的取值更新为0.9;在0.9≤1/AI≤1.1时,则保持d不变。
需要说明的是,对于d的约束条件,并不限于上述0.9和1.1的举例,还可以是其他约束的数值或其他的约束条件,本申请对此不做限制。
可以理解的是,利用清晰度指数AI调节目标信噪比的实施方式,与利用调整系数d来调节目标信噪比的实施方式可以结合,本申请对此不做限制。
S107,基于所述目标信噪比、所述噪声信号以及所述有效音频信号,确定增益信号。
示例性的,在目标信噪比为有效音频信号S和噪声信号N的整体的目标信噪比SNR_t时,耳机可通过公式8确定增益信号G。
G=SNR_t*N/S,公式8;
示例性的,在目标信噪比为有效音频信号S和噪声信号N的各子带的目标信噪比SNRi时,耳机可通过公式9确定子带i对应的增益信号g_i,其中,有效音频信号S的每个子带i对应一个增益信号g_i。
g_i=SNRi*ni/si,公式9;
其中,ni为噪声信号N中,与子带i对应的噪声信号ni;si为有效音频信号S中,与子带i对应的有效音频信号si。其中,i=1,2,3,…,k。
其中,关于与子带i对应的噪声信号ni,以及与子带i对应的有效音频信号si的解释可参照S203关于噪声信号ni,和有效音频信号si的相关解释说明,这里不再赘述。
S109,基于所述增益信号,调整所述有效音频信号并输出调整后的有效音频信号。
示例性的,在目标信噪比为所述目标信噪比SNR_t时,耳机可利用有效音频信号的整体的增益信号G,来调整有效音频信号S的增益,以改变有效音频信号S的幅值,其中,S’=f(G,S)。其中,S’为调整增益后的有效音频信号S。示例性的,S’=G*S。
这样,可确保耳机输出的调整增益后的有效音频信号S’与噪声信号N之间的整体信噪比更加稳定,提升整体信噪比的稳定性。
示例性的,在目标信噪比为有效音频信号S和噪声信号N的各子带的目标信噪比SNRi时,耳机可利用各子带i对应的增益信号g_i,来调整子带i中的每个有效音频信号si_l的增益,以改变有效音频信号S中对应子带i内的每个有效音频信号si_l的幅值,其中,si_l’=f(g_i,si_l)。其中,si_l表示子带i中的每个有效音频信号。si_l’表示耳机为对子带i中的有效音频信号si_l调整增益后的有效音频信号。示例性的,si_l’=g_i*si_l。
示例性的,可参照图6所示的信号曲线的对比示意图。
如图6(1)所示,曲线1为利用本申请的方法对有效音频信号S处理(这里为调整增益)之前,有效音频信号S的信号示意图。
如图6(1)和图6(2)所示,曲线2为噪声信号N的信号示意图,其中,图6(1)和图6(2)中的曲线2相同。
以图6(1)所示的曲线2为例进行说明,图6(2)所示的曲线2同理,这里不再赘述。如图6(1)的曲线2所示,噪声信号N的幅值并不稳定,环境噪声的分贝有时强,有时弱。在t1时刻,噪声信号N的幅值,比t0时刻(或者t2时刻)噪声信号N的幅值大许多。可以理解的是,曲线2所示的环境噪声,在t2时刻声音突然增大,其他时刻的声音较为稳定。
请参照图6(1)所示的曲线1,耳机输出的有效音频信号S的各个波峰是稳定的,换言之,有效音频信号S的音量是稳定的。
示例性的,在图5a和图5b所示的场景下,用户1与用户2进行语音通话,用户1使用耳机播放用户1的有效音频信号S。有效音频信号S的音量是比较稳定的,但是在用户1侧的环境噪声中,在t1时刻,环境噪声突然分贝增大,造成用户1的人耳所听到的有效音频信号S的声音容易被噪声信号N所掩盖,影响用户收听有效音频信号S。从图6(1)可以看到,有效音频信号S与噪声信号N的整体信噪比SNR并不稳定。
在现有技术中,由于不区分有效音频信号和噪声信号,而是对图6(1)所示的曲线1和曲线2的幅值均增大或均减小,从而存在信噪比稳定性差的问题。
请参照图6(2),本申请的耳机可在输出有效音频信号S之前,按照上述音频处理方法对有效音频信号S的各子带进行处理,曲线1’为处理后的有效音频信号S的信号示意图。对比于图6(1)所示的曲线1,如图6(2)所示的曲线1’所示,在t1时刻以及t1 前后一段时间(例如t11至t12的时间段)内,有效音频信号S的幅值显著增加,使得该段时间内的信噪比稳定性更高;而对比于曲线1,从曲线1’可以确定,在t11之前以及t12之后的,有效音频信号的幅值并未被调节。这样,本申请的耳机可有针对性的对需要调节的有效音频信号的部分频域或部分时域进行调节,以提升有效音频信号S与噪声信号N在子带内的信噪比稳定性,以及信号整体的信噪比稳定性。
可以理解的是,图6旨在说明通过本申请的方法处理后的有效音频信号S,与噪声信号N的信噪比的稳定性更好。本申请的方法对频域的有效音频信号处理后,效果相同。
在本实施方式中,耳机可对有效音频信号的不同子带的增益分别进行调整,使得有效音频信号的不同的频率成分的增益调整可存在区别,以提高各子带的SNR稳定性。
另外,考虑到待输出的有效音频信号被划分为多个子带,那么在耳机对各个子带中的每个有效音频信号的增益进行调节之后,耳机可对调节后的不同子带间的每个有效音频信号(例如各个子带i对应的si_l’)重新合成为一个整体的调整增益后的有效音频信号(也可以标记为有效音频信号S’)。具体合成方法可参考已知技术,这里不做限制。最后,耳机可将合成后的有效音频信号S’输出播放。示例性的,这里合成后的有效音频信号S’可区别于利用整体的目标信噪比SNR-t调整增益后的有效音频信号S’。
通过本申请的方法,耳机获取有效音频信号的整体的增益信号G来对有效音频信号S的增益进行调节,可使有效音频信号与噪声信号的整体信噪比更加稳定。此外,耳机获取各子带的有效音频信号的增益信号g_i,来对有效音频信号S的各子带的增益分别进行调节,可基于子带的SNR调节有效音频信号S,使得各子带的有效音频信号的信噪比更加稳定。那么在环境噪声的声音突然变大时,利用本申请的方法可同样增大待输出的有效音频信号的幅值,以增大输出的有效音频的音量,避免有效音频信号被环境噪声掩盖,导致用户听不清通话的音频或者播放的音频。此外,在环境噪声的声音突然变小时,利用本申请的方法同样可减小待输出的有效音频信号的幅值,以减小输出的有效音频的音量,避免有效音频的音量过大影响用户听觉。这样,可使耳机输出的音频保持信噪比的稳定性。
场景2
在上述场景1中,以耳机为执行主体来描述了本申请的音频处理方法。在本场景2中,通话服务网元(例如图5b所示的通话服务网元1),可以作为执行主体来执行场景1中描述的各实施方式以及各方法的处理过程,原理类似,这里不再一一赘述。
需要说明的是,以图5b的通话场景为例,在本申请的方法的执行主体为通话服务网元1时,则通话服务网元1可按照图4和/或场景1中示意的相关实施方式,来对待输出的有效音频信号S(例如图5a所示的有效音频信号S1)进行增益处理,得到图5b的虚线箭头所示意的有效音频信号S’(例如对有效音频信号S1进行增益处理,得到有效音频信号S1’)。通话服务网元1在输出增益处理后的有效音频信号S’(例如有效音频信号S1’)时,可将该处理后的有效音频信号S’输出到耳机,由耳机通过扬声器202进行输出播放。换言之,在本场景中,通话服务网元向耳机输出的有效音频信号S’为按照本申请的方法增益处理后的有效音频信号,而非从对侧网元(例如图5a所示的通话服务网元2) 接收到的有效音频信号S1。
示例性的,该通话服务网元可以是话务系统装置、会议系统装置、服务器、手机等电子设备,或者,该通话服务网元可以是该电子设备中集成的软件模块或硬件芯片,本申请对此不做限制。
场景3
示例性的,结合于图5a,图5a所示的通话服务网元1为图7所示的手机,或图7所示的手机中的软件模块或硬件芯片。
图7为示例性示出的用户1的手机与用户1佩戴的耳机的交互场景示意图。
在本场景3中,手机可作为执行主体来执行场景1中的音频处理方法的相应步骤,具体执行原理类似,可参照场景1或场景2的相关描述,这里不再赘述。
示例性的,以图7所在的场景为通话场景为例进行说明。
示例性的,用户1使用与手机通信连接(例如蓝牙,本申请对此不做限制)的耳机来与图5a所示的用户2进行通话。
示例性的,在通话场景下,结合图5a,图7所示的手机可从通话服务网元2接收到有效音频信号S1;
示例性的,如图7所示,手机可从耳机接收噪声信号N1,这里的噪声信号N1可为场景1中描述的耳机获取的噪声信号N,例如N1可为上述参考传感器信号x(n),或误差传感器信号e(n),或补偿后的噪声信号N_true,本申请对此不做限制。
在一种可能的实施方式中,如图7所示,手机下方可具有麦克风302,手机的麦克风302也可以采集环境的噪声信号,这里为噪声信号N2。
示例性的,手机可将自身采集的噪声信号N2,或,接收自耳机的噪声信号N1作为图4中手机在执行S101时所获取的噪声信号。
示例性的,手机还可以基于噪声信号N1和噪声信号N2,生成噪声信号N3,将噪声信号N3作为图4中手机在执行S101时所获取的噪声信号。
例如,噪声信号N3为噪声信号N1与噪声信号N2的平均值,本申请对此不做限制。
这样,手机可基于获取的噪声信号(N1或N2或N3),以及接收自通话服务网元2的有效音频信号S1,来按照场景1或场景2的相关方法,对有效音频信号S1进行增益处理,以获取有效音频信号S1’并输出至耳机,以由耳机播放增益处理后的有效音频信号S1’。
当然,在通话服务网元1为手机时,本申请各实施方式的方法的执行主体也可以是耳机,那么手机可将噪声信号N2作为下行信号以及待输出的有效音频信号S(例如有效音频信号S1)发送至耳机,由耳机来执行本申请的上述方法的过程,原理类似,这里不再赘述。
示例性的,图7还可应用于音频播放场景,其中,该音频并非通话音频。
示例性的,该音频可以是手机本地音频,例如本地视频的音频、本地音乐、本地录音等,本申请对此不做限制。那么手机进行增益处理的有效音频信号S并非手机从外部电子设备接收,而是从手机本地的存储器中获取的待输出的音频信号。
示例性的,该音频还可以手机从外部电子设备接收到的音频,例如手机在使用视频 应用播放在线视频,或者在使用音乐应用播放在线音乐。那么手机可从视频应用或音乐应用的应用服务器接收到待输出的音频信号,将该待输出的音乐信号作为本申请的有效音频信号S,然后,手机,按照本申请的上述任意实施方式中的方法,来对该有效音频信号S进行增益处理,以输出增益处理后的有效音频信号S’。
场景4
示例性的,结合于图5a,请参照图8,在通话场景下,用户1使用手机与用户2进行通话。
示例性的,用户1的手机可以扬声器的方式,以免提通话的方式,来输出播放通话音频。示例性的,用户1的手机也可以通过受话器(简称“听筒”),来输出播放通话音频。
示例性的,如图8所示,通话服务网元1可以与手机不同,也可以是集成在手机中的软件模块或硬件芯片,本申请对此不做限制。
在图8中,执行本申请的音频处理方法的执行主体可以是通话服务网元1,也可以是手机,下面以手机为执行主体为例进行说明,在执行主体为通话服务网元1时,原理类似,这里不再赘述。
示例性的,以通话场景为例来说明图8所示的场景的方案。
示例性的,如图8所示,手机可包括位于底部边缘的麦克风302,扬声器301,受话器(简称“听筒”)303。可以理解的是,本申请对于手机的麦克风、扬声器、受话器的位置不做限制,它们可以布局在手机的任意位置。例如,图8所示的手机的底部边缘也可以设置扬声器。
手机的麦克风302可采集噪声信号N(例如场景3中手机采集的噪声信号N2)。
在一种可能的实施方式中,在本申请的方法的执行主体为通话服务网元1时,如图8的虚线箭头所示,手机可将采集的噪声信号N发送至通话服务网元1。在本示例中,执行主图为手机,则手机不需要传输该噪声信号N至通话服务网元1。
示例性的,结合图5a,手机可从通话服务网元1接收来自通话服务网元2的有效音频信号S(例如有效音频信号S1)。
手机可按照场景1中介绍的任意实施方式的方法,来基于有效音频信号S和噪声信号N,以确定对有效音频信号S的增益,从而对有效音频信号S进行增益调整,获取调整增益后的有效音频信号S’并输出,具体过程可参照场景1的相关实施例的介绍,这里不再赘述。
在一种可能的实施方式中,用户1使用手机来接听语音时,因人耳与手机的受话器303或者扬声器301之间的距离,导致用户1的人耳接收到的音频信号S_true(例如由音频信号转换后的声音信号),与手机从通话服务网元1接收到的有效音频信号S之间存在差异。
示例性的,如图8所示,手机通过扬声器301输出调整增益后的有效音频信号S’,而手机的扬声器301与用户1的人耳存在一定的空间距离,那么扬声器301输出的有效音频信号S’(例如由有效音频信号S’转换后的声音信号)经过声学传递路径L到达 人耳,人耳接收到的声音信号对应的有效音频信号为S_true’。
为此,手机可对经过增益处理后的有效音频信号S’进行补偿,以生成上述有效音频信号S_true’。
示例性的,手机可获取人耳与手机之间的距离L;然后,手机基于该距离L和格林函数G(r,r0,ω),对有效音频信号S’进行补偿处理,以得到S_true’。
例如,手机可按照公式10,来获取S_true’。
Figure PCTCN2022106850-appb-000005
其中,G(r,r0,ω)为格林函数,r0为手机(例如图8所示的手机的扬声器301)的空间坐标点,r为用户人耳(例如图8所示的用户1的人耳)的空间坐标点,‖r-r0‖表示手机和用户1的人耳之间的距离L。有效音频信号S’表示经过本申请的方法增益处理后的有效音频信号。关于公式10中的其他参数可参考格林函数的相关已知技术的介绍,这里不再赘述。
在一种可能的实施方式中,手机也可以基于上述距离L和格林函数G(r,r0,ω),对从图8所示的通话服务网元1接收到的有效音频信号S进行补偿。示例性的,参照图4,手机可在执行S103时,按照上述距离L和格林函数G(r,r0,ω),对接收自通话服务网元1的有效音频信号进行补偿;然后,手机基于补偿后的有效音频信号,执行图4所示的S105及后续步骤。
需要说明的是,上述场景1至场景4中的相关实施方式可以相互结合,本申请对此不做限制。
此外,上述场景1至场景4对应的图5a至图5c,以及图7、图8中,均以通话场景为例,对电子设备待输出的通话音频进行增益处理为例进行说明。在其他场景中,例如电子设备输出视频或音乐的场景下,则电子设备可通过从应用服务器或者从本地存储器等方式来获取待输出的有效音频信号,并采用本申请的音频处理方法对该有效音频信号进行增益处理,以输出增益处理后的有效音频信号。
另外,上述场景2至场景4中未对本申请的音频处理方法的详细细节进行介绍,具体实现过程已在场景1中介绍,具体参照场景1即可,这里不再赘述。示例性的,图4所示的过程可适用于场景1至场景4以及未列举的其他应用场景。
需要说明的是,上述图5a至图5c,以及图6至图8中,各个附图之间相同的附图标记表示相同的对象,因此,未对各附图的附图标记做逐一解释说明,上述各附图中未提及的附图标记可参照图5a至图5c,以及图6至图8中中已提及的同一附图标记的解释说明,这里不再赘述。
可以理解的是,电子设备为了实现上述功能,其包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为 超出本申请的范围。
一个示例中,图9示出了本申请实施例的一种装置300的示意性框图。装置300可包括:处理器301和收发器/收发管脚302,可选地,还包括存储器303。
装置300的各个组件通过总线304耦合在一起,其中总线304除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图中将各种总线都称为总线304。
可选地,存储器303可以用于前述方法实施例中的指令。该处理器301可用于执行存储器303中的指令,并控制接收管脚接收信号,以及控制发送管脚发送信号。
装置300可以是上述方法实施例中的电子设备或电子设备的芯片。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
本实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的音频处理方法。
本实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的音频处理方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中的音频处理方法。
其中,本实施例提供的电子设备、计算机存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是 各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
本申请各个实施例的任意内容,以及同一实施例的任意内容,均可以自由组合。对上述内容的任意组合均在本申请的范围之内。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。
结合本申请实施例公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备中。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (17)

  1. 一种音频处理方法,应用于电子设备,其特征在于,所述方法包括:
    获取环境音对应的目标噪声信号;
    获取待输出的第一音频信号;
    确定所述第一音频信号与所述目标噪声信号对应的目标信噪比;
    基于所述目标信噪比、所述第一音频信号以及所述目标噪声信号,确定与所述第一音频信号对应的目标增益信号;
    基于所述目标增益信号调整所述第一音频信号,获取第二音频信号;
    输出所述第二音频信号。
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述第一音频信号与所述目标噪声信号对应的目标信噪比,包括:
    将所述第一音频信号划分为多个第一子带;
    将所述目标噪声信号划分为多个第二子带;
    其中,所述多个第一子带与所述多个第二子带对应的频段相同;
    确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比;
    其中,每个所述第一信噪比为对应相同频段的所述第一子带与所述第二子带之间的信噪比。
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比,包括:
    基于掩蔽曲线,确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比。
  4. 根据权利要求3所述的方法,其特征在于,所述基于掩蔽曲线,确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比,包括:
    确定所述多个第一子带与所述多个第二子带之间的多个第二信噪比;
    其中,每个所述第二信噪比为对应相同频段的所述第一子带与所述第二子带之间的信噪比;
    基于掩蔽曲线,确定与所述多个第一子带的频段分别对应的幅值阈值;
    基于所述多个第二信噪比和多个所述幅值阈值,确定所述多个第一子带和所述多个第二子带之间的多个第一信噪比。
  5. 根据权利要求2所述的方法,其特征在于,所述确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比,包括:
    对于对应相同频段的所述第一子带和所述第二子带,确定对应同一时间帧的第三音频信号和第一噪声信号之间的第三信噪比;
    其中,所述第一子带包括所述第三音频信号,所述第二子带包括所述第一噪声信号;
    基于所述第三信噪比,确定对应相同频段的所述第一子带和所述第二子带之间的第一信噪比。
  6. 根据权利要求2至5中任意一项所述的方法,其特征在于,所述目标信噪比包括所述多个第一信噪比;
    所述基于所述目标信噪比、所述第一音频信号以及所述目标噪声信号,确定与所述第一音频信号对应的目标增益信号,包括:
    基于所述多个第一信噪比、所述多个第一子带以及所述多个第二子带,确定与每个所述第一子带对应的第一增益信号;
    其中,所述目标增益信号包括与所述多个第一子带对应的多个所述第一增益信号。
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述目标增益信号调整所述第一音频信号,获取第二音频信号,包括:
    基于所述多个第一增益信号,调整对应的所述多个第一子带的增益,获取多个第三子带;
    其中,每个所述第三子带为调整增益后的每个所述第一子带;
    将所述多个第三子带合成为所述第二音频信号。
  8. 根据权利要求2至7中任意一项所述的方法,其特征在于,所述所述确定所述多个第一子带与所述多个第二子带之间的多个第一信噪比之后,所述基于所述目标信噪比、所述第一音频信号以及所述目标噪声信号,确定与所述第一音频信号对应的目标增益信号之前,方法还包括:
    基于所述多个第一信噪比,确定清晰度指数;
    基于所述清晰度指数,调整所述目标信噪比;
    其中,调整后的所述目标信噪比,用于确定所述目标增益信号。
  9. 根据权利要求1至8中任意一项所述的方法,其特征在于,所述确定所述第一音频信号与所述目标噪声信号对应的目标信噪比之后,所述基于所述目标信噪比、所述第一音频信号以及所述目标噪声信号,确定与所述第一音频信号对应的目标增益信号之前,所述方法还包括:
    基于所述目标噪声信号的分贝和预设噪声阈值,调整所述目标信噪比;
    其中,调整后的所述目标信噪比,用于确定所述目标增益信号。
  10. 根据权利要求2至9中任意一项所述的方法,其特征在于,所述目标信噪比包括第四信噪比;
    所述确定所述第一音频信号与所述目标噪声信号对应的目标信噪比,包括:
    对所述多个第一信噪比取第一平均值;
    将所述第一平均值作为所述第四信噪比。
  11. 根据权利要求1至9中任意一项所述的方法,其特征在于,所述目标信噪比包括第五信噪比;
    所述确定所述第一音频信号与所述目标噪声信号对应的目标信噪比,包括:
    基于时间帧,确定所述第一音频信号与所述目标噪声信号之间的信噪比的第二平均值;
    将所述第二平均值作为所述第五信噪比。
  12. 根据权利要求1至11中任意一项所述的方法,其特征在于,所述获取环境音对应的目标噪声信号,包括:
    获取环境音对应的第二噪声信号;
    基于声传递函数,对所述第二噪声信号进行处理,获取所述目标噪声信号。
  13. 根据权利要求1至12中任意一项所述的方法,其特征在于,所述方法还包括:
    基于人耳与所述电子设备之间的空间距离和格林函数,对所述第一音频信号或所述第二音频信号进行处理。
  14. 一种电子设备,其特征在于,包括:存储器和处理器,所述存储器和所述处理器耦合;所述存储器存储有程序指令,所述程序指令由所述处理器执行时,使得所述电子设备执行如权利要求1至13中任意一项所述的音频处理方法。
  15. 一种计算机可读存储介质,其特征在于,包括计算机程序,当所述计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1至13中任意一项所述的音频处理方法。
  16. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至13中任意一项所述的音频处理方法。
  17. 一种芯片,其特征在于,包括一个或多个接口电路和一个或多个处理器;所述接口电路用于从电子设备的存储器接收信号,并向所述处理器发送所述信号,所述信号包括存储器中存储的计算机指令;当所述处理器执行所述计算机指令时,使得所述电子设备执行权利要求1至13中任意一项所述的音频处理方法。
PCT/CN2022/106850 2022-07-20 2022-07-20 音频处理方法及电子设备 WO2024016229A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/106850 WO2024016229A1 (zh) 2022-07-20 2022-07-20 音频处理方法及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/106850 WO2024016229A1 (zh) 2022-07-20 2022-07-20 音频处理方法及电子设备

Publications (1)

Publication Number Publication Date
WO2024016229A1 true WO2024016229A1 (zh) 2024-01-25

Family

ID=89616629

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/106850 WO2024016229A1 (zh) 2022-07-20 2022-07-20 音频处理方法及电子设备

Country Status (1)

Country Link
WO (1) WO2024016229A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625870A (zh) * 2009-08-06 2010-01-13 杭州华三通信技术有限公司 Ans方法和装置、提高监控系统音频质量的方法和系统
US20120076320A1 (en) * 2010-09-28 2012-03-29 Bose Corporation Fine/Coarse Gain Adjustment
CN102934160A (zh) * 2010-03-30 2013-02-13 Nvoq股份有限公司 用于提高音频质量的听写客户端反馈
CN103915103A (zh) * 2014-04-15 2014-07-09 成都凌天科创信息技术有限责任公司 语音质量增强系统
US9558755B1 (en) * 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
US20190198005A1 (en) * 2017-12-21 2019-06-27 Bose Corporation Dynamic sound adjustment based on noise floor estimate
CN113539285A (zh) * 2021-06-04 2021-10-22 浙江华创视讯科技有限公司 音频信号降噪方法、电子装置和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625870A (zh) * 2009-08-06 2010-01-13 杭州华三通信技术有限公司 Ans方法和装置、提高监控系统音频质量的方法和系统
CN102934160A (zh) * 2010-03-30 2013-02-13 Nvoq股份有限公司 用于提高音频质量的听写客户端反馈
US9558755B1 (en) * 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
US20120076320A1 (en) * 2010-09-28 2012-03-29 Bose Corporation Fine/Coarse Gain Adjustment
CN103915103A (zh) * 2014-04-15 2014-07-09 成都凌天科创信息技术有限责任公司 语音质量增强系统
US20190198005A1 (en) * 2017-12-21 2019-06-27 Bose Corporation Dynamic sound adjustment based on noise floor estimate
CN113539285A (zh) * 2021-06-04 2021-10-22 浙江华创视讯科技有限公司 音频信号降噪方法、电子装置和存储介质

Similar Documents

Publication Publication Date Title
WO2022002166A1 (zh) 一种耳机噪声处理方法、装置及耳机
CN113676804A (zh) 一种主动降噪方法及装置
US20230164475A1 (en) Mode Control Method and Apparatus, and Terminal Device
WO2021147415A1 (zh) 实现立体声输出的方法及终端
WO2021083128A1 (zh) 一种声音处理方法及其装置
CN110602312B (zh) 通话方法、电子设备及计算机可读存储介质
WO2021227696A1 (zh) 一种主动降噪方法及装置
CN114466097A (zh) 防漏音的移动终端及移动终端的声音输出方法
WO2022257563A1 (zh) 一种音量调节的方法,电子设备和系统
CN113438364B (zh) 振动调节方法、电子设备、存储介质
WO2020132907A1 (zh) 一种音频数据的通信方法及电子设备
WO2024016229A1 (zh) 音频处理方法及电子设备
WO2022089563A1 (zh) 一种声音增强方法、耳机控制方法、装置及耳机
CN113467747B (zh) 音量调节方法、电子设备及存储介质
CN113678481B (zh) 无线音频系统、音频通讯方法及设备
WO2024046416A1 (zh) 一种音量调节方法、电子设备及系统
CN117093182B (zh) 一种音频播放方法、电子设备和计算机可读存储介质
WO2023142900A1 (zh) 一种音量调整方法与电子设备
CN116546126B (zh) 一种杂音抑制方法及电子设备
WO2024027259A1 (zh) 信号处理方法及装置、设备控制方法及装置
CN113630501B (zh) 响铃方法及终端设备
CN117153181B (zh) 语音降噪方法、设备及存储介质
CN116320123B (zh) 一种语音信号的输出方法和电子设备
CN117714581A (zh) 音频信号的处理方法和电子设备
CN115206278A (zh) 一种声音降噪的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22951506

Country of ref document: EP

Kind code of ref document: A1