WO2023220918A1 - 一种音频信号处理方法、装置、存储介质和车辆 - Google Patents
一种音频信号处理方法、装置、存储介质和车辆 Download PDFInfo
- Publication number
- WO2023220918A1 WO2023220918A1 PCT/CN2022/093274 CN2022093274W WO2023220918A1 WO 2023220918 A1 WO2023220918 A1 WO 2023220918A1 CN 2022093274 W CN2022093274 W CN 2022093274W WO 2023220918 A1 WO2023220918 A1 WO 2023220918A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- noise
- signal
- information
- audio
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 450
- 238000003672 processing method Methods 0.000 title claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 80
- 238000000034 method Methods 0.000 claims abstract description 73
- 230000000873 masking effect Effects 0.000 claims abstract description 47
- 230000008569 process Effects 0.000 claims abstract description 33
- 230000015654 memory Effects 0.000 claims description 34
- 230000005540 biological transmission Effects 0.000 claims description 26
- 230000001755 vocal effect Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 12
- 238000010586 diagram Methods 0.000 description 25
- 230000006870 function Effects 0.000 description 15
- 230000000694 effects Effects 0.000 description 11
- 230000007613 environmental effect Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000005259 measurement Methods 0.000 description 6
- 230000008447 perception Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 230000001629 suppression Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 3
- 210000005069 ears Anatomy 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 210000000883 ear external Anatomy 0.000 description 2
- 210000000959 ear middle Anatomy 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- SAZUGELZHZOXHB-UHFFFAOYSA-N acecarbromal Chemical compound CCC(Br)(CC)C(=O)NC(=O)NC(C)=O SAZUGELZHZOXHB-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 210000000613 ear canal Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01H—MEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
- G01H17/00—Measuring mechanical vibrations or ultrasonic, sonic or infrasonic waves, not provided for in the preceding groups
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
Definitions
- the present application relates to the field of artificial intelligence technology, and in particular to an audio signal processing method, device, storage medium and vehicle.
- the audio signal can be processed by means such as adjusting the volume to reduce the energy of noise perceived by people and reduce the noise interference that people receive.
- the audio volume is manually adjusted, it will distract people's attention in scenarios such as vehicle driving, causing safety hazards and affecting the driving experience.
- non-acoustic measurement values such as vehicle speed, etc.
- noise interference In current solutions, non-acoustic measurement values (such as vehicle speed, etc.) are usually used to process audio signals to reduce noise interference.
- a large number of experiments are required to calibrate the relationship between non-acoustic measurement values and noise, and when When the external environment changes, it is difficult to accurately determine the noise to adjust the played audio signal, resulting in a poor listening experience for the user.
- embodiments of the present application provide an audio signal processing method.
- the method includes: acquiring a first audio signal collected by a sound sensor; processing the first audio signal to determine the first audio signal in the first audio signal. Noise signal; according to the first noise signal and the second audio signal, adjust the second audio signal to obtain a third audio signal.
- the second audio signal is the original sound source of the playback device, and the adjustment includes amplitude adjustment; played through the playback device the third audio signal.
- the audio signal collected by the sound sensor is fully utilized, avoiding dependence on non-acoustic state information, and can accurately It can accurately estimate the current noise level so that the estimated noise is closer to the actual noise.
- the adjusted audio signal can be played by the playback device.
- Using the noise signal to adjust the original sound source to adapt to the noisy environment can make the adjusted audio signal have a better masking effect on noise and provide the user with a better listening experience.
- processing the first audio signal and determining the first noise signal in the first audio signal includes: processing the first audio signal including: One or more of the human voice information, harmonic information, and sudden sound information are processed to determine the first noise signal in the first audio signal.
- the noise signal by processing one or more of the human voice information, harmonic information, and burst sound information included in the audio signal collected by the sound sensor to determine the noise signal, it is possible to accurately It accurately estimates the current noise level, making the estimated noise closer to the actual noise, so that the adjusted audio signal has a better masking effect on noise, and the user's listening experience is better, and it can be used in a variety of scenarios, making it more Flexible and supports rapid deployment.
- adjusting the second audio signal according to the first noise signal and the second audio signal to obtain a third audio signal includes: according to the first The noise signal and the transmitted information are used to determine the second noise signal, which is the estimated user-perceived noise signal; according to the second noise signal and the second audio signal, the second audio signal is adjusted to obtain a third audio signal .
- a second noise signal that is closer to the actual noise perceived by the user can be obtained, so that the obtained third audio signal can better mask the noise and improve the user's listening experience.
- the transmission information includes transmission information from the sound sensor to the human ear of the user, and/or, in the human ear of conveying information.
- the transmission path of the noise can be simulated more realistically, so that the determined second noise signal is closer to the noise perceived by the actual user.
- the second audio signal is adjusted according to the second noise signal and the second audio signal.
- obtaining the third audio signal including: determining a gain curve according to the second noise signal and the second audio signal; adjusting the second audio signal according to the gain curve to obtain the third audio signal.
- the second audio signal can be adjusted to obtain the third audio signal, thereby achieving the effect of the third audio signal masking noise and ensuring the user's auditory experience.
- the second noise signal and the second audio signal are processed according to the second noise signal and the second audio signal.
- Adjusting the audio signal to obtain the third audio signal includes: determining a gain value according to the second noise signal and the second audio signal; adjusting the second audio signal according to the gain value to obtain the third audio signal.
- the resulting third audio signal can be made to have no sense of modulation, so that the user's listening experience is better.
- adjusting the second audio signal to obtain a third audio signal including: determining a masking domain of the second audio signal to noise according to the second audio signal and psychoacoustic information, the masking domain indicating the second audio signal at each frequency. Masked volume threshold, wherein noise with a volume lower than the volume threshold at each frequency is masked by the second audio signal; according to the second noise signal and the masking domain, the second audio signal is adjusted to obtain a third audio signal.
- the third audio signal can be adjusted in a more targeted manner, thereby achieving a better noise masking effect and ensuring that the user auditory experience.
- the Processing one or more of the vocal information, harmonic information, and burst sound information included in an audio signal to determine the first noise signal in the first audio signal includes: processing the first noise signal included in the first audio signal.
- One or more of human voice information, harmonic information, burst sound information, and echo information are processed to determine the first noise signal in the first audio signal.
- the processing of the first audio signal can be made more targeted and various scenarios can be taken into account, so that the noise signal can be more accurately separated from the first audio signal. , making the noise estimation stable, so that the noise signal can be better masked after adjusting the second audio signal, making the user experience better.
- a seventh possible implementation of the audio signal processing method processing one or more of the vocal information, harmonic information, and burst sound information included in the first audio signal to determine the first noise signal in the first audio signal, including: determining the first When the audio signal includes vocal information and/or harmonic information, the first noise signal is determined to be the first noise signal of the previous frame.
- the first noise signal can be directly obtained by using the first noise signal of the previous frame as the determined noise signal, without the need to perform the processing of removing other information, saving the workload in the adjustment process and saving costs.
- the first audio signal includes the collected first audio signal of the current N frames
- the second audio signal includes the second audio signal of the current N frames to be adjusted
- the third audio signal includes the third audio signal of the current N frames.
- Audio signal, N is a positive integer.
- the amount of calculation in the adjustment process can be flexibly adjusted according to actual conditions, which facilitates deployment in different scenarios.
- inventions of the present application provide an audio signal processing device.
- the device includes: an acquisition module for acquiring the first audio signal collected by the sound sensor; and a first determination module for processing the first audio signal.
- One or more of the included human voice information, harmonic information, and sudden sound information are processed to determine the first noise signal in the first audio signal; the second determination module is used to determine the first noise signal according to the first noise signal and the third
- the second audio signal is adjusted to obtain a third audio signal.
- the second audio signal is the original sound source of the playback device.
- the adjustment includes amplitude adjustment; the playback module is used to play the third audio signal through the playback device.
- the second determination module is configured to determine a second noise signal based on the first noise signal and the transmission information, and the second noise signal is an estimated The noise signal perceived by the user; according to the second noise signal and the second audio signal, the second audio signal is adjusted to obtain a third audio signal.
- the transmission information includes transmission information from the sound sensor to the human ear of the user, and/or, in the human ear of conveying information.
- the second audio signal is adjusted according to the second noise signal and the second audio signal.
- obtaining the third audio signal including: determining a gain curve according to the second noise signal and the second audio signal; adjusting the second audio signal according to the gain curve to obtain the third audio signal.
- the second noise signal and the second audio signal are processed according to the second noise signal and the second audio signal.
- Adjusting the audio signal to obtain the third audio signal includes: determining a gain value according to the second noise signal and the second audio signal; adjusting the second audio signal according to the gain value to obtain the third audio signal.
- adjusting the second audio signal to obtain a third audio signal including: determining a masking domain of the second audio signal to noise according to the second audio signal and psychoacoustic information, the masking domain indicating the second audio signal at each frequency. Masked volume threshold, wherein noise with a volume lower than the volume threshold at each frequency is masked by the second audio signal; according to the second noise signal and the masking domain, the second audio signal is adjusted to obtain the third audio signal.
- the first Determining module configured to: process one or more of human voice information, harmonic information, burst sound information, and echo information included in the first audio signal, and determine the first audio signal in the first audio signal. Noise signal.
- a seventh possible implementation of the audio signal processing device a first determination module configured to determine that the first noise signal is the first noise signal of the previous frame when it is determined that the first audio signal includes vocal information and/or harmonic information.
- the first audio signal includes the collected first audio signal of the current N frames
- the second audio signal includes the second audio signal of the current N frames to be adjusted
- the third audio signal includes the third audio signal of the current N frames.
- Audio signal, N is a positive integer.
- embodiments of the present application provide an audio signal processing device, including: a processor and a memory; the memory is used to store a program; the processor is used to execute the program stored in the memory, so that the device implements the above-mentioned first aspect or one or more audio signal processing methods among multiple possible implementations of the first aspect.
- embodiments of the present application provide a terminal device that can perform one or more of the audio signal processing methods of the first aspect or multiple possible implementations of the first aspect.
- embodiments of the present application provide a computer-readable storage medium on which program instructions are stored, characterized in that, when executed by a computer, the program instructions cause the computer to implement the above-mentioned first aspect or the first aspect.
- embodiments of the present application provide a computer program product, which includes program instructions. When executed by a computer, the program instructions cause the computer to implement the above-mentioned first aspect or any of the various possible implementations of the first aspect.
- One or several audio signal processing methods are included in a computer program product.
- inventions of the present application provide a vehicle.
- the vehicle includes a processor configured to execute one or more of the above-mentioned first aspect or multiple possible implementations of the first aspect. Audio signal processing methods.
- Figure 1(a) shows a schematic diagram of an application scenario according to an embodiment of the present application.
- Figure 1(b) shows a schematic diagram of an application scenario according to an embodiment of the present application.
- Figure 2 shows a schematic diagram of adjusting an audio signal according to an embodiment of the present application.
- Figure 3 shows a flow chart of an audio signal processing method according to an embodiment of the present application.
- Figure 4 shows a flowchart of processing a first audio signal according to an embodiment of the present application.
- Figure 5 shows a structural diagram of an audio signal processing device according to an embodiment of the present application.
- Figure 6 shows a structural diagram of an electronic device according to an embodiment of the present application.
- Figure 7 shows a structural diagram of an electronic device according to an embodiment of the present application.
- Figure 8 shows a structural diagram of an electronic device according to an embodiment of the present application.
- Figure 9 shows a structural diagram of an electronic device according to an embodiment of the present application.
- exemplary means "serving as an example, example, or illustrative.” Any embodiment described herein as “exemplary” is not necessarily to be construed as superior or superior to other embodiments.
- the present application provides an audio signal processing method.
- the audio signal processing method of the embodiment of the present application obtains the audio signal collected by the sound sensor and processes the audio signal to determine the noise signal in the audio signal. , whereby the current noise level can be accurately estimated using acoustic measurement values, avoiding reliance on non-acoustic state information, and making the estimated noise closer to the actual noise, by using the noise signal and the original sound source of the playback device , adjust the original sound source, and the adjusted audio signal is played by the playback device.
- the noise signal can be used to adjust the original sound source to achieve the effect of adapting to the noisy environment, and the adjusted audio signal can have a better masking effect on noise.
- the user's listening experience is better, and the audio signal collected by the sound sensor in the above process can be used in a variety of scenarios, is more flexible, and supports rapid deployment.
- Figure 1(a) and Figure 1(b) show a schematic diagram of an application scenario according to an embodiment of the present application.
- the audio signal processing method of the embodiment of the present application can be used in a noise masking scenario in a vehicle, by adjusting the playback Audio signals to reduce the perception of noise by drivers and passengers in the vehicle.
- the audio signal processing system in the embodiment of the present application can be installed on the vehicle and includes a sound sensor, a processor and a playback device.
- the sound sensor (see Figure 1(a), such as a microphone) can be installed at any position in the vehicle, for example, it can be installed near the driver and passengers in the vehicle to collect audio signals in the vehicle (see 1(a) and 1(b), which may be referred to as the first audio signal) to determine the environmental noise perceived by the user in the vehicle.
- the processor can be built into the vehicle machine (or audio system) as an on-board computing unit, such as a system on chip (SoC), a digital signal processing (digital signal processor, DSP) chip, etc.
- SoC system on chip
- DSP digital signal processing
- the processor can determine the noise signal corresponding to the environmental noise perceived by the user in the car based on the audio signal collected by the sound sensor.
- the processor can also adjust the original sound source (see Figure 1 (a) and Figure 1 (b), which can be called the second audio signal) based on the determination of the noise signal and the original sound source of the playback device to determine the adjusted audio signal (See Figure 1(a) and Figure 1(b), which can be called the third audio signal).
- the processor can also be placed externally in a cloud server.
- Servers and vehicles can communicate through wireless connections, such as 2G/3G/4G/5G and other mobile communication technologies, as well as Wi-Fi, Bluetooth, frequency modulation (FM), digital radio, satellite communications and other wireless Communicate by means of communication.
- wireless connections such as 2G/3G/4G/5G and other mobile communication technologies, as well as Wi-Fi, Bluetooth, frequency modulation (FM), digital radio, satellite communications and other wireless Communicate by means of communication.
- FM frequency modulation
- the server can collect the audio signals collected by the sound sensor for calculation, and send the calculation results back to the corresponding vehicle.
- the playback device (as shown in Figure 1(b)) can be installed in the car, can include speakers, etc., and can be used to play the audio signal adjusted by the processor.
- Figure 2 shows a schematic diagram of adjusting an audio signal according to an embodiment of the present application. As shown in Figure 2, for example, in a scene where music is played in a vehicle, if the noise outside the vehicle becomes louder (for example, passing through a congested road section), but the playback device still plays unadjusted music, as the noise increases, the driver and passengers The perceived noise will also become louder, which will undoubtedly affect the auditory experience of drivers and passengers.
- the audio signal collected by the sound sensor is used to adjust the music played at this time, and the adjusted audio signal is played by the playback device.
- the music heard by the driver and passengers for example, due to a change in volume (the volume becomes louder as shown in the figure), can mask the noise perceived by the driver and passengers, that is, due to the adjustment of the played music, the adjustment
- the resulting music can affect the hearing effect of the human ears of the drivers and passengers, thereby reducing the perception of the noise by the drivers and passengers in the vehicle.
- the drivers and passengers will "not hear” the noise, thus improving their hearing when music is played in the car. experience.
- the audio signal processing method in the embodiment of the present application can also be used in other scenarios that require noise masking other than the vehicle scenarios shown in Figure 1(a), Figure 1(b) and Figure 2, such as It is used in usage scenarios corresponding to electronic devices with audio interaction functions and microphones such as mobile phones and smart homes. This application does not limit this.
- FIG 3 shows a flow chart of an audio signal processing method according to an embodiment of the present application. This method can be used in the audio signal processing system described above. As shown in Figure 3, the method may include:
- Step S301 Obtain the first audio signal collected by the sound sensor.
- the sound sensor can be seen as shown in Figure 1(a).
- the first audio signal can be one frame or multiple frames.
- the first audio signal can be a continuous N frame signal (the size of N can be preset), or It may be an interval of N frame signals (for example, N frames include signals determined every 1 frame interval).
- Contains the acoustic information in the environment collected by the sound sensor which may include, for example, human voice information, harmonic information, burst sound information, echo information, noise information, etc.
- human voice information can include the voices of drivers and passengers in the car collected by sound sensors; harmonic information can include long vowels and horn sounds in the voices of drivers and passengers in the car collected by sound sensors; sudden Acoustic information may include short-term bursts of sound collected by sound sensors, such as when opening and closing doors; echo information may include sounds collected by sound sensors and played by a playback device.
- the playback device may play, for example, music, navigation reports, or Audio of other voice announcements; noise information can include ambient noise inside and outside the vehicle.
- the intensity of environmental noise perceived by the user in the car can be estimated, so that the audio signal can be adjusted in a more targeted manner to better mask the noise.
- Step S302 Process the first audio signal to determine the first noise signal in the first audio signal.
- the above processing includes removing the corresponding information in the first audio signal.
- the first noise signal for example, human voice information, harmonic information, burst sound information, echo information, etc. included in the first audio signal may be removed.
- FIG. 4 a flow chart for processing a first audio signal according to an embodiment of the present application is shown. As shown in Figure 4, the process of processing the first audio signal may include:
- Step S401 Process the vocal information in the first audio signal.
- the human voice information may correspond to the audio information generated by the voices of drivers and passengers or people outside the vehicle.
- Methods such as voice activity detection (VAD) can be used for processing.
- VAD voice activity detection
- the VAD method can be used to determine whether the first audio signal includes vocal information.
- the first audio signal of the first few frames for example, the first 3-5 frames
- use methods such as smooth interpolation to remove the human voice information included in the current first audio signal, and other methods can also be used to process the human voice information, which is not limited by this application.
- Step S402 process the harmonic information in the first audio signal.
- Harmonic information can correspond to the audio information produced by long vowels in speech, trumpet sounds, etc.
- Methods such as long vowel detection (LVD) can be used for processing.
- the LVD method can include counting the energy peaks of the first audio signal in the frequency domain to determine whether the first audio signal includes harmonics. wave information.
- the first audio signal includes the above-mentioned human voice information and/or the above-mentioned harmonic information
- it may be determined that the first noise signal is the first noise signal of the previous frame.
- the first noise signal of the previous frame can be directly used as the first noise signal of the current frame.
- the first noise signal can be obtained directly without the need to remove other information. , saving the workload in the adjustment process and saving costs.
- Step S403 Process the burst sound information in the first audio signal.
- the sudden sound information may correspond to audio information generated by short-term sounds such as when a vehicle opens or closes a door.
- Methods such as minimum statistics (MS) can be used for processing.
- MS minimum statistics
- the MS method can be used to estimate the burst sound information included in the first audio signal, and the estimated burst sound information can be removed, or other methods other than MS can be used to remove the burst sound information included in the first audio signal.
- the utterance information is estimated to remove the burst sound information included in the first audio signal, which is not limited by this application.
- the burst sound information can be processed after the human voice information and harmonic information are processed, so that the residual human voice can also be removed when processing the burst sound information. or harmonic information.
- the first noise signal After removing one or more of the above-mentioned information included in the first audio signal, the first noise signal can be determined.
- the current noise level can be estimated more accurately, Make the estimated noise closer to the actual noise.
- the subsequently adjusted audio signal can have a better masking effect on noise, giving users a better listening experience. It can be used in a variety of scenarios, is more flexible, and supports rapid deployment.
- the process of processing the first audio signal may also include:
- Step S404 Process the echo information in the first audio signal.
- the echo information may correspond to audio information generated by audio played by the playback device.
- the frequency domain adaptive filter (FDAF) method can be first used to remove the echo information in the first audio signal.
- the method can be a linear suppression method, for example, it can also be other than FDAF.
- Arbitrary linear echo cancellation (LEC) method can be used.
- the noise value estimated from this is not accurate enough, which will lead to subsequent misadjustment of the second audio signal, resulting in a chain reaction. Therefore, it can be based on linear suppression.
- the residual echo suppression (RES) method can also be used to remove the residual echo information.
- the frames (such as frames 3-5) before the first audio signal of the current frame can be used. an audio signal.
- the method of using FDAF and RES to remove echo information may produce spectrum holes, that is, there may be over-cancellation of some frequency points of the first audio signal in the frequency domain.
- the spectrum of the noise signal is usually It is relatively smooth, so the frequency domain smoothing (FS) method can also be used to compensate for the over-spectrum hole part.
- FS frequency domain smoothing
- the first audio signal can be processed in a more targeted manner, taking into account various scenarios, so that the noise signal can be more accurately separated from the first audio signal, so that the noise estimation is stable, and the second audio signal can be accurately processed. After the audio signal is adjusted, the noise signal can be better masked, making the user experience better.
- this application does not limit the order in which the echo information is processed, and the vocal information, harmonic information, and burst sound information are processed, that is, there is no limit on the execution order between step S401 to step S404.
- the echo information may be processed first, and then one or more of the vocal information, harmonic information, and burst sound information may be processed.
- the obtained signal can be considered as the first noise signal, that is, as the estimated user perception in the car.
- the environmental noise can be adjusted according to the first noise signal and the original sound source of the playback device to determine the adjusted audio signal for playback, so that the adjusted audio signal can mask the environmental noise perceived by the user in the car to achieve noise control.
- the detailed process can be found in Figure 3 below.
- Step S303 Adjust the second audio signal according to the first noise signal and the second audio signal to obtain a third audio signal.
- the second audio signal is the original sound source of the playback device, and the adjustment may include amplitude adjustment.
- the playback device can be seen as shown in Figure 1(b).
- the original sound source can be music, navigation sound, voice call sound, etc. This application does not limit this.
- the third audio signal obtained by adjusting the amplitude of the second audio signal can mask environmental noise, that is, it can reduce the user's perception of environmental noise, thereby achieving a noise masking effect.
- the first noise signal can be processed to be closer to the actual user's perception. Noise size, see below.
- This step S303 may include:
- the second noise signal is the estimated user-perceived noise signal
- the first noise signal can be weighted according to the transfer information to determine the second noise signal, or the second noise signal can be determined using the transfer information according to other methods, which is not limited in this application.
- the transmission information may include transmission information from the sound sensor to the user's human ear, and/or transmission information in the human ear.
- the transmission information from the sound sensor to the user's ear can represent the transmission path of the noise from the sound sensor to the user's ear (for example, the ear location area). ) relative position is determined.
- the transmission information in the human ear can represent the transmission path of noise in the user's ear canal (for example, from the outer ear to the middle ear).
- the transmission information can be determined by, for example, the outer and middle ear attenuation functions, or A-Weighted. , thereby reducing the amount of calculation and improving performance.
- the second noise signal can also be determined through other methods, which is not limited by this application.
- the transmission path of the noise can be simulated more realistically, so that the determined second noise signal is closer to the noise perceived by the actual user.
- the second audio signal can be adjusted according to the second noise signal and the second audio signal to obtain a third audio signal.
- a second noise signal that is closer to the actual noise perceived by the user can be obtained, so that the obtained third audio signal can better mask the noise and improve the user's listening experience.
- the second audio signal may be multiplied by a gain according to the second noise signal and the second audio signal to obtain the third audio signal.
- the gain may be a gain curve or a gain value, as described below.
- adjusting the second audio signal according to the second noise signal and the second audio signal to obtain the third audio signal may include:
- the gain curve may be determined according to the masking domain of the second noise signal and the second audio signal to the noise, wherein the masking domain may indicate that the second audio signal masks each frequency.
- the volume threshold of please see below for how to obtain the masking domain.
- the amplitude of the second noise signal corresponding to each frequency in the frequency domain can be subtracted from the volume threshold of the corresponding frequency in the masking domain to determine the gain curve.
- the gain curve can represent the amplitude gain corresponding to each frequency in the frequency domain.
- the second audio signal can be adjusted according to the gain curve to obtain a third audio signal.
- the amplitude of the third audio signal corresponding to each frequency in the frequency domain can be determined by multiplying the value corresponding to each frequency on the gain curve with the amplitude of the second audio signal corresponding to the frequency in the frequency domain, thereby determining third audio signal.
- the second audio signal can be adjusted to obtain the third audio signal, thereby achieving the effect of the third audio signal masking noise and ensuring the user's auditory experience.
- the overall gain value can be used instead of the gain curve to avoid some singular values in the gain curve.
- Adjusting the second audio signal according to the second noise signal and the second audio signal to obtain a third audio signal may include:
- the gain curve may be determined according to the masking domain of the noise by the second noise signal and the second audio signal, and the gain value may be determined according to the gain curve.
- the gain value may be, for example, the root mean square value of all or part of the values on the gain curve, or a weighted average, etc., which is not limited in this application.
- the gain value may be one value, and the gain value may be determined based on the values corresponding to all frequencies on the gain curve, or may be determined based on the values corresponding to some frequencies (for example, 20 frequencies) on the gain curve.
- the gain value may also be multiple values (for example, 2-5 values), and the multiple gain values may be determined respectively based on the high frequency, intermediate frequency, and low frequency parts of the gain curve, for example.
- the second audio signal can be adjusted according to the gain value to obtain a third audio signal.
- the gain value can be multiplied by the amplitude of each frequency of the second audio signal in the frequency domain to determine the amplitude of the third audio signal corresponding to each frequency in the frequency domain, thereby determining the third audio signal.
- the gain value corresponding to the high frequency can also be compared with the amplitude of the corresponding high frequency part of the second audio signal in the frequency domain. Multiply by multiplying the gain value corresponding to the intermediate frequency with the amplitude of the corresponding intermediate frequency part of the second audio signal in the frequency domain, and multiplying the gain value corresponding to the low frequency with the amplitude of the corresponding low frequency part of the second audio signal in the frequency domain. Multiply to determine the third audio signal.
- the resulting third audio signal can be made to have no sense of modulation, so that the user's listening experience is better.
- the masking domain of the second audio signal against the noise can also be obtained to determine the basis for calculating the gain value or gain curve above, see the following:
- Adjusting the second audio signal according to the second noise signal and the second audio signal to obtain a third audio signal may include:
- the masking domain may indicate a volume threshold masked by the second audio signal at each frequency, and noise with a volume lower than the volume threshold at each frequency may be masked by the second audio signal. For example, if the masking threshold of the second audio signal at 400 Hz is 30 dBspl, then the noise signal below 30 dBspl may not be perceived by the user, thereby achieving the effect of masking the noise.
- the above-mentioned psychoacoustic information may include, for example, the user's threshold of hearing, loudness, pitch, sound masking, and other information.
- the above-mentioned psychoacoustic information may be obtained, for example, based on a psychoacoustic model, such as audio This application does not limit the perceptual evaluation of audio quality (PEAQ) model, Johnston model, Terhardt model, etc.
- PEAQ perceptual evaluation of audio quality
- Johnston model Johnston model
- Terhardt model Terhardt model
- the volume threshold at which the second audio signal can be masked at different frequencies can be determined through psychoacoustic information.
- the second audio signal can be adjusted according to the second noise signal and the masking domain to obtain a third audio signal.
- the gain value or gain curve can be obtained by using the second noise signal and the masking domain, so that the second audio signal can be adjusted to obtain the third audio signal.
- the third audio signal can be adjusted in a more targeted manner, thereby achieving a better noise masking effect and ensuring that the user auditory experience.
- the third audio signal determined in step S303 can also be modified in the frequency domain (for example, equal loudness compensation), for example, by using The loudness information, hearing threshold information, etc. in the above-mentioned psychoacoustic information are used to modify the third audio signal.
- the loudness information may include equal-loudness curves. That is to say, the user's human ear perception within the hearing frequency range may be used.
- the relationship between different pure tone sound pressure levels and frequencies is used to compensate and modify the third audio signal at different frequencies to adapt to the sensitivity of the human ear to different loudnesses.
- Step S304 Play the third audio signal through the playback device.
- the audio signal collected by the sound sensor is fully utilized, avoiding the dependence on non-acoustic state information, and can accurately Estimating the current noise level makes the estimated noise closer to the actual noise.
- the adjusted audio signal is played by the playback device, which can be used
- the noise signal adjusts the original sound source to achieve the effect of adapting to the noise environment, which can make the adjusted audio signal have a better masking effect on noise and provide a better listening experience for the user.
- the first audio signal may include the collected first N frames of audio signals
- the second audio signal may include the current N frames of second audio signals to be adjusted
- the third audio signal may Including the third audio signal of the current N frames, N is a positive integer.
- the size of N may be preset, and the N frame signal may be spaced N frames (for example, the N frames include signals determined every 1 frame), or may be continuous N frames, which is not limited by this application.
- N can be 1 so that frame-by-frame processing can be performed to dynamically determine the third audio signal.
- the amount of calculation in the adjustment process can be flexibly adjusted according to the actual situation to facilitate deployment in different scenarios.
- Figure 5 shows a structural diagram of an audio signal processing device according to an embodiment of the present application. As shown in Figure 5, the device includes:
- Acquisition module 501 used to acquire the first audio signal collected by the sound sensor
- the first determination module 502 is used to process one or more of the vocal information, harmonic information, and burst sound information included in the first audio signal, and determine the first noise signal in the first audio signal;
- the second determination module 503 is used to adjust the second audio signal according to the first noise signal and the second audio signal to obtain a third audio signal.
- the second audio signal is the original sound source of the playback device.
- the adjustment includes amplitude adjustment. ;
- the playback module 504 is used to play the third audio signal through the playback device.
- the noise signal is determined by This can accurately estimate the current noise level, making the estimated noise closer to the actual noise.
- the adjusted audio signal is played by the playback device.
- the noise signal it is possible to use the noise signal to adjust the original sound source to achieve the effect of adapting to the noisy environment. It can make the adjusted audio signal have a better masking effect on the noise, and the user's hearing experience is better, and it is not used in the above process.
- Non-acoustic measurement values fully utilize the audio signals collected by sound sensors, avoiding dependence on non-acoustic status information. They can be used in a variety of scenarios, are more flexible, and support rapid deployment.
- the first determination module 502 may be configured to determine that the first noise signal is the first noise signal of the previous frame when it is determined that the first audio signal includes vocal information and/or harmonic information.
- the first noise signal can be directly obtained by using the first noise signal of the previous frame as the determined noise signal, without the need to perform the processing of removing other information, saving the workload in the adjustment process and saving costs.
- the first determination module 502 may be used to: process one or more of the vocal information, harmonic information, burst sound information, and echo information included in the first audio signal, and determine the first A first noise signal in an audio signal.
- the processing of the first audio signal can be made more targeted and various scenarios can be taken into account, so that the noise signal can be more accurately separated from the first audio signal. , making the noise estimation stable, so that the noise signal can be better masked after adjusting the second audio signal, resulting in a better user experience.
- the second determination module 503 may be configured to: determine a second noise signal based on the first noise signal and the transmission information, where the second noise signal is an estimated user-perceived noise signal; based on the second noise signal and the third The second audio signal is adjusted to obtain the third audio signal.
- a second noise signal that is closer to the actual noise perceived by the user can be obtained, so that the obtained third audio signal can better mask the noise and improve the user's listening experience.
- the transmission information may include transmission information from the sound sensor to the user's human ear, and/or transmission information in the human ear.
- the transmission path of the noise can be simulated more realistically, so that the determined second noise signal is closer to the noise perceived by the actual user.
- adjusting the second audio signal according to the second noise signal and the second audio signal to obtain the third audio signal may include: determining a gain curve according to the second noise signal and the second audio signal; Curve, adjust the second audio signal to obtain the third audio signal.
- the second audio signal can be adjusted to obtain the third audio signal, thereby achieving the effect of the third audio signal masking noise and ensuring the user's auditory experience.
- adjusting the second audio signal according to the second noise signal and the second audio signal to obtain the third audio signal may include: determining a gain value according to the second noise signal and the second audio signal; The second audio signal is adjusted by the value to obtain the third audio signal.
- the resulting third audio signal can be made to have no sense of modulation, so that the user's listening experience is better.
- adjusting the second audio signal according to the second noise signal and the second audio signal to obtain the third audio signal may include: determining the effect of the second audio signal on the noise according to the second audio signal and the psychoacoustic information.
- a masking domain indicating a volume threshold masked by the second audio signal at each frequency, wherein noise with a volume lower than the volume threshold at each frequency is masked by the second audio signal; according to the second noise signal and the masking domain, the The second audio signal is adjusted to obtain a third audio signal.
- the third audio signal can be adjusted in a more targeted manner, thereby achieving a better noise masking effect and ensuring that the user auditory experience.
- the first audio signal may include the collected first audio signal of the current N frames
- the second audio signal may include the second audio signal of the current N frames to be adjusted
- the third audio signal may include the third audio signal of the current N frames.
- signal, N is a positive integer.
- the amount of calculation in the adjustment process can be flexibly adjusted according to actual conditions, which facilitates deployment in different scenarios.
- FIG. 6 shows a structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device can be a terminal, such as a car or a car machine, or it can be a chip built into the terminal, and can implement each step of the audio signal processing method shown in Figures 3-4, or implement the audio signal shown in Figure 5.
- the electronic device 600 includes a processor 601 and an interface circuit 602 coupled with the processor. It should be understood that although only one processor and one interface circuit are shown in FIG. 6 . Electronic device 600 may include other numbers of processors and interface circuits.
- the interface circuit 602 is used to communicate with other components of the terminal, such as memory or other processors.
- the processor 601 is used for signal interaction with other components through the interface circuit 602 .
- Interface circuit 602 may be an input/output interface of processor 601.
- the processor 601 may be a processor in a vehicle-mounted device such as a vehicle machine, or may be a processing device sold separately.
- the processor 601 reads computer programs or instructions in a memory coupled thereto through the interface circuit 602, and decodes and executes these computer programs or instructions.
- the electronic device 600 can be enabled to implement the solution in the audio signal processing method provided by the embodiment of the present application.
- these programs or instructions are stored in a memory external to the electronic device 600 .
- the above program or instruction is decoded and executed by the processor 601, part or all of the above program or instruction is temporarily stored in the memory.
- these programs or instructions are stored in the internal memory of the electronic device 600 .
- the electronic device 600 may be set in the terminal of the embodiment of the present application.
- part of the content of these programs or instructions is stored in a memory outside the electronic device 600 , and other parts of the content of these programs or instructions are stored in a memory inside the electronic device 600 .
- FIG. 7 shows a structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device may be a terminal, such as a car or a car machine, or may be a chip built into the terminal, and implement each step of the audio signal processing method shown in Figures 3-4, or implement the audio signal processing shown in Figure 5.
- the electronic device 700 includes: a processor 701, and a memory 702 coupled to the processor. It should be understood that although only one processor and one memory are shown in FIG. 7 . Electronic device 700 may include other numbers of processors and memories.
- the memory 702 is used to store computer programs or computer instructions.
- the electronic device 700 can implement each step in the audio signal processing method according to the embodiment of the present application.
- Figure 8 shows a structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device 800 can be a terminal, such as a car or a car machine, or can be a chip built into the terminal, and can implement the steps of the audio signal processing method shown in Figures 3-4, or implement the above The functions of each module of the audio signal processing device shown in Figure 5.
- the electronic device 800 includes at least one processor 1801, at least one memory 1802, and at least one communication interface 1803.
- the electronic device may also include common components such as antennas, which will not be described in detail here.
- each component of the electronic device 800 will be introduced in detail with reference to FIG. 8 .
- the processor 1801 may be a general central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits used to control the execution of the program above.
- the processor 1801 may include one or more processing units.
- the processor 1801 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc.
- different processing units can be independent devices or integrated in one or more processors.
- Communication interface 1803 is used to communicate with other electronic devices or communication networks, such as Ethernet, Radio Access Network (RAN), core network, Wireless Local Area Networks (Wireless Local Area Networks, WLAN), etc.
- RAN Radio Access Network
- WLAN Wireless Local Area Networks
- Memory 1802 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory (RAM)) or other type that can store information and instructions.
- Dynamic storage device it can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, optical disk storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by a computer Any other medium for access, but not limited to this.
- the memory can exist independently and be connected to the processor through a bus. Memory can also be integrated with the processor.
- the memory 1802 is used to store the application program code for executing the above solution, and the processor 1801 controls the execution.
- the processor 1801 is used to execute application code stored in the memory 1802.
- the above-mentioned acquisition module 501 in Figure 5 can be implemented by the communication interface 1803 in Figure 8; the above-mentioned first determination module 502 and the second determination module in Figure 5 503 may be implemented by processor 1801 in Figure 8.
- FIG. 9 shows a structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device can be the above-mentioned terminal, such as a car or a car machine, or it can be a chip built into the terminal, and can execute the audio signal processing method shown in any one of the above-mentioned Figures 3-4, or implement the above-mentioned Figure 5
- the functions of each module of the audio signal processing device are shown.
- the electronic device 900 includes a sound sensor 901 , a processing unit 902 coupled with the sound sensor 901 , and a speaker 903 coupled with the processing unit 902 . It should be understood that although only one sound sensor, one speaker and one processing unit are shown in FIG. 9 .
- Electronic device 900 may include other numbers of sound sensors, speakers, and processing units.
- the sound sensor 901 may include a capacitive microphone, a dynamic microphone, a laser microphone, etc.
- the sound sensor 901 is used to collect the above-mentioned first audio signal.
- the processing unit 902 can be used to process the first audio signal, determine the noise signal in the first audio signal, and can also adjust the original sound source according to the noise signal and the original sound source to obtain an adjusted audio signal.
- the speaker 903 can be used to play the adjusted audio signal, so that the playing audio can have a better masking effect on noise, thereby improving the user's listening experience.
- the electronic device in the embodiment of the present application can be implemented by software, for example, by the above-mentioned computer program or instructions.
- the corresponding computer program or instructions can be stored in the memory inside the terminal, and the memory can be read by the processor.
- the corresponding internal computer programs or instructions implement the above functions.
- the electronic device in the embodiment of the present application may also be implemented by hardware.
- the processing unit 902 is a processor.
- Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above.
- Non-exhaustive list of computer-readable storage media include: portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), erasable memory Erasable PROM (EPROM), Static Random-Access Memory (SRAM), Portable Compact Disc Read-Only Memory (CD-ROM), digital multi-function Digital Video Disc (DVD), memory stick, floppy disk, mechanical encoding device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above.
- RAM random access memory
- ROM read only memory
- EPROM erasable memory Erasable PROM
- SRAM Static Random-Access Memory
- CD-ROM Portable Compact Disc Read-Only Memory
- DVD digital multi-function Digital Video Disc
- memory stick floppy disk
- mechanical encoding device such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above.
- Computer-readable program instructions or code described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
- the computer program instructions used to perform the operations of this application can be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or one or more Source code or object code written in any combination of programming languages, including object-oriented programming languages - such as Smalltalk, C++, etc., and conventional procedural programming languages - such as the "C" language or similar programming languages.
- the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
- the remote computer can be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or it can be connected to an external computer (e.g. Use an Internet service provider to connect via the Internet).
- electronic circuits are customized by utilizing state information of computer-readable program instructions, such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Arrays, FPGAs) or programmable logic arrays (Programmable Logic Array (PLA), the electronic circuit can execute computer-readable program instructions to implement various aspects of the present application.
- These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
- These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
- Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more operable functions for implementing the specified logical functions.
- Execute instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
- each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by hardware (such as circuits or ASICs) that perform the corresponding function or action. Specific Integrated Circuit), or can be implemented with a combination of hardware and software, such as firmware.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
本申请涉及一种音频信号处理方法、装置、存储介质和车辆。该方法包括:获取声音传感器采集的第一音频信号;对第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定第一音频信号中的第一噪声信号;根据第一噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,第二音频信号为播放设备的原始音源,调节包括幅度调节;通过播放设备播放第三音频信号。根据本申请实施例,可以准确地估计出当前的噪声水平,使得调节后的音频信号对噪声的掩蔽效果更好,用户的听觉体验更佳,且在上述过程中避免了对非声学状态信息的依赖,可以在多种场景下使用,更加灵活,支持快速部署。
Description
本申请涉及人工智能技术领域,尤其涉及一种音频信号处理方法、装置、存储介质和车辆。
在例如音乐播放、语音通话、导航提示、人机交互等音频播放场景下,噪声的大小会影响人们的音频体验。为了获取较好的音频体验,可以通过调节音量等手段对音频信号进行处理以降低人们感知的噪声的能量,减少人们受到的噪声干扰。然而,如果通过人工调节音频音量,在例如车辆驾驶的场景中会分散人们的注意力,造成安全隐患,也会影响驾驶体验。
当前的方案中,通常是利用非声学测量值(如车速等)来处理音频信号,以减少噪声干扰,但这种情况下需要依赖大量的实验以标定非声学测量值与噪声的关系,且当外界环境变化时难以准确的确定噪声以对播放的音频信号进行调整,从而使得用户获得听觉体验不佳。
发明内容
有鉴于此,提出了一种音频信号处理方法、装置、存储介质和车辆。
第一方面,本申请的实施例提供了一种音频信号处理方法,该方法包括:获取声音传感器采集的第一音频信号;对第一音频信号进行处理,确定该第一音频信号中的第一噪声信号;根据第一噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,该第二音频信号为播放设备的原始音源,该调节包括幅度调节;通过播放设备播放该第三音频信号。
根据本申请实施例,通过获取声音传感器采集的音频信号,并对音频信号进行处理,以确定噪声信号,完全利用了声音传感器采集的音频信号,避免了对非声学状态信息的依赖,且可以准确地估计出当前的噪声水平,使得估计出的噪声更接近于实际噪声,通过根据该噪声信号和播放设备的原始音源,对原始音源进行调节,得到调节后的音频信号由播放设备播放,可以实现利用噪声信号对原始音源进行调节,以达到适应噪声环境的效果,可以使得调节后的音频信号对噪声的掩蔽效果更好,用户的听觉体验更佳。
根据第一方面,在音频信号处理方法的第一种可能的实现方式中,对第一音频信号进行处理,确定该第一音频信号中的第一噪声信号,包括:对第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定第一音频信号中的第一噪声信号。
根据本申请实施例,通过对获取的声音传感器采集的音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,以确定噪声信号,由此可以准确地估计出当前的噪声水平,使得估计出的噪声更接近于实际噪声,使得调节后的音频信号对噪声的掩蔽效果更好,用户的听觉体验更佳,且可以在多种场景下使用,更加灵活,支持快速部署。
根据第一方面,在音频信号处理方法的第一种可能的实现方式中,根据第一噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,包括:根据第一噪声信号和传递信息,确定第二噪声信号,该第二噪声信号为估计的用户感知的噪声信号;根据第二噪 声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过对第一噪声信号进行处理,可以得到更接近于用户实际感知噪声的第二噪声信号,从而得到的第三音频信号可以更好的掩蔽噪声,提升了用户的听觉体验。
根据第一方面的第一种可能的实现方式,在音频信号处理方法的第二种可能的实现方式中,该传递信息包括声音传感器至用户的人耳的传递信息,和/或,人耳中的传递信息。
根据本申请实施例,可以更加真实的模拟噪声的传递路径,使得确定的第二噪声信号更接近于实际用户感知的噪声。
根据第一方面的第一种或第二种可能的实现方式,在音频信号处理方法的第三种可能的实现方式中,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,包括:根据第二噪声信号和第二音频信号,确定增益曲线;根据该增益曲线,对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过利用增益曲线,可以实现对第二音频信号进行调节,得到第三音频信号,从而实现第三音频信号掩蔽噪声的效果,保证了用户的听觉感受。
根据第一方面的第一种或第二种或第三种可能的实现方式,在音频信号处理方法的第四种可能的实现方式中,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,包括:根据第二噪声信号和第二音频信号,确定增益值;根据该增益值对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过以增益值替代增益曲线,以对第二音频信号进行调节,可以使得得到的第三音频信号没有调制感,使得用户的听觉体验更佳。
根据第一方面的第一种或第二种或第三种或第四种可能的实现方式,在音频信号处理方法的第五种可能的实现方式中,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,包括:根据第二音频信号和心理声学信息,确定第二音频信号对噪声的掩蔽域,该掩蔽域指示第二音频信号在各频率上掩蔽的音量阈值,其中,各频率上音量低于音量阈值的噪声被第二音频信号掩蔽;根据第二噪声信号和掩蔽域,对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过利用心理声学信息确定第二音频信号对噪声的掩蔽域,可以更有针对性地对第三音频信号进行调节,从而能获得更好的对噪声掩蔽的效果,保证用户的听觉感受。
根据第一方面或第一方面的第一种或第二种或第三种或第四种或第五种可能的实现方式,在音频信号处理方法的第六种可能的实现方式中,对第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定第一音频信号中的第一噪声信号,包括:对第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种,以及回声信息进行处理,确定第一音频信号中的第一噪声信号。
根据本申请实施例,通过对回声信息进行处理,可以使得对第一音频信号的处理更有针对性,且兼顾到了各种场景,由此可以更加精准的从第一音频信号中分离除噪声信号,使得噪声估计稳定,从而对第二音频信号进行调节后可以更好的掩蔽噪声信号,使得用户的体验更佳。
根据第一方面或第一方面的第一种或第二种或第三种或第四种或第五种或第六种可能的实现方式,在音频信号处理方法的第七种可能的实现方式中,对第一音频信号中包括的人声 信息、谐波信息、突发声信息中的一种或多种进行处理,确定第一音频信号中的第一噪声信号,包括:在确定第一音频信号中包括人声信息和/或谐波信息的情况下,确定第一噪声信号为前1帧的第一噪声信号。
根据本申请实施例,通过以前1帧的第一噪声信号作为确定的噪声信号,可以直接得到第一噪声信号,无需再进行去除其他信息的处理,节省了调节过程中的工作量,节约了成本。
根据第一方面或第一方面的第一种或第二种或第三种或第四种或第五种或第六种或第七种可能的实现方式,在音频信号处理方法的第八种可能的实现方式中,上述第一音频信号包括采集的当前N帧第一音频信号,上述第二音频信号包括待调节的当前N帧第二音频信号,上述第三音频信号包括当前N帧第三音频信号,N为正整数。
根据本申请实施例,通过对使用的音频信号的帧数不作限制,可以实现根据实际情况灵活的对调节过程中的计算量进行调整,便于不同场景下的部署。
第二方面,本申请的实施例提供了一种音频信号处理装置,该装置包括:获取模块,用于获取声音传感器采集的第一音频信号;第一确定模块,用于对第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定第一音频信号中的第一噪声信号;第二确定模块,用于根据第一噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,该第二音频信号为播放设备的原始音源,该调节包括幅度调节;播放模块,用于通过播放设备播放第三音频信号。
根据第二方面,在音频信号处理装置的第一种可能的实现方式中,第二确定模块,用于:根据第一噪声信号和传递信息,确定第二噪声信号,该第二噪声信号为估计的用户感知的噪声信号;根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号。
根据第二方面的第一种可能的实现方式,在音频信号处理装置的第二种可能的实现方式中,该传递信息包括声音传感器至用户的人耳的传递信息,和/或,人耳中的传递信息。
根据第二方面的第一种或第二种可能的实现方式,在音频信号处理装置的第三种可能的实现方式中,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,包括:根据第二噪声信号和第二音频信号,确定增益曲线;根据该增益曲线,对第二音频信号进行调节,得到第三音频信号。
根据第二方面的第一种或第二种或第三种可能的实现方式,在音频信号处理装置的第四种可能的实现方式中,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,包括:根据第二噪声信号和第二音频信号,确定增益值;根据该增益值对第二音频信号进行调节,得到第三音频信号。
根据第二方面的第一种或第二种或第三种或第四种可能的实现方式,在音频信号处理装置的第五种可能的实现方式中,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,包括:根据第二音频信号和心理声学信息,确定第二音频信号对噪声的掩蔽域,该掩蔽域指示第二音频信号在各频率上掩蔽的音量阈值,其中,各频率上音量低于音量阈值的噪声被第二音频信号掩蔽;根据第二噪声信号和掩蔽域,对第二音频信号进行调节,得到所三音频信号。
根据第二方面或第二方面的第一种或第二种或第三种或第四种或第五种可能的实现方式,在音频信号处理装置的第六种可能的实现方式中,第一确定模块,用于:对第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种,以及回声信息进行处理,确定所 第一音频信号中的第一噪声信号。
根据第二方面或第二方面的第一种或第二种或第三种或第四种或第五种或第六种可能的实现方式,在音频信号处理装置的第七种可能的实现方式中,第一确定模块,用于:在确定第一音频信号中包括人声信息和/或谐波信息的情况下,确定第一噪声信号为前1帧的第一噪声信号。
根据第二方面或第二方面的第一种或第二种或第三种或第四种或第五种或第六种或第七种可能的实现方式,在音频信号处理装置的第八种可能的实现方式中,上述第一音频信号包括采集的当前N帧第一音频信号,上述第二音频信号包括待调节的当前N帧第二音频信号,上述第三音频信号包括当前N帧第三音频信号,N为正整数。
第三方面,本申请的实施例提供了一种音频信号处理装置,包括:处理器和存储器;该存储器用于存储程序;该处理器用于执行存储器所存储的程序,以使装置实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的音频信号处理方法。
第四方面,本申请的实施例提供了一种终端设备,该终端设备可以执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的音频信号处理方法。
第五方面,本申请的实施例提供了一种计算机可读存储介质,其上存储有程序指令,其特征在于,该程序指令当被计算机执行时使得计算机实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的音频信号处理方法。
第六方面,本申请的实施例提供了一种计算机程序产品,其包括有程序指令,该程序指令当被计算机执行时使得计算机实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的音频信号处理方法。
第七方面,本申请的实施例提供了一种车辆,该车辆包括处理器,该处理器用于执行如上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的音频信号处理方法。
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。
图1(a)示出根据本申请一实施例的应用场景的示意图。
图1(b)示出根据本申请一实施例的应用场景的示意图。
图2示出根据本申请一实施例的一种调节音频信号的示意图。
图3示出根据本申请一实施例的音频信号处理方法的流程图。
图4示出根据本申请一实施例的处理第一音频信号的流程图。
图5示出根据本申请一实施例的音频信号处理装置的结构图。
图6示出根据本申请一实施例的电子设备的结构图。
图7示出根据本申请一实施例的电子设备的结构图。
图8示出根据本申请一实施例的电子设备的结构图。
图9示出根据本申请一实施例的电子设备的结构图。
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
在例如车辆驾驶的场景中,存在着大量的音频使用场景,例如音乐播放、语音通话、导航提示、人机交互等,而环境噪声的大小会影响人们在音频使用场景下的听觉体验。为了获取较好的听觉体验,可以通过调节音量等手段以适应噪声环境,降低人们感知的噪声的能量,减少人们受到的噪声干扰。然而,频繁的人工调节音频音量会分散人们的注意力,造成安全隐患,也会影响驾驶体验。当前的方案中,通常是利用非声学测量值(如车速等)来处理音频信号,以减少噪声干扰,但这种情况下需要依赖大量的实验以标定非声学测量值与噪声的关系,且当外界环境变化时难以准确的确定噪声以对播放的音频信号进行调整,或者是利用采集的声学信号来模糊地估计噪声,但是这种情况下对于噪声的估计不够准确,从而使得用户获得听觉体验不佳。
为了解决上述技术问题,本申请提供了一种音频信号处理方法,本申请实施例的音频信号处理方法通过获取声音传感器采集的音频信号,并以对音频信号进行处理,确定音频信号中的噪声信号,由此可以利用声学测量值准确地估计出当前的噪声水平,避免了对非声学状态信息的依赖,且使得估计出的噪声更接近于实际噪声,通过根据该噪声信号和播放设备的原始音源,对原始音源进行调节,得到调节后的音频信号由播放设备播放,可以实现利用噪声信号对原始音源进行调节,以达到适应噪声环境的效果,可以使得调节后的音频信号对噪声的掩蔽效果更好,用户的听觉体验更佳,且在上述过程中利用声音传感器采集的音频信号,可以在多种场景下使用,更加灵活,支持快速部署。
图1(a)和图1(b)示出根据本申请一实施例的应用场景的示意图。如图1(a)和图1(b)所示,在一种可能的应用场景中,本申请实施例的音频信号处理方法可以用于在车辆中进行噪声掩蔽的场景中,通过调节播放的音频信号,以降低车辆中驾乘人员对噪声的感知。本申请实施例的音频信号处理系统可设置于车辆上,包括声音传感器、处理器和播放设备。
其中,声音传感器(可参见图1(a)中所示,例如是麦克风)可以设置于车辆内任意位置,例如可设置于车内驾乘人员的附近,用于采集车内的音频信号(参见图1(a)和图1(b),可称为第一音频信号)以确定车内用户感知的环境噪声。
其中,处理器可以作为车载计算单元内置于车辆上的车机(或是音频系统)中,例如是片上系统(system on chip,SoC)、数字信号处理(digital signal processor,DSP)芯片等。处理器可以根据声音传感器采集到的音频信号,确定车内用户感知的环境噪声对应的噪声信号。处理器还可以根据确定噪声信号和播放设备的原始音源,对原始音源(参见图1(a)和图1(b),可称为第二音频信号)进行调节,以确定调节后的音频信号(参见图1(a)和图1(b), 可称为第三音频信号)。该处理器也可以外置于云端服务器中。服务器和车辆可以通过无线连接的方式进行通信,例如可以通过2G/3G/4G/5G等移动通信技术,以及Wi-Fi、蓝牙、调频(frequency modulation,FM)、数传电台、卫星通信等无线通信方式进行通信。通过车辆和服务器之间的通信,服务器可以收集声音传感器采集到的音频信号进行计算,并将计算结果回传给对应的车辆。
其中,播放设备(可参见图1(b)中所示)可以设置于车内,可以包括扬声器等,可以用于对处理器调节后的音频信号进行播放。图2示出根据本申请一实施例的一种调节音频信号的示意图。如图2所示,例如在车辆内播放音乐的场景中,若车外噪声变大(例如经过拥堵路段),而播放设备仍播放的是未经调节的音乐,随着噪声增强,驾乘人员感知到的噪声也将变大,无疑会影响驾乘人员的听觉体验。而根据本申请实施例,通过利用声音传感器采集到的音频信号,对此时播放的音乐进行调节,由播放设备播放调节后的音频信号。可以使得驾乘人员听到的音乐例如由于音量的改变(如图中所示音量变大),从而能够可以掩蔽驾乘人员所感知到的噪声,即,由于对播放的音乐进行调节,使得调节后的音乐可以影响驾乘人员人耳对噪声的听闻效果,从而可以降低车辆中驾乘人员对噪声的感知,驾乘人员将“听不见”噪声,从而提高了车内播放音乐时其的听觉体验。
应理解,虽然图1(a)和图1(b)中仅示出了一个声音传感器、一个处理器和一个播放设备,音频信号处理系统中也可以包括其他数目的声音传感器、处理器和播放设备。
需要说明的是,本申请实施例的音频信号处理方法也可以用于除图1(a)、图1(b)和图2所示的车载场景以外的其他需要进行噪声掩蔽的场景中,例如用于手机、智能家居等具有音频交互功能,且具有麦克风的电子设备对应的使用场景中,本申请对此不作限制。
以下以车载场景下为例,在上述音频信号处理系统的基础上,对本申请实施例的音频信号处理方法进行详细的介绍:
图3示出根据本申请一实施例的音频信号处理方法的流程图。该方法可用于上述音频信号处理系统。如图3所示,该方法可包括:
步骤S301,获取声音传感器采集的第一音频信号。
声音传感器可参见图1(a)所示,第一音频信号可以是1帧或多帧,其中,多帧时第一音频信号可以是连续的N帧信号(N的大小可以预先设置),也可以是间隔的N帧信号(例如N帧包括每间隔1帧确定的信号)。包含声音传感器采集到的环境内声学信息,例如可以包括人声信息、谐波信息、突发声信息、回声信息、噪声信息等。
其中,人声信息可以包括声音传感器采集到的车内驾乘人员的说话声;谐波信息可以包括声音传感器采集到的车内驾乘人员说话声中的长元音以及喇叭声等;突发声信息可以包括声音传感器采集到的例如在开关门时发出的短时突发声;回声信息可以包括声音传感器采集到的由播放设备播放的声音,播放设备播放的例如可以是音乐、导航播报或其他语音播报的音频;噪声信息可以包括车内外的环境噪声。
通过对第一音频信号中除噪声信息外的其他信息进行去除,可以估计出车内用户感知的环境噪声强度,以更有针对性的对音频信号进行调节,以更好地实现对噪声的掩蔽,详细过程可参见下述。
步骤S302,对第一音频信号进行处理,确定第一音频信号中的第一噪声信号。
其中,上述处理包括对第一音频信号中的对应信息进行去除。为了确定第一噪声信号, 例如可以对第一音频信号中包括的人声信息、谐波信息、突发声信息、回声信息等进行去除。参见图4,示出根据本申请一实施例的处理第一音频信号的流程图。如图4所示,处理第一音频信号的过程可包括:
步骤S401,对第一音频信号中的人声信息进行处理。
人声信息可以对应于驾乘人员或者车外人员的说话声产生的音频信息。可以利用语音端点检测(voice activity detection,VAD)等方法进行处理。例如,通过VAD方法可以确定第一音频信号中是否包括人声信息,在确定第一音频信号中包括人声信息的情况下,可以根据前几帧(例如前3-5帧)的第一音频信号,利用平滑插值等方式去除当前第一音频信号中包括的人声信息,也可以利用其他方式对人声信息进行处理,本申请对此不作限制。
步骤S402,对第一音频信号中的谐波信息进行处理。
谐波信息可以对应于说话声中的长元音、喇叭声等产生的音频信息。可以利用长元音检测(long vowel detection,LVD)等方法进行处理,例如,LVD方法可以包括对第一音频信号的在频域上的能量峰值进行统计,以确定第一音频信号中是否包括谐波信息。
其中,在确定第一音频信号中包括上述人声信息和/或上述谐波信息的情况下,可以确定第一噪声信号为前1帧的第一噪声信号。
例如,在逐帧处理的场景下,可以直接以前1帧的第一噪声信号,作为当前1帧的第一噪声信号,由此,可以直接得到第一噪声信号,无需再进行去除其他信息的处理,节省了调节过程中的工作量,节约了成本。
步骤S403,对第一音频信号中对突发声信息进行处理。
突发声信息可以对应于车辆开关门等情况下发出等短时声产生的音频信息。可以利用最小值统计(minimum statistics,MS)等方法进行处理。例如,可以利用MS方法估计出第一音频信号中包括的突发声信息,并对估计出的突发声信息进行去除,还可以利用除MS以外的其他方法对第一音频信号中包括的突发声信息进行估计,以去除第一音频信号中包括的突发声信息,本申请对此不作限制。
本申请对于处理上述信息的顺序不作限制,示例性地,可以在处理完人声信息和谐波信息后再处理突发声信息,由此在处理突发声信息时还可以去除残留的人声或谐波信息。在对第一音频信号中包括的上述一种或几种信息进行去除后,可确定第一噪声信号。
通过上述过程,通过对获取的声音传感器采集的音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,可以更加准确地估计出当前的噪声水平,使得估计出的噪声更接近于实际噪声。由此,后续调节后的音频信号对噪声的掩蔽效果也可以更好,用户的听觉体验更佳,且可以在多种场景下使用,更加灵活,支持快速部署。
可选地,处理该第一音频信号的过程还可包括:
步骤S404,对第一音频信号中的回声信息进行处理。
回声信息可以对应于播放设备播放的音频产生的音频信息。对此,可以首先利用频域自适应滤波(frequency domain adaptive filter,FDAF)对方法对第一音频信号中的该回声信息进行去除,该方法可以是线性抑制方法,例如也可以是除FDAF以外的任意线性回声消除(line echo cancellation,LEC)方法。
由于利用线性抑制的方式去除回声信息可能会有残余的回声信息,由此估计的噪声值不够准确,会导致后续对第二音频信号的误调节,产生连锁反应,因此,可以在线性抑制的基 础上,还可利用残余回声抑制(residual echo suppression,RES)的方法对残余的回声信息进行去除,在此过程中可以用到当前帧第一音频信号之前的几帧(例如3-5帧)第一音频信号。
由于在利用FDAF和RES的方法在对回声信息去除的过程中,可能会产生频谱空洞,即可能对第一音频信号在频域上的部分频点存在过消除现象,考虑到噪声信号的频谱通常相对平滑,因此还可以利用频域平滑(frequency smoothing,FS)的方法对过频谱空洞部分进行补偿。
由此,可以对第一音频信号进行更有针对性的处理,且兼顾到了各种场景,由此可以更加精准的从第一音频信号中分离除噪声信号,使得噪声估计稳定,从而对第二音频信号进行调节后可以更好的掩蔽噪声信号,使得用户体验更佳。
需要说明的是,本申请对于对回声信息进行处理,以及对人声信息、谐波信息、突发声信息进行处理的顺序不作限制,即对于步骤S401-步骤S404之间的执行顺序不做限制。例如可以先对回声信息进行处理,再对人声信息、谐波信息、突发声信息中的一种或多种进行处理。
在对回声信息,以及对人声信息、谐波信息、突发声信息中的一种或多种进行去除后,得到的信号可以认为是第一噪声信号,即作为估计的车内用户感知的环境噪声,可以根据该第一噪声信号和播放设备的原始音源进行调节,以确定调节后的音频信号进行播放,使得该调节后的音频信号可以掩蔽车内用户感知的环境噪声,以达到对噪声掩蔽的效果,详细过程可以参见返回参见图3下述。
步骤S303,根据第一噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号。
其中,第二音频信号为播放设备的原始音源,调节可包括幅度调节。播放设备可参见图1(b)中所示,原始音源可以是音乐、导航声、语音通话声等,本申请对此不作限制。对第二音频信号进行幅度调节后得到的第三音频信号可以掩蔽环境噪声,即可以降低用户对于环境噪声的感知,从而达到对噪声掩蔽的效果。
由于第一噪声信号与用户人耳实际感知的噪声存在差距,为了使得调节后得到的第三音频信号能够更好地掩蔽噪声,可以对第一噪声信号进行处理,以更接近于实际用户感知的噪声大小,参见下述。
该步骤S303,可包括:
根据第一噪声信号和传递信息,确定第二噪声信号;
其中,第二噪声信号为估计的用户感知的噪声信号;
例如,可以根据传递信息,对第一噪声信号进行加权,以确定第二噪声信号,也可以是根据其他方法利用传递信息确定第二噪声信号,本申请对此不作限制。
其中,该传递信息可包括声音传感器至用户的人耳的传递信息,和/或,人耳中的传递信息。
例如,声音传感器至用户人耳的传递信息可以表示噪声从声音传感器处至用户人耳处(例如耳朵位置区域)的传递路径,可以通过确定声音传感器与用户人耳(也可以是人耳附近区域)的相对位置确定。人耳中的传递信息可以表示噪声在用户耳道(例如从外耳至中耳)中的传递路径,该传递信息例如可以通过外中耳衰减函数、或者A计权(A-Weighted)等方式确定,由此可以减少计算量,提升性能,第二噪声信号也可以通过其他方式确定,本申请对 此不作限制。
根据本申请实施例,可以更加真实的模拟噪声的传递路径,使得确定的第二噪声信号更接近于实际用户感知的噪声。
在确定第二噪声信号后,可以根据该第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过对第一噪声信号进行处理,可以得到更接近于用户实际感知噪声的第二噪声信号,从而得到的第三音频信号可以更好的掩蔽噪声,提升了用户的听觉体验。
例如,可以根据第二噪声信号和第二音频信号,对第二音频信号乘以增益,以得到第三音频信号,该增益可以是增益曲线或增益值,可参见下述。
其中,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,可包括:
根据第二噪声信号和第二音频信号,确定增益曲线;
其中,根据第二噪声信号和第二音频信号,可以是根据第二噪声信号和第二音频信号对噪声的掩蔽域,确定增益曲线,其中,掩蔽域可以指示第二音频信号在各频率上掩蔽的音量阈值,掩蔽域的获取方式可参见下文。例如,可以将第二噪声信号在频域上各频率对应的幅值分别与掩蔽域上对应频率的音量阈值相减,从而确定增益曲线。增益曲线可以表示频域上对应各频率上的幅值增益。
在确定增益曲线后,可以根据该增益曲线,对第二音频信号进行调节,得到第三音频信号。
例如,可以通过以增益曲线上各频率对应的值,与第二音频信号在频域上对应频率上的幅值相乘,确定第三音频信号在频域上对应各频率的幅值,从而确定第三音频信号。
根据本申请实施例,通过利用增益曲线,可以实现对第二音频信号进行调节,得到第三音频信号,从而实现第三音频信号掩蔽噪声的效果,保证了用户的听觉感受。
由于使用增益曲线对第二音频信号进行调节可能会使得调节后的音频信号有明显的调制感,甚至可能导致失真,因此,可以使用整体的增益值代替增益曲线,避免增益曲线中的部分奇异值对第二音频信号的调节产生过大的影响,以减少音频信号的调制感,参见下述:
根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,可包括:
根据第二噪声信号和第二音频信号,确定增益值;
其中,根据第二噪声信号和第二音频信号,可以是根据第二噪声信号和第二音频信号对噪声的掩蔽域,确定增益曲线,并根据增益曲线确定增益值。增益值例如可以是增益曲线上全部或部分值的均方根值,或者是加权平均值等,本申请对此不作限制。
该增益值可以是1个值,该1个增益值可以根据增益曲线上全部的频率对应的值确定,也可通过增益曲线上部分的频率(例如20个频率)对应的值确定。增益值也可以是多个值(例如2-5个值),该多个增益值例如可以根据增益曲线上高频、中频、低频的部分分别确定。
在确定增益值后,可以根据该增益值对第二音频信号进行调节,得到第三音频信号。
例如,可以通过增益值,与第二音频信号在频域上各频率上的幅值相乘,确定第三音频信号在频域上对应各频率的幅值,从而确定第三音频信号。在根据增益曲线上高频、中频、低频的部分分别确定对应的增益值的情况下,还可以以对应高频的增益值与第二音频信号在 频域上的对应高频部分的幅值相乘,以对应中频的增益值与第二音频信号在频域上的对应中频部分的幅值相乘,以对应低频的增益值与第二音频信号在频域上的对应低频部分的幅值相乘,从而确定第三音频信号。
根据本申请实施例,通过以增益值替代增益曲线,以对第二音频信号进行调节,可以使得得到的第三音频信号没有调制感,使得用户的听觉体验更佳。
为了模拟音频信号对噪声的掩蔽,在确定第二噪声信号的基础上,还可以获得第二音频信号对噪声的掩蔽域,以为上文中计算增益值或增益曲线确定基础,参见下述:
根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,可包括:
根据第二音频信号和心理声学信息,确定第二音频信号对噪声的掩蔽域;
其中,掩蔽域可以指示第二音频信号在各频率上掩蔽的音量阈值,各频率上音量低于该音量阈值的噪声可以被该第二音频信号掩蔽。例如,若第二音频信号在400Hz下的掩蔽阈值为30dBspl,则在30dBspl以下的噪声信号可以不为用户所感知,从而达到了掩蔽噪声的效果。
上述心理声学信息例如可以包括用户的听阈(threshold of hearing)、响度(loudness)、音高(pitch)、声掩蔽(masking)等信息,上述心理声学信息例如可以根据心理声学模型获得,例如是音频质量的感性评估(perceptual evaluation of audio Quality,PEAQ)模型、Johnston模型、Terhardt模型等,本申请对此不作限制,通过心理声学信息可以确定第二音频信号在不同频率上可掩蔽的音量阈值。
在确定掩蔽与后,可以根据第二噪声信号和该掩蔽域,对第二音频信号进行调节,得到第三音频信号。
例如,可参见上文,利用第二噪声信号和掩蔽域可得到增益值或增益曲线,从而可以对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过利用心理声学信息确定第二音频信号对噪声的掩蔽域,可以更有针对性地对第三音频信号进行调节,从而能获得更好的对噪声掩蔽的效果,保证用户的听觉感受。
由于人耳对于不同响度的敏感度存在差异,为了保证用户的听觉感受,还可以对步骤S303中确定的第三音频信号在频域上进行修正(例如是进行等响补偿),例如可以通过利用上述心理声学信息中的响度信息、听阈信息等对第三音频信号进行修正,其中响度信息可包括等响曲线(equal-loudness curves),也就是说,可以利用在听觉频率范围内用户人耳所感受到的响度相同时,不同纯音声压级与频率之间的关系,对第三音频信号在不同频率上进行补偿修正,以适应人耳对于不同响度的敏感度。
步骤S304,通过播放设备播放第三音频信号。
根据本申请实施例,通过获取声音传感器采集的音频信号,并对音频信号进行处理,以确定噪声信号,完全利用声音传感器采集的音频信号,避免了对非声学状态信息的依赖,且可以准确地估计出当前的噪声水平,使得估计出的噪声更接近于实际噪声,通过根据该噪声信号和播放设备的原始音源,对原始音源进行调节,得到调节后的音频信号由播放设备播放,可以实现利用噪声信号对原始音源进行调节,以达到适应噪声环境的效果,可以使得调节后的音频信号对噪声的掩蔽效果更好,用户的听觉体验更佳。
在进行音频信号处理的过程中,上述第一音频信号可包括采集的当前N帧第一音频信号, 上述第二音频信号可包括待调节的当前N帧第二音频信号,上述第三音频信号可包括当前N帧第三音频信号,N为正整数。
其中,N的大小可以是预先设置的,N帧信号可以是间隔的N帧(例如N帧包括每间隔1帧确定的信号),也可以是连续的N帧,本申请对此不作限制。
例如,N可以是1,从而可以进行逐帧处理,以动态地确定第三音频信号。
由此,可以根据实际情况灵活的对调节过程中的计算量进行调整,便于不同场景下的部署。
图5示出根据本申请一实施例的音频信号处理装置的结构图。如图5所示,该装置包括:
获取模块501,用于获取声音传感器采集的第一音频信号;
第一确定模块502,用于对第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定第一音频信号中的第一噪声信号;
第二确定模块503,用于根据第一噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,该第二音频信号为播放设备的原始音源,该调节包括幅度调节;
播放模块504,用于通过播放设备播放第三音频信号。
根据本申请实施例,通过获取声音传感器采集的音频信号,并对音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,以确定噪声信号,由此可以准确地估计出当前的噪声水平,使得估计出的噪声更接近于实际噪声,通过根据该噪声信号和播放设备的原始音源,对原始音源进行调节,得到调节后的音频信号由播放设备播放,可以实现利用噪声信号对原始音源进行调节,以达到适应噪声环境的效果,可以使得调节后的音频信号对噪声的掩蔽效果更好,用户的听觉体验更佳,且在上述过程中未利用到非声学测量值,完全利用声音传感器采集的音频信号,避免了对非声学状态信息的依赖,可以在多种场景下使用,更加灵活,支持快速部署。
可选地,该第一确定模块502,可用于:在确定第一音频信号中包括人声信息和/或谐波信息的情况下,确定第一噪声信号为前1帧的第一噪声信号。
根据本申请实施例,通过以前1帧的第一噪声信号作为确定的噪声信号,可以直接得到第一噪声信号,无需再进行去除其他信息的处理,节省了调节过程中的工作量,节约了成本。
可选地,该第一确定模块502,可用于:对第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种,以及回声信息进行处理,确定第一音频信号中的第一噪声信号。
根据本申请实施例,通过对回声信息进行处理,可以使得对第一音频信号的处理更有针对性,且兼顾到了各种场景,由此可以更加精准的从第一音频信号中分离除噪声信号,使得噪声估计稳定,从而对第二音频信号进行调节后可以更好的掩蔽噪声信号,使得用户体验更佳。
示例性地,该第二确定模块503,可用于:根据第一噪声信号和传递信息,确定第二噪声信号,该第二噪声信号为估计的用户感知的噪声信号;根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过对第一噪声信号进行处理,可以得到更接近于用户实际感知噪声的第二噪声信号,从而得到的第三音频信号可以更好的掩蔽噪声,提升了用户的听觉体验。
其中,该传递信息可包括声音传感器至用户的人耳的传递信息,和/或,人耳中的传递信息。
根据本申请实施例,可以更加真实的模拟噪声的传递路径,使得确定的第二噪声信号更接近于实际用户感知的噪声。
可选地,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,可包括:根据第二噪声信号和第二音频信号,确定增益曲线;根据该增益曲线,对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过利用增益曲线,可以实现对第二音频信号进行调节,得到第三音频信号,从而实现第三音频信号掩蔽噪声的效果,保证了用户的听觉感受。
可选地,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,可包括:根据第二噪声信号和第二音频信号,确定增益值;根据该增益值对第二音频信号进行调节,得到第三音频信号。
根据本申请实施例,通过以增益值替代增益曲线,以对第二音频信号进行调节,可以使得得到的第三音频信号没有调制感,使得用户的听觉体验更佳。
可选地,根据第二噪声信号和第二音频信号,对第二音频信号进行调节,得到第三音频信号,可包括:根据第二音频信号和心理声学信息,确定第二音频信号对噪声的掩蔽域,该掩蔽域指示第二音频信号在各频率上掩蔽的音量阈值,其中,各频率上音量低于音量阈值的噪声被第二音频信号掩蔽;根据第二噪声信号和掩蔽域,对第二音频信号进行调节,得第三音频信号。
根据本申请实施例,通过利用心理声学信息确定第二音频信号对噪声的掩蔽域,可以更有针对性地对第三音频信号进行调节,从而能获得更好的对噪声掩蔽的效果,保证用户的听觉感受。
其中,上述第一音频信号可包括采集的当前N帧第一音频信号,上述第二音频信号可包括待调节的当前N帧第二音频信号,上述第三音频信号可包括当前N帧第三音频信号,N为正整数。
根据本申请实施例,通过对使用的音频信号的帧数不作限制,可以实现根据实际情况灵活的对调节过程中的计算量进行调整,便于不同场景下的部署。
图6示出根据本申请一实施例的电子设备的结构图。该电子设备可以是终端,例如车或车机,也可以是终端内置的芯片,并且可以实现上述图3-图4所示音频信号处理方法的各步骤,或者实现上述图5中所示音频信号处理装置各模块的功能。如图6所示,电子设备600包括处理器601、以及与处理器耦合的接口电路602。应理解,虽然图6中仅示出了一个处理器和一个接口电路。电子设备600可以包括其他数目的处理器和接口电路。
其中,接口电路602用于与终端的其他组件连通,例如存储器或其他处理器。处理器601用于通过接口电路602与其他组件进行信号交互。接口电路602可以是处理器601的输入/输出接口。
其中,处理器601可以是车机等车载设备中的处理器,也可以是单独售卖的处理装置。
例如,处理器601通过接口电路602读取与之耦合的存储器中的计算机程序或指令,并译码和执行这些计算机程序或指令。当相应程序或指令被处理器601译码并执行时,可以使得电子设备600实现本申请实施例所提供的音频信号处理方法中的方案。
可选的,这些程序或指令存储在电子设备600外部的存储器中。当上述程序或指令被处理器601译码并执行时,存储器中临时存放上述程序或指令的部分或全部内容。
可选的,这些程序或指令存储在电子设备600内部的存储器中。当电子设备600内部的存储器中存储有程序或指令时,电子设备600可被设置在本申请实施例的终端中。
可选的,这些程序或指令的部分内容存储在电子设备600外部的存储器中,这些程序或指令的其他部分内容存储在电子设备600内部的存储器中。
图7示出根据本申请一实施例的电子设备的结构图。该电子设备可以是终端,例如车或车机,也可以是终端内置的芯片,并且实现上述图3-图4所示音频信号处理方法的各步骤,或者实现上述图5中所示音频信号处理装置各模块的功能。如图7所示,该电子设备700包括:处理器701,与处理器耦合的存储器702。应理解,虽然图7中仅示出了一个处理器和一个存储器。电子设备700可以包括其他数目的处理器和存储器。
其中,存储器702用于存储计算机程序或计算机指令。这些计算机程序或指令被处理器701执行时,可使得电子设备700实现本申请实施例的音频信号处理方法中的各步骤。
图8示出根据本申请一实施例的电子设备的结构图。如图8所示,该电子设备800可以是终端,例如车或车机,也可以是终端内置的芯片,并且可以实现上述图3-图4所示音频信号处理方法的各步骤,或者实现上述图5中所示音频信号处理装置各模块的功能。该电子设备800包括至少一个处理器1801,至少一个存储器1802、至少一个通信接口1803。此外,该电子设备还可以包括天线等通用部件,在此不再详述。
下面结合图8对电子设备800的各个构成部件进行具体的介绍。
处理器1801可以是通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。处理器1801可以包括一个或多个处理单元,例如:处理器1801可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
通信接口1803,用于与其他电子设备或通信网络通信,如以太网,无线接入网(RAN),核心网,无线局域网(Wireless Local Area Networks,WLAN)等。
存储器1802可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器1802用于存储执行以上方案的应用程序代码,并由处理器1801来控制执行。处理器1801用于执行存储器1802中存储的应用程序代码。
作为一个示例,结合图5所示的音频信号处理装置,上述图5中的获取模块501可以由图8中的通信接口1803来实现;上述图5中的第一确定模块502、第二确定模块503可以由图8中的处理器1801来实现。
图9示出根据本申请一实施例的电子设备的结构图。该电子设备可以是上述终端,例如车或车机,也可以是终端内置的芯片,并且可以执行上述图3-图4中任一项所示出的音频信号处理方法,或者实现上述图5中所示音频信号处理装置各模块的功能。该电子设备900包括声音传感器901,与声音传感器901耦合的处理单元902,以及与处理单元902耦合的扬声器903。应理解,虽然图9中仅示出了一个声音传感器、一个扬声器和一个处理单元。电子设备900可以包括其他数目的声音传感器、扬声器和处理单元。
其中,声音传感器901可包括可电容式传声器、动圈式传声器、激光传声器等。声音传感器901用于采集上述第一音频信号。处理单元902可用于对第一音频信号进行处理,确定该第一音频信号中的噪声信号,还可以根据该噪声信号和原始音源,对原始音源进行调节,得到调节后的音频信号。扬声器903可用于对调节后对音频信号进行播放,使得播放对音频能够对噪声的掩蔽效果更好,从而提升了用户对听觉体验。
应理解的是,本申请实施例中的电子设备可以由软件实现,例如,由上述计算机程序或指令来实现,相应计算机程序或指令可以存储在终端内部的存储器中,通过处理器读取该存储器内部的相应计算机程序或指令来实现上述功能。或者,本申请实施例中的电子设备还可以由硬件来实现。其中处理单元902为处理器。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Erasable PROM,EPROM)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网 络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,上述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看附图、公开内容、以及所附权利要求书,可理解并实现公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员 能理解本文披露的各实施例。
Claims (14)
- 一种音频信号处理方法,其特征在于,所述方法包括:获取声音传感器采集的第一音频信号;对所述第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定所述第一音频信号中的第一噪声信号;根据所述第一噪声信号和第二音频信号,对所述第二音频信号进行调节,得到第三音频信号,所述第二音频信号为播放设备的原始音源,所述调节包括幅度调节;通过所述播放设备播放所述第三音频信号。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一噪声信号和第二音频信号,对所述第二音频信号进行调节,得到第三音频信号,包括:根据所述第一噪声信号和传递信息,确定第二噪声信号,所述第二噪声信号为估计的用户感知的噪声信号;根据所述第二噪声信号和所述第二音频信号,对所述第二音频信号进行调节,得到所述第三音频信号。
- 根据权利要求2所述的方法,其特征在于,所述传递信息包括所述声音传感器至用户的人耳的传递信息,和/或,所述人耳中的传递信息。
- 根据权利要求2或3所述的方法,其特征在于,所述根据所述第二噪声信号和所述第二音频信号,对所述第二音频信号进行调节,得到所述第三音频信号,包括:根据所述第二噪声信号和所述第二音频信号,确定增益曲线;根据所述增益曲线,对所述第二音频信号进行调节,得到所述第三音频信号。
- 根据权利要求2-4任一项所述的方法,其特征在于,所述根据所述第二噪声信号和所述第二音频信号,对所述第二音频信号进行调节,得到所述第三音频信号,包括:根据所述第二噪声信号和所述第二音频信号,确定增益值;根据所述增益值对所述第二音频信号进行调节,得到所述第三音频信号。
- 根据权利要求2-5任一项所述的方法,其特征在于,所述根据所述第二噪声信号和所述第二音频信号,对所述第二音频信号进行调节,得到所述第三音频信号,包括:根据所述第二音频信号和心理声学信息,确定所述第二音频信号对噪声的掩蔽域,所述掩蔽域指示所述第二音频信号在各频率上掩蔽的音量阈值,其中,各频率上音量低于所述音 量阈值的噪声被所述第二音频信号掩蔽;根据所述第二噪声信号和所述掩蔽域,对所述第二音频信号进行调节,得到所述第三音频信号。
- 根据权利要求1-6任一项所述的方法,其特征在于,所述对所述第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定所述第一音频信号中的第一噪声信号,包括:对所述第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种,以及回声信息进行处理,确定所述第一音频信号中的第一噪声信号。
- 根据权利要求1-7任一项所述的方法,其特征在于,所述对所述第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定所述第一音频信号中的第一噪声信号,包括:在确定所述第一音频信号中包括所述人声信息和/或谐波信息的情况下,确定所述第一噪声信号为前1帧的第一噪声信号。
- 根据权利要求1-8任一项所述的方法,其特征在于,所述第一音频信号包括采集的当前N帧第一音频信号,所述第二音频信号包括待调节的当前N帧第二音频信号,所述第三音频信号包括当前N帧第三音频信号,N为正整数。
- 一种音频信号处理装置,其特征在于,所述装置包括:获取模块,用于获取声音传感器采集的第一音频信号;第一确定模块,用于对所述第一音频信号中包括的人声信息、谐波信息、突发声信息中的一种或多种进行处理,确定所述第一音频信号中的第一噪声信号;第二确定模块,用于根据所述第一噪声信号和第二音频信号,对所述第二音频信号进行调节,得到第三音频信号,所述第二音频信号为播放设备的原始音源,所述调节包括幅度调节;播放模块,用于通过所述播放设备播放所述第三音频信号。
- 一种音频信号处理装置,其特征在于,包括:处理器和存储器;所述存储器用于存储程序;所述处理器用于执行所述存储器所存储的程序,以使所述装置实现权利要求1-9中任意一项所述的方法。
- 一种计算机可读存储介质,其上存储有程序指令,其特征在于,所述程序指令当被计 算机执行时使得计算机实现权利要求1-9中任意一项所述的方法。
- 一种计算机程序产品,其包括有程序指令,其特征在于,所述程序指令当被计算机执行时使得计算机实现权利要求1-9中任意一项所述的方法。
- 一种车辆,其特征在于,所述车辆包括处理器,所述处理器用于执行如权利要求1-9中任意一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/093274 WO2023220918A1 (zh) | 2022-05-17 | 2022-05-17 | 一种音频信号处理方法、装置、存储介质和车辆 |
CN202280005472.9A CN117425812A (zh) | 2022-05-17 | 2022-05-17 | 一种音频信号处理方法、装置、存储介质和车辆 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/093274 WO2023220918A1 (zh) | 2022-05-17 | 2022-05-17 | 一种音频信号处理方法、装置、存储介质和车辆 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023220918A1 true WO2023220918A1 (zh) | 2023-11-23 |
Family
ID=88834402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/093274 WO2023220918A1 (zh) | 2022-05-17 | 2022-05-17 | 一种音频信号处理方法、装置、存储介质和车辆 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117425812A (zh) |
WO (1) | WO2023220918A1 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101989423A (zh) * | 2009-07-30 | 2011-03-23 | Nxp股份有限公司 | 利用感知掩蔽的有源降噪方法 |
US20160192071A1 (en) * | 2013-12-06 | 2016-06-30 | JVC Kenwood Corporation | Acoustic device, acoustic processing method, and acoustic processing program |
CN112306448A (zh) * | 2020-01-15 | 2021-02-02 | 北京字节跳动网络技术有限公司 | 根据环境噪声调节输出音频的方法、装置、设备和介质 |
CN112954115A (zh) * | 2021-03-16 | 2021-06-11 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种音量调节方法、装置、电子设备及存储介质 |
CN113160845A (zh) * | 2021-03-29 | 2021-07-23 | 南京理工大学 | 基于语音存在概率和听觉掩蔽效应的语音增强算法 |
-
2022
- 2022-05-17 CN CN202280005472.9A patent/CN117425812A/zh active Pending
- 2022-05-17 WO PCT/CN2022/093274 patent/WO2023220918A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101989423A (zh) * | 2009-07-30 | 2011-03-23 | Nxp股份有限公司 | 利用感知掩蔽的有源降噪方法 |
US20160192071A1 (en) * | 2013-12-06 | 2016-06-30 | JVC Kenwood Corporation | Acoustic device, acoustic processing method, and acoustic processing program |
CN112306448A (zh) * | 2020-01-15 | 2021-02-02 | 北京字节跳动网络技术有限公司 | 根据环境噪声调节输出音频的方法、装置、设备和介质 |
CN112954115A (zh) * | 2021-03-16 | 2021-06-11 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种音量调节方法、装置、电子设备及存储介质 |
CN113160845A (zh) * | 2021-03-29 | 2021-07-23 | 南京理工大学 | 基于语音存在概率和听觉掩蔽效应的语音增强算法 |
Also Published As
Publication number | Publication date |
---|---|
CN117425812A (zh) | 2024-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11710473B2 (en) | Method and device for acute sound detection and reproduction | |
US8315400B2 (en) | Method and device for acoustic management control of multiple microphones | |
CN106664473B (zh) | 信息处理装置、信息处理方法和程序 | |
US8081780B2 (en) | Method and device for acoustic management control of multiple microphones | |
US8170221B2 (en) | Audio enhancement system and method | |
TWI463817B (zh) | 可適性智慧雜訊抑制系統及方法 | |
EP1580882A1 (en) | Audio enhancement system and method | |
EP3605529B1 (en) | Method and apparatus for processing speech signal adaptive to noise environment | |
WO2008128173A1 (en) | Method and device for voice operated control | |
US20220122605A1 (en) | Method and device for voice operated control | |
US20240221769A1 (en) | Voice optimization in noisy environments | |
EP3830823A1 (en) | Forced gap insertion for pervasive listening | |
WO2023220918A1 (zh) | 一种音频信号处理方法、装置、存储介质和车辆 | |
EP4354898A1 (en) | Ear-mounted device and reproduction method | |
CN113259826B (zh) | 在电子终端中实现助听的方法和装置 | |
US10848859B1 (en) | Loudspeaker-induced noise mitigation | |
EP4258263A1 (en) | Apparatus and method for noise suppression | |
CN118660262A (zh) | 一种集成mems扬声器与麦克风的混合信号处理方法及系统 | |
JP2022535299A (ja) | 個人用のヒアリングデバイスにおける適応サウンドイコライゼーションのためのシステムおよび方法 | |
CN115668370A (zh) | 听力设备自带的语音检测器 | |
CN117392994A (zh) | 一种音频信号处理方法、装置、设备及存储介质 | |
CN118158594A (zh) | 音频处理方法、装置、音频播放设备以及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 202280005472.9 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22941960 Country of ref document: EP Kind code of ref document: A1 |