CN111863006A - Audio signal processing method, audio signal processing device and earphone - Google Patents

Audio signal processing method, audio signal processing device and earphone Download PDF

Info

Publication number
CN111863006A
CN111863006A CN201910363609.8A CN201910363609A CN111863006A CN 111863006 A CN111863006 A CN 111863006A CN 201910363609 A CN201910363609 A CN 201910363609A CN 111863006 A CN111863006 A CN 111863006A
Authority
CN
China
Prior art keywords
audio signal
frequency
frequency component
inner ear
power spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910363609.8A
Other languages
Chinese (zh)
Inventor
吴超
肖甫
仇存收
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910363609.8A priority Critical patent/CN111863006A/en
Publication of CN111863006A publication Critical patent/CN111863006A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Abstract

An audio signal processing method comprising: receiving a first inner ear audio signal and an external audio signal respectively; converting the first inner ear audio signal into a second inner ear audio signal; determining a first filter from the second inner ear audio signal; filtering the second inner ear audio signal by using a first filter to obtain a filtered signal; and synthesizing the first output audio signal according to the filtering signal and the external audio signal. The middle and high frequency speech of the inner ear audio signal can be enhanced by the first filter. And the attenuation degree of the middle-high frequency voice of the external audio signal is lower, and the audio signal synthesized by the filtering signal and the external audio signal has better middle-high frequency voice, so that the voice quality of the earphone transmitting end can be improved. The application also provides an audio signal processing device.

Description

Audio signal processing method, audio signal processing device and earphone
Technical Field
The present application relates to the field of audio signal processing, and in particular, to an audio signal processing method, an audio signal processing apparatus, and an earphone.
Background
The development of voice communication technology and the wide application of communication equipment greatly facilitate the daily life of people. In the process of carrying out voice communication by a user through equipment such as an earphone and the like, the quality of a voice signal is reduced due to environmental noise, so that the communication comfort level is influenced.
For the problem of environmental noise in headset communication, an existing speech enhancement technique is roughly as follows: the method comprises the steps of collecting sound of an ear canal by using an inner ear microphone, collecting environmental noise by using an external microphone, and eliminating a noise component in the sound of the ear canal by using the environmental noise collected by the external microphone as a reference signal, and reserving a voice component, thereby obtaining a transmitting end signal of the earphone.
However, the inner ear microphone not only attenuates the middle-high frequency noise, but also significantly attenuates the middle-high frequency voice, so that the generated transmitting end signal sounds stuffy and has a high distortion degree.
Disclosure of Invention
In view of the above, the present application provides an audio signal processing method, an audio signal processing apparatus and a headset. The audio signal processing method can improve the quality of medium-high frequency voice, thereby improving the voice quality in earphone communication.
A first aspect of the present application provides an audio signal processing method that is applicable to a headphone or an electronic device equipped with a headphone. The audio signal processing method includes: receiving a first inner ear audio signal and an external audio signal respectively; converting the first inner ear audio signal into a second inner ear audio signal; determining a first filter from the second inner ear audio signal; filtering the second inner ear audio signal by using the first filter to obtain a filtered signal; and synthesizing a first output audio signal according to the filtering signal and the external audio signal. Wherein harmonic components of the second inner ear audio signal are greater than harmonic components of the first inner ear audio signal. The first filter is a comb filter and a passband of the first filter includes harmonic frequencies of the second inner ear audio signal, a gain of the first filter within the passband being greater than 1.
In this implementation, the harmonic component of the second inner ear audio signal is greater than the harmonic component of the first inner ear audio signal. Since the passband of the comb filter includes the harmonic frequencies of the second inner-ear audio signal and the gain of the comb filter within the passband is greater than 1, the harmonic components of the second inner-ear audio signal can be enhanced by the comb filter. The harmonic components include mid-to-high frequency speech components, thus enhancing mid-to-high frequency speech of the second inner ear audio signal. And the attenuation degree of the medium-high frequency voice in the external audio signal is very low, and the audio signal synthesized by the filtering signal and the external audio signal has better medium-high frequency voice, so that the voice quality of the earphone transmitting end can be improved.
In one embodiment, said synthesizing a first output audio signal from said filtered signal and said external audio signal comprises: a first frequency component is extracted from the filtered signal, a second frequency component is extracted from the external audio signal, and a first output audio signal is synthesized from the first frequency component and the second frequency component. Wherein the first frequency component does not include a high frequency component. The second frequency component does not include a low frequency component. Specifically, the first frequency component is a low-frequency component, and the second frequency component is a high-frequency component; or, the first frequency component is a low-frequency component, and the second frequency component includes a high-frequency component and a medium-frequency component; or, the first frequency component includes a low frequency component and a middle frequency component, and the second frequency component is a high frequency component.
The filtering signal may be represented by a time domain signal or a frequency domain signal, and when the filtering signal is represented by a frequency domain signal, the filtering signal may be represented by a combination of a low frequency component of the filtering signal, an intermediate frequency component of the filtering signal, and a high frequency component of the filtering signal. Likewise, the external audio signal may be represented as a combination of a low frequency component of the external audio signal, a middle frequency component of the external audio signal, and a high frequency component of the external audio signal.
By this implementation, since the first inner ear audio signal and the second inner ear audio signal have good low-frequency voice quality, and the comb filter enhances the low-frequency component of the second inner ear audio signal, the quality of the low-frequency voice of the filtered signal is good. And the attenuation degree of the middle and high frequency voice of the external audio signal is very low, and after the low frequency component of the filtering signal and the middle and high frequency component in the external audio signal are synthesized, the synthesized first output audio signal integrates the advantages of the filtering signal and the external audio signal, and has good low frequency voice and middle and high frequency voice.
In another embodiment, said synthesizing a first output audio signal from said filtered signal and said external audio signal comprises: extracting a low frequency component and a third frequency component from the filtered signal, the third frequency component not including the low frequency component; extracting a fourth frequency component from the external audio signal, the fourth frequency component not including a low frequency component; performing weighting operation on the power spectrum of the third frequency component and the power spectrum of the fourth frequency component to obtain a target power spectrum; and synthesizing a first output audio signal according to the low-frequency component and the frequency component corresponding to the target power spectrum. By this, since the intermediate frequency component of the filtered signal is superior to the intermediate frequency component of the first inner ear audio signal, and the high frequency component of the filtered signal is superior to the high frequency component of the first inner ear audio signal, the degree of attenuation of both the intermediate frequency speech component and the high frequency speech component of the external audio signal is low, and thus the synthesized frequency component (i.e., the frequency component corresponding to the target power spectrum) includes a good intermediate and high frequency speech component. Also, the filtered signal has good low frequency speech components. In this way, the low-frequency component of the filtered signal is synthesized with the frequency component corresponding to the target power spectrum, and the synthesized first output audio signal has both good low-frequency speech components and good medium-high frequency speech components.
In another embodiment, after the synthesizing of the first output audio signal from the filtered signal and the external audio signal, the method further comprises: acquiring a first power spectrum corresponding to the first inner ear audio signal and a second power spectrum corresponding to the external audio signal, wherein the total number of frequency points of the first power spectrum and the total number of frequency points of the second power spectrum are both N; subtracting the power spectral density of the ith frequency point in the second power spectrum from the power spectral density of the ith frequency point in the first power spectrum to obtain the power spectral density difference of the ith frequency point, wherein i is a natural number not less than 1 and not more than N; comparing the power spectral density difference of the ith frequency point with a preset threshold value of the ith frequency point, determining the VAD value of voice activity detection of the ith frequency point according to the comparison result, and taking the VAD values of the N frequency points as a first group of VAD values; performing stationary noise estimation on the filtered signal to obtain a noise component of the filtered signal; acquiring a third power spectrum corresponding to the filtering signal and acquiring a fourth power spectrum corresponding to the noise component of the filtering signal, wherein the total number of frequency points in the third power spectrum and the total number of frequency points in the fourth power spectrum are both N; determining a second group of VAD values according to the third power spectrum and the fourth power spectrum, wherein the VAD value of the ith frequency point in the second group of VAD values is determined according to the power spectral density of the ith frequency point in the third power spectrum and the power spectral density of the ith frequency point in the fourth power spectrum; determining a third group of VAD values according to the first group of VAD values and the second group of VAD values, wherein the VAD value of the ith frequency point in the third group of VAD values is obtained by performing AND operation on the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values; filtering the external audio signal according to the first output audio signal and the third set of VAD values to obtain a noise component in the external audio signal; and filtering the first output audio signal according to the noise component in the external audio signal to obtain a second output audio signal.
Wherein the filtering the first output audio signal according to a noise component in the external audio signal comprises: filtering the first output audio signal by adaptive filtering, wiener filtering, or spectral subtraction according to a noise component in the external audio signal.
In another embodiment, said converting said first inner ear audio signal into a second inner ear audio signal comprises: performing half-wave rectification on the first inner ear audio signal; and carrying out weighting processing on the audio signal obtained by half-wave rectification and the first inner ear audio signal to obtain a second inner ear audio signal.
In another embodiment, the determining a first filter from the second inner ear audio signal comprises: carrying out fundamental frequency estimation on the power spectrum corresponding to the second inner ear audio signal to obtain a fundamental frequency; determining a maximum peak point from non-isolated peak points in a spectrum of the second inner ear audio signal; and when the difference value between the frequency of the maximum peak point and the harmonic frequency is smaller than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and a gain factor, wherein the harmonic frequency is larger than the fundamental frequency and is an integral multiple of the fundamental frequency, and the value of the gain factor is larger than 1.
A second aspect of the present application provides an audio signal processing apparatus having functions to implement the method described in any one of the embodiments of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
A third aspect of the present application provides an earphone comprising an inner ear microphone, an external microphone, and an audio processing unit; the inner ear microphone and the outer microphone are used for collecting audio signals; the audio processing unit is configured to enable the headset to perform the functions referred to in the above aspects, e.g. to transmit, receive or process audio signals and/or data referred to in the above-mentioned methods.
A fourth aspect of the present application provides an electronic device configured with a headset, comprising a headset, a processor, and a memory; the earphone comprises an inner ear microphone and an outer microphone; the inner ear microphone and the outer microphone are both used for collecting audio signals; the memory is used for storing program instructions and audio signals; the processor is configured to enable the electronic device to implement the functions referred to in the above aspects, e.g. to transmit, receive or process audio signals and/or data as referred to in the above methods. The memory may also be used to store other types of data that are received, transmitted, or processed by the processor.
A fifth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the first aspect or embodiments of the first aspect.
A sixth aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or embodiments of the first aspect.
Drawings
Fig. 1 is a schematic view of a headset according to the present application;
fig. 2 is a schematic structural diagram of an earphone according to an embodiment of the present application;
fig. 3 is another schematic structural diagram of the earphone according to the embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device configured with a headset in an embodiment of the present application;
FIG. 5 is a flowchart illustrating an audio signal processing method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an audio signal processing apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present application.
Detailed Description
The audio signal processing method provided by the application can be applied to an in-ear headphone, which comprises two earplugs, each of which comprises an inner-ear microphone and an outer microphone. The number of inner ear microphones may be one or more and the number of outer microphones may be one or more. The following describes in detail the in-ear headphone:
Referring to fig. 1, the present application provides an earphone 10 including an inner ear microphone 101 and an outer microphone 102. When the user wears the headset, the inner microphone 101 is located in the ear canal and the outer microphone 102 is located outside the ear canal. When a user makes a sound, the inner ear microphone 101 may convert the collected sound signal into an inner ear audio signal, and the external microphone 102 may convert the collected sound signal into an external audio signal. The inner ear audio signal includes a speech component and a noise component, and the external audio signal also includes a speech component and a noise component. The signal-to-noise ratio of the inner ear audio signal is higher than the signal-to-noise ratio of the outer audio signal.
The earphone can effectively isolate the auditory canal from the external environment, and can reduce medium-high frequency sound entering the auditory canal. Thus, the inner ear microphone reduces the medium-high frequency noise and the medium-high frequency voice, and causes the medium-high frequency voice included in the inner ear audio signal to be seriously attenuated. The external microphone directly picks up the sound, resulting in no significant attenuation of the middle and high frequency components of the external audio signal, but more noise components than in the inner ear audio signal.
In the prior art, a noise component in an inner ear audio signal is removed according to a noise component of an external audio signal, so that a speech component in the inner ear audio signal is obtained and is used as a transmitting end signal. However, since the middle and high frequency voice components included in the inner ear audio signal are severely attenuated, a relatively severe voice distortion is caused.
In order to solve the above problem, the audio signal processing method provided by the present application can enhance the medium-high frequency speech signal in the inner ear audio signal, thereby improving the quality of the output speech. The following describes headphones and electronic equipment to which the audio signal processing method is applied:
referring to fig. 2, one embodiment of the headset 10 of the present application includes: an inner ear microphone 201, an external microphone 202, an audio processing unit 203; the audio processing unit 203 is respectively connected with the inner ear microphone 201 and the external microphone 202;
an inner ear microphone 201 for receiving a first inner ear audio signal;
an external microphone 202 for receiving an external audio signal;
the audio processing unit 203 includes:
a signal processor 2031 configured to convert the first inner ear audio signal received by the inner ear microphone 201 into a second inner ear audio signal, a harmonic component of the second inner ear audio signal being larger than a harmonic component of the first inner ear audio signal; determining a first filter from the second inner ear audio signal, a passband of the first filter including harmonic frequencies of the second inner ear audio signal, a gain of the first filter within the passband being greater than 1;
a first filter 2032, configured to filter the second inner-ear audio signal converted from the first inner-ear audio signal by the signal processor 2031 to obtain a filtered signal;
A synthesizer 2033 for synthesizing a first output audio signal from the filtered signal filtered by the first filter 2032 and the external audio signal received by the external microphone 202.
In an alternative embodiment of the method of the invention,
the synthesizer 2033 is specifically configured to extract a first frequency component from the filtered signal obtained by filtering with the first filter 2032, where the first frequency component does not include a high frequency component; extracting a second frequency component from the external audio signal received by the external microphone 202, the second frequency component not including a low frequency component; synthesizing a first output audio signal from the first frequency component and the second frequency component.
Specifically, the first frequency component is a low-frequency component, and the second frequency component is a high-frequency component; alternatively, the first and second electrodes may be,
the first frequency component is a low-frequency component, and the second frequency component comprises a high-frequency component and a middle-frequency component; alternatively, the first and second electrodes may be,
the first frequency component includes a low frequency component and a middle frequency component, and the second frequency component is a high frequency component.
In a further alternative embodiment of the method,
the synthesizer 2033 is specifically configured to extract a low-frequency component and a third frequency component from the filtered signal obtained by filtering with the first filter 2032, where the third frequency component does not include the low-frequency component;
Extracting a fourth frequency component from the external audio signal received by the external microphone 202, the fourth frequency component not including a low frequency component;
performing weighting operation on the power spectrum corresponding to the third frequency component and the power spectrum corresponding to the fourth frequency component to obtain a target power spectrum; and synthesizing a first output audio signal according to the low-frequency component and the frequency component corresponding to the target power spectrum.
Referring to fig. 3, in another alternative embodiment, the audio processing unit 203 further includes a voice activity detection subunit 2034, a second filter 2035, and a third filter 2036;
the voice activity detection subunit 2034 is configured to obtain a first power spectrum corresponding to the first inner ear audio signal, and obtain a second power spectrum corresponding to the external audio signal, where a total number of frequency points in the first power spectrum and a total number of frequency points in the second power spectrum are both N;
subtracting the power spectral density of the ith frequency point in the second power spectrum from the power spectral density of the ith frequency point in the first power spectrum to obtain the power spectral density difference of the ith frequency point, wherein i is a natural number which is not less than 1 and not more than N;
Comparing the power spectral density difference of the ith frequency point with a preset threshold of the ith frequency point, determining a Voice Activity Detection (VAD) value of the ith frequency point according to a comparison result, and taking the VAD values of the N frequency points as a first group of VAD values;
performing stationary noise estimation on the filtering signal to obtain a noise component in the filtering signal;
acquiring a third power spectrum corresponding to the filtering signal and a fourth power spectrum corresponding to a noise component in the filtering signal, wherein the total number of frequency points in the third power spectrum and the total number of frequency points in the fourth power spectrum are both N;
determining a second group of VAD values according to the third power spectrum and the fourth power spectrum, wherein the VAD value of the ith frequency point in the second group of VAD values is determined according to the power spectral density of the ith frequency point in the third power spectrum and the power spectral density of the ith frequency point in the fourth power spectrum;
determining a third group of VAD values according to the first group of VAD values and the second group of VAD values, wherein the VAD value of the ith frequency point in the third group of VAD values is obtained by performing AND operation on the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values;
A second filter 2035, configured to, after the synthesizer 2033 synthesizes a first output audio signal according to the filtered signal and an external audio signal, filter the external audio signal according to the first output audio signal synthesized by the synthesizer 2033 and the third set of VAD values determined by the voice activity detection subunit 2034, so as to obtain a noise component in the external audio signal;
a third filter 2036, configured to filter the first output audio signal synthesized by the synthesizer 2033 according to the noise component in the external audio signal filtered by the second filter 2035, so as to obtain a second output audio signal.
In another alternative embodiment, the third filter 2036 is specifically configured to filter the first output audio signal by adaptive filtering, wiener filtering, or spectral subtraction based on a noise component in the external audio signal.
In a further alternative embodiment of the method,
the signal processor 2031 is specifically configured to perform half-wave rectification on the first inner ear audio signal; and carrying out weighting processing on the audio signal obtained by half-wave rectification and the first inner ear audio signal to obtain a second inner ear audio signal.
In a further alternative embodiment of the method,
the signal processor 2031 is specifically configured to perform fundamental frequency estimation on the power spectrum corresponding to the second inner ear audio signal to obtain a fundamental frequency; determining a maximum peak point from non-isolated peak points in a spectrum of the second inner ear audio signal; and when the difference value between the frequency of the maximum peak point and the harmonic frequency is smaller than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and a gain factor, wherein the harmonic frequency is larger than the fundamental frequency and is an integral multiple of the fundamental frequency, and the value of the gain factor is larger than 1.
The above-described headphones are provided with the audio processing unit 203, and for headphones not provided with an audio processing unit, the headphones may be connected with an electronic device to implement the audio signal processing method in the present application. In the following, an electronic device with a headset will be described, and the electronic device may be any terminal device such as a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a Point of Sales (POS), or a vehicle-mounted computer.
The following description will be made in detail by taking a mobile phone as an example. Referring to fig. 4, the mobile phone includes: an inner ear microphone 401, an external microphone 402, a speaker 403, an audio circuit 404, a wireless fidelity (WiFi) module 405, a Radio Frequency (RF) circuit 406, a power supply 407, a memory 408, an input unit 409, a display unit 410, a sensor 411, a processor 412, and the like. Those skilled in the art will appreciate that the handset configuration shown in fig. 4 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following describes each component of the mobile phone in detail with reference to fig. 4:
inner ear microphone 401, outer microphone 402, speaker 403, audio circuitry 404 may provide an audio interface between the user and the handset. The audio circuit 404 may transmit the electrical signal converted from the received audio data to the speaker 403, and convert the electrical signal into a sound signal by the speaker 403 for output; on the other hand, the inner ear microphone 401 and the outer microphone 402 convert the collected sound signals into electrical signals, which are received by the audio circuit 404 and converted into audio data, which are then output to the processor 412 for processing, and then via the RF circuit 406 to be sent to, for example, another cellular phone, or output to the memory 408 for further processing. Microphones are also known as microphones or microphones.
The processor 412 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 408 and calling data stored in the memory 408, thereby performing overall monitoring of the mobile phone. Alternatively, processor 412 may include one or more processing units; alternatively, the processor 412 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 412.
WiFi belongs to short-distance wireless transmission technology, and a mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through a WiFi module 405, and provides wireless broadband internet access for the user. Although fig. 4 shows the WiFi module 405, it is understood that it does not belong to the essential constitution of the handset, and may be omitted entirely as needed within the scope not changing the essence of the invention.
RF circuit 406 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for processing downlink information of a base station received by processor 412; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuitry 406 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 406 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.
The handset also includes a power supply 407 (e.g., a battery) for powering the various components, which may be logically coupled to the processor 412 via a power management system to manage charging, discharging, and power consumption via the power management system.
The memory 408 may be used to store software programs and modules, and the processor 412 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 412. The memory 408 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 408 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The input unit 409 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 409 may include a touch panel 4091 and other input devices 4092. The touch panel 4091, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by a user on or near the touch panel 4091 using any suitable object or attachment such as a finger, a stylus, etc.) and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 4091 may include two portions, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 412, and can receive and execute commands sent by the processor 412. In addition, the touch panel 4091 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 409 may include other input devices 4092 in addition to the touch panel 4091. In particular, other input devices 4092 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 410 may be used to display information input by or provided to the user and various menus of the cellular phone. The display unit 410 may include a display panel 4101, and optionally, the display panel 4101 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 4091 may overlay the display panel 4101, and when the touch panel 4091 detects a touch operation thereon or nearby, communicate to the processor 412 to determine the type of touch event, and the processor 412 then provides a corresponding visual output on the display panel 4101 according to the type of touch event. Although in fig. 4, the touch panel 4091 and the display panel 4101 are two separate components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 4091 and the display panel 4101 may be integrated to implement the input and output functions of the mobile phone.
The handset may also include at least one sensor 411, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 4101 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 4101 and/or backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.
It is noted that the processor 412 may invoke program instructions stored in the memory 408 for performing the audio signal processing methods of the present application.
Based on the above-described headphones or electronic devices, the audio signal processing method of the present application is described below. Referring to fig. 5, an embodiment of an audio signal processing method of the present application includes:
step 501, receiving a first inner ear audio signal and an external audio signal respectively.
In this embodiment, for a sound to be processed, an inner ear microphone is used to collect the sound to be processed to obtain a first inner ear audio signal, and an external microphone is used to collect the sound to be processed to obtain an external audio signal. The sound to be processed may be a voice signal sent by a user, for example, after the user wears an earphone, the user performs voice interaction using an electronic device connected to the earphone, when the user speaks, the inner ear microphone may collect a voice signal of the user, ambient noise, and the like, the audio signal collected by the inner ear microphone is a first inner ear audio signal, and the audio signal collected by the outer microphone is an outer audio signal.
Step 502, converting the first inner ear audio signal into a second inner ear audio signal.
In the time domain, the first inner ear audio signal may be denoted as s (t).
In an alternative embodiment, the first inner ear audio signal is half-wave rectified; and carrying out weighting processing on the audio signal obtained by half-wave rectification and the first inner ear audio signal to obtain a second inner ear audio signal.
The relationship between the half-wave rectified audio signal S' (t) and the first inner ear audio signal S (t) can be expressed by the following formula:
S'(t)=max(S(t),);
is the threshold to which s (t) is compared. Greater than or equal to 0. When the value is greater than 0, the value is a nonnegative number close to 0, and the specific value can be set according to practical application.
max (S (t),) indicates that the larger value is selected in the S (t) sum. When s (t) > max (s (t),) ═ s (t); when s (t) < max (s (t) >); when s (t) is equal, max (s (t)), (s (t)) is equal.
It should be noted that, in the time domain, the first inner ear audio signal includes a portion with a positive amplitude and a portion with a negative amplitude. When 0, after half-wave rectifying the first inner ear audio signal, a negative number in the first inner ear audio signal may be adjusted to 0. Thus, the half-wave rectified audio signal S' (t) is composed of a positive amplitude portion and a 0 amplitude portion. The half-wave rectified audio signal S' (t) is composed of a portion having a magnitude greater than or equal to that when a non-negative number close to 0.
After the audio signal S '(t) obtained through half-wave rectification is obtained, weighting processing is carried out on the audio signal S' (t) obtained through half-wave rectification and the first inner ear audio signal S (t) so as to obtain a second inner ear audio signal.
Optionally, in the frequency domain, the power spectrum S corresponding to the second inner ear audio signalNAn instantaneous power spectrum S of the first inner ear audio signal S (t) and a power spectrum S of the half-wave rectified audio signal S' (t)hThe following formula is satisfied:
|SN|2=α1|S|2+(1-α1)|Sh|2
α1for the smoothing factor, its value may be [0.8,0.99 ]]Any real number within.
Wherein S ishThe following formula is satisfied: shFFT (max (s (t)),) FFT () represents the fourier transform.
Step 503, determining a first filter according to the second inner ear audio signal.
Wherein the first filter is a comb filter and the passband of the first filter includes the harmonic frequencies of the second inner ear audio signal, the gain of the first filter within the passband being greater than 1.
And step 504, filtering the second inner ear audio signal by using the first filter to obtain a filtered signal.
Since the gain of the first filter within the pass band is greater than 1, the first filter can enhance the signal components of the pass band. Since the pass band includes harmonic frequencies, the first filter may enhance harmonic components in the inner ear audio signal, thereby increasing middle and high frequency sound components of the inner ear audio signal.
And 505, synthesizing a first output audio signal according to the filtering signal and the external audio signal.
In this embodiment, since the pass band of the comb filter includes harmonic frequencies and the gain of the comb filter within the pass band is greater than 1, the harmonic components of the second inner ear audio signal can be enhanced by the comb filter. The harmonic components include mid-to-high frequency speech components, thus enhancing mid-to-high frequency speech of the second inner ear audio signal. And the attenuation degree of the middle-high frequency voice of the external audio signal is lower, so that the first output audio signal synthesized by the filtering signal and the external audio signal has better middle-high frequency voice, and the voice quality of the earphone transmitting end can be improved.
In an alternative embodiment, step 503 comprises: carrying out fundamental frequency estimation on the power spectrum corresponding to the second inner ear audio signal to obtain a fundamental frequency; determining a maximum peak point from non-isolated peak points in a spectrum of the second inner ear audio signal; and when the difference value between the frequency of the maximum peak point and the harmonic frequency is smaller than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and a gain factor, wherein the harmonic frequency is larger than the fundamental frequency and is an integral multiple of the fundamental frequency, and the value of the gain factor is larger than 1.
Specifically, the estimating the fundamental frequency of the power spectrum corresponding to the second inner ear audio signal includes: determining Cepstrum according to the power spectrum corresponding to the second inner ear audio signal, and determining fundamental frequency f according to the Cepstrum0
Power spectrum S corresponding to the second inner ear audio signalNCepstrum and fundamental frequency f0The following formula is satisfied:
Cepstrum=IFFT{log|SN|2};
[value_pitch,pos_pitch]=max(Cepstrum(cepsMin:cesMax));
f0=sample_rate/(pos_pitch+cepsMin)。
IFFT { } denotes an inverse fourier transform.
log|SN|2Represents | SN|2The logarithm of (d).
cepsMin represents the lower limit of the section of the search cepstrum maximum, and cepsMax represents the upper limit of the section of the search cepstrum maximum. For example, at a sampling frequency of 16KHz (kilohertz), the value of cepsMin is 32 and the value of cepsMax is 228.
value _ pitch represents the maximum value of the cepstrum.
pos _ pitch represents the position index of the cepstrum.
sample _ rate represents the sampling frequency, e.g., 16 KHz. It will be appreciated that the sampling frequency is not limited to the above examples.
Determining a maximum peak point from non-isolated peak points in a spectrum of the second inner ear audio signal; and when the difference value between the frequency of the maximum peak point and the harmonic frequency is less than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and the gain factor. Or, determining a maximum peak point from non-isolated peak points in the power spectrum of the second inner ear audio signal; and when the difference value between the frequency of the maximum peak point and the harmonic frequency is less than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and the gain factor. The non-isolated peak point refers to a peak at which a plurality of peaks appear continuously on one frequency band. It will be understood that harmonic frequencies, i.e. the frequencies of the harmonic components. The difference between the frequency of the maximum peak point and the harmonic frequency may be taken as an absolute value.
When the difference between the frequency of the maximum peak point and the harmonic frequency is less than or equal to a preset difference, the frequency of the maximum peak point is very close to the harmonic frequency, the filter determined according to the frequency of the maximum peak point can pass through the harmonic component, and the attenuation degree of the filter to the harmonic component is lower. And when the difference value between the frequency of the maximum peak point and the harmonic frequency is equal to 0, the frequency of the maximum peak point is represented as the harmonic frequency, and the filter has the maximum gain on the harmonic component.
When the difference between the frequency of the maximum peak point and the harmonic frequency is greater than the preset difference, it indicates that the frequency of the maximum peak point is far from the harmonic frequency, so that the filter determined according to the frequency of the maximum peak point cannot pass through the harmonic component or cannot effectively gain the harmonic component.
According to the frequency f of the maximum peak pointcAnd the gain factor determines the first filter H (f)k) The formula of (a) is as follows:
when f isk∈[fc-f0/2,fc+f0/2]When the temperature of the water is higher than the set temperature,
Figure BDA0002047570990000091
when in use
Figure BDA0002047570990000092
When is, H (f)k)=A2
[fc-f0/2,fc+f0/2]Is represented by fcIs a center frequency and has a bandwidth value of f0The frequency band of (2).
fkIs a variable, fkMay be [ f ]c-f0/2,fc+f0/2]Any one of the frequencies.
When the frequency f of the maximum peak pointcWhen the difference from the harmonic frequency is less than or equal to a preset difference, [ f c-f0/2,fc+f0/2]Belonging to the passband of the comb filter. It will be appreciated that the maximum peak point satisfying the above conditions may be one or more, whereby the first filter may enhance one or more harmonic components.
When the frequency f of the maximum peak pointcWhen the difference from the harmonic frequency is greater than a preset difference, [ fc-f0/2,fc+f0/2]Not belonging to the passband of the comb filter, the first filter does not enhance the harmonic components of this band.
A1Denotes the comb filter is at fc-f0/2,fc+f0/2]The gain factor of (c). A. the1>1 represents a gain. It is understood that A may also be used1>0dB (decibel) represents the gain.
A2Representing the gain factor of the comb filter in frequency bands not belonging to the passband. 1>A2>0。A1,A2The value of (b) can be set according to actual conditions.
The parameters are adjusted for amplitude. The value of (b) can be set according to actual conditions.
Harmonic frequency fkGreater than the fundamental frequency f0And the harmonic frequency fkAt said fundamental frequency f0Integer multiples of.
The above provides a specific method of determining a comb filter from the second inner-ear audio signal. The comb filter is capable of enhancing harmonic components of the second inner ear audio signal, i.e. of enhancing medium and high frequency components of the second inner ear audio signal.
In another alternative embodiment, step 505 comprises: extracting a first frequency component from the filtered signal, the first frequency component not including a high frequency component; extracting a second frequency component from the external audio signal, the second frequency component not including a low frequency component; synthesizing a first output audio signal from the first frequency component and the second frequency component.
Specifically, the first frequency component is a low-frequency component, and the second frequency component is a high-frequency component; or, the first frequency component is a low-frequency component, and the second frequency component includes a high-frequency component and a medium-frequency component; or, the first frequency component includes a low frequency component and a middle frequency component, and the second frequency component is a high frequency component.
In this embodiment, the filtering signal may be represented by a time domain signal or a frequency domain signal, and when the filtering signal is represented by a frequency domain signal, the filtering signal may be represented by a combination of a low frequency component of the filtering signal, an intermediate frequency component of the filtering signal, and a high frequency component of the filtering signal. Likewise, the external audio signal may be represented as a combination of a low frequency component of the external audio signal, a middle frequency component of the external audio signal, and a high frequency component of the external audio signal. The low frequency, the medium frequency and the high frequency can be divided according to the following modes: the low frequency is a frequency band of [0,2000Hz ], the intermediate frequency is a frequency band of (2000Hz, 4000 Hz), and the high frequency is a frequency band of (4000Hz, 8000 Hz).
Both the first inner ear audio signal and the second inner ear audio signal have a good low frequency speech quality. And the comb filter filtering can enhance the low frequency component of the second inner ear audio signal, so that the quality of the low frequency speech of the filtered signal is good. In addition, the attenuation degree of the middle and high frequency voice of the external audio signal is very low, and after the low frequency component of the filtering signal and the middle and high frequency component in the external audio signal are synthesized, the synthesized first output audio signal integrates the advantages of the filtering signal and the external audio signal, and has good low frequency voice and middle and high frequency voice. The first output audio signal has a better medium-high frequency speech quality than the first inner ear audio signal. The low frequency speech quality of the first output audio signal is better compared to the external audio signal.
And, the noise component of the first output audio signal is lower than the noise component in the external audio signal, thereby achieving a noise reduction effect by the dual microphones. The first output audio signal may be used for voiceprint recognition or voice wake-up. According to the audio signal processing method, the awakening rate in a noise environment can be improved.
In another alternative embodiment, step 505 comprises: extracting a low frequency component and a third frequency component from the filtered signal, the third frequency component not including the low frequency component; extracting a fourth frequency component from the external audio signal, the fourth frequency component not including a low frequency component; performing weighting operation on the power spectrum of the third frequency component and the power spectrum of the fourth frequency component to obtain a target power spectrum; and synthesizing a first output audio signal according to the low-frequency component and the frequency component corresponding to the target power spectrum.
In this embodiment, the target power spectrum S' and the power spectrum | S of the third frequency componentfPower spectrum | S of | and fourth frequency componentsoL satisfies the following formula: s ═ α ″2|Sf|+(1-α2)|So|。
The spectrum corresponding to the target power spectrum S "satisfies the following formula: s1=(α2|Sf|+(1-α2)|So|)e
S1Represents the corresponding frequency spectrum, | S, of the target power spectrumfL represents a power spectrum of a third frequency component in the filtered signal; i So| represents a power spectrum of the fourth frequency component in the external audio signal.
α2Represents | Sf| weighting factor, α2Is at [0,1 ]]Any real number in (2).
(1-α2) Represents | S0A weighting factor of | is given. θ denotes a phase of the first inner ear audio signal or a phase of the external audio signal.
Specifically, the third frequency component may include an intermediate frequency component and a high frequency component, and the fourth frequency component includes an intermediate frequency component and a high frequency component. Alternatively, the third frequency component may include a high frequency component, and the fourth frequency component includes a high frequency component. Alternatively, the third frequency component may include an intermediate frequency component and the fourth frequency component includes an intermediate frequency component.
As can be seen from the fact that the third frequency component does not include a low frequency component and the fourth frequency component does not include a low frequency component, the frequency component corresponding to the target power spectrum obtained by performing weighting operation on the power spectrum corresponding to the third frequency component and the power spectrum corresponding to the fourth frequency component may be an intermediate frequency component or a high frequency component, or a combination of the intermediate frequency component and the high frequency component.
Since the intermediate frequency voice component of the filtered signal is superior to the intermediate frequency voice component of the first inner ear audio signal, the high frequency voice component of the filtered signal is superior to the high frequency voice component of the first inner ear audio signal, and the attenuation degrees of the intermediate frequency voice component and the high frequency voice component of the external audio signal are both small, the synthesized frequency component (i.e., the frequency component corresponding to the target power spectrum) includes a good intermediate and high frequency voice component. And the low-frequency voice quality in the filtering signal is good, the low-frequency component of the filtering signal and the frequency component corresponding to the target power spectrum are synthesized, and the synthesized first output audio signal has good low-frequency voice component and good medium-high frequency voice component.
And, the noise component of the first output audio signal is lower than the noise component in the external audio signal, thereby achieving a noise reduction effect by the dual microphones. The first output audio signal may be used for voiceprint recognition or voice wake-up. According to the audio signal processing method, the awakening rate in a noise environment can be improved.
A user a wearing headphones makes a sound and a user b beside the user a also makes a sound, so that in the audio signal collected by the in-ear microphone, the voice quality of the user a is better than that of the user b. In the audio signal collected by the external microphone, the voice quality of the user B is better than that of the user A. The audio signal processing method of the present application can reduce the voice of the user b, which is described in detail in the following embodiments.
In another optional embodiment, after step 505, the method further comprises:
acquiring a first power spectrum corresponding to the first inner ear audio signal and a second power spectrum corresponding to the external audio signal, wherein the total number of frequency points of the first power spectrum and the total number of frequency points of the second power spectrum are both N; subtracting the power spectral density of the ith frequency point in the second power spectrum from the power spectral density of the ith frequency point in the first power spectrum to obtain the power spectral density difference of the ith frequency point, wherein i is a natural number not less than 1 and not more than N; comparing the power spectral density difference of the ith frequency point with a preset threshold value of the ith frequency point, determining the VAD value of voice activity detection of the ith frequency point according to the comparison result, and taking the VAD values of the N frequency points as a first group of VAD values;
performing stationary noise estimation on the filtered signal to obtain a noise component of the filtered signal; acquiring a third power spectrum corresponding to the filtering signal and acquiring a fourth power spectrum corresponding to the noise component of the filtering signal, wherein the total number of frequency points in the third power spectrum and the total number of frequency points in the fourth power spectrum are both N; determining a second group of VAD values according to the third power spectrum and the fourth power spectrum, wherein the VAD value of the ith frequency point in the second group of VAD values is determined according to the power spectral density of the ith frequency point in the third power spectrum and the power spectral density of the ith frequency point in the fourth power spectrum;
Determining a third group of VAD values according to the first group of VAD values and the second group of VAD values, wherein the VAD value of the ith frequency point in the third group of VAD values is obtained by performing AND operation on the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values;
filtering the external audio signal according to the first output audio signal and the third set of VAD values to obtain a noise component in the external audio signal;
and filtering the first output audio signal according to the noise component in the external audio signal to obtain a second output audio signal.
In this embodiment, the first group of VAD values may be determined according to the power spectrum of the first inner ear audio signal (i.e., the first power spectrum), the power spectrum of the external audio signal (i.e., the second power spectrum), and the thresholds of the N frequency points. And for the first group of VAD values, if the power spectral density of the ith frequency point in the first power spectrum is greater than or equal to the sum of the power spectral density of the ith frequency point in the second power spectrum and a preset threshold value of the ith frequency point, determining that the VAD value of the ith frequency point is 1. And if the power spectral density of the ith frequency point in the first power spectrum is smaller than the sum of the power spectral density of the ith frequency point in the second power spectrum and a preset threshold value of the ith frequency point, determining the VAD value of the ith frequency point to be 0. It should be noted that the preset threshold values of the N frequency points may be the same. Or each frequency point has an independent threshold, and the values of the preset thresholds corresponding to the frequency points may be different. The preset threshold value can be set according to actual conditions. i is a variable, and the value of i can be any natural number from 1 to N.
By this implementation, if the power spectral density of the ith frequency point in the first power spectrum is greater than or equal to the sum of the power spectral density of the ith frequency point in the second power spectrum and the preset threshold of the ith frequency point, it indicates that at the ith frequency point, the voice component in the first inner ear audio signal greatly exceeds the voice component in the external audio signal, and at this time, the voice signal at the ith frequency point is likely to be the sound of a user wearing the earphone. If the power spectral density of the ith frequency point in the first power spectrum is smaller than the sum of the power spectral density of the ith frequency point in the second power spectrum and the preset threshold value of the ith frequency point, it is indicated that whether the voice signal of the ith frequency point is the sound of a user wearing the earphone or the sound of other users cannot be determined.
And performing stationary noise estimation on the filtering signal to obtain a noise component of the filtering signal. The noise component of the filtered signal is also referred to as the background noise of the filtered signal. In particular, according to the power spectrum S of the second inner ear audio signalNThe power spectrum S of the filtered signal can be calculatedN2
SNAnd SN2The following formula is satisfied: i SN2|2=|SN|2(β+H)。
Beta represents an adjustable parameter array. The value of each adjustable parameter in β is a small positive number, for example 0.5. The value of β can be, but is not limited to, the above examples.
H is H (f)k) The values at all bins constitute an array. β and H comprise the same number of values.
From the power spectrum of the filtered signal (i.e., the third power spectrum) and the power spectrum of the noise component of the filtered signal (i.e., the fourth power spectrum), a second set of VAD values may be determined. If the power spectral density of the ith frequency point in the third power spectrum is greater than m times of the power spectral density of the ith frequency point in the fourth power spectrum, the signal of the ith frequency point is possibly a voice signal, and the VAD value of the ith frequency point is determined to be 1. And if the power spectral density of the ith frequency point in the third power spectrum is less than or equal to m times of the power spectral density of the ith frequency point in the fourth power spectrum, the signal of the ith frequency point is indicated to be not a voice signal, and the VAD value of the ith frequency point is determined to be 0. Where m is a real number greater than 1, such as 3, 4, 5, 6, etc. It is to be understood that the value of m is not limited to the above examples.
And taking the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values, and operating to obtain the VAD value of the ith frequency point in the third group of VAD values. That is, when the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values are both 1, the VAD value of the ith frequency point in the third group of VAD values is equal to 1. And when at least one of the VAD value of the ith frequency point in the first group of VAD values or the VAD value of the ith frequency point in the second group of VAD values is 0, the VAD value of the ith frequency point in the third group of VAD values is equal to 0.
The first set of VAD values are the result of a first voice activity detection and the second set of VAD values are the result of a second voice activity detection. The third set of VAD values is more accurate because they satisfy both the first voice activity detection and the second voice activity detection. It should be noted that, the process of determining the first group of VAD values according to the first power spectrum and the second power spectrum, determining the second group of VAD values according to the third power spectrum and the fourth power spectrum, and determining the third group of VAD values according to the first group of VAD values and the second group of VAD values has no fixed sequence with step 505, the process of determining the first group of VAD values according to the first power spectrum and the second power spectrum, determining the second group of VAD values according to the third power spectrum and the fourth power spectrum, and determining the third group of VAD values according to the first group of VAD values and the second group of VAD values may also be performed before step 505.
Filtering the external audio signal according to the first output audio signal and the third set of VAD values. The filtering process is controlled according to a third group of VADs, whereby the speech component in the external audio signal is removed to obtain the noise component in the external audio signal. The filtering method may be, but is not limited to, adaptive filtering, wiener filtering, or spectral subtraction.
And filtering the first output audio signal according to the noise component in the external audio signal, thereby removing the noise component corresponding to the noise component in the external audio signal in the first output audio signal. The filtering method may be, but is not limited to, adaptive filtering, wiener filtering, or spectral subtraction. Therefore, not only coherent noise but also non-stationary noise and incoherent noise in the first output audio signal can be removed, and the output voice quality is improved. The second output audio signal has fewer noise components and thus the speech is clearer. Whereby the second output audio signal can be used for voice wake-up or voiceprint recognition and also as a speech end signal.
In the present application, the noise component is also referred to as a noise component, the speech component is also referred to as a speech component, the harmonic component is also referred to as a harmonic component, the low-frequency component is also referred to as a low-frequency component, the intermediate-frequency component is also referred to as an intermediate-frequency component, and the high-frequency component is also referred to as a high-frequency component.
Referring to fig. 6, the present application provides an audio signal processing apparatus capable of implementing the audio signal processing method in the embodiment shown in fig. 5. The audio signal processing means may be a headphone or an electronic device provided with a headphone, or an integral part of a headphone for performing the audio signal method.
The audio signal processing apparatus includes:
a first signal receiving module 601, configured to receive a first inner ear audio signal;
a second signal receiving module 602, configured to receive an external audio signal;
a signal processing module 603, configured to convert the first inner ear audio signal received by the first signal receiving module 601 into a second inner ear audio signal, where a harmonic component of the second inner ear audio signal is greater than a harmonic component of the first inner ear audio signal;
the signal processing module 603 is further configured to determine a first filter according to the second inner ear audio signal, where the first filter is a comb filter and a passband of the first filter includes a harmonic frequency of the second inner ear audio signal, and a gain of the first filter in the passband is greater than 1;
the first filter 604 is further configured to filter the second inner ear audio signal obtained by converting the first inner ear audio signal by the signal processing module 603 to obtain a filtered signal;
the signal synthesizing module 605 is further configured to synthesize a first output audio signal according to the filtered signal obtained by filtering with the first filter 604 and the external audio signal received by the second signal receiving module 602.
In an alternative embodiment of the method of the invention,
the signal synthesis module 605 is specifically configured to extract a first frequency component from the filtered signal obtained by filtering with the first filter 604, where the first frequency component does not include a high-frequency component; extracting a second frequency component from the external audio signal received by a second signal receiving module 602, the second frequency component not including a low frequency component; synthesizing a first output audio signal from the first frequency component and the second frequency component.
Optionally, the first frequency component is a low-frequency component, and the second frequency component is a high-frequency component; alternatively, the first and second electrodes may be,
the first frequency component is a low-frequency component, and the second frequency component comprises a high-frequency component and a middle-frequency component; alternatively, the first and second electrodes may be,
the first frequency component includes a low frequency component and a middle frequency component, and the second frequency component is a high frequency component.
In a further alternative embodiment of the method,
the signal synthesis module 605 is specifically configured to extract a low-frequency component and a third frequency component from the filtered signal obtained by filtering with the first filter 604, where the third frequency component does not include the low-frequency component;
extracting a fourth frequency component from the external audio signal received by the second signal receiving module 602, the fourth frequency component not including a low frequency component;
Performing weighting operation on the power spectrum corresponding to the third frequency component and the power spectrum corresponding to the fourth frequency component to obtain a target power spectrum; and synthesizing a first output audio signal according to the low-frequency component and the frequency component corresponding to the target power spectrum.
Referring to fig. 7, in another alternative embodiment, the audio signal processing apparatus further includes:
a voice activity detection module 701, configured to obtain a first power spectrum corresponding to a first inner ear audio signal received by the first signal receiving module 601, and obtain a second power spectrum corresponding to an external audio signal received by the second signal receiving module 602, where a total number of frequency points in the first power spectrum and a total number of frequency points in the second power spectrum are both N;
subtracting the power spectral density of the ith frequency point in the second power spectrum from the power spectral density of the ith frequency point in the first power spectrum to obtain the power spectral density difference of the ith frequency point, wherein i is a natural number which is not less than 1 and not more than N;
comparing the power spectral density difference of the ith frequency point with a preset threshold value of the ith frequency point, determining a voice activity detection VAD value of the ith frequency point according to a comparison result, and taking the VAD values of the N frequency points as a first group of VAD values;
Performing stationary noise estimation on the filtered signal obtained by filtering with the first filter 604 to obtain a noise component in the filtered signal;
acquiring a third power spectrum corresponding to the filtering signal and a fourth power spectrum corresponding to a noise component in the filtering signal, wherein the total number of frequency points in the third power spectrum and the total number of frequency points in the fourth power spectrum are both N;
determining a second group of VAD values according to the third power spectrum and the fourth power spectrum, wherein the VAD value of the ith frequency point in the second group of VAD values is obtained by comparing the power spectral density of the ith frequency point in the third power spectrum with the power spectral density of the ith frequency point in the fourth power spectrum;
determining a third group of VAD values according to the first group of VAD values and the second group of VAD values, wherein the VAD value of the ith frequency point in the third group of VAD values is obtained by performing AND operation on the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values;
a second filter 702, configured to, after the signal synthesis module 605 synthesizes a first output audio signal according to the filtered signal and the external audio signal, filter the external audio signal according to the first output audio signal synthesized by the signal synthesis module 605 and the third set of VAD values determined by the voice activity detection module 701, so as to obtain a noise component in the external audio signal;
The third filter 703 is configured to filter the first output audio signal synthesized by the signal synthesis module 605 according to the noise component in the external audio signal obtained by filtering with the second filter 702, so as to obtain a second output audio signal.
Specifically, the third filter 703 is specifically configured to filter the first output audio signal by using adaptive filtering, wiener filtering, or spectral subtraction according to the noise component in the external audio signal, so as to obtain a second output audio signal.
In another optional embodiment, the signal processing module 603 is specifically configured to perform half-wave rectification on the first inner ear audio signal; and carrying out weighting processing on the audio signal obtained by half-wave rectification and the first inner ear audio signal to obtain a second inner ear audio signal.
In a further alternative embodiment of the method,
the signal processing module 603 is specifically configured to perform fundamental frequency estimation on the power spectrum corresponding to the second inner ear audio signal to obtain a fundamental frequency;
determining a maximum peak point from non-isolated peak points in a spectrum of the second inner ear audio signal;
and when the difference value between the frequency of the maximum peak point and the harmonic frequency is smaller than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and a gain factor, wherein the harmonic frequency is larger than the fundamental frequency and is an integral multiple of the fundamental frequency, and the value of the gain factor is larger than 1.
The present application provides a computer-readable storage medium comprising instructions which, when run on a computer, cause the computer to perform a method as in any one of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present disclosure as defined by the appended claims.

Claims (23)

1. An audio signal processing method, comprising:
receiving a first inner ear audio signal and an external audio signal respectively;
converting the first inner ear audio signal to a second inner ear audio signal having a harmonic component greater than a harmonic component of the first inner ear audio signal;
determining a first filter from the second inner ear audio signal, the first filter being a comb filter and a passband of the first filter including harmonic frequencies of the second inner ear audio signal, a gain of the first filter within the passband being greater than 1;
filtering the second inner ear audio signal by using the first filter to obtain a filtered signal;
And synthesizing a first output audio signal according to the filtering signal and the external audio signal.
2. The method of claim 1, wherein synthesizing a first output audio signal from the filtered signal and the external audio signal comprises:
extracting a first frequency component from the filtered signal, the first frequency component not including a high frequency component;
extracting a second frequency component from the external audio signal, the second frequency component not including a low frequency component;
synthesizing a first output audio signal from the first frequency component and the second frequency component.
3. The method of claim 2,
the first frequency component is a low-frequency component, and the second frequency component is a high-frequency component; alternatively, the first and second electrodes may be,
the first frequency component is a low-frequency component, and the second frequency component comprises a high-frequency component and a middle-frequency component; alternatively, the first and second electrodes may be,
the first frequency component includes a low frequency component and a middle frequency component, and the second frequency component is a high frequency component.
4. The method of claim 1, wherein synthesizing a first output audio signal from the filtered signal and the external audio signal comprises:
Extracting a low frequency component and a third frequency component from the filtered signal, the third frequency component excluding the low frequency component;
extracting a fourth frequency component from the external audio signal, the fourth frequency component not including a low frequency component;
performing weighting operation on the power spectrum of the third frequency component and the power spectrum of the fourth frequency component to obtain a target power spectrum;
and synthesizing a first output audio signal according to the low-frequency component and the frequency component corresponding to the target power spectrum.
5. The method according to any of claims 1-4, wherein after said synthesizing a first output audio signal from said filtered signal and said external audio signal, the method further comprises:
acquiring a first power spectrum corresponding to the first inner ear audio signal and a second power spectrum corresponding to the external audio signal, wherein the total number of frequency points of the first power spectrum and the total number of frequency points of the second power spectrum are both N;
subtracting the power spectral density of the ith frequency point in the second power spectrum from the power spectral density of the ith frequency point in the first power spectrum to obtain the power spectral density difference of the ith frequency point, wherein i is a natural number not less than 1 and not more than N;
Comparing the power spectral density difference of the ith frequency point with a preset threshold value of the ith frequency point, determining the VAD value of voice activity detection of the ith frequency point according to the comparison result, and taking the VAD values of the N frequency points as a first group of VAD values;
performing stationary noise estimation on the filtered signal to obtain a noise component of the filtered signal;
acquiring a third power spectrum corresponding to the filtering signal and acquiring a fourth power spectrum corresponding to the noise component of the filtering signal, wherein the total number of frequency points in the third power spectrum and the total number of frequency points in the fourth power spectrum are both N;
determining a second group of VAD values according to the third power spectrum and the fourth power spectrum, wherein the VAD value of the ith frequency point in the second group of VAD values is determined according to the power spectral density of the ith frequency point in the third power spectrum and the power spectral density of the ith frequency point in the fourth power spectrum;
determining a third group of VAD values according to the first group of VAD values and the second group of VAD values, wherein the VAD value of the ith frequency point in the third group of VAD values is obtained by performing AND operation on the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values;
Filtering the external audio signal according to the first output audio signal and the third set of VAD values to obtain a noise component in the external audio signal;
and filtering the first output audio signal according to the noise component in the external audio signal to obtain a second output audio signal.
6. The method of any of claims 1-4, wherein said converting the first inner ear audio signal to a second inner ear audio signal comprises:
performing half-wave rectification on the first inner ear audio signal;
and carrying out weighting processing on the audio signal obtained by half-wave rectification and the first inner ear audio signal to obtain a second inner ear audio signal.
7. The method of any of claims 1-4, wherein determining a first filter from the second inner ear audio signal comprises:
carrying out fundamental frequency estimation on the power spectrum corresponding to the second inner ear audio signal to obtain a fundamental frequency;
determining a maximum peak point from non-isolated peak points in a spectrum of the second inner ear audio signal;
and when the difference value between the frequency of the maximum peak point and the harmonic frequency is smaller than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and a gain factor, wherein the harmonic frequency is larger than the fundamental frequency and is an integral multiple of the fundamental frequency, and the value of the gain factor is larger than 1.
8. An audio signal processing apparatus, comprising:
the first signal receiving module is used for receiving a first inner ear audio signal;
the second signal receiving module is used for receiving an external audio signal;
a signal processing module, configured to convert the first inner ear audio signal into a second inner ear audio signal, where a harmonic component of the second inner ear audio signal is greater than a harmonic component of the first inner ear audio signal;
the signal processing module is further configured to determine a first filter according to the second inner ear audio signal, where the first filter is a comb filter and a passband of the first filter includes a harmonic frequency of the second inner ear audio signal, and a gain of the first filter in the passband is greater than 1;
the first filter is used for filtering the second inner ear audio signal to obtain a filtered signal;
and the signal synthesis module is used for synthesizing a first output audio signal according to the filtering signal and the external audio signal.
9. The audio signal processing apparatus of claim 8,
the signal synthesis module is specifically configured to extract a first frequency component from the filtered signal, where the first frequency component does not include a high-frequency component; extracting a second frequency component from the external audio signal, the second frequency component not including a low frequency component; synthesizing a first output audio signal from the first frequency component and the second frequency component.
10. The audio signal processing apparatus of claim 9,
the first frequency component is a low-frequency component, and the second frequency component is a high-frequency component; alternatively, the first and second electrodes may be,
the first frequency component is a low-frequency component, and the second frequency component comprises a high-frequency component and a middle-frequency component; alternatively, the first and second electrodes may be,
the first frequency component includes a low frequency component and a middle frequency component, and the second frequency component is a high frequency component.
11. The audio signal processing apparatus of claim 8,
the signal synthesis module is specifically configured to extract a low-frequency component and a third frequency component from the filtered signal, where the fourth frequency component does not include the low-frequency component;
extracting a fourth frequency component from the external audio signal, the fourth frequency component not including a low frequency component;
performing weighting operation on the power spectrum corresponding to the third frequency component and the power spectrum corresponding to the fourth frequency component to obtain a target power spectrum; and synthesizing a first output audio signal according to the low-frequency component and the frequency component corresponding to the target power spectrum.
12. The audio signal processing apparatus according to any one of claims 8 to 11, characterized in that the audio signal processing apparatus further comprises:
The voice activity detection module is used for acquiring a first power spectrum corresponding to the first inner ear audio signal and acquiring a second power spectrum corresponding to the external audio signal, wherein the total number of frequency points in the first power spectrum and the total number of frequency points in the second power spectrum are both N;
subtracting the power spectral density of the ith frequency point in the second power spectrum from the power spectral density of the ith frequency point in the first power spectrum to obtain the power spectral density difference of the ith frequency point, wherein i is a natural number which is not less than 1 and not more than N;
comparing the power spectral density difference of the ith frequency point with a preset threshold value of the ith frequency point, determining a voice activity detection VAD value of the ith frequency point according to a comparison result, and taking the VAD values of the N frequency points as a first group of VAD values;
performing stationary noise estimation on the filtering signal to obtain a noise component in the filtering signal;
acquiring a third power spectrum corresponding to the filtering signal and a fourth power spectrum corresponding to a noise component in the filtering signal, wherein the total number of frequency points in the third power spectrum and the total number of frequency points in the fourth power spectrum are both N;
determining a second group of VAD values according to the third power spectrum and the fourth power spectrum, wherein the VAD value of the ith frequency point in the second group of VAD values is obtained by comparing the power spectral density of the ith frequency point in the third power spectrum with the power spectral density of the ith frequency point in the fourth power spectrum;
Determining a third group of VAD values according to the first group of VAD values and the second group of VAD values, wherein the VAD value of the ith frequency point in the third group of VAD values is obtained by performing AND operation on the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values;
a second filter, configured to filter the external audio signal according to the first output audio signal and the third set of VAD values after the signal synthesis module synthesizes a first output audio signal according to the filtered signal and the external audio signal, so as to obtain a noise component in the external audio signal;
and the third filter is used for filtering the first output audio signal according to the noise component in the external audio signal to obtain a second output audio signal.
13. The audio signal processing apparatus according to any one of claims 8 to 11,
the signal processing module is specifically configured to perform half-wave rectification on the first inner ear audio signal; and carrying out weighting processing on the audio signal obtained by half-wave rectification and the first inner ear audio signal to obtain a second inner ear audio signal.
14. The audio signal processing apparatus according to any one of claims 8 to 11,
The signal processing module is further configured to perform fundamental frequency estimation on the power spectrum corresponding to the second inner ear audio signal to obtain a fundamental frequency;
determining a maximum peak point from non-isolated peak points in a spectrum of the second inner ear audio signal;
and when the difference value between the frequency of the maximum peak point and the harmonic frequency is smaller than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and a gain factor, wherein the harmonic frequency is larger than the fundamental frequency and is an integral multiple of the fundamental frequency, and the value of the gain factor is larger than 1.
15. An earphone, characterized in that the earphone comprises an inner ear microphone, an external microphone and an audio processing unit, the audio processing unit comprising a signal processor, a first filter and a synthesizer;
the inner ear microphone is used for receiving a first inner ear audio signal;
the external microphone is used for receiving an external audio signal;
the signal processor is configured to convert the first inner ear audio signal into a second inner ear audio signal, a harmonic component of the second inner ear audio signal being greater than a harmonic component of the first inner ear audio signal; determining a first filter from the second inner ear audio signal, the first filter being a comb filter and a passband of the first filter including harmonic frequencies of the second inner ear audio signal, a gain of the first filter within the passband being greater than 1;
The first filter is used for filtering the second inner ear audio signal to obtain a filtered signal;
and the synthesizer is used for synthesizing a first output audio signal according to the filtering signal and the external audio signal.
16. The headset of claim 15,
the synthesizer is specifically configured to extract a first frequency component from the filtered signal, where the first frequency component does not include a high frequency component; extracting a second frequency component from the external audio signal, the second frequency component not including a low frequency component; synthesizing a first output audio signal from the first frequency component and the second frequency component.
17. The headset of claim 16,
the first frequency component is a low-frequency component, and the second frequency component is a high-frequency component; alternatively, the first and second electrodes may be,
the first frequency component is a low-frequency component, and the second frequency component comprises a high-frequency component and a middle-frequency component; alternatively, the first and second electrodes may be,
the first frequency component includes a low frequency component and a middle frequency component, and the second frequency component is a high frequency component.
18. The headset of claim 15,
the synthesizer is specifically configured to extract a low-frequency component and a third frequency component from the filtered signal, where the third frequency component does not include the low-frequency component;
Extracting a fourth frequency component from the external audio signal, the fourth frequency component not including a low frequency component;
performing weighting operation on the power spectrum corresponding to the third frequency component and the power spectrum corresponding to the fourth frequency component to obtain a target power spectrum; and synthesizing a first output audio signal according to the low-frequency component and the frequency component corresponding to the target power spectrum.
19. The headphones of any of claims 15-18, wherein the audio processing unit further comprises:
the voice activity detection subunit is configured to acquire a first power spectrum corresponding to the first inner ear audio signal and acquire a second power spectrum corresponding to the external audio signal, where a total number of frequency points in the first power spectrum and a total number of frequency points in the second power spectrum are both N; subtracting the power spectral density of the ith frequency point in the second power spectrum from the power spectral density of the ith frequency point in the first power spectrum to obtain the power spectral density difference of the ith frequency point, wherein i is a natural number which is not less than 1 and not more than N; comparing the power spectral density difference of the ith frequency point with a preset threshold value of the ith frequency point, determining a voice activity detection VAD value of the ith frequency point according to a comparison result, and taking the VAD values of the N frequency points as a first group of VAD values; performing stationary noise estimation on the filtering signal to obtain a noise component in the filtering signal; acquiring a third power spectrum corresponding to the filtering signal and a fourth power spectrum corresponding to a noise component in the filtering signal, wherein the total number of frequency points in the third power spectrum and the total number of frequency points in the fourth power spectrum are both N; determining a second group of VAD values according to the third power spectrum and the fourth power spectrum, wherein the VAD value of the ith frequency point in the second group of VAD values is determined according to the power spectral density of the ith frequency point in the third power spectrum and the power spectral density of the ith frequency point in the fourth power spectrum; determining a third group of VAD values according to the first group of VAD values and the second group of VAD values, wherein the VAD value of the ith frequency point in the third group of VAD values is obtained by performing AND operation on the VAD value of the ith frequency point in the first group of VAD values and the VAD value of the ith frequency point in the second group of VAD values;
A second filter, configured to filter the external audio signal according to the first output audio signal synthesized by the synthesizer and the third set of VAD values after the synthesizer synthesizes the first output audio signal according to the filtered signal and the external audio signal, so as to obtain a noise component in the external audio signal;
and the third filter is used for filtering the first output audio signal according to the noise component in the external audio signal to obtain a second output audio signal.
20. The headset of any one of claims 15 to 18,
the signal processor is specifically configured to perform half-wave rectification on the first inner ear audio signal; and carrying out weighting processing on the audio signal obtained by half-wave rectification and the first inner ear audio signal to obtain a second inner ear audio signal.
21. The headset of any one of claims 15 to 18,
the signal processor is specifically configured to perform fundamental frequency estimation on a power spectrum corresponding to the second inner ear audio signal to obtain a fundamental frequency; determining a maximum peak point from non-isolated peak points in a spectrum of the second inner ear audio signal; and when the difference value between the frequency of the maximum peak point and the harmonic frequency is smaller than or equal to a preset difference value, determining a first filter according to the frequency of the maximum peak point and a gain factor, wherein the harmonic frequency is larger than the fundamental frequency and is an integral multiple of the fundamental frequency, and the value of the gain factor is larger than 1.
22. An electronic device, comprising:
a headset, a processor and a memory;
the headset comprises an inner ear microphone and an outer microphone, wherein the inner ear microphone is used for receiving a first inner ear audio signal, and the outer microphone is used for receiving an outer audio signal;
the memory for storing program instructions and audio signals;
the processor is configured to perform the following operations in accordance with program instructions stored by the memory:
converting the first inner ear audio signal to a second inner ear audio signal having a harmonic component greater than a harmonic component of the first inner ear audio signal; determining a first filter from the second inner ear audio signal, the first filter being a comb filter and a passband of the first filter including harmonic frequencies of the second inner ear audio signal, a gain of the first filter within the passband being greater than 1; filtering the second inner ear audio signal to obtain a filtered signal;
and synthesizing a first output audio signal according to the filtering signal and the external audio signal.
23. A computer storage medium comprising instructions that, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7.
CN201910363609.8A 2019-04-30 2019-04-30 Audio signal processing method, audio signal processing device and earphone Pending CN111863006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910363609.8A CN111863006A (en) 2019-04-30 2019-04-30 Audio signal processing method, audio signal processing device and earphone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910363609.8A CN111863006A (en) 2019-04-30 2019-04-30 Audio signal processing method, audio signal processing device and earphone

Publications (1)

Publication Number Publication Date
CN111863006A true CN111863006A (en) 2020-10-30

Family

ID=72965043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910363609.8A Pending CN111863006A (en) 2019-04-30 2019-04-30 Audio signal processing method, audio signal processing device and earphone

Country Status (1)

Country Link
CN (1) CN111863006A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634910A (en) * 2021-01-05 2021-04-09 三星电子(中国)研发中心 Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08305396A (en) * 1995-05-09 1996-11-22 Matsushita Electric Ind Co Ltd Device and method for expanding voice band
EP1130577A2 (en) * 2000-03-02 2001-09-05 Volkswagen Aktiengesellschaft Method for the reconstruction of low speech frequencies from mid-range frequencies
JP2005033245A (en) * 2003-07-07 2005-02-03 Matsushita Electric Ind Co Ltd Audio signal reproducing apparatus
CN102411936A (en) * 2010-11-25 2012-04-11 歌尔声学股份有限公司 Speech enhancement method and device as well as head de-noising communication earphone
US20140079234A1 (en) * 2012-09-14 2014-03-20 Sikorsky Aircraft Corporation Noise suppression device, system, and method
US20160241948A1 (en) * 2013-05-22 2016-08-18 Goertek Inc Headset Communication Method Under A Strong-Noise Environment And Headset

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08305396A (en) * 1995-05-09 1996-11-22 Matsushita Electric Ind Co Ltd Device and method for expanding voice band
EP1130577A2 (en) * 2000-03-02 2001-09-05 Volkswagen Aktiengesellschaft Method for the reconstruction of low speech frequencies from mid-range frequencies
JP2005033245A (en) * 2003-07-07 2005-02-03 Matsushita Electric Ind Co Ltd Audio signal reproducing apparatus
CN102411936A (en) * 2010-11-25 2012-04-11 歌尔声学股份有限公司 Speech enhancement method and device as well as head de-noising communication earphone
US20140079234A1 (en) * 2012-09-14 2014-03-20 Sikorsky Aircraft Corporation Noise suppression device, system, and method
US20160241948A1 (en) * 2013-05-22 2016-08-18 Goertek Inc Headset Communication Method Under A Strong-Noise Environment And Headset

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634910A (en) * 2021-01-05 2021-04-09 三星电子(中国)研发中心 Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium

Similar Documents

Publication Publication Date Title
US10433075B2 (en) Low latency audio enhancement
KR101981879B1 (en) Method and apparatus for processing voice signals
CN111210021B (en) Audio signal processing method, model training method and related device
CN110970057B (en) Sound processing method, device and equipment
CN107742523B (en) Voice signal processing method and device and mobile terminal
CN107993672B (en) Frequency band expanding method and device
CN108737615A (en) microphone reception method, mobile terminal and computer readable storage medium
CN111524498B (en) Filtering method and device and electronic equipment
CN109686378B (en) Voice processing method and terminal
CN111477243B (en) Audio signal processing method and electronic equipment
CN111343540B (en) Piano audio processing method and electronic equipment
CN109616135B (en) Audio processing method, device and storage medium
CN111462764B (en) Audio encoding method, apparatus, computer-readable storage medium and device
CN109817241B (en) Audio processing method, device and storage medium
CN104409081A (en) Speech signal processing method and device
CN108492837B (en) Method, device and storage medium for detecting audio burst white noise
CN111739545B (en) Audio processing method, device and storage medium
WO2024041512A1 (en) Audio noise reduction method and apparatus, and electronic device and readable storage medium
CN111863006A (en) Audio signal processing method, audio signal processing device and earphone
CN109889665B (en) Volume adjusting method, mobile terminal and storage medium
CN108810787B (en) Foreign matter detection method and device based on audio equipment and terminal
CN110728990B (en) Pitch detection method, apparatus, terminal device and medium
CN110139181B (en) Audio processing method and device, earphone, terminal equipment and storage medium
CN107370883A (en) Improve the method, device and mobile terminal of communication effect
US10455319B1 (en) Reducing noise in audio signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination