WO2018077109A1 - 一种声音处理方法和装置 - Google Patents

一种声音处理方法和装置 Download PDF

Info

Publication number
WO2018077109A1
WO2018077109A1 PCT/CN2017/106905 CN2017106905W WO2018077109A1 WO 2018077109 A1 WO2018077109 A1 WO 2018077109A1 CN 2017106905 W CN2017106905 W CN 2017106905W WO 2018077109 A1 WO2018077109 A1 WO 2018077109A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
signal
current frame
frequency point
sound signal
Prior art date
Application number
PCT/CN2017/106905
Other languages
English (en)
French (fr)
Inventor
王乐临
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020197014937A priority Critical patent/KR102305066B1/ko
Priority to EP17863390.5A priority patent/EP3531674B1/en
Publication of WO2018077109A1 publication Critical patent/WO2018077109A1/zh
Priority to US16/397,666 priority patent/US10575096B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S11/00Systems for determining distance or velocity not using reflection or reradiation
    • G01S11/14Systems for determining distance or velocity not using reflection or reradiation using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/027Spatial or constructional arrangements of microphones, e.g. in dummy heads
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S2205/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S2205/01Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations specially adapted for specific applications

Definitions

  • the present invention relates to the field of terminal technologies, and in particular, to a sound processing method and apparatus.
  • a voice processing device When a voice processing device collects or inputs a voice signal, it is inevitably subject to various noise interference.
  • common noise includes stationary noise and directional interference sound sources, which easily interfere with the target sound signal, and seriously reduce the auditory comfort and intelligibility of the collected sound.
  • the traditional noise estimation and single-channel speech enhancement algorithms are not ideal for suppressing directional interference noise. To this end, it is necessary to design some systems with interference noise suppression capability according to the actual situation, so as to achieve the purpose of directional pickup of the target speech and the ability to suppress other noises.
  • the user uses the camera of the terminal to perform imaging.
  • the embodiment of the invention provides a sound processing method and device, which solves the problem that the existing aliasing noise is serious when the target sound signal is directionally picked up, and the pickup precision of the target sound source is low.
  • an embodiment of the present invention provides a sound processing method, which is applied to a terminal having two microphones on the top, and two microphones are respectively located on the front and the back of the terminal, and the method includes:
  • the two microphones are used to collect the sound signal of the current frame in the current environment where the terminal is located; and the sound pressure difference between the two microphones is calculated according to the first preset algorithm according to the sound signal collected to the current frame.
  • the sound signal includes a backward sound signal, and the backward sound signal is a sound signal whose sound source is located behind the camera; if it is determined that the sound signal of the current frame includes a backward sound signal, the backward direction of the sound signal of the current frame is The sound signal is filtered out.
  • an embodiment of the present invention provides a sound processing apparatus, which is applied to a terminal having two microphones on the top, and two microphones are respectively located on the front and the back of the terminal, and the apparatus includes:
  • An acquisition module configured to: when the camera of the terminal is in a shooting state, use two microphones to collect a sound signal of a current frame in a current environment where the terminal is located;
  • a calculation module configured to calculate, between the two microphones according to the first preset algorithm, according to the sound signal collected to the current frame Sound pressure difference;
  • a judging module configured to determine whether a sound pressure difference between two microphones of the current frame satisfies a sound source direction determination condition
  • a determining module configured to determine, according to a sound pressure difference between two microphones of the current frame, whether a backward sound signal is included in the sound signal of the current frame, wherein the backward sound signal is The sound source is located at the back of the camera;
  • the filtering module is configured to filter the backward sound signal in the sound signal of the current frame if it is determined that the sound signal of the current frame includes the backward sound signal.
  • the backward sound signal in the sound signal can be determined by a certain algorithm and filtered out. Therefore, it is possible to filter out the noise signal outside the imaging range during imaging, thereby ensuring the sound quality of the video at the time of shooting and improving the user experience.
  • the terminal needs to detect the shooting state of the camera, and when detecting whether the camera is shooting, the position of the camera can also be determined. If the terminal has only one camera, you can directly get the location of the camera. If the terminal has a plurality of cameras, when detecting whether the camera is in a shooting state, it is also possible to determine which camera is actually shooting, so that the processor performs corresponding signal processing according to the position of the camera by using a corresponding algorithm.
  • the shooting state of the camera is detected, which can be detected by a timing program or by detecting an enable signal of the camera.
  • This step can be done by the acquisition module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the design can obtain the enabling state of the camera and the position of the camera.
  • the sound signals of the current frame in the current environment in which the terminal is located are collected by using two microphones, including: collecting sound signals of the current frame by using two microphones, respectively, S1 And S2; calculating a sound pressure difference between the two microphones according to the collected sound signal according to the first preset algorithm, comprising: calculating a power spectrum of S1 and S2 by using a fast Fourier transform FFT algorithm based on S1 and S2, respectively P1, P2; according to P1, P2, the sound pressure difference between the two microphones is calculated by the following formula;
  • P 1 represents a sound power spectrum corresponding to the top front microphone in the current frame
  • P 2 represents a sound power spectrum corresponding to the top back microphone in the current frame
  • P1 and P2 are both vectors containing N elements, the N elements
  • the value of the corresponding N frequency points after fast Fourier transform for the current frame sound signal, N is an integer greater than 1; ILD now is a vector containing sound pressure differences corresponding to N frequency points.
  • This step can be done by the acquisition module and the calculation module. More specifically, this technical implementation can control the microphone audio circuit to collect sound signals by the processor, and call the programs and instructions in the memory to perform corresponding operations on the collected sound signals.
  • the design can calculate the sound pressure difference. It is worth noting that there are many alternative ways to calculate the sound pressure difference, and this is not enumerated.
  • determining whether the sound pressure difference between the two microphones of the current frame satisfies the sound source direction determination condition includes:
  • This step can be done by the decision module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the design scheme gives the judgment rule of whether the sound pressure difference can be judged by the sound pressure difference, and provides a basis for the subsequent use of the sound pressure difference reasonably.
  • the specific discrimination method can have various alternative ways, and the invention is not limited; the first door The limit value can be set as needed according to the empirical value, and the present invention is still not limited.
  • the sound pressure difference of the two microphones corresponding to the ith frequency point is used to calculate the ith frequency point according to a second preset algorithm.
  • Corresponding maximum reference value and minimum reference value including:
  • the maximum reference value corresponding to the ith frequency point is calculated by using the following formula.
  • ILD max ⁇ low *ILD now +(1- ⁇ low )*ILD max ';
  • ILD max ⁇ fast * ILD now + (1- ⁇ fast) * ILD max ';
  • ILD min ⁇ low *ILD now +(1- ⁇ low )*ILD min ';
  • ILD min ⁇ fast * ILD now + (1- ⁇ fast) * ILD min ';
  • ILD now represents the sound pressure difference of the two microphones corresponding to the ith frequency point
  • ILD max represents the maximum reference value corresponding to the ith frequency point
  • ILD max ' represents the corresponding i-th frequency point.
  • ILD min represents a minimum reference value corresponding to the ith frequency point
  • ILD min ' represents a minimum reference value corresponding to the ith i-1 frequency point
  • ⁇ fast and ⁇ low represent preset step values
  • And ⁇ fast > ⁇ low
  • This step can be done by the decision module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the design scheme provides a subordinate implementation of the judgment rule of whether the noise can be judged by the sound pressure difference.
  • the specific discriminating method can be replaced by various alternatives, which are not limited by the present invention.
  • determining whether the sound signal of the current frame includes a backward sound signal according to a sound pressure difference between two microphones of the current frame includes:
  • This step can be done by the determination module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the design scheme gives a way to finally determine the noise through the sound pressure difference, and can accurately recognize the backward sound signal; the second threshold value can be set as needed according to experience.
  • filtering the backward sound signal in the sound signal of the current frame including:
  • the sound signal collected by the top back microphone is used as a reference signal, and the adaptive filter of the control terminal filters out the backward sound in the sound signal of the current frame collected by the top front microphone. signal;
  • the sound signal collected by the top front microphone is used as a reference signal, and the adaptive filter of the control terminal filters out the backward sound in the sound signal of the current frame collected by the top and back microphones. signal.
  • This step can be done by the filter module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations. This design gives how the noise is processed for cameras at different locations.
  • the terminal further includes a third microphone at the bottom, the position of the third microphone at the bottom is not limited, and the camera being photographed is the front camera, the method Also includes:
  • the sound signal of the current frame includes a secondary noise signal; in this case, the secondary noise signal is located in front of the front camera and outside the front camera imaging range Noise signal
  • the sound signal collected by the top back microphone is used as the reference signal, and the adaptive filter of the control terminal filters out the time of the sound signal of the current frame collected by the top front microphone. Level noise signal.
  • the foregoing apparatus may further include a secondary noise filtering module, configured to perform the foregoing method. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations. This design gives the possibility to handle secondary noise when there is a bottom microphone.
  • the terminal further includes a fourth microphone at the bottom, and the third microphone and the fourth microphone are arranged at the bottom left and right of the terminal, the specific position is not limited, and the method further include:
  • the sound signal of the current frame includes a secondary noise signal when the left and right azimuth angles are greater than the second preset angle
  • the sound signal collected by the top back microphone is used as the reference signal, and the adaptive filter of the control terminal filters out the time of the sound signal of the current frame collected by the top front microphone.
  • Level noise signal It is worth noting that the secondary noise signal can be determined by using the up and down azimuth and the left and right azimuth. No., but the focus of the sound source is different, the two can complement each other, and it is more comprehensive and accurate than using the upper and lower azimuth or the left and right azimuth to determine the secondary noise signal.
  • the foregoing apparatus may further include a secondary noise filtering module, configured to perform the foregoing method. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations. This design gives the possibility to handle secondary noise when there are two microphones at the bottom.
  • the terminal further includes a third microphone at the bottom, the position of the third microphone at the bottom is not limited, and the camera being photographed is a rear camera, the The method also includes:
  • the sound signal of the current frame includes a secondary noise signal.
  • the secondary noise signal is located in front of the rear camera and outside the imaging range of the rear camera. Noise signal
  • the sound signal collected by the top front microphone is used as a reference signal, and the adaptive filter of the terminal is controlled to filter out the sound signal of the current frame collected by the top and back microphones. Secondary noise signal.
  • the foregoing apparatus may further include a secondary noise filtering module, configured to perform the foregoing method. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations. This design gives the possibility to handle secondary noise when there is a bottom microphone.
  • the terminal further includes a fourth microphone at the bottom, and the third microphone and the fourth microphone are arranged at the bottom of the terminal, the method further includes:
  • the sound signal of the current frame includes a secondary noise signal when the left and right azimuth angles are greater than the second preset angle
  • the sound signal collected by the top front microphone is used as a reference signal, and the adaptive filter of the control terminal filters out the sound signal of the current frame collected by the top and back microphones.
  • Secondary noise signal can be determined by using the up and down azimuth and the left and right azimuth angles, but the direction of the sound source is different, and the two can complement each other, which is determined by using the up and down azimuth or the left and right azimuth.
  • the level noise signal is more comprehensive and accurate.
  • the foregoing apparatus may further include a secondary noise filtering module, configured to perform the foregoing method. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations. This design gives the possibility to handle secondary noise when there are two microphones at the bottom.
  • an embodiment of the present invention provides a sound processing terminal device, which includes: a microphone, a camera, a memory, and a processor; wherein they are connected by a bus;
  • a microphone for collecting a sound signal under the control of the processor
  • a camera for acquiring an image signal under the control of the processor
  • a memory is used to store computer programs and instructions
  • the processor is operative to invoke a computer program and instructions stored in the memory to perform any of the possible design methods described above.
  • the terminal device further includes an antenna system, and the antenna system transmits and receives wireless communication signals under the control of the processor to implement wireless communication with the mobile communication network;
  • the mobile communication network includes the following one Or multiple: GSM network, CDMA network, 3G network, FDMA, TDMA, PDC, TACS, AMPS, WCDMA, TDSCDMA, WIFI and LTE networks.
  • an embodiment of the present invention provides a sound processing method, which is applied to a terminal having two microphones on the top, and two microphones are respectively located on the front and the back of the terminal, and the method includes:
  • the two microphones are used to collect the sound signal of the current frame in the current environment where the terminal is located; and the sound between the two microphones is calculated according to the first preset algorithm according to the sound signal collected to the current frame.
  • the pressure difference is determined whether the sound pressure difference between the two microphones of the current frame satisfies the sound source direction determination condition; if the sound source direction determination condition is satisfied, the current pressure difference between the two microphones of the current frame is used to determine the current Whether the backward sound signal is included in the sound signal of the frame, and the backward sound signal is a sound signal whose sound source is located behind the camera; if it is determined that the sound signal of the current frame includes the backward sound signal, then the sound signal of the current frame is The backward sound signal is filtered out.
  • an embodiment of the present invention provides a sound processing apparatus, which is applied to a terminal having two microphones on the top, and two microphones are respectively located on the front and the back of the terminal, and the apparatus includes:
  • An identification module configured to determine whether a target user exists in a camera range of the camera of the terminal when the terminal is in a video call state
  • An acquisition module configured to: when the identification module recognizes that the target user exists in the imaging range, use two microphones to collect the sound signal of the current frame in the current environment where the terminal is located;
  • a calculating module configured to calculate a sound pressure difference between the two microphones according to the first preset algorithm according to the sound signal collected to the current frame
  • a judging module configured to determine whether a sound pressure difference between two microphones of the current frame satisfies a sound source direction determination condition
  • a determining module configured to determine, according to a sound pressure difference between two microphones of the current frame, whether a backward sound signal is included in the sound signal of the current frame, wherein the backward sound signal is The sound source is located at the back of the camera;
  • the filtering module is configured to filter the backward sound signal in the sound signal of the current frame if it is determined that the sound signal of the current frame includes the backward sound signal.
  • the backward sound signal in the sound signal can be determined by a certain algorithm and filtered out. Therefore, in a scene such as a video call, the noise signal outside the imaging range can be filtered out to ensure the sound quality of the video and improve the user experience.
  • the terminal needs to detect the shooting state of the camera, and when detecting whether the camera is shooting, the position of the camera (such as a front camera or a backward camera) can also be determined. . If the terminal has only one camera, you can directly get the location of the camera. If the terminal has a plurality of cameras, when detecting whether the camera is in a shooting state, it is also possible to determine which camera is actually shooting, so that the processor performs corresponding signal processing according to the position of the camera by using a corresponding algorithm. The shooting state of the camera is detected, which can be detected by a timing program or by detecting an enable signal of the camera.
  • This step can be done by the identification module or the acquisition module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the terminal when the terminal detects the shooting state of the camera, It is also possible to further determine the scene in which the camera is turned on, such as whether it is a normal video shooting or a real-time video call, etc., which can be realized by the processor identifying the enable signal by program instructions.
  • This step can be done by the identification module or the acquisition module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • determining that the target user exists within the imaging range includes:
  • the lip motion detection technique is used to detect the presence of a user with a lip motion.
  • This step can be done by the identification module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the sound signals of the current frame in the current environment in which the terminal is located are collected by using two microphones, including: collecting sound signals of the current frame by using two microphones, respectively, S1 And S2; calculating a sound pressure difference between the two microphones according to the collected sound signal according to the first preset algorithm, comprising: calculating a power spectrum of S1 and S2 by using a fast Fourier transform FFT algorithm based on S1 and S2, respectively P1, P2; according to P1, P2, the sound pressure difference between the two microphones is calculated by the following formula;
  • P 1 represents a sound power spectrum corresponding to the top front microphone in the current frame
  • P 2 represents a sound power spectrum corresponding to the top back microphone in the current frame
  • P1 and P2 are both vectors containing N elements, the N elements
  • the value of the corresponding N frequency points after fast Fourier transform for the current frame sound signal, N is an integer greater than 1; ILD now is a vector containing sound pressure differences corresponding to N frequency points.
  • This step can be done by the acquisition module and the calculation module. More specifically, this technical implementation can control the microphone audio circuit to collect sound signals by the processor, and call the programs and instructions in the memory to perform corresponding operations on the collected sound signals.
  • determining whether the sound pressure difference between the two microphones of the current frame satisfies the sound source direction determination condition includes:
  • This step can be done by the decision module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the sound pressure difference of the two microphones corresponding to the ith frequency point is used to calculate the ith frequency point according to a second preset algorithm.
  • Corresponding maximum reference value and minimum reference value including:
  • the maximum reference value corresponding to the ith frequency point is calculated by using the following formula.
  • ILD max ⁇ low *ILD now +(1- ⁇ low )*ILD max ';
  • ILD max ⁇ fast *ILD now +(1- ⁇ fast )*ILD max ';
  • ILD min ⁇ low *ILD now +(1- ⁇ low )*ILD min ';
  • ILD min ⁇ fast *ILD now +(1- ⁇ fast )*ILD min ';
  • ILD now represents the sound pressure difference of the two microphones corresponding to the ith frequency point
  • ILD max represents the maximum reference value corresponding to the ith frequency point
  • ILD max ' represents the corresponding i-th frequency point.
  • ILD min represents a minimum reference value corresponding to the ith frequency point
  • ILD min ' represents a minimum reference value corresponding to the ith i-1 frequency point
  • ⁇ fast and ⁇ low represent preset step values
  • And ⁇ fast > ⁇ low
  • This step can be done by the decision module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • determining whether the sound signal of the current frame includes a backward sound signal according to a sound pressure difference between two microphones of the current frame include:
  • This step can be done by the determination module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • filtering the backward sound signal in the sound signal of the current frame including:
  • the sound signal collected by the top back microphone is used as a reference signal, and the adaptive filter of the control terminal filters out the sound signal of the current frame collected by the top front microphone.
  • the sound signal collected by the top front microphone is used as a reference signal, and the adaptive filter of the control terminal filters out the backward sound in the sound signal of the current frame collected by the top and back microphones. signal.
  • This step can be done by the filter module. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the terminal further includes a third microphone at the bottom, the position of the third microphone at the bottom is not limited, and the camera being photographed is a front camera, the method Also includes:
  • the sound signal of the current frame includes a secondary noise signal; in this case, the secondary noise signal is located in front of the front camera and outside the front camera imaging range Noise signal
  • the sound signal collected by the top back microphone is used as the reference signal, and the adaptive filter of the control terminal filters out the time of the sound signal of the current frame collected by the top front microphone. Level noise signal.
  • the foregoing apparatus may further include a secondary noise filtering module, configured to perform the foregoing method. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the terminal further includes a fourth microphone at the bottom, and the third microphone and the fourth microphone are arranged at the bottom left and right of the terminal, the specific position is not limited, and the method further include:
  • the sound signal of the current frame includes a secondary noise signal when the left and right azimuth angles are greater than the second preset angle
  • the sound signal collected by the top back microphone is used as the reference signal, and the adaptive filter of the control terminal filters out the time of the sound signal of the current frame collected by the top front microphone.
  • Level noise signal It is worth noting that the secondary noise signal can be determined by using the up and down azimuth and the left and right azimuth angles, but the direction of the sound source is different, and the two can complement each other, which is determined by using the up and down azimuth or the left and right azimuth.
  • the level noise signal is more comprehensive and accurate.
  • the foregoing apparatus may further include a secondary noise filtering module, configured to perform the foregoing method. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the terminal further includes a third microphone at the bottom, the position of the third microphone at the bottom is not limited, and the camera being photographed is a rear camera, the The method also includes:
  • the sound signal of the current frame includes a secondary noise signal.
  • the secondary noise signal is located in front of the rear camera and outside the imaging range of the rear camera. Noise signal
  • the adaptive filter controlling the terminal filters out the secondary noise signal in the sound signal of the current frame acquired by the top-back microphone.
  • the foregoing apparatus may further include a secondary noise filtering module, configured to perform the foregoing method. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • the terminal further includes a fourth microphone at the bottom, and the third microphone and the fourth microphone are arranged at the bottom of the terminal, the method further includes:
  • the sound signal of the current frame includes a secondary noise signal when the left and right azimuth angles are greater than the second preset angle
  • the sound signal collected by the top front microphone is used as a reference signal, and the adaptive filter of the control terminal filters out the sound signal of the current frame collected by the top and back microphones.
  • Secondary noise signal can be determined by using the up and down azimuth and the left and right azimuth angles, but the direction of the sound source is different, and the two can complement each other, which is determined by using the up and down azimuth or the left and right azimuth.
  • the level noise signal is more comprehensive and accurate.
  • the foregoing apparatus may further include a secondary noise filtering module, configured to perform the foregoing method. More specifically, this technical implementation can be performed by the processor invoking the program and instructions in the memory to perform the corresponding operations.
  • an embodiment of the present invention provides a sound processing terminal device, which includes: a microphone, a camera, a memory, and a processor; wherein they are connected by a bus;
  • a microphone for collecting a sound signal under the control of the processor
  • a camera for acquiring an image signal under the control of the processor
  • a memory is used to store computer programs and instructions
  • the processor is operative to invoke a computer program and instructions stored in the memory to perform any of the possible design methods described above.
  • the terminal device further includes an antenna system, and the antenna system transmits and receives wireless communication signals under the control of the processor to implement wireless communication with the mobile communication network;
  • the mobile communication network includes the following one Or multiple: GSM network, CDMA network, 3G network, FDMA, TDMA, PDC, TACS, AMPS, WCDMA, TDSCDMA, WIFI and LTE networks.
  • the method based on the sound pressure difference is used to determine the sound source direction, which can effectively determine the noise and suppress the noise, and improve the target sound during imaging.
  • the picking accuracy of the source improves the user experience.
  • 1 is a schematic structural view of a terminal
  • FIG. 2A, FIG. 2B or FIG. 2C is a schematic diagram of a microphone layout on a terminal according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a sound processing method according to an embodiment of the present invention.
  • 4A is a schematic diagram showing the relationship between the energy difference dB number and the ILD of the two microphones before and after the terminal;
  • 4B is a schematic diagram of sound source direction determination using a microphone for sound source localization
  • Figure 5 is a schematic diagram of a sound source localization technique based on phase difference
  • FIG. 6 is a schematic diagram of implementation of a generalized cross-correlation sound source localization method
  • FIG. 7 is a schematic structural diagram of a sound processing apparatus according to an embodiment of the present invention.
  • the terminal may be a device that provides voice and/or data connectivity to the user, a handheld device with a wireless connection function, or other processing device connected to the wireless modem, such as a mobile phone (or "Cellular" phones, which can be portable, pocket-sized, handheld, wearable devices (such as smart watches, smart bracelets, etc.), tablets, personal computers (PCs, Personal Computers), PDAs (Personal Digital Assistants, personal digital) Assistant), POS (Point of Sales), on-board computer, etc.
  • a mobile phone or "Cellular" phones, which can be portable, pocket-sized, handheld, wearable devices (such as smart watches, smart bracelets, etc.), tablets, personal computers (PCs, Personal Computers), PDAs (Personal Digital Assistants, personal digital) Assistant), POS (Point of Sales), on-board computer, etc.
  • PCs Personal Computers
  • PDAs Personal Digital Assistants, personal digital) Assistant
  • POS Point of Sales
  • FIG. 1 shows an alternative hardware structure diagram of the terminal 100.
  • the terminal 100 may include a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150, an audio circuit 160, a speaker 161, a microphone 162, a processor 170, an external interface 180, a power supply 190, and the like.
  • the microphone 162 can be an analog microphone or a digital microphone, and can realize normal microphone pickup function, and the number of microphones is at least two, and the layout of the microphone needs to meet certain requirements. For details, refer to FIG.
  • FIG. 1 is merely an example of a portable multi-function device, and does not constitute a limitation of the portable multi-function device, and may include more or less components than those illustrated, or combine some components, or different. Parts.
  • the input unit 130 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the portable multifunction device.
  • the input unit 130 may include a touch screen 131 and other input devices 132.
  • the touch screen 131 can collect touch operations on or near the user (such as the user's operation on the touch screen or near the touch screen using any suitable object such as a finger, a joint, a stylus pen, etc.), and drive the corresponding according to a preset program. Connection device.
  • the touch screen can detect a user's touch action on the touch screen, convert the touch action into a touch signal and send the signal to the processor 170, and can receive and execute a command sent by the processor 170; the touch signal includes at least a touch Point coordinate information.
  • the touch screen 131 can provide an input interface and an output interface between the terminal 100 and a user.
  • touch screens can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 130 may also include other input devices.
  • other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control button 132, switch button 133, etc.), trackball, mouse, joystick, and the like.
  • the display unit 140 can be used to display information input by a user or information provided to a user and various menus of the terminal 100.
  • the touch screen 131 may cover the display panel 141.
  • the touch screen 131 detects a touch operation on or near it, the touch screen 131 transmits to the processor 170 to determine the type of the touch event, and then the processor 170 displays the panel according to the type of the touch event.
  • a corresponding visual output is provided on 141.
  • the touch screen and the display unit can be set
  • the input, output, and display functions of the terminal 100 are implemented as a component; for convenience of description, the touch display screen represents a function set of the touch screen and the display unit; in some embodiments, the touch screen and the display unit can also function as Two separate components.
  • the memory 120 can be used to store instructions and data, the memory 120 can mainly include a storage instruction area and a storage data area, the storage data area can store an association relationship between the joint touch gesture and the application function; the storage instruction area can store an operating system, an application, Software units such as instructions required for at least one function, or their subsets, extension sets.
  • a non-volatile random access memory can also be included; providing hardware 170, software, and data resources in the management computing processing device to the processor 170, supporting control software and applications. Also used for the storage of multimedia files, as well as the storage of running programs and applications.
  • the processor 170 is a control center of the terminal 100, and connects various parts of the entire mobile phone by various interfaces and lines, and executes various kinds of the terminal 100 by operating or executing an instruction stored in the memory 120 and calling data stored in the memory 120. Function and process data to monitor the phone as a whole.
  • the processor 170 may include one or more processing units; preferably, the processor 170 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 170.
  • the processors, memories can be implemented on a single chip, and in some embodiments, they can also be implemented separately on separate chips.
  • the processor 170 can also be configured to generate corresponding operational control signals, send to corresponding components of the computing processing device, read and process data in the software, and in particular read and process the data and programs in the memory 120 to enable Each function module performs the corresponding function, thereby controlling the corresponding component to act according to the requirements of the instruction.
  • the camera 150 is used for capturing images or videos, and can be triggered by an application instruction to realize a photographing or photographing function.
  • the radio frequency unit 110 can be used for receiving and transmitting signals during transmission and reception of information or during a call. Specifically, after receiving the downlink information of the base station, the processing is performed by the processor 170. In addition, the uplink data is designed to be sent to the base station.
  • RF circuits include, but are not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
  • the radio unit 110 can also communicate with network devices and other devices through wireless communication.
  • the wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code). Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), etc.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • the audio circuit 160, the speaker 161, and the microphone 162 can provide an audio interface between the user and the terminal 100.
  • the audio circuit 160 can transmit the converted electrical data of the received audio data to the speaker 161 for conversion to the sound signal output by the speaker 161; on the other hand, the microphone 162 is used to collect the sound signal, and can also convert the collected sound signal.
  • the electrical signal is received by the audio circuit 160 and converted into audio data, and then processed by the audio data output processor 170, transmitted to the terminal, for example, via the radio frequency unit 110, or outputted to the memory 120 for further processing.
  • the audio circuit can also include a headphone jack 163 for providing a connection interface between the audio circuit and the earphone.
  • the terminal 100 also includes a power source 190 (such as a battery) for powering various components.
  • a power source 190 such as a battery
  • the power source can be logically coupled to the processor 170 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
  • the terminal 100 also includes an external interface 180, which may be a standard Micro USB interface, or The multi-pin connector can be used to connect the terminal 100 to communicate with other devices, or can be used to connect a charger to charge the terminal 100.
  • an external interface 180 which may be a standard Micro USB interface, or The multi-pin connector can be used to connect the terminal 100 to communicate with other devices, or can be used to connect a charger to charge the terminal 100.
  • the terminal 100 may further include a flash, a wireless fidelity (WiFi) module, a Bluetooth module, various sensors, and the like, and details are not described herein.
  • WiFi wireless fidelity
  • Bluetooth wireless fidelity
  • the embodiment of the present invention provides a sound processing method and apparatus for improving the accuracy of sound source localization, reducing false positives, and effectively filtering out noise from the rear of the camera.
  • This embodiment of the present invention may also be called a backward sound signal.
  • the plane is bounded by the plane of the terminal body, and the sound source is in the rear area of the camera (for example, for the front camera, the front camera can be understood as the area on the back side of the fuselage; for example, for the rear camera, The noise behind the camera can be understood as the area on the front side of the fuselage can be understood as a backward sound signal.
  • the area mentioned above allows for a certain definition error.
  • an embodiment of the present invention provides a sound processing method, which may be applied to a terminal having two microphones on the top, the two microphones being respectively located on the front and the back of the terminal, and the terminal
  • the microphone setting manner may be as shown in any one of FIG. 2A, FIG. 2B or FIG. 2C; the specific process includes the following steps:
  • Step 31 When detecting that the camera of the terminal is in a shooting state, use the two microphones to collect a sound signal in a current environment where the terminal is located.
  • the sound signal in the time domain, can be further divided into the frame signal of the sound, and the length of the frame is related to the preset dividing algorithm, so each frame has a corresponding sound signal. Therefore, when the microphone is in operation, the sound signal of the current frame can be collected.
  • Step 32 Calculate a sound pressure difference between the two microphones according to the first predetermined algorithm according to the collected sound signal.
  • each frame signal can be calculated to obtain the sound pressure difference of the two microphones corresponding to the frame.
  • Step 33 Determine whether the sound pressure difference between the two microphones satisfies the sound source direction determination condition.
  • Step 34 If the sound source direction determination condition is met, determining whether the sound signal includes a backward sound signal according to a sound pressure difference between the two microphones, and the backward sound signal is a sound source An acoustic signal located behind the camera.
  • the backward sound signal can also be understood as a noise signal.
  • Step 35 If it is determined that the backward sound signal is included in the sound signal, the backward sound signal in the sound signal is filtered out.
  • steps 31 and 32 can be implemented by the following processes:
  • the terminal can identify whether the camera is in an open state by a preset detection program, such as detecting whether the camera has been enabled, and once detecting that the camera is in a shooting state, the terminal uses the top and back of the terminal to collect two microphones.
  • the sound signals in the current environment can theoretically be recorded as S1 and S2 for the sound signals of the current frame respectively; based on S1 and S2, the power spectrum of S1 and S2 is calculated by the Fast Fourier Transformation (FFT) algorithm.
  • FFT Fast Fourier Transformation
  • P1, P2 are respectively calculated; according to P1, P2, the sound pressure difference between the two microphones is calculated; it should be clear to those skilled in the art that the sound signal can be composed of a plurality of frame signals.
  • step 31 when detecting that the camera is enabled, it is generally also possible to detect whether the terminal uses a front camera or a rear camera, so that the processor can be based on the position of the camera. Make appropriate for the subsequent signal processing Algorithm selection.
  • the sound signals collected by the two microphones are sent to the FFT module, and the FFT module is responsible for time-frequency transforming the collected sound signals to obtain a spectrum of the signals. Specifically, the FFT module processes the signal by using a Short-Time Fourier Transform (STFT).
  • STFT Short-Time Fourier Transform
  • the signal x i (N, l) is obtained, N represents the transmission frequency point corresponding to one frame signal, l represents the frame number, and the first frame can be understood as the current frame at the time of being acquired. Power spectrum of the signal
  • the interaural level difference (ILD) of the two mics corresponding to the current frame is calculated by the following formula:
  • P 1 represents the sound power spectrum corresponding to the top front microphone in the current frame
  • P 2 represents the sound power spectrum corresponding to the top back microphone in the current frame
  • P1 and P2 are both vectors containing N elements, the N elements are the values of the corresponding N frequency points after the fast Fourier transform of the current frame sound signal, and N is an integer greater than 1
  • ILD now is A vector containing sound pressure differences corresponding to N frequency points, and the value of N is determined by a preset frequency division rule.
  • the algorithm in the above example is only one implementation form of the first preset algorithm, and is not limited.
  • Figure 4A shows the relationship between the two mic energy phase difference (dB) numbers and the ILD at the top front and back.
  • the ILD value ranges from -1 to 1. 1 represents that the front of the current frame has a significantly larger front-end microphone energy than the top-back microphone energy, which is a forward sound signal, and -1 represents the current frame top-back microphone energy is significantly larger than the top.
  • the front microphone energy is a backward sound signal.
  • step 33 uses the sound pressure difference of the two microphones to determine whether the sound pressure difference is satisfied or not.
  • the conditions of the sound source direction include the following processes:
  • the sound source direction determination condition is satisfied at M frequency points among the N frequency points in one frame signal, determining that the sound pressure difference between the two microphones of the current frame satisfies the sound source direction determination condition, wherein M is greater than or equal to N/2; that is, the current frame can be judged by the sound pressure difference to determine whether there is a backward sound signal.
  • ILD max is used to represent the maximum value of the sound pressure difference corresponding to the ith frequency point (one of the frequency points corresponding to the current frame)
  • ILD min is the minimum value of the sound pressure difference corresponding to the ith frequency point, ILD max and ILD.
  • Min can be set to 0 at the first frequency of an initial frame or set to the preset sound pressure difference between the top front and top back two microphones, specifically, when ILD max -ILD min >first door In the time limit, it is considered that the sound pressure difference between the two microphones satisfies the sound source direction determination condition at the ith frequency point, otherwise it is considered that the sound source direction determination condition is not satisfied at the ith frequency point, wherein ILDmax is based on the ith frequency point corresponding The maximum value of the sound pressure difference calculated by the sound pressure difference of the two microphones, ILDmin is the minimum value of the sound pressure difference calculated based on the sound pressure difference of the two microphones corresponding to the ith frequency point.
  • the two microphones are calculated by the following formula.
  • the maximum value of the sound pressure difference corresponding to the frequency point is not greater than the maximum value of the sound pressure difference corresponding to the i-1th frequency point (previous frequency point).
  • ILD max ⁇ low *ILD now +(1- ⁇ low )*ILD max ';
  • the two microphones are calculated at the ith frequency by the following formula The maximum value of the sound pressure difference corresponding to the point,
  • ILD max ⁇ fast *ILD now +(1- ⁇ fast )*LD max ';
  • the two microphones are calculated at the ith frequency by the following formula The minimum value of the sound pressure difference corresponding to the point,
  • ILD min ⁇ low *ILD now +(1- ⁇ low )*ILD min ';
  • the two microphones are calculated by the following formula.
  • the minimum value of the sound pressure difference corresponding to the frequency point is not greater than the minimum value of the sound pressure difference corresponding to the i-1th frequency point (previous frequency point).
  • ILD min ⁇ fast *ILD now +(1- ⁇ fast )*ILD min ';
  • ILD now represents the sound pressure difference of the two microphones corresponding to the ith frequency point
  • ILD max represents the maximum reference value corresponding to the ith frequency point
  • ILD max ' represents the corresponding i-th frequency point.
  • a maximum reference value ILD min represents a minimum reference value corresponding to the ith frequency point
  • ILD min ' represents a minimum reference value corresponding to the ith i-1 frequency point
  • ⁇ fast and ⁇ low represent preset step values
  • ILD max is obtained by smoothing the sound pressure difference ILD now of the ith frequency point and the maximum sound pressure difference of the previous frequency point
  • ILD min It is obtained by smoothing the sound pressure difference ILD now of the i-th frequency point and the minimum value of the sound pressure difference of the previous frequency point.
  • the current frame sound pressure difference satisfies the sound source determination condition, determining whether the current frame sound signal includes a backward sound signal based on the sound pressure difference between the top front and back two microphones of the terminal, and determining the When the backward sound signal is included in the current frame sound signal, the backward signal is filtered out.
  • the step 34 may be specifically: the sound pressure difference corresponding to the j-frequency of the two microphones.
  • the second threshold is less than the second threshold, determining that the j-th frequency point corresponding to the sound signal includes a backward sound signal; when the sound pressure difference between the two microphones at the j-th frequency point is not less than the second threshold value And determining that the j-frequency point corresponding to the sound signal does not include a backward sound signal.
  • Step 35 may be specifically: if it is determined that the j-frequency point corresponding to the sound signal includes a backward sound signal and the camera that the terminal is shooting is a front camera, the sound signal collected by the top back microphone is used as a reference signal, and is controlled.
  • the adaptive filter of the terminal filters out a backward sound signal in a sound signal of a current frame collected by the top front microphone; if the camera being photographed is a rear camera, the sound signal collected by the top front microphone is used as a reference signal,
  • the adaptive filter that controls the terminal filters out the backward sound signal in the sound signal of the current frame acquired by the top-back microphone.
  • an NLMS adaptive filter scheme can be used.
  • the frequency domain filter is the equivalent form of the time domain filter.
  • the principle of the two filtering methods in the signal processing can be equivalent. These are all prior art, and the detailed filtering process is detailed.
  • any terminal with two microphones at the top and the bottom can use this method, as shown in Figures 2B and 2C.
  • the angle of view that the camera can take is about 120 degrees, and not the front area of the entire camera. Therefore, there may also be a noise signal whose sound source is located in front of the camera and beyond the camera's imaging range. This part of the noise signal is relative to the backward sound signal. The effect on the content of the camera is small and can be defined as a secondary noise signal.
  • the above mentioned areas allow a certain definition error. Therefore, in the specific implementation process, in addition to filtering out part of the noise of the backward sound signal, the secondary noise can be further filtered out, and the following two examples can be referred to.
  • the sound processing method may further include the following steps, as shown in FIG. 2B, in which the back sound signal is present before: Step 36: If the front camera is used for shooting, the time difference difference positioning may be performed on the sound signals collected by the bottom left microphone and the top front microphone to obtain the up and down azimuth of the sound signal; the upper and lower azimuth angles are greater than the first pre-preparation When the angle is set, it is determined that the sound signal of the current frame contains the secondary noise signal.
  • the secondary noise signal is a noise signal located in front of the front camera and outside the boundary of the front camera's imaging range.
  • the sound signal collected by the top back microphone is used as a reference signal, and the adaptive filter of the terminal is controlled to filter out the sound signal of the current frame collected by the top front microphone. Secondary noise signal.
  • the time difference difference between the bottom left microphone and the top rear microphone can be used to obtain the up and down azimuth of the sound signal; the upper and lower azimuth angle is greater than the first preset angle (with When the first preset angles in a segment may be the same or different, it is determined that the sound signal of the current frame includes a secondary noise signal.
  • the secondary noise signal is a noise signal located in front of the rear camera and outside the boundary of the rear camera imaging range. If it is determined that the sound signal of the current frame contains the secondary noise signal, the sound signal collected by the top front microphone is used as the reference signal, and the adaptive filter of the control terminal filters out the time of the sound signal of the current frame collected by the top and back microphones. Level noise signal.
  • the two microphones at the bottom may also be referred to as a third microphone and a fourth microphone.
  • the sound processing method may further include the following steps, regardless of whether there is a backward sound signal before:
  • Step 37 If the front camera is used for shooting, the time difference difference positioning of the sound signal collected by the bottom left microphone and the top front microphone may be performed to obtain the up and down azimuth of the sound signal; When a preset angle is determined, it is determined that the sound signal of the current frame contains a secondary noise signal.
  • the secondary noise signal is a noise signal located in front of the front camera and outside the boundary of the front camera's imaging range. Further, performing delay difference positioning on the sound signals of the current frame collected by the third microphone and the fourth microphone to obtain a left and right azimuth of the sound signal of the current frame; and the left and right azimuth angles are greater than the second preset Angle, determining that the sound signal of the current frame contains a secondary noise signal.
  • the sound signal collected by the top back microphone is used as the reference signal, and the adaptive filter of the control terminal filters out the current frame collected by the top front microphone. All secondary noise signals in the sound signal.
  • the upper and lower azimuth and the left and right azimuth can detect that although the noise belongs to the secondary noise signal, the orientation of the noise source is slightly different.
  • the upper azimuth is more focused on the upper and lower directions of the detection terminal.
  • the noise on the left and right azimuth angles are more focused on the noise in the left and right direction of the plane where the terminal is detected.
  • the time difference difference between the bottom left microphone and the top rear microphone can be used to obtain the up and down azimuth of the sound signal; when the upper and lower azimuth angle is greater than the first preset angle, it is determined.
  • the secondary signal is included in the sound signal of the current frame.
  • the secondary noise signal is a noise signal located in front of the rear camera and outside the boundary of the rear camera imaging range.
  • the time difference difference positioning may be performed on the sound signals of the current frame collected by the third microphone and the fourth microphone to obtain the left and right azimuth angles of the sound signals of the current frame; and the left and right azimuth angles are greater than the second preset angle (and The second preset angle in the previous segment may be the same or different), determining that the sound signal of the current frame includes a secondary noise signal; and by the above steps, if it is determined that the sound signal of the current frame includes the secondary noise signal, The sound signal collected by the top front microphone is used as a reference signal, and the adaptive filter of the control terminal filters out all secondary noise signals in the sound signal of the current frame collected by the top and back microphones.
  • the upper and lower azimuth and the left and right azimuth can detect that although the noise belongs to the secondary noise signal, the orientation of the noise source is slightly different.
  • the upper azimuth is more focused on the upper and lower directions of the detection terminal.
  • the noise on the left and right azimuth angles are more focused on the noise in the left and right direction of the plane where the terminal is detected.
  • the sound source direction estimation of the forward and backward directions can be performed by using the two microphone sound pressure difference information before and after the terminal.
  • the sound source direction estimation may be performed using the delay difference information, such as the ⁇ 1 angle (forward and backward azimuth angle) in FIG. 4B, that is, the angle value obtained by estimating the azimuth using the delay difference of the front and rear microphones.
  • the front and rear microphones refer to the microphones on the front and back of the top, respectively.
  • the specific calculation method refers to the delay difference calculation method in FIG.
  • the calculation method of the delay difference belongs to the prior art, and will not be described in detail in the present invention.
  • the front and rear mic can perform angle analysis on the x-axis in the spatial coordinate system x, y, and z axes
  • the front mic and the bottom mic2 can perform angle analysis on the y-axis
  • the bottom mic1 and the bottom mic2 can be on the z-axis. Perform azimuth analysis.
  • the sound source localization function of the space can be realized, and then the target sound source is positioned at the camera.
  • the front and rear orientation, the left and right orientation, and the up and down orientation are all referenced to the body of the mobile phone.
  • the front and rear orientation refers to the front and back directions of the mobile phone
  • the left and right orientation refers to the direction of the two sides of the fuselage
  • the upper and lower orientation refers to the top of the fuselage.
  • the direction of the bottom refers to the direction of the bottom.
  • the camera's field of view The display on the terminal is two open angles, namely open angle 1 and open angle 2; open angle 1 corresponds to the z-axis direction, and open angle 2 corresponds to the y-axis direction.
  • the algorithm firstly uses the angle estimation method of sound pressure difference or delay difference to distinguish the forward and backward sound source signals, and then uses ⁇ 2 to constrain the opening angle of the y axis. When ⁇ 2 is larger than the opening angle 2 of the camera At this time, the sound signal contains a secondary noise signal.
  • the secondary sound signal includes a noise signal.
  • the secondary noise signal is a relatively higher-level concept.
  • the secondary noise direction detected by the method of determining by ⁇ 2 and ⁇ 3 is not the same, and the secondary noise of the left and right azimuth is detected by using ⁇ 2 as the main bias.
  • the ⁇ 3 is mainly used to detect the secondary noise of the up and down direction, and ⁇ 2 and ⁇ 3 complement each other in the determination of the sound source orientation.
  • the microphone layout in the case where the microphone layout is as shown in FIG. 2A, it can be obtained whether the sound signal of the current frame includes a backward sound signal, and the current frame signal does not include a backward direction.
  • the output voice activity detection (VAD) flag is 0; when the sound signal of the current frame includes the backward sound signal, it is considered to contain the noise source, and the output VAD flag is 1; in the microphone layout is FIG. 2B And in the case shown in FIG.
  • the VAD flag is 1; the current frame signal does not include the backward direction.
  • the output voice activity detection (VAD) flag is 0; otherwise, the output VAD flag is 1; wherein, the VAD flag The default is 0.
  • the VAD will be set to 1 if only the backward sound signal is considered. The effect, as long as the backward sound signal is included, the VAD will be set to 1; obviously the former is more sensitive to the sound source and requires higher requirements, which can be flexibly configured by the user in advance.
  • the sound source localization technology is used to determine the sound source location. The specific method is as follows:
  • the time difference information is the phase change amount for the waveform signal of any frequency, that is, Phase difference.
  • phase difference f is the frequency
  • c is the speed of sound
  • d is the mic spacing
  • 0° incident here Equal to 0,180° incident Equal to ⁇ .
  • h can be understood as the phase difference between two mics
  • d is the maximum phase difference of two mics
  • the incident angle asin (h/d).
  • the common method is the generalized cross-correlation (GCC) sound source localization method.
  • GCC generalized cross-correlation
  • x 1 and x 2 are time-domain signals received by two mics, and the FFT is a fast Fourier transform, and the calculated peak index ⁇ 12 , that is, the corresponding number of delay samples, is The angle of incidence can be calculated as follows: c is the speed of sound, d is the mic spacing, and Fs is the sampling rate.
  • the whole frame and the frequency point incident angle of the current frame signal can be obtained.
  • the entire frame and the frequency point incident angle are outside the beam picking range (the beam range is set in advance), it is considered to be the current
  • the sound signal contains the backward sound signal as the noise sound source, and the output VAD flag is 1; otherwise, the output VAD flag is 0, and the VAD flag defaults to 0.
  • the adaptive filter of the terminal is controlled to filter out the noise signal in the sound signal collected by the top front microphone.
  • the specific implementation process is: outputting the VAD flag to the beamformer, optionally, using a normalized minimum mean square error (NLMS) filter, and the NLMS needs to adaptively generate the desired signal with the reference signal and subtract the target with the desired signal.
  • the signal thus obtaining the residual signal, is designed with minimal residuals.
  • the filter step size of the NLMS is guided by the sound source localization result. When the forward sound signal is the target sound source, the filter step size is 0 and is not updated.
  • the filter step size is the largest and is updated.
  • the reference signal of the filter uses the signal in the opposite direction to the target user. For example, when the speaker is directly in front of the screen, the reference signal selects the signal of the mic on the top of the top of the terminal. vice versa.
  • the updated filter coefficient is multiplied by the input (in) signal to obtain an output signal that filters out the backward noise component.
  • the sound source localization result is used to guide the post-processing post-beam noise reduction.
  • the sound source localization result of the frequency point contains noise
  • the noise energy of the frequency point is updated, and the post-processing gain suppression is performed using a conventional Wiener filtering algorithm. Further noise reduction processing is performed on the signal after beamforming.
  • the post-processed signal is sent to the echo cancellation module for further echo cancellation. Due to the beamforming and post-processing part, it has a certain elimination effect on the echo signal. Because in this scene, the position of the speaker is generally at the bottom or back of the mobile phone, and the signal orientation generated by the speaker belongs to the noise orientation. Therefore, compared to the traditional echo cancellation technology, the beamforming and post-processing techniques of the microphone array make the echo smaller and easier to eliminate.
  • the delay difference information is already very turbid, which is a mixture of multiple azimuth sound sources.
  • the results of sound source localization based on the delay difference are random.
  • the sound pressure difference information is mixed.
  • the sound source on the front and back of the terminal the sound pressure difference generated on the two mics has a relative difference, the sound source can be used for positioning, especially in a video call scene, when a speaker, a noise source, or the like, The sound pressure difference information is more reliable when the terminal is not very far away.
  • the mic microphone layout on the terminal 100 includes, but is not limited to, any one of the layouts of FIG. 2A, FIG. 2B or FIG. 2C.
  • the microphone layout in Figures 2A, 2B or 2C is a typical inline layout. When the number of microphones increases, the beam pickup range can be better distinguished, the beam range can be more accurate, and spatial 3D sound source positioning can be realized.
  • the microphone layout in Fig. 2A is employed, the signals before and after can be effectively distinguished.
  • the mic layout in FIG. 2B not only the front and rear signals can be effectively distinguished, but also a mic is added to the bottom (the left and right positions are not limited), and the sound source orientation of the mobile phone in the up and down direction can be distinguished.
  • the present invention provides a sound processing method applied to a terminal having two microphones on the top, and the two The microphones are respectively located on the front and the back of the terminal.
  • the two microphones are used to collect the sound signals in the current environment; according to the sound signals collected to the current frame, two microphones are calculated according to the first preset algorithm.
  • the sound pressure difference between the two is determined whether the sound pressure difference between the two microphones of the current frame satisfies the sound source direction determination condition; if the sound source direction determination condition is satisfied, the sound pressure difference between the two microphones according to the current frame is determined.
  • the backward sound signal is a sound signal whose sound source is located behind the camera; if it is determined that the sound signal of the current frame includes a backward sound signal, then the current sound signal
  • the backward sound signal in the sound signal of the frame is filtered out.
  • an embodiment of the present invention provides a sound processing apparatus 700.
  • the apparatus 700 is applied to a terminal having two microphones at the top, and the two microphones are respectively located.
  • the front and back sides of the terminal as shown in FIG. 7, the apparatus 700 includes an acquisition module 701, a calculation module 702, a determination module 703, a determination module 704, and a filtering module 705, where:
  • the collecting module 701 is configured to collect, by using the two microphones, a sound signal in a current environment where the terminal is located when detecting that the camera of the terminal is in a shooting state.
  • the acquisition module can be implemented by a processor, and can call a program instruction in a local memory or a cloud server to monitor whether the camera function of the camera is enabled. If the camera is enabled, the processor can further control the microphone to collect the sound signal, further Ground, the acquired signal can also be converted into a digital signal through an audio circuit.
  • the collecting module 701 can include a detecting unit 701a and a receiving unit 701b.
  • the detecting unit 701a is configured to detect whether a camera is in an enabled state, and can distinguish the front and rear cameras; if a camera is detected to be enabled, Then, the receiving unit 701b further collects the sound signal in the current environment. Both units can also implement the corresponding functions by calling the programs and instructions in the memory through the processor.
  • the calculating module 702 is configured to calculate a sound pressure difference between the two microphones according to the first preset algorithm according to the sound signal collected by the collecting module 701.
  • the computing module can be implemented by a processor, and the sound signal obtained by the above-mentioned collected sound signal can be calculated and processed by calling a sound pressure difference algorithm program in the local memory or the cloud server to obtain a sound pressure difference.
  • the determining module 703 is configured to determine whether the sound pressure difference between the two microphones calculated by the calculating module 702 satisfies the sound source direction determining condition.
  • the judging module can be implemented by the processor, and can perform corresponding calculation by calling a judgment algorithm program in the local storage or the cloud server to obtain a judgment result.
  • a determining module 704 configured to determine, according to the sound pressure difference between the two microphones, whether the sound signal includes a backward sound signal according to the sound source direction determining condition .
  • the determining module can be implemented by the processor. When the received result is satisfied, the back sound signal can be determined whether the sound signal is included in the sound signal by calling the backward sound judgment algorithm program in the local memory or the cloud server.
  • the filtering module 705 is configured to filter the backward sound signal in the sound signal if the determining module 704 determines that the sound signal includes a backward sound signal.
  • the filtering module can be implemented by a processor. When it is determined that the sound signal includes a backward sound signal, the backward sound signal in the sound signal can be filtered by calling a noise filtering algorithm program in the local memory or the cloud server. .
  • the angle of view that the camera can take is about 120 degrees, and not the front area of the entire camera. Therefore, there may also be a noise signal whose sound source is located in front of the camera and beyond the camera's imaging range. This part of the noise signal is relative to the backward sound signal. Less influence on the content of the camera, can be understood as a secondary noise signal, the area mentioned above The domain allows for a certain definition error. Therefore, in the specific implementation process, in addition to filtering out part of the noise of the backward sound signal, the secondary noise can be further filtered out, and the following two examples can be referred to.
  • the top and back of the terminal and the bottom each have a layout of one microphone.
  • the following is an example of a layout situation in which the top front and back sides of the terminal and the bottom left side each have a microphone, as shown in FIG. 2B.
  • the device may further include secondary noise filtering regardless of whether there is a backward sound signal before.
  • the module 706 is configured to perform the following steps:
  • the acquisition module 701 detects that the terminal uses the front camera to shoot, it can also perform time difference difference positioning on the sound signals collected by the bottom left microphone and the top front microphone to obtain an up and down azimuth of the sound signal; At the first preset angle, it is determined that the sound signal of the current frame contains a secondary noise signal.
  • the secondary noise signal is a noise signal located in front of the front camera and outside the boundary of the front camera's imaging range. If it is determined that the sound signal of the current frame includes the secondary noise signal, the sound signal collected by the top back microphone is used as a reference signal, and the adaptive filter of the terminal is controlled to filter out the sound signal of the current frame collected by the top front microphone. Secondary noise signal.
  • the collecting module 701 detects that the terminal uses the rear camera to shoot, it can also perform time difference positioning for the sound signals collected by the bottom left microphone and the top rear microphone to obtain the up and down azimuth of the sound signal; the upper and lower azimuth angle is greater than the first
  • the preset angle (which may be the same as or different from the first preset angle in the previous segment) determines that the sound signal of the current frame contains a secondary noise signal.
  • the secondary noise signal is a noise signal located in front of the rear camera and outside the boundary of the rear camera imaging range.
  • the sound signal collected by the top front microphone is used as the reference signal, and the adaptive filter of the control terminal filters out the time of the sound signal of the current frame collected by the top and back microphones. Level noise signal.
  • the apparatus may further include a secondary noise filtering module for performing the following steps, regardless of whether there is a backward sound signal.
  • the acquisition module 701 detects that the terminal uses the front camera to shoot, it can also perform time difference difference positioning on the sound signals collected by the bottom left microphone and the top front microphone to obtain an up and down azimuth of the sound signal; At the first preset angle, it is determined that the sound signal of the current frame contains a secondary noise signal.
  • the secondary noise signal is a noise signal located in front of the front camera and outside the boundary of the front camera's imaging range. Further, performing delay difference positioning on the sound signals of the current frame collected by the third microphone and the fourth microphone to obtain a left and right azimuth of the sound signal of the current frame; and the left and right azimuth angles are greater than the second preset Angle, determining that the sound signal of the current frame contains a secondary noise signal.
  • the sound signal collected by the top back microphone is used as the reference signal, and the adaptive filter of the control terminal filters out the current frame collected by the top front microphone. All secondary noise signals in the sound signal.
  • the collecting module 701 detects that the terminal uses the rear camera to shoot, it can also perform time difference positioning for the sound signals collected by the bottom left microphone and the top rear microphone to obtain the up and down azimuth of the sound signal; the upper and lower azimuth angle is greater than the first
  • the angle is preset, it is determined that the sound signal of the current frame contains a secondary noise signal.
  • the secondary noise signal is a noise signal located in front of the rear camera and outside the boundary of the rear camera imaging range.
  • the time difference difference positioning may be performed on the sound signals of the current frame collected by the third microphone and the fourth microphone to obtain the left and right azimuth angles of the sound signals of the current frame; and the left and right azimuth angles are greater than the second preset angle (and Second preset in the previous paragraph The angle may be the same or different), determining that the sound signal of the current frame includes a secondary noise signal; and by determining the secondary noise signal in the sound signal of the current frame through the above steps, the sound signal collected by the top front microphone is used as the sound signal
  • the reference signal controls the terminal's adaptive filter to filter out all secondary noise signals in the sound signal of the current frame acquired by the top-back microphone.
  • the processor When it is determined that the sound signal contains the secondary noise signal, all the sound signals can be obtained by calling the secondary noise filtering algorithm program in the local memory or the cloud server. The secondary noise signal is filtered out.
  • the collection module 701 is specifically configured to perform the method mentioned in the step 31 and the method that can be replaced by the same; the calculation module 702 is specifically configured to perform the method mentioned in the step 32 and the method that can be replaced equally;
  • the determining module 703 is specifically configured to perform the method mentioned in the step 33 and the method that can be replaced by the same; the determining module 704 is specifically configured to perform the method mentioned in the step 34 and the method that can be replaced equally;
  • the filtering module 705 is specifically used The method mentioned in step 35 and the method which can be replaced equally are performed;
  • the secondary noise filtering module 706 is specifically used to perform the method mentioned in the step 36 or 37 and the method which can be equivalently replaced.
  • the present invention provides a sound processing device, which is applied to a terminal having two microphones on the top, and the two microphones are respectively located on the front and the back of the terminal.
  • the device includes: an acquisition module 701, a calculation module 702, and a determination module 703.
  • the preset algorithm calculates a sound pressure difference between the two microphones; the determining module 703 determines whether the sound pressure difference between the two microphones of the current frame satisfies the sound source direction determination condition; if the sound source direction determination condition is satisfied, the determining module 704 Determining whether the sound signal of the current frame includes a backward sound signal according to the sound pressure difference between the two microphones of the current frame, and the backward sound signal is a sound signal whose sound source is located behind the camera; if the current frame is determined
  • the sound signal includes a backward sound signal
  • the filtering module 705 converts the backward sound in the sound signal of the current frame. Signals are filtered out. With this device, the noise signal outside the imaging range can be
  • each module in the above device 700 is only a division of a logical function, and the actual implementation may be integrated into one physical entity in whole or in part, or may be physically separated.
  • each of the above modules may be a separately set processing component, or may be integrated in one chip of the terminal, or may be stored in a storage element of the controller in the form of program code, and processed by one of the processors.
  • the component calls and executes the functions of each of the above modules.
  • the individual modules can be integrated or implemented independently.
  • the processing elements described herein can be an integrated circuit chip with signal processing capabilities.
  • each step of the above method or each of the above modules may be completed by an integrated logic circuit of hardware in the processor element or an instruction in a form of software.
  • the processing element may be a general purpose processor, such as a central processing unit (CPU), or may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrations Circuit (English: application-specific integrated circuit, ASIC for short), or one or more microprocessors (English: digital signal processor, referred to as: DSP), or one or more field programmable gate arrays (English: Field-programmable gate array, referred to as: FPGA).
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA Field-programmable gate array
  • embodiments of the present invention can be provided as a method, system, or computer program.
  • Product may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware.
  • the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Studio Devices (AREA)

Abstract

本发明公开了一种声音处理方法和装置。该方法应用于顶部具有两个麦克风的终端上,该两个麦克风分别位于终端的正面和背面:检测到终端的摄像头处于拍摄状态时,利用两个麦克风采集当前帧的声音信号;根据当前帧的声音信号按照第一预设算法计算两个麦克风之间的声压差;判断所述声压差是否满足声源方向判定条件;若满足判定条件,则根据声压差确定出当前帧的声音信号中是否包含后向声音信号,后向声音信号为声源位于所述摄像头后方的声音信号;若确定出当前帧的声音信号中包含后向声音信号,则将当前帧的声音信号中的后向声音信号进行滤除。这样,在低信噪比场景下基于声压差进行声源定位,能够提高摄像范围内声源的拾取精度。

Description

一种声音处理方法和装置 技术领域
本发明涉及终端技术领域,尤其涉及一种声音处理方法和装置。
背景技术
语音处理设备在采集或输入语音信号时,不可避免地要受到各种噪声的干扰。在实际语音通信系统中,常见的噪声包括平稳类噪声和方向性干扰声源,这些噪声易对目标声音信号产生干扰,严重降低采集声音的听觉舒适度和可懂度。传统的噪声估计及单通道语音增强算法对方向性干扰噪声的抑制效果很不理想。为此,需要根据实际情况,设计一些含有干扰噪声抑制能力的系统,从而达到定向拾取目标语音的目的,抑制其他噪声的能力。
现有的声源定位算法大多采用波束形成和基于时延差的声源定位等技术,对声场中的声源方位进行定位,然后利用固定波束或自适应波束的方法,达到降低束外干扰声源,定向拾音的目的。
基于终端的拍摄场景,用户会使用终端的摄像头,进行摄像。现有的基于时延差的声源定位技术,在低信噪比场景下,目标声源(与摄像头拍摄方向同向的声源)的方位信息常被噪声源(与摄像头拍摄方向反向的声源)方位信息混叠,因此会在拍摄的视频过程中出现很多噪声,使得目标声源的拾取精度低下,进而导致最后的摄像内容中仍然存在有大量的噪声。
发明内容
本发明实施例提供一种声音处理方法和装置,以解决现有的定向拾取目标声音信号时,混叠噪声严重的问题,造成目标声源的拾取精度低下的问题。
本发明实施例提供的具体技术方案如下:
第一方面,本发明实施例提供一种声音处理方法,该方法应用于顶部具有两个麦克风的终端上,两个麦克风分别位于终端的正面和背面,该方法包括:
当终端的摄像头处于拍摄状态时,利用两个麦克风采集终端所处当前环境中当前帧的声音信号;根据采集到当前帧的声音信号按照第一预设算法计算两个麦克风之间的声压差;判断当前帧的两个麦克风之间的声压差是否满足声源方向判定条件;若满足声源方向判定条件,则根据当前帧的两个麦克风之间的声压差,确定出当前帧的声音信号中是否包含后向声音信号,后向声音信号为声源位于摄像头后方的声音信号;若确定出当前帧的声音信号中包含后向声音信号,则将当前帧的声音信号中的后向声音信号进行滤除。
第二方面,本发明实施例提供一种声音处理装置,该装置应用于顶部具有两个麦克风的终端上,两个麦克风分别位于终端的正面和背面,该装置包括:
采集模块,用于当终端的摄像头处于拍摄状态时,利用两个麦克风采集终端所处当前环境中当前帧的声音信号;
计算模块,用于根据采集到当前帧的声音信号按照第一预设算法计算两个麦克风之间 的声压差;
判断模块,用于判断当前帧的两个麦克风之间的声压差是否满足声源方向判定条件;
确定模块,用于若满足声源方向判定条件,则根据当前帧的两个麦克风之间的声压差,确定出当前帧的声音信号中是否包含后向声音信号,其中,后向声音信号为声源位于摄像头后方的声音信号;
滤除模块,用于若确定出当前帧的声音信号中包含后向声音信号,则将当前帧的声音信号中的后向声音信号进行滤除。
根据本发明实施例提供的上述方法和装置的技术方案,可以通过一定的算法确定出声音信号中的后向声音信号,并将其滤除。因此可以在摄像时,将摄像范围外的噪声信号滤除,保证了拍摄时的视频的声音质量,提高用户的体验。
根据第一方面或者第二方面,在一种可能的设计中,终端需要检测摄像头的拍摄状态,在检测到摄像头是否拍摄时,还可以确定出摄像头的位置。如果终端只有一个摄像头,则可以直接获取到摄像头的位置。如果终端具有多个摄像头,在检测摄像头是否拍摄状态时,还可以确定出具体是哪个摄像头在进行拍摄,以使得处理器根据摄像头的位置采用对应的算法进行信号后续处理。检测到摄像头的拍摄状态,可以通过定时程序检测,或者可以检测摄像头的使能信号等方式实现。
该步骤可以由采集模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案能够获取到摄像头的使能状态以及摄像头所处的位置。
根据第一方面或者第二方面,在一种可能的设计中,利用两个麦克风采集终端所处当前环境中当前帧的声音信号,包括:利用两个麦克风采集当前帧的声音信号,分别为S1、S2;根据采集到的声音信号按照第一预设算法计算两个麦克风之间的声压差,包括:基于S1、S2,利用快速傅里叶变换FFT算法计算S1、S2的功率谱,分别为P1、P2;根据P1、P2,利用以下公式计算所述两个麦克风之间的声压差;
Figure PCTCN2017106905-appb-000001
其中,P1表示顶部正面麦克风在当前帧对应的声音功率谱,P2表示顶部背面麦克风在当前帧对应的声音功率谱,且P1和P2均为含有N个元素的向量,所述N个元素为当前帧声音信号进行快速傅里叶变换后对应的N个频点的值,N为大于1的整数;ILDnow为包含N个频点对应的声压差的向量。
该步骤可以由采集模块和计算模块完成。更具体地,这个技术实现可以由处理器控制麦克风音频电路采集声音信号、以及调用存储器中的程序与指令对采集到的声音信号进行相应的运算。该设计方案能够计算出声压差,值得说明的是,计算声压差的方法有很多种替代方式,此不不进行一一列举。
根据第一方面或者第二方面,在一种可能的设计中,判断当前帧的两个麦克风之间的声压差是否满足声源方向判定条件,包括:
利用第i频点对应的两个麦克风的声压差,按照第二预设算法计算出第i频点对应的最大参考值和最小参考值,其中第i频点为所述N个频点中的一个,i取遍不大于N的所有正整数;
如果第i频点的最大参考值与最小参考值之差大于所述第i频点对应的第一门限值,则确定两个麦克风之间的声压差在所述第i频点上满足声源方向判定条件;
如果所述最大参考值与所述最小参考值之差不大于所述第i频点对应的第一门限值,则确定两个麦克风之间的声压差在所述第i频点上不满足声源方向判定条件;
若所述N个频点中的M个频点满足声源方向判定条件,确定所述当前帧的两个麦克风之间的声压差满足声源方向判定条件,其中M大于等于N/2。
该步骤可以由判断模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案给出了能否通过声压差去判断噪声的判断规则,为后续如何合理利用声压差提供使用依据,具体判别方法可以有多种替换方式,本发明不做限定;第一门限值可以根据经验值按需设定,本发明依旧不进行限定。
根据第一方面或者第二方面,在一种可能的设计中,所述利用第i频点对应的所述两个麦克风的声压差,按照第二预设算法计算出所述第i频点对应的最大参考值和最小参考值,包括:
获取第i-1频点对应的最大参考值,所述第i-1频点为所述第i频点的上一个频点,若所述第i频点对应的两个麦克风的声压差不大于所述第i-1频点对应的最大参考值时,利用以下公式计算所述第i频点对应的最大参考值,
ILDmax=αlow*ILDnow+(1-αlow)*ILDmax′;
若所述第i频点对应的两个麦克风的声压差大于所述第i-1频点对应的最大参考值时,利用以下公式计算所述第i频点对应的最大参考值,
ILDmax=αfast*ILDnow+(1-αfast)*ILDmax′;
获取第i-1频点对应的最小参考值,若所述第i频点对应的两个麦克风的声压差大于所述第i-1频点对应的最小参考值时,利用以下公式计算所述第i频点对应的最小参考值,
ILDmin=αlow*ILDnow+(1-αlow)*ILDmin′;
若所述第i频点对应的两个麦克风的声压差不大于所述第i-1频点对应的最小参考值时,利用以下公式计算所述第i频点对应的最小参考值,
ILDmin=αfast*ILDnow+(1-αfast)*ILDmin′;
其中,ILDnow表示所述第i频点对应的两个麦克风的声压差,ILDmax表示所述第i频点对应的最大参考值,ILDmax′表示所述第i-1频点对应的最大参考值,ILDmin表示所述第i频点对应的最小参考值,ILDmin′表示所述第i-1频点对应的最小参考值,αfast、αlow表示预设的步长值,且αfastlow
该步骤可以由判断模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案给出了能否通过声压差去判断噪声的判断规则的一种下位实现,具体判别方法可以有多种替换方式,本发明不做限定。
根据第一方面或者第二方面,在一种可能的设计中,根据所述当前帧的两个麦克风之间的声压差,确定出所述当前帧的声音信号中是否包含后向声音信号,包括:
当第j频点对应的声压差小于所述第j频点对应的第二门限值时,确定所述j频点处包含后向声音信号,其中,所述第j频点为所述M个频点中的一个,j取遍不大于M的所有正整数;
当所述两个麦克风在第j频点对应的声压差不小于第二门限值时,确定所述j频点处不包含后向声音信号。
该步骤可以由确定模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案给出了通过声压差最终判定出噪声的方式,能够准确地识别出后向声音信号;第二门限值可以根据经验进行按需设定。
根据第一方面或者第二方面,在一种可能的设计中,将所述当前帧的声音信号中的后向声音信号进行滤除,包括:
若检测到终端正在拍摄的摄像头为前置摄像头,则以顶部背面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的后向声音信号;
若检测到终端正在拍摄的摄像头为后置摄像头,则以顶部正面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的后向声音信号。
该步骤可以由滤除模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案给出了针对不同位置的摄像头如何进行噪声处理。
根据第一方面或者第二方面,在一种可能的设计中,若终端在底部还包括第三麦克风,第三麦克风位于底部的位置不做限定,且正在拍摄的摄像头为前置摄像头时,方法还包括:
针对第三麦克风和顶部正面麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的上下方位角;
在上下方位角大于第一预设角度时,确定所述当前帧的声音信号中包含次级噪声信号;该情形下,次级噪声信号为位于前置摄像头前方且位于前置摄像头摄像范围以外的噪声信号;
若确定出当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。
具体实现过程中,上述装置还可以包括次级噪声滤除模块,用于执行上述方法。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案给出了当存在底部麦克风的时候,还可以对次级噪声进行处理。
根据第一方面或者第二方面,在一种可能的设计中,若终端在底部还包括第四麦克风,且第三麦克风和第四麦克风在终端底部左右排列,具体位置不做限定,该方法还包括:
针对第三麦克风和第四麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的左右方位角;
在左右方位角大于第二预设角度,确定所述当前帧的声音信号中包含次级噪声信号;
若确定出当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。值得注意的是,采用上下方位角和左右方位角都能确定出次级噪声信 号,只是侧重的声源方位不同,两者可以互为补充,比单独用上下方位角或用左右方位角确定次级噪声信号更为全面准确。
具体实现过程中,上述装置还可以包括次级噪声滤除模块,用于执行上述方法。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案给出了当底部存在两个麦克风的时候,还可以对次级噪声进行处理。
根据第一方面或者第二方面,在一种可能的设计中,若终端在底部还包括第三麦克风,第三麦克风位于底部的位置不做限定,且正在拍摄的摄像头为后置摄像头时,该方法还包括:
针对第三麦克风和顶部背面麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的上下方位角;
在上下方位角大于第一预设角度时,确定所述当前帧的声音信号中包含次级噪声信号,该情形下,次级噪声信号为位于后置摄像头前方且位于后置摄像头摄像范围以外的噪声信号;
若确定出当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。
具体实现过程中,上述装置还可以包括次级噪声滤除模块,用于执行上述方法。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案给出了当存在底部麦克风的时候,还可以对次级噪声进行处理。
根据第一方面或者第二方面,在一种可能的设计中,若终端在底部还包括第四麦克风,且第三麦克风和第四麦克风在终端底部左右排列,该方法还包括:
针对第三麦克风和第四麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的左右方位角;
在左右方位角大于第二预设角度,确定所述当前帧的声音信号中包含次级噪声信号;
若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。值得注意的是,采用上下方位角和左右方位角都能确定出次级噪声信号,只是侧重的声源方位不同,两者可以互为补充,比单独用上下方位角或用左右方位角确定次级噪声信号更为全面准确。
具体实现过程中,上述装置还可以包括次级噪声滤除模块,用于执行上述方法。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。该设计方案给出了当底部存在两个麦克风的时候,还可以对次级噪声进行处理。
第三方面,本发明实施例提供一种声音处理终端设备,该设备包括:麦克风、摄像头、存储器、处理器;它们通过总线相连;其中,
麦克风用于在所述处理器的控制下采集声音信号;
摄像头用于在所述处理器的控制下采集图像信号;
存储器用于存储计算机程序和指令;
处理器用于调用所述存储器中存储的计算机程序和指令,执行如上述任一一种可能的设计方法。
根据第三方面,在一种可能的设计中,终端设备还包括天线系统、天线系统在处理器的控制下,收发无线通信信号实现与移动通信网络的无线通信;移动通信网络包括以下的一种或多种:GSM网络、CDMA网络、3G网络、FDMA、TDMA、PDC、TACS、AMPS、WCDMA、TDSCDMA、WIFI以及LTE网络。
第四方面,本发明实施例提供一种声音处理方法,该方法应用于顶部具有两个麦克风的终端上,两个麦克风分别位于终端的正面和背面,该方法包括:
终端处于视频通话状态时,确定所述终端的摄像头的摄像范围内是否存在目标用户;
若确定在摄像范围内存在目标用户,则利用两个麦克风采集终端所处当前环境中当前帧的声音信号;根据采集到当前帧的声音信号按照第一预设算法计算两个麦克风之间的声压差;判断当前帧的两个麦克风之间的声压差是否满足声源方向判定条件;若满足声源方向判定条件,则根据当前帧的两个麦克风之间的声压差,确定出当前帧的声音信号中是否包含后向声音信号,后向声音信号为声源位于摄像头后方的声音信号;若确定出当前帧的声音信号中包含后向声音信号,则将当前帧的声音信号中的后向声音信号进行滤除。
第五方面,本发明实施例提供一种声音处理装置,该装置应用于顶部具有两个麦克风的终端上,两个麦克风分别位于终端的正面和背面,该装置包括:
识别模块,用于在终端处于视频通话状态时,确定终端的摄像头的摄像范围内是否存在目标用户;
采集模块,用于在识别模块识别出摄像范围内存在目标用户时,利用两个麦克风采集终端所处当前环境中当前帧的声音信号;
计算模块,用于根据采集到当前帧的声音信号按照第一预设算法计算两个麦克风之间的声压差;
判断模块,用于判断当前帧的两个麦克风之间的声压差是否满足声源方向判定条件;
确定模块,用于若满足声源方向判定条件,则根据当前帧的两个麦克风之间的声压差,确定出当前帧的声音信号中是否包含后向声音信号,其中,后向声音信号为声源位于摄像头后方的声音信号;
滤除模块,用于若确定出当前帧的声音信号中包含后向声音信号,则将当前帧的声音信号中的后向声音信号进行滤除。
根据本发明实施例提供的上述方法和装置的技术方案,可以通过一定的算法确定出声音信号中的后向声音信号,并将其滤除。因此可以在摄像时,如视频通话这样的场景中,可以将摄像范围外的噪声信号滤除,保证了视频的声音质量,提高用户的体验。
根据第四方面或者第五方面,在一种可能的设计中,终端需要检测摄像头的拍摄状态,在检测到摄像头是否拍摄时,还可以确定出摄像头的位置(如前置摄像头还是后向摄像头)。如果终端只有一个摄像头,则可以直接获取到摄像头的位置。如果终端具有多个摄像头,在检测摄像头是否拍摄状态时,还可以确定出具体是哪个摄像头在进行拍摄,以使得处理器根据摄像头的位置采用对应的算法进行信号后续处理。检测到摄像头的拍摄状态,可以通过定时程序检测,或者可以检测摄像头的使能信号等方式实现。
该步骤可以由识别模块或者采集模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,终端在检测摄像头的拍摄状态时, 还可以进一步确定摄像头开启的场景,如属于常规视频拍摄还是进行实时视频通话等等,这些可以通过处理器通过程序指令识别使能信号来实现。
该步骤可以由识别模块或者采集模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,确定在摄像范围内存在目标用户,包括:
利用人像识别技术在所述摄像范围内检测出存在任一用户;或,
利用人脸识别技术在所述摄像范围内检测出存在脸部特征与预存的人脸模板相同的用户;或
利用唇部运动检测技术,检测出存在唇部发生运动的用户。
该步骤可以由识别模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,利用两个麦克风采集终端所处当前环境中当前帧的声音信号,包括:利用两个麦克风采集当前帧的声音信号,分别为S1、S2;根据采集到的声音信号按照第一预设算法计算两个麦克风之间的声压差,包括:基于S1、S2,利用快速傅里叶变换FFT算法计算S1、S2的功率谱,分别为P1、P2;根据P1、P2,利用以下公式计算所述两个麦克风之间的声压差;
Figure PCTCN2017106905-appb-000002
其中,P1表示顶部正面麦克风在当前帧对应的声音功率谱,P2表示顶部背面麦克风在当前帧对应的声音功率谱,且P1和P2均为含有N个元素的向量,所述N个元素为当前帧声音信号进行快速傅里叶变换后对应的N个频点的值,N为大于1的整数;ILDnow为包含N个频点对应的声压差的向量。
该步骤可以由采集模块和计算模块完成。更具体地,这个技术实现可以由处理器控制麦克风音频电路采集声音信号、以及调用存储器中的程序与指令对采集到的声音信号进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,判断当前帧的两个麦克风之间的声压差是否满足声源方向判定条件,包括:
利用第i频点对应的两个麦克风的声压差,按照第二预设算法计算出第i频点对应的最大参考值和最小参考值,其中第i频点为所述N个频点中的一个,i取遍不大于N的所有正整数;
如果第i频点的最大参考值与最小参考值之差大于所述第i频点对应的第一门限值,则确定两个麦克风之间的声压差在所述第i频点上满足声源方向判定条件;
如果所述最大参考值与所述最小参考值之差不大于所述第i频点对应的第一门限值,则确定两个麦克风之间的声压差在所述第i频点上不满足声源方向判定条件;
若所述N个频点中的M个频点满足声源方向判定条件,确定所述当前帧的两个麦克风之间的声压差满足声源方向判定条件,其中M大于等于N/2。
该步骤可以由判断模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,所述利用第i频点对应的所述两个麦克风的声压差,按照第二预设算法计算出所述第i频点对应的最大参考值和最小参考值,包括:
获取第i-1频点对应的最大参考值,所述第i-1频点为所述第i频点的上一个频点,若所述第i频点对应的两个麦克风的声压差不大于所述第i-1频点对应的最大参考值时,利用以下公式计算所述第i频点对应的最大参考值,
ILDmax=αlow*ILDnow+(1-αlow)*ILDmax′;
若所述第i频点对应的两个麦克风的声压差大于所述第i-1频点对应的最大参考值时,利用以下公式计算所述第i频点对应的最大参考值,
ILDmax=αfast*ILDnow+(1-αfast)*ILDmax′;
获取第i-1频点对应的最小参考值,若所述第i频点对应的两个麦克风的声压差大于所述第i-1频点对应的最小参考值时,利用以下公式计算所述第i频点对应的最小参考值,
ILDmin=αlow*ILDnow+(1-αlow)*ILDmin′;
若所述第i频点对应的两个麦克风的声压差不大于所述第i-1频点对应的最小参考值时,利用以下公式计算所述第i频点对应的最小参考值,
ILDmin=αfast*ILDnow+(1-αfast)*ILDmin′;
其中,ILDnow表示所述第i频点对应的两个麦克风的声压差,ILDmax表示所述第i频点对应的最大参考值,ILDmax′表示所述第i-1频点对应的最大参考值,ILDmin表示所述第i频点对应的最小参考值,ILDmin′表示所述第i-1频点对应的最小参考值,αfast、αlow表示预设的步长值,且αfastlow
该步骤可以由判断模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,根据所述当前帧的两个麦克风之间的声压差,确定出所述当前帧的声音信号中是否包含后向声音信号,包括:
当第j频点对应的声压差小于所述第j频点对应的第二门限值时,确定所述j频点处包含后向声音信号,其中,所述第j频点为所述M个频点中的一个,j取遍不大于M的所有正整数;
当所述两个麦克风在第j频点对应的声压差不小于第二门限值时,确定所述j频点处不包含后向声音信号。
该步骤可以由确定模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,将所述当前帧的声音信号中的后向声音信号进行滤除,包括:
若检测到终端正在拍摄的摄像头为前置摄像头,则以顶部背面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中 的后向声音信号;
若检测到终端正在拍摄的摄像头为后置摄像头,则以顶部正面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的后向声音信号。
该步骤可以由滤除模块完成。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,若终端在底部还包括第三麦克风,第三麦克风位于底部的位置不做限定,且正在拍摄的摄像头为前置摄像头时,方法还包括:
针对第三麦克风和顶部正面麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的上下方位角;
在上下方位角大于第一预设角度时,确定所述当前帧的声音信号中包含次级噪声信号;该情形下,次级噪声信号为位于前置摄像头前方且位于前置摄像头摄像范围以外的噪声信号;
若确定出当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。
具体实现过程中,上述装置还可以包括次级噪声滤除模块,用于执行上述方法。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,若终端在底部还包括第四麦克风,且第三麦克风和第四麦克风在终端底部左右排列,具体位置不做限定,该方法还包括:
针对第三麦克风和第四麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的左右方位角;
在左右方位角大于第二预设角度,确定所述当前帧的声音信号中包含次级噪声信号;
若确定出当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。值得注意的是,采用上下方位角和左右方位角都能确定出次级噪声信号,只是侧重的声源方位不同,两者可以互为补充,比单独用上下方位角或用左右方位角确定次级噪声信号更为全面准确。
具体实现过程中,上述装置还可以包括次级噪声滤除模块,用于执行上述方法。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,若终端在底部还包括第三麦克风,第三麦克风位于底部的位置不做限定,且正在拍摄的摄像头为后置摄像头时,该方法还包括:
针对第三麦克风和顶部背面麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的上下方位角;
在上下方位角大于第一预设角度时,确定所述当前帧的声音信号中包含次级噪声信号,该情形下,次级噪声信号为位于后置摄像头前方且位于后置摄像头摄像范围以外的噪声信号;
若确定出当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信 号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。
具体实现过程中,上述装置还可以包括次级噪声滤除模块,用于执行上述方法。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
根据第四方面或者第五方面,在一种可能的设计中,若终端在底部还包括第四麦克风,且第三麦克风和第四麦克风在终端底部左右排列,该方法还包括:
针对第三麦克风和第四麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的左右方位角;
在左右方位角大于第二预设角度,确定所述当前帧的声音信号中包含次级噪声信号;
若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。值得注意的是,采用上下方位角和左右方位角都能确定出次级噪声信号,只是侧重的声源方位不同,两者可以互为补充,比单独用上下方位角或用左右方位角确定次级噪声信号更为全面准确。
具体实现过程中,上述装置还可以包括次级噪声滤除模块,用于执行上述方法。更具体地,这个技术实现可以由处理器调用存储器中的程序与指令进行相应的运算。
第六方面,本发明实施例提供一种声音处理终端设备,该设备包括:麦克风、摄像头、存储器、处理器;它们通过总线相连;其中,
麦克风用于在所述处理器的控制下采集声音信号;
摄像头用于在所述处理器的控制下采集图像信号;
存储器用于存储计算机程序和指令;
处理器用于调用所述存储器中存储的计算机程序和指令,执行如上述任一一种可能的设计方法。
根据第六方面,在一种可能的设计中,终端设备还包括天线系统、天线系统在处理器的控制下,收发无线通信信号实现与移动通信网络的无线通信;移动通信网络包括以下的一种或多种:GSM网络、CDMA网络、3G网络、FDMA、TDMA、PDC、TACS、AMPS、WCDMA、TDSCDMA、WIFI以及LTE网络。
通过上述方案,本发明的实施例中终端处于拍摄状态时,在低信噪比场景下,采用基于声压差的方法来判断声源方向,能够有效确定噪声并抑制噪声,提高摄像时目标声源的拾取精度,提高用户体验。
附图说明
图1为终端的结构示意图;
图2A、图2B或图2C为本发明实施例中终端上的麦克风布局示意图;
图3为本发明实施例中声音处理方法的流程图;
图4A为终端前后两个麦克风能量相差dB数与ILD的关系示意图;
图4B为利用麦克风进行声源定位的声源方向判断示意图;
图5为基于相位差的声源定位技术原理图;
图6为广义互相关声源定位方法实现示意图;
图7为本发明实施例中的声音处理装置结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,并不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明实施例中,终端,可以是向用户提供语音和/或数据连通性的设备,具有无线连接功能的手持式设备、或连接到无线调制解调器的其他处理设备,比如:移动电话(或称为“蜂窝”电话),可以是便携式、袖珍式、手持式、可穿戴设备(如智能手表、智能手环等)、平板电脑、个人电脑(PC,Personal Computer)、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑等。
图1示出了终端100的一种可选的硬件结构示意图。
参考图1所示,终端100可以包括射频单元110、存储器120、输入单元130、显示单元140、摄像头150、音频电路160、扬声器161、麦克风162、处理器170、外部接口180、电源190等部件,所述麦克风162可以是模拟麦克风或数字麦克风,能够实现正常的麦克风拾音功能,且麦克风的数量至少为2个,且麦克风的布局需满足一定的要求,具体可参阅图2A(终端顶部一前一后共两个麦克风)、图2B(终端顶部一前一后、底部一个,共三个麦克风)和图2C(终端顶部一前一后,底部一左一右共四个麦克风)所示的几种布局,当然也可以包括其他的布局方式。可通过操作系统获取到底层麦克风采集到的声音数据,可实现基本的通话功能。
本领域技术人员可以理解,图1仅仅是便携式多功能装置的举例,并不构成对便携式多功能装置的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。
所述输入单元130可用于接收输入的数字或字符信息,以及产生与所述便携式多功能装置的用户设置以及功能控制有关的键信号输入。具体地,输入单元130可包括触摸屏131以及其他输入设备132。所述触摸屏131可收集用户在其上或附近的触摸操作(比如用户使用手指、关节、触笔等任何适合的物体在触摸屏上或在触摸屏附近的操作),并根据预先设定的程序驱动相应的连接装置。触摸屏可以检测用户对触摸屏的触摸动作,将所述触摸动作转换为触摸信号发送给所述处理器170,并能接收所述处理器170发来的命令并加以执行;所述触摸信号至少包括触点坐标信息。所述触摸屏131可以提供所述终端100和用户之间的输入界面和输出界面。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触摸屏。除了触摸屏131,输入单元130还可以包括其他输入设备。具体地,其他输入设备132可以包括但不限于物理键盘、功能键(比如音量控制按键132、开关按键133等)、轨迹球、鼠标、操作杆等中的一种或多种。
所述显示单元140可用于显示由用户输入的信息或提供给用户的信息以及终端100的各种菜单。进一步的,触摸屏131可覆盖显示面板141,当触摸屏131检测到在其上或附近的触摸操作后,传送给处理器170以确定触摸事件的类型,随后处理器170根据触摸事件的类型在显示面板141上提供相应的视觉输出。在本实施例中,触摸屏与显示单元可以集 成为一个部件而实现终端100的输入、输出、显示功能;为便于描述,本发明实施例以触摸显示屏代表触摸屏和显示单元的功能集合;在某些实施例中,触摸屏与显示单元也可以作为两个独立的部件。
所述存储器120可用于存储指令和数据,存储器120可主要包括存储指令区和存储数据区,存储数据区可存储关节触摸手势与应用程序功能的关联关系;存储指令区可存储操作系统、应用、至少一个功能所需的指令等软件单元,或者他们的子集、扩展集。还可以包括非易失性随机存储器;向处理器170提供包括管理计算处理设备中的硬件、软件以及数据资源,支持控制软件和应用。还用于多媒体文件的存储,以及运行程序和应用的存储。
处理器170是终端100的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器120内的指令以及调用存储在存储器120内的数据,执行终端100的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器170可包括一个或多个处理单元;优选的,处理器170可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器170中。在一些实施例中,处理器、存储器、可以在单一芯片上实现,在一些实施例中,他们也可以在独立的芯片上分别实现。处理器170还可以用于产生相应的操作控制信号,发给计算处理设备相应的部件,读取以及处理软件中的数据,尤其是读取和处理存储器120中的数据和程序,以使其中的各个功能模块执行相应的功能,从而控制相应的部件按指令的要求进行动作。
摄像头150用于采集图像或视频,可以通过应用程序指令触发开启,实现拍照或者摄像功能。
所述射频单元110可用于收发信息或通话过程中信号的接收和发送,特别地,将基站的下行信息接收后,给处理器170处理;另外,将设计上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,射频单元110还可以通过无线通信与网络设备和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
音频电路160、扬声器161、麦克风162可提供用户与终端100之间的音频接口。音频电路160可将接收到的音频数据转换后的电信号,传输到扬声器161,由扬声器161转换为声音信号输出;另一方面,麦克风162用于收集声音信号,还可以将收集的声音信号转换为电信号,由音频电路160接收后转换为音频数据,再将音频数据输出处理器170处理后,经射频单元110以发送给比如另一终端,或者将音频数据输出至存储器120以便进一步处理,音频电路也可以包括耳机插孔163,用于提供音频电路和耳机之间的连接接口。
终端100还包括给各个部件供电的电源190(比如电池),优选的,电源可以通过电源管理系统与处理器170逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
终端100还包括外部接口180,所述外部接口可以是标准的Micro USB接口,也可以使 多针连接器,可以用于连接终端100与其他装置进行通信,也可以用于连接充电器为终端100充电。
尽管未示出,终端100还可以包括闪光灯、无线保真(wireless fidelity,WiFi)模块、蓝牙模块、各种传感器等,在此不再赘述。
在一些场景中,用户使用移动终端例如手机,进行视频录制或实时摄像时,用户一般希望摄像到的视频中不包含摄像头后方的声音。然而在信噪比较低的环境中,来自摄像头后方的干扰噪声源,容易被定位成摄像头摄像范围内的声源,声源定位容易出现误判,准确性较差。因此,本发明实施例提供一种声音处理方法和装置,以提高声源定位的准确性,降低误判,有效滤除来自摄像头后方的噪声,本发明实施例中也可以叫做后向声音信号。作为说明,以终端机身所在的平面为界,声源在摄像头后方区域(如,对于前置摄像头,前置摄像头后方可以理解为机身背面一侧的区域;如,对于后置摄像头,后置摄像头后方可以理解为机身正面一侧的区域)的噪声可以被理解为后向声音信号。上面所提到的区域允许存在一定的界定误差。
参阅图3所示,本发明实施例提供一种声音处理方法,所述方法可以应用在顶部具有两个麦克风的终端上,所述两个麦克风分别位于所述终端的正面和背面,所述终端可以为图1所示的终端100,麦克风设置方式可以如图2A、图2B或图2C中的任意一种布局所示;具体流程包括如下步骤:
步骤31:检测到所述终端的摄像头处于拍摄状态时,利用所述两个麦克风采集所述终端所处当前环境中的声音信号。其中,时域上,声音信号可以更精细地分为声音的帧信号,帧的长短与预先设置的划分算法有关,因此每个帧都有对应的声音信号。因此,麦克风处于工作状态时,可以收集到当前帧的声音信号。
步骤32:根据采集到的声音信号按照第一预设算法计算所述两个麦克风之间的声压差。
具体进行计算处理时,可以对每一帧信号进行计算,得到这一帧对应的两个麦克风的声压差。
步骤33:判断所述两个麦克风之间的声压差是否满足声源方向判定条件。
步骤34:若满足所述声源方向判定条件,则根据所述两个麦克风之间的声压差,确定出所述声音信号中是否包含后向声音信号,所述后向声音信号为声源位于所述摄像头后方的声音信号。后向声音信号也可以理解为一种噪声信号。
步骤35:若确定出所述声音信号中包含后向声音信号,则将所述声音信号中的后向声音信号进行滤除。
具体的,步骤31、32可以通过以下过程实现:
终端可以通过预先设定的检测程序识别出摄像头是否处于开启状态,如检测摄像头是否已经使能,一旦检测到摄像头处于拍摄状态时,所述终端利用所述终端的顶部正面和背面两个麦克风采集当前环境中的声音信号,理论上对于当前帧的声音信号可以分别记为S1、S2;基于S1、S2,利用快速傅里叶变换(Fast Fourier Transformation,FFT)算法计算S1、S2的功率谱,分别为P1、P2;根据P1、P2,计算所述两个麦克风之间的声压差;本领域技术人员应当清楚,声音信号可以由多个帧信号构成。另外,如果终端具有两个摄像头,在步骤31的具体实现过程中,在检测到摄像头使能时,通常也可以检测到终端是使用前置摄像头还是后置摄像头,使得处理器能够根据摄像头的位置为后面的信号处理做出合适的 算法选择。
一种具体实现过程如下:
首先将所述两个麦克风采集到的声音信号发送到FFT模块,FFT模块负责对采集到的声音信号进行时频变换,得到信号的频谱。具体的,FFT模块采用短时傅立叶变换(Short-Time Fourier Transform,STFT)对信号进行处理。
这里以顶部正面和背面两个麦克风为例,假定两个麦克风拾取的声音信号为xi(n),其中i=1,2,当所使用的摄像头为前置摄像头时,i=1,2分别表示正面、反面的麦克风;当所使用的摄像头为后置摄像头时,i=1,2分别表示反面、正面的麦克风;n表示一帧信号的样点数,以8k采样率,10ms为帧长为例,n等于80。经过傅里叶变换后得到信号xi(N,l),N代表一帧信号对应的发送频点,l代表帧号,第l帧在被采集的时刻都可以理解为当前帧。信号的功率谱
Figure PCTCN2017106905-appb-000003
当前帧对应的两个mic的声压差(interaural level difference,ILD)采用如下公式计算:
Figure PCTCN2017106905-appb-000004
以使用前置摄像头拍摄为例(使用后置摄像头拍摄,原理类似),其中,P1表示顶部正面麦克风在当前帧对应的声音功率谱,P2表示顶部背面麦克风在当前帧对应的声音功率谱,且P1和P2均为含有N个元素的向量,所述N个元素为当前帧声音信号进行快速傅里叶变换后对应的N个频点的值,N为大于1的整数;ILDnow为包含N个频点对应的声压差的向量,N的取值由预设的频点划分规则决定。上述例子中的算法仅为第一预设算法中的一种实现形式,并非限定。
图4A为顶部正面和背面两个mic能量相差分贝(dB)数与ILD的关系。
如图4A所示,ILD的取值范围是-1~1。1代表当前帧顶部正面麦克风能量显著大于顶部背面麦克风能量,属于前向声音信号,-1代表当前帧顶部背面麦克风能量显著大于顶部正面麦克风能量,属于后向声音信号。
然而,当声压差满足一定条件的时候,利用声压差去判定后向声音信号才更准确,具体的,步骤33利用所述两个麦克风的声压差,判断是否满足利用声压差判断声源方向的条件,包括以下过程:
利用当前频点对应的所述两个麦克风的声压差,计算所述两个麦克风在当前频点对应的声压差的最大值和声压差的最小值;在所述两个麦克风在当前频点对应的声压差的最大值与声压差的最小值之差大于第一门限值时,确定在当前频点满足利用声压差判断声源方向的条件;在所述声压差的最大值与所述声压差的最小值之差不大于第一门限值时,确定在当前频点不满足利用声压差判断声源方向的条件。如果在一帧信号中,在N个频点中的M个频点上满足声源方向判定条件,则确定所述当前帧的两个麦克风之间的声压差满足声源方向判定条件,其中M大于等于N/2;即当前帧可以用声压差开判定是否有后向声音信号。
这里,用ILDmax表示第i频点(当前帧对应的频点中的一个)对应的声压差的最大值,ILDmin表示第i频点对应的声压差的最小值,ILDmax和ILDmin在某一初始帧的第一个频点时,可以设为0或者设为预置的顶部正面和顶部背面两个麦克风的声压差,具体的,当ILDmax-ILDmin>第一门限时,认为两个麦克风之间的声压差在第i频点上满足声源方向 判定条件,否则认为在第i频点上不满足声源方向判定条件其中,ILDmax就是基于第i频点对应的两个麦克风的声压差计算的声压差的最大值,ILDmin就是基于第i频点对应的两个麦克风的声压差计算的声压差的最小值。
其中,ILDmax、ILDmin计算方法如下:
在第i频点对应的两个麦克风的声压差不大于第i-1频点(上一频点)对应的声压差的最大值时,利用以下公式计算所述两个麦克风在第i频点对应的声压差的最大值,
ILDmax=αlow*ILDnow+(1-αlow)*ILDmax′;
在第i频点对应的两个麦克风的声压差大于第i-1频点(上一频点)对应的声压差的最大值时,利用以下公式计算所述两个麦克风在第i频点对应的声压差的最大值,
ILDmax=αfast*ILDnow+(1-αfast)*LDmax′;
在第i频点对应的两个麦克风的声压差大于第i-1频点(上一频点)对应的声压差的最小值时,利用以下公式计算所述两个麦克风在第i频点对应的声压差的最小值,
ILDmin=αlow*ILDnow+(1-αlow)*ILDmin′;
在第i频点对应的两个麦克风的声压差不大于第i-1频点(上一频点)对应的声压差的最小值时,利用以下公式计算所述两个麦克风在第i频点对应的声压差的最小值,
ILDmin=αfast*ILDnow+(1-αfast)*ILDmin′;
其中,ILDnow表示所述第i频点对应的两个麦克风的声压差,ILDmax表示所述第i频点对应的最大参考值,ILDmax′表示所述第i-1频点对应的最大参考值,ILDmin表示所述第i频点对应的最小参考值,ILDmin′表示所述第i-1频点对应的最小参考值,αfast、αlow表示预设的步长值,且αfastlow,建议分别为0.95和0.05,由此可知ILDmax表示是基于第i频点的声压差ILDnow和上一频点的声压差最大值进行平滑得到的,ILDmin是基于第i频点的声压差ILDnow和上一频点的声压差最小值进行平滑得到的。
进一步的,如果当前帧声压差满足声源判定条件,则基于终端的顶部正面和背面两个麦克风的声压差,确定当前帧声音信号中是否包含后向声音信号,并在确定出所述当前帧声音信号中包括后向声音信号时,将后向信号滤除。
具体实现过程中,如图2A所示,在终端的顶部正面和背面各具有1个麦克风的布局情形下,步骤34可以具体为:在所述两个麦克风在第j频点对应的声压差小于第二门限值时,确定所述声音信号对应的第j频点处包含后向声音信号;在所述两个麦克风在第j频点对应的声压差不小于第二门限值时,确定所述声音信号对应的第j频点处不包含后向声音信号。
步骤35可以具体为:若确定声音信号对应的第j频点处包含后向声音信号且终端正在拍摄的摄像头为前置摄像头时,则以顶部背面麦克风采集的声音信号作为参考信号,控制 所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的后向声音信号;若正在拍摄的摄像头为后置摄像头,则以顶部正面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的后向声音信号。如可以采用NLMS自适应滤波器方案。频域滤波器是时域滤波器的等效形式,两种滤波方式在信号处理时的原理是可以进行等效的,这些都是现有技术,详细滤除过程不过详述。
值得说明的是,任意一个顶端包含前后两个麦克风的终端都可以使用此方法,如图2B、2C均可。
然而,通常摄像头能投摄取的视角范围为120度左右,而并非整个摄像头的前方区域,因此还可能存在声源位于摄像头前方且超出摄像头摄像范围的噪声信号,这部分噪声信号相对后向声音信号对摄像内容的影响较小,可以被定义为次级噪声信号,上面所提到的区域允许存在一定的界定误差。因此在具体实现过程中,除了滤除后向声音信号那部分噪声之外,还可以进一步滤除次级噪声,可以参照下列两个示例。
示例一:
终端的顶部正面和背面以及底部各具有1个麦克风的布局情形。下面以终端的顶部正面和背面以及底部左边各具有1个麦克风的布局情形为例,如图2B所示,此时,无论之前是否存在后向声音信号,上述声音处理方法还可以包括以下步骤:步骤36:若使用前置摄像头拍摄时,还可以针对底部左面麦克风和顶部正面麦克风采集到的声音信号进行时延差定位,得到声音信号的上下方位角;在所述上下方位角大于第一预设角度时,确定当前帧的声音信号中包含次级噪声信号。次级噪声信号为位于前置摄像头前方且位于前置摄像头摄像范围边界以外的噪声信号。若确定出当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。
若使用后置摄像头拍摄时,还可以针对底部左面麦克风和顶部后面麦克风采集到的声音信号进行时延差定位,得到声音信号的上下方位角;在上下方位角大于第一预设角度(与上一段中的第一预设角度可以相同也可以不同)时,确定当前帧的声音信号中包含次级噪声信号。此时,次级噪声信号为位于后置摄像头前方且位于后置摄像头摄像范围边界以外的噪声信号。若确定出当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。
在具体实现过程中,终端的顶部正面和背面以及底部右边各具有1个麦克风的布局情形与上面的实例原理极为相似,本领域技术人员能够基于本发明实例轻松实现类似实例方式,此处不加以赘述。
示例二:
终端的顶部正面和背面以及底部左面和右面各具有1个麦克风的布局情形下,为引用方便,底部的两个麦克风也可以称作第三麦克风、第四麦克风。如图2C所示,此时,无论之前是否存在后向声音信号,上述声音处理方法还可以包括以下步骤:
步骤37:若使用前置摄像头拍摄时,还可以针对底部左面麦克风和顶部正面麦克风采集到的声音信号进行时延差定位,得到声音信号的上下方位角;在所述上下方位角大于第 一预设角度时,确定当前帧的声音信号中包含次级噪声信号。次级噪声信号为位于前置摄像头前方且位于前置摄像头摄像范围边界以外的噪声信号。进一步地,还针对所述第三麦克风和所述第四麦克风采集到的当前帧的声音信号进行时延差定位,得到当前帧的声音信号的左右方位角;在左右方位角大于第二预设角度,确定当前帧的声音信号中包含次级噪声信号。通过上述步骤,若确定出当前帧的声音信号中包含次级噪声信号时,则以顶部背面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中所有的次级噪声信号。另外,上下方位角和左右方位角所能检测到的是噪声虽然都属于次级噪声信号,但侧重的噪声声源的方位是略有侧重的,如上下方位角更侧重检测终端所在平面上下方向上的噪声,而左右方位角更侧重检测终端所在平面左右方向上的噪声。
若使用后置摄像头拍摄时,还可以针对底部左面麦克风和顶部后面麦克风采集到的声音信号进行时延差定位,得到声音信号的上下方位角;在上下方位角大于第一预设角度时,确定当前帧的声音信号中包含次级噪声信号。此时,次级噪声信号为位于后置摄像头前方且位于后置摄像头摄像范围边界以外的噪声信号。进一步地,还可以针对第三麦克风和第四麦克风采集到的当前帧的声音信号进行时延差定位,得到当前帧的声音信号的左右方位角;在左右方位角大于第二预设角度(与上一段中的第二预设角度可以相同也可以不同),确定当前帧的声音信号中包含次级噪声信号;通过上述步骤,若确定出当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中所有的次级噪声信号。另外,上下方位角和左右方位角所能检测到的是噪声虽然都属于次级噪声信号,但侧重的噪声声源的方位是略有侧重的,如上下方位角更侧重检测终端所在平面上下方向上的噪声,而左右方位角更侧重检测终端所在平面左右方向上的噪声。
由此可知,可以利用终端前后两个麦克风声压差信息,进行前后向的声源方位估计。此外,也可以利用时延差信息进行前后向的声源方位估计,如图4B中θ1角(前后方位角),即为利用前后麦克风的时延差进行方位估计得到的角度值。这里前后麦克风分别指的是顶部正面和背面的麦克风。具体计算方法参照图5中的时延差计算方法。时延差的计算方法属于现有技术,本发明中不予以赘述。
当底部存在麦克风时,类似图2B中的麦克风布局。增加了底部左面麦克风,图4B中用mic2表示,此时利用底部的mic2和前mic,采用基于时延差的方位角估计方法,计算θ2。当底部具有两个麦克风时,类似图2C中的麦克风布局。增加了底部左面和右面麦克风,图4B中分别用mic2、mic1表示,此时可利用底部mic1和底部mic2,采用基于时延差的方位角估计方法,计算θ3。如图4B中所示,前后mic可对空间坐标系x、y、z轴中的x轴进行角度解析,前mic和底部mic2可对y轴进行角度解析,底部mic1和底部mic2可对z轴进行方位角解析。
此时,通过三个角度值θ1、θ2、θ3(前后方位角、左右方位角、上下方位角),即可实现空间的声源定位功能,进而定位出目标声源是否在摄像头的摄像范围内。其中,前后方位、左右方位、上下方位都是以手机的机身作为参照的,如前后方位指的是手机的正面和背面方向,左右方位指机身两侧的方向,上下方位指机身顶部和底端的方向。
仍以前置摄像头拍摄为例(注:后置摄像头拍摄虽是不同场景,但方案实现原理类似,故本发明不再做赘述,并非限定应用场景,全文皆如此),众所周知,摄像头的视场角在终 端上的显示,为两个开角,分别为开角1和开角2;开角1对应z轴方向,开角2对应y轴方向。本算法首先利用声压差或时延差的角度估计方法,区分了前后向的声源信号,接下来就是利用θ2对y轴的开角进行约束,当θ2大于摄像头的开角2时,此时的声音信号中包含次级噪声信号。θ3同理,当θ3大于开角1时,声音信号中包含次级噪声信号。值得说明的是,次级噪声信号是一个相对更上位的概念,用θ2和θ3进行判定的方法所检测的次级噪声方位并不相同,采用θ2主要偏重检测左右方位的次级噪声,采用θ3主要偏重检测上下方位的次级噪声,θ2和θ3在声源方位的判定上起到相互补充的作用。
在具体实现过程中,基于声压差的声源定位方法,在麦克风布局为图2A所示的情形下,可以得到当前帧的声音信号是否包含后向声音信号,在当前帧信号不包含后向声音信号时,输出语音活动检测(Voice Activity Detection,VAD)标志为0;在当前帧的声音信号包含后向声音信号时,认为含有噪声声源,输出VAD标志为1;在麦克风布局为图2B和图2C所示的情形下,可以得到当前帧的声音信号是否包含后向声音信号,在当前帧的声音信号包含后向声音信号时,输出VAD标志为1;在当前帧信号不包含后向声音信号时,进一步的确定是否包括次级噪声信号,若不包括次级噪声信号,则输出语音活动检测(Voice Activity Detection,VAD)标志为0;否则,输出VAD标志为1;其中,VAD标志默认为0。总之,若同时考虑到后向声音信号和次级噪声的影响,当前声音信号中含有次级噪声或者后向声音信号中的任意一个,VAD将被置为1;若只考虑到后向声音信号的影响,只要含有后向声音信号,则VAD将被置为1;显然前者对声源更为敏感,要求更高,这些可以事先由用户灵活配置。
需要说明的是,在当前ILD信息无效时,即利用顶部正面和背面两个麦克风的声压差,确定不满足利用声压差判断声源方向的条件,此时,使用传统的基于相位差的声源定位技术进行声源定位判断,具体方法如下所示:
如图5所示,当远场声源信号平行入射时,由于入射角的不同,信号到达两个mic存在时间差,时间差信息对于任一频率的波形信号来说,就是其相位的变化量,即相位差。
当入射角为0°,相位差
Figure PCTCN2017106905-appb-000005
f为频率,c为声速,d为mic间距,0°入射时,这里
Figure PCTCN2017106905-appb-000006
等于0,180°入射时
Figure PCTCN2017106905-appb-000007
等于π。图5中h可理解为两个mic的相位差,d为两个mic的最大相位差,入射角=asin(h/d)。其中,最大相位差为
Figure PCTCN2017106905-appb-000008
两个mic的相位差为频点复数信号的相位角差(复数可用三角函数表示为x+yi=|A|(cosθ+i sinθ),A为复数的模)。
基于整帧的相位差声源定位方法有很多种,比较常见的是广义互相关(GCC)声源定位方法,GCC实现方法如图6所示:
图6中的x1,x2为两个mic接收的时域信号,FFT为快速傅里叶变换,通过计算得到的 峰值索引τ12,即对应的时延样点数,则
Figure PCTCN2017106905-appb-000009
入射角可按下式进行计算:
Figure PCTCN2017106905-appb-000010
c为声速,d为mic间距,Fs为采样率。
基于相位差的声源定位方法,可以得到当前帧信号的整帧及频点入射角,当整帧和频点入射角都在波束拾取范围外(波束范围事先设定好)时,认为是当前声音信号中包含后向声音信号即为噪声声源,输出VAD标志为1;否则输出VAD标志为0,VAD标志默认为0。
具体的,在VAD标志输出为0时,将顶部后面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的声音信号中的噪声信号。具体的实现过程为:将VAD标志输出给波束形成器,可选的,采用归一化最小均方误差(NLMS)滤波器,NLMS需要用参考信号自适应产生期望信号,并用期望信号减去目标信号,从而得到残差信号,并以残差最小为设计导向。NLMS的滤波器步长,通过上述声源定位结果进行指导,当判断为前向声音信号即目标声源时,滤波器步长为0,不更新。当判断为后向声音信号包含噪声源时,滤波器步长最大,进行更新。这里滤波器的参考信号使用的是与目标用户相反方向的信号。比如:说话人在屏幕正前方时,参考信号选取终端顶部背面mic的信号。反之亦然。将更新好的滤波器系数乘以输入(in)信号,得到滤除后向噪声成分的输出信号。
进一步的,针对波束后的信号,再利用声源定位结果,指导波束后的后处理降噪。当频点的声源定位结果为包含噪声时,更新该频点的噪声能量,并使用传统的维纳滤波算法,进行后处理增益压制。对波束形成后的信号,进行进一步的降噪处理。
接着,将后处理处理后的信号,送给回声消除模块,进行进一步的回声消除。由于波束形成及后处理部分,本身对回声信号已有一定的消除作用。因为该场景下,喇叭所处的位置一般在手机的底部或背部,喇叭产生的信号方位,属于噪声方位。所以,相较于传统的回声消除技术,麦克风阵列的波束形成及后处理技术,会使回声更小,更易于消除。
因为中高频信号被终端遮挡时,可产生显著的遮挡效应。当低信噪比或多声源场景时,时延差信息已经非常浑浊,是多个方位声源的混合。基于时延差的声源定位的结果呈现随机性。此时,声压差信息,虽然也是混合的。但是只要终端正面和背面的声源,在两个mic上产生的声压差,有相对的差异性,即可利用进行声源定位,尤其视频通话场景,当说话人、噪声源等声源,距离终端不是很远时,该声压差信息更加可靠。
具体实现过程中,终端100上的mic麦克风布局,包括但不限于图2A、图2B或图2C中的任意一种布局。图2A、图2B或图2C中的麦克风布局属于典型的直列式布局。麦克风数量增多时,可以对波束的拾音范围进行更好的区分,使波束范围更准确,可实现空间3D声源定位。采用图2A中的麦克风布局时,可以有效区分前后的信号。当使用图2B中的mic布局时,不仅可以有效区分前后的信号,由于底部增加了一个mic(左右位置不限定),可进行手机上下方向的声源方位区分。当使用图2C中的mic布局时,不仅可以有效区分前后的信号,由于底部新增两个左右分布的mic,可进行上下方位、左右方位的声源方位区分,可实现空间3D的声源定位。
本发明提供了一种声音处理方法,该方法应用于顶部具有两个麦克风的终端上,这两 个麦克风分别位于终端的正面和背面,检测到终端的摄像头处于拍摄状态时,利用两个麦克风采集当前环境中的声音信号;根据采集到当前帧的声音信号按照第一预设算法计算两个麦克风之间的声压差;判断当前帧的两个麦克风之间的声压差是否满足声源方向判定条件;若满足声源方向判定条件,则根据当前帧的两个麦克风之间的声压差,确定出当前帧的声音信号中是否包含后向声音信号,后向声音信号为声源位于所述摄像头后方的声音信号;若确定出当前帧的声音信号中包含后向声音信号,则将当前帧的声音信号中的后向声音信号进行滤除。采用该方法,可以在摄像时,将摄像范围外的噪声信号滤除,保证了拍摄时的视频的声音质量,提高用户的体验。
如图7所示,基于上述实施例提供的声音处理方法,本发明实施例提供一种声音处理装置700,所述装置700应用于顶部具有两个麦克风的终端上,所述两个麦克风分别位于所述终端的正面和背面,如图7所示,该装置700包括采集模块701、计算模块702、判断模块703、确定模块704和滤除模块705,其中:
采集模块701,用于检测到所述终端的摄像头处于拍摄状态时,利用所述两个麦克风采集所述终端所处当前环境中的声音信号。该采集模块可以由处理器实现,可以调用本地存储器或云端服务器中的程序指令,监测摄像头的摄像功能是否使能;如果监测到摄像头已经使能,则处理器可以进一步控制麦克风采集声音信号,进一步地,还可以通过音频电路将采集到的信号转换为数字信号。
一种具体实现过程中,采集模块701可以包含检测单元701a和接收单元701b,检测单元701a用于检测是否有摄像头处于使能状态,并且能区分出前后摄像头;如果检测到有摄像头存在使能,则有接收单元701b进一步采集当前环境中的声音信号。这两个单元也都可以通过处理器调用存储器中的程序和指令来实现相应功能。
计算模块702,用于根据采集模块701采集到的声音信号按照第一预设算法计算所述两个麦克风之间的声压差。该计算模块可以由处理器实现,可以通过调用本地存储器或云端服务器中的声压差算法程序,对上述采集到的声音信号进行计算处理,得到声压差。
判断模块703,用于判断所述计算模块702计算出来的两个麦克风之间的声压差是否满足声源方向判定条件。该判断模块可以由处理器实现,可以通过调用本地存储器或云端服务器中的判断算法程序,进行相应计算,得到判断结果。确定模块704,用于当所述判断模块703得出满足所述声源方向判定条件,则根据所述两个麦克风之间的声压差,确定出所述声音信号中是否包含后向声音信号。该确定模块可以由处理器实现,当接收到结果为满足时,可以通过调用本地存储器或云端服务器中的后向声音判断算法程序,确定出声音信号中是否包含后向声音信号。
滤除模块705,用于若所述确定模块704确定出所述声音信号中包含后向声音信号,则将所述声音信号中的后向声音信号进行滤除。该滤除模块可以由处理器实现,当确定出声音信号包含后向声音信号时,可以通过调用本地存储器或云端服务器中的噪声滤除算法程序,将声音信号中的后向声音信号进行滤除。
然而,通常摄像头能投摄取的视角范围为120度左右,而并非整个摄像头的前方区域,因此还可能存在声源位于摄像头前方且超出摄像头摄像范围的噪声信号,这部分噪声信号相对后向声音信号对摄像内容的影响较小,可以被理解为次级噪声信号,上面所提到的区 域允许存在一定的界定误差。因此在具体实现过程中,除了滤除后向声音信号那部分噪声之外,还可以进一步滤除次级噪声,可以参照下列两个示例。
示例三:
终端的顶部正面和背面以及底部各具有1个麦克风的布局情形。下面以终端的顶部正面和背面以及底部左边各具有1个麦克风的布局情形为例,如图2B所示,此时,无论之前是否存在后向声音信号,上述装置还可以包括次级噪声滤除模块706,用于执行以下步骤:
若采集模块701检测到终端使用前置摄像头拍摄时,还可以针对底部左面麦克风和顶部正面麦克风采集到的声音信号进行时延差定位,得到声音信号的上下方位角;在所述上下方位角大于第一预设角度时,确定当前帧的声音信号中包含次级噪声信号。次级噪声信号为位于前置摄像头前方且位于前置摄像头摄像范围边界以外的噪声信号。若确定出当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。
若采集模块701检测到终端使用后置摄像头拍摄时,还可以针对底部左面麦克风和顶部后面麦克风采集到的声音信号进行时延差定位,得到声音信号的上下方位角;在上下方位角大于第一预设角度(与上一段中的第一预设角度可以相同也可以不同)时,确定当前帧的声音信号中包含次级噪声信号。此时,次级噪声信号为位于后置摄像头前方且位于后置摄像头摄像范围边界以外的噪声信号。若确定出当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。
示例四:
终端的顶部正面和背面以及底部左面和右面各具有1个麦克风的布局情形下,为引用方便,底部的两个麦克风也可以称作第三麦克风、第四麦克风。如图2C所示,此时,无论之前是否存在后向声音信号,上述装置还可以包括次级噪声滤除模块,用于执行以下步骤:
若采集模块701检测到终端使用前置摄像头拍摄时,还可以针对底部左面麦克风和顶部正面麦克风采集到的声音信号进行时延差定位,得到声音信号的上下方位角;在所述上下方位角大于第一预设角度时,确定当前帧的声音信号中包含次级噪声信号。次级噪声信号为位于前置摄像头前方且位于前置摄像头摄像范围边界以外的噪声信号。进一步地,还针对所述第三麦克风和所述第四麦克风采集到的当前帧的声音信号进行时延差定位,得到当前帧的声音信号的左右方位角;在左右方位角大于第二预设角度,确定当前帧的声音信号中包含次级噪声信号。通过上述步骤,若确定出当前帧的声音信号中包含次级噪声信号时,则以顶部背面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中所有的次级噪声信号。
若采集模块701检测到终端使用后置摄像头拍摄时,还可以针对底部左面麦克风和顶部后面麦克风采集到的声音信号进行时延差定位,得到声音信号的上下方位角;在上下方位角大于第一预设角度时,确定当前帧的声音信号中包含次级噪声信号。此时,次级噪声信号为位于后置摄像头前方且位于后置摄像头摄像范围边界以外的噪声信号。进一步地,还可以针对第三麦克风和第四麦克风采集到的当前帧的声音信号进行时延差定位,得到当前帧的声音信号的左右方位角;在左右方位角大于第二预设角度(与上一段中的第二预设 角度可以相同也可以不同),确定当前帧的声音信号中包含次级噪声信号;通过上述步骤,若确定出当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中所有的次级噪声信号。
对于以上次级噪声滤除模块,可以由处理器实现,当确定出声音信号包含次级噪声信号时,可以通过调用本地存储器或云端服务器中的次级噪声滤除算法程序,将声音信号中所有的次级噪声信号进行滤除。
在具体实现过程中,采集模块701具体用于执行步骤31中所提到的方法以及可以等同替换的方法;计算模块702具体用于执行步骤32中所提到的方法以及可以等同替换的方法;判断模块703具体用于执行步骤33中所提到的方法以及可以等同替换的方法;确定模块704具体用于执行步骤34中所提到的方法以及可以等同替换的方法;滤除模块705具体用于执行步骤35中所提到的方法以及可以等同替换的方法;次级噪声滤除模块706具体用于执行步骤36或37中所提到的方法以及可以等同替换的方法。其中,上述具体的方法实施例以及实施例中的解释和表述也适用于装置中的方法执行。
本发明提供了一种声音处理装置,该装置应用于顶部具有两个麦克风的终端上,这两个麦克风分别位于终端的正面和背面,该装置包括:采集模块701、计算模块702、判断模块703、确定模块704和滤除模块705;采集模块701在检测到终端的摄像头处于拍摄状态时,利用两个麦克风采集当前环境中的声音信号;计算模块702根据采集到当前帧的声音信号按照第一预设算法计算两个麦克风之间的声压差;判断模块703判断当前帧的两个麦克风之间的声压差是否满足声源方向判定条件;若满足声源方向判定条件,则确定模块704根据当前帧的两个麦克风之间的声压差,确定出当前帧的声音信号中是否包含后向声音信号,后向声音信号为声源位于所述摄像头后方的声音信号;若确定出当前帧的声音信号中包含后向声音信号,则滤除模块705将当前帧的声音信号中的后向声音信号进行滤除。采用该装置,可以在摄像时,将摄像范围外的噪声信号滤除,保证了拍摄时的视频的声音质量,提高用户的体验。
应理解以上装置700中的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。例如,以上各个模块可以为单独设立的处理元件,也可以集成在终端的某一个芯片中实现,此外,也可以以程序代码的形式存储于控制器的存储元件中,由处理器的某一个处理元件调用并执行以上各个模块的功能。此外各个模块可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。该处理元件可以是通用处理器,例如中央处理器(英文:central processing unit,简称:CPU),还可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(英文:application-specific integrated circuit,简称:ASIC),或,一个或多个微处理器(英文:digital signal processor,简称:DSP),或,一个或者多个现场可编程门阵列(英文:field-programmable gate array,简称:FPGA)等。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产 品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明实施例的精神和范围。这样,倘若本发明实施例的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (22)

  1. 一种声音处理方法,其特征在于,所述方法应用于顶部具有两个麦克风的终端上,所述两个麦克风分别位于所述终端的正面和背面,所述方法包括:
    当所述终端的摄像头处于拍摄状态时,利用所述两个麦克风采集所述终端所处当前环境中当前帧的声音信号;
    根据采集到当前帧的声音信号按照第一预设算法计算所述两个麦克风之间的声压差;
    判断所述当前帧的两个麦克风之间的声压差是否满足声源方向判定条件;
    若满足所述声源方向判定条件,则根据所述当前帧的两个麦克风之间的声压差,确定出所述当前帧的声音信号中是否包含后向声音信号,所述后向声音信号为声源位于所述摄像头后方的声音信号;
    若确定出所述当前帧的声音信号中包含后向声音信号,则将所述当前帧的声音信号中的后向声音信号进行滤除。
  2. 如权利要求1所述的方法,其特征在于,所述利用所述两个麦克风采集所述终端所处当前环境中当前帧的声音信号,包括:
    利用所述两个麦克风采集当前帧的声音信号,分别为S1、S2;
    所述根据采集到的声音信号按照第一预设算法计算所述两个麦克风之间的声压差,包括:
    基于S1、S2,利用快速傅里叶变换FFT算法计算S1、S2的功率谱,分别为P1、P2;
    根据P1、P2,利用以下公式计算所述两个麦克风之间的声压差;
    Figure PCTCN2017106905-appb-100001
    其中,P1表示顶部正面麦克风在当前帧对应的声音功率谱,P2表示顶部背面麦克风在当前帧对应的声音功率谱,且P1和P2均为含有N个元素的向量,所述N个元素为当前帧声音信号进行快速傅里叶变换后对应的N个频点的值,N为大于1的整数;ILDnow为包含N个频点对应的声压差的向量。
  3. 如权利要求2所述的方法,其特征在于,所述判断所述当前帧的两个麦克风之间的声压差是否满足声源方向判定条件,包括:
    利用第i频点对应的所述两个麦克风的声压差,按照第二预设算法计算出所述第i频点对应的最大参考值和最小参考值,其中所述第i频点为所述N个频点中的一个,i取遍不大于N的所有正整数;
    如果所述第i频点的最大参考值与所述最小参考值之差大于所述第i频点对应的第一门限值,则确定所述两个麦克风之间的声压差在所述第i频点上满足声源方向判定条件;
    如果所述最大参考值与所述最小参考值之差不大于所述第i频点对应的第一门限值,则确定所述两个麦克风之间的声压差在所述第i频点上不满足声源方向判定条件;
    若所述N个频点中的M个频点满足声源方向判定条件,确定所述当前帧的两个麦克风之间的声压差满足声源方向判定条件,其中M大于等于N/2。
  4. 如权利要求3所述的方法,其特征在于,所述利用第i频点对应的所述两个麦克风的声压差,按照第二预设算法计算出所述第i频点对应的最大参考值和最小参考值,包括:
    获取第i-1频点对应的最大参考值,所述第i-1频点为所述第i频点的上一个频点,若所述第i频点对应的两个麦克风的声压差不大于所述第i-1频点对应的最大参考值时,利用以下公式计算所述第i频点对应的最大参考值,
    ILDmax=αlow*ILDnow+(1-αlow)*ILDmax′;
    若所述第i频点对应的两个麦克风的声压差大于所述第i-1频点对应的最大参考值时,利用以下公式计算所述第i频点对应的最大参考值,
    ILDmax=αfast*ILDnow+(1-αfast)*ILDmax′;
    获取第i-1频点对应的最小参考值,若所述第i频点对应的两个麦克风的声压差大于所述第i-1频点对应的最小参考值时,利用以下公式计算所述第i频点对应的最小参考值,
    ILDmin=αlow*ILDnow+(1-αlow)*ILDmin′;
    若所述第i频点对应的两个麦克风的声压差不大于所述第i-1频点对应的最小参考值时,利用以下公式计算所述第i频点对应的最小参考值,
    ILDmin=αfast*ILDnow+(1-αfast)*ILDmin′;
    其中,ILDnow表示所述第i频点对应的两个麦克风的声压差,ILDmax表示所述第i频点对应的最大参考值,ILDmax′表示所述第i-1频点对应的最大参考值,ILDmin表示所述第i频点对应的最小参考值,ILDmin′表示所述第i-1频点对应的最小参考值,αfast、αlow表示预设的步长值,且αfastlow
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述根据所述当前帧的两个麦克风之间的声压差,确定出所述当前帧的声音信号中是否包含后向声音信号,包括:
    当第j频点对应的声压差小于所述第j频点对应的第二门限值时,确定所述j频点处包含后向声音信号,其中,所述第j频点为所述M个频点中的一个,j取遍不大于M的所有正整数;
    当所述两个麦克风在第j频点对应的声压差不小于第二门限值时,确定所述j频点处不包含后向声音信号。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述将所述当前帧的声音信号中的后向声音信号进行滤除,包括:
    若所述终端正在拍摄的摄像头为前置摄像头,则以顶部背面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的后向声音信号;
    若所述终端正在拍摄的摄像头为后置摄像头,则以顶部正面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的后向声音信号。
  7. 如权利要求1-6任一项所述的方法,其特征在于,若所述终端在底部还包括第三麦克风,且正在拍摄的摄像头为前置摄像头时,所述方法还包括:
    针对所述第三麦克风和顶部正面麦克风采集到的当前帧的声音信号进行时延差定位, 得到所述当前帧的声音信号的上下方位角;
    在所述上下方位角大于第一预设角度时,确定所述当前帧的声音信号中包含次级噪声信号;所述次级噪声信号为位于所述前置摄像头前方且位于所述前置摄像头摄像范围以外的噪声信号;
    若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。
  8. 如权利要求1-6任一项所述的方法,其特征在于,若所述终端在底部还包括第三麦克风,且正在拍摄的摄像头为后置摄像头时,所述方法还包括:
    针对所述第三麦克风和顶部背面麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的上下方位角;
    在所述上下方位角大于第一预设角度时,确定所述当前帧的声音信号中包含次级噪声信号,所述次级噪声信号为位于所述后置摄像头前方且位于所述后置摄像头摄像范围以外的噪声信号;
    若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。
  9. 如权利要求7所述的方法,其特征在于,若所述终端在底部还包括第四麦克风,且所述第三麦克风和所述第四麦克风在终端底部左右排列,所述方法还包括:
    针对所述第三麦克风和所述第四麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的左右方位角;
    在所述左右方位角大于第二预设角度,确定所述当前帧的声音信号中包含次级噪声信号;
    若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。
  10. 如权利要求8所述的方法,其特征在于,若所述终端在底部还包括第四麦克风,且所述第三麦克风和所述第四麦克风在终端底部左右排列,所述方法还包括:
    针对所述第三麦克风和所述第四麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的左右方位角;
    在所述左右方位角大于第二预设角度,确定所述当前帧的声音信号中包含次级噪声信号;
    若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。
  11. 一种声音处理装置,其特征在于,所述装置应用于顶部具有两个麦克风的终端上,所述两个麦克风分别位于所述终端的正面和背面,所述装置包括:
    采集模块,用于当所述终端的摄像头处于拍摄状态时,利用所述两个麦克风采集所述 终端所处当前环境中当前帧的声音信号;
    计算模块,用于根据采集到当前帧的声音信号按照第一预设算法计算所述两个麦克风之间的声压差;
    判断模块,用于判断所述当前帧的两个麦克风之间的声压差是否满足声源方向判定条件;
    确定模块,用于若满足所述声源方向判定条件,则根据所述当前帧的两个麦克风之间的声压差,确定出所述当前帧的声音信号中是否包含后向声音信号,所述后向声音信号为声源位于所述摄像头后方的声音信号;
    滤除模块,用于若确定出所述当前帧的声音信号中包含后向声音信号,则将所述当前帧的声音信号中的后向声音信号进行滤除。
  12. 如权利要求11所述的装置,其特征在于,所述采集模块具体用于,
    利用所述两个麦克风采集当前帧的声音信号,分别为S1、S2;
    所述计算模块具体用于:
    基于S1、S2,利用快速傅里叶变换FFT算法计算S1、S2的功率谱,分别为P1、P2;
    根据P1、P2,利用以下公式计算所述两个麦克风之间的声压差;
    Figure PCTCN2017106905-appb-100002
    其中,P1表示顶部正面麦克风在当前帧对应的声音功率谱,P2表示顶部背面麦克风在当前帧对应的声音功率谱,且P1和P2均为含有N个元素的向量,所述N个元素为当前帧声音信号进行快速傅里叶变换后对应的N个频点的值,N为大于1的整数;ILDnow为包含N个频点对应的声压差的向量。
  13. 如权利要求12所述的装置,其特征在于,所述判断模块具体用于,
    利用第i频点对应的所述两个麦克风的声压差,按照第二预设算法计算出所述第i频点对应的最大参考值和最小参考值,其中所述第i频点为所述N个频点中的一个,i取遍不大于N的所有正整数;
    如果所述第i频点的最大参考值与所述最小参考值之差大于所述第i频点对应的第一门限值,则确定所述两个麦克风之间的声压差在所述第i频点上满足声源方向判定条件;
    如果所述最大参考值与所述最小参考值之差不大于所述第i频点对应的第一门限值,则确定所述两个麦克风之间的声压差在所述第i频点上不满足声源方向判定条件;
    若所述N个频点中的M个频点满足声源方向判定条件,确定所述当前帧的两个麦克风之间的声压差满足声源方向判定条件,其中M大于等于N/2。
  14. 如权利要求13所述的装置,其特征在于,所述判断模块具体用于,
    获取第i-1频点对应的最大参考值,所述第i-1频点为所述第i频点的上一个频点,若所述第i频点对应的两个麦克风的声压差不大于所述第i-1频点对应的最大参考值时,利用以下公式计算所述第i频点对应的最大参考值,
    ILDmax=αlow*ILDnow+(1-αlow)*ILDmax′;
    若所述第i频点对应的两个麦克风的声压差大于所述第i-1频点对应的最大参考值时,利用以下公式计算所述第i频点对应的最大参考值,
    ILDmax=αfast*ILDnow+(1-αfast)*ILDmax′;
    获取第i-1频点对应的最小参考值,若所述第i频点对应的两个麦克风的声压差大于所述第i-1频点对应的最小参考值时,利用以下公式计算所述第i频点对应的最小参考值,
    ILDmin=αlow*ILDnow+(1-αlow)*ILDmin′;
    若所述第i频点对应的两个麦克风的声压差不大于所述第i-1频点对应的最小参考值时,利用以下公式计算所述第i频点对应的最小参考值,
    ILDmin=αfast*ILDnow+(1-αfast)*ILDmin′;
    其中,ILDnow表示所述第i频点对应的两个麦克风的声压差,ILDmax表示所述第i频点对应的最大参考值,ILDmax′表示所述第i-1频点对应的最大参考值,ILDmin表示所述第i频点对应的最小参考值,ILDmin′表示所述第i-1频点对应的最小参考值,αfast、αlow表示预设的步长值,且αfastlow
  15. 如权利要求11-14任一项所述的装置,其特征在于,所述确定模块具体用于,
    当第j频点对应的声压差小于所述第j频点对应的第二门限值时,确定所述j频点处包含后向声音信号,其中,所述第j频点为所述M个频点中的一个,j取遍不大于M的所有正整数;
    当所述两个麦克风在第j频点对应的声压差不小于第二门限值时,确定所述j频点处不包含后向声音信号。
  16. 如权利要求11-15任一项所述的装置,其特征在于,所述滤除模块具体用于,
    若所述终端正在拍摄的摄像头为前置摄像头,则以顶部背面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的后向声音信号;
    若所述终端正在拍摄的摄像头为后置摄像头,则以顶部正面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的后向声音信号。
  17. 如权利要求11-16任一项所述的装置,其特征在于,若所述终端在底部还包括第三麦克风,且正在拍摄的摄像头为前置摄像头时,所述装置还包括次级噪声滤除模块,所述次级噪声滤除模块具体用于:
    针对所述第三麦克风和顶部正面麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的上下方位角;
    在所述上下方位角大于第一预设角度时,确定所述当前帧的声音信号中包含次级噪声信号;所述次级噪声信号为位于所述前置摄像头前方且位于所述前置摄像头摄像范围以外的噪声信号;
    若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。
  18. 如权利要求11-16任一项所述的装置,其特征在于,若所述终端在底部还包括第 三麦克风,且正在拍摄的摄像头为后置摄像头时,所述装置还包括次级噪声滤除模块,所述次级噪声滤除模块具体用于:
    针对所述第三麦克风和顶部背面麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的上下方位角;
    在所述上下方位角大于第一预设角度时,确定所述当前帧的声音信号中包含次级噪声信号,所述次级噪声信号为位于所述后置摄像头前方且位于所述后置摄像头摄像范围以外的噪声信号;
    若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。
  19. 如权利要求17所述的装置,其特征在于,若所述终端在底部还包括第四麦克风,且所述第三麦克风和所述第四麦克风在终端底部左右排列,所述次级噪声滤除模块具体用于:
    针对所述第三麦克风和所述第四麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的左右方位角;
    在所述左右方位角大于第二预设角度,确定所述当前帧的声音信号中包含次级噪声信号;;
    若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部背面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部正面麦克风采集的当前帧的声音信号中的次级噪声信号。
  20. 如权利要求18所述的装置,其特征在于,若所述终端在底部还包括第四麦克风,且所述第三麦克风和所述第四麦克风在终端底部左右排列,所述次级噪声滤除模块具体用于:
    针对所述第三麦克风和所述第四麦克风采集到的当前帧的声音信号进行时延差定位,得到所述当前帧的声音信号的左右方位角;
    在所述左右方位角大于第二预设角度,确定所述当前帧的声音信号中包含次级噪声信号;
    若确定出所述当前帧的声音信号中包含次级噪声信号时,以顶部正面麦克风采集的声音信号作为参考信号,控制所述终端的自适应滤波器滤除顶部背面麦克风采集的当前帧的声音信号中的次级噪声信号。
  21. 一种终端设备,其特征在于,包括:麦克风、摄像头、存储器、处理器,总线;所述麦克风、所述摄像头,所述存储器以及所述处理器通过所述总线相连;其中,
    所述麦克风用于在所述处理器的控制下采集声音信号;
    所述摄像头用于在所述处理器的控制下采集图像信号;
    所述存储器用于存储计算机程序和指令;
    所述处理器用于调用所述存储器中存储的所述计算机程序和指令,执行如权利要求1~10任一项所述的方法。
  22. 如权利要求21所述的终端设备,所述终端设备还包括天线系统、所述天线系统在处理器的控制下,收发无线通信信号实现与移动通信网络的无线通信;所述移动通信网络包括以下的一种或多种:GSM网络、CDMA网络、3G网络、FDMA、TDMA、PDC、TACS、AMPS、WCDMA、TDSCDMA、WIFI以及LTE网络。
PCT/CN2017/106905 2016-10-27 2017-10-19 一种声音处理方法和装置 WO2018077109A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020197014937A KR102305066B1 (ko) 2016-10-27 2017-10-19 사운드 처리 방법 및 장치
EP17863390.5A EP3531674B1 (en) 2016-10-27 2017-10-19 Sound processing method and device
US16/397,666 US10575096B2 (en) 2016-10-27 2019-04-29 Sound processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610970977.5A CN107026934B (zh) 2016-10-27 2016-10-27 一种声源定位方法和装置
CN201610970977.5 2016-10-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/397,666 Continuation US10575096B2 (en) 2016-10-27 2019-04-29 Sound processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2018077109A1 true WO2018077109A1 (zh) 2018-05-03

Family

ID=59525239

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/106905 WO2018077109A1 (zh) 2016-10-27 2017-10-19 一种声音处理方法和装置

Country Status (5)

Country Link
US (1) US10575096B2 (zh)
EP (1) EP3531674B1 (zh)
KR (1) KR102305066B1 (zh)
CN (1) CN107026934B (zh)
WO (1) WO2018077109A1 (zh)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107026934B (zh) * 2016-10-27 2019-09-27 华为技术有限公司 一种声源定位方法和装置
CN108089152B (zh) * 2016-11-23 2020-07-03 杭州海康威视数字技术股份有限公司 一种设备控制方法、装置及系统
CN109036448B (zh) * 2017-06-12 2020-04-14 华为技术有限公司 一种声音处理方法和装置
US10334360B2 (en) * 2017-06-12 2019-06-25 Revolabs, Inc Method for accurately calculating the direction of arrival of sound at a microphone array
CN108269582B (zh) * 2018-01-24 2021-06-01 厦门美图之家科技有限公司 一种基于双麦克风阵列的定向拾音方法及计算设备
CN108519583A (zh) * 2018-04-11 2018-09-11 吉林大学 适用于各向异性二维板的声发射源定位方法
CN108254721A (zh) * 2018-04-13 2018-07-06 歌尔科技有限公司 一种机器人声源定位方法和机器人
CN110441738B (zh) * 2018-05-03 2023-07-28 阿里巴巴集团控股有限公司 车载语音定位的方法、系统、车辆和存储介质
CN108734733B (zh) * 2018-05-17 2022-04-26 东南大学 一种基于麦克风阵列与双目摄像头的说话人定位与识别方法
CN108766457B (zh) 2018-05-30 2020-09-18 北京小米移动软件有限公司 音频信号处理方法、装置、电子设备及存储介质
CN108922555A (zh) * 2018-06-29 2018-11-30 北京小米移动软件有限公司 语音信号的处理方法及装置、终端
CN109754803B (zh) * 2019-01-23 2021-06-22 上海华镇电子科技有限公司 车载多音区语音交互系统及方法
CN111479180B (zh) * 2019-01-24 2022-04-29 Oppo广东移动通信有限公司 拾音控制方法及相关产品
CN110198372B (zh) * 2019-05-31 2020-10-09 华为技术有限公司 确定摄像组件伸缩状态的方法、可读存储介质及相关设备
CN111025233B (zh) * 2019-11-13 2023-09-15 阿里巴巴集团控股有限公司 一种声源方向定位方法和装置、语音设备和系统
CN110853657B (zh) 2019-11-18 2022-05-13 北京小米智能科技有限公司 空间划分方法、装置及存储介质
CN113132863B (zh) * 2020-01-16 2022-05-24 华为技术有限公司 立体声拾音方法、装置、终端设备和计算机可读存储介质
CN111505583B (zh) * 2020-05-07 2022-07-01 北京百度网讯科技有限公司 声源定位方法、装置、设备和可读存储介质
CN111736797B (zh) * 2020-05-21 2024-04-05 阿波罗智联(北京)科技有限公司 负延时时间的检测方法、装置、电子设备及存储介质
CN111665422A (zh) * 2020-06-08 2020-09-15 郑州精铖电力设备有限公司 基于fpga的麦克风阵列非侵入式宽频声波实时成像检测系统
CN112129402B (zh) * 2020-08-21 2021-07-13 东风汽车集团有限公司 一种异响源探测装置
CN113640744A (zh) * 2021-08-20 2021-11-12 歌尔科技有限公司 声源定位方法及音频设备
CN115615624B (zh) * 2022-12-13 2023-03-31 杭州兆华电子股份有限公司 一种基于无人巡检装置的设备泄漏检测方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050031136A1 (en) * 2001-10-03 2005-02-10 Yu Du Noise canceling microphone system and method for designing the same
CN101203063A (zh) * 2007-12-19 2008-06-18 北京中星微电子有限公司 麦克风阵列的噪声消除方法及装置
US20120288113A1 (en) * 2011-05-09 2012-11-15 Hiroshi Akino Microphone
CN104270489A (zh) * 2014-09-10 2015-01-07 中兴通讯股份有限公司 一种从多个麦克风中确定主副麦克风的方法和系统
CN107026934A (zh) * 2016-10-27 2017-08-08 华为技术有限公司 一种声源定位方法和装置

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1443498B1 (en) 2003-01-24 2008-03-19 Sony Ericsson Mobile Communications AB Noise reduction and audio-visual speech activity detection
US8345890B2 (en) * 2006-01-05 2013-01-01 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US9185487B2 (en) * 2006-01-30 2015-11-10 Audience, Inc. System and method for providing noise suppression utilizing null processing noise subtraction
US8194882B2 (en) * 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
US8319858B2 (en) 2008-10-31 2012-11-27 Fortemedia, Inc. Electronic apparatus and method for receiving sounds with auxiliary information from camera system
US8761412B2 (en) 2010-12-16 2014-06-24 Sony Computer Entertainment Inc. Microphone array steering with image-based source location
KR101761312B1 (ko) 2010-12-23 2017-07-25 삼성전자주식회사 마이크 어레이를 이용한 방향성 음원 필터링 장치 및 그 제어방법
US8903722B2 (en) * 2011-08-29 2014-12-02 Intel Mobile Communications GmbH Noise reduction for dual-microphone communication devices
US9197974B1 (en) * 2012-01-06 2015-11-24 Audience, Inc. Directional audio capture adaptation based on alternative sensory input
CN104981866B (zh) * 2013-01-04 2018-09-28 华为技术有限公司 用于确定立体声信号的方法
CN104715757A (zh) * 2013-12-13 2015-06-17 华为技术有限公司 一种终端声控操作方法及装置
US9282399B2 (en) * 2014-02-26 2016-03-08 Qualcomm Incorporated Listen to people you recognize
WO2018148095A1 (en) * 2017-02-13 2018-08-16 Knowles Electronics, Llc Soft-talk audio capture for mobile devices

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050031136A1 (en) * 2001-10-03 2005-02-10 Yu Du Noise canceling microphone system and method for designing the same
CN101203063A (zh) * 2007-12-19 2008-06-18 北京中星微电子有限公司 麦克风阵列的噪声消除方法及装置
US20120288113A1 (en) * 2011-05-09 2012-11-15 Hiroshi Akino Microphone
CN104270489A (zh) * 2014-09-10 2015-01-07 中兴通讯股份有限公司 一种从多个麦克风中确定主副麦克风的方法和系统
CN107026934A (zh) * 2016-10-27 2017-08-08 华为技术有限公司 一种声源定位方法和装置

Also Published As

Publication number Publication date
US20190253802A1 (en) 2019-08-15
EP3531674B1 (en) 2024-02-14
KR20190067902A (ko) 2019-06-17
US10575096B2 (en) 2020-02-25
EP3531674A4 (en) 2019-11-06
KR102305066B1 (ko) 2021-09-24
CN107026934A (zh) 2017-08-08
EP3531674A1 (en) 2019-08-28
CN107026934B (zh) 2019-09-27

Similar Documents

Publication Publication Date Title
WO2018077109A1 (zh) 一种声音处理方法和装置
WO2018228060A1 (zh) 一种声音处理方法和装置
CN110970057B (zh) 一种声音处理方法、装置与设备
JP6400566B2 (ja) ユーザインターフェースを表示するためのシステムおよび方法
US9668048B2 (en) Contextual switching of microphones
EP2882170B1 (en) Audio information processing method and apparatus
WO2014161309A1 (zh) 一种移动终端实现声源定位的方法及装置
CN108766457B (zh) 音频信号处理方法、装置、电子设备及存储介质
CN113393856B (zh) 拾音方法、装置和电子设备
CN112233689A (zh) 音频降噪方法、装置、设备及介质
WO2022062531A1 (zh) 一种多通道音频信号获取方法、装置及系统
US8924206B2 (en) Electrical apparatus and voice signals receiving method thereof
CN117153180A (zh) 声音信号处理方法、装置、存储介质及电子设备
CN116935883B (zh) 声源定位方法、装置、存储介质及电子设备
US11961501B2 (en) Noise reduction method and device
CN110047494B (zh) 设备响应方法、设备及存储介质
CN114239293A (zh) 恒定束宽的波束形成器设计方法、装置、设备及存储介质
CN118071627A (zh) 图像降噪方法、装置、电子设备及存储介质
CN115691524A (zh) 音频信号的处理方法、装置、设备及存储介质
WO2018076324A1 (zh) 音频处理的方法和终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17863390

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20197014937

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017863390

Country of ref document: EP

Effective date: 20190521