WO2019223650A1 - Beamforming method, multi-beam forming method and apparatus, and electronic device - Google Patents

Beamforming method, multi-beam forming method and apparatus, and electronic device Download PDF

Info

Publication number
WO2019223650A1
WO2019223650A1 PCT/CN2019/087621 CN2019087621W WO2019223650A1 WO 2019223650 A1 WO2019223650 A1 WO 2019223650A1 CN 2019087621 W CN2019087621 W CN 2019087621W WO 2019223650 A1 WO2019223650 A1 WO 2019223650A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
target sound
calculate
product
pointed
Prior art date
Application number
PCT/CN2019/087621
Other languages
French (fr)
Chinese (zh)
Inventor
周舒然
李志飞
Original Assignee
出门问问信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201810496448.5A external-priority patent/CN108551625A/en
Priority claimed from CN201810497069.8A external-priority patent/CN108717495A/en
Priority claimed from CN201810496450.2A external-priority patent/CN108831498B/en
Application filed by 出门问问信息科技有限公司 filed Critical 出门问问信息科技有限公司
Publication of WO2019223650A1 publication Critical patent/WO2019223650A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the embodiments of the present application relate to the field of speech processing technologies, and in particular, to a beam forming method, a multi-beam forming method, a device, and an electronic device.
  • Beamforming is a signal processing technology (such as a microphone array) used for sensor arrays and used for directional signals. Receive and perform appropriate signal processing on the received sound signals. Beamforming allows the microphone component to receive sound signals in order to achieve the effect of selectively processing electrical signals. For example, the processing of sound information from one sound source is different from the processing of sound information from different sound sources.
  • embodiments of the present application provide a beam forming method, a multi-beam forming method, a device, and an electronic device, so as to ensure that the sound directed by the target space is not distorted, and effectively suppress the sound directed by other target spaces, thereby improving The signal-to-noise ratio of the sound pointed at the target space.
  • an embodiment of the present invention provides a beamforming method, including:
  • the method further includes:
  • the calculating spatial filtering parameters includes:
  • the first limitation condition is specifically a white noise gain limitation
  • the second limitation condition is that a product of the spatial filtering parameter and the signal vector function is a first preset value.
  • calculating the delay time for the sound source to reach the microphone array includes:
  • the delay time is calculated according to a distance between the microphones, a speed at which the sound source propagates sound, and an angle at which the sound source points.
  • calculating the sound source direction according to the signal vector function and the delay time includes:
  • the spatial filtering parameter is a matrix.
  • the sound source is directed to an arbitrary angle of 0 ° -180 ° of a plane wave.
  • an embodiment of the present invention provides a beamforming apparatus, including:
  • a first obtaining unit configured to obtain spatial filtering parameters, which are different with different angles and subband frequencies
  • a determining unit configured to determine a sound source corresponding to the spatial filtering parameter obtained by the first obtaining unit
  • a second obtaining unit configured to obtain that the sound source determined by the determining unit points to a corresponding original frequency domain signal
  • a first calculation unit is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal, and the product is used to perform suppression in a frequency domain signal other than the original frequency domain signal pointed by the sound source. Beamforming.
  • an embodiment of the present invention provides a multi-beam beamforming method, including:
  • calculating the target beamforming output of the target sound source includes:
  • calculating the noise parameters according to the blocking matrix includes:
  • performing noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output to the target sound source according to the noise parameter includes:
  • an embodiment of the present invention provides a multi-beam beamforming apparatus, including:
  • a first calculation unit configured to calculate that a target sound source points to a corresponding beamforming output
  • a second calculation unit configured to calculate a noise parameter by using a blocking matrix
  • a noise reduction unit configured to perform, according to the noise parameter calculated by the second calculation unit, a signal pointed by the target sound source calculated by the first calculation unit to a non-target sound source other than a corresponding beamforming output; Noise reduction.
  • the first calculation unit includes:
  • a first acquisition module configured to acquire spatial filtering parameters
  • a determining module configured to determine a target sound source direction corresponding to the spatial filtering parameter obtained by the first obtaining module
  • a second acquisition module configured to acquire a target sound source acquired by the first acquisition module to point to a corresponding original frequency domain signal
  • a calculation module is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain a beamforming output pointed by the target sound source.
  • the second calculation unit includes:
  • a first calculation module configured to calculate a frequency response in which a sound signal reaches the microphone in order
  • a construction module configured to construct the blocking matrix according to the frequency response calculated by the first calculation module
  • a second calculation module is configured to calculate the noise parameter according to the blocking matrix constructed by the construction module and the non-target sound source pointing to a corresponding original frequency domain signal.
  • the noise reduction unit includes:
  • a noise reduction module configured to point the target sound source to a non-target sound source other than the corresponding beamforming output according to the beamforming output of the target sound source, the multi-channel optimal filtering parameter, and the noise parameter
  • the pointed signal is denoised.
  • the present invention provides a multi-beam beamforming method, including:
  • the spatial filtering parameters vary with the angle of the sound source and the subband frequency.
  • the at least two The sound source pointing includes a target sound source and at least one non-target sound source pointing;
  • a product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, and the speech corresponding to the product is output.
  • the method further includes:
  • calculating the product of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions respectively to obtain multi-beam beamforming includes:
  • the calculation of the enhanced speech pointed by the target sound source includes:
  • calculating the energy ratio according to the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source includes:
  • performing smoothing processing on a frame-by-frame basis for the current frame and the previous frame by using a smoothing parameter includes:
  • calculating the product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio, and outputting the speech corresponding to the product includes:
  • a product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, and the speech corresponding to the product is output according to the smoothing processing result.
  • an embodiment of the present invention provides a multi-beam beamforming apparatus, including:
  • a first calculation unit is configured to calculate a product of spatial filtering parameters and original frequency domain signals corresponding to at least two sound source directions, respectively, to obtain multi-beam beamforming.
  • the spatial filtering parameters vary with the angle of the sound source and the subband frequency.
  • the at least two sound source points include a target sound source point and at least one non-target sound source point sound source point;
  • a second calculation unit configured to separately calculate the enhanced speech pointed by the target sound source
  • a third calculation unit configured to calculate an energy ratio based on the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source;
  • a fourth calculation unit is configured to calculate a product of the original frequency domain signal pointed by the target sound source, a corresponding enhanced speech pointed by the target sound source, and the energy ratio, and output a speech corresponding to the product.
  • an embodiment of the present invention provides a storage medium on which a computer program is stored, and the program is executed by a processor to implement the method according to the first aspect of the embodiment of the present invention and / or the method according to the first embodiment of the present invention.
  • an embodiment of the present invention provides an electronic device, where the electronic device includes a processor, a memory, and a bus; the processor and the memory communicate with each other through the bus; And storing program instructions that are executed by the processor to implement the method according to the first aspect of the embodiment of the present invention and / or the method according to the third aspect of the embodiment of the present invention and / or the method according to the present invention The method described in the fifth aspect of the embodiment.
  • the beamforming output of the target sound source is obtained by calculating the product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source direction, and the target is improved by performing noise reduction processing on the non-target sound source direction
  • the signal-to-noise ratio of the beamforming output pointed by the sound source is obtained by calculating the product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source direction, and the target is improved by performing noise reduction processing on the non-target sound source direction. Therefore, it is possible to ensure that the sound pointed by the target space is not distorted, and effectively suppress the sound pointed by other target spaces, thereby improving the signal-to-noise ratio of the sound pointed by the target space.
  • FIG. 1 is a flowchart of a beamforming method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a microphone array according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of another microphone array according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for calculating spatial filtering parameters according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a multi-beam beamforming method according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a final voice output pointed by a target sound source according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of still another multi-beam beamforming method according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a beamforming apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of another beamforming apparatus according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 14 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 15 is a structural block diagram of an electronic device according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a beamforming method according to an embodiment of the present invention.
  • the beamforming method of the sound source in this embodiment is shown in FIG. 1 and includes the following steps:
  • Step S110 acquiring spatial filtering parameters.
  • the spatial filtering parameters are different with different angles and subband frequencies.
  • the beamforming in a fixed spatial direction can be enhanced by using spatial filtering parameters to ensure that the sound in the pointing direction is substantially unchanged, and the sound in other directions will be suppressed to a certain extent.
  • the spatial filtering parameter in the embodiment of the present invention is a filter parameter in the frequency domain, and its purpose is to perform corresponding gain or suppression on the subband frequency of the signal of each frame.
  • the spatial filtering parameter in this embodiment is a matrix, and the spatial filtering parameter is calculated by a computer device, and the obtained spatial filtering parameter is stored in the method for performing the method described in the embodiment of the present invention.
  • the electronic devices are directly used by the power supply sub-devices, thereby reducing the time consumption of beamforming.
  • the following embodiments will take the direction of the beam pointing directly in front of 90 ° as an example, that is, the sound source is pointing directly in front of 90 °, but it should be noted that this method is not easy to perform in a limited beam. It is 90 °. In practical applications, the sound source is directed at any angle of plane wave 0 ° -180 °, such as 30 °, 60 °, 120 ° and so on.
  • step S120 the sound source corresponding to the spatial filtering parameter is determined.
  • Step S130 Acquire a sound source pointing to a corresponding original frequency domain signal.
  • the sound source reaches the microphone array from different directions, causing different microphones to receive signals with different degrees of delay time.
  • the delay time can be used to locate the direction of the beam focus and determine the direction of the sound source that is consistent with the spatial filtering parameters (such as positive 90 ° ahead).
  • the microphone array is composed of a certain number of acoustic sensors (usually microphones), which are used to sample the spatial characteristics of the sound field.
  • the number of microphones can be 4 in a linear pattern (as shown in Figure 2), with even spacing. Distribution, 6 evenly spaced evenly spaced lines, 8 evenly spaced evenly spaced circles (as shown in Figure 3), 12 or 14 evenly spaced evenly spaced circles, rectangles, crescents, etc.
  • the number and arrangement of the microphone array are not limited in the embodiment of the present invention. However, for the convenience of description, the embodiment of the present invention will be described by taking the four linear microphone arrays 2 shown in FIG. 2 as an example, but it should be clear that this description method is not a specific limitation on the microphone array. .
  • the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning.
  • the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
  • the delay time for the sound source to reach each microphone may be calculated by the physical structure of the microphone arrangement. Assumption: Determine the microphone distance d, the sound propagation speed c, and the angle ⁇ at which the sound source points (that is, the angle of the direction in which you want to receive and focus, such as 90 ° directly in front).
  • tau_0 d * sin ( ⁇ ) / c; the second microphone Mic2
  • the delay time of tau_1 2 * d * sin ( ⁇ ) / c
  • tau_1 refers to the delay time from the sound field to the second microphone Mic2.
  • the above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
  • the signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time.
  • a matrix corresponding to all subband frequencies needs to be determined.
  • the signal vector function is:
  • is the direction angle of sound receiving and focusing
  • j is the phase at a certain time
  • 2 * ⁇ * f
  • f is a matrix corresponding to all subband frequencies
  • ⁇ 0 is the sound source to the first
  • N is the number of microphones
  • ⁇ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone.
  • the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • step 110 may be performed first, and then step 120 may be performed, or step 110 and step 120 may be performed simultaneously.
  • step 110 may be performed first, and then step 120 may be performed, or step 110 and step 120 may be performed simultaneously. This embodiment of the present invention This is not limited.
  • Step S140 Calculate a product of the obtained spatial filtering parameter and the original frequency domain signal of the sound source to obtain a beamforming output pointed by the sound source.
  • the product performs beamforming in a manner of suppressing the original frequency domain signal corresponding to a non-target sound source other than the original frequency domain signal pointed by the sound source.
  • the spatial filtering parameters and the original frequency domain signal are both matrices, and the two matrices are multiplied together, and the product is generated from the original frequency domain signal corresponding to a non-target sound source other than the original frequency domain signal pointed by the sound source.
  • the beamforming is performed in a suppression manner, so that sound signals in a fixed direction are not distorted, and sound signals in other directions are suppressed.
  • an electronic device obtains a spatial filtering parameter, and the spatial filtering parameter is different with different angles and subband frequencies; determining a sound source direction corresponding to the spatial filtering parameter, and acquiring the sound The source points to the corresponding original frequency domain signal; the product of the spatial filtering parameter and the original frequency domain signal is calculated, and the product is used to suppress the frequency domain signals other than the original frequency domain signal pointed by the sound source.
  • the present invention can not only save the time of beamforming by presetting the spatial filtering parameters in advance, but also can achieve no distortion of sound signals in a fixed direction.
  • a computer equipment is used to pre-calculate the spatial filtering parameters corresponding to any angle of plane wave 0 ° -180 °, so as to obtain the corresponding spatial filtering parameters when beamforming the sound source.
  • FIG. 4 is a flowchart of a method for calculating spatial filtering parameters according to an embodiment of the present invention.
  • calculating the spatial filtering parameters specifically includes the following steps:
  • Step S1 Calculate the delay time for the sound source to reach the microphone array.
  • the sound source reaches the microphone array from different directions, resulting in different degrees of delay time for signals received by different microphones.
  • the delay time can be used to locate the direction of the beam focus and determine the direction of the sound source that is consistent with the spatial filtering parameters (such as positive 90 ° ahead).
  • the calculation of the delay time from the arrival of the sound source to the microphone array may specifically include, but is not limited to, the following steps: determine the microphone distance d, the sound propagation speed c, and the angle ⁇ that the sound source is pointing at (i.e., you want to receive and focus) Direction angle, such as 90 ° directly in front).
  • the above-mentioned delay time is calculated according to the determined microphone distance d, the sound propagation speed c, and the angle ⁇ at which the sound source is pointed. For specific methods, refer to step S120, and details are not described herein again.
  • step S2 a signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time.
  • a matrix corresponding to all subband frequencies needs to be determined.
  • the signal vector function is:
  • is the direction angle of sound receiving and focusing
  • j is the phase at a certain time
  • 2 * ⁇ * f
  • f is a matrix corresponding to all subband frequencies
  • ⁇ 0 is the sound source to the first
  • N is the number of microphones
  • ⁇ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone.
  • the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • Step S3 Calculate a spatial filtering parameter when the loss function approaches a minimum value according to a preset first limitation condition and a second limitation condition.
  • the loss function is constructed according to the spatial filtering parameters and the signal vector function.
  • the preset first limiting condition is a white noise gain limitation.
  • W f ( ⁇ ) is the spatial filtering parameter
  • T is the transpose operation
  • H is the conjugate transpose
  • 2 * ⁇ * f
  • f is the matrix corresponding to all subband frequencies
  • is the direction angle of the sound and focus .
  • g ( ⁇ , ⁇ ) is a signal vector function.
  • is a gain limit of white noise.
  • the embodiment of the present invention does not limit the specific value of ⁇ .
  • the preset second limitation condition is that the product of the spatial filtering parameter and the signal vector function is a first preset value.
  • the first preset value is 1.
  • the spatial filtering parameters and the signal vector function are both matrices, and in general, the matrix of the signal vector function hardly changes.
  • the spatial conditions of beamforming are limited.
  • the first restriction condition and the second restriction condition must be satisfied at the same time.
  • it may also include satisfying a third limiting condition.
  • the third limiting condition is: determining the convexity of the loss function.
  • R nn is the covariance matrix of noise
  • g ( ⁇ , ⁇ ) is a signal vector function
  • H is a conjugate transpose
  • the loss function constructed according to the spatial filtering parameters and the signal vector function is:
  • the loss function b_hat makes the final response at each angle ⁇ :
  • calculating the spatial filtering parameter when the loss function approaches a minimum value is as follows:
  • FIG. 5 is a flowchart of a multi-beam beamforming method according to an embodiment of the present invention. As shown in FIG. 5, the multi-beam beamforming method in this embodiment includes the following steps:
  • Step S210 Calculate that the target sound source points to the corresponding beamforming output.
  • the source angle of the beam forming sound is directed by at least two sound sources, forming a multi-beam beam forming.
  • the sound source is directed at any angle of plane wave 0 ° -180 °, which needs to be explained.
  • the at least two sound source directions described in the embodiment of the present invention include a target sound source and at least one other sound source direction.
  • the following embodiments will use beam directions: 0 °, 30 °, 60 °, 90 ° , 120 °, 150 °, and 180 ° directions (a total of 7 directions) are used as an example for explanation.
  • the target sound source is directed at 90 °.
  • the angle can be 53 °, 80 °, and the target sound source can also be 60 °. It is not limited.
  • each sound source pointing needs to be determined through a microphone array, which specifically includes: the microphone array is composed of a certain number of acoustic sensors (generally microphones) , Used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be uniformly distributed at 4 equal intervals (as shown in Figure 2), uniformly distributed at 6 equal intervals, and 8 in circles. Shapes are uniformly distributed at equal intervals (as shown in FIG.
  • 12 or 14 are uniformly distributed at equal intervals such as circles, rectangles, and crescents, etc.
  • the specific embodiment of the present invention does not limit the number and arrangement of microphone arrays. However, for convenience of description, the embodiment of the present invention will be described by taking the four linear microphone arrays 3 shown in FIG. 3 as an example, but it should be clear that this description method is not a specific limitation on the microphone array. .
  • the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning.
  • the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
  • a GSC Generalized Sidelobe Cancellation
  • the blocking matrix is used to characterize the frequency response of the sound signal.
  • the purpose of calculating the noise parameter is to reduce the noise of the sound pointed by the non-target sound source.
  • the beam pointing is: 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, 180 ° directions (a total of 7 directions), and the target sound source is 90 °, so the noise parameter is used for the sound Sources are: 0 °, 30 °, 60 °, 120 °, 150 °, 180 ° for noise reduction.
  • Step S230 Perform noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output to the target sound source according to the noise parameter.
  • the signal pointed by the non-target sound source in step S220 is filtered, that is, the noise parameter is used to reduce the signal pointed by the non-target sound source.
  • Noise in this way, can not only ensure that the target sound source is not distorted by the sound, but also reduce the interference of other sound sources to the sound.
  • the multi-beam beamforming method provided in the embodiment of the present invention calculates a target sound source to point to a corresponding beamforming output; calculates a noise parameter through a blocking matrix; and points the target sound source outside the corresponding beamforming output according to the noise parameter.
  • the embodiments of the present invention can ensure that the sound pointed by the target sound source is not distorted, and perform noise reduction on the sound pointed by other sound sources, which can effectively suppress other sounds.
  • Directional interference can be used to ensure that the sound pointed by the target sound source is not distorted, and perform noise reduction on the sound pointed by other sound sources, which can effectively suppress other sounds.
  • step S210 When step S210 is performed to calculate the target sound source pointing to the corresponding beamforming output, the following methods may be adopted, for example: acquiring spatial filtering parameters, and determining the target sound source corresponding to the spatial filtering parameters, and obtaining the target sound source Pointing to the corresponding original frequency domain signal; calculating the product of the spatial filtering parameter and the target sound source pointing to the corresponding original frequency domain signal to obtain the beamforming pointed by the target sound source.
  • the spatial filtering parameter according to the embodiment of the present invention is a filter parameter in the frequency domain, and its purpose is to make a corresponding gain on the subband frequency of the signal of each frame.
  • the spatial filtering parameters described in the embodiments of the present invention are a matrix.
  • the spatial filtering parameters are calculated by computer equipment, and the obtained spatial filtering parameters are stored in an electronic device that executes the method according to the embodiments of the present invention. In the device, the power supply sub-device is used directly, thereby reducing the time consumption of beamforming.
  • determining the spatial filtering parameters W f ( ⁇ ) corresponding to the target sound source directed at the direction of the beam focused by the delay time positioning i.e., determining the spatial filter parameters W f ( ⁇ ) corresponding to the target sound source point.
  • the following method can be adopted, but not limited to, the delay time of the sound source reaching each microphone can be calculated through the physical structure of the microphone arrangement. Assumption: Determine the microphone distance d, the sound propagation speed c, and the angle ⁇ at which the sound source points (that is, the angle of the direction in which you want to receive and focus, such as 90 ° directly in front).
  • the delay time of tau_1 2 * d * sin ( ⁇ ) / c
  • tau_1 refers to the delay time from the sound field to the second microphone Mic2.
  • the above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
  • the signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time.
  • a matrix corresponding to all subband frequencies needs to be determined.
  • the signal vector function is:
  • is the direction angle of sound receiving and focusing
  • j is the phase at a certain time
  • 2 * ⁇ * f
  • f is a matrix corresponding to all subband frequencies
  • ⁇ 0 is the sound source to the first
  • N is the number of microphones
  • ⁇ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone.
  • the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • the noise parameter is calculated through the blocking matrix, which can be adopted, but not limited to, for example, by calculating the frequency response of the sound signal reaching the microphone in sequence, and constructing the blocking matrix based on the frequency response, according to the blocking matrix and the non-target sound
  • the source points to the corresponding original frequency domain signal, and the noise parameters are calculated.
  • the purpose of calculating the noise parameter is to reduce the noise of the sound pointed by the non-target sound source.
  • the noise parameter is calculated according to the blocking matrix H (e j ⁇ ) and the non-target sound source pointing to the corresponding original frequency domain signal Z (t, e j ⁇ ):
  • t represents the input time of each frame signal.
  • the signal pointed by the non-target sound source in step S220 is filtered, that is, the noise parameter U (t, e j ⁇ )
  • the signal pointed by the target sound source is denoised, so that it can not only ensure that the target sound source is not distorted, but also reduce the interference of non-target sound sources.
  • step S230 noise reduction is performed according to the noise parameter U (t, e j ⁇ ) on the signals pointed by the sound source other than the corresponding beamforming output, which can be adopted, but not limited to
  • the following methods include: calculating multi-channel optimal filtering parameters through a multi-channel filtering algorithm and an iterative algorithm; and pointing the target sound source to the corresponding beam forming output according to the beam forming output, optimal filtering parameters, and noise parameters of the target sound source. Signals pointed to by other sound sources are denoised.
  • the embodiment of the present invention is described by taking a multi-channel filtering algorithm as a multi-channel Wiener filtering as an example.
  • the optimal filtering parameter G is calculated by using a multi-channel Wiener filter and an NLMS iterative method (Normalized Least Mean Square).
  • FIG. 6 shows a schematic diagram of the final voice output pointed to by a target sound source according to an embodiment of the present invention, where Y ( ⁇ , ⁇ ) in FIG. 7 It is expressed as Y FBF (t, e j ⁇ ), and G (t, e j ⁇ ) * U (t, e j ⁇ ) is expressed as Y NC (t, e j ⁇ ).
  • this can further ensure that the sound pointed by the target sound source is not distorted and further suppress the non- The interference pointed by the target sound source.
  • FIG. 7 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention. As shown in FIG. 7, the multi-beam beamforming method in this embodiment includes the following steps:
  • Step S340 Calculate a product of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions, respectively, to obtain multi-beam beamforming.
  • the spatial filtering parameters vary with the angle of the sound source and the frequency of the subband.
  • At least two sound source directions include a target sound source and at least one non-target sound source direction.
  • the spatial filtering parameter described in this embodiment is a filter parameter in the frequency domain, and its purpose is to make a corresponding gain on the subband frequency of the signal of each frame.
  • the spatial filtering parameters described in the embodiments of the present invention are a matrix.
  • the spatial filtering parameters are obtained through calculation by a computer device. After the calculation results are obtained, the spatial filtering parameters are stored in the electronic device according to the embodiments of the present invention. In the use of power supply equipment, the time consumption of beamforming is shortened.
  • the method described in steps S1 to S3 in FIG. 4 may be used to calculate the spatial filtering parameters, and details are not described herein again.
  • the beam angle of the sound source in this embodiment is directed by at least two sound sources, which constitutes multi-beam beamforming.
  • the sound source is directed at any angle of plane wave 0 ° -180 °.
  • the at least two sound source directions described in the embodiment of the present invention include a target sound source and at least one other sound source direction.
  • the following embodiments will use beam directions: 0 °, 30 °, 60 °, 90 °, 120
  • the directions of °, 150 °, and 180 ° are used as an example for description.
  • the target sound source is pointed at 90 °.
  • this method is not easy to perform in a limited beam.
  • the above angle can also point to 53 °, 80 °, and the target sound source can also be 60 °, etc., which is not specifically limited.
  • each sound source pointing needs to be determined through a microphone array, which specifically includes: the microphone array is composed of a certain number of acoustic sensors (generally microphones) , Used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be uniformly distributed at 4 linear shapes and evenly spaced (as shown in Figure 2), uniformly distributed at 6 linear shapes and evenly spaced, and 8 circled.
  • the microphone array is composed of a certain number of acoustic sensors (generally microphones) , Used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be uniformly distributed at 4 linear shapes and evenly spaced (as shown in Figure 2), uniformly distributed at 6 linear shapes and evenly spaced, and 8 circled.
  • Shapes are uniformly distributed at equal intervals (as shown in FIG. 3), 12 or 14 are uniformly distributed at equal intervals such as circles, rectangles, and crescents, etc.
  • the specific embodiment of the present invention does not limit the number and arrangement of microphone arrays. However, for the convenience of description, the embodiments of the present invention will be described later using the microphone array style and quantity in FIG. 2 as an example, but it should be clear that this description manner does not specifically limit the microphone array.
  • the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning.
  • the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
  • Step S320 Calculate the enhanced speech pointed by the target sound source.
  • the microphone array 2 in FIG. 2 is used as an example.
  • the 7 segments of sound are subjected to Fourier transform to obtain 7 4 * 512 matrices, where 4 represents the number of microphones.
  • 512 represents that the spectrum corresponding to different directions is decomposed into 512 subbands respectively.
  • the purpose of this step is to perform filtering processing from the perspective of the subbands, and determine the proportion of all subbands corresponding to the target sound source on each subband.
  • the frequency spectrum corresponding to the target sound source corresponds to ⁇ 1: 4 * 512 subbands
  • the 0 ° sound source points to the corresponding spectrum corresponding to ⁇ 2: 4 * 512 subbands
  • the 30 ° sound source points to the corresponding frequency spectrum.
  • 60 ° sound source points to the corresponding spectrum corresponds to ⁇ 4: 4 * 512 subbands
  • 120 ° sound source points to the corresponding spectrum corresponds to ⁇ 5: 4 * 512 subbands
  • 150 ° sound source points to the corresponding spectrum corresponds to ⁇ 6: 4 * 512 subbands
  • a 180 ° sound source pointing to the corresponding spectrum corresponds to ⁇ 7: 4 * 512 subbands.
  • calculating the ratio gain of the target sound source pointing is: ⁇ 1 / ( ⁇ 1 + ⁇ 2 + ⁇ 3 + ⁇ 4 + ⁇ 5 + ⁇ 6 + ⁇ 7); in another implementation, calculating the target sound source pointing corresponding The ratio gain is: ⁇ 1 / ( ⁇ 2 + ⁇ 3 + ⁇ 4 + ⁇ 5 + ⁇ 6 + ⁇ 7).
  • Step S330 Calculate an energy ratio based on the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
  • multiple subbands of the current frame spectrum decomposition are combined, and the energy of the combined subbands is obtained.
  • the current frame includes a target sound source and a non-target sound source.
  • the 512 subbands corresponding to the target sound source are combined first, and the combined subband energy is determined.
  • calculate 6 The sum of the energy of all subbands pointed by the sound source (or 7 sound sources, including the target sound source). The energy sum is a matrix.
  • the energy ratio is calculated based on the sum of the energy of the subbands corresponding to the target sound source and the energy of all the subbands pointed by the 6 sound sources (or 7 sound sources, including the target sound source).
  • Step S340 Calculate a product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech and the energy ratio pointed by the target sound source to reduce noise to the non-target sound source, and output the speech corresponding to the product.
  • the shaping can ensure that the sound pointed by the target sound source is not distorted, and at the same time, can suppress the noise generated in the direction of the non-target sound source.
  • multi-beam beamforming is obtained by calculating a product of a spatial filtering parameter and at least two sound source directions corresponding to original frequency domain signals, respectively.
  • the product of the voice, energy ratio, and the original frequency domain signal pointed by the target sound source to output the speech corresponding to the product, thereby achieving noise reduction processing for non-target sound sources and ensuring that the sound pointed by the target sound source is not distorted.
  • FIG. 8 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention.
  • an embodiment of the present invention also provides another method for multi-beam beamforming.
  • the method for multi-beam beamforming in this embodiment includes the following steps:
  • Step S410 Calculate a product of the spatial filtering parameter and the original frequency domain signals corresponding to the at least two sound source directions, respectively, to obtain multi-beam beamforming.
  • the spatial filtering parameters vary with the angle of the sound source and the frequency of the subband.
  • At least two sound source directions include a target sound source and at least one non-target sound source direction.
  • the spatial filter is determined parameter W f ( ⁇ ) corresponding to at least two points when the sound source direction of the beam focused by positioning the delay time, i.e., determining the spatial filter parameters W f ( ⁇ ) corresponding to the target sound source
  • the delay time of tau_1 2 * d * sin ( ⁇ ) / c
  • tau_1 refers to the delay time from the sound field to the second microphone Mic2.
  • the above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
  • the signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time.
  • a matrix corresponding to all subband frequencies needs to be determined.
  • the signal vector function is:
  • is the direction angle of sound receiving and focusing
  • j is the phase at a certain time
  • 2 * ⁇ * f
  • f is a matrix corresponding to all subband frequencies
  • ⁇ 0 is the sound source to the first
  • N is the number of microphones
  • ⁇ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone.
  • the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
  • the beam directions calculated by the above method are respectively : 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, 180 ° directions (7 directions in total) for single beam forming.
  • 7 7 * 512 matrices are obtained, 4 represents the number of microphones, and 512 represents the frequency spectrum corresponding to different directions is decomposed into 512 subbands respectively.
  • Step S420 Calculate the enhanced speech pointed by the target sound source.
  • the following methods are used to calculate the enhanced speech pointed by the target sound source, including:
  • each subband calculates the ratio gain between the energy pointed by the target sound source and the energy sum directed by all sound sources; calculate the product of the first product B ( ⁇ , ⁇ ) and the ratio gain to obtain enhanced speech, where:
  • the first product is a product between the target sound source pointing to a corresponding original frequency domain signal and the spatial filtering.
  • the essence is to merge the 4 microphones, that is, to obtain 7 1 * 512 matrices, and obtain the energy sums pointed by all sound sources as Spectrum power of other directions, continue Obtain the energy pointed by the target sound source and record it as: Spectrum power of target directions. Calculate the ratio of the energy pointed by the target sound source to the energy of the target direction and the energy pointed by all sound sources and the Spectrum power of other directions to get the ratio gain Gain- mask.
  • Step S430 Calculate an energy ratio based on the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
  • the energies of all subbands in the current frame are combined, and the energy sum of all subbands in the current frame is calculated; the energy of the subband corresponding to the target sound source and the energy of all subbands pointed to by the non-target sound source are calculated.
  • the ratio between and to get the energy ratio is calculated.
  • the ratio between the energy of the subband corresponding to the target sound source and the energy sum of all subbands in the current frame is calculated to obtain the energy ratio.
  • the current frame contains all subbands in the direction of the 7 sound sources.
  • the energy corresponding to all subbands in the current frame is combined.
  • all the subbands pointed by each sound source are combined to obtain the spectrum corresponding to the different directions.
  • 7 * 1 matrix where 7 is the direction of 7 sound sources and 1 is the combined subband (spectrum).
  • all subbands corresponding to different directions are combined to obtain a 1 * 1 matrix, that is, the energy sum of all subbands is obtained according to the matrix, and it is denoted as Energy of each bin direction.
  • Step S240 Perform frame-by-frame smoothing processing on the current frame and the previous frame through the smoothing parameters.
  • the purpose of performing the smoothing process is to enable a smooth transition of speech before two consecutive frames. Therefore, when smoothing the current frame and the previous frame frame by frame using the smoothing parameter, the following manners can be adopted but not limited to:
  • the smoothing parameters of the current frame so that the sum of the smoothing parameters of the current frame and the smoothing parameters of the previous frame is the second preset value.
  • the second preset value is 1.
  • the frame-by-frame smoothing process is performed on the sound source in the current frame according to the sum of the second product and the third product.
  • the smoothing parameter ⁇ is an empirical value
  • the smoothing parameter ⁇ of the current frame can be set to 0.8
  • the present invention does not limit this.
  • Step S450 Calculate a product of the corresponding enhanced speech and energy ratio pointed by the target sound source and the original frequency domain signal pointed by the target sound source, and output the speech corresponding to the product according to the smoothing result.
  • multi-beam beamforming is obtained by calculating a product of spatial filtering parameters and original frequency domain signals corresponding to at least two sound source directions respectively, and by calculating an enhanced voice, an energy ratio, and a target sound source direction of the target sound source, The product of the original frequency domain signal, while smoothing the current frame and the previous frame by smoothing parameters, and outputting the speech corresponding to the product according to the smoothing processing result, further reducing the noise of non-target sound sources, and further ensuring The sound pointed by the target sound source is not distorted.
  • another embodiment of the present invention further provides a voice processing apparatus.
  • This device embodiment corresponds to the foregoing method embodiment.
  • this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
  • another embodiment of the present invention further provides a beamforming apparatus.
  • This device embodiment corresponds to the foregoing method embodiment.
  • this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
  • FIG. 9 is a schematic diagram of a beamforming apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of another beamforming apparatus according to an embodiment of the present invention.
  • the beamforming apparatus 9 of this embodiment includes a first obtaining unit 91, a determining unit 92, a second obtaining unit 93, and a first calculating unit 94.
  • the first obtaining unit 91 is configured to obtain spatial filtering parameters, and the spatial filtering parameters are different according to different angles and subband frequencies.
  • the determining unit 92 is configured to determine a sound source corresponding to the spatial filtering parameter obtained by the first obtaining unit 91.
  • the second obtaining unit 93 is configured to obtain that the sound source determined by the determining unit 92 points to a corresponding original frequency domain signal.
  • the first calculation unit 94 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal, and the product is used to suppress other frequency domain signals except the original frequency domain signal pointed by the sound source.
  • the beamforming device 9 further includes:
  • the second calculation unit 95 is configured to calculate the spatial filtering parameters before the first obtaining unit 93 obtains the spatial filtering parameters.
  • the second calculation unit 95 includes:
  • the first calculation module 951 is configured to calculate a delay time when the sound source reaches the microphone array.
  • a building module 952 is used to build a signal vector function.
  • a second calculation module 953 is configured to calculate a sound source direction according to the signal vector function constructed by the construction module 952 and the delay time calculated by the first calculation module 951.
  • the first setting module 954 is configured to set a first limiting condition, where the first limiting condition is a white noise gain limitation.
  • the second setting module 955 is configured to set a second limitation condition, where the second limitation condition is that a product of the spatial filtering parameter and the signal vector function is 1.
  • a construction module 956 is configured to construct a loss function according to the spatial filtering parameter and the signal vector function.
  • a third calculation module 957 is configured to calculate the loss function according to the first restriction condition set by the first setting module 954 and the second restriction condition set by the second setting module 955. Spatial filtering parameters towards the minimum.
  • the first calculation module 951 includes:
  • the first determining sub-module 951a is configured to determine a distance between microphones in the microphone array, and a speed at which a sound source propagates sound.
  • the second determining sub-module 951b is configured to determine an angle pointed by the sound source.
  • a calculation sub-module 951c is configured to calculate a delay time according to a distance, a speed, and an angle between the microphones.
  • the second calculation module 953 includes:
  • a determining sub-module 953a is configured to determine a matrix corresponding to all sub-band frequencies.
  • a calculation sub-module 953b is configured to calculate a sound source direction according to the matrices corresponding to all the sub-band frequencies, the signal vector function, and the delay time determined by the determination sub-module.
  • the spatial filtering parameter is a matrix.
  • the sound source is directed to an arbitrary angle of 0 ° -180 ° of a plane wave.
  • the beamforming device described in this embodiment is a device that can execute the beamforming method in the embodiment of the present invention, based on the beamforming method described in the embodiment of the present invention, those skilled in the art can understand the The specific implementations of the beamforming device and its various variations, so how to implement the beamforming method in the embodiment of the present invention with the beamforming device will not be described in detail here. As long as a device used by a person skilled in the art to implement the beamforming method in the embodiment of the present invention falls within the protection scope of the present application.
  • another embodiment of the present invention further provides a multi-beam beamforming apparatus.
  • This device embodiment corresponds to the foregoing method embodiment.
  • this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
  • FIG. 11 is a schematic diagram of a multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • the multi-beam beamforming apparatus 11 of this embodiment includes a first calculation unit 111, a second calculation unit 112, and a noise reduction unit 113.
  • the first calculation unit 111 is configured to calculate that the target sound source points to a corresponding beamforming output.
  • the second calculation unit 112 is configured to calculate a noise parameter by using a blocking matrix.
  • the noise reduction unit 113 is configured to perform, according to the noise parameter calculated by the second calculation unit 112, a signal pointed by the target sound source calculated by the first calculation unit 111 to a non-target sound source other than the corresponding beamforming output. Noise reduction.
  • the first calculation unit 111 includes:
  • the first obtaining module 1111 is configured to obtain spatial filtering parameters.
  • a determining module 1112 is configured to determine a target sound source corresponding to the spatial filtering parameter obtained by the first obtaining module 1111.
  • the second acquisition module 1113 is configured to acquire the target sound source acquired by the first acquisition module 1111 to point to the corresponding original frequency domain signal.
  • a calculation module 1114 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain the beamforming pointed by the target sound source.
  • the second calculation unit 112 includes:
  • the first calculation module 1121 is configured to calculate a frequency response of the sound signal reaching the microphone in order.
  • a construction module 1122 is configured to construct the blocking matrix according to the frequency response calculated by the first calculation module.
  • a second calculation module 1123 is configured to calculate the noise parameter according to the blocking matrix constructed by the construction module and the other sound sources pointing to corresponding original frequency domain signals.
  • the noise reduction unit 113 includes:
  • a calculation module 1131 is configured to calculate a multi-channel optimal filtering parameter by using a multi-channel filtering algorithm and an iterative algorithm.
  • the noise reduction module 1132 is configured to perform, according to the beamforming output of the target sound source, an optimal filtering parameter, and the noise parameter, a signal directed by the sound source directed by a sound source other than the corresponding beamforming output. Noise reduction.
  • the multi-beam beamforming apparatus described in this embodiment is an apparatus that can execute the multi-beam beamforming method in the embodiment of the present invention, based on the multi-beam beamforming method described in the embodiment of the present invention, those skilled in the art
  • the specific implementations of the multi-beam beamforming apparatus of this embodiment and its various variations can be understood, so how to implement the multi-beam beamforming apparatus in the embodiment of the present invention will not be described in detail here.
  • a device used by a person skilled in the art to implement the multi-beam beamforming method in the embodiment of the present invention falls within the scope of the present application.
  • another embodiment of the present invention further provides a multi-beam beamforming apparatus.
  • This device embodiment corresponds to the foregoing method embodiment.
  • this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
  • FIG. 13 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • FIG. 14 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention.
  • the multi-beam beamforming apparatus 13 in this embodiment includes a first calculation unit 131, a second calculation unit 132, a third calculation unit 133, and a fourth calculation unit 134.
  • the first calculation unit 131 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signals corresponding to the at least two sound source directions respectively to obtain multi-beam beamforming.
  • the spatial filtering parameter varies with the angle of the sound source and the subband frequency.
  • Each sound source point includes a target sound source point and at least one non-target sound source point.
  • the second calculation unit 132 is configured to separately calculate the enhanced speech pointed by the target sound source.
  • the third calculation unit 133 is configured to calculate an energy ratio according to the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
  • a fourth calculation unit 134 is configured to calculate a product of the original frequency domain signal pointed by the target sound source and the enhanced speech and energy ratio corresponding to the target sound source direction, and output the speech corresponding to the product.
  • the multi-beam beamforming device 13 further includes:
  • a processing unit 135, configured to: before the fourth calculation unit 134 calculates a product of the original frequency domain signal pointed by the target sound source and the target speech source pointed by the corresponding enhanced speech and energy ratio, smoothing the current frame One frame is smoothed frame by frame.
  • the first calculation unit 131 includes:
  • the first obtaining module 1311 is configured to obtain spatial filtering parameters.
  • a determining module 1312 is configured to determine at least two sound source directions respectively corresponding to the spatial filtering parameters obtained by the first obtaining module 1311.
  • a second acquisition module 1313 is configured to acquire at least two sound sources determined by the determination module to point to corresponding original frequency domain signals.
  • a calculation module 1314 is configured to calculate products of the spatial filtering parameters and original frequency domain signals corresponding to different sound source directions, respectively.
  • the second calculation unit 132 includes:
  • the first calculation module 1321 is configured to calculate a ratio gain between the energy pointed by the target sound source and the energy sum pointed by all the sound sources by using each subband as a unit.
  • a second calculation module 1322 is configured to calculate a product of a first product and a ratio gain to obtain enhanced speech, where the first product is a signal between the target sound source pointing to a corresponding original frequency domain signal and the spatial filtering. product.
  • the third calculation unit 133 includes:
  • a combining module 1331 is configured to combine the energy corresponding to all subbands in the current frame.
  • the first calculation module 1332 is configured to calculate energy sums of all subbands in the current frame.
  • a second calculation module 1333 is configured to calculate a ratio between the energy of the sub-band corresponding to the target sound source and the energy of all the sub-bands pointed to by at least one non-target sound source to obtain an energy ratio.
  • the processing unit 135 includes:
  • a setting module 1351 is used to set the smoothing parameters of the current frame so that the sum of the smoothing parameters of the current frame and the smoothing parameters of the previous frame is 1.
  • a calculation module 1352 is configured to calculate a product of a ratio gain of a previous frame and a corresponding smoothing parameter to obtain a second product, and calculate a product of a smoothing parameter of the current frame and the ratio gain to obtain a third product.
  • the processing module 1353 is configured to perform frame-by-frame smoothing processing on the current frame according to the sum of the first product and the second product.
  • the fourth calculation unit 134 is further configured to calculate a product of the target sound source pointing to a corresponding enhanced voice, an energy ratio, and the original frequency domain signal pointed to by the target sound source, and output a smoothing result according to a smoothing result.
  • the speech corresponding to the product is described.
  • the apparatus for multi-beam beamforming calculates a product of a spatial filtering parameter and at least two sound sources pointing to corresponding original frequency-domain signals to obtain multi-beam beamforming.
  • the spatial filtering parameter varies with the angle of the sound source.
  • the at least two sound source directions include a target sound source and at least one other sound source direction; calculating an enhanced speech pointed by the target sound source; and according to the sub-band energy corresponding to the target sound source and at least one Sum of the energy of all subbands pointed by other sound sources, calculate the energy ratio; calculate the product of the original frequency-domain signal pointed by the target sound source and the target sound source pointed to the corresponding enhanced speech, energy ratio, and output the product corresponding to the product
  • the embodiment of the present invention can ensure that the sound pointed by the target sound source is not distorted, and can effectively suppress interference from other sound directions.
  • the multi-beam beamforming apparatus described in this embodiment is an apparatus that can execute the multi-beam beamforming method in the embodiment of the present invention, based on the multi-beam beamforming method described in the embodiment of the present invention, those skilled in the art
  • the specific implementations of the multi-beam beamforming apparatus of this embodiment and its various variations can be understood, so how to implement the multi-beam beamforming apparatus in the embodiment of the present invention will not be described in detail here.
  • a device used by a person skilled in the art to implement the multi-beam beamforming method in the embodiment of the present invention falls within the scope of the present application.
  • Each of the foregoing devices includes a processor and a memory.
  • Each unit in the device is stored in the memory as a program unit, and the processor executes the program unit stored in the memory to implement a corresponding function.
  • the processor contains a kernel, and the kernel retrieves the corresponding program unit from the memory.
  • the kernel can set one or more, and when adjusting the kernel parameters to implement the above method, ensure that the sound pointed by the target space is not distorted, and the sound pointed by other spaces is effectively suppressed.
  • Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flashRAM).
  • Memory includes at least one storage chip.
  • An embodiment of the present invention provides a storage medium on which a program is stored, and when the program is executed by a processor, the foregoing voice processing method is implemented.
  • An embodiment of the present invention provides a processor, where the processor is configured to run a program, and when the program runs, the foregoing voice processing method is performed.
  • FIG. 15 is a structural block diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 15, the electronic device 17 includes:
  • At least one processor 151 At least one processor 151;
  • the processor 151 and the memory 152 complete communication with each other through the bus 153;
  • the processor 151 is configured to call program instructions in the memory 152 to execute any one of the foregoing methods.
  • the electronic devices in this article can be servers, PCs, PADs, mobile phones, smart TVs, and other smart devices that include microphones.
  • the electronic device obtained by the embodiment of the present invention obtains a beamforming output pointed by the target sound source by calculating a product of a spatial filtering parameter and a target original sound source signal corresponding to the target sound source pointing, and performs noise reduction by pointing to a non-target sound source
  • the processing improves the signal-to-noise ratio of the beamforming output pointed by the target sound source. Therefore, it is possible to ensure that the sound pointed by the target space is not distorted, and effectively suppress the sound pointed by other target spaces, thereby improving the signal-to-noise ratio of the sound pointed by the target space.
  • An embodiment of the present invention further provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute any one of the foregoing voice processing methods.
  • This application also provides a computer program product that, when executed on a data processing device, implements the functions of any of the above-mentioned speech processing methods.
  • this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
  • processors CPUs
  • input / output interfaces output interfaces
  • network interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flashRAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • flashRAM flash memory
  • Computer-readable media includes permanent and non-persistent, removable and non-removable media.
  • Information storage can be accomplished by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device.
  • computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.
  • this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.

Abstract

Disclosed are a beamforming method, multi-beam forming method and apparatus, and an electronic device. The product of a spatial filtering parameter and an original frequency domain signal corresponding to a target sound source orientation is calculated to obtain beamforming output of the target sound source orientation, and noise reduction processing is performed on a non-target sound source orientation to improve the signal-to-noise ratio of the beamforming output of the target sound source orientation. Therefore, it can be ensured that a target spatial orientation is undistorted; moreover, sounds of the other target spatial orientations are effectively suppressed, thereby improving the signal-to-noise ratio of a sound of the target spatial orientation.

Description

一种波束成形方法、多波束成形方法、装置及电子设备Beam forming method, multi-beam forming method, device and electronic equipment
本申请要求了2018年05月22日提交的、申请号为2018104970698、发明名称为“多波束波束成形的方法、装置及电子设备”,2018年05月22日提交的、申请号为2018104964502、发明名称为“多波束波束成形的方法、装置及电子设备”,以及2018年05月22日提交的、申请号为2018104964485、发明名称为“波束成形的方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the application number 2018104970698 filed on May 22, 2018 and the invention name as "Method, Device and Electronic Equipment for Multi-Beam Beamforming", and the application number 2018104964502, invention filed on May 22, 2018 A method, device and electronic device for multi-beam beamforming, and a Chinese patent application filed on May 22, 2018 with an application number of 2018104964485 and an invention name of "method, device and electronic device for beamforming" Priority, the entire contents of which are incorporated herein by reference.
技术领域Technical field
本申请实施例涉及语音处理技术领域,特别是涉及一种波束成形方法、多波束成形方法、装置及电子设备。The embodiments of the present application relate to the field of speech processing technologies, and in particular, to a beam forming method, a multi-beam forming method, a device, and an electronic device.
背景技术Background technique
随着智能终端技术的快速普及,用户对于智能终端的功能以及智能化的要求越来越高,如何使智能终端更加智能化,专业化,已经成为了当前研究方向之一。With the rapid popularization of smart terminal technology, users have increasingly higher requirements for the functions and intelligence of smart terminals. How to make smart terminals more intelligent and professional has become one of the current research directions.
比如:基本上所有的智能终端都标配录音功能,而录音功能大多数会使用波束成形(Beamforming),波束成形是一种用于传感器阵列的信号处理技术(例如麦克风阵列),用于定向信号接收和对接收到的声音信号进行适当的信号处理。波束成形允许麦克风组件接收声音信号以便达到选择性处理电信号的效果,例如,对从一个声源发出的声音信息的处理不同于从不同的声源发出的声音信息的处理。For example: Basically all smart terminals are equipped with a recording function, and most of the recording functions use beamforming. Beamforming is a signal processing technology (such as a microphone array) used for sensor arrays and used for directional signals. Receive and perform appropriate signal processing on the received sound signals. Beamforming allows the microphone component to receive sound signals in order to achieve the effect of selectively processing electrical signals. For example, the processing of sound information from one sound source is different from the processing of sound information from different sound sources.
目前,通常通过融合时域滤波器和频域中的波束成形驱动权重的计算来进行语音处理,但这并不能降低不需要的环境噪音。Currently, speech processing is usually performed by fusing the calculation of the beamforming driving weights in the time domain filter and the frequency domain, but this does not reduce unwanted environmental noise.
发明内容Summary of the Invention
有鉴于此,本申请实施例提供了一种波束成形方法、多波束成形方法、装置及电子设备,以确保目标空间指向的声音不失真,并对其他目标空间指向的声音进行有效抑制,从而提高目标空间指向的声音的信噪比。In view of this, embodiments of the present application provide a beam forming method, a multi-beam forming method, a device, and an electronic device, so as to ensure that the sound directed by the target space is not distorted, and effectively suppress the sound directed by other target spaces, thereby improving The signal-to-noise ratio of the sound pointed at the target space.
第一方面,本发明实施例提供一种波束成形的方法,包括:In a first aspect, an embodiment of the present invention provides a beamforming method, including:
获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;确定所述空间滤波参数对应的声音源指向,并获取所述声音源指向对应的原始频域信号;Acquiring a spatial filtering parameter, which is different with different angles and subband frequencies; determining the sound source direction corresponding to the spatial filtering parameter, and obtaining the original frequency domain signal corresponding to the sound source direction;
计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形。Calculate a product of the spatial filtering parameter and the original frequency domain signal, where the product is used to perform beamforming in a manner that suppresses other frequency domain signals other than the original frequency domain signal pointed by the sound source.
进一步地,在获取空间滤波参数之前,所述方法还包括:Further, before acquiring the spatial filtering parameters, the method further includes:
计算所述空间滤波参数。Calculate the spatial filtering parameters.
进一步地,所述计算空间滤波参数包括:Further, the calculating spatial filtering parameters includes:
计算声音源到达麦克风阵列的延迟时间;Calculate the delay time for the sound source to reach the microphone array;
根据所述延迟时间构建信号矢量函数,并根据所述信号矢量函数及所述延迟时间计算声音源指向;Constructing a signal vector function according to the delay time, and calculating a sound source direction according to the signal vector function and the delay time;
根据预设的第一限制条件和第二限制条件,计算损失函数趋向最小值时的空间滤波参数,所述损失函数根据所述空间滤波参数和所述信号矢量函数构造;Calculating a spatial filtering parameter when the loss function approaches a minimum value according to a preset first limiting condition and a second limiting condition, and the loss function is constructed according to the spatial filtering parameter and the signal vector function;
其中,所述第一限制条件具体为白噪音增益限制;所述第二限制条件具体为使得所述空间滤波参数与所述信号矢量函数的乘积为第一预设值。The first limitation condition is specifically a white noise gain limitation; the second limitation condition is that a product of the spatial filtering parameter and the signal vector function is a first preset value.
进一步地,计算声音源到达麦克风阵列的延迟时间包括:Further, calculating the delay time for the sound source to reach the microphone array includes:
确定麦克风阵列中麦克风之间的间距,以及声音源传播声音的速度;Determine the spacing between the microphones in the microphone array, and the speed at which the sound source propagates the sound;
确定所述声音源指向的角度;Determining an angle at which the sound source is pointing;
根据所述麦克风之间的间距、所述声音源传播声音的速度及所述声音源指向的角度计算延迟时间。The delay time is calculated according to a distance between the microphones, a speed at which the sound source propagates sound, and an angle at which the sound source points.
进一步地,根据所述信号矢量函数及所述延迟时间计算声音源指向包括:Further, calculating the sound source direction according to the signal vector function and the delay time includes:
确定所有子带频率对应的矩阵;Determine a matrix corresponding to all subband frequencies;
根据所述所有子带频率对应的矩阵、所述信号矢量函数及所述延迟时间计算声音源指向。Calculate a sound source direction according to the matrix corresponding to all subband frequencies, the signal vector function, and the delay time.
进一步地于,所述空间滤波参数为一矩阵。Further, the spatial filtering parameter is a matrix.
进一步地,所述声音源指向为平面波0°-180°的任意角度。Further, the sound source is directed to an arbitrary angle of 0 ° -180 ° of a plane wave.
第二方面,本发明实施例提供一种波束成形的装置,包括:In a second aspect, an embodiment of the present invention provides a beamforming apparatus, including:
第一获取单元,用于获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;A first obtaining unit, configured to obtain spatial filtering parameters, which are different with different angles and subband frequencies;
确定单元,用于确定所述第一获取单元获取的所述空间滤波参数对应的声音源指向;A determining unit, configured to determine a sound source corresponding to the spatial filtering parameter obtained by the first obtaining unit;
第二获取单元,用于获取所述确定单元确定的所述声音源指向对应的原始频域信号;A second obtaining unit, configured to obtain that the sound source determined by the determining unit points to a corresponding original frequency domain signal;
第一计算单元,用于计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形。A first calculation unit is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal, and the product is used to perform suppression in a frequency domain signal other than the original frequency domain signal pointed by the sound source. Beamforming.
第三方面,本发明实施例提供一种多波束波束成形的方法,包括:According to a third aspect, an embodiment of the present invention provides a multi-beam beamforming method, including:
计算目标声音源指向对应的波束成形输出;Calculate the target sound source pointing to the corresponding beamforming output;
根据阻塞矩阵计算噪音参数;Calculate noise parameters based on the blocking matrix;
根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。Performing noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output directed by the target sound source according to the noise parameter.
进一步地,计算目标声音源指向对应的波束成形输出包括:Further, calculating the target beamforming output of the target sound source includes:
获取空间滤波参数,确定所述空间滤波参数对应的目标声音源指向;Acquiring spatial filtering parameters, and determining the target sound source direction corresponding to the spatial filtering parameters;
获取所述目标声音源指向对应的原始频域信号;Acquiring that the target sound source points to a corresponding original frequency domain signal;
计算所述空间滤波参数与所述目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形输出。Calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain the beamforming output pointed by the target sound source.
进一步地,根据阻塞矩阵计算噪音参数包括:Further, calculating the noise parameters according to the blocking matrix includes:
计算声音信号依次达到麦克风的频率响应;Calculate the frequency response of the sound signal to the microphone in turn;
根据所述频率响应构建所述阻塞矩阵;Constructing the blocking matrix according to the frequency response;
根据所述阻塞矩阵及所述非目标声音源指向对应的原始频域信号,计算所述噪音参数。Calculate the noise parameter according to the blocking matrix and the non-target sound source pointing to a corresponding original frequency domain signal.
进一步地,根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪包括:Further, performing noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output to the target sound source according to the noise parameter includes:
通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;Calculate multi-channel optimal filtering parameters through multi-channel filtering algorithm and iterative algorithm;
根据所述目标声音源的波束成形输出、所述多通道最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。Performing noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output according to the beamforming output of the target sound source, the multi-channel optimal filtering parameter, and the noise parameter .
第四方面,本发明实施例提供一种多波束波束成形的装置,包括:According to a fourth aspect, an embodiment of the present invention provides a multi-beam beamforming apparatus, including:
第一计算单元,用于计算目标声音源指向对应的波束成形输出;A first calculation unit, configured to calculate that a target sound source points to a corresponding beamforming output;
第二计算单元,用于通过阻塞矩阵计算噪音参数;A second calculation unit, configured to calculate a noise parameter by using a blocking matrix;
降噪单元,用于根据所述第二计算单元计算的所述噪音参数对所述第一计算单元计算的所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。A noise reduction unit, configured to perform, according to the noise parameter calculated by the second calculation unit, a signal pointed by the target sound source calculated by the first calculation unit to a non-target sound source other than a corresponding beamforming output; Noise reduction.
进一步地,所述第一计算单元包括:Further, the first calculation unit includes:
第一获取模块,用于获取空间滤波参数;A first acquisition module, configured to acquire spatial filtering parameters;
确定模块,用于确定所述第一获取模块获取的所述空间滤波参数对应的目标声音源指向;A determining module, configured to determine a target sound source direction corresponding to the spatial filtering parameter obtained by the first obtaining module;
第二获取模块,用于获取所述第一获取模块获取的目标声音源指向对应的原始频域信号;A second acquisition module, configured to acquire a target sound source acquired by the first acquisition module to point to a corresponding original frequency domain signal;
计算模块,用于计算所述空间滤波参数与目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形输出。A calculation module is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain a beamforming output pointed by the target sound source.
进一步地,第二计算单元包括:Further, the second calculation unit includes:
第一计算模块,用于计算声音信号依次达到麦克风的频率响应;A first calculation module, configured to calculate a frequency response in which a sound signal reaches the microphone in order;
构建模块,用于根据所述第一计算模块计算的所述频率响应构建所述阻塞矩阵;A construction module, configured to construct the blocking matrix according to the frequency response calculated by the first calculation module;
第二计算模块,用于根据所述构建模块构建的所述阻塞矩阵及所述非目标声音源指向对应的原始频域信号,计算所述噪音参数。A second calculation module is configured to calculate the noise parameter according to the blocking matrix constructed by the construction module and the non-target sound source pointing to a corresponding original frequency domain signal.
进一步地,所述降噪单元包括:Further, the noise reduction unit includes:
计算模块,用于通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;A calculation module for calculating multi-channel optimal filtering parameters through a multi-channel filtering algorithm and an iterative algorithm;
降噪模块,用于根据所述目标声音源的波束成形输出、所述多通道最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。A noise reduction module, configured to point the target sound source to a non-target sound source other than the corresponding beamforming output according to the beamforming output of the target sound source, the multi-channel optimal filtering parameter, and the noise parameter The pointed signal is denoised.
第五方面,本发明提供一种多波束波束成形的方法,包括:In a fifth aspect, the present invention provides a multi-beam beamforming method, including:
计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个非目标声音源指向;Calculate the product of the spatial filtering parameters and at least two sound sources pointing to the corresponding original frequency domain signals to obtain multi-beam beamforming. The spatial filtering parameters vary with the angle of the sound source and the subband frequency. The at least two The sound source pointing includes a target sound source and at least one non-target sound source pointing;
计算所述目标声音源指向的增强语音;Calculating the enhanced speech pointed to by the target sound source;
根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值;Calculate the energy ratio according to the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source;
计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音。A product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, and the speech corresponding to the product is output.
进一步地,在计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积之前,所述方法还包括:Further, before calculating the product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the product of the energy ratio, the method further includes:
通过平滑参数对当前帧与前一帧进行逐帧平滑处理。Perform smoothing frame-by-frame on the current frame and the previous frame through the smoothing parameters.
进一步地,所述计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形包括:Further, calculating the product of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions respectively to obtain multi-beam beamforming includes:
获取空间滤波参数,并确定所述空间滤波参数分别对应的至少两个声音源指向;Acquiring spatial filtering parameters, and determining at least two sound source directions respectively corresponding to the spatial filtering parameters;
获取至少两个声音源指向分别对应的原始频域信号;Acquiring at least two sound sources pointing to respective original frequency domain signals;
计算所述空间滤波参数分别与至少两个声音源指向对应的原始频域信号的乘积。Calculate products of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions, respectively.
进一步地,所述计算目标声音源指向的增强语音包括:Further, the calculation of the enhanced speech pointed by the target sound source includes:
以每个子带为单位,计算所述目标声音源指向的能量与所有声音源指向的能量和之间的比值增益;Calculate a ratio gain between the energy pointed by the target sound source and the energy sum pointed by all sound sources with each subband as a unit;
计算第一乘积与所述比值增益的乘积,以获取所述增强语音,其中,所述第一乘积为所述目标声音源指向对应的原始频域信号与所述空间滤波参数之间的乘积。Calculate a product of a first product and the ratio gain to obtain the enhanced speech, wherein the first product is a product between the target sound source pointing to a corresponding original frequency domain signal and the spatial filtering parameter.
进一步地,根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值包括:Further, calculating the energy ratio according to the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source includes:
将当前帧中所有子带对应的能量进行合并,计算当前帧所有子带的能量和;Combine the energy corresponding to all subbands in the current frame to calculate the energy sum of all subbands in the current frame;
计算所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和之间的比值,得到能量比值。Calculate the ratio between the energy of the subband corresponding to the target sound source and the energy sum of all the subbands pointed to by at least one non-target sound source to obtain the energy ratio.
进一步地,通过平滑参数对当前帧与前一帧进行逐帧平滑处理包括:Further, performing smoothing processing on a frame-by-frame basis for the current frame and the previous frame by using a smoothing parameter includes:
设置当前帧的平滑参数,使得当前帧的平滑参数与前一帧的平滑参数之和为第二预设值;Set the smoothing parameters of the current frame so that the sum of the smoothing parameters of the current frame and the smoothing parameters of the previous frame is the second preset value;
计算前一帧的比值增益与前一帧的平滑参数以获取第二乘积;Calculate the ratio gain of the previous frame and the smoothing parameters of the previous frame to obtain the second product;
计算当前帧的比值增益与当前帧的平滑参数的乘积以获取第三乘积;Calculate the product of the ratio gain of the current frame and the smoothing parameter of the current frame to obtain a third product;
根据所述第二乘积与第三乘积之和对当前帧进行逐帧平滑处理。Performing frame-by-frame smoothing processing on the current frame according to the sum of the second product and the third product.
进一步地,计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音包括:Further, calculating the product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio, and outputting the speech corresponding to the product includes:
计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,根据平滑处理结果输出所述乘积对应的语音。A product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, and the speech corresponding to the product is output according to the smoothing processing result.
第六方面,本发明实施例提供一种多波束波束成形的装置,包括:According to a sixth aspect, an embodiment of the present invention provides a multi-beam beamforming apparatus, including:
第一计算单元,用于计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个非目标声音源声音源指向;A first calculation unit is configured to calculate a product of spatial filtering parameters and original frequency domain signals corresponding to at least two sound source directions, respectively, to obtain multi-beam beamforming. The spatial filtering parameters vary with the angle of the sound source and the subband frequency. Differently, the at least two sound source points include a target sound source point and at least one non-target sound source point sound source point;
第二计算单元,用于分别计算目标声音源指向的增强语音;A second calculation unit, configured to separately calculate the enhanced speech pointed by the target sound source;
第三计算单元,用于根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值;A third calculation unit, configured to calculate an energy ratio based on the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source;
第四计算单元,用于计算所述目标声音源指向的所述原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音。A fourth calculation unit is configured to calculate a product of the original frequency domain signal pointed by the target sound source, a corresponding enhanced speech pointed by the target sound source, and the energy ratio, and output a speech corresponding to the product.
第七方面,本发明实施例提供一种存储介质,其上存储有计算机程序,该程序被处理器执行以实现如本发明实施例第一方面所述的方法和/或如本发明实施例第三方面所述的方法和/或如本发明实施例第五方面所述的方法。In a seventh aspect, an embodiment of the present invention provides a storage medium on which a computer program is stored, and the program is executed by a processor to implement the method according to the first aspect of the embodiment of the present invention and / or the method according to the first embodiment of the present invention. The method described in the third aspect and / or the method described in the fifth aspect of the embodiments of the present invention.
第八方面,本发明实施例提供一种电子设备,所述电子设备中包括处理器、存储器和总线;所述处理器、所述存储器通过所述总线完成相互间的通信;所述存储器中用于存储程序指令,所述程序指令被所述处理器执行以实现如本发明实施例第一方面所述的方法和/或如本发明实施例第三方面所述的方法和/或如本发明实施例第五方面所述的方法。In an eighth aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes a processor, a memory, and a bus; the processor and the memory communicate with each other through the bus; And storing program instructions that are executed by the processor to implement the method according to the first aspect of the embodiment of the present invention and / or the method according to the third aspect of the embodiment of the present invention and / or the method according to the present invention The method described in the fifth aspect of the embodiment.
本发明实施例通过计算空间滤波参数与目标声音源指向对应的原始频域信号的乘积获取所述目标声音源指向的波束成形输出,并通过对非目标声音源指向进行降噪处理提高所述目标声音源指向的波束成形输出的信噪比。由此,可以确保目标空间指向的声音不失真,并对其他目标空间指向的声音进行有效抑制,从而提高目标空间指向的声音的信噪比。In the embodiment of the present invention, the beamforming output of the target sound source is obtained by calculating the product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source direction, and the target is improved by performing noise reduction processing on the non-target sound source direction The signal-to-noise ratio of the beamforming output pointed by the sound source. Therefore, it is possible to ensure that the sound pointed by the target space is not distorted, and effectively suppress the sound pointed by other target spaces, thereby improving the signal-to-noise ratio of the sound pointed by the target space.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请实施例的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the detailed description of the preferred embodiments below. The drawings are only for the purpose of illustrating preferred embodiments, and are not to be considered as limiting the embodiments of the present application. Moreover, the same reference numerals are used throughout the drawings to refer to the same parts. In the drawings:
图1是本发明实施例的一种波束成形方法的流程图;FIG. 1 is a flowchart of a beamforming method according to an embodiment of the present invention;
图2是本发明实施例的一种麦克风阵列的示意图;2 is a schematic diagram of a microphone array according to an embodiment of the present invention;
图3是本发明实施例的另一种麦克风阵列的示意图;3 is a schematic diagram of another microphone array according to an embodiment of the present invention;
图4是本发明实施例的一种计算空间滤波参数的方法流程图;4 is a flowchart of a method for calculating spatial filtering parameters according to an embodiment of the present invention;
图5是本发明实施例的一种多波束波束成形方法的流程图;5 is a flowchart of a multi-beam beamforming method according to an embodiment of the present invention;
图6是本发明实施例的一种目标声音源指向的最终语音输出的示意图;6 is a schematic diagram of a final voice output pointed by a target sound source according to an embodiment of the present invention;
图7是本发明实施例的另一种多波束波束成形方法的流程图;7 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention;
图8是本发明实施例的又一种多波束波束成形方法的流程图;8 is a flowchart of still another multi-beam beamforming method according to an embodiment of the present invention;
图9是本发明实施例的一种波束成形装置的示意图;9 is a schematic diagram of a beamforming apparatus according to an embodiment of the present invention;
图10是本发明实施例的另一种波束成形装置的示意图;10 is a schematic diagram of another beamforming apparatus according to an embodiment of the present invention;
图11是本发明实施例的一种多波束波束成形装置的示意图;11 is a schematic diagram of a multi-beam beamforming apparatus according to an embodiment of the present invention;
图12是本发明实施例的另一种多波束波束成形装置的示意图;12 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention;
图13是本发明实施例的又一种多波束波束成形装置的示意图;13 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention;
图14是本发明实施例的又一种多波束波束成形装置的示意图;14 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention;
图15是本发明实施例的一种电子设备的结构框图。FIG. 15 is a structural block diagram of an electronic device according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a thorough understanding of the present disclosure, and to fully convey the scope of the present disclosure to those skilled in the art.
图1是本发明实施例的一种波束成形方法的流程图。本实施例的声音源的波束成形方法如图1所示,包括以下步骤:FIG. 1 is a flowchart of a beamforming method according to an embodiment of the present invention. The beamforming method of the sound source in this embodiment is shown in FIG. 1 and includes the following steps:
步骤S110,获取空间滤波参数。其中,空间滤波参数随角度和子带频率的不同而不同。Step S110, acquiring spatial filtering parameters. Among them, the spatial filtering parameters are different with different angles and subband frequencies.
在本实施例中,可以通过空间滤波参数增强固定空间指向(声音源指向)的波束成形,以确保指向方向的声音大致不变,其他方向的声音会在一定程度上有抑制。In this embodiment, the beamforming in a fixed spatial direction (sound source pointing) can be enhanced by using spatial filtering parameters to ensure that the sound in the pointing direction is substantially unchanged, and the sound in other directions will be suppressed to a certain extent.
本发明实施例中的空间滤波参数为在频域中的滤波器参数,其目的在于对每一帧的信号在子带频率上做相应的增益或者抑制。在一种可选的实现方式中,本实施例中的空间滤波参数为一矩阵,该空间滤波参数经过计算机设备的计算得到,将获取的空间滤波参数存储于执行本发明实施例所述的方法的电子设备中,以供电子设备直接使用,从而缩短了波束成形的时间消耗。The spatial filtering parameter in the embodiment of the present invention is a filter parameter in the frequency domain, and its purpose is to perform corresponding gain or suppression on the subband frequency of the signal of each frame. In an optional implementation manner, the spatial filtering parameter in this embodiment is a matrix, and the spatial filtering parameter is calculated by a computer device, and the obtained spatial filtering parameter is stored in the method for performing the method described in the embodiment of the present invention. The electronic devices are directly used by the power supply sub-devices, thereby reducing the time consumption of beamforming.
为了便于说明,后续实施例会以波束指向正前方90°方向为例进行说明,即声音源指向为正前方90°,但是,应当说明的是,该种说明该方式并非易在限定波束执行仅能为90°,实际应用中,所述声音源指向为平面波0°-180°的任意角度,如30°、60°、120°等。In order to facilitate the description, the following embodiments will take the direction of the beam pointing directly in front of 90 ° as an example, that is, the sound source is pointing directly in front of 90 °, but it should be noted that this method is not easy to perform in a limited beam. It is 90 °. In practical applications, the sound source is directed at any angle of plane wave 0 ° -180 °, such as 30 °, 60 °, 120 ° and so on.
步骤S120,确定空间滤波参数对应的声音源指向。In step S120, the sound source corresponding to the spatial filtering parameter is determined.
步骤S130,获取声音源指向对应的原始频域信号。Step S130: Acquire a sound source pointing to a corresponding original frequency domain signal.
声音源从不同的方向达到麦克风阵列,导致不同麦克风接收到信号会有不同程度的延迟时间,可通过延迟时间进行波束聚焦的方向定位,并确定出与空间滤波参数一致的声音源指向(如正前方90°)。The sound source reaches the microphone array from different directions, causing different microphones to receive signals with different degrees of delay time. The delay time can be used to locate the direction of the beam focus and determine the direction of the sound source that is consistent with the spatial filtering parameters (such as positive 90 ° ahead).
所述麦克风阵列由一定数目的声学传感器(一般是麦克风)组成,用来对声场的空间特性进行采样,在实际应用中,麦克风数目可以4个成线型(如图2所示)等间距均匀分布、6个成线型等间距均匀分布、8个成圆形等间距均匀分布(如图3所示),12或14个成圆形、长方形、月牙形等间距均匀分布等等,具体的本发明实施例对麦克风阵列的数量和排列方式不作限定。但是,为了便于说明,本发明实施例后续会以图2所示的4个成线型的麦克风阵列2为例进行说明,但是应当明确的是,该种说明方式并不是对麦克风阵列的具体限定。The microphone array is composed of a certain number of acoustic sensors (usually microphones), which are used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be 4 in a linear pattern (as shown in Figure 2), with even spacing. Distribution, 6 evenly spaced evenly spaced lines, 8 evenly spaced evenly spaced circles (as shown in Figure 3), 12 or 14 evenly spaced evenly spaced circles, rectangles, crescents, etc. The number and arrangement of the microphone array are not limited in the embodiment of the present invention. However, for the convenience of description, the embodiment of the present invention will be described by taking the four linear microphone arrays 2 shown in FIG. 2 as an example, but it should be clear that this description method is not a specific limitation on the microphone array. .
在实际应用过程中,考虑到声波的特性,在对麦克风进行布局时,每个麦克风之间的距离不易设置的过大,也不能设置的过小,若设置的距离不合适会对声音源的聚焦定位产生误差,一般情况下,可设置麦克风之间的等间距距离小于80毫米,且大于30毫米。In the practical application process, considering the characteristics of sound waves, when laying out microphones, the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning. In general, the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
在本实施例中,在通过延迟时间进行波束聚焦的方向定位时,可以采用但不局限于以下方法,通过麦克风排列的物理结构,计算声音源到达每一个麦克风的延迟时间。假设:确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω(也即想要收声和聚焦的方向角度,如正前方90°)。在麦克风阵列中,选择一个最先到达麦克风的参照物(如图2中的Mic1),计算第一个麦克风Mic1的延迟时间为:tau_0=d*sin(Ω)/c;第二个麦克风Mic2的延迟时间为tau_1=2*d*sin(Ω)/c,第三个麦克风Mic4的延迟时间为:tau_2=3*d*sin(Ω)/c,第四个麦克风Mic4的延迟时间为:tau_3=4*d*sin(Ω)/c。以声音源指向的角度Ω为90°为例,通常第一个麦克风Mic1为参考麦克风,延迟时间为0,tau_1指的是声场到第二个麦克风Mic2的延迟时间。上述延迟时间的计算方法适用于线性等间距分布的麦克风阵列,其他麦克风分布及非等间距的计算方法与上述方法可能存在差异。In this embodiment, when positioning the direction of the beam focus by the delay time, the following method may be adopted, but the delay time for the sound source to reach each microphone may be calculated by the physical structure of the microphone arrangement. Assumption: Determine the microphone distance d, the sound propagation speed c, and the angle Ω at which the sound source points (that is, the angle of the direction in which you want to receive and focus, such as 90 ° directly in front). In the microphone array, select a reference object that reaches the microphone first (such as Mic1 in Figure 2), and calculate the delay time of the first microphone Mic1 as: tau_0 = d * sin (Ω) / c; the second microphone Mic2 The delay time of tau_1 = 2 * d * sin (Ω) / c, the delay time of the third microphone Mic4 is: tau_2 = 3 * d * sin (Ω) / c, and the delay time of the fourth microphone Mic4 is: tau_3 = 4 * d * sin (Ω) / c. Taking the angle Ω of the sound source as 90 ° as an example, usually the first microphone Mic1 is the reference microphone, and the delay time is 0. tau_1 refers to the delay time from the sound field to the second microphone Mic2. The above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
根据各麦克风阵列的延迟时间构建信号矢量函数,并根据信号矢量函数和延迟时间计算声音源指向。在构建信号矢量函数时,需要确定所有子带频率对应的矩阵。信号矢量函数为:The signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time. When constructing a signal vector function, a matrix corresponding to all subband frequencies needs to be determined. The signal vector function is:
Figure PCTCN2019087621-appb-000001
Figure PCTCN2019087621-appb-000001
其中,Ω为收声和聚焦的方向角度,j为某个时刻下的相位,ω=2*π*f,其中,f为所有子带频率对应的矩阵,τ 0为声音源到第一个麦克风的延迟时间,N为麦克风的数量,τ (N-1)为声音源到第N个麦克风的延迟时间。由此,可以根据信号矢量函数和各麦克风对应的延迟时间计算声音源指向。可选的,首先确定声音源对应的子带频率对应的矩阵,并根据声音源对应的所有子带频率对应的矩阵、上述信号矢量函数和延迟时间计算目标声音源指向。 Among them, Ω is the direction angle of sound receiving and focusing, j is the phase at a certain time, ω = 2 * π * f, where f is a matrix corresponding to all subband frequencies, and τ 0 is the sound source to the first The delay time of the microphone, N is the number of microphones, and τ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone. Optionally, the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
在实际应用过程中,为了便于对声音进行后续使用,需要先将声音信号通过傅立叶变换将原来难以处理的时域信号(声音信号)转换成了易于分析的频域信号,所述傅立叶变换的原理为任何连续测量的时序或信号,都可以表示为不同频率的正弦波信号的无限叠加,而根据该原理创立的傅立叶变换算法利用直接测量到的原始信号,以累加方式来计算该信号中不同正弦波信号的频率、振幅和相位。其中,有关傅立叶变换的具体实现方式本发明实施例在此不再进行赘述。In the practical application process, in order to facilitate the subsequent use of sound, it is necessary to first convert the sound signal through the Fourier transform to the time-domain signal (sound signal) that was originally difficult to process into a frequency-domain signal that is easy to analyze. For any continuously measured time sequence or signal, it can be expressed as an infinite superposition of sine wave signals of different frequencies, and the Fourier transform algorithm created according to this principle uses the directly measured original signal to calculate the different sine in the signal in an additive manner. The frequency, amplitude, and phase of the wave signal. A specific implementation manner of the Fourier transform is not described in this embodiment of the present invention.
需要说明的是,步骤110及步骤120之间并没有先后执行的限定,在实际应用中,也可先执行步骤110,再执行步骤120,或者,步骤110和步骤120同步执行,本发明实施例对此不做限定。It should be noted that there is no restriction on the execution between step 110 and step 120. In practical applications, step 110 may be performed first, and then step 120 may be performed, or step 110 and step 120 may be performed simultaneously. This embodiment of the present invention This is not limited.
步骤S140、计算获取的空间滤波参数与声音源的原始频域信号的乘积以获取该声音源指向的波束成形输出。其中,所述乘积会以对除声音源指向的原始频域信号之外的非目标声音源对应的原始频域信号产生抑制的方式进行波束成形。Step S140: Calculate a product of the obtained spatial filtering parameter and the original frequency domain signal of the sound source to obtain a beamforming output pointed by the sound source. Wherein, the product performs beamforming in a manner of suppressing the original frequency domain signal corresponding to a non-target sound source other than the original frequency domain signal pointed by the sound source.
其中,空间滤波参数和原始频域信号均为矩阵,将两个矩阵相乘,所述乘积会以对除声音源指向的原始频域信号之外的非目标声音源对应的原始频域信号产生抑制的方式进行波束成形,使得固定方向的声音信号不失真,并且,对其他方向的声音信号产生抑制。Wherein, the spatial filtering parameters and the original frequency domain signal are both matrices, and the two matrices are multiplied together, and the product is generated from the original frequency domain signal corresponding to a non-target sound source other than the original frequency domain signal pointed by the sound source. The beamforming is performed in a suppression manner, so that sound signals in a fixed direction are not distorted, and sound signals in other directions are suppressed.
本发明实施例提供的波束成形的方法,电子设备获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;确定所述空间滤波参数对应的声音源指向,并获取所述声音源指向对应的原始频域信号;计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积会以对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形;与现有技术相比,本发明不仅能够通过空间滤波参数的提前预置节省波束成形的时间,而且还能够实现对固定方向的声音信号不失真。In the beamforming method provided by the embodiment of the present invention, an electronic device obtains a spatial filtering parameter, and the spatial filtering parameter is different with different angles and subband frequencies; determining a sound source direction corresponding to the spatial filtering parameter, and acquiring the sound The source points to the corresponding original frequency domain signal; the product of the spatial filtering parameter and the original frequency domain signal is calculated, and the product is used to suppress the frequency domain signals other than the original frequency domain signal pointed by the sound source. Compared with the prior art, the present invention can not only save the time of beamforming by presetting the spatial filtering parameters in advance, but also can achieve no distortion of sound signals in a fixed direction.
在本实施例中,通过计算机设备预先计算平面波0°-180°的任意角度对应的空间 滤波参数,以便对声音源进行波束成形时获取对应的空间滤波参数。In this embodiment, a computer equipment is used to pre-calculate the spatial filtering parameters corresponding to any angle of plane wave 0 ° -180 °, so as to obtain the corresponding spatial filtering parameters when beamforming the sound source.
图4是本发明实施例的一种计算空间滤波参数的方法流程图。在一种可选的实现方式中,如图4所示,计算空间滤波参数具体包括以下步骤:FIG. 4 is a flowchart of a method for calculating spatial filtering parameters according to an embodiment of the present invention. In an optional implementation manner, as shown in FIG. 4, calculating the spatial filtering parameters specifically includes the following steps:
步骤S1,计算声音源到达麦克风阵列的延迟时间。声音源从不同的方向到达麦克风阵列,导致不同麦克风接收到信号会有不同程度的延迟时间,可通过延迟时间进行波束聚焦的方向定位,并确定出与空间滤波参数一致的声音源指向(如正前方90°)。Step S1: Calculate the delay time for the sound source to reach the microphone array. The sound source reaches the microphone array from different directions, resulting in different degrees of delay time for signals received by different microphones. The delay time can be used to locate the direction of the beam focus and determine the direction of the sound source that is consistent with the spatial filtering parameters (such as positive 90 ° ahead).
在本实施例中,计算声音源到达麦克风阵列的延迟时间具体可以采用但不限于以下步骤:确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω(也即想要收声和聚焦的方向角度,如正前方90°)。根据确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω计算上述延迟时间。具体方法请参考步骤S120,在此不再赘述。In this embodiment, the calculation of the delay time from the arrival of the sound source to the microphone array may specifically include, but is not limited to, the following steps: determine the microphone distance d, the sound propagation speed c, and the angle Ω that the sound source is pointing at (i.e., you want to receive and focus) Direction angle, such as 90 ° directly in front). The above-mentioned delay time is calculated according to the determined microphone distance d, the sound propagation speed c, and the angle Ω at which the sound source is pointed. For specific methods, refer to step S120, and details are not described herein again.
步骤S2,根据各麦克风阵列的延迟时间构建信号矢量函数,并根据信号矢量函数和延迟时间计算声音源指向。在构建信号矢量函数时,需要确定所有子带频率对应的矩阵。信号矢量函数为:In step S2, a signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time. When constructing a signal vector function, a matrix corresponding to all subband frequencies needs to be determined. The signal vector function is:
Figure PCTCN2019087621-appb-000002
Figure PCTCN2019087621-appb-000002
其中,Ω为收声和聚焦的方向角度,j为某个时刻下的相位,ω=2*π*f,其中,f为所有子带频率对应的矩阵,τ 0为声音源到第一个麦克风的延迟时间,N为麦克风的数量,τ (N-1)为声音源到第N个麦克风的延迟时间。由此,可以根据信号矢量函数和各麦克风对应的延迟时间计算声音源指向。可选的,首先确定声音源对应的子带频率对应的矩阵,并根据声音源对应的所有子带频率对应的矩阵、上述信号矢量函数和延迟时间计算目标声音源指向。具体解释请参考步骤S120,在此不再赘述。 Among them, Ω is the direction angle of sound receiving and focusing, j is the phase at a certain time, ω = 2 * π * f, where f is a matrix corresponding to all subband frequencies, and τ 0 is the sound source to the first The delay time of the microphone, N is the number of microphones, and τ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone. Optionally, the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time. For specific explanation, please refer to step S120, which is not repeated here.
步骤S3,根据预设的第一限制条件和第二限制条件,计算损失函数趋向最小值时的空间滤波参数。其中,损失函数根据空间滤波参数和信号矢量函数构造。Step S3: Calculate a spatial filtering parameter when the loss function approaches a minimum value according to a preset first limitation condition and a second limitation condition. Among them, the loss function is constructed according to the spatial filtering parameters and the signal vector function.
在一种可选的实现方式中,预设的第一限制条件为白噪音增益限制。In an optional implementation manner, the preset first limiting condition is a white noise gain limitation.
Figure PCTCN2019087621-appb-000003
Figure PCTCN2019087621-appb-000003
W f(ω)为空间滤波参数,T为转置运算,H为共轭转置,ω=2*π*f,f为所有子带频率对应的矩阵,Ω为收声和聚焦的方向角度。g(ω,Ω)为信号矢量函数。γ为白噪音的增益限制,可选的,白噪音的增益限制为gamma_db=-20db,γ具体为exp(gamma_db/10),具体的,本发明实施例对γ的具体数值不做限定。 W f (ω) is the spatial filtering parameter, T is the transpose operation, H is the conjugate transpose, ω = 2 * π * f, f is the matrix corresponding to all subband frequencies, and Ω is the direction angle of the sound and focus . g (ω, Ω) is a signal vector function. γ is a gain limit of white noise. Optionally, the gain limit of white noise is gamma_db = -20db, and γ is specifically exp (gamma_db / 10). Specifically, the embodiment of the present invention does not limit the specific value of γ.
在一种可选的实现方式中,预设的第二限制条件为使得空间滤波参数与信号矢量 函数的乘积为第一预设值。优选地,第一预设值为1。也就是说,第二限制条件为:W f(ω)*g(ω,Ω)=1。其中,空间滤波参数与信号矢量函数均为矩阵,并且,在一般情况下信号矢量函数的矩阵几乎不会变化。 In an optional implementation manner, the preset second limitation condition is that the product of the spatial filtering parameter and the signal vector function is a first preset value. Preferably, the first preset value is 1. That is, the second restriction condition is: W f (ω) * g (ω, Ω) = 1. Among them, the spatial filtering parameters and the signal vector function are both matrices, and in general, the matrix of the signal vector function hardly changes.
本发明实施例要对波束形成的空间条件进行限定。在具体实现过程中,必须要同时满足所述第一限制条件和第二限制条件。可选的,除了满足上述两个限制条件外,还可以包含满足第三限制条件,第三限制条件为:确定损失函数的凸面性。In the embodiment of the present invention, the spatial conditions of beamforming are limited. In a specific implementation process, the first restriction condition and the second restriction condition must be satisfied at the same time. Optionally, in addition to satisfying the above two limiting conditions, it may also include satisfying a third limiting condition. The third limiting condition is: determining the convexity of the loss function.
Figure PCTCN2019087621-appb-000004
Figure PCTCN2019087621-appb-000004
其中,R nn是噪声的协方差矩阵,g(ω,Ω)为信号矢量函数,H为共轭转置。 Among them, R nn is the covariance matrix of noise, g (ω, Ω) is a signal vector function, and H is a conjugate transpose.
根据空间滤波参数及信号矢量函数构造的损失函数为:The loss function constructed according to the spatial filtering parameters and the signal vector function is:
Figure PCTCN2019087621-appb-000005
Figure PCTCN2019087621-appb-000005
其中,损失函数b_hat使得最终得到在每个角度Ω上的响应response:Among them, the loss function b_hat makes the final response at each angle Ω:
Figure PCTCN2019087621-appb-000006
Figure PCTCN2019087621-appb-000006
根据所述第一限制条件及所述第二限制条件,计算所述损失函数趋向最小值时的空间滤波参数具体如下:According to the first limitation condition and the second limitation condition, calculating the spatial filtering parameter when the loss function approaches a minimum value is as follows:
Figure PCTCN2019087621-appb-000007
Figure PCTCN2019087621-appb-000007
在计算损失函数趋向最小值时的空间滤波参数,还需要与第一限制条件、第二限制条件与第三限制条件建立方程式,采用数学解方程的方式解空间滤波参数,有关数学解答方程的算法本发明实施例在此不再进行赘述。When calculating the spatial filtering parameters when the loss function approaches the minimum value, it is also necessary to establish equations with the first, second, and third constraints, and use mathematical solutions to solve the spatial filtering parameters. Algorithms for mathematically solving the equations The embodiments of the present invention will not be repeated here.
图5是本发明实施例的一种多波束波束成形方法的流程图。如图5所示,本实施例的多波束波束成形方法包括以下步骤:FIG. 5 is a flowchart of a multi-beam beamforming method according to an embodiment of the present invention. As shown in FIG. 5, the multi-beam beamforming method in this embodiment includes the following steps:
步骤S210,计算目标声音源指向对应的波束成形输出。Step S210: Calculate that the target sound source points to the corresponding beamforming output.
本发明实施所述的波束成形的声音角度来源为至少两个声音源指向,构成多波束波束成形,在实际应用中,所述声音源指向为平面波0°-180°的任意角度,需要说明的是,本发明实施例所述的至少两个声音源指向包含一个目标声音源及至少一个其他声音源指向,为了便于说明,后续实施例会以波束指向:0°、30°、60°、90°、120°、150°、180°方向(共7个方向)为例进行说明,其中,目标声音源为指向90°,但是,应当说明的是,该种说明该方式并非易在限定波束执行仅能为上述角度,还可以指向53°、80°,目标声音源还可以为60°等等,具体不做限定。In the implementation of the present invention, the source angle of the beam forming sound is directed by at least two sound sources, forming a multi-beam beam forming. In practical applications, the sound source is directed at any angle of plane wave 0 ° -180 °, which needs to be explained. Yes, the at least two sound source directions described in the embodiment of the present invention include a target sound source and at least one other sound source direction. For ease of description, the following embodiments will use beam directions: 0 °, 30 °, 60 °, 90 ° , 120 °, 150 °, and 180 ° directions (a total of 7 directions) are used as an example for explanation. Among them, the target sound source is directed at 90 °. However, it should be noted that this method is not easy to perform in a limited beam. The angle can be 53 °, 80 °, and the target sound source can also be 60 °. It is not limited.
分别计算每个声音源指向对应的原始频域信号与空间滤波参数的乘积,得到各个 单波束成形,该结果也为一个矩阵,其表现形式为频谱。在计算每个声音源指向对应的原始频域信号与空间滤波参数的乘积时,需要通过麦克风阵列确定各个声音源指向,具体包括:所述麦克风阵列由一定数目的声学传感器(一般是麦克风)组成,用来对声场的空间特性进行采样,在实际应用中,麦克风数目可以4个成线型(如图2所示)等间距均匀分布、6个成线型等间距均匀分布、8个成圆形等间距均匀分布(如图3所示),12或14个成圆形、长方形、月牙形等间距均匀分布等等,具体的本发明实施例对麦克风阵列的数量和排列方式不作限定。但是,为了便于说明,本发明实施例后续会以图3所示的4个成线型的麦克风阵列3为例进行说明,但是应当明确的是,该种说明方式并不是对麦克风阵列的具体限定。Calculate the product of the original frequency-domain signal and the spatial filtering parameters corresponding to each sound source respectively, and get each single beamforming. The result is also a matrix, and its expression is spectrum. When calculating the product of the corresponding original frequency-domain signal and the spatial filtering parameter of each sound source pointing, each sound source pointing needs to be determined through a microphone array, which specifically includes: the microphone array is composed of a certain number of acoustic sensors (generally microphones) , Used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be uniformly distributed at 4 equal intervals (as shown in Figure 2), uniformly distributed at 6 equal intervals, and 8 in circles. Shapes are uniformly distributed at equal intervals (as shown in FIG. 3), 12 or 14 are uniformly distributed at equal intervals such as circles, rectangles, and crescents, etc. The specific embodiment of the present invention does not limit the number and arrangement of microphone arrays. However, for convenience of description, the embodiment of the present invention will be described by taking the four linear microphone arrays 3 shown in FIG. 3 as an example, but it should be clear that this description method is not a specific limitation on the microphone array. .
在实际应用过程中,考虑到声波的特性,在对麦克风进行布局时,每个麦克风之间的距离不易设置的过大,也不能设置的过小,若设置的距离不合适会对声音源的聚焦定位产生误差,一般情况下,可设置麦克风之间的等间距距离小于80毫米,且大于30毫米。In the practical application process, considering the characteristics of sound waves, when laying out microphones, the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning. In general, the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
作为本发明实施例的另一种实现方法,在计算目标声音源指向对应的波束成形输出时,还可以采用GSC(Generalized Sidelobe Cancellation)等计算单一声音源指向的波束成形的算法,本发明实施例对计算单一声音源指向的波束成形算法不做限定。As another implementation method of the embodiment of the present invention, when calculating the target sound source pointing to the corresponding beamforming output, a GSC (Generalized Sidelobe Cancellation) algorithm may also be used to calculate the beamforming algorithm of a single sound source pointing. There is no limitation on the beamforming algorithm for calculating the direction of a single sound source.
S220,根据阻塞矩阵计算噪音参数。其中,阻塞矩阵用于表征声音信号的频率响应。计算噪音参数的目的在于对非目标声音源指向的声音进行降噪。例如,波束指向分别为:0°、30°、60°、90°、120°、150°、180°方向(共7个方向),目标声音源指向为90°,则噪音参数用于对声音源指向为:0°、30°、60°、120°、150°、180°的声音进行降噪。S220. Calculate a noise parameter according to the blocking matrix. Among them, the blocking matrix is used to characterize the frequency response of the sound signal. The purpose of calculating the noise parameter is to reduce the noise of the sound pointed by the non-target sound source. For example, the beam pointing is: 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, 180 ° directions (a total of 7 directions), and the target sound source is 90 °, so the noise parameter is used for the sound Sources are: 0 °, 30 °, 60 °, 120 °, 150 °, 180 ° for noise reduction.
步骤S230,根据噪声参数对目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。Step S230: Perform noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output to the target sound source according to the noise parameter.
在具体实施过程中,从步骤S210计算的目标声音源指向对应的波束成形输出信号中,滤除步骤S220中非目标声音源指向的信号,即采用噪音参数对非目标声音源指向的信号进行降噪,如此一来既能确保目标声音源指向声音的不失真,又能降低其他声音源指向声音的干扰。In the specific implementation process, from the target sound source calculated in step S210 to the corresponding beamforming output signal, the signal pointed by the non-target sound source in step S220 is filtered, that is, the noise parameter is used to reduce the signal pointed by the non-target sound source. Noise, in this way, can not only ensure that the target sound source is not distorted by the sound, but also reduce the interference of other sound sources to the sound.
本发明实施例提供的多波束波束成形的方法,计算目标声音源指向对应的波束成形输出;通过阻塞矩阵计算噪音参数;根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的其他声音源指向的信号进行降噪;与现有技术相比,本发明实 施例能够确保目标声音源指向的声音不失真,并且对其他声音源指向的声音进行降噪,能够有效抑制其他声音方向的干扰。The multi-beam beamforming method provided in the embodiment of the present invention calculates a target sound source to point to a corresponding beamforming output; calculates a noise parameter through a blocking matrix; and points the target sound source outside the corresponding beamforming output according to the noise parameter. Compared with the prior art, the embodiments of the present invention can ensure that the sound pointed by the target sound source is not distorted, and perform noise reduction on the sound pointed by other sound sources, which can effectively suppress other sounds. Directional interference.
进一步的,作为对上述实施例的进一步扩展及细化,下面依次说明每个步骤的具体实现方法。Further, as a further extension and refinement of the foregoing embodiment, a specific implementation method of each step is described below in order.
在执行步骤S210计算目标声音源指向对应的波束成形输出时,可以采用但不局限于以下方法,例如:获取空间滤波参数,并确定所述空间滤波参数对应的目标声音源指向,获取目标声音源指向对应的原始频域信号;计算所述空间滤波参数与目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形。When step S210 is performed to calculate the target sound source pointing to the corresponding beamforming output, the following methods may be adopted, for example: acquiring spatial filtering parameters, and determining the target sound source corresponding to the spatial filtering parameters, and obtaining the target sound source Pointing to the corresponding original frequency domain signal; calculating the product of the spatial filtering parameter and the target sound source pointing to the corresponding original frequency domain signal to obtain the beamforming pointed by the target sound source.
其中,本发明实施例所述的空间滤波参数为在频域中的滤波器参数,其目的在于对每一帧的信号在子带频率上做相应的增益。在实际应用中,本发明实施例中所述的空间滤波参数为一矩阵,该空间滤波参数经过电脑设备的计算得到,将获取的空间滤波参数存储于执行本发明实施例所述的方法的电子设备中,以供电子设备直接使用,从而缩短了波束成形的时间消耗。The spatial filtering parameter according to the embodiment of the present invention is a filter parameter in the frequency domain, and its purpose is to make a corresponding gain on the subband frequency of the signal of each frame. In practical applications, the spatial filtering parameters described in the embodiments of the present invention are a matrix. The spatial filtering parameters are calculated by computer equipment, and the obtained spatial filtering parameters are stored in an electronic device that executes the method according to the embodiments of the present invention. In the device, the power supply sub-device is used directly, thereby reducing the time consumption of beamforming.
获取空间滤波参数W f(ω),并确定所述空间滤波参数W f(ω)对应的目标声音源指向,并分别获取目标声音源指向对应的原始频域信号;计算所述空间滤波参数W f(ω)分别与不同声音源指向对应的原始频域信号的乘积。 Obtaining a spatial filtering parameter W f (ω), and determining a target sound source direction corresponding to the spatial filtering parameter W f (ω), and respectively obtaining a target sound source pointing to a corresponding original frequency domain signal; calculating the spatial filtering parameter W The product of f (ω) and the original frequency domain signal corresponding to different sound source directions respectively.
在本实施例中,确定空间滤波参数W f(ω)对应的目标声音源指向在通过延迟时间进行波束聚焦的方向定位时,即确定空间滤波参数W f(ω)对应的目标声音源指向,可以采用但不局限于以下方法,通过麦克风排列的物理结构,计算声音源到达每一个麦克风的延迟时间。假设:确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω(也即想要收声和聚焦的方向角度,如正前方90°)。在麦克风阵列中,选择一个最先到达麦克风的参照物(如图2中的Mic1),计算第一个麦克风Mic1的延迟时间为:tau_0=d*sin(Ω)/c;第二个麦克风Mic2的延迟时间为tau_1=2*d*sin(Ω)/c,第三个麦克风Mic4的延迟时间为:tau_2=3*d*sin(Ω)/c,第四个麦克风Mic4的延迟时间为:tau_3=4*d*sin(Ω)/c。通常第一个麦克风Mic1为参考麦克风,所以延迟时间为0,tau_1指的是声场到第二个麦克风Mic2的延迟时间。上述延迟时间的计算方法适用于线性等间距分布的麦克风阵列,其他麦克风分布及非等间距的计算方法与上述方法可能存在差异。 In the present embodiment, determining the spatial filtering parameters W f (ω) corresponding to the target sound source directed at the direction of the beam focused by the delay time positioning, i.e., determining the spatial filter parameters W f (ω) corresponding to the target sound source point, The following method can be adopted, but not limited to, the delay time of the sound source reaching each microphone can be calculated through the physical structure of the microphone arrangement. Assumption: Determine the microphone distance d, the sound propagation speed c, and the angle Ω at which the sound source points (that is, the angle of the direction in which you want to receive and focus, such as 90 ° directly in front). In the microphone array, select a reference object that reaches the microphone first (such as Mic1 in Figure 2), and calculate the delay time of the first microphone Mic1 as: tau_0 = d * sin (Ω) / c; the second microphone Mic2 The delay time of tau_1 = 2 * d * sin (Ω) / c, the delay time of the third microphone Mic4 is: tau_2 = 3 * d * sin (Ω) / c, and the delay time of the fourth microphone Mic4 is: tau_3 = 4 * d * sin (Ω) / c. Usually the first microphone Mic1 is the reference microphone, so the delay time is 0, tau_1 refers to the delay time from the sound field to the second microphone Mic2. The above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
根据各麦克风阵列的延迟时间构建信号矢量函数,并根据信号矢量函数和延迟时间计算声音源指向。在构建信号矢量函数时,需要确定所有子带频率对应的矩阵。信 号矢量函数为:The signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time. When constructing a signal vector function, a matrix corresponding to all subband frequencies needs to be determined. The signal vector function is:
Figure PCTCN2019087621-appb-000008
Figure PCTCN2019087621-appb-000008
其中,Ω为收声和聚焦的方向角度,j为某个时刻下的相位,ω=2*π*f,其中,f为所有子带频率对应的矩阵,τ 0为声音源到第一个麦克风的延迟时间,N为麦克风的数量,τ (N-1)为声音源到第N个麦克风的延迟时间。由此,可以根据信号矢量函数和各麦克风对应的延迟时间计算声音源指向。可选的,首先确定声音源对应的子带频率对应的矩阵,并根据声音源对应的所有子带频率对应的矩阵、上述信号矢量函数和延迟时间计算目标声音源指向。 Among them, Ω is the direction angle of sound receiving and focusing, j is the phase at a certain time, ω = 2 * π * f, where f is a matrix corresponding to all subband frequencies, and τ 0 is the sound source to the first The delay time of the microphone, N is the number of microphones, and τ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone. Optionally, the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
在实际应用过程中,为了便于对声音进行后续使用,需要先将声音信号通过傅立叶变换将原来难以处理的时域信号(声音信号)转换成了易于分析的频域信号,所述傅立叶变换的原理为任何连续测量的时序或信号,都可以表示为不同频率的正弦波信号的无限叠加,而根据该原理创立的傅立叶变换算法利用直接测量到的原始信号,以累加方式来计算该信号中不同正弦波信号的频率、振幅和相位。其中,有关傅立叶变换的具体实现方式本发明实施例在此不再进行赘述。In the practical application process, in order to facilitate the subsequent use of sound, it is necessary to first convert the sound signal through the Fourier transform to the time-domain signal (sound signal) that was originally difficult to process into a frequency-domain signal that is easy to analyze. The principle of the Fourier transform For any continuously measured time sequence or signal, it can be expressed as an infinite superposition of sine wave signals of different frequencies, and the Fourier transform algorithm created according to this principle uses the directly measured original signal to calculate the different sine in the signal in an additive manner. The frequency, amplitude, and phase of the wave signal. A specific implementation manner of the Fourier transform is not described in this embodiment of the present invention.
进一步地,空间滤波参数W f(ω)和原始频域信号Z(t,e )均为矩阵,将两个矩阵相乘:得到Y(ω,Ω)=W f(ω)Z(t,e ),乘积Y(ω,Ω)会对除目标声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形,使得固定方向的声音信号不失真。 Further, the spatial filtering parameter W f (ω) and the original frequency domain signal Z (t, e ) are both matrices, and the two matrices are multiplied: Y (ω, Ω) = W f (ω) Z (t , e ), the product Y (ω, Ω) performs beamforming in a manner that suppresses other frequency-domain signals except the original frequency-domain signal pointed by the target sound source, so that the sound signal in a fixed direction is not distorted.
在执行步骤S220通过阻塞矩阵计算噪音参数,可以采用但不局限于以下方式,例如:通过计算声音信号依次到达麦克风的频率响应,并根据该频率响应构建阻塞矩阵,根据该阻塞矩阵和非目标声音源指向对应的原始频域信号,计算噪声参数。计算噪音参数的目的在于对非目标声音源指向的声音进行降噪。In step S220, the noise parameter is calculated through the blocking matrix, which can be adopted, but not limited to, for example, by calculating the frequency response of the sound signal reaching the microphone in sequence, and constructing the blocking matrix based on the frequency response, according to the blocking matrix and the non-target sound The source points to the corresponding original frequency domain signal, and the noise parameters are calculated. The purpose of calculating the noise parameter is to reduce the noise of the sound pointed by the non-target sound source.
在一种可选的实现方式中,首先计算声音信号到达第一个麦克风的频率响应:A-1(e ),到达第二个麦克风的频率响应:A-2(e ),…,声音信号到达第M个麦克风的频率响应:A-M(e ),A用于表征麦克风的频率响应函数。 In an optional implementation manner, first calculate the frequency response of the sound signal reaching the first microphone: A-1 (e ), and the frequency response of reaching the second microphone: A-2 (e ), ..., Frequency response of the sound signal reaching the Mth microphone: AM (e ), A is used to characterize the frequency response function of the microphone.
根据上述频率响应构建组阻塞矩阵:Construct a group blocking matrix based on the above frequency response:
Figure PCTCN2019087621-appb-000009
Figure PCTCN2019087621-appb-000009
阻塞矩阵H(e )构建完毕后,根据阻塞矩阵H(e )及非目标声音源指向对应的原始频域信号Z(t,e ),计算所述噪音参数: After the blocking matrix H (e ) is constructed, the noise parameter is calculated according to the blocking matrix H (e ) and the non-target sound source pointing to the corresponding original frequency domain signal Z (t, e ):
U(t,e )=H(t,e )Z(t,e ) U (t, e ) = H (t, e ) Z (t, e )
其中,t表征每帧信号的输入时间。Among them, t represents the input time of each frame signal.
在具体实施过程中,从步骤S210中计算的目标声音源指向对应的波束成形输出信号中,滤除步骤S220中非目标声音源指向的信号,即采用噪音参数U(t,e )对非目标声音源指向的信号进行降噪,如此一来既能确保目标声音源指向声音的不失真,又能降低非目标声音源指向声音的干扰。 In the specific implementation process, from the target sound source calculated in step S210 to the corresponding beamforming output signal, the signal pointed by the non-target sound source in step S220 is filtered, that is, the noise parameter U (t, e ) The signal pointed by the target sound source is denoised, so that it can not only ensure that the target sound source is not distorted, but also reduce the interference of non-target sound sources.
在实际应用过程中,声音信号在传播过程中,会包含一些风扇、空调等比较稳定、微弱的噪声。为了降低该些噪声,在步骤S230执行根据噪音参数U(t,e )对目标声音源指向对应的波束成形输出之外的其他声音源指向的信号进行降噪时,可以采用但不局限于以下方法,包括:通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;根据目标声音源的波束成形输出、最优滤波参数以及噪音参数,对目标声音源指向对应的波束成形输出之外的其他声音源指向的信号进行降噪。 In the actual application process, during the propagation of sound signals, some stable and weak noises such as fans and air conditioners will be included. In order to reduce these noises, in step S230, noise reduction is performed according to the noise parameter U (t, e ) on the signals pointed by the sound source other than the corresponding beamforming output, which can be adopted, but not limited to The following methods include: calculating multi-channel optimal filtering parameters through a multi-channel filtering algorithm and an iterative algorithm; and pointing the target sound source to the corresponding beam forming output according to the beam forming output, optimal filtering parameters, and noise parameters of the target sound source. Signals pointed to by other sound sources are denoised.
本发明实施例以多通道滤波算法为多通道维纳滤波为例进行说明。为了使得目标声音源指向输出的能量收到的影响最小,通过多通道维纳滤波和NLMS迭代的方法(Normalized Least Mean Square,归一化最小均方自适应滤波算法),计算最优滤波参数G(t,e ),进一步滤掉稳定的背景噪音,计算最优滤波参数G(t,e ),必须使得E{||Y(t,e )-G(t,e )U(t,e )|| 2}最小,进而得到最优滤波参数G(t,e )。 The embodiment of the present invention is described by taking a multi-channel filtering algorithm as a multi-channel Wiener filtering as an example. In order to minimize the impact of the energy of the target sound source pointing to the output, the optimal filtering parameter G is calculated by using a multi-channel Wiener filter and an NLMS iterative method (Normalized Least Mean Square). (t, e ), further filtering out the stable background noise, and calculating the optimal filtering parameter G (t, e ), must make E {|| Y (t, e ) -G (t, e ) U (t, e ) || 2 } is the smallest, and then an optimal filtering parameter G (t, e ) is obtained.
计算出最优滤波参数G(t,e )、噪音参数U(t,e )之后,输出最终目标声音源指向的语音输出: After calculating the optimal filtering parameter G (t, e ) and noise parameter U (t, e ), the speech output to which the final target sound source is directed is output:
Y=Y(ω,Ω)-G(t,e )*U(t,e ) Y = Y (ω, Ω) -G (t, e ) * U (t, e )
为了便于对最终的语音输出的理解,如图6所示,图6示出了本发明实施例的一种目标声音源指向的最终语音输出的示意图,其中,图7中Y(ω,Ω)表示为Y FBF(t,e ),G(t,e )*U(t,e )表示为Y NC(t,e )。 In order to facilitate the understanding of the final voice output, as shown in FIG. 6, FIG. 6 shows a schematic diagram of the final voice output pointed to by a target sound source according to an embodiment of the present invention, where Y (ω, Ω) in FIG. 7 It is expressed as Y FBF (t, e ), and G (t, e ) * U (t, e ) is expressed as Y NC (t, e ).
本实施例通过计算目标声音源指向对应的波束成形输出,并根据噪音参数对非目标声音源指向的信号进行降噪,由此,可以进一步确保目标声音源指向的声音不失真,并进一步抑制非目标声音源指向的干扰。In this embodiment, by calculating the target beamforming output corresponding to the target sound source, and reducing the noise of the signal pointed by the non-target sound source according to the noise parameter, this can further ensure that the sound pointed by the target sound source is not distorted and further suppress the non- The interference pointed by the target sound source.
图7是本发明实施例的另一种多波束波束成形方法的流程图。如图7所示,本实施例的多波束波束成形方法包括以下步骤:FIG. 7 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention. As shown in FIG. 7, the multi-beam beamforming method in this embodiment includes the following steps:
步骤S340,计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形。其中,空间滤波参数随声音源的角度和子带频率的不同而不同,至少两个声音源指向包含一个目标声音源及至少一个非目标声音源指向。Step S340: Calculate a product of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions, respectively, to obtain multi-beam beamforming. The spatial filtering parameters vary with the angle of the sound source and the frequency of the subband. At least two sound source directions include a target sound source and at least one non-target sound source direction.
本实施例所述的空间滤波参数为在频域中的滤波器参数,其目的在于对每一帧的信号在子带频率上做相应的增益。在实际应用中,本发明实施例中所述的空间滤波参数为一矩阵,该空间滤波参数经过电脑设备的计算得到,计算得到结果后将空间滤波参数存储于本发明实施例所述的电子设备中,以供电子设备直接使用,从而缩短了波束成形的时间消耗。在一种可选的实现方式中,本实施例可采用图4中的步骤S1-S3所述的方法来计算空间滤波参数,在此不再赘述。The spatial filtering parameter described in this embodiment is a filter parameter in the frequency domain, and its purpose is to make a corresponding gain on the subband frequency of the signal of each frame. In practical applications, the spatial filtering parameters described in the embodiments of the present invention are a matrix. The spatial filtering parameters are obtained through calculation by a computer device. After the calculation results are obtained, the spatial filtering parameters are stored in the electronic device according to the embodiments of the present invention. In the use of power supply equipment, the time consumption of beamforming is shortened. In an optional implementation manner, in this embodiment, the method described in steps S1 to S3 in FIG. 4 may be used to calculate the spatial filtering parameters, and details are not described herein again.
本实施例的波束成形的声音角度来源为至少两个声音源指向,构成多波束波束成形,在实际应用中,所述声音源指向为平面波0°-180°的任意角度,需要说明的是,本发明实施例所述的至少两个声音源指向包含一个目标声音源及至少一个其他声音源指向,为了便于说明,后续实施例会以波束指向:0°、30°、60°、90°、120°、150°、180°方向(共7个方向)为例进行说明,其中,目标声音源为指向90°,但是,应当说明的是,该种说明该方式并非易在限定波束执行仅能为上述角度,还可以指向53°、80°,目标声音源还可以为60°等等,具体不做限定。The beam angle of the sound source in this embodiment is directed by at least two sound sources, which constitutes multi-beam beamforming. In practical applications, the sound source is directed at any angle of plane wave 0 ° -180 °. It should be noted that The at least two sound source directions described in the embodiment of the present invention include a target sound source and at least one other sound source direction. For ease of description, the following embodiments will use beam directions: 0 °, 30 °, 60 °, 90 °, 120 The directions of °, 150 °, and 180 ° (a total of 7 directions) are used as an example for description. The target sound source is pointed at 90 °. However, it should be noted that this method is not easy to perform in a limited beam. The above angle can also point to 53 °, 80 °, and the target sound source can also be 60 °, etc., which is not specifically limited.
分别计算每个声音源指向对应的原始频域信号与空间滤波参数的乘积,得到各个单波束成形,该结果也为一个矩阵,其表现形式为频谱。在计算每个声音源指向对应的原始频域信号与空间滤波参数的乘积时,需要通过麦克风阵列确定各个声音源指向,具体包括:所述麦克风阵列由一定数目的声学传感器(一般是麦克风)组成,用来对声场的空间特性进行采样,在实际应用中,麦克风数目可以4个成线型等间距均匀分布(如图2所示)、6个成线型等间距均匀分布、8个成圆形等间距均匀分布(如图3所示),12或14个成圆形、长方形、月牙形等间距均匀分布等等,具体的本发明实施例对麦克风阵列的数量和排列方式不作限定。但是,为了便于说明,本发明实施例后续会以图2中的麦克风阵列样式和数量为例进行说明,但是应当明确的是,该种说明方式并 不是对麦克风阵列的具体限定。Calculate the product of the original frequency-domain signal and the spatial filtering parameter corresponding to each sound source pointing separately to obtain each single beamforming. The result is also a matrix whose representation is the frequency spectrum. When calculating the product of the corresponding original frequency-domain signal and the spatial filtering parameter of each sound source pointing, each sound source pointing needs to be determined through a microphone array, which specifically includes: the microphone array is composed of a certain number of acoustic sensors (generally microphones) , Used to sample the spatial characteristics of the sound field. In practical applications, the number of microphones can be uniformly distributed at 4 linear shapes and evenly spaced (as shown in Figure 2), uniformly distributed at 6 linear shapes and evenly spaced, and 8 circled. Shapes are uniformly distributed at equal intervals (as shown in FIG. 3), 12 or 14 are uniformly distributed at equal intervals such as circles, rectangles, and crescents, etc. The specific embodiment of the present invention does not limit the number and arrangement of microphone arrays. However, for the convenience of description, the embodiments of the present invention will be described later using the microphone array style and quantity in FIG. 2 as an example, but it should be clear that this description manner does not specifically limit the microphone array.
在实际应用过程中,考虑到声波的特性,在对麦克风进行布局时,每个麦克风之间的距离不易设置的过大,也不能设置的过小,若设置的距离不合适会对声音源的聚焦定位产生误差,一般情况下,可设置麦克风之间的等间距距离小于80毫米,且大于30毫米。In the practical application process, considering the characteristics of sound waves, when laying out microphones, the distance between each microphone cannot be easily set too large, nor can it be set too small. If the set distance is not suitable for the sound source, There is an error in the focus positioning. In general, the equidistance between microphones can be set to less than 80 mm and greater than 30 mm.
步骤S320,计算目标声音源指向的增强语音。Step S320: Calculate the enhanced speech pointed by the target sound source.
本实施例以图2中的麦克风阵列2为例,在获取到7个方向的声音,将7段声音经过傅里叶变换后,得到7个4*512的矩阵,其中,4代表麦克风的数量,512代表将不同方向对应的频谱分别分解为512个子带。本步骤的目的在于从子带的角度进行滤波处理,确定目标声音源对应的所有子带,在每个子带上的占比。In this embodiment, the microphone array 2 in FIG. 2 is used as an example. After obtaining sounds in 7 directions, the 7 segments of sound are subjected to Fourier transform to obtain 7 4 * 512 matrices, where 4 represents the number of microphones. , 512 represents that the spectrum corresponding to different directions is decomposed into 512 subbands respectively. The purpose of this step is to perform filtering processing from the perspective of the subbands, and determine the proportion of all subbands corresponding to the target sound source on each subband.
假设目标声音源指向为90°,目标声音源对应的频谱对应α1:4*512个子带,0°声音源指向对应的频谱对应α2:4*512个子带,30°声音源指向对应的频谱对应α3:4*512个子带,60°声音源指向对应的频谱对应α4:4*512个子带,120°声音源指向对应的频谱对应α5:4*512个子带,150°声音源指向对应的频谱对应α6:4*512个子带,180°声音源指向对应的频谱对应α7:4*512个子带。在一种实现方式中,计算目标声音源指向对应的比值增益为:α1/(α1+α2+α3+α4+α5+α6+α7);在另一种实现方式中,计算目标声音源指向对应的比值增益为:α1/(α2+α3+α4+α5+α6+α7)。得到目标声音源对应的比值增益后,根据比值增益与步骤S310计算的多波束波束成形输出(也即空间滤波参数与至少两个声音源指向对应的原始频域信号的乘积)获得目标声音源指向的增强语音。可选的,计算第一乘积与目标声音源对应的比值增益的乘积。其中,第一乘积为目标声音源指向对应的原始频域信号与空间滤波参数之间的乘积。Assuming that the target sound source is pointed at 90 °, the frequency spectrum corresponding to the target sound source corresponds to α1: 4 * 512 subbands, the 0 ° sound source points to the corresponding spectrum corresponding to α2: 4 * 512 subbands, and the 30 ° sound source points to the corresponding frequency spectrum. α3: 4 * 512 subbands, 60 ° sound source points to the corresponding spectrum corresponds to α4: 4 * 512 subbands, 120 ° sound source points to the corresponding spectrum corresponds to α5: 4 * 512 subbands, and 150 ° sound source points to the corresponding spectrum Corresponding to α6: 4 * 512 subbands, and a 180 ° sound source pointing to the corresponding spectrum corresponds to α7: 4 * 512 subbands. In one implementation, calculating the ratio gain of the target sound source pointing is: α1 / (α1 + α2 + α3 + α4 + α5 + α6 + α7); in another implementation, calculating the target sound source pointing corresponding The ratio gain is: α1 / (α2 + α3 + α4 + α5 + α6 + α7). After obtaining the ratio gain corresponding to the target sound source, obtain the target sound source direction according to the ratio gain and the multi-beam beamforming output calculated in step S310 (that is, the product of the spatial filtering parameter and the original frequency domain signals corresponding to at least two sound source directions). Enhanced speech. Optionally, the product of the first product and the ratio gain corresponding to the target sound source is calculated. Wherein, the first product is a product between the target sound source pointing to the corresponding original frequency domain signal and the spatial filtering parameter.
步骤S330,根据目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值。Step S330: Calculate an energy ratio based on the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
在一种可选的实现方式中,将当前帧频谱分解的多个子带进行合并,并获取合并后的子带的能量。其中,当前帧中包括目标声音源和非目标声音源。在具体实施过程中,先将目标声音源对应的512个子带进行合并,并确定合并后的子带能量。其次,依次将其他6个声音源指向(或7个声音源指向,包含目标声音源)的512个子带进行合并,分别确定每个合并后的声音源指向的子带能量,最后,计算6个声音源指向(或7个声音源指向,包含目标声音源)的所有子带的能量和,该能量和为一矩阵。In an optional implementation manner, multiple subbands of the current frame spectrum decomposition are combined, and the energy of the combined subbands is obtained. The current frame includes a target sound source and a non-target sound source. In the specific implementation process, the 512 subbands corresponding to the target sound source are combined first, and the combined subband energy is determined. Secondly, merge the 512 sub-bands pointed by the other 6 sound sources (or 7 sound sources, including the target sound source) in order, and determine the sub-band energy pointed by each combined sound source. Finally, calculate 6 The sum of the energy of all subbands pointed by the sound source (or 7 sound sources, including the target sound source). The energy sum is a matrix.
根据目标声音源对应的子带能量与6个声音源指向(或7个声音源指向,包含目 标声音源)的所有子带的能量和,计算能量比值。The energy ratio is calculated based on the sum of the energy of the subbands corresponding to the target sound source and the energy of all the subbands pointed by the 6 sound sources (or 7 sound sources, including the target sound source).
步骤S340,计算目标声音源指向的原始频域信号、目标声音源指向对应的增强语音以及能量比值的乘积,以对非目标声音源指向降噪,并输出该乘积对应的语音。Step S340: Calculate a product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech and the energy ratio pointed by the target sound source to reduce noise to the non-target sound source, and output the speech corresponding to the product.
获取目标声音源指向对应的原始频域信号,并计算原始频域信号与步骤S320得到的目标声音源指向对应的增强语音、步骤S330计算得到的能量比值之间的乘积,根据该乘积得到的波束成形能够确保目标声音源指向的声音不失真,同时,能够抑制非目标声音源方向产生的噪音。Obtain the target sound source pointing to the corresponding original frequency domain signal, and calculate the product between the original frequency domain signal and the target sound source obtained at step S320 pointing to the corresponding enhanced speech, the energy ratio calculated at step S330, and the beam obtained according to the product The shaping can ensure that the sound pointed by the target sound source is not distorted, and at the same time, can suppress the noise generated in the direction of the non-target sound source.
本发明实施例提供的多波束波束成形的方法,通过计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,并通过计算目标声音源指向的增强语音、能量比值和目标声音源指向的所述原始频域信号的乘积,以输出该乘积对应的语音,从而实现对非目标声音源的降噪处理,确保目标声音源指向的声音不失真。In the method for multi-beam beamforming provided by the embodiment of the present invention, multi-beam beamforming is obtained by calculating a product of a spatial filtering parameter and at least two sound source directions corresponding to original frequency domain signals, respectively. The product of the voice, energy ratio, and the original frequency domain signal pointed by the target sound source to output the speech corresponding to the product, thereby achieving noise reduction processing for non-target sound sources and ensuring that the sound pointed by the target sound source is not distorted.
图8是本发明实施例的又一种多波束波束成形方法的流程图。作为对上述实施例的细化和扩展,本发明实施例还提供另一种多波束波束成形的方法,如图8所示,本实施例的多波束波束成形方法包括以下步骤:FIG. 8 is a flowchart of another multi-beam beamforming method according to an embodiment of the present invention. As a refinement and extension of the foregoing embodiment, an embodiment of the present invention also provides another method for multi-beam beamforming. As shown in FIG. 8, the method for multi-beam beamforming in this embodiment includes the following steps:
步骤S410,计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形。其中,空间滤波参数随声音源的角度和子带频率的不同而不同,至少两个声音源指向包含一个目标声音源及至少一个非目标声音源指向。Step S410: Calculate a product of the spatial filtering parameter and the original frequency domain signals corresponding to the at least two sound source directions, respectively, to obtain multi-beam beamforming. The spatial filtering parameters vary with the angle of the sound source and the frequency of the subband. At least two sound source directions include a target sound source and at least one non-target sound source direction.
在计算空间滤波参数W f(ω)与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形时,可以采用但不局限于以下方法: When calculating the product of the spatial filtering parameter W f (ω) and the original frequency-domain signals corresponding to at least two sound source directions respectively, to obtain multi-beam beamforming, the following methods can be adopted, but not limited to:
获取空间滤波参数W f(ω),并确定所述空间滤波参数W f(ω)分别对应的各声音源指向,并分别获取各声音源指向对应的原始频域信号;计算所述空间滤波参数W f(ω)分别与不同声音源指向对应的原始频域信号的乘积。 Acquire a spatial filtering parameter W f (ω), and determine the respective sound source directions corresponding to the spatial filtering parameter W f (ω), and separately obtain each sound source pointing to a corresponding original frequency domain signal; calculate the spatial filtering parameter The products of W f (ω) and original frequency domain signals corresponding to different sound source directions respectively.
在本实施例中,确定空间滤波参数W f(ω)对应的至少两个声音源指向在通过延迟时间进行波束聚焦的方向定位时,即确定空间滤波参数W f(ω)对应的目标声音源指向,可以采用但不局限于以下方法,通过麦克风排列的物理结构,计算声音源到达每一个麦克风的延迟时间。假设:确定麦克风间距d,声音传播速度c,以及声音源指向的角度Ω(也即想要收声和聚焦的方向角度,如正前方90°)。在麦克风阵列中,选择一个最先到达麦克风的参照物(如图2中的Mic1),计算第一个麦克风Mic1的延迟时间为:tau_0=d*sin(Ω)/c;第二个麦克风Mic2的延迟时间为tau_1=2*d*sin(Ω)/c,第三 个麦克风Mic4的延迟时间为:tau_2=3*d*sin(Ω)/c,第四个麦克风Mic4的延迟时间为:tau_3=4*d*sin(Ω)/c。通常第一个麦克风Mic1为参考麦克风,所以延迟时间为0,tau_1指的是声场到第二个麦克风Mic2的延迟时间。上述延迟时间的计算方法适用于线性等间距分布的麦克风阵列,其他麦克风分布及非等间距的计算方法与上述方法可能存在差异。 In the present embodiment, the spatial filter is determined parameter W f (ω) corresponding to at least two points when the sound source direction of the beam focused by positioning the delay time, i.e., determining the spatial filter parameters W f (ω) corresponding to the target sound source For pointing, you can use but not limited to the following methods to calculate the delay time for the sound source to reach each microphone through the physical structure of the microphone arrangement. Assumption: Determine the microphone distance d, the sound propagation speed c, and the angle Ω at which the sound source points (that is, the angle of the direction in which you want to receive and focus, such as 90 ° directly in front). In the microphone array, select a reference object that reaches the microphone first (such as Mic1 in Figure 2), and calculate the delay time of the first microphone Mic1 as: tau_0 = d * sin (Ω) / c; the second microphone Mic2 The delay time of tau_1 = 2 * d * sin (Ω) / c, the delay time of the third microphone Mic4 is: tau_2 = 3 * d * sin (Ω) / c, and the delay time of the fourth microphone Mic4 is: tau_3 = 4 * d * sin (Ω) / c. Usually the first microphone Mic1 is the reference microphone, so the delay time is 0, tau_1 refers to the delay time from the sound field to the second microphone Mic2. The above calculation method of delay time is suitable for linearly spaced microphone arrays. Other calculation methods for microphone distribution and non-equally spaced may be different from the above methods.
根据各麦克风阵列的延迟时间构建信号矢量函数,并根据信号矢量函数和延迟时间计算声音源指向。在构建信号矢量函数时,需要确定所有子带频率对应的矩阵。信号矢量函数为:The signal vector function is constructed according to the delay time of each microphone array, and the sound source direction is calculated according to the signal vector function and the delay time. When constructing a signal vector function, a matrix corresponding to all subband frequencies needs to be determined. The signal vector function is:
Figure PCTCN2019087621-appb-000010
Figure PCTCN2019087621-appb-000010
其中,Ω为收声和聚焦的方向角度,j为某个时刻下的相位,ω=2*π*f,其中,f为所有子带频率对应的矩阵,τ 0为声音源到第一个麦克风的延迟时间,N为麦克风的数量,τ (N-1)为声音源到第N个麦克风的延迟时间。由此,可以根据信号矢量函数和各麦克风对应的延迟时间计算声音源指向。可选的,首先确定声音源对应的子带频率对应的矩阵,并根据声音源对应的所有子带频率对应的矩阵、上述信号矢量函数和延迟时间计算目标声音源指向。 Among them, Ω is the direction angle of sound receiving and focusing, j is the phase at a certain time, ω = 2 * π * f, where f is a matrix corresponding to all subband frequencies, and τ 0 is the sound source to the first The delay time of the microphone, N is the number of microphones, and τ (N-1) is the delay time from the sound source to the Nth microphone. Therefore, the sound source direction can be calculated according to the signal vector function and the delay time corresponding to each microphone. Optionally, the matrix corresponding to the subband frequencies corresponding to the sound source is first determined, and the target sound source direction is calculated according to the matrices corresponding to all the subband frequencies corresponding to the sound source, the above-mentioned signal vector function, and the delay time.
在实际应用过程中,为了便于对声音进行后续使用,需要先将声音信号通过傅立叶变换将原来难以处理的时域信号(声音信号)转换成了易于分析的频域信号,所述傅立叶变换的原理为任何连续测量的时序或信号,都可以表示为不同频率的正弦波信号的无限叠加,而根据该原理创立的傅立叶变换算法利用直接测量到的原始信号,以累加方式来计算该信号中不同正弦波信号的频率、振幅和相位。其中,有关傅立叶变换的具体实现方式本发明实施例在此不再进行赘述。In the practical application process, in order to facilitate the subsequent use of sound, it is necessary to first convert the sound signal through the Fourier transform to the time-domain signal (sound signal) that was originally difficult to process into a frequency-domain signal that is easy to analyze. The principle of the Fourier transform For any continuously measured time sequence or signal, it can be expressed as an infinite superposition of sine wave signals of different frequencies, and the Fourier transform algorithm created according to this principle uses the directly measured original signal to calculate the different sine in the signal in an additive manner. The frequency, amplitude, and phase of the wave signal. A specific implementation manner of the Fourier transform is not described in this embodiment of the present invention.
进一步地,空间滤波参数W f(ω)和原始频域信号Z(t,e )均为矩阵,将两个矩阵相乘:得到Y(ω,Ω)=W f(ω)Z(t,e ),乘积Y(ω,Ω)会对除目标声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形,使得固定方向的声音信号不失真,并且,对其他方向的声音信号产生抑制。 Further, the spatial filtering parameter W f (ω) and the original frequency domain signal Z (t, e ) are both matrices, and the two matrices are multiplied: Y (ω, Ω) = W f (ω) Z (t , e ), the product Y (ω, Ω) performs beamforming in a manner that suppresses frequency-domain signals other than the original frequency-domain signal pointed by the target sound source, so that the sound signal in a fixed direction is not distorted, and, Suppress sound signals in other directions.
在本实施例中,假设有7个声音源指向(包含一个90°的目标声音源指向)、4个麦克风(如图2所示的麦克风阵列3)采集声音,通过上述方法计算波束指向分别为:0°、30°、60°、90°、120°、150°、180°方向(共7个方向)的单波束成形。得到7个4*512的矩阵,4代表麦克风的数量,512代表将不同方向对应的频谱分别分解为512个子带。In this embodiment, it is assumed that there are 7 sound source points (including a 90 ° target sound source point) and 4 microphones (such as the microphone array 3 shown in FIG. 2) to collect sounds. The beam directions calculated by the above method are respectively : 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, 180 ° directions (7 directions in total) for single beam forming. 7 7 * 512 matrices are obtained, 4 represents the number of microphones, and 512 represents the frequency spectrum corresponding to different directions is decomposed into 512 subbands respectively.
步骤S420,计算目标声音源指向的增强语音。Step S420: Calculate the enhanced speech pointed by the target sound source.
在实际应用中采用以下方式计算目标声音源指向的增强语音,包括:In practical applications, the following methods are used to calculate the enhanced speech pointed by the target sound source, including:
以每个子带为单位,计算目标声音源指向的能量与所有声音源指向的能量和之间的比值增益;计算第一乘积B(ω,Ω)与比值增益的乘积,得到增强语音,其中,所述第一乘积为所述目标声音源指向对应的原始频域信号与所述空间滤波之间的乘积。Using each subband as a unit, calculate the ratio gain between the energy pointed by the target sound source and the energy sum directed by all sound sources; calculate the product of the first product B (ω, Ω) and the ratio gain to obtain enhanced speech, where: The first product is a product between the target sound source pointing to a corresponding original frequency domain signal and the spatial filtering.
在计算所有声音源指向的能量和时,其实质为将4个麦克风进行合并,即合并后得到7个1*512的矩阵,得到所有声音源指向的能量和记作Spectrum power of other directions,继续获取目标声音源指向的能量,记作:Spectrum power of target directions,计算目标声音源指向的能量Spectrum power of target directions与所有声音源指向的能量和Spectrum power of other directions的比值,得到比值增益Gain-mask。When calculating the energy sums pointed by all sound sources, the essence is to merge the 4 microphones, that is, to obtain 7 1 * 512 matrices, and obtain the energy sums pointed by all sound sources as Spectrum power of other directions, continue Obtain the energy pointed by the target sound source and record it as: Spectrum power of target directions. Calculate the ratio of the energy pointed by the target sound source to the energy of the target direction and the energy pointed by all sound sources and the Spectrum power of other directions to get the ratio gain Gain- mask.
继续计算第一乘积B(ω,Ω)与比值增益Gain-mask的乘积,得到增强语音Gain-mask-frame=B(ω,Ω)*Gain-mask。Continue to calculate the product of the first product B (ω, Ω) and the ratio gain Gain-mask to obtain the enhanced speech Gain-mask-frame = B (ω, Ω) * Gain-mask.
步骤S430,根据目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值。Step S430: Calculate an energy ratio based on the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
具体包括:将当前帧中所有子带对应的能量进行合并,并计算当前帧所有子带的能量和;计算所述目标声音源对应的子带能量与非目标声音源指向的所有子带的能量和之间的比值,得到能量比值。或者,计算所述目标声音源对应的子带能量与当前帧中所有子带的能量和之间的比值,得到能量比值。Specifically, the energies of all subbands in the current frame are combined, and the energy sum of all subbands in the current frame is calculated; the energy of the subband corresponding to the target sound source and the energy of all subbands pointed to by the non-target sound source are calculated. The ratio between and to get the energy ratio. Alternatively, the ratio between the energy of the subband corresponding to the target sound source and the energy sum of all subbands in the current frame is calculated to obtain the energy ratio.
当前帧中包含7个声音源方向的所有子带,将当前帧中所有子带对应的能量进行合并,首先,将每个声音源指向的所有子带进行合并,得到不同方向对应的频谱,得到7*1的矩阵,其中,7为7个声音源方向,1为合并后的子带(频谱)。其次,将不同方向对应的所有子带进行合并,得到1*1的矩阵,即根据该矩阵获取所有子带的能量和,记作Energy of each bin in all directions。第三,获取目标声音源对应的子带能量,记作:Energy of each bin in target directions,最后,计算所述目标声音源对应的子带能量与非目标声音源指向的所有子带的能量和(当前帧所有声音源指向对应的所有子带的能量和)之间的比值,得到能量比值,记作:Gain-mask-frame-bin。The current frame contains all subbands in the direction of the 7 sound sources. The energy corresponding to all subbands in the current frame is combined. First, all the subbands pointed by each sound source are combined to obtain the spectrum corresponding to the different directions. 7 * 1 matrix, where 7 is the direction of 7 sound sources and 1 is the combined subband (spectrum). Secondly, all subbands corresponding to different directions are combined to obtain a 1 * 1 matrix, that is, the energy sum of all subbands is obtained according to the matrix, and it is denoted as Energy of each bin direction. Third, obtain the subband energy corresponding to the target sound source, and record it as: Energy of each target in the target directions. Finally, calculate the subband energy corresponding to the target sound source and the energy of all subbands pointed by the non-target sound source. (The energy sum of all the sound sources pointing to the corresponding subbands in the current frame) to get the energy ratio, which is recorded as: Gain-mask-frame-bin.
步骤S240,通过平滑参数对当前帧与前一帧进行逐帧平滑处理。Step S240: Perform frame-by-frame smoothing processing on the current frame and the previous frame through the smoothing parameters.
本发明实施例中,进行平滑处理的目的在于,使连续两帧之前的语音能够平滑过渡。因此,在通过平滑参数对当前帧与前一帧进行逐帧平滑处理时,可以采用但不局限于以下方式实现:In the embodiment of the present invention, the purpose of performing the smoothing process is to enable a smooth transition of speech before two consecutive frames. Therefore, when smoothing the current frame and the previous frame frame by frame using the smoothing parameter, the following manners can be adopted but not limited to:
设置当前帧的平滑参数,使得当前帧的平滑参数与前一帧的平滑参数之和为第二预设值。优选地,第二预设值为1。计算前一帧的比值增益与前一帧对应的平滑参数的乘积以获取第二乘积,计算上述比值增益与当前帧对应的平滑参数的乘积以获取第三乘积。根据第二乘积与第三乘积之和对当前帧中的声音源进行逐帧平滑处理。Set the smoothing parameters of the current frame so that the sum of the smoothing parameters of the current frame and the smoothing parameters of the previous frame is the second preset value. Preferably, the second preset value is 1. Calculate the product of the ratio gain of the previous frame and the smoothing parameter corresponding to the previous frame to obtain a second product, and calculate the product of the ratio gain and the smoothing parameter corresponding to the current frame to obtain a third product. The frame-by-frame smoothing process is performed on the sound source in the current frame according to the sum of the second product and the third product.
在一种可选的实现方式中,平滑参数γ为一经验值,可设置当前帧的平滑参数γ为0.8,那么前一帧的平滑参数为(1-γ)=0.2,具体的,本发明实施例对此不做限定。由此,可以获取当前帧的比值增益以对当前帧中的声音源进行逐帧平滑处理。假设前一帧的比值增益为前一帧的比值增益为Previous Gain。则当前帧的比值增益Current Gain=Previous Gain*(1-γ)+γ*Gain-mask=Previous Gain*(1-γ)+γ*Spectrum power of target directions/Spectrum power of other directions。In an optional implementation manner, the smoothing parameter γ is an empirical value, and the smoothing parameter γ of the current frame can be set to 0.8, then the smoothing parameter of the previous frame is (1-γ) = 0.2. Specifically, the present invention The embodiment does not limit this. Thus, the ratio gain of the current frame can be obtained to perform frame-by-frame smoothing processing on the sound source in the current frame. Assume that the ratio gain of the previous frame is the ratio gain of the previous frame is Previous Gain. Then the current frame's ratio gain Current Gain = Previous Gain * (1-γ) + γ * Gain-mask = Previous Gain * (1-γ) + γ * Spectrum power of target directions / Spectrum power of other directions.
步骤S450,计算所述目标声音源指向对应的增强语音、能量比值与目标声音源指向的所述原始频域信号的乘积,并根据上述平滑处理结果输出所述乘积对应的语音。Step S450: Calculate a product of the corresponding enhanced speech and energy ratio pointed by the target sound source and the original frequency domain signal pointed by the target sound source, and output the speech corresponding to the product according to the smoothing result.
本实施例通过计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,并通过计算目标声音源指向的增强语音、能量比值和目标声音源指向的所述原始频域信号的乘积,同时通过平滑参数对当前帧与前一帧进行逐帧平滑,根据平滑处理结果输出该乘积对应的语音,进一步对非目标声音源的降噪处理,并进一步确保目标声音源指向的声音不失真。In this embodiment, multi-beam beamforming is obtained by calculating a product of spatial filtering parameters and original frequency domain signals corresponding to at least two sound source directions respectively, and by calculating an enhanced voice, an energy ratio, and a target sound source direction of the target sound source, The product of the original frequency domain signal, while smoothing the current frame and the previous frame by smoothing parameters, and outputting the speech corresponding to the product according to the smoothing processing result, further reducing the noise of non-target sound sources, and further ensuring The sound pointed by the target sound source is not distorted.
进一步的,作为对上述图1所示方法的实现,本发明另一实施例还提供了一种语音处理装置。该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。Further, as an implementation of the method shown in FIG. 1 described above, another embodiment of the present invention further provides a voice processing apparatus. This device embodiment corresponds to the foregoing method embodiment. For ease of reading, this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
进一步的,作为对上述图1所示方法的实现,本发明另一实施例还提供了一种波束成形装置。该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。Further, as an implementation of the method shown in FIG. 1, another embodiment of the present invention further provides a beamforming apparatus. This device embodiment corresponds to the foregoing method embodiment. For ease of reading, this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
图9是本发明实施例的一种波束成形装置的示意图。图10是本发明实施例的另一种波束成形装置的示意图。如图9所示,本实施例的波束成形装置9包括第一获取单元91、确定单元92、第二获取单元93和第一计算单元94。FIG. 9 is a schematic diagram of a beamforming apparatus according to an embodiment of the present invention. FIG. 10 is a schematic diagram of another beamforming apparatus according to an embodiment of the present invention. As shown in FIG. 9, the beamforming apparatus 9 of this embodiment includes a first obtaining unit 91, a determining unit 92, a second obtaining unit 93, and a first calculating unit 94.
其中,第一获取单元91用于获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同。确定单元92用于确定所述第一获取单元91获取的所述空间滤 波参数对应的声音源指向。第二获取单元93用于获取所述确定单元92确定的所述声音源指向对应的原始频域信号。第一计算单元94用于计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号进行抑制。The first obtaining unit 91 is configured to obtain spatial filtering parameters, and the spatial filtering parameters are different according to different angles and subband frequencies. The determining unit 92 is configured to determine a sound source corresponding to the spatial filtering parameter obtained by the first obtaining unit 91. The second obtaining unit 93 is configured to obtain that the sound source determined by the determining unit 92 points to a corresponding original frequency domain signal. The first calculation unit 94 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal, and the product is used to suppress other frequency domain signals except the original frequency domain signal pointed by the sound source.
进一步的,如图10所示,波束成形装置9还包括:Further, as shown in FIG. 10, the beamforming device 9 further includes:
第二计算单元95用于在第一获取单元93获取空间滤波参数之前,计算所述空间滤波参数。The second calculation unit 95 is configured to calculate the spatial filtering parameters before the first obtaining unit 93 obtains the spatial filtering parameters.
进一步的,如图10所示,所述第二计算单元95包括:Further, as shown in FIG. 10, the second calculation unit 95 includes:
第一计算模块951,用于计算声音源到达麦克风阵列的延迟时间。构建模块952,用于构建信号矢量函数。第二计算模块953,用于根据所述构建模块952构建的所述信号矢量函数及所述第一计算模块951计算的所述延迟时间计算声音源指向。第一设定模块954,用于设定第一限制条件,所述第一限制条件为白噪音增益限制。第二设定模块955,用于设定第二限制条件,所述第二限制条件为所述空间滤波参数与所述信号矢量函数的乘积为1。构造模块956,用于根据所述空间滤波参数及所述信号矢量函数构造损失函数。第三计算模块957,用于根据所述第一设定模块954设定的所述第一限制条件及所述第二设定模块设955定的所述第二限制条件,计算所述损失函数趋向最小值时的空间滤波参数。The first calculation module 951 is configured to calculate a delay time when the sound source reaches the microphone array. A building module 952 is used to build a signal vector function. A second calculation module 953 is configured to calculate a sound source direction according to the signal vector function constructed by the construction module 952 and the delay time calculated by the first calculation module 951. The first setting module 954 is configured to set a first limiting condition, where the first limiting condition is a white noise gain limitation. The second setting module 955 is configured to set a second limitation condition, where the second limitation condition is that a product of the spatial filtering parameter and the signal vector function is 1. A construction module 956 is configured to construct a loss function according to the spatial filtering parameter and the signal vector function. A third calculation module 957 is configured to calculate the loss function according to the first restriction condition set by the first setting module 954 and the second restriction condition set by the second setting module 955. Spatial filtering parameters towards the minimum.
进一步的,如图10所示,所述第一计算模块951包括:Further, as shown in FIG. 10, the first calculation module 951 includes:
第一确定子模块951a,用于确定麦克风阵列中麦克风之间的间距,以及声音源传播声音的速度。第二确定子模块951b,用于确定所述声音源指向的角度。计算子模块951c,用于根据所述麦克风之间的间距、速度及角度计算延迟时间。The first determining sub-module 951a is configured to determine a distance between microphones in the microphone array, and a speed at which a sound source propagates sound. The second determining sub-module 951b is configured to determine an angle pointed by the sound source. A calculation sub-module 951c is configured to calculate a delay time according to a distance, a speed, and an angle between the microphones.
进一步的,如图12所示,所述第二计算模块953包括:Further, as shown in FIG. 12, the second calculation module 953 includes:
确定子模块953a,用于确定所有子带频率对应的矩阵。计算子模块953b,用于根据所述确定子模块确定的所述所有子带频率对应的矩阵、所述信号矢量函数及所述延迟时间计算声音源指向。A determining sub-module 953a is configured to determine a matrix corresponding to all sub-band frequencies. A calculation sub-module 953b is configured to calculate a sound source direction according to the matrices corresponding to all the sub-band frequencies, the signal vector function, and the delay time determined by the determination sub-module.
进一步的,所述空间滤波参数为一矩阵。Further, the spatial filtering parameter is a matrix.
进一步的,所述声音源指向为平面波0°-180°的任意角度。Further, the sound source is directed to an arbitrary angle of 0 ° -180 ° of a plane wave.
由于本实施例所介绍的波束成形装置为可以执行本发明实施例中的波束成形方法的装置,故而基于本发明实施例中所介绍的波束成形方法,本领域所属技术人员能够了解本实施例的波束成形装置的具体实施方式以及其各种变化形式,所以在此对于 该波束成形装置如何实现本发明实施例中的波束成形方法不再详细介绍。只要本领域所属技术人员实施本发明实施例中波束成形方法所采用的装置,都属于本申请所欲保护的范围。Since the beamforming device described in this embodiment is a device that can execute the beamforming method in the embodiment of the present invention, based on the beamforming method described in the embodiment of the present invention, those skilled in the art can understand the The specific implementations of the beamforming device and its various variations, so how to implement the beamforming method in the embodiment of the present invention with the beamforming device will not be described in detail here. As long as a device used by a person skilled in the art to implement the beamforming method in the embodiment of the present invention falls within the protection scope of the present application.
进一步的,作为对上述图5所示方法的实现,本发明另一实施例还提供了一种多波束波束成形的装置。该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。Further, as an implementation of the method shown in FIG. 5, another embodiment of the present invention further provides a multi-beam beamforming apparatus. This device embodiment corresponds to the foregoing method embodiment. For ease of reading, this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
图11是本发明实施例的一种多波束波束成形装置的示意图。图12是本发明实施例的另一种多波束波束成形装置的示意图。如图11所示,本实施例的多波束波束成形装置11包括第一计算单元111、第二计算单元112和降噪单元113。FIG. 11 is a schematic diagram of a multi-beam beamforming apparatus according to an embodiment of the present invention. FIG. 12 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention. As shown in FIG. 11, the multi-beam beamforming apparatus 11 of this embodiment includes a first calculation unit 111, a second calculation unit 112, and a noise reduction unit 113.
其中,第一计算单元111用于计算目标声音源指向对应的波束成形输出。第二计算单元112用于通过阻塞矩阵计算噪音参数。降噪单元113用于根据所述第二计算单元112计算的所述噪音参数对第一计算单元111计算的所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。The first calculation unit 111 is configured to calculate that the target sound source points to a corresponding beamforming output. The second calculation unit 112 is configured to calculate a noise parameter by using a blocking matrix. The noise reduction unit 113 is configured to perform, according to the noise parameter calculated by the second calculation unit 112, a signal pointed by the target sound source calculated by the first calculation unit 111 to a non-target sound source other than the corresponding beamforming output. Noise reduction.
进一步的,如图12所示,所述第一计算单元111包括:Further, as shown in FIG. 12, the first calculation unit 111 includes:
第一获取模块1111,用于获取空间滤波参数。The first obtaining module 1111 is configured to obtain spatial filtering parameters.
确定模块1112,用于确定所述第一获取模块1111获取的所述空间滤波参数对应的目标声音源指向。A determining module 1112 is configured to determine a target sound source corresponding to the spatial filtering parameter obtained by the first obtaining module 1111.
第二获取模块1113,用于获取所述第一获取模块1111获取的目标声音源指向对应的原始频域信号。The second acquisition module 1113 is configured to acquire the target sound source acquired by the first acquisition module 1111 to point to the corresponding original frequency domain signal.
计算模块1114,用于计算所述空间滤波参数与目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形。A calculation module 1114 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain the beamforming pointed by the target sound source.
进一步的,如图12所示,第二计算单元112包括:Further, as shown in FIG. 12, the second calculation unit 112 includes:
第一计算模块1121,用于计算声音信号依次达到麦克风的频率响应。The first calculation module 1121 is configured to calculate a frequency response of the sound signal reaching the microphone in order.
构建模块1122,用于根据所述第一计算模块计算的所述频率响应构建所述阻塞矩阵。A construction module 1122 is configured to construct the blocking matrix according to the frequency response calculated by the first calculation module.
第二计算模块1123,用于根据所述构建模块构建的所述阻塞矩阵及所述其他声音源指向对应的原始频域信号,计算所述噪音参数。A second calculation module 1123 is configured to calculate the noise parameter according to the blocking matrix constructed by the construction module and the other sound sources pointing to corresponding original frequency domain signals.
进一步的,如图12所示,所述降噪单元113包括:Further, as shown in FIG. 12, the noise reduction unit 113 includes:
计算模块1131,用于通过多通道滤波算法及迭代算法,计算多通道最优滤波参数。A calculation module 1131 is configured to calculate a multi-channel optimal filtering parameter by using a multi-channel filtering algorithm and an iterative algorithm.
降噪模块1132,用于根据所述目标声音源的波束成形输出、最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的其他声音源指向的信号进行降噪。The noise reduction module 1132 is configured to perform, according to the beamforming output of the target sound source, an optimal filtering parameter, and the noise parameter, a signal directed by the sound source directed by a sound source other than the corresponding beamforming output. Noise reduction.
由于本实施例所介绍的多波束波束成形装置为可以执行本发明实施例中的多波束波束成形方法的装置,故而基于本发明实施例中所介绍的多波束波束成形方法,本领域所属技术人员能够了解本实施例的多波束波束成形装置的具体实施方式以及其各种变化形式,所以在此对于该多波束波束成形装置如何实现本发明实施例中的多波束波束成形方法不再详细介绍。只要本领域所属技术人员实施本发明实施例中多波束波束成形方法所采用的装置,都属于本申请所欲保护的范围。Since the multi-beam beamforming apparatus described in this embodiment is an apparatus that can execute the multi-beam beamforming method in the embodiment of the present invention, based on the multi-beam beamforming method described in the embodiment of the present invention, those skilled in the art The specific implementations of the multi-beam beamforming apparatus of this embodiment and its various variations can be understood, so how to implement the multi-beam beamforming apparatus in the embodiment of the present invention will not be described in detail here. As long as a device used by a person skilled in the art to implement the multi-beam beamforming method in the embodiment of the present invention falls within the scope of the present application.
进一步的,作为对上述图7所示方法的实现,本发明另一实施例还提供了一种多波束波束成形的装置。该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。Further, as an implementation of the method shown in FIG. 7, another embodiment of the present invention further provides a multi-beam beamforming apparatus. This device embodiment corresponds to the foregoing method embodiment. For ease of reading, this device embodiment does not repeat the details of the foregoing method embodiment one by one, but it should be clear that the device in this embodiment can correspondingly implement the foregoing method implementation. Everything in the example.
图13是本发明实施例的又一种多波束波束成形装置的示意图。图14是本发明实施例的又一种多波束波束成形装置的示意图。如图13所示,本实施例中的多波束波束成形装置13包括第一计算单元131、第二计算单元132、第三计算单元133和第四计算单元134。FIG. 13 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention. FIG. 14 is a schematic diagram of another multi-beam beamforming apparatus according to an embodiment of the present invention. As shown in FIG. 13, the multi-beam beamforming apparatus 13 in this embodiment includes a first calculation unit 131, a second calculation unit 132, a third calculation unit 133, and a fourth calculation unit 134.
其中,第一计算单元131,用于计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述各声音源指向包含一个目标声音源及至少一个非目标声音源指向。The first calculation unit 131 is configured to calculate a product of the spatial filtering parameter and the original frequency domain signals corresponding to the at least two sound source directions respectively to obtain multi-beam beamforming. The spatial filtering parameter varies with the angle of the sound source and the subband frequency. Each sound source point includes a target sound source point and at least one non-target sound source point.
第二计算单元132,用于分别计算目标声音源指向的增强语音。The second calculation unit 132 is configured to separately calculate the enhanced speech pointed by the target sound source.
第三计算单元133,用于根据目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值。The third calculation unit 133 is configured to calculate an energy ratio according to the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source.
第四计算单元134,用于计算目标声音源指向的所述原始频域信号与目标声音源指向对应的增强语音、能量比值的乘积,输出所述乘积对应的语音。A fourth calculation unit 134 is configured to calculate a product of the original frequency domain signal pointed by the target sound source and the enhanced speech and energy ratio corresponding to the target sound source direction, and output the speech corresponding to the product.
进一步的,如图14所示,多波束波束成形装置13还包括:Further, as shown in FIG. 14, the multi-beam beamforming device 13 further includes:
处理单元135,用于在所述第四计算单元134计算目标声音源指向的所述原始频域信号与目标声音源指向对应的增强语音、能量比值的乘积之前,通过平滑参数对当前帧与前一帧进行逐帧平滑处理。A processing unit 135, configured to: before the fourth calculation unit 134 calculates a product of the original frequency domain signal pointed by the target sound source and the target speech source pointed by the corresponding enhanced speech and energy ratio, smoothing the current frame One frame is smoothed frame by frame.
进一步的,如图14所示,所述第一计算单元131包括:Further, as shown in FIG. 14, the first calculation unit 131 includes:
第一获取模块1311,用于获取空间滤波参数。The first obtaining module 1311 is configured to obtain spatial filtering parameters.
确定模块1312,用于确定所述第一获取模块1311获取的所述空间滤波参数分别对应的至少两个声音源指向。A determining module 1312 is configured to determine at least two sound source directions respectively corresponding to the spatial filtering parameters obtained by the first obtaining module 1311.
第二获取模块1313,用于分别获取所述确定模块确定的至少两个声音源指向对应的原始频域信号。A second acquisition module 1313 is configured to acquire at least two sound sources determined by the determination module to point to corresponding original frequency domain signals.
计算模块1314,用于计算所述空间滤波参数分别与不同声音源指向对应的原始频域信号的乘积。A calculation module 1314 is configured to calculate products of the spatial filtering parameters and original frequency domain signals corresponding to different sound source directions, respectively.
进一步的,如图14所示,所述第二计算单元132包括:Further, as shown in FIG. 14, the second calculation unit 132 includes:
第一计算模块1321,用于以每个子带为单位,计算目标声音源指向的能量与所有声音源指向的能量和之间的比值增益。The first calculation module 1321 is configured to calculate a ratio gain between the energy pointed by the target sound source and the energy sum pointed by all the sound sources by using each subband as a unit.
第二计算模块1322,用于计算第一乘积与比值增益的乘积,得到增强语音,其中,所述第一乘积为所述目标声音源指向对应的原始频域信号与所述空间滤波之间的乘积。A second calculation module 1322 is configured to calculate a product of a first product and a ratio gain to obtain enhanced speech, where the first product is a signal between the target sound source pointing to a corresponding original frequency domain signal and the spatial filtering. product.
进一步的,如图14所示,所述第三计算单元133包括:Further, as shown in FIG. 14, the third calculation unit 133 includes:
合并模块1331,用于将当前帧中所有子带对应的能量进行合并。A combining module 1331 is configured to combine the energy corresponding to all subbands in the current frame.
第一计算模块1332,用于计算当前帧所有子带的能量和。The first calculation module 1332 is configured to calculate energy sums of all subbands in the current frame.
第二计算模块1333,用于计算所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和之间的比值,得到能量比值。A second calculation module 1333 is configured to calculate a ratio between the energy of the sub-band corresponding to the target sound source and the energy of all the sub-bands pointed to by at least one non-target sound source to obtain an energy ratio.
进一步的,如图14所示,所述处理单元135包括:Further, as shown in FIG. 14, the processing unit 135 includes:
设置模块1351,用于设置当前帧的平滑参数,使得当前帧的平滑参数与前一帧的平滑参数之和为1。A setting module 1351 is used to set the smoothing parameters of the current frame so that the sum of the smoothing parameters of the current frame and the smoothing parameters of the previous frame is 1.
计算模块1352,用于计算前一帧的比值增益与对应的平滑参数的乘积以获取第二乘积,计算当前帧的平滑参数与所述比值增益的乘积以获取第三乘积。A calculation module 1352 is configured to calculate a product of a ratio gain of a previous frame and a corresponding smoothing parameter to obtain a second product, and calculate a product of a smoothing parameter of the current frame and the ratio gain to obtain a third product.
处理模块1353,用于根据所述第一乘积与第二乘积之和对当前帧进行逐帧平滑处理。The processing module 1353 is configured to perform frame-by-frame smoothing processing on the current frame according to the sum of the first product and the second product.
进一步的,所述第四计算单元134,还用于计算所述目标声音源指向对应的增强语音、能量比值与目标声音源指向的所述原始频域信号的乘积,并根据平滑处理结果输出所述乘积对应的语音。Further, the fourth calculation unit 134 is further configured to calculate a product of the target sound source pointing to a corresponding enhanced voice, an energy ratio, and the original frequency domain signal pointed to by the target sound source, and output a smoothing result according to a smoothing result. The speech corresponding to the product is described.
本发明实施例提供的多波束波束成形的装置,计算空间滤波参数与至少两个声音 源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个其他声音源指向;计算目标声音源指向的增强语音;根据目标声音源对应的子带能量与至少一个其他声音源指向的所有子带的能量和,计算能量比值;计算目标声音源指向的所述原始频域信号与目标声音源指向对应的增强语音、能量比值的乘积,并输出所述乘积对应的语音,与现有技术相比,本发明实施例能够确保目标声音源指向的声音不失真,并且能够有效抑制其他声音方向的干扰。The apparatus for multi-beam beamforming provided by the embodiment of the present invention calculates a product of a spatial filtering parameter and at least two sound sources pointing to corresponding original frequency-domain signals to obtain multi-beam beamforming. The spatial filtering parameter varies with the angle of the sound source. Different from the sub-band frequency, the at least two sound source directions include a target sound source and at least one other sound source direction; calculating an enhanced speech pointed by the target sound source; and according to the sub-band energy corresponding to the target sound source and at least one Sum of the energy of all subbands pointed by other sound sources, calculate the energy ratio; calculate the product of the original frequency-domain signal pointed by the target sound source and the target sound source pointed to the corresponding enhanced speech, energy ratio, and output the product corresponding to the product Compared with the prior art, the embodiment of the present invention can ensure that the sound pointed by the target sound source is not distorted, and can effectively suppress interference from other sound directions.
由于本实施例所介绍的多波束波束成形装置为可以执行本发明实施例中的多波束波束成形方法的装置,故而基于本发明实施例中所介绍的多波束波束成形方法,本领域所属技术人员能够了解本实施例的多波束波束成形装置的具体实施方式以及其各种变化形式,所以在此对于该多波束波束成形装置如何实现本发明实施例中的多波束波束成形方法不再详细介绍。只要本领域所属技术人员实施本发明实施例中多波束波束成形方法所采用的装置,都属于本申请所欲保护的范围。Since the multi-beam beamforming apparatus described in this embodiment is an apparatus that can execute the multi-beam beamforming method in the embodiment of the present invention, based on the multi-beam beamforming method described in the embodiment of the present invention, those skilled in the art The specific implementations of the multi-beam beamforming apparatus of this embodiment and its various variations can be understood, so how to implement the multi-beam beamforming apparatus in the embodiment of the present invention will not be described in detail here. As long as a device used by a person skilled in the art to implement the multi-beam beamforming method in the embodiment of the present invention falls within the scope of the present application.
上述各装置均包括处理器和存储器,装置中的各个单元均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来实现上述方法时,确保目标空间指向的声音不失真,并对其他空间指向的声音进行有效抑制。存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flashRAM),存储器包括至少一个存储芯片。Each of the foregoing devices includes a processor and a memory. Each unit in the device is stored in the memory as a program unit, and the processor executes the program unit stored in the memory to implement a corresponding function. The processor contains a kernel, and the kernel retrieves the corresponding program unit from the memory. The kernel can set one or more, and when adjusting the kernel parameters to implement the above method, ensure that the sound pointed by the target space is not distorted, and the sound pointed by other spaces is effectively suppressed. Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flashRAM). Memory includes at least one storage chip.
本发明实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现上述语音处理方法。An embodiment of the present invention provides a storage medium on which a program is stored, and when the program is executed by a processor, the foregoing voice processing method is implemented.
本发明实施例提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述语音处理方法。An embodiment of the present invention provides a processor, where the processor is configured to run a program, and when the program runs, the foregoing voice processing method is performed.
图15是本发明实施例的一种电子设备的结构框图。如图15所示,电子设备17包括:FIG. 15 is a structural block diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 15, the electronic device 17 includes:
至少一个处理器151;At least one processor 151;
以及与所述处理器151连接的至少一个存储器152、总线153;其中,And at least one memory 152 and bus 153 connected to the processor 151;
所述处理器151、存储器152通过所述总线153完成相互间的通信;The processor 151 and the memory 152 complete communication with each other through the bus 153;
所述处理器151用于调用所述存储器152中的程序指令,以执行上述方法的任一 实施例。The processor 151 is configured to call program instructions in the memory 152 to execute any one of the foregoing methods.
本文中的电子设备可以是服务器、PC、PAD、手机、智能电视等一切包含麦克风的智能设备。The electronic devices in this article can be servers, PCs, PADs, mobile phones, smart TVs, and other smart devices that include microphones.
本发明实施例提供的电子设备,通过计算空间滤波参数与目标声音源指向对应的原始频域信号的乘积获取所述目标声音源指向的波束成形输出,并通过对非目标声音源指向进行降噪处理提高所述目标声音源指向的波束成形输出的信噪比。由此,可以确保目标空间指向的声音不失真,并对其他目标空间指向的声音进行有效抑制,从而提高目标空间指向的声音的信噪比。The electronic device provided by the embodiment of the present invention obtains a beamforming output pointed by the target sound source by calculating a product of a spatial filtering parameter and a target original sound source signal corresponding to the target sound source pointing, and performs noise reduction by pointing to a non-target sound source The processing improves the signal-to-noise ratio of the beamforming output pointed by the target sound source. Therefore, it is possible to ensure that the sound pointed by the target space is not distorted, and effectively suppress the sound pointed by other target spaces, thereby improving the signal-to-noise ratio of the sound pointed by the target space.
本发明实施例还提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行上述任一种语音处理方法。An embodiment of the present invention further provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute any one of the foregoing voice processing methods.
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时实现上述任一种语音处理方法的功能。This application also provides a computer program product that, when executed on a data processing device, implements the functions of any of the above-mentioned speech processing methods.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present application. It should be understood that each process and / or block in the flowcharts and / or block diagrams, and combinations of processes and / or blocks in the flowcharts and / or block diagrams can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing device to produce a machine, so that instructions generated by the processor of the computer or other programmable data processing device may be used to Means for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions The device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算 机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flashRAM)。存储器是计算机可读介质的示例。Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flashRAM). Memory is an example of a computer-readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitorymedia),如调制的数据信号和载波。Computer-readable media includes permanent and non-persistent, removable and non-removable media. Information storage can be accomplished by any method or technology. Information may be computer-readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "including", "comprising" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, product, or device that includes a series of elements includes not only those elements, but also Other elements not explicitly listed, or those that are inherent to such a process, method, product, or device. Without more restrictions, the elements defined by the sentence "including a ..." do not exclude that there are other identical elements in the process, method, product or equipment including the elements.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above are only examples of the present application and are not intended to limit the present application. For those skilled in the art, this application may have various modifications and changes. Any modification, equivalent replacement, and improvement made within the spirit and principle of this application shall be included in the scope of claims of this application.

Claims (26)

  1. 一种波束成形的方法,其特征在于,包括:A beamforming method includes:
    获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;确定所述空间滤波参数对应的声音源指向,并获取所述声音源指向对应的原始频域信号;Acquiring a spatial filtering parameter, which is different with different angles and subband frequencies; determining the sound source direction corresponding to the spatial filtering parameter, and obtaining the original frequency domain signal corresponding to the sound source direction;
    计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形。Calculate a product of the spatial filtering parameter and the original frequency domain signal, where the product is used to perform beamforming in a manner that suppresses other frequency domain signals other than the original frequency domain signal pointed by the sound source.
  2. 根据权利要求1所述的方法,其特征在于,在获取空间滤波参数之前,所述方法还包括:The method according to claim 1, wherein before the obtaining the spatial filtering parameters, the method further comprises:
    计算所述空间滤波参数。Calculate the spatial filtering parameters.
  3. 根据权利要求2所述的方法,其特征在于,所述计算空间滤波参数包括:The method according to claim 2, wherein the calculating spatial filtering parameters comprises:
    计算声音源到达麦克风阵列的延迟时间;Calculate the delay time for the sound source to reach the microphone array;
    根据所述延迟时间构建信号矢量函数,并根据所述信号矢量函数及所述延迟时间计算声音源指向;Constructing a signal vector function according to the delay time, and calculating a sound source direction according to the signal vector function and the delay time;
    根据预设的第一限制条件和第二限制条件,计算损失函数趋向最小值时的空间滤波参数,所述损失函数根据所述空间滤波参数和所述信号矢量函数构造;Calculating a spatial filtering parameter when the loss function approaches a minimum value according to a preset first limiting condition and a second limiting condition, and the loss function is constructed according to the spatial filtering parameter and the signal vector function;
    其中,所述第一限制条件具体为白噪音增益限制;所述第二限制条件具体为使得所述空间滤波参数与所述信号矢量函数的乘积为第一预设值。The first limitation condition is specifically a white noise gain limitation; the second limitation condition is that a product of the spatial filtering parameter and the signal vector function is a first preset value.
  4. 根据权利要求3所述的方法,其特征在于,计算声音源到达麦克风阵列的延迟时间包括:The method according to claim 3, wherein calculating the delay time for the sound source to reach the microphone array comprises:
    确定麦克风阵列中麦克风之间的间距,以及声音源传播声音的速度;Determine the spacing between the microphones in the microphone array, and the speed at which the sound source propagates the sound;
    确定所述声音源指向的角度;Determining an angle at which the sound source is pointing;
    根据所述麦克风之间的间距、所述声音源传播声音的速度及所述声音源指向的角度计算延迟时间。The delay time is calculated according to a distance between the microphones, a speed at which the sound source propagates sound, and an angle at which the sound source points.
  5. 根据权利要求3所述的方法,其特征在于,根据所述信号矢量函数及所述延迟时间计算声音源指向包括:The method according to claim 3, wherein calculating a sound source direction according to the signal vector function and the delay time comprises:
    确定所有子带频率对应的矩阵;Determine a matrix corresponding to all subband frequencies;
    根据所述所有子带频率对应的矩阵、所述信号矢量函数及所述延迟时间计算声音源指向。Calculate a sound source direction according to the matrix corresponding to all subband frequencies, the signal vector function, and the delay time.
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述空间滤波参数为 一矩阵。The method according to any one of claims 1-5, wherein the spatial filtering parameter is a matrix.
  7. 根据权利要求1-5中任一项所述的方法,其特征在于,所述声音源指向为平面波0°-180°的任意角度。The method according to any one of claims 1-5, wherein the sound source is directed at any angle of 0 ° -180 ° of a plane wave.
  8. 一种波束成形的装置,其特征在于,包括:A beamforming device includes:
    第一获取单元,用于获取空间滤波参数,所述空间滤波参数随角度和子带频率的不同而不同;A first obtaining unit, configured to obtain spatial filtering parameters, which are different with different angles and subband frequencies;
    确定单元,用于确定所述第一获取单元获取的所述空间滤波参数对应的声音源指向;A determining unit, configured to determine a sound source corresponding to the spatial filtering parameter obtained by the first obtaining unit;
    第二获取单元,用于获取所述确定单元确定的所述声音源指向对应的原始频域信号;A second obtaining unit, configured to obtain that the sound source determined by the determining unit points to a corresponding original frequency domain signal;
    第一计算单元,用于计算所述空间滤波参数及所述原始频域信号的乘积,所述乘积用于对除声音源指向的原始频域信号之外的其他频域信号产生抑制的方式进行波束成形。A first calculation unit is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal, and the product is used to perform suppression in a frequency domain signal other than the original frequency domain signal pointed by the sound source. Beamforming.
  9. 一种多波束波束成形的方法,其特征在于,包括:A method for multi-beam beamforming includes:
    计算目标声音源指向对应的波束成形输出;Calculate the target sound source pointing to the corresponding beamforming output;
    根据阻塞矩阵计算噪音参数;Calculate noise parameters based on the blocking matrix;
    根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。Performing noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output directed by the target sound source according to the noise parameter.
  10. 根据权利要求9所述的方法,其特征在于,计算目标声音源指向对应的波束成形输出包括:The method according to claim 9, wherein calculating the target sound source pointing to the corresponding beamforming output comprises:
    获取空间滤波参数,确定所述空间滤波参数对应的目标声音源指向;Acquiring spatial filtering parameters, and determining the target sound source direction corresponding to the spatial filtering parameters;
    获取所述目标声音源指向对应的原始频域信号;Acquiring that the target sound source points to a corresponding original frequency domain signal;
    计算所述空间滤波参数与所述目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形输出。Calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain the beamforming output pointed by the target sound source.
  11. 根据权利要求10所述的方法,其特征在于,根据阻塞矩阵计算噪音参数包括:The method according to claim 10, wherein calculating the noise parameter according to the blocking matrix comprises:
    计算声音信号依次达到麦克风的频率响应;Calculate the frequency response of the sound signal to the microphone in turn;
    根据所述频率响应构建所述阻塞矩阵;Constructing the blocking matrix according to the frequency response;
    根据所述阻塞矩阵及所述非目标声音源指向对应的原始频域信号,计算所述噪音参数。Calculate the noise parameter according to the blocking matrix and the non-target sound source pointing to a corresponding original frequency domain signal.
  12. 根据权利要求11所述的方法,其特征在于,根据所述噪音参数对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪包括:The method according to claim 11, wherein performing noise reduction on a signal pointed by a non-target sound source other than the corresponding beamforming output of the target sound source according to the noise parameter comprises:
    通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;Calculate multi-channel optimal filtering parameters through multi-channel filtering algorithm and iterative algorithm;
    根据所述目标声音源的波束成形输出、所述多通道最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。Performing noise reduction on a signal directed by a non-target sound source other than the corresponding beamforming output according to the beamforming output of the target sound source, the multi-channel optimal filtering parameter, and the noise parameter .
  13. 一种多波束波束成形的装置,其特征在于,包括:A multi-beam beamforming device, comprising:
    第一计算单元,用于计算目标声音源指向对应的波束成形输出;A first calculation unit, configured to calculate that a target sound source points to a corresponding beamforming output;
    第二计算单元,用于通过阻塞矩阵计算噪音参数;A second calculation unit, configured to calculate a noise parameter by using a blocking matrix;
    降噪单元,用于根据所述第二计算单元计算的所述噪音参数对所述第一计算单元计算的所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。A noise reduction unit, configured to perform, according to the noise parameter calculated by the second calculation unit, a signal pointed by the target sound source calculated by the first calculation unit to a non-target sound source other than a corresponding beamforming output; Noise reduction.
  14. 根据权利要求13所述的装置,其特征在于,所述第一计算单元包括:The apparatus according to claim 13, wherein the first calculation unit comprises:
    第一获取模块,用于获取空间滤波参数;A first acquisition module, configured to acquire spatial filtering parameters;
    确定模块,用于确定所述第一获取模块获取的所述空间滤波参数对应的目标声音源指向;A determining module, configured to determine a target sound source direction corresponding to the spatial filtering parameter obtained by the first obtaining module;
    第二获取模块,用于获取所述第一获取模块获取的目标声音源指向对应的原始频域信号;A second acquisition module, configured to acquire a target sound source acquired by the first acquisition module to point to a corresponding original frequency domain signal;
    计算模块,用于计算所述空间滤波参数与目标声音源指向对应的原始频域信号的乘积,得到目标声音源指向的波束成形输出。A calculation module is configured to calculate a product of the spatial filtering parameter and the original frequency domain signal corresponding to the target sound source pointing to obtain a beamforming output pointed by the target sound source.
  15. 根据权利要求14所述的装置,其特征在于,第二计算单元包括:The apparatus according to claim 14, wherein the second calculation unit comprises:
    第一计算模块,用于计算声音信号依次达到麦克风的频率响应;A first calculation module, configured to calculate a frequency response in which a sound signal reaches the microphone in order;
    构建模块,用于根据所述第一计算模块计算的所述频率响应构建所述阻塞矩阵;A construction module, configured to construct the blocking matrix according to the frequency response calculated by the first calculation module;
    第二计算模块,用于根据所述构建模块构建的所述阻塞矩阵及所述非目标声音源指向对应的原始频域信号,计算所述噪音参数。A second calculation module is configured to calculate the noise parameter according to the blocking matrix constructed by the construction module and the non-target sound source pointing to a corresponding original frequency domain signal.
  16. 根据权利要求14所述的装置,其特征在于,所述降噪单元包括:The apparatus according to claim 14, wherein the noise reduction unit comprises:
    计算模块,用于通过多通道滤波算法及迭代算法,计算多通道最优滤波参数;A calculation module for calculating multi-channel optimal filtering parameters through a multi-channel filtering algorithm and an iterative algorithm;
    降噪模块,用于根据所述目标声音源的波束成形输出、所述多通道最优滤波参数以及所述噪音参数,对所述目标声音源指向对应的波束成形输出之外的非目标声音源指向的信号进行降噪。A noise reduction module, configured to point the target sound source to a non-target sound source other than the corresponding beamforming output according to the beamforming output of the target sound source, the multi-channel optimal filtering parameter, and the noise parameter The pointed signal is denoised.
  17. 一种多波束波束成形的方法,其特征在于,包括:A method for multi-beam beamforming includes:
    计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个非目标声音源指向;Calculate the product of the spatial filtering parameters and at least two sound sources pointing to the corresponding original frequency domain signals to obtain multi-beam beamforming. The spatial filtering parameters vary with the angle of the sound source and the subband frequency. The at least two The sound source pointing includes a target sound source and at least one non-target sound source pointing;
    计算所述目标声音源指向的增强语音;Calculating the enhanced speech pointed to by the target sound source;
    根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值;Calculate the energy ratio according to the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source;
    计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音。A product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, and the speech corresponding to the product is output.
  18. 根据权利要求17所述的方法,其特征在于,在计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积之前,所述方法还包括:The method according to claim 17, wherein before calculating a product of the original frequency domain signal pointed by the target sound source, the target sound source pointed at the corresponding enhanced speech, and the product of the energy ratio, the method further include:
    通过平滑参数对当前帧与前一帧进行逐帧平滑处理。Perform smoothing frame-by-frame on the current frame and the previous frame through the smoothing parameters.
  19. 根据权利要求18所述的方法,其特征在于,所述计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形包括:The method according to claim 18, wherein the calculating the product of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions respectively to obtain multi-beam beamforming comprises:
    获取空间滤波参数,并确定所述空间滤波参数分别对应的至少两个声音源指向;Acquiring spatial filtering parameters, and determining at least two sound source directions respectively corresponding to the spatial filtering parameters;
    获取至少两个声音源指向分别对应的原始频域信号;Acquiring at least two sound sources pointing to respective original frequency domain signals;
    计算所述空间滤波参数分别与至少两个声音源指向对应的原始频域信号的乘积。Calculate products of the spatial filtering parameters and the original frequency domain signals corresponding to the at least two sound source directions, respectively.
  20. 根据权利要求19所述的方法,其特征在于,所述计算目标声音源指向的增强语音包括:The method according to claim 19, wherein the calculating the enhanced speech pointed by the target sound source comprises:
    以每个子带为单位,计算所述目标声音源指向的能量与所有声音源指向的能量和之间的比值增益;Calculate a ratio gain between the energy pointed by the target sound source and the energy sum pointed by all sound sources with each subband as a unit;
    计算第一乘积与所述比值增益的乘积,以获取所述增强语音,其中,所述第一乘积为所述目标声音源指向对应的原始频域信号与所述空间滤波参数之间的乘积。Calculate a product of a first product and the ratio gain to obtain the enhanced speech, wherein the first product is a product between the target sound source pointing to a corresponding original frequency domain signal and the spatial filtering parameter.
  21. 根据权利要求20所述的方法,其特征在于,根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值包括:The method according to claim 20, wherein calculating an energy ratio based on a sum of subband energy corresponding to the target sound source and energy of all subbands pointed to by at least one non-target sound source comprises:
    将当前帧中所有子带对应的能量进行合并,计算当前帧所有子带的能量和;Combine the energy corresponding to all subbands in the current frame to calculate the energy sum of all subbands in the current frame;
    计算所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和之间的比值,得到能量比值。Calculate the ratio between the energy of the subband corresponding to the target sound source and the energy sum of all the subbands pointed to by at least one non-target sound source to obtain the energy ratio.
  22. 根据权利要求21所述的方法,其特征在于,通过平滑参数对当前帧与前一 帧进行逐帧平滑处理包括:The method according to claim 21, wherein performing frame-by-frame smoothing processing on the current frame and the previous frame by using a smoothing parameter comprises:
    设置当前帧的平滑参数,使得当前帧的平滑参数与前一帧的平滑参数之和为第二预设值;Set the smoothing parameters of the current frame so that the sum of the smoothing parameters of the current frame and the smoothing parameters of the previous frame is the second preset value;
    计算前一帧的比值增益与前一帧的平滑参数以获取第二乘积;Calculate the ratio gain of the previous frame and the smoothing parameters of the previous frame to obtain the second product;
    计算当前帧的比值增益与当前帧的平滑参数的乘积以获取第三乘积;Calculate the product of the ratio gain of the current frame and the smoothing parameter of the current frame to obtain a third product;
    根据所述第二乘积与第三乘积之和对当前帧进行逐帧平滑处理。Performing frame-by-frame smoothing processing on the current frame according to the sum of the second product and the third product.
  23. 根据权利要求18-22中任一项所述的方法,其特征在于,计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音包括:The method according to any one of claims 18 to 22, wherein a product of an original frequency domain signal pointed by the target sound source, a corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, And outputting the speech corresponding to the product includes:
    计算所述目标声音源指向的原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,根据平滑处理结果输出所述乘积对应的语音。A product of the original frequency domain signal pointed by the target sound source, the corresponding enhanced speech pointed by the target sound source, and the energy ratio is calculated, and the speech corresponding to the product is output according to the smoothing processing result.
  24. 一种多波束波束成形的装置,其特征在于,包括:A multi-beam beamforming device, comprising:
    第一计算单元,用于计算空间滤波参数与至少两个声音源指向分别对应的原始频域信号的乘积,得到多波束波束成形,所述空间滤波参数随声音源的角度和子带频率的不同而不同,所述至少两个声音源指向包含一个目标声音源及至少一个非目标声音源声音源指向;A first calculation unit is configured to calculate a product of spatial filtering parameters and original frequency domain signals corresponding to at least two sound source directions, respectively, to obtain multi-beam beamforming. The spatial filtering parameters vary with the angle of the sound source and the subband frequency. Differently, the at least two sound source points include a target sound source point and at least one non-target sound source point sound source point;
    第二计算单元,用于分别计算目标声音源指向的增强语音;A second calculation unit, configured to separately calculate the enhanced speech pointed by the target sound source;
    第三计算单元,用于根据所述目标声音源对应的子带能量与至少一个非目标声音源指向的所有子带的能量和,计算能量比值;A third calculation unit, configured to calculate an energy ratio based on the sum of the energy of the subband corresponding to the target sound source and the energy of all the subbands pointed to by at least one non-target sound source;
    第四计算单元,用于计算所述目标声音源指向的所述原始频域信号、所述目标声音源指向对应的增强语音以及所述能量比值的乘积,并输出所述乘积对应的语音。A fourth calculation unit is configured to calculate a product of the original frequency domain signal pointed by the target sound source, a corresponding enhanced speech pointed by the target sound source, and the energy ratio, and output a speech corresponding to the product.
  25. 一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行以实现如权利要求1-7中任一项所述的方法和/或如权利要求9-12中任一项所述的方法和/或如权利要求17-23中任一项所述的方法。A storage medium having a computer program stored thereon, characterized in that the program is executed by a processor to implement the method according to any one of claims 1-7 and / or any one of claims 9-12 And / or the method according to any of claims 17-23.
  26. 一种电子设备,其特征在于,所述电子设备中包括处理器、存储器和总线;所述处理器、所述存储器通过所述总线完成相互间的通信;所述存储器中用于存储程序指令,所述程序指令被所述处理器执行以实现如权利要求1-7中任一项所述的方法和/或如权利要求9-12中任一项所述的方法和/或如权利要求17-23中任一项所述的方法。An electronic device, characterized in that the electronic device includes a processor, a memory, and a bus; the processor and the memory complete communication with each other through the bus; and the memory is used to store program instructions, The program instructions are executed by the processor to implement the method according to any of claims 1-7 and / or the method according to any of claims 9-12 and / or claim 17 The method according to any of -23.
PCT/CN2019/087621 2018-05-22 2019-05-20 Beamforming method, multi-beam forming method and apparatus, and electronic device WO2019223650A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN201810496448.5A CN108551625A (en) 2018-05-22 2018-05-22 The method, apparatus and electronic equipment of beam forming
CN201810497069.8A CN108717495A (en) 2018-05-22 2018-05-22 The method, apparatus and electronic equipment of multi-beam beam forming
CN201810496450.2A CN108831498B (en) 2018-05-22 2018-05-22 Multi-beam beamforming method and device and electronic equipment
CN201810496450.2 2018-05-22
CN201810496448.5 2018-05-22
CN201810497069.8 2018-05-22

Publications (1)

Publication Number Publication Date
WO2019223650A1 true WO2019223650A1 (en) 2019-11-28

Family

ID=68617121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087621 WO2019223650A1 (en) 2018-05-22 2019-05-20 Beamforming method, multi-beam forming method and apparatus, and electronic device

Country Status (1)

Country Link
WO (1) WO2019223650A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070127736A1 (en) * 2003-06-30 2007-06-07 Markus Christoph Handsfree system for use in a vehicle
CN101369427A (en) * 2007-08-13 2009-02-18 哈曼贝克自动系统股份有限公司 Noise reduction by combined beamforming and post-filtering
CN106023996A (en) * 2016-06-12 2016-10-12 杭州电子科技大学 Sound identification method based on cross acoustic array broadband wave beam formation
CN108551625A (en) * 2018-05-22 2018-09-18 出门问问信息科技有限公司 The method, apparatus and electronic equipment of beam forming
CN108717495A (en) * 2018-05-22 2018-10-30 出门问问信息科技有限公司 The method, apparatus and electronic equipment of multi-beam beam forming
CN108831498A (en) * 2018-05-22 2018-11-16 出门问问信息科技有限公司 The method, apparatus and electronic equipment of multi-beam beam forming

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070127736A1 (en) * 2003-06-30 2007-06-07 Markus Christoph Handsfree system for use in a vehicle
CN101369427A (en) * 2007-08-13 2009-02-18 哈曼贝克自动系统股份有限公司 Noise reduction by combined beamforming and post-filtering
CN106023996A (en) * 2016-06-12 2016-10-12 杭州电子科技大学 Sound identification method based on cross acoustic array broadband wave beam formation
CN108551625A (en) * 2018-05-22 2018-09-18 出门问问信息科技有限公司 The method, apparatus and electronic equipment of beam forming
CN108717495A (en) * 2018-05-22 2018-10-30 出门问问信息科技有限公司 The method, apparatus and electronic equipment of multi-beam beam forming
CN108831498A (en) * 2018-05-22 2018-11-16 出门问问信息科技有限公司 The method, apparatus and electronic equipment of multi-beam beam forming

Similar Documents

Publication Publication Date Title
CN109102822B (en) Filtering method and device based on fixed beam forming
CN108831498B (en) Multi-beam beamforming method and device and electronic equipment
KR101724514B1 (en) Sound signal processing method and apparatus
US10080088B1 (en) Sound zone reproduction system
CN109616136B (en) Adaptive beam forming method, device and system
Salvati et al. Incoherent frequency fusion for broadband steered response power algorithms in noisy environments
JP6939786B2 (en) Sound field forming device and method, and program
CN104699445A (en) Audio information processing method and device
US11651772B2 (en) Narrowband direction of arrival for full band beamformer
US20160219365A1 (en) Adaptive Beamforming for Eigenbeamforming Microphone Arrays
CN108717495A (en) The method, apparatus and electronic equipment of multi-beam beam forming
CN102421050A (en) Apparatus and method for enhancing audio quality using non-uniform configuration of microphones
JP6987075B2 (en) Audio source separation
Kumar et al. The spherical harmonics root-MUSIC
CN107369460B (en) Voice enhancement device and method based on acoustic vector sensor space sharpening technology
CN111681665A (en) Omnidirectional noise reduction method, equipment and storage medium
CN113299307A (en) Microphone array signal processing method, system, computer device and storage medium
CN108551625A (en) The method, apparatus and electronic equipment of beam forming
WO2019223650A1 (en) Beamforming method, multi-beam forming method and apparatus, and electronic device
CN113223552B (en) Speech enhancement method, device, apparatus, storage medium, and program
CN106448693B (en) A kind of audio signal processing method and device
Firoozabadi et al. Combination of nested microphone array and subband processing for multiple simultaneous speaker localization
Zhang et al. Performance comparison of UCA and UCCA based real-time sound source localization systems using circular harmonics SRP method
CN110661510B (en) Beam former forming method, beam forming device and electronic equipment
CN113491137B (en) Flexible differential microphone array with fractional order

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19807611

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19807611

Country of ref document: EP

Kind code of ref document: A1