WO2016183791A1 - Procédé et dispositif de traitement de signal vocal - Google Patents

Procédé et dispositif de traitement de signal vocal Download PDF

Info

Publication number
WO2016183791A1
WO2016183791A1 PCT/CN2015/079245 CN2015079245W WO2016183791A1 WO 2016183791 A1 WO2016183791 A1 WO 2016183791A1 CN 2015079245 W CN2015079245 W CN 2015079245W WO 2016183791 A1 WO2016183791 A1 WO 2016183791A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
target sound
image
microphone array
source region
Prior art date
Application number
PCT/CN2015/079245
Other languages
English (en)
Chinese (zh)
Inventor
赵天宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201580079468.7A priority Critical patent/CN107534725B/zh
Priority to PCT/CN2015/079245 priority patent/WO2016183791A1/fr
Publication of WO2016183791A1 publication Critical patent/WO2016183791A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the present invention relates to the field of voice processing technologies, and in particular, to a voice signal processing method and apparatus.
  • the voice signal is generally picked up by a microphone, and the picked-up voice signal is often interfered by signals such as ambient noise, other speakers' voices, reverberations, etc., so that the quality of the voice is seriously degraded, and therefore, it is necessary to pick up
  • the voice signal is effectively noise-reduced to suppress noise and improve voice quality.
  • a common noise reduction technology is a noise reduction method based on a microphone array.
  • the principle is to use a microphone array to locate a sound source to determine a beam direction, and enhance a voice signal received by the microphone and in the beam direction. At the same time try to suppress interference in other directions.
  • the above method can be used to reduce noise.
  • the embodiment of the invention discloses a voice signal processing method and device, which can improve the accuracy of sound source localization and effectively improve the noise reduction effect of the voice signal.
  • a first aspect of the embodiments of the present invention discloses a voice signal processing method, including:
  • the speech signal After receiving the speech signal through the microphone array, the speech signal is enhanced by a minimum variance distortion-free response MVDR beamforming algorithm according to a relative position of the target sound source region and the microphone array.
  • the location information of the target sound source region relative to the image capturing device is obtained by using a point feature positioning manner ,include:
  • the image storage device and the microphone array are pre-stored. Determining the relative position of the target sound source area and the microphone array, and determining the relative position of the target sound source area and the microphone array, including:
  • Determining a center of the target sound source region according to coordinates of the microphone array in a coordinate system of the image capturing device and coordinates of a center point of the target sound source region in a coordinate system of the image capturing device
  • the relative position of the point to the array of microphones is the relative position of the target sound source area to the microphone array.
  • the distance between any two microphones in the microphone array is greater than half of the wavelength of the voice signal.
  • the voice signal After receiving the voice signal through the microphone array, the voice signal is enhanced by using a minimum variance distortion-free response MVDR beam shape algorithm according to the relative position of the target sound source region and the microphone array, including:
  • the microphone array After receiving the voice signal through the microphone array, calculating a linear distance from a center point of the target sound source area to each microphone in the microphone array according to a relative position of the target sound source area and the microphone array, And calculating a sound path difference between a center point of the target sound source region and any two microphones, wherein the sound path difference is a center point of the target sound source region to one of the two microphones An absolute difference between a linear distance from a center point of the target sound source region to a linear distance of the other of the two microphones;
  • a second aspect of the embodiments of the present invention discloses a voice signal processing apparatus, including:
  • An acquisition unit configured to collect an image of a target speaker by using an image acquisition device
  • a first determining unit configured to determine, from the image, a mouth region of the target speaker as a target sound source region
  • An acquiring unit configured to acquire, by using a point feature positioning manner, position information of the target sound source area relative to the image capturing device;
  • a second determining unit configured to determine the target sound source region and the location according to a spatially relative position of the image capturing device and the microphone array stored in advance, and position information of the target sound source region relative to the image capturing device The relative position of the microphone array;
  • a processing unit configured to: after receiving the voice signal through the microphone array, enhance the voice signal by using a minimum variance distortion-free response MVDR beamforming algorithm according to a relative position of the target sound source region and the microphone array deal with.
  • the acquiring unit includes:
  • a first acquiring sub-unit configured to extract at least one point feature of the target sound source region, and acquire coordinates of the point feature in a coordinate system of the image according to a pixel value of the point feature, where the image
  • the coordinate origin in the coordinate system is a vertical projection point of the optical axis of the image acquisition device on the image, and two mutually perpendicular axes in the coordinate system of the image are in the plane of the image;
  • a second acquiring subunit configured to acquire, according to coordinates of the point feature in a coordinate system of the image and a focal length of the image capturing device, coordinates of a center point of the target sound source region in the image capturing device a coordinate of the target sound source area relative to the image acquisition device; wherein a coordinate origin in the coordinate system of the image acquisition device is a center point of the image acquisition device, and the image One of the two axes perpendicular to each other in the coordinate system of the acquisition device is perpendicular to the plane of the image, and the other two axes are respectively parallel to the two axes of the coordinate system of the image.
  • the second determining unit includes:
  • a first determining subunit configured to determine coordinates of the microphone array in a coordinate system of the image capturing device according to a spatially relative position of the image capturing device and the microphone array stored in advance;
  • a second determining subunit configured to determine, according to coordinates of the microphone array in a coordinate system of the image capturing device, and coordinates of a center point of the target sound source region in a coordinate system of the image capturing device a relative position of a center point of the target sound source region and the microphone array as a relative position of the target sound source region and the microphone array.
  • the distance between any two microphones in the microphone array is greater than half the wavelength of the speech signal.
  • the processing unit includes:
  • a first calculating subunit configured to calculate a center point of the target sound source area to the microphone according to a relative position of the target sound source area and the microphone array after receiving a voice signal through the microphone array a linear distance of each microphone in the array, and calculating a sound path difference from a center point of the target sound source region to any two microphones, wherein the sound path difference is a center point of the target sound source region to the arbitrary An absolute difference between a linear distance of one of the two microphones and a linear distance from a center point of the target sound source region to another of the two microphones;
  • a second calculating subunit configured to calculate a delay from a center point of the target sound source area to the any two microphones according to a sound path difference between the center point of the target sound source area and the any two microphones ;
  • a delay compensation subunit configured to perform time delay compensation on the any two microphones according to a delay from a center point of the target sound source area to the any two microphones, to enhance the The voice signal in the direction of the target sound source area.
  • a third aspect of the embodiments of the present invention discloses a voice signal processing apparatus, including: a processor, a memory, a communication bus, an image acquisition device, and a microphone array;
  • the memory is used to store programs and data
  • the communication bus is configured to establish connection communication between the processor, the memory, the image acquisition device, and the microphone array;
  • the processor is configured to invoke the program stored in the memory, and perform the following steps:
  • the speech signal After receiving the speech signal through the microphone array, the speech signal is enhanced by a minimum variance distortion-free response MVDR beamforming algorithm according to a relative position of the target sound source region and the microphone array.
  • the processor acquires the target sound source region relative to the image capturing device by using a point feature positioning manner.
  • the location information is specifically as follows:
  • the processor includes the data pre-stored according to the memory, including The spatial relative position of the image acquisition device and the microphone array, and the position information of the target sound source region relative to the image acquisition device, the manner of determining the relative position of the target sound source region and the microphone array Specifically:
  • the distance between any two microphones in the microphone array is greater than half the wavelength of the speech signal.
  • the processor After receiving the voice signal through the microphone array, the processor uses the minimum variance distortion-free response MVDR beam shape algorithm to enhance the voice signal according to the relative position of the target sound source region and the microphone array. Specifically:
  • the image of the target speaker may be collected by the image acquisition device, and the mouth region of the target speaker is determined according to the image as the target sound source region, and the relative image collection of the target sound source region may be acquired by the point feature positioning method.
  • the voice signal is enhanced by the minimum variance distortion-free response MVDR beamforming algorithm.
  • the present invention is implemented
  • the image acquisition device and the microphone array can be combined to locate the sound source, thereby improving the accuracy of sound source localization; further, in the voice enhancement process, accurate sound source localization is beneficial to improve the noise reduction effect of the voice signal. .
  • FIG. 1 is a schematic flowchart of a voice signal processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of another voice signal processing method according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of coordinates of a target sound source positioning disclosed in an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a voice signal processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of another voice signal processing apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of still another voice signal processing apparatus according to an embodiment of the present invention.
  • the embodiment of the invention discloses a voice signal processing method and device, which can improve the accuracy of sound source localization and effectively improve the noise reduction effect of the voice signal. The details are described below separately.
  • FIG. 1 is a schematic flowchart diagram of a voice signal processing method according to an embodiment of the present invention. As shown in FIG. 1, the voice signal processing method may include the following steps:
  • the voice signal processing device may collect the object in real time through the image acquisition device.
  • the image of the target speaker may be an image of the target speaker collected in real time by the image acquisition device when the voice signal processing device starts a video call or a hands-free conference.
  • the voice signal processing device may include, but is not limited to, a smart phone, a personal computer, a multimedia player, a videophone, and a device that can implement communication.
  • the image collection device may be one or more, may be integrated in the voice signal processing device, or may be an external device independent of the voice signal processing device and maintain a communication connection with the voice signal processing device;
  • the image acquisition device It can be a device such as a camera or a camera, which is not limited in the embodiment of the present invention.
  • the face detection process it may be detected whether the face image of the target speaker, ie, the face detection process, is included in the image before determining the mouth region of the target speaker from the image.
  • the implementation process of the feature-based face detection method is to compare the feature information of the extracted image with the pre-stored face feature information to determine whether the face is included;
  • the implementation process of the template matching face detection method is The image is matched with a pre-established face template to determine whether a face is included;
  • the appearance-based face detection method is to compare the image with a pre-trained face and a non-face classifier to determine whether Contains faces.
  • the face detection method described above may be used alone or in combination.
  • a Hal Haar mouth feature classifier is used in the face image region to locate the approximate position of the mouth on the face image;
  • the principle that the feature distribution satisfies the one-third ratio, and the position of the acquired mouth in the approximate position of the lower third of the face is determined as the final position of the mouth, and is defined as the mouth region.
  • the mouth area is the target sound source area.
  • the point feature positioning is a positioning method using a single frame image, and the relative position and posture of the image capturing device are determined according to n feature points on the target sound source region, that is, the positioning is performed.
  • the image acquisition device takes an image containing n spatial points, and the coordinates of the n spatial points are known to determine the coordinates of the n spatial points in the coordinate system of the image acquisition device, thereby obtaining the target sound source region. Relative to the location information of the image acquisition device. Where n is an integer greater than zero.
  • S104 Determine a relative position of the target sound source area and the microphone array according to a spatial relative position of the image storage device and the microphone array stored in advance, and position information of the target sound source area relative to the image capturing device.
  • the microphone array includes at least two microphones, and each of the microphones may be an omnidirectional receiving type microphone, that is, a voice signal in each direction may be picked up.
  • the microphone array can be integrated inside the speech signal processing device or can be in communication with the speech signal processing device and in communication with the speech signal processing device.
  • the spatial relative position between the image capturing device and the microphone array may be known, and may be stored in advance in the memory of the voice signal processing device.
  • the spatial relative position between the image acquisition device and the microphone array stored in advance, and the position information of the target sound source region and the image acquisition device acquired in step S103 can be determined, and the target sound source region and the microphone array can be determined. relative position.
  • the voice signal After receiving the voice signal through the microphone array, the voice signal is enhanced by using a minimum variance distortion-free response MVDR beamforming algorithm according to a relative position of the target sound source region and the microphone array.
  • the beam can be utilized.
  • the forming technique performs enhancement processing on the voice signals collected by the respective microphones to enhance the voice signals of the respective microphones in the direction of the target sound source, and suppress the received voice signals in other directions.
  • the beamforming technology performs weighting processing on the voice signals received by the respective microphones, enhances the voice signals in a specific direction, and weakens the voice signals in other directions, thereby obtaining a voice signal from a specific direction.
  • the specific direction is Each microphone points in the direction of the target source.
  • the beamforming capability of the microphone array makes it possible to provide a directional sound source while providing more than a single microphone High system output signal to noise ratio.
  • beamforming technology is quite common. Common beamforming algorithms include LMS (Least mean square) algorithm, RLS (Recursive Least Squares) algorithm, and MVDR (Minimum Variance Distortion Less Response). Distortion response) algorithm and so on.
  • the MVDR beamforming algorithm is selected in the embodiment of the present invention, and the principle is that the speech signal of interest is output without distortion, and the beam output noise variance is minimized. Compared with the LMS algorithm, the RLS algorithm, etc., the MVDR algorithm can increase the array gain, so the noise suppression ability is stronger.
  • the image of the target speaker may be collected by the image acquisition device, and the mouth region of the target speaker is determined according to the image as the target sound source region, and the relative image collection of the target sound source region may be acquired by the point feature positioning method.
  • the voice signal is enhanced by the minimum variance distortion-free response MVDR beamforming algorithm.
  • FIG. 2 is a schematic flowchart diagram of another voice signal processing method according to an embodiment of the present invention. As shown in FIG. 2, the voice signal processing method may include the following steps:
  • the voice signal processing device when the voice signal processing device starts a mode such as a video call or a hands-free conference, the image of the target speaker may be collected by one or more image acquisition devices.
  • the image capturing device may be a camera, a camera, or the like, which is not limited in the embodiment of the present invention.
  • the target speaker may be one or multiple.
  • An image acquisition device may be used to capture images of multiple target speakers, or multiple image acquisition devices may be used to capture images of multiple target speakers.
  • the mouth area of the target speaker may be determined from the image according to a preset algorithm, and positioned as the target sound source area.
  • the mouth area of multiple target speakers can be determined at the same time to obtain a plurality of target sound source areas.
  • a plurality of dot features can be extracted on the target sound source region, and the dot features have corresponding pixel values in the image, so that the pixel values can be regarded as point features in the image.
  • the coordinate system of the image is a two-dimensional coordinate system, wherein the coordinate origin is a vertical projection point of the optical axis of the image acquisition device on the image, and the two axes are perpendicular to each other and are in the plane of the image.
  • the coordinate system of the image acquisition device may be constructed with the center point (ie, the optical center) of the image acquisition device as the coordinate origin, and the coordinate system of the image acquisition device is a three-dimensional coordinate system, and the three axes are perpendicular to each other, wherein One axis is perpendicular to the plane of the image, and the other two axes are parallel to the two axes of the image's coordinate system.
  • the positional relationship between the coordinate system of the image acquisition device and the coordinate system of the image may be utilized to determine the target sound source.
  • the coordinates of the point feature on the area are in the coordinate system of the image acquisition device, and the coordinates of one of the point features of the target sound source area or the coordinates of the center point of the target sound source area are selected as the target sound source area relative to the image acquisition device. location information.
  • FIG. 3 is a schematic diagram of coordinates of target sound source positioning disclosed in an embodiment of the present invention.
  • FIG. 3 shows only one image acquisition device, and the case where only the microphone array comprising two microphones m 1 and m 2 of.
  • the arrangement of the image capturing device and the microphone array shown in FIG. 3 does not constitute a limitation of the present invention, and they may be arranged on the same straight line or in any arrangement, and may also include more than FIG. Show more image acquisition devices and microphones.
  • the o point is the center point (ie, the optical center) of the image acquisition device
  • the o' point is the vertical projection point of the optical axis of the image acquisition device on the image
  • the coordinate system of the image is taken as the coordinate origin of the o' point
  • the axes u and v are perpendicular to each other and in the plane of the image. Selecting a plurality of point features on the target sound source region, and the coordinates of the point features are known in the coordinate system of the image, wherein the point M is the center point of the target sound source region, and the point M is in the coordinate system of the image. The coordinates are also known.
  • the coordinate system of the image acquisition device takes o point as the coordinate origin, and the three axes x, y and z are perpendicular to each other, wherein the y axis is perpendicular to the plane of the image, and the foot is o' point, x axis and u
  • the axes are parallel, the z-axis is parallel to the v-axis, and o o' is the focal length of the image acquisition device.
  • S205 Determine a relative position of the target sound source area and the microphone array according to the spatial relative position of the pre-stored image capturing device and the microphone array, and the position information of the target sound source area relative to the image capturing device.
  • step S205 may include the following steps:
  • the spatial relative position of the image capturing device and the microphone array is known.
  • the two microphones m 1 and m 2 included in the microphone array are all located on the x-axis, and the image capturing device is The distance is both L. Therefore, the coordinates of the two microphones in the coordinate system of the image acquisition device can be determined according to the distance between the two microphones and the image acquisition device. Obtaining the coordinates of the two microphones m 1 , m 2 in the coordinate system of the image acquisition device, and the coordinates of the center point M of the target sound source region in the coordinate system of the image acquisition device, the center of the target sound source region can be determined. The relative position between the point M and the two microphones m 1 , m 2 .
  • the voice signal After receiving the voice signal through the microphone array, the voice signal is enhanced by using a minimum variance distortion-free response MVDR beamforming algorithm according to a relative position of the target sound source region and the microphone array.
  • the distance between any two microphones in the microphone array is greater than half of the wavelength of the voice signal.
  • the wavelength of the speech signal is the result obtained by dividing the propagation speed of the speech signal in the air by the frequency of the speech signal.
  • step S206 may include the following steps:
  • the linear distance between the center point of the target sound source area and each microphone in the microphone array can be calculated, and the target sound source can be further calculated.
  • the delay ⁇ t of any two of the above microphones, at this time, the delay ⁇ t is the time difference of the voice signals sent by the arbitrary two microphones respectively received by the target sound source region.
  • delay compensation is performed on any two microphones according to the delay ⁇ t, thereby enhancing the voice signal received by each microphone in the direction of the target sound source region, and suppressing the voice signal in other directions.
  • the coordinates of the microphone m 1 and the microphone m 2 in the coordinate system of the image acquisition device are respectively obtained, and the center point M of the target sound source region is determined under the coordinate system of the image acquisition device.
  • the coordinates of the line can be used to calculate the linear distance S 1 between the center point M of the target sound source region and the microphone m 1 , and the linear distance S 2 between the center point M of the target sound source region and the microphone m 2 .
  • of the linear distance between the center point M of the target sound source region and the microphone m 1 and the microphone m 2 can be calculated, and the absolute difference
  • is divided by the propagation speed of the speech signal in the air (generally 340 m / s), that is, the center point M of the target sound source region is obtained to the microphone m 1 and the microphone m 2 Delay ⁇ t, and delay compensation of the microphone m 1 and the microphone m 2 according to the delay ⁇ t, with the maximum output microphone m 1 and the microphone m 2 pointing to the voice signal in the direction of the center point M of the target sound source region, and Try to suppress the received voice signals in other directions.
  • the voice signal processing method described in FIG. 2 may further include the following steps:
  • the enhanced processed speech signal may be filtered by an IIR (Infinite Impulse Response) digital filter to appropriately raise a higher frequency band in the speech signal band, thereby improving the speech signal.
  • IIR Infinite Impulse Response
  • the image capturing device and the microphone array can be combined to locate the sound source, thereby improving the accuracy of the sound source positioning; further, in the voice enhancement In the process, accurate sound source localization is beneficial to improve the noise reduction effect of the speech signal.
  • FIG. 4 is a schematic structural diagram of a voice signal processing apparatus according to an embodiment of the present invention.
  • the voice signal processing apparatus shown in FIG. 4 can be used to execute the language disclosed in the embodiment of the present invention. Sound signal processing method.
  • the voice signal processing apparatus may include:
  • the collecting unit 401 is configured to collect an image of the target speaker by using the image capturing device.
  • the collecting unit 401 may collect an image of the target speaker in real time through the image collecting device when the voice signal processing device starts a video call or a hands-free conference.
  • the image capturing device may be one or more, may be integrated in the voice signal processing device, or may be independent of the external device of the voice signal processing device and maintain a communication connection with the voice signal processing device; the image capturing device may be a camera,
  • the device and the like are not limited in the embodiment of the present invention.
  • the first determining unit 402 is configured to determine, from the image, a mouth region of the target speaker as the target sound source region.
  • the voice signal processing device may detect whether the image of the face of the target speaker is included in the image, that is, the face detection process.
  • the face detection process There are several methods for common face detection: feature-based face detection, template matching face detection, appearance-based face detection, and so on.
  • the implementation process of the feature-based face detection method is to compare the feature information of the extracted image with the pre-stored face feature information to determine whether the face is included;
  • the implementation process of the template matching face detection method is The image is matched with a pre-established face template to determine whether a face is included;
  • the appearance-based face detection method is to compare the image with a pre-trained face and a non-face classifier to determine whether Contains faces.
  • the face detection method described above may be used alone or in combination.
  • the first determining unit 402 may adopt a Hal Haar mouth feature classifier, and locate the mouth on the face image. Approximate position; according to the principle that the facial facial feature distribution satisfies the one-third ratio, the position of the acquired mouth in the approximate position of the lower third of the face is determined as the final position of the mouth, and It is defined as a mouth area, which is the target sound source area.
  • the obtaining unit 403 is configured to acquire location information of the target sound source area relative to the image capturing device by using a point feature positioning manner.
  • the point feature positioning is a positioning method using a single frame image, which is based on Identifying the relative position and posture of the image acquisition device by using n feature points on the source region, that is, using the image acquisition device to capture an image containing n spatial points, and the coordinates of the n spatial points are known.
  • the coordinates of the n spatial points in the coordinate system of the image capturing device are determined, thereby obtaining position information of the target sound source region relative to the image capturing device.
  • n is an integer greater than zero.
  • the second determining unit 404 is configured to determine a relative position of the target sound source area and the microphone array according to the spatial relative position of the image storage device and the microphone array stored in advance, and the position information of the target sound source area and the image capturing device.
  • the microphone array includes at least two microphones, and each of the microphones may be an omnidirectional receiving type microphone, that is, a voice signal in each direction may be picked up.
  • the microphone array can be integrated inside the speech signal processing device or can be in communication with the speech signal processing device and in communication with the speech signal processing device.
  • the spatial relative position between the image capturing device and the microphone array may be known, and may be stored in advance in the memory of the voice signal processing device.
  • the second determining unit 404 can determine the target sound source region by using the spatial relative position between the image capturing device and the microphone array stored in advance, and the position information of the target sound source region and the image capturing device acquired by the acquiring unit 403. The relative position between the array and the microphone.
  • the processing unit 405 is configured to perform enhancement processing on the voice signal by using a minimum variance distortion-free response MVDR beamforming algorithm according to a relative position of the target sound source region and the microphone array after receiving the voice signal through the microphone array.
  • the processing is performed.
  • the unit 405 can perform enhancement processing on the voice signals collected by the respective microphones by using a beamforming technique to enhance the voice signals of the respective microphones in the direction of the target sound source, and suppress the received voice signals in other directions.
  • the beamforming technology performs weighting processing on the voice signals received by the respective microphones, enhances the voice signals in a specific direction, and weakens the voice signals in other directions, thereby obtaining a voice signal from a specific direction.
  • the specific direction is Each microphone points in the direction of the target source.
  • Mike The beamforming capability of the wind array allows it to provide a higher system output signal-to-noise ratio than a single microphone while capturing a directional sound source.
  • beamforming technology is quite common. Common beamforming algorithms include LMS algorithm, RLS algorithm, MVDR algorithm and so on.
  • the MVDR beamforming algorithm is selected in the embodiment of the present invention, and the principle is that the speech signal of interest is output without distortion, and the beam output noise variance is minimized. Compared with the LMS algorithm, the RLS algorithm, etc., the MVDR algorithm can increase the array gain, so the noise suppression ability is stronger.
  • the collecting unit 401 may collect an image of the target speaker through the image capturing device, and the first determining unit 402 determines the mouth region of the target speaker according to the image as the target sound source region, and the obtaining unit 403 may pass
  • the point feature locating mode acquires the location information of the target sound source region relative to the image capturing device
  • the second determining unit 404 is configured according to the spatial relative position of the image capturing device and the microphone array stored in advance, and the location information of the target sound source region relative to the image capturing device. Determining the relative position of the target sound source area and the microphone array.
  • the processing unit 405 After receiving the voice signal through the microphone array, the processing unit 405 uses the minimum variance distortion-free response MVDR beamforming algorithm to the voice signal according to the relative position of the target sound source area and the microphone array. Enhance processing.
  • the image acquisition device and the microphone array can be combined to locate the sound source, thereby improving the accuracy of the sound source localization; further, during the speech enhancement process, the accurate sound is obtained. Source positioning is beneficial to improve the noise reduction effect of the speech signal.
  • FIG. 5 is a schematic structural diagram of another voice signal processing apparatus according to an embodiment of the present invention.
  • the voice signal processing apparatus shown in FIG. 5 can be used to execute the voice signal processing method disclosed in the embodiment of the present invention.
  • the voice signal processing apparatus may include:
  • the collecting unit 501 is configured to collect an image of the target speaker by using the image capturing device.
  • the first determining unit 502 is configured to determine, from the image, a mouth region of the target speaker as the target sound source region.
  • the obtaining unit 503 is configured to acquire location information of the target sound source area relative to the image capturing device by using a point feature positioning manner.
  • the obtaining unit 503 may further include:
  • the first acquiring unit 5031 is configured to extract at least one point feature of the target sound source region, and acquire coordinates of the point feature in a coordinate system of the image according to the pixel value of the point feature, where the coordinate origin in the coordinate system of the image is
  • the optical axis of the image capture device is a vertical projection point on the image, and the two axes perpendicular to each other in the coordinate system of the image are in the plane of the image.
  • a second acquiring unit 5032 configured to acquire coordinates of a center point of the target sound source region in a coordinate system of the image capturing device according to coordinates of the point feature in a coordinate system of the image and a focal length of the image capturing device, as a target sound source
  • the position information of the region relative to the image capturing device wherein the coordinate origin in the coordinate system of the image capturing device is the center point of the image capturing device, and one of the three axes perpendicular to each other in the coordinate system of the image capturing device It is perpendicular to the plane of the image, and the other two axes are parallel to the two axes of the coordinate system of the image.
  • the second determining unit 504 is configured to determine a relative position of the target sound source area and the microphone array according to the spatial relative position of the image storage device and the microphone array stored in advance, and the position information of the target sound source area and the image capturing device.
  • the second determining unit 504 may further include:
  • the first determining subunit 5041 is configured to determine coordinates of the microphone array in a coordinate system of the image capturing device according to a spatial relative position of the image storage device and the microphone array stored in advance.
  • a second determining subunit 5042 configured to determine a center point of the target sound source area according to coordinates of the microphone array in a coordinate system of the image capturing device and coordinates of a center point of the target sound source area in a coordinate system of the image capturing device The relative position to the microphone array as the relative position of the target sound source area to the microphone array.
  • the processing unit 505 is configured to perform enhancement processing on the voice signal by using a minimum variance distortion-free response MVDR beamforming algorithm according to a relative position of the target sound source region and the microphone array after receiving the voice signal through the microphone array.
  • the distance between any two microphones in the microphone array is greater than half of the wavelength of the voice signal.
  • the processing unit 505 may further include:
  • a first calculating subunit 5051 configured to: after receiving the voice signal through the microphone array, according to the mesh Calculating the relative position of the source region and the microphone array, calculating the linear distance from the center point of the target sound source region to each microphone in the microphone array, and calculating the sound path difference from the center point of the target sound source region to any two microphones, wherein
  • the sound path difference is an absolute difference between a linear distance from a center point of the target sound source region to one of the two microphones and a linear distance from a center point of the target sound source region to another microphone of any two of the above microphones value.
  • the second calculating sub-unit 5052 is configured to calculate a delay from a center point of the target sound source area to any two of the microphones according to a sound path difference between the center point of the target sound source area and any two of the above microphones.
  • the delay compensation sub-unit 5053 is configured to perform delay compensation on any two of the microphones according to a delay from a center point of the target sound source area to any two of the above microphones, so as to enhance a direction of the target sound source area received by each microphone. Voice signal.
  • the image capturing device and the microphone array can be combined to locate the sound source, thereby improving the accuracy of the sound source positioning; further, the voice enhancement is performed.
  • accurate sound source localization is beneficial to improve the noise reduction effect of the speech signal.
  • FIG. 6 is a schematic structural diagram of still another voice signal processing apparatus according to an embodiment of the present invention.
  • the voice signal processing apparatus shown in FIG. 6 can be used to perform the voice signal processing method disclosed in the embodiment of the present invention.
  • the voice signal processing apparatus 600 may include at least one processor 601, such as a CPU (Central Processing Unit), at least one image acquisition device 602, a microphone array 603, a memory 604, and a communication bus 605. .
  • the communication bus 605 is used to implement connection communication between these components.
  • the structure of the speech signal processing apparatus shown in FIG. 6 does not constitute a limitation of the present invention, and it may be a bus-shaped structure or a star-shaped structure, and may also include FIG. More or fewer parts, or some parts, or different parts.
  • the image capturing device 602 may be a camera, a camera, or the like for collecting an image of a target speaker; the microphone array 603 includes at least two microphones for receiving voice signals in various directions.
  • the memory 604 may be a high speed RAM memory or a non-volatile memory, such as at least one disk memory.
  • the memory 604 can optionally also be at least one storage device located remotely from the aforementioned processor 601.
  • the memory 604 as a computer storage medium may include an operating system, a voice signal processing program, data, and the like, which are not limited in the embodiment of the present invention.
  • the processor 601 can be used to call a speech signal processing program stored in the memory 604 to perform the following operations:
  • the speech signal After receiving the speech signal through the microphone array 603, the speech signal is enhanced by the minimum variance distortion-free response MVDR beamforming algorithm according to the relative position of the target sound source region and the microphone array 603.
  • the manner in which the processor 601 obtains the location information of the target sound source region relative to the image capturing device 602 by using the point feature positioning manner may be:
  • the coordinates of the center point of the target sound source region in the coordinate system of the image capturing device 602 are acquired as the target sound source region relative to the image capturing device 602.
  • Position information wherein the coordinate origin in the coordinate system of the image capturing device 602 is the center point of the image capturing device 602, and the two in the coordinate system of the image capturing device 602
  • One of the three axes perpendicular to each other is perpendicular to the plane of the image, and the remaining two axes are respectively parallel to the two axes of the coordinate system of the image.
  • the processor 601 determines the target sound source region according to the spatial relative position of the image capturing device 602 and the microphone array 603 included in the voice signal processing data stored in advance by the memory 604, and the position information of the target sound source region relative to the image capturing device 602.
  • the manner of the relative position of the microphone array 603 may specifically be:
  • the relative position of the center point of the target sound source region to the microphone array 603 is determined according to the coordinates of the microphone array 603 in the coordinate system of the image capturing device 602 and the coordinates of the center point of the target sound source region in the coordinate system of the image capturing device 602.
  • the position is the relative position of the target sound source area to the microphone array 603.
  • the distance between any two microphones in the microphone array 603 is greater than half of the wavelength of the voice signal.
  • the processor 601 after receiving the voice signal through the microphone array 603, the processor 601 enhances the voice signal by using the minimum variance distortion-free response MVDR beamforming algorithm according to the relative position of the target sound source region and the microphone array 603.
  • the specific method can be:
  • any two of the above The microphones are time-delay compensated to enhance the speech signal received by each microphone in the direction of the target sound source area.
  • the image capturing device and the microphone array can be combined to locate the sound source, thereby improving the accuracy of the sound source positioning; further, the voice enhancement is performed.
  • accurate sound source localization is beneficial to improve the noise reduction effect of the speech signal.
  • the voice signal processing apparatus introduced in the embodiment of the present invention may implement some or all of the processes in the voice signal processing method embodiment introduced by the present invention in conjunction with FIG. 1 or FIG.
  • the units in the apparatus of the embodiment of the present invention may be combined, divided, and deleted according to actual needs.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: Flash disk, Read-Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

L'invention concerne, dans ses modes de réalisation, un procédé et un dispositif de traitement de signal. Le procédé consiste: à capturer, au moyen d'un appareil de capture d'image, une image d'un locuteur cible; à déterminer, à partir de cette image, une région de la bouche du locuteur cible, et à considérer cette région comme une région de source vocale cible; à acquérir des informations de position de la région de source vocale cible par rapport à l'appareil de capture d'image par positionnement de caractéristiques de points; en fonction d'une position spatiale relative pré-stockée entre l'appareil de capture d'image et un réseau de microphones, et en fonction des informations de position de la région de source vocale cible par rapport à l'appareil de capture d'image, à déterminer une position relative de la région de source vocale cible et du réseau de microphones; après réception du signal vocal par l'intermédiaire du réseau de microphones, en fonction de la position relative de la région de source vocale cible et du réseau de microphones, à améliorer le signal vocal en utilisant un algorithme de formation de faisceau à réponse sans distorsion à variance minimale (MVDR). Les modes de réalisation de l'invention assurent une plus grande précision de positionnement de la source vocale et améliorent efficacement un effet de réduction de bruit du signal vocal.
PCT/CN2015/079245 2015-05-19 2015-05-19 Procédé et dispositif de traitement de signal vocal WO2016183791A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201580079468.7A CN107534725B (zh) 2015-05-19 2015-05-19 一种语音信号处理方法及装置
PCT/CN2015/079245 WO2016183791A1 (fr) 2015-05-19 2015-05-19 Procédé et dispositif de traitement de signal vocal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/079245 WO2016183791A1 (fr) 2015-05-19 2015-05-19 Procédé et dispositif de traitement de signal vocal

Publications (1)

Publication Number Publication Date
WO2016183791A1 true WO2016183791A1 (fr) 2016-11-24

Family

ID=57319205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/079245 WO2016183791A1 (fr) 2015-05-19 2015-05-19 Procédé et dispositif de traitement de signal vocal

Country Status (2)

Country Link
CN (1) CN107534725B (fr)
WO (1) WO2016183791A1 (fr)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108200515A (zh) * 2017-12-29 2018-06-22 苏州科达科技股份有限公司 多波束会议拾音系统及方法
CN108957392A (zh) * 2018-04-16 2018-12-07 深圳市沃特沃德股份有限公司 声源方向估计方法和装置
CN109451291A (zh) * 2018-12-29 2019-03-08 像航(上海)科技有限公司 无介质浮空投影声源定向语音交互系统、智能汽车
WO2019061292A1 (fr) * 2017-09-29 2019-04-04 深圳传音通讯有限公司 Procédé de réduction du bruit pour borne et borne
CN110767246A (zh) * 2018-07-26 2020-02-07 深圳市优必选科技有限公司 一种噪声处理的方法、装置及机器人
CN110764520A (zh) * 2018-07-27 2020-02-07 杭州海康威视数字技术股份有限公司 飞行器控制方法、装置、飞行器和存储介质
CN110808048A (zh) * 2019-11-13 2020-02-18 联想(北京)有限公司 语音处理方法、装置、系统及存储介质
CN111323753A (zh) * 2018-12-13 2020-06-23 蔚来汽车有限公司 定位汽车内语音源的方法
CN111580050A (zh) * 2020-05-28 2020-08-25 国网上海市电力公司 一种用于识别gis设备异响声源位置的装置及方法
CN111601198A (zh) * 2020-04-24 2020-08-28 达闼科技成都有限公司 应用麦克风跟踪说话人的方法、装置及计算设备
CN111688580A (zh) * 2020-05-29 2020-09-22 北京百度网讯科技有限公司 智能后视镜进行拾音的方法以及装置
CN111722186A (zh) * 2020-06-30 2020-09-29 中国平安人寿保险股份有限公司 基于声源定位的拍摄方法、装置、电子设备及存储介质
CN112205002A (zh) * 2018-12-06 2021-01-08 松下知识产权经营株式会社 信号处理装置以及信号处理方法
CN112261528A (zh) * 2020-10-23 2021-01-22 汪洲华 一种多路定向拾音的音频输出方法及系统
CN112466323A (zh) * 2020-11-24 2021-03-09 中核检修有限公司 一种光学图像与声学图像融合方法及系统
CN112826446A (zh) * 2020-12-30 2021-05-25 上海联影医疗科技股份有限公司 一种医学扫描语音增强方法、装置、系统及存储介质
CN112951257A (zh) * 2020-09-24 2021-06-11 上海译会信息科技有限公司 一种音频图像采集设备及说话人定位及语音分离方法
CN113314138A (zh) * 2021-04-25 2021-08-27 普联国际有限公司 基于麦克风阵列的声源监听分离方法、装置及存储介质
US20210343042A1 (en) * 2019-06-17 2021-11-04 Tencent Technology (Shenzhen) Company Limited Audio acquisition device positioning method and apparatus, and speaker recognition method and system
CN113726947A (zh) * 2020-05-26 2021-11-30 Oppo广东移动通信有限公司 语音通话方法、装置、终端及存储介质
CN114442039A (zh) * 2020-11-05 2022-05-06 中国移动通信集团山东有限公司 一种声源定位方法、装置和电子设备
WO2023016053A1 (fr) * 2021-08-12 2023-02-16 北京荣耀终端有限公司 Procédé de traitement de signal sonore et dispositif électronique
CN115831141A (zh) * 2023-02-02 2023-03-21 小米汽车科技有限公司 车载语音的降噪方法、装置、车辆及存储介质
CN116165607A (zh) * 2023-02-15 2023-05-26 深圳市拔超科技股份有限公司 采用多个麦克风阵列实现声源精确定位系统及定位方法

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9565493B2 (en) 2015-04-30 2017-02-07 Shure Acquisition Holdings, Inc. Array microphone system and method of assembling the same
US9554207B2 (en) 2015-04-30 2017-01-24 Shure Acquisition Holdings, Inc. Offset cartridge microphones
US10367948B2 (en) 2017-01-13 2019-07-30 Shure Acquisition Holdings, Inc. Post-mixing acoustic echo cancellation systems and methods
CN110121048A (zh) * 2018-02-05 2019-08-13 青岛海尔多媒体有限公司 一种会议一体机的控制方法及控制系统和会议一体机
CN110495185B (zh) * 2018-03-09 2022-07-01 深圳市汇顶科技股份有限公司 语音信号处理方法及装置
US11523212B2 (en) 2018-06-01 2022-12-06 Shure Acquisition Holdings, Inc. Pattern-forming microphone array
US11297423B2 (en) 2018-06-15 2022-04-05 Shure Acquisition Holdings, Inc. Endfire linear array microphone
JP7126143B2 (ja) * 2018-07-18 2022-08-26 パナソニックIpマネジメント株式会社 無人飛行体、情報処理方法およびプログラム
US10206036B1 (en) * 2018-08-06 2019-02-12 Alibaba Group Holding Limited Method and apparatus for sound source location detection
CN112889296A (zh) 2018-09-20 2021-06-01 舒尔获得控股公司 用于阵列麦克风的可调整的波瓣形状
JP2022526761A (ja) 2019-03-21 2022-05-26 シュアー アクイジッション ホールディングス インコーポレイテッド 阻止機能を伴うビーム形成マイクロフォンローブの自動集束、領域内自動集束、および自動配置
CN113841419A (zh) 2019-03-21 2021-12-24 舒尔获得控股公司 天花板阵列麦克风的外壳及相关联设计特征
US11558693B2 (en) 2019-03-21 2023-01-17 Shure Acquisition Holdings, Inc. Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
WO2020237206A1 (fr) 2019-05-23 2020-11-26 Shure Acquisition Holdings, Inc. Réseau de haut-parleurs orientables, système et procédé associé
US11302347B2 (en) 2019-05-31 2022-04-12 Shure Acquisition Holdings, Inc. Low latency automixer integrated with voice and noise activity detection
CN110225430A (zh) * 2019-06-12 2019-09-10 付金龙 一种降噪骨传导耳麦及其降噪方法
JP2022545113A (ja) 2019-08-23 2022-10-25 シュアー アクイジッション ホールディングス インコーポレイテッド 指向性が改善された一次元アレイマイクロホン
CN112578338B (zh) * 2019-09-27 2024-05-14 阿里巴巴集团控股有限公司 声源定位方法、装置、设备及存储介质
CN110716180B (zh) * 2019-10-17 2022-03-15 北京华捷艾米科技有限公司 一种基于人脸检测的音频定位方法及装置
US12028678B2 (en) 2019-11-01 2024-07-02 Shure Acquisition Holdings, Inc. Proximity microphone
CN110933254B (zh) * 2019-12-11 2021-09-07 杭州叙简科技股份有限公司 一种基于图像分析的声音过滤系统及其声音过滤方法
CN112964256B (zh) * 2019-12-13 2024-02-27 佛山市云米电器科技有限公司 室内定位方法、智能家电设备及计算机可读存储介质
CN113141285B (zh) * 2020-01-19 2022-04-29 海信集团有限公司 一种沉浸式语音交互方法及系统
US11552611B2 (en) 2020-02-07 2023-01-10 Shure Acquisition Holdings, Inc. System and method for automatic adjustment of reference gain
CN113450769B (zh) * 2020-03-09 2024-06-25 杭州海康威视数字技术股份有限公司 语音提取方法、装置、设备和存储介质
CN113516989A (zh) * 2020-03-27 2021-10-19 浙江宇视科技有限公司 声源音频的管理方法、装置、设备和存储介质
US11706562B2 (en) 2020-05-29 2023-07-18 Shure Acquisition Holdings, Inc. Transducer steering and configuration systems and methods using a local positioning system
CN113767432A (zh) * 2020-06-29 2021-12-07 深圳市大疆创新科技有限公司 音频处理方法、音频处理装置、电子设备
CN111932619A (zh) * 2020-07-23 2020-11-13 安徽声讯信息技术有限公司 结合图像识别和语音定位的麦克风跟踪系统及方法
CN112614508B (zh) * 2020-12-11 2022-12-06 北京华捷艾米科技有限公司 音视频结合的定位方法、装置、电子设备以及存储介质
WO2022165007A1 (fr) 2021-01-28 2022-08-04 Shure Acquisition Holdings, Inc. Système de mise en forme hybride de faisceaux audio
CN113093106A (zh) * 2021-04-09 2021-07-09 北京华捷艾米科技有限公司 一种声源定位方法及系统
CN114205725A (zh) * 2021-12-01 2022-03-18 云知声智能科技股份有限公司 一种无线扩音设备、方法、装置、终端设备及存储介质
CN114911449A (zh) * 2022-04-08 2022-08-16 南京地平线机器人技术有限公司 音量控制方法、装置、存储介质和电子设备
DE202023103428U1 (de) 2023-06-21 2023-06-28 Richik Kashyap Ein Sprachqualitätsschätzsystem für reale Signale basierend auf nicht negativer frequenzgewichteter Energie

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101674410A (zh) * 2008-09-12 2010-03-17 Lg电子株式会社 在移动终端上调整图像的显示方向
JP2010233173A (ja) * 2009-03-30 2010-10-14 Sony Corp 信号処理装置、および信号処理方法、並びにプログラム
CN104012074A (zh) * 2011-12-12 2014-08-27 华为技术有限公司 用于数据处理系统的智能音频和视频捕捉系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01253787A (ja) * 1988-04-01 1989-10-11 Ishikawajima Harima Heavy Ind Co Ltd 訓練シミュレータ用模擬視界再現方法
JP3627058B2 (ja) * 2002-03-01 2005-03-09 独立行政法人科学技術振興機構 ロボット視聴覚システム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101674410A (zh) * 2008-09-12 2010-03-17 Lg电子株式会社 在移动终端上调整图像的显示方向
JP2010233173A (ja) * 2009-03-30 2010-10-14 Sony Corp 信号処理装置、および信号処理方法、並びにプログラム
CN104012074A (zh) * 2011-12-12 2014-08-27 华为技术有限公司 用于数据处理系统的智能音频和视频捕捉系统

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019061292A1 (fr) * 2017-09-29 2019-04-04 深圳传音通讯有限公司 Procédé de réduction du bruit pour borne et borne
CN108200515B (zh) * 2017-12-29 2021-01-22 苏州科达科技股份有限公司 多波束会议拾音系统及方法
CN108200515A (zh) * 2017-12-29 2018-06-22 苏州科达科技股份有限公司 多波束会议拾音系统及方法
CN108957392A (zh) * 2018-04-16 2018-12-07 深圳市沃特沃德股份有限公司 声源方向估计方法和装置
CN110767246A (zh) * 2018-07-26 2020-02-07 深圳市优必选科技有限公司 一种噪声处理的方法、装置及机器人
CN110764520A (zh) * 2018-07-27 2020-02-07 杭州海康威视数字技术股份有限公司 飞行器控制方法、装置、飞行器和存储介质
CN112205002B (zh) * 2018-12-06 2024-06-14 松下知识产权经营株式会社 信号处理装置以及信号处理方法
CN112205002A (zh) * 2018-12-06 2021-01-08 松下知识产权经营株式会社 信号处理装置以及信号处理方法
CN111323753A (zh) * 2018-12-13 2020-06-23 蔚来汽车有限公司 定位汽车内语音源的方法
CN109451291A (zh) * 2018-12-29 2019-03-08 像航(上海)科技有限公司 无介质浮空投影声源定向语音交互系统、智能汽车
US20210343042A1 (en) * 2019-06-17 2021-11-04 Tencent Technology (Shenzhen) Company Limited Audio acquisition device positioning method and apparatus, and speaker recognition method and system
US11915447B2 (en) * 2019-06-17 2024-02-27 Tencent Technology (Shenzhen) Company Limited Audio acquisition device positioning method and apparatus, and speaker recognition method and system
CN110808048A (zh) * 2019-11-13 2020-02-18 联想(北京)有限公司 语音处理方法、装置、系统及存储介质
CN111601198A (zh) * 2020-04-24 2020-08-28 达闼科技成都有限公司 应用麦克风跟踪说话人的方法、装置及计算设备
CN113726947B (zh) * 2020-05-26 2022-09-09 Oppo广东移动通信有限公司 语音通话方法、装置、终端及存储介质
CN113726947A (zh) * 2020-05-26 2021-11-30 Oppo广东移动通信有限公司 语音通话方法、装置、终端及存储介质
CN111580050A (zh) * 2020-05-28 2020-08-25 国网上海市电力公司 一种用于识别gis设备异响声源位置的装置及方法
CN111688580A (zh) * 2020-05-29 2020-09-22 北京百度网讯科技有限公司 智能后视镜进行拾音的方法以及装置
US11631420B2 (en) 2020-05-29 2023-04-18 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Voice pickup method for intelligent rearview mirror, electronic device and storage medium
CN111722186A (zh) * 2020-06-30 2020-09-29 中国平安人寿保险股份有限公司 基于声源定位的拍摄方法、装置、电子设备及存储介质
CN111722186B (zh) * 2020-06-30 2024-04-05 中国平安人寿保险股份有限公司 基于声源定位的拍摄方法、装置、电子设备及存储介质
CN112951257A (zh) * 2020-09-24 2021-06-11 上海译会信息科技有限公司 一种音频图像采集设备及说话人定位及语音分离方法
CN112261528B (zh) * 2020-10-23 2022-08-26 汪洲华 一种多路定向拾音的音频输出方法及系统
CN112261528A (zh) * 2020-10-23 2021-01-22 汪洲华 一种多路定向拾音的音频输出方法及系统
CN114442039A (zh) * 2020-11-05 2022-05-06 中国移动通信集团山东有限公司 一种声源定位方法、装置和电子设备
CN112466323A (zh) * 2020-11-24 2021-03-09 中核检修有限公司 一种光学图像与声学图像融合方法及系统
CN112826446A (zh) * 2020-12-30 2021-05-25 上海联影医疗科技股份有限公司 一种医学扫描语音增强方法、装置、系统及存储介质
CN113314138B (zh) * 2021-04-25 2024-03-29 普联国际有限公司 基于麦克风阵列的声源监听分离方法、装置及存储介质
CN113314138A (zh) * 2021-04-25 2021-08-27 普联国际有限公司 基于麦克风阵列的声源监听分离方法、装置及存储介质
WO2023016053A1 (fr) * 2021-08-12 2023-02-16 北京荣耀终端有限公司 Procédé de traitement de signal sonore et dispositif électronique
CN115831141A (zh) * 2023-02-02 2023-03-21 小米汽车科技有限公司 车载语音的降噪方法、装置、车辆及存储介质
CN116165607A (zh) * 2023-02-15 2023-05-26 深圳市拔超科技股份有限公司 采用多个麦克风阵列实现声源精确定位系统及定位方法
CN116165607B (zh) * 2023-02-15 2023-12-19 深圳市拔超科技股份有限公司 采用多个麦克风阵列实现声源精确定位系统及定位方法

Also Published As

Publication number Publication date
CN107534725B (zh) 2020-06-16
CN107534725A (zh) 2018-01-02

Similar Documents

Publication Publication Date Title
WO2016183791A1 (fr) Procédé et dispositif de traitement de signal vocal
CN106328156B (zh) 一种音视频信息融合的麦克风阵列语音增强系统及方法
CN106653041B (zh) 音频信号处理设备、方法和电子设备
CN106782584B (zh) 音频信号处理设备、方法和电子设备
EP2882170B1 (fr) Méthode et appareil de traitement d'information audio
US20150022636A1 (en) Method and system for voice capture using face detection in noisy environments
CN110379439B (zh) 一种音频处理的方法以及相关装置
US20100123785A1 (en) Graphic Control for Directional Audio Input
US9500739B2 (en) Estimating and tracking multiple attributes of multiple objects from multi-sensor data
KR101508092B1 (ko) 화상 회의를 지원하는 방법 및 시스템
TW201120469A (en) Method, computer readable storage medium and system for localizing acoustic source
WO2018049957A1 (fr) Signal audio, procédé de traitement d'image, dispositif, et système
US10964326B2 (en) System and method for audio-visual speech recognition
JP6977448B2 (ja) 機器制御装置、機器制御プログラム、機器制御方法、対話装置、及びコミュニケーションシステム
JP7194897B2 (ja) 信号処理装置及び信号処理方法
CN112351248B (zh) 一种关联图像数据和声音数据的处理方法
US20150281839A1 (en) Background noise cancellation using depth
JP2022062875A (ja) 音信号処理方法および音信号処理装置
KR101678305B1 (ko) 텔레프레즌스를 위한 하이브리드형 3d 마이크로폰 어레이 시스템 및 동작 방법
CN113539288A (zh) 一种语音信号去噪方法及装置
KR101542647B1 (ko) 화자 검출을 이용한 오디오 신호 처리 방법 및 장치
US11172319B2 (en) System and method for volumetric sound generation
JP6881267B2 (ja) 制御装置、変換装置、制御方法、変換方法、およびプログラム
CN114038452A (zh) 一种语音分离方法和设备
US11956606B2 (en) Audio signal processing method and audio signal processing apparatus that process an audio signal based on posture information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15892170

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15892170

Country of ref document: EP

Kind code of ref document: A1