WO2023056905A1 - 声源定位方法、装置及设备 - Google Patents

声源定位方法、装置及设备 Download PDF

Info

Publication number
WO2023056905A1
WO2023056905A1 PCT/CN2022/123555 CN2022123555W WO2023056905A1 WO 2023056905 A1 WO2023056905 A1 WO 2023056905A1 CN 2022123555 W CN2022123555 W CN 2022123555W WO 2023056905 A1 WO2023056905 A1 WO 2023056905A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
array
sound source
steering vector
microphone
Prior art date
Application number
PCT/CN2022/123555
Other languages
English (en)
French (fr)
Inventor
陈维广
黄伟隆
冯津伟
Original Assignee
阿里巴巴达摩院(杭州)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴达摩院(杭州)科技有限公司 filed Critical 阿里巴巴达摩院(杭州)科技有限公司
Priority to EP22877924.5A priority Critical patent/EP4375695A1/en
Publication of WO2023056905A1 publication Critical patent/WO2023056905A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/40Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
    • H04R2201/4012D or 3D arrays of transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2201/00Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
    • H04R2201/40Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
    • H04R2201/403Linear arrays of transducers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • H04R2430/23Direction finding using a sum-delay beam-former
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems

Definitions

  • the present application relates to the technical field of voice processing, in particular to a conference speech display system, a sound source localization method and device, a conference system, and sound pickup equipment.
  • the basic functions of audio and video equipment in conference scenarios include speaker tracking. To realize the speaker tracking function, it is necessary to locate the speaker in real time. Sound Localization is the determination of the spatial position of the sound source, and the accuracy of the sound source localization directly affects the accuracy of speaker tracking.
  • a typical sound source localization method is a microphone-based Direction of Arrival (DOA) method.
  • DOA methods include two types: DOA methods based on omnidirectional microphones and DOA methods based on directional microphone arrays. Because the DOA method based on omnidirectional microphone array is greatly affected by reverberation, and the DOA method based on directional microphone array is more robust, so the DOA method based on directional microphone array has been widely used.
  • the existing method of DOA based on directional microphone array is to use a circular directional microphone array and add a weighting function (Weighting Function) on the basis of the controllable response power (Steered-Response Power, SRP) sound source localization algorithm, using The direction of the sound source is estimated from the signal picked up by the microphone partially facing the sound source.
  • Weighting Function Weighting Function
  • SRP Stepered-Response Power
  • the inventors found that the existing DOA scheme based on the directional microphone array has at least the following problems: since only part of the signal picked up by the microphone facing the sound source is used, and the amplitude information is not fully utilized, the sound Source location accuracy is low.
  • the present application provides a sound source localization method to solve the problem of low sound source localization accuracy existing in the prior art.
  • the application additionally provides a conference speech display system, a sound source localization device, a conference system, and sound pickup equipment.
  • This application provides a conference speech display system, including:
  • the terminal device is used to collect multi-channel voice signals in a meeting space through a directional microphone array; determine a steering vector including phase information and amplitude information according to the array shape information and microphone pointing direction information; and determine a steering vector including phase information and amplitude information according to the steering vector and the voice signal , determining the location information of the conference speaking user; sending the voice signal and the location information to the server; and displaying the conference speech texts of different conference speaking users sent back by the server;
  • the server is configured to convert the speech signal into a conference speech text through a speech recognition algorithm; and determine conference speech texts of different conference speech users according to the location information.
  • the present application also provides a sound source localization method, including:
  • Sound source direction information is determined according to the steering vector and the speech signal.
  • the determining the steering vector including phase information and amplitude information according to the array shape information and the microphone pointing direction information includes:
  • the steering vector is determined.
  • the array includes a linear array
  • the array shape information includes distances between microphones
  • the microphone pointing direction includes pointing to one side perpendicular to the array.
  • the array comprises a circular array
  • the array shape information includes a circular array radius
  • the microphone pointing direction is the direction of the microphone relative to the center of the circular array.
  • the determining sound source direction information according to the steering vector and the voice signal includes:
  • the sound source direction information is determined.
  • the determining the sound source direction information according to the spatial spectrum includes:
  • the direction in which the energy response data is ranked first is the direction of the sound source.
  • the present application also provides a sound source localization device, comprising:
  • the sound collection unit is used to collect multi-channel voice signals through the directional microphone array
  • a steering vector determining unit configured to determine a steering vector including phase information and amplitude information according to the array shape information and the microphone pointing direction information;
  • the sound source direction determining unit is configured to determine sound source direction information according to the steering vector and the speech signal.
  • the present application also provides a conference system, including: a sound source localization device and a speaker tracking device.
  • the application also provides a sound pickup device, including:
  • a processor and a memory the memory is used to store a program for implementing the above method, and the device is powered on and runs the program of the method through the processor.
  • the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, it causes the computer to execute the above-mentioned various methods.
  • the present application also provides a computer program product including instructions, which, when run on a computer, cause the computer to execute the above-mentioned various methods.
  • a multi-channel voice signal is collected through a directional microphone array; a steering vector including phase information and amplitude information is determined according to the array shape information and microphone pointing direction information; according to the steering vector and the obtained
  • the voice signal is used to determine the direction information of the sound source.
  • the phase information and the amplitude information are taken into consideration when determining the steering vector, which can effectively improve the accuracy of sound source localization.
  • the terminal device collects multi-channel voice signals in the conference space through a directional microphone array; according to the array shape information and the microphone pointing direction information, determine the steering vector including phase information and amplitude information; according to the The steering vector and the voice signal are used to determine the location information of the conference speaking user; the voice signal and the location information are sent to the server; the server converts the voice signal into a conference speech text through a voice recognition algorithm; According to the location information, the conference speech texts of different conference speaking users are determined; the terminal device displays the conference speech texts of different conference speaking users.
  • the phase information and the amplitude information are taken into consideration when determining the steering vector, which can effectively improve the positioning accuracy of conference speech users, thereby improving the accuracy of conference speech display.
  • Fig. 1 is a schematic flow chart of an embodiment of the sound source localization method provided by the present application
  • Fig. 2 is a linear array schematic diagram of an embodiment of the sound source localization method provided by the present application
  • FIG. 3 is a schematic flow chart of an embodiment of the sound source localization method provided by the present application.
  • FIG. 4 is a schematic diagram of an application scenario of an embodiment of the conference presentation presentation system provided by this application.
  • a conference presentation display system a sound source localization method and device, a conference system, and sound pickup equipment are provided.
  • Various schemes are described in detail in the following examples one by one.
  • the embodiment of the present application provides a sound source localization method, which can be used in sound pickup equipment, audio and video conference terminals, etc., and the equipment includes a directional microphone array instead of an omnidirectional microphone array.
  • FIG. 1 is a schematic flowchart of an embodiment of the sound source localization method of the present application.
  • the method may include the following steps:
  • Step S101 collecting multi-channel voice signals through a directional microphone array.
  • the directional microphones include, but are not limited to: cardioid, supercardioid, gun, and bidirectional.
  • the microphone array may be a circular array or a linear array, or an array of other geometric shapes, such as a square array, a triangular array, etc., or an array of irregular geometric shapes.
  • Step S103 According to the array shape information and the microphone pointing direction information, determine a steering vector including phase information and amplitude information.
  • the processing flow of the method provided by the embodiment of the present application adopts the same processing flow as the DOA method based on the omnidirectional microphone in the prior art, but the method of determining the steering vector is improved, and this step S103 is the improved The way to determine the steering vector.
  • joint controllable response power and phase transformation (Steered-Response Power-Phase Transform, SRP-PHAT), MUSIC (Multiple Signal Classification, multi-signal classification algorithm) and MVDR (Minimum Variance Distortionless Response, minimum variance no Distortion response) and other DOA positioning methods.
  • SRP-PHAT positioning method scans different angles (0-360 degrees), calculates the energy response of each angle according to the steering vector and the signal received by the microphone array, and then obtains the spatial spectrum; after obtaining the spatial spectrum Finally, the angle of higher energy response in the spatial spectrum can be selected as the result of sound source localization.
  • These DOA methods differ in the way the spatial spectrum is computed from the steering vectors and the multi-channel speech signal.
  • the array shape information is related to the geometric shape of the array. Taking a linear array as an example, the array shape information may include information such as the distance between microphones. Taking a circular array as an example, the array shape information may include information such as the radius of the circular array.
  • the microphone pointing direction information is also related to the geometry of the array. Taking a linear array as an example, the microphones point to one side perpendicular to the array. Taking a circular array as an example, the microphone pointing direction is the direction of the microphone relative to the center of the array.
  • the steering vector when an omnidirectional microphone array is used, the steering vector only represents the phase relationship of the incident signal on each array element in the microphone array.
  • the steering vector when the microphones in the array are directional microphones, the steering vector also considers the directivity of the microphones, that is, the amplitude response in the direction is calculated. That is to say, the steering vector described in the embodiment of the present application includes phase information and amplitude information. Therefore, for signals in different directions, both phase information and amplitude information can be used for positioning.
  • step S103 may include the following sub-steps: determine the phase difference according to the array shape information; determine the amplitude response according to the microphone pointing direction information; determine the steering vector according to the phase difference and the amplitude response.
  • Mic-1, Mic-2, ..., Mic-m represent m directional microphones, and the directional microphone array is a linear array, and the amplitude response can be calculated using the following formula:
  • p( ⁇ m , ⁇ ) represents the amplitude response of the m-th directional microphone
  • represents the incident direction of the signal
  • ⁇ m is the pointing direction of the m-th directional microphone
  • is the first-order directional microphone coefficient
  • the directional microphone array includes m directional microphones, and the distance between adjacent microphones is d, where d is the shape information of the array.
  • v(w) represents the steering vector, which includes two parts of phase difference and amplitude response
  • p( ⁇ i , ⁇ ) represents the amplitude response of the ith directional microphone in direction ⁇
  • e -jwd/ c cos ⁇ represents the phase difference of the directional microphone in the direction ⁇ .
  • the distance difference is 0, and the phase difference is 1; for the second microphone, the distance difference is d, and the phase difference is e -jwd/c cos ⁇ ; and so on, for the mth microphone
  • the distance difference is (m-1)d, and the phase difference is e -jw(m-1)d/c cos ⁇ .
  • the directional microphone array is a circular array
  • the steering vector can use the following formula:
  • represents the incident direction of the signal
  • ⁇ m is the pointing direction of the m-th directional microphone
  • R is the radius of the circular array.
  • Step S105 Determine sound source direction information according to the steering vector and the voice signal.
  • the DOA method can be used to determine sound source direction information according to the steering vector and the voice signal.
  • the directional microphone array may be a circular array or a linear array.
  • step S105 may include the following sub-steps: according to the steering vector and the speech signal, determine the spatial spectrum, wherein the speech signal may be a multi-channel speech signal that is short-time Fourier transformed (Short-Time Fourier Transform, or Short-Term Fourier Transform, referred to as STFT) processed speech signal; according to the spatial spectrum, determine the sound source direction information.
  • STFT Short-Time Fourier Transform
  • the angle of higher energy response in the spatial spectrum can be selected as the sound source localization result. Since DOA methods such as SRP-PHAT, MUSIC, and MVDR belong to relatively mature existing technologies, they will not be described in detail here.
  • the sound source localization method collects multi-channel voice signals through a directional microphone array; according to the array shape information and the microphone pointing direction information, the steering vector including phase information and amplitude information is determined; according to The steering vector and the voice signal determine sound source direction information.
  • the phase information and the amplitude information are taken into consideration when determining the steering vector, which can effectively improve the accuracy of sound source localization.
  • a method for localizing a sound source is provided, and correspondingly, the present application also provides a device for localizing a sound source.
  • the device corresponds to the embodiment of the above-mentioned method. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to the part of the description of the method embodiment.
  • the device embodiments described below are illustrative only.
  • the present application additionally provides a sound source localization device, including:
  • the sound collection unit is used to collect multi-channel voice signals through the directional microphone array
  • a steering vector determining unit configured to determine a steering vector including phase information and amplitude information according to the array shape information and the microphone pointing direction information;
  • the sound source direction determining unit is configured to determine sound source direction information according to the steering vector and the speech signal.
  • the steering vector determination unit includes:
  • the determining steering vector subunit is configured to determine the steering vector according to the phase difference and the amplitude response.
  • the array includes a linear array
  • the array shape information includes distances between microphones
  • the microphone pointing direction includes pointing to one side perpendicular to the array.
  • the array comprises a circular array
  • the array shape information includes a circular array radius
  • the microphone pointing direction is the direction of the microphone relative to the center of the circular array.
  • the sound source direction determining unit includes:
  • determining a spatial spectrum subunit configured to determine a spatial spectrum according to the steering vector and the speech signal
  • the determining sound source direction subunit is configured to determine the sound source direction information according to the spatial spectrum.
  • the determining sound source direction subunit is specifically configured to use the direction in which the energy response data ranks first as the sound source direction.
  • a conference system provided by the present application includes: a sound source localization device and a speaker tracking device.
  • the audio and video conferencing system is two or more individuals or groups in different places, through transmission lines and conference terminals and other equipment, the audio, video and document data are exchanged to achieve real-time and interactive communication, so as to realize simultaneous conferences. system equipment.
  • the sound source localization device corresponds to the first embodiment, so details are not repeated here, please refer to the corresponding part in the first embodiment.
  • the speaker tracking device is used to determine the speaker's activity track information according to the sound source direction information output by the sound source localization device. Since speaker tracking is a relatively mature prior art, it is not repeated here.
  • the conference system includes a sound source localization device and a speaker tracking device, and the sound source localization device is used to collect multi-channel voice signals through a directional microphone array; according to the array shape information and Microphone pointing direction information, determine a steering vector including phase information and amplitude information; determine sound source direction information according to the steering vector and the voice signal; the speaker tracking device is used to output the sound source localization device Sound source direction information to determine the speaker's activity track information.
  • the system considers both phase information and amplitude information when determining the steering vector, so it can effectively improve the accuracy of sound source localization, thereby improving the accuracy of speaker tracking.
  • a conference presentation display system includes: a terminal device and a server.
  • FIG. 4 is a schematic diagram of a scene of the conference speech presentation system of the present application.
  • the terminal device is deployed on the conference site, and the server is deployed on the cloud server.
  • a large screen can also be deployed on the conference site to display the conference speech text and the corresponding speaker in real time for users to watch.
  • the server and terminal equipment can be connected through the network, such as the terminal equipment can be connected to the Internet through GPRS ⁇ 4G ⁇ WIFI.
  • the terminal device is used to collect multi-channel voice signals in the meeting space through a directional microphone array; determine a steering vector including phase information and amplitude information according to the array shape information and microphone pointing direction information;
  • the voice signal is used to determine the location information of the conference speaking user; the voice signal and the location information are sent to the server; the server is used to convert the voice signal into a conference speech text through a voice recognition algorithm; according to The location information determines conference speech texts of different conference speaking users; the terminal device displays conference speech texts and corresponding speaking user information on a large screen.
  • the terminal device collects the multi-channel voice signal of the conference space through the directional microphone array; according to the array shape information and the microphone pointing direction information, it is determined to include phase information and amplitude information.
  • the steering vector according to the steering vector and the voice signal, determine the location information of the conference speaker; send the voice signal and the location information to the server; the server uses the voice recognition algorithm to convert the voice signal converting into conference speech texts; determining the conference speech texts of different conference speaking users according to the location information; and displaying the conference speech texts of different conference speaking users on the terminal device.
  • the phase information and the amplitude information are taken into consideration when determining the steering vector, which can effectively improve the positioning accuracy of conference speech users, thereby improving the accuracy of conference speech display.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer-readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read-only memory (ROM) or flash RAM. Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media.
  • Information storage can be realized by any method or technology.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes non-transitory computer-readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present application may be provided as methods, systems or computer program products. Accordingly, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Otolaryngology (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • General Health & Medical Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Telephone Function (AREA)

Abstract

会议发言展示系统,声源定位方法和装置,会议系统,拾音设备。其中,该方法包括:通过指向性麦克风阵列采集多通道语音信号(S101);根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量(S103);根据导向矢量和语音信号,确定声源方向信息(S105)。采用这种处理方式,使得在确定导向矢量时同时考虑相位信息和振幅信息,可以有效提升声源定位的准确度。

Description

声源定位方法、装置及设备
本申请要求于2021年10月09日提交中国专利局、申请号为202111173456.4、申请名称为“声源定位方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音处理技术领域,具体涉及会议发言展示系统,声源定位方法和装置,会议系统,拾音设备。
背景技术
会议场景中的音视频设备的基本功能包括发言者跟踪功能。要实现发言者跟踪功能,就需要对发言者进行实时定位。声源定位(Sound Localization)是对声源空间位置的判定,声源定位的准确度直接影响着发言者跟踪的准确度。
一种典型的声源定位方法是基于麦克风的波达方向估计(Direction of Arrival,DOA)方法。基于麦克风的DOA方法包括两类:基于全向性麦克风的DOA方法和基于指向性麦克风阵列的DOA方法。由于基于全向性麦克风阵列的DOA方法受混响影响大,而基于指向性麦克风阵列的DOA方法的鲁棒性更高,因此基于指向性麦克风阵列的DOA方法得到了广泛应用。基于指向性麦克风阵列的DOA现有方法是,采用呈圆形的指向性麦克风阵列,在可控响应功率(Steered-Response Power,SRP)声源定位算法基础上增加权重函数(Weighting Function),利用部分面向声源的麦克风拾取的信号估计声源方向。
然而,在实现本发明过程中,发明人发现基于指向性麦克风阵列的DOA现有方案至少存在如下问题:由于只利用了部分面向声源的麦克风拾取的信号,且没有充分利用振幅信息,因此声源定位准确度较低。
发明内容
本申请提供声源定位方法,以解决现有技术存在的声源定位准确度较低的问题。本申请另外提供会议发言展示系统,声源定位装置,会议系统,拾音设备。
本申请提供一种会议发言展示系统,包括:
终端设备,用于通过指向性麦克风阵列采集会议空间的多通道语音信号;根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;根据所述导向矢量和所述语音信号,确定会议发言用户的位置信息;将所述语音信号和所述位置信息发送至服务端;以及,展示服务端回送的不同会议发言用户的会议发言文本;
服务端,用于通过语音识别算法,将所述语音信号转换为会议发言文本;根据所述位置信息,确定不同会议发言用户的会议发言文本。
本申请还提供一种声源定位方法,包括:
通过指向性麦克风阵列采集多通道语音信号;
根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;
根据所述导向矢量和所述语音信号,确定声源方向信息。
可选的,所述根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量,包括:
根据阵列形状信息,确定相位差;
根据麦克风指向方向信息,确定振幅响应;
根据相位差和振幅响应,确定所述导向矢量。
可选的,所述阵列包括线性阵列;
所述阵列形状信息包括麦克风之间的距离;
所述麦克风指向方向包括垂直于阵列指向一侧。
可选的,所述阵列包括圆形阵列;
所述阵列形状信息包括圆形阵列半径;
所述麦克风指向方向为麦克风相对圆形阵列圆心的方向。
可选的,所述根据所述导向矢量和所述语音信号,确定声源方向信息,包括:
根据所述导向矢量和所述语音信号,确定空间谱;
根据所述空间谱,确定所述声源方向信息。
可选的,所述根据所述空间谱,确定所述声源方向信息,包括:
将能量响应数据排在前面的方向作为声源方向。
本申请还提供一种声源定位装置,包括:
声音采集单元,用于通过指向性麦克风阵列采集多通道语音信号;
导向矢量确定单元,用于根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;
声源方向确定单元,用于根据所述导向矢量和所述语音信号,确定声源方向信息。
本申请还提供一种会议系统,包括:声源定位装置和发言者跟踪装置。
本申请还提供一种拾音设备,包括:
指向性麦克风阵列;
处理器和存储器;存储器,用于存储实现上述方法的程序,该设备通电并通过所述处理器运行该方法的程序。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各种方法。
本申请还提供一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各种方法。
与现有技术相比,本申请具有以下优点:
本申请实施例提供的声源定位方法,通过指向性麦克风阵列采集多通道语音信号;根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;根据所述导向矢量和所述语音信号,确定声源方向信息。采用这种处理方式,使得在确定导向矢量时同时考虑相位信息和振幅信息,这样可以有效提升声源定位的准确度。
本申请实施例提供的会议发言展示系统,终端设备通过指向性麦克风阵列采集会 议空间的多通道语音信号;根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;根据所述导向矢量和所述语音信号,确定会议发言用户的位置信息;将所述语音信号和所述位置信息发送至服务端;服务端通过语音识别算法,将所述语音信号转换为会议发言文本;根据所述位置信息,确定不同会议发言用户的会议发言文本;终端设备展示不同会议发言用户的会议发言文本。采用这种处理方式,使得在确定导向矢量时同时考虑相位信息和振幅信息,这样可以有效提升会议发言用户定位的准确度,进而提升会议发言展示的准确度。
附图说明
图1本申请提供的声源定位方法的实施例的流程示意图;
图2本申请提供的声源定位方法的实施例的线性阵列示意图;
图3本申请提供的声源定位方法的实施例的具体流程示意图;
图4本申请提供的会议发言展示系统实施例的应用场景示意图。
具体实施方式
在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请内涵的情况下做类似推广,因此本申请不受下面公开的具体实施的限制。
在本申请中,提供了会议发言展示系统,声源定位方法和装置,会议系统,拾音设备。在下面的实施例中逐一对各种方案进行详细说明。
第一实施例
本申请实施例提供了声源定位方法,可用于拾音设备、音视频会议终端等,所述设备包括指向性麦克风阵列,而非全向性麦克风阵列。
请参看图1,其为本申请的声源定位方法的实施例的流程示意图。在本实施例中,所述方法可包括如下步骤:
步骤S101:通过指向性麦克风阵列采集多通道语音信号。
所述指向性麦克风,包括但不限于:心形,超心形,枪型,双指向式。
所述麦克风阵列,可以是圆形阵列或者线性阵列,也可以是其它几何形状的阵列,如方形阵列、三角形阵列等,还可以是不规则几何形状的阵列。
步骤S103:根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量。
本申请实施例提供的方法的处理流程采用了与现有技术中基于全向性麦克风的DOA方法相同的处理流程,但对其中的导向矢量的确定方式进行了改进,本步骤S103即为改进后的导向矢量确定方式。
具体实施时,可以采用联合可控响应功率和相位变换(Steered-Response Power-Phase Transform,SRP-PHAT)、MUSIC(Multiple Signal Classification,多信号分类算法)以及MVDR(Minimum Variance Distortionless Response,最小方差无失真响应)等DOA定位方法。以SRP-PHAT定位方法为例,该方法通过扫描不同角度(0-360度),根据导向矢量以及麦克风阵列接收到的信号,计算每个角度的能量响应,进而得到空间谱;在获得空间谱后,可选取空间谱中较高能量响应的角度作为声源定位结果。这些DOA方法的不同之处在于,根据导向矢量和多通道语音信号计算空 间谱的方式不同。
所述阵列形状信息,与阵列的几何形状有关。以线性阵列为例,阵列形状信息可包括麦克风之间的距离等信息。以圆形阵列为例,阵列形状信息可包括圆形阵列半径等信息。
所述麦克风指向方向信息,也与阵列的几何形状有关。以线性阵列为例,麦克风指向方向为垂直于阵列指向一侧。以圆形阵列为例,麦克风指向方向为麦克风相对阵列圆心的方向。
在现有技术中,当使用全向麦克风阵列时,导向矢量只表示入射信号在麦克风阵列中每个阵元上的相位关系。在本申请提供的方法中,当阵列中的麦克风为指向性麦克风时,导向矢量还考虑麦克风的指向性,即要计算方向上的振幅响应。也就是说,本申请实施例所述的导向矢量包括了相位信息和振幅信息。因此,对于不同方向的信号,可以同时使用相位信息和振幅信息进行定位。
在本实施例中,步骤S103可包括如下子步骤:根据阵列形状信息,确定相位差;根据麦克风指向方向信息,确定振幅响应;根据相位差和振幅响应,确定所述导向矢量。
如图2所示,在一个示例中,Mic-1,Mic-2,…,Mic-m表示m个指向性麦克风,指向性麦克风阵列为线性阵列,可采用如下公式计算振幅响应:
p(θ m,θ)=α+(1-α)cos(θ m-θ)
在该公式中,p(θ m,θ)表示第m个指向性麦克风的振幅响应,θ表示信号入射方向,θ m是第m个指向性麦克风的指向方向,α是一阶指向性麦克风的系数。
相应的,所述导向矢量的可采用如下公式:
Figure PCTCN2022123555-appb-000001
由该公式可见,指向性麦克风阵列包括m个指向性麦克风,相邻麦克风之间的距离为d,d即为所述阵列形状信息。其中,v(w)表示所述导向矢量,该导向矢量包括相位差和振幅响应两部分;p(θ i,θ)表示第i个指向性麦克风在方向θ上的振幅响应,e -jwd/c cosθ表示指向性麦克风在方向θ上的相位差。对于第一个麦克风而言,距离差为0,相位差为1;对于第二个麦克风而言,距离差为d,相位差为e -jwd/c cosθ;以此类推,对于第m个麦克风而言,距离差为(m-1)d,相位差为e -jw(m-1)d/c cosθ
而现有技术中,全向麦克风指向性麦克风导向矢量的计算可采用如下公式:
Figure PCTCN2022123555-appb-000002
由该公式可见,现有技术在计算导向矢量时并没有考虑振幅信息,因此导向矢量不够准确。
在另一个示例中,指向性麦克风阵列为圆形阵列,导向矢量可采用如下公式:
Figure PCTCN2022123555-appb-000003
在该公式中,θ表示信号入射方向,θ m是第m个指向性麦克风的指向方向,R是圆形阵列的半径。
步骤S105:根据所述导向矢量和所述语音信号,确定声源方向信息。
在确定包括相位信息和振幅信息的导向矢量后,就可以采用DOA方法,根据所述导向矢量和所述语音信号,确定声源方向信息。
如图3所示,指向性麦克风阵列可以是圆形阵列或者线性阵列。具体实施时,步骤S105可包括如下子步骤:根据所述导向矢量和所述语音信号,确定空间谱,其中所述语音信号可以是对多通道语音信号进行短时傅里叶变换(Short-Time Fourier Transform,或Short-Term Fourier Transform,简称STFT)处理后的语音信号;根据所述空间谱,确定所述声源方向信息。具体实施时,在获得空间谱后,可选取空间谱中较高能量响应的角度作为声源定位结果。由于SRP-PHAT、MUSIC以及MVDR等DOA方法属于较为成熟的现有技术,因此此处不再赘述。
从上述实施例可见,本申请实施例提供的声源定位方法,通过指向性麦克风阵列采集多通道语音信号;根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;根据所述导向矢量和所述语音信号,确定声源方向信息。采用这种处理方式,使得在确定导向矢量时同时考虑相位信息和振幅信息,这样可以有效提升声源定位的准确度。
第二实施例
在上述的实施例中,提供了一种声源定位方法,与之相对应的,本申请还提供一种声源定位装置。该装置是与上述方法的实施例相对应。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
本申请另外提供一种声源定位装置,包括:
声音采集单元,用于通过指向性麦克风阵列采集多通道语音信号;
导向矢量确定单元,用于根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;
声源方向确定单元,用于根据所述导向矢量和所述语音信号,确定声源方向信息。
可选的,所述导向矢量确定单元包括:
确定相位差子单元,用于根据阵列形状信息,确定相位差;
确定振幅响应子单元,用于根据麦克风指向方向信息,确定振幅响应;
确定导向矢量子单元,用于根据相位差和振幅响应,确定所述导向矢量。
可选的,所述阵列包括线性阵列;
所述阵列形状信息包括麦克风之间的距离;
所述麦克风指向方向包括垂直于阵列指向一侧。
可选的,所述阵列包括圆形阵列;
所述阵列形状信息包括圆形阵列半径;
所述麦克风指向方向为麦克风相对圆形阵列圆心的方向。
可选的,所述声源方向确定单元包括:
确定空间谱子单元,用于根据所述导向矢量和所述语音信号,确定空间谱;
确定声源方向子单元,用于根据所述空间谱,确定所述声源方向信息。
可选的,所述确定声源方向子单元,具体用于将能量响应数据排在前面的方向作为声源方向。
第三实施例
与上述的声源定位方法相对应,本申请还提供一种会议系统。本实施例与第一实施例内容相同的部分不再赘述,请参见实施例一中的相应部分。本申请提供的一种会议系统包括:声源定位装置和发言者跟踪装置。
音视频会议系统是两个或两个以上不同地方的个人或群体,通过传输线路及会议终端等设备,将声音、影像及文件资料互传,实现即时且互动的沟通,以实现同时进行会议的系统设备。
其中,所述声源定位装置与第一实施例相对应,因此不再赘述,请参见实施例一中的相应部分。所述发言者跟踪装置用于根据所述声源定位装置输出的声源方向信息,确定发言者的活动轨迹信息。由于发言者跟踪属于较为成熟的现有技术,因此此处不再赘述。
从上述实施例可见,本申请实施例提供的会议系统,包括声源定位装置和发言者跟踪装置,所述声源定位装置用于通过指向性麦克风阵列采集多通道语音信号;根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;根据所述导向矢量和所述语音信号,确定声源方向信息;所述发言者跟踪装置用于根据所述声源定位装置输出的声源方向信息,确定发言者的活动轨迹信息。该系统在确定导向矢量时同时考虑相位信息和振幅信息,因此可以有效提升声源定位的准确度,进而提升发言者跟踪的准度度。
第四实施例
与上述的声源定位方法相对应,本申请还提供一种会议发言展示系统。本实施例与第一实施例内容相同的部分不再赘述,请参见实施例一中的相应部分。本申请提供的一种会议系统包括:终端设备和服务端。
请参考图4,其为本申请的会议发言展示系统的场景示意图。在本实施例中,终端设备部署在会议现场,服务端部署在云端服务器上,此外会议现场还可部署大屏幕,用于实时显示会议发言文本及对应的发言用户,供用户观看。服务端、终端设备间可通过网络连接,如终端设备可通过GPRS\4G\WIFI等方式联网。其中,所述终端设备用于通过指向性麦克风阵列采集会议空间的多通道语音信号;根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;根据所述导向矢量和所述语音信号,确定会议发言用户的位置信息;将所述语音信号和所述位置信息发送至服务端;所述服务端用于通过语音识别算法,将所述语音信号转换为会议发言文本;根据所述位置信息,确定不同会议发言用户的会议发言文本;所述终端设备在大屏幕上显示会议发言文本及对应的发言用户信息。
从上述实施例可见,本申请实施例提供的会议发言展示系统,终端设备通过指向性麦克风阵列采集会议空间的多通道语音信号;根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;根据所述导向矢量和所述语音信号, 确定会议发言用户的位置信息;将所述语音信号和所述位置信息发送至服务端;服务端通过语音识别算法,将所述语音信号转换为会议发言文本;根据所述位置信息,确定不同会议发言用户的会议发言文本;终端设备展示不同会议发言用户的会议发言文本。采用这种处理方式,使得在确定导向矢量时同时考虑相位信息和振幅信息,这样可以有效提升会议发言用户定位的准确度,进而提升会议发言展示的准确度。
本申请虽然以较佳实施例公开如上,但其并不是用来限定本申请,任何本领域技术人员在不脱离本申请的精神和范围内,都可以做出可能的变动和修改,因此本申请的保护范围应当以本申请权利要求所界定的范围为准。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
1、计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
2、本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。

Claims (12)

  1. 一种会议发言展示系统,其特征在于,包括:
    终端设备,用于通过指向性麦克风阵列采集会议空间的多通道语音信号;根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;根据所述导向矢量和所述语音信号,确定会议发言用户的位置信息;将所述语音信号和所述位置信息发送至服务端;以及,展示服务端回送的不同会议发言用户的会议发言文本;
    服务端,用于通过语音识别算法,将所述语音信号转换为会议发言文本;根据所述位置信息,确定不同会议发言用户的会议发言文本。
  2. 一种声源定位方法,其特征在于,包括:
    通过指向性麦克风阵列采集多通道语音信号;
    根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;
    根据所述导向矢量和所述语音信号,确定声源方向信息。
  3. 根据权利要求2的方法,其特征在于,
    所述根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量,包括:
    根据阵列形状信息,确定相位差;
    根据麦克风指向方向信息,确定振幅响应;
    根据相位差和振幅响应,确定所述导向矢量。
  4. 根据权利要求2的方法,其特征在于,
    所述阵列包括线性阵列;
    所述阵列形状信息包括麦克风之间的距离;
    麦克风指向方向包括垂直于阵列指向一侧。
  5. 根据权利要求2的方法,其特征在于,
    所述阵列包括圆形阵列;
    所述阵列形状信息包括圆形阵列半径;
    麦克风指向方向为麦克风相对圆形阵列圆心的方向。
  6. 根据权利要求2的方法,其特征在于,
    所述根据所述导向矢量和所述语音信号,确定声源方向信息,包括:
    根据所述导向矢量和所述语音信号,确定空间谱;
    根据所述空间谱,确定所述声源方向信息。
  7. 根据权利要求6的方法,其特征在于,
    所述根据所述空间谱,确定所述声源方向信息,包括:
    将能量响应数据排在前面的方向作为声源方向。
  8. 一种声源定位装置,其特征在于,包括:
    声音采集单元,用于通过指向性麦克风阵列采集多通道语音信号;
    导向矢量确定单元,用于根据阵列形状信息和麦克风指向方向信息,确定包括相位信息和振幅信息的导向矢量;
    声源方向确定单元,用于根据所述导向矢量和所述语音信号,确定声源方向信息。
  9. 一种拾音设备,其特征在于,包括:
    指向性麦克风阵列;
    处理器;以及
    存储器,用于存储实现声源定位方法的程序,该设备通电并通过所述处理器运行该方法的程序。
  10. 一种会议系统,其特征在于,包括:
    声源定位装置和发言者跟踪装置。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如权利要求2-7中任一项所述的方法。
  12. 一种包括指令的计算机程序产品,其特征在于,当所述指令在计算机上运行时,使得计算机执行如权利要求2-7中任一项所述的方法。
PCT/CN2022/123555 2021-10-09 2022-09-30 声源定位方法、装置及设备 WO2023056905A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22877924.5A EP4375695A1 (en) 2021-10-09 2022-09-30 Sound source localization method and apparatus, and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111173456.4 2021-10-09
CN202111173456.4A CN113608167B (zh) 2021-10-09 2021-10-09 声源定位方法、装置及设备

Publications (1)

Publication Number Publication Date
WO2023056905A1 true WO2023056905A1 (zh) 2023-04-13

Family

ID=78310828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/123555 WO2023056905A1 (zh) 2021-10-09 2022-09-30 声源定位方法、装置及设备

Country Status (3)

Country Link
EP (1) EP4375695A1 (zh)
CN (1) CN113608167B (zh)
WO (1) WO2023056905A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113608167B (zh) * 2021-10-09 2022-02-08 阿里巴巴达摩院(杭州)科技有限公司 声源定位方法、装置及设备

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100054085A1 (en) * 2008-08-26 2010-03-04 Nuance Communications, Inc. Method and Device for Locating a Sound Source
JP2010283676A (ja) * 2009-06-05 2010-12-16 Sony Corp 音声検出装置、音声検出方法及び撮像システム
CN107356943A (zh) * 2017-06-01 2017-11-17 西南电子技术研究所(中国电子科技集团公司第十研究所) 数字波束形成和相位拟合方法
CN108375763A (zh) * 2018-01-03 2018-08-07 北京大学 一种应用于多声源环境的分频定位方法
CN108419168A (zh) * 2018-01-19 2018-08-17 广东小天才科技有限公司 拾音设备的指向性拾音方法、装置、拾音设备及存储介质
CN108630222A (zh) * 2017-03-21 2018-10-09 株式会社东芝 信号处理系统、信号处理方法以及信号处理程序
CN111741404A (zh) * 2020-07-24 2020-10-02 支付宝(杭州)信息技术有限公司 拾音设备、拾音系统和声音信号采集的方法
CN112558004A (zh) * 2021-02-22 2021-03-26 北京远鉴信息技术有限公司 一种波束信息波达方向的确定方法、装置、及存储介质
CN112995838A (zh) * 2021-03-01 2021-06-18 支付宝(杭州)信息技术有限公司 拾音设备、拾音系统和音频处理方法
CN113608167A (zh) * 2021-10-09 2021-11-05 阿里巴巴达摩院(杭州)科技有限公司 声源定位方法、装置及设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176167B (zh) * 2013-03-21 2014-11-05 徐华中 一种基于锁相放大器的强干扰下声源定位方法
CN109788382A (zh) * 2019-01-25 2019-05-21 深圳大学 一种分布式麦克风阵列拾音系统及方法
CN110047507B (zh) * 2019-03-01 2021-03-30 北京交通大学 一种声源识别方法及装置
CN110049270B (zh) * 2019-03-12 2023-05-30 平安科技(深圳)有限公司 多人会议语音转写方法、装置、系统、设备及存储介质

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100054085A1 (en) * 2008-08-26 2010-03-04 Nuance Communications, Inc. Method and Device for Locating a Sound Source
JP2010283676A (ja) * 2009-06-05 2010-12-16 Sony Corp 音声検出装置、音声検出方法及び撮像システム
CN108630222A (zh) * 2017-03-21 2018-10-09 株式会社东芝 信号处理系统、信号处理方法以及信号处理程序
CN107356943A (zh) * 2017-06-01 2017-11-17 西南电子技术研究所(中国电子科技集团公司第十研究所) 数字波束形成和相位拟合方法
CN108375763A (zh) * 2018-01-03 2018-08-07 北京大学 一种应用于多声源环境的分频定位方法
CN108419168A (zh) * 2018-01-19 2018-08-17 广东小天才科技有限公司 拾音设备的指向性拾音方法、装置、拾音设备及存储介质
CN111741404A (zh) * 2020-07-24 2020-10-02 支付宝(杭州)信息技术有限公司 拾音设备、拾音系统和声音信号采集的方法
CN112558004A (zh) * 2021-02-22 2021-03-26 北京远鉴信息技术有限公司 一种波束信息波达方向的确定方法、装置、及存储介质
CN112995838A (zh) * 2021-03-01 2021-06-18 支付宝(杭州)信息技术有限公司 拾音设备、拾音系统和音频处理方法
CN113608167A (zh) * 2021-10-09 2021-11-05 阿里巴巴达摩院(杭州)科技有限公司 声源定位方法、装置及设备

Also Published As

Publication number Publication date
CN113608167A (zh) 2021-11-05
CN113608167B (zh) 2022-02-08
EP4375695A1 (en) 2024-05-29

Similar Documents

Publication Publication Date Title
CN111445920B (zh) 一种多声源的语音信号实时分离方法、装置和拾音器
CN104246878B (zh) 音频用户交互辨识和上下文精炼
JP7082126B2 (ja) デバイス内の非対称配列の複数のマイクからの空間メタデータの分析
US11659349B2 (en) Audio distance estimation for spatial audio processing
US11284211B2 (en) Determination of targeted spatial audio parameters and associated spatial audio playback
JP2020500480A5 (zh)
WO2016034454A1 (en) Method and apparatus for enhancing sound sources
CN110875056B (zh) 语音转录设备、系统、方法、及电子设备
CN109964272B (zh) 声场表示的代码化
US11496830B2 (en) Methods and systems for recording mixed audio signal and reproducing directional audio
WO2023056905A1 (zh) 声源定位方法、装置及设备
Bai et al. Localization and separation of acoustic sources by using a 2.5-dimensional circular microphone array
US11889260B2 (en) Determination of sound source direction
CN112071332A (zh) 确定拾音质量的方法及装置
CN115547354A (zh) 波束形成方法、装置及设备
Freiberger et al. Similarity-based sound source localization with a coincident microphone array
Pasha et al. A survey on ad hoc signal processing: Applications, challenges and state-of-the-art techniques
CN112311999A (zh) 智能视频音箱设备及其摄像头视角调整方法
Pasha et al. Distributed microphone arrays, emerging speech and audio signal processing platforms: A review
WO2023065317A1 (zh) 会议终端及回声消除方法
WO2023088156A1 (zh) 一种声速矫正方法以及装置
Cruz et al. Digital MEMS beamforming microphone array for small-scale video conferencing
Nguyen et al. Spatialized audio multiparty teleconferencing with commodity miniature microphone array
Ramnath Stereo Voice Detection and Direction Estimation in Background Music or Noise for Robot Control
CN115515038A (zh) 波束形成方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22877924

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022877924

Country of ref document: EP

Ref document number: 22877924

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022877924

Country of ref document: EP

Effective date: 20240219

NENP Non-entry into the national phase

Ref country code: DE