CN110716178A

CN110716178A - Full sound field oriented sound source positioning method and device

Info

Publication number: CN110716178A
Application number: CN201910874838.6A
Authority: CN
Inventors: 姚康; 李保民; 张燕; 华中南; 范文伟
Original assignee: Suning Intelligent Terminal Co Ltd
Current assignee: Suning Intelligent Terminal Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2020-01-21

Abstract

The embodiment of the invention discloses a sound source positioning method and device for whole sound field orientation, relates to the technical field of acoustics, and can relieve the situation that the sound generated by the subsequent extraction and separation of single auxiliary Mic is suddenly and suddenly changed, so that the sound can be better applied to a large-scale scene. The invention comprises the following steps: starting an auxiliary microphone after a main microphone in the acquisition equipment is started; collecting the sound in the direction of sight of the user through the main microphone, and shielding the sound in the direction of non-sight of the user; extracting and separating the main sound signal to obtain main sound, and acquiring auxiliary sound from the auxiliary sound signal; and playing the main sound and the auxiliary sound through a loudspeaker of the user terminal. The invention is suitable for sound source processing of large-scale scene.

Description

Full sound field oriented sound source positioning method and device

Technical Field

The invention relates to the technical field of acoustics in AR/VR (augmented reality/virtual reality), in particular to a sound source positioning method and device for full sound field orientation.

Background

With the development of AR/VR technology, technology companies in various countries have issued their own AR/V R hardware devices as bamboo shoots in spring rain. The core idea of the AR/VR technology is to help human beings to complete work by loading virtual information in a real environment. Especially with the advent of the 5G era, the eMBB in the 5G standard defined by 3GPP is one of the important scenarios. The realization of the large-flow mobile broadband service such as 3D/ultra-high definition video and the like corresponding to the scene, so the development and the application of the VA/VR technology are accelerated by the arrival of the 5G era.

Under the promotion of 5G of the 'catalyst', the 'virtual reality' is not enough to become a serious game of AR/VR development land. How to provide a more realistic 'real-time viewing' experience becomes a main research objective. Among them, an acoustic scheme synchronized with an image is a main research subject. At present, a video part of real-time watching mainly adopts a sensory camera to capture a 3D picture, and an audio part adopts the output of X frames per second of the sensory camera to achieve the aim of simulating a sound field. In short, a camera is used to record a live audio/video at a certain fixed point, and then a delay is added. However, this solution has the problem that the location of the acquisition is fixed; the user only experiences the match feeling of a certain specific position even if wearing the match, and the match viewing angle is single.

Especially in the large sports such as football, marathon, etc., can't reach the multi-position to watch the match. The reason is that the sound field environment of the competition field is extremely noisy, and the interference of noise of other people to observers cannot be avoided even if the real-time camera is adopted. If the noise reduction algorithm is adopted for processing, the algorithm operation is extremely complex under the extremely complex sound field environment. While the usual noise reduction process filters out a portion of the less audible sounds, the method filters out the effective sounds of the game being played in the field, such as "hit" if the user is at a slightly remote point of the game. Therefore, the current 'real-time watching' scheme is mainly applied to scenes with fixed positions such as a reading scene, a concert scene and the like, and the actual application range is limited.

Disclosure of Invention

The embodiment of the invention provides a sound source positioning method and device for whole sound field orientation, which can relieve the condition that the sound generated by the subsequent extraction and separation of a single auxiliary Mic is suddenly large and suddenly small, and can be better applied to large-scale scene.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

in a first aspect, an embodiment of the present invention provides a sound source localization method for full sound field orientation, including:

starting an auxiliary microphone after a main microphone in the acquisition equipment is started;

collecting the sound in the direction of sight of the user through the main microphone, and shielding the sound in the direction of non-sight of the user;

extracting and separating the main sound signal to obtain main sound, and acquiring auxiliary sound from the auxiliary sound signal;

and playing the main sound and the auxiliary sound through a loudspeaker of the user terminal.

In a second aspect, an embodiment of the present invention provides a sound source localization apparatus for full sound field localization, including:

the microphone management unit is used for starting the auxiliary microphone after the main microphone is started;

the preprocessing unit is used for collecting the sound in the direction of sight of the user through the main microphone and shielding the sound in the direction of non-sight of the user;

the processing unit is used for extracting and separating the main sound signal to obtain main sound and acquiring auxiliary sound from the auxiliary sound signal;

and the transmission unit is used for playing the main sound and the auxiliary sound through a loudspeaker of the user terminal.

The embodiment of the invention mainly designs a brand-new method in the aspect of audio spatialization (sound source positioning). Based on the structure that the original MIC is redefined to be a sector which can rotate at the same angle along with the head of a user on the AR/VR, the problem that sound enters the channel is solved and other sound sources in non-visual positions are shielded by automatically rotating in a plane and increasing and widening the aperture and the length of the sound pickup hole, and the like, and functions such as functions of scattering, frequency and incidence direction (details are shown below) of the inner diameter and the length of the sound pickup hole are remodeled. And adds other secondary MICs to match it. The situation that the sound generated by the single auxiliary Mic during the subsequent extraction and separation is suddenly large and suddenly small can be avoided. The user is close to the scene to see the match to the position of head is rotated, the sound effect of the position that can real-time perception is located, increases the degree of depth and immerses the sense, lets the user reach splendid experience.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of a possible hardware environment provided by an embodiment of the invention;

FIG. 2 is a schematic flow chart of a method provided by an embodiment of the present invention;

FIGS. 3-11 are schematic diagrams of embodiments provided by embodiments of the present invention;

fig. 12 and 13 are schematic diagrams of the device structure according to the embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The method flow in this embodiment may be specifically implemented in a system as shown in fig. 1, where the system includes a sound source collecting device and a user terminal. Wherein the sound source collecting device may be a head-shoulder simulator having at least 3 microphones for collecting sound signals at different points of a site. The user terminal may be understood as a currently common AR/VR device, such as a head-mounted VR device, a smartphone with a speaker, and the like.

The embodiment of the invention provides a sound source positioning method for full sound field orientation, as shown in fig. 2, comprising the following steps:

and S101, after a main microphone in the acquisition equipment is started, starting an auxiliary microphone.

And S102, collecting the sound in the direction of sight of the user through the main microphone, and shielding the sound in the direction of non-sight of the user.

S103, extracting and separating the main sound signal to obtain main sound, and acquiring auxiliary sound from the auxiliary sound signal.

And S104, playing the main sound and the auxiliary sound through a loudspeaker of the user terminal.

In the implementation process of this embodiment, a B & K "head and torso simulator" (HATS — head and shoulder simulator for short, the experiments described in this embodiment are all based on the verification performed by this head and shoulder simulator) meeting the international standard may be adopted. The problem of positioning the sound source oriented by the experimental total free field is mainly aimed at solving the problem that the sound effect at any position of a user watching seat in a court can be experienced along with the changed position through the rotation of the head of the user. And a method for answering the viewpoint sound source and shielding other non-point-of-view when looking at a certain point is realized.

The specific principle is as follows: the sound wave is propagated in free space, i.e. in a wireless ideal medium, and the boundary is infinite, so that the sound field can be regarded as a spherical body for the solution of sound field orientation.

Establishing a coordinate system with the spherical center as an origin, including: x-r cosA (a is the angle between the radius passing through the target point and the x axis), y-r cosB (B is the angle between the radius passing through the target point and the y axis), and z-r cosC (C is the angle between the radius passing through the target point and the z axis). Taking a free sound field as an example, as shown in fig. 3, the free sound field is a spherical plane, where 'O' is a position of a person, and 'A, B, C' is three different sound sources, and the person makes a visual angle motion on a horizontal plane, the problem to be solved by the solution of this embodiment is how to determine the 'A, B, C' sound source, so as to achieve an experience of an auditory effect on different positions of 'A, B, C'.

Specifically, the collecting, by the main microphone, the sound in the viewing direction of the user and shielding the sound in the non-viewing direction of the user includes:

the main microphone rotates with the acquisition equipment in a sector within a preset angle range, and main sound signals acquired by the main microphone are acquired in the sector rotating process.

The main microphone and the at least 2 auxiliary microphones are installed in the acquisition equipment, the auxiliary microphones are respectively installed at the positions of the left ear and the right ear, and the auxiliary microphones face the outside and are respectively used for acquiring the environment sounds on the left side and the right side. And in the process of the auxiliary sound signals collected by the at least 2 auxiliary microphones, the auxiliary sound signals do not rotate along with the fan surface of the main microphone.

Specifically, for example, the sound processing method implemented on the main microphone is as follows:

the primary microphone is designed as a protruding surface, using the principle of convex reflection, since almost all convexes have a scattering effect, they are important reflecting surfaces as diffusers, since for convexes r is always negative, as shown in fig. 4.

Continue to bring negative values into the concave equation:

then b will also be a negative value. In summary, Q1 is the position of the user, S is the position of the primary microphone, the sound wave transmitted by the Q2 sound source enters the channel of the S primary microphone, and other waveforms are reflected on a spherical surface, such as A, B points. On the contrary, if the Q2 is not located at the position of the upper diagram, but at other positions, the waveform transmission thereof must be scattered, that is, can be shielded.

The main microphone is designed as a rotatable device for mainly collecting the video source, and further carries out aperture widening processing on the sound pickup hole. Where the aperture is represented as an electro-acoustic transducer (microphone) that converts an acoustic signal into an electrical signal. Such as: the microphone receiving aperture of volume V, considering a receiving aperture of volume V, x (t, r) represents the value of the signal at time t and space r. The impulse response of an infinitely small volume dV at r of the receive aperture is a (t, r), then the received signal can be represented by convolution: x is the number of_R(t, r) ═ x (i, r) a (t-i, r) di (formula 1); or in the frequency domain: x_R(f, r) ═ X (f, r) a (f, r) (equation 2), where a (f, r) is the aperture function, from which the corresponding function reflected by the aperture at different spatial sizes can be known.

The solid angle spread of the receiving aperture is different for signals transmitted in different directions, as shown in fig. 5, which shows that a linear aperture in a one-dimensional space receives signals of plane waves. The response of the aperture is a function of frequency and incident direction, and it can be deduced by solving the wave equation that the response of the aperture and the aperture function have Fourier transform relationship.

If the scene of the competition field is taken as a far field condition, the representation mode is as follows:

wherein:

fr { } is a three-bit fourier transform.

Is the spatial location of a point on the aperture.

Is the direction vector of the wave, where the numerical relationship of the angle parameters theta and phi is as shown in figure 6. And the coordinates in the figure can be simplified to a one-dimensional linear aperture along the X-axis with an aperture length L, as shown in fig. 7.

In this case of fig. 7:

the aperture response is simplified to:wherein

If expressed in terms of θ and φ, then there are:

the above algorithm is obtained under the assumption of plane waves, and is therefore only applicable to far-field conditions such as a ball game. For a linear aperture, the far field condition can be considered to be satisfied when the condition of equation 10 should be satisfied.

Considering a particular case, if the aperture function does not vary with frequency position for a linear aperture, then the aperture function can be expressed as: a. the_R(x_α)＝rect(x_α/L) (formula 11)

Wherein the content of the first and second substances,

the obtained aperture function is then: d_R(f,a_x)＝F_r{rect(x_α/L) } (equation 13)

The fourier transform results are: d_R(f,a_x)＝Lsinc(α_x/L) } (equation 14)

Wherein the content of the first and second substances,

in summary, the graph obtained by calculating the uniform aperture function and the corresponding directional aperture function is shown in fig. 8, and it can be seen from fig. 8 that the zero point of the directional aperture function is distributed at α_xM λ/L, where m is an integer. The directional range can be derived, the range is: - λ/L ≦ α_xThe region between the regions ≦ λ/L is called the main lobe, and its extent is referred to as the beam width. Thus, the beam width of the linear aperture is 2 λ/L, which can also be written as 2 c/fL. Therefore, the beam width is inversely proportional to f × L. Thus, for a fixed aperture length, the higher the frequency, the narrower the beamwidth, as shown in fig. 9.

It follows that for a fixed aperture length, the higher the frequency, the narrower the beamwidth. Because of the fact thatThe normalization can reflect the relative response of sound waves incident at different angles, and therefore the normalized aperture response of the aperture is also considered. The sinc function satisfies-1 ≦ sinc (x ≦ 1), and the maximum possible value of the directive Pattern is D_max＝L，

The normalized caliber response length is:

in the horizontal direction can be expressed as:

from equation 17, the expression of the polar coordinate in the horizontal direction can be obtained, and then the polar coordinate under the condition of L/λ is four different values of L/λ, i.e., 0.5, 1, 2, and 4, as shown in fig. 10 below.

The linear aperture characteristic can be obtained by the above formula 1 to formula 17. The linear aperture characteristic described by the implementation design algorithm of the main microphone is combined with the original linear aperture characteristic formula in the horizontal direction:

it follows that the aperture characteristics for an entire column of microphones, whether linear or equi-spaced, depend on the following conditions: the number of sensors N, the spacing between sensors d and the frequency of the sound waves f. Since discrete sensor arrays are an approximation of continuous apertures. It should be noted that the effective length of the sensor array is defined as the length of the corresponding continuous aperture, L ═ Nd, and the actual length of the sensor array is d (N-1). Therefore, when the main microphone pick-up hole of the device used by the user is long enough and the fan rotates, the sound source in the viewing direction can be identified more accurately through the functions of scattering, frequency and incidence direction described in fig. 3 and 4, and the sound in the non-viewing direction of the user can be shielded.

In this embodiment, the method further includes: after receiving the enabling signal, select the gear of main microphone to the main microphone of the gear of selection is started, wherein, divide into 4 at least gears according to microphone pick-up hole diameter, includes: a (diameter of a pickup hole is 24mm), B (diameter of the pickup hole is 12mm), C (diameter of the pickup hole is 6mm) and D (diameter of the pickup hole is 3mm) four gear specifications.

Specifically, the embodiment is applied to the environment of the playing field, generally, the lowest to highest sound pressure level of the playing field is generally between 60dB and 110dB, the diameter of the traditional microphone is generally four, namely 24mm, 12mm, 6mm and 3mm, the frequency response of the traditional microphone can reach 20Hz to 40kHz, the traditional microphone can be approximately regarded as omni-directional, and the measurement range of the sound pressure level is 30dB to 140 dB; in summary, the problem to be solved is located at an ambient sound pressure level within the range of the measured sound pressure level of the microphone.

On sound source collection equipment, the sound source of the main microphone with four gears can be collected simultaneously, a user terminal can receive a gear shifting instruction input by a user, and the gears of the main microphone are switched according to the gear shifting instruction.

For example: the operation process of the user can be as shown in fig. 11, including:

【001】 The user starts using the device.

【002】 The integrated main microphone starts to move along with the head-shoulder simulator and records sound.

【003】 The double auxiliary microphones positioned at the two ears of the head and shoulder simulator record; and transmitted to the user's headphones via the electro-acoustic signal.

【004】 After [ 002 ] recording by integrating the main microphone, the user selects a D default gear; the electro-acoustic signal is passed to the user's headphones.

【005】 After [ 002 ] recording with the integrated main microphone, the user does not select the D default gear.

【006】 And when the user does not select the default gear, the other channels continue to record the sound and correspond to the corresponding channels.

【007】 After the user selects other gears, the system is automatically switched to the passage of the corresponding gear, and is connected with the earphone passage for transmission. Other non-user selected gears close the path to the headset.

Correspondingly, in the process of real-time acquisition on the sound source acquisition equipment, the flow of [ 101 ] to [ 107 ] can be realized:

【101】 And when the user uses the equipment, starting the real-time competition watching function.

【102】 Starting an integrated main microphone of the functional equipment and starting working; the device defaults A, B, C, D to any one of the gears (default to D gear, based on the general principle of larger computing and playground area as used herein). Because the main microphone has four gear specifications, the microphones with the four specifications record at the same time, and correspond to four audio channels. The diameters of microphone pickup holes of the current international standards are generally four types, namely 24mm, 12mm, 6mm and 3mm, so that the main microphone rotating along with the human head is defined into four specifications, and the four specifications are defined into four gear specifications, namely A (24mm), B (12mm), C (6mm) and D (3mm), and are integrated.

【103】 After the function is started, the two full-field auxiliary microphones of the equipment start to work, the positions of the microphones are located at the left ear and the right ear of the head-shoulder simulator, the microphones face the outside, and the ambient sounds on the left side and the right side of the head-shoulder simulator are collected in real time.

【104】 When the head and shoulder simulator faces a certain area, the integrated main microphone changes along with the change of the angle of the head and shoulder simulator.

【105】 At the moment, the integrated microphone converts the electroacoustic signal and transmits the electroacoustic signal to the head-shoulder simulator through the earphone.

【106】 Similarly, the two auxiliary microphones convert the sound from the left side and the right side of the head through the electroacoustic signal and transmit the converted sound to the head-shoulder simulator through the earphone.

【107】 Knowing that the default gear is D, when the head and shoulder simulator feels that the feeling at the moment cannot meet the requirement of on-site match watching, the gear can be switched through the adjusting button, and a corresponding channel is opened.

There are some problems with the existing solutions in the industry today: the overall solution idea of 'real-time viewing' is to put the 'virtual reality' + 'on-site perception' in front of the user, and through two wireless earplugs and application, and sound effects including stereo sound and sound field differentiation, etc., the audio effect which gives more strength than usual can be heard when viewing the ball game. The video part of the technology mainly adopts a series of sensitive elements including a 1080p high-definition camera, an infrared camera and an infrared laser projector to capture a 3D picture. The audio part is output by a real-sense camera with X frames per second, so that the purpose of simulating a simulated sound field is achieved. In short, a camera is used to record a live audio/video at a certain fixed point, and then a delay is added.

The disadvantage of using this technique is that the acquisition site is fixed from the point of view of the implementation and the objectives to be achieved; the user only experiences the match feeling of a certain specific position even if wearing the match, and the match viewing angle is single. Especially in the large sports such as football, marathon, etc., can't reach the multi-position to watch the match. Secondly, for the sports with active atmosphere such as rugby, basketball, etc., the sound field environment of the competition field is extremely noisy, and the interference of the noise of other people to the competition cannot be avoided even if a real-time camera is adopted. If the noise reduction algorithm is adopted for processing, the algorithm operation is extremely complex under the extremely complex sound field environment. It is known that noise reduction filters out a portion of the less audible sounds, and if the user is at a slightly remote point of view at the time, the method filters out the effective sounds of the game being played in the field, such as "hit balls".

This embodiment designs a new approach in audio spatialization (sound source localization). Based on the structure that the original MIC is redefined to be a sector which can rotate at the same angle along with the head of a user on the AR/VR, the problem that sound enters the channel is solved and other sound sources in non-visual positions are shielded by automatically rotating in a plane and increasing and widening the aperture and the length of the sound pickup hole, and the like, and functions such as functions of scattering, frequency and incidence direction (details are shown below) of the inner diameter and the length of the sound pickup hole are remodeled. And adds other secondary MICs to match it. The original MIC is designed into pickup equipment capable of rotating in a sector, and is used for collecting sound in the direction of sight of a user and shielding other non-sight-looking sounds. The auxiliary MICs are 2 omnidirectional auxiliary Mics, and the sound beside the position where the user is located is collected in real time and does not rotate along with the main MIC. After a user wears the device, when the target orientation of the user is the point A of the competition field, the sound of the direction of sight of the user in the sector interval recorded by the point A is extracted and separated through the sound in the sector interval recorded by the primary MIC, 2 omnidirectional auxiliary Mics are added to record the sound around the user, and the situation that the sound generated by the single auxiliary Mic in the subsequent extraction and separation process is small and large can be avoided. The user is close to the scene to see the match to the position of head is rotated, the sound effect of the position that can real-time perception is located, increases the degree of depth and immerses the sense, lets the user reach splendid experience.

An embodiment of the present invention further provides a sound source positioning device for full sound field orientation, as shown in fig. 12, including:

and the microphone management unit is used for starting the auxiliary microphone after the main microphone is started.

And the preprocessing unit is used for collecting the sound in the direction of sight of the user through the main microphone and shielding the sound in the direction of non-sight of the user.

And the processing unit is used for extracting and separating the main sound signal to obtain main sound and acquiring auxiliary sound from the auxiliary sound signal.

Specifically, a main microphone and at least 2 auxiliary microphones are installed in the sound source positioning device, at least one auxiliary microphone is installed at the positions of the left ear and the right ear respectively, and the auxiliary microphones face to the outside and are used for collecting the ambient sounds on the left side and the right side respectively.

Wherein, in the process of the auxiliary sound signals collected by the at least 2 auxiliary microphones, the auxiliary sound signals do not rotate along with the fan surface of the main microphone.

The preprocessing unit is specifically configured to perform sector rotation of the main microphone along with the acquisition device within a preset angle range, and acquire a main sound signal acquired by the main microphone in the sector rotation process.

Further, as shown in fig. 13, the method further includes:

and the gear switching unit is used for selecting the gear of the main microphone and starting the main microphone with the selected gear after receiving the starting signal.

Wherein, divide into 4 at least gears according to microphone pickup hole diameter, include: a (diameter of a pickup hole is 24mm), B (diameter of the pickup hole is 12mm), C (diameter of the pickup hole is 6mm) and D (diameter of the pickup hole is 3mm) four gear specifications.

And the receiving unit is used for receiving a gear shifting instruction input by a user to the user terminal and switching the gear of the main microphone according to the gear shifting instruction.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for positioning a sound source with full sound field orientation, comprising:

2. The method of claim 1, wherein the capturing sounds in a user's apparent direction and masking sounds in a user's non-apparent direction by the primary microphone comprises:

3. The method of claim 2, wherein one primary microphone and at least 2 secondary microphones are installed in the collection device, at least one secondary microphone is installed at the positions of the left ear and the right ear, respectively, and the secondary microphones face to the outside for collecting the ambient sounds of the left side and the right side, respectively;

4. The method of claim 1, further comprising:

after receiving the enabling signal, select the gear of main microphone to the main microphone of the gear of selection is started, wherein, divide into 4 at least gears according to microphone pick-up hole diameter, includes: a (diameter of a pickup hole is 24mm), B (diameter of the pickup hole is 12mm), C (diameter of the pickup hole is 6mm) and D (diameter of the pickup hole is 3mm) four gear specifications.

5. The method of claim 1, further comprising:

and receiving a gear shifting instruction input by a user into the user terminal, and switching the gear of the main microphone according to the gear shifting instruction.

6. An omnidirectional-sound-field-oriented sound source localization apparatus, characterized by comprising:

7. The apparatus according to claim 6, wherein one primary microphone and at least 2 secondary microphones are installed in the sound source localization apparatus, at least one secondary microphone is installed at the positions of the left ear and the right ear, respectively, and the secondary microphones are facing to the outside for collecting the ambient sounds of the left side and the right side, respectively;

8. The device according to claim 7, wherein the preprocessing unit is specifically configured to perform a fan rotation with the collecting device within a preset angle range, and during the fan rotation, obtain a main sound signal collected by the main microphone.

9. The apparatus of claim 6, further comprising:

the gear switching unit is used for selecting the gear of the main microphone after receiving the starting signal and starting the main microphone with the selected gear;

10. The apparatus of claim 9, further comprising: