Disclosure of Invention
The embodiment of the application provides a sound source positioning method, a sound source positioning device, a storage medium and computer equipment, which can be used for quickly and timely positioning the position and the position of a sound source, improving the positioning resolution and the positioning precision of the sound source and improving the accuracy of a positioning result. The technical scheme is as follows:
In a first aspect, an embodiment of the present application provides a sound source localization method, where the method includes:
collecting n paths of audio data of characteristic sound through a pickup array; the pickup array comprises 1 central array element and n-1 adjacent array elements, wherein n is an integer greater than 1;
recognizing the sound type of the characteristic sound according to a pre-trained characteristic sound judgment model;
if the sound type is a preset type, calculating the arrival time difference between the audio data acquired by the n-1 adjacent array elements and the audio data acquired by the central array element;
and calculating the azimuth angle of the characteristic sound according to the arrival time difference.
With reference to the first aspect, in certain implementations of the first aspect, the sound pickup array includes 1 central array element and 6 adjacent array elements distributed at equal intervals on the circumference, where the central array element is located on the center of the circle.
With reference to the first aspect, in certain implementations of the first aspect, the identifying, according to a pre-trained feature sound determination model, a sound type of the feature sound includes:
loading a mean matrix, a covariance matrix and a weight of the Gaussian mixture model; wherein, the mean matrix, the covariance matrix and the weight are obtained by simulation;
Calculating a feature matrix of each frame of data in the audio data;
calculating the probability value of each frame of data according to the parameters of the Gaussian mixture model and the characteristic matrix;
judging and determining whether the sound type of the characteristic sound belongs to any one of preset sound type libraries or not according to the probability value by a threshold value;
if so, identifying the sound type of the characteristic sound according to the probability value and the characteristic matrix.
With reference to the first aspect, in certain implementations of the first aspect, the method further includes:
and if the sound type of the characteristic sound does not belong to any one of preset sound type libraries, adding the sound type of the characteristic sound into the preset sound type libraries.
With reference to the first aspect, in certain implementations of the first aspect, the calculating a time difference of arrival between the audio data acquired by the n-1 adjacent array elements and the audio data acquired by the central array element includes:
performing analog-to-digital conversion on the audio data;
performing framing processing and interpolation processing on the converted audio data;
performing fast Fourier transform processing on the processed audio data;
performing cross-power spectrum analysis on the converted audio data;
Weighting the analyzed data;
performing fast Fourier inverse transformation on the processed data;
carrying out peak value detection on the transformed audio data to obtain a maximum value point;
and calculating the arrival time difference between the audio data collected by the n-1 adjacent array elements and the audio data collected by the central array element according to the maximum value point.
With reference to the first aspect, in certain implementations of the first aspect, the interpolation processing includes: newton interpolation, hermitian interpolation, lagrange interpolation, spline interpolation or linear interpolation.
With reference to the first aspect, in certain implementations of the first aspect, the calculating an azimuth angle of the characteristic sound according to the time difference of arrival includes:
the azimuth angle is calculated according to the following formula:
(ii) a Wherein, mu represents the number of sampling points of the audio data corresponding to the arrival time difference, Ts represents the sampling period, c represents the propagation speed of the characteristic sound, and d represents the distance between the central array element and the adjacent array element.
In a second aspect, an embodiment of the present application provides a sound source positioning device, including:
the acquisition unit is used for acquiring n paths of audio data through the pickup array; the pickup array comprises 1 central array element and n-1 adjacent array elements, wherein n is an integer greater than 1;
The recognition unit is used for recognizing the sound type of the audio data according to a pre-trained characteristic sound judgment model;
the time delay calculation unit is used for calculating the arrival time difference between the audio data acquired by the n-1 adjacent array elements and the audio data acquired by the central array element if the sound type is a preset type;
and the angle calculation unit is used for calculating the azimuth angle of the characteristic sound according to the arrival time difference.
In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides a computer device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The beneficial effects brought by the technical scheme provided by some embodiments of the application at least comprise:
in the embodiment, the azimuth angle of the sound source is determined by adopting a calculation method based on the arrival time difference, so that the calculation amount is greatly reduced, and the characteristic sound can be positioned in real time. Moreover, time domain interpolation is carried out on the audio data of the characteristic sound, so that the sampling rate of the audio data is improved, more sampling interval points are obtained, and the calculation precision of the arrival time difference of the characteristic sound in the subsequent calculation is improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It should be noted that the sound source localization method provided by the present application is generally executed by a computer device, and accordingly, the sound source localization apparatus is generally disposed in the computer device.
Computer devices including, but not limited to, smart phones, tablets, laptop portable computers, desktop computers, and the like.
The computer equipment can also be provided with a display device and a camera, the display of the display device can be various devices capable of realizing the display function, and the camera is used for collecting video streams; for example: the display device may be a cathode ray tube (CR) display, a light-emitting diode (LED) display, an electronic ink screen, a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), or the like. The user can utilize the display device on the computer device to view the displayed information such as characters, pictures, videos and the like.
The sound source localization method provided by the embodiment of the present application will be described in detail below with reference to fig. 1. The sound source positioning device in the embodiment of the present application may be a computer device.
Referring to fig. 1, a schematic flow chart of a sound source localization method according to an embodiment of the present application is provided. As shown in fig. 1, the method of the embodiment of the present application may include the steps of:
S101, collecting n paths of audio data of characteristic sound through a sound collecting array.
The characteristic sound is sound emitted by a sound source, the characteristic sound is collected by the pickup array after being propagated through air to obtain n paths of audio data, the pickup array comprises a plurality of microphones, the pickup array is used for collecting the n paths of audio data, the audio data are analog signals, the pickup array comprises 1 central array element and n-1 adjacent array elements, and n is an integer greater than 1.
The pickup array of the present application may also be referred to as an acoustic array, a sensor array, a microphone array, or other audio acquisition array. The pickup array includes n array elements (that is, audio acquisition unit), and the position distribution of n array elements can be according to actual demand and decide.
In a possible embodiment, referring to fig. 2, the pick-up array comprises 1 central array element and 6 adjacent array elements distributed on the circumference at equal intervals (i.e. circumferential array elements), so that it can be seen that the distances between the 6 circumferential array elements and the central array element located at the center of the circle are equal, and the distances are the radius of the circle. The sound source S emits characteristic sound, the distance between the central array element Mic1 and the sound source S is R1, the distance between the circumferential array element Mic3 and the sound source S is R3, the distance between the circumferential array element Mic2 and the sound source S is R2, and the azimuth angle is theta.
The layout of the sound pickup array is not limited to the circumferential distribution in fig. 2, and may be a rectangular distribution, a square distribution, or other distribution, which is not limited in the present application.
In this embodiment, the characteristic sound may be a whistle sound, a tracked vehicle traveling sound, a modified vehicle sound, an explosion sound, a collision sound, an alarm whistle sound, a gunshot sound, and the like, which is not limited in this application.
In this embodiment, the distance between the circumferential array element and the central array element in the pickup array is 0.043 m, that is, the diameter of the pickup array is only 0.086 m, that is, 8.6 cm, so that the miniaturization of the positioning device is realized, and the portable microphone is convenient to carry.
And S102, recognizing the voice type of the characteristic voice according to the pre-trained characteristic voice judgment model.
After n paths of audio data of the characteristic sound are acquired through the pickup array, whether the currently acquired audio data are of a sound type needing to be identified is judged firstly, so that the type or the category of the characteristic sound is determined.
In the embodiment of the application, which type of characteristic sound the audio data is can be judged through a pre-trained characteristic sound judgment model. In addition, the sound type of the characteristic sound may be identified by other methods, which are not limited in the present application.
For example, three parameters, namely a mean matrix, a covariance matrix and a weight, of a Gaussian Mixture Model (GMM) are loaded in advance, then a feature matrix of each frame of data is calculated, a probability value of each frame of data is calculated according to the three parameters of the gaussian mixture model loaded in advance and the calculated feature matrix, and finally a threshold value is determined for the sound type (for example, gunshot, tracked vehicle running sound, car whistling sound, etc.) of the current feature sound according to the obtained probability value. When the judgment result is true, identifying the category of the characteristic sound by combining the probability value and the characteristic matrix; and if the judgment result is false, re-entering the step of calculating the feature matrix of each frame of data next time.
It should be understood that, by using the gaussian mixture model to detect the sound type of the characteristic sound, the power consumption can be effectively reduced for a miniaturized positioning device using the characteristic sound source positioning method.
Optionally, if the sound type of the collected characteristic sound is not matched with the sound type of the characteristic sound to be identified through judgment and detection, adding a new sound type into the sound type library, thereby updating the sound type library.
Here, the sound type library for updating the feature sound may have an active manner as well as a passive manner. The present application does not limit the manner in which the sound type library of the feature sound is updated.
Wherein, as an example, the active manner may be that the computer code actively updates the sound type library of the feature sound; the passive mode can be controlled by a user or an operator through a local human-computer interaction device.
Alternatively, the user or the operator may confirm whether or not to update the sound type library of the characteristic sound to the characteristic sound source localization apparatus through a device such as a keyboard, a mouse, a control panel (button), a touch panel, or the like.
For example, the user or operator may indicate a need to update the sound type library of the feature sound by mouse clicking on a corresponding icon on the display screen.
For another example, the user or operator may indicate that an update to the sound type library of the feature sound is required by pressing a corresponding button.
For another example, the user or the operator may indicate that the sound type library of the characteristic sound needs to be updated by clicking a corresponding icon on the touch screen with a finger.
Alternatively, the sound source localization apparatus may acquire the indication information from the remote device.
For example, the sound source positioning device may be in communication connection (e.g., wireless connection) with the remote device, and the sound source positioning device may receive a message sent by the remote device, parse and read the message, where the message carries indication information of whether to update the sound type library of the feature sound.
Optionally, the remote device may be a control host or a cloud server, and may also be an intelligent electronic product such as a mobile phone and a tablet computer.
Optionally, when the detected audio data is not any one of the characteristic sound types that need to be identified, the audio data is re-acquired and the next characteristic sound type detection is performed.
S103, if the sound type is a preset type, calculating the arrival time difference between the audio data acquired by the n-1 adjacent array elements and the audio data acquired by the central array element.
After detecting that the audio data of the characteristic sound is the characteristic sound of a certain type, the audio data can be preprocessed in a frame dividing mode.
For example, 256 samples are specified for a frame, which is approximately 5 milliseconds. In order to maintain continuity, there is an overlapping part between each frame, i.e. frame shift, and framing is achieved by adding a hamming window.
In the application, after the audio data of the characteristic sound is subjected to frame-by-frame windowing preprocessing, interpolation processing is continued, and the arrival time difference of the characteristic sound is calculated.
The time domain interpolation is carried out on the audio data, so that the sampling rate of the audio data can be improved, more sampling interval points are obtained, the high-precision requirement of calculation is met, the calculation precision of the arrival time difference of the characteristic sound in the subsequent calculation can be further improved, and the error of the pickup array spatial resolution angle when the positioning device positions the sound source is reduced.
Optionally, the interpolation processing adopted in the embodiment of the present application may include any one of the following methods: newton interpolation, hermitian interpolation, lagrange interpolation, spline interpolation, linear interpolation, etc.
In the embodiment of the application, the arrival time difference of the characteristic sound is calculated, and a plurality of arrival time differences between the characteristic sound arriving at each circular array element and the central array element can be calculated according to the distance between the circular array elements and the central array element in the pickup array.
For example, as shown in fig. 2, a plurality of arrival time differences between the feature sound S and the central array element 1 (mic 1) are calculated, respectively, for the arrival time differences at the circumferential array element 2 (mic 2), the circumferential array element 3 (mic 3), the circumferential array element 4 (mic 4), the circumferential array element 5 (mic 5), the circumferential array element 6 (mic 6), the circumferential array element 7 (mic 7).
In the application, because the positions of the characteristic sounds and the array elements are different, a certain time difference exists when the array elements receive the audio data of the characteristic sounds, and the positioning accuracy of the characteristic sounds is directly influenced by the estimation accuracy of the arrival time difference.
In the present application, in order to optimize the performance of the estimation of the time difference of arrival, the time difference of arrival may be estimated by a generalized cross-correlation phase transformation (GCC-PHAT) method. The essence of the generalized cross-correlation (GCC) is to perform weighting processing on the power spectrum of the received sound signal, so as to highlight the frequency components with high signal-to-noise ratio, further weaken the noise, and achieve the effect of sharpening the peak.
Specifically, after analog audio-to-digital audio conversion, frame windowing and interpolation processing are performed on the audio data collected by each array element in the pickup array, Fast Fourier Transform (FFT) is performed on the audio data, cross power spectrum analysis is performed on the obtained result, weighting processing is performed on the audio data, Inverse Fast Fourier Transform (IFFT) is performed on the audio data, a maximum value point is found through peak detection, and finally the arrival time difference of the audio data collected by each array element is estimated through the point.
Here, Fast Fourier Transform (FFT) may transform a sound source signal from a time domain to a frequency domain, so as to facilitate analysis of signal characteristics; the cross-power spectrum mainly realizes conjugate multiplication of signals after two paths of Fast Fourier Transform (FFT) conversion to obtain a basic frame cross-power spectrum; the purpose of frequency domain weighting is to perform whitening processing on a signal and noise, so that the frequency component of the signal with high signal-to-noise ratio is enhanced, and the effect of noise suppression is achieved. And performing inverse Fourier transform (IFFT) on the signal subjected to weighted sharpening peak, and performing peak detection on output data, wherein the maximum module value is the corresponding delay position, and the ratio of the maximum module value to a specific sampling point in an audio data frame is the arrival time difference.
Alternatively, in the embodiment of the present application, the weighting function may be a fundamental Cross Correlation (CC), a smooth coherence transform (SCOT), an ML (maximum likelihood, ML) filter, a whitening filter (PHAT), or the like.
And S104, calculating the azimuth angle of the characteristic sound according to the arrival time difference.
In this application, in the pickup array, because the interval of sound source position distance pickup array is often not fixed, consequently can correspond the model that uses difference to different sound fields.
Specifically, the pickup array is divided into a near field and a far field. In the far-field model, since the sound source is far from the pickup array, the paths between the sound source and each array element can be regarded as parallel lines. In the near-field model, the sound source can only be regarded as one point, and in the pickup array, a connecting line between any two array elements and the sound source forms a triangle. Therefore, the angle calculation methods of the two models are different. Experiments prove that the calculation amount of the far-field model is smaller than that of the near-field model.
For a uniform linear array, the far field and near field can be determined by the following equations:
where λ = c/f, d denotes the spacing between the circumferential array element and the central array element, λ denotes the wavelength of the characteristic sound, l denotes the distance from the sound source to the sound pickup array, f denotes the highest frequency of the characteristic sound, and c denotes the propagation speed of the sound source.
When the distance from the sound source to the pickup array is larger than l, the model can be regarded as a far-field model, and conversely, the model can be regarded as a near-field model.
For example, according to the sampling theorem, when the sampling rate is 48 khz and the array elements are spaced apart by 0.043 m, the maximum distance from the sound source to the array is equal to about 0.26 m, i.e., greater than 0.26 m under the current conditions, can be considered as a far-field model.
Fig. 3 is a schematic diagram of determining the azimuth angle of a characteristic sound according to an embodiment of the present application.
In the application, the arrival time difference of the characteristic sound can be converted into a sound angle value through a geometric calculation method.
Specifically, as shown in fig. 3, assuming that the paths from the characteristic sound to the respective array elements are regarded as parallel lines in the far-field model, when the characteristic sound is incident on and received by the microphone array at an angle θ and the distance between the circumferential array element and the central array element is d, it can be seen from the nature of the trigonometric function that the characteristic sound travels a distance dcos θ more when reaching the central array element (M1) than when reaching the circumferential array element (M2).
When the propagation speed of the characteristic sound is known to be c, a specific calculation formula of the arrival time difference of the audio data of the characteristic sound received by the circumferential array element (M2) can be as follows:
where Δ t represents the time difference of arrival between the audio data received for the circumferential array element and the audio data received for the central array element.
Further, by transforming the above formula, the following formula can be obtained:
that is, only the arrival time difference of the characteristic sound needs to be estimated, and the incident angle of the characteristic sound, that is, the azimuth angle of the characteristic sound, can be obtained correspondingly.
Furthermore, in order to make the positioning precision of the characteristic sound source direction higher, the method can be realized by combining the mode of improving the resolvable angle of the sound pickup array.
Specifically, when the sampling rate of the audio data is set to fsThe sampling period of the audio data is Ts=1/fsThen the arrival time difference of the characteristic sound can be expressed as μ TsWhere μ represents the number of samples of the audio data corresponding to the time difference of arrival. Then the following formula can be obtained from the foregoing formula for calculating the incident angle of the characteristic sound:
according to the Nyquist sampling theorem, the highest frequency of the characteristic sound which can be collected by the positioning device of the characteristic sound is half of the sampling rate of the positioning device of the application, namely f0=fs/2。
Wherein
,f
0The maximum frequency of the characteristic sound which can be collected by the characteristic sound positioning device is shown, fs is the sampling rate of the characteristic sound source positioning device,
representing the minimum wavelength of the characteristic sound.
Further, according to the property cos θ ϵ [ -1,1] of the mathematical trigonometric function, the range formula of the number μ of sampling points can be obtained as follows:
to pair
Rounded, defined as v, then within 180 ° the resolvable angles of the sound pick-up array are 1+2v, respectively
、
、
……
。
From the properties of the mathematical trigonometric function cos θ, the slope of the curve is the largest when θ =90 degrees, so the angular resolution is the largest when the sound source is located directly in front of the sound collection array and the angular resolution is the smallest when the sound source is located on the side of the sound collection array.
FIG. 4 is a schematic diagram illustrating the resolvable angles of the pickup array according to an embodiment of the present application.
As an example, as shown in fig. 4, when the distance between the circumferential array element and the central array element in the microphone array is set to 0.086 m, and the sampling rate of the positioning apparatus of the present application is set to 48 khz, the resolvable angle error of the microphone array is the smallest near 90 degrees, and the resolvable angle error of the microphone array is the largest near 0 degrees. Calculated by the above formula, the minimum error is equal to about 4.72 degrees and the maximum error is equal to about 16.3 degrees.
Furthermore, in order to avoid the problem that the angular resolution of the pickup array in the 0-degree direction is too low, the azimuth angle of the characteristic sound can be accurately distinguished in the 360-degree omnidirectional range, the optimal spatial resolution angle of the pickup array is set in the range from 60 degrees to 120 degrees, and then the azimuth angle of the characteristic sound can be calculated in the range from 60 degrees to 120 degrees.
In the embodiment of the application, as shown in fig. 2, the pickup array is a seven-element array, the distance between a circumferential array element and a central array element in the pickup array can be set to be 0.043 m, and an included angle between two adjacent circumferential array elements and the central array element is 60 degrees, so that the whole pickup array can be divided into six pickup zones, and an area between two adjacent circumferences is taken as one pickup zone.
According to the method and the device, the target sound-collecting area can be determined in the plurality of sound-collecting areas according to the arrival time difference of the characteristic sound, and then the azimuth angle can be determined according to the position of the target sound-collecting area.
Specifically, the distribution position of the array elements in the pickup array is determined, a straight line formed by connecting the central array element and the circumferential array elements positioned on two sides of the central array element is used as a transverse axis, a first rectangular coordinate system in which the transverse axis and the longitudinal axis intersect with the central array element is established, so that the 360-degree omnidirectional pickup array can be divided into two pickup areas of 180 degrees, then according to the characteristic sound arrival time difference of each array element calculated in the step S103, the position of the characteristic sound is preliminarily judged to be positioned above or below the transverse axis of the first rectangular coordinate system through a plurality of symbols of the arrival time difference under the first rectangular coordinate system, and further the characteristic sound azimuth can be reduced to be within one pickup area of 180 degrees. If the arrival time difference is 0, namely the distances from the sound source to the two circumferential array elements are the same, the sound source is positioned on the horizontal axis. At this time, the symbol of the arrival time difference of the two circumferential array elements on the horizontal axis is only needed to be judged, and the angle of the sound source is 0 degree or 180 degrees under the coordinate system.
And secondly, rotating the first rectangular coordinate system by a certain angle to establish a second rectangular coordinate system, and judging that the position of the characteristic sound is positioned above or below the horizontal axis of the second rectangular coordinate system according to the symbol of the arrival time difference of the characteristic sound positioned in the sound pickup area of 180 degrees under the second rectangular coordinate system, wherein the sound pickup area range of the characteristic sound azimuth can be further reduced.
Here, if the sound pickup area of the characteristic sound can be directly determined by the first rectangular coordinate system and the second rectangular coordinate system, the sound pickup area is locked as a target sound pickup area;
if the sound-collecting area of the characteristic sound cannot be judged, the second rectangular coordinate system needs to be further rotated by a certain angle to establish a third rectangular coordinate system, and the position of the characteristic sound is judged to be located above or below the horizontal axis of the third rectangular coordinate system through the symbols of the arrival time difference of the characteristic sound in the remaining sound-collecting areas under the third rectangular coordinate system, so that the target sound-collecting area can be finally determined.
It should be understood that the first rectangular coordinate system, the second rectangular coordinate system, and the third rectangular coordinate system may be obtained by clockwise or counterclockwise rotation, which is not limited in the present application.
Here, since the range of azimuth angle calculation values is distributed between 60 degrees and 120 degrees, it is also necessary to unify these azimuth angle values from different directions on one coordinate system.
In this application embodiment, can be through setting up a certain circumference array element on the pickup array as 0 degree, clockwise or anticlockwise rotation a week forms 360 degrees. By rotating the coordinate system, the azimuth angle corresponding to the characteristic sound is added to the rotation angle to obtain a relative angle.
Optionally, in order to prevent the azimuth angle of the characteristic sound from changing along with the rotation of the sound pickup array, which causes the change of the calculated azimuth angle, a direction sensor may be further added in the embodiment of the present application.
For example, a coordinate system is established by rotating the earth in the true north direction by 0 degree counterclockwise or clockwise, and after the relative angle value is calculated, the relative angle value is added to the deviation angle of the direction sensor, so as to obtain the final azimuth angle.
Here, the azimuth angle does not change with the rotation of the sound pickup array, and high stability is achieved.
In the embodiment of the present application, the direction sensor may be a magnetic heading sensor or other types of direction sensors, and the present application does not limit the types of the direction sensors.
The embodiment adopts a calculation method based on the arrival time difference, so that the calculation amount is greatly reduced, and the characteristic sound can be positioned in real time. Moreover, time domain interpolation is carried out on the audio data of the characteristic sound, so that the sampling rate of the audio data is improved, more sampling interval points are obtained, and the calculation precision of the arrival time difference of the characteristic sound in subsequent calculation is improved; meanwhile, the azimuth angle of the characteristic sound is calculated within the range of a pickup area of 60-120 degrees every time, angle weighting is carried out, the positioning accuracy of the characteristic sound is greatly improved, and therefore a more accurate sound source position can be obtained.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 5, a schematic structural diagram of a sound source positioning device provided in an exemplary embodiment of the present application is shown, which is hereinafter referred to as device 5. The apparatus 5 may be implemented as all or part of a computer device, in software, hardware or a combination of both. The device 5 comprises: the device comprises an acquisition unit 501, a recognition unit 502, a time delay calculation unit 503 and an angle calculation unit 504.
The acquisition unit 501 is used for acquiring n paths of audio data through the pickup array; the pickup array comprises 1 central array element and n-1 adjacent array elements, wherein n is an integer greater than 1;
the recognition unit 502 is configured to recognize a sound type of the audio data according to a pre-trained feature sound judgment model;
a time delay calculating unit 503, configured to calculate an arrival time difference between the audio data acquired by the n-1 adjacent array elements and the audio data acquired by the central array element if the sound type is a preset type;
an angle calculating unit 504, configured to calculate an azimuth angle of the characteristic sound according to the arrival time difference.
In one or more possible embodiments, the pickup array includes 1 central array element and 6 adjacent array elements distributed on the circumference at equal intervals, and the central array element is located on the center of the circle.
In one or more possible embodiments, the identifying the sound type of the characteristic sound according to the pre-trained characteristic sound judgment model includes:
loading a mean matrix, a covariance matrix and a weight of the Gaussian mixture model; wherein, the mean matrix, the covariance matrix and the weight are obtained by simulation;
Calculating a feature matrix of each frame of data in the audio data;
calculating the probability value of each frame of data according to the parameters of the Gaussian mixture model and the characteristic matrix;
judging and determining whether the sound type of the characteristic sound belongs to any one of preset sound type libraries or not according to the probability value by a threshold value;
if so, identifying the sound type of the characteristic sound according to the probability value and the characteristic matrix.
In one or more possible embodiments, the method further comprises:
and the updating unit is used for adding the sound type of the characteristic sound into a preset sound type library if the sound type of the characteristic sound does not belong to any one of preset sound type libraries.
In one or more possible embodiments, the calculating the arrival time difference between the audio data collected by the n-1 adjacent array elements and the audio data collected by the central array element includes:
performing analog-to-digital conversion on the audio data;
performing framing processing and interpolation processing on the converted audio data;
performing fast Fourier transform processing on the processed audio data;
performing cross-power spectrum analysis on the converted audio data;
Weighting the analyzed data;
performing fast Fourier inverse transformation on the processed data;
carrying out peak value detection on the transformed audio data to obtain a maximum value point;
and calculating the arrival time difference between the audio data collected by the n-1 adjacent array elements and the audio data collected by the central array element according to the maximum value point.
In one or more possible embodiments, the interpolation process includes: newton interpolation, hermitian interpolation, lagrange interpolation, spline interpolation or linear interpolation.
In one or more possible embodiments, the calculating an azimuth angle of the characteristic sound from the time difference of arrival includes:
the azimuth angle is calculated according to the following formula:
(ii) a Wherein, mu represents the number of sampling points of the audio data corresponding to the arrival time difference, Ts represents the sampling period, c represents the propagation speed of the characteristic sound, and d represents the distance between the central array element and the adjacent array element.
It should be noted that, when the device 5 provided in the foregoing embodiment executes the sound source localization method, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above functions. In addition, the sound source positioning device and the sound source positioning method provided by the above embodiments belong to the same concept, and the detailed implementation process thereof is referred to as the method embodiment, which is not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the embodiment shown in fig. 1, and a specific execution process may refer to a specific description of the embodiment shown in fig. 1, which is not described herein again.
The present application further provides a computer program product storing at least one instruction, which is loaded and executed by the processor to implement the sound source localization method according to the above embodiments.
Referring to fig. 6, a schematic structural diagram of a computer device is provided in an embodiment of the present application. As shown in fig. 6, the computer device 600 may include: at least one processor 601, at least one network interface 604, a user interface 603, a memory 605, at least one communication bus 602.
Wherein a communication bus 602 is used to enable the connection communication between these components.
The user interface 603 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 603 may also include a standard wired interface and a wireless interface.
The network interface 604 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Processor 601 may include one or more processing cores, among others. The processor 601 connects various parts within the overall terminal 600 using various interfaces and lines, and performs various functions of the terminal 600 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 605 and calling data stored in the memory 605. Optionally, the processor 601 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 601 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 601, but may be implemented by a single chip.
The Memory 605 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 605 includes a non-transitory computer-readable medium. The memory 605 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 605 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 605 can optionally also be at least one storage device located remotely from the processor 601. As shown in fig. 6, the memory 605, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an application program.
In the computer device 600 shown in fig. 6, the user interface 603 is mainly used as an interface for providing input for a user, and acquiring data input by the user; the processor 601 may be configured to call the application program stored in the memory 605 and specifically execute the method shown in fig. 1, and the specific process may refer to fig. 1 and is not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.