CN112346012A - Sound source position determining method and device, readable storage medium and electronic equipment - Google Patents

Sound source position determining method and device, readable storage medium and electronic equipment Download PDF

Info

Publication number
CN112346012A
CN112346012A CN202011267775.7A CN202011267775A CN112346012A CN 112346012 A CN112346012 A CN 112346012A CN 202011267775 A CN202011267775 A CN 202011267775A CN 112346012 A CN112346012 A CN 112346012A
Authority
CN
China
Prior art keywords
sound
transfer function
positions
sound source
maximum likelihood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011267775.7A
Other languages
Chinese (zh)
Inventor
胡玉祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Horizon Robotics Technology Co Ltd
Original Assignee
Nanjing Horizon Robotics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Horizon Robotics Technology Co Ltd filed Critical Nanjing Horizon Robotics Technology Co Ltd
Priority to CN202011267775.7A priority Critical patent/CN112346012A/en
Publication of CN112346012A publication Critical patent/CN112346012A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves

Abstract

The embodiment of the disclosure discloses a sound source position determining method and device, a readable storage medium and an electronic device, wherein the method comprises the following steps: based on respective transfer functions of a plurality of positions in a set space, obtaining transfer function matrixes corresponding to the positions in the set space; collecting a sound signal emitted from at least one of the plurality of positions; and processing the sound signals by combining the transfer function matrix by adopting a maximum likelihood method, and determining the sound source position emitting the sound signals in the plurality of positions. The sound zone detection is carried out by adopting the maximum likelihood method, so that the influence of the offline modeling amplitude error on the sound zone detection result can be avoided; and the ideal transfer function in the maximum likelihood function is replaced by the modeling transfer function, so that the accuracy of detecting the sound zone is improved.

Description

Sound source position determining method and device, readable storage medium and electronic equipment
Technical Field
The present disclosure relates to sound source localization technologies, and in particular, to a method and an apparatus for determining a sound source position, a readable storage medium, and an electronic device.
Background
With the development of voice recognition technology, voice recognition has been applied to various fields, for example, voice control, voice wakeup, and the like. In the application of voice wakeup, in order to facilitate the control of multiple users on the target, the position of the wakeup person needs to be determined during the wakeup. For example, while a vehicle is traveling, a vehicle occupant wakes up a device in the vehicle by voice, but the vehicle-mounted sound zone detection accuracy is greatly reduced due to the influence of fetal noise, wind noise, engine noise, in-vehicle air-conditioning noise, in-vehicle music, in-vehicle speaker interference, and the like during the traveling of the vehicle.
Disclosure of Invention
The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a sound source position determining method and device, a readable storage medium and an electronic device.
According to an aspect of an embodiment of the present disclosure, there is provided a sound source position determination method including:
obtaining a transfer function matrix corresponding to a plurality of positions in a set space;
collecting a sound signal emitted from at least one of the plurality of positions;
and processing the sound signals by combining the transfer function matrix by adopting a maximum likelihood method, and determining the sound source position emitting the sound signals in the plurality of positions.
According to another aspect of the embodiments of the present disclosure, there is provided a sound source position determination apparatus including:
the transfer function determining module is used for obtaining a transfer function matrix corresponding to a plurality of positions in a set space;
the signal acquisition module is used for acquiring a sound signal emitted by at least one position in the plurality of positions;
and the sound source determining module is used for processing the sound signals acquired by the signal acquisition module by adopting a maximum likelihood method based on the transfer function matrix determined by the transfer function determining module, and determining the sound source position emitting the sound signals in the positions.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the sound source position determination method according to the above-described embodiments.
According to still another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the sound source position determining method according to the foregoing embodiment.
Based on the sound source position determining method and device, the readable storage medium and the electronic device provided by the above embodiments of the present disclosure, the maximum likelihood method is adopted to perform the sound zone detection, so that the influence of the offline modeling amplitude error on the sound zone detection result can be avoided; and the ideal transfer function in the maximum likelihood function is replaced by the modeling transfer function, so that the accuracy of detecting the sound zone is improved.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a schematic flowchart of a sound source position determining method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating a sound source position determining method according to another exemplary embodiment of the present disclosure.
Fig. 3 is a schematic flow chart of step 203 in the embodiment shown in fig. 2 of the present disclosure.
Fig. 4 is a schematic flow chart of step 2032 in the embodiment shown in fig. 3 of the present disclosure.
Fig. 5 is a schematic flow chart of step 201 in the embodiment shown in fig. 2 of the present disclosure.
Fig. 6 is a schematic structural diagram of a sound source position determination apparatus according to an exemplary embodiment of the present disclosure.
Fig. 7 is a schematic structural diagram of a sound source position determination apparatus according to another exemplary embodiment of the present disclosure.
Fig. 8 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the application
In the process of implementing the present disclosure, the inventor finds that, when multiple users in a vehicle wake up a voice system in the vehicle, the prior art generally uses a beam forming algorithm for detecting a sound zone in the vehicle based on a free field model; however, the prior art has at least the following problems: because the acoustic environment in the car is complex, the reflection and scattering are strong, the difference between the free field model and the actual sound source model is large, and the sound zone detection accuracy is poor.
Exemplary System
In the prior art, to determine the position of a sound source in a plurality of positions, the following procedure is performed. The frequency domain signal x (k) received by the microphone array can be represented by formula (1):
x (k) ═ a (k) s (k) + n (k) formula (1)
Where k is a frequency index, a (k), s (k), and n (k) are respectively expressed as M × Q dimensional steering matrix, Q × 1 dimensional complex sound source signal, and noise signal received by M × 1 dimensional microphone, a (k) is an ideal transfer function (i.e., free field transfer function), and n (k) is a noise signal, which can be respectively expressed by the following formulas (2-5):
X(k)=[X1(k) X2(k) … XM(k)]Tformula (2)
A(k)=[a1(k) a2(k) … aQ(k)]Formula (3)
S(k)=[S1(k) S2(k) … SQ(k)]TFormula (4)
N(k)=[N1(k) N2(k) … NM(k)]TFormula (5)
Wherein a in the above formula (3)q(k) Can be expressed by the following formula (6):
Figure BDA0002776731950000041
in each of the above formulas, [ alpha ]]TDenotes the transpose of the matrix, M denotes the number of microphones, Q is the number of sound sources, and M > Q (i.e., the number of microphones)Greater than the number of sound sources). The noise signal N (k) can be assumed to be mean 0 and covariance matrix σ2Complex Gaussian noise of I, σ2For unknown constants, I is an identity matrix (a matrix with each value being 1).
In the above formula (1), the sound source direction Θ is [ θ ═ θ1 θ2 … θQ]TSignal S ═ S (1)T S(2)T … S(K)T]TAnd σ2Is deterministic and unknown, K is the maximum frequency index. Q ═ ΘT ST σ2]TThe likelihood function of (a) can be expressed by the following equation (7):
Figure BDA0002776731950000051
maximizing the likelihood function provided by equation (7) yields a range detection result, which can be expressed as the following equation (8):
Figure BDA0002776731950000052
wherein the content of the first and second substances,
Figure BDA0002776731950000053
representing the pseudo-inverse, i.e. in equation (8)
Figure BDA0002776731950000054
Can be represented by the following formula (9):
Figure BDA0002776731950000055
wherein (C)HRepresenting a conjugate transpose.
The processes of the above equations (1) to (9) belong to the methods for determining the sound source position in the prior art.
The main idea of the disclosed embodiment is to replace an ideal transfer function with a measured transfer function to more accurately locate the sound source position.
Fig. 1 is a schematic flowchart of a sound source position determining method according to an exemplary embodiment of the present disclosure. The sound source position determining method provided by the embodiment can be applied to a scene of a set space such as a vehicle-mounted scene, and the embodiment takes the application to the vehicle-mounted scene as an example, and comprises the following steps:
and 101, performing offline modeling on the position where the speaker possibly appears on each seat by using white noise to obtain a relative transfer function of the direction in which the speaker is located.
Specifically, a seat q in a vehicle is selected, P positions are selected in a small-range area on the seat q where a speaker is likely to appear, white noise is played at each of the P positions by using an artificial mouth (an artificial sound source playing device), and white noise signals played by the artificial mouth are synchronously collected
Figure BDA0002776731950000056
And the signal x received by the microphone array is x ═ x1 x2 … xM]Where M is the number of microphones, an absolute transfer function between a sound source of a qth seat and an mth microphone can be expressed by the following equation (10):
Figure BDA0002776731950000057
wherein the content of the first and second substances,
Figure BDA0002776731950000061
representing the time domain signal received by the mth microphone when the sound source is at the pth position; n denotes the length of the time domain modeling data and "+" denotes the convolution multiplication.
For Q seats, M microphone units, in a vehicle, offline modeling may determine a transfer function matrix h, which may be represented by the following equation (11):
Figure BDA0002776731950000062
step (ii) of102, transfer function h obtained according to off-line modelingmqAcquiring the guide vector of the sound source relative to the microphone array, and modeling the off-line transfer function hmqNormalization is performed, which can be expressed by the following equation (12):
Figure BDA0002776731950000063
wherein | | | purple hairlThe term "l norm" means normalization by amplitude when l is 1, and means normalization by energy when l is 2. Transforming the normalized transfer function to the frequency domain to obtain a steering vector of the sound source relative to the microphone array, where the transfer function matrix h (k) at the k-th frequency can be expressed as shown in equation (13):
Figure BDA0002776731950000064
103, replacing the ideal transfer function with the modeling transfer function to determine a sound zone detection result; that is, a (k) in the formula (8) is replaced with h (k), and at this time, the formula (8) of the vocal range detection result can be expressed as the following formula (14):
Figure BDA0002776731950000065
as can be seen from equation (14), it is assumed that there is an amplitude error in the h (k) modeling, as shown in equation (15):
H(k)=αHreal(k) formula (15)
Wherein Hreal(k) Alpha is a constant other than 0 for the true transfer function. At this time, in the formula (14)
Figure BDA0002776731950000075
Can be expressed as shown in equation (16):
Figure BDA0002776731950000071
at this time, the process of the present invention,
Figure BDA0002776731950000072
wherein, due to alpha and
Figure BDA0002776731950000073
the multiplications cancel each other out, so the value of α does not affect the result in equation (17), and therefore, it can be seen that the amplitude error of the offline modeling has no effect on the result of the range detection.
In order to improve the accuracy of sound zone detection, k in formula (14) selects a frequency point with a high probability of occurrence of a sound source signal according to a certain criterion, so that the influence of noise interference on a sound zone detection result can be reduced.
The commonly used multi-sound-source maximum likelihood sound source sound zone detection algorithm has the operation amount exponentially increased along with the increase of the number of sound sources. The maximum sound source number of the vehicle-mounted scene is the number of seats in the vehicle, the number of the seats in the vehicle is assumed to be Q, the number of the sound zones needing to be detected is P, and the formula (14) only needs to calculate
Figure BDA0002776731950000074
And then, obtaining a multi-sound source sound zone detection result.
The sound source position determining method provided by the disclosure adopts a maximum likelihood sound zone detection strategy, and when a maximum likelihood function is solved, an actual measurement model is used for replacing a free field model, so that the sound zone detection accuracy can be effectively improved; the method avoids the influence caused by actually measured transfer function amplitude difference; and the detection target of the sound zone of the vehicle-mounted scene is less, the problem that the computation amount is exponentially increased along with the number of sound sources in a multi-sound-source maximum likelihood algorithm is avoided, and the practicability is high.
Exemplary method
Fig. 2 is a flowchart illustrating a sound source position determining method according to another exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, as shown in fig. 2, and includes the following steps:
step 201, obtaining a transfer function matrix corresponding to a plurality of positions in a set space.
Here, the setting space may be a space having a boundary such as a vehicle, a conference room, or the like, and the plurality of positions are provided within the setting space, for example, when applied to a vehicle (in-vehicle scene), the plurality of positions may be a plurality of seats in the vehicle.
Step 202, collecting a sound signal emitted from at least one of a plurality of positions.
Alternatively, the sound signal of at least one location may be obtained by a sound collection device such as a microphone.
Step 203, processing the sound signal by using a maximum likelihood method and combining a transfer function matrix, and determining the sound source position emitting the sound signal in a plurality of positions.
In the embodiment, the maximum likelihood method does not estimate the probabilities of all the positions, and only obtains the position with the maximum possibility, and the embodiment replaces an ideal transfer function in the maximum likelihood function with a transfer function matrix, thereby improving the estimation accuracy; alternatively, the maximum likelihood function after replacement in this embodiment may refer to equation (14) in the embodiment shown in fig. 1.
According to the sound source position determining method provided by the embodiment of the disclosure, the sound zone detection is performed by adopting a maximum likelihood method, so that the influence of an offline modeling amplitude error on a sound zone detection result can be avoided; and the ideal transfer function in the maximum likelihood function is replaced by the modeling transfer function, so that the accuracy of detecting the sound zone is improved.
As shown in fig. 3, based on the embodiment shown in fig. 2, step 203 may include the following steps:
step 2031, determine the transfer function in the maximum likelihood function as a transfer function matrix.
Step 2032, processing the sound signal based on the maximum likelihood function, and determining a sound source position from which the sound signal is emitted among the plurality of positions.
The formula of the original maximum likelihood function is shown in the above formula (8), where a (k) is an ideal transfer function, and in this embodiment, by replacing a (k) with a transfer function matrix h (k), at this time, the formula (8) of the sound zone detection result can be represented as the formula (14) in the above embodiment shown in fig. 1, and the sound source sound zone detection result can be obtained by calculating the set number of times based on the formula (14).
As shown in fig. 4, based on the embodiment shown in fig. 3, step 2032 may include the following steps:
step 401, respectively substituting at least one group of sound frequency domain vectors in the plurality of sound frequency domain vectors included in the transfer function matrix into the maximum likelihood function.
Each group of sound frequency domain vectors comprises at least one sound frequency domain vector, and each sound frequency domain vector corresponds to one position.
The expression of the transfer function can be understood by referring to formula (13) in the embodiment shown in fig. 1, where the transfer function matrix h (k) shown in formula (13) includes Q columns, each column in the transfer function corresponds to one position, and optionally each column in the transfer function represents one sound frequency domain vector, and one or more columns in the transfer function can be simultaneously substituted into formula (14), so as to realize the estimation of the sound position.
Step 402, processing the sound signal based on at least one maximum likelihood function to obtain at least one maximum likelihood function value.
Step 403, determining a sound source position from the plurality of positions from which the sound signal originates, based on the at least one maximum likelihood function value.
In this embodiment, since each column (sound frequency domain vector) in the transfer function corresponds to one position, at least one maximum likelihood function value can be calculated based on the formula (14) by simultaneously substituting at least one set of sound frequency domain vectors in the plurality of sound frequency domain vectors into the formula (14), where the position corresponding to the at least one set of sound frequency domain vector corresponding to the smallest maximum likelihood function value is determined as the sound source position, and since the set of sound frequency domain vectors includes at least one sound frequency domain vector, the plurality of sound source positions can be determined based on the smallest maximum likelihood function value.
Optionally, step 403 provided by the foregoing embodiment may include:
and determining at least one position corresponding to the group of sound frequency domain vectors as the sound source position based on the group of sound frequency domain vectors corresponding to the minimum maximum likelihood function value in the at least one maximum likelihood function value.
In this embodiment, a plurality of maximum likelihood function values may be determined based on formula (14), and the magnitude relationship of the plurality of maximum likelihood function values is compared, and the position corresponding to a group of sound frequency domain vectors corresponding to the maximum likelihood function value with the smallest value is used as the sound source position, at this time, because the ideal transfer function is replaced by the transfer function matrix in formula (14), the obtained sound source position is more accurate.
Optionally, step 201 provided in the foregoing embodiment may include:
and modeling a transfer function for a plurality of positions in the set space to obtain a transfer function matrix corresponding to the plurality of positions.
In this embodiment, the transfer function obtained by modeling is usually an absolute transfer function, and what this embodiment needs to obtain is a transfer function matrix composed of relative transfer functions of a plurality of positions, and the specific process can refer to the process from formula (10) to formula (13) in the embodiment shown in fig. 1, and the absolute transfer function between the sound source at the q-th position and the m-th microphone can be expressed as formula (10), so the transfer function matrix for modeling a plurality of positions is expressed as a matrix shown in formula (11), and the transfer function matrix shown in formula (13) is obtained by performing normalization and frequency domain conversion processing on the transfer function at each position.
As shown in fig. 5, based on the embodiment shown in fig. 2, step 201 may include the following steps:
in step 2011, a known sound signal is played in each of a plurality of sound emission ranges corresponding to the plurality of positions.
Wherein each position corresponds to a set sounding range; the known sound signal may be white noise, for example, the embodiment shown in fig. 1 utilizes an artificial mouth to play the white noise signal within each set range of sound production.
Alternatively, the known sound signal is played at a plurality of preset sound source positions within each of the at least two set sound emission ranges, respectively.
Each set sounding range comprises a plurality of preset sound source positions, for example, in a vehicle space, a seat q in the vehicle is selected, a small range area which is possibly presented by a speaker on the seat q is selected, P positions are selected, white noise is played at the P positions by an artificial mouth, and white noise signals played by the artificial mouth are synchronously acquired.
Step 2012, an absolute transfer function of each of the elements of the microphone array with respect to the sound source is determined based on the acquisition of each known sound signal by the microphone array.
Step 2013, determining a transfer function matrix based on at least two absolute transfer functions corresponding to at least two microphone units in the microphone array.
In this embodiment, modeling is performed based on a plurality of positions to obtain an absolute transfer function corresponding to each microphone unit, normalization and frequency domain conversion are performed on a matrix formed by a plurality of absolute transfer functions to obtain a transfer function matrix shown in formula (13), and the transfer function matrix obtained by processing avoids the influence caused by the actually measured transfer function amplitude difference, thereby improving the accuracy of sound zone detection.
Optionally, step 2013 in the above embodiment may include:
respectively executing normalization operation on each absolute transfer function in the at least two absolute transfer functions to obtain at least two normalized transfer functions;
the normalization operation in this embodiment may be amplitude normalization or energy normalization, for example, as shown in the embodiment of FIG. 1 for the transfer function h in step 102mqAnd (3) carrying out normalization, wherein the specific normalization formula can be determined by the value of l in the formula by referring to the formula (12) and specifically adopting amplitude normalization or energy normalization.
Converting each normalized transfer function of the at least two normalized transfer functions into a frequency domain transfer function expressed in a frequency domain;
optionally, the normalized time domain signal is converted into a frequency domain transfer function expressed in a frequency domain by a time-frequency domain conversion, and in an alternative embodiment, the converted frequency domain transfer function is as in a column of formula (13) in the embodiment shown in fig. 1.
And arranging at least two frequency domain transfer functions according to corresponding positions to obtain a transfer function matrix.
In this embodiment, the number of frequency domain transfer functions corresponds to the number of sound sources, for example, as shown in fig. 1, Q rows of frequency domain transfer functions are obtained for sound sources in Q seats in a vehicle, and the Q rows of frequency domain transfer functions are arranged in sequence, so as to obtain a transfer function matrix shown in formula (13).
The embodiment performs amplitude normalization on the absolute transfer function, ensures that the energy received by each sound source relative to the microphone is consistent, performs frequency domain conversion after amplitude normalization, reduces the influence of amplitude change on a transfer function matrix, and improves the comprehensiveness of sound source positioning and the accuracy of positioning each sound source because the energy received by each sound source relative to the microphone is consistent.
Any of the sound source location determination methods provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including but not limited to: terminal equipment, a server and the like. Alternatively, any of the sound source position determining methods provided by the embodiments of the present disclosure may be executed by a processor, such as the processor executing any of the sound source position determining methods mentioned by the embodiments of the present disclosure by calling a corresponding instruction stored in a memory. And will not be described in detail below.
Exemplary devices
Fig. 6 is a schematic structural diagram of a sound source position determination apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 6, the apparatus provided in this embodiment includes:
and a transfer function determining module 61, configured to obtain a transfer function matrix corresponding to a plurality of positions in the set space.
And a signal acquisition module 62 for acquiring the sound signal emitted from at least one of the plurality of positions.
And a sound source determining module 63, configured to process the sound signal acquired by the signal acquiring module 62 by using a maximum likelihood method based on the transfer function matrix determined by the transfer function determining module 61, and determine a sound source position from which the sound signal is emitted in the multiple positions.
According to the sound source position determining device provided by the embodiment of the disclosure, the sound zone detection is performed by adopting a maximum likelihood method, so that the influence of an offline modeling amplitude error on a sound zone detection result can be avoided; and the ideal transfer function in the maximum likelihood function is replaced by the modeling transfer function, so that the accuracy of detecting the sound zone is improved.
Fig. 7 is a schematic structural diagram of a sound source position determination apparatus according to another exemplary embodiment of the present disclosure. As shown in fig. 7, the apparatus provided in this embodiment includes:
a sound source determination module 63 comprising:
a function replacing unit 631 for determining a transfer function in the maximum likelihood function as a transfer function matrix;
a position determining unit 632, configured to process the sound signal based on the maximum likelihood function, and determine a sound source position from which the sound signal is emitted in the plurality of positions.
Optionally, the position determining unit 632 is specifically configured to substitute at least one group of sound frequency domain vectors in the plurality of sound frequency domain vectors included in the transfer function matrix into the maximum likelihood function; each group of sound frequency domain vectors comprises at least one sound frequency domain vector, and each sound frequency domain vector corresponds to one position; processing the sound signal based on at least one maximum likelihood function to obtain at least one maximum likelihood function value; based on the at least one maximum likelihood function value, a location of a sound source from the plurality of locations from which the sound signal originated is determined.
Alternatively, the position determining unit 632, when determining a sound source position from among the plurality of positions at which a sound signal is emitted based on the at least one maximum likelihood function value, is configured to determine, as the sound source position, at least one position corresponding to a set of sound frequency domain vectors based on a set of sound frequency domain vectors corresponding to a minimum maximum likelihood function value from among the at least one maximum likelihood function value.
Optionally, the transfer function determining module 61 is specifically configured to model a transfer function for a plurality of positions in the set space, and obtain a transfer function matrix corresponding to the plurality of positions.
Wherein each position corresponds to a set sounding range; the transfer function determining module 61 includes:
a signal playing unit 611 for playing a known sound signal in each of a plurality of set sound emission ranges corresponding to the plurality of positions, respectively;
an absolute function determination unit 612, configured to determine an absolute transfer function of each microphone element in the microphone array with respect to the sound source based on each known sound signal acquired by the microphone array;
a matrix determination unit 613 is configured to determine a transfer function matrix based on at least two absolute transfer functions corresponding to at least two microphone elements in the microphone array.
The signal playing unit 611 is specifically configured to play the known sound signals at a plurality of preset sound source positions in each of the at least two set sound emission ranges. Wherein each of the set sounding ranges includes a plurality of preset sound source positions.
A matrix determining unit 613, configured to perform normalization operation on each of the at least two absolute transfer functions, respectively, to obtain at least two normalized transfer functions; converting each normalized transfer function of the at least two normalized transfer functions into a frequency domain transfer function expressed in a frequency domain; and arranging at least two frequency domain transfer functions according to corresponding positions to obtain a transfer function matrix.
Exemplary electronic device
Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 8. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.
FIG. 8 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
As shown in fig. 8, the electronic device 80 includes one or more processors 81 and memory 82.
The processor 81 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 80 to perform desired functions.
Memory 82 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 81 to implement the sound source location determination methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 80 may further include: an input device 83 and an output device 84, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, when the electronic device is the first device 100 or the second device 200, the input device 83 may be a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 83 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.
The input device 83 may also include, for example, a keyboard, a mouse, and the like.
The output device 84 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 84 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 80 relevant to the present disclosure are shown in fig. 8, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 80 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the sound source position determination method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the sound source position determination method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (11)

1. A sound source position determination method, comprising:
obtaining a transfer function matrix corresponding to a plurality of positions in a set space;
collecting a sound signal emitted from at least one of the plurality of positions;
and processing the sound signals by combining the transfer function matrix by adopting a maximum likelihood method, and determining the sound source position emitting the sound signals in the plurality of positions.
2. The method of claim 1, wherein said processing the sound signal using a maximum likelihood method in conjunction with the transfer function matrix to determine a location of a sound source from the plurality of locations from which the sound signal originated comprises:
determining a transfer function in a maximum likelihood function as the transfer function matrix;
and processing the sound signal based on the maximum likelihood function to determine the sound source position of the sound signal in the plurality of positions.
3. The method of claim 2, wherein said processing the sound signal based on the maximum likelihood function to determine a location of a sound source from the plurality of locations from which the sound signal originated comprises:
respectively substituting at least one group of sound frequency domain vectors in a plurality of sound frequency domain vectors included in the transfer function matrix into a maximum likelihood function; wherein each set of sound frequency domain vectors comprises at least one sound frequency domain vector, each of the sound frequency domain vectors corresponding to a location;
processing the sound signal based on the at least one maximum likelihood function to obtain at least one maximum likelihood function value;
determining a location of a sound source emitting sound signals from the plurality of locations based on the at least one maximum likelihood function value.
4. The method of claim 3, wherein said determining a location of a sound source emitting sound signals from among said plurality of locations based on said at least one maximum likelihood function value comprises:
and determining at least one position corresponding to the group of sound frequency domain vectors as the sound source position based on the group of sound frequency domain vectors corresponding to the minimum maximum likelihood function value in the at least one maximum likelihood function value.
5. The method according to any one of claims 1-4, wherein the obtaining a transfer function matrix corresponding to a plurality of positions in the set space comprises:
and modeling a transfer function for a plurality of positions in a set space, and obtaining a transfer function matrix corresponding to the plurality of positions.
6. The method of claim 5, wherein each of said locations corresponds to a set range of utterances;
the modeling a transfer function for a plurality of positions in a set space to obtain a transfer function matrix corresponding to the plurality of positions includes:
playing a known sound signal in each of a plurality of set sound emission ranges corresponding to the plurality of positions respectively;
determining an absolute transfer function of each microphone element in the microphone array relative to a sound source based on the acquisition of each of the known sound signals by the microphone array;
the transfer function matrix is determined based on at least two absolute transfer functions corresponding to at least two microphone elements in the microphone array.
7. The method of claim 6, wherein each of the set sounding ranges includes a plurality of preset sound source positions;
the playing of the known sound signal in each of the at least two set sound emission ranges respectively comprises:
and respectively playing the known sound signals at a plurality of preset sound source positions in each of the at least two set sound production ranges.
8. The method of claim 6, wherein the determining the transfer function matrix based on at least two absolute transfer functions corresponding to at least two elements of the microphone array comprises:
respectively executing normalization operation on each absolute transfer function in the at least two absolute transfer functions to obtain at least two normalized transfer functions;
converting each normalized transfer function of the at least two normalized transfer functions into a frequency domain transfer function expressed in a frequency domain, respectively;
and arranging the at least two frequency domain transfer functions according to corresponding positions to obtain the transfer function matrix.
9. A sound source position determination apparatus comprising:
the transfer function determining module is used for obtaining a transfer function matrix corresponding to a plurality of positions in a set space;
the signal acquisition module is used for acquiring a sound signal emitted by at least one position in the plurality of positions;
and the sound source determining module is used for processing the sound signals acquired by the signal acquisition module by adopting a maximum likelihood method based on the transfer function matrix determined by the transfer function determining module, and determining the sound source position emitting the sound signals in the positions.
10. A computer-readable storage medium storing a computer program for executing the sound source position determination method according to any one of claims 1 to 8.
11. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the sound source position determining method according to any one of claims 1 to 8.
CN202011267775.7A 2020-11-13 2020-11-13 Sound source position determining method and device, readable storage medium and electronic equipment Pending CN112346012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011267775.7A CN112346012A (en) 2020-11-13 2020-11-13 Sound source position determining method and device, readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011267775.7A CN112346012A (en) 2020-11-13 2020-11-13 Sound source position determining method and device, readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112346012A true CN112346012A (en) 2021-02-09

Family

ID=74363701

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011267775.7A Pending CN112346012A (en) 2020-11-13 2020-11-13 Sound source position determining method and device, readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112346012A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023000206A1 (en) * 2021-07-21 2023-01-26 华为技术有限公司 Speech sound source location method, apparatus and system
CN117076843A (en) * 2023-08-18 2023-11-17 上海工程技术大学 Method for establishing in-car acoustic transfer function error model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080181430A1 (en) * 2007-01-26 2008-07-31 Microsoft Corporation Multi-sensor sound source localization
US20120069714A1 (en) * 2010-08-17 2012-03-22 Honda Motor Co., Ltd. Sound direction estimation apparatus and sound direction estimation method
CN106093866A (en) * 2016-05-27 2016-11-09 南京大学 A kind of sound localization method being applicable to hollow ball array
CN108600907A (en) * 2017-03-09 2018-09-28 奥迪康有限公司 Method, hearing devices and the hearing system of localization of sound source
CN110675892A (en) * 2019-09-24 2020-01-10 北京地平线机器人技术研发有限公司 Multi-position voice separation method and device, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080181430A1 (en) * 2007-01-26 2008-07-31 Microsoft Corporation Multi-sensor sound source localization
US20120069714A1 (en) * 2010-08-17 2012-03-22 Honda Motor Co., Ltd. Sound direction estimation apparatus and sound direction estimation method
CN106093866A (en) * 2016-05-27 2016-11-09 南京大学 A kind of sound localization method being applicable to hollow ball array
CN108600907A (en) * 2017-03-09 2018-09-28 奥迪康有限公司 Method, hearing devices and the hearing system of localization of sound source
CN110675892A (en) * 2019-09-24 2020-01-10 北京地平线机器人技术研发有限公司 Multi-position voice separation method and device, storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡玉祥: "鲁棒性传声器阵列系统研究", 中国博士学位论文全文数据库 信息科技辑, no. 9, pages 49 - 55 *
陈浩楠: "特高压输电塔状态监测系统的声音监测与定位方法研究", 中国优秀硕士学位论文全文数据库 工程科技II辑, no. 8, 15 August 2020 (2020-08-15), pages 22 - 24 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023000206A1 (en) * 2021-07-21 2023-01-26 华为技术有限公司 Speech sound source location method, apparatus and system
CN117076843A (en) * 2023-08-18 2023-11-17 上海工程技术大学 Method for establishing in-car acoustic transfer function error model

Similar Documents

Publication Publication Date Title
Diaz-Guerra et al. Robust sound source tracking using SRP-PHAT and 3D convolutional neural networks
CN110556103B (en) Audio signal processing method, device, system, equipment and storage medium
CN109272989B (en) Voice wake-up method, apparatus and computer readable storage medium
CN110148422B (en) Method and device for determining sound source information based on microphone array and electronic equipment
EP3479377B1 (en) Speech recognition
CN107077860B (en) Method for converting a noisy audio signal into an enhanced audio signal
CN110673096B (en) Voice positioning method and device, computer readable storage medium and electronic equipment
Yu et al. Adversarial network bottleneck features for noise robust speaker verification
CN113611315B (en) Voiceprint recognition method and device based on lightweight convolutional neural network
CN110690930B (en) Information source number detection method and device
CN112346012A (en) Sound source position determining method and device, readable storage medium and electronic equipment
CN110675892B (en) Multi-position voice separation method and device, storage medium and electronic equipment
CN111863005A (en) Sound signal acquisition method and device, storage medium and electronic equipment
CN111128178A (en) Voice recognition method based on facial expression analysis
CN112055284B (en) Echo cancellation method, neural network training method, apparatus, medium, and device
Hemavathi et al. Voice conversion spoofing detection by exploring artifacts estimates
JP5994639B2 (en) Sound section detection device, sound section detection method, and sound section detection program
CN110689900B (en) Signal enhancement method and device, computer readable storage medium and electronic equipment
Pandharipande et al. Robust front-end processing for emotion recognition in noisy speech
CN112180318A (en) Sound source direction-of-arrival estimation model training and sound source direction-of-arrival estimation method
CN112652320A (en) Sound source positioning method and device, computer readable storage medium and electronic equipment
CN112750455A (en) Audio processing method and device
JP2019066339A (en) Diagnostic device, diagnostic method and diagnostic system each using sound
CN114333769B (en) Speech recognition method, computer program product, computer device and storage medium
Tao et al. An ensemble framework of voice-based emotion recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination