CN114333876A - Method and apparatus for signal processing - Google Patents
Method and apparatus for signal processing Download PDFInfo
- Publication number
- CN114333876A CN114333876A CN202111415175.5A CN202111415175A CN114333876A CN 114333876 A CN114333876 A CN 114333876A CN 202111415175 A CN202111415175 A CN 202111415175A CN 114333876 A CN114333876 A CN 114333876A
- Authority
- CN
- China
- Prior art keywords
- signal
- matrix
- mixing matrix
- sound source
- observation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000012545 processing Methods 0.000 title claims abstract description 71
- 239000011159 matrix material Substances 0.000 claims abstract description 198
- 238000002156 mixing Methods 0.000 claims abstract description 107
- 230000006870 function Effects 0.000 claims abstract description 65
- 238000012546 transfer Methods 0.000 claims abstract description 65
- 238000000926 separation method Methods 0.000 claims abstract description 50
- 238000012899 de-mixing Methods 0.000 claims abstract description 29
- 238000013507 mapping Methods 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 23
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 11
- 238000003672 processing method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 230000001360 synchronised effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000002087 whitening effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 239000004984 smart glass Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Abstract
The application provides a method and a device for signal processing, which can reduce the influence of reverberation on signal separation by obtaining a de-mixing matrix according to a mixing matrix containing correlation transfer functions between microphones, thereby improving the signal separation performance. In the method, a first mixing matrix including a correlation transfer function between microphones and a speech signal with reverberation can be obtained according to an observation signal, then a de-mixing matrix of the observation signal can be obtained according to the first mixing matrix and the speech signal with reverberation, and finally a separation signal can be obtained according to the de-mixing matrix. The embodiment of the application can be used in the field of audio processing, for example, front-end speech signal enhancement.
Description
Technical Field
The present application relates to the field of audio processing, and more particularly, to methods and apparatus for signal processing.
Background
The cocktail party effect reveals the masking effect of the human ear, i.e. the natural ability to extract a desired sound source from a complex noisy auditory scene (an acoustic scene where multiple sound sources are present simultaneously). With the increasing maturity of voice interaction technology, a target voice signal can be extracted through a blind source separation method. Blind Source Separation (BSS) refers to a process of separating a Source signal from a mixed signal (i.e., an observation signal) without knowing the Source signal and signal mixing system (or transmission channel).
Independent Vector Analysis (IVA) is a commonly used blind source separation method, i.e. a received observed signal is decomposed into several Independent components according to a statistically Independent principle, and these Independent components are used as an approximate estimate of the source signal. However, in the existing IVA-based blind source separation method, the mixing matrix is considered to be formed by a room transfer function, which makes the separation performance affected by the room reverberation condition.
Disclosure of Invention
The embodiment of the application provides a method and a device for signal processing, wherein a de-mixing matrix is obtained according to a mixing matrix containing correlation transfer functions between microphones, so that the influence of reverberation on signal separation can be reduced, and the signal separation performance is improved.
In a first aspect, a method of signal processing is provided, including:
acquiring observation signals, wherein the observation signals comprise original sound source signals of at least two sources acquired by at least two microphones;
determining a first mixing matrix H and a reverberated speech signal from the observation signalWherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signalThe mapping relationship between the two;
mixing the first mixing matrix H and the reverberated speech signalInputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberationAnd the unmixing matrix WThe mapping relationship between the two;
and acquiring a separation signal according to the unmixing matrix W and the observation signal.
In a second aspect, there is provided an apparatus for signal processing, comprising:
an acquisition unit for acquiring observation signals, wherein the observation signals comprise original sound source signals of at least two sources acquired by at least two microphones;
a processing unit for determining a first mixing matrix H and the reverberated speech signal from the observation signalWherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signalThe mapping relationship between the two;
the processing unit is further configured to combine the first mixing matrix H and the reverberated speech signalInputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberationAnd the mapping relation between the unmixing matrix W;
the processing unit is further configured to obtain a separation signal according to the unmixing matrix W and the observation signal.
In a third aspect, an electronic device is provided, which includes: a processor and a memory; the memory for storing a computer program; the processor is configured to execute the computer program to implement the method of the first aspect.
In a fourth aspect, a chip is provided, comprising: a processor for calling and running the computer program from the memory so that the device on which the chip is installed performs the method according to the first aspect.
In a fifth aspect, there is provided a computer readable storage medium comprising computer instructions which, when executed by a computer, cause the computer to carry out the method of the first aspect.
In a sixth aspect, there is provided a computer program product comprising computer program instructions to, when run on a computer, cause the computer to perform the method of the first aspect.
In the embodiment of the application, a first mixing matrix including a correlation transfer function between microphones and a voice signal with reverberation are obtained according to an observation signal, then a de-mixing matrix of the observation signal is obtained according to the first mixing matrix and the voice signal with reverberation, and finally a separation signal is obtained from the observation signal according to the de-mixing matrix. Since the first mixing matrix contains the correlation transfer function between microphones instead of the room transfer function, and the correlation transfer function between microphones does not contain reverberation, obtaining the unmixing matrix according to the first mixing matrix can reduce the influence of the reverberation on signal separation, thereby improving the signal separation performance.
Drawings
FIG. 1 is a schematic diagram of an application scenario suitable for use in embodiments of the present application;
FIG. 2 is a schematic diagram of a speech recognition system suitable for use with embodiments of the present application;
fig. 3 is a schematic flow chart of a method of signal processing provided by an embodiment of the present application;
FIG. 4 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
FIG. 5 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
FIG. 6 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
FIG. 7 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
FIG. 8 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
fig. 9 is a schematic diagram for comparing the effect of the method of signal processing provided by the embodiment of the present application with the effect of the sound source separation scheme in the prior art;
FIG. 10 is an alternative schematic block diagram of an apparatus for signal processing of an embodiment of the present application;
fig. 11 is another alternative schematic block diagram of an electronic device provided by an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a. In one implementation, B may be determined from a. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
In the description of the present application, "at least one" means one or more, "a plurality" means two or more than two, unless otherwise specified. In addition, "and/or" describes an association relationship of associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
It should be further understood that the descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent a particular limitation to the number of devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application.
It should also be appreciated that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the application. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides a signal processing scheme, which can enhance a front-end voice signal, for example, enhance an expected signal, suppress an interference signal, and the like, and can be applied to various fields, for example, smart homes, video conferences, intelligent traffic, driving assistance, and the like, without limitation.
Some brief descriptions will be made below on application scenarios to which the technical solution of the embodiment of the present application can be applied. It should be noted that the following application scenarios are only used for illustrating the embodiments of the present application and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.
Fig. 1 is a schematic diagram of an application scenario suitable for use in an embodiment of the present application. As shown in fig. 1, the application scenario may include a user terminal, which may be, for example, a mobile phone, a smart voice interaction device (e.g., wearable devices such as smart watches and smart glasses), an in-vehicle terminal, and a smart appliance (e.g., a smart speaker, a coffee maker, a printer, etc.). Optionally, the application scenario may further include a computing device, for example, the application scenario may be a cloud server, an intelligent portable device, or a home computing hub, which is not limited in this application. Illustratively, the smart portable device may be a smart phone, a computer, or the like, and the home computing center may be a smart phone, a computer, a smart television, a router, or the like, without limitation. For example, the user terminal and the computing device may be connected through a wireless network or through a bluetooth pairing connection, which is not limited in this embodiment of the application.
It should be noted that the user terminal in fig. 1 is only an example, and the user terminal to which the present application is applied is not limited thereto, and for example, the user terminal may also be an electronic device in an internet of things (IoT) system. In addition, the computing device in fig. 1 is only an example, and the computing device to which the present application is applied is not limited thereto, and may be, for example, a mobile internet device or the like. It should be further noted that the plurality of electronic devices shown in the embodiments of the present application are for better and more comprehensive description of the embodiments of the present application, but should not cause any limitation to the embodiments of the present application.
For a specific example, when the system architecture shown in fig. 1 is applied to a home use scenario, a user terminal may be a home computing hub, such as a mobile phone, a television, a router, or a cloud device, such as a cloud server, for example, and the embodiment of the present application is not limited thereto.
For another specific example, when the system architecture shown in fig. 1 is applied to a personal wearing scenario, the user terminal is, for example, a personal wearing device, such as a smart band, a smart watch, a smart headset, smart glasses, and the like, and the computing device may be a personal device, such as a mobile phone, and the like, which is not limited in this embodiment of the present application.
In some embodiments, the signal processing method provided by the embodiments of the present application may be implemented by a user terminal. For example, after acquiring the observation signal, the user terminal may obtain a unmixing matrix according to the signal processing method provided in the embodiment of the present application, and obtain the separation signal according to the unmixing matrix.
In other embodiments, the signal processing method provided by the embodiments of the present application may be implemented by a user terminal and a computing device in cooperation. For example, after acquiring an observation signal, the user terminal may send the observation signal to the computing device, the computing device obtains a de-mixing matrix according to the signal processing method provided in the embodiment of the present application, and sends the de-mixing matrix to the user terminal, and the user terminal obtains a separation signal according to the de-mixing matrix. For another example, the computing device may obtain a de-mixing matrix according to the signal processing method provided in the embodiment of the present application, obtain a separation signal according to the de-mixing matrix, and send the separation signal to the user terminal.
FIG. 2 is a schematic diagram of a speech recognition system suitable for use with embodiments of the present application. As shown in fig. 2, a front-end signal processing module 201 may be disposed before the speech recognition system 202, the target speech and the interfering speech may be received by one or more microphones (an example of a microphone), an observation signal output by the microphone is input to the front-end signal processing module 201, for example, an enhanced clean target speech signal (i.e., a separation signal) may be obtained after echo cancellation, dereverberation, sound source separation (also referred to as blind source separation), post-processing, and the like, respectively, and then the target speech signal may be input to the speech recognition system 202 for speech recognition. The signal processing scheme provided by the embodiment of the application can be applied to the sound source separation module, and the target speech signal is obtained by obtaining the unmixing matrix and performing signal separation on the observation signal.
For example, the front-end signal processing module 201 in fig. 2 may be on the user terminal in fig. 1, or may be on the computing device in fig. 1, which is not limited in this application.
In the following, related terms related to the embodiments of the present application are described.
1) Mixing matrix: a mapping relationship (e.g., a frequency domain linear combination relationship in the complex domain) between the observed signal and the original sound source signal is characterized. The mixing matrix may be a matrix of Room Transfer Functions (RTFs) from the individual sound sources to the individual microphones.
2) Unmixing matrix: the inverse matrix of the mixing matrix, i.e. the target matrix to be solved, characterizes the mapping relationship between the target speech signal and the observed signal (e.g. the frequency domain linear combination relationship in the complex domain). The unmixing matrix may also be referred to as a separation matrix, both meaning the same.
3) Room transfer function: a function that characterizes the propagation characteristics of sound in the frequency domain from a sound source to a microphone (e.g., a microphone).
4) The correlation transfer function between microphones is a function that characterizes the frequency domain propagation characteristics of sound from one microphone to another. When the microphones are microphones, the correlation transfer function between the microphones may be referred to as a correlation transfer function between the microphones.
Currently, in an IVA-based blind source separation method, an IVA-based separation method is used, a source signal model is established according to a hybrid matrix to obtain an objective function, the objective function is iteratively optimized, and a separation matrix is solved until the model converges to obtain an estimated source signal. In the scheme, the mixing matrix is considered to be formed by a room transfer function, so that the separation performance of the voice signal is influenced by the room reverberation condition, therefore, dereverberation preprocessing needs to be carried out in advance, and the complexity of a sound source separation algorithm is increased. Secondly, this solution is difficult to estimate the variance of the source signal, requires pre-whitening of the observed signal, and is thus difficult to implement in real time in the product. Finally, the scheme adopts a natural gradient method to carry out parameter optimization, the classification performance is limited by step length parameters, and although a large number of self-adaptive variable step length technologies are provided, the gradient descent algorithm still has large calculation amount.
In view of the above problem, embodiments of the present application provide a method for signal processing, which may transform a mixing matrix into a mixing matrix including a correlation transfer function between microphones, instead of a room transfer function, and the correlation transfer function between the microphones does not include reverberation, so as to obtain a unmixing matrix according to the mixing matrix including the correlation transfer function between the microphones, and may mitigate an influence of the reverberation on signal separation, thereby improving signal separation performance.
Furthermore, according to the mixing matrix and the voice signal with reverberation, the de-mixing matrix can construct a first parameter, and the de-mixing matrix is determined according to the mapping relation between the first parameter and the de-mixing matrix, so that estimation of a voice signal model can be avoided in the signal separation process, pre-whitening processing on an observed signal is not needed, and meanwhile, a natural gradient method is avoided for parameter optimization, so that the separation process is not restricted by the step length parameter, and the calculated amount can be effectively reduced.
The technical solutions provided by the embodiments of the present application are described below with reference to the accompanying drawings.
Fig. 3 shows a schematic flow chart of a method 300 of signal processing provided by an embodiment of the present application. The method 300 may be used for blind source separation, for example, may be applied to the application scenario shown in fig. 1, or may be applied to the speech recognition system shown in fig. 2, without limitation. As shown in fig. 3, method 300 includes steps 310 through 340.
An observation signal is acquired 310, wherein the observation signal comprises raw sound source signals of at least two sources acquired by at least two microphones.
Illustratively, the user terminal may acquire the observed signal via one or more microphones (e.g., microphones). The observation signal may comprise speech signals from a plurality of sound sources, which may comprise a target speech signal, i.e. a speech signal from a desired sound source. The observation signal may also include interfering speech signals, i.e. speech signals from undesired sound sources. In addition, the transmission channel or mixing system information of the observed signal is unknown.
In some embodiments, a Short-Time Fourier Transform (STFT) may be performed on the observed signal, resulting in the following equation (1):
x(f,t)=Afs(f,t) (1)
wherein x (f, t) represents an observation signal of f frequency point and t time, AfRepresenting the mixing matrix at the f bins (i.e. one example of the second mixing matrix a),s (f, t) represents original sound source signals of at least two sources of f frequency points and t time, wherein f is the frequency of the signals, and t is the time of the signals.
In the following description, a scheme provided by an embodiment of the present application is described by taking a dual-microphone and dual-sound-source scene as an example. It will be appreciated that the process may be extended to the case of multiple microphones and multiple sound sources, and in particular reference may be made to the description of the process of two microphones and two sound sources, and some simple adaptations may be required, which are within the scope of the embodiments of the present application.
For example, in a two-microphone, two-source scenario, the observed signal x (f, t) may be expressed as:
x(f,t)=[x1(f,t),x2(f,t)]T
the original sound source signal s (f, t) can be expressed as:
s(f,t)=[s1(f,t),s2(f,t)]T
mixing matrix AfGenerally consisting of a room transfer function, which can be expressed as:
In the method 300 for signal processing provided in the embodiment of the present application, it is necessary to estimate the unmixing matrix WfAnd satisfies the following conditions:
y(f,t)=Wfx(f,t) (3)
where y (f, t) represents the estimated separated signal, or may be referred to as the target speech signal, and should coincide with s (f, t) as much as possible. In a two-microphone, two-sound-source scenario, y (f, t) is y1f, t, y2f, tT.
320 determining a first mixing matrix H and the reverberated speech signal from the observation signalWherein the first mixing matrix H comprises a first correlation transfer function between at least two microphones. The first mixing matrix H is used for representing the observation signal and the language signal with reverberationThe mapping relationship between them.
Illustratively, in step 320, the above equation (1) may be transformed to obtain:
wherein HfA mixing matrix representing the frequencies at f, including the associated transfer functions between the microphones,and (3) representing the voice signal with reverberation at the frequency f and the time t.
In some alternative embodiments, referring to fig. 4, a speech signal with reverberation may be determined according to the following steps 321 and 322
321, determining a mapping relation between the second mixing matrix a and the first mixing matrix H.
322 determining said reverberated speech signal based on said mapping relationship and said original sound source signals of said at least two sources
Illustratively, taking a dual-microphone and dual-sound-source scene as an example, the mixing matrix a in formula (1) can be obtainedfThe following transformations are made:
wherein the content of the first and second substances, andfor the related transfer functions between the microphones, the two form a new mixing matrixI.e. an example of the first mixing matrix H. HfIncluding 2 parameters
Further, substituting equation (5) into equation (4) can obtain:
In equation (6), the speech signal to be restored is changed from the original speech signal s (f, t) to the reverberated speech signalAnd a mixing matrix A consisting of room transfer functionsfInto a mixing matrix H formed by microphone-dependent transfer functionsfThe reverberation involved due to the room transfer function is transferred to the reverberated speech signalSo that the microphone-dependent transfer function does not containAnd (4) reverberation.
330, mixing the first mixing matrix H and the reverberated speech signalInputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberationAnd the mapping relation between the unmixing matrix W.
That is, the signal processing model may be based on the first mixing matrix H, the reverberated speech signalAnd a mapping relation between the mixing matrix W, and the input first mixing matrix H and the voice signal with reverberationAnd obtaining a demixing matrix W of the observed signals.
In some alternative embodiments, referring to fig. 5, a demixing matrix W for the observed signals may be determined according to steps 331 and 332.
331, from the first mixing matrix H, the speech signal with reverberationAnd a unmixing matrix W, determining the first parameter.
332, obtaining the unmixing matrix W according to the mapping relationship between the first parameter and the unmixing matrix W.
In some embodiments, the first parameter may be defined. As a possible implementation, referring to fig. 6, the first parameter may be determined according to the following steps 333 and 334:
333, according to the first mixing matrix H, the voice signal with reverberationA second parameter is determined.
334, the first parameter is determined according to the second parameter and the de-mixing matrix W.
Illustratively, the second parameter may be expressed asIn the embodiment of the present application, the first parameter may be definedAnd defining a second parameterWherein E [ alpha ], [ beta ], [ alpha ], [ beta ]]Indicating data expectation, different values of k correspond to different sound sources.
For a dual microphone, dual sound source scenario, becauseAndindependently of each other, substituting the above formula (6) into the second parameterIn (b), one can obtain:
in addition, the unmixing matrix WfAnd a mixing matrix HfIs a reciprocal matrix, satisfies WfHfI, i.e.:
in the examples of the present application, it can be considered thatThe formula (8) is satisfied, wherein the value of k for the dual sound source scene is 1 or 2, which respectively corresponds to different sound sources, and (t-1) represents the last moment of time t.
For equation (7), each term left and right of equal sign is multiplied by leftAnd right rideThe following can be obtained:
Similarly, for equation (7), each term of the left and right equal sign is multiplied by the leftAnd right rideThe following can be obtained:
As a specific implementation, may be based on the first parameter (e.g., such asAnd) Determining a mapping relationship between the first parameter and the unmixing matrix W.
That is, it is possible to let:
from the formula (12), it can be foundAndcan be expressed, for example, as shown in the following formula
In some alternative embodiments, the modulus value of the unmixing matrix W may also be determined according to a minimum distortion principle (minimum distortion principle). Illustratively, the modulus value of the unmixing matrix W may be determined according to the following equation (14):
Wf(t)=diag(diag((Wf(t))-1))Wf(t) (14)
in summary, as a possible implementation manner of step 330, firstly, the reverberated speech signal may be obtained according to the first mixing matrix HDetermining a second parameterThen according to the second parameterAnd a de-mixing matrix W for determining the first parameterFinally, according to the first parameter andthe mapping relation with the unmixing matrix W, such as equations (13) and (14), yields the unmixing matrix W.
340, obtaining a separation signal according to the unmixing matrix W and the observation signal.
Illustratively, the observed signal x (f, t) may be de-mixed with the matrix WfThe separated signal y (f, t), i.e. the target speech signal, is obtained by substituting the above equation (3).
Therefore, according to the method, a first mixing matrix comprising correlation transfer functions between microphones and a voice signal with reverberation are obtained according to an observation signal, then a de-mixing matrix of the observation signal is obtained according to the first mixing matrix and the voice signal with reverberation, and finally a separation signal is obtained according to the de-mixing matrix. Since the first mixing matrix contains the correlation transfer function between microphones instead of the room transfer function, and the correlation transfer function between microphones does not contain reverberation, obtaining the unmixing matrix according to the first mixing matrix can reduce the influence of the reverberation on signal separation, thereby improving the signal separation performance.
Furthermore, according to the first mixing matrix and the voice signal with reverberation, the first parameter can be constructed by the de-mixing matrix, and the de-mixing matrix is determined according to the mapping relation between the first parameter and the de-mixing matrix, so that the estimation of the voice signal model can be avoided in the signal separation process, the pre-whitening processing on the observed signal is not needed, and the parameter optimization by adopting a natural gradient method is avoided, so that the separation process is not restricted by the step length parameter, the calculated amount can be effectively reduced, and the signal separation efficiency is improved.
In some alternative embodiments, for example, in the case where the energy of a certain original sound source signal in the observed signal is weak, the first parameter is madeThe mapping relation with the unmixing matrix W (such as the formula (13)) has a denominator of 0, which may cause the above-mentioned signal processing procedure to be unstable, for example, a downtime situation.
In order to ensure the stability of the signal processing process and improve the separation performance of the method 300, an Auxiliary virtual sound Source (AuxIS) may be introduced to enhance the observed signal, so as to obtain a first mixing matrix H of the enhanced observed signal and a voice signal with reverberationFor example, the auxiliary virtual sound source may enhance a weaker sound source signal in the original sound source signal to avoid that the energy of a certain original sound source signal is too weak, which may result in the first parameterThe mapping relation (such as formula (13)) with the unmixing matrix W has a denominator of 0, which can help to improve the stability of the signal processing process and improve the signal separationPerformance of ion.
Illustratively, referring to fig. 7, in the method 300, a first mixing matrix H of the enhanced observation signal and the reverberated speech signal may be obtained by the following steps 350 to 370.
350, determining the energy of the signal of the auxiliary virtual sound source according to the observation signal.
As a possible implementation, referring to fig. 8, the energy of the signal of the secondary virtual sound source may be determined by the following steps 351 and 352.
351, determining the amplitude spectrum of the signal of the auxiliary virtual sound source according to the observation signal.
352, determining the energy of the signal of the secondary virtual sound source based on the energy ratio of the observed signal to the signal of the secondary virtual sound source.
That is, the signal of the auxiliary virtual sound source can be decomposed into two parts, i.e. the amplitude spectrum of the signal of the auxiliary virtual sound source and the energy ratio of the observed signal to the signal of the auxiliary virtual sound source, which can be specifically seen in formula (15):
wherein λ isdBThe energy ratio of the observed signal to the signal of the auxiliary virtual sound source may be given in advance;to assist the amplitude spectrum of the virtual sound source.
360, obtaining a second related transfer function corresponding to the auxiliary virtual sound source
For example, a secondary virtual sound source may be introduced to enhance the kth sound source (e.g. the weakest one of the original sound source signals), where the secondary virtual sound source corresponds to a second related transfer functionCan be expressed asWherein k can be positive integers and respectively correspond to different sound sources.
Alternatively to this, the first and second parts may,for the estimated correlation transfer function, the estimation method may change with the change of the usage scenario. In some embodiments, the estimation may be performed by using a method of averaging multiple point measurements in advance (i.e. a method of real measurement), so as to obtain the related transfer functionFor example, it may be in a scene where the speaker location is relatively fixed, such as in a car. In some embodiments, the estimation may be performed using an adaptive correlation transfer function estimation algorithm (e.g., a far-field approximation estimation algorithm) to obtain the correlation transfer functionFor example, in a scenario where the speaker location is unknown, such as a conference room.
370 according to the original sound source signals of said at least two sources, the energy of the signals of said secondary virtual sound source and said second associated transfer functionObtaining the first mixing matrix H and the voice signal with reverberationWherein the first mixing matrix H comprises the second correlation transfer functionThe voice signal with reverberationIncluding the energy of the signal of the secondary virtual sound source.
Illustratively, after obtaining equation (6) above, the equation (6) may be further expanded to obtain:
when an auxiliary virtual sound source is introduced to enhance the kth sound source, the enhanced observation signal can be marked as xk(f, t), which can be expressed as the following equation:
that is, the first mixing matrix H may be updated toSpeech signal with reverberationCan be updated to
Illustratively, for a dual sound source scene, k takes values of 1 and 2, corresponding to two different sound sources respectively. When a virtual sound source is introduced, the 1 st sound source is enhanced to obtain an observation signal x1(f, t) is as follows:
when a virtual sound source is introduced, the 2 nd sound source is enhanced to obtain an observation signal x2(f, t) is as follows:
after the observation signal is enhanced by the auxiliary virtual sound source, the enhanced first mixing matrix H and the reverberated speech signal may be combinedAnd inputting the signal processing model to obtain an enhanced unmixing matrix W of the observation signal. Accordingly, the enhanced first mixing matrix H and the reverberated speech signal may now be usedAnd determining the second parameter and the first parameter, and further obtaining a de-mixing matrix W according to the first parameter and the de-mixing matrix W.
Illustratively, based on the enhanced first mixing matrix H and the reverberated speech signalThe determined second parameter may be recorded asThe first parameter may be recorded as
illustratively, for a dual-microphone, dual-source scene, the first parameter is determinedThereafter, can beSubstituting into the above equations (13) and (14), the unmixing matrix W is obtained. Then, a separation signal can be obtained from the unmixing matrix W and the enhanced observation signal.
As a specific example, in obtainingThereafter, can beSubstituting into the equations (13) and (14) to obtainAnd the modulus values of the unmixing matrix W. Then, can be based onObtaining a target speech signal of the 1 st sound source based onAnd obtaining a target voice signal of the 2 nd sound source.
That is, in the case where the auxiliary virtual sound source is introduced, it is possible to first obtain the energy λ (f, t) of the auxiliary virtual sound source from the observed signal and estimate the correlation transfer function between the microphones corresponding to the auxiliary virtual sound sourceCan then be based on this energy λ (f, t) and the associated transfer functionDetermining an enhanced observed Signal xk(f, t), and further based on the enhanced observed signal xk(f, t) determining a second parameterAnd a first parameterFinally, according to the first parameter andthe mapping relation with the unmixing matrix W, such as equations (13) and (14), obtains the unmixing matrix W, and thus obtains the separated signal y (f, t), i.e., the target speech signal.
Therefore, in the embodiment of the application, the auxiliary virtual sound source is introduced to enhance the observation signal, so as to obtain a second mixing matrix corresponding to the enhanced observation signal and a voice signal with reverberation, where the second mixing matrix includes a second correlation transfer function corresponding to the auxiliary virtual sound source, and the voice signal with reverberation includes energy of a signal of the auxiliary virtual sound source. The unmixing matrix W to be solved can be regarded as a special beam forming matrix (that is, the unmixing matrix W is not a matrix designed by direction information, but a matrix designed by sound source independence), the added auxiliary virtual sound source can enhance the original voice signal in the observation signal, and the accuracy of the unmixing matrix W can be increased, so that the stability of the signal processing process can be ensured, and the signal separation performance can be improved.
Fig. 9 is a schematic diagram for comparing the effect of the method of signal processing provided by the embodiment of the present application with the effect of the sound source separation scheme in the prior art. The graph (a) is a comparison graph of the Signal-to-Interference Ratio (SIR) rise value of the separated Signal obtained in each scheme, and the graph (b) is a comparison graph of the Signal-to-Interference Ratio (SDR) rise value of the separated Signal obtained in each scheme, and the X-axis of the graph (a) and the graph (b) represents the reverberation time.
For example, a mixed speech signal may be acquired in a two-microphone, two-source mixing scenario. As a specific example, two microphones can be used to collect the voice signals of two persons speaking simultaneously in a room with a length of 4.45m, a width of 3.55 m and a height of 2.5 m. The two persons may be located 1m from the microphones, respectively, at 45 ° and 135 ° with respect to the directional angle of the microphones, respectively, and the distance between the two microphones may be 0.1 m. The reverberation time is adjusted from 150ms to 300ms, and the adjustment step size of the reverberation time is 10 ms.
The speech signals received by the two microphones can respectively adopt (1) the traditional AuxIVA technology; (2) a geometric Constrained Auxiliary function (GCAV) -IVA with VCD of the reference algorithm; (3) the AuxIS-AuxIVA estimating method and device using the guide vector provided by the embodiment of the application; (4) the AuxIS-AuxIVA estimated by using the pre-measured value is provided by the embodiment of the application. Wherein the steering vector represents a far-field approximate estimation of the related transfer function of the AuxIS, and the pre-measured value is the actually measured related transfer function of the AuxIS.
As can be seen from fig. 9, under different reverberation times, the SIR and SDR of the separated signal obtained by the signal processing method provided by the embodiment of the present application are significantly improved compared with the existing method, so that the signal processing method provided by the embodiment of the present application can help to improve the quality of the front-end signal.
The present invention is not limited to the details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the technical concept of the present invention, and the technical solution of the present invention is protected by the present invention. For example, the various features described in the foregoing detailed description may be combined in any suitable manner without contradiction, and various combinations that may be possible are not described in this application in order to avoid unnecessary repetition. For example, various embodiments of the present application may be arbitrarily combined with each other, and the same should be considered as the disclosure of the present application as long as the concept of the present application is not violated.
It should also be understood that, in the various method embodiments of the present application, the sequence numbers of the above-mentioned processes do not imply an execution sequence, and the execution sequence of the processes should be determined by their functions and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. It is to be understood that the numerical designations are interchangeable under appropriate circumstances such that the embodiments of the application described are capable of operation in sequences other than those illustrated or described herein.
Method embodiments of the present application are described in detail above in conjunction with fig. 3-9, and apparatus embodiments of the present application are described in detail below in conjunction with fig. 10-11.
Fig. 10 is a schematic block diagram of an apparatus 700 for signal processing according to an embodiment of the present application. As shown in fig. 10, the signal processing apparatus 700 may include an obtaining unit 710 and a processing unit 720.
An obtaining unit 710 for obtaining observation signals, wherein the observation signals comprise at least two original sound source signals of at least two sources obtained by at least two microphones;
a processing unit 720 for determining a first mixing matrix H and a reverberated speech signal based on the observation signalWherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signalThe mapping relationship between the two;
the processing unit 720 is further configured to combine the first mixing matrix H and the reverberated speech signalInputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberationAnd the mapping relation between the unmixing matrix W;
the processing unit 720 is further configured to obtain a separation signal according to the unmixing matrix W and the observation signal.
Optionally, the processing unit 720 is specifically configured to:
the voice signal with reverberation is obtained according to the first mixing matrix HAnd the unmixing matrix W, determining a first parameter;
and obtaining the unmixing matrix W according to the mapping relation between the first parameter and the unmixing matrix W.
Optionally, the processing unit 720 is specifically configured to:
the voice signal with reverberation is obtained according to the first mixing matrix HDetermining a second parameter;
and determining the first parameter according to the second parameter and the unmixing matrix W.
Optionally, the processing unit 720 is further configured to:
and determining the mapping relation between the first parameter and the unmixing matrix W according to the null space of the first parameter.
Optionally, the processing unit 720 is further configured to:
and determining the modulus value of the unmixing matrix W according to the minimum distortion principle.
Optionally, the processing unit 720 is further configured to determine, according to the observation signal, an energy of a signal of an auxiliary virtual sound source;
the obtaining unit 710 is further configured to obtain a second correlation transfer function corresponding to the auxiliary virtual sound source.
Wherein, the processing unit 720 is specifically configured to:
according to the observed signal, the energy of the signal of the auxiliary virtual sound source and the second associated transfer functionObtaining the first mixing matrix H and the voice signal with reverberationWherein the first mixing matrix H comprises the second correlation transfer function, the reverberated speech signalIncluding the energy of the signal of the secondary virtual sound source.
Optionally, the processing unit 720 is specifically configured to:
determining a magnitude spectrum of a signal of the auxiliary virtual sound source according to the observation signal;
determining the energy of the signal of the auxiliary virtual sound source according to the energy ratio of the observed signal to the signal of the auxiliary virtual sound source.
Optionally, the obtaining unit 710 is specifically configured to determine the second correlation transfer function by using a way of averaging in advance through multipoint measurement.
Optionally, the obtaining unit 710 is specifically configured to determine the second correlation transfer function by using an adaptive correlation transfer function estimation algorithm.
Optionally, the processing unit 720 is specifically configured to:
determining a mapping relationship between the first mixing matrix H and a second mixing matrix A, wherein the second mixing matrix A is used for representing the mapping relationship between the observation signals and the original sound source signals of the at least two sources;
determining the language with reverberation according to the mapping relation and the original sound source signals of the N sourcesSound signal
Optionally, the second mixing matrix a comprises a room transfer function between a sound source of the observation signal to a microphone.
It is to be understood that apparatus embodiments and method embodiments may correspond to one another and that similar descriptions may refer to method embodiments. To avoid repetition, further description is omitted here. Specifically, the apparatus 700 for signal processing in this embodiment may correspond to a corresponding main body for executing the method 300 in this embodiment, and the foregoing and other operations and/or functions of each module in the apparatus 700 are respectively for implementing each method in fig. 3 to fig. 8 or a corresponding flow in each method, and are not described again here for brevity.
The apparatus and system of embodiments of the present application are described above in connection with the drawings from the perspective of functional modules. It should be understood that the functional modules may be implemented by hardware, by instructions in software, or by a combination of hardware and software modules. Specifically, the steps of the method embodiments in the present application may be implemented by integrated logic circuits of hardware in a processor and/or instructions in the form of software, and the steps of the method disclosed in conjunction with the embodiments in the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, electrically erasable programmable memory, registers, and the like, as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps in the above method embodiments in combination with hardware thereof.
Fig. 11 is a schematic block diagram of an electronic device 800 provided in an embodiment of the present application.
As shown in fig. 11, the electronic device 800 may include:
a memory 810 and a processor 820, the memory 810 being configured to store a computer program and to transfer the program code to the processor 820. In other words, the processor 820 may call and execute a computer program from the memory 810 to implement the communication method in the embodiment of the present application.
For example, the processor 820 may be configured to perform the steps of the method 300 according to instructions in the computer program.
In some embodiments of the present application, the processor 820 may include, but is not limited to:
general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like.
In some embodiments of the present application, the memory 810 includes, but is not limited to:
volatile memory and/or non-volatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DR RAM).
In some embodiments of the present application, the computer program may be partitioned into one or more modules, which are stored in the memory 810 and executed by the processor 820 to perform the encoding methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, the instruction segments describing the execution of the computer program in the electronic device 800.
Optionally, the electronic device 800 may further include:
a transceiver 830, the transceiver 830 being connectable to the processor 820 or the memory 810.
The processor 820 may control the transceiver 830 to communicate with other devices, and specifically, may transmit information or data to the other devices or receive information or data transmitted by the other devices. The transceiver 830 may include a transmitter and a receiver. The transceiver 830 may further include one or more antennas.
It should be understood that the various components in the electronic device 800 are connected by a bus system that includes a power bus, a control bus, and a status signal bus in addition to a data bus.
According to an aspect of the present application, there is provided a communication device comprising a processor and a memory, the memory being configured to store a computer program, the processor being configured to call and execute the computer program stored in the memory, so that the encoder performs the method of the above-described method embodiment.
According to an aspect of the present application, there is provided a computer storage medium having a computer program stored thereon, which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. In other words, the present application also provides a computer program product containing instructions, which when executed by a computer, cause the computer to execute the method of the above method embodiments.
According to another aspect of the application, a computer program product or computer program is provided, comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of the above-described method embodiment.
In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application occur, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the module is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (15)
1. A method of signal processing, comprising:
acquiring observation signals, wherein the observation signals comprise original sound source signals of at least two sources acquired by at least two microphones;
determining a first mixing matrix H and a reverberated speech signal from the observation signalWherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signalThe mapping relationship between the two;
mixing the first mixing matrix H and the reverberated speech signalInputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberationAnd the mapping relation between the unmixing matrix W;
and acquiring a separation signal according to the unmixing matrix W and the observation signal.
2. The method of claim 1, wherein the mixing the first mixing matrix H and the reverberated speech signalInputting a signal processing model to obtain a demixing matrix W of the observation signal, wherein the demixing matrix W comprises the following steps:
the voice signal with reverberation is obtained according to the first mixing matrix HAnd the unmixing matrix W, determining a first parameter;
and obtaining the unmixing matrix W according to the mapping relation between the first parameter and the unmixing matrix W.
3. The method of claim 2, wherein the reverberated speech signal is based on the mixing matrix HAnd the unmixing matrix W, determining a first parameter, comprising:
the voice signal with reverberation is obtained according to the first mixing matrix HDetermining a second parameter;
and determining the first parameter according to the second parameter and the unmixing matrix W.
4. The method of claim 2 or 3, further comprising:
and determining the mapping relation between the first parameter and the unmixing matrix W according to the null space of the first parameter.
5. The method according to any one of claims 2-4, further comprising:
and determining the modulus value of the unmixing matrix W according to the minimum distortion principle.
6. The method of any one of claims 1-5, further comprising:
determining the energy of the signal of the auxiliary virtual sound source according to the observation signal;
acquiring a second related transfer function corresponding to the auxiliary virtual sound source;
wherein the determining of the first mixing matrix H and the reverberated speech signal from the observation signalThe method comprises the following steps:
according to the observed signal, the energy of the signal of the auxiliary virtual sound sourceQuantity and said second associated transfer functionObtaining the first mixing matrix H and the voice signal with reverberationWherein the first mixing matrix H comprises the second correlation transfer function, the reverberated speech signalIncluding the energy of the signal of the secondary virtual sound source.
7. The method of claim 6, wherein determining the energy of the signal of the secondary virtual sound source from the observed signal comprises:
determining a magnitude spectrum of a signal of the auxiliary virtual sound source according to the observation signal;
determining the energy of the signal of the auxiliary virtual sound source according to the energy ratio of the observed signal to the signal of the auxiliary virtual sound source.
8. The method according to claim 6 or 7, wherein the obtaining a second correlation transfer function corresponding to the auxiliary virtual sound source comprises:
and determining the second correlation transfer function by using a mode of averaging in advance through multipoint measurement.
9. The method according to claim 6 or 7, wherein the obtaining a second correlation transfer function corresponding to the auxiliary virtual sound source comprises:
determining the second correlation transfer function using an adaptive correlation transfer function estimation algorithm.
10. The method of any one of claims 1-9, wherein the root is a root of a plantDetermining a mixing matrix H and a speech signal with reverberation from the observation signalThe method comprises the following steps:
determining a mapping relationship between the first mixing matrix H and a second mixing matrix A, wherein the second mixing matrix A is used for representing the mapping relationship between the observation signals and the original sound source signals of the at least two sources;
11. The method according to claim 10, wherein the second mixing matrix a comprises a room transfer function between a sound source of the observation signal to a microphone.
12. An apparatus for signal processing, comprising:
an acquisition unit for acquiring observation signals, wherein the observation signals comprise original sound source signals of at least two sources acquired by at least two microphones;
a processing unit for determining a first mixing matrix H and the reverberated speech signal from the observation signalWherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signalThe mapping relationship between the two;
the processing unit is further configured to sum the first mixing matrix H andthe voice signal with reverberationInputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberationAnd the mapping relation between the unmixing matrix W;
the processing unit is further configured to obtain a separation signal according to the unmixing matrix W and the observation signal.
13. An electronic device comprising a processor and a memory, the memory having stored therein instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-11.
14. A computer storage medium for storing a computer program comprising instructions for performing the method of any one of claims 1-11.
15. A computer program product, comprising computer program code which, when run by an electronic device, causes the electronic device to perform the method of any of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111415175.5A CN114333876B (en) | 2021-11-25 | 2021-11-25 | Signal processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111415175.5A CN114333876B (en) | 2021-11-25 | 2021-11-25 | Signal processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114333876A true CN114333876A (en) | 2022-04-12 |
CN114333876B CN114333876B (en) | 2024-02-09 |
Family
ID=81046323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111415175.5A Active CN114333876B (en) | 2021-11-25 | 2021-11-25 | Signal processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114333876B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090086998A1 (en) * | 2007-10-01 | 2009-04-02 | Samsung Electronics Co., Ltd. | Method and apparatus for identifying sound sources from mixed sound signal |
EP2863391A1 (en) * | 2012-06-18 | 2015-04-22 | Goertek Inc. | Method and device for dereverberation of single-channel speech |
CN109994120A (en) * | 2017-12-29 | 2019-07-09 | 福州瑞芯微电子股份有限公司 | Sound enhancement method, system, speaker and storage medium based on diamylose |
CN110428852A (en) * | 2019-08-09 | 2019-11-08 | 南京人工智能高等研究院有限公司 | Speech separating method, device, medium and equipment |
WO2020064089A1 (en) * | 2018-09-25 | 2020-04-02 | Huawei Technologies Co., Ltd. | Determining a room response of a desired source in a reverberant environment |
CN112435685A (en) * | 2020-11-24 | 2021-03-02 | 深圳市友杰智新科技有限公司 | Blind source separation method and device for strong reverberation environment, voice equipment and storage medium |
CN113393857A (en) * | 2021-06-10 | 2021-09-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device and medium for eliminating human voice of music signal |
CN113687307A (en) * | 2021-08-19 | 2021-11-23 | 中国人民解放军海军工程大学 | Self-adaptive beam forming method under low signal-to-noise ratio and reverberation environment |
-
2021
- 2021-11-25 CN CN202111415175.5A patent/CN114333876B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090086998A1 (en) * | 2007-10-01 | 2009-04-02 | Samsung Electronics Co., Ltd. | Method and apparatus for identifying sound sources from mixed sound signal |
EP2863391A1 (en) * | 2012-06-18 | 2015-04-22 | Goertek Inc. | Method and device for dereverberation of single-channel speech |
CN109994120A (en) * | 2017-12-29 | 2019-07-09 | 福州瑞芯微电子股份有限公司 | Sound enhancement method, system, speaker and storage medium based on diamylose |
WO2020064089A1 (en) * | 2018-09-25 | 2020-04-02 | Huawei Technologies Co., Ltd. | Determining a room response of a desired source in a reverberant environment |
CN110428852A (en) * | 2019-08-09 | 2019-11-08 | 南京人工智能高等研究院有限公司 | Speech separating method, device, medium and equipment |
CN112435685A (en) * | 2020-11-24 | 2021-03-02 | 深圳市友杰智新科技有限公司 | Blind source separation method and device for strong reverberation environment, voice equipment and storage medium |
CN113393857A (en) * | 2021-06-10 | 2021-09-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device and medium for eliminating human voice of music signal |
CN113687307A (en) * | 2021-08-19 | 2021-11-23 | 中国人民解放军海军工程大学 | Self-adaptive beam forming method under low signal-to-noise ratio and reverberation environment |
Non-Patent Citations (2)
Title |
---|
ROBERT AICHNER等: "A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments", SIGNAL PROCESSING, vol. 86, no. 6 * |
宁峻: "麦克风阵列波束成形语音分离及声学回声消除方法研究", 中国优秀硕士学位论文全文数据库 * |
Also Published As
Publication number | Publication date |
---|---|
CN114333876B (en) | 2024-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111418010B (en) | Multi-microphone noise reduction method and device and terminal equipment | |
Serizel et al. | Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants | |
US8787587B1 (en) | Selection of system parameters based on non-acoustic sensor information | |
US20190272842A1 (en) | Speech enhancement for an electronic device | |
CN109727604A (en) | Frequency domain echo cancel method and computer storage media for speech recognition front-ends | |
CN111131947B (en) | Earphone signal processing method and system and earphone | |
US20070100605A1 (en) | Method for processing audio-signals | |
US11146897B2 (en) | Method of operating a hearing aid system and a hearing aid system | |
JP4543014B2 (en) | Hearing device | |
US10755728B1 (en) | Multichannel noise cancellation using frequency domain spectrum masking | |
CN111081267B (en) | Multi-channel far-field speech enhancement method | |
WO2019113253A1 (en) | Voice enhancement in audio signals through modified generalized eigenvalue beamformer | |
US9877115B2 (en) | Dynamic relative transfer function estimation using structured sparse Bayesian learning | |
CN110265054A (en) | Audio signal processing method, device, computer readable storage medium and computer equipment | |
US20150318001A1 (en) | Stepsize Determination of Adaptive Filter For Cancelling Voice Portion by Combing Open-Loop and Closed-Loop Approaches | |
CN111681665A (en) | Omnidirectional noise reduction method, equipment and storage medium | |
Spriet et al. | Stochastic gradient-based implementation of spatially preprocessed speech distortion weighted multichannel Wiener filtering for noise reduction in hearing aids | |
CN105957536B (en) | Based on channel degree of polymerization frequency domain echo cancel method | |
CN112802490A (en) | Beam forming method and device based on microphone array | |
CN113889135A (en) | Method for estimating direction of arrival of sound source, electronic equipment and chip system | |
US20140254825A1 (en) | Feedback canceling system and method | |
US20230209283A1 (en) | Method for audio signal processing on a hearing system, hearing system and neural network for audio signal processing | |
CN114333876A (en) | Method and apparatus for signal processing | |
Farmani et al. | Sound source localization for hearing aid applications using wireless microphones | |
CN113223552B (en) | Speech enhancement method, device, apparatus, storage medium, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40069743 Country of ref document: HK |
|
GR01 | Patent grant | ||
GR01 | Patent grant |