CN114333876A - Method and apparatus for signal processing - Google Patents

Method and apparatus for signal processing Download PDF

Info

Publication number
CN114333876A
CN114333876A CN202111415175.5A CN202111415175A CN114333876A CN 114333876 A CN114333876 A CN 114333876A CN 202111415175 A CN202111415175 A CN 202111415175A CN 114333876 A CN114333876 A CN 114333876A
Authority
CN
China
Prior art keywords
signal
matrix
mixing matrix
sound source
observation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111415175.5A
Other languages
Chinese (zh)
Other versions
CN114333876B (en
Inventor
陈日林
张兆奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111415175.5A priority Critical patent/CN114333876B/en
Publication of CN114333876A publication Critical patent/CN114333876A/en
Application granted granted Critical
Publication of CN114333876B publication Critical patent/CN114333876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application provides a method and a device for signal processing, which can reduce the influence of reverberation on signal separation by obtaining a de-mixing matrix according to a mixing matrix containing correlation transfer functions between microphones, thereby improving the signal separation performance. In the method, a first mixing matrix including a correlation transfer function between microphones and a speech signal with reverberation can be obtained according to an observation signal, then a de-mixing matrix of the observation signal can be obtained according to the first mixing matrix and the speech signal with reverberation, and finally a separation signal can be obtained according to the de-mixing matrix. The embodiment of the application can be used in the field of audio processing, for example, front-end speech signal enhancement.

Description

Method and apparatus for signal processing
Technical Field
The present application relates to the field of audio processing, and more particularly, to methods and apparatus for signal processing.
Background
The cocktail party effect reveals the masking effect of the human ear, i.e. the natural ability to extract a desired sound source from a complex noisy auditory scene (an acoustic scene where multiple sound sources are present simultaneously). With the increasing maturity of voice interaction technology, a target voice signal can be extracted through a blind source separation method. Blind Source Separation (BSS) refers to a process of separating a Source signal from a mixed signal (i.e., an observation signal) without knowing the Source signal and signal mixing system (or transmission channel).
Independent Vector Analysis (IVA) is a commonly used blind source separation method, i.e. a received observed signal is decomposed into several Independent components according to a statistically Independent principle, and these Independent components are used as an approximate estimate of the source signal. However, in the existing IVA-based blind source separation method, the mixing matrix is considered to be formed by a room transfer function, which makes the separation performance affected by the room reverberation condition.
Disclosure of Invention
The embodiment of the application provides a method and a device for signal processing, wherein a de-mixing matrix is obtained according to a mixing matrix containing correlation transfer functions between microphones, so that the influence of reverberation on signal separation can be reduced, and the signal separation performance is improved.
In a first aspect, a method of signal processing is provided, including:
acquiring observation signals, wherein the observation signals comprise original sound source signals of at least two sources acquired by at least two microphones;
determining a first mixing matrix H and a reverberated speech signal from the observation signal
Figure BDA0003375575240000011
Wherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signal
Figure BDA0003375575240000012
The mapping relationship between the two;
mixing the first mixing matrix H and the reverberated speech signal
Figure BDA0003375575240000013
Inputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberation
Figure BDA0003375575240000021
And the unmixing matrix WThe mapping relationship between the two;
and acquiring a separation signal according to the unmixing matrix W and the observation signal.
In a second aspect, there is provided an apparatus for signal processing, comprising:
an acquisition unit for acquiring observation signals, wherein the observation signals comprise original sound source signals of at least two sources acquired by at least two microphones;
a processing unit for determining a first mixing matrix H and the reverberated speech signal from the observation signal
Figure BDA0003375575240000024
Wherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signal
Figure BDA0003375575240000025
The mapping relationship between the two;
the processing unit is further configured to combine the first mixing matrix H and the reverberated speech signal
Figure BDA0003375575240000022
Inputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberation
Figure BDA0003375575240000023
And the mapping relation between the unmixing matrix W;
the processing unit is further configured to obtain a separation signal according to the unmixing matrix W and the observation signal.
In a third aspect, an electronic device is provided, which includes: a processor and a memory; the memory for storing a computer program; the processor is configured to execute the computer program to implement the method of the first aspect.
In a fourth aspect, a chip is provided, comprising: a processor for calling and running the computer program from the memory so that the device on which the chip is installed performs the method according to the first aspect.
In a fifth aspect, there is provided a computer readable storage medium comprising computer instructions which, when executed by a computer, cause the computer to carry out the method of the first aspect.
In a sixth aspect, there is provided a computer program product comprising computer program instructions to, when run on a computer, cause the computer to perform the method of the first aspect.
In the embodiment of the application, a first mixing matrix including a correlation transfer function between microphones and a voice signal with reverberation are obtained according to an observation signal, then a de-mixing matrix of the observation signal is obtained according to the first mixing matrix and the voice signal with reverberation, and finally a separation signal is obtained from the observation signal according to the de-mixing matrix. Since the first mixing matrix contains the correlation transfer function between microphones instead of the room transfer function, and the correlation transfer function between microphones does not contain reverberation, obtaining the unmixing matrix according to the first mixing matrix can reduce the influence of the reverberation on signal separation, thereby improving the signal separation performance.
Drawings
FIG. 1 is a schematic diagram of an application scenario suitable for use in embodiments of the present application;
FIG. 2 is a schematic diagram of a speech recognition system suitable for use with embodiments of the present application;
fig. 3 is a schematic flow chart of a method of signal processing provided by an embodiment of the present application;
FIG. 4 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
FIG. 5 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
FIG. 6 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
FIG. 7 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
FIG. 8 is a schematic flow chart diagram of another method of signal processing provided by an embodiment of the present application;
fig. 9 is a schematic diagram for comparing the effect of the method of signal processing provided by the embodiment of the present application with the effect of the sound source separation scheme in the prior art;
FIG. 10 is an alternative schematic block diagram of an apparatus for signal processing of an embodiment of the present application;
fig. 11 is another alternative schematic block diagram of an electronic device provided by an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a. In one implementation, B may be determined from a. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
In the description of the present application, "at least one" means one or more, "a plurality" means two or more than two, unless otherwise specified. In addition, "and/or" describes an association relationship of associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
It should be further understood that the descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent a particular limitation to the number of devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application.
It should also be appreciated that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the application. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides a signal processing scheme, which can enhance a front-end voice signal, for example, enhance an expected signal, suppress an interference signal, and the like, and can be applied to various fields, for example, smart homes, video conferences, intelligent traffic, driving assistance, and the like, without limitation.
Some brief descriptions will be made below on application scenarios to which the technical solution of the embodiment of the present application can be applied. It should be noted that the following application scenarios are only used for illustrating the embodiments of the present application and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.
Fig. 1 is a schematic diagram of an application scenario suitable for use in an embodiment of the present application. As shown in fig. 1, the application scenario may include a user terminal, which may be, for example, a mobile phone, a smart voice interaction device (e.g., wearable devices such as smart watches and smart glasses), an in-vehicle terminal, and a smart appliance (e.g., a smart speaker, a coffee maker, a printer, etc.). Optionally, the application scenario may further include a computing device, for example, the application scenario may be a cloud server, an intelligent portable device, or a home computing hub, which is not limited in this application. Illustratively, the smart portable device may be a smart phone, a computer, or the like, and the home computing center may be a smart phone, a computer, a smart television, a router, or the like, without limitation. For example, the user terminal and the computing device may be connected through a wireless network or through a bluetooth pairing connection, which is not limited in this embodiment of the application.
It should be noted that the user terminal in fig. 1 is only an example, and the user terminal to which the present application is applied is not limited thereto, and for example, the user terminal may also be an electronic device in an internet of things (IoT) system. In addition, the computing device in fig. 1 is only an example, and the computing device to which the present application is applied is not limited thereto, and may be, for example, a mobile internet device or the like. It should be further noted that the plurality of electronic devices shown in the embodiments of the present application are for better and more comprehensive description of the embodiments of the present application, but should not cause any limitation to the embodiments of the present application.
For a specific example, when the system architecture shown in fig. 1 is applied to a home use scenario, a user terminal may be a home computing hub, such as a mobile phone, a television, a router, or a cloud device, such as a cloud server, for example, and the embodiment of the present application is not limited thereto.
For another specific example, when the system architecture shown in fig. 1 is applied to a personal wearing scenario, the user terminal is, for example, a personal wearing device, such as a smart band, a smart watch, a smart headset, smart glasses, and the like, and the computing device may be a personal device, such as a mobile phone, and the like, which is not limited in this embodiment of the present application.
In some embodiments, the signal processing method provided by the embodiments of the present application may be implemented by a user terminal. For example, after acquiring the observation signal, the user terminal may obtain a unmixing matrix according to the signal processing method provided in the embodiment of the present application, and obtain the separation signal according to the unmixing matrix.
In other embodiments, the signal processing method provided by the embodiments of the present application may be implemented by a user terminal and a computing device in cooperation. For example, after acquiring an observation signal, the user terminal may send the observation signal to the computing device, the computing device obtains a de-mixing matrix according to the signal processing method provided in the embodiment of the present application, and sends the de-mixing matrix to the user terminal, and the user terminal obtains a separation signal according to the de-mixing matrix. For another example, the computing device may obtain a de-mixing matrix according to the signal processing method provided in the embodiment of the present application, obtain a separation signal according to the de-mixing matrix, and send the separation signal to the user terminal.
FIG. 2 is a schematic diagram of a speech recognition system suitable for use with embodiments of the present application. As shown in fig. 2, a front-end signal processing module 201 may be disposed before the speech recognition system 202, the target speech and the interfering speech may be received by one or more microphones (an example of a microphone), an observation signal output by the microphone is input to the front-end signal processing module 201, for example, an enhanced clean target speech signal (i.e., a separation signal) may be obtained after echo cancellation, dereverberation, sound source separation (also referred to as blind source separation), post-processing, and the like, respectively, and then the target speech signal may be input to the speech recognition system 202 for speech recognition. The signal processing scheme provided by the embodiment of the application can be applied to the sound source separation module, and the target speech signal is obtained by obtaining the unmixing matrix and performing signal separation on the observation signal.
For example, the front-end signal processing module 201 in fig. 2 may be on the user terminal in fig. 1, or may be on the computing device in fig. 1, which is not limited in this application.
In the following, related terms related to the embodiments of the present application are described.
1) Mixing matrix: a mapping relationship (e.g., a frequency domain linear combination relationship in the complex domain) between the observed signal and the original sound source signal is characterized. The mixing matrix may be a matrix of Room Transfer Functions (RTFs) from the individual sound sources to the individual microphones.
2) Unmixing matrix: the inverse matrix of the mixing matrix, i.e. the target matrix to be solved, characterizes the mapping relationship between the target speech signal and the observed signal (e.g. the frequency domain linear combination relationship in the complex domain). The unmixing matrix may also be referred to as a separation matrix, both meaning the same.
3) Room transfer function: a function that characterizes the propagation characteristics of sound in the frequency domain from a sound source to a microphone (e.g., a microphone).
4) The correlation transfer function between microphones is a function that characterizes the frequency domain propagation characteristics of sound from one microphone to another. When the microphones are microphones, the correlation transfer function between the microphones may be referred to as a correlation transfer function between the microphones.
Currently, in an IVA-based blind source separation method, an IVA-based separation method is used, a source signal model is established according to a hybrid matrix to obtain an objective function, the objective function is iteratively optimized, and a separation matrix is solved until the model converges to obtain an estimated source signal. In the scheme, the mixing matrix is considered to be formed by a room transfer function, so that the separation performance of the voice signal is influenced by the room reverberation condition, therefore, dereverberation preprocessing needs to be carried out in advance, and the complexity of a sound source separation algorithm is increased. Secondly, this solution is difficult to estimate the variance of the source signal, requires pre-whitening of the observed signal, and is thus difficult to implement in real time in the product. Finally, the scheme adopts a natural gradient method to carry out parameter optimization, the classification performance is limited by step length parameters, and although a large number of self-adaptive variable step length technologies are provided, the gradient descent algorithm still has large calculation amount.
In view of the above problem, embodiments of the present application provide a method for signal processing, which may transform a mixing matrix into a mixing matrix including a correlation transfer function between microphones, instead of a room transfer function, and the correlation transfer function between the microphones does not include reverberation, so as to obtain a unmixing matrix according to the mixing matrix including the correlation transfer function between the microphones, and may mitigate an influence of the reverberation on signal separation, thereby improving signal separation performance.
Furthermore, according to the mixing matrix and the voice signal with reverberation, the de-mixing matrix can construct a first parameter, and the de-mixing matrix is determined according to the mapping relation between the first parameter and the de-mixing matrix, so that estimation of a voice signal model can be avoided in the signal separation process, pre-whitening processing on an observed signal is not needed, and meanwhile, a natural gradient method is avoided for parameter optimization, so that the separation process is not restricted by the step length parameter, and the calculated amount can be effectively reduced.
The technical solutions provided by the embodiments of the present application are described below with reference to the accompanying drawings.
Fig. 3 shows a schematic flow chart of a method 300 of signal processing provided by an embodiment of the present application. The method 300 may be used for blind source separation, for example, may be applied to the application scenario shown in fig. 1, or may be applied to the speech recognition system shown in fig. 2, without limitation. As shown in fig. 3, method 300 includes steps 310 through 340.
An observation signal is acquired 310, wherein the observation signal comprises raw sound source signals of at least two sources acquired by at least two microphones.
Illustratively, the user terminal may acquire the observed signal via one or more microphones (e.g., microphones). The observation signal may comprise speech signals from a plurality of sound sources, which may comprise a target speech signal, i.e. a speech signal from a desired sound source. The observation signal may also include interfering speech signals, i.e. speech signals from undesired sound sources. In addition, the transmission channel or mixing system information of the observed signal is unknown.
In some embodiments, a Short-Time Fourier Transform (STFT) may be performed on the observed signal, resulting in the following equation (1):
x(f,t)=Afs(f,t) (1)
wherein x (f, t) represents an observation signal of f frequency point and t time, AfRepresenting the mixing matrix at the f bins (i.e. one example of the second mixing matrix a),s (f, t) represents original sound source signals of at least two sources of f frequency points and t time, wherein f is the frequency of the signals, and t is the time of the signals.
In the following description, a scheme provided by an embodiment of the present application is described by taking a dual-microphone and dual-sound-source scene as an example. It will be appreciated that the process may be extended to the case of multiple microphones and multiple sound sources, and in particular reference may be made to the description of the process of two microphones and two sound sources, and some simple adaptations may be required, which are within the scope of the embodiments of the present application.
For example, in a two-microphone, two-source scenario, the observed signal x (f, t) may be expressed as:
x(f,t)=[x1(f,t),x2(f,t)]T
the original sound source signal s (f, t) can be expressed as:
s(f,t)=[s1(f,t),s2(f,t)]T
mixing matrix AfGenerally consisting of a room transfer function, which can be expressed as:
Figure BDA0003375575240000071
wherein, according to the formula (2), AfIncluding 4 parameters
Figure BDA0003375575240000072
In the method 300 for signal processing provided in the embodiment of the present application, it is necessary to estimate the unmixing matrix WfAnd satisfies the following conditions:
y(f,t)=Wfx(f,t) (3)
where y (f, t) represents the estimated separated signal, or may be referred to as the target speech signal, and should coincide with s (f, t) as much as possible. In a two-microphone, two-sound-source scenario, y (f, t) is y1f, t, y2f, tT.
320 determining a first mixing matrix H and the reverberated speech signal from the observation signal
Figure BDA0003375575240000081
Wherein the first mixing matrix H comprises a first correlation transfer function between at least two microphones. The first mixing matrix H is used for representing the observation signal and the language signal with reverberation
Figure BDA00033755752400000813
The mapping relationship between them.
Illustratively, in step 320, the above equation (1) may be transformed to obtain:
Figure BDA0003375575240000082
wherein HfA mixing matrix representing the frequencies at f, including the associated transfer functions between the microphones,
Figure BDA0003375575240000083
and (3) representing the voice signal with reverberation at the frequency f and the time t.
In some alternative embodiments, referring to fig. 4, a speech signal with reverberation may be determined according to the following steps 321 and 322
Figure BDA0003375575240000084
321, determining a mapping relation between the second mixing matrix a and the first mixing matrix H.
322 determining said reverberated speech signal based on said mapping relationship and said original sound source signals of said at least two sources
Figure BDA0003375575240000085
Illustratively, taking a dual-microphone and dual-sound-source scene as an example, the mixing matrix a in formula (1) can be obtainedfThe following transformations are made:
Figure BDA0003375575240000086
wherein the content of the first and second substances,
Figure BDA0003375575240000087
Figure BDA0003375575240000088
and
Figure BDA0003375575240000089
for the related transfer functions between the microphones, the two form a new mixing matrix
Figure BDA00033755752400000810
I.e. an example of the first mixing matrix H. HfIncluding 2 parameters
Figure BDA00033755752400000811
Further, substituting equation (5) into equation (4) can obtain:
Figure BDA00033755752400000812
as can be seen from equation (6), the speech signal with reverberation
Figure BDA0003375575240000091
In equation (6), the speech signal to be restored is changed from the original speech signal s (f, t) to the reverberated speech signal
Figure BDA0003375575240000092
And a mixing matrix A consisting of room transfer functionsfInto a mixing matrix H formed by microphone-dependent transfer functionsfThe reverberation involved due to the room transfer function is transferred to the reverberated speech signal
Figure BDA0003375575240000093
So that the microphone-dependent transfer function does not containAnd (4) reverberation.
330, mixing the first mixing matrix H and the reverberated speech signal
Figure BDA0003375575240000094
Inputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberation
Figure BDA0003375575240000095
And the mapping relation between the unmixing matrix W.
That is, the signal processing model may be based on the first mixing matrix H, the reverberated speech signal
Figure BDA0003375575240000096
And a mapping relation between the mixing matrix W, and the input first mixing matrix H and the voice signal with reverberation
Figure BDA0003375575240000097
And obtaining a demixing matrix W of the observed signals.
In some alternative embodiments, referring to fig. 5, a demixing matrix W for the observed signals may be determined according to steps 331 and 332.
331, from the first mixing matrix H, the speech signal with reverberation
Figure BDA00033755752400000917
And a unmixing matrix W, determining the first parameter.
332, obtaining the unmixing matrix W according to the mapping relationship between the first parameter and the unmixing matrix W.
In some embodiments, the first parameter may be defined. As a possible implementation, referring to fig. 6, the first parameter may be determined according to the following steps 333 and 334:
333, according to the first mixing matrix H, the voice signal with reverberation
Figure BDA0003375575240000098
A second parameter is determined.
334, the first parameter is determined according to the second parameter and the de-mixing matrix W.
Illustratively, the second parameter may be expressed as
Figure BDA0003375575240000099
In the embodiment of the present application, the first parameter may be defined
Figure BDA00033755752400000910
And defining a second parameter
Figure BDA00033755752400000911
Wherein E [ alpha ], [ beta ], [ alpha ], [ beta ]]Indicating data expectation, different values of k correspond to different sound sources.
For a dual microphone, dual sound source scenario, because
Figure BDA00033755752400000912
And
Figure BDA00033755752400000913
independently of each other, substituting the above formula (6) into the second parameter
Figure BDA00033755752400000914
In (b), one can obtain:
Figure BDA00033755752400000915
in addition, the unmixing matrix WfAnd a mixing matrix HfIs a reciprocal matrix, satisfies WfHfI, i.e.:
Figure BDA00033755752400000916
in the examples of the present application, it can be considered that
Figure BDA0003375575240000101
The formula (8) is satisfied, wherein the value of k for the dual sound source scene is 1 or 2, which respectively corresponds to different sound sources, and (t-1) represents the last moment of time t.
For equation (7), each term left and right of equal sign is multiplied by left
Figure BDA0003375575240000102
And right ride
Figure BDA0003375575240000103
The following can be obtained:
Figure BDA0003375575240000104
due to the fact that
Figure BDA0003375575240000105
Satisfies the formula (8), i.e.
Figure BDA0003375575240000106
So that equation (9) can become:
Figure BDA0003375575240000107
in the formula (10)
Figure BDA0003375575240000108
Is that
Figure BDA0003375575240000109
Similarly, for equation (7), each term of the left and right equal sign is multiplied by the left
Figure BDA00033755752400001010
And right ride
Figure BDA00033755752400001011
The following can be obtained:
Figure BDA00033755752400001012
in formula (11)
Figure BDA00033755752400001013
Is that
Figure BDA00033755752400001014
As a specific implementation, may be based on the first parameter (e.g., such as
Figure BDA00033755752400001015
And
Figure BDA00033755752400001016
) Determining a mapping relationship between the first parameter and the unmixing matrix W.
That is, it is possible to let:
Figure BDA00033755752400001017
from the formula (12), it can be found
Figure BDA00033755752400001018
And
Figure BDA00033755752400001019
can be expressed, for example, as shown in the following formula
Figure BDA00033755752400001020
Figure BDA0003375575240000111
In some alternative embodiments, the modulus value of the unmixing matrix W may also be determined according to a minimum distortion principle (minimum distortion principle). Illustratively, the modulus value of the unmixing matrix W may be determined according to the following equation (14):
Wf(t)=diag(diag((Wf(t))-1))Wf(t) (14)
in summary, as a possible implementation manner of step 330, firstly, the reverberated speech signal may be obtained according to the first mixing matrix H
Figure BDA0003375575240000112
Determining a second parameter
Figure BDA0003375575240000113
Then according to the second parameter
Figure BDA0003375575240000114
And a de-mixing matrix W for determining the first parameter
Figure BDA0003375575240000115
Finally, according to the first parameter and
Figure BDA0003375575240000116
the mapping relation with the unmixing matrix W, such as equations (13) and (14), yields the unmixing matrix W.
340, obtaining a separation signal according to the unmixing matrix W and the observation signal.
Illustratively, the observed signal x (f, t) may be de-mixed with the matrix WfThe separated signal y (f, t), i.e. the target speech signal, is obtained by substituting the above equation (3).
Therefore, according to the method, a first mixing matrix comprising correlation transfer functions between microphones and a voice signal with reverberation are obtained according to an observation signal, then a de-mixing matrix of the observation signal is obtained according to the first mixing matrix and the voice signal with reverberation, and finally a separation signal is obtained according to the de-mixing matrix. Since the first mixing matrix contains the correlation transfer function between microphones instead of the room transfer function, and the correlation transfer function between microphones does not contain reverberation, obtaining the unmixing matrix according to the first mixing matrix can reduce the influence of the reverberation on signal separation, thereby improving the signal separation performance.
Furthermore, according to the first mixing matrix and the voice signal with reverberation, the first parameter can be constructed by the de-mixing matrix, and the de-mixing matrix is determined according to the mapping relation between the first parameter and the de-mixing matrix, so that the estimation of the voice signal model can be avoided in the signal separation process, the pre-whitening processing on the observed signal is not needed, and the parameter optimization by adopting a natural gradient method is avoided, so that the separation process is not restricted by the step length parameter, the calculated amount can be effectively reduced, and the signal separation efficiency is improved.
In some alternative embodiments, for example, in the case where the energy of a certain original sound source signal in the observed signal is weak, the first parameter is made
Figure BDA0003375575240000117
The mapping relation with the unmixing matrix W (such as the formula (13)) has a denominator of 0, which may cause the above-mentioned signal processing procedure to be unstable, for example, a downtime situation.
In order to ensure the stability of the signal processing process and improve the separation performance of the method 300, an Auxiliary virtual sound Source (AuxIS) may be introduced to enhance the observed signal, so as to obtain a first mixing matrix H of the enhanced observed signal and a voice signal with reverberation
Figure BDA0003375575240000121
For example, the auxiliary virtual sound source may enhance a weaker sound source signal in the original sound source signal to avoid that the energy of a certain original sound source signal is too weak, which may result in the first parameter
Figure BDA0003375575240000122
The mapping relation (such as formula (13)) with the unmixing matrix W has a denominator of 0, which can help to improve the stability of the signal processing process and improve the signal separationPerformance of ion.
Illustratively, referring to fig. 7, in the method 300, a first mixing matrix H of the enhanced observation signal and the reverberated speech signal may be obtained by the following steps 350 to 370.
350, determining the energy of the signal of the auxiliary virtual sound source according to the observation signal.
As a possible implementation, referring to fig. 8, the energy of the signal of the secondary virtual sound source may be determined by the following steps 351 and 352.
351, determining the amplitude spectrum of the signal of the auxiliary virtual sound source according to the observation signal.
352, determining the energy of the signal of the secondary virtual sound source based on the energy ratio of the observed signal to the signal of the secondary virtual sound source.
That is, the signal of the auxiliary virtual sound source can be decomposed into two parts, i.e. the amplitude spectrum of the signal of the auxiliary virtual sound source and the energy ratio of the observed signal to the signal of the auxiliary virtual sound source, which can be specifically seen in formula (15):
Figure BDA0003375575240000123
wherein λ isdBThe energy ratio of the observed signal to the signal of the auxiliary virtual sound source may be given in advance;
Figure BDA0003375575240000124
to assist the amplitude spectrum of the virtual sound source.
By way of example, one may define
Figure BDA0003375575240000125
The following were used:
Figure BDA0003375575240000126
360, obtaining a second related transfer function corresponding to the auxiliary virtual sound source
Figure BDA0003375575240000127
For example, a secondary virtual sound source may be introduced to enhance the kth sound source (e.g. the weakest one of the original sound source signals), where the secondary virtual sound source corresponds to a second related transfer function
Figure BDA0003375575240000128
Can be expressed as
Figure BDA0003375575240000129
Wherein k can be positive integers and respectively correspond to different sound sources.
Alternatively to this, the first and second parts may,
Figure BDA00033755752400001210
for the estimated correlation transfer function, the estimation method may change with the change of the usage scenario. In some embodiments, the estimation may be performed by using a method of averaging multiple point measurements in advance (i.e. a method of real measurement), so as to obtain the related transfer function
Figure BDA0003375575240000131
For example, it may be in a scene where the speaker location is relatively fixed, such as in a car. In some embodiments, the estimation may be performed using an adaptive correlation transfer function estimation algorithm (e.g., a far-field approximation estimation algorithm) to obtain the correlation transfer function
Figure BDA0003375575240000132
For example, in a scenario where the speaker location is unknown, such as a conference room.
370 according to the original sound source signals of said at least two sources, the energy of the signals of said secondary virtual sound source and said second associated transfer function
Figure BDA0003375575240000133
Obtaining the first mixing matrix H and the voice signal with reverberation
Figure BDA0003375575240000134
Wherein the first mixing matrix H comprises the second correlation transfer function
Figure BDA0003375575240000135
The voice signal with reverberation
Figure BDA0003375575240000136
Including the energy of the signal of the secondary virtual sound source.
Illustratively, after obtaining equation (6) above, the equation (6) may be further expanded to obtain:
Figure BDA0003375575240000137
when an auxiliary virtual sound source is introduced to enhance the kth sound source, the enhanced observation signal can be marked as xk(f, t), which can be expressed as the following equation:
Figure BDA0003375575240000138
that is, the first mixing matrix H may be updated to
Figure BDA0003375575240000139
Speech signal with reverberation
Figure BDA00033755752400001310
Can be updated to
Figure BDA00033755752400001311
Illustratively, for a dual sound source scene, k takes values of 1 and 2, corresponding to two different sound sources respectively. When a virtual sound source is introduced, the 1 st sound source is enhanced to obtain an observation signal x1(f, t) is as follows:
Figure BDA00033755752400001312
when a virtual sound source is introduced, the 2 nd sound source is enhanced to obtain an observation signal x2(f, t) is as follows:
Figure BDA00033755752400001313
after the observation signal is enhanced by the auxiliary virtual sound source, the enhanced first mixing matrix H and the reverberated speech signal may be combined
Figure BDA00033755752400001314
And inputting the signal processing model to obtain an enhanced unmixing matrix W of the observation signal. Accordingly, the enhanced first mixing matrix H and the reverberated speech signal may now be used
Figure BDA00033755752400001315
And determining the second parameter and the first parameter, and further obtaining a de-mixing matrix W according to the first parameter and the de-mixing matrix W.
Illustratively, based on the enhanced first mixing matrix H and the reverberated speech signal
Figure BDA00033755752400001316
The determined second parameter may be recorded as
Figure BDA00033755752400001317
The first parameter may be recorded as
Figure BDA00033755752400001318
Substitution of equation (18)
Figure BDA00033755752400001319
It is possible to obtain:
Figure BDA0003375575240000141
Figure BDA0003375575240000142
illustratively, for a dual-microphone, dual-source scene, the first parameter is determined
Figure BDA0003375575240000143
Thereafter, can be
Figure BDA0003375575240000144
Substituting into the above equations (13) and (14), the unmixing matrix W is obtained. Then, a separation signal can be obtained from the unmixing matrix W and the enhanced observation signal.
As a specific example, in obtaining
Figure BDA0003375575240000145
Thereafter, can be
Figure BDA0003375575240000146
Substituting into the equations (13) and (14) to obtain
Figure BDA0003375575240000147
And the modulus values of the unmixing matrix W. Then, can be based on
Figure BDA0003375575240000148
Obtaining a target speech signal of the 1 st sound source based on
Figure BDA0003375575240000149
And obtaining a target voice signal of the 2 nd sound source.
That is, in the case where the auxiliary virtual sound source is introduced, it is possible to first obtain the energy λ (f, t) of the auxiliary virtual sound source from the observed signal and estimate the correlation transfer function between the microphones corresponding to the auxiliary virtual sound source
Figure BDA00033755752400001410
Can then be based on this energy λ (f, t) and the associated transfer function
Figure BDA00033755752400001411
Determining an enhanced observed Signal xk(f, t), and further based on the enhanced observed signal xk(f, t) determining a second parameter
Figure BDA00033755752400001412
And a first parameter
Figure BDA00033755752400001413
Finally, according to the first parameter and
Figure BDA00033755752400001414
the mapping relation with the unmixing matrix W, such as equations (13) and (14), obtains the unmixing matrix W, and thus obtains the separated signal y (f, t), i.e., the target speech signal.
Therefore, in the embodiment of the application, the auxiliary virtual sound source is introduced to enhance the observation signal, so as to obtain a second mixing matrix corresponding to the enhanced observation signal and a voice signal with reverberation, where the second mixing matrix includes a second correlation transfer function corresponding to the auxiliary virtual sound source, and the voice signal with reverberation includes energy of a signal of the auxiliary virtual sound source. The unmixing matrix W to be solved can be regarded as a special beam forming matrix (that is, the unmixing matrix W is not a matrix designed by direction information, but a matrix designed by sound source independence), the added auxiliary virtual sound source can enhance the original voice signal in the observation signal, and the accuracy of the unmixing matrix W can be increased, so that the stability of the signal processing process can be ensured, and the signal separation performance can be improved.
Fig. 9 is a schematic diagram for comparing the effect of the method of signal processing provided by the embodiment of the present application with the effect of the sound source separation scheme in the prior art. The graph (a) is a comparison graph of the Signal-to-Interference Ratio (SIR) rise value of the separated Signal obtained in each scheme, and the graph (b) is a comparison graph of the Signal-to-Interference Ratio (SDR) rise value of the separated Signal obtained in each scheme, and the X-axis of the graph (a) and the graph (b) represents the reverberation time.
For example, a mixed speech signal may be acquired in a two-microphone, two-source mixing scenario. As a specific example, two microphones can be used to collect the voice signals of two persons speaking simultaneously in a room with a length of 4.45m, a width of 3.55 m and a height of 2.5 m. The two persons may be located 1m from the microphones, respectively, at 45 ° and 135 ° with respect to the directional angle of the microphones, respectively, and the distance between the two microphones may be 0.1 m. The reverberation time is adjusted from 150ms to 300ms, and the adjustment step size of the reverberation time is 10 ms.
The speech signals received by the two microphones can respectively adopt (1) the traditional AuxIVA technology; (2) a geometric Constrained Auxiliary function (GCAV) -IVA with VCD of the reference algorithm; (3) the AuxIS-AuxIVA estimating method and device using the guide vector provided by the embodiment of the application; (4) the AuxIS-AuxIVA estimated by using the pre-measured value is provided by the embodiment of the application. Wherein the steering vector represents a far-field approximate estimation of the related transfer function of the AuxIS, and the pre-measured value is the actually measured related transfer function of the AuxIS.
As can be seen from fig. 9, under different reverberation times, the SIR and SDR of the separated signal obtained by the signal processing method provided by the embodiment of the present application are significantly improved compared with the existing method, so that the signal processing method provided by the embodiment of the present application can help to improve the quality of the front-end signal.
The present invention is not limited to the details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the technical concept of the present invention, and the technical solution of the present invention is protected by the present invention. For example, the various features described in the foregoing detailed description may be combined in any suitable manner without contradiction, and various combinations that may be possible are not described in this application in order to avoid unnecessary repetition. For example, various embodiments of the present application may be arbitrarily combined with each other, and the same should be considered as the disclosure of the present application as long as the concept of the present application is not violated.
It should also be understood that, in the various method embodiments of the present application, the sequence numbers of the above-mentioned processes do not imply an execution sequence, and the execution sequence of the processes should be determined by their functions and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. It is to be understood that the numerical designations are interchangeable under appropriate circumstances such that the embodiments of the application described are capable of operation in sequences other than those illustrated or described herein.
Method embodiments of the present application are described in detail above in conjunction with fig. 3-9, and apparatus embodiments of the present application are described in detail below in conjunction with fig. 10-11.
Fig. 10 is a schematic block diagram of an apparatus 700 for signal processing according to an embodiment of the present application. As shown in fig. 10, the signal processing apparatus 700 may include an obtaining unit 710 and a processing unit 720.
An obtaining unit 710 for obtaining observation signals, wherein the observation signals comprise at least two original sound source signals of at least two sources obtained by at least two microphones;
a processing unit 720 for determining a first mixing matrix H and a reverberated speech signal based on the observation signal
Figure BDA0003375575240000161
Wherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signal
Figure BDA0003375575240000162
The mapping relationship between the two;
the processing unit 720 is further configured to combine the first mixing matrix H and the reverberated speech signal
Figure BDA0003375575240000163
Inputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberation
Figure BDA0003375575240000164
And the mapping relation between the unmixing matrix W;
the processing unit 720 is further configured to obtain a separation signal according to the unmixing matrix W and the observation signal.
Optionally, the processing unit 720 is specifically configured to:
the voice signal with reverberation is obtained according to the first mixing matrix H
Figure BDA0003375575240000165
And the unmixing matrix W, determining a first parameter;
and obtaining the unmixing matrix W according to the mapping relation between the first parameter and the unmixing matrix W.
Optionally, the processing unit 720 is specifically configured to:
the voice signal with reverberation is obtained according to the first mixing matrix H
Figure BDA0003375575240000166
Determining a second parameter;
and determining the first parameter according to the second parameter and the unmixing matrix W.
Optionally, the processing unit 720 is further configured to:
and determining the mapping relation between the first parameter and the unmixing matrix W according to the null space of the first parameter.
Optionally, the processing unit 720 is further configured to:
and determining the modulus value of the unmixing matrix W according to the minimum distortion principle.
Optionally, the processing unit 720 is further configured to determine, according to the observation signal, an energy of a signal of an auxiliary virtual sound source;
the obtaining unit 710 is further configured to obtain a second correlation transfer function corresponding to the auxiliary virtual sound source.
Wherein, the processing unit 720 is specifically configured to:
according to the observed signal, the energy of the signal of the auxiliary virtual sound source and the second associated transfer function
Figure BDA0003375575240000167
Obtaining the first mixing matrix H and the voice signal with reverberation
Figure BDA0003375575240000168
Wherein the first mixing matrix H comprises the second correlation transfer function, the reverberated speech signal
Figure BDA0003375575240000169
Including the energy of the signal of the secondary virtual sound source.
Optionally, the processing unit 720 is specifically configured to:
determining a magnitude spectrum of a signal of the auxiliary virtual sound source according to the observation signal;
determining the energy of the signal of the auxiliary virtual sound source according to the energy ratio of the observed signal to the signal of the auxiliary virtual sound source.
Optionally, the obtaining unit 710 is specifically configured to determine the second correlation transfer function by using a way of averaging in advance through multipoint measurement.
Optionally, the obtaining unit 710 is specifically configured to determine the second correlation transfer function by using an adaptive correlation transfer function estimation algorithm.
Optionally, the processing unit 720 is specifically configured to:
determining a mapping relationship between the first mixing matrix H and a second mixing matrix A, wherein the second mixing matrix A is used for representing the mapping relationship between the observation signals and the original sound source signals of the at least two sources;
determining the language with reverberation according to the mapping relation and the original sound source signals of the N sourcesSound signal
Figure BDA0003375575240000171
Optionally, the second mixing matrix a comprises a room transfer function between a sound source of the observation signal to a microphone.
It is to be understood that apparatus embodiments and method embodiments may correspond to one another and that similar descriptions may refer to method embodiments. To avoid repetition, further description is omitted here. Specifically, the apparatus 700 for signal processing in this embodiment may correspond to a corresponding main body for executing the method 300 in this embodiment, and the foregoing and other operations and/or functions of each module in the apparatus 700 are respectively for implementing each method in fig. 3 to fig. 8 or a corresponding flow in each method, and are not described again here for brevity.
The apparatus and system of embodiments of the present application are described above in connection with the drawings from the perspective of functional modules. It should be understood that the functional modules may be implemented by hardware, by instructions in software, or by a combination of hardware and software modules. Specifically, the steps of the method embodiments in the present application may be implemented by integrated logic circuits of hardware in a processor and/or instructions in the form of software, and the steps of the method disclosed in conjunction with the embodiments in the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, electrically erasable programmable memory, registers, and the like, as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps in the above method embodiments in combination with hardware thereof.
Fig. 11 is a schematic block diagram of an electronic device 800 provided in an embodiment of the present application.
As shown in fig. 11, the electronic device 800 may include:
a memory 810 and a processor 820, the memory 810 being configured to store a computer program and to transfer the program code to the processor 820. In other words, the processor 820 may call and execute a computer program from the memory 810 to implement the communication method in the embodiment of the present application.
For example, the processor 820 may be configured to perform the steps of the method 300 according to instructions in the computer program.
In some embodiments of the present application, the processor 820 may include, but is not limited to:
general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like.
In some embodiments of the present application, the memory 810 includes, but is not limited to:
volatile memory and/or non-volatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DR RAM).
In some embodiments of the present application, the computer program may be partitioned into one or more modules, which are stored in the memory 810 and executed by the processor 820 to perform the encoding methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, the instruction segments describing the execution of the computer program in the electronic device 800.
Optionally, the electronic device 800 may further include:
a transceiver 830, the transceiver 830 being connectable to the processor 820 or the memory 810.
The processor 820 may control the transceiver 830 to communicate with other devices, and specifically, may transmit information or data to the other devices or receive information or data transmitted by the other devices. The transceiver 830 may include a transmitter and a receiver. The transceiver 830 may further include one or more antennas.
It should be understood that the various components in the electronic device 800 are connected by a bus system that includes a power bus, a control bus, and a status signal bus in addition to a data bus.
According to an aspect of the present application, there is provided a communication device comprising a processor and a memory, the memory being configured to store a computer program, the processor being configured to call and execute the computer program stored in the memory, so that the encoder performs the method of the above-described method embodiment.
According to an aspect of the present application, there is provided a computer storage medium having a computer program stored thereon, which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. In other words, the present application also provides a computer program product containing instructions, which when executed by a computer, cause the computer to execute the method of the above method embodiments.
According to another aspect of the application, a computer program product or computer program is provided, comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of the above-described method embodiment.
In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application occur, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the module is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (15)

1. A method of signal processing, comprising:
acquiring observation signals, wherein the observation signals comprise original sound source signals of at least two sources acquired by at least two microphones;
determining a first mixing matrix H and a reverberated speech signal from the observation signal
Figure FDA0003375575230000011
Wherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signal
Figure FDA0003375575230000012
The mapping relationship between the two;
mixing the first mixing matrix H and the reverberated speech signal
Figure FDA0003375575230000013
Inputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberation
Figure FDA0003375575230000014
And the mapping relation between the unmixing matrix W;
and acquiring a separation signal according to the unmixing matrix W and the observation signal.
2. The method of claim 1, wherein the mixing the first mixing matrix H and the reverberated speech signal
Figure FDA0003375575230000015
Inputting a signal processing model to obtain a demixing matrix W of the observation signal, wherein the demixing matrix W comprises the following steps:
the voice signal with reverberation is obtained according to the first mixing matrix H
Figure FDA0003375575230000016
And the unmixing matrix W, determining a first parameter;
and obtaining the unmixing matrix W according to the mapping relation between the first parameter and the unmixing matrix W.
3. The method of claim 2, wherein the reverberated speech signal is based on the mixing matrix H
Figure FDA0003375575230000017
And the unmixing matrix W, determining a first parameter, comprising:
the voice signal with reverberation is obtained according to the first mixing matrix H
Figure FDA0003375575230000018
Determining a second parameter;
and determining the first parameter according to the second parameter and the unmixing matrix W.
4. The method of claim 2 or 3, further comprising:
and determining the mapping relation between the first parameter and the unmixing matrix W according to the null space of the first parameter.
5. The method according to any one of claims 2-4, further comprising:
and determining the modulus value of the unmixing matrix W according to the minimum distortion principle.
6. The method of any one of claims 1-5, further comprising:
determining the energy of the signal of the auxiliary virtual sound source according to the observation signal;
acquiring a second related transfer function corresponding to the auxiliary virtual sound source;
wherein the determining of the first mixing matrix H and the reverberated speech signal from the observation signal
Figure FDA0003375575230000021
The method comprises the following steps:
according to the observed signal, the energy of the signal of the auxiliary virtual sound sourceQuantity and said second associated transfer function
Figure FDA0003375575230000022
Obtaining the first mixing matrix H and the voice signal with reverberation
Figure FDA0003375575230000023
Wherein the first mixing matrix H comprises the second correlation transfer function, the reverberated speech signal
Figure FDA0003375575230000024
Including the energy of the signal of the secondary virtual sound source.
7. The method of claim 6, wherein determining the energy of the signal of the secondary virtual sound source from the observed signal comprises:
determining a magnitude spectrum of a signal of the auxiliary virtual sound source according to the observation signal;
determining the energy of the signal of the auxiliary virtual sound source according to the energy ratio of the observed signal to the signal of the auxiliary virtual sound source.
8. The method according to claim 6 or 7, wherein the obtaining a second correlation transfer function corresponding to the auxiliary virtual sound source comprises:
and determining the second correlation transfer function by using a mode of averaging in advance through multipoint measurement.
9. The method according to claim 6 or 7, wherein the obtaining a second correlation transfer function corresponding to the auxiliary virtual sound source comprises:
determining the second correlation transfer function using an adaptive correlation transfer function estimation algorithm.
10. The method of any one of claims 1-9, wherein the root is a root of a plantDetermining a mixing matrix H and a speech signal with reverberation from the observation signal
Figure FDA0003375575230000025
The method comprises the following steps:
determining a mapping relationship between the first mixing matrix H and a second mixing matrix A, wherein the second mixing matrix A is used for representing the mapping relationship between the observation signals and the original sound source signals of the at least two sources;
determining the voice signal with reverberation according to the mapping relation and the original sound source signals of the N sources
Figure FDA0003375575230000026
11. The method according to claim 10, wherein the second mixing matrix a comprises a room transfer function between a sound source of the observation signal to a microphone.
12. An apparatus for signal processing, comprising:
an acquisition unit for acquiring observation signals, wherein the observation signals comprise original sound source signals of at least two sources acquired by at least two microphones;
a processing unit for determining a first mixing matrix H and the reverberated speech signal from the observation signal
Figure FDA0003375575230000031
Wherein the first mixing matrix H comprises a first correlation transfer function between the at least two microphones, the first mixing matrix H being used to represent the observation signal and the reverberated speech signal
Figure FDA0003375575230000032
The mapping relationship between the two;
the processing unit is further configured to sum the first mixing matrix H andthe voice signal with reverberation
Figure FDA0003375575230000033
Inputting a signal processing model to obtain a de-mixing matrix W of the observation signal, wherein the signal processing model is used for representing the first mixing matrix H and the voice signal with reverberation
Figure FDA0003375575230000034
And the mapping relation between the unmixing matrix W;
the processing unit is further configured to obtain a separation signal according to the unmixing matrix W and the observation signal.
13. An electronic device comprising a processor and a memory, the memory having stored therein instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-11.
14. A computer storage medium for storing a computer program comprising instructions for performing the method of any one of claims 1-11.
15. A computer program product, comprising computer program code which, when run by an electronic device, causes the electronic device to perform the method of any of claims 1-11.
CN202111415175.5A 2021-11-25 2021-11-25 Signal processing method and device Active CN114333876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111415175.5A CN114333876B (en) 2021-11-25 2021-11-25 Signal processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111415175.5A CN114333876B (en) 2021-11-25 2021-11-25 Signal processing method and device

Publications (2)

Publication Number Publication Date
CN114333876A true CN114333876A (en) 2022-04-12
CN114333876B CN114333876B (en) 2024-02-09

Family

ID=81046323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111415175.5A Active CN114333876B (en) 2021-11-25 2021-11-25 Signal processing method and device

Country Status (1)

Country Link
CN (1) CN114333876B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090086998A1 (en) * 2007-10-01 2009-04-02 Samsung Electronics Co., Ltd. Method and apparatus for identifying sound sources from mixed sound signal
EP2863391A1 (en) * 2012-06-18 2015-04-22 Goertek Inc. Method and device for dereverberation of single-channel speech
CN109994120A (en) * 2017-12-29 2019-07-09 福州瑞芯微电子股份有限公司 Sound enhancement method, system, speaker and storage medium based on diamylose
CN110428852A (en) * 2019-08-09 2019-11-08 南京人工智能高等研究院有限公司 Speech separating method, device, medium and equipment
WO2020064089A1 (en) * 2018-09-25 2020-04-02 Huawei Technologies Co., Ltd. Determining a room response of a desired source in a reverberant environment
CN112435685A (en) * 2020-11-24 2021-03-02 深圳市友杰智新科技有限公司 Blind source separation method and device for strong reverberation environment, voice equipment and storage medium
CN113393857A (en) * 2021-06-10 2021-09-14 腾讯音乐娱乐科技(深圳)有限公司 Method, device and medium for eliminating human voice of music signal
CN113687307A (en) * 2021-08-19 2021-11-23 中国人民解放军海军工程大学 Self-adaptive beam forming method under low signal-to-noise ratio and reverberation environment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090086998A1 (en) * 2007-10-01 2009-04-02 Samsung Electronics Co., Ltd. Method and apparatus for identifying sound sources from mixed sound signal
EP2863391A1 (en) * 2012-06-18 2015-04-22 Goertek Inc. Method and device for dereverberation of single-channel speech
CN109994120A (en) * 2017-12-29 2019-07-09 福州瑞芯微电子股份有限公司 Sound enhancement method, system, speaker and storage medium based on diamylose
WO2020064089A1 (en) * 2018-09-25 2020-04-02 Huawei Technologies Co., Ltd. Determining a room response of a desired source in a reverberant environment
CN110428852A (en) * 2019-08-09 2019-11-08 南京人工智能高等研究院有限公司 Speech separating method, device, medium and equipment
CN112435685A (en) * 2020-11-24 2021-03-02 深圳市友杰智新科技有限公司 Blind source separation method and device for strong reverberation environment, voice equipment and storage medium
CN113393857A (en) * 2021-06-10 2021-09-14 腾讯音乐娱乐科技(深圳)有限公司 Method, device and medium for eliminating human voice of music signal
CN113687307A (en) * 2021-08-19 2021-11-23 中国人民解放军海军工程大学 Self-adaptive beam forming method under low signal-to-noise ratio and reverberation environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ROBERT AICHNER等: "A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments", SIGNAL PROCESSING, vol. 86, no. 6 *
宁峻: "麦克风阵列波束成形语音分离及声学回声消除方法研究", 中国优秀硕士学位论文全文数据库 *

Also Published As

Publication number Publication date
CN114333876B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN111418010B (en) Multi-microphone noise reduction method and device and terminal equipment
Serizel et al. Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants
US8787587B1 (en) Selection of system parameters based on non-acoustic sensor information
US20190272842A1 (en) Speech enhancement for an electronic device
CN109727604A (en) Frequency domain echo cancel method and computer storage media for speech recognition front-ends
CN111131947B (en) Earphone signal processing method and system and earphone
US20070100605A1 (en) Method for processing audio-signals
US11146897B2 (en) Method of operating a hearing aid system and a hearing aid system
JP4543014B2 (en) Hearing device
US10755728B1 (en) Multichannel noise cancellation using frequency domain spectrum masking
CN111081267B (en) Multi-channel far-field speech enhancement method
WO2019113253A1 (en) Voice enhancement in audio signals through modified generalized eigenvalue beamformer
US9877115B2 (en) Dynamic relative transfer function estimation using structured sparse Bayesian learning
CN110265054A (en) Audio signal processing method, device, computer readable storage medium and computer equipment
US20150318001A1 (en) Stepsize Determination of Adaptive Filter For Cancelling Voice Portion by Combing Open-Loop and Closed-Loop Approaches
CN111681665A (en) Omnidirectional noise reduction method, equipment and storage medium
Spriet et al. Stochastic gradient-based implementation of spatially preprocessed speech distortion weighted multichannel Wiener filtering for noise reduction in hearing aids
CN105957536B (en) Based on channel degree of polymerization frequency domain echo cancel method
CN112802490A (en) Beam forming method and device based on microphone array
CN113889135A (en) Method for estimating direction of arrival of sound source, electronic equipment and chip system
US20140254825A1 (en) Feedback canceling system and method
US20230209283A1 (en) Method for audio signal processing on a hearing system, hearing system and neural network for audio signal processing
CN114333876A (en) Method and apparatus for signal processing
Farmani et al. Sound source localization for hearing aid applications using wireless microphones
CN113223552B (en) Speech enhancement method, device, apparatus, storage medium, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40069743

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant