CN113782047B - Voice separation method, device, equipment and storage medium - Google Patents

Voice separation method, device, equipment and storage medium Download PDF

Info

Publication number
CN113782047B
CN113782047B CN202111040658.1A CN202111040658A CN113782047B CN 113782047 B CN113782047 B CN 113782047B CN 202111040658 A CN202111040658 A CN 202111040658A CN 113782047 B CN113782047 B CN 113782047B
Authority
CN
China
Prior art keywords
channel
signal
angle deviation
time domain
noise reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111040658.1A
Other languages
Chinese (zh)
Other versions
CN113782047A (en
Inventor
戴玮
关海欣
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202111040658.1A priority Critical patent/CN113782047B/en
Publication of CN113782047A publication Critical patent/CN113782047A/en
Application granted granted Critical
Publication of CN113782047B publication Critical patent/CN113782047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/02Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using radio waves
    • G01S3/14Systems for determining direction or deviation from predetermined direction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

The invention relates to a voice separation method, a device, equipment and a storage medium, which comprises the steps of separating a time domain mixed voice signal to obtain a time domain signal of a first channel and a time domain signal of a second channel, selecting two-dimensional arrival azimuth estimation corresponding to the time domain signal of the first channel with a designated frame number according to the sequence of signal energy from high to low, and solving the mode to obtain azimuth estimation information of the first channel, and selecting two-dimensional arrival azimuth estimation information corresponding to the time domain signal of the second channel with the designated frame number, and solving the mode to obtain azimuth estimation of the second channel; calculating pitch angle deviation of the first channel and azimuth angle deviation of the first channel according to the azimuth estimation information of the first channel, and calculating pitch angle deviation of the second channel and azimuth angle deviation of the second channel according to the azimuth estimation information of the second channel; and obtaining a comparison result of the deviations of the first channel and the second channel, and determining a target sound source corresponding to each channel according to the comparison result.

Description

Voice separation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a speech separation method, apparatus, device, and storage medium.
Background
In recent years, with the rapid development of speech recognition technology, urgent technical demands are being made on real-time speech separation technology in a multi-path speech recognition scenario. For example, in one-to-one education, it is necessary to separate the voices of students and the voices of teachers.
In the related art, a blind source separation technology is generally adopted to separate mixed voices, but the sequence of output channels corresponding to voice signals obtained by blind source separation is uncertain, so that a user is required to further determine voice signals corresponding to each channel, and the voice separation efficiency is reduced.
Disclosure of Invention
The invention provides a voice separation method, a device, equipment and a storage medium, which are used for solving the technical problems that in the prior art, the order of output channels corresponding to voice signals obtained by blind source separation is uncertain, a user is required to further determine the voice signals corresponding to each channel, and the voice separation efficiency is reduced.
The technical scheme for solving the technical problems is as follows:
a method of speech separation comprising:
performing Fourier transform on the time domain mixed voice signals received by the microphone array to obtain time domain mixed voice signals;
separating the mixed voice signals of the time-frequency domain to obtain a separation signal of a first channel and a separation signal of a second channel;
respectively carrying out short-time inverse Fourier transform on the separated signals of the first channel and the separated signals of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel;
selecting two-dimensional arrival azimuth estimation corresponding to time domain signals of a first channel with a specified frame number according to the sequence of the signal energy from high to low, and solving a mode to obtain azimuth estimation information of the first channel, and selecting two-dimensional arrival azimuth estimation information corresponding to time domain signals of a second channel with a specified frame number, and solving a mode to obtain azimuth estimation of the second channel;
calculating pitch angle deviation of the first channel and azimuth angle deviation of the first channel according to the azimuth estimation information of the first channel, and calculating pitch angle deviation of the second channel and azimuth angle deviation of the second channel according to the azimuth estimation information of the second channel;
if the pitch angle deviation of the first channel is not greater than the pitch angle deviation of the second channel and/or the azimuth angle deviation of the first channel is not greater than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the first target sound source and the second channel is the voice information of the second target sound source;
if the pitch angle deviation of the first channel is larger than the pitch angle deviation of the second channel, and the azimuth angle deviation of the first channel is larger than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the second target sound source, and the second channel is the voice information of the first target sound source.
Further, in the above voice separation method, before performing short-time inverse fourier transform on the separation signal of the first channel and the separation signal of the second channel to obtain the time domain signal of the first channel and the time domain signal of the second channel, the method further includes:
processing the separation signal of the first channel and the separation signal of the second channel through an adaptive filtering algorithm to obtain a primary noise reduction signal of the first channel;
performing energy comparison on the primary noise reduction signal of the first channel and the mixed voice signal of the time domain, and processing the voice signal with high energy and the mixed voice signal of the time domain through a self-adaptive filtering algorithm and a nonlinear noise reduction algorithm to obtain a primary noise reduction signal of a second channel;
correspondingly, performing short-time inverse fourier transform on the separation signal of the first channel and the separation signal of the second channel respectively to obtain a time domain signal of the first channel and a time domain signal of the second channel, including:
and respectively carrying out short-time inverse Fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel.
Further, in the above voice separation method, before performing short-time inverse fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel to obtain the time domain signal of the first channel and the time domain signal of the second channel, the method further includes:
the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel are respectively subjected to single-channel noise reduction to eliminate background noise, so that a final noise reduction signal of the first channel and a final noise reduction signal of the second channel are obtained;
correspondingly, performing short-time inverse fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel respectively to obtain a time domain signal of the first channel and a time domain signal of the second channel, including:
and respectively carrying out short-time inverse Fourier transform on the final noise reduction signal of the first channel and the final noise reduction signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel.
Further, the above voice separation method further includes:
and updating the weight of a filter corresponding to the adaptive filtering algorithm when the pitch angle deviation is larger than the angle deviation threshold of the pitch angle or the azimuth angle deviation is larger than the angle deviation threshold of the azimuth angle.
Further, the above voice separation method further includes:
and when the pitch angle deviation is smaller than or equal to the angle deviation threshold value, and the azimuth angle deviation is smaller than or equal to the angle deviation threshold value of the azimuth angle, maintaining the weight of the filter corresponding to the adaptive filtering algorithm unchanged.
Further, in the above voice separation method, the adaptive filtering algorithm is any one of a least mean square algorithm LMS, an NLMS algorithm, and a least square method RLS.
The invention also provides a voice separation device, which comprises:
the first transformation module is used for obtaining a time-frequency domain mixed voice signal by not carrying out Fourier transformation on the time domain mixed voice signal received by the microphone array;
the separation module is used for separating the time-frequency domain mixed voice signals to obtain a separation signal of a first channel and a separation signal of a second channel;
the second transformation module is used for respectively carrying out short-time inverse Fourier transformation on the separation signal of the first channel and the separation signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel;
the azimuth estimation module is used for selecting two-dimensional arrival azimuth estimation corresponding to the time domain signals of the first channel with the designated frame number according to the sequence of the signal energy from high to low, and solving the mode to obtain azimuth estimation information of the first channel, and selecting two-dimensional arrival azimuth estimation information corresponding to the time domain signals of the second channel with the designated frame number, and solving the mode to obtain azimuth estimation of the second channel;
the deviation estimation module is used for calculating the pitch angle deviation of the first channel and the azimuth angle deviation of the first channel according to the azimuth estimation information of the first channel, and calculating the pitch angle deviation of the second channel and the azimuth angle deviation of the second channel according to the azimuth estimation information of the second channel;
the determining module is used for determining that the first channel is the voice information of the first target sound source and the second channel is the voice information of the second target sound source if the pitch angle deviation of the first channel is not greater than the pitch angle deviation of the second channel and/or the azimuth angle deviation of the first channel is not greater than the azimuth angle deviation of the second channel; if the pitch angle deviation of the first channel is larger than the pitch angle deviation of the second channel, and the azimuth angle deviation of the first channel is larger than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the second target sound source, and the second channel is the voice information of the first target sound source.
Further, in the above voice separation apparatus, the separation module is further configured to:
processing the separation signal of the first channel and the separation signal of the second channel through an adaptive filtering algorithm to obtain a primary noise reduction signal of the first channel;
performing energy comparison on the primary noise reduction signal of the first channel and the mixed voice signal of the time domain, and processing the voice signal with high energy and the mixed voice signal of the time domain through a self-adaptive filtering algorithm and a nonlinear noise reduction algorithm to obtain a primary noise reduction signal of a second channel;
correspondingly, the second transformation module is further configured to perform short-time inverse fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel, so as to obtain a time domain signal of the first channel and a time domain signal of the second channel.
The invention also provides a voice separation apparatus comprising: a processor and a memory;
the processor is configured to execute a program of the speech separation method stored in the memory, so as to implement any one of the above-described speech separation methods.
The present invention also provides a storage medium storing one or more programs that when executed implement any of the above-described methods of speech separation.
The beneficial effects of the invention are as follows:
the method comprises the steps of performing voice separation on a time-domain mixed voice signal, obtaining a time-domain signal of a first channel and a time-domain signal of a second channel, collecting energy judgment, selecting two-dimensional arrival azimuth estimation corresponding to the time-domain signal of the first channel with a specified frame number, and obtaining azimuth estimation information of the first channel, and selecting two-dimensional arrival azimuth estimation information corresponding to the time-domain signal of the second channel with a specified frame number, and obtaining azimuth estimation of the second channel; then, according to the azimuth estimation information of the first channel, calculating the pitch angle deviation of the first channel and the azimuth deviation of the first channel, and according to the azimuth estimation information of the second channel, calculating the pitch angle deviation of the second channel and the azimuth deviation of the second channel; if the pitch angle deviation of the first channel is not greater than the pitch angle deviation of the second channel, and/or the azimuth angle deviation of the first channel is not greater than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the first target sound source, and the second channel is the voice information of the second target sound source; if the pitch angle deviation of the first channel is larger than the pitch angle deviation of the second channel, and the azimuth angle deviation of the first channel is larger than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the second target sound source, and the second channel is the voice information of the first target sound source. Therefore, the voice signals are output according to the determined channel sequence, so that the user is prevented from further determining the voice signals corresponding to each channel, and the voice separation efficiency is improved.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for speech separation according to the present invention;
FIG. 2 is a schematic diagram of a microphone array according to the present invention
FIG. 3 is a schematic diagram of a voice separation apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural view of the voice separation apparatus of the present invention.
Detailed Description
The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.
Fig. 1 is a flowchart of an embodiment of a voice separation method according to the present invention, as shown in fig. 1, the voice separation method of the present embodiment may specifically include the following steps:
100. performing Fourier transform on the time domain mixed voice signals received by the microphone array to obtain time domain mixed voice signals;
fig. 2 is a schematic diagram of a microphone array according to the present invention. As shown in fig. 2, an angle error threshold of a pitch angle and an angle error threshold of an azimuth angle may be set, where the pitch angle θ, e.g., 30 degrees, of the first sound source signal of the time-domain mixed speech signal received by the microphone arraySuch as 60 degrees. The second sound source signal of the time-domain mixed voice signal received by the microphone array can be in any direction.
In a specific implementation process, the microphone array may receive a time-domain mixed speech signal, and because the speech signal has a short-time stable characteristic, the speech signal is generally transformed into a short-time frequency domain for analysis, so that a short-time fourier transform is performed on the time-domain mixed speech signal to obtain a time-frequency domain mixed speech signal. May be expressed as x (t, k), t representing the number of frames and k representing the frequency.
101. Separating the mixed voice signals of the time-frequency domain to obtain a separation signal of a first channel and a separation signal of a second channel;
in a specific implementation process, a blind source separation algorithm may be used to separate the time-frequency domain mixed speech signal, so as to obtain a separation signal of the first channel and a separation signal of the second channel. For the specific separation method, reference may be made to the related art, and will not be described herein.
102. Respectively carrying out short-time inverse Fourier transform on the separated signals of the first channel and the separated signals of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel;
in a specific implementation process, the separation signal of the first channel and the separation signal of the second channel may be respectively subjected to short-time inverse fourier transform, so as to obtain a time domain signal of the first channel and a time domain signal of the second channel.
103. Selecting two-dimensional arrival azimuth estimation corresponding to time domain signals of a first channel with a specified frame number according to the sequence of the signal energy from high to low, and solving a mode to obtain azimuth estimation information of the first channel, and selecting two-dimensional arrival azimuth estimation information corresponding to time domain signals of a second channel with a specified frame number, and solving a mode to obtain azimuth estimation of the second channel;
in a specific implementation process, the pitch angle of each frame of each channel can be obtained through two-dimensional direction of arrival estimationAnd azimuth->The ability of the speech signal for each frame can also be derived from the calculation of the speech signal energy. Wherein, the languageThe energy of the sound signal is calculated as +.>E i Representing speech signal energy, x i (t) represents a time domain signal of each channel of the current frame, and N represents the number of frames.
In a specific implementation process, two-dimensional arrival azimuth estimation corresponding to the time domain signal of the first channel with the first 30% of frame number can be selected, and the mode is calculated to obtain azimuth estimation information of the first channel, and two-dimensional arrival azimuth estimation information corresponding to the time domain signal of the second channel with the designated frame number is selected, and the mode is calculated to obtain azimuth estimation of the second channel.
Specifically, after two-dimensional direction of arrival estimates (pitch angle and azimuth angle) of all frames are obtained, the energy calculation of all frames is ordered from high to low, and the pitch angle and azimuth angle of the first 30% of frames with the highest energy are selected, so that an array of pitch angles and an array of azimuth angles are obtained. Three angular region ranges, such as 0-50, 50-100, 100-180, can be set in advance, and the mode is to choose which angle to see if the value in the array occurs the most frequently. For example, the number of occurrences of 0-50 in the pitch array is the largest, i consider the mode of the pitch of this channel to be any value from 0-50.
104. Calculating pitch angle deviation of the first channel and azimuth angle deviation of the first channel according to the azimuth estimation information of the first channel, and calculating pitch angle deviation of the second channel and azimuth angle deviation of the second channel according to the azimuth estimation information of the second channel;
in one implementation, the position estimate information for the first channel may be recorded asThe azimuth estimation information of the second channel can be denoted +.>The pitch angle deviation of the first channel is +.>The azimuthal deviation of the first channel is +.>Wherein θ represents a reference pitch angle, < >>Representing the reference azimuth angle.
105. Detecting whether the pitch angle deviation of a first channel is larger than that of a second channel, and whether the azimuth angle deviation of the first channel is larger than that of the second channel; if yes, go to step 106, if no, go to step 107;
106. determining the first channel as voice information of a second target sound source, wherein the second channel is the voice information of the first target sound source;
if the pitch angle deviation of the first channel is larger than the pitch angle deviation of the second channel, and the azimuth angle deviation of the first channel is larger than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the second target sound source, determining that the second channel is the voice information of the second target sound source for the voice information of the first target sound source, and determining that the second channel is the voice information of the first target sound source.
107. And determining the first channel as the voice information of the first target sound source, and determining the second channel as the voice information of the second target sound source.
If the pitch angle deviation of the first channel is not greater than the pitch angle deviation of the second channel, and/or the azimuth angle deviation of the first channel is not greater than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the first target sound source, and the second channel is the voice information of the second target sound source.
According to the voice separation method, voice separation is carried out on the time-domain mixed voice signals, after the time-domain signals of the first channel and the time-domain signals of the second channel are obtained, energy judgment is gathered, two-dimensional arrival azimuth estimation corresponding to the time-domain signals of the first channel with a specified frame number is selected, the mode is calculated, azimuth estimation information of the first channel is obtained, two-dimensional arrival azimuth estimation information corresponding to the time-domain signals of the second channel with a specified frame number is selected, and the mode is calculated, so that azimuth estimation of the second channel is obtained; then, according to the azimuth estimation information of the first channel, calculating the pitch angle deviation of the first channel and the azimuth deviation of the first channel, and according to the azimuth estimation information of the second channel, calculating the pitch angle deviation of the second channel and the azimuth deviation of the second channel; if the pitch angle deviation of the first channel is not greater than the pitch angle deviation of the second channel, and/or the azimuth angle deviation of the first channel is not greater than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the first target sound source, and the second channel is the voice information of the second target sound source; if the pitch angle deviation of the first channel is larger than the pitch angle deviation of the second channel, and the azimuth angle deviation of the first channel is larger than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the second target sound source, and the second channel is the voice information of the first target sound source. Therefore, the voice signals are output according to the determined channel sequence, so that the user is prevented from further determining the voice signals corresponding to each channel, and the voice separation efficiency is improved.
In a specific implementation process, before the step 102 "performing short-time inverse fourier transform on the separation signal of the first channel and the separation signal of the second channel to obtain the time domain signal of the first channel and the time domain signal of the second channel" in the foregoing embodiment, the following steps may be further performed:
(1) Processing the separation signal of the first channel and the separation signal of the second channel through an adaptive filtering algorithm to obtain a primary noise reduction signal of the first channel;
(2) Performing energy comparison on the primary noise reduction signal of the first channel and the mixed voice signal of the time domain, and processing the voice signal with high energy and the mixed voice signal of the time domain through a self-adaptive filtering algorithm and a nonlinear noise reduction algorithm to obtain a primary noise reduction signal of a second channel;
specifically, after the primary noise reduction signal of the first channel is obtained, energy comparison can be performed between the primary noise reduction signal of the first channel and the mixed voice signal of the time domain, and a voice signal with high energy is selected. If the energy of the primary noise reduction signal of the first channel is higher than the energy of the mixed voice signal of the time domain, the primary noise reduction signal of the first channel is used as the voice signal with high energy, and if the energy of the primary noise reduction signal of the first channel is lower than the energy of the mixed voice signal of the time domain, the mixed voice signal of the time domain is used as the voice signal with high energy. And taking the time domain mixed voice signal as a reference, and filtering by a self-adaptive filtering algorithm to obtain a primary noise reduction signal of the second channel. The self-adaptive filtering algorithm is any one of a least mean square algorithm LMS, an NLMS algorithm and a least square method RLS.
Correspondingly, performing short-time inverse fourier transform on the separation signal of the first channel and the separation signal of the second channel respectively to obtain a time domain signal of the first channel and a time domain signal of the second channel, including: and respectively carrying out short-time inverse Fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel.
In a specific implementation process, before "performing short-time inverse fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel respectively to obtain a time domain signal of the first channel and a time domain signal of the second channel", the following steps may be further performed:
(11) And respectively removing background noise from the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel through single-channel noise reduction to obtain a final noise reduction signal of the first channel and a final noise reduction signal of the second channel.
Correspondingly, the short-time inverse fourier transform is respectively performed on the separation signal of the first channel and the separation signal of the second channel, so as to obtain a time domain signal of the first channel and a time domain signal of the second channel, which comprises the following steps: and respectively carrying out short-time inverse Fourier transform on the final noise reduction signal of the first channel and the final noise reduction signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel.
In this embodiment, the energy judgment and the adaptive filtering technology are combined to further denoise the separated voice signals of each channel, so that the separated voice is cleaner.
In a specific implementation process, after calculating the pitch angle deviation of the first channel and the azimuth angle deviation of the first channel according to the azimuth estimation information of the first channel in step 104", and calculating the pitch angle deviation of the second channel and the azimuth angle deviation of the second channel according to the azimuth estimation information of the second channel, the following steps may be further performed: and updating the weight of a filter corresponding to the adaptive filtering algorithm when the pitch angle deviation is larger than the angle deviation threshold of the pitch angle or the azimuth angle deviation is larger than the angle deviation threshold of the azimuth angle. And when the pitch angle deviation is smaller than or equal to the angle deviation threshold value, and the azimuth angle deviation is smaller than or equal to the angle deviation threshold value of the azimuth angle, maintaining the weight of the filter corresponding to the adaptive filtering algorithm unchanged.
In a specific implementation process, fitting can be performed according to historical weights of the filter corresponding to the updating self-adaptive filtering algorithm to obtain a weight updating fitting function of the filter, so that before the filter is used, the weight of the filter is set according to the obtained weight updating fitting function, after the updating times of the weight updating fitting function reach the preset times m, the actual calculated weight of the filter of the mth time is obtained by using the updating method, if errors of the calculated weight of the filter are within a preset range, the weight of the filter is still set by using the weight updating fitting function of the mth to the mth 2, otherwise, the weight of the filter is set by using the mode that the pitch angle deviation is larger than the angle deviation threshold value or the azimuth angle deviation is larger than the angle deviation threshold value of the azimuth angle, the weight of the filter corresponding to the self-adaptive filtering algorithm is updated until the weight of the filter is updated for n times, and the weight of the filter is updated according to the calculated values of the n times after the calculation is completed, and the weight of the filter is updated again, and the weight of the filter is fitted. Therefore, the pitch angle deviation between the pitch angle in the mixed voice signal of the time-frequency domain and the target azimuth can be avoided through repeated calculation, the azimuth angle deviation between the azimuth angle in the mixed voice signal of the time-frequency domain and the target azimuth can be avoided, and the efficiency and the accuracy are improved.
It should be noted that, the method of the embodiment of the present invention may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the method of an embodiment of the present invention, and the devices interact with each other to complete the method.
Fig. 3 is a schematic structural diagram of an embodiment of the voice separation apparatus according to the present invention, as shown in fig. 3, the voice separation apparatus according to the present embodiment may include a first transformation module 20, a separation module 21, a second transformation module 22, an orientation estimation module 23, a deviation estimation module 24, and a determination module 25.
The first transformation module 20 performs fourier transformation on the time domain mixed voice signal received by the microphone array to obtain a time domain mixed voice signal;
a separation module 21, configured to separate the time-frequency domain mixed speech signal to obtain a separation signal of the first channel and a separation signal of the second channel;
a second transform module 22, configured to perform short-time inverse fourier transform on the separated signal of the first channel and the separated signal of the second channel, to obtain a time domain signal of the first channel and a time domain signal of the second channel;
the azimuth estimation module 23 is configured to select, according to the order of the signal energy from high to low, two-dimensional arrival azimuth estimation corresponding to the time domain signal of the first channel with a specified frame number, and calculate a mode, to obtain azimuth estimation information of the first channel, and select two-dimensional arrival azimuth estimation information corresponding to the time domain signal of the second channel with a specified frame number, and calculate a mode, to obtain azimuth estimation of the second channel;
a deviation estimating module 24, configured to calculate a pitch angle deviation of the first channel and an azimuth angle deviation of the first channel according to the azimuth estimation information of the first channel, and calculate a pitch angle deviation of the second channel and an azimuth angle deviation of the second channel according to the azimuth estimation information of the second channel;
a determining module 25, configured to determine that the first channel is the voice information of the first target sound source, and the second channel is the voice information of the second target sound source, if the pitch angle deviation of the first channel is not greater than the pitch angle deviation of the second channel, and/or the azimuth angle deviation of the first channel is not greater than the azimuth angle deviation of the second channel; if the pitch angle deviation of the first channel is larger than the pitch angle deviation of the second channel, and the azimuth angle deviation of the first channel is larger than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the second target sound source, and the second channel is the voice information of the first target sound source.
In a specific implementation, the separation module 21 is further configured to:
processing the separation signal of the first channel and the separation signal of the second channel through an adaptive filtering algorithm to obtain a primary noise reduction signal of the first channel;
and performing energy comparison on the primary noise reduction signal of the first channel and the mixed voice signal of the time domain, and processing the voice signal with high energy and the mixed voice signal of the time domain through a self-adaptive filtering algorithm and a nonlinear noise reduction algorithm to obtain the primary noise reduction signal of the second channel. The self-adaptive filtering algorithm is any one of a least mean square algorithm LMS, an NLMS algorithm and a least square method RLS.
Correspondingly, the second transform module 22 is further configured to perform short-time inverse fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel, so as to obtain a time domain signal of the first channel and a time domain signal of the second channel.
In a specific implementation, the separation module 21 is further configured to: the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel are respectively subjected to single-channel noise reduction to eliminate background noise, so that a final noise reduction signal of the first channel and a final noise reduction signal of the second channel are obtained;
correspondingly, the second transform module 22 is further configured to perform short-time inverse fourier transform on the final noise reduction signal of the first channel and the final noise reduction signal of the second channel, so as to obtain a time domain signal of the first channel and a time domain signal of the second channel.
In a specific implementation, the deviation estimation module 24 is further configured to update the weight of the filter corresponding to the adaptive filtering algorithm when the pitch angle deviation is greater than the angle deviation threshold of the pitch angle, or the azimuth angle deviation is greater than the angle deviation threshold of the azimuth angle. And when the pitch angle deviation is smaller than or equal to the angle deviation threshold value, and the azimuth angle deviation is smaller than or equal to the angle deviation threshold value of the azimuth angle, maintaining the weight of the filter corresponding to the adaptive filtering algorithm unchanged.
The device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and specific implementation schemes thereof may refer to the method described in the foregoing embodiment and related descriptions in the method embodiment, and have beneficial effects of the corresponding method embodiment, which are not described herein.
Fig. 4 is a schematic structural diagram of a voice separation apparatus according to the present invention, as shown in fig. 4, a traffic apparatus according to this embodiment may include: a processor 1010 and a memory 1020. The device may also include an input/output interface 1030, a communication interface 1040, and a bus 1050, as will be appreciated by those skilled in the art. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1030 is used to connect with the input/output module 32 for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
In one specific implementation, the processor 1010 is configured to execute a program for speech separation stored in the memory 1020 to implement the speech separation method of the above embodiment.
The present invention also provides a storage medium storing one or more programs which when executed implement the speech separation method of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the invention. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The present invention is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A method of speech separation comprising:
performing Fourier transform on the time domain mixed voice signals received by the microphone array to obtain time domain mixed voice signals;
separating the mixed voice signals of the time-frequency domain to obtain a separation signal of a first channel and a separation signal of a second channel;
respectively carrying out short-time inverse Fourier transform on the separated signals of the first channel and the separated signals of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel;
selecting two-dimensional arrival azimuth estimation corresponding to time domain signals of a first channel with a specified frame number according to the sequence of the signal energy from high to low, and solving a mode to obtain azimuth estimation information of the first channel, and selecting two-dimensional arrival azimuth estimation information corresponding to time domain signals of a second channel with a specified frame number, and solving a mode to obtain azimuth estimation of the second channel;
calculating pitch angle deviation of the first channel and azimuth angle deviation of the first channel according to the azimuth estimation information of the first channel, and calculating pitch angle deviation of the second channel and azimuth angle deviation of the second channel according to the azimuth estimation information of the second channel;
if the pitch angle deviation of the first channel is not greater than the pitch angle deviation of the second channel and/or the azimuth angle deviation of the first channel is not greater than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the first target sound source and the second channel is the voice information of the second target sound source;
if the pitch angle deviation of the first channel is larger than the pitch angle deviation of the second channel, and the azimuth angle deviation of the first channel is larger than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the second target sound source, and the second channel is the voice information of the first target sound source.
2. The method according to claim 1, wherein the step of performing short-time inverse fourier transform on the first channel separation signal and the second channel separation signal to obtain a first channel time domain signal and a second channel time domain signal, respectively, further comprises:
processing the separation signal of the first channel and the separation signal of the second channel through an adaptive filtering algorithm to obtain a primary noise reduction signal of the first channel;
performing energy comparison on the primary noise reduction signal of the first channel and the mixed voice signal of the time domain, and processing the voice signal with high energy and the mixed voice signal of the time domain through a self-adaptive filtering algorithm and a nonlinear noise reduction algorithm to obtain a primary noise reduction signal of a second channel;
correspondingly, performing short-time inverse fourier transform on the separation signal of the first channel and the separation signal of the second channel respectively to obtain a time domain signal of the first channel and a time domain signal of the second channel, including:
and respectively carrying out short-time inverse Fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel.
3. The method of claim 2, wherein performing short-time inverse fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel, respectively, further comprises:
the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel are respectively subjected to single-channel noise reduction to eliminate background noise, so that a final noise reduction signal of the first channel and a final noise reduction signal of the second channel are obtained;
correspondingly, performing short-time inverse fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel respectively to obtain a time domain signal of the first channel and a time domain signal of the second channel, including:
and respectively carrying out short-time inverse Fourier transform on the final noise reduction signal of the first channel and the final noise reduction signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel.
4. The voice separation method of claim 2, further comprising:
and updating the weight of a filter corresponding to the adaptive filtering algorithm when the pitch angle deviation is larger than the angle deviation threshold of the pitch angle or the azimuth angle deviation is larger than the angle deviation threshold of the azimuth angle.
5. The voice separation method of claim 4, further comprising:
and when the pitch angle deviation is smaller than or equal to the angle deviation threshold value, and the azimuth angle deviation is smaller than or equal to the angle deviation threshold value of the azimuth angle, maintaining the weight of the filter corresponding to the adaptive filtering algorithm unchanged.
6. The method of claim 2, wherein the adaptive filtering algorithm is any one of a least mean square algorithm LMS, an NLMS algorithm, and a least squares RLS.
7. A speech separation device, comprising:
the first transformation module is used for obtaining a time-frequency domain mixed voice signal by not carrying out Fourier transformation on the time domain mixed voice signal received by the microphone array;
the separation module is used for separating the time-frequency domain mixed voice signals to obtain a separation signal of a first channel and a separation signal of a second channel;
the second transformation module is used for respectively carrying out short-time inverse Fourier transformation on the separation signal of the first channel and the separation signal of the second channel to obtain a time domain signal of the first channel and a time domain signal of the second channel;
the azimuth estimation module is used for selecting two-dimensional arrival azimuth estimation corresponding to the time domain signals of the first channel with the designated frame number according to the sequence of the signal energy from high to low, and solving the mode to obtain azimuth estimation information of the first channel, and selecting two-dimensional arrival azimuth estimation information corresponding to the time domain signals of the second channel with the designated frame number, and solving the mode to obtain azimuth estimation of the second channel;
the deviation estimation module is used for calculating the pitch angle deviation of the first channel and the azimuth angle deviation of the first channel according to the azimuth estimation information of the first channel, and calculating the pitch angle deviation of the second channel and the azimuth angle deviation of the second channel according to the azimuth estimation information of the second channel;
the determining module is used for determining that the first channel is the voice information of the first target sound source and the second channel is the voice information of the second target sound source if the pitch angle deviation of the first channel is not greater than the pitch angle deviation of the second channel and/or the azimuth angle deviation of the first channel is not greater than the azimuth angle deviation of the second channel; if the pitch angle deviation of the first channel is larger than the pitch angle deviation of the second channel, and the azimuth angle deviation of the first channel is larger than the azimuth angle deviation of the second channel, determining that the first channel is the voice information of the second target sound source, and the second channel is the voice information of the first target sound source.
8. The speech separation device of claim 7 wherein the separation module is further configured to:
processing the separation signal of the first channel and the separation signal of the second channel through an adaptive filtering algorithm to obtain a primary noise reduction signal of the first channel;
performing energy comparison on the primary noise reduction signal of the first channel and the mixed voice signal of the time domain, and processing the voice signal with high energy and the mixed voice signal of the time domain through a self-adaptive filtering algorithm and a nonlinear noise reduction algorithm to obtain a primary noise reduction signal of a second channel;
correspondingly, the second transformation module is further configured to perform short-time inverse fourier transform on the primary noise reduction signal of the first channel and the primary noise reduction signal of the second channel, so as to obtain a time domain signal of the first channel and a time domain signal of the second channel.
9. A speech separation apparatus, comprising: a processor and a memory;
the processor is configured to execute a program of the speech separation method stored in the memory to implement the speech separation method of any one of claims 1 to 6.
10. A storage medium storing one or more programs which when executed implement the speech separation method of any of claims 1-6.
CN202111040658.1A 2021-09-06 2021-09-06 Voice separation method, device, equipment and storage medium Active CN113782047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111040658.1A CN113782047B (en) 2021-09-06 2021-09-06 Voice separation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111040658.1A CN113782047B (en) 2021-09-06 2021-09-06 Voice separation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113782047A CN113782047A (en) 2021-12-10
CN113782047B true CN113782047B (en) 2024-03-08

Family

ID=78841275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111040658.1A Active CN113782047B (en) 2021-09-06 2021-09-06 Voice separation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113782047B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103308889A (en) * 2013-05-13 2013-09-18 辽宁工业大学 Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment
CN106373589A (en) * 2016-09-14 2017-02-01 东南大学 Binaural mixed voice separation method based on iteration structure
CN106847301A (en) * 2017-01-03 2017-06-13 东南大学 A kind of ears speech separating method based on compressed sensing and attitude information
CN107346664A (en) * 2017-06-22 2017-11-14 河海大学常州校区 A kind of ears speech separating method based on critical band
KR20180079975A (en) * 2017-01-03 2018-07-11 한국전자통신연구원 Sound source separation method using spatial position of the sound source and non-negative matrix factorization and apparatus performing the method
WO2020042708A1 (en) * 2018-08-31 2020-03-05 大象声科(深圳)科技有限公司 Time-frequency masking and deep neural network-based sound source direction estimation method
CN110931036A (en) * 2019-12-07 2020-03-27 杭州国芯科技股份有限公司 Microphone array beam forming method
CN113053406A (en) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 Sound signal identification method and device
CN113050035A (en) * 2021-03-12 2021-06-29 云知声智能科技股份有限公司 Two-dimensional directional pickup method and device
US11064294B1 (en) * 2020-01-10 2021-07-13 Synaptics Incorporated Multiple-source tracking and voice activity detections for planar microphone arrays
CN113225441A (en) * 2021-07-09 2021-08-06 北京中电慧声科技有限公司 Conference telephone system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6030032B2 (en) * 2013-08-30 2016-11-24 本田技研工業株式会社 Sound processing apparatus, sound processing method, and sound processing program
KR20170101629A (en) * 2016-02-29 2017-09-06 한국전자통신연구원 Apparatus and method for providing multilingual audio service based on stereo audio signal
KR102617476B1 (en) * 2016-02-29 2023-12-26 한국전자통신연구원 Apparatus and method for synthesizing separated sound source

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103308889A (en) * 2013-05-13 2013-09-18 辽宁工业大学 Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment
CN106373589A (en) * 2016-09-14 2017-02-01 东南大学 Binaural mixed voice separation method based on iteration structure
CN106847301A (en) * 2017-01-03 2017-06-13 东南大学 A kind of ears speech separating method based on compressed sensing and attitude information
KR20180079975A (en) * 2017-01-03 2018-07-11 한국전자통신연구원 Sound source separation method using spatial position of the sound source and non-negative matrix factorization and apparatus performing the method
CN107346664A (en) * 2017-06-22 2017-11-14 河海大学常州校区 A kind of ears speech separating method based on critical band
WO2020042708A1 (en) * 2018-08-31 2020-03-05 大象声科(深圳)科技有限公司 Time-frequency masking and deep neural network-based sound source direction estimation method
CN110931036A (en) * 2019-12-07 2020-03-27 杭州国芯科技股份有限公司 Microphone array beam forming method
US11064294B1 (en) * 2020-01-10 2021-07-13 Synaptics Incorporated Multiple-source tracking and voice activity detections for planar microphone arrays
CN113050035A (en) * 2021-03-12 2021-06-29 云知声智能科技股份有限公司 Two-dimensional directional pickup method and device
CN113053406A (en) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 Sound signal identification method and device
CN113225441A (en) * 2021-07-09 2021-08-06 北京中电慧声科技有限公司 Conference telephone system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于矢量阵的水下噪声源近场高分辨定位识别方法研究;时洁;中国博士学位论文全文数据库 工程科技Ⅱ辑;20110215;C028-12 *
基于麦克风阵列的语音增强和分离方法研究;李万龙;中国优秀硕士学位论文全文数据库 信息科技辑;20090115;I136-92 *

Also Published As

Publication number Publication date
CN113782047A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
JP7434137B2 (en) Speech recognition method, device, equipment and computer readable storage medium
CN109074816B (en) Far field automatic speech recognition preprocessing
US10382849B2 (en) Spatial audio processing apparatus
CN108242234B (en) Speech recognition model generation method, speech recognition model generation device, storage medium, and electronic device
CN102625946B (en) Systems, methods, apparatus, and computer-readable media for dereverberation of multichannel signal
CN103426435B (en) The source by independent component analysis with mobile constraint separates
EP3526979B1 (en) Method and apparatus for output signal equalization between microphones
CN108922553B (en) Direction-of-arrival estimation method and system for sound box equipment
JP2014085673A (en) Method for intelligently controlling volume of electronic equipment, and mounting equipment
US20200342891A1 (en) Systems and methods for aduio signal processing using spectral-spatial mask estimation
CN111031463B (en) Microphone array performance evaluation method, device, equipment and medium
CN112492207B (en) Method and device for controlling camera to rotate based on sound source positioning
US20180172502A1 (en) Estimation of reverberant energy component from active audio source
CN113053365B (en) Voice separation method, device, equipment and storage medium
CN112951263B (en) Speech enhancement method, apparatus, device and storage medium
CN113470685B (en) Training method and device for voice enhancement model and voice enhancement method and device
CN110890099B (en) Sound signal processing method, device and storage medium
CN113782047B (en) Voice separation method, device, equipment and storage medium
GB2510650A (en) Sound source separation based on a Binary Activation model
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
JP6343771B2 (en) Head related transfer function modeling apparatus, method and program thereof
US20230116052A1 (en) Array geometry agnostic multi-channel personalized speech enhancement
CN116106826A (en) Sound source positioning method, related device and medium
CN113555031A (en) Training method and device of voice enhancement model and voice enhancement method and device
CN109378012B (en) Noise reduction method and system for recording audio by single-channel voice equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant