CN110718230B - Method and system for eliminating reverberation - Google Patents

Method and system for eliminating reverberation Download PDF

Info

Publication number
CN110718230B
CN110718230B CN201910810308.5A CN201910810308A CN110718230B CN 110718230 B CN110718230 B CN 110718230B CN 201910810308 A CN201910810308 A CN 201910810308A CN 110718230 B CN110718230 B CN 110718230B
Authority
CN
China
Prior art keywords
voice
data
probability
reverberation
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910810308.5A
Other languages
Chinese (zh)
Other versions
CN110718230A (en
Inventor
关海欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN201910810308.5A priority Critical patent/CN110718230B/en
Publication of CN110718230A publication Critical patent/CN110718230A/en
Application granted granted Critical
Publication of CN110718230B publication Critical patent/CN110718230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a method and a system for eliminating reverberation, which are characterized in that voice time-frequency probability detection related to a voice signal is introduced in the process of eliminating the reverberation to calculate and obtain the voice time-frequency probability of the voice signal, and adaptive updating and adjusting operations are carried out on a filter used in reverberation elimination processing according to the voice time-frequency probability, so that the probability of update errors of the filter can be effectively reduced, the calculation frequency of the filter is reduced, the calculation amount of the reverberation elimination is greatly compressed, and the applicability of a reverberation elimination algorithm in different types of processors is improved.

Description

Method and system for eliminating reverberation
Technical Field
The present invention relates to the field of sound signal processing technologies, and in particular, to a method and a system for eliminating reverberation.
Background
When the sound waves encounter walls, the ground and objects in the propagation process, multiple reflections occur and then reach corresponding microphones, the microphones can receive a large number of reflected sound waves besides direct sound waves from a sound source, and the reflected sound waves simultaneously form corresponding reverberation components. Generally, the early reverberation component can make the sound signal fuller, while the late reverberation component can greatly reduce the identifiability of the sound signal and influence the audibility of the sound signal, so the existing reverberation cancellation technology mainly aims at the late reverberation component.
Although the existing technology for eliminating the late reverberation component can effectively suppress the late reverberation component, the technology designs an online algorithm which updates a sound filter to a wrong direction under the condition of strong noise and coherent noise, so that the reverberation component cannot be effectively eliminated, and causes damage to the sound signal and reduces the recognizable performance of the sound signal, and the online algorithm is limited to be operated on a processor with low computing power because how much the online algorithm eliminates the reverberation component is related to the length of the filter, and the amount of computation involved is huge when the order of the filter is higher. It can be seen that the prior art does not achieve an accurate and efficient cancellation process of reverberation components, particularly higher order reverberation components.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a method and a system for eliminating reverberation, which are characterized in that voice time-frequency probability detection related to a voice signal is introduced in the process of eliminating the reverberation to calculate the voice time-frequency probability of the voice signal, and a filter used in reverberation elimination processing is adaptively updated and adjusted according to the voice time-frequency probability, wherein most of noise in the voice signal can be excluded by calculating the voice time-frequency probability of the voice signal, so that when the filter is updated and adjusted according to the voice time-frequency probability, the probability of update error of the filter can be effectively reduced, and meanwhile, as most of noise in the voice signal is excluded, non-voice data does not participate in subsequent confusion elimination calculation, the calculation frequency of the filter can be greatly reduced, and the calculation amount of reverberation elimination can be greatly reduced, thereby facilitating large-scale compression of the calculated amount of the reverberation elimination and improving the applicability of the reverberation elimination algorithm in different types of processors.
The invention provides a method for eliminating reverberation, which is characterized by comprising the following steps:
the method comprises the following steps that (1) target voice signals are preprocessed, and voice time-frequency probability corresponding to the preprocessed target voice signals is obtained;
step (2), according to the voice time-frequency probability, adjusting the filtering processing acted on the target voice signal;
step (3), according to the adjusted filtering processing, suppressing reverberation components existing in the voice array data corresponding to the target voice signal;
further, in the step (1), preprocessing the target speech signal, and acquiring the speech time-frequency probability corresponding to the preprocessed target speech signal specifically includes,
a step (101) of performing late reverberation suppression processing on the target speech signal to eliminate late reverberation components in the target speech signal;
step (102), performing speech time-frequency probability calculation processing on the target speech signal after the late reverberation suppression processing to obtain the speech time-frequency probability, wherein the speech time-frequency probability calculation processing is realized by a deep learning model, and the construction process of the deep learning model comprises,
s1, mixing clean voice data X and noise data n to obtain noisy voice data Y, and decomposing each frame of clean voice signal of the clean voice data X and each frame of mixed voice signal of the noisy voice data Y into a frequency domain to respectively obtain corresponding clean voice frequency domain data X and noisy voice frequency domain data Y;
s2, for the clean voice frequency domain data X and the noisy voice frequency domain data Y, calculating a probability value p (k) = abs (Y (k))/abs (X (k)) corresponding to the noisy voice frequency domain data Y at each frequency point k relative to the clean voice frequency domain data X, where abs (X (k)) is a probability value of the clean voice frequency domain data X for each frequency point k, and abs (Y (k)) is a probability value of the noisy voice frequency domain data Y for each frequency point k;
s3, constructing and obtaining the deep learning model according to all probability values of the noisy speech frequency domain data Y corresponding to the clean speech frequency domain data X at each frequency point k;
further, in the step (2), the adjusting the filtering process applied to the target speech signal according to the speech time-frequency probability specifically includes,
step (201), according to the voice time-frequency probability, judging the available state of each of a plurality of frames of voice data corresponding to the target voice signal;
step (202), determining a data buffer evaluation value of a FIFO data buffer area corresponding to each frame of voice data according to the available state judgment result of each frame of voice data in the target voice signal;
a step (203) of determining whether to adjust the filtering process according to the data buffer evaluation value;
further, in the step (201), the determining the available state of each of the plurality of frames of speech data corresponding to the target speech signal according to the speech time-frequency probability specifically includes,
step (2011), comparing the voice time-frequency probability with a preset probability threshold value, and judging the available state of each frame of voice data in the target voice signal according to the comparison result;
step (2022), if the voice time-frequency probability is greater than the preset probability threshold value, determining that the corresponding frame voice data in the target voice signal is in an available state;
step (2023), if the voice time-frequency probability is less than or equal to the preset probability threshold value, determining that the corresponding frame voice data in the target voice signal is in an unavailable state;
in the step (202), determining the data buffer evaluation value of the FIFO data buffer corresponding to each frame of voice data according to the determination result of the available state of each frame of voice data in the target voice signal specifically includes,
step (2021), determining the data storage state of all the frame voice data in the available state in the target voice signal in the FIFO data buffer area according to the available state judgment result;
step (2022), determining a data buffer evaluation value corresponding to each frame of voice data in the available state according to the data storage state of all the frames of voice data in the available state in the FIFO data buffer area;
in the step (203), determining whether to adjust the filtering process specifically includes, according to the data buffer evaluation value,
step (2031) of comparing the data buffer evaluation value with a preset evaluation threshold value, and determining whether to update the filtering process according to the result of the comparison process;
step (2032) of updating the filtering process if the data buffer evaluation value exceeds the preset evaluation threshold value;
step (2033), if the data buffer evaluation value does not exceed the preset evaluation threshold value, the filtering process is not updated;
further, in the step (3), the suppressing reverberation component existing in the voice array data corresponding to the target voice signal according to the adjusted filtering process specifically includes,
step (301), obtaining the reverberation attribute of the target speech signal obtained by the adjusted filtering processing;
a step (302) of converting the target speech signal into the speech array data according to the reverberation attribute;
and (303) carrying out suppression and elimination processing on reverberation components of the voice array data.
The invention also provides a system for eliminating reverberation, which is characterized in that: the system for eliminating reverberation comprises a voice signal preprocessing module, a voice time-frequency probability calculation module, a filter adjusting module and a reverberation suppression module; wherein the content of the first and second substances,
the voice signal preprocessing module is used for preprocessing a target voice signal;
the voice time-frequency probability calculation module is used for calculating and acquiring the voice time-frequency probability corresponding to the preprocessed target voice signal;
the filter module is used for filtering the target voice signal;
the filter adjusting module is used for adjusting the filtering processing mode of the filter module according to the voice time-frequency probability;
the reverberation suppression module is used for suppressing reverberation components existing in the voice array data corresponding to the target voice signal through the adjusted filtering processing mode;
further, the voice signal preprocessing module comprises a late reverberation suppression sub-module; wherein the content of the first and second substances,
the late reverberation suppressor submodule is used for performing late reverberation component suppression processing on the target voice signal so as to eliminate a late reverberation component in the target voice signal;
the voice time-frequency probability calculation module is used for performing voice time-frequency probability calculation processing on the target voice signal subjected to the late reverberation component suppression processing to obtain the voice time-frequency probability;
further, the filter adjusting module comprises a voice data available state judging sub-module, a voice data buffer evaluation value determining sub-module and a filter adjusting determining sub-module; wherein the content of the first and second substances,
the voice data available state judgment submodule is used for judging the available state of each of a plurality of frames of voice data corresponding to the target voice signal according to the voice time-frequency probability;
the voice data buffer evaluation value determining submodule is used for determining a data buffer evaluation value of a FIFO data buffer area corresponding to each frame of voice data according to the available state judgment result of each frame of voice data in the target voice signal;
the filter adjustment determining submodule is used for determining whether to adjust the filtering processing mode of the filter module according to the data buffer evaluation value;
further, the voice data available state judgment submodule comprises a first comparison unit and an available state determination unit; wherein the content of the first and second substances,
the first comparison unit is used for comparing the voice time-frequency probability with a preset probability threshold value;
the available state determining unit is used for judging the available state of each frame of voice data in the target voice signal according to the comparison processing result;
the voice data buffer evaluation value determining submodule comprises a voice data storage state determining unit and a data buffer evaluation value calculating unit; wherein the content of the first and second substances,
the voice data storage state determining unit is used for determining the data storage state of all frame voice data in the target voice signal in the available state in the FIFO data buffer area according to the available state judgment result;
the data buffer evaluation value calculation unit is used for calculating a data buffer evaluation value corresponding to each frame of voice data in the available state according to the data storage state of all the frames of voice data in the available state in the FIFO data buffer area;
the filter adjustment determining submodule comprises a second comparison unit and a filtering processing mode updating unit; wherein the content of the first and second substances,
the second comparison unit is used for comparing the data buffer evaluation value with a preset evaluation threshold value;
the filtering processing mode updating unit is used for determining whether to update the filtering processing mode of the filter module according to the comparison processing result;
further, the reverberation suppression module comprises a reverberation attribute acquisition sub-module, a voice array data conversion sub-module and a reverberation component processing sub-module; wherein the content of the first and second substances,
the attribute acquisition submodule is used for acquiring the reverberation attribute of the target voice signal obtained through the filtering processing;
the voice array data conversion sub-module is used for converting the target voice signal into the voice array data according to the reverberation attribute;
and the reverberation component processing submodule is used for carrying out suppression and elimination processing on reverberation components on the voice array data.
Compared with the prior art, the method and the system for eliminating the reverberation calculate the voice time-frequency probability of the voice signal by introducing the voice time-frequency probability detection about the voice signal in the process of eliminating the reverberation, and perform the adaptive updating and adjusting operation on the filter used in the reverberation elimination processing according to the voice time-frequency probability, wherein most of the noise in the voice signal can be eliminated by calculating the voice time-frequency probability of the voice signal, so that when the filter is updated and adjusted according to the voice time-frequency probability, the probability of the update error of the filter can be effectively reduced, and simultaneously, as most of the noise in the voice signal is eliminated, the non-voice data does not participate in the subsequent confusion elimination calculation, which can greatly reduce the calculation frequency of the filter and reduce the calculation amount of eliminating the reverberation, thereby facilitating large-scale compression of the calculated amount of the reverberation elimination and improving the applicability of the reverberation elimination algorithm in different types of processors.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for eliminating reverberation according to the present invention.
Fig. 2 is a schematic structural diagram of a system for eliminating reverberation according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for eliminating reverberation according to an embodiment of the present invention. The method for eliminating reverberation comprises the following steps:
and (1) preprocessing a target voice signal, and acquiring a voice time-frequency probability corresponding to the preprocessed target voice signal.
Preferably, in the step (1), preprocessing the target speech signal, and acquiring the speech time-frequency probability corresponding to the preprocessed target speech signal specifically includes,
step (101), performing late reverberation suppression processing on the target speech signal to eliminate late reverberation components in the target speech signal;
step (102), performing voice time-frequency probability calculation processing on the target voice signal subjected to the late reverberation suppression processing to obtain the voice time-frequency probability;
preferably, the speech time-frequency probability calculation process is implemented by a deep learning model, and the construction process of the deep learning model comprises,
s1, mixing the clean voice data X and the noise data n to obtain noisy voice data Y, and decomposing each frame of clean voice signal of the clean voice data X and each frame of mixed voice signal of the noisy voice data Y into a frequency domain to respectively obtain corresponding clean voice frequency domain data X and noisy voice frequency domain data Y;
s2, for the clean voice frequency domain data X and the noisy voice frequency domain data Y, calculating a probability value p (k) = abs (Y (k))/abs (X (k))) corresponding to the noisy voice frequency domain data Y at each frequency point k relative to the clean voice frequency domain data X, where abs (X (k)) is the probability value of the clean voice frequency domain data X for each frequency point k, and abs (Y (k)) is the probability value of the noisy voice frequency domain data Y for each frequency point k;
and S3, constructing and obtaining the deep learning model according to all probability values of the noisy speech frequency domain data Y corresponding to the clean speech frequency domain data X at each frequency point k.
And (2) adjusting the filtering processing acted on the target voice signal according to the voice time-frequency probability.
Preferably, in the step (2), the adjusting the filtering applied to the target speech signal according to the speech time-frequency probability specifically includes,
step (201), according to the voice time-frequency probability, judging the available state of each of a plurality of frames of voice data corresponding to the target voice signal;
step (202), determining a data buffer evaluation value of a FIFO data buffer area corresponding to each frame of voice data according to the available state judgment result of each frame of voice data in the target voice signal;
step (203) of determining whether to adjust the filtering process according to the data buffer evaluation value.
Preferably, in the step (201), the determining the available state of each of the frames of speech data corresponding to the target speech signal according to the speech time-frequency probability specifically includes,
step (2011), comparing the voice time-frequency probability with a preset probability threshold value, and judging the available state of each frame of voice data in the target voice signal according to the comparison result;
step (2022), if the voice time-frequency probability is greater than the preset probability threshold, determining that the corresponding frame of voice data in the target voice signal is in an available state;
step (2023), if the voice time-frequency probability is less than or equal to the preset probability threshold, determining that the corresponding frame of voice data in the target voice signal is in an unavailable state;
preferably, in the step (202), determining the data buffer evaluation value of the FIFO data buffer corresponding to each frame of voice data according to the available state judgment result of each frame of voice data in the target voice signal specifically includes,
step (2021), according to the available state judgment result, determining the data storage state of all the frame voice data in the available state in the target voice signal in the FIFO data buffer area;
step (2022), according to the data storage status of all the frames of voice data in the available status in the FIFO data buffer area, determining the data buffer evaluation value corresponding to each frame of voice data in the available status;
preferably, in the step (203), determining whether to adjust the filtering process specifically includes, based on the data buffer evaluation value,
step (2031) of comparing the data buffer evaluation value with a preset evaluation threshold value, and determining whether to update the filtering process according to the result of the comparison process;
step (2032), if the data buffer evaluation value exceeds the preset evaluation threshold value, the filtering process is updated;
step (2033), if the data buffer evaluation value does not exceed the preset evaluation threshold value, the filtering process is not updated.
And (3) according to the adjusted filtering processing, suppressing reverberation components existing in the voice array data corresponding to the target voice signal.
Preferably, step (301), obtaining the reverberation attribute of the target speech signal obtained by the adjusted filtering process;
step (302), according to the reverberation attribute, converting the target voice signal into the voice array data;
and (303) carrying out suppression and elimination processing on the reverberation component on the voice array data.
Fig. 2 is a schematic structural diagram of a system for eliminating reverberation according to the present invention. The system for eliminating reverberation comprises a voice signal preprocessing module, a voice time-frequency probability calculation module, a filter adjusting module and a reverberation suppression module; wherein the content of the first and second substances,
the voice signal preprocessing module is used for preprocessing a target voice signal;
the voice time-frequency probability calculation module is used for calculating and acquiring the voice time-frequency probability corresponding to the preprocessed target voice signal;
the filter module is used for filtering the target voice signal;
the filter adjusting module is used for adjusting the filtering processing mode of the filter module according to the voice time-frequency probability;
the reverberation suppression module is used for suppressing reverberation components existing in the voice array data corresponding to the target voice signal through the adjusted filtering processing mode.
Preferably, the speech signal pre-processing module comprises a late reverberation suppression sub-module;
preferably, the late reverberation suppression submodule is configured to perform late reverberation component suppression processing on the target speech signal to eliminate a late reverberation component in the target speech signal;
preferably, the speech time-frequency probability calculation module is configured to perform speech time-frequency probability calculation processing on the target speech signal subjected to the late reverberation component suppression processing to obtain the speech time-frequency probability;
preferably, the filter adjusting module comprises a voice data available state judging sub-module, a voice data buffer evaluation value determining sub-module and a filter adjusting determining sub-module;
preferably, the voice data available state judgment sub-module is configured to judge, according to the voice time-frequency probability, an available state of each of a plurality of frames of voice data corresponding to the target voice signal;
preferably, the voice data buffer evaluation value determining sub-module is configured to determine, according to the available state judgment result of each frame of voice data in the target voice signal, a data buffer evaluation value of a FIFO data buffer area corresponding to each frame of voice data;
preferably, the filter adjustment determining sub-module is configured to determine whether to adjust a filtering processing mode of the filter module according to the data buffer evaluation value;
preferably, the voice data available state judgment sub-module includes a first comparison unit and an available state determination unit,
the first comparison unit is used for comparing the voice time-frequency probability with a preset probability threshold value;
the available state determining unit is used for judging the available state of each frame of voice data in the target voice signal according to the comparison processing result;
preferably, the voice data buffer evaluation value determination sub-module includes a voice data storage state determination unit and a data buffer evaluation value calculation unit; wherein the content of the first and second substances,
the voice data storage state determining unit is used for determining the data storage state of all frame voice data in the available state in the target voice signal in the FIFO data buffer area according to the available state judgment result;
the data buffer evaluation value calculation unit is used for calculating the data buffer evaluation value corresponding to each frame of voice data in the available state according to the data storage state of all the frames of voice data in the available state in the FIFO data buffer area;
preferably, the filter adjustment determination submodule includes a second comparison unit and a filter processing mode update unit; wherein the content of the first and second substances,
the second comparison unit is used for comparing the data buffer evaluation value with a preset evaluation threshold value;
the filter processing mode updating unit is used for determining whether to update the filter processing mode of the filter module according to the comparison processing result;
preferably, the reverberation suppression module comprises a reverberation attribute acquisition sub-module, a voice array data conversion sub-module and a reverberation component processing sub-module;
preferably, the attribute obtaining sub-module is configured to obtain a reverberation attribute of the target speech signal obtained through the filtering;
preferably, the voice array data conversion sub-module is configured to convert the target voice signal into the voice array data according to the reverberation attribute;
preferably, the reverberation component processing sub-module is configured to perform suppression and cancellation processing on the reverberation component on the voice array data.
It can be seen from the above embodiments that, the method and system for eliminating reverberation calculates the voice time-frequency probability of a voice signal by introducing voice time-frequency probability detection on the voice signal during the process of eliminating reverberation, and performs adaptive update adjustment operation on a filter used in reverberation elimination processing according to the voice time-frequency probability, wherein most of noise in the voice signal can be excluded by calculating the voice time-frequency probability of the voice signal, so that when the filter is adjusted according to the voice time-frequency probability, the probability of update error of the filter can be effectively reduced, and meanwhile, since most of noise in the voice signal is excluded, non-voice data does not participate in subsequent confusion elimination calculation, which can greatly reduce the calculation frequency of the filter and reduce the calculation amount of eliminating reverberation, thereby facilitating large-scale compression of the calculated amount of the reverberation elimination and improving the applicability of the reverberation elimination algorithm in different types of processors.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (4)

1. A method of cancelling reverberation, comprising the steps of:
the method comprises the following steps that (1) target voice signals are preprocessed, and voice time-frequency probability corresponding to the preprocessed target voice signals is obtained;
the step (1) specifically comprises:
a step (101) of performing late reverberation suppression processing on the target speech signal to eliminate late reverberation components in the target speech signal;
step (102), performing speech time-frequency probability calculation processing on the target speech signal after the late reverberation suppression processing to obtain the speech time-frequency probability, wherein the speech time-frequency probability calculation processing is realized by a deep learning model, and the construction process of the deep learning model comprises,
s1, mixing clean voice data X and noise data n to obtain noisy voice data Y, and decomposing each frame of clean voice signal of the clean voice data X and each frame of mixed voice signal of the noisy voice data Y into a frequency domain to respectively obtain corresponding clean voice frequency domain data X and noisy voice frequency domain data Y;
s2, for the clean voice frequency domain data X and the noisy voice frequency domain data Y, calculating a probability value p (k) = abs (Y (k))/abs (X (k)) corresponding to the noisy voice frequency domain data Y at each frequency point k relative to the clean voice frequency domain data X, where abs (X (k)) is a probability value of the clean voice frequency domain data X for each frequency point k, and abs (Y (k)) is a probability value of the noisy voice frequency domain data Y for each frequency point k;
s3, constructing and obtaining the deep learning model according to all probability values of the noisy speech frequency domain data Y corresponding to the clean speech frequency domain data X at each frequency point k;
step (2), according to the voice time-frequency probability, adjusting the filtering processing acted on the target voice signal;
the step (2) specifically comprises:
step (201), according to the voice time-frequency probability, judging the available state of each of a plurality of frames of voice data corresponding to the target voice signal;
the step (201) specifically comprises:
step (2011), comparing the voice time-frequency probability with a preset probability threshold value, and judging the available state of each frame of voice data in the target voice signal according to the comparison result;
step (2022), if the voice time-frequency probability is greater than the preset probability threshold value, determining that the corresponding frame voice data in the target voice signal is in an available state;
step (2023), if the voice time-frequency probability is less than or equal to the preset probability threshold value, determining that the corresponding frame voice data in the target voice signal is in an unavailable state;
step (202), determining a data buffer evaluation value of a FIFO data buffer area corresponding to each frame of voice data according to the available state judgment result of each frame of voice data in the target voice signal;
the step (202) specifically comprises:
step (2021), determining the data storage state of all the frame voice data in the available state in the target voice signal in the FIFO data buffer area according to the available state judgment result;
step (2022), determining a data buffer evaluation value corresponding to each frame of voice data in the available state according to the data storage state of all the frames of voice data in the available state in the FIFO data buffer area;
a step (203) of determining whether to adjust the filtering process according to the data buffer evaluation value;
the step (203) specifically includes:
step (2031) of comparing the data buffer evaluation value with a preset evaluation threshold value, and determining whether to update the filtering process according to the result of the comparison process;
step (2032) of updating the filtering process if the data buffer evaluation value exceeds the preset evaluation threshold value;
step (2033), if the data buffer evaluation value does not exceed the preset evaluation threshold value, the filtering process is not updated;
and (3) according to the adjusted filtering processing, suppressing reverberation components existing in the voice array data corresponding to the target voice signal.
2. The method of cancelling reverberation as set forth in claim 1, wherein:
in the step (3), the suppressing reverberation component existing in the speech array data corresponding to the target speech signal according to the adjusted filtering process specifically includes,
step (301), obtaining the reverberation attribute of the target speech signal obtained by the adjusted filtering processing;
a step (302) of converting the target speech signal into the speech array data according to the reverberation attribute;
and (303) carrying out suppression and elimination processing on reverberation components of the voice array data.
3. A system for cancelling reverberation, characterized by:
the system for eliminating reverberation comprises a voice signal preprocessing module, a voice time-frequency probability calculation module, a filter adjusting module and a reverberation suppression module; wherein the content of the first and second substances,
the voice signal preprocessing module is used for preprocessing a target voice signal;
the voice signal preprocessing module comprises a late reverberation suppression submodule; wherein the content of the first and second substances,
the late reverberation suppressor submodule is used for performing late reverberation component suppression processing on the target voice signal so as to eliminate a late reverberation component in the target voice signal;
the voice time-frequency probability calculation module is used for performing voice time-frequency probability calculation processing on the target voice signal subjected to the late reverberation component suppression processing to obtain the voice time-frequency probability;
wherein, the speech time-frequency probability calculation processing is realized by a deep learning model, the construction process of the deep learning model comprises the following steps,
s1, mixing clean voice data X and noise data n to obtain noisy voice data Y, and decomposing each frame of clean voice signal of the clean voice data X and each frame of mixed voice signal of the noisy voice data Y into a frequency domain to respectively obtain corresponding clean voice frequency domain data X and noisy voice frequency domain data Y;
s2, for the clean voice frequency domain data X and the noisy voice frequency domain data Y, calculating a probability value p (k) = abs (Y (k))/abs (X (k)) corresponding to the noisy voice frequency domain data Y at each frequency point k relative to the clean voice frequency domain data X, where abs (X (k)) is a probability value of the clean voice frequency domain data X for each frequency point k, and abs (Y (k)) is a probability value of the noisy voice frequency domain data Y for each frequency point k;
s3, constructing and obtaining the deep learning model according to all probability values of the noisy speech frequency domain data Y corresponding to the clean speech frequency domain data X at each frequency point k;
the voice time-frequency probability calculation module is used for calculating and acquiring the voice time-frequency probability corresponding to the preprocessed target voice signal;
the filter module is used for filtering the target voice signal;
the filter adjusting module is used for adjusting the filtering processing mode of the filter module according to the voice time-frequency probability;
the filter adjusting module comprises a voice data available state judging sub-module, a voice data buffer evaluation value determining sub-module and a filter adjusting determining sub-module; wherein the content of the first and second substances,
the voice data available state judgment submodule is used for judging the available state of each of a plurality of frames of voice data corresponding to the target voice signal according to the voice time-frequency probability;
the voice data buffer evaluation value determining submodule is used for determining a data buffer evaluation value of a FIFO data buffer area corresponding to each frame of voice data according to the available state judgment result of each frame of voice data in the target voice signal;
the filter adjustment determining submodule is used for determining whether to adjust the filtering processing mode of the filter module according to the data buffer evaluation value;
the voice data available state judgment submodule comprises a first comparison unit and an available state determination unit; wherein the content of the first and second substances,
the first comparison unit is used for comparing the voice time-frequency probability with a preset probability threshold value;
the available state determining unit is used for judging the available state of each frame of voice data in the target voice signal according to the comparison processing result; the method specifically comprises the following steps:
if the voice time-frequency probability is larger than the preset probability threshold value, judging that the corresponding frame voice data in the target voice signal is in an available state;
if the voice time-frequency probability is smaller than or equal to the preset probability threshold value, judging that the corresponding frame voice data in the target voice signal is in an unavailable state;
the voice data buffer evaluation value determining submodule comprises a voice data storage state determining unit and a data buffer evaluation value calculating unit; wherein the content of the first and second substances,
the voice data storage state determining unit is used for determining the data storage state of all frame voice data in the target voice signal in the available state in the FIFO data buffer area according to the available state judgment result;
the data buffer evaluation value calculation unit is used for calculating a data buffer evaluation value corresponding to each frame of voice data in the available state according to the data storage state of all the frames of voice data in the available state in the FIFO data buffer area;
the filter adjustment determining submodule comprises a second comparison unit and a filtering processing mode updating unit; wherein the content of the first and second substances,
the second comparison unit is used for comparing the data buffer evaluation value with a preset evaluation threshold value;
the filtering processing mode updating unit is used for determining whether to update the filtering processing mode of the filter module according to the comparison processing result;
the method specifically comprises the following steps:
if the data buffer evaluation value exceeds the preset evaluation threshold value, updating the filtering process;
if the data buffer evaluation value does not exceed the preset evaluation threshold value, not updating the filtering process;
and the reverberation suppression module is used for suppressing reverberation components existing in the voice array data corresponding to the target voice signal through the adjusted filtering processing mode.
4. The system for reverberation canceling of claim 3, wherein:
the reverberation suppression module comprises a reverberation attribute acquisition sub-module, a voice array data conversion sub-module and a reverberation component processing sub-module; wherein the content of the first and second substances,
the attribute acquisition submodule is used for acquiring the reverberation attribute of the target voice signal obtained through the filtering processing;
the voice array data conversion sub-module is used for converting the target voice signal into the voice array data according to the reverberation attribute;
and the reverberation component processing submodule is used for carrying out suppression and elimination processing on reverberation components on the voice array data.
CN201910810308.5A 2019-08-29 2019-08-29 Method and system for eliminating reverberation Active CN110718230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910810308.5A CN110718230B (en) 2019-08-29 2019-08-29 Method and system for eliminating reverberation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910810308.5A CN110718230B (en) 2019-08-29 2019-08-29 Method and system for eliminating reverberation

Publications (2)

Publication Number Publication Date
CN110718230A CN110718230A (en) 2020-01-21
CN110718230B true CN110718230B (en) 2021-12-17

Family

ID=69209595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910810308.5A Active CN110718230B (en) 2019-08-29 2019-08-29 Method and system for eliminating reverberation

Country Status (1)

Country Link
CN (1) CN110718230B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111537955A (en) * 2020-04-02 2020-08-14 云知声智能科技股份有限公司 Multi-sound-source positioning method and device based on spherical microphone array
CN111551898A (en) * 2020-04-07 2020-08-18 云知声智能科技股份有限公司 Anti-reverberation sound source positioning method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102387273A (en) * 2011-07-08 2012-03-21 歌尔声学股份有限公司 Method and device for inhibiting residual echoes
CN102938254A (en) * 2012-10-24 2013-02-20 中国科学技术大学 Voice signal enhancement system and method
JP2013037174A (en) * 2011-08-08 2013-02-21 Nippon Telegr & Teleph Corp <Ntt> Noise/reverberation removal device, method thereof, and program
WO2014171920A1 (en) * 2013-04-15 2014-10-23 Nuance Communications, Inc. System and method for addressing acoustic signal reverberation
CN106448692A (en) * 2016-07-04 2017-02-22 Tcl集团股份有限公司 RETF reverberation elimination method and system optimized by use of voice existence probability
CN108986832A (en) * 2018-07-12 2018-12-11 北京大学深圳研究生院 Ears speech dereverberation method and device based on voice probability of occurrence and consistency
CN110088834A (en) * 2016-12-23 2019-08-02 辛纳普蒂克斯公司 Multiple-input and multiple-output (MIMO) Audio Signal Processing for speech dereverbcration
CN110100457A (en) * 2016-12-23 2019-08-06 辛纳普蒂克斯公司 The online dereverberation algorithm of the weight estimation error of changing environment when based on noise

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102387273A (en) * 2011-07-08 2012-03-21 歌尔声学股份有限公司 Method and device for inhibiting residual echoes
JP2013037174A (en) * 2011-08-08 2013-02-21 Nippon Telegr & Teleph Corp <Ntt> Noise/reverberation removal device, method thereof, and program
CN102938254A (en) * 2012-10-24 2013-02-20 中国科学技术大学 Voice signal enhancement system and method
WO2014171920A1 (en) * 2013-04-15 2014-10-23 Nuance Communications, Inc. System and method for addressing acoustic signal reverberation
CN106448692A (en) * 2016-07-04 2017-02-22 Tcl集团股份有限公司 RETF reverberation elimination method and system optimized by use of voice existence probability
CN110088834A (en) * 2016-12-23 2019-08-02 辛纳普蒂克斯公司 Multiple-input and multiple-output (MIMO) Audio Signal Processing for speech dereverbcration
CN110100457A (en) * 2016-12-23 2019-08-06 辛纳普蒂克斯公司 The online dereverberation algorithm of the weight estimation error of changing environment when based on noise
CN108986832A (en) * 2018-07-12 2018-12-11 北京大学深圳研究生院 Ears speech dereverberation method and device based on voice probability of occurrence and consistency

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《改进的多级线性预测晚期混响抑制算法》;赵红、李双田;《信号处理》;20140630;第30卷(第6期);全文 *

Also Published As

Publication number Publication date
CN110718230A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2022012367A1 (en) Noise suppression method and apparatus for quickly calculating speech presence probability, and storage medium and terminal
CN107393550A (en) Method of speech processing and device
US8750491B2 (en) Mitigation of echo in voice communication using echo detection and adaptive non-linear processor
Soon et al. Speech enhancement using 2-D Fourier transform
EP2346032B1 (en) Noise suppressor and voice decoder
CN111445919B (en) Speech enhancement method, system, electronic device, and medium incorporating AI model
US8296135B2 (en) Noise cancellation system and method
CN110718230B (en) Method and system for eliminating reverberation
CN105280193B (en) Priori signal-to-noise ratio estimation method based on MMSE error criterion
CN110211602B (en) Intelligent voice enhanced communication method and device
CN105489226A (en) Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup
CN107360497B (en) Calculation method and device for estimating reverberation component
CN109215672B (en) Method, device and equipment for processing sound information
CN107045874B (en) Non-linear voice enhancement method based on correlation
JP2007293059A (en) Signal processing apparatus and its method
CN112289337B (en) Method and device for filtering residual noise after machine learning voice enhancement
WO2024017110A1 (en) Voice noise reduction method, model training method, apparatus, device, medium, and product
CN107346658B (en) Reverberation suppression method and device
CN107393553B (en) Auditory feature extraction method for voice activity detection
CN112165558B (en) Method and device for detecting double-talk state, storage medium and terminal equipment
CN111048096B (en) Voice signal processing method and device and terminal
CN115497492A (en) Real-time voice enhancement method based on full convolution neural network
CN114242095A (en) Neural network noise reduction system and method based on OMLSA framework adopting harmonic structure
CN113593599A (en) Method for removing noise signal in voice signal
Prasad et al. Two microphone technique to improve the speech intelligibility under noisy environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant