WO2019227589A1 - Speech enhancement method and apparatus, computer device, and storage medium - Google Patents

Speech enhancement method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2019227589A1
WO2019227589A1 PCT/CN2018/094410 CN2018094410W WO2019227589A1 WO 2019227589 A1 WO2019227589 A1 WO 2019227589A1 CN 2018094410 W CN2018094410 W CN 2018094410W WO 2019227589 A1 WO2019227589 A1 WO 2019227589A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal component
signal
speech
correlation coefficient
processed
Prior art date
Application number
PCT/CN2018/094410
Other languages
French (fr)
Chinese (zh)
Inventor
涂宏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019227589A1 publication Critical patent/WO2019227589A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Definitions

  • the present application relates to the technical field of speech signal processing, and in particular, to a speech enhancement method, device, computer device, and storage medium.
  • Embodiments of the present application provide a method, a device, a computer device, and a storage medium for voice enhancement.
  • a speech enhancement method includes:
  • a voice enhancement device includes:
  • Digital voice signal acquisition module for converting original voice information to obtain digital voice signals
  • a first signal component acquisition module configured to decompose the digital voice signal by using an EEMD algorithm to acquire a first signal component
  • a first correlation coefficient acquisition module configured to perform a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient
  • a second signal component acquisition module configured to select, as the second signal component, a first signal component whose first correlation coefficient is greater than a preset threshold
  • a target voice information acquisition module is configured to perform integrated processing on the second signal component to acquire target voice information.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
  • FIG. 1 is an application environment diagram of a speech enhancement method in an embodiment of the present application
  • FIG. 2 is a flowchart of a speech enhancement method according to an embodiment of the present application.
  • FIG. 3 is a specific flowchart of step S20 in FIG. 2;
  • step S22 in FIG. 3 is a specific flowchart of step S22 in FIG. 3;
  • FIG. 5 is another flowchart of a speech enhancement method according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a speech enhancement device according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a computer device in an embodiment of the present application.
  • the speech enhancement method provided in this application can be applied in the application environment shown in FIG. 1, where a computer device communicates with a server through a network.
  • Computer devices can be, but are not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices.
  • the server can be implemented as a stand-alone server.
  • the speech enhancement method can be applied to computer equipment configured by financial institutions such as banks, securities, insurance, or other institutions, and is used to perform speech enhancement on voice data before voiceprint recognition to improve recognition accuracy.
  • the speech enhancement method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
  • the original voice information is the voice information of the speaker collected by the recording module (such as a microphone) of the front-end device.
  • the original voice information may be voice information in wav, mp3, or other formats.
  • Digital voice signals refer to discrete digital signals obtained by converting original voice information. Since computer equipment cannot directly process the original voice information, it can only process binary data, so the original voice information needs to be converted into digital voice signals.
  • the server receives the original voice information sent by the front-end device, and reads the original voice information by using a command function for reading an audio file in the Python module to obtain a digital voice signal.
  • the command function for reading an audio file may be wave.open (file (original voice information), rb (read file operation)).
  • the command function for reading an audio file is used to read and obtain the original voice information.
  • the one-dimensional array of the received audio files is the digital voice signal.
  • a Python module is a module containing a large number of encapsulated functions written in an object-oriented interpreted computer-readable instruction design language.
  • a command function for reading an audio file in the Python module is used to directly read the original voice information to obtain a digital voice signal, which is simple to implement.
  • S20 Decompose the digital voice signal by using the EEMD algorithm to obtain a first signal component.
  • the first signal component refers to an IMF (Intrinsic Mode Function) component obtained by decomposing a digital voice signal by using an EEMD algorithm.
  • the EEMD (Ensemble, Empirical, Mode, and Decomposition) algorithm is a noise-assisted data analysis algorithm that can effectively solve the modal aliasing phenomenon, so that the decomposition result (the first signal component) can clearly reflect the digital voice signal in different Time scale or oscillating changes at different frequencies.
  • Modal aliasing refers to the phenomenon that different modal components cannot be effectively separated according to the time scale, so that different modalities appear in one modal.
  • the server uses the EEMD algorithm to decompose the digital voice signal to obtain N (N is a positive integer) first signal components, and each first signal component represents an oscillation change of the digital voice signal at different time scales or at different frequencies.
  • S30 Perform a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient.
  • the first correlation coefficient is a calculation result obtained by performing correlation calculation on the digital voice signal and the first signal component.
  • the first correlation coefficient may reflect the degree of correlation between the digital voice signal and the first signal component, and may also reflect the degree to which the first signal component contains an effective amount of information (voice information) in the digital voice signal.
  • the correlation calculation formula is Among them, x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the first A correlation coefficient.
  • Cov (x, y) is calculated as:
  • E (x) represents the expectation of the digital voice signal
  • E (y) represents the expectation of the first signal component
  • n represents the number of the first signal component
  • x j represents the j-th digital voice signal on the time scale.
  • y j represents the j-th first signal component on the same time scale.
  • the first correlation coefficient may be a real number between 0 and 1. The closer the first correlation coefficient is to 1, the greater the correlation between the digital speech signal and the first signal component; otherwise, the first correlation The closer the coefficient is to 0, the smaller the correlation between the digital speech signal and the first signal component.
  • S40 Select a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component.
  • the preset threshold is a threshold defined in advance for screening the first signal component.
  • the second signal component is a signal component obtained by performing a filtering operation on the first signal component by using a preset threshold.
  • the preset threshold is also a real number between 0 and 1. If the first correlation coefficient is greater than a preset threshold value, it means that the correlation between the first signal component and the digital voice signal is large, and the first signal component contains more effective information amount of the digital voice signal. If the first correlation coefficient is not greater than a preset threshold value, it means that the correlation between the first signal component and the digital voice signal is small, and the first signal component contains a small amount of effective information of the digital voice signal, and it may default to noise.
  • the first signal component is filtered to obtain a first signal component with a high correlation with a digital voice signal as a second signal component to reduce noise interference and further improve the accuracy of the voice signal.
  • the method for screening the second signal component is simple to implement and can improve the efficiency of speech enhancement processing.
  • S50 Perform integrated processing on the second signal component to obtain target voice information.
  • the target voice signal is relatively pure voice information obtained by integrating the original voice information.
  • Integrated processing is processing that restores signal components to speech information.
  • the server uses the formula (N is a positive integer) perform integration processing on the second signal component to obtain a target voice signal, where SN represents the Nth second signal component, N represents the total number of second signal components, and Z represents the target voice information. That is, when the server performs integrated processing on the second signal component, it needs to first perform a square operation on each second signal component and then perform an average operation to obtain the target voice information.
  • a command function for reading an audio file in the Python module is used to directly read the original voice information to obtain a digital voice signal, so that the process of acquiring the digital voice signal is simple, and the efficiency of voice enhancement can be improved.
  • the EEMD algorithm is used to decompose the digital voice signal to obtain the first signal component, and the correlation calculation formula is used to calculate the correlation between the digital voice signal and the first signal component to obtain the first correlation coefficient, and then select the first correlation
  • a first signal component with a coefficient of coefficient greater than a preset threshold is used to obtain a first signal component with greater correlation with a digital speech signal as a second signal component to reduce noise interference and achieve the purpose of speech enhancement.
  • the second signal component is integrated to obtain target speech information with higher accuracy.
  • the implementation of the speech enhancement method is simple, can improve the processing efficiency of speech enhancement, and ensures that the accuracy of the acquired target speech information is high.
  • step S20 the EEMD algorithm is used to decompose the digital voice signal to obtain a first signal component, which specifically includes the following steps:
  • the speech signal to be processed is a digital speech signal added with different normally distributed white noise sequences.
  • the normally distributed white noise sequence refers to a Gaussian white noise sequence.
  • Gaussian white noise means that the instantaneous value of the noise obeys Gaussian distribution, and its power spectral density is normally distributed, then it is called Gaussian white noise.
  • the instantaneous value refers to the probability density function, and the Gaussian distribution is the normal distribution.
  • a speech signal to be processed is obtained, so that the white noise is evenly distributed in the time-frequency space of the entire digital voice signal, that is, when the digital voice
  • the signal regions at different time scales are automatically mapped to the appropriate time scale related to white noise, which effectively solves the modal aliasing phenomenon, and uses the zero mean value of the white noise of the Gaussian distribution.
  • S22 EMD decomposition of the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed.
  • the intermediate signal component is an IMF component obtained by performing EMD decomposition on each to-be-processed voice signal.
  • the EMD (Empirical Mode Decomposition, empirical mode decomposition) method is a method of performing signal decomposition based on the local time scale characteristics of the signal. Specifically, the EMD method is used to perform EMD decomposition on each to-be-processed voice signal and obtain an intermediate signal component corresponding to each to-be-processed voice signal, which can effectively avoid the modal aliasing phenomenon that is easy to occur during the decomposition process, which makes the EMD decomposition The accuracy is higher, which further improves the accuracy of speech enhancement.
  • the server performs an averaging operation on an intermediate signal component corresponding to each to-be-processed voice signal to obtain a first signal component.
  • the server uses a mean calculation formula Calculate the intermediate signal component to obtain the first signal component, where M j is the j-th first signal component, M is the intermediate signal component, N is the number of the first signal component, t is the time scale, and i is the intermediate signal The subscript value of the component.
  • different normal distributed white noise sequences are added to the digital voice signal to obtain the to-be-processed voice signal so that the white noise is evenly distributed in the time-frequency space of the entire digital voice signal, which is helpful for solving the modal
  • the aliasing phenomenon and the use of the characteristics of zero mean of white noise in Gaussian distribution make the real digital speech signal preserved and improve the accuracy of speech enhancement.
  • EMD decomposition of the speech signal to be processed is performed to obtain the intermediate signal component corresponding to the speech signal to be processed. Due to the addition of different normally distributed white noise sequences to the digital speech signal, the deficiency of EMD decomposition can be solved (that is, modal mixing Overlap phenomenon), therefore, the accuracy of EMD decomposition can be improved. Finally, an average operation is performed on the intermediate signal components to obtain the first signal component. The calculation process is simple and can improve the processing efficiency of the voice information.
  • step S22 the EMD decomposition of the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed specifically includes the following steps:
  • S221 Obtain local extreme points of the speech signal to be processed. Each local extreme point includes a maximum point and a minimum point.
  • the speech signal to be processed includes a plurality of local extreme points, and the local extreme points refer to extreme points of the speech signal to be processed in an arbitrary time range in the entire time domain.
  • the local extreme point includes a local maximum point and a local minimum point.
  • the functions formed by the speech signals to be processed in different time ranges are differentiated, and the value of the corresponding function when the derivative is 0 is the local extreme point.
  • S222 Construct an upper envelope based on the maximum points of all local extreme points, and construct a lower envelope based on the minimum points of all local extreme points.
  • the envelope refers to connecting the peak points of the high frequency AM signal to obtain a curve corresponding to the low frequency modulation signal.
  • the high frequency AM signal refers to a signal whose amplitude is changed according to the change of the low frequency modulation signal.
  • the low-frequency modulation signal is a modulation signal, and the modulation signal is a low-frequency signal converted from the original information.
  • the upper envelope is a smooth curve obtained by fitting all the maximum points using a spline function.
  • the lower envelope is a smooth curve obtained by fitting all the minimum points with a spline function.
  • a spline function usually refers to a polynomial parameter curve defined in sections. The spline function is used to fit all the maximum points or all the minimum points.
  • the upper envelope can be obtained by fitting all the maximum value points by using the built-in spline function (spline function) in Matlab, and using the built-in spline function (spline function) in Matlab for all the minimum value points.
  • the lower envelope curve can be obtained by fitting, and the curve in the time domain of the speech signal to be processed can be made smoother and clearer by drawing the envelope curve.
  • Matlab is an application software for numerical calculations in the field of mathematical technology applications.
  • S223 Obtain an average value corresponding to the upper and lower envelopes based on the upper and lower envelopes.
  • the formula calculates the upper and lower envelopes to obtain the corresponding mean value, where P is the mean value, s 1 (t) represents the upper envelope curve that changes with time t, and s 2 (t) represents the time curve with time t Varying lower envelope.
  • the corresponding mean value is obtained based on the upper envelope curve and the lower envelope curve, and technical support is provided for subsequent screening of the initial signal components.
  • S224 Obtain an initial signal component based on the speech signal to be processed and the average value. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
  • the preset condition is a condition set in advance for filtering signal components.
  • the preset conditions are as follows: First, the number of extreme points of the signal and the number of zero crossings are equal or differ by at most one. Second, the average of the upper and lower envelopes is zero. Specifically, the number of extreme points includes the number of local maximums and local minimums. In this embodiment, only the initial signal component that meets the two preset conditions can be used as the intermediate signal component. This process can effectively decompose the noise-containing voice signal to obtain a more pure voice signal and achieve the purpose of voice enhancement. .
  • h 0 (t) s (t) -m 0 (t) is used to process the speech signal to be processed and the mean value to obtain the initial signal component, where h 0 (t) is the initial signal component and s (t ) Is the speech signal to be processed, m 0 (t) is the average, and t is the time scale. If the initial signal component meets the preset condition, the initial signal component is used as the first intermediate signal component.
  • the loop ends.
  • the first threshold is a predefined threshold for stopping the foregoing cycle.
  • N intermediate signal components can be obtained after multiple cycles, and the speech signal to be processed can be expressed as Among them, c k (t) is the k-th intermediate signal component, and r n (t) is the initial signal component of the monotonic signal or the value of the initial signal component is less than a given threshold initial signal component.
  • each local extreme point includes a local maximum point and a local minimum point, so as to construct a packet based on the local maximum point among all local extreme points.
  • the envelope curve is constructed based on the minimum points of all local extreme points to make the curve of the speech signal to be processed in the time domain smoother and clearer.
  • the initial signal component based on the upper envelope curve and the lower envelope curve, obtain the average values corresponding to the upper envelope curve and the lower envelope curve, and obtain the initial signal component based on the speech signal and the mean value to be processed; if the initial signal component meets the preset conditions, The initial signal component is an intermediate signal component to make the signal stable; if the initial signal component does not meet the preset conditions, the initial signal component is used as a new voice signal to be processed, and then multiple times based on the new voice signal to be processed Loop processing to obtain N intermediate signal components.
  • This decomposition process can effectively decompose the noise-containing voice signal to obtain a relatively pure voice signal and achieve the purpose of voice enhancement.
  • the voice enhancement method further includes the following steps:
  • S411 Decompose the second signal component by using the EEMD algorithm to obtain a second decomposed signal component.
  • the EEMD algorithm is used to perform secondary decomposition on the second signal component to obtain the second decomposed signal component.
  • the decomposition process of using the EEMD algorithm to decompose the second signal component is the same as step S20, and details are not described herein again.
  • S412 Perform a correlation calculation on the digital speech signal and the binary signal component to obtain a second correlation coefficient.
  • the second correlation coefficient is a coefficient that reflects the correlation degree between the binary signal component and the digital voice signal obtained by performing correlation calculation on the digital voice signal and the binary signal component.
  • the correlation calculation formula is Among them, a is a digital voice signal, b is a binary decomposition signal component, Cov (a, b) is the covariance of a and b, Var [a] is the variance of a, Var [b] is the variance of b, and r2 is the first Second correlation coefficient.
  • the calculation formula of the covariance and the calculation formula of the variance are the same as those in step S30. To avoid repetition, details are not described herein again.
  • a second correlation coefficient needs to be calculated in order to be selected by the second correlation coefficient. Decompose the signal components to improve the accuracy of the speech signal.
  • S413 Select a binary signal component whose second correlation coefficient is greater than a preset threshold as the updated second signal component.
  • the preset threshold is a threshold that is defined in advance for screening the binary decomposition signal components.
  • the preset threshold is the same as the preset threshold in step S40.
  • the second correlation coefficient is a real number between 0 and 1. If the first correlation coefficient is greater than a preset threshold value, it means that the correlation between the binary decomposition signal component and the digital voice signal is large, and the signal component contains a large amount of effective information of the digital voice signal. If the second correlation coefficient is less than a preset threshold, it means that the correlation between the binary decomposition signal component and the digital voice signal is small, the amount of effective information contained in the signal component is small, and noise may be defaulted.
  • the binarized signal component is filtered to obtain a binarized signal component that has a greater correlation with a digital voice signal as an updated second signal component to reduce noise interference and further improve the accuracy of the voice signal.
  • the screening method of the binary decomposition signal component is simple to implement and improves the efficiency of speech enhancement.
  • the EEMD algorithm is first used to decompose the second signal component to obtain the second decomposed signal component, so as to perform correlation calculation on the digital voice signal and the second decomposed signal component to obtain a second correlation coefficient.
  • the integrated second updated signal component is subsequently processed to obtain the target speech information. This process can perform more detailed noise reduction processing on speech signals to obtain more pure speech information, making voiceprint recognition more accurate.
  • FIG. 6 shows a schematic diagram of a speech enhancement device corresponding to the speech enhancement method in the above embodiment.
  • the voice enhancement device includes a digital voice signal acquisition module 10, a first signal component acquisition module 20, a first correlation coefficient acquisition module 30, a second signal component acquisition module 40, and a target voice information acquisition module 50.
  • the detailed description of each function module is as follows:
  • the digital voice signal acquisition module 10 is configured to convert the original voice information to obtain a digital voice signal.
  • the first signal component acquisition module 20 is configured to decompose a digital voice signal by using an EEMD algorithm to acquire a first signal component.
  • a first correlation coefficient acquisition module 30 is configured to perform a correlation calculation on a digital voice signal and a first signal component by using a correlation calculation formula to obtain a first correlation coefficient.
  • the second signal component acquisition module 40 is configured to select, as the second signal component, a first signal component whose first correlation coefficient is greater than a preset threshold.
  • the target voice information acquisition module 50 is configured to perform integrated processing on the second signal component to acquire target voice information.
  • the first signal component acquisition module 20 is configured to include a to-be-processed voice signal acquisition unit 21, an intermediate signal component acquisition unit 22, and a first signal component acquisition unit 23.
  • the to-be-processed voice signal obtaining unit 21 is configured to add different normally distributed white noise sequences to the digital voice signal to obtain the to-be-processed voice signal.
  • the intermediate signal component obtaining unit 22 is configured to perform EMD decomposition on the speech signal to be processed, and obtain an intermediate signal component corresponding to the speech signal to be processed.
  • the first signal component acquiring unit 23 is configured to perform an averaging operation on the intermediate signal component to acquire a first signal component.
  • the intermediate signal component acquisition unit 22 includes a local extreme point acquisition subunit 221, an envelope construction subunit 222, a mean acquisition subunit 223, and an intermediate signal component acquisition subunit 224.
  • the local extreme point acquisition subunit 221 is configured to acquire a local extreme point of a speech signal to be processed, and each local extreme point includes a maximum point and a minimum point.
  • the envelope construction sub-unit 222 is configured to construct an upper envelope based on a local maximum point among all local extreme points, and a lower envelope based on a local minimum point among all local extreme points.
  • the average value obtaining subunit 223 is configured to obtain an average value corresponding to the upper envelope line and the lower envelope line based on the upper envelope line and the lower envelope line.
  • the intermediate signal component acquisition subunit 224 is configured to obtain an initial signal component based on the speech signal to be processed and the average value. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
  • the correlation calculation formula is Among them, x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is The first correlation coefficient is described.
  • the voice enhancement device further includes a binary decomposition signal component acquisition unit 411, a second correlation coefficient acquisition unit 412, and a second signal component update unit 413.
  • the binary decomposition signal component obtaining unit 411 is configured to decompose the second signal component by using the EEMD algorithm to obtain a binary decomposition signal component.
  • a second correlation coefficient acquisition unit 412 is configured to perform correlation calculation on the digital speech signal and the binary decomposition signal component to obtain a second correlation coefficient.
  • the second signal component updating unit 413 is configured to select a binary decomposition signal component whose second correlation coefficient is greater than a preset threshold as the updated second signal component.
  • the target voice information acquisition module 50 uses a formula (N is a positive integer) perform integration processing on the second signal component to obtain a target speech signal.
  • S N represents the second signal component
  • N represents the total number of the second signal components
  • Z represents the target voice information.
  • Each module in the above voice enhancement device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 7.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium.
  • the database of the computer device is used to store data generated or obtained during the execution of the speech enhancement method, such as a target speech signal.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a speech enhancement method.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions to implement the following steps:
  • the voice information is converted to obtain the digital voice signal;
  • the digital voice signal is decomposed using the EEMD algorithm to obtain the first signal component;
  • the correlation calculation formula is used to calculate the correlation between the digital voice signal and the first signal component to obtain the first correlation Coefficient; selecting a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component; performing integrated processing on the second signal component to obtain target speech information.
  • the correlation calculation formula is Where x is a digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, and Var [y] is the variance of y, r Is the first correlation coefficient.
  • the processor executes the computer-readable instructions, the following steps are further implemented: adding different normally distributed white noise sequences to the digital voice signal to obtain a voice signal to be processed;
  • EMD decomposition of the to-be-processed voice signal is performed to obtain an intermediate signal component corresponding to the to-be-processed voice signal; an average operation is performed on the intermediate signal component to obtain a first signal component.
  • the processor executes the computer-readable instructions, the following steps are further implemented: obtaining local extreme points of the speech signal to be processed, each local extreme point including a local maximum point and a local minimum point; based on all local points
  • the upper envelope is constructed from the maximum points of the extreme points, and the lower envelope is constructed based on the minimum points from all the local extreme points; the upper envelope is obtained based on the upper and lower envelopes.
  • the average value corresponding to the lower envelope; based on the speech signal to be processed and the average value, an initial signal component is obtained. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
  • the processor executes the computer-readable instructions, the following steps are further implemented: the EEMD algorithm is used to decompose the second signal component to obtain a binary decomposition signal component; and a correlation calculation is performed on the digital voice signal and the binary decomposition signal component, Obtaining a second correlation coefficient; selecting a binary decomposition signal component whose second correlation coefficient is greater than a preset threshold value as the updated second signal component.
  • the processor when the processor executes the computer-readable instructions, the following steps are further implemented: using a formula (N is a positive integer) perform integration processing on the second signal component to obtain a target voice signal; wherein, S N represents the second signal component, N represents the total number of the second signal component, and Z represents the target voice information.
  • one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more When the processors execute, the following steps are implemented: converting the original voice information to obtain the digital voice signal; using the EEMD algorithm to decompose the digital voice signal to obtain the first signal component; and using the correlation calculation formula to the digital voice signal and the first signal
  • the components are subjected to correlation calculation to obtain a first correlation coefficient; a first signal component whose first correlation coefficient is greater than a preset threshold is selected as a second signal component; and the second signal component is integrated to obtain target speech information.
  • the correlation calculation formula is Among them, x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is The first correlation coefficient is described.
  • the execution of the one or more processors further implements the following steps: adding different normal distributions to the digital voice signal White noise sequence to obtain the voice signal to be processed;
  • EMD decomposition of the to-be-processed voice signal is performed to obtain an intermediate signal component corresponding to the to-be-processed voice signal; an average operation is performed on the intermediate signal component to obtain a first signal component.
  • the execution of the one or more processors further implements the following steps: obtaining local extreme points of the speech signal to be processed, each Local extreme points include local maximum points and local minimum points; the upper envelope is constructed based on the local maximum points of all local extreme points, and the lower envelope is constructed based on the local minimum points of all local extreme points Envelope; based on the upper and lower envelopes, obtain the average value of the upper and lower envelopes; based on the speech signal and the mean value to be processed, obtain the initial signal component, if the initial signal component meets the preset conditions, Then the initial signal component is the intermediate signal component.
  • the execution of the one or more processors further implements the following steps: the EEMD algorithm is used to decompose the second signal component to obtain Binary decomposition signal component; performing correlation calculation on the digital speech signal and the binary decomposition signal component to obtain a second correlation coefficient; selecting a binary decomposition signal component whose second correlation coefficient is larger than a preset threshold value as the updated second signal component.
  • the execution of the one or more processors further implements the following steps: using a formula (N is a positive integer) perform integration processing on the second signal component to obtain a target voice signal; wherein, S N represents the second signal component, N represents the total number of the second signal component, and Z represents the target voice information.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Abstract

Disclosed are a speech enhancement method and apparatus, a computer device, and a storage medium. The speech enhancement method comprises: transforming original speech information to obtain a digital speech signal; decomposing the digital speech signal by using an EEMD algorithm to obtain a first signal component; performing a correlation calculation on the digital speech signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient; selecting a first signal component having the first correlation coefficient greater than a preset threshold as a second signal component; and integrating the second signal component to obtain target speech information. According to the speech enhancement method, when the speech enhancement is performed, the speech signal can be effectively denoised to obtain a pure speech signal, so that the accuracy of voiceprint recognition by using the pure speech signal is higher.

Description

语音增强方法、装置、计算机设备及存储介质Voice enhancement method, device, computer equipment and storage medium
本专利申请以2018年5月29日提交的申请号为201810528846.0,名称为“语音增强方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。This patent application is based on a Chinese invention patent application filed on May 29, 2018 with the application number 201810528846.0, entitled "Voice Enhancement Method, Device, Computer Equipment, and Storage Medium", and claims priority.
技术领域Technical field
本申请涉及语音信号处理技术领域,尤其涉及一种语音增强方法、装置、计算机设备及存储介质。The present application relates to the technical field of speech signal processing, and in particular, to a speech enhancement method, device, computer device, and storage medium.
背景技术Background technique
随着语音识别技术的广泛使用,语音信号处理技术的需求也随之扩大。目前,在语音识别或声纹识别过程中,由前端设备采集到的语音信号一般都带有噪声,包括背景环境中的噪声以及前端设备录音过程中产生的噪声。这些携带噪声的语音信号在进行语音识别时,会影响语音识别的准确性,因此,需要对语音信号进行语音增强处理(即对语音信号进行降噪处理),以从该语音信号中尽可能提取到更纯净的语音信号,以使语音识别更加准确。当前对语音信号进行语音增强处理后提取的语音信号精度不高,不利于后续进行语音识别。With the widespread use of speech recognition technology, the demand for speech signal processing technology has also expanded. At present, in the process of speech recognition or voiceprint recognition, the speech signals collected by the front-end equipment are generally noisy, including noise in the background environment and noise generated during recording by the front-end equipment. These speech signals with noise will affect the accuracy of speech recognition when performing speech recognition. Therefore, it is necessary to perform speech enhancement processing on the speech signal (that is, perform noise reduction processing on the speech signal) to extract as much as possible from the speech signal. To more pure speech signals to make speech recognition more accurate. The accuracy of the currently extracted speech signal after speech enhancement processing on the speech signal is not high, which is not conducive to subsequent speech recognition.
发明内容Summary of the Invention
基于此,有必要针对上述技术问题,本申请实施例提供一种语音增强方法、装置、计算机设备及存储介质。Based on this, it is necessary to address the above technical problems. Embodiments of the present application provide a method, a device, a computer device, and a storage medium for voice enhancement.
一种语音增强方法,包括:A speech enhancement method includes:
对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量;Use the EEMD algorithm to decompose the digital voice signal to obtain a first signal component;
采用相关性计算公式对所述数字语音信号和所述第一信号分量进行相关性计算,获取第一相关性系数;Performing a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient;
选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量;Selecting a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component;
对所述第二信号分量进行集成处理,获取目标语音信息。Performing integration processing on the second signal component to obtain target voice information.
一种语音增强装置,包括:A voice enhancement device includes:
数字语音信号获取模块,用于对原始语音信息进行转换,获取数字语音信号;Digital voice signal acquisition module, for converting original voice information to obtain digital voice signals;
第一信号分量获取模块,用于采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量;A first signal component acquisition module, configured to decompose the digital voice signal by using an EEMD algorithm to acquire a first signal component;
第一相关性系数获取模块,用于采用相关性计算公式对所述数字语音信号和所述第一信号分量进行相关性计算,获取第一相关性系数;A first correlation coefficient acquisition module, configured to perform a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient;
第二信号分量获取模块,用于选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量;A second signal component acquisition module, configured to select, as the second signal component, a first signal component whose first correlation coefficient is greater than a preset threshold;
目标语音信息获取模块,用于对所述第二信号分量进行集成处理,获取目标语音信息。A target voice information acquisition module is configured to perform integrated processing on the second signal component to acquire target voice information.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量;Use the EEMD algorithm to decompose the digital voice signal to obtain a first signal component;
采用相关性计算公式对所述数字语音信号和所述第一信号分量进行相关性计算,获取第一相关性系数;Performing a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient;
选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量;Selecting a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component;
对所述第二信号分量进行集成处理,获取目标语音信息。Performing integration processing on the second signal component to obtain target voice information.
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量;Use the EEMD algorithm to decompose the digital voice signal to obtain a first signal component;
采用相关性计算公式对所述数字语音信号和所述第一信号分量进行相关性计算,获取第一相关性系数;Performing a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient;
选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量;Selecting a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component;
对所述第二信号分量进行集成处理,获取目标语音信息。Performing integration processing on the second signal component to obtain target voice information.
本申请的一个或多个实施例的细节在下面的附图及描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below, and other features and advantages of the present application will become apparent from the description, the drawings, and the claims.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获 得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.
图1是本申请一实施例中语音增强方法的一应用环境图FIG. 1 is an application environment diagram of a speech enhancement method in an embodiment of the present application
图2是本申请一实施例中语音增强方法的一流程图;2 is a flowchart of a speech enhancement method according to an embodiment of the present application;
图3是图2中步骤S20的一具体流程图;FIG. 3 is a specific flowchart of step S20 in FIG. 2;
图4是图3中步骤S22的一具体流程图;4 is a specific flowchart of step S22 in FIG. 3;
图5是本发明本申请一实施例中语音增强方法的另一流程图;5 is another flowchart of a speech enhancement method according to an embodiment of the present application;
图6是本申请一实施例中语音增强装置的一示意图;6 is a schematic diagram of a speech enhancement device according to an embodiment of the present application;
图7是本申请一实施例中计算机设备的一示意图。FIG. 7 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.
本申请提供的语音增强方法,可应用在如图1的应用环境中,其中,计算机设备通过网络与服务器进行通信。计算机设备可以但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器来实现。The speech enhancement method provided in this application can be applied in the application environment shown in FIG. 1, where a computer device communicates with a server through a network. Computer devices can be, but are not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented as a stand-alone server.
该语音增强方法可应用在银行、证券、保险等金融机构或者其他机构配置的计算机设备上,用于进行声纹识别之前对语音数据进行语音增强,以提高识别准确率。The speech enhancement method can be applied to computer equipment configured by financial institutions such as banks, securities, insurance, or other institutions, and is used to perform speech enhancement on voice data before voiceprint recognition to improve recognition accuracy.
在一个实施例中,如图2所示,以该语音增强方法应用于图1中的服务器为例进行说明,包括如下步骤:In one embodiment, as shown in FIG. 2, the speech enhancement method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
S10:对原始语音信息进行转换,获取数字语音信号。S10: Convert the original voice information to obtain a digital voice signal.
其中,原始语音信息是前端设备的录音模块(如麦克风)采集到的说话人的语音信息。该原始语音信息可以是wav、mp3或其他格式的语音信息。数字语音信号是指将原始语音信息进行转换所获取的离散数字信号。由于计算机设备是不能直接处理原始语音信息的,它只能处理二进制数据,因此需要将原始语音信息转换为数字语音信号。The original voice information is the voice information of the speaker collected by the recording module (such as a microphone) of the front-end device. The original voice information may be voice information in wav, mp3, or other formats. Digital voice signals refer to discrete digital signals obtained by converting original voice information. Since computer equipment cannot directly process the original voice information, it can only process binary data, so the original voice information needs to be converted into digital voice signals.
具体地,服务器接收前端设备发送的原始语音信息,并采用Python模块中的读取音频文件的命令函数对该原始语音信息读取,获取数字语音信号。例如,该读取音频文件的命令函数可以为wave.open(file(原始语音信息),rb(读取文件操作)),通过该读取音频文件的命令函数对原始语音信息进行读取,获取到的音频文件的一维数组即为数字语音信 号。Python模块是一种由面向对象的解释型计算机可读指令设计语言编写的包含大量的封装函数的模块。本实施例中,采用Python模块中的读取音频文件的命令函数直接读取原始语音信息,即可获取数字语音信号,实现简单。Specifically, the server receives the original voice information sent by the front-end device, and reads the original voice information by using a command function for reading an audio file in the Python module to obtain a digital voice signal. For example, the command function for reading an audio file may be wave.open (file (original voice information), rb (read file operation)). The command function for reading an audio file is used to read and obtain the original voice information. The one-dimensional array of the received audio files is the digital voice signal. A Python module is a module containing a large number of encapsulated functions written in an object-oriented interpreted computer-readable instruction design language. In this embodiment, a command function for reading an audio file in the Python module is used to directly read the original voice information to obtain a digital voice signal, which is simple to implement.
S20:采用EEMD算法对数字语音信号进行分解,获取第一信号分量。S20: Decompose the digital voice signal by using the EEMD algorithm to obtain a first signal component.
其中,第一信号分量是指采用EEMD算法对数字语音信号进行分解,获取的IMF(Intrinsic Mode Function,本征模态函数)分量。EEMD(Ensemble Empirical Mode Decomposition,集合经验模态分解)算法是一种噪声辅助数据分析算法,可有效解决模态混叠现象,使得分解结果(第一信号分量)能清晰反映出数字语音信号在不同时间尺度或不同频率的振荡变化。模态混叠是指不能依据时间尺度有效的分离出不同的模态分量,使原本不同的模态出现在一个模态之中的现象。The first signal component refers to an IMF (Intrinsic Mode Function) component obtained by decomposing a digital voice signal by using an EEMD algorithm. The EEMD (Ensemble, Empirical, Mode, and Decomposition) algorithm is a noise-assisted data analysis algorithm that can effectively solve the modal aliasing phenomenon, so that the decomposition result (the first signal component) can clearly reflect the digital voice signal in different Time scale or oscillating changes at different frequencies. Modal aliasing refers to the phenomenon that different modal components cannot be effectively separated according to the time scale, so that different modalities appear in one modal.
由于数字语音信号是非平稳的,为了使数字语音信号更加平稳,需采用EEMD算法对数字语音信号进行分解,以使通过数字语音信号分解出的第一信号分量更加平稳,可有助于抑制噪声干扰,使得语音信号的精度较高。具体地,服务器采用EEMD算法对数字语音信号进行分解会获取N(N为正整数)个第一信号分量,每个第一信号分量表征数字语音信号在不同时间尺度或不同频率的振荡变化。Because the digital voice signal is non-stationary, in order to make the digital voice signal more stable, the EEMD algorithm needs to be used to decompose the digital voice signal to make the first signal component decomposed by the digital voice signal more stable, which can help suppress noise interference So that the accuracy of the speech signal is high. Specifically, the server uses the EEMD algorithm to decompose the digital voice signal to obtain N (N is a positive integer) first signal components, and each first signal component represents an oscillation change of the digital voice signal at different time scales or at different frequencies.
S30:采用相关性计算公式对数字语音信号和第一信号分量进行相关性计算,获取第一相关性系数。S30: Perform a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient.
其中,第一相关性系数是对数字语音信号和第一信号分量进行相关性计算所获取的计算结果。第一相关性系数可以反映数字语音信号和第一信号分量的相关程度,并且也可以反映第一信号分量包含数字语音信号中的有效信息量(语音信息)的程度。The first correlation coefficient is a calculation result obtained by performing correlation calculation on the digital voice signal and the first signal component. The first correlation coefficient may reflect the degree of correlation between the digital voice signal and the first signal component, and may also reflect the degree to which the first signal component contains an effective amount of information (voice information) in the digital voice signal.
具体地,相关性计算公式为
Figure PCTCN2018094410-appb-000001
其中,x为数字语音信号,y为第一信号分量,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为第一相关性系数。其中,Cov(x,y)的计算公式为:
Figure PCTCN2018094410-appb-000002
Var[x]的计算公式为Var[x]=E(x 2)-E 2(x);Var[y]的计算公式为Var[y]=E(y 2)-E 2(y);其中,E(x)表示数字语音信号的期望,E(y)表示第一信号分量的期望,n表示第一信号分量的数量,x j表示时间尺度上的第j个数字语音信号。y j表示同一时间尺度上的第j个第一信号分量。本实施例中,第一相关 性系数可以为0到1之间的实数,第一相关性系数越接近1,则数字语音信号和第一信号分量的相关性越大;反之,第一相关性系数越接近0,则数字语音信号和第一信号分量的相关性越小。
Specifically, the correlation calculation formula is
Figure PCTCN2018094410-appb-000001
Among them, x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the first A correlation coefficient. Among them, Cov (x, y) is calculated as:
Figure PCTCN2018094410-appb-000002
The calculation formula of Var [x] is Var [x] = E (x 2 ) -E 2 (x); The calculation formula of Var [y] is Var [y] = E (y 2 ) -E 2 (y); Among them, E (x) represents the expectation of the digital voice signal, E (y) represents the expectation of the first signal component, n represents the number of the first signal component, and x j represents the j-th digital voice signal on the time scale. y j represents the j-th first signal component on the same time scale. In this embodiment, the first correlation coefficient may be a real number between 0 and 1. The closer the first correlation coefficient is to 1, the greater the correlation between the digital speech signal and the first signal component; otherwise, the first correlation The closer the coefficient is to 0, the smaller the correlation between the digital speech signal and the first signal component.
S40:选取第一相关性系数大于预设阈值的第一信号分量,作为第二信号分量。S40: Select a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component.
其中,预设阈值是预先定义好的用于筛选第一信号分量的阈值。第二信号分量是利用预设阈值对第一信号分量进行筛选操作后获取的的信号分量。The preset threshold is a threshold defined in advance for screening the first signal component. The second signal component is a signal component obtained by performing a filtering operation on the first signal component by using a preset threshold.
由于第一相关性系数是0到1之间的实数,因此,该预设阈值也是0到1之间的实数。若第一相关性系数大于预设阈值,则表示第一信号分量与数字语音信号的相关性大,第一信号分量中包含数字语音信号的有效信息量较多。若第一相关性系数不大于预设阈值,则表示第一信号分量与数字语音信号的相关性小,第一信号分量中包含数字语音信号的有效信息量较少,可默认为噪声。本实施例中,通过对第一信号分量进行筛选,以获取与数字语音信号的相关性较大的第一信号分量作为第二信号分量,以减少噪声干扰,进一步提高语音信号的精度。并且,该第二信号分量的筛选方法实现简单,可提高语音增强处理的效率。Since the first correlation coefficient is a real number between 0 and 1, the preset threshold is also a real number between 0 and 1. If the first correlation coefficient is greater than a preset threshold value, it means that the correlation between the first signal component and the digital voice signal is large, and the first signal component contains more effective information amount of the digital voice signal. If the first correlation coefficient is not greater than a preset threshold value, it means that the correlation between the first signal component and the digital voice signal is small, and the first signal component contains a small amount of effective information of the digital voice signal, and it may default to noise. In this embodiment, the first signal component is filtered to obtain a first signal component with a high correlation with a digital voice signal as a second signal component to reduce noise interference and further improve the accuracy of the voice signal. In addition, the method for screening the second signal component is simple to implement and can improve the efficiency of speech enhancement processing.
S50:对第二信号分量进行集成处理,获取目标语音信息。S50: Perform integrated processing on the second signal component to obtain target voice information.
其中,目标语音信号是对原始语音信息进行集成处理后得到的较纯净的语音信息。集成处理是将信号分量还原为语音信息的处理。The target voice signal is relatively pure voice information obtained by integrating the original voice information. Integrated processing is processing that restores signal components to speech information.
具体地,服务器采用公式
Figure PCTCN2018094410-appb-000003
(N为正整数)对第二信号分量进行集成处理,获取目标语音信号,其中,S N表示第N个第二信号分量,N表示第二信号分量的总数量,Z表示目标语音信息。即服务器在对第二信号分量进行集成处理时,需先对每一第二信号分量进行平方运算,然后进行求均值运算,即可获取目标语音信息。
Specifically, the server uses the formula
Figure PCTCN2018094410-appb-000003
(N is a positive integer) perform integration processing on the second signal component to obtain a target voice signal, where SN represents the Nth second signal component, N represents the total number of second signal components, and Z represents the target voice information. That is, when the server performs integrated processing on the second signal component, it needs to first perform a square operation on each second signal component and then perform an average operation to obtain the target voice information.
本实施例中,先采用Python模块中的读取音频文件的命令函数直接读取原始语音信息,即可获取数字语音信号,使得数字语音信号的获取过程实现简单,可提高语音增强的效率。然后,采用EEMD算法对数字语音信号进行分解,获取第一信号分量,并采用相关性计算公式对数字语音信号和第一信号分量进行相关性计算,获取第一相关性系数,然后选取第一相关性系数大于预设阈值的第一信号分量,以获取与数字语音信号的相关性较大的第一信号分量作为第二信号分量,以减少噪声干扰,达到语音增强的目的。最后,对第二信号分量进行集成处理,获取精度较高的目标语音信息。该语音增强方法的实现过程简单,可提高语音增强的处理效率,并保证获取到的目标语音信息的精度较高。In this embodiment, first, a command function for reading an audio file in the Python module is used to directly read the original voice information to obtain a digital voice signal, so that the process of acquiring the digital voice signal is simple, and the efficiency of voice enhancement can be improved. Then, the EEMD algorithm is used to decompose the digital voice signal to obtain the first signal component, and the correlation calculation formula is used to calculate the correlation between the digital voice signal and the first signal component to obtain the first correlation coefficient, and then select the first correlation A first signal component with a coefficient of coefficient greater than a preset threshold is used to obtain a first signal component with greater correlation with a digital speech signal as a second signal component to reduce noise interference and achieve the purpose of speech enhancement. Finally, the second signal component is integrated to obtain target speech information with higher accuracy. The implementation of the speech enhancement method is simple, can improve the processing efficiency of speech enhancement, and ensures that the accuracy of the acquired target speech information is high.
在一实施例中,如图3所示,步骤S20中,即采用EEMD算法对数字语音信号进行分解,获取第一信号分量,具体包括如下步骤:In an embodiment, as shown in FIG. 3, in step S20, the EEMD algorithm is used to decompose the digital voice signal to obtain a first signal component, which specifically includes the following steps:
S21:向数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号。S21: Add different normally distributed white noise sequences to the digital voice signal to obtain a voice signal to be processed.
其中,待处理语音信号是加入不同的正态分布的白噪声序列的数字语音信号。本实施例中,正态分布的白噪声序列具体是指高斯白噪声序列。高斯白噪声,是指噪声的瞬时值服从高斯分布,而它的功率谱密度又是正态分布的,则称它为高斯白噪声。瞬时值指的是概率密度函数,高斯分布即为正态分布。Among them, the speech signal to be processed is a digital speech signal added with different normally distributed white noise sequences. In this embodiment, the normally distributed white noise sequence refers to a Gaussian white noise sequence. Gaussian white noise means that the instantaneous value of the noise obeys Gaussian distribution, and its power spectral density is normally distributed, then it is called Gaussian white noise. The instantaneous value refers to the probability density function, and the Gaussian distribution is the normal distribution.
本实施例中,通过向每一数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号,以使白噪声均匀分布在整个数字语音信号的时频空间,即当数字语音信号加上正态分布的白噪声序列时,不同时间尺度的信号区域将自动映射到与白噪声相关的适当时间尺度上去,有效解决模态混叠现象,并且利用高斯分布的白噪声零均值的特性,使数字语音信号得到保留,提高语音增强的精度。In this embodiment, by adding different normal-distributed white noise sequences to each digital voice signal, a speech signal to be processed is obtained, so that the white noise is evenly distributed in the time-frequency space of the entire digital voice signal, that is, when the digital voice When the signal is added to a normal-distributed white noise sequence, the signal regions at different time scales are automatically mapped to the appropriate time scale related to white noise, which effectively solves the modal aliasing phenomenon, and uses the zero mean value of the white noise of the Gaussian distribution. Features to keep digital voice signals and improve the accuracy of voice enhancement.
S22:对待处理语音信号进行EMD分解,获取待处理语音信号对应的中间信号分量。S22: EMD decomposition of the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed.
其中,中间信号分量是对每一待处理语音信号进行EMD分解所得到的IMF分量。EMD(Empirical Mode Decomposition,经验模态分解)方法是基于信号的局部时间尺度特征来进行信号分解的方法。具体地,采用EMD方法对每一待处理语音信号进行EMD分解,获取与每一待处理语音信号对应的中间信号分量,可有效避免分解过程中容易出现的模态混叠现象,使得EMD分解的准确率较高,进一步提高语音增强的精度。The intermediate signal component is an IMF component obtained by performing EMD decomposition on each to-be-processed voice signal. The EMD (Empirical Mode Decomposition, empirical mode decomposition) method is a method of performing signal decomposition based on the local time scale characteristics of the signal. Specifically, the EMD method is used to perform EMD decomposition on each to-be-processed voice signal and obtain an intermediate signal component corresponding to each to-be-processed voice signal, which can effectively avoid the modal aliasing phenomenon that is easy to occur during the decomposition process, which makes the EMD decomposition The accuracy is higher, which further improves the accuracy of speech enhancement.
S23:对中间信号分量进行取均值运算,获取第一信号分量。S23: Perform an average operation on the intermediate signal components to obtain a first signal component.
具体地,服务器对每一待处理语音信号对应的中间信号分量进行取均值运算,获取第一信号分量。具体地,服务器采用均值运算公式
Figure PCTCN2018094410-appb-000004
对中间信号分量进行计算,获取第一信号分量,其中,M j为第j个第一信号分量,M为中间信号分量,N为第一信号分量的数量,t为时间尺度,i为中间信号分量的下标值。
Specifically, the server performs an averaging operation on an intermediate signal component corresponding to each to-be-processed voice signal to obtain a first signal component. Specifically, the server uses a mean calculation formula
Figure PCTCN2018094410-appb-000004
Calculate the intermediate signal component to obtain the first signal component, where M j is the j-th first signal component, M is the intermediate signal component, N is the number of the first signal component, t is the time scale, and i is the intermediate signal The subscript value of the component.
本实施例中,通过向数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号,以使白噪声均匀分布在整个数字语音信号的时频空间,有助于解决模态混叠现象,并利用高斯分布的白噪声零均值的特性,使真实数字语音信号得到了保留,提高了语音增强的精度。然后,对待处理语音信号进行EMD分解,获取待处理语音信号对应的中间信号分量,由于通过向数字语音信号中加入不同的正态分布的白噪声序列可解决EMD分解的不足(即存在模态混叠现象),因此,可提高EMD分解的准确率。最后,对中间信号分 量进行取均值运算,获取第一信号分量,该计算过程简单,可提高语音信息的处理效率。In this embodiment, different normal distributed white noise sequences are added to the digital voice signal to obtain the to-be-processed voice signal so that the white noise is evenly distributed in the time-frequency space of the entire digital voice signal, which is helpful for solving the modal The aliasing phenomenon and the use of the characteristics of zero mean of white noise in Gaussian distribution make the real digital speech signal preserved and improve the accuracy of speech enhancement. Then, EMD decomposition of the speech signal to be processed is performed to obtain the intermediate signal component corresponding to the speech signal to be processed. Due to the addition of different normally distributed white noise sequences to the digital speech signal, the deficiency of EMD decomposition can be solved (that is, modal mixing Overlap phenomenon), therefore, the accuracy of EMD decomposition can be improved. Finally, an average operation is performed on the intermediate signal components to obtain the first signal component. The calculation process is simple and can improve the processing efficiency of the voice information.
在一实施例中,如图4所示,步骤S22中,即对待处理语音信号进行EMD分解,获取待处理语音信号对应的中间信号分量,具体包括如下步骤:In an embodiment, as shown in FIG. 4, in step S22, the EMD decomposition of the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed specifically includes the following steps:
S221:获取待处理语音信号的局部极值点,每个局部极值点包括极大值点和极小值点。S221: Obtain local extreme points of the speech signal to be processed. Each local extreme point includes a maximum point and a minimum point.
其中,待处理语音信号中包括多个局部极值点,该局部极值点是指待处理语音信号在整个时域上的任意时间范围内的极值点。该局部极值点包括极大值点和极小值点。具体地,对不同时间范围内的待处理语音信号形成的函数进行求导,导数为0时对应的函数的值即为局部极值点。例如,不同时间范围内的待处理语音信号为x(t),t∈T,T为整个时域,X'(t)=0时,t对应x(t)的值即为局部极值点。Wherein, the speech signal to be processed includes a plurality of local extreme points, and the local extreme points refer to extreme points of the speech signal to be processed in an arbitrary time range in the entire time domain. The local extreme point includes a local maximum point and a local minimum point. Specifically, the functions formed by the speech signals to be processed in different time ranges are differentiated, and the value of the corresponding function when the derivative is 0 is the local extreme point. For example, the speech signals to be processed in different time ranges are x (t), t ∈ T, T is the entire time domain, and when X '(t) = 0, the value of t corresponding to x (t) is the local extreme point. .
S222:基于所有局部极值点中的极大值点构建上包络线,并基于所有局部极值点中的极小值点构建下包络线。S222: Construct an upper envelope based on the maximum points of all local extreme points, and construct a lower envelope based on the minimum points of all local extreme points.
其中,包络线是指把高频调幅信号的峰点连接起来得到一个与低频调制信号相对应的曲线。高频调幅信号是指高频调幅信号的幅度是按低频调制信号的变化而变化的信号。低频调制信号为调制信号,调制信号是由原始信息转换而来的低频信号。上包络线是采用样条函数将所有极大值点进行拟合得到的一条平滑的曲线。下包络线是采用样条函数将所有极小值点进行拟合得到的一条平滑的曲线。样条函数通常是指分段定义的多项式参数曲线,采用样条函数对所有极大值点或所有极小值点进行拟合,具有构造简单、使用方便和拟合准确的优点。具体地,采用Matlab中内置的样条函数(spline函数)对所有极大值点进行拟合即可得到上包络线,采用Matlab中内置的样条函数(spline函数)对所有极小值点进行拟合即可得到下包络线,通过绘制包络线能够使待处理语音信号在时域上的曲线更加平滑和清晰。Matlab是在数学科技应用领域中数值计算方面的应用软件。Among them, the envelope refers to connecting the peak points of the high frequency AM signal to obtain a curve corresponding to the low frequency modulation signal. The high frequency AM signal refers to a signal whose amplitude is changed according to the change of the low frequency modulation signal. The low-frequency modulation signal is a modulation signal, and the modulation signal is a low-frequency signal converted from the original information. The upper envelope is a smooth curve obtained by fitting all the maximum points using a spline function. The lower envelope is a smooth curve obtained by fitting all the minimum points with a spline function. A spline function usually refers to a polynomial parameter curve defined in sections. The spline function is used to fit all the maximum points or all the minimum points. It has the advantages of simple construction, convenient use and accurate fitting. Specifically, the upper envelope can be obtained by fitting all the maximum value points by using the built-in spline function (spline function) in Matlab, and using the built-in spline function (spline function) in Matlab for all the minimum value points. The lower envelope curve can be obtained by fitting, and the curve in the time domain of the speech signal to be processed can be made smoother and clearer by drawing the envelope curve. Matlab is an application software for numerical calculations in the field of mathematical technology applications.
S223:基于上包络线和下包络线,获取上包络线和下包络线对应的均值。S223: Obtain an average value corresponding to the upper and lower envelopes based on the upper and lower envelopes.
具体地,采用
Figure PCTCN2018094410-appb-000005
公式对上包络线和下包络线进行计算,获取对应的均值,其中,P为均值,s 1(t)表示随时间t变化的上包络线,s 2(t)表示随时间t变化的下包络线。本实施例中,基于上包络线和下包络线,获取对应的均值,为后续对初始信号分量进行筛选,提供技术支持。
Specifically, using
Figure PCTCN2018094410-appb-000005
The formula calculates the upper and lower envelopes to obtain the corresponding mean value, where P is the mean value, s 1 (t) represents the upper envelope curve that changes with time t, and s 2 (t) represents the time curve with time t Varying lower envelope. In this embodiment, the corresponding mean value is obtained based on the upper envelope curve and the lower envelope curve, and technical support is provided for subsequent screening of the initial signal components.
S224:基于待处理语音信号和均值,获取初始信号分量,若初始信号分量符合预设条件,则初始信号分量为中间信号分量。S224: Obtain an initial signal component based on the speech signal to be processed and the average value. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
其中,预设条件是预先设置用于筛选信号分量的条件。该预设条件具体为:一是信号的极值点数目和过零点数目相等或最多相差一个。二是上下包络线的均值为零。具体地,极值点数目包括局部极大值和局部极小值的数目。本实施例中,只有符合上述两个预设条件的初始信号分量,才能作为中间信号分量,该过程能够有效将夹杂噪声的语音信号进行分解,以得到较纯净的语音信号,实现语音增强的目的。The preset condition is a condition set in advance for filtering signal components. The preset conditions are as follows: First, the number of extreme points of the signal and the number of zero crossings are equal or differ by at most one. Second, the average of the upper and lower envelopes is zero. Specifically, the number of extreme points includes the number of local maximums and local minimums. In this embodiment, only the initial signal component that meets the two preset conditions can be used as the intermediate signal component. This process can effectively decompose the noise-containing voice signal to obtain a more pure voice signal and achieve the purpose of voice enhancement. .
具体地,采用公式h 0(t)=s(t)-m 0(t)对待处理语音信号和均值进行处理,获取初始信号分量,其中,h 0(t)为初始信号分量,s(t)为待处理语音信号,m 0(t)为均值,t为时间尺度。若初始信号分量符合预设条件,则将初始信号分量作为第一个中间信号分量,若初始信号分量不符合预设条件,则将初始信号分量作为新的待处理语音信号(即将h 0(t)作为s(t))重复进行步骤S221-S223的步骤,直至得到满足预设条件的第一个中间信号分量。然后,设r 1(t)=s(t)-c 1(t),其中,r 1(t)为新的待处理语音信号,c 1(t)为第一个中间信号分量,重复执行步骤SS221-S224,得到第二个中间信号分量。经过上述步骤的多次循环处理,直至得到的初始信号分量是一个单调信号或者初始信号分量的值小于第一阈值的初始信号分量,结束循环。其中,第一阈值是预先定义的用于中止上述循环的阈值。最后,经过多次循环可得到N个中间信号分量,则待处理语音信号可表示为
Figure PCTCN2018094410-appb-000006
其中,c k(t)为第k个中间信号分量,r n(t)为单调信号的初始信号分量或者初始信号分量的值小于给定阈值初始信号分量。
Specifically, the formula h 0 (t) = s (t) -m 0 (t) is used to process the speech signal to be processed and the mean value to obtain the initial signal component, where h 0 (t) is the initial signal component and s (t ) Is the speech signal to be processed, m 0 (t) is the average, and t is the time scale. If the initial signal component meets the preset condition, the initial signal component is used as the first intermediate signal component. If the initial signal component does not meet the preset condition, the initial signal component is used as the new pending speech signal (that is, h 0 (t ) As s (t)), and the steps S221 to S223 are repeatedly performed until the first intermediate signal component that satisfies a preset condition is obtained. Then, set r 1 (t) = s (t)-c 1 (t), where r 1 (t) is the new speech signal to be processed, and c 1 (t) is the first intermediate signal component, and repeat the execution. In steps SS221-S224, a second intermediate signal component is obtained. After repeated processing of the above steps, until the obtained initial signal component is a monotonic signal or an initial signal component whose value is smaller than the first threshold value, the loop ends. Wherein, the first threshold is a predefined threshold for stopping the foregoing cycle. Finally, N intermediate signal components can be obtained after multiple cycles, and the speech signal to be processed can be expressed as
Figure PCTCN2018094410-appb-000006
Among them, c k (t) is the k-th intermediate signal component, and r n (t) is the initial signal component of the monotonic signal or the value of the initial signal component is less than a given threshold initial signal component.
本实施例中,通过获取待处理语音信号的局部极值点,每个局部极值点包括极大值点和极小值点,以便基于所有局部极值点中的极大值点构建上包络线,并基于所有局部极值点中的极小值点构建下包络线,以使待处理语音信号在时域上的曲线更加平滑和清晰。然后,基于上包络线和下包络线,获取上包络线和下包络线对应的均值,并基于待处理语音信号和均值,获取初始信号分量;若初始信号分量符合预设条件,则初始信号分量为中间信号分量,以使信号变得平稳;若初始信号分量不符合预设条件,则将初始信号分量作为新的待处理语音信号,然后基于新的待处理语音信号进行多次循环处理,获取N个中间信号分量。该分解过程能够有效将夹杂噪声的语音信号进行分解,以得到较纯净的语音信号,实现语音增强的目的。In this embodiment, by acquiring the local extreme point of the speech signal to be processed, each local extreme point includes a local maximum point and a local minimum point, so as to construct a packet based on the local maximum point among all local extreme points. The envelope curve is constructed based on the minimum points of all local extreme points to make the curve of the speech signal to be processed in the time domain smoother and clearer. Then, based on the upper envelope curve and the lower envelope curve, obtain the average values corresponding to the upper envelope curve and the lower envelope curve, and obtain the initial signal component based on the speech signal and the mean value to be processed; if the initial signal component meets the preset conditions, The initial signal component is an intermediate signal component to make the signal stable; if the initial signal component does not meet the preset conditions, the initial signal component is used as a new voice signal to be processed, and then multiple times based on the new voice signal to be processed Loop processing to obtain N intermediate signal components. This decomposition process can effectively decompose the noise-containing voice signal to obtain a relatively pure voice signal and achieve the purpose of voice enhancement.
在一实施例中,如图5所示,该语音增强方法还包括如下步骤:In an embodiment, as shown in FIG. 5, the voice enhancement method further includes the following steps:
S411:采用EEMD算法对第二信号分量进行分解,获取二分解信号分量。S411: Decompose the second signal component by using the EEMD algorithm to obtain a second decomposed signal component.
为了更细微的降噪,达到更好的语音增强效果,使得语音识别更加准确,本实施例中,采用EEMD算法对第二信号分量进行二次分解,获取二分解信号分量。具体地,采用EEMD算法对第二信号分量进行分解的分解过程与步骤S20相同,在此不再赘述。In order to achieve more subtle noise reduction and achieve better speech enhancement effects, so that speech recognition is more accurate, in this embodiment, the EEMD algorithm is used to perform secondary decomposition on the second signal component to obtain the second decomposed signal component. Specifically, the decomposition process of using the EEMD algorithm to decompose the second signal component is the same as step S20, and details are not described herein again.
S412:对数字语音信号和二分解信号分量进行相关性计算,获取第二相关性系数。S412: Perform a correlation calculation on the digital speech signal and the binary signal component to obtain a second correlation coefficient.
其中,第二相关性系数是对数字语音信号和二分解信号分量进行相关性计算所获得的反映二分解信号分量与数字语音信号的相关程度的系数。具体地,The second correlation coefficient is a coefficient that reflects the correlation degree between the binary signal component and the digital voice signal obtained by performing correlation calculation on the digital voice signal and the binary signal component. specifically,
相关性计算公式为
Figure PCTCN2018094410-appb-000007
其中,a为数字语音信号,b为二分解信号分量,Cov(a,b)为a与b的协方差,Var[a]为a的方差,Var[b]为b的方差,r2为第二相关性系数。其中,协方差的计算公式与方差的计算公式与步骤S30相同,为避免重复,在此不再赘述。本实施例中,为了筛选出与数字语音信号相关性较大的二分解信号分量,更加细致的对数字语音信号进行降噪,因此,需要计算第二相关性系数以便通过第二相关性系数选取二分解信号分量,以提高语音信号的精度。
The correlation calculation formula is
Figure PCTCN2018094410-appb-000007
Among them, a is a digital voice signal, b is a binary decomposition signal component, Cov (a, b) is the covariance of a and b, Var [a] is the variance of a, Var [b] is the variance of b, and r2 is the first Second correlation coefficient. The calculation formula of the covariance and the calculation formula of the variance are the same as those in step S30. To avoid repetition, details are not described herein again. In this embodiment, in order to filter out binary decomposition signal components that have greater correlation with digital voice signals, and perform more detailed noise reduction on digital voice signals, a second correlation coefficient needs to be calculated in order to be selected by the second correlation coefficient. Decompose the signal components to improve the accuracy of the speech signal.
S413:选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量。S413: Select a binary signal component whose second correlation coefficient is greater than a preset threshold as the updated second signal component.
其中,预设阈值是预先定义好的用于筛选二分解信号分量的阈值。该预设阈值与步骤S40的预设阈值相同。The preset threshold is a threshold that is defined in advance for screening the binary decomposition signal components. The preset threshold is the same as the preset threshold in step S40.
具体地,第二相关性系数是0到1之间的实数。若第一相关性系数大于预设阈值,则表示二分解信号分量与数字语音信号的相关性大,信号分量中包含数字语音信号的有效信息量多。若第二相关性系数小于预设阈值,则表示二分解信号分量与数字语音信号的相关性小,信号分量中包含的有效信息量少,可默认为噪声。本实施例中,通过对二分解信号分量进行筛选,以获取与数字语音信号的相关性较大的二分解信号分量作为更新的第二信号分量,以减少噪声干扰,进一步提高语音信号的精度。并且,该二分解信号分量的筛选方法实现简单,提高语音增强的效率。Specifically, the second correlation coefficient is a real number between 0 and 1. If the first correlation coefficient is greater than a preset threshold value, it means that the correlation between the binary decomposition signal component and the digital voice signal is large, and the signal component contains a large amount of effective information of the digital voice signal. If the second correlation coefficient is less than a preset threshold, it means that the correlation between the binary decomposition signal component and the digital voice signal is small, the amount of effective information contained in the signal component is small, and noise may be defaulted. In this embodiment, the binarized signal component is filtered to obtain a binarized signal component that has a greater correlation with a digital voice signal as an updated second signal component to reduce noise interference and further improve the accuracy of the voice signal. In addition, the screening method of the binary decomposition signal component is simple to implement and improves the efficiency of speech enhancement.
本实施例中,先采用EEMD算法对第二信号分量进行分解,获取二分解信号分量,以便对数字语音信号和二分解信号分量进行相关性计算,获取第二相关性系数。通过选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量,以便后续对更新的第二信号分量进行集成处理,以获取目标语音信息。该过程能够更加细致的对语音信号进行降噪处理,以获取纯度更高的语音信息,使得声纹识别更加准确。In this embodiment, the EEMD algorithm is first used to decompose the second signal component to obtain the second decomposed signal component, so as to perform correlation calculation on the digital voice signal and the second decomposed signal component to obtain a second correlation coefficient. By selecting a binary signal component whose second correlation coefficient is greater than a preset threshold value as the updated second signal component, the integrated second updated signal component is subsequently processed to obtain the target speech information. This process can perform more detailed noise reduction processing on speech signals to obtain more pure speech information, making voiceprint recognition more accurate.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执 行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
在一个实施例中,图6示出与上述实施例中语音增强方法一一对应的语音增强装置的示意图。如图6所示,该语音增强装置包括数字语音信号获取模块10、第一信号分量获取模块20、第一相关性系数获取模块30、第二信号分量获取模块40和目标语音信息获取模块50。各功能模块详细说明如下:In one embodiment, FIG. 6 shows a schematic diagram of a speech enhancement device corresponding to the speech enhancement method in the above embodiment. As shown in FIG. 6, the voice enhancement device includes a digital voice signal acquisition module 10, a first signal component acquisition module 20, a first correlation coefficient acquisition module 30, a second signal component acquisition module 40, and a target voice information acquisition module 50. The detailed description of each function module is as follows:
数字语音信号获取模块10,用于对原始语音信息进行转换,获取数字语音信号。The digital voice signal acquisition module 10 is configured to convert the original voice information to obtain a digital voice signal.
第一信号分量获取模块20,用于采用EEMD算法对数字语音信号进行分解,获取第一信号分量。The first signal component acquisition module 20 is configured to decompose a digital voice signal by using an EEMD algorithm to acquire a first signal component.
第一相关性系数获取模块30,用于采用相关性计算公式对数字语音信号和第一信号分量进行相关性计算,获取第一相关性系数。A first correlation coefficient acquisition module 30 is configured to perform a correlation calculation on a digital voice signal and a first signal component by using a correlation calculation formula to obtain a first correlation coefficient.
第二信号分量获取模块40,用于选取第一相关性系数大于预设阈值的第一信号分量,作为第二信号分量。The second signal component acquisition module 40 is configured to select, as the second signal component, a first signal component whose first correlation coefficient is greater than a preset threshold.
目标语音信息获取模块50,用于对第二信号分量进行集成处理,获取目标语音信息。The target voice information acquisition module 50 is configured to perform integrated processing on the second signal component to acquire target voice information.
具体地,第一信号分量获取模块20,用于包括待处理语音信号获取单元21、中间信号分量获取单元22和第一信号分量获取单元23。Specifically, the first signal component acquisition module 20 is configured to include a to-be-processed voice signal acquisition unit 21, an intermediate signal component acquisition unit 22, and a first signal component acquisition unit 23.
待处理语音信号获取单元21,用于向数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号。The to-be-processed voice signal obtaining unit 21 is configured to add different normally distributed white noise sequences to the digital voice signal to obtain the to-be-processed voice signal.
中间信号分量获取单元22,用于对待处理语音信号进行EMD分解,获取待处理语音信号对应的中间信号分量。The intermediate signal component obtaining unit 22 is configured to perform EMD decomposition on the speech signal to be processed, and obtain an intermediate signal component corresponding to the speech signal to be processed.
第一信号分量获取单元23,用于对中间信号分量进行取均值运算,获取第一信号分量。The first signal component acquiring unit 23 is configured to perform an averaging operation on the intermediate signal component to acquire a first signal component.
具体地,中间信号分量获取单元22包括局部极值点获取子单元221、包络线构建子单元222、均值获取子单元223和中间信号分量获取子单元224。Specifically, the intermediate signal component acquisition unit 22 includes a local extreme point acquisition subunit 221, an envelope construction subunit 222, a mean acquisition subunit 223, and an intermediate signal component acquisition subunit 224.
局部极值点获取子单元221,用于获取待处理语音信号的局部极值点,每个局部极值点包括极大值点和极小值点。The local extreme point acquisition subunit 221 is configured to acquire a local extreme point of a speech signal to be processed, and each local extreme point includes a maximum point and a minimum point.
包络线构建子单元222,用于基于所有局部极值点中的极大值点构建上包络线,并基于所有局部极值点中的极小值点构建下包络线。The envelope construction sub-unit 222 is configured to construct an upper envelope based on a local maximum point among all local extreme points, and a lower envelope based on a local minimum point among all local extreme points.
均值获取子单元223,用于基于上包络线和所述下包络线,获取上包络线和下包络线对应的均值。The average value obtaining subunit 223 is configured to obtain an average value corresponding to the upper envelope line and the lower envelope line based on the upper envelope line and the lower envelope line.
中间信号分量获取子单元224,用于基于待处理语音信号和均值,获取初始信号分量,若初始信号分量符合预设条件,则初始信号分量为中间信号分量。The intermediate signal component acquisition subunit 224 is configured to obtain an initial signal component based on the speech signal to be processed and the average value. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
具体地,相关性计算公式为
Figure PCTCN2018094410-appb-000008
其中,x为数字语音信号,y为第一信号分量,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为所述第一相关性系数。
Specifically, the correlation calculation formula is
Figure PCTCN2018094410-appb-000008
Among them, x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is The first correlation coefficient is described.
具体地,该语音增强装置还包括二分解信号分量获取单元411、第二相关性系数获取单元412和第二信号分量更新单元413。Specifically, the voice enhancement device further includes a binary decomposition signal component acquisition unit 411, a second correlation coefficient acquisition unit 412, and a second signal component update unit 413.
二分解信号分量获取单元411,用于采用EEMD算法对第二信号分量进行分解,获取二分解信号分量。The binary decomposition signal component obtaining unit 411 is configured to decompose the second signal component by using the EEMD algorithm to obtain a binary decomposition signal component.
第二相关性系数获取单元412,用于对数字语音信号和二分解信号分量进行相关性计算,获取第二相关性系数。A second correlation coefficient acquisition unit 412 is configured to perform correlation calculation on the digital speech signal and the binary decomposition signal component to obtain a second correlation coefficient.
第二信号分量更新单元413,用于选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量。The second signal component updating unit 413 is configured to select a binary decomposition signal component whose second correlation coefficient is greater than a preset threshold as the updated second signal component.
具体地,目标语音信息获取模块50为采用公式
Figure PCTCN2018094410-appb-000009
(N为正整数)对第二信号分量进行集成处理,获取目标语音信号。其中,S N表示第二信号分量,N表示第二信号分量的总数量,Z表示目标语音信息。
Specifically, the target voice information acquisition module 50 uses a formula
Figure PCTCN2018094410-appb-000009
(N is a positive integer) perform integration processing on the second signal component to obtain a target speech signal. Among them, S N represents the second signal component, N represents the total number of the second signal components, and Z represents the target voice information.
关于语音增强装置的具体限定可以参见上文中对于语音增强方法的限定,在此不再赘述。上述语音增强装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the speech enhancement device, refer to the foregoing limitation on the speech enhancement method, and details are not described herein again. Each module in the above voice enhancement device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于用于存储执行语音增强方法过程中生成或获取的数据,如目标语音信号。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种语音增强方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 7. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer device is used to store data generated or obtained during the execution of the speech enhancement method, such as a target speech signal. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement a speech enhancement method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:对原 始语音信息进行转换,获取数字语音信号;采用EEMD算法对数字语音信号进行分解,获取第一信号分量;采用相关性计算公式对数字语音信号和第一信号分量进行相关性计算,获取第一相关性系数;选取第一相关性系数大于预设阈值的第一信号分量,作为第二信号分量;对第二信号分量进行集成处理,获取目标语音信息。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. The processor executes the computer-readable instructions to implement the following steps: The voice information is converted to obtain the digital voice signal; the digital voice signal is decomposed using the EEMD algorithm to obtain the first signal component; the correlation calculation formula is used to calculate the correlation between the digital voice signal and the first signal component to obtain the first correlation Coefficient; selecting a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component; performing integrated processing on the second signal component to obtain target speech information.
具体地,相关性计算公式为
Figure PCTCN2018094410-appb-000010
其中,x为数字语音信号,y为所述第一信号分量,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为所述第一相关性系数。
Specifically, the correlation calculation formula is
Figure PCTCN2018094410-appb-000010
Where x is a digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, and Var [y] is the variance of y, r Is the first correlation coefficient.
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:向数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号;In one embodiment, when the processor executes the computer-readable instructions, the following steps are further implemented: adding different normally distributed white noise sequences to the digital voice signal to obtain a voice signal to be processed;
待处理语音信号进行EMD分解,获取待处理语音信号对应的中间信号分量;对中间信号分量进行取均值运算,获取第一信号分量。EMD decomposition of the to-be-processed voice signal is performed to obtain an intermediate signal component corresponding to the to-be-processed voice signal; an average operation is performed on the intermediate signal component to obtain a first signal component.
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:获取待处理语音信号的局部极值点,每个局部极值点包括极大值点和极小值点;基于所有局部极值点中的极大值点构建上包络线,并基于所有局部极值点中的极小值点构建下包络线;基于上包络线和下包络线,获取上包络线和下包络线对应的均值;基于待处理语音信号和均值,获取初始信号分量,若初始信号分量符合预设条件,则初始信号分量为中间信号分量。In one embodiment, when the processor executes the computer-readable instructions, the following steps are further implemented: obtaining local extreme points of the speech signal to be processed, each local extreme point including a local maximum point and a local minimum point; based on all local points The upper envelope is constructed from the maximum points of the extreme points, and the lower envelope is constructed based on the minimum points from all the local extreme points; the upper envelope is obtained based on the upper and lower envelopes. The average value corresponding to the lower envelope; based on the speech signal to be processed and the average value, an initial signal component is obtained. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:采用EEMD算法对第二信号分量进行分解,获取二分解信号分量;对数字语音信号和二分解信号分量进行相关性计算,获取第二相关性系数;选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量。In one embodiment, when the processor executes the computer-readable instructions, the following steps are further implemented: the EEMD algorithm is used to decompose the second signal component to obtain a binary decomposition signal component; and a correlation calculation is performed on the digital voice signal and the binary decomposition signal component, Obtaining a second correlation coefficient; selecting a binary decomposition signal component whose second correlation coefficient is greater than a preset threshold value as the updated second signal component.
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:采用公式
Figure PCTCN2018094410-appb-000011
(N为正整数)对第二信号分量进行集成处理,获取目标语音信号;其中,S N表示第二信号分量,N表示第二信号分量的总数量,Z表示目标语音信息。
In one embodiment, when the processor executes the computer-readable instructions, the following steps are further implemented: using a formula
Figure PCTCN2018094410-appb-000011
(N is a positive integer) perform integration processing on the second signal component to obtain a target voice signal; wherein, S N represents the second signal component, N represents the total number of the second signal component, and Z represents the target voice information.
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现以下步骤:对原始语音信息进行转换,获取数字语音信号;采用EEMD算法对数字语音信号进行分解,获取第一信号分量;采用相关性计算公式对数字语音信号和第一信号分量进行相关性计算,获取第一相关性系数;选取第一相关性系数大于预设阈值的第一信号 分量,作为第二信号分量;对第二信号分量进行集成处理,获取目标语音信息。In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more When the processors execute, the following steps are implemented: converting the original voice information to obtain the digital voice signal; using the EEMD algorithm to decompose the digital voice signal to obtain the first signal component; and using the correlation calculation formula to the digital voice signal and the first signal The components are subjected to correlation calculation to obtain a first correlation coefficient; a first signal component whose first correlation coefficient is greater than a preset threshold is selected as a second signal component; and the second signal component is integrated to obtain target speech information.
具体地,相关性计算公式为
Figure PCTCN2018094410-appb-000012
其中,x为数字语音信号,y为第一信号分量,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为所述第一相关性系数。
Specifically, the correlation calculation formula is
Figure PCTCN2018094410-appb-000012
Among them, x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is The first correlation coefficient is described.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:向数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号;In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: adding different normal distributions to the digital voice signal White noise sequence to obtain the voice signal to be processed;
待处理语音信号进行EMD分解,获取待处理语音信号对应的中间信号分量;对中间信号分量进行取均值运算,获取第一信号分量。EMD decomposition of the to-be-processed voice signal is performed to obtain an intermediate signal component corresponding to the to-be-processed voice signal; an average operation is performed on the intermediate signal component to obtain a first signal component.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:获取待处理语音信号的局部极值点,每个局部极值点包括极大值点和极小值点;基于所有局部极值点中的极大值点构建上包络线,并基于所有局部极值点中的极小值点构建下包络线;基于上包络线和下包络线,获取上包络线和下包络线对应的均值;基于待处理语音信号和均值,获取初始信号分量,若初始信号分量符合预设条件,则初始信号分量为中间信号分量。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: obtaining local extreme points of the speech signal to be processed, each Local extreme points include local maximum points and local minimum points; the upper envelope is constructed based on the local maximum points of all local extreme points, and the lower envelope is constructed based on the local minimum points of all local extreme points Envelope; based on the upper and lower envelopes, obtain the average value of the upper and lower envelopes; based on the speech signal and the mean value to be processed, obtain the initial signal component, if the initial signal component meets the preset conditions, Then the initial signal component is the intermediate signal component.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:采用EEMD算法对第二信号分量进行分解,获取二分解信号分量;对数字语音信号和二分解信号分量进行相关性计算,获取第二相关性系数;选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: the EEMD algorithm is used to decompose the second signal component to obtain Binary decomposition signal component; performing correlation calculation on the digital speech signal and the binary decomposition signal component to obtain a second correlation coefficient; selecting a binary decomposition signal component whose second correlation coefficient is larger than a preset threshold value as the updated second signal component.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:采用公式
Figure PCTCN2018094410-appb-000013
(N为正整数)对第二信号分量进行集成处理,获取目标语音信号;其中,S N表示第二信号分量,N表示第二信号分量的总数量,Z表示目标语音信息。
In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: using a formula
Figure PCTCN2018094410-appb-000013
(N is a positive integer) perform integration processing on the second signal component to obtain a target voice signal; wherein, S N represents the second signal component, N represents the total number of the second signal component, and Z represents the target voice information.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、 可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a non-volatile computer-readable In the storage medium, when the computer-readable instructions are executed, the computer-readable instructions may include the processes of the embodiments of the methods described above. Wherein, any reference to the storage, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile storage. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to describe the technical solution of the present application, but not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.

Claims (20)

  1. 一种语音增强方法,其特征在于,包括:A speech enhancement method, comprising:
    对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
    采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量;Use the EEMD algorithm to decompose the digital voice signal to obtain a first signal component;
    采用相关性计算公式对所述数字语音信号和所述第一信号分量进行相关性计算,获取第一相关性系数;Performing a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient;
    选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量;Selecting a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component;
    对所述第二信号分量进行集成处理,获取目标语音信息。Performing integration processing on the second signal component to obtain target voice information.
  2. 如权利要求1所述的语音增强方法,其特征在于,所述采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量,包括:The method of claim 1, wherein the step of decomposing the digital voice signal by using an EEMD algorithm to obtain a first signal component comprises:
    向所述数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号;Adding different normally distributed white noise sequences to the digital voice signal to obtain a voice signal to be processed;
    对所述待处理语音信号进行EMD分解,获取所述待处理语音信号对应的中间信号分量;EMD decompose the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed;
    对所述中间信号分量进行取均值运算,获取所述第一信号分量。Performing an averaging operation on the intermediate signal component to obtain the first signal component.
  3. 如权利要求2所述的语音增强方法,其特征在于,所述对所述待处理语音信号进行EMD分解,获取所述待处理语音信号对应的中间信号分量,包括:The speech enhancement method according to claim 2, wherein the performing EMD decomposition on the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed comprises:
    获取所述待处理语音信号的局部极值点,每个局部极值点包括极大值点和极小值点;Obtaining local extreme points of the speech signal to be processed, each local extreme point including a maximum point and a minimum point;
    基于所有局部极值点中的极大值点构建上包络线,并基于所有局部极值点中的极小值点构建下包络线;Construct an upper envelope based on the maximum points of all local extreme points and a lower envelope based on the minimum points of all local extreme points;
    基于所述上包络线和所述下包络线,获取上包络线和下包络线对应的均值;Obtaining an average value corresponding to the upper envelope and the lower envelope based on the upper envelope and the lower envelope;
    基于待处理语音信号和所述均值,获取初始信号分量,若所述初始信号分量符合预设条件,则所述初始信号分量为中间信号分量。An initial signal component is obtained based on the speech signal to be processed and the average value. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
  4. 如权利要求1所述的语音增强方法,其特征在于,所述相关性计算公式为
    Figure PCTCN2018094410-appb-100001
    其中,x为所述数字语音信号,y为所述第一信号分量,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为所述第一相关性系数。
    The speech enhancement method according to claim 1, wherein the correlation calculation formula is
    Figure PCTCN2018094410-appb-100001
    Where x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, and Var [y] is the variance of y , R is the first correlation coefficient.
  5. 如权利要求1所述的语音增强方法,其特征在于,在所述选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量的步骤之后,所述语音增强方法还包括:The speech enhancement method according to claim 1, wherein after the step of selecting the first signal component with the first correlation coefficient larger than a preset threshold as the second signal component, the speech Enhancements also include:
    采用EEMD算法对所述第二信号分量进行分解,获取二分解信号分量;Use the EEMD algorithm to decompose the second signal component to obtain a second decomposed signal component;
    对所述数字语音信号和所述二分解信号分量进行相关性计算,获取第二相关性系数;Performing a correlation calculation on the digital speech signal and the binary decomposition signal component to obtain a second correlation coefficient;
    选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量。A binary signal component with a second correlation coefficient greater than a preset threshold is selected as the updated second signal component.
  6. 如权利要求1所述的语音增强方法,其特征在于,所述对所述第二信号分量进行集成处理,获取目标语音信息,包括:The speech enhancement method according to claim 1, wherein the performing integrated processing on the second signal component to obtain target speech information comprises:
    采用公式
    Figure PCTCN2018094410-appb-100002
    对所述第二信号分量进行集成处理,获取目标语音信号;其中,S N表示第二信号分量,N为正整数且表示第二信号分量的总数量,Z表示目标语音信息。
    Use formula
    Figure PCTCN2018094410-appb-100002
    Performing integration processing on the second signal component to obtain a target voice signal; wherein, S N represents a second signal component, N is a positive integer and represents the total number of the second signal components, and Z represents the target voice information.
  7. 一种语音增强装置,其特征在于,包括:A speech enhancement device, comprising:
    数字语音信号获取模块,用于对原始语音信息进行转换,获取数字语音信号;Digital voice signal acquisition module, for converting original voice information to obtain digital voice signals;
    第一信号分量获取模块,用于采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量;A first signal component acquisition module, configured to decompose the digital voice signal by using an EEMD algorithm to acquire a first signal component;
    第一相关性系数获取模块,用于采用相关性计算公式对所述数字语音信号和所述第一信号分量进行相关性计算,获取第一相关性系数;A first correlation coefficient acquisition module, configured to perform a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient;
    第二信号分量获取模块,用于选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量;A second signal component acquisition module, configured to select, as the second signal component, a first signal component whose first correlation coefficient is greater than a preset threshold;
    目标语音信息获取模块,用于对所述第二信号分量进行集成处理,获取目标语音信息。A target voice information acquisition module is configured to perform integrated processing on the second signal component to acquire target voice information.
  8. 如权利要求7所述的语音增强装置,其特征在于,所述语音增强装置还包括:The voice enhancement device according to claim 7, wherein the voice enhancement device further comprises:
    二分解信号分量获取单元,用于采用EEMD算法对所述第二信号分量进行分解,获取二分解信号分量;A binary decomposition signal component obtaining unit, configured to decompose the second signal component by using an EEMD algorithm to obtain a binary decomposition signal component;
    第二相关性系数获取单元,用于对所述数字语音信号和所述二分解信号分量进行相关性计算,获取第二相关性系数;A second correlation coefficient obtaining unit, configured to perform a correlation calculation on the digital speech signal and the binary decomposition signal component to obtain a second correlation coefficient;
    第二信号分量更新单元,用于选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量。The second signal component updating unit is configured to select a binary decomposition signal component whose second correlation coefficient is greater than a preset threshold as the updated second signal component.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and is characterized in that the processor implements the computer-readable instructions as follows step:
    对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
    采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量;Use the EEMD algorithm to decompose the digital voice signal to obtain a first signal component;
    采用相关性计算公式对所述数字语音信号和所述第一信号分量进行相关性计算,获取第一相关性系数;Performing a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient;
    选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量;Selecting a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component;
    对所述第二信号分量进行集成处理,获取目标语音信息。Performing integration processing on the second signal component to obtain target voice information.
  10. 如权利要求9所述的计算机设备,其特征在于,所述采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量,包括:The computer device according to claim 9, wherein the using the EEMD algorithm to decompose the digital voice signal to obtain a first signal component comprises:
    向所述数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号;Adding different normally distributed white noise sequences to the digital voice signal to obtain a voice signal to be processed;
    对所述待处理语音信号进行EMD分解,获取所述待处理语音信号对应的中间信号分量;EMD decompose the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed;
    对所述中间信号分量进行取均值运算,获取所述第一信号分量。Performing an averaging operation on the intermediate signal component to obtain the first signal component.
  11. 如权利要求10所述的计算机设备,其特征在于,所述对所述待处理语音信号进行EMD分解,获取所述待处理语音信号对应的中间信号分量,包括:The computer device according to claim 10, wherein performing EMD decomposition on the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed comprises:
    获取所述待处理语音信号的局部极值点,每个局部极值点包括极大值点和极小值点;Obtaining local extreme points of the speech signal to be processed, each local extreme point including a maximum point and a minimum point;
    基于所有局部极值点中的极大值点构建上包络线,并基于所有局部极值点中的极小值点构建下包络线;Construct an upper envelope based on the maximum points of all local extreme points and a lower envelope based on the minimum points of all local extreme points;
    基于所述上包络线和所述下包络线,获取上包络线和下包络线对应的均值;Obtaining an average value corresponding to the upper envelope and the lower envelope based on the upper envelope and the lower envelope;
    基于待处理语音信号和所述均值,获取初始信号分量,若所述初始信号分量符合预设条件,则所述初始信号分量为中间信号分量。An initial signal component is obtained based on the speech signal to be processed and the average value. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
  12. 如权利要求9所述的计算机设备,其特征在于,所述相关性计算公式为
    Figure PCTCN2018094410-appb-100003
    其中,x为所述数字语音信号,y为所述第一信号分量,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为所述第一相关性系数。
    The computer device according to claim 9, wherein the correlation calculation formula is
    Figure PCTCN2018094410-appb-100003
    Where x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, and Var [y] is the variance of y , R is the first correlation coefficient.
  13. 如权利要求9所述的计算机设备,其特征在于,在所述选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量的步骤之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein after the step of selecting the first signal component with the first correlation coefficient larger than a preset threshold as the second signal component, the processor When the computer-readable instructions are executed, the following steps are also implemented:
    采用EEMD算法对所述第二信号分量进行分解,获取二分解信号分量;Use the EEMD algorithm to decompose the second signal component to obtain a second decomposed signal component;
    对所述数字语音信号和所述二分解信号分量进行相关性计算,获取第二相关性系数;Performing a correlation calculation on the digital speech signal and the binary decomposition signal component to obtain a second correlation coefficient;
    选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量。A binary signal component with a second correlation coefficient greater than a preset threshold is selected as the updated second signal component.
  14. 如权利要求9所述的计算机设备,其特征在于,所述对所述第二信号分量进行集成处理,获取目标语音信息,包括:The computer device according to claim 9, wherein the performing integrated processing on the second signal component to obtain target voice information comprises:
    采用公式
    Figure PCTCN2018094410-appb-100004
    对所述第二信号分量进行集成处理,获取目标语音信号;其中,S N表示第二信号分量,N为正整数且表示第二信号分量的总数量,Z表示目标语音信息。
    Use formula
    Figure PCTCN2018094410-appb-100004
    Performing integration processing on the second signal component to obtain a target voice signal; wherein, S N represents a second signal component, N is a positive integer and represents the total number of the second signal components, and Z represents the target voice information.
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:
    对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
    采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量;Use the EEMD algorithm to decompose the digital voice signal to obtain a first signal component;
    采用相关性计算公式对所述数字语音信号和所述第一信号分量进行相关性计算,获取第一相关性系数;Performing a correlation calculation on the digital voice signal and the first signal component by using a correlation calculation formula to obtain a first correlation coefficient;
    选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量;Selecting a first signal component whose first correlation coefficient is greater than a preset threshold as the second signal component;
    对所述第二信号分量进行集成处理,获取目标语音信息。Performing integration processing on the second signal component to obtain target voice information.
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述采用EEMD算法对所述数字语音信号进行分解,获取第一信号分量,包括:The non-volatile readable storage medium according to claim 15, wherein the step of decomposing the digital voice signal by using an EEMD algorithm to obtain a first signal component comprises:
    向所述数字语音信号中加入不同的正态分布的白噪声序列,获取待处理语音信号;Adding different normally distributed white noise sequences to the digital voice signal to obtain a voice signal to be processed;
    对所述待处理语音信号进行EMD分解,获取所述待处理语音信号对应的中间信号分量;EMD decompose the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed;
    对所述中间信号分量进行取均值运算,获取所述第一信号分量。Performing an averaging operation on the intermediate signal component to obtain the first signal component.
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述对所述待处理语音信号进行EMD分解,获取所述待处理语音信号对应的中间信号分量,包括:The non-volatile readable storage medium according to claim 16, wherein the EMD decomposition of the speech signal to be processed to obtain an intermediate signal component corresponding to the speech signal to be processed comprises:
    获取所述待处理语音信号的局部极值点,每个局部极值点包括极大值点和极小值点;Obtaining local extreme points of the speech signal to be processed, each local extreme point including a maximum point and a minimum point;
    基于所有局部极值点中的极大值点构建上包络线,并基于所有局部极值点中的极小值点构建下包络线;Construct an upper envelope based on the maximum points of all local extreme points and a lower envelope based on the minimum points of all local extreme points;
    基于所述上包络线和所述下包络线,获取上包络线和下包络线对应的均值;Obtaining an average value corresponding to the upper envelope and the lower envelope based on the upper envelope and the lower envelope;
    基于待处理语音信号和所述均值,获取初始信号分量,若所述初始信号分量符合预设条件,则所述初始信号分量为中间信号分量。An initial signal component is obtained based on the speech signal to be processed and the average value. If the initial signal component meets a preset condition, the initial signal component is an intermediate signal component.
  18. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述相关性计算公式为
    Figure PCTCN2018094410-appb-100005
    其中,x为所述数字语音信号,y为所述第一信号分量,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为所述第一相关性系数。
    The non-volatile readable storage medium according to claim 15, wherein the correlation calculation formula is
    Figure PCTCN2018094410-appb-100005
    Where x is the digital voice signal, y is the first signal component, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, and Var [y] is the variance of y , R is the first correlation coefficient.
  19. 如权利要求15所述的非易失性可读存储介质,其特征在于,在所述选取所述第一相关性系数大于预设阈值的第一信号分量,作为所述第二信号分量的步骤之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The non-volatile readable storage medium according to claim 15, wherein, in the step of selecting the first signal component with the first correlation coefficient larger than a preset threshold, as the second signal component, Thereafter, when the computer-readable instructions are executed by one or more processors, the one or more processors further perform the following steps:
    采用EEMD算法对所述第二信号分量进行分解,获取二分解信号分量;Use the EEMD algorithm to decompose the second signal component to obtain a second decomposed signal component;
    对所述数字语音信号和所述二分解信号分量进行相关性计算,获取第二相关性系数;Performing a correlation calculation on the digital speech signal and the binary decomposition signal component to obtain a second correlation coefficient;
    选取第二相关性系数大于预设阈值的二分解信号分量,作为更新的第二信号分量。A binary signal component with a second correlation coefficient greater than a preset threshold is selected as the updated second signal component.
  20. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述对所述第二信号分量进行集成处理,获取目标语音信息,包括:The non-volatile readable storage medium according to claim 15, wherein the performing integrated processing on the second signal component to obtain target voice information comprises:
    采用公式
    Figure PCTCN2018094410-appb-100006
    对所述第二信号分量进行集成处理,获取目标语音信号;其中,S N表示第二信号分量,N为正整数且表示第二信号分量的总数量,Z表示目标语音信息。
    Use formula
    Figure PCTCN2018094410-appb-100006
    Performing integration processing on the second signal component to obtain a target voice signal; wherein, S N represents a second signal component, N is a positive integer and represents the total number of the second signal components, and Z represents the target voice information.
PCT/CN2018/094410 2018-05-29 2018-07-04 Speech enhancement method and apparatus, computer device, and storage medium WO2019227589A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810528846.0 2018-05-29
CN201810528846.0A CN108682429A (en) 2018-05-29 2018-05-29 Sound enhancement method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2019227589A1 true WO2019227589A1 (en) 2019-12-05

Family

ID=63807090

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094410 WO2019227589A1 (en) 2018-05-29 2018-07-04 Speech enhancement method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN108682429A (en)
WO (1) WO2019227589A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109671433B (en) * 2019-01-10 2023-06-16 腾讯科技(深圳)有限公司 Keyword detection method and related device
CN109785854B (en) * 2019-01-21 2021-07-13 福州大学 Speech enhancement method combining empirical mode decomposition and wavelet threshold denoising
CN111107478B (en) * 2019-12-11 2021-04-09 江苏爱谛科技研究院有限公司 Sound enhancement method and sound enhancement system
CN112002343B (en) * 2020-08-18 2024-01-23 海尔优家智能科技(北京)有限公司 Speech purity recognition method and device, storage medium and electronic device
CN112420066A (en) * 2020-11-05 2021-02-26 深圳市卓翼科技股份有限公司 Noise reduction method, noise reduction device, computer equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853666A (en) * 2009-03-30 2010-10-06 华为技术有限公司 Speech enhancement method and device
US8219390B1 (en) * 2003-09-16 2012-07-10 Creative Technology Ltd Pitch-based frequency domain voice removal
CN104299620A (en) * 2014-09-22 2015-01-21 河海大学 Speech enhancement method based on EMD algorithm
CN107045874A (en) * 2016-02-05 2017-08-15 深圳市潮流网络技术有限公司 A kind of Non-linear Speech Enhancement Method based on correlation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7941298B2 (en) * 2006-09-07 2011-05-10 DynaDx Corporation Noise-assisted data analysis method, system and program product therefor
CN103106903B (en) * 2013-01-11 2014-10-22 太原科技大学 Single channel blind source separation method
CN104679981A (en) * 2014-12-25 2015-06-03 新疆大学 Vibration signal noise reduction method based on variable-step-length LMS-EEMD
CN105788603B (en) * 2016-02-25 2019-04-16 深圳创维数字技术有限公司 A kind of audio identification methods and system based on empirical mode decomposition
CN106798554A (en) * 2017-01-12 2017-06-06 安徽大学 A kind of denoising method of noisy IMF components and electrocardiosignal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8219390B1 (en) * 2003-09-16 2012-07-10 Creative Technology Ltd Pitch-based frequency domain voice removal
CN101853666A (en) * 2009-03-30 2010-10-06 华为技术有限公司 Speech enhancement method and device
CN104299620A (en) * 2014-09-22 2015-01-21 河海大学 Speech enhancement method based on EMD algorithm
CN107045874A (en) * 2016-02-05 2017-08-15 深圳市潮流网络技术有限公司 A kind of Non-linear Speech Enhancement Method based on correlation

Also Published As

Publication number Publication date
CN108682429A (en) 2018-10-19

Similar Documents

Publication Publication Date Title
WO2019227589A1 (en) Speech enhancement method and apparatus, computer device, and storage medium
CN108831500B (en) Speech enhancement method, device, computer equipment and storage medium
US10621971B2 (en) Method and device for extracting speech feature based on artificial intelligence
WO2020107269A1 (en) Self-adaptive speech enhancement method, and electronic device
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
WO2021189642A1 (en) Method and device for signal processing, computer device, and storage medium
CN110797041B (en) Speech noise reduction processing method and device, computer equipment and storage medium
WO2020192009A1 (en) Silence detection method based on neural network, and terminal device and medium
WO2019237519A1 (en) General vector training method, voice clustering method, apparatus, device and medium
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
CN111261182B (en) Wind noise suppression method and system suitable for cochlear implant
WO2022141868A1 (en) Method and apparatus for extracting speech features, terminal, and storage medium
WO2019232826A1 (en) I-vector extraction method, speaker recognition method and apparatus, device, and medium
WO2021127978A1 (en) Speech synthesis method and apparatus, computer device and storage medium
EP3066664A1 (en) Speech processing system
WO2019232848A1 (en) Voice distinguishing method and device, computer device and storage medium
Hasannezhad et al. PACDNN: A phase-aware composite deep neural network for speech enhancement
CN113345463A (en) Voice enhancement method, device, equipment and medium based on convolutional neural network
WO2019227588A1 (en) Voice enhancement method and apparatus, and computer device and storage medium
WO2021104189A1 (en) Method, apparatus, and device for generating high-sampling rate speech waveform, and storage medium
CN114362186B (en) Power system tide adjusting method, device, equipment and storage medium
CN116152549A (en) Image classification model and image classification method based on depth network
CN112309404B (en) Machine voice authentication method, device, equipment and storage medium
Lu et al. Temporal modulation normalization for robust speech feature extraction and recognition
CN111192569A (en) Double-microphone voice feature extraction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920716

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18920716

Country of ref document: EP

Kind code of ref document: A1