WO2019227588A1 - Voice enhancement method and apparatus, and computer device and storage medium - Google Patents

Voice enhancement method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2019227588A1
WO2019227588A1 PCT/CN2018/094409 CN2018094409W WO2019227588A1 WO 2019227588 A1 WO2019227588 A1 WO 2019227588A1 CN 2018094409 W CN2018094409 W CN 2018094409W WO 2019227588 A1 WO2019227588 A1 WO 2019227588A1
Authority
WO
WIPO (PCT)
Prior art keywords
singular
matrix
singular values
singular value
target
Prior art date
Application number
PCT/CN2018/094409
Other languages
French (fr)
Chinese (zh)
Inventor
涂宏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019227588A1 publication Critical patent/WO2019227588A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • G10L21/0202

Definitions

  • the present application relates to the field of signal processing, and in particular, to a method, a device, a computer device, and a storage medium for voice enhancement.
  • the voice signals collected on computer equipment include both the voice information corresponding to the voice of the speaker, the voice information being valid information, and noise information other than the voice of the speaker.
  • the speech signals collected by the computer equipment need to be enhanced (that is, noise reduction processing is performed on the speech signals) in order to extract as much purer speech signals as possible from the speech signals to make speech recognition more accurate.
  • the accuracy of the currently extracted speech signal after speech enhancement processing on the speech signal is not high, which is not conducive to subsequent speech recognition.
  • a speech enhancement method includes:
  • a voice enhancement device includes:
  • Digital voice signal acquisition module for converting original voice information to obtain digital voice signals
  • a Hankel matrix acquisition module configured to acquire a Hankel matrix based on the digital speech signal
  • a singular value acquisition module configured to perform singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values
  • a target voice signal acquisition module configured to perform an inverse singular value decomposition operation on at least two of the singular values to obtain a target voice signal
  • a target voice information acquisition module is configured to perform restoration processing on the target voice signal to acquire target voice information.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
  • FIG. 1 is an application environment diagram of a speech enhancement method according to an embodiment of the present application
  • FIG. 2 is a flowchart of a speech enhancement method according to an embodiment of the present application.
  • FIG. 3 is a specific flowchart of step S30 in FIG. 2;
  • FIG. 4 is a specific flowchart of step S40 in FIG. 2;
  • step S411 in FIG. 4 is a specific flowchart of step S411 in FIG. 4;
  • FIG. 6 is a specific flowchart of step S40 in FIG. 2;
  • FIG. 7 is a schematic diagram of a speech enhancement device according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a computer device according to an embodiment of the present application.
  • the voice enhancement method provided by the embodiment of the present application may be applied in the application environment shown in FIG. 1, where a computer device communicates with a server through a network.
  • Computer devices can be, but are not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices.
  • the server can be implemented as a stand-alone server.
  • the speech enhancement method can be specifically applied to computer equipment configured by financial institutions such as banks, securities, and insurance, or other institutions, and is used to enhance speech signals during speech recognition to improve the accuracy of recognition.
  • the speech enhancement method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
  • the original voice information is voice information of a speaker collected by a recording module (such as a microphone) in a computer device.
  • the original voice information may be voice information in wav, mp3, or other formats.
  • Digital voice signals refer to discrete digital signals obtained by converting original voice information. Since computer equipment cannot directly process the original voice information, it can only process binary data, so the original voice information needs to be converted into digital voice signals.
  • the server receives the original voice information sent by the computer device, and reads the original voice information by using a command function for reading an audio file in the Python module to obtain a digital voice signal.
  • the command function for reading an audio file may be wave.open (file (original voice information), rb (read file operation)).
  • the command function for reading an audio file is used to read and obtain the original voice information.
  • the one-dimensional array of the received audio files is the digital voice signal.
  • a Python module is a module containing a large number of encapsulated functions written in an object-oriented interpreted computer-readable instruction design language.
  • a command function for reading an audio file in the Python module is used to directly read the original voice information to obtain a digital voice signal, which is simple to implement.
  • the digital voice signal is a one-dimensional digital information obtained by converting the original voice information.
  • the digital voice signal is a one-dimensional digital signal obtained by directly reading the original voice information by using the command function of the read audio file in the Python module. .
  • the digital voice signal is a one-dimensional digital signal of one-dimensional digital information obtained by converting the original voice information.
  • Hankel matrix refers to a square matrix with equal elements on each subdiagonal.
  • the elements in the j-th row of the Hankel matrix are formed by shifting the elements from the previous row one element to the left, so that the elements on each subdiagonal in the Hankel matrix are equal, that is, the elements in each row are related to their lower left corner.
  • the adjacent elements are equal.
  • the diagonal from the upper right corner to the lower left corner is the sub diagonal.
  • the elements of the first column and the last row of the Hankel matrix need to be defined in advance in order to determine the rows and columns of the Hankel matrix.
  • Singular Value Decomposition (SVD Decomposition for short) is an important matrix factorization in linear algebra.
  • This singular value decomposition operation can effectively reduce the dimension of a large amount of data to reduce the amount of calculation and save operation time.
  • the server performs singular value decomposition on the Hankel matrix to obtain two unitary matrices and a semi-positive definite diagonal matrix.
  • the values on the diagonal of the semi-definite definite diagonal matrix are singular values.
  • the singular values generally contain N (N > 2), in order from largest to smallest.
  • the singular value can represent the important information hidden in the matrix, and the importance is positively related to the size of the singular value. Understandably, the larger the singular value is, the larger the effective information amount of the digital voice signal contained in the singular value is. The more noise is considered to be included in the example.
  • the server obtains at least two singular values by performing singular value decomposition operation processing on the Hankel matrix, and can intuitively observe the degree of effective information contained in the singular values, which is convenient for noise reduction processing.
  • the unitary matrix refers to a matrix that satisfies the condition that n column vectors in the matrix are orthogonal unit vectors, that is, the conjugate transpose of the unitary matrix is equal to its inverse matrix.
  • a semi-definite definite diagonal matrix refers to a matrix that is both a semi-definite definite matrix and a diagonal matrix.
  • a semi-positive definite matrix is an n-th order square matrix with X'AX ⁇ 0 (X 'represents the transpose of X) for any non-zero vector X, where A is a semi-positive definite matrix.
  • a diagonal matrix is a matrix with zero elements except the main diagonal (the diagonal from the upper left corner to the lower right corner).
  • step S30 the singular value decomposition operation processing is performed on the Hankel matrix to obtain at least two singular values, and the specific steps include the following steps:
  • the transposed matrix of the Hankel matrix refers to a matrix obtained by mirror-inverting all elements of the Hankel matrix around a ray of 45 degrees below and to the right starting from the elements in the first row and the first column.
  • the Hankel matrix Hankel matrix transpose matrix Provide technical support for the subsequent acquisition of eigenvalues by obtaining the transposed matrix of the Hankel matrix.
  • A be the Hankel matrix and A T be the transposed matrix
  • the Johankel matrix Hankel's transpose matrix Based on the product of the Hankel matrix and the transposed matrix, at least two eigenvalues are obtained, which specifically include the following process:
  • a matrix determinant is used to process the matrix B and the matrix B ′ to obtain at least two eigenvalues.
  • the calculation formula of matrix determinant is The matrix ⁇ number represents the sum of all permutations, ⁇ represents the inverse ordinal number of the permutations k 1 k 2 ... k n , and D is called the determinant of the matrix.
  • the server obtains eigenvalues and eigenvectors based on the product of the Hankel matrix and the transposed matrix to achieve the purpose of data dimensionality reduction.
  • S33 Operate at least two eigenvalues according to a preset calculation method to obtain at least two singular values.
  • the preset calculation method refers to a predefined calculation method for calculating singular values by calculating characteristic values.
  • the server uses the formula By performing a square operation on at least two eigenvalues, at least two singular values can be obtained, where ⁇ i is a singular value and ⁇ i is a eigenvalue.
  • the server performs a square root operation on the eigenvalues to obtain a singular value. The calculation is simple and the efficiency is improved.
  • u i is a feature vector corresponding to the eigenvalues of matrix B
  • v i is a feature vector corresponding to the eigenvalues of matrix B ′.
  • the transpose matrix of the Hankel matrix is first calculated so as to obtain at least two eigenvalues based on the product of the Hankel matrix and the transpose matrix, and then based on the obtained eigenvalues, the Hankel matrix-based The matrix obtained by multiplying the product with the transposed matrix is scaled to achieve the purpose of reducing the dimension of the data. Finally, at least two eigenvalues are subjected to a square operation to obtain at least two singular values. The method for obtaining the singular values is simple to calculate and easy to implement.
  • S40 Perform inverse singular value decomposition operation on at least two singular values to obtain a target speech signal.
  • the singular value decomposition inverse operation refers to reducing each singular value into a semi-positive definite diagonal matrix, and multiplying the semi-positive definite diagonal matrix with two unitary matrices obtained by the previous singular value decomposition operation to obtain the target speech.
  • the target speech signal is a denoised speech signal obtained by performing singular value decomposition on a digital speech signal.
  • the server performs an inverse singular value decomposition operation on at least two singular values to obtain a voice signal (that is, a target voice signal) corresponding to each singular value, so as to achieve the purpose of voice enhancement.
  • step S40 the singular value decomposition inverse operation is performed on at least two singular values to obtain a target voice signal, which specifically includes the following steps:
  • S411 Perform singular value decomposition and inverse operation processing on at least two singular values, respectively, to obtain an original signal component corresponding to each singular value.
  • the original signal component is a signal component obtained by performing singular value decomposition inverse operation processing on at least two singular values respectively. Specifically, each singular value is reduced (the position of the singular value in the matrix is unchanged) into a semi-positive definite diagonal matrix, and multiplied by two unitary matrices obtained from the previous singular value decomposition operation to obtain each singular value. Corresponding original signal component.
  • S412 Perform correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient.
  • the correlation coefficient is a calculation result obtained by performing correlation calculation on the digital voice signal and the first signal component.
  • the first correlation coefficient reflects the degree of correlation between the digital speech signal and the first signal component, and also reflects the degree to which the signal component contains an effective amount of information.
  • the correlation calculation formula is Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
  • Cov (x, y) is calculated as:
  • E (x) represents the average value of the original signal components
  • E (y) represents the average value of the digital speech signals
  • n represents the number of original signal components
  • y j represents the j-th digital speech signal on the time scale.
  • x j represents the j-th original signal component on the same time scale.
  • S413 Select an original signal component whose correlation coefficient is greater than a preset threshold as a target signal component.
  • the preset threshold is a predefined threshold for filtering the original signal components.
  • the target signal component is an original signal component obtained by performing a filtering operation on the original signal component using a preset threshold.
  • the preset threshold is selected as a real number between 0 and 1. If the correlation coefficient is greater than a preset threshold, it means that the original signal component has a large correlation with the digital voice signal, and the original signal component contains a large amount of effective information of the digital voice signal. If the correlation coefficient is not greater than a preset threshold value, it means that the correlation between the original signal component and the digital voice signal is small, and the amount of effective information contained in the original signal component is small, and the noise may be defaulted.
  • the original signal components are filtered to obtain the original signal components with greater correlation with the digital speech signal as the target signal components to reduce noise interference and achieve the purpose of speech enhancement.
  • the method for screening original signal components is simple to implement and improves the efficiency of speech enhancement.
  • S414 Perform linear superposition processing on the target signal components to obtain a target voice signal.
  • the server first obtains the original signal component corresponding to each singular value by performing singular value decomposition and inverse operation processing on each singular value, so as to perform correlation calculation between the original signal component and the digital voice signal to obtain correlation.
  • the coefficient reflects the degree of correlation between the digital speech signal and the first signal component, and also reflects the degree to which the signal component contains an effective amount of information.
  • the server screens each original signal component to obtain the original signal component with greater correlation with the digital speech signal as the target signal component, in order to reduce noise interference in more detail, and achieve the purpose of speech enhancement.
  • the target signal components are linearly superimposed to obtain the target speech signal.
  • the process of obtaining the target speech signal is simple to calculate, easy to implement, and improves the processing efficiency of speech enhancement.
  • step S411 at least two singular values are separately subjected to singular value decomposition and inverse operation processing to obtain an original signal component corresponding to each singular value, which specifically includes the following steps:
  • S4111 Obtain a singular value matrix based on the singular values.
  • the singular value matrix is a matrix obtained by reducing each singular value in a semi-positive definite diagonal matrix.
  • the server restores each singular value in a semi-positive definite diagonal matrix to obtain a singular value matrix.
  • each singular value is restored to obtain a corresponding singular value matrix, which can be expressed according to the following formula Among them, D n represents a singular value matrix corresponding to the n-th singular value.
  • each singular value matrix is operated according to the following formula to obtain an original signal component corresponding to each singular value.
  • U and V * are two two unitary matrices
  • D is the singular value matrix corresponding to each singular value, that is, D 1 , D 2 ... D n
  • H is the original signal component corresponding to each singular value
  • Bu i ⁇ i u i
  • each singular value is first reduced in a semi-positive definite diagonal matrix to obtain a singular value matrix, and then the singular value matrix corresponding to each singular value and the two unitary matrices obtained by the singular value decomposition operation are performed.
  • Multiplication operation to obtain the original signal component corresponding to each singular value, and to provide technical support for subsequent filtering of the original signal component to obtain the target signal component.
  • step S40 at least two of the singular values are subjected to singular value decomposition inverse operation to obtain target voice information, and specifically include the following steps:
  • S421 Calculate the sum of at least two singular values, multiply the sum by a preset threshold, and obtain a corresponding evaluation threshold.
  • the preset threshold is a positive number not greater than 1.
  • the preset threshold is a threshold defined in advance for calculating an evaluation threshold.
  • the evaluation threshold is a threshold used for screening singular values.
  • the preset threshold is a positive number not greater than 1. Specifically, a sum of all singular values is calculated, and then the sum is multiplied with a preset threshold to obtain an evaluation threshold. That is, the calculation formula of the evaluation threshold is: Among them, T is a preset threshold, P is an evaluation threshold, and ⁇ i is a singular value.
  • S422 Perform linear superposition of at least two singular values in order from large to small to obtain a superposition sum value. If the superposition sum value is greater than the evaluation threshold, obtain N singular values corresponding to the superposition sum value. Where N is a positive integer.
  • the singular values are arranged in descending order. Therefore, the singular values are added linearly in order from large to small to obtain the superposition sum value. If the superposition sum value is greater than the evaluation threshold, the superposition sum value is obtained. Singular values of N terms, where N is a positive integer. Understandably, the linear addition is performed in the order of the singular values from large to small until the sum of the singular values of the superimposed N items is greater than the evaluation threshold, then the superimposition is stopped to obtain the N singular values. As the singular value is larger, the effective information amount of the digital voice signal contained in the singular value is larger. On the other hand, the smaller the singular value is, the less effective information amount of the digital voice signal contained in the singular value is considered to be the main Contains noise.
  • the server linearly adds the singular values in ascending order until the sum of the values of the singular values of the N items is larger than the evaluation threshold, and removes the remaining singular values of the M items to reduce noise interference.
  • the singular value screening process does not need to perform inverse operation on each singular value, and then performs correlation analysis.
  • the required singular value can be filtered directly based on the evaluation threshold, which is simple to operate and improves efficiency.
  • the batch reconstruction refers to a method of performing batch restoration processing on N singular values to obtain target voice information.
  • batch reconstruction is performed on the N singular values, and the specific implementation process of obtaining the target speech signal is as follows:
  • the selected N singular values are retained in the original semi-positive definite diagonal matrix D obtained by singular value decomposition operation, and the size and position
  • the singular values (that is, the singular values representing noise) are reduced to 0 and the positions are unchanged in the semi-definite definite diagonal matrix to obtain the target semi-definite definite diagonal matrix M containing the selected N singular values.
  • the matrix H ' is expanded according to the properties of the Hankel matrix (that is, the elements on each subdiagonal are equal), and a denoised speech signal, that is, a target speech signal in this embodiment, can be obtained.
  • the inverse singular value decomposition includes inverse decomposition of each singular value or batch reconstruction of singular values to obtain a target speech signal.
  • the sum of at least two singular values is calculated, and the sum is multiplied with a preset threshold to obtain an evaluation threshold, so that at least two singular values are linearly added in order from large to small until superimposed.
  • the sum of the N singular values is greater than the evaluation threshold, then the superposition is stopped to obtain the N singular values, and the remaining M singular values are removed to reduce noise interference and achieve the purpose of speech enhancement.
  • batch reconstruction is performed on the N singular values to obtain the target speech signal.
  • the process of obtaining the target speech signal can directly restore the selected N singular values in the original semi-positive definite diagonal matrix D obtained by the singular value decomposition operation.
  • multiply the two unitary matrices obtained by the singular value decomposition operation to obtain the target speech signal, and obtain the target speech signal by means of batch reconstruction to improve the acquisition efficiency of the target speech signal, and then improve the processing efficiency of speech enhancement.
  • S50 Perform restoration processing on the target voice signal to obtain target voice information.
  • the target voice information is voice information obtained by restoring the target voice signal in a required audio format.
  • the server can use the following method to restore the target speech signal in the form of a matrix: first expand the Hankel matrix according to the subdiagonal elements, and then obtain the one-dimensional digital signal after noise reduction, by adding the sampling frequency parameter And a one-dimensional digital signal to obtain the target voice information.
  • the sampling frequency is also called the sampling speed or sampling rate, which defines the number of samples that are extracted from the continuous signal per second to form a discrete signal. It is expressed in Hertz (Hz).
  • a command function for reading an audio file in the Python module is used to directly read the original voice information to obtain the sampling frequency parameter.
  • the Python module has a function for generating audio files in different formats. Calling this function directly and assigning a sampling frequency parameter and a one-dimensional digital signal can generate the target voice information in the required format. For example, you can call the function wave that generates a wav format file in the Python module to process the acquired sampling frequency parameters and one-dimensional digital signals to generate an audio file (that is, target voice information) in the wav format.
  • the original voice information is first converted to obtain a digital voice signal, and the digital voice signal is constructed into a Hankel matrix, so that the Hankel matrix is subjected to singular value decomposition operation processing to obtain at least two singular values.
  • the value can represent the important information implied in the matrix, and the importance is positively related to the size of the singular value. According to the singular value obtained, the degree of the effective information contained in the singular value can be intuitively observed.
  • the server performs a singular value decomposition inverse operation on at least two singular values to obtain a speech signal corresponding to each singular value, that is, a target speech signal, so as to suppress noise interference and implement speech enhancement.
  • the target voice signal is restored to obtain the audio file in the required format, that is, the target voice information.
  • the restoration process can directly call the function in the Python module for restoration, and the operation is simple.
  • FIG. 7 shows a schematic diagram of a speech enhancement device corresponding to the speech enhancement method in the above embodiment.
  • the voice enhancement device includes a digital voice signal acquisition module 10, a Hankel matrix acquisition module 20, a singular value acquisition module 30, a target voice signal acquisition module 40, and a target voice information acquisition module.
  • the detailed description of each function module is as follows:
  • the digital voice signal acquisition module 10 is configured to convert the original voice information to obtain a digital voice signal.
  • the Hankel matrix obtaining module 20 is configured to obtain a Hankel matrix based on a digital voice signal.
  • the singular value acquisition module 30 is configured to perform singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values.
  • the target voice signal acquisition module 40 is configured to perform an inverse singular value decomposition operation on at least two singular values to obtain a target voice signal.
  • the target voice information acquisition module 50 is configured to perform restoration processing on the target voice signal to acquire the target voice information.
  • the singular value acquisition module 30 includes a transposed matrix calculation unit 31, a eigenvalue acquisition unit 32, and a singular value acquisition unit 33.
  • the transposed matrix calculation unit 31 is configured to calculate a transposed matrix of the Hankel matrix.
  • An eigenvalue obtaining unit 32 is configured to obtain at least two eigenvalues based on a product of a Hankel matrix and a transposed matrix.
  • the singular value obtaining unit 33 is configured to perform an operation on at least two eigenvalues according to a preset calculation method to obtain at least two singular values.
  • the target speech signal acquisition module 40 includes an original signal component acquisition unit 411, a correlation coefficient acquisition unit 412, a target signal component acquisition unit 413, and a target speech signal acquisition unit 414.
  • the original signal component acquiring unit 411 is configured to perform singular value decomposition and inverse operation processing on at least two singular values, respectively, to obtain an original signal component corresponding to each singular value.
  • a correlation coefficient acquisition unit 412 is configured to perform correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient.
  • the target signal component acquiring unit 413 is configured to select an original signal component whose correlation coefficient is greater than a preset threshold as a target signal component.
  • the target voice signal acquisition unit 414 is configured to perform linear superposition processing on the target signal components to acquire a target voice signal.
  • the original signal component acquisition unit 411 includes a singular value matrix acquisition subunit 4111 and an original signal component acquisition subunit 4112.
  • the singular value matrix obtaining subunit 4111 is configured to obtain a singular value matrix based on the singular value.
  • the original signal component acquisition subunit 4112 is configured to obtain an original signal component corresponding to each singular value based on the eigenvalue and the singular value matrix.
  • the correlation calculation formula is Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
  • the target voice signal acquisition module 40 includes an evaluation threshold acquisition unit 421, N-term singular value acquisition units 422, and a target voice signal acquisition unit 423.
  • the evaluation threshold obtaining unit 421 is configured to calculate a sum of at least two singular values, and multiply the sum with a preset threshold to obtain a corresponding evaluation threshold.
  • the preset threshold is a positive number not greater than 1.
  • the N-term singular value obtaining unit 422 is configured to linearly superimpose at least two singular values in order from large to small to obtain a superposition sum value. If the superposition sum value is greater than the evaluation threshold, obtain N singularities corresponding to the superposition sum value Value; where N is a positive integer.
  • the target voice signal acquisition unit 423 is configured to perform batch reconstruction on the N singular values to acquire a target voice signal.
  • Each module in the above voice enhancement device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium.
  • the database of the computer device is used to store data generated or obtained during the execution of the speech enhancement method, such as target speech information.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by one or more processors, the one or more processors are executed to implement a speech enhancement method.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions to implement the following steps:
  • the speech information is converted to obtain a digital speech signal; based on the digital speech signal, a Hankel matrix is obtained; a singular value decomposition operation is performed on the Hankel matrix to obtain at least two singular values; and at least two singular values are singular value decomposition
  • the inverse operation is performed to obtain the target voice signal; the target voice signal is restored to obtain the target voice information.
  • the processor when the processor executes the computer-readable instructions, the following steps are further implemented: calculating a transposed matrix of the Hankel matrix; obtaining at least two eigenvalues based on a product of the Hankel matrix and the transposed matrix; The design calculation method operates on at least two eigenvalues to obtain at least two singular values.
  • the processor when the processor executes the computer-readable instructions, the processor further implements the following steps: performing singular value decomposition and inverse operation processing on at least two singular values, respectively, to obtain an original signal component corresponding to each singular value;
  • the digital speech signal is subjected to correlation calculation to obtain a correlation coefficient; an original signal component with a correlation coefficient greater than a preset threshold is selected as a target signal component.
  • the correlation calculation formula is Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
  • the processor when the processor executes the computer-readable instructions, the following steps are further implemented: obtaining a singular value matrix based on the singular values; and obtaining an original signal component corresponding to each singular value based on the eigenvalues and the singular value matrix.
  • the processor when the processor executes the computer-readable instructions, the processor further implements the following steps: calculating a sum of at least two singular values, and multiplying the sum with a preset threshold to obtain a corresponding evaluation threshold; wherein the preset threshold is A positive number not greater than 1. At least two singular values are linearly superimposed in order from large to small to obtain a superposition sum value. If the superposition sum value is greater than the evaluation threshold, then N singular values corresponding to the superposition sum value are obtained; where N is a positive integer. Perform batch reconstruction on N singular values to obtain the target speech signal.
  • one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more When the processors are executed, the following steps are implemented: converting the original speech information to obtain a digital speech signal; obtaining a Hankel matrix based on the digital speech signal; performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values ; Performing inverse singular value decomposition on at least two singular values to obtain a target voice signal; and performing restoration processing on the target voice signal to obtain target voice information.
  • the execution of the one or more processors further implements the following steps: calculating a transpose matrix of a Hankel matrix; The product of the Kerr matrix and the transposed matrix is used to obtain at least two eigenvalues; the at least two eigenvalues are calculated according to a preset calculation method to obtain at least two singular values.
  • the execution of the one or more processors further implements the following steps: performing singular value decomposition inverse on at least two singular values, respectively.
  • the operation process obtains the original signal component corresponding to each singular value; performs correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient; and selects an original signal component with a correlation coefficient greater than a preset threshold as a target signal component. Perform linear superposition processing on the target signal components to obtain the target speech signal.
  • the correlation calculation formula is Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
  • the execution of the one or more processors further implements the following steps: obtaining a singular value matrix based on the singular values; and based on the eigenvalues And singular value matrix to obtain the original signal component corresponding to each singular value.
  • the execution of the one or more processors further implements the following steps: calculating a sum of at least two singular values, and summing the sum with The preset threshold value is multiplied to obtain a corresponding evaluation threshold value, wherein the preset threshold value is a positive number not greater than 1. At least two singular values are linearly superimposed in order from large to small to obtain a superposition sum value. If the superposition sum value is greater than the evaluation threshold, then N singular values corresponding to the superposition sum value are obtained; where N is a positive integer. Perform batch reconstruction on N singular values to obtain the target speech signal.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Complex Calculations (AREA)

Abstract

A voice enhancement method and apparatus, and a computer device and a storage medium. The voice enhancement method comprises: converting original voice information to obtain a digital voice signal (S10); obtaining a Hankel matrix on the basis of the digital voice signal (S20); performing a singular value decomposition operation on the Hankel matrix to obtain at least two singular values (S30); performing an inverse singular value decomposition operation on the at least two singular values to obtain a target voice signal (S40); and performing reduction processing on the target voice signal to obtain target speech information (S50). The voice enhancement method can effectively inhibit the noise interference, so as to improve the recognition accuracy of the target voice information in the voice recognition process.

Description

语音增强方法、装置、计算机设备及存储介质Voice enhancement method, device, computer equipment and storage medium
本专利申请以2018年5月29日提交的申请号为201810529510.6,名称为“语音增强方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。This patent application is based on a Chinese invention patent application filed on May 29, 2018 with the application number 201810529510.6 and entitled "Voice Enhancement Method, Device, Computer Equipment, and Storage Medium", and claims its priority.
技术领域Technical field
本申请涉及信号处理领域,尤其涉及一种语音增强方法、装置、计算机设备及存储介质。The present application relates to the field of signal processing, and in particular, to a method, a device, a computer device, and a storage medium for voice enhancement.
背景技术Background technique
随着语音识别技术的广泛使用,语音信号处理技术的需求也随之扩大。目前,在计算机设备采集到的语音信号,既包含说话人说话声音对应的语音信息,该语音信息属于有效信息,也包含除了说话人说话声音以外的其他声音形成的噪声信息。在语音识别过程中,若直接对计算机设备采集到的语音信号进行识别,由于噪声信息的存在,会影响语音识别的准确性。因此,需要对计算机设备采集到的语音信号进行增强处理(即对语音信号进行降噪处理),以从该语音信号中尽可能提取到更纯净的语音信号,以使语音识别更加准确。当前对语音信号进行语音增强处理后提取的语音信号精度不高,不利于后续进行语音识别。With the widespread use of speech recognition technology, the demand for speech signal processing technology has also expanded. At present, the voice signals collected on computer equipment include both the voice information corresponding to the voice of the speaker, the voice information being valid information, and noise information other than the voice of the speaker. During the speech recognition process, if the speech signals collected by the computer equipment are directly identified, the accuracy of speech recognition will be affected due to the presence of noise information. Therefore, the speech signals collected by the computer equipment need to be enhanced (that is, noise reduction processing is performed on the speech signals) in order to extract as much purer speech signals as possible from the speech signals to make speech recognition more accurate. The accuracy of the currently extracted speech signal after speech enhancement processing on the speech signal is not high, which is not conducive to subsequent speech recognition.
发明内容Summary of the Invention
基于此,有必要针对上述技术问题,提供一种可以提升语音增强处理后语音信号精度的语音增强方法、装置、计算机设备及存储介质。Based on this, it is necessary to provide a speech enhancement method, device, computer equipment, and storage medium that can improve the accuracy of the speech signal after speech enhancement processing, in response to the above technical problems.
一种语音增强方法,包括:A speech enhancement method includes:
对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
基于所述数字语音信号,获取汉克尔矩阵;Obtaining a Hankel matrix based on the digital speech signal;
对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;Performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values;
对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号;Performing an inverse singular value decomposition operation on at least two of the singular values to obtain a target speech signal;
对所述目标语音信号进行还原处理,获取目标语音信息。Performing restoration processing on the target voice signal to obtain target voice information.
一种语音增强装置,包括:A voice enhancement device includes:
数字语音信号获取模块,用于对原始语音信息进行转换,获取数字语音信号;Digital voice signal acquisition module, for converting original voice information to obtain digital voice signals;
汉克尔矩阵获取模块,用于基于所述数字语音信号,获取汉克尔矩阵;A Hankel matrix acquisition module, configured to acquire a Hankel matrix based on the digital speech signal;
奇异值获取模块,用于对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;A singular value acquisition module, configured to perform singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values;
目标语音信号获取模块,用于对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号;A target voice signal acquisition module, configured to perform an inverse singular value decomposition operation on at least two of the singular values to obtain a target voice signal;
目标语音信息获取模块,用于对所述目标语音信号进行还原处理,获取目标语音信息。A target voice information acquisition module is configured to perform restoration processing on the target voice signal to acquire target voice information.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
基于所述数字语音信号,获取汉克尔矩阵;Obtaining a Hankel matrix based on the digital speech signal;
对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;Performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values;
对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号;Performing an inverse singular value decomposition operation on at least two of the singular values to obtain a target speech signal;
对所述目标语音信号进行还原处理,获取目标语音信息。Performing restoration processing on the target voice signal to obtain target voice information.
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
基于所述数字语音信号,获取汉克尔矩阵;Obtaining a Hankel matrix based on the digital speech signal;
对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;Performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values;
对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号;Performing an inverse singular value decomposition operation on at least two of the singular values to obtain a target speech signal;
对所述目标语音信号进行还原处理,获取目标语音信息。Performing restoration processing on the target voice signal to obtain target voice information.
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features and advantages of the application will become apparent from the description, the drawings, and the claims.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings used in the description of the embodiments of the application will be briefly introduced below. Obviously, the drawings in the following description are just some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.
图1是本申请一实施例中语音增强方法的一应用环境图;FIG. 1 is an application environment diagram of a speech enhancement method according to an embodiment of the present application;
图2是本申请一实施例中语音增强方法的一流程图;2 is a flowchart of a speech enhancement method according to an embodiment of the present application;
图3是图2中步骤S30的一具体流程图;FIG. 3 is a specific flowchart of step S30 in FIG. 2;
图4是图2中步骤S40的一具体流程图;FIG. 4 is a specific flowchart of step S40 in FIG. 2;
图5是图4中步骤S411的一具体流程图;5 is a specific flowchart of step S411 in FIG. 4;
图6是图2中步骤S40的一具体流程图;FIG. 6 is a specific flowchart of step S40 in FIG. 2;
图7是本申请一实施例中语音增强装置的一示意图;7 is a schematic diagram of a speech enhancement device according to an embodiment of the present application;
图8是本申请一实施例中计算机设备的一示意图。FIG. 8 is a schematic diagram of a computer device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of this application.
本申请实施例提供的语音增强方法,可应用在如图1的应用环境中,其中,计算机设备通过网络与服务器进行通信。计算机设备可以但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器来实现。The voice enhancement method provided by the embodiment of the present application may be applied in the application environment shown in FIG. 1, where a computer device communicates with a server through a network. Computer devices can be, but are not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented as a stand-alone server.
该语音增强方法具体可应用在银行、证券、保险等金融机构或者其他机构配置的计算机设备上,用于在语音识别过程中对语音信号进行语音增强,以提高识别的准确率。The speech enhancement method can be specifically applied to computer equipment configured by financial institutions such as banks, securities, and insurance, or other institutions, and is used to enhance speech signals during speech recognition to improve the accuracy of recognition.
在一个实施例中,如图2所示,以该语音增强方法应用于图1中的服务器为例进行说明,包括如下步骤:In one embodiment, as shown in FIG. 2, the speech enhancement method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
S10:对原始语音信息进行转换,获取数字语音信号。S10: Convert the original voice information to obtain a digital voice signal.
其中,原始语音信息是计算机设备中的录音模块(如麦克风)采集到的说话人的语音信息。该原始语音信息可以是wav、mp3或其他格式的语音信息。数字语音信号是指将原始语音信息进行转换所获取的离散数字信号。由于计算机设备是不能直接处理原始语音信息的,它只能处理二进制数据,因此需要将原始语音信息转换为数字语音信号。The original voice information is voice information of a speaker collected by a recording module (such as a microphone) in a computer device. The original voice information may be voice information in wav, mp3, or other formats. Digital voice signals refer to discrete digital signals obtained by converting original voice information. Since computer equipment cannot directly process the original voice information, it can only process binary data, so the original voice information needs to be converted into digital voice signals.
具体地,服务器接收计算机设备发送的原始语音信息,并采用Python模块中的读取音频文件的命令函数对该原始语音信息读取,获取数字语音信号。例如,该读取音频文件的命令函数可以为wave.open(file(原始语音信息),rb(读取文件操作)),通过该读取音频文件的命令函数对原始语音信息进行读取,获取到的音频文件的一维数组即为数字语音信号。Python模块是一种由面向对象的解释型计算机可读指令设计语言编写的包含大量的封装函数的模块。本实施例中,采用Python模块中的读取音频文件的命令函数直接读取原始语音信息,即可获取数字语音信号,实现简单。Specifically, the server receives the original voice information sent by the computer device, and reads the original voice information by using a command function for reading an audio file in the Python module to obtain a digital voice signal. For example, the command function for reading an audio file may be wave.open (file (original voice information), rb (read file operation)). The command function for reading an audio file is used to read and obtain the original voice information. The one-dimensional array of the received audio files is the digital voice signal. A Python module is a module containing a large number of encapsulated functions written in an object-oriented interpreted computer-readable instruction design language. In this embodiment, a command function for reading an audio file in the Python module is used to directly read the original voice information to obtain a digital voice signal, which is simple to implement.
综上,数字语音信号是对原始语音信息进行转换处理后获取到的一维数字信息,具体是采用Python模块中的读取音频文件的命令函数直接读取原始语音信息所获取的一维数字信号。In summary, the digital voice signal is a one-dimensional digital information obtained by converting the original voice information. Specifically, the digital voice signal is a one-dimensional digital signal obtained by directly reading the original voice information by using the command function of the read audio file in the Python module. .
S20:基于数字语音信号,获取汉克尔矩阵。S20: Obtaining a Hankel matrix based on a digital voice signal.
其中,数字语音信号是对原始语音信息进行转换处理后获取到的一维数字信息的一维数字信号。汉克尔矩阵(Hankel Matrix)是指每一条副对角线上的元素都相等的方阵。The digital voice signal is a one-dimensional digital signal of one-dimensional digital information obtained by converting the original voice information. Hankel matrix refers to a square matrix with equal elements on each subdiagonal.
具体地,汉克尔矩阵具有如下表示形式:假设数字语音信号(一维数字信号序列)为x(i),长度为N,i=1,2,3…N,则
Figure PCTCN2018094409-appb-000001
其中,n为矩阵元素数量。汉克尔矩阵中第j行的元素是通过上一行的元素左移一个元素形成的,使得汉克尔矩阵中每一条副对角线上的元素相等,即每一行中的元素与其左下角相邻的元素相等。右上角到左下角的对角线是副对角线。
Specifically, the Hankel matrix has the following representation: Assuming that a digital speech signal (a one-dimensional digital signal sequence) is x (i), the length is N, and i = 1,2,3 ... N, then
Figure PCTCN2018094409-appb-000001
Where n is the number of matrix elements. The elements in the j-th row of the Hankel matrix are formed by shifting the elements from the previous row one element to the left, so that the elements on each subdiagonal in the Hankel matrix are equal, that is, the elements in each row are related to their lower left corner. The adjacent elements are equal. The diagonal from the upper right corner to the lower left corner is the sub diagonal.
本实施例中,需预先定义汉克尔矩阵的第一列元素和最后一行元素,以便确定汉克尔矩阵的行和列,根据这两个参数构建汉克尔矩阵,为后续进行奇异值分解运算提供技术支持。可以理解地,最后一行元素的首位元素与第一列元素的末位元素相同。例如,给定矩阵的第一列元素为A=(1,2,3,4),矩阵的最后一行元素B=(4,4.5,5.5),则基于这两个参数构建的汉克尔矩阵为
Figure PCTCN2018094409-appb-000002
In this embodiment, the elements of the first column and the last row of the Hankel matrix need to be defined in advance in order to determine the rows and columns of the Hankel matrix. The Hankel matrix is constructed according to these two parameters, and singular value decomposition is performed for subsequent Computing provides technical support. Understandably, the first element of the last row of elements is the same as the last element of the first column of elements. For example, if the first column of a given matrix is A = (1,2,3,4) and the last row of the matrix is B = (4,4.5,5.5), then a Hankel matrix constructed based on these two parameters for
Figure PCTCN2018094409-appb-000002
S30:对汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值。S30: Perform singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values.
其中,奇异值分解(Singular Value Decomposition,简称SVD分解)是线性代数中一种重要的矩阵分解,该奇异值分解运算处理能够有效对大批量数据进行降维,以减少运算量,节省运算时间。具体地,服务器对汉克尔矩阵进行奇异值分解会得到两个酉矩阵和一个半正定对角矩阵,半正定对角矩阵对角线上的值即为奇异值,奇异值一般含有N(N>2)个,按从大到小的顺序排列。奇异值可表征矩阵中隐含的重要信息,且重要性和奇异值大小正相关。可以理解地,奇异值越大,则该奇异值包含的数字语音信号的有效信息量越大;反之,奇异值越小,则该奇异值包含的数字语音信号的有效信息量越少,本实施例中认定 包含越多的噪声。服务器通过对汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值,能够直观的观察出奇异值中所包含的有效信息量的程度,便于进行降噪处理。Among them, Singular Value Decomposition (SVD Decomposition for short) is an important matrix factorization in linear algebra. This singular value decomposition operation can effectively reduce the dimension of a large amount of data to reduce the amount of calculation and save operation time. Specifically, the server performs singular value decomposition on the Hankel matrix to obtain two unitary matrices and a semi-positive definite diagonal matrix. The values on the diagonal of the semi-definite definite diagonal matrix are singular values. The singular values generally contain N (N > 2), in order from largest to smallest. The singular value can represent the important information hidden in the matrix, and the importance is positively related to the size of the singular value. Understandably, the larger the singular value is, the larger the effective information amount of the digital voice signal contained in the singular value is. The more noise is considered to be included in the example. The server obtains at least two singular values by performing singular value decomposition operation processing on the Hankel matrix, and can intuitively observe the degree of effective information contained in the singular values, which is convenient for noise reduction processing.
具体地,可将奇异值分解运算采用奇异值分解公式进行表示,即H=UDV *,其中,U、V为两个酉矩阵,D为半正定对角矩阵。酉矩阵(Unitary Matrix)是指满足矩阵中n个列向量是两两正交的单位向量的条件的矩阵,即酉矩阵的共轭转置和它的逆矩阵相等。设A是数域上的一个n阶方阵,若在相同数域上存在另一个n阶矩阵B,使得AB=BA=E(E为单位矩阵即从左上角到右下角的对角线上的元素均为1的n阶方阵),则称B是A的逆矩阵。共轭转置是指把矩阵转置后,再把矩阵中的每一个元素换成它的共轭复数。共轭复数是指两个实部相等,虚部互为相反数的复数。例如,z=a+bi(a,b∈R)中,z的共轭复数为zˊ=a-bi(a,b∈R)。半正定对角矩阵是指既是半正定矩阵又是对角矩阵的矩阵。半正定矩阵是对任何非零向量X,都有X'AX≥0,(X’表示X的转置)的n阶方阵,其中,A为半正定矩阵。对角矩阵是一个主对角线(从左上角到右下角的对角线)之外的元素皆为0的矩阵。 Specifically, the singular value decomposition operation can be expressed by a singular value decomposition formula, that is, H = UDV * , where U and V are two unitary matrices and D is a semi-definite positive diagonal matrix. The unitary matrix refers to a matrix that satisfies the condition that n column vectors in the matrix are orthogonal unit vectors, that is, the conjugate transpose of the unitary matrix is equal to its inverse matrix. Let A be an n-th order square matrix in the number field. If there is another n-th order matrix B in the same number field, make AB = BA = E (E is the identity matrix, that is, the diagonal line from the upper left corner to the lower right corner. The elements of N are all square matrices of order n), then B is called the inverse matrix of A. Conjugate transpose means that after transposing the matrix, every element in the matrix is replaced with its conjugate complex number. A conjugate complex number is a complex number where two real parts are equal and the imaginary parts are opposite numbers to each other. For example, in z = a + bi (a, b∈R), the conjugate complex number of z is zˊ = a-bi (a, b∈R). A semi-definite definite diagonal matrix refers to a matrix that is both a semi-definite definite matrix and a diagonal matrix. A semi-positive definite matrix is an n-th order square matrix with X'AX ≥ 0 (X 'represents the transpose of X) for any non-zero vector X, where A is a semi-positive definite matrix. A diagonal matrix is a matrix with zero elements except the main diagonal (the diagonal from the upper left corner to the lower right corner).
在一实施例中,如图3所示,步骤S30中,即对汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值,具体包括如下步骤:In an embodiment, as shown in FIG. 3, in step S30, the singular value decomposition operation processing is performed on the Hankel matrix to obtain at least two singular values, and the specific steps include the following steps:
S31:计算汉克尔矩阵的转置矩阵。S31: Calculate the transpose matrix of the Hankel matrix.
其中,汉克尔矩阵的转置矩阵是指将汉克尔矩阵的所有元素绕着一条从第1行第1列元素出发的右下方45度的射线作镜面反转所得到的矩阵。例如,设汉克尔矩阵
Figure PCTCN2018094409-appb-000003
则汉克尔矩阵的转置矩阵
Figure PCTCN2018094409-appb-000004
通过获取汉克尔矩阵的转置矩阵为后续获取特征值提供技术支持。
Among them, the transposed matrix of the Hankel matrix refers to a matrix obtained by mirror-inverting all elements of the Hankel matrix around a ray of 45 degrees below and to the right starting from the elements in the first row and the first column. For example, let ’s set the Hankel matrix
Figure PCTCN2018094409-appb-000003
Hankel matrix transpose matrix
Figure PCTCN2018094409-appb-000004
Provide technical support for the subsequent acquisition of eigenvalues by obtaining the transposed matrix of the Hankel matrix.
S32:基于汉克尔矩阵和转置矩阵的乘积,获取至少两个特征值。S32: Obtain at least two eigenvalues based on the product of the Hankel matrix and the transposed matrix.
具体地,设A为汉克尔矩阵,A T为转置矩阵,即可采用公式B=AA T和B'=A TA计算汉克尔矩阵和转置矩阵的乘积对应的矩阵B和矩阵B’,依据Bx=mx进行计算即可获取至少两个特征值。若B是n阶方阵,如果存在实数m和非零n维列向量x,使得Bx=mx等式成立,则称m是B的一个特征值,特征值反映了对矩阵进行变换的伸缩倍数,通过对矩阵进行伸缩变换,以实现对数据进行降维的目的。 Specifically, let A be the Hankel matrix and A T be the transposed matrix, and the formula B = AA T and B ′ = A T A can be used to calculate the matrix B and the matrix corresponding to the product of the Hankel matrix and the transposed matrix. B ', at least two eigenvalues can be obtained by calculation according to Bx = mx. If B is a square matrix of order n, if the real number m and non-zero n-dimensional column vector x exist, so that the equation Bx = mx holds, then m is said to be a eigenvalue of B, and the eigenvalue reflects the scaling factor of the matrix transformation , By scaling the matrix to achieve the purpose of data dimensionality reduction.
具体地,若汉克尔矩阵
Figure PCTCN2018094409-appb-000005
汉克尔矩阵的转置矩阵
Figure PCTCN2018094409-appb-000006
则基于汉克尔矩阵和转置矩阵的乘积,获取至少两个特征值,具体包括如下过程:
Specifically, the Johankel matrix
Figure PCTCN2018094409-appb-000005
Hankel's transpose matrix
Figure PCTCN2018094409-appb-000006
Based on the product of the Hankel matrix and the transposed matrix, at least two eigenvalues are obtained, which specifically include the following process:
(1)采用公式B=AA T和B'=A TA计算汉克尔矩阵和转置矩阵的乘积对应的矩阵B和矩阵B’,例如,采用公式B=AA T计算得到
Figure PCTCN2018094409-appb-000007
通过公式B'=A TA计算得到
Figure PCTCN2018094409-appb-000008
(1) Use the formula B = AA T and B ′ = A T A to calculate the matrix B and matrix B ′ corresponding to the product of the Hankel matrix and the transposed matrix. For example, use the formula B = AA T to obtain
Figure PCTCN2018094409-appb-000007
Calculated by the formula B '= A T A
Figure PCTCN2018094409-appb-000008
(2)采用矩阵行列式的计算公式对矩阵B和矩阵B’进行处理,获取至少两个特征值。其中,矩阵行列式的计算公式为
Figure PCTCN2018094409-appb-000009
矩阵Σ号表示对一切排列求和,τ表示排列k 1k 2…k n的逆序数,D称为矩阵的行列式。逆序数的计算公式为
Figure PCTCN2018094409-appb-000010
以B’为例,通过计算矩阵B’的矩阵行列式
Figure PCTCN2018094409-appb-000011
得到特征值λ 1=3和λ 2=1。
(2) A matrix determinant is used to process the matrix B and the matrix B ′ to obtain at least two eigenvalues. Among them, the calculation formula of matrix determinant is
Figure PCTCN2018094409-appb-000009
The matrix Σ number represents the sum of all permutations, τ represents the inverse ordinal number of the permutations k 1 k 2 … k n , and D is called the determinant of the matrix. The formula for calculating the inverse ordinal number is
Figure PCTCN2018094409-appb-000010
Take B 'as an example, by calculating the matrix determinant of matrix B'
Figure PCTCN2018094409-appb-000011
The eigenvalues λ 1 = 3 and λ 2 = 1 are obtained.
(3)通过公式Bu i=λ iu i和公式B′v i=λ iv i进行对至少两个特征值λ i进行处理,获取与每一特征值对应的特征向量,其中,u i为与矩阵B的特征值对应的特征向量,v i为与矩阵B’的特征值对应的特征向量。服务器基于汉克尔矩阵和转置矩阵的乘积,获取特征值和特征向量,以实现数据降维的目的。 (3) At least two eigenvalues λ i are processed by formula Bu i = λ i u i and formula B′v i = λ i v i to obtain a feature vector corresponding to each feature value, where u i It is an eigenvalue corresponding to the matrix B, wherein v i of the matrix B 'values corresponding eigenvectors. The server obtains eigenvalues and eigenvectors based on the product of the Hankel matrix and the transposed matrix to achieve the purpose of data dimensionality reduction.
S33:按照预设计算方法对至少两个特征值进行运算,获取至少两个奇异值。S33: Operate at least two eigenvalues according to a preset calculation method to obtain at least two singular values.
其中,预设计算方法是指预先定义的用于对特征值进行计算获取奇异值的计算方法。该预设计算方法包括采用公式
Figure PCTCN2018094409-appb-000012
对奇异值进行开方运算或者采用公式Av i=σ iu i对至少两个特征值进行计算。
The preset calculation method refers to a predefined calculation method for calculating singular values by calculating characteristic values. The preset calculation method includes using a formula
Figure PCTCN2018094409-appb-000012
Perform singular value square operation or use the formula Av i = σ i u i to calculate at least two eigenvalues.
具体地,服务器采用公式
Figure PCTCN2018094409-appb-000013
对至少两个特征值进行开方运算,即可获取至少两个奇异值,其中,σ i为奇异值,λ i为特征值。服务器对特征值进开方运算,以获取奇 异值的方法,计算简单,提高效率。
Specifically, the server uses the formula
Figure PCTCN2018094409-appb-000013
By performing a square operation on at least two eigenvalues, at least two singular values can be obtained, where σ i is a singular value and λ i is a eigenvalue. The server performs a square root operation on the eigenvalues to obtain a singular value. The calculation is simple and the efficiency is improved.
或者,服务器采用公式Av i=σ iu i对至少两个特征值进行计算,获取至少两个奇异值。u i为与矩阵B的特征值对应的特征向量,v i为与矩阵B’的特征值对应的特征向量。 Alternatively, the server uses the formula Av i = σ i u i to calculate at least two eigenvalues to obtain at least two singular values. u i is a feature vector corresponding to the eigenvalues of matrix B, and v i is a feature vector corresponding to the eigenvalues of matrix B ′.
最终,基于奇异值σ i、特征向量u i和特征向量v i,得到对汉克尔矩阵进行奇异值分解的表达式即H=UDV *,其中,
Figure PCTCN2018094409-appb-000014
Figure PCTCN2018094409-appb-000015
Finally, based on the singular value σ i , the eigenvector u i and the eigenvector v i , we get the expression of singular value decomposition of the Hankel matrix, which is H = UDV * , where,
Figure PCTCN2018094409-appb-000014
Figure PCTCN2018094409-appb-000015
本实施例中,先计算汉克尔矩阵的转置矩阵,以便基于汉克尔矩阵和转置矩阵的乘积,获取至少两个特征值,再基于获取到的特征值,对基于汉克尔矩阵和转置矩阵的乘积所得到的矩阵进行伸缩变换,以实现对数据进行降维的目的。最后,至少两个特征值进行开方运算,获取至少两个奇异值,该奇异值的获取方法计算简单,容易实现。In this embodiment, the transpose matrix of the Hankel matrix is first calculated so as to obtain at least two eigenvalues based on the product of the Hankel matrix and the transpose matrix, and then based on the obtained eigenvalues, the Hankel matrix-based The matrix obtained by multiplying the product with the transposed matrix is scaled to achieve the purpose of reducing the dimension of the data. Finally, at least two eigenvalues are subjected to a square operation to obtain at least two singular values. The method for obtaining the singular values is simple to calculate and easy to implement.
S40:对至少两个奇异值进行奇异值分解逆运算,获取目标语音信号。S40: Perform inverse singular value decomposition operation on at least two singular values to obtain a target speech signal.
其中,奇异值分解逆运算是指将每一个奇异值还原成半正定对角矩阵,并将该半正定对角矩阵与先前奇异值分解运算得到的两个酉矩阵进行相乘,以得到目标语音信息的运算。目标语音信号是通过对数字语音信号进行奇异值分解得到的去噪后的语音信号。具体地,服务器对至少两个奇异值进行奇异值分解逆运算,以获取每一奇异值对应的语音信号(即目标语音信号),以达到语音增强的目的。Among them, the singular value decomposition inverse operation refers to reducing each singular value into a semi-positive definite diagonal matrix, and multiplying the semi-positive definite diagonal matrix with two unitary matrices obtained by the previous singular value decomposition operation to obtain the target speech. Information operations. The target speech signal is a denoised speech signal obtained by performing singular value decomposition on a digital speech signal. Specifically, the server performs an inverse singular value decomposition operation on at least two singular values to obtain a voice signal (that is, a target voice signal) corresponding to each singular value, so as to achieve the purpose of voice enhancement.
在一实施例中,如图4所示,步骤S40中,即对至少两个奇异值进行奇异值分解逆运算,获取目标语音信号,具体包括如下步骤:In an embodiment, as shown in FIG. 4, in step S40, the singular value decomposition inverse operation is performed on at least two singular values to obtain a target voice signal, which specifically includes the following steps:
S411:对至少两个奇异值分别进行奇异值分解逆运算处理,获取每一奇异值对应的原始信号分量。S411: Perform singular value decomposition and inverse operation processing on at least two singular values, respectively, to obtain an original signal component corresponding to each singular value.
其中,原始信号分量是对至少两个奇异值分别进行奇异值分解逆运算处理所获取的信号分量。具体地,将每一奇异值还原(奇异值在矩阵中的位置不变)成半正定对角矩阵,并与先前奇异值分解运算得到的两个酉矩阵进行相乘,获取与每一奇异值对应的原始信号分量。The original signal component is a signal component obtained by performing singular value decomposition inverse operation processing on at least two singular values respectively. Specifically, each singular value is reduced (the position of the singular value in the matrix is unchanged) into a semi-positive definite diagonal matrix, and multiplied by two unitary matrices obtained from the previous singular value decomposition operation to obtain each singular value. Corresponding original signal component.
S412:将原始信号分量与数字语音信号进行相关性计算,获取相关性系数。S412: Perform correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient.
其中,相关性系数是对数字语音信号和第一信号分量进行相关性计算所获取的计算结果。第一相关性系数反映了数字语音信号和第一信号分量的相关程度,并且也反映了信号分量中包含有效信息量的程度。The correlation coefficient is a calculation result obtained by performing correlation calculation on the digital voice signal and the first signal component. The first correlation coefficient reflects the degree of correlation between the digital speech signal and the first signal component, and also reflects the degree to which the signal component contains an effective amount of information.
具体地,相关性计算公式为
Figure PCTCN2018094409-appb-000016
其中,x为原始信号分量,y为数字语音信号,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为相关性系数。
Specifically, the correlation calculation formula is
Figure PCTCN2018094409-appb-000016
Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
其中,Cov(x,y)的计算公式为:
Figure PCTCN2018094409-appb-000017
Var[x]的计算公式为Var[x]=E(x 2)-E 2(x);Var[y]的计算公式为Var[y]=E(y 2)-E 2(y);其中,E(x)表示原始信号分量的均值,E(y)表示数字语音信号的均值,n表示原始信号分量的数量,y j表示时间尺度上的第j个数字语音信号。x j表示同一时间尺度上的第j个原始信号分量。
Among them, Cov (x, y) is calculated as:
Figure PCTCN2018094409-appb-000017
The calculation formula of Var [x] is Var [x] = E (x 2 ) -E 2 (x); The calculation formula of Var [y] is Var [y] = E (y 2 ) -E 2 (y); Among them, E (x) represents the average value of the original signal components, E (y) represents the average value of the digital speech signals, n represents the number of original signal components, and y j represents the j-th digital speech signal on the time scale. x j represents the j-th original signal component on the same time scale.
S413:选取相关性系数大于预设阈值的原始信号分量,作为目标信号分量。S413: Select an original signal component whose correlation coefficient is greater than a preset threshold as a target signal component.
其中,预设阈值是预先定义好的用于筛选原始信号分量的阈值。目标信号分量是利用预设阈值对原始信号分量进行筛选操作后获取的原始信号分量。Wherein, the preset threshold is a predefined threshold for filtering the original signal components. The target signal component is an original signal component obtained by performing a filtering operation on the original signal component using a preset threshold.
由于相关性系数是0到1之间的实数,因此,该预设阈值的选取为0到1之间的实数。若相关性系数大于预设阈值,则表示该原始信号分量与数字语音信号的相关性大,原始信号分量中包含数字语音信号的有效信息量多。若相关性系数不大于预设阈值,则表示原始信号分量与数字语音信号的相关性小,原始信号分量中包含的有效信息量少,可默认为噪声。本实施例中,通过对原始信号分量进行筛选,以获取与数字语音信号的相关性较大的原始信号分量作为目标信号分量,以减少噪声干扰,达到语音增强的目的。并且,该原始信号分量的筛选方法实现简单,提高语音增强的效率。Since the correlation coefficient is a real number between 0 and 1, the preset threshold is selected as a real number between 0 and 1. If the correlation coefficient is greater than a preset threshold, it means that the original signal component has a large correlation with the digital voice signal, and the original signal component contains a large amount of effective information of the digital voice signal. If the correlation coefficient is not greater than a preset threshold value, it means that the correlation between the original signal component and the digital voice signal is small, and the amount of effective information contained in the original signal component is small, and the noise may be defaulted. In this embodiment, the original signal components are filtered to obtain the original signal components with greater correlation with the digital speech signal as the target signal components to reduce noise interference and achieve the purpose of speech enhancement. In addition, the method for screening original signal components is simple to implement and improves the efficiency of speech enhancement.
S414:对目标信号分量进行线性叠加处理,获取目标语音信号。S414: Perform linear superposition processing on the target signal components to obtain a target voice signal.
具体地,服务器采用公式W=x 1+x 2+…x n对获取到的N个目标信号分量进行线性叠加,以获取目标语音信号,其中,W为目标语音信号,x为目标信号分量。 Specifically, the server linearly superimposes the acquired N target signal components by using a formula W = x 1 + x 2 + ... x n to obtain a target voice signal, where W is a target voice signal and x is a target signal component.
本实施例中,服务器先通过对每一奇异值分别进行奇异值分解逆运算处理,获取每一奇异值对应的原始信号分量,以便对原始信号分量与数字语音信号进行相关性计算,获取相关性系数,反映了数字语音信号和第一信号分量的相关程度,并且也反映了信号分量中 包含有效信息量的程度。服务器再通过对每一原始信号分量进行筛选,以获取与数字语音信号的相关性较大的原始信号分量作为目标信号分量,以更细致的减少噪声干扰,达到语音增强的目的。最后,对目标信号分量进行线性叠加处理,获取目标语音信号,该获取目标语音信号的过程计算简单,容易实现,提高了语音增强的处理效率。In this embodiment, the server first obtains the original signal component corresponding to each singular value by performing singular value decomposition and inverse operation processing on each singular value, so as to perform correlation calculation between the original signal component and the digital voice signal to obtain correlation. The coefficient reflects the degree of correlation between the digital speech signal and the first signal component, and also reflects the degree to which the signal component contains an effective amount of information. The server then screens each original signal component to obtain the original signal component with greater correlation with the digital speech signal as the target signal component, in order to reduce noise interference in more detail, and achieve the purpose of speech enhancement. Finally, the target signal components are linearly superimposed to obtain the target speech signal. The process of obtaining the target speech signal is simple to calculate, easy to implement, and improves the processing efficiency of speech enhancement.
在一实施例中,如图5所示,步骤S411中,对至少两个奇异值分别进行奇异值分解逆运算处理,获取每一奇异值对应的原始信号分量,具体包括如下步骤:In an embodiment, as shown in FIG. 5, in step S411, at least two singular values are separately subjected to singular value decomposition and inverse operation processing to obtain an original signal component corresponding to each singular value, which specifically includes the following steps:
S4111:基于奇异值,获取奇异值矩阵。S4111: Obtain a singular value matrix based on the singular values.
其中,奇异值矩阵是将每一奇异值在半正定对角矩阵中进行还原所获取的矩阵。具体地,服务器将每一奇异值在半正定对角矩阵中还原,以获取奇异值矩阵。本实施例中,对每一奇异值进行还原,以获取对应的奇异值矩阵可按照如下公式表示
Figure PCTCN2018094409-appb-000018
其中,D n表示第n个奇异值对应的奇异值矩阵。
The singular value matrix is a matrix obtained by reducing each singular value in a semi-positive definite diagonal matrix. Specifically, the server restores each singular value in a semi-positive definite diagonal matrix to obtain a singular value matrix. In this embodiment, each singular value is restored to obtain a corresponding singular value matrix, which can be expressed according to the following formula
Figure PCTCN2018094409-appb-000018
Among them, D n represents a singular value matrix corresponding to the n-th singular value.
S4112:基于奇异值矩阵,获取每一奇异值对应的原始信号分量。S4112: Obtain an original signal component corresponding to each singular value based on the singular value matrix.
具体地,按照如下公式对每一奇异值矩阵进行运算,以获取与每一奇异值对应的原始信号分量。Specifically, each singular value matrix is operated according to the following formula to obtain an original signal component corresponding to each singular value.
Figure PCTCN2018094409-appb-000019
U和V*为两个两个酉矩阵,D为每一奇异值对应的奇异值矩阵,即D 1、D 2…D n,H为每一奇异值对应的原始信号分量,U ik是由Bu i=λ iu i计算得到的第i个特征向量对应的矩阵。V ik是由公式B′v i=λ iv i计算得到的第i个特征向量对应的矩阵。
Figure PCTCN2018094409-appb-000019
U and V * are two two unitary matrices, D is the singular value matrix corresponding to each singular value, that is, D 1 , D 2 … D n , H is the original signal component corresponding to each singular value, U ik is given by Bu i = λ i u i The matrix corresponding to the ith feature vector. V ik is a matrix corresponding to the ith feature vector calculated by the formula B′v i = λ i v i .
本实施例中,先将每一奇异值在半正定对角矩阵中进行还原,以获取奇异值矩阵,然后将每一奇异值对应的奇异值矩阵与奇异值分解运算得到的两个酉矩阵进行相乘运算,以获取每一奇异值对应的原始信号分量,为后续对原始信号分量进行筛选获取目标信号分量提供技术支持。In this embodiment, each singular value is first reduced in a semi-positive definite diagonal matrix to obtain a singular value matrix, and then the singular value matrix corresponding to each singular value and the two unitary matrices obtained by the singular value decomposition operation are performed. Multiplication operation to obtain the original signal component corresponding to each singular value, and to provide technical support for subsequent filtering of the original signal component to obtain the target signal component.
在一实施例中,如图6所示,步骤S40中,即对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信息,具体包括如下步骤:In an embodiment, as shown in FIG. 6, in step S40, at least two of the singular values are subjected to singular value decomposition inverse operation to obtain target voice information, and specifically include the following steps:
S421:计算至少两个奇异值的总和,将总和与预设阈值进行乘法运算,获取对应的评价阈值。其中,预设阈值为不大于1的正数。S421: Calculate the sum of at least two singular values, multiply the sum by a preset threshold, and obtain a corresponding evaluation threshold. The preset threshold is a positive number not greater than 1.
其中,预设阈值是预先定义好的用于计算评价阈值的阈值。评价阈值是用于筛选奇异值的阈值。该预设阈值为不大于1的正数。具体地,计算所有奇异值的总和,然后将总和与预设阈值进行乘法运算,以获取评价阈值。即评价阈值的计算公式为:
Figure PCTCN2018094409-appb-000020
其中,T为预设阈值,P为评价阈值,σ i为奇异值。
The preset threshold is a threshold defined in advance for calculating an evaluation threshold. The evaluation threshold is a threshold used for screening singular values. The preset threshold is a positive number not greater than 1. Specifically, a sum of all singular values is calculated, and then the sum is multiplied with a preset threshold to obtain an evaluation threshold. That is, the calculation formula of the evaluation threshold is:
Figure PCTCN2018094409-appb-000020
Among them, T is a preset threshold, P is an evaluation threshold, and σ i is a singular value.
S422:将至少两个奇异值按从大到小的顺序进行线性叠加,获取叠加和值,若叠加和值大于评价阈值,则获取叠加和值对应的N项奇异值。其中,N为正整数。S422: Perform linear superposition of at least two singular values in order from large to small to obtain a superposition sum value. If the superposition sum value is greater than the evaluation threshold, obtain N singular values corresponding to the superposition sum value. Where N is a positive integer.
具体地,奇异值是按照从大到小的顺序进行排列,因此按照奇异值从大到小顺序进行线性相加,获取叠加和值,若叠加和值大于评价阈值,则获取该叠加和值对应的N项奇异值,其中,N为正整数。可以理解地,按照奇异值从大到小顺序进行线性相加直至叠加的N项奇异值之和大于评价阈值,则停止叠加,以获取N项奇异值。由于奇异值越大,则该奇异值所包含的数字语音信号的有效信息量越大,反之,奇异值越小,则该奇异值所包含的数字语音信号的有效信息量越少,则认为主要包含了噪声。因此,服务器按照奇异值从大到小的顺序进行线性相加,直至叠加的N项奇异值的叠加和值大于评价阈值,并将剩余的M项奇异值去除,以减少噪声干扰。该奇异值筛选过程无需将每一奇异值进行分解逆运算,再进行相关性分析,直接根据评价阈值即可筛选出所需奇异值,操作简单,提高效率。Specifically, the singular values are arranged in descending order. Therefore, the singular values are added linearly in order from large to small to obtain the superposition sum value. If the superposition sum value is greater than the evaluation threshold, the superposition sum value is obtained. Singular values of N terms, where N is a positive integer. Understandably, the linear addition is performed in the order of the singular values from large to small until the sum of the singular values of the superimposed N items is greater than the evaluation threshold, then the superimposition is stopped to obtain the N singular values. As the singular value is larger, the effective information amount of the digital voice signal contained in the singular value is larger. On the other hand, the smaller the singular value is, the less effective information amount of the digital voice signal contained in the singular value is considered to be the main Contains noise. Therefore, the server linearly adds the singular values in ascending order until the sum of the values of the singular values of the N items is larger than the evaluation threshold, and removes the remaining singular values of the M items to reduce noise interference. The singular value screening process does not need to perform inverse operation on each singular value, and then performs correlation analysis. The required singular value can be filtered directly based on the evaluation threshold, which is simple to operate and improves efficiency.
S423:对N项奇异值进行批量重构,获取目标语音信号。S423: Perform batch reconstruction on the N singular values to obtain a target voice signal.
其中,批量重构是指对N项奇异值进行批量还原处理以获取目标语音信息的方法。The batch reconstruction refers to a method of performing batch restoration processing on N singular values to obtain target voice information.
具体地,对N项奇异值进行批量重构,获取目标语音信号的具体实现过程如下:将选取的N项奇异值在奇异值分解运算得到的原始半正定对角矩阵D中保留,大小位置不变,去除掉的奇异值(即代表噪声的奇异值)在半正定对角矩阵中大小归0,位置不变,以获取包含选取的N项奇异值的目标半正定对角矩阵M。然后,将目标半正定对角矩阵M代入上述奇异值分解公式中,U、V不变,得到新的汉克尔矩阵H',其中,H'=UD nV *,将新的汉克尔矩阵H'按照汉克尔矩阵的性质(即每一条副对角线上的元素都相等的性质)进行展开,即可获取得到去噪后的语音信号,即本实施例中的目标语音信号。 Specifically, batch reconstruction is performed on the N singular values, and the specific implementation process of obtaining the target speech signal is as follows: The selected N singular values are retained in the original semi-positive definite diagonal matrix D obtained by singular value decomposition operation, and the size and position The singular values (that is, the singular values representing noise) are reduced to 0 and the positions are unchanged in the semi-definite definite diagonal matrix to obtain the target semi-definite definite diagonal matrix M containing the selected N singular values. Then, the target semi-positive definite diagonal matrix M is substituted into the above singular value decomposition formula, U and V are unchanged, and a new Hankel matrix H 'is obtained, where H' = UD n V * , and the new Hankel The matrix H 'is expanded according to the properties of the Hankel matrix (that is, the elements on each subdiagonal are equal), and a denoised speech signal, that is, a target speech signal in this embodiment, can be obtained.
综上,本实施例中,奇异值逆分解包括对每一奇异值进行逆分解或者对奇异值进行批量重构,以获取目标语音信号。In summary, in this embodiment, the inverse singular value decomposition includes inverse decomposition of each singular value or batch reconstruction of singular values to obtain a target speech signal.
本实施例中,通过计算至少两个奇异值的总和,并将总和与预设阈值进行乘法运算,以获取评价阈值,以便将至少两个奇异值从大到小顺序进行线性相加直至叠加的N项奇异值之和大于评价阈值,则停止叠加,以获取N项奇异值,并将剩余的M项奇异值去除,以减少噪声干扰,达到语音增强的目的。最后,对N项奇异值进行批量重构,获取目标语音信号,该获取目标语音信号的过程可将选取的N项奇异值直接在奇异值分解运算得到的原始半正定对角矩阵D中还原,并与奇异值分解运算得到的两个酉矩阵进行相乘运算,以获取目标语音信号,通过批量重构的方式获取目标语音信号,提高目标语音信号的获取效率,进而提高语音增强的处理效率。In this embodiment, the sum of at least two singular values is calculated, and the sum is multiplied with a preset threshold to obtain an evaluation threshold, so that at least two singular values are linearly added in order from large to small until superimposed. The sum of the N singular values is greater than the evaluation threshold, then the superposition is stopped to obtain the N singular values, and the remaining M singular values are removed to reduce noise interference and achieve the purpose of speech enhancement. Finally, batch reconstruction is performed on the N singular values to obtain the target speech signal. The process of obtaining the target speech signal can directly restore the selected N singular values in the original semi-positive definite diagonal matrix D obtained by the singular value decomposition operation. And multiply the two unitary matrices obtained by the singular value decomposition operation to obtain the target speech signal, and obtain the target speech signal by means of batch reconstruction to improve the acquisition efficiency of the target speech signal, and then improve the processing efficiency of speech enhancement.
S50:对目标语音信号进行还原处理,获取目标语音信息。S50: Perform restoration processing on the target voice signal to obtain target voice information.
其中,目标语音信息是对目标语音信号按照所需音频格式进行还原所获取的语音信息。进一步地,服务器可采用如下方法对矩阵形式的目标语音信号进行还原:先将汉克尔矩阵按对副角线元素进行展开,即可得到降噪后的一维数字信号,通过附加采样频率参数和一维数字信号,即可获取目标语音信息。其中,采样频率也称为采样速度或者采样率,定义了每秒从连续信号中提取并组成离散信号的采样个数,它用赫兹(Hz)来表示。The target voice information is voice information obtained by restoring the target voice signal in a required audio format. Further, the server can use the following method to restore the target speech signal in the form of a matrix: first expand the Hankel matrix according to the subdiagonal elements, and then obtain the one-dimensional digital signal after noise reduction, by adding the sampling frequency parameter And a one-dimensional digital signal to obtain the target voice information. Among them, the sampling frequency is also called the sampling speed or sampling rate, which defines the number of samples that are extracted from the continuous signal per second to form a discrete signal. It is expressed in Hertz (Hz).
本实施例中,采用Python模块中的读取音频文件的命令函数直接读取原始语音信息即可获取采样频率参数。具体地,Python模块中有生成不同格式音频文件的函数,直接调用该函数并赋予采样频率参数和一维数字信号,即可生成需格式的目标语音信息。例如,可通过调用Python模块中生成wav格式文件的函数wave,对获取到的采样频率参数和一维数字信号进行处理,生成wav格式的音频文件(即目标语音信息)。In this embodiment, a command function for reading an audio file in the Python module is used to directly read the original voice information to obtain the sampling frequency parameter. Specifically, the Python module has a function for generating audio files in different formats. Calling this function directly and assigning a sampling frequency parameter and a one-dimensional digital signal can generate the target voice information in the required format. For example, you can call the function wave that generates a wav format file in the Python module to process the acquired sampling frequency parameters and one-dimensional digital signals to generate an audio file (that is, target voice information) in the wav format.
本实施例中,先对原始语音信息进行转换,获取数字语音信号,将数字语音信号构建为汉克尔矩阵,以便对汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值,奇异值可表征矩阵中隐含的重要信息,且重要性和奇异值大小正相关,可根据获取到奇异值,直观地观察出奇异值中所包含的有效信息量的程度。然后,服务器对至少两个奇异值进行奇异值分解逆运算,以获取每一奇异值对应的语音信号即目标语音信号,以抑制噪声干扰,实现语音增强。最后,对目标语音信号进行还原处理,以获取所需格式的音频文件即目标语音信息,该还原过程可直接调用Python模块中的函数进行还原,操作简单。In this embodiment, the original voice information is first converted to obtain a digital voice signal, and the digital voice signal is constructed into a Hankel matrix, so that the Hankel matrix is subjected to singular value decomposition operation processing to obtain at least two singular values. The value can represent the important information implied in the matrix, and the importance is positively related to the size of the singular value. According to the singular value obtained, the degree of the effective information contained in the singular value can be intuitively observed. Then, the server performs a singular value decomposition inverse operation on at least two singular values to obtain a speech signal corresponding to each singular value, that is, a target speech signal, so as to suppress noise interference and implement speech enhancement. Finally, the target voice signal is restored to obtain the audio file in the required format, that is, the target voice information. The restoration process can directly call the function in the Python module for restoration, and the operation is simple.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
在一个实施例中,图7示出与上述实施例中语音增强方法一一对应的语音增强装置的示意图。如图7所示,该语音增强装置包括数字语音信号获取模块10、汉克尔矩阵获取模 块20、奇异值获取模块30、目标语音信号获取模块40和目标语音信息获取模块。各功能模块详细说明如下:In one embodiment, FIG. 7 shows a schematic diagram of a speech enhancement device corresponding to the speech enhancement method in the above embodiment. As shown in FIG. 7, the voice enhancement device includes a digital voice signal acquisition module 10, a Hankel matrix acquisition module 20, a singular value acquisition module 30, a target voice signal acquisition module 40, and a target voice information acquisition module. The detailed description of each function module is as follows:
数字语音信号获取模块10,用于对原始语音信息进行转换,获取数字语音信号。The digital voice signal acquisition module 10 is configured to convert the original voice information to obtain a digital voice signal.
汉克尔矩阵获取模块20,用于基于数字语音信号,获取汉克尔矩阵。The Hankel matrix obtaining module 20 is configured to obtain a Hankel matrix based on a digital voice signal.
奇异值获取模块30,用于对汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值。The singular value acquisition module 30 is configured to perform singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values.
目标语音信号获取模块40,用于对至少两个奇异值进行奇异值分解逆运算,获取目标语音信号。The target voice signal acquisition module 40 is configured to perform an inverse singular value decomposition operation on at least two singular values to obtain a target voice signal.
目标语音信息获取模块50,用于对目标语音信号进行还原处理,获取目标语音信息。The target voice information acquisition module 50 is configured to perform restoration processing on the target voice signal to acquire the target voice information.
具体地,奇异值获取模块30包括转置矩阵计算单元31、特征值获取单元32和奇异值获取单元33。Specifically, the singular value acquisition module 30 includes a transposed matrix calculation unit 31, a eigenvalue acquisition unit 32, and a singular value acquisition unit 33.
转置矩阵计算单元31,用于计算汉克尔矩阵的转置矩阵。The transposed matrix calculation unit 31 is configured to calculate a transposed matrix of the Hankel matrix.
特征值获取单元32,用于基于汉克尔矩阵和转置矩阵的乘积,获取至少两个特征值。An eigenvalue obtaining unit 32 is configured to obtain at least two eigenvalues based on a product of a Hankel matrix and a transposed matrix.
奇异值获取单元33,用于按照预设计算方法对至少两个特征值进行运算,获取至少两个奇异值。The singular value obtaining unit 33 is configured to perform an operation on at least two eigenvalues according to a preset calculation method to obtain at least two singular values.
具体地,目标语音信号获取模块40包括原始信号分量获取单元411、相关性系数获取单元412、目标信号分量获取单元413和目标语音信号获取单元414。Specifically, the target speech signal acquisition module 40 includes an original signal component acquisition unit 411, a correlation coefficient acquisition unit 412, a target signal component acquisition unit 413, and a target speech signal acquisition unit 414.
原始信号分量获取单元411,用于对至少两个奇异值分别进行奇异值分解逆运算处理,获取每一奇异值对应的原始信号分量。The original signal component acquiring unit 411 is configured to perform singular value decomposition and inverse operation processing on at least two singular values, respectively, to obtain an original signal component corresponding to each singular value.
相关性系数获取单元412,用于将原始信号分量与数字语音信号进行相关性计算,获取相关性系数。A correlation coefficient acquisition unit 412 is configured to perform correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient.
目标信号分量获取单元413,用于选取相关性系数大于预设阈值的原始信号分量,作为目标信号分量。The target signal component acquiring unit 413 is configured to select an original signal component whose correlation coefficient is greater than a preset threshold as a target signal component.
目标语音信号获取单元414,用于对目标信号分量进行线性叠加处理,获取目标语音信号。The target voice signal acquisition unit 414 is configured to perform linear superposition processing on the target signal components to acquire a target voice signal.
具体地,原始信号分量获取单元411包括奇异值矩阵获取子单元4111和原始信号分量获取子单元4112。Specifically, the original signal component acquisition unit 411 includes a singular value matrix acquisition subunit 4111 and an original signal component acquisition subunit 4112.
奇异值矩阵获取子单元4111,用于基于奇异值,获取奇异值矩阵。The singular value matrix obtaining subunit 4111 is configured to obtain a singular value matrix based on the singular value.
原始信号分量获取子单元4112,用于基于特征值和奇异值矩阵,获取每一奇异值对应的原始信号分量。The original signal component acquisition subunit 4112 is configured to obtain an original signal component corresponding to each singular value based on the eigenvalue and the singular value matrix.
具体地,相关性计算公式为
Figure PCTCN2018094409-appb-000021
其中,x为原始信号分量,y为数字语音信号,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为相关性系数。
Specifically, the correlation calculation formula is
Figure PCTCN2018094409-appb-000021
Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
具体地,目标语音信号获取模块40包括评价阈值获取单元421、N项奇异值获取单元422和目标语音信号获取单元423。Specifically, the target voice signal acquisition module 40 includes an evaluation threshold acquisition unit 421, N-term singular value acquisition units 422, and a target voice signal acquisition unit 423.
评价阈值获取单元421,用于计算至少两个奇异值的总和,将总和与预设阈值进行乘法运算,获取对应的评价阈值。其中,预设阈值为不大于1的正数。The evaluation threshold obtaining unit 421 is configured to calculate a sum of at least two singular values, and multiply the sum with a preset threshold to obtain a corresponding evaluation threshold. The preset threshold is a positive number not greater than 1.
N项奇异值获取单元422,用于将至少两个奇异值按从大到小的顺序进行线性叠加,获取叠加和值,若叠加和值大于评价阈值,则获取叠加和值对应的N项奇异值;其中,N为正整数。The N-term singular value obtaining unit 422 is configured to linearly superimpose at least two singular values in order from large to small to obtain a superposition sum value. If the superposition sum value is greater than the evaluation threshold, obtain N singularities corresponding to the superposition sum value Value; where N is a positive integer.
目标语音信号获取单元423,用于对N项奇异值进行批量重构,获取目标语音信号。The target voice signal acquisition unit 423 is configured to perform batch reconstruction on the N singular values to acquire a target voice signal.
关于语音增强装置的具体限定可以参见上文中对于语音增强方法的限定,在此不再赘述。上述语音增强装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the speech enhancement device, refer to the foregoing limitation on the speech enhancement method, and details are not described herein again. Each module in the above voice enhancement device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于用于存储执行语音增强方法过程中生成或获取的数据,如目标语音信息。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时以实现一种语音增强方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 8. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer device is used to store data generated or obtained during the execution of the speech enhancement method, such as target speech information. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by one or more processors, the one or more processors are executed to implement a speech enhancement method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:对原始语音信息进行转换,获取数字语音信号;基于数字语音信号,获取汉克尔矩阵;对汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;对至少两个奇异值进行奇异值分解逆运算,获取目标语音信号;对目标语音信号进行还原处理,获取目标语音信息。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. The processor executes the computer-readable instructions to implement the following steps: The speech information is converted to obtain a digital speech signal; based on the digital speech signal, a Hankel matrix is obtained; a singular value decomposition operation is performed on the Hankel matrix to obtain at least two singular values; and at least two singular values are singular value decomposition The inverse operation is performed to obtain the target voice signal; the target voice signal is restored to obtain the target voice information.
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:计算汉克尔矩阵的转置矩阵;基于汉克尔矩阵和转置矩阵的乘积,获取至少两个特征值;按照预设计算方法对至少两个特征值进行运算,获取至少两个奇异值。In an embodiment, when the processor executes the computer-readable instructions, the following steps are further implemented: calculating a transposed matrix of the Hankel matrix; obtaining at least two eigenvalues based on a product of the Hankel matrix and the transposed matrix; The design calculation method operates on at least two eigenvalues to obtain at least two singular values.
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:对至少两个奇异值分别进行奇异值分解逆运算处理,获取每一奇异值对应的原始信号分量;将原始信号分量与数字语音信号进行相关性计算,获取相关性系数;选取相关性系数大于预设阈值的原始信号分量,作为目标信号分量。In an embodiment, when the processor executes the computer-readable instructions, the processor further implements the following steps: performing singular value decomposition and inverse operation processing on at least two singular values, respectively, to obtain an original signal component corresponding to each singular value; The digital speech signal is subjected to correlation calculation to obtain a correlation coefficient; an original signal component with a correlation coefficient greater than a preset threshold is selected as a target signal component.
对目标信号分量进行线性叠加处理,获取目标语音信号。Perform linear superposition processing on the target signal components to obtain the target speech signal.
具体地,相关性计算公式为
Figure PCTCN2018094409-appb-000022
其中,x为原始信号分量,y为数字语音信号,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为相关性系数。
Specifically, the correlation calculation formula is
Figure PCTCN2018094409-appb-000022
Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:基于奇异值,获取奇异值矩阵;基于特征值和奇异值矩阵,获取每一奇异值对应的原始信号分量。In one embodiment, when the processor executes the computer-readable instructions, the following steps are further implemented: obtaining a singular value matrix based on the singular values; and obtaining an original signal component corresponding to each singular value based on the eigenvalues and the singular value matrix.
在一个实施例中,处理器执行计算机可读指令时还实现以下步骤:计算至少两个奇异值的总和,将总和与预设阈值进行乘法运算,获取对应的评价阈值;其中,预设阈值为不大于1的正数。将至少两个奇异值按从大到小的顺序进行线性叠加,获取叠加和值,若叠加和值大于评价阈值,则获取叠加和值对应的N项奇异值;其中,N为正整数。对N项奇异值进行批量重构,获取目标语音信号。In an embodiment, when the processor executes the computer-readable instructions, the processor further implements the following steps: calculating a sum of at least two singular values, and multiplying the sum with a preset threshold to obtain a corresponding evaluation threshold; wherein the preset threshold is A positive number not greater than 1. At least two singular values are linearly superimposed in order from large to small to obtain a superposition sum value. If the superposition sum value is greater than the evaluation threshold, then N singular values corresponding to the superposition sum value are obtained; where N is a positive integer. Perform batch reconstruction on N singular values to obtain the target speech signal.
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现以下步骤:对原始语音信息进行转换,获取数字语音信号;基于数字语音信号,获取汉克尔矩阵;对汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;对至少两个奇异值进行奇异值分解逆运算,获取目标语音信号;对目标语音信号进行还原处理,获取目标语音信息。In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more When the processors are executed, the following steps are implemented: converting the original speech information to obtain a digital speech signal; obtaining a Hankel matrix based on the digital speech signal; performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values ; Performing inverse singular value decomposition on at least two singular values to obtain a target voice signal; and performing restoration processing on the target voice signal to obtain target voice information.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:计算汉克尔矩阵的转置矩阵;基于汉克尔矩阵和转置矩阵的乘积,获取至少两个特征值;按照预设计算方法对至少两个特征值进行运算,获取至少两个奇异值。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: calculating a transpose matrix of a Hankel matrix; The product of the Kerr matrix and the transposed matrix is used to obtain at least two eigenvalues; the at least two eigenvalues are calculated according to a preset calculation method to obtain at least two singular values.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或 多个处理器执行时还实现以下步骤:对至少两个奇异值分别进行奇异值分解逆运算处理,获取每一奇异值对应的原始信号分量;将原始信号分量与数字语音信号进行相关性计算,获取相关性系数;选取相关性系数大于预设阈值的原始信号分量,作为目标信号分量。对目标信号分量进行线性叠加处理,获取目标语音信号。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: performing singular value decomposition inverse on at least two singular values, respectively. The operation process obtains the original signal component corresponding to each singular value; performs correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient; and selects an original signal component with a correlation coefficient greater than a preset threshold as a target signal component. Perform linear superposition processing on the target signal components to obtain the target speech signal.
具体地,相关性计算公式为
Figure PCTCN2018094409-appb-000023
其中,x为原始信号分量,y为数字语音信号,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为相关性系数。
Specifically, the correlation calculation formula is
Figure PCTCN2018094409-appb-000023
Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:基于奇异值,获取奇异值矩阵;基于特征值和奇异值矩阵,获取每一奇异值对应的原始信号分量。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: obtaining a singular value matrix based on the singular values; and based on the eigenvalues And singular value matrix to obtain the original signal component corresponding to each singular value.
在一个实施例中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时还实现以下步骤:计算至少两个奇异值的总和,将总和与预设阈值进行乘法运算,获取对应的评价阈值;其中,预设阈值为不大于1的正数。将至少两个奇异值按从大到小的顺序进行线性叠加,获取叠加和值,若叠加和值大于评价阈值,则获取叠加和值对应的N项奇异值;其中,N为正整数。对N项奇异值进行批量重构,获取目标语音信号。In one embodiment, when the computer-readable instructions are executed by one or more processors, the execution of the one or more processors further implements the following steps: calculating a sum of at least two singular values, and summing the sum with The preset threshold value is multiplied to obtain a corresponding evaluation threshold value, wherein the preset threshold value is a positive number not greater than 1. At least two singular values are linearly superimposed in order from large to small to obtain a superposition sum value. If the superposition sum value is greater than the evaluation threshold, then N singular values corresponding to the superposition sum value are obtained; where N is a positive integer. Perform batch reconstruction on N singular values to obtain the target speech signal.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a non-volatile computer-readable In the storage medium, when the computer-readable instructions are executed, the computer-readable instructions may include the processes of the embodiments of the methods described above. Wherein, any reference to the storage, storage, database, or other media used in the embodiments provided in this application may include non-volatile and / or volatile storage. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功 能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to describe the technical solution of the present application, but not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of this application.

Claims (20)

  1. 一种语音增强方法,其特征在于,包括:A speech enhancement method, comprising:
    对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
    基于所述数字语音信号,获取汉克尔矩阵;Obtaining a Hankel matrix based on the digital speech signal;
    对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;Performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values;
    对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号;Performing an inverse singular value decomposition operation on at least two of the singular values to obtain a target speech signal;
    对所述目标语音信号进行还原处理,获取目标语音信息。Performing restoration processing on the target voice signal to obtain target voice information.
  2. 如权利要求1所述的语音增强方法,其特征在于,所述对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值,包括:The speech enhancement method according to claim 1, wherein the performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values comprises:
    计算所述汉克尔矩阵的转置矩阵;Calculating a transposed matrix of the Hankel matrix;
    基于所述汉克尔矩阵和所述转置矩阵的乘积,获取至少两个特征值;Obtaining at least two eigenvalues based on a product of the Hankel matrix and the transposed matrix;
    按照预设计算方法对至少两个所述特征值进行运算,获取至少两个所述奇异值。Operate at least two of the eigenvalues according to a preset calculation method to obtain at least two of the singular values.
  3. 如权利要求2所述的语音增强方法,其特征在于,所述对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号,包括:The method of claim 2, wherein the step of performing inverse singular value decomposition on at least two of the singular values to obtain a target speech signal comprises:
    对至少两个所述奇异值分别进行奇异值分解逆运算处理,获取每一所述奇异值对应的原始信号分量;Performing singular value decomposition and inverse operation processing on at least two of the singular values, respectively, to obtain an original signal component corresponding to each of the singular values;
    将所述原始信号分量与所述数字语音信号进行相关性计算,获取相关性系数;Performing a correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient;
    选取所述相关性系数大于预设阈值的所述原始信号分量,作为目标信号分量;Selecting the original signal component whose correlation coefficient is greater than a preset threshold as a target signal component;
    对所述目标信号分量进行线性叠加处理,获取目标语音信号。Performing linear superposition processing on the target signal component to obtain a target speech signal.
  4. 如权利要求3所述的语音增强方法,其特征在于,所述对至少两个所述奇异值分别进行奇异值分解逆运算处理,获取每一所述奇异值对应的原始信号分量,包括:The speech enhancement method according to claim 3, wherein said performing singular value decomposition inverse operation processing on at least two of said singular values respectively to obtain an original signal component corresponding to each of said singular values comprises:
    基于所述奇异值,获取奇异值矩阵;Obtaining a singular value matrix based on the singular value;
    基于所述特征值和所述奇异值矩阵,获取每一奇异值对应的原始信号分量。Based on the eigenvalues and the singular value matrix, an original signal component corresponding to each singular value is obtained.
  5. 如权利要求3所述的语音增强方法,其特征在于,所述相关性计算公式为
    Figure PCTCN2018094409-appb-100001
    其中,x为原始信号分量,y为数字语音信号,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为相关性系数。
    The speech enhancement method according to claim 3, wherein the correlation calculation formula is
    Figure PCTCN2018094409-appb-100001
    Where x is the original signal component, y is the digital speech signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
  6. 如权利要求1所述的语音增强方法,其特征在于,所述对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号,包括:The speech enhancement method according to claim 1, wherein the performing a singular value decomposition inverse operation on at least two of the singular values to obtain a target speech signal comprises:
    计算至少两个所述奇异值的总和,将所述总和与预设阈值进行乘法运算,获取对应的评价阈值;其中,预设阈值为不大于1的正数;Calculating a sum of at least two singular values, and multiplying the sum with a preset threshold to obtain a corresponding evaluation threshold; wherein the preset threshold is a positive number not greater than 1;
    将至少两个所述奇异值按从大到小的顺序进行线性叠加,获取叠加和值,若所述叠加和值大于所述评价阈值,则获取所述叠加和值对应的N项奇异值;其中,N为正整数;Linearly superimpose at least two of the singular values in order from large to small to obtain a superposition sum value, and if the superposition sum value is greater than the evaluation threshold, obtain N singular values corresponding to the superposition sum value; Where N is a positive integer;
    对N项奇异值进行批量重构,获取目标语音信号。Perform batch reconstruction on N singular values to obtain the target speech signal.
  7. 一种语音增强装置,其特征在于,包括:A speech enhancement device, comprising:
    数字语音信号获取模块,用于对原始语音信息进行转换,获取数字语音信号;Digital voice signal acquisition module, for converting original voice information to obtain digital voice signals;
    汉克尔矩阵获取模块,用于基于所述数字语音信号,获取汉克尔矩阵;A Hankel matrix acquisition module, configured to acquire a Hankel matrix based on the digital speech signal;
    奇异值获取模块,用于对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;A singular value acquisition module, configured to perform singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values;
    目标语音信号获取模块,用于对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号;A target voice signal acquisition module, configured to perform an inverse singular value decomposition operation on at least two of the singular values to obtain a target voice signal;
    目标语音信息获取模块,用于对所述目标语音信号进行还原处理,获取目标语音信息。A target voice information acquisition module is configured to perform restoration processing on the target voice signal to acquire target voice information.
  8. 如权利要求7所述的语音增强装置,其特征在于,所述奇异值获取模块包括:The speech enhancement device according to claim 7, wherein the singular value acquisition module comprises:
    转置矩阵计算单元,用于计算所述汉克尔矩阵的转置矩阵;A transpose matrix calculation unit, configured to calculate the transpose matrix of the Hankel matrix;
    特征值获取单元,用于基于所述汉克尔矩阵和所述转置矩阵的乘积,获取至少两个特征值;A eigenvalue obtaining unit, configured to obtain at least two eigenvalues based on a product of the Hankel matrix and the transposed matrix;
    奇异值获取单元,用于按照预设计算方法对至少两个所述特征值进行运算,获取至少两个所述奇异值。The singular value obtaining unit is configured to perform an operation on at least two of the eigenvalues according to a preset calculation method to obtain at least two of the singular values.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and is characterized in that the processor implements the computer-readable instructions as follows step:
    对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
    基于所述数字语音信号,获取汉克尔矩阵;Obtaining a Hankel matrix based on the digital speech signal;
    对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;Performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values;
    对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号;Performing an inverse singular value decomposition operation on at least two of the singular values to obtain a target speech signal;
    对所述目标语音信号进行还原处理,获取目标语音信息。Performing restoration processing on the target voice signal to obtain target voice information.
  10. 如权利要求9所述的计算机设备,其特征在于,所述对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值,包括:The computer device according to claim 9, wherein the performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values comprises:
    计算所述汉克尔矩阵的转置矩阵;Calculating a transposed matrix of the Hankel matrix;
    基于所述汉克尔矩阵和所述转置矩阵的乘积,获取至少两个特征值;Obtaining at least two eigenvalues based on a product of the Hankel matrix and the transposed matrix;
    按照预设计算方法对至少两个所述特征值进行运算,获取至少两个所述奇异值。Operate at least two of the eigenvalues according to a preset calculation method to obtain at least two of the singular values.
  11. 如权利要求10所述的计算机设备,其特征在于,所述对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号,包括:The computer device according to claim 10, wherein said performing a singular value decomposition inverse operation on at least two of said singular values to obtain a target speech signal comprises:
    对至少两个所述奇异值分别进行奇异值分解逆运算处理,获取每一所述奇异值对应的原始信号分量;Performing singular value decomposition and inverse operation processing on at least two of the singular values, respectively, to obtain an original signal component corresponding to each of the singular values;
    将所述原始信号分量与所述数字语音信号进行相关性计算,获取相关性系数;Performing a correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient;
    选取所述相关性系数大于预设阈值的所述原始信号分量,作为目标信号分量;Selecting the original signal component whose correlation coefficient is greater than a preset threshold as a target signal component;
    对所述目标信号分量进行线性叠加处理,获取目标语音信号。Performing linear superposition processing on the target signal component to obtain a target speech signal.
  12. 如权利要求11所述的计算机设备,其特征在于,所述对至少两个所述奇异值分别进行奇异值分解逆运算处理,获取每一所述奇异值对应的原始信号分量,包括:The computer device according to claim 11, wherein the performing singular value decomposition and inverse operation processing on at least two of the singular values respectively to obtain an original signal component corresponding to each of the singular values comprises:
    基于所述奇异值,获取奇异值矩阵;Obtaining a singular value matrix based on the singular value;
    基于所述特征值和所述奇异值矩阵,获取每一奇异值对应的原始信号分量。Based on the eigenvalues and the singular value matrix, an original signal component corresponding to each singular value is obtained.
  13. 如权利要求11所述的计算机设备,其特征在于,所述相关性计算公式为
    Figure PCTCN2018094409-appb-100002
    其中,x为原始信号分量,y为数字语音信号,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为相关性系数。
    The computer device according to claim 11, wherein the correlation calculation formula is
    Figure PCTCN2018094409-appb-100002
    Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
  14. 如权利要求9所述的计算机设备,其特征在于,所述对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号,包括:The computer device according to claim 9, wherein said performing a singular value decomposition inverse operation on at least two of said singular values to obtain a target speech signal comprises:
    计算至少两个所述奇异值的总和,将所述总和与预设阈值进行乘法运算,获取对应的评价阈值;其中,预设阈值为不大于1的正数;Calculating a sum of at least two singular values, and multiplying the sum with a preset threshold to obtain a corresponding evaluation threshold; wherein the preset threshold is a positive number not greater than 1;
    将至少两个所述奇异值按从大到小的顺序进行线性叠加,获取叠加和值,若所述叠加和值大于所述评价阈值,则获取所述叠加和值对应的N项奇异值;其中,N为正整数;Linearly superimpose at least two of the singular values in order from large to small to obtain a superposition sum value, and if the superposition sum value is greater than the evaluation threshold, obtain N singular values corresponding to the superposition sum value; Where N is a positive integer;
    对N项奇异值进行批量重构,获取目标语音信号。Perform batch reconstruction on N singular values to obtain the target speech signal.
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer readable instructions, characterized in that when the computer readable instructions are executed by one or more processors, the one or more processors are caused to execute The following steps:
    对原始语音信息进行转换,获取数字语音信号;Convert the original voice information to obtain digital voice signals;
    基于所述数字语音信号,获取汉克尔矩阵;Obtaining a Hankel matrix based on the digital speech signal;
    对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值;Performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values;
    对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号;Performing an inverse singular value decomposition operation on at least two of the singular values to obtain a target speech signal;
    对所述目标语音信号进行还原处理,获取目标语音信息。Performing restoration processing on the target voice signal to obtain target voice information.
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述对所述汉克尔矩阵进行奇异值分解运算处理,获取至少两个奇异值,包括:The non-volatile readable storage medium according to claim 15, wherein the performing singular value decomposition operation processing on the Hankel matrix to obtain at least two singular values comprises:
    计算所述汉克尔矩阵的转置矩阵;Calculating a transposed matrix of the Hankel matrix;
    基于所述汉克尔矩阵和所述转置矩阵的乘积,获取至少两个特征值;Obtaining at least two eigenvalues based on a product of the Hankel matrix and the transposed matrix;
    按照预设计算方法对至少两个所述特征值进行运算,获取至少两个所述奇异值。Operate at least two of the eigenvalues according to a preset calculation method to obtain at least two of the singular values.
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号,包括:The non-volatile readable storage medium according to claim 16, wherein said performing a singular value decomposition inverse operation on at least two of said singular values to obtain a target speech signal comprises:
    对至少两个所述奇异值分别进行奇异值分解逆运算处理,获取每一所述奇异值对应的原始信号分量;Performing singular value decomposition and inverse operation processing on at least two of the singular values, respectively, to obtain an original signal component corresponding to each of the singular values;
    将所述原始信号分量与所述数字语音信号进行相关性计算,获取相关性系数;Performing a correlation calculation between the original signal component and the digital voice signal to obtain a correlation coefficient;
    选取所述相关性系数大于预设阈值的所述原始信号分量,作为目标信号分量;Selecting the original signal component whose correlation coefficient is greater than a preset threshold as a target signal component;
    对所述目标信号分量进行线性叠加处理,获取目标语音信号。Performing linear superposition processing on the target signal component to obtain a target speech signal.
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述对至少两个所述奇异值分别进行奇异值分解逆运算处理,获取每一所述奇异值对应的原始信号分量,包括:The non-volatile readable storage medium according to claim 17, wherein the at least two singular values are respectively subjected to singular value decomposition inverse operation processing to obtain an original signal corresponding to each of the singular values. Weight, including:
    基于所述奇异值,获取奇异值矩阵;Obtaining a singular value matrix based on the singular value;
    基于所述特征值和所述奇异值矩阵,获取每一奇异值对应的原始信号分量。Based on the eigenvalues and the singular value matrix, an original signal component corresponding to each singular value is obtained.
  19. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述相关性计算公式为
    Figure PCTCN2018094409-appb-100003
    其中,x为原始信号分量,y为数字语音信号,Cov(x,y)为x与y的协方差,Var[x]为x的方差,Var[y]为y的方差,r为相关性系数。
    The non-volatile readable storage medium according to claim 17, wherein the correlation calculation formula is
    Figure PCTCN2018094409-appb-100003
    Where x is the original signal component, y is the digital voice signal, Cov (x, y) is the covariance of x and y, Var [x] is the variance of x, Var [y] is the variance of y, and r is the correlation coefficient.
  20. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述对至少两个所述奇异值进行奇异值分解逆运算,获取目标语音信号,包括:The non-volatile readable storage medium according to claim 15, wherein the performing a singular value decomposition inverse operation on at least two of the singular values to obtain a target voice signal comprises:
    计算至少两个所述奇异值的总和,将所述总和与预设阈值进行乘法运算,获取对应的评价阈值;其中,预设阈值为不大于1的正数;Calculating a sum of at least two singular values, and multiplying the sum with a preset threshold to obtain a corresponding evaluation threshold; wherein the preset threshold is a positive number not greater than 1;
    将至少两个所述奇异值按从大到小的顺序进行线性叠加,获取叠加和值,若所述叠加和值大于所述评价阈值,则获取所述叠加和值对应的N项奇异值;其中,N为正整数;Linearly superimpose at least two of the singular values in order from large to small to obtain a superposition sum value, and if the superposition sum value is greater than the evaluation threshold, obtain N singular values corresponding to the superposition sum value; Where N is a positive integer;
    对N项奇异值进行批量重构,获取目标语音信号。Perform batch reconstruction on N singular values to obtain the target speech signal.
PCT/CN2018/094409 2018-05-29 2018-07-04 Voice enhancement method and apparatus, and computer device and storage medium WO2019227588A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810529510.6A CN108831494B (en) 2018-05-29 2018-05-29 Voice enhancement method and device, computer equipment and storage medium
CN201810529510.6 2018-05-29

Publications (1)

Publication Number Publication Date
WO2019227588A1 true WO2019227588A1 (en) 2019-12-05

Family

ID=64146072

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094409 WO2019227588A1 (en) 2018-05-29 2018-07-04 Voice enhancement method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN108831494B (en)
WO (1) WO2019227588A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112420066A (en) * 2020-11-05 2021-02-26 深圳市卓翼科技股份有限公司 Noise reduction method, noise reduction device, computer equipment and computer readable storage medium
CN112560412A (en) * 2020-12-25 2021-03-26 北京百度网讯科技有限公司 Information completion method, device, equipment and storage medium
CN112818290A (en) * 2021-01-21 2021-05-18 支付宝(杭州)信息技术有限公司 Method and device for determining object feature correlation in private data in multi-party combination manner

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060700B (en) * 2019-03-12 2021-07-30 上海微波技术研究所(中国电子科技集团公司第五十研究所) Short sequence audio analysis method based on parameter spectrum estimation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559886A (en) * 2013-09-24 2014-02-05 浙江大学 Speech signal enhancing method based on group sparse low-rank expression
CN105120419A (en) * 2015-08-27 2015-12-02 武汉大学 Method and system for enhancing effect of multichannel system
CN107094277A (en) * 2016-02-18 2017-08-25 谷歌公司 Signal processing method and system for the rendering audio on virtual speaker array

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9716959B2 (en) * 2013-05-29 2017-07-25 Qualcomm Incorporated Compensating for error in decomposed representations of sound fields
CN105807241B (en) * 2016-03-23 2018-05-29 厦门大学 A kind of exponential signal denoising method using prior information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559886A (en) * 2013-09-24 2014-02-05 浙江大学 Speech signal enhancing method based on group sparse low-rank expression
CN105120419A (en) * 2015-08-27 2015-12-02 武汉大学 Method and system for enhancing effect of multichannel system
CN107094277A (en) * 2016-02-18 2017-08-25 谷歌公司 Signal processing method and system for the rendering audio on virtual speaker array

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112420066A (en) * 2020-11-05 2021-02-26 深圳市卓翼科技股份有限公司 Noise reduction method, noise reduction device, computer equipment and computer readable storage medium
CN112420066B (en) * 2020-11-05 2024-05-14 深圳市卓翼科技股份有限公司 Noise reduction method, device, computer equipment and computer readable storage medium
CN112560412A (en) * 2020-12-25 2021-03-26 北京百度网讯科技有限公司 Information completion method, device, equipment and storage medium
CN112560412B (en) * 2020-12-25 2023-09-01 北京百度网讯科技有限公司 Information complement method, device, equipment and storage medium
CN112818290A (en) * 2021-01-21 2021-05-18 支付宝(杭州)信息技术有限公司 Method and device for determining object feature correlation in private data in multi-party combination manner
CN112818290B (en) * 2021-01-21 2023-11-14 支付宝(杭州)信息技术有限公司 Method and device for determining object feature correlation in privacy data by multiparty combination

Also Published As

Publication number Publication date
CN108831494B (en) 2022-07-19
CN108831494A (en) 2018-11-16

Similar Documents

Publication Publication Date Title
WO2019227588A1 (en) Voice enhancement method and apparatus, and computer device and storage medium
US10621971B2 (en) Method and device for extracting speech feature based on artificial intelligence
WO2020147445A1 (en) Rephotographed image recognition method and apparatus, computer device, and computer-readable storage medium
EP3627759B1 (en) Method and apparatus for encrypting data, method and apparatus for training machine learning model, and electronic device
Wu et al. Asymptotic properties of sufficient dimension reduction with a diverging number of predictors
Murtaza et al. Face recognition using adaptive margin fisher’s criterion and linear discriminant analysis
CN109710402A (en) Method, apparatus, computer equipment and the storage medium of process resource acquisition request
JP6278042B2 (en) Information processing apparatus and image processing method
WO2019227589A1 (en) Speech enhancement method and apparatus, computer device, and storage medium
WO2020000877A1 (en) Method and device for generating image
CN110796000A (en) Lip sample generation method and device based on bidirectional LSTM and storage medium
US11520837B2 (en) Clustering device, method and program
Salhov et al. Approximately-isometric diffusion maps
CN113689371B (en) Image fusion method, device, computer equipment and storage medium
Hardiansyah et al. Single image super-resolution via multiple linear mapping anchored neighborhood regression
Chau et al. Intrinsic data depth for Hermitian positive definite matrices
WO2018024259A1 (en) Method and device for training voiceprint recognition system
CN112001285A (en) Method, device, terminal and medium for processing beautifying image
WO2020000878A1 (en) Method and apparatus for generating image
Chen et al. KPCA method based on within-class auxiliary training samples and its application to pattern classification
CN112506423B (en) Method and device for dynamically accessing storage equipment in cloud storage system
CN113779498A (en) Discrete Fourier matrix reconstruction method, device, equipment and storage medium
CN116167411A (en) Training method for decomposition convolution model, lesion area prediction method and related equipment
CN115797643A (en) Image denoising method, device and system, edge device and storage medium
He et al. The nyström kernel conjugate gradient algorithm based on $ k $-means sampling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921142

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18921142

Country of ref document: EP

Kind code of ref document: A1