CN108831494A

CN108831494A - Sound enhancement method, device, computer equipment and storage medium

Info

Publication number: CN108831494A
Application number: CN201810529510.6A
Authority: CN
Inventors: 涂宏
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2018-11-16
Anticipated expiration: 2038-05-29
Also published as: CN108831494B; WO2019227588A1

Abstract

The present invention discloses a kind of sound enhancement method, device, computer equipment and storage medium, the sound enhancement method：Original speech information is converted, audio digital signals are obtained；Based on the audio digital signals, Hankel matrix is obtained；Singular value decomposition calculation process is carried out to the Hankel matrix, obtains at least two singular values；Singular value decomposition inverse operation is carried out at least two singular values, obtains targeted voice signal.Reduction treatment is carried out to the targeted voice signal, obtains target voice information.The sound enhancement method can effectively inhibit noise jamming, to improve the accuracy rate identified in speech recognition process to target voice information.

Description

Sound enhancement method, device, computer equipment and storage medium

Technical field

The present invention relates to field of signal processing more particularly to a kind of sound enhancement method, device, computer equipment and storages Medium.

Background technique

With being widely used for speech recognition technology, the demand of voice process technology also expands therewith.Currently, counting The collected voice signal of machine equipment is calculated, had both included the corresponding voice messaging of speaker's sound of speaking, which, which belongs to, has Information is imitated, the noise information also formed comprising other sound other than speaker's sound of speaking.In speech recognition process, If directly identifying to the collected voice signal of computer equipment, due to the presence of noise information, speech recognition will affect Accuracy.Therefore, it is necessary to carry out enhancing processing to the collected voice signal of computer equipment (to drop voice signal Make an uproar processing), to extract purer voice signal as far as possible from the voice signal, so that speech recognition is more accurate.When The voice signal precision extracted after the preceding progress speech enhan-cement processing to voice signal is not high, is unfavorable for subsequent carry out speech recognition.

Summary of the invention

Based on this, it is necessary to which in view of the above technical problems, voice signal after speech enhan-cement processing can be promoted by providing one kind Sound enhancement method, device, computer equipment and the storage medium of precision.

A kind of sound enhancement method, including：

Original speech information is converted, audio digital signals are obtained；

Based on the audio digital signals, Hankel matrix is obtained；

Singular value decomposition calculation process is carried out to the Hankel matrix, obtains at least two singular values；

Singular value decomposition inverse operation is carried out at least two singular values, obtains targeted voice signal；

Reduction treatment is carried out to the targeted voice signal, obtains target voice information.

A kind of speech sound enhancement device, including：

Audio digital signals obtain module, for converting to original speech information, obtain audio digital signals；

Hankel matrix obtains module, for being based on the audio digital signals, obtains Hankel matrix；

Singular value obtains module, for carrying out singular value decomposition calculation process to the Hankel matrix, obtains at least two A singular value；

Targeted voice signal obtains module, for carrying out singular value decomposition inverse operation at least two singular values, obtains Take targeted voice signal；

Target voice data obtaining module obtains target voice for carrying out reduction treatment to the targeted voice signal Information.

A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize the step of above-mentioned sound enhancement method when executing the computer program Suddenly.

A kind of non-volatile memory medium, the non-volatile memory medium are stored with computer program, the computer The step of above-mentioned sound enhancement method is realized when program is executed by processor.

In above-mentioned sound enhancement method, device, computer equipment and storage medium, first original speech information is turned It changes, obtains audio digital signals, audio digital signals are configured to Hankel matrix, to carry out singular value to Hankel matrix Decomposition operation processing, obtains at least two singular values, since singular value often corresponds to the important information implied in matrix, and again The property wanted and singular value size are positively correlated.By obtaining singular value, it can intuitively observe and effectively believe included in singular value The degree of breath amount.Then, singular value decomposition inverse operation is carried out at least two singular values, to obtain the corresponding language of each singular value Sound signal, that is, targeted voice signal, to achieve the purpose that carry out dimensionality reduction to data.Finally, carrying out also original place to targeted voice signal Reason, to obtain target voice information, realizes the purpose of speech enhan-cement.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is an applied environment figure of sound enhancement method in one embodiment of the invention；

Fig. 2 is a flow chart of sound enhancement method in one embodiment of the invention；

Fig. 3 is a specific flow chart of step S30 in Fig. 2；

Fig. 4 is a specific flow chart of step S40 in Fig. 2；

Fig. 5 is a specific flow chart of step S411 in Fig. 4；

Fig. 6 is a specific flow chart of step S40 in Fig. 2；

Fig. 7 is a schematic diagram of speech sound enhancement device in one embodiment of the invention；

Fig. 8 is a schematic diagram of computer equipment in one embodiment of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

Sound enhancement method provided in an embodiment of the present invention can be applicable in the application environment such as Fig. 1, wherein computer Equipment is communicated by network with server.Computer equipment can be, but not limited to various personal computers, laptop, Smart phone, tablet computer and portable wearable device.Server can be realized with independent server.

The sound enhancement method specifically can be applicable to the financial institutions such as bank, security, insurance or other mechanisms configuration In computer equipment, for carrying out speech enhan-cement to voice signal in speech recognition process, to improve the accuracy rate of identification.

In one embodiment, as shown in Fig. 2, being carried out by taking the server that the sound enhancement method is applied in Fig. 1 as an example Illustrate, includes the following steps：

S10：Original speech information is converted, audio digital signals are obtained.

Wherein, original speech information is the language of recording module (such as microphone) the collected speaker in computer equipment Message breath.The original speech information can be the voice messaging of wav, mp3 or extended formatting.Audio digital signals refer to will be original Voice messaging carries out converting acquired discrete digital signal.Since computer equipment is cannot directly to handle original speech information , it can only handle binary data, it is therefore desirable to which original speech information is converted to audio digital signals.

Specifically, server receives the original speech information that computer equipment is sent, and using the reading in Python module It takes the command functions of audio file to read the original speech information, obtains audio digital signals.For example, the reading audio file Command functions can be wave.open (file (original speech information), rb (read file operation)), pass through the reading audio The command functions of file are read out original speech information, and the one-dimension array of the audio file got is digital speech letter Number.Python module be it is a kind of by the explanation type computer programming language of object-oriented write comprising a large amount of encapsulation letter Several modules.In the present embodiment, raw tone is directly read using the command functions of the reading audio file in Python module Information can obtain audio digital signals, realize simple.

To sum up, audio digital signals are the one-dimensional digital information for get after conversion process to original speech information, Specifically directly read one acquired in original speech information using the command functions of the reading audio file in Python module Dimension word signal.

S20：Based on audio digital signals, Hankel matrix is obtained.

Wherein, audio digital signals are the one-dimensional digital information for get after conversion process to original speech information One-dimensional digital signal.Hankel matrix (Hankel Matrix) refers to all equal square matrix of the element on each counter-diagonal.

Specifically, Hankel matrix has following representation：Assuming that audio digital signals (one-dimensional digital signal sequences) For x (i), length N, i=1,2,3 ... N, then

Wherein, n is matrix element prime number Amount.The element of jth row is to move to left an element by the element of lastrow to be formed in Hankel matrix, so that Hankel matrix Element on middle each counter-diagonal is equal, i.e., the element in every a line is equal with the adjacent element in its lower left corner.The upper right corner Diagonal line to the lower left corner is counter-diagonal.

In the present embodiment, the first column element and last line element of Hankel matrix need to be pre-defined, to determine the Chinese The row and column of Ke Er matrix constructs Hankel matrix according to the two parameters, provides skill for subsequent progress singular value decomposition operation Art is supported.It is to be appreciated that the first element of last line element is identical as the last bit element of the first column element.For example, given First column element of matrix is A=(1,2,3,4), and last line element B=(4,4.5,5.5) of matrix are then based on the two Parameter building Hankel matrix be

S30：Singular value decomposition calculation process is carried out to Hankel matrix, obtains at least two singular values.

Wherein, singular value decomposition (Singular Value Decomposition, abbreviation SVD decomposition) is in linear algebra A kind of important matrix decomposition, the singular value decomposition calculation process effectively can carry out dimensionality reduction to high-volume data, to reduce fortune Calculation amount saves operation time.Specifically, server, which carries out singular value decomposition to Hankel matrix, can obtain two unitary matrice and one A positive semidefinite diagonal matrix, the value on positive semidefinite diagonal matrix diagonal line is singular value, and singular value typically contains N (N>It is 2) a, By sequence arrangement from big to small.The important information that singular value can imply in characterization matrix, and importance and singular value size are just It is related.It is to be appreciated that singular value is bigger, then the effective information for the audio digital signals that the singular value includes is bigger；Conversely, Singular value is smaller, then the effective information for the audio digital signals that the singular value includes is fewer, assert in the present embodiment comprising more More noises.For server by carrying out singular value decomposition calculation process to Hankel matrix, at least two singular values of acquisition can The degree for intuitively observing effective information included in singular value, is convenient for noise reduction process.

Specifically, singular value decomposition operation can be indicated using singular value decomposition formula, i.e. H=UDV^*, wherein U, V is two unitary matrice, and D is positive semidefinite diagonal matrix.Unitary matrice (Unitary Matrix), which refers to, meets n column vector in matrix It is the matrix of the condition of the unit vector of pairwise orthogonal, i.e., the conjugate transposition of unitary matrice is equal with its inverse matrix.If A is number field On a n rank square matrix, if there are another n rank matrix B on identical number field, so that (E is that unit matrix is to AB=BA=E The element on diagonal line from the upper left corner to the lower right corner is 1 n rank square matrix), then claiming B is the inverse matrix of A.Conjugate transposition refers to After matrix transposition, then each of matrix element changed into its conjugate complex number.Conjugate complex number refers to that two real parts are equal, The plural number of imaginary part opposite number each other.For example, the conjugate complex number of z is z ˊ=a-bi (a, b ∈ R) in z=a+bi (a, b ∈ R).Half It is both positive semidefinite matrix and the matrix of diagonal matrix that positive definite diagonal matrix, which refers to,.Positive semidefinite matrix be to any non-vanishing vector X, There is X'AX >=0, the n rank square matrix of (transposition of X ' expression X), wherein A is positive semidefinite matrix.Diagonal matrix is that a master is diagonal Element except line (diagonal line from the upper left corner to the lower right corner) is all 0 matrix.

In one embodiment, as shown in figure 3, in step S30, i.e., Hankel matrix is carried out at singular value decomposition operation Reason obtains at least two singular values, specifically comprises the following steps：

S31：Calculate the transposed matrix of Hankel matrix.

Wherein, the transposed matrix of Hankel matrix refers to all elements of Hankel matrix around one article from the 1st row the 1st The ray for 45 degree of the lower right that column element sets out makees the obtained matrix of mirror-inverted.For example, setting Hankel matrixThe then transposed matrix of Hankel matrixBy the transposition for obtaining Hankel matrix Matrix provides technical support for subsequent acquisition characteristic value.

S32：Product based on Hankel matrix and transposed matrix obtains at least two characteristic values.

Specifically, if A is Hankel matrix, A^TFor transposed matrix, that is, formula B=AA can be used^TAnd B'=A^TA calculates the Chinese The corresponding matrix B of the product of Ke Er matrix and transposed matrix and matrix B ', at least two can be obtained by calculate according to Bx=mx A characteristic value.If B is n rank square matrix, then claim m so that Bx=mx equation is set up if there is real number m and non-zero n dimensional vector x It is a characteristic value of B, characteristic value reflects the flexible multiple converted to matrix, by carrying out stretching to matrix, To realize the purpose that data are carried out with dimensionality reduction.

Specifically, if Hankel matrixThe transposed matrix of Hankel matrix The then product based on Hankel matrix and transposed matrix obtains at least two characteristic values, specifically includes following process：

(1) formula B=AA is used^TAnd B'=A^TA calculate Hankel matrix and transposed matrix the corresponding matrix B of product and Matrix B ', for example, using formula B=AA^TIt is calculatedPass through formula B'=A^TA is calculated

(2) using the calculation formula of matrix determinant to matrix B and matrix B ' it handles, obtain at least two features Value.Wherein, the calculation formula of matrix determinant isMatrix Σ indicates to all rows Column summation, τ indicate arrangement k₁k₂…k_nPermutation number, D is known as determinant of a matrix.The calculation formula of permutation number isBy taking B ' as an example, pass through the matrix determinant of calculating matrix B ' Obtain eigenvalue λ₁=3 and λ₂=1.

(3) pass through formula Bu_i=λ_iu_iWith formula B'v_i=λ_iv_iIt carries out at least two eigenvalue λs_iIt is handled, is obtained Feature vector corresponding with each characteristic value, wherein u_iFor feature vector corresponding with the characteristic value of matrix B, v_iFor with matrix B ' The corresponding feature vector of characteristic value.Product of the server based on Hankel matrix and transposed matrix obtains characteristic value and feature Vector, to realize the purpose of Data Dimensionality Reduction.

S33：Operation is carried out at least two characteristic values according to default calculation method, obtains at least two singular values.

Wherein, default calculation method refers to predetermined for carrying out calculating the calculating side for obtaining singular value to characteristic value Method.The default calculation method includes using formulaExtracting operation is carried out to singular value or uses formula Av_i=σ_iu_iAt least two characteristic values are calculated.

Specifically, server uses formulaTo at least two characteristic values carry out extracting operation, can obtain to Few two singular values, wherein σ_iFor singular value, λ_iIt is characterized value.Server to characteristic value into extracting operation, to obtain singular value Method, calculate simple, improve efficiency.

Alternatively, server uses formula Av_i=σ_iu_iAt least two characteristic values are calculated, it is unusual to obtain at least two Value.u_iFor feature vector corresponding with the characteristic value of matrix B, v_iFor with matrix B ' the corresponding feature vector of characteristic value.

Finally, it is based on singular value σ_i, feature vector u_iWith feature vector v_i, obtain carrying out singular value decomposition to Hankel matrix Expression formula, that is, H=UDV^*, wherein

In the present embodiment, the transposed matrix of Hankel matrix is first calculated, so as to based on Hankel matrix and transposed matrix Product obtains at least two characteristic values, then based on the characteristic value got, to the product based on Hankel matrix and transposed matrix Obtained matrix carries out stretching, to realize the purpose that data are carried out with dimensionality reduction.Finally, at least two characteristic values are opened Square operation obtains at least two singular values, and the acquisition methods of the singular value calculate simply, easy to accomplish.

S40：Singular value decomposition inverse operation is carried out at least two singular values, obtains targeted voice signal.

Wherein, singular value decomposition inverse operation, which refers to, is reduced into positive semidefinite diagonal matrix for each singular value, and should be partly Positive definite diagonal matrix is multiplied with two unitary matrice that previous singular value decomposition operation obtains, to obtain target voice information Operation.Targeted voice signal is by the way that audio digital signals are carried out with the voice signal after the obtained denoising of singular value decomposition.Tool Body, server carries out singular value decomposition inverse operation at least two singular values, to obtain the corresponding voice letter of each singular value Number (i.e. targeted voice signal), to achieve the purpose that speech enhan-cement.

In one embodiment, as shown in figure 4, in step S40, i.e., the inverse fortune of singular value decomposition is carried out at least two singular values It calculates, obtains targeted voice signal, specifically comprise the following steps：

S411：At least two singular values are carried out with singular value decomposition inverse operation processing respectively, it is corresponding to obtain each singular value Original signal component.

Wherein, original signal component is carried out acquired in singular value decomposition inverse operation processing respectively at least two singular values Signal component.Specifically, by each singular value reduction (position of singular value in a matrix is constant) at positive semidefinite diagonal matrix, And be multiplied with two unitary matrice that previous singular value decomposition operation obtains, obtain original signal corresponding with each singular value Component.

S412：Original signal component and audio digital signals are subjected to correlation calculations, obtain relative coefficient.

Wherein, relative coefficient is to carry out meter acquired in correlation calculations to audio digital signals and the first signal component Calculate result.First relative coefficient reflects the degree of correlation of audio digital signals and the first signal component, and also reflects It include the degree of effective information in signal component.

Specifically, correlation calculations formula isWherein, x is original signal component, Y is audio digital signals, and Cov (x, y) is the covariance of x and y, and Var [x] is the variance of x, and Var [y] is the variance of y, and r is phase Close property coefficient.

Wherein, the calculation formula of Cov (x, y) is：'s Calculation formula is Var [x]=E (x²)-E²(x)；The calculation formula of Var [y] is Var [y]=E (y²)-E²(y)；Wherein, E (x) Indicate the mean value of original signal component, E (y) indicates the mean value of audio digital signals, and n indicates the quantity of original signal component, y_j Indicate j-th of audio digital signals in time scale.x_jIndicate j-th of original signal component in same time scale.

S413：The original signal component that relative coefficient is greater than preset threshold is chosen, as echo signal component.

Wherein, preset threshold is pre-defined for screening the threshold value of original signal component.Echo signal component is The original signal component obtained after screening operation is carried out to original signal component using preset threshold.

Since relative coefficient is the real number between 0 to 1, the real number of the preset threshold being chosen between 0 to 1. If relative coefficient is greater than preset threshold, then it represents that the original signal component and the correlation of audio digital signals are big, original letter Effective information in number component comprising audio digital signals is more.If relative coefficient is not more than preset threshold, then it represents that original Signal component and the correlation of audio digital signals are small, and the effective information for including in original signal component is few, can be defaulted as making an uproar Sound.In the present embodiment, by being screened to original signal component, to obtain the biggish original of correlation with audio digital signals Beginning signal component achievees the purpose that speech enhan-cement as echo signal component to reduce noise jamming.Also, the original signal The screening technique of component is realized simply, and the efficiency of speech enhan-cement is improved.

S414：Linear superposition processing is carried out to echo signal component, obtains targeted voice signal.

Specifically, server uses formula W=x₁+x₂+…x_nThe N number of echo signal component got is linearly folded Add, to obtain targeted voice signal, wherein W is targeted voice signal, and x is echo signal component.

In the present embodiment, server first passes through and carries out singular value decomposition inverse operation processing respectively to each singular value, obtains The corresponding original signal component of each singular value, to carry out correlation calculations to original signal component and audio digital signals, Relative coefficient is obtained, the degree of correlation of audio digital signals and the first signal component is reflected, and also reflects signal point It include the degree of effective information in amount.Server is again by screening each original signal component, to obtain and number The biggish original signal component of the correlation of voice signal is reached as echo signal component with finer reduction noise jamming To the purpose of speech enhan-cement.Finally, carrying out linear superposition processing to echo signal component, targeted voice signal, the acquisition are obtained The process of targeted voice signal calculates simply, easy to accomplish, improves the treatment effeciency of speech enhan-cement.

In one embodiment, as shown in figure 5, in step S411, singular value decomposition is carried out respectively at least two singular values Inverse operation processing, obtains the corresponding original signal component of each singular value, specifically comprises the following steps：

S4111：Based on singular value, singular value matrix is obtained.

Wherein, singular value matrix is to carry out each singular value in positive semidefinite diagonal matrix to restore acquired matrix. Specifically, server restores each singular value in positive semidefinite diagonal matrix, to obtain singular value matrix.In the present embodiment, Each singular value is restored, can be indicated according to following formula with obtaining corresponding singular value matrix

Wherein, D_nIndicate the corresponding singular value of n-th of singular value Matrix.

S4112：Based on singular value matrix, the corresponding original signal component of each singular value is obtained.

Specifically, operation is carried out to each singular value matrix according to following formula, it is corresponding with each singular value to obtain Original signal component.

H=UDV^*,U and V* is two Two unitary matrice, D are the corresponding singular value matrix of each singular value, i.e. D₁、D₂…D_n, H is the corresponding original letter of each singular value Number component, U_ikIt is by Bu_i=λ_iu_iThe corresponding matrix of ith feature vector being calculated.V_ikIt is by formula B'v_i=λ_iv_iMeter The obtained corresponding matrix of ith feature vector.

In the present embodiment, first each singular value is restored in positive semidefinite diagonal matrix, to obtain singular value matrix, Then two unitary matrice obtained the corresponding singular value matrix of each singular value and singular value decomposition operation carry out multiplication operation, To obtain the corresponding original signal component of each singular value, screening acquisition echo signal point is carried out to original signal component to be subsequent Amount provides technical support.

In one embodiment, as shown in fig. 6, in step S40, i.e., singular value decomposition is carried out at least two singular values Inverse operation obtains target voice information, specifically comprises the following steps:

S421：Summation and preset threshold are carried out multiplying, obtained corresponding by the summation for calculating at least two singular values Evaluation threshold.Wherein, preset threshold is the positive number no more than 1.

Wherein, preset threshold is the threshold value for Calculation Estimation threshold value pre-defined.Evaluation threshold is for screening The threshold value of singular value.The preset threshold is the positive number no more than 1.Specifically, the summation of all singular values is calculated, then by summation Multiplying is carried out with preset threshold, to obtain Evaluation threshold.That is the calculation formula of Evaluation threshold is：Its In, T is preset threshold, and P is Evaluation threshold, σ_iFor singular value.

S422：At least two singular values are subjected to linear superposition by sequence from big to small, obtain folded addition and value, if superposition It is greater than Evaluation threshold with value, then obtains the folded corresponding N singular value of addition and value.Wherein, N is positive integer.

Specifically, singular value is arranged according to sequence from big to small, therefore according to singular value descending order Progress linear, additive, the folded addition and value of acquisition obtain this and fold the corresponding N surprise of addition and value if folded addition and value is greater than Evaluation threshold Different value, wherein N is positive integer.It is to be appreciated that carrying out linear, additive according to singular value descending order until the N being superimposed The sum of item singular value is greater than Evaluation threshold, then stops being superimposed, to obtain N singular values.Since singular value is bigger, then the singular value The effective information for the audio digital signals for being included is bigger, conversely, singular value is smaller, then the digital language that the singular value is included The effective information of sound signal is fewer, then it is assumed that mainly contains noise.Therefore, server is according to singular value from big to small suitable Sequence carries out linear, additive, until the folded addition and value of the N item singular value of superposition is greater than Evaluation threshold, and by remaining M singular value Removal, to reduce noise jamming.The singular value screening process is not necessarily to each singular value carrying out decomposition inverse operation, then carries out correlation Property analysis, required singular value directly can be filtered out according to Evaluation threshold, it is easy to operate, improve efficiency.

S423：Batch reconstruct is carried out to N singular values, obtains targeted voice signal.

Wherein, batch reconstruct, which refers to, carries out method of the batch reduction treatment to obtain target voice information to N singular values.

Specifically, batch reconstruct are carried out to N singular values, obtain targeted voice signal the specific implementation process is as follows：It will The N item singular value of selection retains in the original positive semidefinite diagonal matrix D that singular value decomposition operation obtains, and size position is constant, Singular value (representing the singular value of the noise) size in positive semidefinite diagonal matrix got rid of returns 0, and position is constant, to obtain packet The target positive semidefinite diagonal matrix M of N item singular value containing selection.Then, target positive semidefinite diagonal matrix M is substituted into above-mentioned unusual It is worth in decomposition formula, U, V are constant, obtain new Hankel matrix H', wherein H'=UD_nV^*, new Hankel matrix H' is pressed It is unfolded according to the property (all equal property of element i.e. on each counter-diagonal) of Hankel matrix, can be acquired Voice signal after denoising, i.e. targeted voice signal in the present embodiment.

To sum up, in the present embodiment, singular value is inverse decompose include each singular value is carried out it is inverse decompose or to singular value into Row batch reconstructs, to obtain targeted voice signal.

In the present embodiment, multiplication fortune is carried out by calculating the summation of at least two singular values, and by summation and preset threshold It calculates, to obtain Evaluation threshold, so that at least two singular value descending orders are carried out linear, additive until the N item of superposition is odd The sum of different value is greater than Evaluation threshold, then stops being superimposed, and to obtain N singular values, and remaining M singular value is removed, to subtract Few noise jamming, achievees the purpose that speech enhan-cement.Finally, carrying out batch reconstruct to N singular values, targeted voice signal is obtained, The process of the acquisition targeted voice signal can by the N item singular value of selection directly singular value decomposition operation obtain it is original partly just Determine to restore in diagonal matrix D, and two obtained with singular value decomposition operation unitary matrice carries out multiplication operation, to obtain target language Sound signal obtains targeted voice signal in such a way that batch reconstructs, and improves the acquisition efficiency of targeted voice signal, and then improve The treatment effeciency of speech enhan-cement.

S50：Reduction treatment is carried out to targeted voice signal, obtains target voice information.

Wherein, target voice information is to carry out restoring acquired voice according to required audio format to targeted voice signal Information.Further, server can be used following method and restore to the targeted voice signal of matrix form：First by Hunk that Matrix is unfolded by secondary diagonal element, and the one-dimensional digital signal after noise reduction can be obtained passes through additional sample frequency parameter With one-dimensional digital signal, target voice information can be obtained.Wherein, sample frequency is also referred to as sample rate or sample rate, fixed The justice number of samples per second extracted from continuous signal and form discrete signal, it indicates with hertz (Hz).

In the present embodiment, raw tone letter is directly read using the command functions of the reading audio file in Python module Breath can obtain sampling frequency parameters.Specifically, there is the function for generating different-format audio file in Python module, directly adjust With the function and sampling frequency parameters and one-dimensional digital signal are assigned, that is, produce the target voice information for needing format.For example, can The function wave for generating wav formatted file in Python module by calling, to the sampling frequency parameters got and a dimension Word signal is handled, and the audio file (i.e. target voice information) of wav format is generated.

In the present embodiment, first original speech information is converted, audio digital signals are obtained, by audio digital signals structure It builds and obtains at least two singular values for Hankel matrix to carry out singular value decomposition calculation process to Hankel matrix, it is unusual The important information that value can imply in characterization matrix, and importance and singular value size are positively correlated, it can be according to singular value be got, directly Observe the degree of effective information included in singular value with seeing.Then, server carries out at least two singular values odd Different value decomposes inverse operation, real to inhibit noise jamming to obtain the corresponding voice signal of each singular value i.e. targeted voice signal Existing speech enhan-cement.Finally, reduction treatment is carried out to targeted voice signal, to obtain audio file, that is, target voice of required format Information, the function which can call directly in Python module is restored, easy to operate.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

In one embodiment, Fig. 7 is shown fills with the one-to-one speech enhan-cement of sound enhancement method in above-described embodiment The schematic diagram set.As shown in fig. 7, the speech sound enhancement device includes that audio digital signals obtain module 10, Hankel matrix obtains Module 20, singular value obtain module 30, targeted voice signal obtains module 40 and target voice data obtaining module.Each function mould Detailed description are as follows for block：

Audio digital signals obtain module 10, for converting to original speech information, obtain audio digital signals.

Hankel matrix obtains module 20, for being based on audio digital signals, obtains Hankel matrix.

Singular value obtains module 30, for carrying out singular value decomposition calculation process to Hankel matrix, obtains at least two Singular value.

Targeted voice signal obtains module 40, for carrying out singular value decomposition inverse operation at least two singular values, obtains Targeted voice signal.

Target voice data obtaining module 50 obtains target language message for carrying out reduction treatment to targeted voice signal Breath.

Specifically, it includes transposed matrix computing unit 31, characteristic value acquiring unit 32 and unusual that singular value, which obtains module 30, It is worth acquiring unit 33.

Transposed matrix computing unit 31, for calculating the transposed matrix of Hankel matrix.

Characteristic value acquiring unit 32 obtains at least two features for the product based on Hankel matrix and transposed matrix Value.

Singular value acquiring unit 33 obtains extremely for carrying out operation at least two characteristic values according to default calculation method Few two singular values.

Specifically, targeted voice signal obtains module 40 and obtains including original signal component acquiring unit 411, relative coefficient Take unit 412, echo signal component acquiring unit 413 and targeted voice signal acquiring unit 414.

Original signal component acquiring unit 411, for carrying out singular value decomposition inverse operation respectively at least two singular values Processing, obtains the corresponding original signal component of each singular value.

Relative coefficient acquiring unit 412, for original signal component and audio digital signals to be carried out correlation calculations, Obtain relative coefficient.

Echo signal component acquiring unit 413, the original signal component for being greater than preset threshold for choosing relative coefficient, As echo signal component.

Targeted voice signal acquiring unit 414 obtains target language for carrying out linear superposition processing to echo signal component Sound signal.

Specifically, original signal component acquiring unit 411 includes that singular value matrix obtains subelement 4111 and original signal Component obtains subelement 4112.

Singular value matrix obtains subelement 4111, for being based on singular value, obtains singular value matrix.

Original signal component obtains subelement 4112, for being based on characteristic value and singular value matrix, obtains each singular value Corresponding original signal component.

Specifically, it includes Evaluation threshold acquiring unit 421, N singular values acquisition lists that targeted voice signal, which obtains module 40, Member 422 and targeted voice signal acquiring unit 423.

Evaluation threshold acquiring unit 421 carries out summation and preset threshold for calculating the summation of at least two singular values Multiplying obtains corresponding Evaluation threshold.Wherein, preset threshold is the positive number no more than 1.

N singular value acquiring units 422, for linearly being folded at least two singular values by sequence from big to small Add, obtain folded addition and value, if folded addition and value is greater than Evaluation threshold, obtains the folded corresponding N singular value of addition and value；Wherein, N is Positive integer.

Targeted voice signal acquiring unit 423 obtains targeted voice signal for carrying out batch reconstruct to N singular values.

Specific about speech sound enhancement device limits the restriction that may refer to above for sound enhancement method, herein not It repeats again.Modules in above-mentioned speech sound enhancement device can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also store in a software form In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 8.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is used for for storing the data for executing sound enhancement method and generating or obtain in the process, such as target language message Breath.The network interface of the computer equipment is used to communicate with external terminal by network connection.The computer program is processed To realize a kind of sound enhancement method when device executes.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize following steps when executing computer program：To raw tone Information is converted, and audio digital signals are obtained；Based on audio digital signals, Hankel matrix is obtained；To Hankel matrix into Row singular value decomposition calculation process obtains at least two singular values；Singular value decomposition inverse operation is carried out at least two singular values, Obtain targeted voice signal；Reduction treatment is carried out to targeted voice signal, obtains target voice information.

In one embodiment, following steps are also realized when processor executes computer program：Calculate Hankel matrix Transposed matrix；Product based on Hankel matrix and transposed matrix obtains at least two characteristic values；According to default calculation method pair At least two characteristic values carry out operation, obtain at least two singular values.

In one embodiment, following steps are also realized when processor executes computer program：To at least two singular values Singular value decomposition inverse operation processing is carried out respectively, obtains the corresponding original signal component of each singular value；By original signal component Correlation calculations are carried out with audio digital signals, obtain relative coefficient；It chooses relative coefficient and is greater than the original of preset threshold Signal component, as echo signal component.

Linear superposition processing is carried out to echo signal component, obtains targeted voice signal.

In one embodiment, following steps are also realized when processor executes computer program：Based on singular value, obtain odd Different value matrix；Based on characteristic value and singular value matrix, the corresponding original signal component of each singular value is obtained.

In one embodiment, following steps are also realized when processor executes computer program：It is unusual to calculate at least two Summation and preset threshold are carried out multiplying, obtain corresponding Evaluation threshold by the summation of value；Wherein, preset threshold is little In 1 positive number.At least two singular values are subjected to linear superposition by sequence from big to small, obtain folded addition and value, if superposition and Value is greater than Evaluation threshold, then obtains the folded corresponding N singular value of addition and value；Wherein, N is positive integer.N singular values are criticized Amount reconstruct, obtains targeted voice signal.

In one embodiment, a kind of non-volatile memory medium is provided, computer program, computer are stored thereon with Following steps are realized when program is executed by processor：Original speech information is converted, audio digital signals are obtained；Based on number Word voice signal obtains Hankel matrix；Singular value decomposition calculation process is carried out to Hankel matrix, it is unusual to obtain at least two Value；Singular value decomposition inverse operation is carried out at least two singular values, obtains targeted voice signal；Targeted voice signal is gone back Original place reason, obtains target voice information.

In one embodiment, following steps are also realized when computer program is executed by processor：Calculate Hankel matrix Transposed matrix；Product based on Hankel matrix and transposed matrix obtains at least two characteristic values；According to default calculation method Operation is carried out at least two characteristic values, obtains at least two singular values.

In one embodiment, following steps are also realized when computer program is executed by processor：It is unusual at least two Value carries out singular value decomposition inverse operation processing respectively, obtains the corresponding original signal component of each singular value；By original signal point Amount carries out correlation calculations with audio digital signals, obtains relative coefficient；Choose the original that relative coefficient is greater than preset threshold Beginning signal component, as echo signal component.

In one embodiment, following steps are also realized when computer program is executed by processor：Based on singular value, obtain Singular value matrix；Based on characteristic value and singular value matrix, the corresponding original signal component of each singular value is obtained.

In one embodiment, following steps are also realized when computer program is executed by processor：It is odd to calculate at least two Summation and preset threshold are carried out multiplying, obtain corresponding Evaluation threshold by the summation of different value；Wherein, preset threshold is not Positive number greater than 1.At least two singular values are subjected to linear superposition by sequence from big to small, obtain folded addition and value, if superposition It is greater than Evaluation threshold with value, then obtains the folded corresponding N singular value of addition and value；Wherein, N is positive integer.N singular values are carried out Batch reconstructs, and obtains targeted voice signal.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that：It still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of sound enhancement method, which is characterized in that including：

Original speech information is converted, audio digital signals are obtained；

Based on the audio digital signals, Hankel matrix is obtained；

2. sound enhancement method as described in claim 1, which is characterized in that described to carry out singular value to the Hankel matrix Decomposition operation processing obtains at least two singular values, including：

Calculate the transposed matrix of the Hankel matrix；

Product based on the Hankel matrix and the transposed matrix obtains at least two characteristic values；

Operation is carried out at least two characteristic values according to default calculation method, obtains at least two singular values.

3. sound enhancement method as claimed in claim 2, which is characterized in that described to carry out surprise at least two singular values Different value decomposes inverse operation, obtains targeted voice signal, including：

Singular value decomposition inverse operation processing is carried out at least two singular values respectively, it is corresponding to obtain each singular value Original signal component；

The original signal component and the audio digital signals are subjected to correlation calculations, obtain relative coefficient；

The original signal component that the relative coefficient is greater than preset threshold is chosen, as echo signal component；

Linear superposition processing is carried out to the echo signal component, obtains targeted voice signal.

4. sound enhancement method as claimed in claim 3, which is characterized in that it is described at least two singular values respectively into Row singular value decomposition inverse operation processing obtains the corresponding original signal component of each singular value, including：

Based on the singular value, singular value matrix is obtained；

Based on the characteristic value and the singular value matrix, the corresponding original signal component of each singular value is obtained.

5. sound enhancement method as claimed in claim 3, which is characterized in that the correlation calculations formula isWherein, x is original signal component, and y is audio digital signals, and Cov (x, y) is x and y Covariance, Var [x] be x variance, Var [y] be y variance, r is relative coefficient.

6. sound enhancement method as described in claim 1, which is characterized in that described to carry out surprise at least two singular values Different value decomposes inverse operation, obtains targeted voice signal, including：

The summation and preset threshold are carried out multiplying, obtained corresponding by the summation for calculating at least two singular values Evaluation threshold；Wherein, preset threshold is the positive number no more than 1；

At least two singular values are subjected to linear superposition by sequence from big to small, folded addition and value are obtained, if the superposition It is greater than the Evaluation threshold with value, then obtains the corresponding N singular value of the folded addition and value；Wherein, N is positive integer；

Batch reconstruct is carried out to N singular values, obtains targeted voice signal.

7. a kind of speech sound enhancement device, which is characterized in that including：

Singular value obtains module, and for carrying out singular value decomposition calculation process to the Hankel matrix, it is odd to obtain at least two Different value；

Targeted voice signal obtains module, for carrying out singular value decomposition inverse operation at least two singular values, obtains mesh Poster sound signal；

Target voice data obtaining module obtains target voice information for carrying out reduction treatment to the targeted voice signal.

8. speech sound enhancement device as claimed in claim 7, which is characterized in that the singular value obtains module and includes：

Transposed matrix computing unit, for calculating the transposed matrix of the Hankel matrix；

It is special to obtain at least two for the product based on the Hankel matrix and the transposed matrix for characteristic value acquiring unit Value indicative；

Singular value acquiring unit obtains at least for carrying out operation at least two characteristic values according to default calculation method Two singular values.

9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to The step of any one of 6 sound enhancement method.

10. a kind of non-volatile memory medium, the non-volatile memory medium is stored with computer program, which is characterized in that It is realized when the computer program is executed by processor as described in any one of claim 1 to 6 the step of sound enhancement method.