CN110164461A

CN110164461A - Audio signal processing method, device, electronic equipment and storage medium

Info

Publication number: CN110164461A
Application number: CN201910611481.2A
Authority: CN
Inventors: 王天宝
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2019-08-23
Anticipated expiration: 2039-07-08
Also published as: CN110164461B

Abstract

The embodiment of the present application provides a kind of audio signal processing method, device, electronic equipment and storage medium.This method comprises: obtaining primary speech signal, linear prediction analysis is carried out to primary speech signal, determine the corresponding original excitation of primary speech signal and first filter, at least one of in the corresponding pole angle information of adjustment first filter and the corresponding pole amplitude information of first filter, first filter after being adjusted, based on the corresponding original excitation of primary speech signal and first filter adjusted, targeted voice signal is determined.The embodiment of the present application, which is realized, is adjusted at least one in the corresponding formant frequency of primary speech signal and formant acutance, obtains targeted voice signal, so as to realize that the voice to user's input carries out the change of voice, and then can promote user experience.

Description

Audio signal processing method, device, electronic equipment and storage medium

Technical field

This application involves signal processing technology fields, specifically, this application involves a kind of audio signal processing methods, dress It sets, electronic equipment and storage medium.

Background technique

With the development of mobile communication, types of applications program is come into being, such as some application journeys for having communication function Sequence.The application program that can have communication function between user and user by these carries out interactive voice, i.e., passes through user The information of phonetic matrix input is sent to peer user, to realize information exchange.

During carrying out information exchange by voice mode between user and user, in order to increase information interactive process Middle interest is sent to opposite end after the voice messaging that user inputs being carried out voice change process, so that opposite end received Voice messaging is different from the voice messaging of user's input.

But how to carry out the change of voice to the voice of user's input becomes a critical issue.

Summary of the invention

This application provides a kind of audio signal processing method, device, electronic equipment and storage mediums, can solve above At least one technical problem.The technical solution is as follows:

In a first aspect, a kind of audio signal processing method is provided, this method comprises:

Obtain primary speech signal；

Linear prediction analysis is carried out to primary speech signal, determines the corresponding original excitation of primary speech signal and first Filter；

It adjusts in the corresponding pole angle information of first filter and the corresponding pole amplitude information of first filter At least one of, the first filter after being adjusted；

Based on the corresponding original excitation of primary speech signal and first filter adjusted, target language message is determined Number.

In a possible implementation, linear prediction analysis is carried out to primary speech signal, determines that raw tone is believed Number corresponding original excitation and first filter, comprising:

Linear prediction analysis is carried out to primary speech signal, determines the corresponding prediction error information of primary speech signal；

Based on the corresponding prediction error information of primary speech signal, determine the corresponding original excitation of primary speech signal and First filter.

In another possible implementation, it is based on the corresponding prediction error information of primary speech signal, is determined original The corresponding original excitation of voice signal and first filter, comprising:

Based on the corresponding prediction error information of primary speech signal, determine that second filter, second filter are linear pre- It surveys and analyzes corresponding filter；

The corresponding original excitation of primary speech signal is determined based on primary speech signal and second filter, and is based on Second filter determines first filter.

In another possible implementation, the corresponding pole angle information of adjustment first filter and the first filtering At least one of in the corresponding pole amplitude information of device, the first filter after being adjusted, comprising:

If the corresponding pole angle information of first filter meets preset condition, according to the first filtering of predetermined manner adjustment At least one of in the corresponding pole angle information of device and pole amplitude information.

In another possible implementation, the corresponding pole angle information of first filter includes: at least one pole The corresponding pole angle value of point；

According to the corresponding pole angle information of predetermined manner adjustment first filter, include at least one of the following:

The corresponding pole angle value of at least one pole is increased by the first preset threshold；

The corresponding pole angle value of at least one pole is reduced by the second preset threshold.

In another possible implementation, the corresponding pole amplitude information of first filter includes: at least one pole The corresponding pole range value of point；

According to the corresponding pole amplitude information of presetting method adjustment first filter, comprising:

The corresponding pole range value of at least one pole is adjusted according to presupposition multiple.

In another possible implementation, primary speech signal is obtained, before further include:

Obtain the voice signal of user's input；

Denoising is carried out to the voice signal of user's input, and using the voice signal after denoising as raw tone Signal.

Second aspect provides a kind of speech signal processing device, which includes:

First obtains module, for obtaining primary speech signal；

First determining module determines that primary speech signal is corresponding for carrying out linear prediction analysis to primary speech signal Original excitation and first filter；

Module is adjusted, for adjusting the corresponding pole angle information of first filter and the corresponding pole of first filter At least one of in amplitude information, the first filter after being adjusted；

Second determining module, for based on the corresponding original excitation of primary speech signal and the first filtering adjusted Device determines targeted voice signal.

In a possible implementation, the first determining module includes the first determination unit and the second determination unit, In,

First determination unit determines that primary speech signal is corresponding for carrying out linear prediction analysis to primary speech signal Prediction error information；

Second determination unit determines primary speech signal for being based on the corresponding prediction error information of primary speech signal Corresponding original excitation and first filter.

In another possible implementation, the second determination unit is specifically used for corresponding based on primary speech signal Prediction error information determines that second filter, second filter are the corresponding filter of linear prediction analysis；

Second determination unit is specifically also used to determine primary speech signal based on primary speech signal and second filter Corresponding original excitation, and first filter is determined based on second filter.

In another possible implementation, module is adjusted, is specifically used for working as the corresponding pole angle of first filter When information meets preset condition, believe according to the corresponding pole angle information of predetermined manner adjustment first filter and pole amplitude At least one of in breath.

In another possible implementation, the corresponding pole angle information of first filter includes: at least one pole The corresponding pole angle value of point；Adjustment module include: adding unit and reduce unit at least one of, wherein

Adding unit, for the corresponding pole angle value of at least one pole to be increased by the first preset threshold；

Unit is reduced, for the corresponding pole angle value of at least one pole to be reduced by the second preset threshold.

Module is adjusted, specifically for the corresponding pole range value of at least one pole to be adjusted according to presupposition multiple.

In another possible implementation, speech signal processing device further include: second obtains at module and denoising Manage module, wherein

Second obtains module, for obtaining the voice signal of user's input；

The voice signal progress denoising that denoising module is used to input user, and by the voice after denoising Signal is as primary speech signal.

The third aspect provides a kind of electronic equipment, which includes:

One or more processors；

Memory；

One or more application program, wherein one or more application programs be stored in memory and be configured as by One or more processors execute, and one or more programs are configured to: executing first aspect or any possibility of first aspect Implementation shown in the corresponding operation of audio signal processing method.

Fourth aspect provides a kind of computer readable storage medium, is stored thereon with computer program, which is located It manages when device executes and realizes audio signal processing method shown in first aspect or any possible implementation of first aspect.

Technical solution provided by the embodiments of the present application has the benefit that

This application provides a kind of audio signal processing method, device, electronic equipment and storage mediums, with prior art phase Than, the application by carrying out linear prediction analysis to primary speech signal, determine the corresponding original excitation of primary speech signal and First filter, and by adjusting the corresponding pole angle information of first filter and the corresponding pole amplitude of first filter In information at least one of, the first filter after being adjusted, then using primary speech signal it is corresponding it is original excitation with And first filter adjusted determines targeted voice signal, it can realizes to the corresponding formant frequency of primary speech signal And at least one in formant acutance is adjusted, and obtains targeted voice signal, so as to realize to user's input Voice carries out the change of voice, and then can promote user experience.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application Attached drawing to be used is needed to be briefly described.

Fig. 1 is a kind of flow diagram of audio signal processing method provided by the embodiments of the present application；

Fig. 2 is change of voice display interface schematic diagram provided by the embodiments of the present application；

Fig. 3 is flow diagram of the primary speech signal change of voice provided by the embodiments of the present application at flu sound；

Fig. 4 is the corresponding sound spectrograph schematic diagram of primary speech signal provided by the embodiments of the present application；

Fig. 5 be the change of voice provided by the embodiments of the present application be catch a cold sound after sound spectrograph schematic diagram；

Fig. 6 is a kind of structural schematic diagram of speech signal processing device provided by the embodiments of the present application；

Fig. 7 is a kind of structural schematic diagram of the electronic equipment of Speech processing provided by the embodiments of the present application.

Specific embodiment

Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the application, and cannot be construed to the limitation to the application.

Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.

First to this application involves several nouns be introduced and explain:

QR decomposition method: by matrix decomposition at an orthonomal matrix Q and upper triangular matrix R.Specifically, QR decomposition method For the most effective and widely applied method for seeking general matrix All Eigenvalues, general matrix first pass through it is orthogonal it is similar be changing into for Heisenberg (Hessenberg) matrix, then reapplies QR method finding eigenvalue and eigenvector；

Monic polynomial: the multinomial that leading coefficient is 1.

How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.

The embodiment of the present application provides a kind of audio signal processing method, as shown in Figure 1, this method comprises:

Step S101 obtains primary speech signal.

For the embodiment of the present application, primary speech signal in preset time period is obtained.For example, 25 milliseconds of preset time period, 20 milliseconds or 15 milliseconds.

Step S102 carries out linear prediction analysis to primary speech signal, determines that primary speech signal is corresponding original sharp It encourages and first filter.

For the embodiment of the present application, linear prediction analysis (Linear Prediction Analysis, LPA) is to carry out A kind of technological means of speech signal analysis describes signal by the way that signal to be regarded as to the output of a model, and with model parameter. In the embodiment of the present application, linear prediction analysis is carried out to primary speech signal mainly to come in fact by Linear prediction error fiker It is existing.

Specifically, step S102 can specifically include: be determined using primary speech signal and Linear prediction error fiker The corresponding original excitation of primary speech signal and first filter.

Step S103, the corresponding pole angle information of adjustment first filter and the corresponding pole amplitude of first filter At least one of in information, the first filter after being adjusted.

For the embodiment of the present application, sound is when by resonant cavity, by the filter action of cavity, so that different in frequency domain The energy of frequency is reallocated, because the resonant interaction of resonant cavity is strengthened, another part is then attenuated a part.By Uneven in Energy distribution, strong part is like crossing mountain peak, so referred to as formant.Wherein, formant and speech sound Color is closely related.In the embodiment of the present application, by the corresponding pole angle information of first filter and pole amplitude information In at least one of be adjusted, so as to adjust the corresponding pole location of first filter, and then adjust first filter so that The corresponding original excitation of primary speech signal after the first filter by adjusting after can to modify raw tone corresponding Formant, to realize change of voice effect.

It is specific to adjust the corresponding pole angle information of first filter and the corresponding pole amplitude information of first filter In at least one of mode embodiment as described below, details are not described herein.

Step S104 determines mesh based on the corresponding original excitation of primary speech signal and first filter adjusted Poster sound signal.

For the embodiment of the present application, the first filter after the corresponding original excitation of primary speech signal is adjusted is carried out After filtering processing, targeted voice signal is obtained.

The embodiment of the present application provides a kind of audio signal processing method, and compared with prior art, the embodiment of the present application is logical It crosses and linear prediction analysis is carried out to primary speech signal, determine the corresponding original excitation of primary speech signal and first filter, And by adjusting in the corresponding pole angle information of first filter and the corresponding pole amplitude information of first filter extremely One item missing, the first filter after being adjusted, then using primary speech signal it is corresponding it is original excitation and it is adjusted First filter determines targeted voice signal, it can realizes to the corresponding formant frequency of primary speech signal and formant At least one in acutance is adjusted, and obtains targeted voice signal, so as to realize that the voice to user's input becomes Sound, and then user experience can be promoted.

A kind of possible implementation of the embodiment of the present application, step S102 can specifically include: to primary speech signal into Row linear prediction analysis determines the corresponding prediction error information of primary speech signal；Based on the corresponding prediction of primary speech signal Control information determines the corresponding original excitation of primary speech signal and first filter.

For the embodiment of the present application, linear prediction point is carried out to primary speech signal using Linear prediction error fiker Analysis, determines the corresponding prediction error information of primary speech signal.

For the embodiment of the present application, the transmission function of Linear prediction error fiker can by following formula (1-1) come It indicates:

Wherein, A (z) characterizes the transmission function of Linear prediction error fiker, and z is a plural number, and p characterizes linear prediction and misses The order of poor filter, α_iI-th of coefficient of Linear prediction error fiker, the sequence number of i characterization parameter are characterized, " ∑ " is to ask And symbol.

For the embodiment of the present application, primary speech signal is input to Linear prediction error fiker, obtains raw tone The corresponding prediction error information of signal.Primary speech signal s (n) is input in formula (1-1), obtains prediction error letter Breath.

Wherein, prediction error information can be indicated by following formula (1-2):

Wherein, e (n) characterizes prediction error information, and n characterizes the time parameter as unit of the sampling period, s (n) characterization n-th The primary speech signal in a sampling period, p characterize the order of Linear prediction error fiker, α_iCharacterize linear prediction error filtering I-th of coefficient of device, n-i characterize the ith sample period before n-th of sampling period, and s (n-i) characterizes n-th of sampling period The primary speech signal in ith sample period before enablesThenFor the predicted value of s (n), In, " ^ " is predicted value symbol, and " ∑ " is summation symbol.

For the embodiment of the present application, it is based on the corresponding prediction error information of primary speech signal, determines primary speech signal The specific implementation of corresponding original excitation and first filter is as follows:

The alternatively possible implementation of the embodiment of the present application is based on the corresponding prediction error information of primary speech signal, Determine that the corresponding original excitation of primary speech signal and first filter can specifically include: corresponding based on primary speech signal Prediction error information, determine second filter；Primary speech signal is determined based on primary speech signal and second filter Corresponding original excitation, and first filter is determined based on second filter.

Wherein, second filter is the corresponding filter of linear prediction analysis.

For the embodiment of the present application, prediction error information can be indicated by formula (1-2), be carried out to formula (1-2) Processing, the coefficient { α of Linear prediction error fiker is calculated_i}_{I=1,2 ... p}, it is then based on Linear prediction error fiker Coefficient { α_i}_{I=1,2 ... p}, determine second filter, be then based on primary speech signal and second filter determines original language The corresponding original excitation of sound signal, and first filter is determined based on second filter.

Specifically how formula (1-2) is handled so that the coefficient of Linear prediction error fiker is calculated {α_i}_{I=1,2 ... p}, it is as follows:

For the embodiment of the present application, it is based on above-mentioned formula (1-2), by making prediction error information e (n) under some criterion The coefficient of Linear prediction error fiker is calculated in minimum.In the embodiment of the present application, make prediction error information e (n) at certain It is minimum under a criterion, it can be using the square mean error amount E [e for making prediction error information²(n)] minimum, prediction error information it is equal Square error value E [e²(n)] it can be indicated by following formula (1-3):

Wherein, E [e²(n)] square mean error amount for characterizing prediction error information, is also the mathematic expectaion of prediction error information Value, " E " are mathematic expectaion symbol, and e (n) characterizes prediction error information, and n characterizes the time parameter as unit of the sampling period, e² (n) square of prediction error information is characterized, s (n) characterizes the primary speech signal in n-th of sampling period, and p characterizes linear prediction and misses The order of poor filter, α_iI-th of coefficient of Linear prediction error fiker is characterized, n-i was characterized before n-th of sampling period Ith sample period, s (n-i) characterize the primary speech signal in the ith sample period before n-th of sampling period, and " ∑ " is Summation symbol.

For the embodiment of the present application, derivative operation on the one hand is carried out to formula (1-3) and makes derivation result 0, it may be assumed that

It enablesObtain following formula (1-4):

Wherein,For derivative operation symbol, E [e²(n)] square mean error amount for characterizing prediction error information is also named pre- The mathematical expectation of control information is surveyed, " E " is mathematic expectaion symbol, and e (n) characterizes prediction error information, and n was characterized with the sampling period For the time parameter of unit, e²(n) square of prediction error information, α are characterized_jCharacterize j-th of system of Linear prediction error fiker Number, p characterize the order of Linear prediction error fiker, and s (n-j) characterizes j-th of sampling period before n-th of sampling period Primary speech signal, " ∑ " are summation symbol.

Formula (1-2) is substituted into formula (1-4), following formula (1-5) is obtained:

Wherein, " E " is mathematic expectaion symbol, and s (n) characterizes the primary speech signal in n-th of sampling period, and n is characterized to adopt The sample period is the time parameter of unit, and s (n-j) characterizes the raw tone letter in j-th of sampling period before n-th of sampling period Number, n-j characterizes j-th of sampling period before n-th of sampling period, and p characterizes the order of Linear prediction error fiker, α_iTable I-th of coefficient of Linear prediction error fiker is levied, s (n-i) characterizes the ith sample period before n-th of sampling period Primary speech signal, n-i characterize the ith sample period before n-th of sampling period, r (j)=E [s (n) s (n-j)], r (j) J-th of value of the auto-correlation function of s (n) is characterized, r (j-i) characterizes-i values of jth of the auto-correlation function of s (n), and " ∑ " is to ask And symbol.

Wherein, formula (1-5) is a multi head linear equation, referred to as You Er-Wo Ke equation (Yule-Walker equation)。

For the embodiment of the present application, on the other hand make the square mean error amount of prediction error information minimum, i.e. calculating E [e² (n)] minimum value.In the embodiment of the present application, the minimum value of the square mean error amount of prediction error information can pass through following public affairs Formula (1-6) indicates:

Wherein, E_pCharacterize the minimum value of the square mean error amount of prediction error information, E [e²(n)] prediction error information is characterized Square mean error amount is also the mathematical expectation of prediction error information, and " E " is mathematic expectaion symbol, e (n) characterization prediction error letter Breath, e²(n) square of prediction error information is characterized, n characterizes the time parameter as unit of the sampling period, and s (n) is characterized n-th The primary speech signal in sampling period, p characterize the order of Linear prediction error fiker, α_iCharacterize Linear prediction error fiker I-th of coefficient, s (n-i) characterize n-th of sampling period before the ith sample period primary speech signal, n-i characterization Ith sample period before n-th of sampling period, r (0) characterize the 0th value of the auto-correlation function of s (n), and r (i) characterizes s (n) i-th of value of auto-correlation function, " ∑ " are summation symbol.

For the embodiment of the present application, derivative operation on the one hand is carried out to formula (1-3) and makes derivation result 0, obtains public affairs Formula (1-4), and formula (1-2) substitution formula (1-4) is obtained into formula (1-5)；On the other hand make E [e in formula (1-3)²(n)] Minimum obtains formula (1-6).In the embodiment of the present application, linear prediction analysis is obtained based on formula (1-5) and formula (1-6) Corresponding solution expression formula, wherein the corresponding solution expression formula of linear prediction analysis can be by following formula (1-7) come table Show:

Wherein, r (j) characterizes j-th of value of the auto-correlation function of s (n), and p characterizes the order of Linear prediction error fiker, α_iI-th of coefficient of Linear prediction error fiker is characterized, r (j-i) characterizes-i values of jth of the auto-correlation function of s (n), r (0) the 0th value of the auto-correlation function of s (n) is characterized, r (i) characterizes i-th of value of the auto-correlation function of s (n), E_pCharacterization prediction The minimum value of the square mean error amount of control information, " ∑ " are summation symbol.

For the embodiment of the present application, calculation processing is carried out to formula (1-7), available Linear prediction error fiker Coefficient { α_i}_{I=1,2 ... p}.In the embodiment of the present application, carrying out the key of calculation processing to formula (1-7) is in formula (1-7) R (j) solved, and to the r (j) in formula (1-7) carry out solve be related to ensemble average.In the embodiment of the present application, needle Pair signal be voice signal, it is generally the case that voice signal is considered as that is, in a short time the signal of short-term stationarity is recognized It is the stationary random signal of ergodicity for the corresponding random signal of voice signal, therefore, ensemble average is average equal to the time.At this Apply can use when solving r (j) in embodimentValuation is carried out to it.

For the embodiment of the present application, utilizeIt is related to ensemble average when carrying out valuation to r (j), Since the embodiment of the present application is directed to voice signal, and voice signal is considered as ergodic random letter in the short time Number, therefore, it can useValuation is carried out to r (j), due toThe solution to r (j) is not influenced, because This, removesAgain since n value is infinity, a default biggish value N to carry out valuation to r (j), specifically such as Under:

It is assumed that primary speech signal s (n) is 0 other than 0≤n≤N range, then the estimated value of r (j) can be by following Formula (1-8) indicates:

Wherein, r (j) characterizes j-th of value of the auto-correlation function of s (n), and p characterizes the order of Linear prediction error fiker, N characterizes the time parameter as unit of the sampling period, and N is preset one value, and s (n) characterizes the original language in n-th of sampling period Sound signal, s (n-j) characterize the primary speech signal in j-th of sampling period before n-th of sampling period, and " ∑ " is summation symbol Number.

For the embodiment of the present application, formula (1-8) remains even function characteristic r (j)=r (- j), utilizes even function spy Property rewrite formula (1-7), available following formula (1-9):

Wherein, r (0) characterizes the 0th value of the auto-correlation function of s (n), and r (1) characterizes the 1st of the auto-correlation function of s (n) A value, r (2) characterize the 2nd value of the auto-correlation function of s (n), and r (p-2) characterizes pth -2 values of the auto-correlation function of s (n), R (p-1) characterizes pth -1 value of the auto-correlation function of s (n), and r (p) characterizes p-th of value of the auto-correlation function of s (n), α₁Table Levy the 1st coefficient of Linear prediction error fiker, α₂Characterize the 2nd coefficient of Linear prediction error fiker, α_p1Characterize line 1 coefficient of pth of property prediction error filter, E_pCharacterize the minimum value of the square mean error amount of prediction error information.

For the embodiment of the present application, formula (1-9) is Tobe Ritz matrix, can use Paul levinson moral guest (Levinson- Durbin coefficient { the α of available Linear prediction error fiker after) algorithm solves formula (1-9)_i}_{I=1,2 ... p}。

For in the embodiment of the present application, by the coefficient { α of obtained Linear prediction error fiker_i}_{I=1,2 ... p}It substitutes into public Formula (1-1) determines that second filter, second filter can be indicated by formula (1-1).

For the embodiment of the present application, primary speech signal is s (n), and s (n) is input to determining second filter and is carried out Whitening processing, obtains the corresponding original excitation of primary speech signal, and original excitation can be indicated by formula (1-2).

For the embodiment of the present application, the inverse of the corresponding characterization formula of second filter is the corresponding table of first filter Levy formula.In the embodiment of the present application, first filter can be indicated by following formula (1-10):

Wherein, H (z) characterizes first filter, and z is a plural number, and p characterizes the order of Linear prediction error fiker, α_i I-th of coefficient of Linear prediction error fiker, the sequence number of i characterization parameter are characterized, " ∑ " is summation symbol.

The alternatively possible implementation of the embodiment of the present application, can also comprise determining that first before step S103 The corresponding pole of filter, and based on the corresponding pole of first filter determine the corresponding pole angle information of first filter with And the corresponding pole amplitude information of first filter.

Specifically, first filter can be indicated by formula (1-10), carry out relevant calculation to formula (1-10), can To obtain the corresponding pole of first filter, any pole includes pole angle information and pole amplitude information.In the application In embodiment, the denominator of solution formula (1-10) is all of 0.In the embodiment of the present application, it is solved using QR decomposition method public The denominator of formula (1-10) is all of 0, to obtain the corresponding whole poles of first filter, and then obtains first filter The corresponding pole angle value of whole poles and pole range value.

The corresponding whole poles of first filter are obtained especially by formula (1-10), and then obtain the complete of first filter The corresponding pole angle value of portion's pole and pole range value are as follows:

For the embodiment of the present application, have in the way of all that the denominator of QR decomposition method solution formula (1-10) is 0 Body may include:

Firstly, the denominator of formula (1-10) to be turned to the monic polynomial equation of n, and it is equal to the monic polynomial equation 0.Monic polynomial equation is equal to 0 can be indicated by following formula (1-11):

Q_n(x)=xⁿ+b_n-1x^n-1+…+b₁x+b₀=0 (1-11)

Wherein, Q_n(x) polynomial of degree n of the characterization about x, n characterize the number of independent variable x, and x characterizes independent variable, b₀, b₁,…,b_n-1Characterize equation Q_n(x) coefficient.

For the embodiment of the present application, formula (1-11) is considered as the characteristic equation of certain real number matrix, solution formula (1- 11) whole roots can be converted into the All Eigenvalues for solving the real number matrix.In the embodiment of the present application, to formula (1-11) It is written over, real number matrix shown in available following formula (1-12):

Wherein, B characterizes real number matrix, b₀,b₁,…,b_n-1It is the element of real number matrix B.

For the embodiment of the present application, formula (1-12) is upper H-matrix, can directly find out real number matrix B with QR decomposition method All Eigenvalues, I will not elaborate.In the embodiment of the present application, the All Eigenvalues for obtaining real number matrix B obtain The corresponding whole poles of one filter, any pole includes pole angle value and pole range value.

For example, pole 1 can be indicated by following formula (1-13):

Wherein, z₁Characterize pole 1, r₁Characterize the corresponding pole amplitude information of pole 1, ω₁Characterize the corresponding pole of pole 1 Angle information.

The alternatively possible implementation of the embodiment of the present application, S103 can specifically include: if first filter is corresponding Pole angle information meets preset condition, then according to the corresponding pole angle information of predetermined manner adjustment first filter and pole At least one of in point amplitude information.

For the embodiment of the present application, the corresponding pole angle information of first filter be can specifically include: at least one pole The corresponding pole angle value of point；The corresponding pole amplitude information of first filter can specifically include: at least one pole is corresponding Pole range value.

Further, it may include: first filter that the corresponding pole angle information of first filter, which meets preset condition, At least one corresponding pole corresponds to whether each pole angle value in pole angle value meets preset condition.Implement in the application In example, all poles of preset condition will be met, adjusts corresponding pole angle value and pole width according to predetermined manner At least one of in angle value.

It, can be according to default when the corresponding pole angle value of any pole belongs to [- a, a] for the embodiment of the present application Mode adjust in the corresponding pole angle information of first filter and pole amplitude information at least one of.

The embodiment of the present application can be adjusted only when belonging to [- a, a] to the corresponding pole angle value of any pole The corresponding pole angle value of the pole, without adjusting the corresponding pole range value of the pole.

With the first poleFor introduce, it can only adjust ω₁, and r₁It is constant.

A kind of possible implementation of the embodiment of the present application, according to the corresponding pole of predetermined manner adjustment first filter Angle information, comprising: the corresponding pole angle value of at least one pole is increased by the first preset threshold, and by least one pole The corresponding pole angle value of point reduces at least one in the second preset threshold.

For the embodiment of the present application, the first preset threshold may be the same or different with the second preset threshold.In this Shen It please be in embodiment without limitation.

For the embodiment of the present application, when the corresponding pole of first filter includes at least two, by least two poles Middle pole angle value belong to (0, a] pole angle value increase the first preset threshold, will belong in pole angle [- a, 0) pole Angle value reduces by the second preset threshold.

For the embodiment of the present application, when the corresponding pole of first filter includes one, if the pole angle of the pole Value belong to (0, a], then the pole angle value of the pole is increased by the first preset threshold, if the pole angle value of the pole belong to [- A, 0), then the pole angle value of the pole reduces by the second preset threshold.

With the first poleFor introduce, work as ω₁Belong to (0,3] when, ω₁Increase X, works as ω₁Belong to [- 3,0) When, ω₁Reduce X, wherein r₁It can be constant.

Wherein, [0.07,0.11] X ∈.

The embodiment of the present application can be adjusted simultaneously when the corresponding pole angle value of any pole belongs to [- a, a] The corresponding pole angle value of the pole and pole range value.

Wherein, for the adjustment mode of the corresponding pole angle value of pole as described above, details are not described herein.

It is corresponding to adjust the first filter according to presetting method for the alternatively possible implementation of the embodiment of the present application Pole amplitude information, comprising: the corresponding pole range value of at least one described pole is adjusted according to presupposition multiple.

For the embodiment of the present application, the corresponding pole angle value adjustment of all poles that pole angle value is belonged into [- a, a] To corresponding Y times.

With the first poleFor introduce, work as ω₁When [- 3,3] ∈, by r₁It is adjustable to its corresponding Y Times.

Wherein, [0.8,1.2] Y ∈.

The embodiment of the present application only can be adjusted only when the corresponding pole angle value of any pole belongs to [- a, a] The corresponding pole range value of any pole, without adjusting pole angle value.In the embodiment of the present application, for pole range value Adjustment as it appears from the above, details are not described herein.

For the embodiment of the present application, by the corresponding pole angle information of first filter and pole amplitude information At least one of be adjusted, i.e., to the corresponding formant frequency of primary speech signal and the corresponding formant of primary speech signal At least one in acutance is adjusted.

For example, targeted voice signal is the corresponding signal of flu sound, then the corresponding sound spectrograph of primary speech signal such as Fig. 4 Shown, the change of voice is the sound spectrograph after catching a cold sound as shown in figure 5, horizontal axis indicates the time in Fig. 4 and Fig. 5, and the longitudinal axis indicates frequency Rate, the brightness of a point shows the amplitude of the corresponding frequency content of point in sound spectrograph, and a point is brighter to show that the point is corresponding The amplitude of frequency content is bigger, and a point more secretly shows that the amplitude of the corresponding frequency content of point is smaller, when certain point is corresponding When the amplitude of frequency content is greater than the amplitude of the corresponding frequency content of surrounding each point, which is a formant.In the application reality It applies in example, formant concentrates between frequency 0 to 1000 in region 1 as shown in Figure 4, and formant is concentrated in region 2 as shown in Figure 5 Between frequency 1000 to 3000, therefore, the corresponding formant of the voice signal after being adjusted is shown by Fig. 4 and Fig. 5 comparison Frequency upper shift (raising)；The acutance of the difference characterization formant of brightness in Fig. 4 and Fig. 5, as shown in figure 4, formant difference in brightness It is larger to show that formant is sharper (formant acutance is larger), as shown in figure 5, formant difference in brightness is smaller to show that formant is relatively slow (formant acutance is smaller) therefore shows the corresponding formant acutance of the voice signal after being adjusted by Fig. 4 and Fig. 5 comparison It reduces.

For the embodiment of the present application, by the corresponding pole angle information of first filter and pole amplitude information At least one of be adjusted, i.e., to the corresponding formant frequency of primary speech signal and the corresponding formant of primary speech signal At least one in acutance is adjusted, so that target voice and raw tone difference by obtaining after above-mentioned adjustment, real Change of voice effect is showed, user experience may further be promoted.

The alternatively possible implementation of the embodiment of the present application, S101 can also include: the language for obtaining user's input before Sound signal；Denoising is carried out to the voice signal of user's input, and using the voice signal after denoising as raw tone Signal.

For the embodiment of the present application, the voice signal of user's input may include the voice signal that user inputs in real time, It may include the voice signal being locally stored.In the embodiment of the present application without limitation.

For the embodiment of the present application, the voice signal of the user that can will acquire during practical application input into Row denoising can not also believe the voice that user inputs using the voice signal after denoising as primary speech signal Number carry out denoising, i.e., using user input voice signal as primary speech signal.It does not limit in the embodiment of the present application It is fixed.

For the embodiment of the present application, can be gone by the voice signal that a variety of feasible denoising modes input user It makes an uproar processing.It is carried out at denoising for example, the voice signal of user's input is inputted in trained noise separation neural network model Reason.

For the embodiment of the present application, above-described embodiment can be executed by terminal device, can also be executed by server, can also It is executed with part by terminal device, is partially executed by server.In the embodiment of the present application without limitation.

Above-described embodiment, which describes in detail, carries out Speech processing to primary speech signal, obtains targeted voice signal (voice signal after the change of voice), the primary speech signal change of voice (is flu sound letter by following specific application scenarios of combination Number), the specific implementation of the application is introduced, specific as follows shown:

If being flu sound by the voice change of voice of user's input, obtaining primary speech signal, (primary speech signal can be with Voice to input user carries out the voice signal obtained after denoising, can also be the corresponding language of voice of user's input Sound signal), then by primary speech signal by voice change process, voice signal after obtaining voice change process, and by voice change process Voice signal afterwards by internet is sent to opposite end after being encoded, and opposite end is broadcast after being decoded to the information received It puts, that is, the voice played is sound of catching a cold, specific as shown in Figure 3.

For the embodiment of the present application, user can with displaying target object in trigger action interface (" askew fruit people ", " flu ", " tired beast " and " netting red female ") in any one, as shown in Fig. 2, if targeted voice signal is the corresponding voice of flu sound Signal, then user can trigger " flu " object in operation interface as shown in Figure 2.

For the embodiment of the present application, by primary speech signal by voice change process include: i.e. by primary speech signal it is true Its fixed corresponding original excitation and first filter, and by pole angle value ∈ in the corresponding pole of first filter (0,3] All poles pole angle value increase X (X ∈ [0.07,0.11]), by pole angle in the corresponding pole of first filter Value ∈ (- 3,0] all poles in pole angle value reduce X (X ∈ [0.07,0.11])；And/or by first filter pair The pole range value of all poles of pole angle value ∈ [- 3,3] is adjusted to its corresponding 0.8~1.2 times in the pole answered, and obtains To first filter adjusted, then first filter of the original excitation by adjusting after carries out voice change process.

It is above-mentioned to specifically describe audio signal processing method from the angle of method and step, below from virtual module or virtually The angle of unit introduces speech signal processing device, specific as follows shown:

The embodiment of the present application provides a kind of speech signal processing device, as shown in fig. 6, the speech signal processing device 60 It include: the first acquisition module 601, the first determining module 602, adjustment module 603 and the second determining module 604, wherein

First obtains module 601, for obtaining primary speech signal.

First determining module 602 determines primary speech signal pair for carrying out linear prediction analysis to primary speech signal The original excitation answered and first filter.

Module 603 is adjusted, it is corresponding for adjusting the corresponding pole angle information of first filter and first filter At least one of in pole amplitude information, the first filter after being adjusted.

Second determining module 604, for based on the corresponding original excitation of primary speech signal and the first filter adjusted Wave device, determines targeted voice signal.

A kind of possible implementation of the embodiment of the present application, the first determining module 602 can specifically include first and determine list Member and the second determination unit, wherein

First determination unit determines that primary speech signal is corresponding for carrying out linear prediction analysis to primary speech signal Prediction error information.

The alternatively possible implementation of the embodiment of the present application, the second determination unit are specifically used for believing based on raw tone Number corresponding prediction error information determines that second filter, second filter are the corresponding filter of linear prediction analysis.

The alternatively possible implementation of the embodiment of the present application adjusts module 603, is specifically used for when first filter is corresponding Pole angle information when meeting preset condition, according to the corresponding pole angle information of predetermined manner adjustment first filter and At least one of in pole amplitude information.

The alternatively possible implementation of the embodiment of the present application, the corresponding pole angle information of first filter include: to Few corresponding pole angle value of a pole；It adjusts module 603 and includes at least one in adding unit and reduction unit, In,

Adding unit, for the corresponding pole angle value of at least one pole to be increased by the first preset threshold.

The alternatively possible implementation of the embodiment of the present application, the corresponding pole amplitude information of first filter include: to Few corresponding pole range value of a pole.

Module 603 is adjusted, specifically for adjusting the corresponding pole range value of at least one pole according to presupposition multiple It is whole.

The alternatively possible implementation of the embodiment of the present application, speech signal processing device 60 further include: second obtains mould Block and denoising module:

Second obtains module, for obtaining the voice signal of user's input.

Denoising module, voice signal for inputting to user carry out denoising, and by the language after denoising Sound signal is as the primary speech signal.

For the embodiment of the present application, the first acquisition module and the second acquisition module can be the same acquisition module, can also Think two different acquisition modules.In the embodiment of the present application without limitation.

Voice signal shown in above method embodiment can be performed in speech signal processing device provided by the embodiments of the present application The corresponding operation of processing method, realization principle is similar, and details are not described herein again.

This application provides a kind of speech signal processing devices, and compared with prior art, the application passes through to raw tone Signal carries out linear prediction analysis, determines the corresponding original excitation of primary speech signal and first filter, and by adjusting the At least one of in the corresponding pole angle information of one filter and the corresponding pole amplitude information of first filter, it is adjusted Then first filter after whole is determined using the corresponding original excitation of primary speech signal and first filter adjusted Targeted voice signal, it can realize at least one in the corresponding formant frequency of primary speech signal and formant acutance Item is adjusted, and obtains targeted voice signal, so as to realize that the voice to user's input carries out the change of voice, and then can be promoted User experience.

The above-mentioned angle from virtual module or dummy unit introduces the speech signal processing device of the application, below from reality The angle of body device introduces a kind of electronic equipment, and the electronic equipment in the embodiment of the present application can be terminal device, or Server, in the embodiment of the present application without limitation.

The embodiment of the present application provides a kind of electronic equipment, as shown in fig. 7, electronic equipment shown in Fig. 7 4000 includes: place Manage device 4001 and memory 4003.Wherein, processor 4001 is connected with memory 4003, is such as connected by bus 4002.It is optional Ground, electronic equipment 4000 can also include transceiver 4004.It should be noted that transceiver 4004 is not limited to one in practical application A, the structure of the electronic equipment 4000 does not constitute the restriction to the embodiment of the present application.

Processor 4001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure Various illustrative logic blocks, module and circuit.Processor 4001 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..

Bus 4002 may include an access, and information is transmitted between said modules.Bus 4002 can be pci bus or Eisa bus etc..Bus 4002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 7 convenient for indicating One thick line indicates, it is not intended that an only bus or a type of bus.

Memory 4003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.

Memory 4003 is used to store the application code for executing application scheme, and is held by processor 4001 to control Row.Processor 4001 is for executing the application code stored in memory 4003, to realize aforementioned either method embodiment Shown in content.

The embodiment of the present application provides a kind of electronic equipment, and the electronic equipment in the embodiment of the present application includes: one or more A processor；Memory；One or more application program, wherein one or more application programs are stored in memory and quilt Be configured to be performed by one or more processors, one or more programs are configured to: execute preceding method embodiment or its The corresponding operation of audio signal processing method shown in any possibility implementation, can realize: this Shen compared with prior art Please by carrying out linear prediction analysis to primary speech signal, the corresponding original excitation of primary speech signal and the first filtering are determined Device, and by adjusting in the corresponding pole angle information of first filter and the corresponding pole amplitude information of first filter At least one of, the first filter after being adjusted, after then utilizing the corresponding original excitation of primary speech signal and adjustment First filter determine targeted voice signal, it can realize to the corresponding formant frequency of primary speech signal and resonance At least one in peak acutance is adjusted, and obtains targeted voice signal, so as to realize that the voice to user's input carries out The change of voice, and then user experience can be promoted.

It is above-mentioned to introduce a kind of Speech processing electronic equipment from the angle of entity apparatus, below from the angle of storage medium Introduce a kind of computer readable storage medium.

The embodiment of the present application provides a kind of computer readable storage medium, is stored on the computer readable storage medium Computer program when the program is executed by processor, is realized shown in preceding method embodiment or any possible implementation Audio signal processing method.Compared with prior art, it by carrying out linear prediction analysis to primary speech signal, determines original The corresponding original excitation of voice signal and first filter, and by adjusting the corresponding pole angle information of first filter and At least one of in the corresponding pole amplitude information of first filter, the first filter after being adjusted, then using original The corresponding original excitation of voice signal and first filter adjusted determine targeted voice signal, it can realize to original At least one in the corresponding formant frequency of voice signal and formant acutance is adjusted, and obtains targeted voice signal, So as to realize that the voice to user's input carries out the change of voice, and then user experience can be promoted.

It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

The above is only some embodiments of the application, it is noted that for the ordinary skill people of the art For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered It is considered as the protection scope of the application.

Claims

1. a kind of audio signal processing method characterized by comprising

Obtain primary speech signal；

To the primary speech signal carry out linear prediction analysis, determine the primary speech signal it is corresponding it is original excitation and First filter；

Adjust the corresponding pole angle information of the first filter and the corresponding pole amplitude information of the first filter At least one of in, the first filter after being adjusted；

Based on the corresponding original excitation of the primary speech signal and the first filter adjusted, target voice is determined Signal.

2. the method according to claim 1, wherein described carry out linear prediction point to the primary speech signal Analysis, determines the corresponding original excitation of the primary speech signal and first filter, comprising:

Linear prediction analysis is carried out to the primary speech signal, determines the corresponding prediction error letter of the primary speech signal Breath；

Based on the corresponding prediction error information of the primary speech signal, the corresponding original excitation of the primary speech signal is determined And first filter.

3. according to the method described in claim 2, it is characterized in that, described missed based on the corresponding prediction of the primary speech signal Poor information determines the corresponding original excitation of the primary speech signal and first filter, comprising:

Based on the corresponding prediction error information of the primary speech signal, determine that second filter, the second filter are institute State the corresponding filter of linear prediction analysis；

The corresponding original excitation of the primary speech signal is determined based on the primary speech signal and the second filter, And the first filter is determined based on the second filter.

4. method according to claim 1-3, which is characterized in that the adjustment first filter is corresponding At least one of in pole angle information and the corresponding pole amplitude information of the first filter, first after being adjusted Filter, comprising:

If the corresponding pole angle information of the first filter meets preset condition, according to predetermined manner adjustment described first At least one of in the corresponding pole angle information of filter and pole amplitude information.

5. according to the method described in claim 4, it is characterized in that, the corresponding pole angle information packet of the first filter It includes: the corresponding pole angle value of at least one pole；

The corresponding pole angle information of the first filter is adjusted according to predetermined manner, is included at least one of the following:

It will the first preset threshold of the corresponding pole angle value increase of at least one pole；

It will the second preset threshold of the corresponding pole angle value reduction of at least one pole.

6. according to the method described in claim 5, it is characterized in that, the corresponding pole amplitude information packet of the first filter It includes: the corresponding pole range value of at least one pole；

The corresponding pole amplitude information of the first filter is adjusted according to presetting method, comprising:

The corresponding pole range value of at least one described pole is adjusted according to presupposition multiple.

7. the method according to claim 1, wherein the acquisition primary speech signal, before further include:

Obtain the voice signal of user's input；

Denoising is carried out to the voice signal of user input, and using the voice signal after denoising as described original Voice signal.

8. a kind of speech signal processing device characterized by comprising

First obtains module, for obtaining primary speech signal；

First determining module determines the primary speech signal for carrying out linear prediction analysis to the primary speech signal Corresponding original excitation and first filter；

Module is adjusted, it is corresponding for adjusting the corresponding pole angle information of the first filter and the first filter At least one of in pole amplitude information, the first filter after being adjusted；

Second determining module, for based on the corresponding original excitation of the primary speech signal and first filter adjusted Wave device, determines targeted voice signal.

9. a kind of electronic equipment, characterized in that it comprises:

One or more processors；

Memory；

One or more application program, wherein one or more of application programs are stored in the memory and are configured To be executed by one or more of processors, one or more of programs are configured to: being executed according to claim 1~7 Described in any item audio signal processing methods.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Claim 1~7 described in any item audio signal processing methods are realized when execution.