CN110164461A - Audio signal processing method, device, electronic equipment and storage medium - Google Patents
Audio signal processing method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110164461A CN110164461A CN201910611481.2A CN201910611481A CN110164461A CN 110164461 A CN110164461 A CN 110164461A CN 201910611481 A CN201910611481 A CN 201910611481A CN 110164461 A CN110164461 A CN 110164461A
- Authority
- CN
- China
- Prior art keywords
- filter
- speech signal
- primary speech
- pole
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 25
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 230000005284 excitation Effects 0.000 claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 34
- 238000004458 analytical method Methods 0.000 claims abstract description 30
- 238000012545 processing Methods 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 2
- 230000008859 change Effects 0.000 abstract description 21
- 238000005070 sampling Methods 0.000 description 26
- 239000011159 matrix material Substances 0.000 description 16
- 238000005311 autocorrelation function Methods 0.000 description 15
- 238000012512 characterization method Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000000354 decomposition reaction Methods 0.000 description 6
- 238000001914 filtration Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/07—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
- H04L51/10—Multimedia information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/07—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
- H04L51/18—Commands or executable codes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Filters That Use Time-Delay Elements (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The embodiment of the present application provides a kind of audio signal processing method, device, electronic equipment and storage medium.This method comprises: obtaining primary speech signal, linear prediction analysis is carried out to primary speech signal, determine the corresponding original excitation of primary speech signal and first filter, at least one of in the corresponding pole angle information of adjustment first filter and the corresponding pole amplitude information of first filter, first filter after being adjusted, based on the corresponding original excitation of primary speech signal and first filter adjusted, targeted voice signal is determined.The embodiment of the present application, which is realized, is adjusted at least one in the corresponding formant frequency of primary speech signal and formant acutance, obtains targeted voice signal, so as to realize that the voice to user's input carries out the change of voice, and then can promote user experience.
Description
Technical field
This application involves signal processing technology fields, specifically, this application involves a kind of audio signal processing methods, dress
It sets, electronic equipment and storage medium.
Background technique
With the development of mobile communication, types of applications program is come into being, such as some application journeys for having communication function
Sequence.The application program that can have communication function between user and user by these carries out interactive voice, i.e., passes through user
The information of phonetic matrix input is sent to peer user, to realize information exchange.
During carrying out information exchange by voice mode between user and user, in order to increase information interactive process
Middle interest is sent to opposite end after the voice messaging that user inputs being carried out voice change process, so that opposite end received
Voice messaging is different from the voice messaging of user's input.
But how to carry out the change of voice to the voice of user's input becomes a critical issue.
Summary of the invention
This application provides a kind of audio signal processing method, device, electronic equipment and storage mediums, can solve above
At least one technical problem.The technical solution is as follows:
In a first aspect, a kind of audio signal processing method is provided, this method comprises:
Obtain primary speech signal;
Linear prediction analysis is carried out to primary speech signal, determines the corresponding original excitation of primary speech signal and first
Filter;
It adjusts in the corresponding pole angle information of first filter and the corresponding pole amplitude information of first filter
At least one of, the first filter after being adjusted;
Based on the corresponding original excitation of primary speech signal and first filter adjusted, target language message is determined
Number.
In a possible implementation, linear prediction analysis is carried out to primary speech signal, determines that raw tone is believed
Number corresponding original excitation and first filter, comprising:
Linear prediction analysis is carried out to primary speech signal, determines the corresponding prediction error information of primary speech signal;
Based on the corresponding prediction error information of primary speech signal, determine the corresponding original excitation of primary speech signal and
First filter.
In another possible implementation, it is based on the corresponding prediction error information of primary speech signal, is determined original
The corresponding original excitation of voice signal and first filter, comprising:
Based on the corresponding prediction error information of primary speech signal, determine that second filter, second filter are linear pre-
It surveys and analyzes corresponding filter;
The corresponding original excitation of primary speech signal is determined based on primary speech signal and second filter, and is based on
Second filter determines first filter.
In another possible implementation, the corresponding pole angle information of adjustment first filter and the first filtering
At least one of in the corresponding pole amplitude information of device, the first filter after being adjusted, comprising:
If the corresponding pole angle information of first filter meets preset condition, according to the first filtering of predetermined manner adjustment
At least one of in the corresponding pole angle information of device and pole amplitude information.
In another possible implementation, the corresponding pole angle information of first filter includes: at least one pole
The corresponding pole angle value of point;
According to the corresponding pole angle information of predetermined manner adjustment first filter, include at least one of the following:
The corresponding pole angle value of at least one pole is increased by the first preset threshold;
The corresponding pole angle value of at least one pole is reduced by the second preset threshold.
In another possible implementation, the corresponding pole amplitude information of first filter includes: at least one pole
The corresponding pole range value of point;
According to the corresponding pole amplitude information of presetting method adjustment first filter, comprising:
The corresponding pole range value of at least one pole is adjusted according to presupposition multiple.
In another possible implementation, primary speech signal is obtained, before further include:
Obtain the voice signal of user's input;
Denoising is carried out to the voice signal of user's input, and using the voice signal after denoising as raw tone
Signal.
Second aspect provides a kind of speech signal processing device, which includes:
First obtains module, for obtaining primary speech signal;
First determining module determines that primary speech signal is corresponding for carrying out linear prediction analysis to primary speech signal
Original excitation and first filter;
Module is adjusted, for adjusting the corresponding pole angle information of first filter and the corresponding pole of first filter
At least one of in amplitude information, the first filter after being adjusted;
Second determining module, for based on the corresponding original excitation of primary speech signal and the first filtering adjusted
Device determines targeted voice signal.
In a possible implementation, the first determining module includes the first determination unit and the second determination unit,
In,
First determination unit determines that primary speech signal is corresponding for carrying out linear prediction analysis to primary speech signal
Prediction error information;
Second determination unit determines primary speech signal for being based on the corresponding prediction error information of primary speech signal
Corresponding original excitation and first filter.
In another possible implementation, the second determination unit is specifically used for corresponding based on primary speech signal
Prediction error information determines that second filter, second filter are the corresponding filter of linear prediction analysis;
Second determination unit is specifically also used to determine primary speech signal based on primary speech signal and second filter
Corresponding original excitation, and first filter is determined based on second filter.
In another possible implementation, module is adjusted, is specifically used for working as the corresponding pole angle of first filter
When information meets preset condition, believe according to the corresponding pole angle information of predetermined manner adjustment first filter and pole amplitude
At least one of in breath.
In another possible implementation, the corresponding pole angle information of first filter includes: at least one pole
The corresponding pole angle value of point;Adjustment module include: adding unit and reduce unit at least one of, wherein
Adding unit, for the corresponding pole angle value of at least one pole to be increased by the first preset threshold;
Unit is reduced, for the corresponding pole angle value of at least one pole to be reduced by the second preset threshold.
In another possible implementation, the corresponding pole amplitude information of first filter includes: at least one pole
The corresponding pole range value of point;
Module is adjusted, specifically for the corresponding pole range value of at least one pole to be adjusted according to presupposition multiple.
In another possible implementation, speech signal processing device further include: second obtains at module and denoising
Manage module, wherein
Second obtains module, for obtaining the voice signal of user's input;
The voice signal progress denoising that denoising module is used to input user, and by the voice after denoising
Signal is as primary speech signal.
The third aspect provides a kind of electronic equipment, which includes:
One or more processors;
Memory;
One or more application program, wherein one or more application programs be stored in memory and be configured as by
One or more processors execute, and one or more programs are configured to: executing first aspect or any possibility of first aspect
Implementation shown in the corresponding operation of audio signal processing method.
Fourth aspect provides a kind of computer readable storage medium, is stored thereon with computer program, which is located
It manages when device executes and realizes audio signal processing method shown in first aspect or any possible implementation of first aspect.
Technical solution provided by the embodiments of the present application has the benefit that
This application provides a kind of audio signal processing method, device, electronic equipment and storage mediums, with prior art phase
Than, the application by carrying out linear prediction analysis to primary speech signal, determine the corresponding original excitation of primary speech signal and
First filter, and by adjusting the corresponding pole angle information of first filter and the corresponding pole amplitude of first filter
In information at least one of, the first filter after being adjusted, then using primary speech signal it is corresponding it is original excitation with
And first filter adjusted determines targeted voice signal, it can realizes to the corresponding formant frequency of primary speech signal
And at least one in formant acutance is adjusted, and obtains targeted voice signal, so as to realize to user's input
Voice carries out the change of voice, and then can promote user experience.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application
Attached drawing to be used is needed to be briefly described.
Fig. 1 is a kind of flow diagram of audio signal processing method provided by the embodiments of the present application;
Fig. 2 is change of voice display interface schematic diagram provided by the embodiments of the present application;
Fig. 3 is flow diagram of the primary speech signal change of voice provided by the embodiments of the present application at flu sound;
Fig. 4 is the corresponding sound spectrograph schematic diagram of primary speech signal provided by the embodiments of the present application;
Fig. 5 be the change of voice provided by the embodiments of the present application be catch a cold sound after sound spectrograph schematic diagram;
Fig. 6 is a kind of structural schematic diagram of speech signal processing device provided by the embodiments of the present application;
Fig. 7 is a kind of structural schematic diagram of the electronic equipment of Speech processing provided by the embodiments of the present application.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, and is only used for explaining the application, and cannot be construed to the limitation to the application.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one
It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application
Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition
Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member
Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be
Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange
Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.
First to this application involves several nouns be introduced and explain:
QR decomposition method: by matrix decomposition at an orthonomal matrix Q and upper triangular matrix R.Specifically, QR decomposition method
For the most effective and widely applied method for seeking general matrix All Eigenvalues, general matrix first pass through it is orthogonal it is similar be changing into for
Heisenberg (Hessenberg) matrix, then reapplies QR method finding eigenvalue and eigenvector;
Monic polynomial: the multinomial that leading coefficient is 1.
How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned
Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept
Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.
The embodiment of the present application provides a kind of audio signal processing method, as shown in Figure 1, this method comprises:
Step S101 obtains primary speech signal.
For the embodiment of the present application, primary speech signal in preset time period is obtained.For example, 25 milliseconds of preset time period,
20 milliseconds or 15 milliseconds.
Step S102 carries out linear prediction analysis to primary speech signal, determines that primary speech signal is corresponding original sharp
It encourages and first filter.
For the embodiment of the present application, linear prediction analysis (Linear Prediction Analysis, LPA) is to carry out
A kind of technological means of speech signal analysis describes signal by the way that signal to be regarded as to the output of a model, and with model parameter.
In the embodiment of the present application, linear prediction analysis is carried out to primary speech signal mainly to come in fact by Linear prediction error fiker
It is existing.
Specifically, step S102 can specifically include: be determined using primary speech signal and Linear prediction error fiker
The corresponding original excitation of primary speech signal and first filter.
Step S103, the corresponding pole angle information of adjustment first filter and the corresponding pole amplitude of first filter
At least one of in information, the first filter after being adjusted.
For the embodiment of the present application, sound is when by resonant cavity, by the filter action of cavity, so that different in frequency domain
The energy of frequency is reallocated, because the resonant interaction of resonant cavity is strengthened, another part is then attenuated a part.By
Uneven in Energy distribution, strong part is like crossing mountain peak, so referred to as formant.Wherein, formant and speech sound
Color is closely related.In the embodiment of the present application, by the corresponding pole angle information of first filter and pole amplitude information
In at least one of be adjusted, so as to adjust the corresponding pole location of first filter, and then adjust first filter so that
The corresponding original excitation of primary speech signal after the first filter by adjusting after can to modify raw tone corresponding
Formant, to realize change of voice effect.
It is specific to adjust the corresponding pole angle information of first filter and the corresponding pole amplitude information of first filter
In at least one of mode embodiment as described below, details are not described herein.
Step S104 determines mesh based on the corresponding original excitation of primary speech signal and first filter adjusted
Poster sound signal.
For the embodiment of the present application, the first filter after the corresponding original excitation of primary speech signal is adjusted is carried out
After filtering processing, targeted voice signal is obtained.
The embodiment of the present application provides a kind of audio signal processing method, and compared with prior art, the embodiment of the present application is logical
It crosses and linear prediction analysis is carried out to primary speech signal, determine the corresponding original excitation of primary speech signal and first filter,
And by adjusting in the corresponding pole angle information of first filter and the corresponding pole amplitude information of first filter extremely
One item missing, the first filter after being adjusted, then using primary speech signal it is corresponding it is original excitation and it is adjusted
First filter determines targeted voice signal, it can realizes to the corresponding formant frequency of primary speech signal and formant
At least one in acutance is adjusted, and obtains targeted voice signal, so as to realize that the voice to user's input becomes
Sound, and then user experience can be promoted.
A kind of possible implementation of the embodiment of the present application, step S102 can specifically include: to primary speech signal into
Row linear prediction analysis determines the corresponding prediction error information of primary speech signal;Based on the corresponding prediction of primary speech signal
Control information determines the corresponding original excitation of primary speech signal and first filter.
For the embodiment of the present application, linear prediction point is carried out to primary speech signal using Linear prediction error fiker
Analysis, determines the corresponding prediction error information of primary speech signal.
For the embodiment of the present application, the transmission function of Linear prediction error fiker can by following formula (1-1) come
It indicates:
Wherein, A (z) characterizes the transmission function of Linear prediction error fiker, and z is a plural number, and p characterizes linear prediction and misses
The order of poor filter, αiI-th of coefficient of Linear prediction error fiker, the sequence number of i characterization parameter are characterized, " ∑ " is to ask
And symbol.
For the embodiment of the present application, primary speech signal is input to Linear prediction error fiker, obtains raw tone
The corresponding prediction error information of signal.Primary speech signal s (n) is input in formula (1-1), obtains prediction error letter
Breath.
Wherein, prediction error information can be indicated by following formula (1-2):
Wherein, e (n) characterizes prediction error information, and n characterizes the time parameter as unit of the sampling period, s (n) characterization n-th
The primary speech signal in a sampling period, p characterize the order of Linear prediction error fiker, αiCharacterize linear prediction error filtering
I-th of coefficient of device, n-i characterize the ith sample period before n-th of sampling period, and s (n-i) characterizes n-th of sampling period
The primary speech signal in ith sample period before enablesThenFor the predicted value of s (n),
In, " ^ " is predicted value symbol, and " ∑ " is summation symbol.
For the embodiment of the present application, it is based on the corresponding prediction error information of primary speech signal, determines primary speech signal
The specific implementation of corresponding original excitation and first filter is as follows:
The alternatively possible implementation of the embodiment of the present application is based on the corresponding prediction error information of primary speech signal,
Determine that the corresponding original excitation of primary speech signal and first filter can specifically include: corresponding based on primary speech signal
Prediction error information, determine second filter;Primary speech signal is determined based on primary speech signal and second filter
Corresponding original excitation, and first filter is determined based on second filter.
Wherein, second filter is the corresponding filter of linear prediction analysis.
For the embodiment of the present application, prediction error information can be indicated by formula (1-2), be carried out to formula (1-2)
Processing, the coefficient { α of Linear prediction error fiker is calculatedi}I=1,2 ... p, it is then based on Linear prediction error fiker
Coefficient { αi}I=1,2 ... p, determine second filter, be then based on primary speech signal and second filter determines original language
The corresponding original excitation of sound signal, and first filter is determined based on second filter.
Specifically how formula (1-2) is handled so that the coefficient of Linear prediction error fiker is calculated
{αi}I=1,2 ... p, it is as follows:
For the embodiment of the present application, it is based on above-mentioned formula (1-2), by making prediction error information e (n) under some criterion
The coefficient of Linear prediction error fiker is calculated in minimum.In the embodiment of the present application, make prediction error information e (n) at certain
It is minimum under a criterion, it can be using the square mean error amount E [e for making prediction error information2(n)] minimum, prediction error information it is equal
Square error value E [e2(n)] it can be indicated by following formula (1-3):
Wherein, E [e2(n)] square mean error amount for characterizing prediction error information, is also the mathematic expectaion of prediction error information
Value, " E " are mathematic expectaion symbol, and e (n) characterizes prediction error information, and n characterizes the time parameter as unit of the sampling period, e2
(n) square of prediction error information is characterized, s (n) characterizes the primary speech signal in n-th of sampling period, and p characterizes linear prediction and misses
The order of poor filter, αiI-th of coefficient of Linear prediction error fiker is characterized, n-i was characterized before n-th of sampling period
Ith sample period, s (n-i) characterize the primary speech signal in the ith sample period before n-th of sampling period, and " ∑ " is
Summation symbol.
For the embodiment of the present application, derivative operation on the one hand is carried out to formula (1-3) and makes derivation result 0, it may be assumed that
It enablesObtain following formula (1-4):
Wherein,For derivative operation symbol, E [e2(n)] square mean error amount for characterizing prediction error information is also named pre-
The mathematical expectation of control information is surveyed, " E " is mathematic expectaion symbol, and e (n) characterizes prediction error information, and n was characterized with the sampling period
For the time parameter of unit, e2(n) square of prediction error information, α are characterizedjCharacterize j-th of system of Linear prediction error fiker
Number, p characterize the order of Linear prediction error fiker, and s (n-j) characterizes j-th of sampling period before n-th of sampling period
Primary speech signal, " ∑ " are summation symbol.
Formula (1-2) is substituted into formula (1-4), following formula (1-5) is obtained:
Wherein, " E " is mathematic expectaion symbol, and s (n) characterizes the primary speech signal in n-th of sampling period, and n is characterized to adopt
The sample period is the time parameter of unit, and s (n-j) characterizes the raw tone letter in j-th of sampling period before n-th of sampling period
Number, n-j characterizes j-th of sampling period before n-th of sampling period, and p characterizes the order of Linear prediction error fiker, αiTable
I-th of coefficient of Linear prediction error fiker is levied, s (n-i) characterizes the ith sample period before n-th of sampling period
Primary speech signal, n-i characterize the ith sample period before n-th of sampling period, r (j)=E [s (n) s (n-j)], r (j)
J-th of value of the auto-correlation function of s (n) is characterized, r (j-i) characterizes-i values of jth of the auto-correlation function of s (n), and " ∑ " is to ask
And symbol.
Wherein, formula (1-5) is a multi head linear equation, referred to as You Er-Wo Ke equation (Yule-Walker
equation)。
For the embodiment of the present application, on the other hand make the square mean error amount of prediction error information minimum, i.e. calculating E [e2
(n)] minimum value.In the embodiment of the present application, the minimum value of the square mean error amount of prediction error information can pass through following public affairs
Formula (1-6) indicates:
Wherein, EpCharacterize the minimum value of the square mean error amount of prediction error information, E [e2(n)] prediction error information is characterized
Square mean error amount is also the mathematical expectation of prediction error information, and " E " is mathematic expectaion symbol, e (n) characterization prediction error letter
Breath, e2(n) square of prediction error information is characterized, n characterizes the time parameter as unit of the sampling period, and s (n) is characterized n-th
The primary speech signal in sampling period, p characterize the order of Linear prediction error fiker, αiCharacterize Linear prediction error fiker
I-th of coefficient, s (n-i) characterize n-th of sampling period before the ith sample period primary speech signal, n-i characterization
Ith sample period before n-th of sampling period, r (0) characterize the 0th value of the auto-correlation function of s (n), and r (i) characterizes s
(n) i-th of value of auto-correlation function, " ∑ " are summation symbol.
For the embodiment of the present application, derivative operation on the one hand is carried out to formula (1-3) and makes derivation result 0, obtains public affairs
Formula (1-4), and formula (1-2) substitution formula (1-4) is obtained into formula (1-5);On the other hand make E [e in formula (1-3)2(n)]
Minimum obtains formula (1-6).In the embodiment of the present application, linear prediction analysis is obtained based on formula (1-5) and formula (1-6)
Corresponding solution expression formula, wherein the corresponding solution expression formula of linear prediction analysis can be by following formula (1-7) come table
Show:
Wherein, r (j) characterizes j-th of value of the auto-correlation function of s (n), and p characterizes the order of Linear prediction error fiker,
αiI-th of coefficient of Linear prediction error fiker is characterized, r (j-i) characterizes-i values of jth of the auto-correlation function of s (n), r
(0) the 0th value of the auto-correlation function of s (n) is characterized, r (i) characterizes i-th of value of the auto-correlation function of s (n), EpCharacterization prediction
The minimum value of the square mean error amount of control information, " ∑ " are summation symbol.
For the embodiment of the present application, calculation processing is carried out to formula (1-7), available Linear prediction error fiker
Coefficient { αi}I=1,2 ... p.In the embodiment of the present application, carrying out the key of calculation processing to formula (1-7) is in formula (1-7)
R (j) solved, and to the r (j) in formula (1-7) carry out solve be related to ensemble average.In the embodiment of the present application, needle
Pair signal be voice signal, it is generally the case that voice signal is considered as that is, in a short time the signal of short-term stationarity is recognized
It is the stationary random signal of ergodicity for the corresponding random signal of voice signal, therefore, ensemble average is average equal to the time.At this
Apply can use when solving r (j) in embodimentValuation is carried out to it.
For the embodiment of the present application, utilizeIt is related to ensemble average when carrying out valuation to r (j),
Since the embodiment of the present application is directed to voice signal, and voice signal is considered as ergodic random letter in the short time
Number, therefore, it can useValuation is carried out to r (j), due toThe solution to r (j) is not influenced, because
This, removesAgain since n value is infinity, a default biggish value N to carry out valuation to r (j), specifically such as
Under:
It is assumed that primary speech signal s (n) is 0 other than 0≤n≤N range, then the estimated value of r (j) can be by following
Formula (1-8) indicates:
Wherein, r (j) characterizes j-th of value of the auto-correlation function of s (n), and p characterizes the order of Linear prediction error fiker,
N characterizes the time parameter as unit of the sampling period, and N is preset one value, and s (n) characterizes the original language in n-th of sampling period
Sound signal, s (n-j) characterize the primary speech signal in j-th of sampling period before n-th of sampling period, and " ∑ " is summation symbol
Number.
For the embodiment of the present application, formula (1-8) remains even function characteristic r (j)=r (- j), utilizes even function spy
Property rewrite formula (1-7), available following formula (1-9):
Wherein, r (0) characterizes the 0th value of the auto-correlation function of s (n), and r (1) characterizes the 1st of the auto-correlation function of s (n)
A value, r (2) characterize the 2nd value of the auto-correlation function of s (n), and r (p-2) characterizes pth -2 values of the auto-correlation function of s (n),
R (p-1) characterizes pth -1 value of the auto-correlation function of s (n), and r (p) characterizes p-th of value of the auto-correlation function of s (n), α1Table
Levy the 1st coefficient of Linear prediction error fiker, α2Characterize the 2nd coefficient of Linear prediction error fiker, αp1Characterize line
1 coefficient of pth of property prediction error filter, EpCharacterize the minimum value of the square mean error amount of prediction error information.
For the embodiment of the present application, formula (1-9) is Tobe Ritz matrix, can use Paul levinson moral guest (Levinson-
Durbin coefficient { the α of available Linear prediction error fiker after) algorithm solves formula (1-9)i}I=1,2 ... p。
For in the embodiment of the present application, by the coefficient { α of obtained Linear prediction error fikeri}I=1,2 ... pIt substitutes into public
Formula (1-1) determines that second filter, second filter can be indicated by formula (1-1).
For the embodiment of the present application, primary speech signal is s (n), and s (n) is input to determining second filter and is carried out
Whitening processing, obtains the corresponding original excitation of primary speech signal, and original excitation can be indicated by formula (1-2).
For the embodiment of the present application, the inverse of the corresponding characterization formula of second filter is the corresponding table of first filter
Levy formula.In the embodiment of the present application, first filter can be indicated by following formula (1-10):
Wherein, H (z) characterizes first filter, and z is a plural number, and p characterizes the order of Linear prediction error fiker, αi
I-th of coefficient of Linear prediction error fiker, the sequence number of i characterization parameter are characterized, " ∑ " is summation symbol.
The alternatively possible implementation of the embodiment of the present application, can also comprise determining that first before step S103
The corresponding pole of filter, and based on the corresponding pole of first filter determine the corresponding pole angle information of first filter with
And the corresponding pole amplitude information of first filter.
Specifically, first filter can be indicated by formula (1-10), carry out relevant calculation to formula (1-10), can
To obtain the corresponding pole of first filter, any pole includes pole angle information and pole amplitude information.In the application
In embodiment, the denominator of solution formula (1-10) is all of 0.In the embodiment of the present application, it is solved using QR decomposition method public
The denominator of formula (1-10) is all of 0, to obtain the corresponding whole poles of first filter, and then obtains first filter
The corresponding pole angle value of whole poles and pole range value.
The corresponding whole poles of first filter are obtained especially by formula (1-10), and then obtain the complete of first filter
The corresponding pole angle value of portion's pole and pole range value are as follows:
For the embodiment of the present application, have in the way of all that the denominator of QR decomposition method solution formula (1-10) is 0
Body may include:
Firstly, the denominator of formula (1-10) to be turned to the monic polynomial equation of n, and it is equal to the monic polynomial equation
0.Monic polynomial equation is equal to 0 can be indicated by following formula (1-11):
Qn(x)=xn+bn-1xn-1+…+b1x+b0=0 (1-11)
Wherein, Qn(x) polynomial of degree n of the characterization about x, n characterize the number of independent variable x, and x characterizes independent variable, b0,
b1,…,bn-1Characterize equation Qn(x) coefficient.
For the embodiment of the present application, formula (1-11) is considered as the characteristic equation of certain real number matrix, solution formula (1-
11) whole roots can be converted into the All Eigenvalues for solving the real number matrix.In the embodiment of the present application, to formula (1-11)
It is written over, real number matrix shown in available following formula (1-12):
Wherein, B characterizes real number matrix, b0,b1,…,bn-1It is the element of real number matrix B.
For the embodiment of the present application, formula (1-12) is upper H-matrix, can directly find out real number matrix B with QR decomposition method
All Eigenvalues, I will not elaborate.In the embodiment of the present application, the All Eigenvalues for obtaining real number matrix B obtain
The corresponding whole poles of one filter, any pole includes pole angle value and pole range value.
For example, pole 1 can be indicated by following formula (1-13):
Wherein, z1Characterize pole 1, r1Characterize the corresponding pole amplitude information of pole 1, ω1Characterize the corresponding pole of pole 1
Angle information.
The alternatively possible implementation of the embodiment of the present application, S103 can specifically include: if first filter is corresponding
Pole angle information meets preset condition, then according to the corresponding pole angle information of predetermined manner adjustment first filter and pole
At least one of in point amplitude information.
For the embodiment of the present application, the corresponding pole angle information of first filter be can specifically include: at least one pole
The corresponding pole angle value of point;The corresponding pole amplitude information of first filter can specifically include: at least one pole is corresponding
Pole range value.
Further, it may include: first filter that the corresponding pole angle information of first filter, which meets preset condition,
At least one corresponding pole corresponds to whether each pole angle value in pole angle value meets preset condition.Implement in the application
In example, all poles of preset condition will be met, adjusts corresponding pole angle value and pole width according to predetermined manner
At least one of in angle value.
It, can be according to default when the corresponding pole angle value of any pole belongs to [- a, a] for the embodiment of the present application
Mode adjust in the corresponding pole angle information of first filter and pole amplitude information at least one of.
The embodiment of the present application can be adjusted only when belonging to [- a, a] to the corresponding pole angle value of any pole
The corresponding pole angle value of the pole, without adjusting the corresponding pole range value of the pole.
With the first poleFor introduce, it can only adjust ω1, and r1It is constant.
A kind of possible implementation of the embodiment of the present application, according to the corresponding pole of predetermined manner adjustment first filter
Angle information, comprising: the corresponding pole angle value of at least one pole is increased by the first preset threshold, and by least one pole
The corresponding pole angle value of point reduces at least one in the second preset threshold.
For the embodiment of the present application, the first preset threshold may be the same or different with the second preset threshold.In this Shen
It please be in embodiment without limitation.
For the embodiment of the present application, when the corresponding pole of first filter includes at least two, by least two poles
Middle pole angle value belong to (0, a] pole angle value increase the first preset threshold, will belong in pole angle [- a, 0) pole
Angle value reduces by the second preset threshold.
For the embodiment of the present application, when the corresponding pole of first filter includes one, if the pole angle of the pole
Value belong to (0, a], then the pole angle value of the pole is increased by the first preset threshold, if the pole angle value of the pole belong to [-
A, 0), then the pole angle value of the pole reduces by the second preset threshold.
With the first poleFor introduce, work as ω1Belong to (0,3] when, ω1Increase X, works as ω1Belong to [- 3,0)
When, ω1Reduce X, wherein r1It can be constant.
Wherein, [0.07,0.11] X ∈.
The embodiment of the present application can be adjusted simultaneously when the corresponding pole angle value of any pole belongs to [- a, a]
The corresponding pole angle value of the pole and pole range value.
Wherein, for the adjustment mode of the corresponding pole angle value of pole as described above, details are not described herein.
It is corresponding to adjust the first filter according to presetting method for the alternatively possible implementation of the embodiment of the present application
Pole amplitude information, comprising: the corresponding pole range value of at least one described pole is adjusted according to presupposition multiple.
For the embodiment of the present application, the corresponding pole angle value adjustment of all poles that pole angle value is belonged into [- a, a]
To corresponding Y times.
With the first poleFor introduce, work as ω1When [- 3,3] ∈, by r1It is adjustable to its corresponding Y
Times.
Wherein, [0.8,1.2] Y ∈.
The embodiment of the present application only can be adjusted only when the corresponding pole angle value of any pole belongs to [- a, a]
The corresponding pole range value of any pole, without adjusting pole angle value.In the embodiment of the present application, for pole range value
Adjustment as it appears from the above, details are not described herein.
For the embodiment of the present application, by the corresponding pole angle information of first filter and pole amplitude information
At least one of be adjusted, i.e., to the corresponding formant frequency of primary speech signal and the corresponding formant of primary speech signal
At least one in acutance is adjusted.
For example, targeted voice signal is the corresponding signal of flu sound, then the corresponding sound spectrograph of primary speech signal such as Fig. 4
Shown, the change of voice is the sound spectrograph after catching a cold sound as shown in figure 5, horizontal axis indicates the time in Fig. 4 and Fig. 5, and the longitudinal axis indicates frequency
Rate, the brightness of a point shows the amplitude of the corresponding frequency content of point in sound spectrograph, and a point is brighter to show that the point is corresponding
The amplitude of frequency content is bigger, and a point more secretly shows that the amplitude of the corresponding frequency content of point is smaller, when certain point is corresponding
When the amplitude of frequency content is greater than the amplitude of the corresponding frequency content of surrounding each point, which is a formant.In the application reality
It applies in example, formant concentrates between frequency 0 to 1000 in region 1 as shown in Figure 4, and formant is concentrated in region 2 as shown in Figure 5
Between frequency 1000 to 3000, therefore, the corresponding formant of the voice signal after being adjusted is shown by Fig. 4 and Fig. 5 comparison
Frequency upper shift (raising);The acutance of the difference characterization formant of brightness in Fig. 4 and Fig. 5, as shown in figure 4, formant difference in brightness
It is larger to show that formant is sharper (formant acutance is larger), as shown in figure 5, formant difference in brightness is smaller to show that formant is relatively slow
(formant acutance is smaller) therefore shows the corresponding formant acutance of the voice signal after being adjusted by Fig. 4 and Fig. 5 comparison
It reduces.
For the embodiment of the present application, by the corresponding pole angle information of first filter and pole amplitude information
At least one of be adjusted, i.e., to the corresponding formant frequency of primary speech signal and the corresponding formant of primary speech signal
At least one in acutance is adjusted, so that target voice and raw tone difference by obtaining after above-mentioned adjustment, real
Change of voice effect is showed, user experience may further be promoted.
The alternatively possible implementation of the embodiment of the present application, S101 can also include: the language for obtaining user's input before
Sound signal;Denoising is carried out to the voice signal of user's input, and using the voice signal after denoising as raw tone
Signal.
For the embodiment of the present application, the voice signal of user's input may include the voice signal that user inputs in real time,
It may include the voice signal being locally stored.In the embodiment of the present application without limitation.
For the embodiment of the present application, the voice signal of the user that can will acquire during practical application input into
Row denoising can not also believe the voice that user inputs using the voice signal after denoising as primary speech signal
Number carry out denoising, i.e., using user input voice signal as primary speech signal.It does not limit in the embodiment of the present application
It is fixed.
For the embodiment of the present application, can be gone by the voice signal that a variety of feasible denoising modes input user
It makes an uproar processing.It is carried out at denoising for example, the voice signal of user's input is inputted in trained noise separation neural network model
Reason.
For the embodiment of the present application, above-described embodiment can be executed by terminal device, can also be executed by server, can also
It is executed with part by terminal device, is partially executed by server.In the embodiment of the present application without limitation.
Above-described embodiment, which describes in detail, carries out Speech processing to primary speech signal, obtains targeted voice signal
(voice signal after the change of voice), the primary speech signal change of voice (is flu sound letter by following specific application scenarios of combination
Number), the specific implementation of the application is introduced, specific as follows shown:
If being flu sound by the voice change of voice of user's input, obtaining primary speech signal, (primary speech signal can be with
Voice to input user carries out the voice signal obtained after denoising, can also be the corresponding language of voice of user's input
Sound signal), then by primary speech signal by voice change process, voice signal after obtaining voice change process, and by voice change process
Voice signal afterwards by internet is sent to opposite end after being encoded, and opposite end is broadcast after being decoded to the information received
It puts, that is, the voice played is sound of catching a cold, specific as shown in Figure 3.
For the embodiment of the present application, user can with displaying target object in trigger action interface (" askew fruit people ", " flu ",
" tired beast " and " netting red female ") in any one, as shown in Fig. 2, if targeted voice signal is the corresponding voice of flu sound
Signal, then user can trigger " flu " object in operation interface as shown in Figure 2.
For the embodiment of the present application, by primary speech signal by voice change process include: i.e. by primary speech signal it is true
Its fixed corresponding original excitation and first filter, and by pole angle value ∈ in the corresponding pole of first filter (0,3]
All poles pole angle value increase X (X ∈ [0.07,0.11]), by pole angle in the corresponding pole of first filter
Value ∈ (- 3,0] all poles in pole angle value reduce X (X ∈ [0.07,0.11]);And/or by first filter pair
The pole range value of all poles of pole angle value ∈ [- 3,3] is adjusted to its corresponding 0.8~1.2 times in the pole answered, and obtains
To first filter adjusted, then first filter of the original excitation by adjusting after carries out voice change process.
It is above-mentioned to specifically describe audio signal processing method from the angle of method and step, below from virtual module or virtually
The angle of unit introduces speech signal processing device, specific as follows shown:
The embodiment of the present application provides a kind of speech signal processing device, as shown in fig. 6, the speech signal processing device 60
It include: the first acquisition module 601, the first determining module 602, adjustment module 603 and the second determining module 604, wherein
First obtains module 601, for obtaining primary speech signal.
First determining module 602 determines primary speech signal pair for carrying out linear prediction analysis to primary speech signal
The original excitation answered and first filter.
Module 603 is adjusted, it is corresponding for adjusting the corresponding pole angle information of first filter and first filter
At least one of in pole amplitude information, the first filter after being adjusted.
Second determining module 604, for based on the corresponding original excitation of primary speech signal and the first filter adjusted
Wave device, determines targeted voice signal.
A kind of possible implementation of the embodiment of the present application, the first determining module 602 can specifically include first and determine list
Member and the second determination unit, wherein
First determination unit determines that primary speech signal is corresponding for carrying out linear prediction analysis to primary speech signal
Prediction error information.
Second determination unit determines primary speech signal for being based on the corresponding prediction error information of primary speech signal
Corresponding original excitation and first filter.
The alternatively possible implementation of the embodiment of the present application, the second determination unit are specifically used for believing based on raw tone
Number corresponding prediction error information determines that second filter, second filter are the corresponding filter of linear prediction analysis.
Second determination unit is specifically also used to determine primary speech signal based on primary speech signal and second filter
Corresponding original excitation, and first filter is determined based on second filter.
The alternatively possible implementation of the embodiment of the present application adjusts module 603, is specifically used for when first filter is corresponding
Pole angle information when meeting preset condition, according to the corresponding pole angle information of predetermined manner adjustment first filter and
At least one of in pole amplitude information.
The alternatively possible implementation of the embodiment of the present application, the corresponding pole angle information of first filter include: to
Few corresponding pole angle value of a pole;It adjusts module 603 and includes at least one in adding unit and reduction unit,
In,
Adding unit, for the corresponding pole angle value of at least one pole to be increased by the first preset threshold.
Unit is reduced, for the corresponding pole angle value of at least one pole to be reduced by the second preset threshold.
The alternatively possible implementation of the embodiment of the present application, the corresponding pole amplitude information of first filter include: to
Few corresponding pole range value of a pole.
Module 603 is adjusted, specifically for adjusting the corresponding pole range value of at least one pole according to presupposition multiple
It is whole.
The alternatively possible implementation of the embodiment of the present application, speech signal processing device 60 further include: second obtains mould
Block and denoising module:
Second obtains module, for obtaining the voice signal of user's input.
Denoising module, voice signal for inputting to user carry out denoising, and by the language after denoising
Sound signal is as the primary speech signal.
For the embodiment of the present application, the first acquisition module and the second acquisition module can be the same acquisition module, can also
Think two different acquisition modules.In the embodiment of the present application without limitation.
Voice signal shown in above method embodiment can be performed in speech signal processing device provided by the embodiments of the present application
The corresponding operation of processing method, realization principle is similar, and details are not described herein again.
This application provides a kind of speech signal processing devices, and compared with prior art, the application passes through to raw tone
Signal carries out linear prediction analysis, determines the corresponding original excitation of primary speech signal and first filter, and by adjusting the
At least one of in the corresponding pole angle information of one filter and the corresponding pole amplitude information of first filter, it is adjusted
Then first filter after whole is determined using the corresponding original excitation of primary speech signal and first filter adjusted
Targeted voice signal, it can realize at least one in the corresponding formant frequency of primary speech signal and formant acutance
Item is adjusted, and obtains targeted voice signal, so as to realize that the voice to user's input carries out the change of voice, and then can be promoted
User experience.
The above-mentioned angle from virtual module or dummy unit introduces the speech signal processing device of the application, below from reality
The angle of body device introduces a kind of electronic equipment, and the electronic equipment in the embodiment of the present application can be terminal device, or
Server, in the embodiment of the present application without limitation.
The embodiment of the present application provides a kind of electronic equipment, as shown in fig. 7, electronic equipment shown in Fig. 7 4000 includes: place
Manage device 4001 and memory 4003.Wherein, processor 4001 is connected with memory 4003, is such as connected by bus 4002.It is optional
Ground, electronic equipment 4000 can also include transceiver 4004.It should be noted that transceiver 4004 is not limited to one in practical application
A, the structure of the electronic equipment 4000 does not constitute the restriction to the embodiment of the present application.
Processor 4001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance
Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure
Various illustrative logic blocks, module and circuit.Processor 4001 is also possible to realize the combination of computing function, such as wraps
It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..
Bus 4002 may include an access, and information is transmitted between said modules.Bus 4002 can be pci bus or
Eisa bus etc..Bus 4002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 7 convenient for indicating
One thick line indicates, it is not intended that an only bus or a type of bus.
Memory 4003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM
Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs
Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium
Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation
Code and can by any other medium of computer access, but not limited to this.
Memory 4003 is used to store the application code for executing application scheme, and is held by processor 4001 to control
Row.Processor 4001 is for executing the application code stored in memory 4003, to realize aforementioned either method embodiment
Shown in content.
The embodiment of the present application provides a kind of electronic equipment, and the electronic equipment in the embodiment of the present application includes: one or more
A processor;Memory;One or more application program, wherein one or more application programs are stored in memory and quilt
Be configured to be performed by one or more processors, one or more programs are configured to: execute preceding method embodiment or its
The corresponding operation of audio signal processing method shown in any possibility implementation, can realize: this Shen compared with prior art
Please by carrying out linear prediction analysis to primary speech signal, the corresponding original excitation of primary speech signal and the first filtering are determined
Device, and by adjusting in the corresponding pole angle information of first filter and the corresponding pole amplitude information of first filter
At least one of, the first filter after being adjusted, after then utilizing the corresponding original excitation of primary speech signal and adjustment
First filter determine targeted voice signal, it can realize to the corresponding formant frequency of primary speech signal and resonance
At least one in peak acutance is adjusted, and obtains targeted voice signal, so as to realize that the voice to user's input carries out
The change of voice, and then user experience can be promoted.
It is above-mentioned to introduce a kind of Speech processing electronic equipment from the angle of entity apparatus, below from the angle of storage medium
Introduce a kind of computer readable storage medium.
The embodiment of the present application provides a kind of computer readable storage medium, is stored on the computer readable storage medium
Computer program when the program is executed by processor, is realized shown in preceding method embodiment or any possible implementation
Audio signal processing method.Compared with prior art, it by carrying out linear prediction analysis to primary speech signal, determines original
The corresponding original excitation of voice signal and first filter, and by adjusting the corresponding pole angle information of first filter and
At least one of in the corresponding pole amplitude information of first filter, the first filter after being adjusted, then using original
The corresponding original excitation of voice signal and first filter adjusted determine targeted voice signal, it can realize to original
At least one in the corresponding formant frequency of voice signal and formant acutance is adjusted, and obtains targeted voice signal,
So as to realize that the voice to user's input carries out the change of voice, and then user experience can be promoted.
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow,
These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps
Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing
Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps
Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other
At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the application, it is noted that for the ordinary skill people of the art
For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered
It is considered as the protection scope of the application.
Claims (10)
1. a kind of audio signal processing method characterized by comprising
Obtain primary speech signal;
To the primary speech signal carry out linear prediction analysis, determine the primary speech signal it is corresponding it is original excitation and
First filter;
Adjust the corresponding pole angle information of the first filter and the corresponding pole amplitude information of the first filter
At least one of in, the first filter after being adjusted;
Based on the corresponding original excitation of the primary speech signal and the first filter adjusted, target voice is determined
Signal.
2. the method according to claim 1, wherein described carry out linear prediction point to the primary speech signal
Analysis, determines the corresponding original excitation of the primary speech signal and first filter, comprising:
Linear prediction analysis is carried out to the primary speech signal, determines the corresponding prediction error letter of the primary speech signal
Breath;
Based on the corresponding prediction error information of the primary speech signal, the corresponding original excitation of the primary speech signal is determined
And first filter.
3. according to the method described in claim 2, it is characterized in that, described missed based on the corresponding prediction of the primary speech signal
Poor information determines the corresponding original excitation of the primary speech signal and first filter, comprising:
Based on the corresponding prediction error information of the primary speech signal, determine that second filter, the second filter are institute
State the corresponding filter of linear prediction analysis;
The corresponding original excitation of the primary speech signal is determined based on the primary speech signal and the second filter,
And the first filter is determined based on the second filter.
4. method according to claim 1-3, which is characterized in that the adjustment first filter is corresponding
At least one of in pole angle information and the corresponding pole amplitude information of the first filter, first after being adjusted
Filter, comprising:
If the corresponding pole angle information of the first filter meets preset condition, according to predetermined manner adjustment described first
At least one of in the corresponding pole angle information of filter and pole amplitude information.
5. according to the method described in claim 4, it is characterized in that, the corresponding pole angle information packet of the first filter
It includes: the corresponding pole angle value of at least one pole;
The corresponding pole angle information of the first filter is adjusted according to predetermined manner, is included at least one of the following:
It will the first preset threshold of the corresponding pole angle value increase of at least one pole;
It will the second preset threshold of the corresponding pole angle value reduction of at least one pole.
6. according to the method described in claim 5, it is characterized in that, the corresponding pole amplitude information packet of the first filter
It includes: the corresponding pole range value of at least one pole;
The corresponding pole amplitude information of the first filter is adjusted according to presetting method, comprising:
The corresponding pole range value of at least one described pole is adjusted according to presupposition multiple.
7. the method according to claim 1, wherein the acquisition primary speech signal, before further include:
Obtain the voice signal of user's input;
Denoising is carried out to the voice signal of user input, and using the voice signal after denoising as described original
Voice signal.
8. a kind of speech signal processing device characterized by comprising
First obtains module, for obtaining primary speech signal;
First determining module determines the primary speech signal for carrying out linear prediction analysis to the primary speech signal
Corresponding original excitation and first filter;
Module is adjusted, it is corresponding for adjusting the corresponding pole angle information of the first filter and the first filter
At least one of in pole amplitude information, the first filter after being adjusted;
Second determining module, for based on the corresponding original excitation of the primary speech signal and first filter adjusted
Wave device, determines targeted voice signal.
9. a kind of electronic equipment, characterized in that it comprises:
One or more processors;
Memory;
One or more application program, wherein one or more of application programs are stored in the memory and are configured
To be executed by one or more of processors, one or more of programs are configured to: being executed according to claim 1~7
Described in any item audio signal processing methods.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
Claim 1~7 described in any item audio signal processing methods are realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910611481.2A CN110164461B (en) | 2019-07-08 | 2019-07-08 | Voice signal processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910611481.2A CN110164461B (en) | 2019-07-08 | 2019-07-08 | Voice signal processing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110164461A true CN110164461A (en) | 2019-08-23 |
CN110164461B CN110164461B (en) | 2023-12-15 |
Family
ID=67637855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910611481.2A Active CN110164461B (en) | 2019-07-08 | 2019-07-08 | Voice signal processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110164461B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415718A (en) * | 2019-09-05 | 2019-11-05 | 腾讯科技(深圳)有限公司 | The method of signal generation, audio recognition method and device based on artificial intelligence |
CN111431855A (en) * | 2020-02-26 | 2020-07-17 | 宁波吉利罗佑发动机零部件有限公司 | Vehicle CAN signal analysis method, device, equipment and medium |
CN113395577A (en) * | 2020-09-10 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Sound changing playing method and device, storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6047254A (en) * | 1996-05-15 | 2000-04-04 | Advanced Micro Devices, Inc. | System and method for determining a first formant analysis filter and prefiltering a speech signal for improved pitch estimation |
CN102779527A (en) * | 2012-08-07 | 2012-11-14 | 无锡成电科大科技发展有限公司 | Speech enhancement method on basis of enhancement of formants of window function |
US20140214413A1 (en) * | 2013-01-29 | 2014-07-31 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for adaptive formant sharpening in linear prediction coding |
CN105304092A (en) * | 2015-09-18 | 2016-02-03 | 深圳市海派通讯科技有限公司 | Real-time voice changing method based on intelligent terminal |
CN105654941A (en) * | 2016-01-20 | 2016-06-08 | 华南理工大学 | Voice change method and device based on specific target person voice change ratio parameter |
-
2019
- 2019-07-08 CN CN201910611481.2A patent/CN110164461B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6047254A (en) * | 1996-05-15 | 2000-04-04 | Advanced Micro Devices, Inc. | System and method for determining a first formant analysis filter and prefiltering a speech signal for improved pitch estimation |
CN102779527A (en) * | 2012-08-07 | 2012-11-14 | 无锡成电科大科技发展有限公司 | Speech enhancement method on basis of enhancement of formants of window function |
US20140214413A1 (en) * | 2013-01-29 | 2014-07-31 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for adaptive formant sharpening in linear prediction coding |
CN105304092A (en) * | 2015-09-18 | 2016-02-03 | 深圳市海派通讯科技有限公司 | Real-time voice changing method based on intelligent terminal |
CN105654941A (en) * | 2016-01-20 | 2016-06-08 | 华南理工大学 | Voice change method and device based on specific target person voice change ratio parameter |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415718A (en) * | 2019-09-05 | 2019-11-05 | 腾讯科技(深圳)有限公司 | The method of signal generation, audio recognition method and device based on artificial intelligence |
CN110415718B (en) * | 2019-09-05 | 2020-11-03 | 腾讯科技(深圳)有限公司 | Signal generation method, and voice recognition method and device based on artificial intelligence |
CN111431855A (en) * | 2020-02-26 | 2020-07-17 | 宁波吉利罗佑发动机零部件有限公司 | Vehicle CAN signal analysis method, device, equipment and medium |
CN113395577A (en) * | 2020-09-10 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Sound changing playing method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110164461B (en) | 2023-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2017404565B2 (en) | Electronic device, method and system of identity verification and computer readable storage medium | |
US20190080148A1 (en) | Method and apparatus for generating image | |
CN110164461A (en) | Audio signal processing method, device, electronic equipment and storage medium | |
CN113921022B (en) | Audio signal separation method, device, storage medium and electronic equipment | |
CN106921749A (en) | For the method and apparatus of pushed information | |
WO2021213008A1 (en) | Video sound and picture matching method, related device and storage medium | |
CN110493612A (en) | Processing method, server and the computer readable storage medium of barrage information | |
CN113345460B (en) | Audio signal processing method, device, equipment and storage medium | |
CN108989706A (en) | The method and device of special efficacy is generated based on music rhythm | |
CN114419300A (en) | Stylized image generation method and device, electronic equipment and storage medium | |
CN111627455A (en) | Audio data noise reduction method and device and computer readable storage medium | |
CN109697393A (en) | Person tracking method, device, electronic device and computer-readable medium | |
CN114995638A (en) | Tactile signal generation method and device, readable medium and electronic equipment | |
CN111354367A (en) | Voice processing method and device and computer storage medium | |
CN105931648A (en) | Audio signal de-reverberation method and device | |
CN111144347B (en) | Data processing method, device, platform and storage medium | |
CN117056728A (en) | Time sequence generation method, device, equipment and storage medium | |
CN116705056A (en) | Audio generation method, vocoder, electronic device and storage medium | |
CN110349108A (en) | Handle method, apparatus, electronic equipment and the storage medium of image | |
CN116188641A (en) | Digital person generation method and device, electronic equipment and storage medium | |
US8462984B2 (en) | Data pattern recognition and separation engine | |
CN115171710A (en) | Voice enhancement method and system for generating confrontation network based on multi-angle discrimination | |
CN116982111A (en) | Audio characteristic compensation method, audio identification method and related products | |
CN112581933A (en) | Speech synthesis model acquisition method and device, electronic equipment and storage medium | |
CN113395577A (en) | Sound changing playing method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |