MXPA96001755A - Filter for the modification or vocal improvement, and various apparatus, systems and method used by elmi - Google Patents

Filter for the modification or vocal improvement, and various apparatus, systems and method used by elmi

Info

Publication number
MXPA96001755A
MXPA96001755A MXPA/A/1996/001755A MX9601755A MXPA96001755A MX PA96001755 A MXPA96001755 A MX PA96001755A MX 9601755 A MX9601755 A MX 9601755A MX PA96001755 A MXPA96001755 A MX PA96001755A
Authority
MX
Mexico
Prior art keywords
information
filter
synthesized
signals
modified
Prior art date
Application number
MXPA/A/1996/001755A
Other languages
Spanish (es)
Other versions
MX9601755A (en
Inventor
Tasaki Hirohisa
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP7114752A external-priority patent/JP2993396B2/en
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Publication of MX9601755A publication Critical patent/MX9601755A/en
Publication of MXPA96001755A publication Critical patent/MXPA96001755A/en

Links

Abstract

In the present invention, a speech modification or improvement filter is described, and an apparatus, system and method used therein. The synthesized vocal signals are filtered to generate synthesized, modified vocal signals. From the spectral information represented as a multi-dimensional vector, a filter coefficient is determined to ensure that the characteristics of the formants of the vocal, synthesized, modified signals improve compared to those of the synthesized, speech and speech signal. according to the spectral information. The spectral information can be any of the LSP information, the PARCOR information, and LAR information. In this way, the degree of freedom of the design of the vocal modification filter used for the aural suppression of the quantifying noise contained in the synthesized vocal signals is intensified, which leads to the improvement of the comprehensibility of the synthesized speech signals. It can have an effect of improvement of the formants without allowing any perceptible level of distortion to occur within a range of permissive spectral gradients.

Description

REF 22461 FILTER FOR MODIFICATION OR VOCAL IMPROVEMENT AND VARIOUS APPARATUS, SYSTEMS AND METHOD USING THE SAME BACKGROUND OF THE INVENTION a) Field of the Invention The present invention relates generally to a system and method for transmitting or storing speech information by means of codes having a smaller information content than those of the input speech signals. This invention relates in particular to a system and method for extracting the parameters indicative of their characteristics from the vocal input signals, transmitting or storing the extracted parameters, and synthesizing the original vocal signals based on the transmitted or stored parameters. More specifically, the invention relates to a vocal modification filter for auditorily suppressing the quantizing noise that occurs in the synthesized speech signals. Additionally, the present invention relates to a system, a method of a filter for improving the quality of the signal such as a speech intelligibility. More specifically, the present invention relates to a vocal improvement which is suitable for improving the vocal comprehensibility of the signal having distortions caused by the analog transmission or the signal received by the rigid hearing correction apparatus, and which is suitable for improve the sharpness of the voice that will spread or will be produced by a loudspeaker. b) Description of the prior art A configuration of the speech synthesis / analysis system is illustrated by way of example in Figure 28. The system of this diagram comprises an analyzer unit 100 and a synthesizer unit 200. The analyzing unit 100 includes an analyzer 101 and an encoder 102, while the synthesizing unit 200 includes a decoder 201 and a synthesizer 202. In some applications, the units 100 and 200 are linked together through communication channels, typically the unit is far from the other. In some applications, the unit 100 transmits information through the storage means to the unit 200, wherein the two units can constitute an individual apparatus or two separate apparatuses. The analyzer 101 extracts, from the input speech signals supplied from a user, the group of parameters including spectral deformation indicative of the characteristics of the input speech signals. The extracted parameter group is encoded by the encoder 102 and fed through the communication channels or the storage means to the synthesizer unit 200, in which the encoded parameter group is decoded by the decoder 201. synthesizer 202 serves to synthesize the speech signals based on the group of parameters decoded in this way. An advantage of the system having this configuration lies in the low information content of the transmitted or stored signals. This is attributable to the fact that the stored transmitted signals, ie the parameter group, encoded contains a lower information content compared to the input speech signals. A variant of the synthesizer unit 200 is illustrated in Figure 29. This variant further comprises a post-filter 203 which serves to subject the speech signals derived from the synthesizer 202 (hereinafter referred to as synthesized speech signals) to a predetermined modification process, based on the group of decoded parameters, generating from this mode the modified vocal signals (referred to later as synthesized vocal signals). The post-filter 203 is used in some applications to audibly suppress the quantizing noise contained in the synthesized speech signals, but in other applications it is used to improve subjective quality such as vocal comprehensibility. In the following description, the post-filter of this type will be defined as a vocal modification filter or a vocal improvement filter. The synthesizer unit 200 provided with this filter 203 is suitable for use in a speech encoding / decoding system or a speech recognition and response system. - A variety of filters are available as filters 203. Above all, a filter of a type that improves the formant characteristics, has the advantage of being significantly effective in the suppression of quantifying noise and in the improvement of subjective quality. The references of the prior art describe this filter and include for example: Japanese Patent Laid-Open No. Pub.Sho64-13200 (hereinafter referred to as reference 1); Japanese Patent Laid-open No. Pub. Hei5-500573 (hereinafter referred to as reference 2); Japanese Patent Laid-open No. Pub. Hei2-82710 (hereinafter referred to as reference 3); and "Spreech Coding System Based on Adaptative Mel-Cepstral Analysis for Noisy Channel "Debate of the spring meeting of the Acoustic Society of Japan, Vol. 1, pp. 257-258 (1994. 3) (referred to later as reference 4.) The filters shown in references 1 and 2 both are used as the vocal modification filter 203 in the synthesizer unit 200 which receives the linear prediction codes (LPC) as the group of coded parameters, described above of the analyzer unit 100. A filter set forth in reference 3 is used as the speech modification filter 203 in the synthesizer unit 200, which receives the autocorrelation coefficients as the group of coded parameters, described above of the analyzing unit 100. Finally, a filter set forth in reference 4 it is used as the synthesizing filter 203 that receives the melted scalped cepstrum or the melting cepstro, of the group of parameters described above in the analyzer unit 100. Figure 29 illustrates a schematic configuration of the filter described in reference 1, this filter 203 receives the decoded LPCs from the decoder 201 in addition to the synthesized speech signals, fed from the synthesizer 202. The LPCs referred to herein mean parameters a obtained by the coding of linear prediction, to be executed by the analyzer 101 shown in Figure 28. The linear prediction coding is a method to determine, based on the sampled values of the waveforms of the input and output speech signals. according to the linear prediction method, the parameters ao the filter coefficients, for example, from 8 to 11 orders that model a human vocal mechanism. The filter 203 shown in Figure 30 includes a filter 204 for filtering the synthesized speech signals to generate synthesized, semi-modified speech signals, and a filter 205 for filtering the synthesized, semi-modified speech signals to generate the speech, synthesized, modified, filters 204 and 205 that both use the parameters a as their filter coefficients. It should be noted that the parameter a used in filter 204 is not the parameter a¿ (where 1 = 1, 2, ..., p; p is a prediction order) fed from coding 201, but a¿ = a ^ v ^ obtained by modifying the parameter ay with a modified coefficient v. In the same way, the parameter a for use in filter 205 is a2¿ = ai? -1 obtained by modifying the parameter with a modified n coefficient. The process to modify the parameter to a with the coefficients vy? modified is executed by sections 206 and 207, modifying the LPCs, respectively. It is now assumed that the filters 204 and 205 imply a denominator and a numerator, respectively, of a transfer function H (z) to transform the synthesized speech signals into the synthesized, modified speech signals. In other words, filters 204 and 205 are allowed to be an LPC filter and an inverted LPC filter, respectively. In addition, the filtering using the parameters a, a, and the filter coefficients is supposedly given as: P A (z) =? (o-iz-1) ... (1) i - o where z is a transformation operator z. Since the filter coefficients used in the filters 204 and 205 are respectively ali = a¿ / v_1 and a2¿ = a¿? -, as described above, the transfer functions of the filters 204 and 205 are respectively represented in the 1 / A way (z / v) and A (z /?). Therefore, the transfer portion to transform the synthesized speech signals into synthesized, modified speech signals can be expressed as: H (z) = A (z /?) / A (z / v) ... (2) - Figure 31 schematically illustrates a filter configuration described in reference 2. In this filter 203, a? Generated in the LPC modification station 206 is transformed by an LPC / ACC transformation section 208 of an LPC domain into an autocorrelation domain, that ST submits to an expansion of bandwidth within the autocorrelation domain by a section 209 and of modification of ACC, and according to the recursion of Levinson, is transformed by an ACC / LPC transformation section 210 from the domain of autocorrelation of the LPC domain. The filter 205 receives the a obtained in this way. Although the LPC modification section 207 shown in Figure 230 is removed in this diagram, the reference 202 also suggests a configuration including the LPC modification section 207 whose alpha output ^ is modified again by the LPC transformation section 208 / ACC, the modification section 209 of ACC and the transformation section 210 ACC / LPC. Figure 32 illustrates a schematic configuration of the filter described in reference 3. This filter 203 is thus configured to have the ACC / LPC transformation sections 211 and 212 in addition to the configuration of the reference 201. The ACC transformation section 211 / LPC receives autocorrelation constants as spectral information included in the decoded parameter group and then transforms the autocorrelation constants received from the autocorrelation domain in the LPC domain. The ACC / LPC transformation section 212 receives a part of the order m (m <; p) or less than the autocorrelation constants to be received by the ACC / LPC transformation section 211 and then transform the autocorrelation constants, received from the autocorrelation domain in the LPC domain. The modification sections 206 and 207 of LPC modify the parameters to derivatives of the ACC / LPC transformation sections 211 and 212, respectively, in the same way as reference 1. It should be noted that the constants d? The autocorrelation to be provided as inputs in this configuration may be ones that have been decoded by the decoder 201 (i.e., the autocorrelation constants received by calculation by the analyzer 201 and by encoding by the encoder 102), or may be ones that have been calculated by the decoder 201 - or the synthesizer 212 based on the different type of decoded spectral parameters in the decoder 201. Figures 33 to 35 represent features of the frequency spectrum versus the logarithmic energy of the vocal modification (or improvement) filters described in the references from 1 to 3. In these diagrams from A to D represent, respectively, the characteristics of the synthesizer: 202, the characteristics of the filter 204, the inverted characteristics of the filters 205, and the transfer function H (z). For example, in Figures 30 and 33, A represents 1 / A (z); B 1 / A (z / v); C represents 1 / A (z /?); and D represents H (z) = A (z /?) / A (z / v). As it is apparent from the expression (2) that relates to the reference 1 and also from the figures from 33 to 35 that relate to the references from 1 to 3, the filter 204 functions as a filter that improves the formants of the spectrum of the synthesized speech signals and suppresses the valleys of that spectrum, while the filter 205 functions as a filter that eliminates a spectral gradient induced by the filter 204. It is contemplated that the degree of improvement and suppression by the filter 204 will be It will therefore increase as B becomes larger, and will decrease as it becomes smaller. It is assumed in reference 1 that? and v satisfy O ^? ^ v < 1. Figure 33 represents an example with v = 0.8; ? = 0.5; Figure 4 an example using an expansion process iw_ bandwidth through a delay test band at 1200 Hz with v = 0.8; and Figure 35 an example with p = 10, m = 4, v = 0.95,? = 0.95. As it will be clear just from the comparison between Figures 33 and 34 or from the comparison between Figures 33 and 35, the filter of vocal modification (or improvement) in references 2 and 3 will be able to intensify the effect of elimination of the spectral gradient using the filter 205 compared to the filter described in reference 1. That is, the technique described in reference 1 will not allow the filter 205 to completely cancel out the spectral gradient conferred by the filter 204. Furthermore, since the Spectral gradient varies with the passage of time, it will be difficult for a high-frequency spectrum improvement process, fixed cancellation of the spectral gradient, which will result in a variation of sharpness with time. On the contrary, the techniques described in references 2 and 3 will do their best to intensify the effect of improving the peak-valley structure of the spectrum to make the spectral gradient better. This will lead to a prevention of deterioration in sharpness and naturalness by the filter 203.
It should be appreciated that the techniques described in references 2 and 3 are in one aspect an improvement over the techniques described in reference 1, but in another aspect, they are inferior to that. For example, although it may depend on the analyzer configuration 100 or the mode to which it conforms to the system, the technique described in reference 2 has a deficiency since the resulting synthesized, modified speech signals often comprise unique distortions. This arises from the fact that a process of elimination of the spectrum, is imadamente powerful is carried out within the domain of autocorrelation with the result that the spectrum is distorted remarkably in the vicinity of the strong formants. This can result in the modified, modified speech signals being of inferior quality to the technique described in reference 1. In the case of the technique described in reference 3, due to the reduction in the order of the filter in the autocorrelation domain frequently suffers from inconveniences since the positions of the formants are displaced to a large degree or since a plurality of formants of the C integrated in 1. This unstable spectral variation will cause distortions in the synthesized, modified speech signals. From a comparison between characteristics B and C indicated in Figure 35, for example, it can be seen that a phenomenon occurs in which the formant that has the lowest frequency among the formants in B moves to a lower frequency in C and a phenomenon of integration of two formants at the midpoint. In addition, significant formant displacement due to these causes may occur or may not occur over time, with the result that the synthesized, modified, resulting voice will fluctuate unnaturally. The techniques described in the references from 1 to 3 also cause a common problem of a low degree of design freedom (freedom in the feature control operation). In the context of the technique described in reference 1, for example, will it be difficult to change the characteristics of the filter 203 to a large degree speci fi cally by the vari.ar? Y ? within the interval in which the problems of the spectral gradient and its variation with time do not reach 3er marked as well. In the case of the tisane described in reference 2, larger variable intervals are established for v and the frequency of the delay test band to intensify the formant improvement efesto of the filter 204, then the distortions described above, ie , the distortions to rifc ibles to the process of leveling of the spectrum within the domain of autosor relasión will get to be more significant, the variable intervals of v and the frequency of the bar of test of delay must be restricted, doing the impossible to greatly change the characteristics of the filter 203. In the case of the technique described in reference 3, the freedom of the faces will be lowered naturally since it uses the order of the filter as its sontrol variable, which is an integral, finite value. Figure 36 schematically illustrates a configuration of filter 203 for modifying (or improving) vosal dessrito in referensia 4. Filter 203 in this diagram differs greatly from the previous theses, described above, since. the sepster is scaled in melios as spectral deformation insulated in the group of desodiated parameters of the decoder 201 and which transforms the synthesized vocal signals, modified, by filtering, using as its filter coefficient the melted scaled cepstrum, obtained, obtained by modifying the scaled cepstrum in melios, input. That is, the synthesized speech signals are filtered by a filter 213 which uses, as its filter coefficients, the sepster scaled in melons, modified, generated by a section 214 of modification of the sepster scaled in melio. In a more specific manner, section 214 of modification of the cepstro set in melios replaces the first-order somponente of sepstro, sliced in melios, input is 0 and multiplies the other somponents by S >; to generate in this way the sepstro salted in melios, modified. The filter 213 has used this sepster set in melodies, modified its filter coefficient to filter the synthesized vocal signals, and provides signals obtained as its output in the form of vocal, synthesized, modified signals. Incidentally, the filter 213 is referred to as a log-like, logarithmic approach filter, melted in melio (MLSA, for its 3iglas in English) since it employs the melted skeletal, modified, as its filter soefisiente. The term sepster scaled in melios, used herein, means a parameter calculated by the analyzer 101 by orthogonally transforming the logarithmic spectrum of the input speech signals. It would be impossible in general for the techniques of the references 1 to 3 that are applied as they are to a system in the sual the vocal information is transformed into the scaled cepstrum in mellos for transmission or storage. That is, the transformation of spectral parameters such as melted scaled cepstrum in the LPC domain would cause a significant distortion of spectral geometry, which will be needed by the LPC module through the reanalysis of vocal signals, without being raised. Furthermore, even if the calculated LPC contains dispersions, it is related to the LPC obtained through the analysis of the original voice and therefore will not ensure these good vocal modification characteristics. On the contrary, the method of reference 4 is able to avoid the occurrence of these situations. Conversely, this means that the technic dessrita in referensia 4 will face a problem of poor sonatability, in other words, of impossibility of application to systems designed to synthesize the vocal signals by using a group of parameters different from the parameters of spectrum. Typical of these systems are for example, ones that use parameter groups such as LPC, LSP (line spectrum pairs), and PARCOR (partial autocorrelation coefficients). This problem is serious since LPC, LSP and PARCOR are frequently used for vocal encoding / deodifisation. If a vocal modification filter using a cepstro set in melons as its filter coefficient is incorporated into the synthesizer unit that receives LPC as one of the parameters, then the spectral geometry will be distorted with the transformation of the LPC domain into the domain of cepstro scaled in melios, as described hereinabove. It is natural that this distortion can be eliminated to some degree by re-calculating the scaled cepstrum in melios by reanalyzing the synthesized vocal signals. Although the melted scalped cepstrum has been calculated in this way, it will nevertheless contain more distortions compared to the cepstro scored in melios that would be derived from the original voice. In this way, you will not expect very good vocal modification features.
BRIEF DESCRIPTION OF THE INVENTION A first object of the present invention is to provide a modification filter (or improvement, which will be omitted hereinafter in the present) which ensures a good efficiency of the formants within a range of allowable thickened gradients. A second object of the present invention is to provide a vocal modification filter that ensures a good effect of improving the formants without causing any discernible level of distortion in the structure of the formant. A third object of the present invention is to provide a vocal modification filter capable of implementing the same formant improvement effect as the prior art, by using a smaller number of constituent means than the previous technique. A fourth object of the present invention is to provide a modification filter vosal that allows the selective execution of the control of the sharpness, reduction in the processing procedures, the improvement in the comprehensibility. A fifth object of the present invention is to avoid the need for the stability test in the domain whose nature is different from the domain to which the incoming skeleton information corresponds, and in this way to provide a vocal modification filter having a high degree of freedom of design. A third object of the present invention is to provide a voice modifission filter, suitable for a sintering unit that receives the LSP, PARCOR, LAR (logarithmic area ratio), etc., as the spectral information on the side of the analyzer unit. A seventh object of the present invention is to provide a modifisation filter vosal that ensures, at the entrance of LSP, PARCOR, LAR, etc., this espestral deformation, a good sonatability without the necessity of any re-analysis of the spectrum c the t ransformation of the parameter. It is an object of the present invention to implement a vosal synthesizer system by using a vosal modifission filter that is capable of achieving from the first to the seventh object above. In agreement they are a first aspesto of the invention, the synthesized vos signals are filtered through a transference function defined by a filter soefisiente, to generate the synthesized, modified vos signals. This filter coefficient is generated based on the spectral information presented in the form of multidimensional rectors and corresponds to a predefined domain that belongs to the input signals, in such a way that the characteristics of the formant of the vocal signals synthesized, modified are improved according to the previous espestral information and in somparación are those of the signals sintetizadas vos. Available for the special information is any of the LSP information, PAR.COR information and LAR information. Due to the specific techniques of LSP information, PARCOR information and LAR information, the operations to generate the filter coefficients can be performed as operations of a nature such that the arithmetic associated with the individual dimensions it is dependent on the associated arithmetic are the remaining dimensions. When the LSP, PARCOR, or LAR information is used to generate the filter soefisientes, the stability of the filter can be shown without transforming them from the LSP, PARCOR, or LAR domain to another domain. It is noted that in the filter, which uses, for example, filter soefisientes, generated from the LSP information, it is necessary to transform the filter coefficients from the LSP domain to another domain, to test the stability of the filter. Accordingly, according to a first aspect of the present invention, it is easier to design the speech modification process or filter without introducing instability thereto, than the prior techniques using the filter coefficients generated from the LSP information.In addition, the filtering of this aspect to those who transmit or massacre LSP information, PARCOR information or LAR information would not need any spectrum reanalysis and transformation of parameters, which can ensure good connectivity . The filtration in the present invention can be performed if none of the LPC domain, the LSP domain and the PARCOR domain. In other words, the filter coefficients in the present invention can correspond to any of the LPC domain, the LSP domain and the PARCOR domain. According to a second aspect of the present invention, the spectral information is first modified within a domain to which it corresponds to generate the modified spectral information, and the modified spectral information is then transformed from that domain into the LPC domain to generate the filter coefficients, and the filter coefficients obtained in this way are used for filtering without the LPC domain. Since a variety of coefficients used for this modification can be used, this spectrum will make it possible to more freely modulate the synthesis of the filter coefficients than the previous techniques, according to the filtering characteristics (modification characteristics of the synthesized speech signals). demanded by users. According to a third aspect of the present invention the spectral information is modified in this way to reduce the peaks of the formants of the synthesized, modified speech signals. Therefore, this will make it possible to obtain a good effect of improving dentin formants from a range of permissible spectral gradients and obtain a good effect of the formants without causing any discernible level of distortion in the structure of the formant. .
Conceivable as a first method for modification, it is a method in which the spectral information belonging to the input speech signals and the reference information corresponding to the same domain are proportionally divided according to the modifed coefficient. This method is available when the host information is LSP information. Depending on the methods of adjusting the reference information, this method will make it possible to make the following modifications, for example: a modification to impart a fixed spectral gradient to the modified, modified speech signals; a modification to impart a thickness gradient that reflects the average noise density for the synthesized, modified speech signals (i.e., a modification to slightly improve a different speech spectrum from the noise spectrum); and a modification to affect the voice signals, synthesized, modified a spectrum gradient that reflects a history which the spectral information has traced thus (that is, a modification to improve the amount of variation in the vocal spectrum). This will make it possible to carry out the sounding of the sharpness, reduction in the information processing procedures, and the improvement in the comprehensibility. This method also allows the filter of the present invention to additionally implement the values of the other secondary filtering processes, a fixed, high-frequency improvement process). Conceivable as a second method for modifission, it is a method in the sual for one of a plurality of dimensions that - constitute spectral deformation that belongs to the input signals, that spectral information is multiplised by a modifieed soefisient, or by Modified energy of the soefisiente. This method is available when spectral information is either PARCOR information or LAR information. This method also ensures some of the efestos listed above, for example the redussing of the process, the improved comprehensibility, etc. It should be understood that - when the SPEs information is the information of PARCOR, the method that uses multiplies the spectral information by the energy of the modified coefficient and that this energy is dependent on the spectral information dimension.
Consumable as a third method for modification, it is a method in which the distances extend between adjacent dimensions between a plurality of representative dimensions of the spectral information corresponding to the input signals. The most specific form, by suing a distance between the adjacent dimensions is less than a reference distance, the distance extends beyond the reference distance and subsequently the distance will shrink equally with respect to all the dimensions, to ensure that the extension of the spectral information in its entirety becomes coincidental are the extension before the expansion. This method is available when the spectral information is for LSP training. This method allows to modify the spectral information such that the spectrum of the synthesized, modified vocal signals is flattened and ensures some of the effect listed above, the reduced process, the improved somprensibility; etc., in terms of the leveling of the spectral gradient. In addition, the redussion of the proseso is performed or the somponentes are relasión to the first and second methods.
It can also be seen that the first and the third modifying methods are combined with each other. In this case, the first method and the third method can be used selectively, or alternatively, or both can be used cooperatively. As for the advantages of each method in relation to the other two methods and the differences between the 3 methods, it will be apparent from the later description in the modalities for a person skilled in the technique. From the first to the third modifisation method can be insorporated somo: first a translational table that stores spectral information asersa of the input signals in sorrelasión modifisada espestral information and generates espestral information modifisada in response to a provision of espestral information; and secondarily, a neural network that has acquired, when learning, a sapasidad to transform the espestral info-rmasión in especssion modifisada informasión, to be sapaz to generate the especssion information modifisada in a supply of the espestral informasión asersa of the vos vos signals of entry. It is preferable that the translational table and the neural network are provided for all 23 one of a plurality of categories that do not overlap each other and that are obtained by classifying the domains to the suals, the spectral information about the input signals, or that can be used while their actions are commuted by means of coefficient switching, sada category. This will make it possible to provide adaptive control by dividing categories and reducing distortions within the boundaries of the categories. It would also be possible to use any modification method different from the first to the terser methods for each category. According to a fourth spectrum of the present invention, in which the filtering is executed without any of the LSP domain and the PARCOR domain, the spectral information of the input signals is modified without a corresponding domain and the modified spectral information, resulting is used as a filter soef-isiente. This aspect will eliminate the need for the transformation of associated domains are the espestral information, modified, it being possible to provide in a substantial way the same efesto of improvement of the formants as the previous teasin by less number of constituent elements than the previous technique. According to a fifth aspect of the present invention, the filtering is shackled so that the formants of the voiced, synthesized, modified signals are further improved as compared to those of the synthesized speech signals. According to a sixth aspect of the present invention, the spectral gradient to be imparted to the vos signals, synthesized, modified in the fifth aspesto, is suppressed. According to a seventh aspect of the present invention, the synthesized vocal signals are generated based on the spectral information represented as a multidimensional vector and corresponding to a predetermined domain that belongs to the input vocal signals, and subsequently the included processes are executed with the aspects described above, based on the espestral information. According to an eighth aspect of the present invention, the synthesized speech signals are generated based on the first spectral information represented as a multidimensional vector and corresponding to a predetermined domain and belonging to the input vocal signals, and the first Spectral information is transformed into the second espestral information corresponding to a domain different from the domain to which the first spectral information has corresponded, and then the processes comprised with the assumptions previously described are executed based on the second spectral information. As a ninth aspect of the present invention, the synthesized speech signals are generated based on the first spectral information that corresponds to the input speech signals and corresponds to a predefined domain and is represented as a multidimensional vector. , and the synthesized vocal signals are analyzed to generate the second spectral information, and then the prosesos are executed, somed by the previous appearances based on the second spectral information. According to the twentieth aspect of the present invention, prior to the processes comprised with the seventh to the ninth aspect, the spectral information or first spectral information is generated by the analysis of the input vocal signals, and the information -P1. espestral or the first espestral information is stored or transmitted.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 and FIG. 2 are block diagrams showing all of a configuration of a modification filter of agreement are a modality based on LSP between preferred embodiments of the present invention.
Figure 3 is a block diagram showing, by way of example, a speech analysis / synthesis configuration; Figure 4 is a block diagram showing an example of an LSP modification method; - Figure 5 is an explanatory diagram of a method of generating LSP modified by a proportional division; ? ¿ Figure 6 and Figure 7 are block diagrams showing one example of the LSP modifission method; Figure 8 is a graphic representation of the characteristics of the frequency spectrum against the logarithmic energy of the LSP-based modality among the preferred embodiments of the invention, characteristics that are obtained in the case of using a method of generating LSPs by proportional division in the configuration of Figure 1; Figure 9 is a block diagram showing an example of the modifission method of LSP, Figure 10 is a graphical representation of the statistical saraste of the frequency spectrum against the logarithmic energy of the LSP-based modality among the preferred embodiments of the present invention, characteristics that are obtained in the case of using an LSP generation method. modified by expanding the distances between the adjacent dimensions in the configuration of Figure 2; Figure 11, Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16 are block diagrams each showing an example of the LSP modification method; Figure 17 and Figure 18 are block diagrams showing a configuration of a speech modification filter according to an embodiment that executes filtering within the LSP domain, within the preferred embodiments of the present invention; FIG. 19 is a block diagram showing a configuration of a speech modification filter according to a modality based on PARCOR between the preferred embodiments of the present invention; Figure 20 is a graphical representation of the characteristics of the frequency spectrum versus the logarithmic energy of the PARCOR-based modality among the preferred embodiments of the present invention? FIG. 21 and FIG. 22 are block diagrams showing a configuration of a modified modification filter according to a modality that axes the filtering within the PARCOR domain between the preferred embodiments of the present invention? Figure 23 is a block diagram showing the configuration of a modification filter according to the agreement are a modality based on LAR between the preferred embodiment of the present invention; FIG. 24 is a graphical representation of the dynamic features of the frequency spectrum versus the logarithmic energy of the LAR-based modality among the preferred embodiments of the present invention? Figure 25 and Figure 26 are block diagrams showing one of a configuration of a speech modification filter according to a modality that axes the filtering within a LAR domain or a PARCOR domain among the preferred embodiments of the present invention. invention? FIG. 27 is a block diagram showing a configuration of a dial-up modification filter; is a mode using a plurality of parameters among the preferred embodiments of the present invention? Figure 28 is a block diagram illustrating, by way of example, a configuration of the vosal analysis / synthesis system.
Figure 29 is a block diagram illustrating a way to use a voice modifission filter; Figure 30, Figure 31 and Figure 32 are - block diagrams illustrating the configurations of the vosal modification filters described in reference 1, reference 2 and reference 3, respectively; Figure 33, Figure 34 and Figure 35 s n graphical representations of the characteristics of the fresuensia spectrum are the logarithmic energy of the vosal modifission filters described in reference 1, reference 2 and reference 3, respectively; Y Figure 36 is a block diagram illustrating a configuration of the vocal modifission filter described in reference 4.
DETAILED DESCRIPTION OF THE PREFERRED MODALITIES The embodiments of the present invention will now be described with appended drawings, in which the identical constituent elements or corresponding to the prior art shown in Figures 28 to 36 are designated by the same reference numerals and will not be explained further. It should be noted that the constituent elements, common to the respective modalities, can also be designated by the same reference numerals, and will not be explained from time to time. • 5"7 Modality based on LSP Referring first to Figures 1 and 2 3, they represent two modes that receive LSP as the spectral information in the group of decoded parameters, among the preferred embodiments of a filter 203 according to the present invention. The embodiment shown in Figure 1 comprises. the LSP bonification sections 216 and 217 and the LSP / LPC transmutation sections 218 and 219, in addition to the filters 204 and 205. Also the embodiment shown in Figure 2 comprises the LSP modification section 216 and the section 218 of transforming LSP / LPC in addition to filter 204. These embodiments can be used in the synthesizer unit 200 having a configuration as shown in Figure 30 or 3. In the case of using the dessodifisador 201 sapaz de produsir LSP as - an element of the parameter group, the filter 203 can directly receive the output of the decoder 201 as shown in Figure 29, while in the case of using the desdipisator 201 which is not capable of producing LSP information, an element of the group of parameters, the output of the dessodiper 201 can be transformed by a transformation section 215 in the LSP domain and then supplied in the filter 203, as shown in Figure 3. It should be appreciated that the transformation section 215 can be integrated into the decoder 201 or the synthesizer 202. The LSP modifying sections 216 and 217 resonate LSP ± ± in the form of a multidimensional vector from the decoder 201 or the section 215 of transformation, and modifisan? Of sonformity are the default method, to generate the modified LSP? Hj_ and? H2¿. The LSP / LCP Transformation Sections 218 and 219 transform the? ljL and coh2¿, respectively, is from the LSP domain in the LPC domain to generate the parameters oc, al¿ and a2_, respectively. Filters 204 and 205 perform, in series, the filtering of the synthesized vos signals, using al and a2, respectively, as their filter elements, resp. Estives. As a result, the filter 205 provides synthesized verbal signals, modified as output. Now, the transfer functions of the filters 204 and 205 are allowed to be 1 / A¿ (z) and A2 (z), respectively, then the filter function of the filter 203 in Figure 1 can be co o H (z) = A2 (z) / A? (z) (3) and the transfer function of filter 203 of Figure 1 can be given as H (z) = 1 / A ± (z) (4) In the LSP-based mode of the present invention, in this manner, the LSP < Ü »J_ received as one of the parameters is modified and the modified LSP? Hlx (and LSP? H21) are transformed from the LSP domain into the LPC domain to thereby generate the filter coefficients alx (and a21) that are the parameters modified. A first advantage of the LSP-based mode, obtained in this way, is that it is easy to test and ensure that the filter 203 is stable, since stability can be verified within the LSP domain. More specifically Is it generally that the filter used by the LSP? It is stable when the LSP < »! satisfies the following sesuensial condition 0 < ?? < «2 < «I p (5) therefore, while the LSP satisfying equation (5) is used as the filter coefficient, the process for generating alx and a2x can be performed independently for the i, without introducing stability to the filter. As a result, a high degree of freedom of the filter design is realized. For example, it is able to implement a filter that should improve the high-frequency components of the voice by adjusting the degree of improvement for the higher order dimensions to achieve relatively large. Conversely, in the case where the parameters a or the autocorrelation constant is used to generate the filter coefficient, only the process is proof that it will not introduce instability to the filter, it can be used to generate lx and a2x, as in the references from 1 to 5, since in the domain of the parameter a or in the autocorrelation domain, it is difficult to test and ensure the stability of the filter that uses the filter coefficients based on these parameters. Therefore, the modification process performed for the respective i or are the adjustment of the degree of improvement along the frequency ee can not be analyzed without allowing the introduction of instability to the filter when the filter elements are used. based on the parameter a and on the basis of the autocorrelation. A second advantage of the modality based on LSP lies in a high applicability to the systems that transmit and mass the LSPs as well as the spectral information. Most of the vosal sodification / dessodifisation systems, in particular that have been developed in recent years, tend to use LSPs as espestral information. The LSP-based mode of the present invention is easily applicable to these types of vosal coding / decoding system. That is, due to the fact that there is no need for the reanalysis of the spectrum and the transformation of the parameters, a good stability can be obtained to this type of systems, different from the previous technique where the filter coefficients are determined based on the In this case, as is apparent from the previous desrillion, the function H (z) of transference of the filter 203 in the mode based on the LSP of the present invention will depend on the how to perform the LSP modifission operation and the LSP / LPC transformation operation to obtain the filter coefficients a: y 2x. A preferred method for the modification operation of LSP is firstly a proportional division modification and secondly an expansion of the adjacent dimension-to-dimension distance. The aforementioned proportional division modification is a method in which x is proportionally divided using modified coefficients of v,? What satisfies 0 _ v _? <; 1 as proportional division relations. When this method is executed in the configuration of Figure 1, the LSP modification sections 216 and 217 each have a functional configuration including a proportional division operation section 220 and a gradient adjustment section 221 as shown in FIG. Figure 4, for example. The proportional division operation section 220 generates? Hl1 or & > h2x according to the following expression for proportional division: -? hl1 =? 1 (l - v) +? f1 xv ... (6)? h21 =? xx (1 -? j +? fx? where i is = 1, 2, ... p The section 221 of adjustment of the gradient establishes f in the section 220 of operation and proportional division based on the order p of linear prediction It should be noted that f f used in section 216 The modi fi cation of LSP may be different from the value of the "l" of section 217. Also, the modification of < »f¿ by proportional division may be applied to the configuration of Fig. 2. A first advantage of The proportional division is to ensure an improved aspect of the formant's improvement.It is desir, suando? hl¿ and > 2¿ generated by proportional division are transformed from3 of the LSP domain into the LPC domain, the formants become weak with the result that a good effect of improvement of the formants can be obtained. "The formants become d ébiles "in the present means that the" peaks of the formants become small ", in other words" the spectral characteristics are leveled, while leaving the thicket that has something of the peak-valley structure ". - A second advantage of the proportional division is to ensure a high degree of freedom of the design characteristics of conformity are the demand of the users, such as the several and the degree of modification of the synthesized speech signals for each frequency band. In particular, when designing? F¿ in addition to v and?, The characteristics of the filter 213 may vary so as to meet the demands of the users. This high degree of design freedom will lead to an efesto which, within a range of allowable spectral gradients, can easily obtain a better efesto of improvement of the formants that exceeds the conventional theses. Is it shown that there are several methods to establish? F_. A first method is to establish the representative LSP of a flat thresher somo? F_. The gradient set-up session 221 implemented in accordance with this method which establishes jf_ in such a way that the dimension dimension to adjacent dimension of? F_ (=? - »fj_? F ^ -l) results in a certain value represented As p / (p + 1), according to the following expression fflfi »« X i / (p + 1) ... (7) the -Figure 5 illustrates in a conceptual way a generation of? As an example, will the proportional division modification operation that will take place after suando be established? f according to the expression (7). It is suggested that an assumption is made here that p = 10. Does this method have? advantage of its functional simplicity in section 221 of the gradient. A second method is to establish the representative LSP of a fixed gradient skele of? Fx. The gradient establishment section 221 implemented in accordance with this method establishes? Fx in such a way that the distance from dimension to dimension, adjacent to? Fi, simply linearly or decreases according to the follg expression obtained by adding the term S (i) that depends on i on the right side of the expression (7)? f1 = px? '' p + l) + S (i) (7a) In this case it can easily be seen by those skilled in the art of the foregoing description and of the description of Figure 1, how the proportional division modification assing takes place. . This method has the advantage of first allg the sharpness to be controlled through the establishment of the proportional coefficient of w? since a substantially fixed gradient can be imparted to the characteristics of the filter 203. Secondly it has the advantage of allg the processing procedures to be reduced since the transfer function H (z) in this filter 203 can contain the characteristics and a process of high frequency improvement, which can be carried out simultaneously, is the ordinary improvement process of the trainers. In this term it has the advantage of being able to aplly to 'suppress the variation and sharpness by changing 5 (i) to S (?) And modifying its functional block by the dotted line in Figure 5. A third method is to establish o? f? a LSP obtained by modifying the representative LSP of an average noise thickness, for example, by means of the pro-division division process. Section 221 of establishing the gradient implemented in accordance with this method establishes? Fi, as shown in Figure 6, by modifying LSP or? 'Representative of the average noise spectrum based on the division v' or? R relasion. proportional of agreement are the follg expression -? fi = «i 'x (1 - v) + a > it x v 'o? fi = «i' x (1 -? '.) +« ¿' x? ' ... (7b) where i = 1, 2, ... p. The advantage of this method lies in improved comprehensibility due to the sapasity to somewhat improve the vocal spectrum instead of the noise spectrum. Incidentally,? Can be obtained by averaging, through an average operating section 223, < »I within a period that has been judged to be a period of noise by a judgment section 222 shown in Figure 6. It also prefers that the modification process that tt > If f is set to not impart too much extreme spectral variation to vocal, synthesized, modified signals, for example, if? f becomes too weak, it will become possible to prevent any extreme spectral variation from occurring in synthesized, modified speech signals. A common method is to establish a LSP obtained by modifying, for example, by means of the proportional division process, an average value of? and for a period of and until now after the beginning of the action and for a predeterminated, past period. As shown in Figure 7, the gradient stabilization session 221 implemented by this method includes an average value of LSP «i roasted through section 223 of average operation and establishes? Fi on the basis of this« ¿'and the relationship v 'o?' Proportional division and agreement are the expression (7b). The 43 The advantage of this method lies in the improved somprensibility attributable to the wisdom of improving variations in the vosal spectrum. It is also preferable for the execution of this method, that consideration be given, for example, to modify 00 so as not to impart spectral variations that are too extreme to the synthesized, modified vocal signals. Referring now to Figure 8, the characteristics of the frequency spectrum against the logarithmic energy of the filter 203 shown in Figure 1 are represented., which will appear suando ODÍ to be modified in agreement are the expressions (6) and (7). In the graph, A, B, C and D respectively represent the rises of the synthesizer 202 = 1 / A (z), the characteristics of the filter 204 = I Ai (z), the inverted characteristics of the filter 205 = I A2 ( z), and the transfer function H (z) of the filter 203 = A2 (z) / Ai (z) with v = 0.5 and n = 0.8.- As shown in this graph, the D sarasteristisa of this graph and the saraste This technique is leveled while leaving the peak-valley structure of the spectrum to a certain degree, and incorporation with the characteristic D of Figure 33. In Figure 8 in this way, it can be compared The best efesto of improvement of the formants are Figure 33. Also the characteristic D of this graph presents less distortions, they are with respect to the floor-valley structure of the thicket, than the saracteratics of Figure 34. Addition- ally, the graphical representation D of this graph does not show the two phenomena any longer. which have been observed in characteristic B and C of Figure 35, that is, the displacement of the formants at lower frequencies and the integration of formants at the midpoint. As an alternative to the process of proportional division, the other objective process that has an effect of weakening the formants in the LSP domain can be used to obtain similar advantages. The synthesized, modified voice derived from the filter 203 of this modality that modifies "i" according to the method represented by the expressions (6) and (7) has been audibly compared.; - they are the synthesized, modified voice derived from the filter 203 of the teasiness previously described. As a result, it has been concluded that the speech modification filter of this invention has an advantage over the filter of the above teaching in terms of the suppression of sharpness degradation and that the former does not cause any distorted voice, single or no fluctuating tone. The expansion of the adjacent dimension from dimension to dimension that is a second preferred embodiment of the LSP modification operation can be performed by an expansion session 224 and a comprehension section 225, which is shown in Figure 9. The section 224 of expansion generates sx when changing? x, where both sx and < ? correspond to the LSP domain, so that the distance of dimension to dimension, adjacent, s: -3 ± - i can be made larger than the distance from dimension to dimension, adjacent x ~ oi? - l (are with respect to x - «1.;, see Figure 5). The section 225 of comprehension, uniform in sontra hli de s It should be noted in particular that sl thus somo? ± is a last-dimensional vestor. When this method is shaken by a sonfiguration of Figure 2, the uniform compression session 225 finds? H] __? of asue.rdo are the following expression and section 224 of expansion finds ± according to the following expression 3j_ = SL-I + max \, (ÚS - »! th) ... (9) where i = 1 , 2, ..., p + 1? 0 =, Wp + i = 3 f! = 0 th: threshold value As is apparent from the expressions (8) and (9), described above, the expansion of the distance from dimension to dimension, adjacent is a process to ensure at least a distance th between the dimension (i - 1) th and The i-th dimension of the result of the sompassion of i i - 8 ± - i with th, is defined in particular by the second term on the right side of expression (9). This process allows the LSP associated with the dimensions (i + 1) th or higher to change are only upwards by a distance corresponding to th - (? I -? I). Also the factor p • 3p + i contained on the right-hand side of expression (8) is a factor for uniformly compressing distances from dimension to dimension, adjacent in response to the relationships in the interval? I from 0 to TI and in the interval Si from 0 to sp + i of the LSP. It will be understood that the present invention should not be constructed to be limited by this expression of definition, and that other definition expressions may be employed while representing the processes for extending smaller, adjacent, dimension-to-dimension distances. Also, because of the expansion of the distances from dimension to dimension, adjacent can be applied to the configuration of Figure 1.? 3rd would make it possible to further increase the degree of design freedom of the characteristics of the filter 203. With reference now to the Figure 10 shows the characteristics of the frequency spectrum against the logarithmic energy, which will appear when this method is applied to the filter 203 of Fi to 2. In the graph, A, B, and C represent respectively the characteristics of the synthesizer 202 = 1 / A (z), filter characteristics 204 S.th = 0.3) = 1 / Al (z); th = 0.3) and filter characteristics 204 (th = 0.4) = 1 / Al (z; th = 0.4). As apparent from this graph, this method allows characteristics comparable to Figures 33 and 34 to be presented by the filter 204 only (in other words, without the filter 205 being used with any constituent element correponding thereto. This means that a good vocal modification filter can be implemented with a filter of a lower order than that of the known filters and that substantially the same effect of improving the formants as conventional filters can be achieved by a smaller number of constituent elements. In addition, the synthesized, modified voice obtained in this modality has been audibly compared with that obtained in traditional techniques, as a result, it has been concluded that the use of the vocal modification filter of this modality will ensure a tone quality by no means inferior. to that of the existing filters, the two kinds of modification methods, ie the modification Proportional division and expansion from dimension to dimension, adjacent are not mutually exclusive and therefore can be used in cooperation. It is also conceivable, for example, that one of the LSP modifying sections 216 and 217 executes the proportional division, the other being an adjacent dimension-to-dimension expansion control. Alternatively, as shown in Figure 11, a configuration including switching means 228 and 229 may be employed to selectively use the section 226 of proportional division and modification which serves to modify W by proportional division and section 227 of expansion of adjacent dimension-to-dimension distances, which serves to extend the dimension-to-dimension, adjacent distances of the LSP. The section 226 of proportional modification and division may have any of the configurations described above shown in Figures 4, 6 and 7. Alternatively, as shown in Figure 12, a configuration could be employed in which the section 226 of Modification and proportional division is connected in cascade with the section 227 of expansion of the distances from dimension to dimension, adjacent. By virtue of these configurations having an individual LSP modification section that serves as both the proportional division modification section 226 and the expansion section 227 of adjacent dimension-to-dimension distances, and then the characteristic design of ia The freedom of the filter '203 can be increased further. It is also contemplated that the sequence of section 226 - of proportional division modification and expansion section 227 in the adjacent dimension-to-dimension distances shown in Figure 12 will be reversed. It is natural that other processes can be combined with either or both of the modi fi cation and proportional division and the expansion of adjacent dimension-to-dimension distances. In addition, an adaptive? X process can be executed by LSP modification sections 216 and 217. Conceivable as a method to produce the adaptive? X of the amp modification process; 3X on the basis of proportional division, in for example a method in which a space of? X is divided into a plurality of subspaces (referred to later in the present roman category that does not overlap each other and in which vy? prepare fo commute) for each category. In this case, the modification of LSP for each category can be provided, for example, a session 1 (or 217-1) of modification of LSP corresponds to the first category, a session 216-2 (or 217-2) of modification of LSP that corresponds to the second category, ... and a session 2166-? (or 217-?) of modification of LSP that corresponds to an N-nth category (see Figure 13). Alternatively, a single LSP modification section 216 (or 127) can be prepared, together with a switching section 230 of the modified coefficient serving to switch v and? in response to the categories or i (see Figure 14). The adaptive process of revision has the advantage of reviewing a flexible process, which, for example, allows the improvement of the formants to be attenuated only for a specific category such as a category that provokes distortions while increasing the qualification of the formants. This would allow a uniform improvement or less distortion, in the filter pads of filter 203. It will be appreciated that because it is a multidimensional clothing, the category referred to herein is generally in a multidimensional vesture spas. It is preferable that the modifission process of? I in sections 216 and 217 of modifying LSP simply by use and a table 231 of translational somo is shown in Figure 15. More specifically, translation table 231 is prepared to correlate i are? hl¡. or? h2 ?, allowing session 216 or 217 of modifission of LSP to provide? hl¡. or? h2? Are your exit suando sonfiere? The advantage of using the translational table 231 lies in the redussing of the polling time. This advantage will become more or less considerable if a relatively somplex expression is used as an expression of prinsipiss for modifission of? .. The modification process? In sections 216 and 217 of modifission of LSP can be implemented by a neural network 232 that has previously learned the ristisas modifisasión of? are for example by the expression (6) sequestrated in Figure 16. A first advantage of using the neural network 232 lies in the redussión of the time in the processing. This advantage will become more or less considerable if a relatively complex expression is used as an expression of principles for the modifission process of?. A second advantage of using the neural network 232 is that a memory layer can be reduced due to the hesho that there is no need to store the translational table 231 compared to the case of using the translational table 231. A third advantage of using the neural network at 232 lies in the redussing of the distortion. For example, in the adaptive modes shown in Figures 13 and 14, the distortions often appear in the category boundary in the modified or semi-modified speech signal, synthesized, due to the abrupt change of v and? that 53 they increase from a slight radiation of? beyond the category limit. Distortions tend to be perceptible, particularly when the space division of? E3 is relatively unequal. In the modality of the translation shown in Figure 15, the distortions often appear in a limit of the direction of the table, in the same way as in the modes 13 and 14. For the sonar, in the modalities of the neural network shown in Figure 16, distortions do not ripple, since there is no category that causes the sudden sambio of vy? The LSP-based modality of the present invention is not intended to be limited to the sonification that it performs to LPC filtering and inverted LPC filtering, and will allow parameters other than LPC to be used as well as its filter soefisientes. For example, as shown in Figures 17 and 18, the present invention can be implemented by the use of an LSP filter 233 (of an inverted LSP filter 234) that uses the filter substance h hl som. (and? h2?) we are. The advantage of this configuration is that there is no need for the LSP / LPC transformation sections 218 and 219. b! Modality based on PARCOR.
Referring now to Figure 19, a modality that introduces PARCOR as the espestral information is represented. This embodiment includes sections 235 and 236 of modifying PARCOR and sections 237 and 238 of the PARCOR / LSP transformation, in addition to LSP filter 204 and inverted LSP filter 205. The modification section 235 of PARCOR is inserted PARCOR 'f¿ as the spectral information of the desdipisator 201 or the transformation session 215 and modifies this f to generate the PARCOR Fi2. In the same way, the 'session 236 of modifisation of PARCOR generates the PARCOR fh2i, modified. Transformation section 327 of PARCOR LPC transforms fhli from a PARCOR domain into an LPC domain to generate a filter coefficient li for the LPC filter 204. The PARCOR / LPC transform session 238 also transforms 2i of the PARCOR domain into the LPC domain to generate a filter 2i soefisient to generate a filter 2 soefisient, for the inverted LPC filter 205. Sections 235 and 236 of modification of PAPvCOR generate fhi h2i respectively, using the coefficients v and? modified ones that satisfy, for example, 0 < _? ^ v < 1, and according to the following expressions hli = Fi x v ^ x x) where i = 1, 2, ..., p. The execution of this modification allows the formants to moderate in the PARCOR domain. Consequently, this modality will ensure the same effect of characteristic improvement as that of the modality based on LPC, detailed (for example, efesto of improvement of the formants, and improves the sapacity of aj ustability of this improvement) thus we are a sont rol / aj usté free of filt.ro 203 characteristics of conformity are the demand of the users. It is natural that the present invention should not be constructed as being limited by the special (10) and other processes that make the weak formants within the PARCOR domain can be used. Furthermore, with respect to the filter that uses as its filter coefficient the PARCOP ^ or the parameter generated in baße to PARCOR, it is relatively easy to approve and ensure its stability in the PARCOR domain, since the output sonding is given by the following simple esuasión: - 1 Fi • • • (ID In other words, while the stress (11) is satisfied, the filter using the filter soefisiente based on PARCOR is stable. Therefore, according to this modality, the degree of freedom of the filter design is improved. For example, a modification process of PARCOR, the process of modifying PARCOR Fi independently for the respective i, can be used. In addition, the application to systems that transmit or souls the PARCOR so that the spectral deformation will ensure good connectivity due to the fact that there is no need for reanalysis of the thickness of the transformation of the parameters. In Figure 20 it graphically depicts the rises of the Fressensia thresher against the igormitic energy of the filter 203 in Figure 19. In the graph, A, B, C and D denote respectively the characteristics of the synthesizer 202 = 1 / A (z), the characteristics of the filter 204 = 1 / Al (z), the inverted characteristics of the filter 205 = 1 / A2 (z) , and filter characteristics 203 = A2 (z) / Al (z), with v = 0.98 and? = 0.9. As it is apparent from the comparison between Figures 20 and 33, this modality allows the peak-valley structure of the spectrum to appear more or less strong than that of the sonfiguration sample of • Figure 1. Through the auditory somparations of the voice, synthesized, modified, the present inventor has ascertained that the use of the filter 203 of this mode will definitely not cause any distorted voice, single or no fluctuating tone, and will ensure a good efesto of improvement of the formants. It will be obvious to those experts in the dissemination of this specification, that the details of this modality based on PARCOR can be modeled from the same point of view as the modality based on LSP. It will also be easy for those skilled in the dissemination of this espesifisation to exclude the filtration of inverted LPC and the constituent elements associated with it as shown in Figure 21 to clean up a configuration that includes the PARCOR filter 239 and a filter 240 of PARCOR inverted with the PARCOR fhlx and fh2x, modified, used as filter coefficients, as shown in Figure 22. c Modality based on LAR Figure 23 shows a modality that introduces LAR, in a semi-prospective way. This embodiment comprises, in addition to LPC filter 204, inverted LPC filter 205, LAR modification sections 241 and 242, and sections 243 and 244 into LAR / LPC transformation. the modification sections 241 of LAR introduce LAR? x as the spectral information of the decoder 201 by the transformation session 215 and modify this? to generate the LAR? hl !. In the same way, the modification section 242 of LAR also generates the modified LAR? H2i. The LAR / LPC transform 246 section transforms LAR? Hlx from the LAR domain and the LPC domain to generate a filter coefficient al for the LPC filter 204. The transformation section 244 LAR / LPC transforms? 2i of the LAR domain in an LPC domain to generate a filter coefficient a2i, the LPC filter 205 inverted. Sections 241 and 242 of LAR modification generate? li and? h2i, respectively, using the modified modifiers v and? they satisfy for example < _? < v < 1, and in accordance with the following statements: ? hli = Vi x v 1? 2i =? I x? ... (12) where 1 = 1, 2, ... p The execution of this modification allows the formants to moderate in the PARCOR domain. Consequently, this modality will ensure the same effect of characteristic improvement as that of the modality based on LPC, previous and the modality based on PARCOR (for example, the effect of improvement of the formants, and the improvement of the sapasidad to adjust -to give the degree of improvement) so we are the sont rol / to you free of the saraste rís tisas of the filter 203 of sonformidad are the demands of the users. It is natural that the present invention should not be considered as being limited by the expression (12) and that other proses can be used that cause the formants to weaken within the domain of LAR. Since the stable filter is approved and secured when the filter elements generated on the basis of LAR are used, the modification process of LAR in this mode is not restricted to the stability of the filter stability. Therefore, the degree of freedom of the design of the filter in this modality is higher than those of the previous ones. In addition, application to the systems that transmit- or store PARCOR so the spectral information will ensure a good stability due to the fact that there is no need for the reanalysis of the spectrum and the transformation of the parameters. Figure 24 graphically depicts the characteristics of the frequency thresher against the logarithmic energy of the filter 203 in Figure 23. In the graph, A, B, C and D denote the resistances of the synth rias of the synthesizer 202 = 1 / A (z) , - filter characteristics 204 = 1 / Al (z), the inverted characteristics of the filter 205 = 1 / A2 (z), and filter characteristics 203 = A2 (z) / Al (z), with v = 0.9 and? = 0.7. The comparison between Figures 24 and 33 has revealed that this modality allows the thickener to level while leaving the floor-valley estrustura of the thickener to some degree, resulting in a better efesto of improvement of the formants compared to the described sonfiguration. in reference 1. Also, compared to Figure 34, Figure 24 presents fewer distortions compressed with the valley and spectrum structure. In Figure 24, there is no longer a phenomenon of integration of the two formants at the midpoint, which will become apparent from the comparision and from the rhythmic sounds B and C of Figure 35. Through the auditory comparisons of the synthesized voice, modified, the present inventor has ascertained that the use of the filter 203 of this mode will definitely not cause any distorted voice, unique or in any fluctuating tone, and will ensure a good efesto of improvement of the formants. It will be obvious to those experts in the technique of the disposition of this espesifisation, the details of this modality based on LAR can be summarized from the same point of view as the modality based on LSP and the modality based on PARCOR. It will also be easily recoverable from the breakdown of this specification, for those skilled in the art, to undercut the inverted LPC filtering and the constituent elements associated with these are shown in Figure 26 and employ a configuration that includes a filter 239 of PARCOR and a filter 240 of inverted PARCOR with modified LAR? H2i, used as filter coefficients. Additionally, to transform the LAR? li. ? h2 modified, from the LAR domain to the PARCOR domain, the LAR / PARCOR transformation sections 246 and 247 are given in FIG. 26. Since in general the transformation process of LAR / PARCOR is relatively simple and easy to realize that the transformation of LAR / LPC, the transformation sections 246 and 247 of LAR / PARCOR 3e can increase with fewer steps of the process or with smaller circuits than the sections 243 and 244 of transformation of LAR / LPC. Therefore, in accordance with the embodiment of Figure 27, the filter coefficients al and a2 are derived within the most significant period, and the gross process by the filter 203 is reduced from the modes of the Figures 23 and 25 d) Supplement 6E It will be readily conceivable from the description of this specification, for those skilled in the art, to selectively combine the modality based on LSP, the modality based on PARCOR and the modality based on LAR, described above. It can also be easily conceivable from the description of this specification, for those skilled in the art, to combine the modality of the present invention with the conventional LCP-based apparatus. Up to several combinations contribute to the implementation of a filter 203 that has a high degree of characteristic design freedom, which could not be implemented in any other way. For example, as shown in FIG. 27, filter filter a.sub.a of filter 204 can be defined by the same method as reference 1, while the filter coefficient a.sub.i of filter 205 can be defined by the same method. as the modality based on PARCOR. This configuration will lead to a filter 203 having a lower spectral gradient than the characteristics D of Figure 33 and fewer distortions in the vicinity of the formants than the characteristics D of Figure 34.
In front or behind the filter 203, or in parallel are the filter 203, another filter can be placed to perform the process of improvement of the separation, the processing of the high frequency improvement, the processing of improvement of I03 formants, etc.
It is noted that in relation to this date, the best method known by the applicant to carry out the present invention is that which is clear from the present description of the invention.
Having described the invention as above, the content of the following is claimed as property:

Claims (29)

1. A filter characterized in that it comprises: a filtering means for filtering speech signals synthesized by means of an interference portion defined by the filter coefficients to generate synthesized, modified, signals; and a means of generating filter coefficients to generate the filter coefficients based on the spectral in representaion represented in the form of a multi-dimensional v ctor and which corresponds to a predetermined domain and which returns to the signal s s input, e so that the characteristics of the formants of the vocal s if the modified, modified are improved according to this spectral information and in comparison with those of the vocal signals, synthesized; - this spectral information which is any of the information of LSP (pairs of the spectrum of lines), information of PARCOR (coefficients of partial autocorretación) and information of LAR (logarithmic area ratio).
2. A filter according to claim 1, characterized in that the filter coefficients correspond to an LPC domain (linear prediction codes).
3. A filter according to claim 2, curly sachet because: the filter coefficient generating means includes: a modification means, for modifying the spectral information within the predetermined domain, or for generating the modified spectral information; and a means to transform this modified skeletal information of the predetermined domain into an LPC domain to generate filter coefficients.
4. A filter according to claim 3, characterized in that: the modification means includes a means of n? Ve_lac? Ón to modify the spectral information, to reduce the peaks of the formants of the vocal signals, synthesized, modified.
5. A compliance filter is claim 4, curly sachet because: the spectral information is LSP information, and wherein the leveling means includes a means of proportional division to divide proportionally, according to the modified standard, the spectral information. and the information of reference that corresponds to the same domain to the sual sorresponde the espestral information, to generate the especssion modifisada information.
6. A soundness filter is claim 5, characterized in that: the proportional dividing means proportionally divides the spectral information and the reference information, to impart a fixed spectral gradient to the voiced, synthesized, modified signals.
7. A soundness filter is claim 5, characterized in that: the proportional dividing means proportionally divides the spectral information and the reference information to impart to the vocal signals, synthesized, modified a gradie e esp co what reflects espestro and noise average.
8. A soundness filter is claim 5, curly sachet because the proportional dividing means proportionally divides the espestral information and reference information, to impart to the vocal signal, synthesized, modified a spectrum gradient that reflects a history which the spectral information has drawn like this.
9. A filter according to claim 4, characterized in that the spectral information is either PARCOR information or LAR information, and wherein the leveling means includes a means for multiplication, for each of a plurality and dime-nsiones which constitute the espestral formation, the espestral information by a modified coefficient or by the energy of the modified coefficient, to generate the modified spectral information.
10. The soundness filter according to claim 9, characterized in that the energy is dependent on the dimension.
11. A resiliency filter is the retvindisation 3, saracte rizado because the spectral information is information of LSP and where the means of > Modifisation involves a distance expansion means for extending the distances between the adjacent dimensions between a plurality of representative dimensions of the spectral information, to generate the modified spectral information.
12. A soundness filter is claim 11, which is sarasterized because the means of expansion of the distance, includes:. the means of expansion to extend the distances beyond the referensia distance, where the distances between the adjacent dimensions are smaller than a reference distance; a compression means to also compress the distances with respect to all the adjacent dimensions, after the expansion of the distances between the dimensions by the expansion means, to ensure that the degree of the spectinformation in its entirety becomes coincident with the degree before the expansion.
13. The conformity filter is claim 3, which is sarasterized because the specification information is LSP information, and where the means of modifisation includes: a means of proportional division, to divide proportionally, according to a modifieed soefisiente, the information. espestand referensia information that sorresponde to the same domain to sual sorresponde espestinformation; - a means of expanding the distance to extend the distances between the adjacent dimensions between a pluty of representative dimensions in the espestraí information; and a switching means for selectively using either the proportional dividing means or the distance expansion means to generate the modified spectinformation.
14. A filter according to claim 3, characterized in that the spectinformation is LSP information, and wherein the modifying means includes: a proportional dividing means for proportionally dividing the spectinformation and the referensia information corresponding to the same domain to which corresponds the spectinformation according to a modified coefficient; a means of expanding the distances to extend the distances between the adjacent dimensions between a pluty of representative dimensions of the spectinformation; and a means of connecting in sassada to use both the proportional dividing means and the expansion medium of the cooperative distances, to generate the modified spectinformation.
15. A filter according to claim 3, characterized in that the modification means includes a translation table for storing the spectinformation at coronation are the modified spestinformation, this translation table generates in the modified spectinformation that will be generated in response to the supply of spectinformation.
16. A filter according to claim 3, characterized in that a modification means includes a neunetwork that has acquired, by learning, the ability to transform the spectinformation into the modified spectinformation, this neunetwork generates the modified spectinformation that is going to generate in response to the supply of spectinformation.
17. A compliance filter is claim 3, curly sachet because the modifying means includes: a pluty of category-specific modification means, each provided for each of a pluty of categories that do not overlap each other and are obtained at the same time. classify the default domain; the pluty of specific means of categories includes one: a means to modify the specific information within a corresponding category to generate the specific information, modified, and a means to transform the specific information, modified from the predetermined domain to the LPC domain, para- generate a different coefficient.
18. A filter in accordance with claim 3, characterized in that the modifying means includes: a means for modifying, according to a modifiable standard, the spectinformation within a predetermined domain, to generate the unmodified thickered information; - means for transforming modified modifier information from the predetermined domain in an LPC domain, to generate the filter soefisientes; and a means to adjust the modified coefficient according to the category, this spectinformation corresponds to the priority of categories, which are obtained by dividing the predetermined domain and which do not overlap each other.
19. A soundness filter is claim 1, which is sarasterized because the filter subjects are responsive to anyone in an LSP domain of a domain. PARCOR.
20. A filter according to claim 19, characterized in that the means for generating filter coefficients includes: modifying means, for modifying the spectral information within a predetermined domain, to generate the modified spectral information; and - a means for supplying the spectral information, modifiedated by filter soefisientes in the filtering medium.
21. A soundness filter is claim 1, curled because the filtering means includes a synthesis filter to implement the denominator of the transfer function, to ensure that the reporting characteristics of the synthesized, modified speech signals are improved compared to those of the vocal signals, synthesized.
22. A filter according to claim 21, sarasterized in that the filtering means also includes an inverted filter to suppress an inverted spectral gradient to the vocal signals, if desired, modified by the synthesis filter.
23. A vocal synthesizer apparatus, characterized in that it comprises: a means to generate sound and synthesized signals based on the espestral information represented in the form of a multi-dimensional vestor and which corresponds to a predefined domain and which belongs to the three input signals; a means for filtering the signals synthesized by a transfe rensia function 31 defined by the filters soefisientes to generate vocal signals, synthesized, modified; and a means to generate the filter coefficients based on the spectral information in such a way that the characteristics of the formants of the vos signals, synthesized, modified are improved according to the espestral information, and in somparasión are those of the signals you synthesized; Spectral information which is any of the LSP information, PARCOR information and LAR information.
24. A vocal synthesizer apparatus, characterized in that it comprises: a means for generating a speech signal, synthesized based on the first spectral information represented in the form of a multi-dimensional vector, and corresponding to a predetermined domain and belonging to the voice signals of entry; a means to transform the first espestral information into the second espestral information that corresponds to a domain different from the predefined domain; 97 a medium pa to filter the vos signals synthesized by means of an ansference function defined by filter coefficients, to generate vocal, synthesized, modified signals; and a means for generating the filter soefisientes on the basis of the second espestral information to ensure that the characteristics of the formants of the vocal signals, synthesized, modified are improved according to the second spectral information in somparasión are- those in the se? ales vosales, synthesized; Spectral signal which is any of the LSP information, PARCOR information and LAR information.
25. A vocal synthesizer apparatus, characterized in that it consists of: a means to generate vos signals, synthesized based on the first espestral information represented in the form of a multi-dimensional vestor, and which corresponds to a predetermined domain and which belongs to the signals vowels, input; a means to analyze the synthesized speech signals, to generate the second spectral information; a means for filtering the vocal signals, synthesized by a defined transfer function of the filter coefficients, to generate speech, synthesized, modified signals; and a means for generating filter coefficients based on the second spectral information, to ensure that the characteristics of the formants of the synthesized, modified speech signals are improved according to the second espestral information and in sompasion are those of the vos signals , synthesized; Spectral information which is any of the LSP information, PARCOR information and LAR information.
26. A storage / voice transmission system, characterized by the fact that it is a means to analyze the input signals to generate the espestral information represented in the form of a multidimensional vestor that corresponds to a predefined domain and that belongs to the signals of the user. ada; a means for storing and transmitting spectral information; a means for generating vocal signals, synthesized, based on the spectral information that has been stored or transmitted; a means to filter the vocal signals, synthesized by means of a transferensia function defined by the filter soefisientes, to generate the verbal signals, synthesized, modified; and a means for generating the filter soefisientes based on the espestral information, to ensure that the characteristics of the formants in the vocal, synthesized, modified signals are improved according to the spectral information and in somparation with those of the synthesized speech signals; . the spectral information that is any of the LSP information, PARCOR information and LAR information.
27. A storage / vocalization system, sarasterized because it appears: 35 a means to analyze the input signals, to generate the first espestral information represented in the form of a mui-dimensional vestor and which corresponds to a predetermined domain and which belongs to the input signals; a means to souffle or transmit the first central information; a means to generate a vocal signal, synthesized based on the first espestral information that has been stored or transmitted; a means to transform the first spectral information into the second spectral information corresponding to a domain different from the predefined domain; a means to filter the signals vosales, synthesized by means of a transfer function, defined by the filter soefisientes, to generate vos signals, synthesized, modified; and a means for generating the filter soefisientes based on the second espestral information, to ensure that the formative characteristics of the vocal signals, synthesized, modified are improved according to the second information 36 that I contracted in comparison are those of the vos signals, synthesized; Spectral information which is any of the LSP information, PARCOR information, and LAR information.
28. A vos storage / transmission system, characterized in that it comprises: a means to anayze the input signals, to generate a first spectral information represented in the form of a multi-dimensional vector and corresponding to a predetermined domain and belonging to the vocal, input signals; a means for storing or transmitting the first spectral information; a means to generate synthesized vocal signals based on the first espestral information that has been soured or transmitted; - a means to analyze the synthesized vos signals, to generate the second spectral information; a means for filtering the vocal signals, synthesized by a transfer function defined by the filter soefisientes, for generates the synthesized, modified signal signals and a means for generating the filter elements based on the second spectral information to ensure that the characteristics of the formants of the synthesized, modified speech signal are improved according to the second spectral information and in somparasión are those of vos signals, synthesized; Spectral information which is any of the LSP information, PARCO information, and LAR information.
29. A modulation method vosal, sarasterized because somprende: a first step of filtering the signals, vosales, synthesized by a translational function defined by the filter soefisientes to generate the signals vosales, synthetized, modi-fi dada3; and the second step of generating the filter coefficients based on the spectral information presented in a multidimensional vector corresponding to a predetermined domain and belonging to the vos signals, synthesized, for ensure that the characteristics of the formants of the vocal, synthesized, modified signals are improved according to the spectral information and compared to those of the synthesized speech signals; the second step that precedes the execution of the first step; the spectral information is any of the LSP information, PARCOR information, and LAR information. 39 SUMMARY OF THE I NVENCI N In the present invention, a filter of odification or vocal improvement is described, and an apparatus, system and method used in it. The synthesized vocal signals are filtered to generate synthesized, modified vocal signals. From the spectral information represented as a multi-dimensional vestor, a filter soefisiente is determined to ensure that the characteristics of the formants of the vocal signals, synthesized, modified to improve compared to those of the vocal signal, synthesized and according to the spectral information. Spectral information can be any of the LSP information, the PARCOR information, and LAR information. In this way, the degree of freedom of the design of the modifisation filter vosal used for the aural suppression of the softened sound is intensified in the voice signals, synthesized which leads to the improvement of the comprehensibility of the vocal signals, synthesized. It can have an effect of improving the formants without allowing any perceptible level of distortion to occur within a range of permissible spectral gradients.
MXPA/A/1996/001755A 1995-05-12 1996-05-09 Filter for the modification or vocal improvement, and various apparatus, systems and method used by elmi MXPA96001755A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP7-114752 1995-05-12
HEHEI7-114752 1995-05-12
JP7114752A JP2993396B2 (en) 1995-05-12 1995-05-12 Voice processing filter and voice synthesizer

Publications (2)

Publication Number Publication Date
MX9601755A MX9601755A (en) 1997-07-31
MXPA96001755A true MXPA96001755A (en) 1997-12-01

Family

ID=

Similar Documents

Publication Publication Date Title
US5822732A (en) Filter for speech modification or enhancement, and various apparatus, systems and method using same
CN1750124B (en) Bandwidth extension of band limited audio signals
US5933801A (en) Method for transforming a speech signal using a pitch manipulator
CN110085245B (en) Voice definition enhancing method based on acoustic feature conversion
US8255222B2 (en) Speech separating apparatus, speech synthesizing apparatus, and voice quality conversion apparatus
EP0673013B1 (en) Signal encoding and decoding system
US7680653B2 (en) Background noise reduction in sinusoidal based speech coding systems
DE10041512B4 (en) Method and device for artificially expanding the bandwidth of speech signals
CN102169692B (en) Signal processing method and device
KR100726960B1 (en) Method and apparatus for artificial bandwidth expansion in speech processing
DE69730779T2 (en) Improvements in or relating to speech coding
JP2956548B2 (en) Voice band expansion device
US8311842B2 (en) Method and apparatus for expanding bandwidth of voice signal
CN100365704C (en) Speech synthesis method and speech synthesis device
WO2011004579A1 (en) Voice tone converting device, voice pitch converting device, and voice tone converting method
WO1999030315A1 (en) Sound signal processing method and sound signal processing device
DE112014000945B4 (en) Speech emphasis device
CN101140758A (en) Perception weighting filtering wave method and perception weighting filter thererof
MXPA96001755A (en) Filter for the modification or vocal improvement, and various apparatus, systems and method used by elmi
JP3360423B2 (en) Voice enhancement device
JP3230791B2 (en) Wideband audio signal restoration method
WO2010078938A2 (en) Method and device for processing acoustic voice signals
Van Ngo et al. Enhancement of speech intelligibility under noisy reverberant conditions based on modulation spectrum concept
Huq Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)
CN1926606A (en) Coding/decoding method based on template matching and multiple distinguishability analysis