CN1119793C

CN1119793C - Method for composing characteristic waveform of audio signals

Info

Publication number: CN1119793C
Application number: CN 98118362
Authority: CN
Inventors: 张景嵩; 温世义; 全晨; 方国平
Original assignee: Inventec Corp
Current assignee: Inventec Corp
Priority date: 1998-08-17
Filing date: 1998-08-17
Publication date: 2003-08-27
Anticipated expiration: 2018-08-17
Also published as: CN1245326A

Abstract

The present invention relates to a method for synthesizing and choosing characteristic wave shapes of audio signals. Firstly, wave shapes which are waited to be processed are analyzed, and representative characteristic wave shapes can be chosen. When the characteristic wave shapes are stored, the characteristic wave shapes and correlation parameters thereof can be only recorded, the wave shapes of the whole signals are not need to record, and a great amount of memory space can be saved. When the characteristic wave shapes are subsequently synthesized and reduced, the wave shapes can be synthesized and reduced by reading out the characteristic wave shapes and the correlation parameters thereof, and an interpolation operational method. The synthesized tone quality is approach to the tone quality of adaptive differential pulse code modulation (ADPCM), and therefore, the present invention accords with the application area of a low speed central processing unit.

Description

The synthetic method of audio signal signature waveform

Technical field

The present invention relates to the audio signal treatment technology, particularly relate to audio signal signature waveform synthetic method.

Background technology

Because the development of Digital Electronic Technique, but make and become digital signal after the analog signal waveform mat analog/digital conversion, even so that storage, handle transmission, therefore, circulation that more can the accelerated electron data with share.

Known intercept signal Wave data and when being write down is decided on required precision usually, be with eight positions or 16 positions represent the amplitude of sampled waveform each point.If one section waveform is with 8K some sampling, and with the value after each sampling spot quantification of eight bit representations, then this section waveform must take the 64K position.In other words, if during the signal of record audio signal, with the sampling rate of per second 8K and with 8 quantifications, just then the signal that per second intercepted needs the storage space of 64K position.

Though mode is handled the processing that audio signal can be in real time (real time), the huge data of handling can take a large amount of storage spaces with above-mentioned pulse-code modulation (PCM:Pulse Code Modulation), to the application formation restriction greatly of its reality.If using adaptive differential pulse code modulation (ADPCM:Adaptive Differential Pulse Code Modulation) mode encodes, though can save the storage space of half, but for low-speed CPU (such as Z80,80386 etc.), because of algorithm is too complicated loaded down with trivial details, thereby can't handle in real-time mode.Therefore, in the application of low-speed CPU, seek a kind of audio signal disposal route, have concurrently simultaneously and can not take a large amount of storage spaces and function such as can handle in real time, become field personage for this reason to be expected.

Summary of the invention

Therefore, fundamental purpose of the present invention is to provide a kind of audio signal signature waveform synthetic method, can reduce the demand to storage space.

Another object of the present invention is to provide a kind of audio signal signature waveform synthetic method, and applicable dried low-speed CPU must be done real-time processing to audio signal.

For achieving the above object, the invention provides a kind of audio signal signature waveform synthetic method, this method comprises the steps: to intercept audio signal; The audio signal that is intercepted is taken a sample and quantification treatment, to form the file of WAV; Carry out choosing of signature waveform, selected signature waveform initial sum final position is preferably selected the amplitude place of equalling zero, and is all up or down trend to guarantee mutually for consistent with adjacent waveform junction; Store selected first signature waveform that can represent an audio signal and the time interval between second signature waveform and two characteristic waves; Read first signature waveform and second signature waveform stored; With the interpolation waveform that synthesizes with interpolation method therebetween.

For achieving the above object, the invention provides a kind of audio signal signature waveform synthetic method finishes, wherein, the cycle of first signature waveform is that Ma, amplitude are Aa[t], and the cycle of second signature waveform is Mb, amplitude is Ab[t], the time interval between this first signature waveform and this second signature waveform is L.According to the inventive method is to synthesize interpolation waveform between first signature waveform and this second signature waveform with interpolation method.And the amplitude of each interpolation waveform is:

Ar[t]＝(L-K)/L×Ar′[t]+(1+k)/L×Ar′′[t]；

The cycle of each interpolation waveform is:

Mr=Ma-r * (Ma-Mb)/(1+R) wherein,

r＝1，2，...，R；

Wherein,

R＝2L/(Ma+Mb)；

Ar′[t]＝Aa[(Ma/Mr)×t]；

Ar′′[t]＝Ab[(Mb/Mr)×t]；

r＝1，2，...，R；

T=0,1 ..., Mr-1; And

k＝(M1+M2+...+M(r-1))，(M1+M2+...+M(r-1)+1)，...，

(M1+M2+...+M(r-1)+(Mr-1))。

Description of drawings

For above-mentioned and other purposes of the present invention, feature and advantage can be become apparent, a preferred embodiment cited below particularly, and conjunction with figs. is described in detail below.

Fig. 1 shows one section audio signal oscillogram;

Fig. 2 is the signature waveform figure that shows after choosing;

Fig. 3 shows according to the present invention audio signal signature waveform synthetic method oscillogram after synthetic;

Fig. 4 is the process flow diagram that shows the inventive method; And

Fig. 5 is the process flow diagram that shows an embodiment of the inventive method.

Embodiment

Audio signal signature waveform synthetic method provided by the present invention is earlier pending waveform to be analyzed, and filters out representative signature waveform (characteristic waveform).When in when storage, only these signature waveforms and correlation parameter thereof need be write down, and the waveform of whole signal needn't be write down, so, but the just a large amount of storage space of economization.Because before filtering out signature waveform, audio signal waits processing through the sampling quantification, so selected signature waveform is discrete value (discrete value) according to its sampling rate.During follow-up synthetic reduction,, just can synthesize with an interpolative operation method again and restore waveform by reading these signature waveforms and correlation parameter thereof.This interpolative operation method is not to belong to complicated loaded down with trivial details method, so reduction rate is quite fast, just the waveform with 80486 central processing units reduction 4000K bit data amount is an example, only needs the scene of five seconds approximately.Therefore, the inventive method quite meets the application of low-speed CPU.Below just be described below in detail with regard to the inventive method.

If it is synthetic to talk about the audio signal signature waveform, then inevitable elder generation gets the characterization waveform is how to choose.Owing to comprise that the audio signal of voice, music, phoneme, audio etc. all has some common characteristic, promptly be to have quasi periodic in the section sometime, in addition, audio signal also has continuity.According to these two principal features, observe at one section audio signal waveform, select wherein representative signature waveform, and, also the length between the two adjacent signature waveforms is also given recording storage in the lump simultaneously these signature waveform recording storage.

Reduction for ease of waveform subsequent, be minimized excessive the beating and the noise that produces in junction between the audio signal waveform after synthetic, therefore, selected signature waveform initial sum final position is chosen in preferably that amplitude equals zero or near zero place, and be all up or down trend with adjacent waveform junction, to guarantee the phase place unanimity.This signature waveform is chosen step, on one side for example on one side the selected characteristic waveform, utilize audio signal signature waveform synthetic method of the present invention (as described later in detail) composite signal, listen the effect after synthesizing then; If undesirable, then choose again synthetic, until seek can obtain the signature waveform of optimum efficiency till.Moreover, also can use autocorrelation function and cross correlation function to come cycle of signal calculated, and selected characteristic waveform according to this.If audio signal belongs to voice signal, then the cycle of its signal clearly is easy to filter out representative signature waveform.

Figure 1 shows that one section audio signal oscillogram, according to this section waveform, choose two signature waveform A and B as shown in Figure 2 and stored, simultaneously the time span L between two signature waveforms is also given recording storage, this moment, length L was meant the starting point of the terminating point of signature waveform A to signature waveform B.Emphasize once more at this, because before filtering out signature waveform, audio signal waits processing through the sampling quantification, so selected signature waveform is discrete value (discretevalue) according to its sampling rate.

As mentioned above, through choose signature waveform A and B, waveform A is that one-period is that Ma, amplitude are Aa[t] waveform, waveform B is that one-period is that Mb, amplitude are Ab[t] waveform, the time interval between waveform A and the waveform B is L, so estimate to want the waveform number of times of interpolation to be at time interval L: R=2L/ (Ma+Mb);

The cycle Mr of each interpolation waveform is respectively: Mr=Ma-r * (Ma-Mb)/(1+R) wherein, r=1,2..., R; Waveform A presses Mr periodic extension: A1 ' [t]=Aa[(Ma/M1) * t] wherein, t=0,1 ..., M1-1; A2 ' [t]=Aa[(Ma/M2) * t] wherein, t=0,1 ..., M2-1; Ar ' [t]=Aa[(Ma/Mr) * t] wherein, t=0,1 ..., Mr-1; Waveform B is pressed Mr periodic extension: A1 ' ' [t]=Ab[(Mb/M1) * t] wherein, t=0,1 ..., M1-1; A2 ' ' [t]=Ab[(Mb/M1) * t] wherein, t=0,1 ..., M2-1; Ar ' ' [t]=Ab[(Mb/Mr) * t] wherein, t=0,1 ..., Mr-1; Moreover waveform A influences each synthetic waveform continuation successively in the ratio of (L-k)/L, and waveform B influences each synthetic waveform continuation successively in the ratio of (1+k)/L.Then each repetitive pattern amplitude after the reduction is: Ar[t]=(L-K)/L * Ar ' [t]+(1+k)/L * Ar ' ' [t];

Wherein,

r＝1，2，...，R；

T=0,1 ..., Mr-1; And

k＝(M1+M2+...+M(r-1))，(M1+M2+...+M(r-1)+1)，...，

(M1+M2+...+M(r-1)+(Mr-1))。

In view of the above, the waveform that is synthesized by waveform A and waveform B promptly as shown in Figure 3.Originally needed storage whole section waveform shown in Figure 1, and after the audio signal signature waveform synthetic method, only needed stored waveform A and waveform B and time interval length L therebetween to get final product, so economization storage space significantly according to the present invention.

The inventive method is applicable to the processing audio signal, and the voice signal that wall writes down with WAV or PCM in this way is so can apply mechanically the basic format of WAV.

Signature waveform storage of the present invention can comprise the form storage that header area (header block) and data field (DataBlock) two blocks are formed, and now is described in detail as follows: The header area

This header area comprises some essential informations, and it comprises: file size, profile name type, Format Type, port number, sampling frequency value, per second average data transfer rate, PCM data sampling figure place and signature waveform number etc.The file data structure of this signature waveform can C language described as follows shown in:

Typedef struct＜br/〉char RIFF[4];＜br/〉long Whfilelen;＜br/〉char BWSfmt[8];＜br/〉long version;＜br/〉int FormatTag;＜br/〉int Channels;＜br/〉long SamplePerSec;＜br/〉long AvgBytesPersec＜br/〉int blockalign;＜br/〉int BitPerSample;＜br/〉char data[4];＜br/〉long SjpeWaveNum;＜br/〉};＜br/〉and,＜br/〉AvgBytesPerSec=Channels * SamplePerSec * (BitPerSample/8);＜br/〉Blockalign=Channels * (BitPerSample/8));＜br/ 〉

The data field

The PCM sampled data of this data field storage feature waveform and signature waveform information parameter.For example, the storage format of one eight monophony pulse-code modulation data can be:

Information bit reaches

With last signature waveform

Sampling 1

Sampling 2

......

Signature period

Gap length

Monophony

16 16 888

Wherein, information bit is that three positions are formed, and signature period is with 13 bit representations.The storage format of one eight two-channel pulse-code modulation data can be:

Information bit and signature period

With last signature waveform gap length

1 monophony of taking a sample

2 monophonys of taking a sample

......

16 16 888 wherein, and information bit is that three positions are formed, and signature period is with 13 bit representations.The storage format of sixteen bit monophony pulse-code modulation data can be:

Information bit and signature period

With last special microwave shape gap length

1 monophony of taking a sample is hanged down the word group

The high word group of 2 monophonys of taking a sample

2 monophonys of taking a sample are hanged down the word group

The high word group of 2 monophonys of taking a sample

......

16 16 88888 wherein, and information bit is that three positions are formed, and signature period is with 13 bit representations.The storage format of sixteen bit two-channel pulse-code modulation data can be:

Information bit and signature period

With last special microwave shape gap length

1 L channel of taking a sample hangs down the word group

The high word group of 2 L channels of taking a sample

2 R channels of taking a sample hang down the word group

The high word group of 2 R channels of taking a sample

......

16 16 88888 wherein, information bit is that three positions are formed, and signature period is with 13 bit representations.

Three positions of the information bit of above-mentioned each form are to be used for the type of distinguishing characteristic waveform.For example audio signal to be chosen is the pronunciation of English individual character, and then signature waveform can be divided into consonant, vowel and quiet etc.If quiet, then 13 of the wave recording cycle positions are together with 29 positions altogether, 16 follow-up positions, in order to writing down this quiet length, so, can write down 512M sampling spot altogether; If quiet length surpasses this numerical value, then can take 4 positions again and write down quiet length.

When signature waveform is synthetic, being connected between interpolation waveform and signature waveform will produce noise if not very level and smooth.For fear of the appearance of this noise, when at the selected characteristic waveform, just should pay attention to the selection of the starting point of signature waveform, to select each starting point amplitude as far as possible be zero or be bordering on zero place.Therefore, guaranteed the level and smooth of waveform junction, then the sound that synthesizes according to this law is nature.

In above-mentioned signature waveform building-up process, the utilization interpolation method calculates the waveform number of required interpolation in the time interval L of two signature waveforms through choosing and the cycle of each interpolation waveform.Yet after synthetic reduction, the time span L ' that is made up of the interpolation waveform is little than L, difference therebetween between 0～less signature period length between.Consistent for guaranteeing institute's synthetic waveform with original waveform length, can be in the interpolation waveform even interpolation 1～2 points again, the two reaches unanimity to impel L ' and L.In addition, also can utilize a low-pass filter that audio signal is filtered, eliminate because of connecting the unsmooth noise that produces.

With reference to Figure 4 and 5, shown in be respectively process flow diagram of the inventive method and the process flow diagram of an embodiment (TTS:Textto Speech).

As shown in Figure 4, be the process flow diagram of the inventive method.At first, in step 40, from medium such as tape intercepting record audio signal thereon, if be applied to the text conversion voice technology, then this audio signal is meant the phoneme of being concluded by pronunciation rule.In step 42, the audio signal that is intercepted is taken a sample and quantification treatment again, in brief, do digitisation exactly and handle, so that form file as the WAV form.Then, in step 44, carry out choosing of signature waveform, reduction for ease of waveform subsequent, be minimized excessive the beating and produce noise in junction between the audio signal waveform after synthetic, so selected signature waveform initial sum final position is preferably selected amplitude to equal zero or approached zero place, and be all up or down trend, to guarantee the phase place unanimity with adjacent waveform junction.Can set up a working environment at present, on one side the selected characteristic waveform, utilize audio signal signature waveform synthetic method composite signal of the present invention on one side, listen the effect after synthetic then; As undesirable, then choose syntheticly again, end until seeking the signature waveform can obtain optimum efficiency.Moreover, also can use autocorrelation function and cross correlation function to come cycle of signal calculated, and selected characteristic waveform according to this.If audio signal belongs to voice signal, then the cycle of its signal clearly is easy to just can determine more suitable signature waveform.Then, in step 46, the time span between selected signature waveform and two signature waveforms is stored, afterwards, in step 48, read the signature waveform and the time interval, what read is first signature waveform that can represent an audio signal and second signature waveform of being stored, in step 50, it is synthetic to carry out signature waveform, at last, in step 52, sounding.

As shown in Figure 5, be depicted as the synthetic block flow diagram that the inventive method is applied to text conversion voice (TTS:Text toSpeech) technology.At first, in step 50, read word, this word for example be by the user inquire about a certain individual character, again in step 52, analyze the phonetic symbol combination of word, and, choose phoneme according to ad hoc rules in step 54, be example for example with English-word " HELLO ", can be cut into＜* h according to rules of pronunciation 〉,＜ha,＜al,＜lo,＜o* etc. phoneme, wherein, symbol * representative is quiet.And step 56 is according to the synthetic selected phoneme of the inventive method, again in step 58, and with the synthetic word of the phoneme set of being synthesized, and in step 60, to this utterances of words.The detailed process of steps such as above-mentioned steps 50,52,54,58 has been exposed in each case such as application number 85112444 and 85112445, but it is not to be emphasis of the present invention, so repeat no more in this.

In sum, using audio signal signature waveform synthetic method of the present invention, is that audio signal is screened representative signature waveform, and follow-up synthesizing with interpolation method according to signature waveform again reduces.Yet, its compressibility and reduction effect end rely the original audio signal waveform that is selected, the inventive method is to music and audio test, at 8K sampling rate, 8 quantifications, transfer rates is the original audio signal of 64Kbits/sec, its speed is approximately between 8～32Kbits/sec, this speed is between adaptive differential pulse code modulation (ADPCM) and vector sum Excited Linear Prediction (VSELP), yet the tonequality that it synthesized is near adaptive differential pulse code modulation (ADPCM).

Though the present invention discloses as above with preferred embodiment; but it is not that any those skilled in the art is not in breaking away from spiritual scope of the present invention in order to qualification the present invention; can do to change and retouching, so protection scope of the present invention should be as the criterion with the scope that claim was defined.

Claims

1. an audio signal signature waveform synthetic method is characterized in that this method comprises the steps:

The intercepting audio signal;

The audio signal that is intercepted is taken a sample and quantification treatment, to form the file of WAV;

Carry out choosing of signature waveform, selected signature waveform initial sum final position is selected the amplitude place of equalling zero, and is all up or down trend to guarantee that phase place is consistent with adjacent waveform junction;

Store selected first signature waveform that can represent an audio signal and the time interval between second signature waveform and two characteristic waves;

Read first signature waveform and second signature waveform stored; With

Synthesize therebetween interpolation waveform with interpolation method.

2. audio signal signature waveform synthetic method as claimed in claim 1, wherein, the step of choosing of carrying out signature waveform comprises that also the use autocorrelation function comes the cycle of signal calculated with the selected characteristic waveform.

3. audio signal signature waveform synthetic method as claimed in claim 1, wherein, the cycle of this first signature waveform is that Ma, amplitude are Aa[t], the cycle of this second signature waveform is that Mb, amplitude are Ab[t], the time interval between this first signature waveform and this second signature waveform is L.

4. this audio signal signature waveform synthetic method as claimed in claim 3, wherein, the relation of this interpolation waveform is as follows:

The amplitude of each this interpolation waveform is:

Ar[t]＝(L-k)/L×Ar′[t]+(1+k)/L×Ar′′[t]；

The cycle of each interpolation waveform is:

Mr=Ma-r * (Ma-Mb)/(1+R) wherein,

r＝1，2，...，R；

Wherein,

R＝2L/(Ma+Mb)；

Ar′[t]＝Aa[(Ma/Mr)×t]；

Ar′′[t]＝Ab[(Mb/Mr)×t]；

r＝1，2，...，R；

T=0,1 ..., Mr-1; And

k＝(M1+M2+...+M(r-1))，(M1+M2+...+M(r-1)+1)，...，

(M1+M2+...+M(r-1)+(Mr-1))。