CN101055722A

CN101055722A - Audio conversion method and equipment

Info

Publication number: CN101055722A
Application number: CNA2006100256210A
Authority: CN
Inventors: 王卫华
Original assignee: Shanghai Chenxing Electronics Science and Technology Co Ltd
Current assignee: Shanghai Chenxing Electronics Science and Technology Co Ltd
Priority date: 2006-04-12
Filing date: 2006-04-12
Publication date: 2007-10-17

Abstract

The invention discloses an audio conversion method comprising following steps: a step one of selecting a first data segment and a second data segment from an audio segment; a step two of superposing the first data segment and the second data segment; a step three of sampling the superposed audio data; a step four of low pass filtering the sampled data. Also disclosed is an audio conversion device including: a data segment selection unit which selects the first data segment and the second data segment from the audio segment; a superposing unit which superposes the first data segment and the second data segment; a sampling unit which samples the superposed audio data; a low pass filter which performs low pass filtering on the sampled data. The operation quantity of the invention is very small, a high performance computing chip or a special DSP are not required, the invention only requires a 10 MIPS computing ability without changing an original vioce length, is suitable for real-time call and capable of realizing real-time voice conversion on a mobile terminal with a finite computing ability and internal memory.

Description

Audio conversion method and equipment

Technical field

The present invention relates to a kind of audio-frequency processing method and equipment, particularly a kind of method and apparatus that the fundamental frequency of audio frequency is carried out conversion

Background technology

Revise the feature that fundamental frequency can change voice, make the hearer can not discern speaker's identity, reach the effect of the change of voice.The method that changes fundamental frequency at present has Fourier transform frequency domain transfer method, SOLA (synchronized Overlap and Add) algorithm, mixes harmonic wave random algorithm etc.But these method operands are bigger, and real-time implementation needs high-performance computer or special DSP, increase the cost of terminal device.Another problem is the length that can change raw tone, the difficulty that transmits in real time when causing conversation.

Summary of the invention

It is big and can change the defective of voice length to the objective of the invention is to overcome existing transform method operand, provides a kind of operand little and can not change the transform method of raw tone length.

To achieve these goals, the present invention has taked following technical scheme:

A kind of audio conversion method comprises the steps:

Step 1 is chosen first data segment and second data segment from audio section;

Step 2 superposes this first data segment and second data segment;

Step 3 is sampled to the voice data after the stack;

Step 4 is carried out low-pass filtering to the data after the sampling.

Wherein, this first data segment and second data segment are determined by the following method: at first, preceding W data in the audio section are taken out, then from L ₀+ n (n=0,1,2...F _Max) individual point begins to get W again from this voice segments _aIndividual, as the first related data section, last W in this W data _aIndividual point takes out, and as the second related data section, is calculating the cross-correlation coefficient of two data segments, and as first data segment, the second related data section is as second data segment with the first related data section of cross-correlation coefficient maximum point wherein, wherein, and L ₀Be the length of audio section, W is a window length, W _aBe the length of first and second data segments, F _MaxFor maximum is searched time-delay.

Wherein, calculate the cross-correlation coefficient of this first comparing data section and the second comparing data section by the following method:

R = Σ_{i = 1}^{Wa} x_{i} y_{i},

Wherein, x _iBe the second comparing data section sequence, y _iIt is the first comparing data section sequence.

Wherein, W=L _n+ W _a, L _nBe the length of stack back output audio section, L _n=L ₀N/M, wherein, M is an original audio length, N is the audio frequency length after the stack.

In the step 2, adopt following method that this first data segment and second data segment are superposeed:

O _i=(a _iw _i+ b _i(2 ¹⁶-w _i))/2 ¹⁶, wherein, O _iBe the output sequence after the stack, a _iBe second sequence of data segments, b _iBe first sequence of data segments, w _iBe the stack coefficient of i point in second data segment, the stack coefficient of each point is recorded in the stack coefficient table.

Wherein, w _iCalculate by the following method:

w_{i} = [\frac{i}{Wa} \times 2^{16}], i &Element; 1 . . . Wa .

After with first data segment and the stack of second data segment, incite somebody to action L subsequently _nThe input voice sequence of individual point is added to the afterbody of output sequence.

In the step 3, to the preceding L of output sequence _nIndividual point is sampled, and sampling rate is M/N.

In the step 4, employed low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.

In addition, described audio section is audio frame, i.e. a L ₀=160, and W _a=60, F _Max=80.

The present invention also comprises a kind of audio frequency conversion equipment, comprising:

Data segment is chosen the unit, and it chooses first data segment and second data segment from audio section;

Superpositing unit, it superposes this first data segment and second data segment;

Sampling unit is sampled to superimposed data;

Low-pass filter carries out low-pass filtering to the data after the sampling.

Wherein, this data segment is chosen the unit and is comprised:

Data extracting unit is selected the first comparing data section and the second comparing data section from audio section;

The correlation calculations unit calculates the cross-correlation coefficient of the first comparing data section and the second comparing data section;

Selected cell is chosen the first comparing data section of cross-correlation coefficient maximum and the second comparing data section first data segment and second data segment the most.

Superpositing unit is incited somebody to action L subsequently after the stack of finishing data segment _nThe input voice sequence of individual point is added to the afterbody of output sequence.

This sampling unit is to the preceding L of output sequence _nIndividual point is sampled, and sampling rate is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.

This low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.

Positive progressive effect of the present invention is: operand is very little, do not need high-performance calculation chip or special DSP, on universal cpu, can realize, only need the computing power of 10MIPS, and do not change raw tone length, be applicable to real-time conversation, can on the limited portable terminal of computing power and internal memory, realize the real-time voice change of voice.

Description of drawings

Fig. 1 is the logic diagram of sound intermediate frequency conversion equipment of the present invention.

Fig. 2 is the process synoptic diagram of method among the present invention.

Fig. 3 A, 3B are for choosing the synoptic diagram of data segment among the present invention.

Embodiment

Provide preferred embodiment of the present invention below in conjunction with accompanying drawing, to describe technical scheme of the present invention in detail.

As shown in Figure 1, a kind of audio frequency conversion equipment comprises:

Data segment is chosen unit 10, and it chooses first data segment and second data segment from audio section;

Superpositing unit 20, it superposes this first data segment and second data segment;

Sampling unit 30 is sampled to superimposed data;

Low-pass filter 40 carries out low-pass filtering to the data after the sampling.

Wherein, this data segment is chosen unit 10 and is comprised:

Data extracting unit 11 is selected the first comparing data section and the second comparing data section from audio section;

Correlation calculations unit 12 calculates the cross-correlation coefficient of the first comparing data section and the second comparing data section;

Selected cell 13 is chosen the first comparing data section of cross-correlation coefficient maximum and the second comparing data section first data segment and second data segment the most.

Superpositing unit 20 is incited somebody to action L subsequently after the stack of finishing data segment _nThe input voice sequence of individual point is added to the afterbody of output sequence.

The sampling rate of this sampling unit 30 is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.

This low-pass filter 40 is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.

As shown in Figure 2, in audio conversion method of the present invention, at first, the parameter that input is relevant, promptly, the original audio length M and the stack after audio frequency length N (step 100), their ratio can be 1/2,3/5,2/3,3/4,4/5,5/4,4/3,3/2,5/3 or 2/1, these 10 ratio correspondences 10 kinds of change of voice effects.

As step 110, choosing an audio frame is an audio section, and when being applied in the present invention on the phonetic modification, because the voice call sampling rate is 8KHZ, a voice frame length is 20ms, and then audio section is decided to be 160 points, that is, and and the length L of audio section ₀=160.

Utilize data selecting unit 10 in audio section, to choose first data segment and second data segment (as step 120) then.

Shown in Fig. 3 A, 3B, in an audio section, preceding W data in the audio section are taken out, W is a window length, W=L _n+ W _a, L _nBe the length of stack back output audio section, L _n=L ₀N/M, W _aBe the length of first and second data segments, for above-mentioned 10 kinds of change of voice effects, in order to reduce EMS memory occupation, the ratio of M, N is reduced to minimum common divisor (1/2,3/5,2/3,3/4,4/5,5/4,4/3,3/2,5/3 or 2/1), then W _aShould be 1,2,3,4,5 common multiple, and less than L ₀If value is big, effect is all right, but can increase calculated amount, takes all factors into consideration and can get W _a=60.Last W in this W data _aIndividual point takes out, as the second related data section,

Then from L ₀+ n (n=0,1,2...F _Max) individual point begins to get W again from this voice segments _aIndividual, as the first related data section, last W in each data of this W _aIndividual point takes out, as the second related data section, F _MaxSearch time-delay for maximum, get F _Max=80.Just, the process of choosing the first related data section is carried out F _Max, the starting point of at every turn getting a little moves one backward.

Can finish choosing of data segment by data extracting unit 11.

Utilize correlation calculations unit 12 calculating the cross-correlation coefficient of the second related data section and the first related data section, utilize the first related data section that selected cell 13 chooses cross-correlation coefficient maximum point wherein as first data segment, the second related data section is as second data segment.Said process is finished in correlation calculations unit 12 and selected cell 13 mutual operations.

R = Σ_{i = 1}^{Wa} x_{i} y_{i},

As step 130, utilize superpositing unit 20 that this first data segment and second data segment are superposeed, can adopt following method to superpose:

Give the w in the stack coefficient table _iCalculate by the following method:

w_{i} = [\frac{i}{Wa} \times 2^{16}], i &Element; 1 . . . Wa .

Just, with W _aPreestablish, calculate the stack coefficient of each point then, be stored in the stack coefficient table, when superposeing, directly from table, read the stack coefficient of each point, and needn't carry out computing at every turn, reduced operand.

In addition, after with first data segment and the stack of second data segment, incite somebody to action L subsequently _nThe input voice sequence of individual point is added to the afterbody of output sequence.

As step 140, utilize the preceding L of 30 pairs of output sequences of sampling unit _nIndividual point is sampled, and sampling rate is M/N.

As step 150, utilize the data after 40 pairs of samplings of low-pass filter to carry out low-pass filtering, low-pass filter 40 is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.

Like this, superimposed data is through over-sampling and low-pass filtering, and data length remains L ₀, the length of audio section does not change, and the length of corresponding whole audio frequency still is original length M.

When using the C language to realize said method and using the ARMADS compiling, the storage space that needs is: the run-time memory (RAM) of the program code of 12K (ROM) and 6K, obviously, said method only needs resource seldom can carry out the conversion of audio frequency.

Claims

1, a kind of audio conversion method is characterized in that, it comprises the steps:

Step 1 is chosen first data segment and second data segment from audio section;

Step 2 superposes this first data segment and second data segment;

Step 3 is sampled to the voice data after the stack;

Step 4 is carried out low-pass filtering to the data after the sampling.

2, audio conversion method according to claim 1 is characterized in that, this first data segment and second data segment are determined by the following method: at first, preceding W data in the audio section are taken out, then from L ₀+ n (n=0,1,2...F _Max) individual point begins to get W again from this voice segments _aIndividual, as the first related data section, last W in this W data _aIndividual point takes out, and as the second related data section, calculates the cross-correlation coefficient of two related data sections, and as first data segment, the second related data section is as second data segment with the first related data section of cross-correlation coefficient maximum point wherein, wherein, and L ₀Be the length of audio section, W is a window length, W _aBe the length of first and second data segments, F _MaxFor maximum is searched time-delay.

3, audio conversion method according to claim 2 is characterized in that, calculates the cross-correlation coefficient of this first comparing data section and the second comparing data section by the following method:

R = Σ_{i = 1}^{Wa} x_{i} y_{i},

4, audio conversion method according to claim 3 is characterized in that, W=L _n+ W _a, L _nBe the length of stack back output audio section, L _n=L ₀N/M, wherein, M is an original audio length, N is the audio frequency length after the stack.

5, according to the described audio conversion method of the arbitrary claim of 1-4, it is characterized in that, in the step 2, adopt following method that this first data segment and second data segment are superposeed:

6, audio conversion method according to claim 5 is characterized in that, w _iCalculate by the following method:

w_{i} = [\frac{i}{Wa} {\times 2}^{16}], i &Element; 1 . . . Wa .

7, audio conversion method according to claim 6 is characterized in that, after with first data segment and the stack of second data segment, incites somebody to action L subsequently _nThe input voice sequence of individual point is added to the afterbody of output sequence.

8, audio conversion method according to claim 7 is characterized in that, in the step 3, to the preceding L of output sequence _nIndividual point is sampled, and sampling rate is M/N.

9, audio conversion method according to claim 8 is characterized in that, in the step 4, employed low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.

10, audio conversion method according to claim 9 is characterized in that, described audio section is audio frame, i.e. a L ₀=160, and W _a=60, F _Max=80.

11, a kind of audio frequency conversion equipment is characterized in that it comprises:

Sampling unit is sampled to superimposed data;

Low-pass filter carries out low-pass filtering to the data after the sampling.

12, audio frequency conversion equipment according to claim 11 is characterized in that, this data segment is chosen the unit and comprised:

13, audio frequency conversion equipment according to claim 12 is characterized in that, superpositing unit is incited somebody to action L subsequently after the stack of finishing data segment _nThe input voice sequence of individual point is added to the afterbody of output sequence.

14, audio frequency conversion equipment according to claim 13 is characterized in that, this sampling unit is to the preceding L of output sequence _nIndividual point is sampled, and sampling rate is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.

15, audio frequency conversion equipment according to claim 14 is characterized in that, this low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.