CN101055722A - Audio conversion method and equipment - Google Patents
Audio conversion method and equipment Download PDFInfo
- Publication number
- CN101055722A CN101055722A CNA2006100256210A CN200610025621A CN101055722A CN 101055722 A CN101055722 A CN 101055722A CN A2006100256210 A CNA2006100256210 A CN A2006100256210A CN 200610025621 A CN200610025621 A CN 200610025621A CN 101055722 A CN101055722 A CN 101055722A
- Authority
- CN
- China
- Prior art keywords
- data
- audio
- data segment
- section
- stack
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses an audio conversion method comprising following steps: a step one of selecting a first data segment and a second data segment from an audio segment; a step two of superposing the first data segment and the second data segment; a step three of sampling the superposed audio data; a step four of low pass filtering the sampled data. Also disclosed is an audio conversion device including: a data segment selection unit which selects the first data segment and the second data segment from the audio segment; a superposing unit which superposes the first data segment and the second data segment; a sampling unit which samples the superposed audio data; a low pass filter which performs low pass filtering on the sampled data. The operation quantity of the invention is very small, a high performance computing chip or a special DSP are not required, the invention only requires a 10 MIPS computing ability without changing an original vioce length, is suitable for real-time call and capable of realizing real-time voice conversion on a mobile terminal with a finite computing ability and internal memory.
Description
Technical field
The present invention relates to a kind of audio-frequency processing method and equipment, particularly a kind of method and apparatus that the fundamental frequency of audio frequency is carried out conversion
Background technology
Revise the feature that fundamental frequency can change voice, make the hearer can not discern speaker's identity, reach the effect of the change of voice.The method that changes fundamental frequency at present has Fourier transform frequency domain transfer method, SOLA (synchronized Overlap and Add) algorithm, mixes harmonic wave random algorithm etc.But these method operands are bigger, and real-time implementation needs high-performance computer or special DSP, increase the cost of terminal device.Another problem is the length that can change raw tone, the difficulty that transmits in real time when causing conversation.
Summary of the invention
It is big and can change the defective of voice length to the objective of the invention is to overcome existing transform method operand, provides a kind of operand little and can not change the transform method of raw tone length.
To achieve these goals, the present invention has taked following technical scheme:
A kind of audio conversion method comprises the steps:
Step 1 is chosen first data segment and second data segment from audio section;
Step 2 superposes this first data segment and second data segment;
Step 3 is sampled to the voice data after the stack;
Step 4 is carried out low-pass filtering to the data after the sampling.
Wherein, this first data segment and second data segment are determined by the following method: at first, preceding W data in the audio section are taken out, then from L
0+ n (n=0,1,2...F
Max) individual point begins to get W again from this voice segments
aIndividual, as the first related data section, last W in this W data
aIndividual point takes out, and as the second related data section, is calculating the cross-correlation coefficient of two data segments, and as first data segment, the second related data section is as second data segment with the first related data section of cross-correlation coefficient maximum point wherein, wherein, and L
0Be the length of audio section, W is a window length, W
aBe the length of first and second data segments, F
MaxFor maximum is searched time-delay.
Wherein, calculate the cross-correlation coefficient of this first comparing data section and the second comparing data section by the following method:
Wherein, W=L
n+ W
a, L
nBe the length of stack back output audio section, L
n=L
0N/M, wherein, M is an original audio length, N is the audio frequency length after the stack.
In the step 2, adopt following method that this first data segment and second data segment are superposeed:
O
i=(a
iw
i+ b
i(2
16-w
i))/2
16, wherein, O
iBe the output sequence after the stack, a
iBe second sequence of data segments, b
iBe first sequence of data segments, w
iBe the stack coefficient of i point in second data segment, the stack coefficient of each point is recorded in the stack coefficient table.
Wherein, w
iCalculate by the following method:
After with first data segment and the stack of second data segment, incite somebody to action L subsequently
nThe input voice sequence of individual point is added to the afterbody of output sequence.
In the step 3, to the preceding L of output sequence
nIndividual point is sampled, and sampling rate is M/N.
In the step 4, employed low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
In addition, described audio section is audio frame, i.e. a L
0=160, and W
a=60, F
Max=80.
The present invention also comprises a kind of audio frequency conversion equipment, comprising:
Data segment is chosen the unit, and it chooses first data segment and second data segment from audio section;
Superpositing unit, it superposes this first data segment and second data segment;
Sampling unit is sampled to superimposed data;
Low-pass filter carries out low-pass filtering to the data after the sampling.
Wherein, this data segment is chosen the unit and is comprised:
Data extracting unit is selected the first comparing data section and the second comparing data section from audio section;
The correlation calculations unit calculates the cross-correlation coefficient of the first comparing data section and the second comparing data section;
Selected cell is chosen the first comparing data section of cross-correlation coefficient maximum and the second comparing data section first data segment and second data segment the most.
Superpositing unit is incited somebody to action L subsequently after the stack of finishing data segment
nThe input voice sequence of individual point is added to the afterbody of output sequence.
This sampling unit is to the preceding L of output sequence
nIndividual point is sampled, and sampling rate is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.
This low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
Positive progressive effect of the present invention is: operand is very little, do not need high-performance calculation chip or special DSP, on universal cpu, can realize, only need the computing power of 10MIPS, and do not change raw tone length, be applicable to real-time conversation, can on the limited portable terminal of computing power and internal memory, realize the real-time voice change of voice.
Description of drawings
Fig. 1 is the logic diagram of sound intermediate frequency conversion equipment of the present invention.
Fig. 2 is the process synoptic diagram of method among the present invention.
Fig. 3 A, 3B are for choosing the synoptic diagram of data segment among the present invention.
Embodiment
Provide preferred embodiment of the present invention below in conjunction with accompanying drawing, to describe technical scheme of the present invention in detail.
As shown in Figure 1, a kind of audio frequency conversion equipment comprises:
Data segment is chosen unit 10, and it chooses first data segment and second data segment from audio section;
Superpositing unit 20, it superposes this first data segment and second data segment;
Low-pass filter 40 carries out low-pass filtering to the data after the sampling.
Wherein, this data segment is chosen unit 10 and is comprised:
Selected cell 13 is chosen the first comparing data section of cross-correlation coefficient maximum and the second comparing data section first data segment and second data segment the most.
Superpositing unit 20 is incited somebody to action L subsequently after the stack of finishing data segment
nThe input voice sequence of individual point is added to the afterbody of output sequence.
The sampling rate of this sampling unit 30 is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.
This low-pass filter 40 is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
As shown in Figure 2, in audio conversion method of the present invention, at first, the parameter that input is relevant, promptly, the original audio length M and the stack after audio frequency length N (step 100), their ratio can be 1/2,3/5,2/3,3/4,4/5,5/4,4/3,3/2,5/3 or 2/1, these 10 ratio correspondences 10 kinds of change of voice effects.
As step 110, choosing an audio frame is an audio section, and when being applied in the present invention on the phonetic modification, because the voice call sampling rate is 8KHZ, a voice frame length is 20ms, and then audio section is decided to be 160 points, that is, and and the length L of audio section
0=160.
Utilize data selecting unit 10 in audio section, to choose first data segment and second data segment (as step 120) then.
Shown in Fig. 3 A, 3B, in an audio section, preceding W data in the audio section are taken out, W is a window length, W=L
n+ W
a, L
nBe the length of stack back output audio section, L
n=L
0N/M, W
aBe the length of first and second data segments, for above-mentioned 10 kinds of change of voice effects, in order to reduce EMS memory occupation, the ratio of M, N is reduced to minimum common divisor (1/2,3/5,2/3,3/4,4/5,5/4,4/3,3/2,5/3 or 2/1), then W
aShould be 1,2,3,4,5 common multiple, and less than L
0If value is big, effect is all right, but can increase calculated amount, takes all factors into consideration and can get W
a=60.Last W in this W data
aIndividual point takes out, as the second related data section,
Then from L
0+ n (n=0,1,2...F
Max) individual point begins to get W again from this voice segments
aIndividual, as the first related data section, last W in each data of this W
aIndividual point takes out, as the second related data section, F
MaxSearch time-delay for maximum, get F
Max=80.Just, the process of choosing the first related data section is carried out F
Max, the starting point of at every turn getting a little moves one backward.
Can finish choosing of data segment by data extracting unit 11.
Utilize correlation calculations unit 12 calculating the cross-correlation coefficient of the second related data section and the first related data section, utilize the first related data section that selected cell 13 chooses cross-correlation coefficient maximum point wherein as first data segment, the second related data section is as second data segment.Said process is finished in correlation calculations unit 12 and selected cell 13 mutual operations.
Wherein, calculate the cross-correlation coefficient of this first comparing data section and the second comparing data section by the following method:
As step 130, utilize superpositing unit 20 that this first data segment and second data segment are superposeed, can adopt following method to superpose:
O
i=(a
iw
i+ b
i(2
16-w
i))/2
16, wherein, O
iBe the output sequence after the stack, a
iBe second sequence of data segments, b
iBe first sequence of data segments, w
iBe the stack coefficient of i point in second data segment, the stack coefficient of each point is recorded in the stack coefficient table.
Give the w in the stack coefficient table
iCalculate by the following method:
Just, with W
aPreestablish, calculate the stack coefficient of each point then, be stored in the stack coefficient table, when superposeing, directly from table, read the stack coefficient of each point, and needn't carry out computing at every turn, reduced operand.
In addition, after with first data segment and the stack of second data segment, incite somebody to action L subsequently
nThe input voice sequence of individual point is added to the afterbody of output sequence.
As step 140, utilize the preceding L of 30 pairs of output sequences of sampling unit
nIndividual point is sampled, and sampling rate is M/N.
As step 150, utilize the data after 40 pairs of samplings of low-pass filter to carry out low-pass filtering, low-pass filter 40 is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
Like this, superimposed data is through over-sampling and low-pass filtering, and data length remains L
0, the length of audio section does not change, and the length of corresponding whole audio frequency still is original length M.
When using the C language to realize said method and using the ARMADS compiling, the storage space that needs is: the run-time memory (RAM) of the program code of 12K (ROM) and 6K, obviously, said method only needs resource seldom can carry out the conversion of audio frequency.
Claims (15)
1, a kind of audio conversion method is characterized in that, it comprises the steps:
Step 1 is chosen first data segment and second data segment from audio section;
Step 2 superposes this first data segment and second data segment;
Step 3 is sampled to the voice data after the stack;
Step 4 is carried out low-pass filtering to the data after the sampling.
2, audio conversion method according to claim 1 is characterized in that, this first data segment and second data segment are determined by the following method: at first, preceding W data in the audio section are taken out, then from L
0+ n (n=0,1,2...F
Max) individual point begins to get W again from this voice segments
aIndividual, as the first related data section, last W in this W data
aIndividual point takes out, and as the second related data section, calculates the cross-correlation coefficient of two related data sections, and as first data segment, the second related data section is as second data segment with the first related data section of cross-correlation coefficient maximum point wherein, wherein, and L
0Be the length of audio section, W is a window length, W
aBe the length of first and second data segments, F
MaxFor maximum is searched time-delay.
3, audio conversion method according to claim 2 is characterized in that, calculates the cross-correlation coefficient of this first comparing data section and the second comparing data section by the following method:
4, audio conversion method according to claim 3 is characterized in that, W=L
n+ W
a, L
nBe the length of stack back output audio section, L
n=L
0N/M, wherein, M is an original audio length, N is the audio frequency length after the stack.
5, according to the described audio conversion method of the arbitrary claim of 1-4, it is characterized in that, in the step 2, adopt following method that this first data segment and second data segment are superposeed:
O
i=(a
iw
i+ b
i(2
16-w
i))/2
16, wherein, O
iBe the output sequence after the stack, a
iBe second sequence of data segments, b
iBe first sequence of data segments, w
iBe the stack coefficient of i point in second data segment, the stack coefficient of each point is recorded in the stack coefficient table.
6, audio conversion method according to claim 5 is characterized in that, w
iCalculate by the following method:
7, audio conversion method according to claim 6 is characterized in that, after with first data segment and the stack of second data segment, incites somebody to action L subsequently
nThe input voice sequence of individual point is added to the afterbody of output sequence.
8, audio conversion method according to claim 7 is characterized in that, in the step 3, to the preceding L of output sequence
nIndividual point is sampled, and sampling rate is M/N.
9, audio conversion method according to claim 8 is characterized in that, in the step 4, employed low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
10, audio conversion method according to claim 9 is characterized in that, described audio section is audio frame, i.e. a L
0=160, and W
a=60, F
Max=80.
11, a kind of audio frequency conversion equipment is characterized in that it comprises:
Data segment is chosen the unit, and it chooses first data segment and second data segment from audio section;
Superpositing unit, it superposes this first data segment and second data segment;
Sampling unit is sampled to superimposed data;
Low-pass filter carries out low-pass filtering to the data after the sampling.
12, audio frequency conversion equipment according to claim 11 is characterized in that, this data segment is chosen the unit and comprised:
Data extracting unit is selected the first comparing data section and the second comparing data section from audio section;
The correlation calculations unit calculates the cross-correlation coefficient of the first comparing data section and the second comparing data section;
Selected cell is chosen the first comparing data section of cross-correlation coefficient maximum and the second comparing data section first data segment and second data segment the most.
13, audio frequency conversion equipment according to claim 12 is characterized in that, superpositing unit is incited somebody to action L subsequently after the stack of finishing data segment
nThe input voice sequence of individual point is added to the afterbody of output sequence.
14, audio frequency conversion equipment according to claim 13 is characterized in that, this sampling unit is to the preceding L of output sequence
nIndividual point is sampled, and sampling rate is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.
15, audio frequency conversion equipment according to claim 14 is characterized in that, this low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2006100256210A CN101055722A (en) | 2006-04-12 | 2006-04-12 | Audio conversion method and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2006100256210A CN101055722A (en) | 2006-04-12 | 2006-04-12 | Audio conversion method and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101055722A true CN101055722A (en) | 2007-10-17 |
Family
ID=38795526
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2006100256210A Pending CN101055722A (en) | 2006-04-12 | 2006-04-12 | Audio conversion method and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101055722A (en) |
-
2006
- 2006-04-12 CN CNA2006100256210A patent/CN101055722A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1248190C (en) | Fast frequency-domain pitch estimation | |
Chang et al. | On the fixed-point accuracy analysis of FFT algorithms | |
CN110718211B (en) | Keyword recognition system based on hybrid compressed convolutional neural network | |
CN1210608A (en) | Noisy speech parameter enhancement method and apparatus | |
CN1262990C (en) | Audio coding method and apparatus using harmonic extraction | |
CN1578137A (en) | Rational sample rate conversion | |
CN1197242C (en) | Digit filter design | |
CN1186765C (en) | Method for encoding 2.3kb/s harmonic wave excidted linear prediction speech | |
Meyer et al. | Efficient convolutional neural network for audio event detection | |
CN114708855B (en) | Voice awakening method and system based on binary residual error neural network | |
CN1266671C (en) | Apparatus and method for estimating harmonic wave of sound coder | |
CN113077806A (en) | Audio processing method and device, model training method and device, medium and equipment | |
CN111294782A (en) | Special integrated circuit and method for accelerating coding and decoding | |
Bielecki | Few-shot bioacoustic event detection with prototypical networks, knowledge distillation and attention transfer loss | |
CN1214362C (en) | Device and method for determining coretative coefficient between signals and signal sectional distance | |
CN117133307A (en) | Low-power consumption mono voice noise reduction method, computer device and computer readable storage medium | |
CN101055722A (en) | Audio conversion method and equipment | |
CN1198397C (en) | Decoder, decoding method and program publishing medium | |
CN1770256A (en) | Digital audio frequency mixing method based on transform domain | |
CN1920951A (en) | Speed transformation method and system | |
CN101031910A (en) | Method and apparatus for implementing fast orthogonal transforms of variable size | |
CN1858998A (en) | No multiplication realizing method for digital audio frequency filter | |
KR100721263B1 (en) | Inverse modified discrete cosine transform co-processor and audio decoder having the same | |
CN113096685B (en) | Audio processing method and device | |
CN112133279B (en) | Vehicle-mounted information broadcasting method and device and terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |