CN101055722A - Audio conversion method and equipment - Google Patents

Audio conversion method and equipment Download PDF

Info

Publication number
CN101055722A
CN101055722A CNA2006100256210A CN200610025621A CN101055722A CN 101055722 A CN101055722 A CN 101055722A CN A2006100256210 A CNA2006100256210 A CN A2006100256210A CN 200610025621 A CN200610025621 A CN 200610025621A CN 101055722 A CN101055722 A CN 101055722A
Authority
CN
China
Prior art keywords
data
audio
data segment
section
stack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006100256210A
Other languages
Chinese (zh)
Inventor
王卫华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Chenxing Electronics Science and Technology Co Ltd
Original Assignee
Shanghai Chenxing Electronics Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Chenxing Electronics Science and Technology Co Ltd filed Critical Shanghai Chenxing Electronics Science and Technology Co Ltd
Priority to CNA2006100256210A priority Critical patent/CN101055722A/en
Publication of CN101055722A publication Critical patent/CN101055722A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses an audio conversion method comprising following steps: a step one of selecting a first data segment and a second data segment from an audio segment; a step two of superposing the first data segment and the second data segment; a step three of sampling the superposed audio data; a step four of low pass filtering the sampled data. Also disclosed is an audio conversion device including: a data segment selection unit which selects the first data segment and the second data segment from the audio segment; a superposing unit which superposes the first data segment and the second data segment; a sampling unit which samples the superposed audio data; a low pass filter which performs low pass filtering on the sampled data. The operation quantity of the invention is very small, a high performance computing chip or a special DSP are not required, the invention only requires a 10 MIPS computing ability without changing an original vioce length, is suitable for real-time call and capable of realizing real-time voice conversion on a mobile terminal with a finite computing ability and internal memory.

Description

Audio conversion method and equipment
Technical field
The present invention relates to a kind of audio-frequency processing method and equipment, particularly a kind of method and apparatus that the fundamental frequency of audio frequency is carried out conversion
Background technology
Revise the feature that fundamental frequency can change voice, make the hearer can not discern speaker's identity, reach the effect of the change of voice.The method that changes fundamental frequency at present has Fourier transform frequency domain transfer method, SOLA (synchronized Overlap and Add) algorithm, mixes harmonic wave random algorithm etc.But these method operands are bigger, and real-time implementation needs high-performance computer or special DSP, increase the cost of terminal device.Another problem is the length that can change raw tone, the difficulty that transmits in real time when causing conversation.
Summary of the invention
It is big and can change the defective of voice length to the objective of the invention is to overcome existing transform method operand, provides a kind of operand little and can not change the transform method of raw tone length.
To achieve these goals, the present invention has taked following technical scheme:
A kind of audio conversion method comprises the steps:
Step 1 is chosen first data segment and second data segment from audio section;
Step 2 superposes this first data segment and second data segment;
Step 3 is sampled to the voice data after the stack;
Step 4 is carried out low-pass filtering to the data after the sampling.
Wherein, this first data segment and second data segment are determined by the following method: at first, preceding W data in the audio section are taken out, then from L 0+ n (n=0,1,2...F Max) individual point begins to get W again from this voice segments aIndividual, as the first related data section, last W in this W data aIndividual point takes out, and as the second related data section, is calculating the cross-correlation coefficient of two data segments, and as first data segment, the second related data section is as second data segment with the first related data section of cross-correlation coefficient maximum point wherein, wherein, and L 0Be the length of audio section, W is a window length, W aBe the length of first and second data segments, F MaxFor maximum is searched time-delay.
Wherein, calculate the cross-correlation coefficient of this first comparing data section and the second comparing data section by the following method:
R = Σ i = 1 Wa x i y i , Wherein, x iBe the second comparing data section sequence, y iIt is the first comparing data section sequence.
Wherein, W=L n+ W a, L nBe the length of stack back output audio section, L n=L 0N/M, wherein, M is an original audio length, N is the audio frequency length after the stack.
In the step 2, adopt following method that this first data segment and second data segment are superposeed:
O i=(a iw i+ b i(2 16-w i))/2 16, wherein, O iBe the output sequence after the stack, a iBe second sequence of data segments, b iBe first sequence of data segments, w iBe the stack coefficient of i point in second data segment, the stack coefficient of each point is recorded in the stack coefficient table.
Wherein, w iCalculate by the following method:
w i = [ i Wa × 2 16 ] , i ∈ 1 . . . Wa .
After with first data segment and the stack of second data segment, incite somebody to action L subsequently nThe input voice sequence of individual point is added to the afterbody of output sequence.
In the step 3, to the preceding L of output sequence nIndividual point is sampled, and sampling rate is M/N.
In the step 4, employed low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
In addition, described audio section is audio frame, i.e. a L 0=160, and W a=60, F Max=80.
The present invention also comprises a kind of audio frequency conversion equipment, comprising:
Data segment is chosen the unit, and it chooses first data segment and second data segment from audio section;
Superpositing unit, it superposes this first data segment and second data segment;
Sampling unit is sampled to superimposed data;
Low-pass filter carries out low-pass filtering to the data after the sampling.
Wherein, this data segment is chosen the unit and is comprised:
Data extracting unit is selected the first comparing data section and the second comparing data section from audio section;
The correlation calculations unit calculates the cross-correlation coefficient of the first comparing data section and the second comparing data section;
Selected cell is chosen the first comparing data section of cross-correlation coefficient maximum and the second comparing data section first data segment and second data segment the most.
Superpositing unit is incited somebody to action L subsequently after the stack of finishing data segment nThe input voice sequence of individual point is added to the afterbody of output sequence.
This sampling unit is to the preceding L of output sequence nIndividual point is sampled, and sampling rate is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.
This low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
Positive progressive effect of the present invention is: operand is very little, do not need high-performance calculation chip or special DSP, on universal cpu, can realize, only need the computing power of 10MIPS, and do not change raw tone length, be applicable to real-time conversation, can on the limited portable terminal of computing power and internal memory, realize the real-time voice change of voice.
Description of drawings
Fig. 1 is the logic diagram of sound intermediate frequency conversion equipment of the present invention.
Fig. 2 is the process synoptic diagram of method among the present invention.
Fig. 3 A, 3B are for choosing the synoptic diagram of data segment among the present invention.
Embodiment
Provide preferred embodiment of the present invention below in conjunction with accompanying drawing, to describe technical scheme of the present invention in detail.
As shown in Figure 1, a kind of audio frequency conversion equipment comprises:
Data segment is chosen unit 10, and it chooses first data segment and second data segment from audio section;
Superpositing unit 20, it superposes this first data segment and second data segment;
Sampling unit 30 is sampled to superimposed data;
Low-pass filter 40 carries out low-pass filtering to the data after the sampling.
Wherein, this data segment is chosen unit 10 and is comprised:
Data extracting unit 11 is selected the first comparing data section and the second comparing data section from audio section;
Correlation calculations unit 12 calculates the cross-correlation coefficient of the first comparing data section and the second comparing data section;
Selected cell 13 is chosen the first comparing data section of cross-correlation coefficient maximum and the second comparing data section first data segment and second data segment the most.
Superpositing unit 20 is incited somebody to action L subsequently after the stack of finishing data segment nThe input voice sequence of individual point is added to the afterbody of output sequence.
The sampling rate of this sampling unit 30 is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.
This low-pass filter 40 is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
As shown in Figure 2, in audio conversion method of the present invention, at first, the parameter that input is relevant, promptly, the original audio length M and the stack after audio frequency length N (step 100), their ratio can be 1/2,3/5,2/3,3/4,4/5,5/4,4/3,3/2,5/3 or 2/1, these 10 ratio correspondences 10 kinds of change of voice effects.
As step 110, choosing an audio frame is an audio section, and when being applied in the present invention on the phonetic modification, because the voice call sampling rate is 8KHZ, a voice frame length is 20ms, and then audio section is decided to be 160 points, that is, and and the length L of audio section 0=160.
Utilize data selecting unit 10 in audio section, to choose first data segment and second data segment (as step 120) then.
Shown in Fig. 3 A, 3B, in an audio section, preceding W data in the audio section are taken out, W is a window length, W=L n+ W a, L nBe the length of stack back output audio section, L n=L 0N/M, W aBe the length of first and second data segments, for above-mentioned 10 kinds of change of voice effects, in order to reduce EMS memory occupation, the ratio of M, N is reduced to minimum common divisor (1/2,3/5,2/3,3/4,4/5,5/4,4/3,3/2,5/3 or 2/1), then W aShould be 1,2,3,4,5 common multiple, and less than L 0If value is big, effect is all right, but can increase calculated amount, takes all factors into consideration and can get W a=60.Last W in this W data aIndividual point takes out, as the second related data section,
Then from L 0+ n (n=0,1,2...F Max) individual point begins to get W again from this voice segments aIndividual, as the first related data section, last W in each data of this W aIndividual point takes out, as the second related data section, F MaxSearch time-delay for maximum, get F Max=80.Just, the process of choosing the first related data section is carried out F Max, the starting point of at every turn getting a little moves one backward.
Can finish choosing of data segment by data extracting unit 11.
Utilize correlation calculations unit 12 calculating the cross-correlation coefficient of the second related data section and the first related data section, utilize the first related data section that selected cell 13 chooses cross-correlation coefficient maximum point wherein as first data segment, the second related data section is as second data segment.Said process is finished in correlation calculations unit 12 and selected cell 13 mutual operations.
Wherein, calculate the cross-correlation coefficient of this first comparing data section and the second comparing data section by the following method:
R = Σ i = 1 Wa x i y i , Wherein, x iBe the second comparing data section sequence, y iIt is the first comparing data section sequence.
As step 130, utilize superpositing unit 20 that this first data segment and second data segment are superposeed, can adopt following method to superpose:
O i=(a iw i+ b i(2 16-w i))/2 16, wherein, O iBe the output sequence after the stack, a iBe second sequence of data segments, b iBe first sequence of data segments, w iBe the stack coefficient of i point in second data segment, the stack coefficient of each point is recorded in the stack coefficient table.
Give the w in the stack coefficient table iCalculate by the following method:
w i = [ i Wa × 2 16 ] , i ∈ 1 . . . Wa .
Just, with W aPreestablish, calculate the stack coefficient of each point then, be stored in the stack coefficient table, when superposeing, directly from table, read the stack coefficient of each point, and needn't carry out computing at every turn, reduced operand.
In addition, after with first data segment and the stack of second data segment, incite somebody to action L subsequently nThe input voice sequence of individual point is added to the afterbody of output sequence.
As step 140, utilize the preceding L of 30 pairs of output sequences of sampling unit nIndividual point is sampled, and sampling rate is M/N.
As step 150, utilize the data after 40 pairs of samplings of low-pass filter to carry out low-pass filtering, low-pass filter 40 is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
Like this, superimposed data is through over-sampling and low-pass filtering, and data length remains L 0, the length of audio section does not change, and the length of corresponding whole audio frequency still is original length M.
When using the C language to realize said method and using the ARMADS compiling, the storage space that needs is: the run-time memory (RAM) of the program code of 12K (ROM) and 6K, obviously, said method only needs resource seldom can carry out the conversion of audio frequency.

Claims (15)

1, a kind of audio conversion method is characterized in that, it comprises the steps:
Step 1 is chosen first data segment and second data segment from audio section;
Step 2 superposes this first data segment and second data segment;
Step 3 is sampled to the voice data after the stack;
Step 4 is carried out low-pass filtering to the data after the sampling.
2, audio conversion method according to claim 1 is characterized in that, this first data segment and second data segment are determined by the following method: at first, preceding W data in the audio section are taken out, then from L 0+ n (n=0,1,2...F Max) individual point begins to get W again from this voice segments aIndividual, as the first related data section, last W in this W data aIndividual point takes out, and as the second related data section, calculates the cross-correlation coefficient of two related data sections, and as first data segment, the second related data section is as second data segment with the first related data section of cross-correlation coefficient maximum point wherein, wherein, and L 0Be the length of audio section, W is a window length, W aBe the length of first and second data segments, F MaxFor maximum is searched time-delay.
3, audio conversion method according to claim 2 is characterized in that, calculates the cross-correlation coefficient of this first comparing data section and the second comparing data section by the following method:
R = Σ i = 1 Wa x i y i , Wherein, x iBe the second comparing data section sequence, y iIt is the first comparing data section sequence.
4, audio conversion method according to claim 3 is characterized in that, W=L n+ W a, L nBe the length of stack back output audio section, L n=L 0N/M, wherein, M is an original audio length, N is the audio frequency length after the stack.
5, according to the described audio conversion method of the arbitrary claim of 1-4, it is characterized in that, in the step 2, adopt following method that this first data segment and second data segment are superposeed:
O i=(a iw i+ b i(2 16-w i))/2 16, wherein, O iBe the output sequence after the stack, a iBe second sequence of data segments, b iBe first sequence of data segments, w iBe the stack coefficient of i point in second data segment, the stack coefficient of each point is recorded in the stack coefficient table.
6, audio conversion method according to claim 5 is characterized in that, w iCalculate by the following method:
w i = [ i Wa × 2 16 ] , i ∈ 1 . . . Wa .
7, audio conversion method according to claim 6 is characterized in that, after with first data segment and the stack of second data segment, incites somebody to action L subsequently nThe input voice sequence of individual point is added to the afterbody of output sequence.
8, audio conversion method according to claim 7 is characterized in that, in the step 3, to the preceding L of output sequence nIndividual point is sampled, and sampling rate is M/N.
9, audio conversion method according to claim 8 is characterized in that, in the step 4, employed low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
10, audio conversion method according to claim 9 is characterized in that, described audio section is audio frame, i.e. a L 0=160, and W a=60, F Max=80.
11, a kind of audio frequency conversion equipment is characterized in that it comprises:
Data segment is chosen the unit, and it chooses first data segment and second data segment from audio section;
Superpositing unit, it superposes this first data segment and second data segment;
Sampling unit is sampled to superimposed data;
Low-pass filter carries out low-pass filtering to the data after the sampling.
12, audio frequency conversion equipment according to claim 11 is characterized in that, this data segment is chosen the unit and comprised:
Data extracting unit is selected the first comparing data section and the second comparing data section from audio section;
The correlation calculations unit calculates the cross-correlation coefficient of the first comparing data section and the second comparing data section;
Selected cell is chosen the first comparing data section of cross-correlation coefficient maximum and the second comparing data section first data segment and second data segment the most.
13, audio frequency conversion equipment according to claim 12 is characterized in that, superpositing unit is incited somebody to action L subsequently after the stack of finishing data segment nThe input voice sequence of individual point is added to the afterbody of output sequence.
14, audio frequency conversion equipment according to claim 13 is characterized in that, this sampling unit is to the preceding L of output sequence nIndividual point is sampled, and sampling rate is M/N, and wherein, M is an original audio length, and N is the audio frequency length after the stack.
15, audio frequency conversion equipment according to claim 14 is characterized in that, this low-pass filter is a FIR type digital filter, its exponent number be 16Max (M, N)/M+1.
CNA2006100256210A 2006-04-12 2006-04-12 Audio conversion method and equipment Pending CN101055722A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2006100256210A CN101055722A (en) 2006-04-12 2006-04-12 Audio conversion method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2006100256210A CN101055722A (en) 2006-04-12 2006-04-12 Audio conversion method and equipment

Publications (1)

Publication Number Publication Date
CN101055722A true CN101055722A (en) 2007-10-17

Family

ID=38795526

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2006100256210A Pending CN101055722A (en) 2006-04-12 2006-04-12 Audio conversion method and equipment

Country Status (1)

Country Link
CN (1) CN101055722A (en)

Similar Documents

Publication Publication Date Title
Kim et al. Broadcasted residual learning for efficient keyword spotting
CN106486130B (en) Noise elimination and voice recognition method and device
CN1248190C (en) Fast frequency-domain pitch estimation
Chang et al. On the fixed-point accuracy analysis of FFT algorithms
Lin et al. Speech enhancement using multi-stage self-attentive temporal convolutional networks
CN110718211B (en) Keyword recognition system based on hybrid compressed convolutional neural network
CN1306473C (en) Fast code-vector searching
CN1647159A (en) Speech converter utilizing preprogrammed voice profiles
CN1262990C (en) Audio coding method and apparatus using harmonic extraction
CN101044552A (en) Sound encoder and sound encoding method
CN1578137A (en) Rational sample rate conversion
CN1186765C (en) Method for encoding 2.3kb/s harmonic wave excidted linear prediction speech
CN1372718A (en) Digit filter design
CN1266671C (en) Apparatus and method for estimating harmonic wave of sound coder
CN114708855B (en) Voice awakening method and system based on binary residual error neural network
CN111294782A (en) Special integrated circuit and method for accelerating coding and decoding
Bielecki Few-shot bioacoustic event detection with prototypical networks, knowledge distillation and attention transfer loss
CN1214362C (en) Device and method for determining coretative coefficient between signals and signal sectional distance
CN101055722A (en) Audio conversion method and equipment
CN1770256A (en) Digital audio frequency mixing method based on transform domain
CN1920951A (en) Speed transformation method and system
CN101031910A (en) Method and apparatus for implementing fast orthogonal transforms of variable size
CN100546197C (en) Be applicable to the VLSI implementation method of multitone frequency marking standard universal filter unit
KR100721263B1 (en) Inverse modified discrete cosine transform co-processor and audio decoder having the same
CN104123943B (en) A kind of method and apparatus of audio signal resampling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication