CN102623007A

CN102623007A - Audio characteristic classification method based on variable duration

Info

Publication number: CN102623007A
Application number: CN2011100334102A
Authority: CN
Inventors: 卢敏; 窦维蓓
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2011-01-30
Filing date: 2011-01-30
Publication date: 2012-08-01
Anticipated expiration: 2031-01-30
Also published as: CN102623007B

Abstract

The invention discloses an audio characteristic classification method based on variable duration in a multimedia signal processing and mode identification technology field. The method comprises the following steps: taking a marked audio sequence whose type is determined as a training sequence; extracting short time characteristics of an audio signal in the training sequence so as to form a short time characteristic vector; calculating a statistical parameter of the each short time characteristic in setting duration so as to acquire a statistical characteristic vector corresponding to the short time characteristic vector; calculating a group of the statistical characteristic vectors corresponding to the short time characteristic vector, and forming a long time characteristic vector of the training sequence by the group of the statistical characteristic vectors; using the long time characteristic vector of the training sequence to train a classifier; extracting a short time characteristic of an ist frame audio signal in a test sequence and calculating an ist frame input long time characteristic vector of the test sequence; sending the ist frame input long time characteristic vector into the trained classifier so as to obtain a classification type. By using the method of the invention, a time-delay problem caused by long time characteristic extraction can be avoided and real time classification of the audio characteristic can be realized.

Description

Audio frequency characteristics sorting technique based on variable duration

Technical field

The invention belongs to multimedia signal dispose and mode identification technology, relate in particular to a kind of audio frequency characteristics sorting technique based on variable duration.

Background technology

Along with the continuous development of the communication technology, digital audio processing has obtained in a plurality of fields such as mobile communication, internet, broadcasting and personal electrics using widely.With audio encoding and decoding technique; Its from traditional be main voice coding with the narrowband voice; Expand to the higher multimedia audio coding of bandwidth expansion quality gradually, the rise of 3G, LTE is also further having higher requirement to audio encoding and decoding technique of new generation to aspects such as the reliability of the adaptability of channel, transmission and encoding and decoding quality.And no matter be audio coding decoding, or the sounds effects editing making, the diversity that sound signal itself is had, making possibly need to select different treatment technologies to different kind of audio signal.As ITU-T G.718 and G.729.1, just sound signal has been divided into voice and two kinds of coding modes of music, and after G.718-SWB in added coding mode to the sound signal that contains sinuso sine protractor.This shows, in some application scenarios, need earlier sound signal to be carried out simply and efficiently classification, know affiliated type.

Divide time-like, the characteristic when short-time characteristic of extraction sound signal and length.Because the stationarity in short-term of sound signal is compared short-time characteristic usually, the stability of characteristic is better with the property distinguished when long, but shortcoming is that the detection time-delay is big, and the application in the real-time grading system is had certain limitation.In addition, steady cycle that different characteristic shows maybe be inconsistent, and characteristic possibly not be an optimum when calculating correspondence long same surely duration under if these characteristics are all got.

Summary of the invention

The objective of the invention is to; The technical scheme of characteristic influences the problem of live effect when mainly adopting extraction long to audio frequency characteristics sorting technique commonly used; A kind of audio frequency characteristics sorting technique based on variable duration is proposed; Characteristic is come training classifier when growing through the variable duration that extracts the same statistical parameter formation of same short-time characteristic under different durations, and utilizes the sorter that trains to carry out the audio frequency characteristics classification.

Technical scheme of the present invention is that a kind of audio frequency characteristics sorting technique based on variable duration is characterized in that said method comprises the following steps:

Step 1: the tonic train that will confirm type and process mark is as training sequence;

Step 2: the short-time characteristic F that extracts the sound signal in the training sequence ₁, F ₂..., F _K, constitute short character vector

, K is the component number of short character vector;

Step 3: calculate each short-time characteristic F _kIn setting duration, the statistical parameter of the short-time characteristic of present frame and (n-1) frame before, n is for setting the totalframes in the duration; Each short-time characteristic F _kCorresponding one group of statistical nature vector that constitutes by the statistical parameter of this short-time characteristic

, and then short character vector

Corresponding statistical nature vector

, wherein

1≤k≤K;

Step 4: choose P value, N ₁, N ₂..., N _PSatisfy N ₁＜N ₂＜...＜N _P, make n equal N respectively ₁, N ₂..., N _P, calculate short character vector according to step 3

One group of corresponding statistical nature vector

, proper vector during by this group statistical nature vector composing training sequence long:

Step 5: proper vector

training classifier when utilizing training sequence long;

Step 6: extract the short-time characteristic of the sound signal in the cycle tests, and calculate statistical nature vectorial

and the cycle tests of the i frame of cycle tests according to the method for step 2 and step 3;

Step 7:, calculate the input of i frame of cycle tests proper vector when long according to the statistical nature vector

of the i frame of cycle tests and

of cycle tests;

Step 8: proper vector

was sent in the sorter after step 5 is trained when the input of i frame was grown, and its output is the classification type of i frame.

Said short-time characteristic comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.

The statistical parameter of the short-time characteristic of said present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before _k(n), minimum M inF _k(n), arithmetic mean AvgF _k(n) or variance VarF _k(n) one or more in.

The use of the long training sequence feature vectors

training classifiers is the use of specific long training sequence feature vectors

training single classifier.

Said when utilizing training sequence long proper vector

training classifier specifically be to use forward direction feature selecting method; When training sequence long, select validity feature to constitute proper vector

when effectively long in the proper vector

, and utilize when effectively long proper vector

to train single sorter.

The use of the long training sequence feature vectors

sub vector

separately from the same type of training after a single classifier classifier consisting of parallel groups.

Proper vector

was specifically utilized formula when the input of the i frame of said calculating cycle tests was long

Wherein, Q=1; 2; L; P-1; Total q of

in

, total P-q of

in

.

Said single sorter is the independent characteristic sorter based on normal distribution.

Features training sorter when the present invention grows through the variable duration that extracts the same statistical parameter formation of same short-time characteristic under different durations; And utilize the sorter that trains to carry out the audio frequency characteristics classification; Avoid extracting the latency issue that characteristic causes when long, realized the real-time grading of audio frequency characteristics.

Description of drawings

Fig. 1 is based on the audio frequency characteristics sorting technique process flow diagram of variable duration;

Fig. 2 is the synoptic diagram that proper vector is trained single sorter when utilizing training sequence long;

Proper vector was trained the synoptic diagram of single sorter when Fig. 3 was effective long that the validity feature of proper vector constitutes when utilizing training sequence long;

Fig. 4 is that the branch vector of proper vector is trained parallelly connected composition and classification device group synoptic diagram behind the single sorter of the same type separately respectively when utilizing training sequence long;

Fig. 5 is the training sample database information table;

Fig. 6 is a test sample book library information table;

Fig. 7 is a sorter performance comparison table.

Embodiment

Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that following explanation only is exemplary, rather than in order to limit scope of the present invention and application thereof.

The present invention is categorized as example with the voice/music signal under the 32kHz sampling rate and describes.To the sound signal classification of other types, the present invention stands good.

Fig. 1 is based on the audio frequency characteristics sorting technique process flow diagram of variable duration.Among Fig. 1, comprise the following steps: based on the audio frequency characteristics sorting technique of variable duration

Step 1: the tonic train that will confirm type and process mark is as training sequence.

Step 2: the short-time characteristic F that extracts the sound signal in the training sequence ₁, F ₂..., F _K, constitute short character vector , K is the component number of short character vector.

Present embodiment sound intermediate frequency signal is by every 40ms one frame, and the short-time characteristic of calculating comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.In the present invention, short-time characteristic includes but not limited to logarithm energy, zero-crossing rate and evenly sub belt energy distribution.

If the sound signal sampling point of i frame is x (n), n=(i-1) L, (i-1) L+1, L, iL-1, L are frame lengths, the computing formula of each short-time characteristic is following:

A, logarithm energy

E_{1} (i) = Σ_{n = (i - 1) L}^{i \cdot L - 1} x^{2} (n)

E ₂(i)＝max(log[E ₁(i)]，-10)

B, zero-crossing rate

ZCR (i) = Σ_{n = (i - 1) L}^{i \cdot L - 1} [sign (x (n) - x (n - 1)) + 1] / 2

Wherein, sign (x) is-symbol function,

Sign (x) = \{\begin{matrix} 1, & x > 0 \\ 0, & x = 0 \\ - 1, & x < 0 \end{matrix}

C, evenly sub belt energy distribution

SubE (i, k) = Σ_{m = (k - 1) L / 2 K}^{kL / 2 K - 1} X (i, m)

，k＝1，2，L，K

Wherein, (i m) is amplitude spectrum after i frame sound signal is done the FFT conversion to X.

X (i, m) = | Σ_{k = 1}^{L} x ((i - 1) L + k - 1) \cdot \exp [- j \cdot \frac{2 π}{L} (m - 1) (k - 1)] |

，m＝1，2，L，L

Character according to real sequence FFT can know that (i is m) about the m=L/2+1 even symmetry, so (L/2+1) individual value before can only keeping for X.K is even sub band number, makes K=16 in the present embodiment.

When present embodiment extracts audio frequency characteristics, the short character vector of i frame

{\overset{r}{V}}_{s} (i) = [\begin{matrix} E_{2} (i) \\ ZCR (i) \\ SubE (i, 1) \\ M \\ SubE (i, 16) \end{matrix}]

Its vectorial dimension is 18.E ₂(i), ZCR (i), SubE (i, 1) ..., SubE (i, 16) promptly is respectively the short character vector F of i frame ₁, F ₂..., F ₁₈

, and then short character vector

Corresponding statistical nature vector

, wherein

1≤k≤K.

The statistical parameter of the short-time characteristic of present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before _k(n), minimum M inF _k(n), arithmetic mean AvgF _k(n) or variance VarF _k(n) one or more in.In the present embodiment, select maximal value and variance as statistical parameter, then each short-time characteristic F _kCorresponding one group of statistical nature vector that constitutes by the statistical parameter of this short-time characteristic

Because after present embodiment the 2nd step is calculated; 18 short-time characteristics are arranged; The statistical nature vector that the statistical parameter by this short-time characteristic that each short-time characteristic is corresponding constitutes has 2, and then the dimension of a statistical nature vector

of short character vector

correspondence is 36 dimensions.

Step 4: choose P value, N ₁, N ₂..., N _PSatisfy N ₁＜N ₂＜... N _P, make n equal N respectively ₁, N ₂..., N _P, calculate short character vector according to step 3

One group of corresponding statistical nature vector

, proper vector during by this group statistical nature vector composing training sequence long

In the present embodiment, get P=3, N ₁=5, N ₂=15, N ₃=25, the corresponding one group of statistical nature of 3 short character vector that obtains the i frame is vectorial

, their vectorial dimension all is 36 dimensions.And then; Proper vector

during by this group statistical nature vector composing training sequence long, its vectorial dimension are 108 to tie up.

Step 5: proper vector

training classifier when utilizing training sequence long.

When obtaining training sequence long after the proper vector

; Can use known technology, proper vector training classifier when utilizing training sequence long.

Fig. 2 is the synoptic diagram that proper vector is trained single sorter when utilizing training sequence long.Among Fig. 2, proper vector the single sorter of direct Training when proper vector training classifier can utilize training sequence long when utilizing training sequence long.

Proper vector was trained the synoptic diagram of single sorter when Fig. 3 was effective long that the validity feature of proper vector constitutes when utilizing training sequence long.Among Fig. 3; Proper vector

training classifier also can use forward direction feature selecting method when utilizing training sequence long; When training sequence long, select validity feature to constitute proper vector when effectively long in the proper vector , and utilize when effectively long proper vector

to train single sorter.

Fig. 4 is that the branch vector of proper vector is trained the set of classifiers synoptic diagram that parallel connection constitutes behind the single sorter of the same type separately respectively when utilizing training sequence long.Among Fig. 4, branch vector

of proper vector

trained the set of classifiers that parallel connection constitutes behind the single sorter of the same type separately respectively when proper vector

training classifier can also utilize training sequence long when utilizing training sequence long.

In the present embodiment, single sorter is selected the independent characteristic sorter based on normal distribution, and for other sorter, the present invention stands good.During training classifier, use like Fig. 3 and method training classifier shown in Figure 4.Promptly use forward direction feature selecting method; When training sequence long in 108 dimensional features of proper vector ; Select 36 dimension validity features to constitute proper vector

when effectively long, and utilize when effectively long proper vector

to train single sorter.Simultaneously; With

respectively the characteristic of division vector, the sorter of stand-alone training same type.

and the cycle tests of the i frame of cycle tests according to the method for step 2 and step 3.

of the i frame of cycle tests and

of cycle tests.

Proper vector

was specifically utilized formula when the input of the i frame of calculating cycle tests was long

Wherein, Q=1; 2; L; P-1; Total q of

in

, total P-q of

in

.

Step 8: proper vector

was sent in the sorter of step 5 training when the input of i frame was grown, and its output is the classification type of i frame.

Training sample database in the present embodiment and test sample book storehouse are formed by voice sequence and music sequence, and be separate between two databases.Fig. 5 is the training sample database information table, and Fig. 6 is a test sample book library information table.On aforesaid test sample book storehouse, test, comparison-of-pair sorting's device results of property is as shown in Figure 7.Can be found out by test result contrast among Fig. 7: the duration of characteristic is big more when long, and classification accuracy rate is high more, but it is also big more to detect the time-delay of type conversion simultaneously; By contrast, the sorter that obtains of training according to the present invention aspect the promptness two that changes in the classification accuracy and the type of detection of audio types, has more excellent performance to show, and is more suitable for the system of real-time music/phonetic classification.

The above; Be merely the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; The variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1. the audio frequency characteristics sorting technique based on variable duration is characterized in that said method comprises the following steps:

, K is the component number of short character vector;

, and then short character vector

Corresponding statistical nature vector

, wherein 1≤k≤K;

One group of corresponding statistical nature vector

, proper vector during by this group statistical nature vector composing training sequence long;

Step 5: proper vector

training classifier when utilizing training sequence long;

and

of cycle tests of the i frame of cycle tests according to the method for step 2 and step 3

of the i frame of cycle tests and

of cycle tests;

Step 8: proper vector

2. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1 is characterized in that said short-time characteristic comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.

3. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, the statistical parameter that it is characterized in that the short-time characteristic of said present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before _k(n), minimum M inF _k(n), arithmetic mean AvgF _k(n) or variance VarF _k(n) one or more in.

4. a kind of audio frequency characteristics sorting technique according to claim 1 based on variable duration, it is characterized in that said when utilizing training sequence long proper vector

training classifier specifically be that proper vector is trained single sorter when utilizing training sequence long.

5. a kind of audio frequency characteristics sorting technique according to claim 1 based on variable duration; It is characterized in that said when utilizing training sequence long proper vector

training classifier specifically be to use forward direction feature selecting method; When training sequence long, select validity feature to constitute proper vector when effectively long in the proper vector

, and utilize when effectively long proper vector

to train single sorter.

6. a kind of audio frequency characteristics sorting technique according to claim 1 based on variable duration, it is characterized in that said when utilizing training sequence long proper vector

training classifier specifically be that the branch vector

of proper vector when utilizing training sequence long is trained the set of classifiers that parallel connection constitutes behind the single sorter of the same type separately respectively.

7. according to the described a kind of audio frequency characteristics sorting technique of any claim among the claim 4-6, it is characterized in that said single sorter is the independent characteristic sorter based on normal distribution based on variable duration.

8. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, proper vector

is specifically utilized formula when it is characterized in that the input of i frame of said calculating cycle tests is long

Wherein, Q=1; 2; L; P-1; Total q of

in

, total P-q of in

.