CN104240719B

CN104240719B - The feature extracting method of audio, the sorting technique of audio and relevant apparatus

Info

Publication number: CN104240719B
Application number: CN201310255746.2A
Authority: CN
Inventors: 谢志明; 潘晖; 潘石柱; 张兴明; 傅利泉; 朱江明; 吴军; 吴坚
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2013-06-24
Filing date: 2013-06-24
Publication date: 2018-01-12
Anticipated expiration: 2033-06-24
Also published as: CN104240719A

Abstract

The invention discloses a kind of feature extracting method of audio, the sorting technique and relevant apparatus of audio, to solve the problems, such as that in the prior art the audio extraction of different durations can not be gone out the feature of equal length.This method includes：Audio is obtained, and each audio for obtaining performs operations described below：According to the framing rule pre-set, the audio is divided, obtains multiple audio frames；According to the feature extraction rule pre-set, feature extraction is carried out to the multiple audio frame respectively, obtains the feature of each audio frame；According to the feature of obtained each audio frame, and for distinguishing each cluster centre of audio frame category, cluster centre corresponding to each audio frame is determined respectively；The number of the audio frame corresponding to each cluster centre is determined respectively, and the feature of the audio is determined according to the number determined.

Description

The feature extracting method of audio, the sorting technique of audio and relevant apparatus

Technical field

The present invention relates to area of pattern recognition, more particularly to a kind of feature extracting method of audio, the sorting technique of audio And relevant apparatus.

Background technology

The classification of audio can be widely applied to audio retrieval and accident detection.Such as applied to audio retrieval One example is the classification by carrying out voice and music to certain audio, so as in the data corresponding to the classification determined Retrieved in storehouse.In this example, if it is possible to which the classification for predefining out the audio is " music ", then can directly exist Go to retrieve in " musical database ".Especially, if it is possible to it is the audio for possessing certain music style to predefine out the audio, It then can further reduce range of search.For another example, an example applied to accident detection is by being carried out to certain audio Shriek, glass fragmentation faced sound, shot and normal sound（Such as by someone normal word speed one's voice in speech）Classification, so as to judge The event for going out to produce the audio is anomalous event or normal event.In this example, if it is determined that go out the feature of the audio with The feature of the anomalous audios such as shriek, shot or glass fragmentation faced is similar, then can determine the audio belong to anomalous audio this Classification, so that it is determined that the event corresponding to the audio is anomalous event；And if the feature of the audio and the feature of normal sound It is similar, then it can determine that the audio belongs to normal audio this classification, so that it is determined that the event corresponding to the audio is normal Event.

In the prior art be typically all to known class and duration be equal to specific duration（Such as 1 second）Audio sample enter Row framing short time treatment（A section audio is divided into multiframe）, obtain the MFCC cepstrum of each frame（Mel Frequency Cepstrum Coefficient, MFCC）, linear prediction residue error（Linear Predictive Cepstral Coding, LPCC）Deng, and the feature for being used as the section audio sample has been combined, then will be extracted from each audio sample To feature clustered or classification based training obtains the common feature of each class audio frequency.Then divide in the audio to unknown classification During class, and same sub-frame processing is carried out to a section audio of time fixed length, extract corresponding feature and be sent into what cluster obtained It is compared in the grader that cluster centre or classification based training obtain, so that it is determined that classification results.

The defects of above-mentioned this method is present be：The either audio sample of known class, it is desired nonetheless to classification it is unknown The audio of classification, require that their duration must be isometric（For specified time length）, because if duration Length discrepancy, then according to The length for the feature that the above method extracts be also it is unequal, more can not be to not so as to be clustered or classification based training Know that the audio of classification is classified.

The content of the invention

The embodiment of the present invention provides a kind of feature extracting method of audio, the sorting technique and relevant apparatus of audio, to Solves the problems, such as the feature that can not go out equal length to the audio extraction of different durations in the prior art.

The embodiment of the present invention uses following technical scheme：

A kind of feature extracting method of audio, including：

Audio is obtained, and each audio for obtaining performs operations described below：

According to the framing rule pre-set, the audio is divided, obtains multiple audio frames；

According to the feature extraction rule pre-set, feature extraction is carried out to the multiple audio frame respectively, obtains each sound The feature of frequency frame；

According to the feature of obtained each audio frame, and for distinguishing each cluster centre of audio frame category, determine respectively Cluster centre corresponding to each audio frame；Wherein, the corresponding cluster centre of each audio frame meets：In the spy of the audio frame Seek peace each cluster centre feature similarity in, the feature of the feature of audio frame cluster centre corresponding with its it is similar Degree is maximum；Each cluster centre is that each audio sample is divided into multiple audio samples respectively according to the framing rule Frame, and after the feature according to each audio sample frame of the feature extraction Rule Extraction, to the spy of each audio sample frame extracted Sign is clustered what is obtained；

The number of the audio frame corresponding to each cluster centre is determined respectively, and according to determining the number determined The feature of audio.

A kind of feature deriving means of audio, including：

Obtaining unit, for obtaining audio；

Framing unit, each audio for being obtained to obtaining unit perform：According to the framing rule pre-set, to this Audio is divided, and obtains multiple audio frames；

Feature extraction unit, for according to pre-set feature extraction rule, framing unit is obtained respectively described in Multiple audio frames carry out feature extraction, obtain the feature of each audio frame；

Cluster centre determining unit, for the feature of each audio frame obtained according to feature extraction unit, and for area Each cluster centre of other audio frame category, determines cluster centre corresponding to each audio frame respectively；Wherein, each audio frame and its Corresponding cluster centre meets：The audio frame feature and each cluster centre feature similarity in, the audio frame The similarity of the feature of feature cluster centre corresponding with its is maximum；Each cluster centre is according to framing rule difference Each audio sample is divided into multiple audio sample frames, and according to the spy of each audio sample frame of the feature extraction Rule Extraction After sign, the feature of each audio sample frame to extracting is clustered what is obtained；

Characteristics determining unit, for determining the number of the audio frame corresponding to each cluster centre respectively, and according to determining The number determine the feature of the audio.

A kind of sorting technique of audio, including：

Step 1：According to the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

Step 2：According to the feature extraction rule pre-set, feature extraction is carried out to the multiple audio frame respectively, obtained To the feature of each audio frame；

Pair Step 3: step 4 and step 5 perform at least twice successively：

Step 3：According to obtained each audio frame and advance default intersegmental overlapping second percentage, predetermined number is determined Audio section；And according to the audio section for the predetermined number determined, and for distinguishing each cluster centre of audio frame category, point Cluster centre corresponding to each audio frame that each audio section is included is not determined；Wherein, in the corresponding cluster of each audio frame The heart meets：The audio frame feature and each cluster centre feature similarity in, the feature of the audio frame and it is corresponding Cluster centre feature similarity it is maximum；Each cluster centre is respectively by each audio sample according to the framing rule Originally multiple audio sample frames are divided into, and after the feature according to each audio sample frame of the feature extraction Rule Extraction, to extraction The feature of each audio sample frame gone out is clustered what is obtained；Wherein, when the step 3 is performed at least twice, each institute According to intersegmental overlapping second percentage it is different；

Step 4：The number of the audio frame corresponding to each cluster centre is determined respectively, and according to the number determined Determine the feature of the audio；

Step 5：According to the feature for the audio determined and the grader for distinguishing audio categories, it is determined that classification As a result；Wherein, the grader is to carry out classification based training according to the feature to each audio sample to obtain；Wherein, each audio Sample is characterized in what is obtained according to the feature and each cluster centre of its audio frame；

Step 6：According to the classification results determined, the classification of the audio is determined.

A kind of sorter of audio, including：

Framing unit, for the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

Classification results determining unit, for being performed successively at least twice to following step：

Step 1：According to obtained each audio frame and advance default intersegmental overlapping second percentage, predetermined number is determined Audio section；And according to the audio section for the predetermined number determined, and for distinguishing each cluster centre of audio frame category, point Cluster centre corresponding to each audio frame that each audio section is included is not determined；Wherein, in the corresponding cluster of each audio frame The heart meets：The audio frame feature and each cluster centre feature similarity in, the feature of the audio frame and it is corresponding Cluster centre feature similarity it is maximum；Each cluster centre is respectively by each audio sample according to the framing rule Originally multiple audio sample frames are divided into, and after the feature according to each audio sample frame of the feature extraction Rule Extraction, to extraction The feature of each audio sample frame gone out is clustered what is obtained；Wherein, when the step 1 is performed at least twice, each institute According to intersegmental overlapping second percentage it is different；

Step 2：The number of the audio frame corresponding to each cluster centre is determined respectively, and according to the number determined Determine the feature of the audio；

Step 3：According to the feature for the audio determined and the grader for distinguishing audio categories, it is determined that classification As a result；Wherein, the grader is to carry out classification based training according to the feature to each audio sample to obtain；Wherein, each audio Sample is characterized in what is obtained according to the feature and each cluster centre of its audio frame；

Classification determination unit, for the classification results to being determined according to classification results determining unit, determine the audio Classification.

A kind of method of audio classification, including：

According to the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

The number of the audio frame corresponding to each cluster centre is determined respectively, and according to determining the number determined The feature of audio；

According to the feature for the audio determined and the grader for distinguishing audio categories, the class of the audio is determined Not；Wherein, the grader is to carry out classification based training according to the feature to each audio sample to obtain；Wherein, each audio sample This is characterized in what is obtained according to the feature and each cluster centre of its audio frame.

A kind of sorter of audio, including：

Characteristics determining unit, for determining the number of the audio frame corresponding to each cluster centre respectively, and according to determining The number determine the feature of the audio；

Classification determination unit, for the feature according to the audio determined and the classification for distinguishing audio categories Device, determine the classification of the audio；Wherein, the grader is to carry out classification based training according to the feature to each audio sample to obtain 's；Wherein, each audio sample is characterized in what is obtained according to the feature and each cluster centre of its audio frame.

The embodiment of the present invention has the beneficial effect that：

The embodiment of the present invention obtains the feature of each audio frame by carrying out sub-frame processing and feature extraction to audio, then By the feature of each audio frame respectively with each cluster for being clustered to obtain in advance to the feature of the audio frame in audio sample The heart is compared, and determines cluster centre corresponding to each audio frame, and then determines the audio frame corresponding to each cluster centre Number, finally give the feature of the audio.Because the number for the cluster centre being previously obtained is certain, thus no matter audio Duration be how many, the length of obtained feature is all constant, in the prior art can not be to different durations so as to solve Audio extraction goes out the problem of feature of equal length.

Brief description of the drawings

Fig. 1 is a kind of implementation process figure of the feature extracting method for audio that the embodiment of the present invention one provides；

Fig. 2 is a kind of implementation process figure of the feature extracting method for audio that the embodiment of the present invention two provides；

Fig. 3 is a kind of implementation process figure of the sorting technique for audio that the embodiment of the present invention three provides；

Fig. 4 is a kind of implementation process figure of the sorting technique for audio that the embodiment of the present invention four provides；

Fig. 5 is a kind of particular flow sheet of the sorting technique for audio that the embodiment of the present invention five provides in actual applications；

Fig. 6 is a kind of structural representation of the feature deriving means for audio that the embodiment of the present invention six provides；

Fig. 7 is a kind of structural representation of the sorter for audio that the embodiment of the present invention seven provides；

Fig. 8 is a kind of structural representation of the sorter for audio that the embodiment of the present invention eight provides.

Embodiment

In order to solve the problems, such as that the feature of equal length, this hair can not be gone out to the audio extraction of different durations in the prior art Bright embodiment provides a kind of feature extracting method of audio, the sorting technique and relevant apparatus of audio.This programme passes through to sound Frequency carries out sub-frame processing and feature extraction, obtains the feature of each audio frame, then by the feature of each audio frame respectively with advance Each cluster centre for being clustered to obtain to the feature of the audio frame in audio sample is compared, and determines each audio frame pair The cluster centre answered, and then the number of the audio frame corresponding to each cluster centre is determined, finally determine the feature of the audio.By In the number for the cluster centre being previously obtained be certain, so no matter the duration of audio is how many, the length of obtained feature All it is constant, so as to solve asking for the feature that can not go out equal length to the audio extraction of different durations in the prior art Topic.

Embodiments of the invention are illustrated below in conjunction with Figure of description, it will be appreciated that implementation described herein Example is merely to illustrate and explain the present invention, and is not intended to limit the invention.And in the case where not conflicting, the reality in this explanation Applying the feature of example and embodiment can be combined with each other.

Embodiment one：

First, the embodiment of the present invention one provides a kind of feature extracting method of audio, implementation process figure such as Fig. 1 of this method It is shown, mainly comprise the steps：

Step 11, audio is obtained；

Wherein, each audio for acquisition is carried out following identical operations.Therefore, following each steps both for Same audio is described.

Step 12, according to the framing rule pre-set, the audio is divided, obtains multiple audio frames；

Although most of audio signal is random non-stationary signal, stationary signal can be regarded as interior in short-term, because The audio of acquisition is first carried out framing short time treatment by this according to the framing rule pre-set, wherein, framing rule can wrap Include frame duration and interframe overlapping percentages.For example a length of 25ms when according to frame, the framing that interframe overlapping percentages are 50% are advised When then being divided to the audio, obtained each audio frame meets：A length of 25ms during the frame of each audio frame, and each adjacent audio The interframe overlapping percentages of frame are 50%.

Step 13, according to the feature extraction rule pre-set, feature extraction is carried out to multiple audio frames respectively, obtained each The feature of audio frame；

In the step, first obtained each audio frame can be pre-processed, including at zero averaging, preemphasis and adding window One or more combination such as reason.Time domain is carried out according to the feature extraction rule pre-set to pretreated each audio frame again Or the feature extraction on frequency domain, wherein, the feature of extraction can be linear predictor coefficient（Linear Predictive Coding, LPC）, linear prediction residue error（Linear Predictive Cepstral Coding, LPCC）, Mel-cepstrum Coefficient（Mel Frequency Cepstrum Coefficient, MFCC）With linear prediction MFCC cepstrum（Linear Predictive Coding Mel Frequency Cepstrum Coefficient, LPCMFCC）In it is one or more Combination.

It should be noted that the framing rule and feature extraction rule that are previously mentioned in subsequent embodiment are implemented with the present invention Framing rule in example is identical with feature extraction rule, repeats no more hereinafter.

Step 14, according to the feature of obtained each audio frame, and for distinguishing each cluster centre of audio frame category, point Cluster centre corresponding to each audio frame is not determined；

Wherein, the corresponding cluster centre of each audio frame meets：Feature and each cluster centre in the audio frame Feature similarity in, the similarity of the feature of the feature of audio frame cluster centre corresponding with its is maximum.In general, The feature of audio frame and the feature of cluster centre can be a vector, i.e. characteristic vector.So as to the feature of audio frame and poly- The similarity of the feature at class center can be represented by the distance between two characteristic vectors.This illustrates two apart from smaller Characteristic vector is more similar, i.e. the similarity of the feature of the audio frame and the feature of the cluster centre is bigger；It is on the contrary then illustrate this two Individual characteristic vector difference is bigger, i.e. the similarity of the feature of the audio frame and the feature of the cluster centre is smaller.

In the step, each cluster centre for distinguishing each audio frame category is the framing rule in above-mentioned steps 12 Each audio sample is divided into multiple audio sample frames respectively, and it is each according to the feature extraction Rule Extraction in above-mentioned steps 13 After the feature of audio sample frame, the feature of each audio sample frame to extracting is clustered what is obtained；Wherein, the method for cluster There is the method for many comparative maturities in the prior art, be not described in detail one by one herein.

When the feature of all audio sample frames in each audio sample clusters, i.e., to each audio sample regardless of Clustered in the case of section, it is possible that to coming the feature of audio frame above and coming the feature of audio frame below Situation about being clustered, this mode destroy the timing of the audio frame in audio sample, cause in the cluster that cluster obtains The effect of the heart is poor.Therefore, in order to avoid this problem, it can take and each audio sample is divided into audio sample section Mode carries out Segment Clustering, so that the effect for the cluster centre that cluster obtains is relatively good.This mode will be in following embodiments two In describe in detail, will not be repeated here.

It should be noted that the number of obtained each cluster centre can be the constant voluntarily set according to user's request.

Step 15, the number of the audio frame corresponding to each cluster centre is determined respectively, and is determined according to the number determined The feature of the audio.

Wherein it is possible to first the number of the audio frame corresponding to each cluster centre for determining is normalized, then Using feature of the numeral obtained after normalized by the use of the form of characteristic vector statistic histogram as the audio.

When the audio of acquisition is known class for multiple and classification, then obtained the audio of these known class After feature, then classification based training can be carried out according to its known class to the feature of these obtained audios, and then be used for Distinguish the grader of audio categories.

The above method provided from the embodiment of the present invention one, due to being previously obtained for distinguishing each audio frame category The number of each cluster centre be certain, so the duration for no matter getting audio is how many, the spy of the audio finally given The length of sign is all constant, so as to solve the spy that can not go out equal length to the audio extraction of different durations in the prior art The problem of sign.

Embodiment two：

On the basis of the above method that the embodiment of the present invention one provides, the embodiment of the present invention two provides a kind of audio Feature extracting method, the implementation process figure of this method is as shown in Fig. 2 mainly include the following steps that：

Step 21, audio is obtained；

Step 22, according to the framing rule pre-set, the audio is divided, obtains multiple audio frames；

Step 23, according to the feature extraction rule pre-set, feature extraction is carried out to multiple audio frames respectively, obtained each The feature of audio frame；

Step 24, according to obtained each audio frame and intersegmental overlapping second percentage pre-set, predetermined number is determined Audio section, and record the ordering of each audio section of determination in time；

Wherein, the continuous audio frame of identical quantity is included in each audio section.

Step 25, performed respectively for each audio section：Determine to cluster corresponding to each audio frame that the audio section includes respectively Center；

Wherein, the corresponding cluster centre of each audio frame meets：Feature and each cluster centre in the audio frame Feature similarity in, the similarity of the feature of the feature of audio frame cluster centre corresponding with its is maximum.

Specifically, the step can include：According to the ordering of each audio section of record in time, the audio is determined The arrangement position of section；Further according to the ordering recorded for each audio sample, each arrangement position is corresponded respectively to from what is obtained In each cluster centre put, it is determined that with each cluster centre corresponding to the arrangement position of the audio section；Then by the audio section bag The feature of each audio frame contained compared with each cluster centre corresponding to the arrangement position of the audio section, determines respectively respectively Cluster centre corresponding to each audio frame that the audio section includes.

Wherein, each cluster centre corresponding to each arrangement position can obtain in the following way：

First, performed for each audio sample：According to above-mentioned framing rule, the audio sample is divided, obtained Multiple audio sample frames；According to features described above extracting rule, the feature of each audio sample frame is extracted；According to obtained each audio sample This frame and intersegmental overlapping first percentage pre-set, obtain the audio sample section of predetermined number；And record obtain it is each The ordering of audio sample section in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio of identical quantity Sample frame.

For example predetermined number is arranged to 2, intersegmental overlapping first percentage is arranged to 30%, and the audio sample is carried out Division has obtained N number of audio frame, then for the audio sample, can obtain two audio sample sections, first audio sample section （Hereinafter referred to as first paragraph）For the 1st frame to 10*N/17 frames, second audio sample section（Hereinafter referred to as second segment）For 7*N/17 Frame；If intersegmental overlapping first percentage is 0%, first paragraph is the 1st frame to N/2 frames, and second segment is N/2+1 frames to N Frame.

Then, according to the ordering recorded for each audio sample, respectively to the identical row in ordering The feature for each audio sample frame that all audio sample sections of column position are included is clustered, and obtains corresponding respectively to each arrangement Each cluster centre of position.

Wherein, or exemplified by dividing to obtain two audio sample sections by each audio sample, by all audio samples The feature of each audio sample frame that is included of first paragraph clustered, obtain each cluster centre corresponding with each first paragraph, with And the feature for each audio sample frame for being included the second segment in all audio samples is clustered, and is obtained and each second segment pair Each cluster centre answered.

Step 26, performed respectively for each audio section：Determine respectively each poly- corresponding to the arrangement position with the audio section The number of audio frame corresponding to class center, and determine according to the number determined the feature of the audio section；

Step 27, it is the audio by the combinations of features for each audio section determined according to the arrangement position of each audio section Feature.

Wherein, or exemplified by dividing to obtain two audio sample sections by each audio sample, the spy of first paragraph is determined After the feature of second segment of seeking peace, can according to the weight pre-set by the feature of first paragraph and the feature of second segment according to One section of sequential combination to second segment is got up, and obtains the feature of the audio.

The above method provided from the embodiment of the present invention two, due to being previously obtained for distinguishing each audio frame category The number of each cluster centre be certain, so the duration for no matter getting audio is how many, the spy of the audio finally given The length of sign is all constant, so as to solve the spy that can not go out equal length to the audio extraction of different durations in the prior art The problem of sign.

Further, clustered in the embodiment in the case where being segmented to each audio sample, ensure that each sound The timing of audio frame in frequency sample, so that the effect for the cluster centre that cluster obtains is relatively good.

Embodiment three：

The embodiment of the present invention three additionally provides a kind of sorting technique of audio, the implementation process figure of this method as shown in figure 3, Mainly include the following steps that：

Step 31, according to the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

Step 32, according to the feature extraction rule pre-set, feature extraction is carried out to multiple audio frames respectively, obtained each The feature of audio frame；

Step 33, according to the feature of obtained each audio frame, and for distinguishing each cluster centre of audio frame category, point Cluster centre corresponding to each audio frame is not determined；

Optionally, when each cluster centre for distinguishing audio frame category is corresponding to each of each arrangement position in the step During cluster centre, cluster centre corresponding to each audio frame that each audio section includes is determined in the step and respectively.

Step 34, the number of the audio frame corresponding to each cluster centre is determined respectively, and is determined according to the number determined The feature of above-mentioned audio；

Wherein, the specific implementation process of step 31 to step 34 may be referred to the phase in above-described embodiment one or embodiment two Description is closed, will not be repeated here.

Step 35, according to the feature for the above-mentioned audio determined and the grader for distinguishing audio categories, audio is determined Classification；

Wherein, grader is to carry out classification based training according to the feature to each audio sample to obtain, and each audio sample It is characterized in what is obtained according to the feature and above-mentioned each cluster centre of its audio frame；Wherein, the method for classification based training is existing There is the method for many comparative maturities in technology, be not described in detail one by one herein.

In an embodiment of the present invention, the feature for the above-mentioned audio determined can be sent into the grader that training obtains In, implement the feature of more above-mentioned audio by grader and each class another characteristic that training obtains.Specifically, grader pin To the feature of the audio compared with each class another characteristic, that class maximum with the similarity of the feature of the audio is chosen Classification not as the audio.

The above method provided from the embodiment of the present invention three, due to being previously obtained for distinguishing each audio frame category The number of each cluster centre be certain, so the duration for no matter getting audio is how many, the spy of the audio finally given The length of sign is all constant, therefore when classifying to audio, the audio of any duration can be carried out classifying, without Need again to audio to be sorted carry out specify duration operation.

Example IV：

By the way that above-described embodiment three is further analyzed, it is found that the sorting technique of this audio has a defect, It is bad classifying quality when classification is handled to be carried out for the different voice data of speed rhythm：If used fast sound during training If frequency sample occupies the majority, and what is handled in classification is one section of slower audio to be sorted, classifying quality can be very poor；If Used slow audio sample occupies the majority during training, and if what is handled in classification is one section of faster audio to be sorted, Classifying quality also can be very poor.

In order to provide the classification embodiment for adapting to the different audio of speed, the embodiment of the present invention four provides a kind of sound The sorting technique of frequency, the implementation process figure of this method is as shown in figure 4, comprise the following steps：

Step 41, according to the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

Step 42, according to the feature extraction rule pre-set, feature extraction is carried out to multiple audio frames respectively, obtained each The feature of audio frame；

Step 43, step 44 and step 45 are performed at least twice successively below, and perform step 43 when institute root every time According to intersegmental overlapping second percentage it is different.

Step 43, according to obtained each audio frame and advance default intersegmental overlapping second percentage, predetermined number is determined Audio section, and according to the audio section for the predetermined number determined, and for distinguishing each cluster centre of audio frame category, point Cluster centre corresponding to each audio frame that each audio section is included is not determined；

Optionally, obtaining the mode of each cluster centre can specifically include：

Performed for each audio sample：According to framing rule, the audio sample is divided, obtains multiple sounds Frequency sample frame；It is regular according to the feature extraction, extract the feature of each audio sample frame；According to obtained each audio sample frame and Intersegmental overlapping first percentage pre-set, obtains the audio sample section of predetermined number；And record obtained each audio sample This section of ordering in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio sample frame of identical quantity；

According to the ordering recorded for each audio sample, respectively to identical in the ordering The feature for each audio sample frame that all audio sample sections of arrangement position are included is clustered, and obtains corresponding respectively to each row Each cluster centre of column position.

Step 44, the number of the audio frame corresponding to each cluster centre is determined respectively, and is determined according to the number determined The feature of above-mentioned audio；

Wherein, the specific implementation process of step 41 to step 44 may be referred to the phase in above-described embodiment one or embodiment two Description is closed, will not be repeated here.

Step 45, according to the feature for the above-mentioned audio determined and the grader for distinguishing audio categories, audio is determined Classification；

Wherein, grader is to carry out classification based training according to the feature to each audio sample to obtain, and each audio sample It is characterized in what is obtained according to the feature and above-mentioned each cluster centre of its audio frame.

Because step 43, step 44 and step 45 perform at least twice successively, so the audio can be obtained in step 44 Multiple features, and in step 45, obtained multiple features are respectively fed in the grader that training obtains, it is every for what is obtained The feature of individual audio, a classification maximum with its similarity can be chosen, and then obtain multiple classification results of the audio.

Step 46, according to the classification results determined, the classification of above-mentioned audio is determined.

Specifically, the implementation of step 46 can include：According to obtained each classification results, determined using ballot mode The classification of above-mentioned audio.

The above method provided from the embodiment of the present invention three, this method can not only realize the audio to any duration Classified, and because it has carried out the division of audio section at least twice to same audio to be sorted, can be according to every The intersegmental overlapping percentages that are used during secondary division and obtain multiple features, so as to equivalent to the adaptation for improving the audio to be sorted Property, enables that the very fast audio accounting suitable for the training sample of grader is higher, and in the training sample of grader Slower audio accounting it is higher etc. different situations.So as to which such method applicability is wider, for the different sound of speed rhythm Frequency has preferable robustness.

Embodiment five：

A kind of concrete application flow of the above method of the offer of the embodiment of the present invention four in practice is provided in detail below.Should Application flow includes following step as shown in Figure 5：

Step 51, the audio sample of each known class is collected, and the audio sample of each known class of collection is carried Take out each cluster centre and grader；

Wherein, the method for extracting each cluster centre and grader may be referred to method described in embodiment two, herein Repeat no more.

In the embodiment of the present invention by taking predetermined number M as an example, each M audio sample section will be divided into by audio sample, The first paragraph being then directed in all audio samples, can obtain K1 cluster centre, for the second segment in all audio samples, K2 cluster centre can be obtained, by that analogy, for the M sections of all audio samples, Km cluster centre can be obtained.Its In, generally, it is optimal that M, which takes 3~10,.

Step 52, for an audio to be sorted, according to the framing rule pre-set, it is divided, obtained more Individual audio frame；

Step 53, according to the feature extraction rule pre-set, feature extraction is carried out to multiple audio frames respectively, obtained each The feature of audio frame；

Step 54, according to obtained each audio frame and intersegmental overlapping second percentage pre-set, M audio is determined Section；

Step 55, it is determined that the feature of each audio section；

Specifically, performed respectively for each audio section：

First, each cluster centre corresponding to the audio section is determined；For example first audio section is corresponded in K1 cluster The heart, second audio section correspond to K2 cluster centre, and m-th audio section corresponds to Km cluster centre；

Then, the feature of each audio frame audio section included compared with the cluster centre determined, is divided respectively Cluster centre corresponding to each audio frame that the audio section includes is not determined, and counts each cluster centre pair corresponding to the audio section The number for the audio frame answered；

Finally, the number of audio frame corresponding to each cluster centre according to corresponding to the audio section of statistics, determines the sound The feature of frequency range.

By taking first audio section as an example.Assuming that 100 audio frames are included in first audio section, and its corresponding 10 cluster Center（K1=10）, then 10 counters are set, and initial value is 0, and the corresponding cluster centre of each counter.It is so first First, the distance of the feature and the feature of 10 cluster centres of each audio frame, that closest cluster centre are calculated respectively Corresponding counter carries out plus 1 operation, for example the distance of the feature of first audio frame and the feature of the 3rd cluster centre is most Closely, then the 3rd counter is carried out plus 1 operates, the like, the digital sum on 10 counters finally obtained is 100.So Afterwards, by this 10 counters numeral all respectively divided by audio frame total number 100, just can obtain small between 10 0~1 Number, this 10 decimals are the feature of the audio section, can be represented with the form of characteristic vector histogram, and its length is 10.

Step 56, the feature of the audio is determined；

Specifically, the feature of obtained each audio section is subject to weight, just can according to the sequential combination of each audio section The feature of the audio is obtained, wherein, the total length of the feature of the audio is K1+K2+ ...+Km.

Step 57, according to the feature for the audio determined and the grader for distinguishing audio categories, classification results are determined；

Above-mentioned steps 53 perform at least twice successively to step 57, wherein when performing every time, in step 53 institute according to section Between overlapping second percentage it is different.

Step 58, according to the classification results determined, the classification of audio is determined using ballot mode.

The above method provided from the embodiment of the present invention five, this method can not only realize the audio to any duration Classified, and because it has carried out the division of audio section at least twice to same audio to be sorted, can be according to every The intersegmental overlapping percentages that are used during secondary division and obtain multiple features, so as to equivalent to the adaptation for improving the audio to be sorted Property, enables that the very fast audio accounting suitable for the training sample of grader is higher, and in the training sample of grader Slower audio accounting it is higher etc. different situations.So as to which such method applicability is wider, for the different sound of speed rhythm Frequency has preferable robustness.

Embodiment six：

The embodiment of the present invention also provides a kind of feature deriving means of audio, concrete structure schematic diagram such as Fig. 6 of the device It is shown, mainly include following function unit：

Obtaining unit 61, for obtaining audio；

Framing unit 62, each audio for being obtained to obtaining unit 61 perform：It is regular according to the framing pre-set, The audio is divided, obtains multiple audio frames；

Feature extraction unit 63, for according to the feature extraction rule pre-set, being obtained respectively to framing unit 62 Multiple audio frames carry out feature extraction, obtain the feature of each audio frame；

Cluster centre determining unit 64, for the feature of each audio frame obtained according to feature extraction unit 63, Yi Jiyong In each cluster centre of difference audio frame category, cluster centre corresponding to each audio frame is determined respectively；

Wherein, the corresponding cluster centre of each audio frame meets：Feature and each cluster centre in the audio frame Feature similarity in, the similarity of the feature of the feature of audio frame cluster centre corresponding with its is maximum；

Each cluster centre is that each audio sample is divided into multiple audio sample frames respectively according to above-mentioned framing rule, and After the feature that each audio sample frame is extracted according to features described above extracting rule, the feature of each audio sample frame to extracting is carried out What cluster obtained；

Characteristics determining unit 65, for determining the number of the audio frame corresponding to each cluster centre respectively, and according to determination The number gone out determines the feature of audio.

First, performed for each audio sample：According to above-mentioned framing rule, the audio sample is divided, obtained Multiple audio sample frames；According to features described above extracting rule, the feature of each audio sample frame is extracted；According to obtained each audio sample This frame and intersegmental overlapping first percentage pre-set, obtain the audio sample section of predetermined number；And record obtain it is each The ordering of audio sample section in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio of identical quantity Sample frame；

Secondly according to the ordering recorded for each audio sample, respectively to the aligned identical in ordering The feature for each audio sample frame that all audio sample sections of position are included is clustered, and obtains corresponding respectively to each arrangement position Each cluster centre put.

When obtaining each cluster centre using aforesaid way, cluster centre determining unit 64, can specifically include：

Audio section determining module 641, for each audio frame obtained according to framing unit 62 and pre-set intersegmental heavy Folded second percentage, determine the audio section of predetermined number；And record the ordering of each audio section of determination in time；Its In, the continuous audio frame of identical quantity is included in each audio section；

Arrangement position determining module 642, each audio section for being determined for audio section determining module perform respectively：According to The ordering of each audio section of record in time, determine the arrangement position of the audio section；

The first determining module of cluster centre 643, for according to the ordering recorded for each audio sample, from obtaining Each cluster centre for corresponding respectively to each arrangement position in, it is determined that with each cluster corresponding to the arrangement position of the audio section The heart；

The second determining module of cluster centre 644, for the feature of each audio frame that includes the audio section respectively with cluster Each cluster centre that the first determining module of center 643 is determined is compared, and determines each audio frame that the audio section includes respectively Corresponding cluster centre.

Characteristics determining unit 65, it can specifically include：

Its feature determining module 651, for being performed respectively for each audio section：The row with the audio section is determined respectively The number of the audio frame corresponding to each cluster centre corresponding to column position, and the audio section is determined according to the number determined Feature；

Audio frequency characteristics determining module 652, for the arrangement position according to each audio section, by its feature determining module The combinations of features of the 651 each audio sections determined is the feature of audio.

Embodiment seven：

The embodiment of the present invention also provides a kind of sorter of audio, the concrete structure schematic diagram of the device as shown in fig. 7, Mainly include following function unit：

Framing unit 71, for the framing rule pre-set, audio to be sorted is divided, obtains multiple audios Frame；

Feature extraction unit 72, for according to the feature extraction rule pre-set, being obtained respectively to framing unit 71 Multiple audio frames carry out feature extraction, obtain the feature of each audio frame；

Classification results determining unit 73, for being performed successively at least twice to following step：

Step 1：According to obtained each audio frame and advance default intersegmental overlapping second percentage, predetermined number is determined Audio section；And according to the audio section for the predetermined number determined, and for distinguishing each cluster centre of audio frame category, point Cluster centre corresponding to each audio frame that each audio section is included is not determined；

Wherein, when step 1 is performed at least twice, each institute according to intersegmental overlapping second percentage it is different；

Step 2：The number of the audio frame corresponding to each cluster centre is determined respectively, and is determined according to the number determined The feature of the audio；

Step 3：According to the feature for the audio determined and the grader for distinguishing audio categories, classification results are determined；

Wherein, grader is to carry out classification based training according to the feature to each audio sample to obtain；Wherein, each audio sample This is characterized in what is obtained according to the feature and above-mentioned each cluster centre of its audio frame；

Classification determination unit 74, for the classification results to being determined according to classification results determining unit 73, determine audio Classification.

When obtaining each cluster centre using aforesaid way, the step one in classification results determining unit 73 includes：

According to obtained each audio frame and intersegmental overlapping second percentage pre-set, the audio of predetermined number is determined Section；And record the ordering of each audio section of determination in time；Wherein, the company of identical quantity is included in each audio section Continuous audio frame；

Performed respectively for each audio section：

According to the ordering of each audio section of record in time, the arrangement position of the audio section is determined；

According to the ordering recorded for each audio sample, each poly- of each arrangement position is corresponded respectively to from what is obtained In class center, it is determined that with each cluster centre corresponding to the arrangement position of the audio section；

The feature for each audio frame that the audio section is included respectively with each cluster corresponding to the arrangement position of the audio section Center is compared, and determines cluster centre corresponding to each audio frame that the audio section includes respectively.

Step two in classification results determining unit 73 includes：

Performed respectively for each audio section：

The number of the audio frame corresponding to each cluster centre corresponding to the arrangement position with the audio section is determined respectively, and Number according to determining determines the feature of the audio section；

According to the arrangement position of each audio section, by the feature that the combinations of features for each audio section determined is the audio.

Optionally, classification determination unit 74, can be specifically used for：

Each classification results obtained according to classification results determining unit 73, the classification of audio is determined using ballot mode.

Embodiment eight：

The embodiment of the present invention also provides a kind of sorter of audio, the concrete structure schematic diagram of the device as shown in figure 8, Mainly include following function unit：

Framing unit 81, for the framing rule pre-set, audio to be sorted is divided, obtains multiple audios Frame；

Feature extraction unit 82, for according to the feature extraction rule pre-set, being obtained respectively to framing unit 81 The multiple audio frame carries out feature extraction, obtains the feature of each audio frame；

Cluster centre determining unit 83, for the feature of each audio frame obtained according to feature extraction unit 82, Yi Jiyong In each cluster centre of difference audio frame category, cluster centre corresponding to each audio frame is determined respectively；

Characteristics determining unit 84, for determining the number of the audio frame corresponding to each cluster centre respectively, and according to determination The number gone out determines the feature of audio；

Classification determination unit 85, for the feature of audio determined according to characteristics determining unit 84 and for distinguishing audio The grader of classification, determine the classification of audio；

Wherein, grader is to carry out classification based training according to the feature to each audio sample to obtain；Wherein, each audio sample This is characterized in what is obtained according to the feature and each cluster centre of its audio frame.

Optionally, obtaining the mode of above-mentioned each cluster centre can specifically include：

When obtaining each cluster centre using aforesaid way, cluster centre determining unit 83 can specifically include：

Audio section determining module 831, for each audio frame obtained according to framing unit 81 and pre-set intersegmental heavy Folded second percentage, determine the audio section of predetermined number；And record the ordering of each audio section of determination in time；Its In, the continuous audio frame of identical quantity is included in each audio section；

Arrangement position determining module 832, each audio section for being determined for audio section determining module 831 perform respectively： According to the ordering of each audio section of record in time, the arrangement position of the audio section is determined；

The first determining module of cluster centre 833, for according to the ordering recorded for each audio sample, from obtaining Each cluster centre for corresponding respectively to each arrangement position in, it is determined that with each cluster corresponding to the arrangement position of the audio section The heart；

The second determining module of cluster centre 834, for the feature of each audio frame that includes the audio section respectively with cluster Each cluster centre that the first determining module of center 833 determines is compared, and determines each audio frame pair that the audio section includes respectively The cluster centre answered.

Characteristics determining unit 84, it can specifically include：

Its feature determining module 841, for being performed respectively for each audio section：The row with the audio section is determined respectively The number of the audio frame corresponding to each cluster centre corresponding to column position, and the audio section is determined according to the number determined Feature；

Audio frequency characteristics determining module 842, for the arrangement position according to each audio section, by its feature determining module The combinations of features of the 841 each audio sections determined is the feature of audio.

It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more Usable storage medium（Including but not limited to magnetic disk storage, CD-ROM, optical memory etc.）The computer program production of upper implementation The form of product.

The present invention is with reference to method according to embodiments of the present invention, equipment（System）And the flow of computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.

These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent Select embodiment and fall into having altered and changing for the scope of the invention.

Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims

A kind of 1. feature extracting method of audio, it is characterised in that including：

Audio is obtained, and each audio for obtaining performs operations described below：

According to the framing rule pre-set, the audio is divided, obtains multiple audio frames；

According to the feature extraction rule pre-set, feature extraction is carried out to the multiple audio frame respectively, obtains each audio frame Feature；

According to the feature of obtained each audio frame, and for distinguishing each cluster centre of audio frame category, determine respectively each Cluster centre corresponding to audio frame；Wherein, the corresponding cluster centre of each audio frame meets：The audio frame feature and In the similarity of the feature of each cluster centre, the similarity of the feature of the feature of audio frame cluster centre corresponding with its is most Greatly；Each cluster centre is that each audio sample is divided into multiple audio sample frames respectively according to the framing rule, and After feature according to each audio sample frame of the feature extraction Rule Extraction, the feature of each audio sample frame to extracting is carried out What cluster obtained；

The number of the audio frame corresponding to each cluster centre is determined respectively, and the audio is determined according to the number determined Feature；

The mode for obtaining each cluster centre specifically includes：

Performed for each audio sample：According to framing rule, the audio sample is divided, obtains multiple audio samples This frame；It is regular according to the feature extraction, extract the feature of each audio sample frame；According to obtained each audio sample frame and in advance Intersegmental overlapping first percentage set, obtains the audio sample section of predetermined number；And record obtained each audio sample section Ordering in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio sample frame of identical quantity；

According to the ordering recorded for each audio sample, respectively to the aligned identical in the ordering The feature for each audio sample frame that all audio sample sections of position are included is clustered, and obtains corresponding respectively to each arrangement position Each cluster centre put.
2. the method as described in claim 1, it is characterised in that according to the feature of obtained each audio frame, and it is described each poly- Class center, determine each audio frame respectively corresponding to cluster centre, specifically include：

According to obtained each audio frame and intersegmental overlapping second percentage pre-set, the audio section of predetermined number is determined；And Record the ordering of each audio section determined in time；Wherein, it is continuous comprising identical quantity in each audio section Audio frame；

Performed respectively for each audio section：

According to the ordering of each audio section of record in time, the arrangement position of the audio section is determined；

According to the ordering recorded for each audio sample, each poly- of each arrangement position is corresponded respectively to from what is obtained In class center, it is determined that with each cluster centre corresponding to the arrangement position of the audio section；

The feature for each audio frame that the audio section is included respectively with each cluster centre corresponding to the arrangement position of the audio section It is compared, determines cluster centre corresponding to each audio frame that the audio section includes respectively.
3. method as claimed in claim 2, it is characterised in that determine of the audio frame corresponding to each cluster centre respectively Count, and the feature of the audio is determined according to the number determined, specifically include：

Performed respectively for each audio section：

Determination and the number of the audio frame corresponding to each cluster centre corresponding to the arrangement position of the audio section respectively, and according to The number determined determines the feature of the audio section；

According to the arrangement position of each audio section, by the feature that the combinations of features for each audio section determined is the audio.
A kind of 4. feature deriving means of audio, it is characterised in that including：

Obtaining unit, for obtaining audio；

Framing unit, each audio for being obtained to obtaining unit perform：According to the framing rule pre-set, to the audio Divided, obtain multiple audio frames；

Feature extraction unit, for according to the feature extraction rule pre-set, being obtained respectively to framing unit the multiple Audio frame carries out feature extraction, obtains the feature of each audio frame；

Cluster centre determining unit, for the feature of each audio frame obtained according to feature extraction unit, and for distinguishing sound Each cluster centre of frequency frame category, determines cluster centre corresponding to each audio frame respectively；Wherein, each audio frame is corresponding Cluster centre meet：The audio frame feature and each cluster centre feature similarity in, the feature of the audio frame The similarity of the feature of cluster centre corresponding with its is maximum；Each cluster centre is respectively will be each according to the framing rule Individual audio sample is divided into multiple audio sample frames, and according to the feature of each audio sample frame of the feature extraction Rule Extraction Afterwards, the feature of each audio sample frame to extracting is clustered what is obtained；

Characteristics determining unit, for determining the number of the audio frame corresponding to each cluster centre respectively, and according to the institute determined State the feature that number determines the audio；

The mode for obtaining each cluster centre specifically includes：

Performed for each audio sample：According to framing rule, the audio sample is divided, obtains multiple audio samples This frame；It is regular according to the feature extraction, extract the feature of each audio sample frame；According to obtained each audio sample frame and in advance Intersegmental overlapping first percentage set, obtains the audio sample section of predetermined number；And record obtained each audio sample section Ordering in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio sample frame of identical quantity；

According to the ordering recorded for each audio sample, respectively to the aligned identical in the ordering The feature for each audio sample frame that all audio sample sections of position are included is clustered, and obtains corresponding respectively to each arrangement position Each cluster centre put.
5. device as claimed in claim 4, it is characterised in that cluster centre determining unit, specifically include：

Audio section determining module, for each audio frame obtained according to framing unit and intersegmental overlapping second percentage pre-set Than determining the audio section of predetermined number；And record the ordering of each audio section of determination in time；Wherein, each audio Duan Zhongjun includes the continuous audio frame of identical quantity；

Arrangement position determining module, each audio section for being determined for audio section determining module perform respectively：According to record The ordering of each audio section in time, determine the arrangement position of the audio section；

The determining module of cluster centre first, for the ordering recorded according to each audio sample is directed to, from what is obtained Correspond respectively in each cluster centre of each arrangement position, it is determined that with each cluster corresponding to the arrangement position of the audio section The heart；

The determining module of cluster centre second, for the feature of each audio frame that includes the audio section respectively with cluster centre first Each cluster centre that determining module is determined is compared, and is determined respectively in being clustered corresponding to each audio frame that the audio section includes The heart.
6. device as claimed in claim 5, it is characterised in that characteristics determining unit, specifically include：

Its feature determining module, for being performed respectively for each audio section：The arrangement position with the audio section is determined respectively The number of audio frame corresponding to corresponding each cluster centre, and determine according to the number determined the feature of the audio section；

Audio frequency characteristics determining module, for the arrangement position according to each audio section, its feature determining module is determined The combinations of features of each audio section is the feature of the audio.
A kind of 7. sorting technique of audio, it is characterised in that including：

Step 1：According to the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

Step 2：According to the feature extraction rule pre-set, feature extraction is carried out to the multiple audio frame respectively, obtained each The feature of audio frame；

Pair Step 3: step 4 and step 5 perform at least twice successively：

Step 3：According to obtained each audio frame and advance default intersegmental overlapping second percentage, the sound of predetermined number is determined Frequency range；And according to the audio section for the predetermined number determined, and for distinguishing each cluster centre of audio frame category, it is true respectively Cluster centre corresponding to each audio frame that fixed each audio section is included；Wherein, the corresponding cluster centre of each audio frame is expired Foot：The audio frame feature and each cluster centre feature similarity in, the feature of the audio frame and its corresponding to it is poly- The similarity of the feature at class center is maximum；Each cluster centre is respectively to be drawn each audio sample according to the framing rule It is divided into multiple audio sample frames, and after the feature according to each audio sample frame of the feature extraction Rule Extraction, to what is extracted The feature of each audio sample frame is clustered what is obtained；Wherein, when the step 3 is performed at least twice, each institute according to Intersegmental overlapping second percentage it is different；

Step 4：The number of the audio frame corresponding to each cluster centre is determined respectively, and is determined according to the number determined The feature of the audio；

Step 5：According to the feature for the audio determined and the grader for distinguishing audio categories, classification results are determined； Wherein, the grader is to carry out classification based training according to the feature to each audio sample to obtain；Wherein, each audio sample It is characterized in what is obtained according to the feature and each cluster centre of its audio frame；

Step 6：According to the classification results determined, the classification of the audio is determined.
8. method as claimed in claim 7, it is characterised in that the mode for obtaining each cluster centre specifically includes：

Performed for each audio sample：According to framing rule, the audio sample is divided, obtains multiple audio samples This frame；It is regular according to the feature extraction, extract the feature of each audio sample frame；According to obtained each audio sample frame and in advance Intersegmental overlapping first percentage set, obtains the audio sample section of predetermined number；And record obtained each audio sample section Ordering in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio sample frame of identical quantity；

According to the ordering recorded for each audio sample, respectively to the aligned identical in the ordering The feature for each audio sample frame that all audio sample sections of position are included is clustered, and obtains corresponding respectively to each arrangement position Each cluster centre put.
9. method as claimed in claim 8, it is characterised in that the step 3 specifically includes：

According to obtained each audio frame and intersegmental overlapping second percentage pre-set, the audio section of predetermined number is determined；And Record the ordering of each audio section determined in time；Wherein, it is continuous comprising identical quantity in each audio section Audio frame；

Performed respectively for each audio section：

According to the ordering of each audio section of record in time, the arrangement position of the audio section is determined；

According to the ordering recorded for each audio sample, each poly- of each arrangement position is corresponded respectively to from what is obtained In class center, it is determined that with each cluster centre corresponding to the arrangement position of the audio section；

The feature for each audio frame that the audio section is included respectively with each cluster centre corresponding to the arrangement position of the audio section It is compared, determines cluster centre corresponding to each audio frame that the audio section includes respectively.
10. method as claimed in claim 9, it is characterised in that the step 4 specifically includes：

Performed respectively for each audio section：

Determination and the number of the audio frame corresponding to each cluster centre corresponding to the arrangement position of the audio section respectively, and according to The number determined determines the feature of the audio section；

According to the arrangement position of each audio section, by the feature that the combinations of features for each audio section determined is the audio.
11. method as claimed in claim 7, it is characterised in that the step 6 specifically includes：

According to obtained each classification results, the classification of the audio is determined using ballot mode.
A kind of 12. sorter of audio, it is characterised in that including：

Framing unit, for the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

Feature extraction unit, for according to the feature extraction rule pre-set, being obtained respectively to framing unit the multiple Audio frame carries out feature extraction, obtains the feature of each audio frame；

Classification results determining unit, for being performed successively at least twice to following step：

Step 1：According to obtained each audio frame and advance default intersegmental overlapping second percentage, the sound of predetermined number is determined Frequency range；And according to the audio section for the predetermined number determined, and for distinguishing each cluster centre of audio frame category, it is true respectively Cluster centre corresponding to each audio frame that fixed each audio section is included；Wherein, the corresponding cluster centre of each audio frame is expired Foot：The audio frame feature and each cluster centre feature similarity in, the feature of the audio frame and its corresponding to it is poly- The similarity of the feature at class center is maximum；Each cluster centre is respectively to be drawn each audio sample according to the framing rule It is divided into multiple audio sample frames, and after the feature according to each audio sample frame of the feature extraction Rule Extraction, to what is extracted The feature of each audio sample frame is clustered what is obtained；Wherein, when the step 1 is performed at least twice, each institute according to Intersegmental overlapping second percentage it is different；

Step 2：The number of the audio frame corresponding to each cluster centre is determined respectively, and is determined according to the number determined The feature of the audio；

Step 3：According to the feature for the audio determined and the grader for distinguishing audio categories, classification results are determined； Wherein, the grader is to carry out classification based training according to the feature to each audio sample to obtain；Wherein, each audio sample It is characterized in what is obtained according to the feature and each cluster centre of its audio frame；

Classification determination unit, for the classification results to being determined according to classification results determining unit, determine the class of the audio Not.
13. device as claimed in claim 12, it is characterised in that the mode for obtaining each cluster centre specifically includes：

Performed for each audio sample：According to framing rule, the audio sample is divided, obtains multiple audio samples This frame；It is regular according to the feature extraction, extract the feature of each audio sample frame；According to obtained each audio sample frame and in advance Intersegmental overlapping first percentage set, obtains the audio sample section of predetermined number；And record obtained each audio sample section Ordering in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio sample frame of identical quantity；

According to the ordering recorded for each audio sample, respectively to the aligned identical in the ordering The feature for each audio sample frame that all audio sample sections of position are included is clustered, and obtains corresponding respectively to each arrangement position Each cluster centre put.
14. device as claimed in claim 13, it is characterised in that the classification results determining unit includes：

According to obtained each audio frame and intersegmental overlapping second percentage pre-set, the audio section of predetermined number is determined；And Record the ordering of each audio section determined in time；Wherein, it is continuous comprising identical quantity in each audio section Audio frame；

Performed respectively for each audio section：

According to the ordering of each audio section of record in time, the arrangement position of the audio section is determined；

According to the ordering recorded for each audio sample, each poly- of each arrangement position is corresponded respectively to from what is obtained In class center, it is determined that with each cluster centre corresponding to the arrangement position of the audio section；

The feature for each audio frame that the audio section is included respectively with each cluster centre corresponding to the arrangement position of the audio section It is compared, determines cluster centre corresponding to each audio frame that the audio section includes respectively.
15. device as claimed in claim 14, it is characterised in that the classification results determining unit includes：

Performed respectively for each audio section：

Determination and the number of the audio frame corresponding to each cluster centre corresponding to the arrangement position of the audio section respectively, and according to The number determined determines the feature of the audio section；

According to the arrangement position of each audio section, by the feature that the combinations of features for each audio section determined is the audio.
16. device as claimed in claim 12, it is characterised in that the classification determination unit, be specifically used for：

Each classification results obtained according to classification results determining unit, the classification of the audio is determined using ballot mode.
A kind of 17. method of audio classification, it is characterised in that including：

According to the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

According to the feature extraction rule pre-set, feature extraction is carried out to the multiple audio frame respectively, obtains each audio frame Feature；

According to the feature of obtained each audio frame, and for distinguishing each cluster centre of audio frame category, determine respectively each Cluster centre corresponding to audio frame；Wherein, the corresponding cluster centre of each audio frame meets：The audio frame feature and In the similarity of the feature of each cluster centre, the similarity of the feature of the feature of audio frame cluster centre corresponding with its is most Greatly；Each cluster centre is that each audio sample is divided into multiple audio sample frames respectively according to the framing rule, and After feature according to each audio sample frame of the feature extraction Rule Extraction, the feature of each audio sample frame to extracting is carried out What cluster obtained；

The number of the audio frame corresponding to each cluster centre is determined respectively, and the audio is determined according to the number determined Feature；

According to the feature for the audio determined and the grader for distinguishing audio categories, the classification of the audio is determined； Wherein, the grader is to carry out classification based training according to the feature to each audio sample to obtain；Wherein, each audio sample It is characterized in what is obtained according to the feature and each cluster centre of its audio frame；

The mode for obtaining each cluster centre specifically includes：

Performed for each audio sample：According to framing rule, the audio sample is divided, obtains multiple audio samples This frame；It is regular according to the feature extraction, extract the feature of each audio sample frame；According to obtained each audio sample frame and in advance Intersegmental overlapping first percentage set, obtains the audio sample section of predetermined number；And record obtained each audio sample section Ordering in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio sample frame of identical quantity；

According to the ordering recorded for each audio sample, respectively to the aligned identical in the ordering The feature for each audio sample frame that all audio sample sections of position are included is clustered, and obtains corresponding respectively to each arrangement position Each cluster centre put.
18. method as claimed in claim 17, it is characterised in that according to the feature of obtained each audio frame, and it is described each Cluster centre, determine each audio frame respectively corresponding to cluster centre, specifically include：

According to obtained each audio frame and intersegmental overlapping second percentage pre-set, the audio section of predetermined number is determined；And Record the ordering of each audio section determined in time；Wherein, it is continuous comprising identical quantity in each audio section Audio frame；

Performed respectively for each audio section：

According to the ordering of each audio section of record in time, the arrangement position of the audio section is determined；

According to the ordering recorded for each audio sample, each poly- of each arrangement position is corresponded respectively to from what is obtained In class center, it is determined that with each cluster centre corresponding to the arrangement position of the audio section；

The feature for each audio frame that the audio section is included respectively with each cluster centre corresponding to the arrangement position of the audio section It is compared, determines cluster centre corresponding to each audio frame that the audio section includes respectively.
19. method as claimed in claim 18, it is characterised in that the audio frame determined respectively corresponding to each cluster centre Number, and determine according to the number determined the feature of the audio, specifically include：

Performed respectively for each audio section：

Determination and the number of the audio frame corresponding to each cluster centre corresponding to the arrangement position of the audio section respectively, and according to The number determined determines the feature of the audio section；

According to the arrangement position of each audio section, by the feature that the combinations of features for each audio section determined is the audio.
A kind of 20. sorter of audio, it is characterised in that including：

Framing unit, for the framing rule pre-set, audio to be sorted is divided, obtains multiple audio frames；

Feature extraction unit, for according to the feature extraction rule pre-set, being obtained respectively to framing unit the multiple Audio frame carries out feature extraction, obtains the feature of each audio frame；

Cluster centre determining unit, for the feature of each audio frame obtained according to feature extraction unit, and for distinguishing sound Each cluster centre of frequency frame category, determines cluster centre corresponding to each audio frame respectively；Wherein, each audio frame is corresponding Cluster centre meet：The audio frame feature and each cluster centre feature similarity in, the feature of the audio frame The similarity of the feature of cluster centre corresponding with its is maximum；Each cluster centre is respectively will be each according to the framing rule Individual audio sample is divided into multiple audio sample frames, and according to the feature of each audio sample frame of the feature extraction Rule Extraction Afterwards, the feature of each audio sample frame to extracting is clustered what is obtained；

Characteristics determining unit, for determining the number of the audio frame corresponding to each cluster centre respectively, and according to the institute determined State the feature that number determines the audio；

Classification determination unit, for the feature of the audio determined according to characteristics determining unit and for distinguishing audio categories Grader, determine the classification of the audio；Wherein, the grader is to carry out classification instruction according to the feature to each audio sample Get；Wherein, each audio sample is characterized in what is obtained according to the feature and each cluster centre of its audio frame；

The mode for obtaining each cluster centre specifically includes：

Performed for each audio sample：According to framing rule, the audio sample is divided, obtains multiple audio samples This frame；It is regular according to the feature extraction, extract the feature of each audio sample frame；According to obtained each audio sample frame and in advance Intersegmental overlapping first percentage set, obtains the audio sample section of predetermined number；And record obtained each audio sample section Ordering in time；Wherein, each audio sample Duan Zhongjun includes the continuous audio sample frame of identical quantity；

According to the ordering recorded for each audio sample, respectively to the aligned identical in the ordering The feature for each audio sample frame that all audio sample sections of position are included is clustered, and obtains corresponding respectively to each arrangement position Each cluster centre put.
21. device as claimed in claim 20, it is characterised in that the cluster centre determining unit, specifically include：

Audio section determining module, for each audio frame obtained according to framing unit and intersegmental overlapping second percentage pre-set Than determining the audio section of predetermined number；And record the ordering of each audio section of determination in time；Wherein, each audio Duan Zhongjun includes the continuous audio frame of identical quantity；

Arrangement position determining module, each audio section for being determined for audio section determining module perform respectively：According to record The ordering of each audio section in time, determine the arrangement position of the audio section；

The determining module of cluster centre first, for the ordering recorded according to each audio sample is directed to, from what is obtained Correspond respectively in each cluster centre of each arrangement position, it is determined that with each cluster corresponding to the arrangement position of the audio section The heart；

The determining module of cluster centre second, for the feature of each audio frame that includes the audio section respectively with cluster centre first Each cluster centre that determining module determines is compared, and is determined respectively in being clustered corresponding to each audio frame that the audio section includes The heart.
22. device as claimed in claim 21, it is characterised in that the characteristics determining unit, specifically include：

Its feature determining module, for being performed respectively for each audio section：The arrangement position with the audio section is determined respectively The number of audio frame corresponding to corresponding each cluster centre, and determine according to the number determined the feature of the audio section；

Audio frequency characteristics determining module, for the arrangement position according to each audio section, its feature determining module is determined The combinations of features of each audio section is the feature of the audio.