CN106024011A

CN106024011A - MOAS based deep layer feature extracting method

Info

Publication number: CN106024011A
Application number: CN201610333538.3A
Authority: CN
Inventors: 杨继臣; 刘磊安
Original assignee: Zhongkai University of Agriculture and Engineering
Current assignee: Zhongkai University of Agriculture and Engineering
Priority date: 2016-05-19
Filing date: 2016-05-19
Publication date: 2016-10-12

Abstract

The invention relates to a deep layer feature extracting method and more specifically relates to a deep layer feature extracting method employing a MOAS (Movie Origin Audio Sample) as the input. The method includes steps of 1, constructing an RBM (Restricted Boltzmann Machine); 2, training the RBM; 3, constructing a deep layer feature extractor; 4, taking the MOAS as the input of a deep layer feature extractor and extracting deep layer features. According to the invention, the MOAS is taken as the input for deep layer feature extraction, training deep layer number can be reduced and more valid information can be extracted than a method employing superficial layer feature as the input.

Description

A kind of further feature extracting method based on MOAS

Technical field

The present invention relates to the method that further feature is extracted, use MOAS (Movie Origin more particularly, to one Audio Sample, film original audio sampled point) as inputting the method extracting further feature.

Background technology

Due to the development of Internet technology, the cinematic data relying on network is explosive growth, and online movie resource is more Come the hugest.Because film easily obtains, therefore having substantial amounts of spectators, current movies signal processes faced and mainly asks Topic is how vast as the open sea cinematic data to be analyzed, index and to be managed, and is convenient for people to what quick-searching was wanted to oneself Information.Therefore, film is carried out content analysis and become more and more urgent with understanding.Audio frequency is the important letter understanding content of multimedia Breath source (Ghoraani, 2011), audio frequency is also a kind of important form in film, the information either still comprised in quantity In content, all occupy and important component.Audio-frequency information has got more and more and has been used in movie contents analysis and understanding in recent years In (Wang, 2006, Benini, 2013).

In movie audio content analysis and understanding research, feature extraction is a critically important problem, only feature Extract, movie audio signal well could be classified and movie audio Scene Semantics reasoning is studied, feature The quality extracted directly affects the order of accuarcy of movie audio Modulation recognition and the semantic reasoning result of movie audio scene, instead Coming over, the order of accuarcy of movie audio Modulation recognition and the semantic reasoning result of movie audio scene may also be used for assessing feature Performance.

In the research of former movie audio signal, the feature of use is typically all artificial constructed shallow-layer feature, than Such as mel cepstrum coefficients (Mel-Frequency Cepstral Coefficient, MFCC), time-frequency characteristics etc. (Austin, 2010, Li, 2014).Original input signal is only transformed into particular space by shallow-layer feature, therefore cannot effectively portray the spy of signal Property, thus result in movie audio signal processing and do not reach the preferable requirement of people.And use deep neural network (deep Neural network, DNN) further feature that obtains of (Hinton, 2006) study not only eliminates loaded down with trivial details and complicated artificial The process of construction feature but also artificial constructed unavailable feature (Seide, 2011) can be extracted, owing to DNN can learn Practise more useful feature, thus finally promote classification or the accuracy (Yu Kai, 2013) of prediction.

Recent years, further feature is widely used in field of speech recognition (Mohamed, 2011, Bao, 2013), and these are deep Layer feature is typically all by using DNN to obtain MFCC feature learning, i.e. using MFCC as the input of DNN, but this logical Cross and MFCC trained the further feature that obtains, since it is desired that remove the information that information useless remains with, so above which floor Effect be generally not very well, it is generally required to deep layer effect just can be got well.If directly using MOAS as the input of DNN, So can directly use DNN to extract effective further feature from MOAS, the degree of depth number of plies of training can be saved；It addition, by In MFCC during extracting, eliminate the information that some in MOAS are useful, use the process that MFCC is learnt by DNN below In, the information of this partial loss is difficult to study and obtains, if directly MOAS is as the input of DNN, and would not this thing happens； If the most directly using MOAS as the input of DNN, the degree of depth number of plies that the further feature of extraction needs not only ratio uses MFCC Input as DNN is few, and the useful information extracted should also can be some more.

Summary of the invention

The present invention is directed to the defect that current movie audio further feature is extracted, it is provided that a kind of further feature based on MOAS carries Access method.

For solving above-mentioned technical problem, technical scheme is as follows:

A kind of further feature extracting method based on MOAS, using MOAS as input, first builds a RBM (Restricted Boltzmann Machines, limited Boltzmann machine), is secondly trained this RBM, re-uses same The method of sample, builds multiple RBM, finally gives further feature extractor, finally using MOAS as this further feature extractor Input, obtains further feature.

Above-mentioned further feature extracting method based on MOAS, specifically includes the following step:

S1, first RBM of structure, it is constituted 2 by visual layers (visual layer) and hidden layer (hidden layer) Layer neural network model；

S2, using MOAS as the input of this RBM, train this RBM, make the likelihood score of visual layers reach maximum；

S3, on the basis of the RBM that s2 step trains, be further added by a hidden layer, will the hidden layer of first RBM As the visual layers of second RBM, build second RBM, train this RBM；

S4, use same method, build a further feature extractor constituted containing n-layer RBM；

S5, further feature extractor s4 step obtained are finely adjusted, and obtain final further feature extractor；

S6, utilize the further feature extractor that s5 step trains, using MOAS as input, extract corresponding deep of MOAS Layer feature.

In above-mentioned further feature extracting method based on MOAS, visual layers and the hidden layer of each RBM described are connected to each other, Connect with nothing between layer.

In above-mentioned further feature extracting method based on MOAS, the nodes of the visual layers of first RBM is set to 512, The nodes of hidden layer is set to 39.

In above-mentioned further feature extracting method based on MOAS, the nodes of the visual layers of second RBM is set to 39, hidden Nodes containing layer is set to 39.

In above-mentioned further feature extracting method based on MOAS, the use back propagation of s5 step (back-propagation, BP) weights between each layer of further feature extractor are finely adjusted, finally give every layer of weights all suitably further feature Extractor.

In above-mentioned further feature extracting method based on MOAS, the further feature extractor layer of described n-layer RBM composition and layer Between transformation relation be

df'_m+1=σ (df'_m)1≤m≤n

Wherein, df'_m+1、df'_mRepresenting the further feature of m+1 and m layer respectively, σ represents sigmoid function,

Compared with prior art, technical solution of the present invention provides the benefit that:

(1) feature that present invention further feature based on MOAS extracting method extracts is further feature, and further feature is not only Eliminate complicated and loaded down with trivial details artificial constructed process, but also can extract artificial constructed less than feature.

(2) present invention is using MOAS as the input of further feature extractor, and uses shallow-layer feature, such as MFCC as defeated Enter to compare, be possible not only to reduce the training number of plies, but also can avoid, during extracting MFCC, losing some useful letters Breath, say, that using MOAS as input, the useful information extracted can be more as input than using shallow-layer feature.

Accompanying drawing explanation

Fig. 1 is the flow chart that further feature based on MOAS is extracted；

Fig. 2 is the building process schematic diagram of first RBM；

Fig. 3 is the building process schematic diagram of second RBM；

Fig. 4 is the building process schematic diagram of further feature extractor.

Detailed description of the invention

Further describe the present invention with specific embodiment below in conjunction with the accompanying drawings, but the present invention is not appointed by embodiment The restriction of what form.

Fig. 1 shows the basic process extracting further feature based on film original audio sampled point.

It is as follows that what present invention further feature based on MOAS was extracted realizes process:

1. first have to prepare data into training further feature extractor, prepare data and be divided into two large divisions: pre-training data With fine setting data.Wherein, pre-training data, for further feature extractor is carried out pre-training, obtain a preliminary deep layer special Levying extractor, fine setting data are for being finely adjusted the further feature extractor obtained, regardless of which part data, Dou Yaowei They extract crude sampling point data and mel cepstrum coefficients respectively.

2. build and first RBM of training.Fig. 2 shows the building process of first RBM, and it is by visual layers and to imply The neural network model of 2 layers of layer composition, wherein visual layers and hidden layer are connected to each other, and connect with nothing between layer.V and h is made to divide Do not represent the parameter of visual layers and hidden layer, then a joint probability (formula is as follows) can distribute to RBM:

ρ (v, h) = \frac{1}{Z} \cdot e^{b^{T} v + c^{T} h + v^{T} W h}

Wherein Ζ represents standardizing factor, W representation value matrix, b and c represents the skew of visual layers and hidden layer respectively Value, T represents transposition.

3., on the basis of first RBM, build and train second RBM.Fig. 3 shows the structure of second RBM Journey.Its using the hidden layer of first RBM as visual layers, and unlike first RBM, its visual layers and the joint of hidden layer It is the same for counting, and uses above method, trains this RBM.

4. use same method, build a further feature extractor constituted containing n-layer RBM.Fig. 4 shows that this is deep The structure structure chart of layer feature extractor.

5. the further feature extractor utilizing fine setting data to obtain pre-training above is finely adjusted.The method wherein finely tuned It is to use back propagation (back-propagation, BP) that the weights between each layer of further feature extractor are finely adjusted, Finally give every layer of weights all suitably further feature extractor.

6. MOAS is input to this further feature extractor, further feature can be extracted.

Have again as a example by carrying out framing windowing (frame length 32ms, frame moves 16ms, adds Hamming window) with film original audio sampled point Body describes.

A1. assuming that sample frequency is 16KHz, so every frame just obtains 512 sampled points, it is assumed that the sampled point vector obtained For S, S is divided into three parts, respectively S₁, S₂And S₃, wherein S₁For pre-training, S₂For finely tuning, S₃Special for extracting deep layer Levy.

A2. to S₁And S₂Every frame extract mel cepstrum coefficients feature, it is assumed that the feature extracted is respectively M₀₁And M₀₂, S₁As the input of first RBM, M₀₁As the output of first RBM, train this RBM, when first RBM has trained After, it is assumed that the nonlinear characteristic through first RBM converts, S₁It is transformed to M₁。

A3. on the basis of first RBM, second RBM is built, wherein M₁Inputted as second RBM, M₀₁As the output of second RBM, train this RBM, after second RBM has trained, it is assumed that through the non-thread of second RBM Property eigentransformation, M₁It is transformed to M₂

A4. by same method, a further feature extractor being made up of n-layer RBM is trained, it is assumed that between layers Transformation relation is

df'_m+1=σ (df'_m)1≤m≤n

A5. S is used₂And M₀₂This further feature extractor is finely adjusted, wherein S₂Extract as this further feature The input of device, M₀₂Output as this further feature extractor.After having finely tuned, obtain new non-linear spy between layers Levy transformation for mula, it is assumed that for

df_m+1=σ (df_m)1≤m≤n

Wherein, df_m+1、df_mRepresenting the further feature of m+1 and m layer respectively, σ represents sigmoid function,

A6. S₃As the input of this further feature extractor, use and train the non-thread between layers obtained above Property eigentransformation formula, i.e. can get S₃Corresponding further feature.

Obviously, the above embodiment of the present invention is only for clearly demonstrating example of the present invention, and is not right The restriction of embodiments of the present invention.For those of ordinary skill in the field, the most also may be used To make other changes in different forms.Here without also cannot all of embodiment be given exhaustive.All at this Any amendment, equivalent and the improvement etc. made within the spirit of invention and principle, should be included in the claims in the present invention Protection domain within.

Claims

1. a further feature extracting method based on MOAS, it is characterised in that using MOAS as input, first builds a RBM And this RBM is trained, then with same method, build multiple RBM, obtain further feature extractor, finally using MOAS as The input of this further feature extractor, extracts its further feature.

Further feature extracting method based on MOAS the most according to claim 1, it is characterised in that comprise the following steps:

S1, first RBM of structure, it is made up of 2 layers of neural network model visual layers and hidden layer；

S3, on the basis of the RBM that s2 step trains, be further added by a hidden layer, will the hidden layer conduct of first RBM The visual layers of second RBM, builds second RBM, trains this RBM；

S6, utilize the further feature extractor that s5 step trains, using MOAS as input, extract deep layer corresponding for MOAS special Levy.

Further feature extracting method based on MOAS the most according to claim 2, it is characterised in that each RBM's described Visual layers and hidden layer are connected to each other, and connect with nothing between layer.

Further feature extracting method based on MOAS the most according to claim 2, it is characterised in that first RBM's can Being set to 512 depending on the nodes of layer, the nodes of hidden layer is set to 39.

Further feature extracting method based on MOAS the most according to claim 2, it is characterised in that second RBM's can Being set to 39 depending on the nodes of layer, the nodes of hidden layer is set to 39.

Further feature extracting method based on MOAS the most according to claim 2, it is characterised in that s5 step uses reversely Propagate and the weights between each layer of further feature extractor are finely adjusted, finally give every layer of weights all suitably further feature Extractor.

Further feature extracting method based on MOAS the most according to claim 2, it is characterised in that described n-layer RBM is constituted Further feature extractor transformation relation between layers be

df'_m+1=σ (df'_m)1≤m≤n