CN110189768B - Chinese folk song geographical classification method based on conditional random field - Google Patents
Chinese folk song geographical classification method based on conditional random field Download PDFInfo
- Publication number
- CN110189768B CN110189768B CN201910395241.3A CN201910395241A CN110189768B CN 110189768 B CN110189768 B CN 110189768B CN 201910395241 A CN201910395241 A CN 201910395241A CN 110189768 B CN110189768 B CN 110189768B
- Authority
- CN
- China
- Prior art keywords
- sequence
- conditional random
- random field
- audio
- music
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000002372 labelling Methods 0.000 claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 20
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 42
- 238000012546 transfer Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 238000009432 framing Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 230000005236 sound signal Effects 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000007476 Maximum Likelihood Methods 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 238000011160 research Methods 0.000 abstract description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/036—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal of musical genre, i.e. analysing the style of musical pieces, usually for selection, filtering or classification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/041—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal based on mfcc [mel -frequency spectral coefficients]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/061—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a Chinese folk song region classification method based on a conditional random field. The invention provides a method for modeling the frame characteristics of folk songs by adopting a conditional random field in consideration of the time sequence of music, wherein a labeling sequence of the folk songs is calculated by combining a limited Boltzmann computer, parameters are learned by using a quasi-Newton algorithm and a k-time contrast divergence method, and finally, the music region classification is realized. Compared with the traditional method, the method solves the problem of the lack of the time sequence relation of the characteristic sequence, and simultaneously adopts the limited Boltzmann computer to calculate the conditional random field tagging sequence, thereby solving the bottleneck problem of the accuracy of the conventional research and calculation of the tagging sequence. In addition, the limited Boltzmann machine learns the audio frame characteristics to obtain the high-level music characteristics, so that the difference between the frame characteristics is increased, and the difficulty of manual audio characteristic design is simplified. The method effectively solves the problem of classification precision of the folk songs, and improves the classification result of the region style of the folk songs.
Description
Technical Field
The invention belongs to the field of machine learning and data mining, and particularly relates to a Chinese folk song region classification method based on a conditional random field.
Background
With the rapid development of multimedia technology, music has been converted from conventional recording, magnetic tape, and the like to digital music, and huge digital music data needs to be managed more efficiently. Under the background, music information retrieval and music classification identification have important academic significance and wide application scenes, and great attention is paid to the academic and industrial fields. The development of music classification technology is helpful to intelligently manage different categories of music and help users to realize high-speed music retrieval in interested categories. With the mass growth of music data, how to further improve the classification precision and efficiency of music is very important.
With the spread and development of Chinese culture, Chinese folk songs with obvious regional styles are beginning to be contacted and liked by more people, so the research on the classification of the Chinese folk songs based on the regional styles is particularly important. However, folk songs are generally impulse editing and vocal singing, lack strict creation rules, and have fuzzy categories of folk song styles in various regions, and cannot well represent the categories of the region styles by directly and independently using frame features, so that related research results are few.
Disclosure of Invention
The invention aims to solve the problem of missing of a characteristic sequence time sequence in the traditional folk song regional style classification algorithm, and provides a Chinese folk song regional classification method based on a conditional random field.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a Chinese folk song regional classification method based on conditional random field, extract the audio frequency characteristic of the music at first, then set up the audio frequency characteristic time sequence model based on conditional random field, combine limited Boltzmann computer its label sequence at the same time, use quasi-Newton's algorithm and k times contrast divergence method to study the parameter, carry on the implementation of regional classification of Chinese folk song finally, include the following steps specifically:
1) audio feature extraction of music: comprises the steps of selecting audio characteristics, framing music segments and extracting music audio characteristics, and particularly comprises the following steps,
1-1) selection of audio features: analyzing from the angles of different variable domains of the audio signal, and selecting time domain characteristics, frequency domain characteristics and cepstrum domain characteristics as audio characteristics representing the tone color and melody of the music;
1-2) framing of music pieces: taking into account music audioShort-time stationarity, sampling the music audio m into continuous short-time segments, m ═ m1,m2,...,mi,...,mNIn which m isiReferred to as "frame", N denotes the sequence length;
1-3) extracting music audio characteristics: extracting 1-1) relevant features v ═ v in the music audio piece m in units of frames1,v2,...,vi,...,vNIn which v isi∈RdA feature vector containing d-dimensional data representing the ith frame;
2) establishing an audio characteristic time sequence model: modeling the folk songs with different region styles through a conditional random field by considering the time sequence relation among the audio features of each frame;
3) calculating an audio prediction annotation sequence: the audio prediction labeling sequence calculation process based on the conditional random field model comprises the learning of music advanced features and the determination of a labeling sequence, and specifically comprises the following steps,
3-1) learning of music advanced features: taking the extracted feature sequence v as the input of a restricted Boltzmann machine, performing network learning by adopting a k-time contrast divergence method, and taking the abstract feature obtained by calculating a hidden layer as a high-level feature of music, wherein the abstract feature is expressed as x ═ x { (x)1,x2,...,xi,...,xNIn which xi∈RdA music high-level feature vector containing d-dimensional data representing the ith frame;
3-2) determination of annotation sequence: taking the music advanced feature vector x as an observed value of the conditional random field observation sequence, calculating a feature function by adopting a restricted Boltzmann machine, and further obtaining a conditional random field labeling sequence y ═ y { (y)1,y2,...,yi,...,yN},yiRepresenting the ith frame high level feature xiMarking corresponding region categories;
4) and (3) realizing music region classification: and identifying the region type of the song according to the obtained models corresponding to different region styles.
In a further improvement of the present invention, the audio features selected in step 1-1) include short-time average Zero Crossing Rate (ZCR), Spectral Centroid (SC), Spectral Flux (SF), spectral attenuation cutoff frequency (SRP), Chroma feature, Linear Prediction Cepstrum Coefficient (LPCC) and mel-frequency cepstrum coefficient (MFCC), which are 7 kinds and 86 dimensions.
A further development of the invention is that said step 2) is specifically operative to: taking a high-level characteristic sequence of each folk song in a class of region type folk songs in a training set as an observation sequence of a conditional random field, establishing a model for each class of region type folk songs by adopting a parameterized form of the conditional random field, and calculating a conditional random field model parameter representing each class of region type folk songs by adopting a parameter solving method; the method comprises the following specific steps:
(i) taking the audio frame characteristic sequence of any one of the folk songs in the region-type folk song sample set in the training set as the observation sequence of the conditional random field, and adopting the probability form of the conditional random field model fitting the folk song sample set of each region type, namely
Wherein, Z (x)(t)) Is a normalization factor, fk(x(t),y(t)) Is to all time characteristic functionsThe sum is obtained by summing up the sum,comprising a state feature function slAnd a transfer characteristic function tkThrough slAnd tkCalculating region category labels of each frame in the audio frame sequence of the folk songs and transfer paths among the frames; w is akIs a characteristic function fkCorresponding weights, including state feature function slAnd a transfer characteristic function tkCorresponding state weight mulAnd a transfer weight λk;
(ii) Calculating conditional random field model parameter w representing each type of region type folk songs by adopting BFGS algorithm in quasi-Newton methodkAccording to the maximumThe parameter solution is to take logarithm of probability density function of a data set satisfying a conditional random field to obtain a log-likelihood function L (w), and continuously iterate and update the L (w) as an optimized target function to maximize log-likelihood function values of all songs, thereby obtaining an optimal solution w of model parameters*。
The further improvement of the invention is that in the step 3-1), a restricted Boltzmann machine is adopted to model the audio features of the folk songs, the originally input features are mapped through nonlinear space transformation, and the high-level feature sequence learned by the hidden layer is used as an observation sequence, which is beneficial to increasing the difference between original audio feature frames; wherein, the estimation of the limited Boltzmann machine parameter is calculated according to the maximum log-likelihood function estimation rule: firstly, pre-training network parameters by adopting a k-time contrast divergence algorithm; then, the parameters calculated by the k times of contrast divergence algorithm are used as the initial values of the network again, and the network parameters are finely adjusted by adopting a back propagation algorithm; and finally, obtaining the optimal parameters of the network.
The invention has the further improvement that the specific calculation process of the step 3-2) is as follows: firstly, adding a Softmax layer behind a hidden layer of a limited Boltzmann machine, wherein each unit in the Softmax layer represents a folk song with a region style category, so that the Softmax layer has a discrimination mechanism; then, judging the labeling state of the high-level features at each moment, setting the corresponding dimension of the state feature function to be 1, and finally obtaining the labeling sequence of the song; the method comprises the following specific steps:
(i) establishing a restricted Boltzmann machine discrimination model for all audio frame characteristics in a data set, namely adding a Softmax layer behind a hidden layer of the restricted Boltzmann machine, wherein each unit in the Softmax layer represents a folk song of a region style class, so that the restricted Boltzmann machine has a discrimination mechanism;
(ii) when the state of the region type label at any time in the label sequence is calculated, the abstract feature learned at the time is judged by the formula (2) to be labeled, namely, the state feature function slSetting the corresponding dimension as 1;
in the formula:the expression takes the value of j when the probability P (-) takes the maximum value,representing the weight value on a line connected with the jth node of the Softmax layer, and determining the transfer characteristic function t at any two adjacent moments after region category labels are obtainedk;
(iii) And finally, obtaining a labeling sequence of the tth song in the data set.
The further improvement of the invention is that the specific calculation process of the step 4) is as follows: firstly, taking high-level features obtained by songs as observation sequences of conditional random fields to be respectively brought into the trained conditional random fields, then respectively calculating the posterior probability of a prediction tagging sequence generated by each conditional random field through a forward-backward algorithm, and finally selecting the class represented by the conditional random field with the maximum posterior probability value as the prediction standard of testing the song region style class; the method comprises the following specific steps:
(i) taking the high-level characteristics obtained by each tested song as an observation sequence of a conditional random field, respectively bringing the observation sequence into the trained conditional random field, and respectively calculating the posterior probability of a tagging sequence generated by each conditional random field when the observation sequence is given by a forward-backward algorithm; wherein the given observation sequence refers to a predicted tagging sequence of the observation sequence in the conditional random field, which is calculated by a restricted boltzmann computer;
(ii) selecting the category represented by the conditional random field with the maximum posterior probability value as a prediction standard of the testing song region style category, namely satisfying the formula (3):
The invention has the following beneficial technical effects:
the invention provides a Chinese folk song regional classification method based on a conditional random field, which comprises the steps of firstly extracting audio features of music, then providing a method for modeling frame features of folk songs by adopting the conditional random field, combining a limited Boltzmann computer to calculate a labeling sequence, learning parameters by using a quasi-Newton algorithm and a k-time contrast divergence method, and finally realizing music regional classification. Compared with the traditional folk song region classification method, the method adopts the conditional random field to establish the time sequence relation among the frame feature sequences, and simultaneously adopts the limited Boltzmann machine to solve the problem of bottleneck of the accuracy of the conventional research and calculation of the labeling sequences, so that the region style classification performance of the folk song is further improved. Theoretical analysis and experimental analysis prove that the accuracy and precision of the method are improved in the Chinese folk song geographical classification problem.
Drawings
FIG. 1 is a diagram of a regionally classified global model based on conditional random fields according to the present invention.
Detailed Description
The present invention is described in further detail below with reference to the attached drawings.
Referring to fig. 1, the Chinese folk song geographical classification method based on the conditional random field provided by the invention firstly extracts the audio features of music, then establishes an audio feature time sequence model based on the conditional random field, simultaneously calculates the labeling sequence thereof by combining a limited Boltzmann computer, learns the parameters by using a quasi-Newton algorithm and a k-time contrast divergence method, and finally realizes the geographical classification of the Chinese folk song, which specifically comprises the following steps:
1) audio feature extraction of music: the method comprises the steps of selecting audio features, framing music segments and extracting music audio features, and specifically comprises the following steps:
step1 audio feature selection: analyzing from the angles of different variable domains of the audio signal, selecting time domain characteristics, frequency domain characteristics and cepstrum domain characteristics as audio characteristics for representing the tone color and melody of the music, and specifically selecting the characteristics according to a table 1;
table 1: the audio features selected in the present invention are used to characterize the timbre of music and melody.
Wherein, the selected 7 audio features all contribute to music representation, and ZCR can well represent the music tone color and is used for end point detection, pitch detection and tone segmentation of the audio signal; SC maps the distribution of audio signal frequencies; SF reflects the change amount of the energy spectrum of two adjacent frames; SRP reflects the energy spectrum, a measure of the spectral envelope; the Chroma characteristic considers the existence of harmony in music, reduces the interference of noise and non-tonal sound, and has stronger capacity of melody distinguishing and reducing misjudgment; the LPCC is a model simulating human phonation, and the MFCC is acoustic characteristics calculated by combining human auditory mechanism, and can well reflect the difference of folk songs in different regional styles on tone.
Step2 framing of music piece: in consideration of the short-time stationarity of the music audio, the music audio m is sampled into continuous short-time segments m ═ { m ═ m }1,m2,…,mi,…,mNIn which m isiReferred to as "frame", N denotes the sequence length;
step3 music audio feature extraction: extracting relevant features for the music audio frequency segment m by taking a frame as a unit, wherein v is { v ═ v }1,v2,…,vi,…,vNWhere v isi∈RdA feature vector containing d-dimensional data representing the ith frame.
2) Establishing an audio characteristic time sequence model: and (4) modeling the folk songs with different regional styles through the conditional random field by considering the time sequence relation among the audio features of each frame. The key steps are described as follows:
(i) taking the audio frame characteristic sequence of any one of the folk songs in the region-type folk song sample set in the training set as the observation sequence of the conditional random field, and adopting the probability form of the conditional random field model fitting the folk song sample set of each region type, namely
Wherein, Z (x)(t)) Is a normalization factor, fk(x(t),y(t)) Is to all time characteristic functionsThe sum is obtained by summing up the sum,comprising a state feature function slAnd a transfer characteristic function tkThrough slAnd tkCalculating region category labels of each frame in the audio frame sequence of the folk songs and transfer paths among the frames; w is akIs a characteristic function fkCorresponding weights, including state feature function slAnd a transfer characteristic function tkCorresponding state weight mulAnd a transfer weight λk;
(ii) Calculating conditional random field model parameter w representing each type of region type folk songs by adopting BFGS algorithm in quasi-Newton methodkAccording to the maximum likelihood function estimation rule, parameter solving is to obtain a log likelihood function L (w) by taking the logarithm of the probability density function of the data set satisfying the conditional random field, and continuously iteratively update the L (w) as an optimized target function to maximize the log likelihood function values of all songs, thereby obtaining the optimal solution w of the model parameters*。
3) Calculating an audio prediction annotation sequence: the audio prediction labeling sequence calculation process based on the conditional random field model comprises the learning of music advanced features and the determination of a labeling sequence, and specifically comprises the following steps:
step1 learning of music advanced features: with reference to FIG. 1, the dashed box represents a restricted Boltzmann machine structure, comprising a visible layer VAnd several hidden layers HN. Taking the extracted feature sequence v as the input of a restricted Boltzmann machine, performing network learning by adopting a k-time contrast divergence method, and taking the abstract feature obtained by calculating a hidden layer as a high-level feature of music, wherein the abstract feature is expressed as x ═ x { (x)1,x2,...,xi,...,xNIn which xi∈RdA music high-level feature vector containing d-dimensional data representing the ith frame;
determination of Step2 annotation sequence: taking the music advanced feature vector x as an observed value of the conditional random field observation sequence, calculating a feature function by adopting a restricted Boltzmann machine, and further obtaining a conditional random field labeling sequence y ═ y { (y)1,y2,...,yi,...,yN},yiRepresenting the ith frame high level feature xiAnd marking the corresponding region types. The key steps are described as follows:
(i) establishing a restricted Boltzmann machine discrimination model for all audio frame characteristics in a data set, namely adding a Softmax layer behind a hidden layer of the restricted Boltzmann machine, wherein each unit in the Softmax layer represents a folk song of a region style class, so that the restricted Boltzmann machine has a discrimination mechanism;
(ii) when the state of the region type label at any time in the label sequence is calculated, the abstract feature learned at the time is judged by the formula (2) to be labeled, namely, the state feature function slSetting the corresponding dimension as 1;
in the formula:the expression takes the value of j when the probability P (-) takes the maximum value,representing the weight value on a line connected with the jth node of the Softmax layer, and determining the transfer characteristic function t at any two adjacent moments after region category labels are obtainedk。
(iii) And finally, obtaining a labeling sequence of the tth song in the data set.
4) And (3) realizing music region classification: and identifying the region type of the song according to the obtained models corresponding to different region styles. The specific calculation process is as follows: firstly, taking high-level features obtained by songs as observation sequences of conditional random fields to be respectively brought into the trained conditional random fields, then respectively calculating the posterior probability of a prediction labeling sequence generated by each conditional random field through a forward-backward algorithm, and finally selecting the class represented by the conditional random field with the maximum posterior probability value as the prediction standard of the tested song region style class. The method comprises the following specific steps:
(i) taking the high-level characteristics obtained by each tested song as an observation sequence of a conditional random field, respectively bringing the observation sequence into the trained conditional random field, and respectively calculating the posterior probability of a tagging sequence generated by each conditional random field when the observation sequence is given by a forward-backward algorithm; wherein the given observation sequence refers to a predicted tagging sequence of the observation sequence in the conditional random field, which is calculated by a restricted boltzmann computer;
(ii) selecting the category represented by the conditional random field with the maximum posterior probability value as a prediction standard of the testing song region style category, namely satisfying the formula (3):
Referring to table 2, it can be seen from the confusion matrix experimental results for classifying the Shanxi folk songs, the Jiangsu folk songs and the Hunan folk songs that the method for classifying the Chinese folk songs based on the conditional random field provided by the invention obtains better classification results.
Table 2: the classification method provided by the invention compares the accuracy and adopts a classification confusion matrix to evaluate.
Referring to table 3, the classification accuracy of the region of the chinese folk songs, which is encoded by the conditional random field according to the present invention, is higher than that of the existing folk song classification method.
Table 3: and comparing the accuracy of the classification algorithm of different regional styles of folk songs.
Claims (4)
1. A Chinese folk song regional classification method based on conditional random field is characterized in that audio features of music are extracted firstly, then an audio feature time sequence model is built based on the conditional random field, meanwhile, a limited Boltzmann computer is combined to calculate a labeling sequence, a quasi-Newton algorithm and a k-time contrast divergence method are used for learning parameters, and finally, the regional classification of the Chinese folk song is realized, and the method specifically comprises the following steps:
1) audio feature extraction of music: comprises the steps of selecting audio characteristics, framing music segments and extracting music audio characteristics, and particularly comprises the following steps,
1-1) selection of audio features: analyzing from the angles of different variable domains of the audio signal, and selecting time domain characteristics, frequency domain characteristics and cepstrum domain characteristics as audio characteristics representing the tone color and melody of the music; the selected audio features comprise short-time average zero-crossing rate, frequency spectrum mass center, frequency spectrum flow, frequency spectrum attenuation cut-off frequency, Chroma features, linear prediction cepstrum coefficient and Mel frequency cepstrum coefficient, and the total number is 7, and 86 dimensions are adopted;
1-2) framing of music pieces: in consideration of the short-time stationarity of music audio, the music audio m is sampled into continuous short-time segments, m ═ m1,m2,...,mi,...,mNIn which m isiReferred to as "frame", N denotes the sequence length;
1-3) extracting music audio characteristics: extracting 1-1) relevant features v ═ v in the music audio piece m in units of frames1,v2,...,vi,...,vNIn which v isi∈RdA feature vector containing d-dimensional data representing the ith frame;
2) establishing an audio characteristic time sequence model: modeling the folk songs with different region styles through a conditional random field by considering the time sequence relation among the audio features of each frame; the specific operation is as follows: taking a high-level characteristic sequence of each folk song in a class of region type folk songs in a training set as an observation sequence of a conditional random field, establishing a model for each class of region type folk songs by adopting a parameterized form of the conditional random field, and calculating a conditional random field model parameter representing each class of region type folk songs by adopting a parameter solving method; the method comprises the following specific steps:
(i) taking the audio frame characteristic sequence of any one of the folk songs in the region-type folk song sample set in the training set as the observation sequence of the conditional random field, and adopting the probability form of the conditional random field model fitting the folk song sample set of each region type, namely
Wherein, Z (x)(t)) Is a normalization factor, fk(x(t),y(t)) Is to all time characteristic functionsThe sum is obtained by summing up the sum,comprising a state feature function slAnd a transfer characteristic function tkThrough slAnd tkCalculating region category labels of each frame in the audio frame sequence of the folk songs and transfer paths among the frames; w is akIs a characteristic function fkCorresponding weights, including status bitsCharacteristic function slAnd a transfer characteristic function tkCorresponding state weight mulAnd a transfer weight λk;
(ii) Calculating conditional random field model parameter w representing each type of region type folk songs by adopting BFGS algorithm in quasi-Newton methodkAccording to the maximum likelihood function estimation rule, parameter solving is to obtain a log likelihood function L (w) by taking the logarithm of the probability density function of the data set satisfying the conditional random field, and continuously iteratively update the L (w) as an optimized target function to maximize the log likelihood function values of all songs, thereby obtaining the optimal solution w of the model parameters*;
3) Calculating an audio prediction annotation sequence: the audio prediction labeling sequence calculation process based on the conditional random field model comprises the learning of music advanced features and the determination of a labeling sequence, and specifically comprises the following steps,
3-1) learning of music advanced features: taking the extracted feature sequence v as the input of a restricted Boltzmann machine, performing network learning by adopting a k-time contrast divergence method, and taking the abstract feature obtained by calculating a hidden layer as a high-level feature of music, wherein the abstract feature is expressed as x ═ x { (x)1,x2,...,xi,...,xNIn which xi∈RdA music high-level feature vector containing d-dimensional data representing the ith frame;
3-2) determination of annotation sequence: taking the music advanced feature vector x as an observed value of the conditional random field observation sequence, calculating a feature function by adopting a restricted Boltzmann machine, and further obtaining a conditional random field labeling sequence y ═ y { (y)1,y2,...,yi,...,yN},yiRepresenting the ith frame high level feature xiMarking corresponding region categories;
4) and (3) realizing music region classification: and identifying the region type of the song according to the obtained models corresponding to different region styles.
2. The regional classification method of folk songs in China based on the conditional random field according to claim 1, wherein in the step 3-1), a restricted Boltzmann machine is adopted to model the audio features of folk songs, the originally input features are mapped through nonlinear space transformation, and high-level feature sequences learned by a hidden layer are taken as an observation sequence, so that the difference between original audio feature frames is increased; wherein, the estimation of the limited Boltzmann machine parameter is calculated according to the maximum log-likelihood function estimation rule: firstly, pre-training network parameters by adopting a k-time contrast divergence algorithm; then, the parameters calculated by the k times of contrast divergence algorithm are used as the initial values of the network again, and the network parameters are finely adjusted by adopting a back propagation algorithm; and finally, obtaining the optimal parameters of the network.
3. The method for classifying Chinese folk songs based on the conditional random field as claimed in claim 2, wherein the step 3-2) comprises the following steps: firstly, adding a Softmax layer behind a hidden layer of a limited Boltzmann machine, wherein each unit in the Softmax layer represents a folk song with a region style category, so that the Softmax layer has a discrimination mechanism; then, judging the labeling state of the high-level features at each moment, setting the corresponding dimension of the state feature function to be 1, and finally obtaining the labeling sequence of the song; the method comprises the following specific steps:
(i) establishing a restricted Boltzmann machine discrimination model for all audio frame characteristics in a data set, namely adding a Softmax layer behind a hidden layer of the restricted Boltzmann machine, wherein each unit in the Softmax layer represents a folk song of a region style class, so that the restricted Boltzmann machine has a discrimination mechanism;
(ii) when the state of the region type label at any time in the label sequence is calculated, the abstract feature learned at the time is judged by the formula (2) to be labeled, namely, the state feature function slSetting the corresponding dimension as 1;
in the formula:expressing get probabilityP (-) takes the value of j when the value is maximum,representing the weight value on a line connected with the jth node of the Softmax layer, and determining the transfer characteristic function t at any two adjacent moments after region category labels are obtainedk;
(iii) And finally, obtaining a labeling sequence of the tth song in the data set.
4. The method for classifying Chinese folk songs based on the conditional random field as claimed in claim 3, wherein the step 4) comprises the following specific calculation processes: firstly, taking high-level features obtained by songs as observation sequences of conditional random fields to be respectively brought into the trained conditional random fields, then respectively calculating the posterior probability of a prediction tagging sequence generated by each conditional random field through a forward-backward algorithm, and finally selecting the class represented by the conditional random field with the maximum posterior probability value as the prediction standard of testing the song region style class; the method comprises the following specific steps:
(i) taking the high-level characteristics obtained by each tested song as an observation sequence of a conditional random field, respectively bringing the observation sequence into the trained conditional random field, and respectively calculating the posterior probability of a tagging sequence generated by each conditional random field when the observation sequence is given by a forward-backward algorithm; wherein the given observation sequence refers to a predicted tagging sequence of the observation sequence in the conditional random field, which is calculated by a restricted boltzmann computer;
(ii) selecting the category represented by the conditional random field with the maximum posterior probability value as a prediction standard of the testing song region style category, namely satisfying the formula (3):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910395241.3A CN110189768B (en) | 2019-05-13 | 2019-05-13 | Chinese folk song geographical classification method based on conditional random field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910395241.3A CN110189768B (en) | 2019-05-13 | 2019-05-13 | Chinese folk song geographical classification method based on conditional random field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110189768A CN110189768A (en) | 2019-08-30 |
CN110189768B true CN110189768B (en) | 2021-02-02 |
Family
ID=67716108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910395241.3A Active CN110189768B (en) | 2019-05-13 | 2019-05-13 | Chinese folk song geographical classification method based on conditional random field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110189768B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110853604A (en) * | 2019-10-30 | 2020-02-28 | 西安交通大学 | Automatic generation method of Chinese folk songs with specific region style based on variational self-encoder |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056128A (en) * | 2016-04-20 | 2016-10-26 | 北京航空航天大学 | Remote sensing image classification marking method based on composite graph conditional random field |
CN106328121A (en) * | 2016-08-30 | 2017-01-11 | 南京理工大学 | Chinese traditional musical instrument classification method based on depth confidence network |
CN106952644A (en) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | A kind of complex audio segmentation clustering method based on bottleneck characteristic |
-
2019
- 2019-05-13 CN CN201910395241.3A patent/CN110189768B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056128A (en) * | 2016-04-20 | 2016-10-26 | 北京航空航天大学 | Remote sensing image classification marking method based on composite graph conditional random field |
CN106328121A (en) * | 2016-08-30 | 2017-01-11 | 南京理工大学 | Chinese traditional musical instrument classification method based on depth confidence network |
CN106952644A (en) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | A kind of complex audio segmentation clustering method based on bottleneck characteristic |
Also Published As
Publication number | Publication date |
---|---|
CN110189768A (en) | 2019-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109241524A (en) | Semantic analysis method and device, computer readable storage medium, electronic equipment | |
Zhang | Music style classification algorithm based on music feature extraction and deep neural network | |
CN106328121A (en) | Chinese traditional musical instrument classification method based on depth confidence network | |
CN103823867A (en) | Humming type music retrieval method and system based on note modeling | |
Peeters et al. | Sound indexing using morphological description | |
CN102073636A (en) | Program climax search method and system | |
Ng et al. | Multi-level local feature coding fusion for music genre recognition | |
US20190199781A1 (en) | Music categorization using rhythm, texture and pitch | |
US11271993B2 (en) | Streaming music categorization using rhythm, texture and pitch | |
Samsekai Manjabhat et al. | Raga and tonic identification in carnatic music | |
Xu | Recognition and classification model of music genres and Chinese traditional musical instruments based on deep neural networks | |
CN113813609B (en) | Game music style classification method and device, readable medium and electronic equipment | |
CN110399522A (en) | A kind of music singing search method and device based on LSTM and layering and matching | |
CN113192471A (en) | Music main melody track identification method based on neural network | |
CN109189982A (en) | A kind of music emotion classification method based on SVM Active Learning | |
Shen et al. | Effective music tagging through advanced statistical modeling | |
CN110189768B (en) | Chinese folk song geographical classification method based on conditional random field | |
Nagavi et al. | Overview of automatic Indian music information recognition, classification and retrieval systems | |
Li et al. | The regional style classification of Chinese folk songs based on GMM-CRF model | |
Haque et al. | An enhanced fuzzy c-means algorithm for audio segmentation and classification | |
Zhang | [Retracted] Research on Music Classification Technology Based on Deep Learning | |
CN111462774A (en) | Music emotion credible classification method based on deep learning | |
Coviello et al. | Automatic Music Tagging With Time Series Models. | |
Chordia et al. | Extending Content-Based Recommendation: The Case of Indian Classical Music. | |
Li | [Retracted] Transformation of Nonmultiple Cluster Music Cyclic Shift Topology to Music Performance Style |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |