CN110189768B - Chinese folk song geographical classification method based on conditional random field - Google Patents

Chinese folk song geographical classification method based on conditional random field Download PDF

Info

Publication number
CN110189768B
CN110189768B CN201910395241.3A CN201910395241A CN110189768B CN 110189768 B CN110189768 B CN 110189768B CN 201910395241 A CN201910395241 A CN 201910395241A CN 110189768 B CN110189768 B CN 110189768B
Authority
CN
China
Prior art keywords
sequence
conditional random
random field
audio
music
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910395241.3A
Other languages
Chinese (zh)
Other versions
CN110189768A (en
Inventor
杨新宇
罗晶
丁建行
魏洁
董怡卓
张亦弛
夏小景
崔宇涵
吉姝蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910395241.3A priority Critical patent/CN110189768B/en
Publication of CN110189768A publication Critical patent/CN110189768A/en
Application granted granted Critical
Publication of CN110189768B publication Critical patent/CN110189768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/036Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal of musical genre, i.e. analysing the style of musical pieces, usually for selection, filtering or classification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/041Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal based on mfcc [mel -frequency spectral coefficients]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/061Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a Chinese folk song region classification method based on a conditional random field. The invention provides a method for modeling the frame characteristics of folk songs by adopting a conditional random field in consideration of the time sequence of music, wherein a labeling sequence of the folk songs is calculated by combining a limited Boltzmann computer, parameters are learned by using a quasi-Newton algorithm and a k-time contrast divergence method, and finally, the music region classification is realized. Compared with the traditional method, the method solves the problem of the lack of the time sequence relation of the characteristic sequence, and simultaneously adopts the limited Boltzmann computer to calculate the conditional random field tagging sequence, thereby solving the bottleneck problem of the accuracy of the conventional research and calculation of the tagging sequence. In addition, the limited Boltzmann machine learns the audio frame characteristics to obtain the high-level music characteristics, so that the difference between the frame characteristics is increased, and the difficulty of manual audio characteristic design is simplified. The method effectively solves the problem of classification precision of the folk songs, and improves the classification result of the region style of the folk songs.

Description

Chinese folk song geographical classification method based on conditional random field
Technical Field
The invention belongs to the field of machine learning and data mining, and particularly relates to a Chinese folk song region classification method based on a conditional random field.
Background
With the rapid development of multimedia technology, music has been converted from conventional recording, magnetic tape, and the like to digital music, and huge digital music data needs to be managed more efficiently. Under the background, music information retrieval and music classification identification have important academic significance and wide application scenes, and great attention is paid to the academic and industrial fields. The development of music classification technology is helpful to intelligently manage different categories of music and help users to realize high-speed music retrieval in interested categories. With the mass growth of music data, how to further improve the classification precision and efficiency of music is very important.
With the spread and development of Chinese culture, Chinese folk songs with obvious regional styles are beginning to be contacted and liked by more people, so the research on the classification of the Chinese folk songs based on the regional styles is particularly important. However, folk songs are generally impulse editing and vocal singing, lack strict creation rules, and have fuzzy categories of folk song styles in various regions, and cannot well represent the categories of the region styles by directly and independently using frame features, so that related research results are few.
Disclosure of Invention
The invention aims to solve the problem of missing of a characteristic sequence time sequence in the traditional folk song regional style classification algorithm, and provides a Chinese folk song regional classification method based on a conditional random field.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a Chinese folk song regional classification method based on conditional random field, extract the audio frequency characteristic of the music at first, then set up the audio frequency characteristic time sequence model based on conditional random field, combine limited Boltzmann computer its label sequence at the same time, use quasi-Newton's algorithm and k times contrast divergence method to study the parameter, carry on the implementation of regional classification of Chinese folk song finally, include the following steps specifically:
1) audio feature extraction of music: comprises the steps of selecting audio characteristics, framing music segments and extracting music audio characteristics, and particularly comprises the following steps,
1-1) selection of audio features: analyzing from the angles of different variable domains of the audio signal, and selecting time domain characteristics, frequency domain characteristics and cepstrum domain characteristics as audio characteristics representing the tone color and melody of the music;
1-2) framing of music pieces: taking into account music audioShort-time stationarity, sampling the music audio m into continuous short-time segments, m ═ m1,m2,...,mi,...,mNIn which m isiReferred to as "frame", N denotes the sequence length;
1-3) extracting music audio characteristics: extracting 1-1) relevant features v ═ v in the music audio piece m in units of frames1,v2,...,vi,...,vNIn which v isi∈RdA feature vector containing d-dimensional data representing the ith frame;
2) establishing an audio characteristic time sequence model: modeling the folk songs with different region styles through a conditional random field by considering the time sequence relation among the audio features of each frame;
3) calculating an audio prediction annotation sequence: the audio prediction labeling sequence calculation process based on the conditional random field model comprises the learning of music advanced features and the determination of a labeling sequence, and specifically comprises the following steps,
3-1) learning of music advanced features: taking the extracted feature sequence v as the input of a restricted Boltzmann machine, performing network learning by adopting a k-time contrast divergence method, and taking the abstract feature obtained by calculating a hidden layer as a high-level feature of music, wherein the abstract feature is expressed as x ═ x { (x)1,x2,...,xi,...,xNIn which xi∈RdA music high-level feature vector containing d-dimensional data representing the ith frame;
3-2) determination of annotation sequence: taking the music advanced feature vector x as an observed value of the conditional random field observation sequence, calculating a feature function by adopting a restricted Boltzmann machine, and further obtaining a conditional random field labeling sequence y ═ y { (y)1,y2,...,yi,...,yN},yiRepresenting the ith frame high level feature xiMarking corresponding region categories;
4) and (3) realizing music region classification: and identifying the region type of the song according to the obtained models corresponding to different region styles.
In a further improvement of the present invention, the audio features selected in step 1-1) include short-time average Zero Crossing Rate (ZCR), Spectral Centroid (SC), Spectral Flux (SF), spectral attenuation cutoff frequency (SRP), Chroma feature, Linear Prediction Cepstrum Coefficient (LPCC) and mel-frequency cepstrum coefficient (MFCC), which are 7 kinds and 86 dimensions.
A further development of the invention is that said step 2) is specifically operative to: taking a high-level characteristic sequence of each folk song in a class of region type folk songs in a training set as an observation sequence of a conditional random field, establishing a model for each class of region type folk songs by adopting a parameterized form of the conditional random field, and calculating a conditional random field model parameter representing each class of region type folk songs by adopting a parameter solving method; the method comprises the following specific steps:
(i) taking the audio frame characteristic sequence of any one of the folk songs in the region-type folk song sample set in the training set as the observation sequence of the conditional random field, and adopting the probability form of the conditional random field model fitting the folk song sample set of each region type, namely
Figure BDA0002057953340000031
Wherein, Z (x)(t)) Is a normalization factor, fk(x(t),y(t)) Is to all time characteristic functions
Figure BDA0002057953340000032
The sum is obtained by summing up the sum,
Figure BDA0002057953340000033
comprising a state feature function slAnd a transfer characteristic function tkThrough slAnd tkCalculating region category labels of each frame in the audio frame sequence of the folk songs and transfer paths among the frames; w is akIs a characteristic function fkCorresponding weights, including state feature function slAnd a transfer characteristic function tkCorresponding state weight mulAnd a transfer weight λk
(ii) Calculating conditional random field model parameter w representing each type of region type folk songs by adopting BFGS algorithm in quasi-Newton methodkAccording to the maximumThe parameter solution is to take logarithm of probability density function of a data set satisfying a conditional random field to obtain a log-likelihood function L (w), and continuously iterate and update the L (w) as an optimized target function to maximize log-likelihood function values of all songs, thereby obtaining an optimal solution w of model parameters*
The further improvement of the invention is that in the step 3-1), a restricted Boltzmann machine is adopted to model the audio features of the folk songs, the originally input features are mapped through nonlinear space transformation, and the high-level feature sequence learned by the hidden layer is used as an observation sequence, which is beneficial to increasing the difference between original audio feature frames; wherein, the estimation of the limited Boltzmann machine parameter is calculated according to the maximum log-likelihood function estimation rule: firstly, pre-training network parameters by adopting a k-time contrast divergence algorithm; then, the parameters calculated by the k times of contrast divergence algorithm are used as the initial values of the network again, and the network parameters are finely adjusted by adopting a back propagation algorithm; and finally, obtaining the optimal parameters of the network.
The invention has the further improvement that the specific calculation process of the step 3-2) is as follows: firstly, adding a Softmax layer behind a hidden layer of a limited Boltzmann machine, wherein each unit in the Softmax layer represents a folk song with a region style category, so that the Softmax layer has a discrimination mechanism; then, judging the labeling state of the high-level features at each moment, setting the corresponding dimension of the state feature function to be 1, and finally obtaining the labeling sequence of the song; the method comprises the following specific steps:
(i) establishing a restricted Boltzmann machine discrimination model for all audio frame characteristics in a data set, namely adding a Softmax layer behind a hidden layer of the restricted Boltzmann machine, wherein each unit in the Softmax layer represents a folk song of a region style class, so that the restricted Boltzmann machine has a discrimination mechanism;
(ii) when the state of the region type label at any time in the label sequence is calculated, the abstract feature learned at the time is judged by the formula (2) to be labeled, namely, the state feature function slSetting the corresponding dimension as 1;
Figure BDA0002057953340000041
in the formula:
Figure BDA0002057953340000042
the expression takes the value of j when the probability P (-) takes the maximum value,
Figure BDA0002057953340000043
representing the weight value on a line connected with the jth node of the Softmax layer, and determining the transfer characteristic function t at any two adjacent moments after region category labels are obtainedk
(iii) And finally, obtaining a labeling sequence of the tth song in the data set.
The further improvement of the invention is that the specific calculation process of the step 4) is as follows: firstly, taking high-level features obtained by songs as observation sequences of conditional random fields to be respectively brought into the trained conditional random fields, then respectively calculating the posterior probability of a prediction tagging sequence generated by each conditional random field through a forward-backward algorithm, and finally selecting the class represented by the conditional random field with the maximum posterior probability value as the prediction standard of testing the song region style class; the method comprises the following specific steps:
(i) taking the high-level characteristics obtained by each tested song as an observation sequence of a conditional random field, respectively bringing the observation sequence into the trained conditional random field, and respectively calculating the posterior probability of a tagging sequence generated by each conditional random field when the observation sequence is given by a forward-backward algorithm; wherein the given observation sequence refers to a predicted tagging sequence of the observation sequence in the conditional random field, which is calculated by a restricted boltzmann computer;
(ii) selecting the category represented by the conditional random field with the maximum posterior probability value as a prediction standard of the testing song region style category, namely satisfying the formula (3):
Figure BDA0002057953340000051
in the formula:
Figure BDA0002057953340000052
representing the value of j at which the probability P (-) is taken to be maximum.
The invention has the following beneficial technical effects:
the invention provides a Chinese folk song regional classification method based on a conditional random field, which comprises the steps of firstly extracting audio features of music, then providing a method for modeling frame features of folk songs by adopting the conditional random field, combining a limited Boltzmann computer to calculate a labeling sequence, learning parameters by using a quasi-Newton algorithm and a k-time contrast divergence method, and finally realizing music regional classification. Compared with the traditional folk song region classification method, the method adopts the conditional random field to establish the time sequence relation among the frame feature sequences, and simultaneously adopts the limited Boltzmann machine to solve the problem of bottleneck of the accuracy of the conventional research and calculation of the labeling sequences, so that the region style classification performance of the folk song is further improved. Theoretical analysis and experimental analysis prove that the accuracy and precision of the method are improved in the Chinese folk song geographical classification problem.
Drawings
FIG. 1 is a diagram of a regionally classified global model based on conditional random fields according to the present invention.
Detailed Description
The present invention is described in further detail below with reference to the attached drawings.
Referring to fig. 1, the Chinese folk song geographical classification method based on the conditional random field provided by the invention firstly extracts the audio features of music, then establishes an audio feature time sequence model based on the conditional random field, simultaneously calculates the labeling sequence thereof by combining a limited Boltzmann computer, learns the parameters by using a quasi-Newton algorithm and a k-time contrast divergence method, and finally realizes the geographical classification of the Chinese folk song, which specifically comprises the following steps:
1) audio feature extraction of music: the method comprises the steps of selecting audio features, framing music segments and extracting music audio features, and specifically comprises the following steps:
step1 audio feature selection: analyzing from the angles of different variable domains of the audio signal, selecting time domain characteristics, frequency domain characteristics and cepstrum domain characteristics as audio characteristics for representing the tone color and melody of the music, and specifically selecting the characteristics according to a table 1;
table 1: the audio features selected in the present invention are used to characterize the timbre of music and melody.
Figure BDA0002057953340000061
Wherein, the selected 7 audio features all contribute to music representation, and ZCR can well represent the music tone color and is used for end point detection, pitch detection and tone segmentation of the audio signal; SC maps the distribution of audio signal frequencies; SF reflects the change amount of the energy spectrum of two adjacent frames; SRP reflects the energy spectrum, a measure of the spectral envelope; the Chroma characteristic considers the existence of harmony in music, reduces the interference of noise and non-tonal sound, and has stronger capacity of melody distinguishing and reducing misjudgment; the LPCC is a model simulating human phonation, and the MFCC is acoustic characteristics calculated by combining human auditory mechanism, and can well reflect the difference of folk songs in different regional styles on tone.
Step2 framing of music piece: in consideration of the short-time stationarity of the music audio, the music audio m is sampled into continuous short-time segments m ═ { m ═ m }1,m2,…,mi,…,mNIn which m isiReferred to as "frame", N denotes the sequence length;
step3 music audio feature extraction: extracting relevant features for the music audio frequency segment m by taking a frame as a unit, wherein v is { v ═ v }1,v2,…,vi,…,vNWhere v isi∈RdA feature vector containing d-dimensional data representing the ith frame.
2) Establishing an audio characteristic time sequence model: and (4) modeling the folk songs with different regional styles through the conditional random field by considering the time sequence relation among the audio features of each frame. The key steps are described as follows:
(i) taking the audio frame characteristic sequence of any one of the folk songs in the region-type folk song sample set in the training set as the observation sequence of the conditional random field, and adopting the probability form of the conditional random field model fitting the folk song sample set of each region type, namely
Figure BDA0002057953340000071
Wherein, Z (x)(t)) Is a normalization factor, fk(x(t),y(t)) Is to all time characteristic functions
Figure BDA0002057953340000072
The sum is obtained by summing up the sum,
Figure BDA0002057953340000073
comprising a state feature function slAnd a transfer characteristic function tkThrough slAnd tkCalculating region category labels of each frame in the audio frame sequence of the folk songs and transfer paths among the frames; w is akIs a characteristic function fkCorresponding weights, including state feature function slAnd a transfer characteristic function tkCorresponding state weight mulAnd a transfer weight λk
(ii) Calculating conditional random field model parameter w representing each type of region type folk songs by adopting BFGS algorithm in quasi-Newton methodkAccording to the maximum likelihood function estimation rule, parameter solving is to obtain a log likelihood function L (w) by taking the logarithm of the probability density function of the data set satisfying the conditional random field, and continuously iteratively update the L (w) as an optimized target function to maximize the log likelihood function values of all songs, thereby obtaining the optimal solution w of the model parameters*
3) Calculating an audio prediction annotation sequence: the audio prediction labeling sequence calculation process based on the conditional random field model comprises the learning of music advanced features and the determination of a labeling sequence, and specifically comprises the following steps:
step1 learning of music advanced features: with reference to FIG. 1, the dashed box represents a restricted Boltzmann machine structure, comprising a visible layer VAnd several hidden layers HN. Taking the extracted feature sequence v as the input of a restricted Boltzmann machine, performing network learning by adopting a k-time contrast divergence method, and taking the abstract feature obtained by calculating a hidden layer as a high-level feature of music, wherein the abstract feature is expressed as x ═ x { (x)1,x2,...,xi,...,xNIn which xi∈RdA music high-level feature vector containing d-dimensional data representing the ith frame;
determination of Step2 annotation sequence: taking the music advanced feature vector x as an observed value of the conditional random field observation sequence, calculating a feature function by adopting a restricted Boltzmann machine, and further obtaining a conditional random field labeling sequence y ═ y { (y)1,y2,...,yi,...,yN},yiRepresenting the ith frame high level feature xiAnd marking the corresponding region types. The key steps are described as follows:
(i) establishing a restricted Boltzmann machine discrimination model for all audio frame characteristics in a data set, namely adding a Softmax layer behind a hidden layer of the restricted Boltzmann machine, wherein each unit in the Softmax layer represents a folk song of a region style class, so that the restricted Boltzmann machine has a discrimination mechanism;
(ii) when the state of the region type label at any time in the label sequence is calculated, the abstract feature learned at the time is judged by the formula (2) to be labeled, namely, the state feature function slSetting the corresponding dimension as 1;
Figure BDA0002057953340000081
in the formula:
Figure BDA0002057953340000082
the expression takes the value of j when the probability P (-) takes the maximum value,
Figure BDA0002057953340000083
representing the weight value on a line connected with the jth node of the Softmax layer, and determining the transfer characteristic function t at any two adjacent moments after region category labels are obtainedk
(iii) And finally, obtaining a labeling sequence of the tth song in the data set.
4) And (3) realizing music region classification: and identifying the region type of the song according to the obtained models corresponding to different region styles. The specific calculation process is as follows: firstly, taking high-level features obtained by songs as observation sequences of conditional random fields to be respectively brought into the trained conditional random fields, then respectively calculating the posterior probability of a prediction labeling sequence generated by each conditional random field through a forward-backward algorithm, and finally selecting the class represented by the conditional random field with the maximum posterior probability value as the prediction standard of the tested song region style class. The method comprises the following specific steps:
(i) taking the high-level characteristics obtained by each tested song as an observation sequence of a conditional random field, respectively bringing the observation sequence into the trained conditional random field, and respectively calculating the posterior probability of a tagging sequence generated by each conditional random field when the observation sequence is given by a forward-backward algorithm; wherein the given observation sequence refers to a predicted tagging sequence of the observation sequence in the conditional random field, which is calculated by a restricted boltzmann computer;
(ii) selecting the category represented by the conditional random field with the maximum posterior probability value as a prediction standard of the testing song region style category, namely satisfying the formula (3):
Figure BDA0002057953340000091
in the formula:
Figure BDA0002057953340000092
representing the value of j at which the probability P (-) is taken to be maximum.
Referring to table 2, it can be seen from the confusion matrix experimental results for classifying the Shanxi folk songs, the Jiangsu folk songs and the Hunan folk songs that the method for classifying the Chinese folk songs based on the conditional random field provided by the invention obtains better classification results.
Table 2: the classification method provided by the invention compares the accuracy and adopts a classification confusion matrix to evaluate.
Figure BDA0002057953340000093
Referring to table 3, the classification accuracy of the region of the chinese folk songs, which is encoded by the conditional random field according to the present invention, is higher than that of the existing folk song classification method.
Table 3: and comparing the accuracy of the classification algorithm of different regional styles of folk songs.
Figure BDA0002057953340000094

Claims (4)

1. A Chinese folk song regional classification method based on conditional random field is characterized in that audio features of music are extracted firstly, then an audio feature time sequence model is built based on the conditional random field, meanwhile, a limited Boltzmann computer is combined to calculate a labeling sequence, a quasi-Newton algorithm and a k-time contrast divergence method are used for learning parameters, and finally, the regional classification of the Chinese folk song is realized, and the method specifically comprises the following steps:
1) audio feature extraction of music: comprises the steps of selecting audio characteristics, framing music segments and extracting music audio characteristics, and particularly comprises the following steps,
1-1) selection of audio features: analyzing from the angles of different variable domains of the audio signal, and selecting time domain characteristics, frequency domain characteristics and cepstrum domain characteristics as audio characteristics representing the tone color and melody of the music; the selected audio features comprise short-time average zero-crossing rate, frequency spectrum mass center, frequency spectrum flow, frequency spectrum attenuation cut-off frequency, Chroma features, linear prediction cepstrum coefficient and Mel frequency cepstrum coefficient, and the total number is 7, and 86 dimensions are adopted;
1-2) framing of music pieces: in consideration of the short-time stationarity of music audio, the music audio m is sampled into continuous short-time segments, m ═ m1,m2,...,mi,...,mNIn which m isiReferred to as "frame", N denotes the sequence length;
1-3) extracting music audio characteristics: extracting 1-1) relevant features v ═ v in the music audio piece m in units of frames1,v2,...,vi,...,vNIn which v isi∈RdA feature vector containing d-dimensional data representing the ith frame;
2) establishing an audio characteristic time sequence model: modeling the folk songs with different region styles through a conditional random field by considering the time sequence relation among the audio features of each frame; the specific operation is as follows: taking a high-level characteristic sequence of each folk song in a class of region type folk songs in a training set as an observation sequence of a conditional random field, establishing a model for each class of region type folk songs by adopting a parameterized form of the conditional random field, and calculating a conditional random field model parameter representing each class of region type folk songs by adopting a parameter solving method; the method comprises the following specific steps:
(i) taking the audio frame characteristic sequence of any one of the folk songs in the region-type folk song sample set in the training set as the observation sequence of the conditional random field, and adopting the probability form of the conditional random field model fitting the folk song sample set of each region type, namely
Figure FDA0002848913810000021
Wherein, Z (x)(t)) Is a normalization factor, fk(x(t),y(t)) Is to all time characteristic functions
Figure FDA0002848913810000022
The sum is obtained by summing up the sum,
Figure FDA0002848913810000023
comprising a state feature function slAnd a transfer characteristic function tkThrough slAnd tkCalculating region category labels of each frame in the audio frame sequence of the folk songs and transfer paths among the frames; w is akIs a characteristic function fkCorresponding weights, including status bitsCharacteristic function slAnd a transfer characteristic function tkCorresponding state weight mulAnd a transfer weight λk
(ii) Calculating conditional random field model parameter w representing each type of region type folk songs by adopting BFGS algorithm in quasi-Newton methodkAccording to the maximum likelihood function estimation rule, parameter solving is to obtain a log likelihood function L (w) by taking the logarithm of the probability density function of the data set satisfying the conditional random field, and continuously iteratively update the L (w) as an optimized target function to maximize the log likelihood function values of all songs, thereby obtaining the optimal solution w of the model parameters*
3) Calculating an audio prediction annotation sequence: the audio prediction labeling sequence calculation process based on the conditional random field model comprises the learning of music advanced features and the determination of a labeling sequence, and specifically comprises the following steps,
3-1) learning of music advanced features: taking the extracted feature sequence v as the input of a restricted Boltzmann machine, performing network learning by adopting a k-time contrast divergence method, and taking the abstract feature obtained by calculating a hidden layer as a high-level feature of music, wherein the abstract feature is expressed as x ═ x { (x)1,x2,...,xi,...,xNIn which xi∈RdA music high-level feature vector containing d-dimensional data representing the ith frame;
3-2) determination of annotation sequence: taking the music advanced feature vector x as an observed value of the conditional random field observation sequence, calculating a feature function by adopting a restricted Boltzmann machine, and further obtaining a conditional random field labeling sequence y ═ y { (y)1,y2,...,yi,...,yN},yiRepresenting the ith frame high level feature xiMarking corresponding region categories;
4) and (3) realizing music region classification: and identifying the region type of the song according to the obtained models corresponding to different region styles.
2. The regional classification method of folk songs in China based on the conditional random field according to claim 1, wherein in the step 3-1), a restricted Boltzmann machine is adopted to model the audio features of folk songs, the originally input features are mapped through nonlinear space transformation, and high-level feature sequences learned by a hidden layer are taken as an observation sequence, so that the difference between original audio feature frames is increased; wherein, the estimation of the limited Boltzmann machine parameter is calculated according to the maximum log-likelihood function estimation rule: firstly, pre-training network parameters by adopting a k-time contrast divergence algorithm; then, the parameters calculated by the k times of contrast divergence algorithm are used as the initial values of the network again, and the network parameters are finely adjusted by adopting a back propagation algorithm; and finally, obtaining the optimal parameters of the network.
3. The method for classifying Chinese folk songs based on the conditional random field as claimed in claim 2, wherein the step 3-2) comprises the following steps: firstly, adding a Softmax layer behind a hidden layer of a limited Boltzmann machine, wherein each unit in the Softmax layer represents a folk song with a region style category, so that the Softmax layer has a discrimination mechanism; then, judging the labeling state of the high-level features at each moment, setting the corresponding dimension of the state feature function to be 1, and finally obtaining the labeling sequence of the song; the method comprises the following specific steps:
(i) establishing a restricted Boltzmann machine discrimination model for all audio frame characteristics in a data set, namely adding a Softmax layer behind a hidden layer of the restricted Boltzmann machine, wherein each unit in the Softmax layer represents a folk song of a region style class, so that the restricted Boltzmann machine has a discrimination mechanism;
(ii) when the state of the region type label at any time in the label sequence is calculated, the abstract feature learned at the time is judged by the formula (2) to be labeled, namely, the state feature function slSetting the corresponding dimension as 1;
Figure FDA0002848913810000031
in the formula:
Figure FDA0002848913810000032
expressing get probabilityP (-) takes the value of j when the value is maximum,
Figure FDA0002848913810000033
representing the weight value on a line connected with the jth node of the Softmax layer, and determining the transfer characteristic function t at any two adjacent moments after region category labels are obtainedk
(iii) And finally, obtaining a labeling sequence of the tth song in the data set.
4. The method for classifying Chinese folk songs based on the conditional random field as claimed in claim 3, wherein the step 4) comprises the following specific calculation processes: firstly, taking high-level features obtained by songs as observation sequences of conditional random fields to be respectively brought into the trained conditional random fields, then respectively calculating the posterior probability of a prediction tagging sequence generated by each conditional random field through a forward-backward algorithm, and finally selecting the class represented by the conditional random field with the maximum posterior probability value as the prediction standard of testing the song region style class; the method comprises the following specific steps:
(i) taking the high-level characteristics obtained by each tested song as an observation sequence of a conditional random field, respectively bringing the observation sequence into the trained conditional random field, and respectively calculating the posterior probability of a tagging sequence generated by each conditional random field when the observation sequence is given by a forward-backward algorithm; wherein the given observation sequence refers to a predicted tagging sequence of the observation sequence in the conditional random field, which is calculated by a restricted boltzmann computer;
(ii) selecting the category represented by the conditional random field with the maximum posterior probability value as a prediction standard of the testing song region style category, namely satisfying the formula (3):
Figure FDA0002848913810000041
in the formula:
Figure FDA0002848913810000042
express get such thatThe probability P (-) takes the value of j at the maximum.
CN201910395241.3A 2019-05-13 2019-05-13 Chinese folk song geographical classification method based on conditional random field Active CN110189768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910395241.3A CN110189768B (en) 2019-05-13 2019-05-13 Chinese folk song geographical classification method based on conditional random field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910395241.3A CN110189768B (en) 2019-05-13 2019-05-13 Chinese folk song geographical classification method based on conditional random field

Publications (2)

Publication Number Publication Date
CN110189768A CN110189768A (en) 2019-08-30
CN110189768B true CN110189768B (en) 2021-02-02

Family

ID=67716108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910395241.3A Active CN110189768B (en) 2019-05-13 2019-05-13 Chinese folk song geographical classification method based on conditional random field

Country Status (1)

Country Link
CN (1) CN110189768B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110853604A (en) * 2019-10-30 2020-02-28 西安交通大学 Automatic generation method of Chinese folk songs with specific region style based on variational self-encoder

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056128A (en) * 2016-04-20 2016-10-26 北京航空航天大学 Remote sensing image classification marking method based on composite graph conditional random field
CN106328121A (en) * 2016-08-30 2017-01-11 南京理工大学 Chinese traditional musical instrument classification method based on depth confidence network
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056128A (en) * 2016-04-20 2016-10-26 北京航空航天大学 Remote sensing image classification marking method based on composite graph conditional random field
CN106328121A (en) * 2016-08-30 2017-01-11 南京理工大学 Chinese traditional musical instrument classification method based on depth confidence network
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic

Also Published As

Publication number Publication date
CN110189768A (en) 2019-08-30

Similar Documents

Publication Publication Date Title
CN109241524A (en) Semantic analysis method and device, computer readable storage medium, electronic equipment
Zhang Music style classification algorithm based on music feature extraction and deep neural network
CN106328121A (en) Chinese traditional musical instrument classification method based on depth confidence network
CN103823867A (en) Humming type music retrieval method and system based on note modeling
Peeters et al. Sound indexing using morphological description
CN102073636A (en) Program climax search method and system
Ng et al. Multi-level local feature coding fusion for music genre recognition
US20190199781A1 (en) Music categorization using rhythm, texture and pitch
US11271993B2 (en) Streaming music categorization using rhythm, texture and pitch
Samsekai Manjabhat et al. Raga and tonic identification in carnatic music
Xu Recognition and classification model of music genres and Chinese traditional musical instruments based on deep neural networks
CN113813609B (en) Game music style classification method and device, readable medium and electronic equipment
CN110399522A (en) A kind of music singing search method and device based on LSTM and layering and matching
CN113192471A (en) Music main melody track identification method based on neural network
CN109189982A (en) A kind of music emotion classification method based on SVM Active Learning
Shen et al. Effective music tagging through advanced statistical modeling
CN110189768B (en) Chinese folk song geographical classification method based on conditional random field
Nagavi et al. Overview of automatic Indian music information recognition, classification and retrieval systems
Li et al. The regional style classification of Chinese folk songs based on GMM-CRF model
Haque et al. An enhanced fuzzy c-means algorithm for audio segmentation and classification
Zhang [Retracted] Research on Music Classification Technology Based on Deep Learning
CN111462774A (en) Music emotion credible classification method based on deep learning
Coviello et al. Automatic Music Tagging With Time Series Models.
Chordia et al. Extending Content-Based Recommendation: The Case of Indian Classical Music.
Li [Retracted] Transformation of Nonmultiple Cluster Music Cyclic Shift Topology to Music Performance Style

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant