CN110309349A

CN110309349A - A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network

Info

Publication number: CN110309349A
Application number: CN201910275097.XA
Authority: CN
Inventors: 傅晨波; 夏镒楠; 李一帆; 岳昕晨; 宣琦
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2019-10-08

Abstract

A kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, comprising the following steps: 1) obtain music audio data and facial expression data；2) classification and marking is carried out to data；3) audio data and image real time transfer；4) RNN-RBM neural network is initialized；5) training RNN-RBM neural network；6) facial expression is identified using VGG19+dropout+10crop+softmax.7) emotional information identified is input in trained RNN-RBM network and is generated to get to final music.Present invention incorporates facial emotions identifications and AI music to generate, and can be given out music according to the mood of people, thus achieve the purpose that mood regulation, practical application value with higher.

Description

A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network

Technical field

The present invention relates to field of computer technology and digital music to generate field, is based on expression mood more particularly to one kind The music generating method of identification and Recognition with Recurrent Neural Network.

Background technique

Music has subtle influence to the body and mind of people, along with the development and progress of internet and cloud music, music The occupied time is more and more in people's daily life, and adjusts the physical and mental health of people dumbly.It is given birth to usually The effect for experiencing music that we can be deep in work.For example programmer's programming efficiency when listening to music can improve, and be good for Body person is accustomed to adjusting oneself body-building rhythm using music, the attention etc. when driver promotes driving using music. Suitable occasion listens suitable music that the body and mind of people can also be allowed greatly to be unfolded, and such as passion is listened to splash when depressed and visitd Symphony mood that people can be made low obtain certain release, listen light music that people can also be allowed tired in mood agitation Hot-tempered mood obtains certain flat comfort.

But existing music is all that singer or composer goes to create according to oneself to the understanding of music, may not be able to satisfy and listen The individual demand of person.In specific occasion, inappropriate tune, which may be such that the positive effect of music weakens, can even be generated Counter productive, the music releived is played when such as driving may deepen the fatigue of driver, increase traffic accident rate；? The passion music visitd of splashing is played when life gas may stimulate the mood of party and party is made to make some too drastic rows For., whereas if can identify the mood of party and generate personalized music can then make music to adjust the mood of people Forward directionization effect reach maximum.However there are no generate individualized music by the mood for analyzing people in existing technology Technology.

Summary of the invention

In order to make up the defect that existing music is unable to satisfy individual demand of the audience under specific occasion, the present invention is combined Facial expression Emotion identification and Recognition with Recurrent Neural Network music generate two kinds of technologies, propose a kind of based on expression Emotion identification and circulation The music generating method of neural network, wherein facial expression Emotion identification is using Visual Geometry Group-19 convolution mind Through network model (abbreviation VGG19), the music generating method of Recognition with Recurrent Neural Network is limited Boltzmann using Recognition with Recurrent Neural Network- Machine (Recurrent Neural Network-Restricted Boltzmann Machine, abbreviation RNN-RBM) algorithm.

The technical solution adopted in the present invention is as follows:

A kind of music generating method of combination facial expression Emotion identification and Recognition with Recurrent Neural Network, comprising the following steps:

S1: music audio data and facial expression data are obtained；

S2: classification and marking is carried out to data；

S3: audio data and image real time transfer；

S4: initialization RNN-RBM neural network；

S5: training RNN-RBM neural network；

S6: facial expression is identified using VGG19；

S7: the emotional information identified is input in trained RNN-RBM network to get raw to final music At.

Further, the step S1 is comprised the steps of:

S1.1: the audio data from Classical Piano Midi data set is obtained；

S1.2: FER2013 and CK+ Facial expression database is obtained；

S1.3: the image data and audio data of corresponding demand are crawled from internet using the method for web crawlers.

Further, the step S2 the following steps are included:

S2.1: the image data and audio data that crawl are labeled and are classified manually；

S2.2: image data is divided into training data and test data.

The step S3 the following steps are included:

S3.1: being completely converted into midi format for audio data, using the binary matrix in annex come to each first song Qu Jinhang coding:

Wherein preceding n column are coding note-on events, behind n column be coding note-off event.The quantity of n is to indicate Note number.

S3.2: image data expands the data set according to the format in FER2013 data set；

Indignation and aversion are classified as D₁Class, it is happy to be classified as D with surprised₂Class, it is sad to be classified as D₃Class, frightened and remaining feelings Thread is grouped into D₄In.

In the step S4, the weight parameter of RNN-RBM is initialized；

In the step S5, in music generating portion, RNN hidden unit sends the mood of song being played on to RBM, these information adjust the probability distribution of RBM, the mode that simulation note changes during song, process are as follows: RNN hides single Member is by the mood P of song being played on¹Send RBM to, it is then hidden using RNN-to-RBM weighted sum bias matrix and RNN Hide unit u_t-1State determine RBM_tBias vectorWith

Some notes are created with RBM later, use v_tInitialize RBM_tAnd single Gibbs sampling iteration is executed with from RBM_t It is rightIt is sampled:

V is calculated using following loss function_tRelative to RBM_tNegative log-likelihood to sdpecific dispersion estimate:

RNN is updated to the internal representation of song status with new information.Use v_t, RNN hidden unit u_t-1State and RNN Weight and deviation matrix determine RNN hidden unit u_tState:

u_t=σ (b_u+v_tW_vu+u_t-1W_uu)。

The step S6 is comprised the steps of:

S6.1: carrying out data enhancing to facial image, such as overturn, cutting, and the variation such as rotation increases training data quantity Add；

The method pair of S6.2: using " VGG depth model "+" Dropout "+" 10 folding cross validations "+" Softmax classification " Expression is identified.

In the step S6, data enhancing first is carried out to facial image, using the image of random cutting 44*44, and will figure As carrying out random mirror image, then it is trained.Test phase is by picture at the upper left corner, the lower left corner, the upper right corner, the lower right corner, center Cut and done mirror image operation, data set is made to expand 10 times, and with VGG depth model and the anti-over-fitting method of Dropout into Row processing, is finally classified with Softmax, obtains the expression of active user.

In the step S7, the music for randomly selecting respective numbers, input are concentrated from music data according to corresponding weight Final music is obtained into trained RNN-RBM network.

The method of the invention has the following beneficial effects:

(1) method of the present invention has great application prospect, such as raw for detection fatigue driving and Interesting music At.

(2) method set of the present invention human facial expression recognition and Recognition with Recurrent Neural Network music generation technique, hardware Dependence is low, and common mobile phone can be realized.

(3) method of the present invention uses the lesser pre-training model of parameter amount, has in recognition speed preferable Performance experience.

Detailed description of the invention

Fig. 1 is song coding mode explanatory diagram.

Fig. 2 is the parameter schematic diagram of RNN-RBM neural network.

Fig. 3 is a kind of flow chart of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network.

The music score figure that Fig. 4 is actually generated.

Specific embodiment

The present invention will be further explained below with reference to the attached drawings.

Referring to Fig.1~Fig. 4 ,-kind of the music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, including following step It is rapid:

S1: music audio data and facial expression data are obtained；

S2: classification and marking is carried out to data；

S3: audio data and image real time transfer；

S4: initialization RNN-RBM neural network；

S5: training RNN-RBM neural network；

S6: facial expression is identified using VGG19；

This example carries out music generation to the picture and audio data oneself acquired, the described method comprises the following steps:

S1: audio data and image data are obtained:

Audio data has part from Classical Piano Midi data set.Image data main source is FER2013 and CK+ Facial expression database.

S2: classification and marking is carried out to data:

The image data and audio data that crawl are labeled and are classified manually, it is " raw that music data is divided into 4 classes Gas, it is happily, sad, other ".And image data is divided into training set and test set according to 4: 1 ratio.

S3: audio data and image real time transfer, process are as follows:

S3.1: making audio data being completely converted into midi format, using the binary matrix of Fig. 1 structure in annex come Each song is encoded:

Wherein preceding n column are coding note-on events, behind n column be coding note-off event.The quantity of n is to indicate Note number.In note-on event, 1 indicates that the note is played, and 0 indicates not play.In note-off event, 1 indicates to be somebody's turn to do Note is released, and 0 on the contrary.In the above coding, every a line indicates a time step, and each time step is a trained sample This.Time is quantified in MIDI beat, and default setting is 96 beats of every bat, and a beat refers to crotchet.It is every to jump 96 Secondary bat causes each time step to parse 1/384 note.This data expression, which also can be expressed as piano volume, to be indicated.

S3.2: image data expands the data set according to the format in FER2013 data set.By indignation and aversion It is classified as D₁Class, it is happy to be classified as D with surprised₂Class, it is sad to be classified as D₃Class, frightened and remaining mood are grouped into D₄In.

S4: initialization RNN-RBM neural network:

With the weight of common initialization function random initializtion RNN-RBM；

S5: training RNN-RBM neural network, process are as follows:

S5.1: in music generating portion, RNN hidden unit sends the mood of song being played on to RBM, these letters Breath adjusts the probability distribution of RBM, the mode that simulation note changes during song.Concrete model is shown in Fig. 2

With RNN-to-RBM weighted sum bias matrix and RNN hidden unit u_t-1State determine RBM_tDeviation arrow AmountWith

- a little notes are created with RBM, use v_tInitialize RBM_tAnd single Gibbs sampling iteration is executed with from RBM_tIt is right It is sampled.

Calculate v_tRelative to RBM_tNegative log-likelihood to sdpecific dispersion estimate.Loss function is as follows:

RNN is updated to the internal representation of song status with new information.Use v_t, RNN hidden unit u_t-1State and RNN Weight and deviation matrix determine RNN hidden unit u_tState.Formula is as follows:

u_t=σ (b_u+v_tW_vu+u_t-1W_uu)

S6: facial expression is identified using VGG19 model:

Image using the image of random cutting 44*44, and is carried out random mirror image, is then trained by the training stage.It surveys The lower left corner, the upper right corner, the lower right corner, center was cut and was done mirror image operation, expands database by picture in the upper left corner examination stage It is 10 times big, and handled with VGG depth model and the anti-over-fitting method of Dropout, finally classified with Softmax, is obtained To Expression Recognition result.

S7: the emotional information identified is input in trained RNN-RBM network

What Expression Recognition obtained is 4 kinds of respective accountings of mood, then according to corresponding weight from music data concentrate with Machine extracts the music of respective numbers, is input in trained RNN-RBM network and obtains final music.One obtained in it Part generates music score as shown in Fig. 4 in annex.

Claims

1. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, which is characterized in that the method includes with Lower step:

S1: music audio data and facial expression data are obtained；

S2: classification and marking is carried out to data；

S3: audio data and image real time transfer；

S4: initialization RNN-RBM neural network；

S5: training RNN-RBM neural network；

S6: facial expression is identified using VGG19；

S7: the emotional information identified is input in trained RNN-RBM network and is generated to get to final music.

2. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1, feature Be: the step S1 is comprised the steps of:

S1.1: the audio data from Classical Piano Midi data set is obtained；

S1.2: FER2013 and CK+ Facial expression database is obtained；

3. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: the step S2 the following steps are included:

S2.2: image data is divided into training data and test data.

4. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: the step S3 the following steps are included:

S3.1: being completely converted into midi format for audio data, using the binary matrix in annex come to each song into Row coding:

Wherein preceding n column are coding note-on events, behind n column be coding note-off event, the quantity of n is the sound to be indicated Accord with number；

Indignation and aversion are classified as D₁Class, it is happy to be classified as D with surprised₂Class, it is sad to be classified as D₃Class, frightened and remaining mood are returned To D₄In.

5. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: in the step S4, carrying out random initializtion to the weight parameter of RNN-RBM.

6. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: in the step S5, in music generating portion, RNN hidden unit sends the mood of song being played on to RBM, these information adjust the probability distribution of RBM, the mode that simulation note changes during song, process are as follows: RNN hides single Member is by the mood P of song being played on¹Send RBM to, it is then hidden using RNN-to-RBM weighted sum bias matrix and RNN Hide unit u_t-1State determine RBM_tBias vectorWith

Some notes are created with RBM later, use v_tInitialize RBM_tAnd single Gibbs sampling iteration is executed with from RBM_tIt is rightInto Row sampling:

RNN is updated to the internal representation of song status with new information, uses v_t, RNN hidden unit u_t-1State and RNN weight RNN hidden unit u is determined with deviation matrix_tState:

u_t=σ (b_u+v_tW_vu+u_t-1W_uu)。

7. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: the step S6 is comprised the steps of:

S6.1: data enhancing is carried out to facial image, increases training data quantity；

The method of S6.2: using " VGG depth model "+" Dropout "+" 10 folding cross validations "+" Softmax classification " is to expression It is identified.

8. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: in the step S7,4 kinds of respective accountings of mood was obtained by Expression Recognition before this, then according to corresponding weight The music for randomly selecting respective numbers is concentrated from music data, is input in trained RNN-RBM network, eventually by Recognition with Recurrent Neural Network obtains final music and generates.