CN110309349A - A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network - Google Patents
A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network Download PDFInfo
- Publication number
- CN110309349A CN110309349A CN201910275097.XA CN201910275097A CN110309349A CN 110309349 A CN110309349 A CN 110309349A CN 201910275097 A CN201910275097 A CN 201910275097A CN 110309349 A CN110309349 A CN 110309349A
- Authority
- CN
- China
- Prior art keywords
- recognition
- rbm
- music
- rnn
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/635—Filtering based on additional data, e.g. user or group profiles
- G06F16/636—Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/65—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Abstract
A kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, comprising the following steps: 1) obtain music audio data and facial expression data;2) classification and marking is carried out to data;3) audio data and image real time transfer;4) RNN-RBM neural network is initialized;5) training RNN-RBM neural network;6) facial expression is identified using VGG19+dropout+10crop+softmax.7) emotional information identified is input in trained RNN-RBM network and is generated to get to final music.Present invention incorporates facial emotions identifications and AI music to generate, and can be given out music according to the mood of people, thus achieve the purpose that mood regulation, practical application value with higher.
Description
Technical field
The present invention relates to field of computer technology and digital music to generate field, is based on expression mood more particularly to one kind
The music generating method of identification and Recognition with Recurrent Neural Network.
Background technique
Music has subtle influence to the body and mind of people, along with the development and progress of internet and cloud music, music
The occupied time is more and more in people's daily life, and adjusts the physical and mental health of people dumbly.It is given birth to usually
The effect for experiencing music that we can be deep in work.For example programmer's programming efficiency when listening to music can improve, and be good for
Body person is accustomed to adjusting oneself body-building rhythm using music, the attention etc. when driver promotes driving using music.
Suitable occasion listens suitable music that the body and mind of people can also be allowed greatly to be unfolded, and such as passion is listened to splash when depressed and visitd
Symphony mood that people can be made low obtain certain release, listen light music that people can also be allowed tired in mood agitation
Hot-tempered mood obtains certain flat comfort.
But existing music is all that singer or composer goes to create according to oneself to the understanding of music, may not be able to satisfy and listen
The individual demand of person.In specific occasion, inappropriate tune, which may be such that the positive effect of music weakens, can even be generated
Counter productive, the music releived is played when such as driving may deepen the fatigue of driver, increase traffic accident rate;?
The passion music visitd of splashing is played when life gas may stimulate the mood of party and party is made to make some too drastic rows
For., whereas if can identify the mood of party and generate personalized music can then make music to adjust the mood of people
Forward directionization effect reach maximum.However there are no generate individualized music by the mood for analyzing people in existing technology
Technology.
Summary of the invention
In order to make up the defect that existing music is unable to satisfy individual demand of the audience under specific occasion, the present invention is combined
Facial expression Emotion identification and Recognition with Recurrent Neural Network music generate two kinds of technologies, propose a kind of based on expression Emotion identification and circulation
The music generating method of neural network, wherein facial expression Emotion identification is using Visual Geometry Group-19 convolution mind
Through network model (abbreviation VGG19), the music generating method of Recognition with Recurrent Neural Network is limited Boltzmann using Recognition with Recurrent Neural Network-
Machine (Recurrent Neural Network-Restricted Boltzmann Machine, abbreviation RNN-RBM) algorithm.
The technical solution adopted in the present invention is as follows:
A kind of music generating method of combination facial expression Emotion identification and Recognition with Recurrent Neural Network, comprising the following steps:
S1: music audio data and facial expression data are obtained;
S2: classification and marking is carried out to data;
S3: audio data and image real time transfer;
S4: initialization RNN-RBM neural network;
S5: training RNN-RBM neural network;
S6: facial expression is identified using VGG19;
S7: the emotional information identified is input in trained RNN-RBM network to get raw to final music
At.
Further, the step S1 is comprised the steps of:
S1.1: the audio data from Classical Piano Midi data set is obtained;
S1.2: FER2013 and CK+ Facial expression database is obtained;
S1.3: the image data and audio data of corresponding demand are crawled from internet using the method for web crawlers.
Further, the step S2 the following steps are included:
S2.1: the image data and audio data that crawl are labeled and are classified manually;
S2.2: image data is divided into training data and test data.
The step S3 the following steps are included:
S3.1: being completely converted into midi format for audio data, using the binary matrix in annex come to each first song
Qu Jinhang coding:
Wherein preceding n column are coding note-on events, behind n column be coding note-off event.The quantity of n is to indicate
Note number.
S3.2: image data expands the data set according to the format in FER2013 data set;
Indignation and aversion are classified as D1Class, it is happy to be classified as D with surprised2Class, it is sad to be classified as D3Class, frightened and remaining feelings
Thread is grouped into D4In.
In the step S4, the weight parameter of RNN-RBM is initialized;
In the step S5, in music generating portion, RNN hidden unit sends the mood of song being played on to
RBM, these information adjust the probability distribution of RBM, the mode that simulation note changes during song, process are as follows: RNN hides single
Member is by the mood P of song being played on1Send RBM to, it is then hidden using RNN-to-RBM weighted sum bias matrix and RNN
Hide unit ut-1State determine RBMtBias vectorWith
Some notes are created with RBM later, use vtInitialize RBMtAnd single Gibbs sampling iteration is executed with from RBMt
It is rightIt is sampled:
V is calculated using following loss functiontRelative to RBMtNegative log-likelihood to sdpecific dispersion estimate:
RNN is updated to the internal representation of song status with new information.Use vt, RNN hidden unit ut-1State and RNN
Weight and deviation matrix determine RNN hidden unit utState:
ut=σ (bu+vtWvu+ut-1Wuu)。
The step S6 is comprised the steps of:
S6.1: carrying out data enhancing to facial image, such as overturn, cutting, and the variation such as rotation increases training data quantity
Add;
The method pair of S6.2: using " VGG depth model "+" Dropout "+" 10 folding cross validations "+" Softmax classification "
Expression is identified.
In the step S6, data enhancing first is carried out to facial image, using the image of random cutting 44*44, and will figure
As carrying out random mirror image, then it is trained.Test phase is by picture at the upper left corner, the lower left corner, the upper right corner, the lower right corner, center
Cut and done mirror image operation, data set is made to expand 10 times, and with VGG depth model and the anti-over-fitting method of Dropout into
Row processing, is finally classified with Softmax, obtains the expression of active user.
In the step S7, the music for randomly selecting respective numbers, input are concentrated from music data according to corresponding weight
Final music is obtained into trained RNN-RBM network.
The method of the invention has the following beneficial effects:
(1) method of the present invention has great application prospect, such as raw for detection fatigue driving and Interesting music
At.
(2) method set of the present invention human facial expression recognition and Recognition with Recurrent Neural Network music generation technique, hardware
Dependence is low, and common mobile phone can be realized.
(3) method of the present invention uses the lesser pre-training model of parameter amount, has in recognition speed preferable
Performance experience.
Detailed description of the invention
Fig. 1 is song coding mode explanatory diagram.
Fig. 2 is the parameter schematic diagram of RNN-RBM neural network.
Fig. 3 is a kind of flow chart of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network.
The music score figure that Fig. 4 is actually generated.
Specific embodiment
The present invention will be further explained below with reference to the attached drawings.
Referring to Fig.1~Fig. 4 ,-kind of the music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, including following step
It is rapid:
S1: music audio data and facial expression data are obtained;
S2: classification and marking is carried out to data;
S3: audio data and image real time transfer;
S4: initialization RNN-RBM neural network;
S5: training RNN-RBM neural network;
S6: facial expression is identified using VGG19;
S7: the emotional information identified is input in trained RNN-RBM network to get raw to final music
At.
This example carries out music generation to the picture and audio data oneself acquired, the described method comprises the following steps:
S1: audio data and image data are obtained:
Audio data has part from Classical Piano Midi data set.Image data main source is
FER2013 and CK+ Facial expression database.
S2: classification and marking is carried out to data:
The image data and audio data that crawl are labeled and are classified manually, it is " raw that music data is divided into 4 classes
Gas, it is happily, sad, other ".And image data is divided into training set and test set according to 4: 1 ratio.
S3: audio data and image real time transfer, process are as follows:
S3.1: making audio data being completely converted into midi format, using the binary matrix of Fig. 1 structure in annex come
Each song is encoded:
Wherein preceding n column are coding note-on events, behind n column be coding note-off event.The quantity of n is to indicate
Note number.In note-on event, 1 indicates that the note is played, and 0 indicates not play.In note-off event, 1 indicates to be somebody's turn to do
Note is released, and 0 on the contrary.In the above coding, every a line indicates a time step, and each time step is a trained sample
This.Time is quantified in MIDI beat, and default setting is 96 beats of every bat, and a beat refers to crotchet.It is every to jump 96
Secondary bat causes each time step to parse 1/384 note.This data expression, which also can be expressed as piano volume, to be indicated.
S3.2: image data expands the data set according to the format in FER2013 data set.By indignation and aversion
It is classified as D1Class, it is happy to be classified as D with surprised2Class, it is sad to be classified as D3Class, frightened and remaining mood are grouped into D4In.
S4: initialization RNN-RBM neural network:
With the weight of common initialization function random initializtion RNN-RBM;
S5: training RNN-RBM neural network, process are as follows:
S5.1: in music generating portion, RNN hidden unit sends the mood of song being played on to RBM, these letters
Breath adjusts the probability distribution of RBM, the mode that simulation note changes during song.Concrete model is shown in Fig. 2
With RNN-to-RBM weighted sum bias matrix and RNN hidden unit ut-1State determine RBMtDeviation arrow
AmountWith
- a little notes are created with RBM, use vtInitialize RBMtAnd single Gibbs sampling iteration is executed with from RBMtIt is right
It is sampled.
Calculate vtRelative to RBMtNegative log-likelihood to sdpecific dispersion estimate.Loss function is as follows:
RNN is updated to the internal representation of song status with new information.Use vt, RNN hidden unit ut-1State and RNN
Weight and deviation matrix determine RNN hidden unit utState.Formula is as follows:
ut=σ (bu+vtWvu+ut-1Wuu)
S6: facial expression is identified using VGG19 model:
Image using the image of random cutting 44*44, and is carried out random mirror image, is then trained by the training stage.It surveys
The lower left corner, the upper right corner, the lower right corner, center was cut and was done mirror image operation, expands database by picture in the upper left corner examination stage
It is 10 times big, and handled with VGG depth model and the anti-over-fitting method of Dropout, finally classified with Softmax, is obtained
To Expression Recognition result.
S7: the emotional information identified is input in trained RNN-RBM network
What Expression Recognition obtained is 4 kinds of respective accountings of mood, then according to corresponding weight from music data concentrate with
Machine extracts the music of respective numbers, is input in trained RNN-RBM network and obtains final music.One obtained in it
Part generates music score as shown in Fig. 4 in annex.
Claims (8)
1. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, which is characterized in that the method includes with
Lower step:
S1: music audio data and facial expression data are obtained;
S2: classification and marking is carried out to data;
S3: audio data and image real time transfer;
S4: initialization RNN-RBM neural network;
S5: training RNN-RBM neural network;
S6: facial expression is identified using VGG19;
S7: the emotional information identified is input in trained RNN-RBM network and is generated to get to final music.
2. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1, feature
Be: the step S1 is comprised the steps of:
S1.1: the audio data from Classical Piano Midi data set is obtained;
S1.2: FER2013 and CK+ Facial expression database is obtained;
S1.3: the image data and audio data of corresponding demand are crawled from internet using the method for web crawlers.
3. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special
Sign is: the step S2 the following steps are included:
S2.1: the image data and audio data that crawl are labeled and are classified manually;
S2.2: image data is divided into training data and test data.
4. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special
Sign is: the step S3 the following steps are included:
S3.1: being completely converted into midi format for audio data, using the binary matrix in annex come to each song into
Row coding:
Wherein preceding n column are coding note-on events, behind n column be coding note-off event, the quantity of n is the sound to be indicated
Accord with number;
S3.2: image data expands the data set according to the format in FER2013 data set;
Indignation and aversion are classified as D1Class, it is happy to be classified as D with surprised2Class, it is sad to be classified as D3Class, frightened and remaining mood are returned
To D4In.
5. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special
Sign is: in the step S4, carrying out random initializtion to the weight parameter of RNN-RBM.
6. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special
Sign is: in the step S5, in music generating portion, RNN hidden unit sends the mood of song being played on to
RBM, these information adjust the probability distribution of RBM, the mode that simulation note changes during song, process are as follows: RNN hides single
Member is by the mood P of song being played on1Send RBM to, it is then hidden using RNN-to-RBM weighted sum bias matrix and RNN
Hide unit ut-1State determine RBMtBias vectorWith
Some notes are created with RBM later, use vtInitialize RBMtAnd single Gibbs sampling iteration is executed with from RBMtIt is rightInto
Row sampling:
V is calculated using following loss functiontRelative to RBMtNegative log-likelihood to sdpecific dispersion estimate:
RNN is updated to the internal representation of song status with new information, uses vt, RNN hidden unit ut-1State and RNN weight
RNN hidden unit u is determined with deviation matrixtState:
ut=σ (bu+vtWvu+ut-1Wuu)。
7. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special
Sign is: the step S6 is comprised the steps of:
S6.1: data enhancing is carried out to facial image, increases training data quantity;
The method of S6.2: using " VGG depth model "+" Dropout "+" 10 folding cross validations "+" Softmax classification " is to expression
It is identified.
8. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special
Sign is: in the step S7,4 kinds of respective accountings of mood was obtained by Expression Recognition before this, then according to corresponding weight
The music for randomly selecting respective numbers is concentrated from music data, is input in trained RNN-RBM network, eventually by
Recognition with Recurrent Neural Network obtains final music and generates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910275097.XA CN110309349A (en) | 2019-04-08 | 2019-04-08 | A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910275097.XA CN110309349A (en) | 2019-04-08 | 2019-04-08 | A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110309349A true CN110309349A (en) | 2019-10-08 |
Family
ID=68074414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910275097.XA Pending CN110309349A (en) | 2019-04-08 | 2019-04-08 | A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110309349A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021196754A (en) * | 2020-06-11 | 2021-12-27 | 日本電信電話株式会社 | Image processing apparatus, image processing method, and image processing program |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080190271A1 (en) * | 2007-02-14 | 2008-08-14 | Museami, Inc. | Collaborative Music Creation |
CN103793718A (en) * | 2013-12-11 | 2014-05-14 | 台州学院 | Deep study-based facial expression recognition method |
CN107145326A (en) * | 2017-03-28 | 2017-09-08 | 浙江大学 | A kind of the music automatic playing system and method for collection of being expressed one's feelings based on target face |
CN107800793A (en) * | 2017-10-27 | 2018-03-13 | 江苏大学 | Car networking environment down train music active pushing system |
CN108734114A (en) * | 2018-05-02 | 2018-11-02 | 浙江工业大学 | A kind of pet recognition methods of combination face harmony line |
CN108805094A (en) * | 2018-06-19 | 2018-11-13 | 合肥工业大学 | Data enhancement methods based on artificial face |
-
2019
- 2019-04-08 CN CN201910275097.XA patent/CN110309349A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080190271A1 (en) * | 2007-02-14 | 2008-08-14 | Museami, Inc. | Collaborative Music Creation |
CN103793718A (en) * | 2013-12-11 | 2014-05-14 | 台州学院 | Deep study-based facial expression recognition method |
CN107145326A (en) * | 2017-03-28 | 2017-09-08 | 浙江大学 | A kind of the music automatic playing system and method for collection of being expressed one's feelings based on target face |
CN107800793A (en) * | 2017-10-27 | 2018-03-13 | 江苏大学 | Car networking environment down train music active pushing system |
CN108734114A (en) * | 2018-05-02 | 2018-11-02 | 浙江工业大学 | A kind of pet recognition methods of combination face harmony line |
CN108805094A (en) * | 2018-06-19 | 2018-11-13 | 合肥工业大学 | Data enhancement methods based on artificial face |
Non-Patent Citations (2)
Title |
---|
NICOLAS BOULANGER-LEWANDOWSKI ET AL.: "Modeling Temporal Dependencies in High-Dimensional Sequences:Application to Polyphonic Music Generation and Transcription", 《PROC OF THE 29TH INT CONF ON MACHINE LEARNING》 * |
黎亚雄等: "基于RNN_RBM语言模型的语音识别研究", 《计算机研究与发展》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021196754A (en) * | 2020-06-11 | 2021-12-27 | 日本電信電話株式会社 | Image processing apparatus, image processing method, and image processing program |
JP7335204B2 (en) | 2020-06-11 | 2023-08-29 | 日本電信電話株式会社 | Image processing device, image processing method and image processing program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cox | Hearing, feeling, grasping gestures | |
Smalley | The listening imagination: listening in the electroacoustic era | |
Todd et al. | Frankensteinian methods for evolutionary music composition | |
CN115066681A (en) | Music content generation | |
CN110085263B (en) | Music emotion classification and machine composition method | |
Ting et al. | A novel automatic composition system using evolutionary algorithm and phrase imitation | |
Canazza et al. | Caro 2.0: an interactive system for expressive music rendering | |
CN110309349A (en) | A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network | |
Numao et al. | Constructive adaptive user interfaces-composing music based on human feelings | |
Whalley | Software Agents in Music and Sound Art Research/Creative Work: current state and a possible direction | |
Johnson | Exploring sound-space with interactive genetic algorithms | |
Madison | Properties of expressive variability patterns in music performances | |
Santos et al. | Evolutionary computation systems for musical composition | |
Heidemann | Hearing Women's Voices in Popular Song: Analyzing Sound and Identity in Country and Soul | |
Parncutt | Modeling piano performance: Physics and cognition of a virtual pianist | |
Hashida et al. | Rencon: Performance rendering contest for automated music systems | |
De Prisco et al. | An evolutionary composer for real-time background music | |
Chang et al. | Fusing creative operations into evolutionary computation for composition: From a composer’s perspective | |
Ostermann et al. | Evaluating Creativity in Automatic Reactive Accompaniment of Jazz Improvisation. | |
Le Groux et al. | Interactive sonification of the spatial behavior of human and synthetic characters in a mixed-reality environment | |
Verma | Artificial intelligence and music: History and the future perceptive | |
Smith | Musical creativities, spirituality, and playing drum kit in Black Light Bastards | |
Scirea et al. | Mood expression in real-time computer generated music using pure data | |
Williams | Affectively-Driven Algorithmic Composition (AAC) | |
Gaydon | Berio's sequenza XII in performance and context: a contribution to the Australian bassoon repertory synthesizing extended techniques into newly commissioned works. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191008 |
|
RJ01 | Rejection of invention patent application after publication |