CN110309349A - A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network - Google Patents

A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network Download PDF

Info

Publication number
CN110309349A
CN110309349A CN201910275097.XA CN201910275097A CN110309349A CN 110309349 A CN110309349 A CN 110309349A CN 201910275097 A CN201910275097 A CN 201910275097A CN 110309349 A CN110309349 A CN 110309349A
Authority
CN
China
Prior art keywords
recognition
rbm
music
rnn
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910275097.XA
Other languages
Chinese (zh)
Inventor
傅晨波
夏镒楠
李一帆
岳昕晨
宣琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201910275097.XA priority Critical patent/CN110309349A/en
Publication of CN110309349A publication Critical patent/CN110309349A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • G06F16/636Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Abstract

A kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, comprising the following steps: 1) obtain music audio data and facial expression data;2) classification and marking is carried out to data;3) audio data and image real time transfer;4) RNN-RBM neural network is initialized;5) training RNN-RBM neural network;6) facial expression is identified using VGG19+dropout+10crop+softmax.7) emotional information identified is input in trained RNN-RBM network and is generated to get to final music.Present invention incorporates facial emotions identifications and AI music to generate, and can be given out music according to the mood of people, thus achieve the purpose that mood regulation, practical application value with higher.

Description

A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network
Technical field
The present invention relates to field of computer technology and digital music to generate field, is based on expression mood more particularly to one kind The music generating method of identification and Recognition with Recurrent Neural Network.
Background technique
Music has subtle influence to the body and mind of people, along with the development and progress of internet and cloud music, music The occupied time is more and more in people's daily life, and adjusts the physical and mental health of people dumbly.It is given birth to usually The effect for experiencing music that we can be deep in work.For example programmer's programming efficiency when listening to music can improve, and be good for Body person is accustomed to adjusting oneself body-building rhythm using music, the attention etc. when driver promotes driving using music. Suitable occasion listens suitable music that the body and mind of people can also be allowed greatly to be unfolded, and such as passion is listened to splash when depressed and visitd Symphony mood that people can be made low obtain certain release, listen light music that people can also be allowed tired in mood agitation Hot-tempered mood obtains certain flat comfort.
But existing music is all that singer or composer goes to create according to oneself to the understanding of music, may not be able to satisfy and listen The individual demand of person.In specific occasion, inappropriate tune, which may be such that the positive effect of music weakens, can even be generated Counter productive, the music releived is played when such as driving may deepen the fatigue of driver, increase traffic accident rate;? The passion music visitd of splashing is played when life gas may stimulate the mood of party and party is made to make some too drastic rows For., whereas if can identify the mood of party and generate personalized music can then make music to adjust the mood of people Forward directionization effect reach maximum.However there are no generate individualized music by the mood for analyzing people in existing technology Technology.
Summary of the invention
In order to make up the defect that existing music is unable to satisfy individual demand of the audience under specific occasion, the present invention is combined Facial expression Emotion identification and Recognition with Recurrent Neural Network music generate two kinds of technologies, propose a kind of based on expression Emotion identification and circulation The music generating method of neural network, wherein facial expression Emotion identification is using Visual Geometry Group-19 convolution mind Through network model (abbreviation VGG19), the music generating method of Recognition with Recurrent Neural Network is limited Boltzmann using Recognition with Recurrent Neural Network- Machine (Recurrent Neural Network-Restricted Boltzmann Machine, abbreviation RNN-RBM) algorithm.
The technical solution adopted in the present invention is as follows:
A kind of music generating method of combination facial expression Emotion identification and Recognition with Recurrent Neural Network, comprising the following steps:
S1: music audio data and facial expression data are obtained;
S2: classification and marking is carried out to data;
S3: audio data and image real time transfer;
S4: initialization RNN-RBM neural network;
S5: training RNN-RBM neural network;
S6: facial expression is identified using VGG19;
S7: the emotional information identified is input in trained RNN-RBM network to get raw to final music At.
Further, the step S1 is comprised the steps of:
S1.1: the audio data from Classical Piano Midi data set is obtained;
S1.2: FER2013 and CK+ Facial expression database is obtained;
S1.3: the image data and audio data of corresponding demand are crawled from internet using the method for web crawlers.
Further, the step S2 the following steps are included:
S2.1: the image data and audio data that crawl are labeled and are classified manually;
S2.2: image data is divided into training data and test data.
The step S3 the following steps are included:
S3.1: being completely converted into midi format for audio data, using the binary matrix in annex come to each first song Qu Jinhang coding:
Wherein preceding n column are coding note-on events, behind n column be coding note-off event.The quantity of n is to indicate Note number.
S3.2: image data expands the data set according to the format in FER2013 data set;
Indignation and aversion are classified as D1Class, it is happy to be classified as D with surprised2Class, it is sad to be classified as D3Class, frightened and remaining feelings Thread is grouped into D4In.
In the step S4, the weight parameter of RNN-RBM is initialized;
In the step S5, in music generating portion, RNN hidden unit sends the mood of song being played on to RBM, these information adjust the probability distribution of RBM, the mode that simulation note changes during song, process are as follows: RNN hides single Member is by the mood P of song being played on1Send RBM to, it is then hidden using RNN-to-RBM weighted sum bias matrix and RNN Hide unit ut-1State determine RBMtBias vectorWith
Some notes are created with RBM later, use vtInitialize RBMtAnd single Gibbs sampling iteration is executed with from RBMt It is rightIt is sampled:
V is calculated using following loss functiontRelative to RBMtNegative log-likelihood to sdpecific dispersion estimate:
RNN is updated to the internal representation of song status with new information.Use vt, RNN hidden unit ut-1State and RNN Weight and deviation matrix determine RNN hidden unit utState:
ut=σ (bu+vtWvu+ut-1Wuu)。
The step S6 is comprised the steps of:
S6.1: carrying out data enhancing to facial image, such as overturn, cutting, and the variation such as rotation increases training data quantity Add;
The method pair of S6.2: using " VGG depth model "+" Dropout "+" 10 folding cross validations "+" Softmax classification " Expression is identified.
In the step S6, data enhancing first is carried out to facial image, using the image of random cutting 44*44, and will figure As carrying out random mirror image, then it is trained.Test phase is by picture at the upper left corner, the lower left corner, the upper right corner, the lower right corner, center Cut and done mirror image operation, data set is made to expand 10 times, and with VGG depth model and the anti-over-fitting method of Dropout into Row processing, is finally classified with Softmax, obtains the expression of active user.
In the step S7, the music for randomly selecting respective numbers, input are concentrated from music data according to corresponding weight Final music is obtained into trained RNN-RBM network.
The method of the invention has the following beneficial effects:
(1) method of the present invention has great application prospect, such as raw for detection fatigue driving and Interesting music At.
(2) method set of the present invention human facial expression recognition and Recognition with Recurrent Neural Network music generation technique, hardware Dependence is low, and common mobile phone can be realized.
(3) method of the present invention uses the lesser pre-training model of parameter amount, has in recognition speed preferable Performance experience.
Detailed description of the invention
Fig. 1 is song coding mode explanatory diagram.
Fig. 2 is the parameter schematic diagram of RNN-RBM neural network.
Fig. 3 is a kind of flow chart of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network.
The music score figure that Fig. 4 is actually generated.
Specific embodiment
The present invention will be further explained below with reference to the attached drawings.
Referring to Fig.1~Fig. 4 ,-kind of the music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, including following step It is rapid:
S1: music audio data and facial expression data are obtained;
S2: classification and marking is carried out to data;
S3: audio data and image real time transfer;
S4: initialization RNN-RBM neural network;
S5: training RNN-RBM neural network;
S6: facial expression is identified using VGG19;
S7: the emotional information identified is input in trained RNN-RBM network to get raw to final music At.
This example carries out music generation to the picture and audio data oneself acquired, the described method comprises the following steps:
S1: audio data and image data are obtained:
Audio data has part from Classical Piano Midi data set.Image data main source is FER2013 and CK+ Facial expression database.
S2: classification and marking is carried out to data:
The image data and audio data that crawl are labeled and are classified manually, it is " raw that music data is divided into 4 classes Gas, it is happily, sad, other ".And image data is divided into training set and test set according to 4: 1 ratio.
S3: audio data and image real time transfer, process are as follows:
S3.1: making audio data being completely converted into midi format, using the binary matrix of Fig. 1 structure in annex come Each song is encoded:
Wherein preceding n column are coding note-on events, behind n column be coding note-off event.The quantity of n is to indicate Note number.In note-on event, 1 indicates that the note is played, and 0 indicates not play.In note-off event, 1 indicates to be somebody's turn to do Note is released, and 0 on the contrary.In the above coding, every a line indicates a time step, and each time step is a trained sample This.Time is quantified in MIDI beat, and default setting is 96 beats of every bat, and a beat refers to crotchet.It is every to jump 96 Secondary bat causes each time step to parse 1/384 note.This data expression, which also can be expressed as piano volume, to be indicated.
S3.2: image data expands the data set according to the format in FER2013 data set.By indignation and aversion It is classified as D1Class, it is happy to be classified as D with surprised2Class, it is sad to be classified as D3Class, frightened and remaining mood are grouped into D4In.
S4: initialization RNN-RBM neural network:
With the weight of common initialization function random initializtion RNN-RBM;
S5: training RNN-RBM neural network, process are as follows:
S5.1: in music generating portion, RNN hidden unit sends the mood of song being played on to RBM, these letters Breath adjusts the probability distribution of RBM, the mode that simulation note changes during song.Concrete model is shown in Fig. 2
With RNN-to-RBM weighted sum bias matrix and RNN hidden unit ut-1State determine RBMtDeviation arrow AmountWith
- a little notes are created with RBM, use vtInitialize RBMtAnd single Gibbs sampling iteration is executed with from RBMtIt is right It is sampled.
Calculate vtRelative to RBMtNegative log-likelihood to sdpecific dispersion estimate.Loss function is as follows:
RNN is updated to the internal representation of song status with new information.Use vt, RNN hidden unit ut-1State and RNN Weight and deviation matrix determine RNN hidden unit utState.Formula is as follows:
ut=σ (bu+vtWvu+ut-1Wuu)
S6: facial expression is identified using VGG19 model:
Image using the image of random cutting 44*44, and is carried out random mirror image, is then trained by the training stage.It surveys The lower left corner, the upper right corner, the lower right corner, center was cut and was done mirror image operation, expands database by picture in the upper left corner examination stage It is 10 times big, and handled with VGG depth model and the anti-over-fitting method of Dropout, finally classified with Softmax, is obtained To Expression Recognition result.
S7: the emotional information identified is input in trained RNN-RBM network
What Expression Recognition obtained is 4 kinds of respective accountings of mood, then according to corresponding weight from music data concentrate with Machine extracts the music of respective numbers, is input in trained RNN-RBM network and obtains final music.One obtained in it Part generates music score as shown in Fig. 4 in annex.

Claims (8)

1. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network, which is characterized in that the method includes with Lower step:
S1: music audio data and facial expression data are obtained;
S2: classification and marking is carried out to data;
S3: audio data and image real time transfer;
S4: initialization RNN-RBM neural network;
S5: training RNN-RBM neural network;
S6: facial expression is identified using VGG19;
S7: the emotional information identified is input in trained RNN-RBM network and is generated to get to final music.
2. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1, feature Be: the step S1 is comprised the steps of:
S1.1: the audio data from Classical Piano Midi data set is obtained;
S1.2: FER2013 and CK+ Facial expression database is obtained;
S1.3: the image data and audio data of corresponding demand are crawled from internet using the method for web crawlers.
3. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: the step S2 the following steps are included:
S2.1: the image data and audio data that crawl are labeled and are classified manually;
S2.2: image data is divided into training data and test data.
4. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: the step S3 the following steps are included:
S3.1: being completely converted into midi format for audio data, using the binary matrix in annex come to each song into Row coding:
Wherein preceding n column are coding note-on events, behind n column be coding note-off event, the quantity of n is the sound to be indicated Accord with number;
S3.2: image data expands the data set according to the format in FER2013 data set;
Indignation and aversion are classified as D1Class, it is happy to be classified as D with surprised2Class, it is sad to be classified as D3Class, frightened and remaining mood are returned To D4In.
5. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: in the step S4, carrying out random initializtion to the weight parameter of RNN-RBM.
6. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: in the step S5, in music generating portion, RNN hidden unit sends the mood of song being played on to RBM, these information adjust the probability distribution of RBM, the mode that simulation note changes during song, process are as follows: RNN hides single Member is by the mood P of song being played on1Send RBM to, it is then hidden using RNN-to-RBM weighted sum bias matrix and RNN Hide unit ut-1State determine RBMtBias vectorWith
Some notes are created with RBM later, use vtInitialize RBMtAnd single Gibbs sampling iteration is executed with from RBMtIt is rightInto Row sampling:
V is calculated using following loss functiontRelative to RBMtNegative log-likelihood to sdpecific dispersion estimate:
RNN is updated to the internal representation of song status with new information, uses vt, RNN hidden unit ut-1State and RNN weight RNN hidden unit u is determined with deviation matrixtState:
ut=σ (bu+vtWvu+ut-1Wuu)。
7. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: the step S6 is comprised the steps of:
S6.1: data enhancing is carried out to facial image, increases training data quantity;
The method of S6.2: using " VGG depth model "+" Dropout "+" 10 folding cross validations "+" Softmax classification " is to expression It is identified.
8. a kind of music generating method based on Expression Recognition and Recognition with Recurrent Neural Network according to claim 1 or 2, special Sign is: in the step S7,4 kinds of respective accountings of mood was obtained by Expression Recognition before this, then according to corresponding weight The music for randomly selecting respective numbers is concentrated from music data, is input in trained RNN-RBM network, eventually by Recognition with Recurrent Neural Network obtains final music and generates.
CN201910275097.XA 2019-04-08 2019-04-08 A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network Pending CN110309349A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910275097.XA CN110309349A (en) 2019-04-08 2019-04-08 A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910275097.XA CN110309349A (en) 2019-04-08 2019-04-08 A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network

Publications (1)

Publication Number Publication Date
CN110309349A true CN110309349A (en) 2019-10-08

Family

ID=68074414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910275097.XA Pending CN110309349A (en) 2019-04-08 2019-04-08 A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network

Country Status (1)

Country Link
CN (1) CN110309349A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021196754A (en) * 2020-06-11 2021-12-27 日本電信電話株式会社 Image processing apparatus, image processing method, and image processing program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080190271A1 (en) * 2007-02-14 2008-08-14 Museami, Inc. Collaborative Music Creation
CN103793718A (en) * 2013-12-11 2014-05-14 台州学院 Deep study-based facial expression recognition method
CN107145326A (en) * 2017-03-28 2017-09-08 浙江大学 A kind of the music automatic playing system and method for collection of being expressed one's feelings based on target face
CN107800793A (en) * 2017-10-27 2018-03-13 江苏大学 Car networking environment down train music active pushing system
CN108734114A (en) * 2018-05-02 2018-11-02 浙江工业大学 A kind of pet recognition methods of combination face harmony line
CN108805094A (en) * 2018-06-19 2018-11-13 合肥工业大学 Data enhancement methods based on artificial face

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080190271A1 (en) * 2007-02-14 2008-08-14 Museami, Inc. Collaborative Music Creation
CN103793718A (en) * 2013-12-11 2014-05-14 台州学院 Deep study-based facial expression recognition method
CN107145326A (en) * 2017-03-28 2017-09-08 浙江大学 A kind of the music automatic playing system and method for collection of being expressed one's feelings based on target face
CN107800793A (en) * 2017-10-27 2018-03-13 江苏大学 Car networking environment down train music active pushing system
CN108734114A (en) * 2018-05-02 2018-11-02 浙江工业大学 A kind of pet recognition methods of combination face harmony line
CN108805094A (en) * 2018-06-19 2018-11-13 合肥工业大学 Data enhancement methods based on artificial face

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NICOLAS BOULANGER-LEWANDOWSKI ET AL.: "Modeling Temporal Dependencies in High-Dimensional Sequences:Application to Polyphonic Music Generation and Transcription", 《PROC OF THE 29TH INT CONF ON MACHINE LEARNING》 *
黎亚雄等: "基于RNN_RBM语言模型的语音识别研究", 《计算机研究与发展》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021196754A (en) * 2020-06-11 2021-12-27 日本電信電話株式会社 Image processing apparatus, image processing method, and image processing program
JP7335204B2 (en) 2020-06-11 2023-08-29 日本電信電話株式会社 Image processing device, image processing method and image processing program

Similar Documents

Publication Publication Date Title
Cox Hearing, feeling, grasping gestures
Smalley The listening imagination: listening in the electroacoustic era
Todd et al. Frankensteinian methods for evolutionary music composition
CN115066681A (en) Music content generation
CN110085263B (en) Music emotion classification and machine composition method
Ting et al. A novel automatic composition system using evolutionary algorithm and phrase imitation
Canazza et al. Caro 2.0: an interactive system for expressive music rendering
CN110309349A (en) A kind of music generating method based on human facial expression recognition and Recognition with Recurrent Neural Network
Numao et al. Constructive adaptive user interfaces-composing music based on human feelings
Whalley Software Agents in Music and Sound Art Research/Creative Work: current state and a possible direction
Johnson Exploring sound-space with interactive genetic algorithms
Madison Properties of expressive variability patterns in music performances
Santos et al. Evolutionary computation systems for musical composition
Heidemann Hearing Women's Voices in Popular Song: Analyzing Sound and Identity in Country and Soul
Parncutt Modeling piano performance: Physics and cognition of a virtual pianist
Hashida et al. Rencon: Performance rendering contest for automated music systems
De Prisco et al. An evolutionary composer for real-time background music
Chang et al. Fusing creative operations into evolutionary computation for composition: From a composer’s perspective
Ostermann et al. Evaluating Creativity in Automatic Reactive Accompaniment of Jazz Improvisation.
Le Groux et al. Interactive sonification of the spatial behavior of human and synthetic characters in a mixed-reality environment
Verma Artificial intelligence and music: History and the future perceptive
Smith Musical creativities, spirituality, and playing drum kit in Black Light Bastards
Scirea et al. Mood expression in real-time computer generated music using pure data
Williams Affectively-Driven Algorithmic Composition (AAC)
Gaydon Berio's sequenza XII in performance and context: a contribution to the Australian bassoon repertory synthesizing extended techniques into newly commissioned works.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191008

RJ01 Rejection of invention patent application after publication