WO2024117973A1 - Système de production de musique algorithmique pour la médiation d'émotion et méthode - Google Patents

Système de production de musique algorithmique pour la médiation d'émotion et méthode Download PDF

Info

Publication number
WO2024117973A1
WO2024117973A1 PCT/SG2023/050789 SG2023050789W WO2024117973A1 WO 2024117973 A1 WO2024117973 A1 WO 2024117973A1 SG 2023050789 W SG2023050789 W SG 2023050789W WO 2024117973 A1 WO2024117973 A1 WO 2024117973A1
Authority
WO
WIPO (PCT)
Prior art keywords
emotive
pitch
level
arousal level
valence
Prior art date
Application number
PCT/SG2023/050789
Other languages
English (en)
Inventor
Kathleen Rose AGRES
Cliff TAN
Phoebe CHUA
Adyasha Dash
Original Assignee
National University Of Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University Of Singapore filed Critical National University Of Singapore
Publication of WO2024117973A1 publication Critical patent/WO2024117973A1/fr

Links

Definitions

  • the present invention relates broadly, but not exclusively, to an algorithmic music generation system for emotion mediation and method.
  • Affective music generation systems that can create pieces that evoke and/or reflect (i.e., sonify) target emotions in listeners are of particular interest, in no small part due to commercial use cases such as soundtrack design and therapeutic applications.
  • rule-based approaches are heavily reliant on hand-designed functions to map affective signals to musical parameters.
  • Rule-based approaches can sidestep the challenges associated with learning-based approaches by incorporating knowledge of how affective states map to musical parameters, as well as typical expectations regarding harmonic, rhythmic and temporal structure.
  • Developing a rule-based affective music generation system requires first identifying a set of musical parameters and affective states, then designing functions that map parameter values to target states.
  • numerous empirical studies have been devoted to mapping out the relationship between musical parameters and affective states, the field lacks a single definitive framework for understanding emotions in music.
  • approaches to measuring the affective content of experimental stimuli have been heterogenous and range from generalizable, well-validated measures to those developed for specific studies.
  • a computer-implemented method for emotion mediation includes determining, by an emotional state identification module of a computing device, an emotive valence level and an emotive arousal level of a user based on a received signal, the received signal represents the user’s emotion, adjusting, by a parameter adjustment module of the computing device, a plurality of musical parameters based on the emotive valence level or the emotive arousal level, and playing, by an audio generation device communicatively coupled to the computing device, a plurality of musical audio signals based on the plurality of adjusted musical parameters to the user to sonify the user’s emotion so as to mediate the user’s emotion.
  • the computer-implemented may further include generating, by a musical composition module of the computing device, a plurality of digital messages based on the plurality of musical parameters using a rule-based algorithm, wherein each of the plurality of digital messages can be converted to an audio signal by the audio generation device, and converting, by the audio generation device, the plurality of digital messages to the plurality of musical audio signals.
  • each of the plurality of musical parameters may include any one of harmonic parameters, pitch characteristics, rhythmic parameters, timbral parameters, loudness parameters, and instrumentation parameters.
  • the emotive valence level and the emotive arousal level may be represented by a numerical value ranging from 0.0 to 1.0, and each of the plurality of musical parameters may be adjusted based on the emotive valence level and/or the emotive arousal level using a set of predetermined rules.
  • the plurality of musical audio signals may represent one or more notes in a 8-bar theme.
  • the timbral parameters and the loudness parameters may include a velocity range, a velocity variation and instrumentation.
  • a maximum loudness level and a minimum loudness level of the loudness parameters within each bar of the 8-bar section may be pre-defined and the limit decreases as the emotive arousal level decreases.
  • the harmonic parameters may include a mode.
  • the set of predetermined rules for adjusting the mode may include dividing the emotive valence level into ten regions, each region being defined by a range of the emotive valence level, and determining a current chord and a subsequent chord to be sounded in each region based on a probabilistic chord progression matrix to match an intended emotive valence level.
  • the probabilistic chord progression matrix may define a probability of the current chord and the subsequent chord to be sounded at a respective bar of the 8-bar section and a respective region of the ten regions.
  • the pitch characteristics may include criteria of dissimilarity, pitch motive and pitch register.
  • rhythmic parameters may include criteria of rhythmic roughness, rhythmic pattern, note density and tempo.
  • the plurality of musical audio signals may represent one or more notes in a first 8-bar theme and a second 8-bar theme.
  • the timbral parameters and the loudness parameters may include a velocity and an instrumentation.
  • the harmonic parameters may include a mode.
  • the set of predetermined rules for adjusting the mode may include dividing the emotive valence level into ten regions, each region being defined by a range of the emotive valence level, and determining a current chord and a subsequent chord to be sounded based on a first probabilistic chord progression matrix and a second probabilistic chord progression matrix matching an intended emotive valence level.
  • the first probabilistic chord progression matrix may define a probability of the current chord and the subsequent chord to be sounded at a respective bar of the first 8-bar theme and a respective region of the ten regions
  • the second probabilistic chord progression matrix may define a probability of the current chord and the subsequent chord to be sounded at a respective bar of the second 8-bar theme and the respective region of the ten regions.
  • the pitch characteristics may include criteria of dissimilarity, pitch motive and pitch register.
  • the pitch motive may include any one of a diatonic step down a scale, a diatonic step up the scale, no pitch motion and a jump to a randomly selected chord tone, adjusting the pitch register of the note played by the strummed guitar and the plucked guitar by defining an increasing linear relationship between a probability of applying a chord inversion and the emotive valence level based on p(
  • the rhythmic parameters may include criteria of rhythmic roughness, rhythmic pattern, note density and tempo.
  • the computer-implemented method may further include providing, by the user, information of the user’s emotion to an input/output (I/O) device that is communicatively coupled to the computing device, generating, by the I/O device, the signal based on the information, and sending, by the I/O device, the signal to the emotional state identification module.
  • I/O input/output
  • the user may provide the information to the I/O device based on any one of: 1) inputting, on a user interface of the I/O device, a perceived or induced emotion by the user from the plurality of musical audio signals, 2) inputting, on the user interface, instructions to adjust the emotive valence level and/or the emotive arousal level, and 3) inputting, on the user interface, a current emotion of the user.
  • the computer-implemented method may further include receiving, by a Brain-Computer Interface (BCI) device communicatively coupled to the computing device, an electroencephalogram (EEG) signal from the user, and sending, by the BCI device, the EEG signal to the emotional state identification module
  • BCI Brain-Computer Interface
  • EEG electroencephalogram
  • the computer-implemented method may further include playing, by a media playing device communicatively coupled to the computing device, media to the user, providing, by the user, information of an emotive reaction to the media to an input element of the media playing device, generating, by the media playing device, the signal based on the information received, and sending, by the media playing device, the signal to the emotional state identification module and data of the media being played to the computing device.
  • the user may provide the information to the media playing device based on any one of: 1) inputting, on the input element of the media playing device, a perceived or induced emotion by the user from the playing media, 2) inputting, on the input element of the media playing device, instructions to adjust the emotive valence level and/or the emotive arousal level, and 3) inputting, on the input element of the media playing device, one or more intended emotions to be experienced when playing the media.
  • the computer-implemented method may further include synchronizing, by a synchronization module of the computing device, the signal with the playing media to generate an audio-visual signal stream including the playing media and an audio track synchronized to the playing media and including the musical audio signals corresponding to the signal.
  • an algorithmic music generation system for emotion mediation.
  • the system includes a computing device including an emotional state identification module configured to determine an emotive valence level and an emotive arousal level of a user based on a received signal, the received signal representing the user’s emotion, and a parameter adjustment module configured to adjust a plurality of musical parameters based on the emotive valence level and/or the emotive arousal level.
  • the system further includes an audio generation device communicatively coupled to the computing device. The audio generation device is configured to play a plurality of musical audio signals based on the plurality of adjusted musical parameters to the user to sonify the user’s emotion so as to mediate the user’s emotion.
  • the computing device may further include a musical composition module configured to generate a plurality of digital messages based on the plurality of musical parameters using a probabilistic, rule-based algorithm. Each of the plurality of digital messages may be converted to an audio signal by the audio generation device.
  • each of the plurality of musical parameters may include any one of: harmonic parameters, pitch characteristics, rhythmic parameters, timbral parameters, loudness parameters, and instrumentation parameters.
  • the emotive valence level and the emotive arousal level may each be represented by a numerical value ranging from 0.0 to 1.0.
  • Each of the plurality of musical parameters may be adjusted based on the emotive valence level and the emotive arousal level using a set of predetermined rules.
  • the plurality of musical audio signals may represent one or more notes in an 8-bar theme.
  • the timbral parameters and the loudness parameters may include a velocity range, a velocity variation and instrumentation.
  • a maximum loudness level and a minimum loudness level of the loudness parameters within each bar of the 8-bar section may be pre-defined and the limit decreases as the emotive arousal level decreases.
  • the harmonic parameters may include a mode.
  • the set of predetermined rules for adjusting the mode may include dividing the emotive valence level into ten regions, each region being defined by a range of the emotive valence level, and determining a current chord and a subsequent chord to be sounded in each region based on a probabilistic chord progression matrix to match an intended emotive valence level.
  • the probabilistic chord progression matrix may define a probability of the current chord and the subsequent chord to be sounded at a respective bar of the 8-bar section and a respective region of the ten regions.
  • the pitch characteristics may include criteria of dissimilarity, pitch motive and pitch register.
  • rhythmic parameters may include criteria of rhythmic roughness, rhythmic pattern, note density and tempo.
  • the plurality of musical audio signals may represent one or more notes in a first 8-bar theme and a second 8-bar theme.
  • the timbral parameters and the loudness parameters may include a velocity and an instrumentation.
  • the harmonic parameters may include a mode.
  • the set of predetermined rules for adjusting the mode may include dividing the emotive valence level into ten regions, each region being defined by a range of the emotive valence level, and determining a current chord and a subsequent chord to be sounded based on a first probabilistic chord progression matrix and a second probabilistic chord progression matrix matching an intended emotive valence level.
  • the first probabilistic chord progression matrix may define a probability of the current chord and the subsequent chord to be sounded at a respective bar of the first 8-bar theme and a respective region of the ten regions
  • the second probabilistic chord progression matrix may define a probability of the current chord and the subsequent chord to be sounded at a respective bar of the second 8-bar theme and the respective region of the ten regions.
  • the pitch characteristics may include criteria of dissimilarity, pitch motive and pitch register.
  • the pitch motive may include any one of a diatonic step down a scale, a diatonic step up the scale, no pitch motion and a jump to a randomly selected chord tone, adjusting the pitch register of the note played by the strummed guitar and the plucked guitar by defining an increasing linear relationship between a probability of applying a chord inversion and the emotive valence level based on p(
  • the rhythmic parameters may include criteria of rhythmic roughness, rhythmic pattern, note density and tempo.
  • the system may further include an input/output (I/O) device communicatively coupled to the computing system.
  • the I/O device may be configured to receive information of the user’s emotion provided by the user, generate the signal based on the information, and send the signal to the emotional state identification module.
  • the user may provide the information to the I/O device based on any one of: 1) inputting, on a user interface of the I/O device, a perceived or induced emotion by the user from the plurality of musical audio signals, 2) inputting, on the user interface of the I/O device, instructions to adjust the emotive valence level and/or the emotive arousal level, and 3) inputting, on the user interface, current emotion of the user.
  • the system may further include a Brain-Computer Interface (BCI) device communicatively coupled to the computing device.
  • BCI Brain-Computer Interface
  • the BCI device may be configured to receive an electroencephalogram (EEG) signal from the user, and send the EEG signal to the emotional state identification module.
  • EEG electroencephalogram
  • the received signal may be the EEG signal.
  • the system may further include a media playing device communicatively coupled to the computing system.
  • the media playing device may be configured to play media to the user, receive, by an input element of the media playing device, information of an emotive reaction to the media, generate the signal based on the information received, and send the signal to the emotional state identification module and data of the media being played to the computing device.
  • the user may provide the information to the media playing device based on any one of: 1) inputting, on the input element of the media playing device, a perceived or induced emotion by the user from the playing media, 2) inputting, on the input element of the media playing device, instructions to adjust the emotive valence level and/or the emotive arousal level, and 3) inputting, on the input element of the media playing device, one or more intended emotions to be experienced when playing the media.
  • the computing device may further include a synchronization module configured to synchronize the signal with the playing media to generate an audio-visual signal stream including the playing media and an audio track synchronized to the playing media and including the musical audio signals corresponding to the signal.
  • a synchronization module configured to synchronize the signal with the playing media to generate an audio-visual signal stream including the playing media and an audio track synchronized to the playing media and including the musical audio signals corresponding to the signal.
  • FIG. 1 shows a flowchart illustrating an exemplary workflow of an algorithmic music generation system, in accordance with an embodiment.
  • FIG. 2 shows a flowchart illustrating an exemplary workflow of the algorithmic music generation system in a first configuration, in accordance with the embodiment.
  • Fig. 3 shows a schematic diagram of the algorithmic music generation system in the first configuration, in accordance with the embodiment.
  • Fig. 4 shows a flowchart illustrating an exemplary workflow of the algorithmic music generation system in a second configuration, in accordance with the embodiment.
  • Fig. 5 shows a schematic diagram of the algorithmic music generation system in the second configuration, in accordance with the embodiment.
  • Fig. 6 shows a flowchart illustrating an exemplary workflow of the algorithmic music generation system in a third configuration, in accordance with the embodiment.
  • Fig. 7 shows a schematic diagram of the algorithmic music generation system in the third configuration, in accordance with the embodiment.
  • Fig. 8 shows a first processing framework of the algorithmic music generation system, in accordance with the embodiment.
  • Fig. 9 shows a second processing framework of the algorithmic music generation system, in accordance with the embodiment.
  • Fig. 10 shows a schematic diagram of an exemplary computing device used in the algorithmic music generation system, in accordance with the embodiment.
  • Fig. 11 shows a schematic diagram of an exemplary system of the algorithmic music generation system in the first configuration, in accordance with the embodiment.
  • Fig. 12 shows a schematic diagram of an exemplary system of the algorithmic music generation system in the second configuration, in accordance with the embodiment.
  • Fig. 13 shows a schematic diagram of an exemplary system of the algorithmic music generation system in the third configuration, in accordance with the embodiment.
  • Fig. 14 shows an exemplary input element used in the algorithmic music generation system in the first and third configurations, in accordance with the embodiment.
  • Fig. 15 shows (a) a graph with average valence ratings plotted against valence parameter settings and (b) a graph with average arousal ratings plotted against arousal parameter settings.
  • Fig. 16 shows plots of average (interpolated) valence and arousal ratings as a function of valence and arousal parameters.
  • Fig. 17 shows a graph with mean arousal rating plotted against stimulus arousal setting.
  • Fig. 18 shows a graph with mean valence rating plotted against mean valence setting.
  • Fig. 19 shows plots illustrating the change in 9 subjects’ heart rate while listening to a 2-minute excerpt of Calm music generated by the Classical AMGS or Retro-Pop AMGS.
  • Fig. 20 shows a graph with 50 participants’ average heart rate (HR) slope (change in heart rate from the beginning to the end of a 2-minute generated music excerpt) plotted against change in listeners’ self-reported Valence over a 2-minute Calm trial.
  • Fig. 21 shows a graph with 50 participants’ average heart rate (HR) slope plotted against change in listeners’ self-reported Arousal over a 2-minute Happy trial.
  • emotion is represented in the Algorithmic Music Generation System (AMGS) using a circumplex model, in which emotions can be understood as points within a two-dimensional space.
  • the first dimension is emotive arousal, or referred to hereafter as simply “arousal”, which captures the intensity, energy, or “activation” of the emotion, while the second dimension is emotive valence, or referred to hereafter as simply “valence”, which captures the degree of pleasantness.
  • arousal emotive arousal
  • valence emotive valence
  • excitement is associated with high arousal and high valence
  • contentment would be associated with low arousal and high valence.
  • the circumplex model has several advantages over alternative measures of emotion.
  • music generated by the AMGS should vary smoothly over the entire space of emotions.
  • the circumplex model proposed herein functions such that it is able to flexibly generate music at any combination of arousal and valence (where arousal and valence levels are both represented on a scale from 0.0 to 1.0, forming a 2-dimensional emotion space).
  • the generalizability of the circumplex model also enables the use of previous research, which may have used less common measures of emotion, by interpreting their results in terms of arousal and valence.
  • the AMGS in accordance with embodiments of the present invention provides a model that is able to fluidly generate affective music in real time, either based on pre- decided arousal and valence levels (e.g., as a sort of affective playlist for emotion mediation or trajectory through emotion space), or based on the real-time feedback or physiological state of the user (e.g., EEG activity captured from the user and mapped to arousal level and valence level).
  • the AMGS provides a flexible yet powerful way to sonify real-time emotion states of the user, and to influence the emotion states of the user.
  • the AMGS may be used to generate dynamic, affective, and royalty-free music for various media applications (e.g., for images or videos lacking a music soundtrack). Further, the AMGS may be used for health and wellness applications, such as generating affective playlists for emotion mediation.
  • the AMGS may also be integrated into Brain-Computer Interface (BCIs) devices (or other systems using biofeedback), to assist the user in achieving a desired emotion state through neuro/biofeedback and affective music listening.
  • BCIs Brain-Computer Interface
  • the present specification also discloses apparatus for performing the operations of the methods.
  • Such apparatus may be specially constructed for the required purposes, or may comprise a computer or other device selectively activated or reconfigured by a computer program stored in the computer.
  • the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus.
  • Various machines may be used with programs in accordance with the teachings herein.
  • the construction of more specialized apparatus to perform the required method steps may be appropriate.
  • the structure of a conventional computer will appear from the description below.
  • the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in the art that the individual steps of the method described herein may be put into effect by computer code.
  • the computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein.
  • the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.
  • the computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a computer.
  • the computer readable medium may also include a hardwired medium such as exemplified in the Internet system, or wireless medium such as exemplified in the GSM, GPRS, 3G or 4G mobile telephone systems, as well as other wireless systems such as Bluetooth, ZigBee, Wi-Fi.
  • the computer program when loaded and executed on such a computer effectively results in an apparatus that implements the steps of the preferred method.
  • the present invention may also be implemented as hardware modules. More particularly, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA). Numerous other possibilities exist. Those skilled in the art will appreciate that the system can also be implemented as a combination of hardware and software modules.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • Fig. 1 shows a flowchart illustrating an exemplary workflow of a method for emotion mediation in accordance with an embodiment.
  • the method 100 can include the following steps: step 102: determining, by an emotional state identification module of a computing device, an emotive valence level and an emotive arousal level of a user based on a received signal representing a user’s emotion; step 104: adjusting, by a parameter adjustment module of the computing device, a plurality of musical parameters based on the emotive valence level and/or the emotive arousal level; step 106: generating, by a musical composition module of the computing device, a plurality of digital messages based on the plurality of musical parameters using a rule based algorithm, wherein each of the plurality of digital messages can be converted to an audio signal by an audio generation device that is communicatively coupled to the computing device; step 108: converting, by the audio generation device, the plurality of digital messages to a plurality of musical audio signals; and step 110: playing,
  • the method 200 can include additional method steps before step 102. These steps include step 202: providing, by the user, information of the user’s emotion to an input/output (I/O) device that is communicatively coupled to the computing device; step 204: generating, by the I/O device, the signal based on the information; and step 206: sending, by the I/O device, the signal to the emotional state identification module.
  • step 202 providing, by the user, information of the user’s emotion to an input/output (I/O) device that is communicatively coupled to the computing device
  • step 204 generating, by the I/O device, the signal based on the information
  • step 206 sending, by the I/O device, the signal to the emotional state identification module.
  • the user can provide the information to the I/O device by inputting, on a user interface of the I/O device, a perceived or induced emotion by the user from the plurality of musical audio signals, inputting, on the user interface, instructions to adjust the emotive valence level and/or the emotive arousal level, or inputting, on the user interface, an input indicative of a current emotion of the user.
  • Fig. 3 shows a schematic diagram 300 of the AMGS in the first configuration.
  • a user 310 provides the information of his/her emotion to an input/output (I/O) device 320, which is communicatively coupled to a computing device 330.
  • the I/O device 320 generates a signal based on the information provided and sends the signal to the computing device 330 including an emotional state identification module.
  • the emotional state identification module determines the valence level and the arousal level based on the information.
  • the valence level and the arousal level are fed into a parameter adjustment module 332 which can map the valence level and the arousal level to the plurality of musical parameters so as to adjust the plurality of musical parameters.
  • a plurality of digital messages are generated based on the plurality of musical parameters using a rule-based algorithm 334.
  • the plurality of digital messages can be, for example, Musical Instrument Digital Interface (MIDI) messages.
  • MIDI Musical Instrument Digital Interface
  • the plurality of digital messages are converted into the plurality of musical audio signal, by an audio generation device 336, into the plurality of musical audio signals corresponding to the received signal to be played to the user by a speaker 340 or similar device.
  • the audio generation device 336 can be a Digital Audio Workstation (DAW) for converting MIDI messages into sound.
  • DAW Digital Audio Workstation
  • the plurality of digital messages can be in multimedia formats other than MIDI without departing from the scope of the present invention.
  • Some of such examples are WAV, MP3, AIFF, symbolic music representations such as scores or piano roll notation, or any other truncated data that can perform similar functions.
  • the audio generation device 336 can be a typical synthesizer or other devices that can perform similar function as the DAW.
  • the method 400 can include additional steps before step 102.
  • the steps can include step 402: receiving, by a Brain- Computer Interface (BCI) device communicatively coupled to the computing device, an electroencephalogram (EEG) signal from the user and step 404: sending, by the BCI device, the EEG signal to the emotional state identification module.
  • BCI Brain- Computer Interface
  • EEG electroencephalogram
  • Fig. 5 shows a schematic diagram 500 of the AMGS in the second configuration.
  • a BCI 520 acquires brain activity information of the user’s brain 510 and sends the brain activity information to the computing device 330 including the emotional state identification module.
  • the brain activity information can be in the form of any one of EEG signals, Magnetoencephalography (MEG), Magnetic Resonance Imaging (MRI), or any other forms of signals that can provide information of the user’s brain activity.
  • the computing device 330 determines the emotive arousal level and the emotive valence level based on, for example, the EEG signal.
  • the EEG signal provides information on the user’s emotion. Subsequent steps of this configuration follow the corresponding steps of the first configuration as described above.
  • the method 600 can include additional steps before step 102. These steps can include step 602: playing, by a media playing device communicatively coupled to the computing device, media to the user; step 604: providing, by the user, information of an emotive reaction to the media to an input element of the media playing device; step 606: generating, by the media playing device, the signal based on the information received; and step 608: sending, by the media playing device, the signal to the emotional state identification module and data of the media being played to the computing device.
  • the user can provide the information to the media playing device by inputting, on the input element of the media playing device, a perceived or induced emotion by the user from the playing media, inputting, on the input element of the media playing device, instructions to adjust the emotive valence level and/or the emotive arousal level, or inputting, on the input element of the media playing device, one or more intended emotions to be experienced when playing the media.
  • the method 600 can further include a step 610: synchronizing, by a synchronization module of the computing device, the signal with the playing media to generate audio-visual signal stream comprising the playing media and an audio track synchronized to the playing media and comprising the musical audio signals corresponding to the signal.
  • the user 710 is presented the media by a media playing device 720 having a display 722 and a speaker (e.g., headset 724).
  • the user 710 may, at any point of time of the media being played, provide information of the user’s emotive reaction to the media or one or more emotions the user intends one to experience when playing the media to an input element 730 of the media playing device 720.
  • the media playing device 720 Upon receiving the information, the media playing device 720 generates the signal based on the information received and sends the signal and the data of the media being played to the computing device including the emotional state identification module.
  • the computing device then synchronizes the signal with the media being played to generate an audio-visual signal stream comprising the playing media and the audio track synchronized to the playing media and comprising the musical audio signals corresponding to the signal.
  • the subsequent steps of this configuration follow the corresponding steps of the first and second configurations described above.
  • the AMGS can be arranged in a fourth configuration which includes the media playing device 720 and the BCI device 520 (not shown in the Figures).
  • the process is similar to the process of the third configuration as described above.
  • the BCI device 520 instead of having the user provide the information of the user’s emotive reaction to the media to the input element 722 of the media playing device 720, the BCI device 520 extracts the information of the user’s brain activity, i.e., the EEG signal. In this manner, the user’s emotive reaction to the playing media is indicated by a change in the user’s brain activity.
  • the media playing device 720 and the BCI device 520 can be operated as separate units or integrated to operate as a single unit.
  • the information provided by or extracted from the user as described in Figs. 1 to 7 and the emotive affect of the musical audio signals being played can be a continuous process.
  • the information provided by the BCI device 520 to the computing device 330 will dynamically update the musical parameters, therefore continually updating the musical audio signals played to the user by the speaker 340 which will have a continuous emotive affect on the user’s EEG signal acquired by the BCI device 520.
  • the affective qualities of the music produced in accordance with the embodiments described will continuously sonify the user’s emotion so as to mediate the user’s emotional state.
  • Fig. 8 shows a processing framework of the AMGS for generation of classical music.
  • the parameters and design of the AMGS developed to both induce emotion in the users, and provide the users with accurate feedback on their current emotional state, expressed in the form of music (that is, when embedded in a BCI or neurofeedback system, the music reflects the user’s current emotional state) are described in the following sections.
  • the AMGS takes a sequence of the arousal level and the valence level as input and encodes a corresponding sequence of harmonic, rhythmic and timbral parameters in the form of a MIDI event stream as output.
  • the MIDI event stream is then sent to a digital audio workstation (DAW) over virtual MIDI buses to be translated into sound.
  • DAW digital audio workstation
  • Arousal and valence take continuous values within the range [0, 1] and are updated every bar. All of the musical parameters are also updated each bar in accordance with the current arousal and valence levels.
  • the main musical parameters of the Classical AMGS are described as follows.
  • Mode In the Classical AMGS, mode is controlled through composed probabilistic chord progressions that reach a cadence every 8 bars.
  • the valence level is divided into ten regions, with one probabilistic chord progression composed for each region to match an intended level of valence.
  • the chord progressions are composed in the major mode as it is typically associated with expressions of positive valence.
  • the chord progressions are represented as a list of dictionaries (shown in Table 2 below), with each dictionary containing the following key parameters:
  • Chord The current chord c, represented as a list with four elements: o A list of scale indices associated with each chord. All chords are defined with reference to the major scale for ease and flexibility of implementation. For example, the I major chord is built on the first, third and fifth notes of the major scale, with the component intervals being a root, major third and perfect fifth. The associated scale indices would be [1, 3, 5]. As an additional detail, all stimuli are generated with respect to the C major scale beginning on C4.
  • O A scalar indicating the desired inversion. For example, 0 indicates that the chord should be played in root position, 1 indicates the first inversion, and so on.
  • Valence The valence level at which the current chord should be selected.
  • Next chord The next chord to be sounded, if there is a fixed chord that should be played following the current chord (such sequences are noted by asterisks * in Table 2) to fulfil intended harmonic functions.
  • the Classical AMGS is developed to generate music with four parts or voices.
  • the bass voice is carried by the string section and plays the root note of the current chord.
  • the principal melody is placed in the soprano voice, which is carried by the clarinet and marimba.
  • Both inner voices are carried by the piano, with the tenor voice simply playing the harmonic progression and the alto voice providing harmonic accompaniment. Instrumentation is explained in more detail in the section on timbral parameters. While there are numerous principles that govern voice leading, or the creation of perceptually independent parts, several straightforward rules that provide sufficient melodic diversity while minimizing unpleasant or artificial-sounding melodic lines are selected for the Classical AMGS.
  • the three parts that are determined through voice leading logic are the tenor, alto and soprano voices.
  • the note sequence is a randomly selected sequence of chord tones.
  • a heuristic based on the concept that pianists tend to voice new chords in a manner that is as similar as possible to the previous chord (in terms of interval and placement on the keyboard) is followed.
  • Dissimilarity between two notesets (noteset, noteset’) is calculated based on Equation (1) and the least dissimilar chord voicing is selected to be played the first inner voice.
  • chord tones are combined with the step motion rule - if the next note in the melody is of a different pitch, the pitch motion should be by diatonic step.
  • transition matrices presented in Tables 4 and 5 below.
  • CT chord tone
  • transition matrices are developed such that at higher levels of arousal, melodies are more likely to consist of scale patterns, mitigating the risk of the music being too dissonant or unpleasant due to the increased tempo and note density.
  • Pitch Register Higher pitches generally tend to be associated with positively- valenced emotions such as excitement and serenity, while lower pitches tend to be associated with negatively-valenced emotions such as sadness.
  • the valence level is divided into ten equally spaced regions and the lower and upper bounds of allowable pitches are tuned by ear. Both the lower and upper bounds of the range of permissible pitches increase gradually as valence level increases. In the lowest valence region, the lowest allowable pitch is a Cl and the highest allowable pitch is a C5. In the highest valence region, the lowest allowable pitch is a G3 and the highest allowable pitch is a C6.
  • the pitch ranges are similarly stored as a list of dictionaries, with each dictionary containing the following key parameters:
  • Valence The valence level at which the current chord should be selected.
  • MIDI-high The upper bound of the pitch range in terms of its MIDI pitch number.
  • Rhythm As mentioned above, the Classical AMGS is developed to generate music with four parts or voices.
  • the bass voice (string section) and first tenor voice (piano) have a fixed rhythmic pattern - they are both played on the first beat of each bar.
  • the soprano voice clarinet and marimba
  • the arousal level is divided into three regions: low (aro ⁇ 0.4), moderate (0.4aro ⁇ 0.75) and high (aro0.75), and a set of two equiprobable rhythmic patterns is composed for each region.
  • the rhythmic patterns are stored as a list of dictionaries, with each dictionary containing the following key parameters:
  • Arousal The range of arousal level within which the current rhythmic pattern should be selected
  • the range of the emotive valence level or emotive arousal level that define each of the divided regions described above are not fixed. The person skilled in the art can readily see that these values can be adjusted according to specific needs and/or desired outcomes. For example, in the “Rhythm” section described above, the “high” region could be greater than 0.8, greater than 0.65, or include other defining features, instead of being fixed at greater than 0.75.
  • rhythmic roughness is incorporated, which is a measure of how irregular the rhythm of a piece of music is. Music with smooth, regular rhythms are typically perceived as higher in valence level. In the Classical AMGS, note density is sued as a proxy for rhythmic roughness.
  • Tempo Tempo, or beats per minute, determines how quickly the notes of each bar are played. Alternatively, it can be thought of as a measure of note duration - the faster the tempo, the shorter the note duration.
  • Velocity Variation Patterns of velocity variation have affective consequences. For example, large variations are associated with fear, while small changes can indicate pleasantness. It is found that frequent changes in velocity resulted in artificial, disjointed- sounding output. To strike a balance between having sufficient variation in velocity and incorporating those variations in as natural a way as possible, the maximum change in velocity allowable is limited in each bar. In MIDI, velocity is measured on a scale from 0-127.
  • DAW Ableton
  • Fig. 9 shows a processing framework of the AMGS for generation of retro-pop music.
  • the AMGS described in the present disclosure can compose music in a retro-pop genre, which may be particularly well-suited for middle-aged and older adults, and anyone who enjoys popular music from the 60s and 70s.
  • the development of multiple genres of AMGS is advantageous, both to allow greater personalization and individual preference to be taken into account, and because research has shown that different genres can have differing impacts on listeners’ emotional states.
  • the doubling of the violin section by French horn helps to smooth and balance the electronic timbre and make the melodic line prominent.
  • the electric guitar provides harmonic accompaniment by strumming chords, which are recorded on one audio track.
  • a plucked electric guitar is also used to add texture, as it can be difficult to convey a desired level of valence when note activations are sparse.
  • the strummed and plucked electric guitars are recorded on separate audio tracks to facilitate audio mixing.
  • Pop songs are typically composed of discrete sections (e.g., verse, chorus, etc), each of which has a distinctive harmonic and melodic theme.
  • the music loops in a 32-bar cycle comprised of two 8-bar sections (which are labelled the “A” and “B” sections, as per convention) that repeat in an AABB pattern.
  • Mode is controlled through the set of probabilistic chord progressions that make up each section.
  • the Retro-Pop AMGS provides a section-based structure that also includes probabilistic chord progressions.
  • the valence level is divided into ten regions, with one probabilistic chord progression composed for each region to match the intended level of valence. For example, at higher levels of valence, the chord progressions are composed in the major mode as it is typically associated with expressions of positive valence. As valence level decreases, the likelihood of chords with greater tension or dissonance (such as those with diminished or minor intervals) increases.
  • the chord progressions are represented as a list of dictionaries as shown in Tables 10 (section “A”) and 11 (section “B”) below, with each dictionary containing the following key parameters:
  • Chord The current chord, represented as a list with four elements: oA list of scale indices associated with each chord. All chords are defined with reference to the major scale for ease and flexibility of implementation. For example, the I major chord is built on the first, third and fifth notes of the major scale, with the component intervals being a root, major third and perfect fifth. The associated scale indices would be [1, 3, 5]. As an additional detail, all stimuli were generated with respect to the C major scale beginning on C4. o A list of alterations with the same length as the list of scale indices. For example, the I minor chord has component intervals of a root, minor third and perfect fifth.
  • the associated scale indices would be [1, 3, 5] with alterations [0, -1, 0] to indicate that the original major third interval should be lowered by a semitone.
  • o A scalar indicating the desired inversion. For example, 0 indicates that the chord should be played in root position, 1 indicates the first inversion, and so on.
  • o A string indicating the common name of the chord.
  • Valence The valence at which the current chord should be selected.
  • Section The section during which the current chord should be selected.
  • Next chord The next chord to be sounded, if there is a fixed chord that should be played following the current chord to fulfil intended harmonic functions.
  • the Retro-Pop AMGS uses five virtual instruments - a percussion kit, bass guitar, electric guitar, violin section and French horn.
  • the bass guitar and plucked electric guitar follow relatively straightforward voice leading logic.
  • the bass guitar plays the roots and fifths of the current chord, a common note pattern used by bass players in pop music.
  • the plucked electric guitar is used primarily to add texture at lower levels of arousal; hence, it plays a randomly selected sequence of chord tones from the current chord (with all chord tones being equiprobable). This allows the current chord quality to be expressed clearly even at lower levels of arousal with little additional dissonance.
  • Pitch Motive For the main melodic instruments (string section and French horn), a mix of composed melodic motives and probabilistic voice leading logic is employed. Specifically, the instruments play a melodic pattern probabilistically determined by the voice leading logic for the first four bars of each 8-bar section, and play a selected composed rhythmic motive in the next four bars. For the voice leading logic, we combine chord tones with the step motion rule - if the next note in the melody is of a different pitch, the pitch motion should be by diatonic step. These rules are encoded in the form of transition matrices, presented in Tables 13 and 14 below.
  • valence level is divided into two regions, with one matrix composed for each region to generate appropriate melodies for each level of valence. For example, at higher levels of valence (e.g., val > 0.5), if the previous pitch motive was a diatonic step motion up the scale, there is a 10% probability that the next pitch motive in the sequence will be 0, i.e., no pitch motion and the current note is repeated.
  • transition matrices were developed such that at higher levels of valence, melodies are more likely to consist of scale patterns, mitigating the risk of the music being too dissonant or unpleasant due to the increased tempo and note density.
  • the composed rhythmic motives were developed with reference to representative pieces from the Western and mandarin Retro-Pop music canon during the 196Os-198Os, and include typical motifs such as pentatonic patterns and arpeggiation.
  • Pitch Register Higher pitches generally tend to be associated with positively- valenced emotions such as excitement and serenity, while lower pitches tend to be associated with negatively-valenced emotions such as sadness.
  • the probability that the inversion of the current chord voicing increases by one increases linearly following p(inv +
  • inv 0) valence level, while the probability that the inversion of the current chord voicing decreases by one (e.g., from first to root inversion) decreases linearly following p(inv
  • inv 0) 1 - valence level.
  • Register is not relevant for the percussion instrument, and for the remaining instruments (French horn and violin section), the register is fairly consistent as they return to the composed melodic motive in the second four bars of each 8-bar section, and there are typically no obvious register changes due to voice leading logic in the remaining four bars.
  • Rhythm A set of three equiprobable 8-bar rhythmic patterns is composed for the bass guitar, and a new rhythmic pattern is randomly selected at the beginning of each 8- bar section.
  • a new rhythmic pattern is selected for strummed guitar at the beginning of each bar.
  • several 8-bar rhythmic patterns are composed that gradually reduce in density as arousal level decreases, with greater use of quieter instruments and techniques such as rim clicks.
  • a new rhythmic pattern for percussion is selected at the beginning of each 8-bar section.
  • a mix of composed rhythmic motives and rhythmic roughness is employed for the main melodic instruments (string section and French horn). Specifically, during the first four bars of each 8-bar section, the instruments play a rhythmic pattern probabilistically determined by the roughness parameter, which is a measure of how irregular the rhythm of a piece of music is.
  • rhythmic patterns for the plucked electric guitar are determined probabilistically by the roughness parameter.
  • the range of the emotive valence level or emotive arousal level that define each of the divided regions described above are not fixed. The person skilled in the art can readily see that these values can be adjusted according to specific needs and/or desired outcomes. For example, in the “Rhythm” section with regard to the strummed guitar described above, the “high” region could be greater than 0.8, greater than 0.65, or include other defining features, instead of being fixed at greater than 0.70.
  • Arousal The range of arousal level within which the current rhythmic pattern should be selected
  • Tempo Tempo or beats per minute, determines how quickly the notes of each bar are played. Alternatively, it can be thought of as a measure of note duration - the faster the tempo, the shorter the note duration.
  • tempo is determined by a logarithmic relationship with the arousal level as defined in Equation - (11) below and has a range tempo [36,130].
  • Fig. 10 shows an exemplary computing device.
  • the example computing device 1000 includes a processor 1002 for executing software routines. Although a single processor is shown for the sake of clarity, the computing device 1000 may also include a multi-processor system.
  • the processor 1002 is connected to a communication infrastructure 1004 for communication with other components of the computing device 1000.
  • the communication infrastructure 1004 may include, for example, a communications bus, cross-bar, or network.
  • the computing device 1000 further includes a main memory 1006, such as a random access memory (RAM), and a secondary memory 1008.
  • the secondary memory 1008 may include, for example, a hard disk drive 1010 and/or a removable storage drive 1012, which may include a floppy disk drive, a magnetic tape drive, an optical disk drive, or the like.
  • the removable storage drive 1012 reads from and/or writes to a removable storage unit 1014 in a well-known manner.
  • the removable storage unit 1014 may include a floppy disk, magnetic tape, optical disk, or the like, which is read by and written to by removable storage drive 1012.
  • the removable storage unit 1014 includes a computer readable storage medium having stored therein computer executable program code instructions and/or data.
  • the secondary memory 1008 may additionally or alternatively include other similar means for allowing computer programs or other instructions to be loaded into the computing device 1000.
  • Such means can include, for example, a removable storage unit 1016 and an interface 1018.
  • a removable storage unit 1016 and interface 1018 include a program cartridge and cartridge interface (such as that found in video game console devices), a removable memory chip (such as an EPROM or PROM) and associated socket, and other removable storage units 816 and interfaces 818 which allow software and data to be transferred from the removable storage unit 816 to the computer system 1018.
  • the computing device 1000 also includes at least one communication interface 1020.
  • the communication interface 1020 allows software and data to be transferred between computing device 1000 and external devices via a communication path 1022.
  • the communication interface 1020 permits data to be transferred between the computing device 1000 and a data communication network, such as a public data or private data communication network.
  • the communication interface 1020 may be used to exchange data between different computing devices 1000 which such computing devices 1000 form part an interconnected computer network. Examples of a communication interface 1020 can include a modem, a network interface (such as an Ethernet card), a communication port, an antenna with associated circuitry and the like.
  • the communication interface 1020 may be wired or may be wireless.
  • Software and data transferred via the communication interface 1020 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received by communication interface 1020. These signals are provided to the communication interface via the communication path 1022.
  • the computing device 1000 further includes a display interface 1024 which performs operations for rendering images to an associated display 1026 and an audio interface 1028 for performing operations for playing audio content via associated speaker(s) 1030.
  • the computing device 1000 can include an Input/Output device 1032.
  • the Input/Output device 1032 includes the audio interface 1028 (such as the audio generation device 336), the speaker(s) 1030 (such as the speaker 340) and an I/O device 1034 (such as I/O devices 320, 520).
  • the user provides the information of the user’s emotion to the I/O device 1034, i.e., I/O device 320.
  • the I/O device 1034 Upon receiving the information, the I/O device 1034 generates the signal based on the information and sends the signal to the emotional state identification module in the processor 1002 of the computing device 1000.
  • the I/O device 1034 can be the BCI device 520.
  • the computing device 1000 can include a media playing device 1036, e.g., media playing device 720 (Fig. 7), which includes the Input/Output device 1032, the display interface 1024 and the display 1026.
  • the media is played to the user through the display 1026, i.e., the display 722, and/or the speaker(s) 1030, i.e., the speaker 724, and the user provides the information of the user’s emotive reaction to the media to the I/O device 1034, i.e., the input element 730.
  • the I/O device 1034 Upon receiving the information of the user’s emotive reaction to the media, the I/O device 1034, i.e., the I/O device 720, then generates the signal based on the information received and sends the signal and the data of the media being played to the emotional state identification module of the computing device 1000 in the processor 1002.
  • the processor 1002 includes the emotional state identification module, the parameter adjustment module, the musical composition module and the synchronization module discussed hereinabove.
  • computer program product may refer, in part, to removable storage unit 1014, removable storage unit 1016, a hard disk installed in hard disk drive 1010, or a carrier wave carrying software over communication path 1022 (wireless link or cable) to communication interface 1020.
  • Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computing device 1000 for execution and/or processing.
  • Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-rayTM Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magnetooptical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computing device 1000.
  • Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computing device 1000 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
  • the computer programs are stored in main memory 1006 and/or secondary memory 1008. Computer programs can also be received via the communication interface 1020. Such computer programs, when executed, enable the computing device 1000 to perform one or more features of embodiments discussed herein. In various embodiments, the computer programs, when executed, enable the processor 1002 to perform features of the above-described embodiments. Accordingly, such computer programs represent controllers of the computer system 1000.
  • Software may be stored in a computer program product and loaded into the computing device 1000 using the removable storage drive 1012, the hard disk drive 1010, or the interface 1018.
  • the computer program product may be downloaded to the computer system 1000 over the communications path 1022.
  • the software when executed by the processor 1002, causes the computing device 1000 to perform functions of embodiments described herein.
  • Fig. 10 is presented merely by way of example. Therefore, in some embodiments one or more features of the computing device 1000 may be omitted. Also, in some embodiments, one or more features of the computing device 1000 may be combined together. Additionally, in some embodiments, one or more features of the computing device 1000 may be split into one or more component parts. [0108] It will be appreciated that the elements illustrated in Fig. 10 function to provide means for performing the various functions and operations of the servers as described in the above embodiments.
  • a server may be generally described as a physical device comprising at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code are configured to, with the at least one processor, cause the physical device to perform the requisite operations.
  • Fig. 11 shows a schematic diagram of the computing device 1100 when the AMGS is in the first configuration.
  • the processor 1002 of the computing device 1100 includes the emotional state identification module 1102, the parameter adjustment module 1104, i.e., the parameter adjustment module 332, and the musical composition module 1106.
  • the computing device 1100 is communicatively coupled to the audio generation device 1108 (such as the audio generation device 336) and the Input/Output device 1032.
  • the user 1112 provides the information of the user’s emotion to the Input/Output device 1032.
  • the Input/Output device 1032 then generates the signal representing the user’s emotion based on the information and sends the signal to the emotional state identification module 1102.
  • the emotional state identification module 1102 determines the emotive valence level and the emotive arousal level of the user based on the received signal.
  • the parameter adjustment module 1104 adjusts the plurality of musical parameters based on the emotive valence level or the emotive arousal level.
  • the musical composition module 1106 then generates the plurality of digital messages based on the plurality of musical parameters based on the rule-based algorithm 334.
  • the plurality of digital messages are converted, by the audio generation device 1108, into the plurality of musical audio signals.
  • the process can start before or after the plurality of musical audio signals are played to the user 1112, e.g., the user 1112 can input the information to the Input/Output device 1032 before or after the plurality of musical audio signals are played to the user 1112.
  • Fig. 12 shows a schematic diagram of the computing device 1200 when the AMGS is in the second configuration.
  • the processor 1002 of the computing device 1200 includes the emotional state identification module 1202, the parameter adjustment module 1204 (i.e., the parameter adjustment module 332) and the musical composition module 1206.
  • the I/O device 1034 is the BCI device 1210 (such as the BCI device 520).
  • the computing device 1200 is communicatively coupled to the audio generation device 1208 (such as the audio generation device 336) and the BCI device 1210.
  • the BCI device 1210 receives the EEG signal from the user and sends the EEG signal to the emotional state identification module 1202 of the computing device 1200.
  • the emotional state identification module 1202 determines the emotive valence level and the emotive arousal level of the user based on the received signal, e.g., the EEG signal.
  • the parameter adjustment module 1204 adjusts the plurality of musical parameters based on the emotive valence level or the emotive arousal level.
  • the musical composition module 1206 then generates the plurality of digital messages based on the plurality of musical parameters based on the rule-based algorithm 334.
  • the plurality of digital messages are converted, by the audio generation device 1208, into the plurality of musical audio signals.
  • the process can start before or after the plurality of musical audio signals are played to the user 1212, e.g., the EEG signal can be sent to the emotional state identification module 1202 of the computing device 1200 before or after the plurality of audio signals are played to the user 1212.
  • Fig. 13 shows a schematic diagram of the computing device 1300 when the AMGS is in the third configuration.
  • the computing device 1300 includes the emotional state identification module 1302, the parameter adjustment module 1304 (i.e., the parameter adjustment module 332), the musical composition module 1306 and the synchronization module 1308.
  • the computing device 1300 is communicatively coupled to the audio generation device 1310 (such as the audio generation device 336) and the media playing device 1036 (such as the media playing device 720).
  • the media playing device 1036 plays the media to the user 1314 via the display 722, 1026 and/or the speaker(s) 724, 1030.
  • the user 1314 provides the information of the user’s emotive reaction to the media to the input element 730 of the media playing device 1036.
  • the media playing device 1036 Upon receiving the information, the media playing device 1036 generates the signal representing the user’s emotion based on the information and sends the signal and the data of the media being played to the emotional state identification module 1302.
  • the emotional state identification module 1302 determines the emotive valence level and the emotive arousal level of the user based on the received signal.
  • the parameter adjustment module 1304 adjusts the plurality of musical parameters based on the emotive valence level or the emotive arousal level.
  • the musical composition module 1306 then generates the plurality of digital messages based on the plurality of musical parameters using the rule-based algorithm 334.
  • the plurality of digital messages are converted, by the audio generation device 1310, into the plurality of musical audio signals.
  • the synchronization module 1308 can synchronize the signal with the playing media to generate the audio-visual signal stream including the playing media and the audio track synchronized to the playing media and including the musical audio signals corresponding to the signal.
  • the information provided by or extracted from the user as described above can be a continuous process to continuously sonify the user’s emotion so as to mediate the user’s emotional state.
  • the information can be input to the computing device continuously, leading the system to dynamically update the musical parameters, and therefore to continuously update the affective qualities of the music.
  • Fig. 14 shows an exemplary analog input element of the I/O device when the AMGS is in the first or third configuration.
  • the input element 1400 (such as the input element 730) can display a plurality of regions, each labelled with one or more discrete emotion term.
  • the emotion terms can be arranged based on their associated valence and arousal levels. As per convention, negative emotion terms (i.e., low valence level) are positioned on the lefthand side of the space, while positive emotion terms (i.e., high valence level) are positioned on the right.
  • emotion terms associated with high energy/activation are placed above the center of the plurality of regions, while emotion terms associated with low energy/activation are placed below the center of the plurality of regions.
  • the center point of the plurality of regions represents a neutral emotion state.
  • the emotion term “Happy” has a higher valence level than the emotion term “Excited” (i.e., being placed more towards to the right).
  • the emotion term “Happy” has a lower arousal level than the emotion term “Excited” (i.e., being placed closer to the center of the plurality of regions).
  • the user may provide one or more current emotions of the user to the input element by clicking on one or more of the regions using a mouse.
  • the input element can further include a button 1402 configured to be moved by the user to any one of the regions.
  • the user may move the button to the region with a label corresponding to the current emotion(s) that the user is currently feeling. Doing so would indicate a change in the user’s emotive arousal and/or emotive valence levels.
  • the movement of the button may be via sliding motion.
  • the sliding motion can advantageously enable the user to provide intensity information for an emotion that the user is currently feeling that has close relation to the displayed emotion term by sliding the button 1402 towards to the edge of the display.
  • the button 1402 For example, if the user slides the button 1402 to the “Happy” region but stops the sliding further from the edge of the display at Point “A”, it will mean that the user is feeling “Happy”. On the other hand, if the user slides the button 1402 to the “Happy” region and stops the sliding near to the edge of the display at Point “B”, it will mean that the user is feeling “Happy” with strong intensity to indicate an emotion such as Delighted or Overjoyed.
  • the display of the plurality of regions can be provided on a touchscreen display and the button 1402 can be moved around the plurality of regions on the display by dragging it across the touchscreen. Similar analog emotion input means, as well as digital emotion input means may be used with the embodiment and configurations described hereinabove.
  • a listening study was conducted in order to validate the efficacy of the Classical AMGS for generating affective music.
  • the system was first used to generate brief musical examples from different points around the arousal-valence space of the circumplex model. Listeners then provided arousal and valence ratings for each of these excerpts to examine whether the target emotion (in terms of arousal and valence) was indeed perceived as intended by listeners.
  • the Classical AMGS was designed to compose affective music that can span the entire valence-arousal plane (such as the valence and arousal plane as displayed in Fig. 14).
  • musical stimuli were generated from 13 different points around the valence and arousal plane. These were meant to represent different emotional states around the space, and covered the corners, middle of each quadrant, and the neutral middle point of the space.
  • ⁇ valence, arousal ⁇ [ ⁇ 0,0 ⁇ ; ⁇ 0,0.5 ⁇ ; ⁇ 0,1 ⁇ ; ⁇ 0.25;0.25 ⁇ ; ⁇ 0.25,0.75 ⁇ ; ⁇ 0.5,0 ⁇ ; ⁇ 0.5, 0.5 ⁇ ; ⁇ 0.5,1 ⁇ ; ⁇ 0.75,0.25 ⁇ ; ⁇ 0.75,0.75 ⁇ ; ⁇ 1,0 ⁇ ; ⁇ 1,0.5 ⁇ ; ⁇ 1,1 ⁇ ].
  • three different musical stimuli were generated from each of the thirteen points, resulting in a total of 39 musical excerpts. This mitigates the risk that artifacts in any particular stimulus might bias listener ratings, for more robust results.
  • the average duration of the music stimuli is 23.6 s.
  • the stimuli were composed based on either an 8- or 16-bar progression to allow the music to reach a cadence. Note that because the Classical AMGS was designed to generate music continuously and flexibly based on the listener’s physiological state or real-time arousal and valence levels, the music does not always reach a full cadence at the end of an 8 -bar sequence (e.g., sometimes the tonic/cadence is only reached at the beginning of the subsequent 8-bar sequence). The objective of this study is not to test the ability of the generated music to have well-formed cadences per se, but to convey a target emotion.
  • the examples do not necessarily end with a musical cadence; rather, they are excerpts from what could be an infinitely-long musical creation. Therefore, while generating stimuli with a fixed duration is possible, this often results in stimuli that end somewhat abruptly, which might influence a listener’s emotional response to the stimuli.
  • Sixteen bars were used for stimuli with a fast tempo (e.g., high arousal excerpts), as 8 bars produced too brief a time duration for these excerpts. All musical stimuli were presented to each participant in randomized order to avoid order effects across participants.
  • the experiment was conducted one subject at a time in a quiet room with minimal auditory and visual distractions.
  • the subject sat in front of a computer and listened to the music stimuli over headphones, with the sound level adjusted to a comfortable listening volume.
  • the subject rated his/her current emotional state.
  • the music listening study began with two practice trials, followed by the 39 experimental trials in randomized order.
  • the subject was asked to indicate the perceived emotion of the stimulus (that is, the emotions they felt that the music conveyed) on a visual 9-point scale known as the Self- Assessment Manikin. These ratings were collected for both arousal and valence.
  • valence refers to the degree of the pleasantness of the emotion
  • arousal refers to the activation or energy level of the emotion.
  • the SAM scale ranged from “very unpleasant” (1) to “extremely pleasant” (9) for valence, and from “calm” (1) to “excited” (9) for arousal. Subjects were allowed to take as long as they required to make these ratings, but were only permitted to listen to each musical stimulus once. [0118] In order to evaluate the efficacy of the Classical AMGS, analysis was made on the user ratings collected during the music listening study. This study aimed to investigate whether the music generated by the Classical AMGS is able to express the desired level of valence and arousal to the subjects.
  • the subject’s averaged (normalized) emotion ratings for the musical stimuli is compared with the valence or arousal parameter settings used during the music generation process.
  • the bar graphs depicting the averaged ratings (along with standard errors) are presented in Fig. 15. As expected, a strong increasing trend is seen for both the average valence and arousal ratings with respect to their corresponding parameter settings.
  • the results also show a stronger linear relationship for arousal (between average arousal ratings and parameter settings) in comparison to valence. These results show that the music generated by Classical AMGS generally conveys the intended levels of valence and arousal to listeners.
  • Fig. 16 visualizes this dependence by presenting the interpolated average valence (left) and arousal ratings (right) as a function of the emotion parameter settings.
  • the stars in Fig. 16 represent the 13 points around the valence and arousal plane used to generate musical stimuli. As can be seen in Fig.
  • the perceived valence is lower than the actual valence parameter setting (for Valence > 0.7) for excerpts expressing arousal levels ⁇ 0.4. That is, excerpts generated to express high valence convey only moderate valence when the arousal setting is low. This may be due in part to the effect of a slower tempo. Ratings at low valence settings are, however, in accordance with their respective parameter values. In contrast, uniform correspondence between the arousal parameter values and arousal ratings regardless of the valence parameter setting is observed.
  • the subjects listened to musical excerpts generated from the Retro- Pop AMGS and provided ratings of perceived emotion (valence and arousal ratings, as in the study above).
  • the musical stimuli were generated from 13 different points around the 2-dimensional valence and arousal space. These points are meant to represent different emotion states around the space, covering the corners, middle of each quadrant, and neutral point of the space.
  • valence level, arousal level [(0,0);(0,0.5); (0,1); (0.25;0.25); (0.25,0.75); (0.5,0); (0.5, 0.5); (0.5,1); (0.75,0.25); (0.75,0.75); (1,0); (1,0.5); (1,1)].
  • Figs. 17 and 18 depict the average results for arousal ratings and valence ratings.
  • the graph in Fig. 17 depicts the average arousal ratings as a function of the Retro- Pop AMGS’s arousal parameter setting.
  • Retro-Pop AMGS can reliably produce music with an intended level of perceived arousal (from low to moderately high arousal) in listeners.
  • the graph in Fig. 18 displays the average valence ratings as a function of the Retro-Pop AMGS’s valence parameter setting.
  • F 202.43, p ⁇ 0.001
  • the results show that the Retro-Pop AMGS is able to produce music with a target level of perceived valence, ranging from moderately low valence to moderately high valence.
  • these findings validate the ability of the Retro-Pop AMGS to generate affective music in a pop style.
  • the results confirm that the Retro-Pop AMGS is capable of producing music at varying degrees of perceived arousal and valence according to the parameters of the model.
  • the four stimuli were as follows: Classical + Calm, Classical + Happy, Pop + Calm, and Pop + Happy.
  • the plurality of emotions can include other known emotions not listed above without departing from the scope of the present invention. Some of such examples are “Peaceful”, “Bored”, “Angry” and “Pleased” as shown in Fig. 14.
  • the data in Table 17 shows that the Classical AMGS Calm excerpt results in a significant increase in the number of subjects reporting feeling Calm (p ⁇ 0.001).
  • the Retro-Pop AMGS Calm excerpt resulted in a decrease in Happy (p ⁇ 0.001), Excited (p ⁇ 0.001), and Anxious (p ⁇ 0.01) emotions, which are all “high arousal” emotion states, and an increase in Sad (p ⁇ 0.001), which may be due to the induction of a lower arousal state. Based on the study, both systems appear to be effective in reducing arousal.
  • the results for the Classical AMGS Happy excerpt show a significant increase in the number of subjects reporting feeling Happy (p ⁇ 0.001) and Excited (p ⁇ 0.001), as well as a decrease in Calm (p ⁇ 0.001) and Tired (p ⁇ 0.001) emotion states, which are both low arousal emotions.
  • the Retro-Pop AMGS Happy excerpt resulted in an increase in Happy (p ⁇ 0.01) emotion states, and a decrease in Calm (p ⁇ 0.001) states.
  • Trial A and Trial B Two 2-minute long music pieces were generated by the system (referred to as Trial A and Trial B) which aimed to move subjects from their current emotion state to the target emotion state.
  • physiological signals were recorded via an E4 wrist sensor, and subjects provided their current emotion state before and after listening to each 2-min excerpt using a two- dimensional arousal-valence state space known as the circumplex model of emotion.
  • Heart Rate HR
  • HR Heart Rate
  • Fig. 19 shows facet plots illustrating an example of the change in HR for individual subjects (subjects 4, 5, 6, etc) over the first 2-minute Calm trial (Trial A for the Calm target emotion), for both the Classical and Retro-Pop AMGS.
  • Average HR is plotted over time for every individual, and the slope is superimposed - an increasing slope represents an overall increase in HR from the beginning to the end of the trial, and a negative slope represents a decrease in HR over the trial.
  • Average HR was calculated for every 30 second duration from the beginning of the trial, with a sliding window of 5 seconds; therefore, average HR was calculated at 19 sequential time points over the trial.
  • the Calm trials aimed to gradually reduce HR, so a negative slope was hypothesized.
  • the plots in Fig. 19 are displayed to show how HR varied for individual participants, and how the HR slope gives an overarching measure of change in HR over each 2-minute trial.
  • Figs. 20 and 21 show graphs of the 50 subjects’ HR slope during Trial A on the y-axis.
  • Trial A is the first 2-minute long trial subjects heard, which aimed to take them from their current emotion state to the target emotion state.
  • a positive slope represents an increase in HR over the trial, and a negative slope represents a decrease in HR over the trial.
  • the average change in Valence is plotted from before to after listening to Trial A.
  • a positive change in Valence represents the subject moving to a more positive emotion state from before to after the music clip (note that some subjects were already reported high valence levels before the music started, so an increase was not possible).
  • FIG. 20 and 21 show graphs of the 50 subjects’ HR slope during Trial A on the y-axis.
  • Trial A is the first 2-minute long trial subjects heard, which aimed to take them from their current emotion state to the target emotion state.
  • a positive slope represents an increase in HR over the trial
  • a negative slope represents a decrease in HR over the trial.
  • the subjects that report a higher Arousal/energy score after listening to the music also tend to have a greater increase in HR across the trial (note: that upon averaging across all subjects, changes in HR during a trial were around 1-2 bpm).
  • the Classical AMGS may be more effective for Calm induction
  • the Retro-Pop AMGS may be more effective for Happy induction, based on this set of results.
  • Subjects who reported an increase in Valence (more positive emotion after listening to the music) when listening to the Calm classical music tended to also demonstrate a decrease in HR, as predicted.
  • subjects who reported an increase in Arousal (more energetic emotion after listening to the music) when listening to the Happy pop music tended to also demonstrate an increase in HR, as hypothesized.
  • the AMGS can be used in various commercial applications. The following lists some of such examples.
  • the AMGS may be used to generate dynamic and emotional music that is royalty-free for various media applications. For example, it could be used to create music soundtracks for images or videos lacking sound.
  • digital technologies become increasingly popular (from “home movies” to visual content creation to digital artwork)
  • the demand for automatic music creation systems will increase tremendously, especially given the difficulty and expertise required in producing music (not to mention the cost).
  • music is a powerful way of conveying emotion, affective music generation systems are able to lend this crucial element of emotion content to visual media.
  • the system offers a real-time music generation system that can be embedded into Brain-Computer-Interface (BCI) devices (or other systems using biofeedback) to assist the user in achieving a desired emotion state through neuro/biofeedback and affective music listening.
  • BCI Brain-Computer-Interface
  • the unique benefit here is that the system’s affective music can be composed based on (and in reaction to) the user’s brain state in real time. Therefore, the system is always sensitive to the user’s mental state (the music generation is based on the individual’s brain state), and can be used to bring the user to an improved emotion state (e.g., a pre-defined target emotion state, such as relaxation or happiness) based on their current mood. In this way, the system is both a sonification of one’s emotional state, and a source of emotion mediation (as music is a powerful means of influencing emotion states).
  • BCI Brain-Computer-Interface
  • the system may be used for wellness applications such as generating affective music “playlist” for emotion mediation. That is, using the flexible music generation system, a user may pre-define an ‘emotion trajectory’ (e.g., a path through emotion space, such as the 2-dimensional Valence- Arousal space) to define the emotional qualities of their music over the duration of listening. For example, if the user desires 10 minutes of music to help him move from a depressed emotion state to a happy emotion state, he may indicate an emotion trajectory from negative arousal/valcncc to positive arousal/valence over the specified duration, and the system will create a bespoke affective music session. This is another prime example of how the system may be used for highly- personalized, affective music creation.
  • an ‘emotion trajectory’ e.g., a path through emotion space, such as the 2-dimensional Valence- Arousal space
  • an Automatic Music Generation System that provides a flexible yet powerful way to sonify real-time emotion states of a user, and to influence the emotion states of the user is presented.
  • the AMGS provides a model that is able to fluidly generate affective music in real time, either based on pre-decided arousal and valence levels (e.g., as a sort of affective playlist for emotion mediation, or trajectory through emotion space), or based on the real-time feedback or physiological state of the user (e.g., EEG activity captured from the user and mapped to emotion state or arousal level and valence level).

Landscapes

  • Electrophonic Musical Instruments (AREA)

Abstract

L'invention concerne une méthode mise en œuvre par ordinateur pour la médiation d'émotion. La méthode mise en œuvre par ordinateur consiste à déterminer, au moyen d'un module d'identification d'état émotionnel d'un dispositif informatique, un niveau de valence émotionnelle et un niveau d'éveil émotionnel d'un utilisateur sur la base d'un signal reçu, le signal reçu représentant l'émotion de l'utilisateur, à ajuster, au moyen d'un module d'ajustement de paramètre du dispositif informatique, une pluralité de paramètres musicaux sur la base du niveau de valence émotionnelle ou du niveau d'éveil émotionnel, et à lire, au moyen d'un dispositif de génération audio couplé en communication au dispositif informatique, une pluralité de signaux audio musicaux sur la base de la pluralité de paramètres musicaux ajustés à l'utilisateur pour sonder l'émotion de l'utilisateur de façon à médier l'émotion de l'utilisateur.
PCT/SG2023/050789 2022-11-28 2023-11-28 Système de production de musique algorithmique pour la médiation d'émotion et méthode WO2024117973A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10202260229T 2022-11-28
SG10202260229T 2022-11-28

Publications (1)

Publication Number Publication Date
WO2024117973A1 true WO2024117973A1 (fr) 2024-06-06

Family

ID=91325104

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2023/050789 WO2024117973A1 (fr) 2022-11-28 2023-11-28 Système de production de musique algorithmique pour la médiation d'émotion et méthode

Country Status (1)

Country Link
WO (1) WO2024117973A1 (fr)

Similar Documents

Publication Publication Date Title
Etani et al. Optimal tempo for groove: Its relation to directions of body movement and Japanese nori
Frühauf et al. Music on the timing grid: The influence of microtiming on the perceived groove quality of a simple drum pattern performance
Bigand et al. Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts
Snyder et al. Tapping to ragtime: Cues to pulse finding
Hunter et al. Feelings and perceptions of happiness and sadness induced by music: similarities, differences, and mixed emotions.
Toiviainen et al. Tapping to Bach: Resonance-based modeling of pulse
US20200286505A1 (en) Method and system for categorizing musical sound according to emotions
Gabrielsson et al. Emotional expression in synthesizer and sentograph performance.
Witek et al. Effects of polyphonic context, instrumentation, and metrical location on syncopation in music
Thompson Intervals and scales
Chaffin et al. " It is different each time I play": Variability in highly prepared musical performance
Lehne et al. The influence of different structural features on felt musical tension in two piano pieces by Mozart and Mendelssohn
Straehley et al. The influence of mode and musical experience on the attribution of emotions to melodic sequences.
Scirea et al. Moody music generator: Characterising control parameters using crowdsourcing
Dromey et al. The effects of emotional expression on vibrato
Gratton et al. Absolute memory for tempo in musicians and non-musicians
Beier et al. Do you chill when I chill? A cross-cultural study of strong emotional responses to music.
London et al. Tapping doesn’t help: Synchronized self-motion and judgments of musical tempo
Sioros et al. Syncopation and groove in polyphonic music: Patterns matter
Micallef Grimaud et al. Emotional expression through musical cues: A comparison of production and perception approaches
Sauvé Prediction in polyphony: modelling musical auditory scene analysis
Agres et al. AffectMachine-Classical: a novel system for generating affective classical music
WO2024117973A1 (fr) Système de production de musique algorithmique pour la médiation d'émotion et méthode
Branje et al. Playing vibrotactile music: A comparison between the Vibrochord and a piano keyboard
Janata Cognitive neuroscience of music