WO2018217790A1 - System and method for automatically generating musical output - Google Patents
System and method for automatically generating musical output Download PDFInfo
- Publication number
- WO2018217790A1 WO2018217790A1 PCT/US2018/033941 US2018033941W WO2018217790A1 WO 2018217790 A1 WO2018217790 A1 WO 2018217790A1 US 2018033941 W US2018033941 W US 2018033941W WO 2018217790 A1 WO2018217790 A1 WO 2018217790A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- input
- musical
- lyrical
- characteristic
- note
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 110
- 230000001755 vocal effect Effects 0.000 claims abstract description 56
- 238000009877 rendering Methods 0.000 claims abstract description 30
- 230000000694 effects Effects 0.000 claims description 51
- 238000004458 analytical method Methods 0.000 description 40
- 238000004891 communication Methods 0.000 description 22
- 230000009467 reduction Effects 0.000 description 21
- 239000011295 pitch Substances 0.000 description 18
- 230000008569 process Effects 0.000 description 15
- 230000000875 corresponding effect Effects 0.000 description 13
- 238000012545 processing Methods 0.000 description 12
- 208000003028 Stuttering Diseases 0.000 description 9
- 239000013598 vector Substances 0.000 description 9
- 230000008859 change Effects 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000003786 synthesis reaction Methods 0.000 description 6
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 5
- 238000013500 data storage Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000036651 mood Effects 0.000 description 5
- 230000002093 peripheral effect Effects 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 238000003860 storage Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 230000006855 networking Effects 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 3
- 230000033764 rhythmic process Effects 0.000 description 3
- 230000002730 additional effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000003416 augmentation Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000001151 other effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 101150047910 CSNK1D gene Proteins 0.000 description 1
- MQJKPEGWNLWLTK-UHFFFAOYSA-N Dapsone Chemical compound C1=CC(N)=CC=C1S(=O)(=O)C1=CC=C(N)C=C1 MQJKPEGWNLWLTK-UHFFFAOYSA-N 0.000 description 1
- 241001669573 Galeorhinus galeus Species 0.000 description 1
- 244000035744 Hura crepitans Species 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000001020 rhythmical effect Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
- G06F3/04847—Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/061—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/066—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/091—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
- G10H2220/101—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters
- G10H2220/106—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters using icons, e.g. selecting, moving or linking icons, on-screen symbols, screen regions or segments representing musical elements or parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/091—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
- G10H2220/101—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters
- G10H2220/126—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters for graphical editing of individual notes, parts or phrases represented as variable length segments on a 2D or 3D representation, e.g. graphical edition of musical collage, remix files or pianoroll representations of MIDI-like files
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
- G10H2240/081—Genre classification, i.e. descriptive metadata for classification or selection of musical pieces according to style
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
- G10H2240/085—Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/171—Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
- G10H2240/201—Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
- G10H2240/211—Wireless transmission, e.g. of music parameters or control data by radio, infrared or ultrasound
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
Definitions
- the present disclosure relates generally to the field of music creation, and more specifically to a system of converting text to a musical composition.
- the disclosure describes a computer implemented method for automatically generating musical works.
- the computer implemented method may include receiving a lyrical input and receiving a musical input.
- the method may include analyzing, via one or more processors, the lyrical input to determine at least one lyrical characteristic and analyzing, via the one or more processors, the musical input to determine at least one musical characteristic.
- the method may include correlating, via the one or more processors, the lyrical input with the musical input to generate a synthesizer input.
- the method may include sending the synthesizer input and the at least one voice characteristic to a voice synthesizer.
- the method may also include receiving, from the voice synthesizer, a vocal rendering of the lyrical input.
- the method may include receiving a singer selection corresponding to at least one voice characteristic, and generating a musical work from the vocal rendering based on the lyrical input, the musical input, and the at least one voice characteristic.
- the disclosure describes a computer implemented method for automatically generating musical works.
- the computer implemented method may include receiving a lyrical input and receiving a musical input.
- the method may include analyzing, via one or more processors, the lyrical input to determine a lyrical characteristic and analyzing, via the one or more processors, the musical input to determine a musical characteristic.
- the method may also include comparing, via one or more processors, the lyrical characteristic with the musical characteristic to determine a disparity. Based on the determined disparity, the method may include automatically applying, via the one or more processors, at least one editing tool to the lyrical input to generate an altered lyrical input with an altered lyrical characteristic.
- the method may include correlating, via the one or more processors, the altered lyrical input with the musical input to generate a synthesizer input, and sending the synthesizer input to a voice synthesizer.
- the method may also include receiving, from the voice synthesizer, a vocal rendering of the altered lyrical input, and generating a musical work from the vocal rendering and the musical input.
- the disclosure describes a computer implemented method for automatically generating musical works.
- the computer implemented method may include receiving a lyrical input and receiving a musical input.
- the method may include analyzing, via one or more processors, the lyrical input to determine a lyrical characteristic, and analyzing, via the one or more processors, the musical input to determine a musical characteristic.
- the method may include comparing, via one or more processors, the lyrical characteristic with the musical characteristic to determine a disparity. Based on the determined disparity, the method may include automatically applying, via the one or more processors, at least one editing tool to the musical input to generate an altered musical input with an altered musical characteristic.
- the method may include correlating, via the one or more processors, the lyrical input with the altered musical input to generate a synthesizer input, and sending the synthesizer input to a voice synthesizer.
- the method may also include receiving, from the voice synthesizer, a vocal rendering of the lyrical input, and generating a musical work from the vocal rendering and the altered musical input.
- FIG. 1 illustrates one exemplary embodiment of a network configuration in which a media generation system may be practiced in accordance with the disclosure
- FIG. 2 illustrates a flow diagram of an embodiment of a method of operating the a media generation system in accordance with the disclosure
- FIG. 3 illustrates an embodiment of a playback slider bar in accordance with the disclosure
- FIG. 4 illustrates a block diagram of a device that supports the systems and processes of the disclosure
- FIG. 5 illustrates a flow diagram of another embodiment of a method of operating the media generation system in accordance with the disclosure
- FIG. 6 illustrates an exemplary graphical user interface for MIDI roll editing in accordance with the disclosure
- FIG. 7 illustrates an exemplary graphical user interface for applying tactile control in accordance with the disclosure
- FIG. 8 illustrates an exemplary graphical user interface for effects adjustment in accordance with the disclosure
- FIG. 9 illustrates a flow diagram of another embodiment of a method of operating the media generation system in accordance with the disclosure.
- FIG. 10 illustrates an exemplary graphical user interface in accordance with the disclosure
- FIG. 11 illustrates an exemplary graphical user interface in accordance with the disclosure
- FIG. 12 illustrates an exemplary graphical user interface in accordance with the disclosure
- FIG. 13 illustrates an exemplary graphical user interface in accordance with the disclosure
- FIG. 14 illustrates an exemplary graphical user interface in accordance with the disclosure
- FIG. 15 illustrates an exemplary graphical user interface in accordance with the disclosure
- FIG. 16 illustrates an exemplary graphical user interface in accordance with the disclosure
- FIG. 17 illustrates an exemplary graphical user interface in accordance with the disclosure.
- FIG. 18 illustrates an exemplary graphical user interface in accordance with the disclosure.
- the disclosure describes a system that may include an audio plugin for use in, for example, digital audio workstations.
- the system may combine at least a Musical Instrument Digital Interface (MIDI) melody or melodies with a typed or spoken user message to generate a vocal musical performance where the message may become lyrics sung to the MIDI melody.
- MIDI Musical Instrument Digital Interface
- the system may receive a user or
- the collection of singers or vocalists may include selections from a variety of genres or musical styles, and aspects from those genres and musical styles may be incorporated into the generated vocal track.
- the message that is the subject of the generated lyrics may be anything ranging from a few words, or to an entire song.
- the system may include additional controls to edit and alter the resultant vocal performance.
- additional controls to edit and alter the resultant vocal performance.
- X/Y axis controls may be used to control aspects of the musical output, such as embellishment or melisma, or slow glide versus auto tune.
- some embodiments of the system may provide various effects that a user may implement manually or that may be implemented automatically, such as reverb, delay, compression, etc.
- the present disclosure may also relate to a system and method for automatically generating musical outputs based on various user inputs and/or selections.
- a system and method for automatically generating musical outputs based on various user inputs and/or selections may also relate to a system and method for automatically generating musical outputs based on various user inputs and/or selections.
- the system may include a software plugin that may be used with existing audio and/or visual editing or composition software or hardware.
- the system may include independent software that may be run on any suitable computing device, such as a smart phone, a desktop computer, a lap top computer, etc.
- the device may be part of a network that includes remote servers that conduct all or parts of the musical output generation.
- the system may include an interface, such as a graphical user interface, with which a user may interact in providing and/or selecting a musical input.
- the musical input may be any of a variety of input types, such as a MIDI input, an audio recording, a prerecorded MIDI file, etc.
- the system may analyze the musical input, and the musical input may define all or part of the melody or melodies for the generated musical output.
- the user may also provide a lyrical input using any suitable input device, such as a keyboard, a touchscreen, a control pad, microphone, etc.
- the user may provide the lyrical input by speaking and allowing voice recognition to translate the speech into text for the system to use as the lyrical input.
- the system may then analyze the lyrical input along with the musical input and provide a musical output using the words or sound in the lyrical input as the lyrics of the musical output, the lyrics being sung to the melody of the musical input.
- the user may also select a singer having a voice or style upon which the musical output may be based.
- the singer's style and/or voice may be modeled by the system in such a way as to provide a musical output of the lyrics in the melody of the musical input as if it were being sung by the selected singer.
- the system may include a collection of singers for which models are available.
- the user may select the singer via a graphical user interface, or any other suitable selection mechanism, such as voice commands or textual input.
- the singers may be existing singers or vocalists whose voices and styles have been modeled, or the singers may be fictional characters with voices and styles having been assigned to them.
- a musical input may be analyzed to produce a musical output sounding like the selected singer singing the words of the lyrical input to the tune of the musical input with the voice and style of the selected singer.
- all or parts of the system and the software included in the system may be implemented in a variety of applications, including via instant messages, via voice command computer interface, such as Amazon Echo, Google Home, or Apple Siri voice command systems, via chat bots, and via filters on third-party or original applications.
- voice command computer interface such as Amazon Echo, Google Home, or Apple Siri voice command systems
- chat bots via filters on third-party or original applications.
- Features of the system may also be used or integrated into systems to create personal music videos and messages, ringback tones, emojis that sing messages, etc.
- the present disclosure may relate to a system and method for creating a message containing an audible musical and/or video composition that can be transmitted to users via a variety of messaging formats, such as SMS, MMS, and e-mail. It may also be possible to send such musical composition messages via various social media platforms and formats, such as Twitter®, Facebook®, Instagram®, or any other suitable media sharing system.
- the disclosed media generation system provides users with an intuitive and convenient way to automatically create and send original works based on infinitely varied user inputs. For example, the disclosed system can receive lyrical input from a user in the form of a text chain, along with the user's selection of a musical work or melody that is prerecorded or recorded and provided by the user.
- the media generation system can analyze and parse both the text chain and the selected musical work to create a vocal rendering of the text chain paired with a version of the musical work to provide a musically-enhanced version of the lyrical input by the user.
- the output of the media generation system can provide a substantial variety of musical output while maintaining user recognition of the selected musical work. The user can then, if it chooses, share the musical message with others via social media, SMS or MMS messaging, or any other form of file sharing or electronic communication.
- the user can additionally record video to accompany the musically enhanced text.
- the video can be recorded in real-time along with a vocal rendering of the lyrical input provided by the user in order to effectively match the video to the musical message created by the system.
- pre-recorded video can be selected and matched to the musical message.
- the result of the system in such embodiments, may be an original lyric video created using merely a client device such as a smartphone or tablet connected to a server via a network, and requiring little or no specialized technical skills or knowledge.
- FIG. 1 illustrates an exemplary embodiment of a network configuration in which the disclosed system 100 can be implemented. It is contemplated herein, however, that not all of the illustrated components may be required to implement the system, and that variations in the arrangement and types of components can be made without departing from the spirit of the scope of the invention.
- the illustrated embodiment of the system 100 includes local area networks ("LANs”) / wide area networks (“WANs”) (collectively network 106), wireless network 110, client devices 101-105, server 108, media database 109, and peripheral input/output (I/O) devices 111, 112, and 113.
- LANs local area networks
- WANs wide area networks
- I/O peripheral input/output
- client devices 101-105 may include virtually any computing device capable of processing and sending audio, video, textual data, or any other communication over a network, such as network 106, wireless network 110, etc.
- a network such as network 106, wireless network 110, etc.
- the wireless network 110 and the network 106 can be a digital communications network.
- Client devices 101-105 may also include devices that are configured to be portable.
- client devices 101-105 may include virtually any portable computing device capable of connecting to another computing device and receiving information.
- Client devices 101-105 may also include virtually any computing device capable of communicating over a network to send and receive information, including track information and social networking information, performing audibly generated track search queries, or the like.
- the set of such devices may include devices that typically connect using a wired or wireless communications medium such as personal computers, multiprocessor systems, microprocessor- based or programmable consumer electronics, network PCs, or the like.
- at least some of client devices 101-105 may operate over wired and/or wireless network.
- a client device 101-105 can be web-enabled and may include a browser application that is configured to receive and to send web pages, web-based messages, and the like.
- the browser application may be configured to receive and display graphics, text, multimedia, video, etc., and can employ virtually any web-based language, including a wireless application protocol messages (WAP), and the like.
- WAP wireless application protocol
- the browser application is enabled to employ Handheld Device Markup Language (HDML), Wireless Markup Language (WML), WMLScript, JavaScript, Standard Generalized 25 Markup
- HDML Handheld Device Markup Language
- WML Wireless Markup Language
- WMLScript Wireless Markup Language
- JavaScript Standard Generalized 25 Markup
- SMGL HyperText Markup Language
- HTML HyperText Markup Language
- XML extensible Markup Language
- a user of the client device may employ the browser application to interact with a messaging client, such as a text messaging client, an email client, or the like, to send and/or receive messages.
- a messaging client such as a text messaging client, an email client, or the like
- Client devices 101-105 also may include at least one other client application that is configured to receive content from another computing device.
- the client application may include a capability to provide and receive multimedia content, such as textual content, graphical content, audio content, video content, etc.
- the client application may further provide information that identifies itself, including a type, capability, name, and the like.
- client devices 101-105 may uniquely identify themselves through any of a variety of mechanisms, including a phone number, Mobile Identification Number (MUST), an electronic serial number (ESN), or other mobile device identifier.
- MUST Mobile Identification Number
- ESN electronic serial number
- the information may also indicate a content format that the mobile device is enabled to employ. Such information may be provided in, for example, a network packet or other suitable form, sent to server 108, or other computing devices.
- the media database 109 may be configured to store various media such as musical clips and files, etc., and the information stored in the media database may be accessed by the server 108 or, in other embodiments, accessed directly by other computing device through over the network 106 or wireless network 110.
- Client devices 101-105 may further be configured to include a client application that enables the end-user to log into a user account that may be managed by another computing device, such as server 108.
- a user account may be configured to enable the end-user to participate in one or more social networking activities, such as submit a track or a multi-track recording or video, search for tracks or recordings, download a multimedia track or other recording, and participate in an online music community.
- participation in various networking activities may also be performed without logging into the user account.
- Wireless network 110 is configured to couple client devices 103-105 and its components with network 106.
- Wireless network 110 may include any of a variety of wireless sub-networks that may further overlay stand-alone ad-hoc networks, and the like, to provide an infrastructure-oriented connection for client devices 103-105.
- Such sub-networks may include mesh networks, Wireless LAN (WLAN) networks, cellular networks, and the like.
- Wireless network 110 may further include an autonomous system of terminals, gateways, routers, etc., connected by wireless radio links, or other suitable wireless communication protocols. These connectors may be configured to move freely and randomly and organize themselves arbitrarily, such that the topology of wireless network 110 may change rapidly.
- Wireless network 110 may further employ a plurality of access technologies including 2nd (2G), 3rd (3G), 4th (4G) generation, and 4G Long Term Evolution (LTE) radio access for cellular systems, WLAN, Wireless Router (WR) mesh, and other suitable access technologies.
- Access technologies such as 2G, 3G, 4G, 4G LTE, and future access networks may enable wide area coverage for mobile devices, such as client devices 103-105 with various degrees of mobility.
- wireless network 110 may enable a radio connection through a radio network access such as Global System for Mobil communication (GSM), General Packet Radio Services (GPRS), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (WCDMA), etc.
- GSM Global System for Mobil communication
- GPRS General Packet Radio Services
- EDGE Enhanced Data GSM Environment
- WCDMA Wideband Code Division Multiple Access
- wireless network 110 may include virtually any wireless communication mechanism by which information may travel between client devices 103-105 and another computing device, network, and the like.
- Network 106 is configured to couple network devices with other computing devices, including, server 108, client devices 101-102, and through wireless network 110 to client devices 103-105.
- Network 106 is enabled to employ any form of computer readable media for communicating information from one electronic device to another.
- network 106 can include the Internet in addition to local area networks (LANs), wide area networks (WANs), direct connections, such as through a universal serial bus (USB) port, other forms of computer-readable media, or any combination thereof.
- LANs local area networks
- WANs wide area networks
- USB universal serial bus
- a router acts as a link between LANs, enabling messages to be sent from one to another.
- communication links within LANs typically include twisted wire pair or coaxial cable
- communication links between networks may utilize analog telephone lines, full or fractional dedicated digital lines including Tl, T2, T3, and T4, Integrated Services Digital Networks (ISDNs), Digital Subscriber Lines (DSLs), wireless links including satellite links, or other communications links known to those skilled in the art.
- ISDNs Integrated Services Digital Networks
- DSLs Digital Subscriber Lines
- remote computers and other related electronic devices could be remotely connected to either LANs or WANs via a modem and temporary telephone link.
- network 106 includes any communication method by which information may travel between computing devices.
- client devices 101-105 may directly communicate, for example, using a peer-to-peer configuration.
- communication media typically embodies computer-readable instructions, data structures, program modules, or other transport mechanism and includes any information delivery media.
- communication media includes wired media such as twisted pair, coaxial cable, fiber optics, wave guides, and other wired media and wireless media such as acoustic, RF, infrared, and other wireless media.
- Various peripherals including I/O devices 111-113 may be attached to client devices 101-105.
- Multi -touch, pressure pad 113 may receive physical inputs from a user and be distributed as a USB peripheral, although not limited to USB, and other interface protocols may also be used, including but not limited to ZIGBEE, BLUETOOTH, near field communication (NFC), or other suitable connections.
- Data transported over an external and the interface protocol of pressure pad 113 may include, for example, MIDI formatted data, though data of other formats may be conveyed over this connection as well.
- a similar pressure pad may alternately be bodily integrated with a client device, such as mobile devices 104 or 105.
- a headset 112 may be attached to an audio port or other wired or wireless I/O interface of a client device, providing an exemplary arrangement for a user to listen to playback of a composed message, along with other audible outputs of the system.
- Microphone 111 may be attached to a client device 101-105 via an audio input port or other connection as well.
- one or more speakers and/or microphones may be integrated into one or more of the client devices 101-105 or other peripheral devices 111-113.
- an external device may be connected to pressure pad 113 and/or client devices 101-105 to provide an external source of sound samples, waveforms, signals, or other musical inputs that can be reproduced by external control.
- Such an external device may be a MIDI device to which a client device 103 and/or pressure pad 113 may route MIDI events or other data in order to trigger the playback of audio from external device.
- formats other than MIDI may be employed by such an external device.
- FIG. 2 is a flow diagram illustrating an embodiment of a method 200 for operating the media generation system 100, with references made to the components shown in FIG. 1.
- the system can receive a lyrical input at 204.
- the text or lyrical input may be input by the user via an electronic device, such as a PC, tablet, or smartphone, any other of the client devices 101-105 described in reference to FIG. 1 or other suitable devices.
- the text may be input in the usual fashion in any of these devices (e.g., manual input using soft or mechanical keyboards, touch-screen keyboards, speech-to-text conversion).
- an electronic device such as a PC, tablet, or smartphone
- the text may be input in the usual fashion in any of these devices (e.g., manual input using soft or mechanical keyboards, touch-screen keyboards, speech-to-text conversion).
- the text or lyrical input is provided through a specialized user interface application accessed using the client device 101-105.
- the lyrical input could be delivered via a general application for transmitting text-based messages using the client device 101-105.
- the resulting lyrical input may be transmitted over the wireless communications network 110 and/or network 106 to be received by the server 108 at 204.
- the system 100 may analyze the lyrical input using server 108 to determine certain characteristics of the lyrical input. In some embodiments, however, it is contemplated that analysis of the lyrical input could alternatively take place on the client device 101-105 itself instead of or in parallel to the server 108.
- the lyrical input is parsed into the speech elements of the text with a speech parser.
- the speech parser may identify important words (e.g., love, anger, crazy), demarcate phrase boundaries (e.g., "I miss you.” “I love you.” “Let's meet.” “That was an awesome concert ”) and/or identify slang terms (e.g., chill, hang). Words considered as important can vary by region or language, and can be updated over time to coincide with the contemporary culture. Similarly, slang terms can vary geographically and temporally such that the media generation system 100 is updatable and customizable.
- Punctuation or other symbols used in the lyrical input can also be identified and attributed to certain moods or tones that can influence the analytical parsing of the text. For example, an exclamation point could indicate happiness or urgency, while a "sad-face" emoticon could indicate sadness or romance.
- the words or lyrics conveyed in the lyrical input can also be processed into its component pieces by breaking words down into syllables, and further by breaking the syllables into a series of phonemes.
- the phonemes are used to create audio playback of the words or lyrics in the lyrical input. Additional techniques used to analyze the lyrical input are described in greater detail below.
- the system may receive a selection of a musical input transmitted from the client device 101-105.
- a user interface may be implemented to select the musical input from a list or library of pre-recorded and catalogued musical works or clips of musical works that may comprise one or more musical phrases.
- a musical phrase may be a grouping of musical notes or connected sounds that exhibits a complete musical "thought," analogous to a linguistic phrase or sentence.
- the list of available musical works or phrase may include, for example, a text-based description of the song title, performing artists, genre, and/or mood set by phrase, to name only a few possible pieces of information that could be provided to users via the user interface.
- the user may then choose the desired musical work or clip for the media generation system to combine with the lyrical input.
- there may be twenty or more pre-recorded and selected musical phrases for the user to choose from.
- the pre-recorded musical works or phrases may be stored on the server 108 or media database 109 in any suitable computer readable format, and accessed via the client device 101-105 through the wireless network 106 and/or network 110.
- the pre-recorded musical works may be stored directly onto the client device 101-105 or another local memory device, such as a flash drive or other computer memory device. Regardless of the storage location, the list of pre-recorded musical works can be updated over time, removing or adding musical works in order to provide the user with new options and additional choices.
- a user may create their own melodies for use in association with the media generation system.
- One or more melodies may be created using the technology disclosed in U.S. Patent No. 8,779,268 entitled “System and Method for Producing a More Harmonious Musical Accompaniment Graphical User Interface for a Display Screen System and Method that Ensures Harmonious Musical Accompaniment" assigned to the assignee of the present application.
- Such patent disclosure is hereby incorporated by reference, in full.
- a user may generate a musical input using an input device 111- 113, such as a MIDI instrument or other device for inputting user-created musical works or clips.
- a user may use MIDI keyboard to generate a musical riff or entire song to be used as the musical input.
- a user may create audio recording playing notes with a more traditional, non-MIDI instrument, such as a piano or a guitar. The audio recording may then be analyzed for pitch, tempo, etc., to utilize the audio recording as the musical input.
- individual entries in the list of musical input options are selectable to provide, via the client device 101-105, a pre-recorded musical work (either stored or provided by the user), or a clip thereof, as a preview to the user.
- the user interface associated with selecting a musical work includes audio playback capabilities to allow the user to listen to the musical clip in association with their selection of one of the musical works as the musical input.
- such playback capability may be associated with a playback slider bar that graphically depicts the progressing playback of the musical work or clip.
- FIG. 3 One illustrative example of a playback slider bar 300 is shown in FIG. 3.
- the illustrated playback slider bar 300 may include a start 302, an end 304, and a progress bar 306 disposed between the start and end. It should be understood, however, that other suitable configurations are contemplated in other embodiments. In the embodiment illustrated in FIG.
- the total length of the selected musical work or clip is 14.53 seconds, as shown at the end 304, though it should be understood that any suitable length of musical work or clip is contemplated.
- a progress indicator 308 moves across the progress bar 306 from the start 302 to end 304.
- the progress bar "fills in” as the progress indicator 308 moves across, resulting in a played portion 310 disposed between the start 302 and the progress indicator and an unplayed portion 312 disposed between the progress indicator and the end 304 of the musical clip.
- the progress indicator 308 has progressed across the progress bar 306 to the 6.10 second mark in the selected musical clip.
- FIG. 3 shows the progress bar 306 being filled in as the progress indicator 308 moves across it, other suitable mechanisms for indicating playback progress of a musical work or clip are also contemplated herein.
- the user may place brackets, such as a first bracket 314 and a second bracket 316, around a subset of the selected musical phrase/melody along the progress bar 306.
- the brackets 314, 316 may indicate the portions of the musical work or clip to be utilized as the musical input at 208 in FIG. 2.
- the first bracket 314 may indicate the "start" point for the selected musical input
- the second bracket 316 may indicate the "end” point.
- Other potential user interfaces that may facilitate user playback and selection of a subset of the musical phrase may be used instead of or in conjunction with the embodiment of the playback slider bar 300 of FIG. 3.
- the client device 101-105 may transmit the selection over the wireless network 106 and/or network 110, which may be received by the server 108 as the musical input at 208 of FIG. 2.
- the musical input may be analyzed and processed in order to identify certain characteristics and patterns associated with the musical input so as to more effectively match the musical input with the lyrical input to produce an original musical composition for use in a message or otherwise.
- analysis and processing of the musical work includes "reducing" or “embellishing" the musical work.
- the selected musical work may be parsed for features such as structurally important notes, rhythmic signatures, and phrase boundaries.
- each musical work or clip may optionally be embellished or reduced, either adding a number of notes to the phrase in a musical way (embellish), or removing them (reduce), while still maintaining the idea and recognition of the original melody in the musical input.
- embellishments or reductions may be performed in order to align the textual phrases in the lyrical input with the musical phrases by aligning their boundaries, and also to provide the musical material necessary for the alignment of the syllables of individual words to notes resulting in a natural musical expression of the input text.
- all or part of the analysis of the pre-recorded musical works may have already been completed enabling the media generation system to merely retrieve the pre-analyzed data from the media database 109 for use in completing the musical composition.
- the process of analyzing the musical work in preparation for matching with the lyrical input and for use in the musical message is set forth in more detail below.
- the lyrical input and the musical input may be correlated with one another based on the analyses of both the lyrical input and the musical input 206 and 210.
- the notes of the selected and analyzed musical work are intelligently and automatically assigned to one or more phonemes in the input text, as described in more detail below.
- the resulting data correlating the lyrical input to the musical input may then be formatted into a synthesizer input at 214 for input into a voice synthesizer.
- the formatted synthesizer input in the form of text syllable-melodic note pairs, may then be sent to a voice synthesizer at 216 to create a vocal rendering of the lyrical input for use in an original musical work that incorporates characteristics of the lyrical input and the musical input.
- the musical message or vocal rendering may then be received by the server 108 at 218.
- the generated musical work may be received in the form of an audio file including a vocal rendering of the lyrical input entered by the user correlating with the music/melody of the musical input, either selected or created.
- the voice synthesizer may generate the entire musical work including the vocal rendering of the lyrical input and the musical portion from the musical input.
- the voice synthesizer may generate only a vocal rendering of the input text created based on the synthesizer input, which may be generated by analyzing the lyrical input and the musical input described above.
- a musical rendering based on the musical input, or the musical input itself may be combined with the vocal rendering to generate a musical work.
- the voice synthesizer may be any suitable vocal renderer.
- the voice synthesizer may be cloud-based with support from a web server that provides security, load balancing, and the ability to accept inbound messages and send outbound musically- enhanced messages.
- the vocal renderer may be run locally on the server 108 itself or on the client device 101-105.
- the voice synthesizer may render the formatted lyrical input data to provide a text-to-speech conversion as well as singing speech synthesis.
- the vocal renderer may provide the user with a choice of a variety of voices, a variety of voice synthesizers (including but not limited to HMM-based, diphone or unit- selection based), or a choice of human languages.
- Some examples of the choices of singing voices are gender (e.g., male/female), age (e.g., young/old), nationality or accent (e.g., American accent/British accent), or other distinguishing vocal characteristics (e.g., sober/drunk, yelling/whispering, seductive, anxious, robotic, etc.).
- these choices of voices may be implemented through one or more speech synthesizers each using one or more vocal models, pitches, cadences, and other variables that may result in perceptively different sung attributes.
- the choice of voice synthesizer may be made automatically by the system based on analysis of the lyrical input and/or the musical input for specific words or musical styles indicating mood, tone, or genre.
- the system may provide harmonization to accompany the melody. Such accompaniment may be added into the message in the manner disclosed in pending U.S. Patent No. 8,779,268, incorporated by reference above.
- the user may have the option of adding graphical elements to the musical work at 219. If selected, graphical elements may be chosen from a library of pre-existing elements stored either at the media database 109, on the client device 101-105 itself, or both. In another embodiment, the user may create its own graphical element for inclusion in a generated multimedia work. In yet other embodiments, graphic elements may be generated automatically without the user needing to specifically select them.
- graphics may be colors and light flashes that correspond to the music in the musical work, animated figures or characters spelling out all or portions of textual message or lyrics input by the user, or other animations or colors that may be automatically determined to correspond with the tone of the musical input or with the tone of the lyrical input itself as determined by analysis of the lyrical input.
- a graphical input indicating this selection may be transmitted to and received by the server 108 at 220.
- the graphical element may then be generated at 222 using either the pre-existing elements selected by the user, automatic elements chosen by the system based on analysis of the lyrical input and/or the musical input, or a graphical elements provided by the user.
- the user may choose, at 224, to include a video element to be paired with the musical work, or to be stored along with the musical work in the same media file output.
- the user interface may activate one or more cameras that may be integrated into the client device 101-105 to capture video input, such as front-facing or rear-facing cameras on a smartphone or other device.
- the user may manipulate the user interface on the client device to record video inputs to be incorporated into the generated musical .
- the user interface displayed on the client device 101-105 may provide playback of the generated musical work while the user captures the video inputs allowing the user to coordinate particular features of the video inputs with particular portions of the musical work.
- the user interface may display the text of the lyrical input on the device's screen with a progress indicator moving across the text during playback so as to provide the user with a visual representation of the musical work's progress during video capture.
- the user interface may allow the user to stop and start video capture as desired throughout playback of the musical work, while simultaneously stopping playback of the musical work.
- One such way of providing this functionality may be by capturing video while the user touches a touchscreen or other input of the client device 101-105, and at least temporarily pausing video capture when the user releases the touchscreen or other input.
- the system may allow the user to capture certain portions of the video input during a first portion of the musical work, pause the video capture and playback of the musical work when desired, and then continue capture of another portion of the video input to correspond with a second portion of the musical work.
- the user interface may provide the option of editing the video input by re-capturing portions of or the entirety of the video input.
- the video input may be transmitted to and received by the server 108 for processing at 226.
- the video input may then be processed to generate a video element at 228, and the video element may then be incorporated into the musical work to generate a multimedia musical work.
- the video element may be synced and played along with the musical work corresponding to an order in which the user captured the portions of the video input.
- processing and video element generation may be completed on the client device 101-105 itself without the need to transmit video input to the server 108.
- the musical work or multimedia work may be transmitted or outputted, at 230, to the client device 101-105 over the network 110 and/or wireless network 110.
- the musical work may be outputted to speakers and/or speakers combined with a visual display.
- the system may provide the user with the option of previewing the musical or multimedia work at 232.
- the musical or multimedia work may be played at 234 via the client device 101-105 for the user to review.
- the user may be provided with the option to cancel the work without sending or otherwise storing, or to edit the work further.
- the user may store the work as a media file, send the work as a musical or multimedia message to a selected message recipient, etc., at 235.
- the musical or multimedia work may be sent to one or more recipients using a variety of communications and social media platforms, such as SMS or MMS messaging, e-mail, Facebook ® , Twitter ® , and Instagram ® , so long as the messaging
- service/format supports the transmission, delivery, and playback of audio and/or video files.
- a method of generating a musical work may additionally include receiving a selection of a singer corresponding to at least one voice characteristic.
- the at least one voice characteristic may be indicative of a particular real- life or fictional singer with a particular recognizable style. For example, a particular musician may have a recognizable voice due to a specific twang, falsetto, vocal range, vibrato style, etc.
- the at least one voice characteristic may be incorporated into the performance of the musical work. It is contemplated that, in some embodiments, the at least one voice characteristic may be included in the formatted data sent to the voice synthesizer at 216 of the method 200 in FIG. 2. However, it is also contemplated that the at least one voice characteristic may be incorporated into the vocal rendering received from the voice synthesizer.
- the methodology disclosed herein provides technical solutions to technical problems associated with correlating lyrical inputs with musical inputs such that the musical output of the correlation of the two inputs is matched effectively. Further, the methods and features described herein can operate to improve the functional ability of the computer or server to process certain types of information in a way that makes the computer more usable and functional than would otherwise be possible without the operations and systems described herein.
- the media generation system may gather and manipulate text and musical inputs in such a way to assure system flexibility, scalability, and effectiveness.
- collection and analysis of data points relating to the lyrical input and musical input is implemented to improve the computer and the system's ability to effectively correlate the musical and lyrical inputs.
- Some data points determined and used by the system in analyzing and processing a lyrical input, such as in step 206, may be the number of characters, or character count (“CC"), and the number of words, or word count ("WC”) included in the lyrical input. Any suitable method may be used to determine the CC and WC.
- the system may determine WC by counting spaces between groups of characters, or by recognizing words in groups of characters by reference to a database of known words in a particular language or selection of languages.
- Other data points determined by the system during analysis of the lyrical input may be the number of syllables, or syllable count ("TC") and the number of sentences, or sentence count ("SC").
- TC and SC may be determined in any suitable manner, for example, by analyzing punctuation and spacing for SC, or parsing words into syllables by reference to a word database stored in the media database 109 or elsewhere.
- the system may analyze and parses the input text to determine values such as the CC, WC, TC, and SC. In some embodiments, this parsing may be conducted at the server 108, but it is also contemplated that, in some embodiments, parsing of the input text may be conducted on the client device 101-105. In certain embodiments, during analysis, the system may insert coded start flags and end flags at the beginning and end of each word, syllable, and sentence to mark the determination made during analysis.
- a start flag at the beginning of a sentence may be referred to as the sentence start ("SS"), and the location of the end flag at the end of a sentence may be referred to as the sentence end ("SE").
- SS sentence start
- SE sentence end
- words or syllables of the lyrical input may be flagged for a textual emphasis.
- the system methodology for recognizing such instances in which words or syllables should receive textual emphasis may be based on language or be culturally specific.
- another analysis conducted by the system on the input text may be determining the phrase class ("PC") of each of the CC and the WC.
- the phrase class of the character count will be referred to as the CCPC and the phrase class of the word count will be referred to as the WCPC.
- the value of the phrase class may be a sequentially indexed set of groups representing increasing sets of values of CC or WC. For example, a lyrical input with CC of 0 may have a CCPC of 1, and a lyrical input with a WC of 0 may have a WCPC of 1.
- a lyrical input with a CC of between 1 and 6 may have a CCPC of 2
- a lyrical input with a WC of 1 may have a WCPC of 2.
- the CCPC and WCPC may then increase sequentially as the CC or the WC increases, respectively.
- Table 1 illustrates, for exemplary and non-limiting purposes only, a possible classification of CCPC and WCPC based on CC and WC in a lyrical input.
- the system may determine an overall phrase class for the entire lyrical input by the user, or the user phrase class ("UPC"). This determination may be made by giving different weights to different values of CCPC and WCPC, respectively. In some embodiments, greater weight may be given to the WCPC than the CCPC in
- a lyrical input with a CC of 27 and a WC of 3 may have a CCPC of 5 and a WCPC of 3, resulting in a UPC of 3.8 as follows:
- the phrase class system and weighting system explained herein m variable based on several factors related to the selected musical input such as mood, genre, style, etc., or other factors related to the lyrical input, such as important words or phrases as determined during analysis of the lyrical input.
- the musical input selected or provided by the user may be parsed during analysis and processing, such as in step 210 of FIG. 2.
- the system may parse the musical input selected or provided by the user to determine a variety of data points.
- One data point determined in the analysis may be the number of notes, or note count ("NC") in the particular musical input.
- Another product of the analysis that may be done on the musical input may include determining the start and end of musical phrases throughout the musical input.
- a musical phrase may be analogous to a linguistic sentence in that a musical phrase is a grouping of musical notes that conveys a musical thought.
- the analysis and processing of the selected musical input may involve flagging the beginnings and endings of each identified musical phrase in a musical input.
- a phrase class of the of the lyrical input UPC
- a phrase class of the source musical input referred to as source phrase class (“SPC") may be determined, for example, based on the number of musical phrases and note count identified in the musical input.
- PS PS
- PE phrase end
- the PS and the PE in the musical input may be analogous to the sentence start (SS) and sentence end (SE) in the lyrical input.
- the PS and PE associated with the preexisting musical works may be pre-recorded and stored on the server 108 or the client device 101-105, where they may be available for selection by the user as a musical input.
- the locations of PS and PE for the musical input may be pre-determined and analysis of the musical input involves retrieving such information from a store location, such as the media database 109.
- further analysis is conducted to distinguish musical phrases in the musical input and, thus, determine the corresponding PS and PE for each identified musical phrase.
- the phrase classes of the lyrical input and the musical input are compared to determine the parity or disparity between the two inputs. It should be understood that, although the disclosure describes comparing corresponding lyrical inputs and musical inputs using phrase classes, other methodologies for making comparisons between lyrical inputs and musical inputs are contemplated herein.
- the phrase class comparison can take place upon correlating the musical input with the lyrical input based on the respective analyses, such as at step 212.
- parity between a lyrical input and a musical input is analyzed by determining the phrase differential ("PD") between corresponding lyrical inputs and musical inputs provided by the user.
- determining the PD is by dividing the user phrase class (UPC) by the source phrase class (SPC), as shown in Equation 3, below:
- perfect phrase parity between the lyrical input and the musical input would result in a PD of 1.0, where the UPC and the SPC are equal. If the lyrical input is "shorter" than the musical input, the PD may have a value less than 1.0, and if the lyrical input is "longer” than the musical input, the PD may have a value of greater than 1.0.
- Those with skill in the art will recognize that similar results could be obtained by dividing the SPC by the UPC, or with other suitable comparison methods.
- Parity between the lyrical input and the musical input may also be determined by the "note” differential ("ND") between the lyrical input and the musical input provided by the user.
- ND the "note" differential
- One example of determining the ND is by taking the difference between the note count (NC) and the analogous syllable count (TC) of the lyrical input. For example:
- perfect phrase parity between the lyrical input and the musical input would be an ND of 0, where the NC and the TC are equal. If the lyrical input is "shorter” than the musical input, the ND may be greater than or equal to 1, and if the lyrical input is "longer” than the musical input, the ND may be less than or equal to -1.
- Those with skill in the art will recognize that similar results could be obtained by subtracting the NC from the TC, or with other suitable comparison methods.
- the sentence starts (SS) and sentence ends (SE) of the lyrical input may align with the phrase starts (PS) and phrase ends (PE), respectively, of the musical input if the parity is perfect or close to perfect (i.e., high parity).
- note differential When parity is perfect, i.e., note differential (ND) is zero, the note count (NC) and the syllable count (TC) may be equal or the phrase differential (PD) may be 1.0, syllabic matching may involve simply matching the syllables in the lyrical input to the notes in the musical input and/or matching the lyrical input sentences to the musical input musical phrases.
- the media generation system 100 may provide techniques to increase or optimize note parity by minimizing the absolute value of note differential in a musical work to be output. Among other things, optimizing note parity may also maximize the recognizability of the melody chosen or otherwise provided as the musical input by, for example, making the number of notes as close as possible to the source note count. For example, in some embodiments, if PD is slightly greater than or less than to 1.0 and/or ND is between, for example, 1 and 5 or -1 and -5, melodic reduction or embellishment, respectively, may be used to provide correlation between the inputs.
- Melodic reduction involves reducing the number of notes played in the musical input and may be used when the NC is slightly greater than the TC (e.g., ND is between approximately 1 and 5) or the musical source phrase class (SPC) is slightly greater than the user phrase class (UPC) (e.g., PD is slightly less than 1.0). Reducing the notes in the musical input may shorten the overall length of the musical input and result in the NC being closer to or equal to the TC of the lyrical input, improving the phrase parity. The fewer notes that are removed from the musical input, the less impact the reduction will have on the musical melody selected as the musical input and, therefore, the more recognizable the musical element of the musical work may be upon completion.
- the NC is slightly greater than the TC (e.g., ND is between approximately 1 and 5) or the musical source phrase class (SPC) is slightly greater than the user phrase class (UPC) (e.g., PD is slightly less than 1.0).
- SPC musical source phrase class
- UPC user phrase
- melodic embellishment involves adding notes to (i.e., "embellishing") the musical input.
- melodic embellishment is used when the NC is slightly less than the TC (e.g., ND is between -1 and -5) or the SPC is slightly less than the UPC (e.g., PD is slightly greater than 1.0). Adding notes in the musical input may lengthen the musical input, which may add to the NC or SPC and, thus, increase the parity between the inputs.
- the additional notes added to the musical work may be determined by analyzing the original notes in the musical input and adding notes that make sense musically. For example, in some embodiments, the system may only add notes in the same musical key as the original work in the musical input, or notes that maintain the tempo or other features of the original work so as to aide in keeping original work recognizable. It should be understood that although melodic reduction and embellishment have been described in the context of slight phrase disparity between the musical and lyrical inputs, use of melodic reduction and embellishment in larger or smaller phrase disparity is also contemplated.
- the system 100 may also include determining the most probable melodic embellishment by utilizing supervised learning on a modified Probabilistic Context-Free Grammar.
- a set of melodic embellishment rules may be implemented that may encode many of the common surface-level forms of melodic
- the melodic embellishment rules may be broken out into two-note rules, three- note rules, and four-note rules.
- the two-note rules may include suspension, anticipation, and consonant skip.
- the three-note rules may include passing tone, neighbor tone, appoggiatura, and escape tone.
- the four-note rules may include at least passing tone.
- each rule may receive a window of notes as its input, such as two notes, three notes, or four notes.
- the grammar may identify the notes that are most likely embellishments of the neighboring notes. As such, embellished notes may be reduced out and removed from the melody, or embellishments may be added as appropriate.
- the process may continue until the melody for the musical input is reduced to a single note or embellished beyond an intelligible note density.
- the result may be a tree of melodic embellishments where each node may be a note that is hierarchically placed by the embellishment rules.
- the process above may be executed once the grammar has been trained using the statistics of existing compositions or the corresponding reductions thereof. For example, a database may be utilized that includes existing melodies that have been analyzed and their entire reductive trees encode in Extensible Mark-up Language (XML).
- XML Extensible Mark-up Language
- the system may define a threshold under which melodic reduction should not be applied.
- the threshold may not be static, but instead may be relative to the size of the melodic phrase being reduced.
- the threshold may be modified through configuration options.
- the default threshold may be 80%.
- melodic reduction may be used alone to achieve note parity when the input text has a syllable count (TC) that is 80% or more of the note count (NC).
- the default threshold may be 70%, 75%, 85%, 90%, or 95%.
- the below XML code may be an example of training data as described herein:
- FIGS. 10-13 show an example graphical user interface (GUI) illustrating an embodiment of the example training data.
- GUI graphical user interface
- FIG. 10 shows an example GUI 1000 of a CONSONANT SKIP LEFT embellishment from note index 2 (the 3rd note, 0-indexed) to note index 4 (the 5th note).
- note index 2 the 3rd note, 0-indexed
- note index 4 the 5th note
- the left note of the tope embellishment (note index 2) may then be further embellished.
- FIG. 11 shows an example GUI 1100 of a REPEAT RIGHT embellishment from the first note (of index 0).
- FIG. 12 shows an example GUI 1200 showing that the first note may be further embellished by a
- the CONSONANT SKIP LEFT (note index 4) may be similarly embellished further by a REPEAT RIGHT embellishment, completing the entire reduction for this example embodiment. This is shown in GUI 1300 of FIG. 13.
- each embellishment may gather a number of situations in which it can be applied.
- the notes that may be embellished as well as the structural tones on which they rely may be measured, including the interval measurements between each note in the embellishment figure.
- the interval-onset intervals may be the difference in time between the onset of one note, and the onset of the note following it in a musical monophonic sequence.
- the system may group similar melodic situations and apply the same reduction or embellishment to those situations.
- stutter effects may be used to address medium parity differentials - e.g., a PD between approximately 0.75 and 1.5.
- Stutter effects may involve cutting and repeating relatively short bits of a musical or vocal work in relatively quick succession.
- Stutter effects may be applied to either the musical input or to the lyrical input in the form of vocal stutter effects in order to lengthen one or the other input to more closely match the corresponding musical or lyrical input.
- a musical input is shorter than a corresponding lyrical input (e.g., PD is approximately 1.5)
- the musical input could be lengthened by repeating a small portion or portions of the musical input in quick succession.
- a similar process may be used with the lyrical input, repeating one or more syllables of the lyrical input in relatively quick succession to lengthen the lyrical input.
- the phrase differential between the musical input and the lyrical input may be brought closer to the optimal level. It should be understood that although stutter effects have been described in the context of medium phrase disparity between the musical and lyrical inputs, use of stutter effects in larger or smaller phrase disparity is also contemplated.
- repetition and melisma may be used to resolve relatively large phrase differentials between musical and lyrical inputs - e.g., a PC less than 0.5 or greater than 2.0.
- Repetition includes repeating either the lyrical input or the musical input more than once while playing the corresponding musical or lyrical input a single time. For example, if the PD is 0.5, this may indicate that musical input is twice as long as the lyrical input. In such a scenario, the lyrical input could simply be repeated once (i.e., played twice), to substantially match the length of the musical input.
- Melisma is another solution that may be used to resolve disparity between musical inputs and corresponding lyrical inputs.
- melisma may be used when the lyrical input is shorter than the musical input to make the lyrical input more closely match with the musical input. Specifically, melisma may occur when a single syllable from the lyrical input is stretched over multiple notes of the musical input.
- the system may assign one syllable from the lyrical input to be played or "sung" over two notes in the musical input.
- Melisma can be applied over a plurality of separate syllables throughout the lyrical input, such as at the beginning, middle, and end of the musical input.
- the system may choose which words or syllables to which a melisma should be applied based on analysis of the words in the lyrical input and/or based on the tone or mood of the musical work chosen as the musical input.
- specific phoneme combinations may be included in a speech syntheses engine's lexicon.
- the word “should” may be broken down in a tokenization process into the phoneme “sh”, “uh", and “d”. New words may be added to the speech syntheses engine's lexicon representing the word "should” as it may be sung over multiple notes. So, the speech synthesis engine may recognize, for example, the "sh” phoneme as a word in its lexicon.
- the lexicon could include: “shouldphonl” for ["sh” “uh”], “shouldphon2" for ["uh”], and “shouldphon3” for ["uh” “d”].
- the synthesis engine may then recognize where a melisma has been marked in the interface XML, and use these "words” when invoking the separated syllables for the word "should.” [0091]
- the system may identify locations where melisma may be helpful by analyzing the difference between two notes.
- a "metric level” is a hierarchy of metrical organization created to differentiate the meter of the onset of notes, based on a 4/4 meter.
- a note on beat one, on the downbeat may be given the Metric Level of 1, the downbeat of beat 3 may be level 2, the downbeat of both beat 2 and beat 4 may be assigned to level 3, all of the upbeats of a given measure may be assigned the level of 4, 16th notes may be level 5, and 32nd notes may be level 6.
- the "metric interval” may be the difference between two consecutive metric levels.
- the "chord-tone level” may be another assigned hierarchy, where the root of the chord is level 1, the fifth of the chord is level 2, and the third is level 3. Triads are assumed. Finally, the "chord-tone interval” may be the difference between two consecutive chord-tone levels.
- the system may estimate the difference in prominence between two consecutive notes.
- a large positive prominence differential may mean that the first note may be more rhythmically and harmonically prominent than the following note, while a negative prominence differential may be the opposite.
- melismas may not possible between two consecutive notes of the same pitch (or, at least, would not be recognizable in the synthesized vocal output), so those situations may not be considered in some embodiments. Additionally, in some embodiments, melismas may be limited to 1 or 2 semitones, with anything above that excluded.
- the system may execute methods for identifying situations in the text where may be advantageous to insert a melisma and calculate a text melisma score.
- syllables that are accented, and correspondingly may be marked with a stress tag of "1" during a tokenization process may be better candidates for melisma.
- syllables identified for melisma may contain a vowel that is extensible (e.g. some vowel sounds, like "ay" may not sound as good as others when repeated) in order to be considered, and a score based on the two conditions may be computed.
- the text melisma score may be combined with the note prominence score, and then a threshold may be used to decide whether or not a particular note should be extended by melisma.
- melismas may be added from left to right along the length of the lyrical input until the number of melismas added attains note parity, or until there are no more melisma scores that are above the threshold.
- leitmotifs are relatively smaller elements of a musical phrase that still include some "sameness" that may be discerned by the listener.
- the "sameness" may be a combination of similar or same rhythms and musical intervals repeated throughout a musical phrase.
- a leitmotif may be a grouping of notes within a musical phrase that follows similar note patterns or note rhythms, and these leitmotifs may be recognized by the system during analysis or can be predetermined for pre-recorded musical works. In either case, leitmotif locations throughout a musical input may be noted and marked.
- leitmotifs may then be used as prioritized targets for textual emphasis or repetition when analyzing the musical input to resolve disparity between the musical input and the lyrical input.
- the system may use melodic phrase analysis and removal to optimize parity. In some embodiments, this may involve analysis using a repeated sequences boundary detector. Such a detector may analyze a musical input to identify every or most of the repeating subsequences of a melody. In some embodiments, the algorithm that may identify the repeating subsequences may identify a sequence representing a series of pitches or pitchclasses, a series of pitch intervals or pitchclass intervals, or a series of inter-onset intervals.
- a pitchclass may be the number of semitones from the nearest "C” note to the given pitch (where C is below the given note, yielding a positive number), and the pitch interval may be the difference in pitchclass from one pitch to the following pitch in a melodic sequence.
- the algorithm in such embodiments may identify every repeating subsequence of every possible length. The system may then output a set of repeated subsequences of which certain
- subsequences are more musically salient than others.
- the system may then use a formula to identify the more musically important subsequences, and assign each subsequence a score based on the formula.
- Each note that begins a particularly strong subsequence in the melody may be assigned a strength based on the score provided by the formula. Notes with higher boundary strengths may be the most likely places that a phrase boundary may occur. In some melodies, a phrase in a subsequence may be repeated with one or more notes added in between.
- the phrase boundary detection algorithm described above may be combined with another algorithm for detecting large musical changes based on the concepts of Gestalt perception.
- the Gestalt theory of human perception may be extended to music into perceptual boundary detection.
- visual objects may be grouped based on the following principles: similarity, proximity, continuity, and closure.
- Musical events may be grouped in the same ways; for example, the system may group subsequences by focusing on similarity, proximity, and continuity. For example, in set of three notes in which the onset of the second note is a dotted quarter note away from the first note's onset, and the third note's onset is a single quarter note away from the second note's onset, the latter two notes can be grouped together, perceptually, because of the closeness of their onsets (proximity).
- the system 100 may use the principle of continuity.
- the secondary algorithm that identifies phrase boundaries may work by comparing three consecutive intervals. If the middle interval is significantly different from both of the surrounding intervals, then it may more likely be a phrase boundary. In some embodiments, this may be estimated by the maximum degree of change for sets of three notes, computed over the entire melody. The degree of change may be normalized on the maximum degree of change, so that all of the degree of change values for each three-note set may be normalized between 0 and 1.
- the intervals used for comparison may be based on three separate measurements: Pitch intervals, Inter-Onset Interval, and Offset-to-Onset Intervals.
- the normalized degree of change vectors may be computed over the melodic sequence for each measurement, and then may be combined into a single vector by a formula.
- the system may employ a phrase boundary detection algorithm by combining the two above processes.
- the algorithm may first use the repeated sequence boundary detector. This may yield a sparse vector which indicates the most likely places in the melody where subphrases might start based on the repetition in the melody. After this, each of the repeated phrase boundaries may be merged with the perceptual boundaries as set forth in the following example.
- the score for the repeated sequence boundary may be multiplied by the perceptual phrase boundary, and also by a measure of the distance between the two boundaries based on a tapered window (in number of notes).
- the system may search for the strongest boundary in the perceptual phrase boundary vector that may be as close as possible to the strong boundaries in the repeated phrase boundary vector.
- the system may then find the top n number of combined phrase boundaries. In some embodiments, n may set to 5, but may be other suitable values as well.
- FIG. 14 shows an example graphical user interface 1400 applying an embodiment of the above- recited melodic phrase analysis process.
- the GUI shows a MIDI representation of a musical input.
- the system may use the diatonic index as a measure for repetition.
- the diatonic index is the number of diatonic steps from the root note of the current key signature, and the diatonic interval is the difference in diatonic indices for two consecutive notes.
- the vector of diatonic intervals may be as follows:
- Analysis of the vector may indicate one repeated sequence that may have the highest strength; specifically, [0, -2, 0, -1, 0]. Many smaller repeated sequences (such as [0, 2, 0] or [-1, 0]), may also be considered but have smaller strengths. Boundary strengths may then be estimated to find the following:
- the perceptual phrase boundaries may be computed based on the discontinuity of a 3-note sliding window.
- the perceptual phrase boundary analysis may result in the following vector:
- the system may identify the following boundaries:
- FIG. 15 shows a GUI 1500 indicating an example of the identification of a first boundary 1502 and a second boundary 1504.
- FIG. 16 shows another example GUI 1600 indicating a first boundary 1602 and a second boundary 1604 that may have been identified if only the strongest repeated phrases were considered.
- the system may remove entire phrases when, for example, the Note Differential (ND) is largely negative.
- ND Note Differential
- the 2nd and 3rd phrase could then be removed, resulting in a melody with only 5 notes.
- Such an example would attain a Note Parity of exactly 1.
- Text alignment may include aligning textual phrases in the lyrical input with their melodic phrase counterpart in the musical input.
- text alignment may include
- the melodic phrases may be extracted from the melody in the musical input.
- the note differential may be calculated for the melodic phrase identified.
- the text repetition feature is available for the textual phrase, and if the repetition of text would bring the note parity above a melodic reduction threshold, (e.g., 0.8, or 0.9) and below 1, the text may be repeated.
- melodic reduction may be used to reduce the number of notes down to the phrase's number of syllables. The process may continue for each textual phrase of the lyrical input until the entirety of the lyrical input has been assigned to notes in the melody, even if somewhat modified.
- FIGS. 15, 17, and 18 illustrate a series of GUIs for implementing an embodiment of text alignment in the manner described above.
- the GUIs may represent a visually depiction of MIDI notes 1701 with notes on the vertical axis 1702 and time on the horizontal axis 1704.
- the text may be tokenized by a text analysis tool, identifying phrase breaks based on grammar and punctuation. For example, the break down may result in: ["That's cool”], ["I like your costume”], ["better”].
- the first break between "That's cool” and "I like your costume” may be identified from the comma.
- the second break between "I like your costume” and “better” may be identified based on "I like your costume” being a grammatically complete sentence.
- "That's cool” may be made to correspond with the first melodic phrase in the musical input based on the phrase boundaries detected such as shown above in FIG. 15.
- the first melodic phrase in the musical input e.g., the notes 1501 before the first boundary 1502 contains five notes, while the first textual phrase, "That's cool”, contains only two syllables, resulting in a phrase differential or note parity of 0.4. Repeating the first phrase in the input text results in four syllables, or a phrase differential or note parity of 0.8.
- the threshold for applying the text repetition tool is set at 80%, the note parity of 0.8 may meet the threshold and allow the text to be repeated.
- the melisma tool may then be applied.
- the pitch intervals for the first melodic phrase may be (0, 0, 4, 0).
- the melisma tool may only be applied for 1 or 2 semitones, so no melismas would be added in this example.
- other less restrictive rules for melismas may be applied in other embodiments.
- the melodic reduction tool may be used.
- the most probable reduction based on the set of solutions that the melodic reduction grammar may be trained on is the REPEAT LEFT embellishment from note index 1 to note index 0.
- the second note in the phrase may be removed, and the duration of the first note may be extended to the end of the second, now reduce-out note.
- the duplication of the text would result in a note parity of more than one, and thus may not be used in this embodiment. Therefore, melisma and the reduction tools may be used to optimize parity.
- a threshold of 0.8 parity may not be reached, and thus the output of the system for the given portion of the musical work may involve 4 notes removed in the reduction process. The notes preceding them may be extended, as depicted in GUI 1800 of FIG. 18.
- the third and final textual phrase is simply the word "better" containing two syllables, and the final melodic phrase contains nine notes.
- the text repetition feature may be invoked.
- the text may be repeated four times to yield a 0.888 note parity, which is above the 0.8 threshold for this example. So, the text may repeated four times. Then, the newly repeated text may be analyzed for possible melismas. A melisma opportunity may be found for the "er” of "better” extended over the fourth and third to last notes. In this portion of the input text, no reduction may be needed because, after adding one extra melisma syllable, optimal note parity may achieved for this phase.
- the musical input has 23 notes, while the lyrical input was nine syllables.
- the application of the system's tools as described herein were used to optimize parity while only removing five notes from the musical input. Further, the notes removed were from different portions of the musical input. Thus, the recognizability of the original melody in the musical input may be preserved using the lyrics of the lyrical input.
- the media generation system 100 may include additional features in generating a musical or multimedia work. As described above, some embodiments of the system may include allowing a user to create a melody to be used as a musical input. In such embodiments, a synthesized vocal melody generated from the input text may follow the specific melody created and defined by the user. The user may perform an original melody on a keyboard or input data through MIDI or other input devices to provide a melodic contour for the musical input. In some embodiments, the system 100 may then generate a vocal-like reference while playing, perform actual words or lyrics from a lyrical input in substantially real time, and may pass MIDI back to an external sound source.
- user's may type or otherwise enter the lyrics the user would like included in a musical work a lyrical input.
- the lyrical input may then be transformed to automatically assign notes, embellishments, and/or other effects such as those described herein.
- a user may change the lyrics or words in the lyrical input at any time and the system may automatically adjust the musical work or a section of the musical work accordingly.
- the system 100 may receive user input of text and melody.
- the text may be a lyrical input of the lyrics the musical work the user seeks to create
- the melody may be a musical input from various sources as described in further detail herein.
- at least one characteristic of the lyrical input may be compared to the musical input. For example, the number of syllables of the lyrical input may be compared to the number of notes in the musical input, or any other of the various analyses described herein with respect to method 200.
- the comparison of the at least one characteristic of the lyrical input and the at least one characteristic of the musical entity may be compared to determine at least one disparity between the lyrical input and the musical input.
- a vocal rendering of the lyrical input may be generated based at least upon the characteristics of the lyrical input and the musical input such as described with relation to the method 200. In some embodiments, however, the vocal rendering may be based merely upon the lyrical input alone. For example, the vocal rendering may analyze the lyrics included in the lyrical input and break down words, phrases, syllables, or phonemes for identification.
- the system may determine whether user controls, either automatically or by a user.
- user controls may include pre-authored lyrics, associated vocal performances (i.e., "licks"), pre-defined stylistic settings, vocal effects, etc.
- additional pre-authored lyrics that may differ from or be in addition to the lyrical input may also be rendered and automatically assigned to the melody of the musical input.
- the "licks" may include different melodies that may be harmonious to the melody of the musical input.
- User controls for stylistic settings may be include vocal idiosyncrasies that determine the genre of the music, the emotion of the lyrics, etc.
- idiosyncrasies may be captured by the system and available to the user to apply to a musical work, or may be applied automatically based on a user's selection of a singer with particular voice characteristics.
- a user may also include (or may be automatically applied) vocal effects such as reverb, delay, stutter effects, pitch shift, etc. If a user has opted to implement any of these user controls or have been implemented automatically, at 506, the method 500 may include receiving those controls and including them into the musical work at 508. After the user controls have been received at 508, or if no user controls are included at 506, the system may determine whether performance editing at 510 is to be included, either automatically or via user input.
- performance editing may include MIDI roll editing, tactile control, vocal effects adjustment, text-to-melody augmentation, etc.
- the performance editing may be received by the system at 512.
- the system may incorporate any and all user controls effects or performance editing effects to generate the final musical work to be output, stored, or sent in a message. It is contemplated that, in some embodiments, the performance editing may take place simultaneous with or prior to the user controls.
- either or both of the user controls effects or the performance editing effects may be received by the system before or after sending formatted data to a voice synthesizer for generation of a vocal rendering.
- the system may re-correlated the lyrical input and the musical input after receiving additional user controls or performance editing so that a new vocal rendering may be generated taking into account the additional received effects edits.
- MIDI roll editing may include adjusting the timing of each musical note within a melody by, for example, clicking on a visual depiction of the musical input or musical work on a user interface, and dragging the length of that note the lengthen or shorten its timing.
- An exemplary graphical user interface (GUI) 600 for MIDI rolling is shown in FIG. 6.
- the MIDI rolling GUI may include an note indication 602 on one axis, and a time indication 604 on another axis.
- the note indication 602 is represented by a graphical depiction of a piano keyboard, with the note "C" shown in several octaves. It should be understood that other graphical representations may be used.
- the lyrics or words from the lyrical input may be indicated as lyric indications 606.
- the lyric indications 606 may be accompanied by note bars 608 that may indicate the note at which the corresponding lyric is sung or played with respect to the vertical axis 602.
- the length of the note bars 608 with respect to the horizontal (i.e., time) axis 604 may also indicate for how long that particular lyric or group of lyrics may be played at the specified note.
- the length of the note bar 608 may be adjusted by lengthening or shortening the note bar, and the note of the lyrics may be adjusted by moving the note bar with respect to the vertical (i.e., note) axis.
- Tactile control may provide a user with the ability to change the way that a sung melody in a musical work is performed.
- FIG. 7 shows an example of a graphical user interface (GUI) 700 that the system may provide a user to adjust tactile control, such as embellishment, auto-tune, melisma, and slow glide. Some of these effects and the adjustment thereof is described in further detail above with respect to method 200.
- the tactile control GUI 700 may include several control aspects that may act in opposition to one another, and provide an effects indicator 710 to make adjustments among those controls and effects.
- embellishment limit 702 may represent the maximum embellishment available
- the melisma limit 704 may represent the maximum melisma available as an effect.
- the portions of the GUI 700 between the embellishment limit 702 and the melisma limit 704 may represent a sliding scale of positions along an embellishment-melisma slider 705 between the maximums of either effect.
- the effects indicator 710 may be moved towards the melisma limit 704, the more individual syllables may be performed or played over consecutive musical notes in the musical input.
- additional notes may be added to the melody.
- lyrical repetition may be utilized.
- the auto-tune limit 706 may represent the maximum auto- tune effect available
- the slow glide limit 708 may represent the maximum slow glide effect available.
- the portions of the GUI 700 between the auto-tune limit 706 and the slow glide limit 708 may represent a sliding scale of positions along an autotune-slow glide slider 709 between the maximums of either effect.
- movement of the effects indicator 710 along the autotune-slow glide slider 709 may control how quickly a note "snaps" from one note to the next. If the effects indicator 710 is moved toward the slow glide limit 708, the vocal performance in the musical work may sound looser and take a longer time to move from one note to the next in a melody. Conversely, if the effects indicator 710 is moved toward the auto- tune limit 706, the vocal performance of the lyrics may sound tighter and take less time to move from one note to the next.
- the GUI 700 may provide a
- multidimensional tool for a user to make various adjustments to musical effects. It is contemplated that, in some embodiments, additional effects may be displayed in the GUI to provide additional control.
- Vocal effects adjustment may allow a user to adjust the sound of the sung vocal performance in the musical work.
- FIG. 8 shows an example vocal effects GUI 800 for adjusting certain effects.
- a reverb effects indicator 803 may slide along a reverb scale 802 to increase or decrease reverb effect
- a delay effects indicator 805 may slide along a delay scale 804 to increase or decrease delay effect
- a compression effects indicator 807 may slide along a compression scale 806 to increase or decrease compression effects
- a bass effect indicator 809 may slide along a bass scale 808 to increase or decrease bass
- a treble effect indicator 811 may slide along a treble scale 810 to increase or decrease treble
- a pitch effect indicator 813 may slide along a pitch scale 812 to increase or decrease pitch.
- controlling each effect may control the sound in the synthesized musical work.
- Text-to-melody augmentation may be used to automatically adjust, for example, the way the lyrics provide in the input text may be sung over the musical input.
- popular music may be recognizable or memorable due at least in part to repeated short musical phrases, or leitmotifs, that may match in both a lyrical or musical note structure. Often times the rhythm and phrase signatures for lyrics and music may match.
- finding the best relationship between leitmotifs and lyrics may be difficult without the help of an expert singer with experience in lyrical phrasing.
- the system herein may provide an algorithmically driven combinatory approach to discerning leitmotifs and poetic cadence to enhance a user's ability to best match lyrics and music.
- the system may receive a lyrical input of the lyrics to be used in a musical work and, at 906, receive a musical input that the lyrics may be sung over in the musical work.
- the musical input may be MIDI notes input by the user via a MIDI device, or may be generated from an analog recording and analyzed to detect pitch, tempo, and other properties.
- the lyrical input may be analyzed to understand the lyrics. In some embodiments, this analysis may include natural language processing, natural language understanding, and other analyses such as those described herein with respect to method 200.
- the system may analyze the musical input, such as by using leitmotif detection.
- the leitmotif detection process may include reference to a leitmotif dataset, which may include numerous examples of leitmotifs used in other music from which to reference.
- the method 900 may include generating poetic cadence options that may be presented to the user based on the analysis of the lyrical input and the musical input.
- the user may approve of the generated poetic cadence option or not. If the user does not approve, an alternative poetic cadence may be generated. If the user approves the generated poetic cadence option the user may indicate that approval and, at 914, the poetic cadence option will be used to generated the musical work. It should be understood that method 900 may be implemented in addition to or in concurrence with the other effects control measures described herein, such as method 200.
- the media generation system can use any of the individual solutions alone while correlating the musical input with the lyrical input, or can implement various solutions described herein sequentially or simultaneously to optimize the output quality of a musical message.
- the system may use embellishment to lengthen a musical input so that it becomes half the length of the lyrical input, followed by using repetition of the embellished musical input to more closely match up with the lyrical input.
- Other combinations of solutions are also contemplated herein to accomplish the task of correlating the musical input with the lyrical input so that the finalized musical message is optimized. It is also contemplated that other techniques consistent with this disclosure could be implemented to effectively correlate the musical input with the lyrical input in transforming the lyrical input and musical input into a finalized musical message.
- the media generation system and the method for operating such media generation system described herein may be performed on a single client device, such as client device 104 or server 108, or may be performed on a variety of devices, each device including different portions of the system and performing different portions of the method.
- client device 104 or server 108 may perform most of the steps, but the voice synthesis may be performed by another device or another server.
- the following includes a description of one embodiment of a single device that could be configured to include the media generation system described herein, but it should be understood that the single device could alternatively be multiple devices.
- FIG. 4 shows one embodiment of the system 100 that may be deployed on any of a variety of devices 101-105 or 108 from FIG. 1, or on a plurality of devices working together, which may be, for illustrative purposes, any multi-purpose computer (101, 102), hand-held computing device (103-105) and/or server (108).
- FIG. 4 depicts the system 100 operating on device 104 from FIG 1., but one skilled in the art would understand that the system 100 may be deployed either as an application installed on a single device or, alternatively, on a plurality of devices that each perform a portion of the system's operation.
- the system may be operated within an http browser environment, which may optionally utilize web-plug in technology to expand the functionality of the browser to enable functionality associated with system 100.
- Device 104 may include many more or less components than those shown in FIG. 4. However, it should be understood by those of ordinary skill in the art that certain components are not necessary to operate system 100, while others, such as processor, video display, and audio speaker are important to practice aspects of the present invention.
- device 104 includes a processor 402, which may be a CPU, in communication with a mass memory 404 via a bus 406.
- processor 402 could also comprise one or more general processors, digital signal processors, other specialized processors and/or ASICs, alone or in combination with one another.
- Device 104 also includes a power supply 408, one or more network interfaces 410, an audio interface 412, a display driver 414, a user input handler 416, an illuminator 418, an input/output interface 420, an optional haptic interface 422, and an optional global positioning systems (GPS) receiver 424.
- GPS global positioning systems
- Device 104 may also include a camera, enabling video to be acquired and/or associated with a particular musical message. Video from the camera, or other source, may also further be provided to an online social network and/or an online music community. Device 104 may also optionally communicate with a base station or server 108 from FIG. 1, or directly with another computing device. Other computing device, such as the base station or server 108 from FIG. 1, may include additional audio-related components, such as a professional audio processor, generator, amplifier, speaker, XLR connectors and/or power supply.
- additional audio-related components such as a professional audio processor, generator, amplifier, speaker, XLR connectors and/or power supply.
- power supply 408 may comprise a rechargeable or non- rechargeable battery or may be provided by an external power source, such as an AC adapter or a powered docking cradle that could also supplement and/or recharge the battery.
- Network interface 410 includes circuitry for coupling device 104 to one or more networks, and is constructed for use with one or more communication protocols and technologies including, but not limited to, global system for mobile communication (GSM), code division multiple access (CDMA), time division multiple access (TDMA), user datagram protocol (UDP), transmission control protocol/Internet protocol (TCP/IP), SMS, general packet radio service (GPRS), WAP, ultra wide band (UWB), IEEE 802.16 Worldwide Interoperability for Microwave Access (WiMax), SIP/RTP, or any of a variety of other wireless communication protocols.
- GSM global system for mobile communication
- CDMA code division multiple access
- TDMA time division multiple access
- UDP user datagram protocol
- TCP/IP transmission control protocol/Internet protocol
- SMS general packet radio service
- GPRS
- network interface 410 may include as a transceiver, transceiving device, or network interface card (NIC).
- Audio interface 412 (FIG. 4) is arranged to produce and receive audio signals such as the sound of a human voice.
- Display driver 414 (FIG. 4) is arranged to produce video signals to drive various types of displays.
- display driver 414 may drive a video monitor display, which may be a liquid crystal, gas plasma, or light emitting diode (LED) based-display, or any other type of display that may be used with a computing device.
- Display driver 414 may alternatively drive a hand-held, touch sensitive screen, which would also be arranged to receive input from an object such as a stylus or a digit from a human hand via user input handler 416.
- Device 104 also comprises input/output interface 420 for communicating with external devices, such as a headset, a speaker, or other input or output devices.
- Input/output interface 420 may utilize one or more communication technologies, such as USB, infrared, BluetoothTM, or the like.
- the optional haptic interface 422 is arranged to provide tactile feedback to a user of device 104.
- the optional haptic interface 422 may be employed to vibrate the device in a particular way such as, for example, when another user of a computing device is calling.
- Optional GPS transceiver 424 may determine the physical coordinates of device 104 on the surface of the Earth, which typically outputs a location as latitude and longitude values. GPS transceiver 424 can also employ other geo-positioning mechanisms, including, but not limited to, tri angulation, assisted GPS (AGPS), E-OTD, CI, SAI, ETA, BSS or the like, to further determine the physical location of device 104 on the surface of the Earth. In one embodiment, however, the mobile device may, through other components, provide other information that may be employed to determine a physical location of the device, including for example, a MAC address, IP address, or the like.
- mass memory 404 includes a RAM 423, a ROM 426, and other storage means.
- Mass memory 404 illustrates an example of computer readable storage media for storage of information such as computer readable instructions, data structures, program modules, or other data.
- Mass memory 404 stores a basic input/output system ("BIOS") 428 for controlling low-level operation of device 104.
- BIOS basic input/output system
- the mass memory also stores an operating system 430 for controlling the operation of device 104.
- this component may include a general purpose operating system such as a version of MAC OS, WINDOWS, UNIX, LINUX, or a specialized operating system such as, for example, Xbox 360 system software, Wii IOS, Windows MobileTM, iOS, Android, webOS, QNX, or the
- the operating system may include, or interface with, a Java virtual machine module that enables control of hardware components and/or operating system operations via Java application programs.
- the operating system may also include a secure virtual container, also generally referred to as a "sandbox,” that enables secure execution of applications, for example, Flash and Unity.
- One or more data storage modules may be stored in memory 404 of device 104. As would be understood by those of ordinary skill in the art having the present specification, drawings, and claims before them, a portion of the information stored in data storage modules may also be stored on a disk drive or other storage medium associated with device 104. These data storage modules may store multiple track recordings, MIDI files, WAV files, samples of audio data, and a variety of other data and/or data formats or input melody data in any of the formats discussed above. Data storage modules may also store information that describes various capabilities of system 100, which may be sent to other devices, for instance as part of a header during a communication, upon request or in response to certain events, or the like.
- data storage modules may also be employed to store social networking information including address books, buddy lists, aliases, user profile information, or the like.
- Device 104 may store and selectively execute a number of different applications, including applications for use in accordance with system 100.
- application for use in accordance with system 100 may include Audio Converter Module, Recording Session Live Looping (RSLL) Module, Multiple Take Auto-Compositor (MTAC) Module, Harmonizer Module, Track Sharer Module, Sound Searcher Module, Genre Matcher Module, and Chord Matcher Module.
- RSLL Recording Session Live Looping
- MTAC Multiple Take Auto-Compositor
- Harmonizer Module Harmonizer Module
- Track Sharer Module Sound Searcher Module
- Genre Matcher Module Genre Matcher Module
- Chord Matcher Module Chord Matcher Module
- the applications on device 104 may also include a messenger 434 and browser 436.
- Messenger 434 may be configured to initiate and manage a messaging session using any of a variety of messaging communications including, but not limited to email, Short Message Service (SMS), Instant Message (FM), Multimedia Message Service (MMS), internet relay chat (IRC), mIRC, RSS feeds, and/or the like.
- SMS Short Message Service
- FM Instant Message
- MMS Multimedia Message Service
- IRC internet relay chat
- mIRC RSS feeds
- messenger 434 may be configured as an FM messaging application, such as AOL Instant Messenger, Yahoo! Messenger, .NET Messenger Server, ICQ, or the like.
- messenger 434 may be a client application that is configured to integrate and employ a variety of messaging protocols.
- messenger 434 may interact with browser 436 for managing messages.
- Browser 436 may include virtually any application configured to receive and display graphics, text, multimedia, and the like, employing virtually any web based language.
- the browser application is enabled to employ Handheld Device Markup Language (HDML), Wireless Markup Language (WML), WMLScript, JavaScript, Standard Generalized Markup Language (SMGL), HyperText Markup Language (HTML), extensible Markup Language (XML), and the like, to display and send a message.
- HDML Handheld Device Markup Language
- WML Wireless Markup Language
- WMLScript Wireless Markup Language
- JavaScript Standard Generalized Markup Language
- SMGL Standard Generalized Markup Language
- HTML HyperText Markup Language
- XML extensible Markup Language
- any of a variety of other web-based languages including Python, Java, and third party web plug-ins, may be employed.
- Device 104 may also include other applications 438, such as computer executable instructions which, when executed by client device 104, transmit, receive, and/or otherwise process messages (e.g., SMS, MMS, JJVI, email, and/or other messages), audio, video, and enable telecommunication with another user of another client device.
- applications 438 such as computer executable instructions which, when executed by client device 104, transmit, receive, and/or otherwise process messages (e.g., SMS, MMS, JJVI, email, and/or other messages), audio, video, and enable telecommunication with another user of another client device.
- Other examples of application programs include calendars, search programs, email clients, JJVI applications, SMS applications, VoIP applications, contact managers, task managers, transcoders, database programs, word processing programs, security applications, spreadsheet programs, games, search programs, and so forth.
- Each of the applications described above may be embedded or, alternately, downloaded and executed on device 104.
- each of these applications may be implemented on one or more remote devices or servers, wherein inputs and outputs of each portion are passed between device 104 and the one or more remote devices or servers over one or more networks.
- one or more of the applications may be packaged for execution on, or downloaded from a peripheral device.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
- Stored Programmes (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201880049353.7A CN111213200A (en) | 2017-05-22 | 2018-05-22 | System and method for automatically generating music output |
CA3064738A CA3064738A1 (en) | 2017-05-22 | 2018-05-22 | System and method for automatically generating musical output |
EP18805032.2A EP3631789A4 (en) | 2017-05-22 | 2018-05-22 | System and method for automatically generating musical output |
BR112019024679A BR112019024679A2 (en) | 2017-05-22 | 2018-05-22 | system and method for automatically generating musical output |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762509727P | 2017-05-22 | 2017-05-22 | |
US62/509,727 | 2017-05-22 | ||
US201762524838P | 2017-06-26 | 2017-06-26 | |
US62/524,838 | 2017-06-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018217790A1 true WO2018217790A1 (en) | 2018-11-29 |
Family
ID=64396995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2018/033941 WO2018217790A1 (en) | 2017-05-22 | 2018-05-22 | System and method for automatically generating musical output |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP3631789A4 (en) |
CN (1) | CN111213200A (en) |
BR (1) | BR112019024679A2 (en) |
CA (1) | CA3064738A1 (en) |
WO (1) | WO2018217790A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110600034A (en) * | 2019-09-12 | 2019-12-20 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
US11200881B2 (en) | 2019-07-26 | 2021-12-14 | International Business Machines Corporation | Automatic translation using deep learning |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111404808B (en) * | 2020-06-02 | 2020-09-22 | 腾讯科技(深圳)有限公司 | Song processing method |
CN113010730B (en) * | 2021-03-22 | 2023-07-21 | 平安科技(深圳)有限公司 | Music file generation method, device, equipment and storage medium |
CN113539216B (en) * | 2021-06-29 | 2024-05-31 | 广州酷狗计算机科技有限公司 | Melody creation navigation method and device, equipment, medium and product thereof |
CN113793578B (en) * | 2021-08-12 | 2023-10-20 | 咪咕音乐有限公司 | Method, device and equipment for generating tune and computer readable storage medium |
CN117059052A (en) * | 2022-05-07 | 2023-11-14 | 脸萌有限公司 | Song generation method, device, system and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6388182B1 (en) * | 2001-03-21 | 2002-05-14 | BERMUDEZ RENéE FRANCESCA | Method and apparatus for teaching music |
US6424944B1 (en) * | 1998-09-30 | 2002-07-23 | Victor Company Of Japan Ltd. | Singing apparatus capable of synthesizing vocal sounds for given text data and a related recording medium |
US20160055838A1 (en) * | 2014-08-22 | 2016-02-25 | Zya, Inc. | System and method for automatically converting textual messages to musical compositions |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002132281A (en) * | 2000-10-26 | 2002-05-09 | Nippon Telegr & Teleph Corp <Ntt> | Method of forming and delivering singing voice message and system for the same |
US8378194B2 (en) * | 2009-07-31 | 2013-02-19 | Kyran Daisy | Composition device and methods of use |
-
2018
- 2018-05-22 CA CA3064738A patent/CA3064738A1/en not_active Abandoned
- 2018-05-22 WO PCT/US2018/033941 patent/WO2018217790A1/en active Application Filing
- 2018-05-22 EP EP18805032.2A patent/EP3631789A4/en not_active Withdrawn
- 2018-05-22 BR BR112019024679A patent/BR112019024679A2/en not_active Application Discontinuation
- 2018-05-22 CN CN201880049353.7A patent/CN111213200A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6424944B1 (en) * | 1998-09-30 | 2002-07-23 | Victor Company Of Japan Ltd. | Singing apparatus capable of synthesizing vocal sounds for given text data and a related recording medium |
US6388182B1 (en) * | 2001-03-21 | 2002-05-14 | BERMUDEZ RENéE FRANCESCA | Method and apparatus for teaching music |
US20160055838A1 (en) * | 2014-08-22 | 2016-02-25 | Zya, Inc. | System and method for automatically converting textual messages to musical compositions |
Non-Patent Citations (1)
Title |
---|
See also references of EP3631789A4 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11200881B2 (en) | 2019-07-26 | 2021-12-14 | International Business Machines Corporation | Automatic translation using deep learning |
CN110600034A (en) * | 2019-09-12 | 2019-12-20 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
CN110600034B (en) * | 2019-09-12 | 2021-12-03 | 广州酷狗计算机科技有限公司 | Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
BR112019024679A2 (en) | 2020-06-09 |
EP3631789A4 (en) | 2021-07-07 |
CA3064738A1 (en) | 2018-11-29 |
EP3631789A1 (en) | 2020-04-08 |
CN111213200A (en) | 2020-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180268792A1 (en) | System and method for automatically generating musical output | |
US10529310B2 (en) | System and method for automatically converting textual messages to musical compositions | |
US20180374461A1 (en) | System and method for automatically generating media | |
WO2018217790A1 (en) | System and method for automatically generating musical output | |
US20190147838A1 (en) | Systems and methods for generating animated multimedia compositions | |
CA2764042C (en) | System and method of receiving, analyzing, and editing audio to create musical compositions | |
CN103959372B (en) | System and method for providing audio for asked note using presentation cache | |
CN106023969B (en) | Method for applying audio effects to one or more tracks of a music compilation | |
EP3759706B1 (en) | Method, computer program and system for combining audio signals | |
WO2019005625A1 (en) | System and method for automatically generating media | |
CA2843438A1 (en) | System and method for providing audio for a requested note using a render cache | |
WO2020010329A1 (en) | Systems and methods for generating animated multimedia compositions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18805032 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3064738 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112019024679 Country of ref document: BR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2018805032 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2018805032 Country of ref document: EP Effective date: 20200102 |
|
ENP | Entry into the national phase |
Ref document number: 112019024679 Country of ref document: BR Kind code of ref document: A2 Effective date: 20191122 |