EP3646315A1 - Système et procédé de génération automatique de supports - Google Patents

Système et procédé de génération automatique de supports

Info

Publication number
EP3646315A1
EP3646315A1 EP18823868.7A EP18823868A EP3646315A1 EP 3646315 A1 EP3646315 A1 EP 3646315A1 EP 18823868 A EP18823868 A EP 18823868A EP 3646315 A1 EP3646315 A1 EP 3646315A1
Authority
EP
European Patent Office
Prior art keywords
lyric
information
audio selection
musical
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP18823868.7A
Other languages
German (de)
English (en)
Other versions
EP3646315A4 (fr
Inventor
Matthew Michael SERLETIC
Bo BAZYLEVSKY
James Mitchell
Ricky KOVAC
Patrick Woodward
Thomas Webb
Ryan Groves
Ed Schofield
Brett Harrison
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zya Inc
Zya Inc
Original Assignee
Zya Inc
Zya Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/986,589 external-priority patent/US20180268792A1/en
Application filed by Zya Inc, Zya Inc filed Critical Zya Inc
Publication of EP3646315A1 publication Critical patent/EP3646315A1/fr
Publication of EP3646315A4 publication Critical patent/EP3646315A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • G10H1/0066Transmission between separate instruments or between individual components of a musical system using a MIDI interface
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/368Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems displaying animated or moving pictures synchronized with the music or audio part
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/005Non-interactive screen display of musical or status data
    • G10H2220/011Lyrics displays, e.g. for karaoke applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/075Musical metadata derived from musical analysis or for use in electrophonic musical instruments
    • G10H2240/085Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/125Library distribution, i.e. distributing musical pieces from a central or master library
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/325Synchronizing two or more audio tracks or files according to musical features or musical timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/455Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis

Definitions

  • the present disclosure relates generally to the field of music creation, and more specifically to a system of creating music videos.
  • Lyric videos are a type of media content in which a song or other audio selection may be set to visualizations, which may include all or some of the song's lyrics displayed in time with the audio playback of the song.
  • the disclosure describes a computer implemented method for automatically generating lyric videos.
  • the method may include receiving an audio selection, determining timing information of the audio selection, and determining lyric information of the audio selection.
  • the method may include receiving tone information of the audio selection and generating video content based on at least one of the timing information, the lyric information, and the tone information of the audio selection.
  • the method may also include rendering a lyric video based on the video content and the audio selection.
  • the disclosure describes a computer implemented method for automatically generating lyric videos.
  • the method may include receiving, via a digital communication network, an audio selection.
  • the method may also include determining, via one or more processors, timing information of the audio selection.
  • the method may include requesting, via the digital communication network, lyric information of the audio selection from a lyric database, and receiving, via the digital communication network, the lyric information of the audio selection from the lyric database based on the request.
  • the method may also include requesting, via the digital communication network, tone information of the audio selection from a tone database, and receiving, via the digital communication network, the tone information of the audio selection from the tone database based on the request.
  • the tone information may include at least one of a genre, a tempo, a mood, an artist, or a style corresponding to the audio selection.
  • the method may include generating, via the one or more processors, video content based on at least one of the timing information, the lyric information, and the tone information of the audio selection.
  • the method may also include rendering, via the one or more processors, a lyric video based on the video content and the audio selection.
  • the disclosure describes a computer implemented method for automatically generating lyric videos.
  • the method may include receiving, via a digital communication network, an audio selection from a user device.
  • the method may include determining, via one or more processors, timing information of the audio selection, and determining, via the one or more processors, lyric information of the audio selection.
  • the method may include performing, via the one or more processors, a lyric analysis on the lyric information.
  • the method may include requesting, via the digital communication network, tone information of the audio selection from a third party database, and receiving, via the digital communication network, the tone information of the audio selection from the third party database based on the request.
  • the tone information may include at least one of a genre, a tempo, a mood, an artist, or a style corresponding to the audio selection.
  • the method may include generating, via the one or more processors, video content based on at least one of the timing information, the lyric analysis, and the tone information of the audio selection.
  • the method may include rendering, via the one or more processors, at least a portion of a lyric video based on the video content and the audio selection.
  • the method may also include transmitting, via the digital communication network, the at least portion of the lyric video to the user device for playback.
  • FIG. 1 illustrates one exemplary embodiment of a network configuration in which a lyric video system may be practiced in accordance with the disclosure
  • FIG. 2 illustrates a flow diagram of an embodiment of a method of operating a media generation system of the lyric video system in accordance with the disclosure
  • FIG. 3 illustrates a flow diagram of an embodiment of a method of operating an audio generation system of the lyric video system in accordance with the disclosure
  • FIG. 4 illustrates a block diagram of a device that supports the systems and processes of the disclosure
  • FIG. 5 illustrates a flow diagram of an embodiment of a method of operating an animation generation system of the lyric video system in accordance with the disclosure.
  • FIG. 6 illustrates a flow diagram of an embodiment of a method of operating the lyric video system in accordance with the disclosure.
  • the present disclosure relates to a system and method for automatically creating a lyric musical video based on user inputs that may be viewed, saved, or transmitted to users via a variety of messaging formats, such as SMS, MMS, and e-mail. It may also be possible to send such musical composition messages via various social media platforms and formats, such as Twitter ® , Facebook ® , Instagram ® , Snapchat ® , or any other suitable media sharing system.
  • the disclosed lyric video system may provide users with an intuitive and convenient way to automatically create, view, and send original lyric videos based on user inputs.
  • the lyric video system may receive a user's selection of a musical work or melody that is pre-recorded or recorded and provided by the user.
  • the selection may be received as user selection in a variety of ways and user interfaces, such as via a keyboard or through voice recognition software.
  • the lyric video system can analyze and parse the selected musical work and its lyrics to create an original lyric musical video of selected or provided musical work to provide a musically-enhanced version of the text input by the user.
  • the output of the lyric video system may automatically provide an original music video with visual representations of the music selection's lyrics based on the lyric's timing, and may include visual representations reflective of the audio selection's mood or tone.
  • the user can then, if it chooses, share the lyrical video with others via social media, SMS or MMS messaging, or any other form of file sharing or electronic communication.
  • the user can additionally record video to accompany the visual depictions and video output of the automatically generated lyric video.
  • the user video input may be recorded in real-time along with a vocal rendering of text input provided by the user in order to effectively match the video to the lyrics in the lyric music video created by the system.
  • the lyric video may include only automatically generated images, animations, video, and other visuals generated by the lyric video system.
  • the result of the system in such embodiments, may be an original lyric video created automatically for viewing on a client device such as a smartphone or tablet connected to a server via a network, and requiring little or no specialized technical skills or knowledge.
  • the client device need not be connected to a network.
  • FIG. 1 illustrates an exemplary embodiment of a network configuration in which the disclosed lyric video system 100 can be implemented. It is contemplated herein, however, that not all of the illustrated components may be required to implement the lyric video system, and that variations in the arrangement and types of components can be made without departing from the spirit of the scope of the invention.
  • the illustrated embodiment of the lyric video system 100 includes local area networks ("LANs”) / wide area networks (“WANs”) (collectively network 106), wireless network 110, client devices 101-105, server 108, media database 109, and peripheral input/output (I/O) devices 111, 112, and 113.
  • LANs local area networks
  • WANs wide area networks
  • I/O peripheral input/output
  • client devices 101-105 may include virtually any computing device capable of processing and sending audio, video, or textual data over a network, such as network 106, wireless network 110, etc.
  • network 106 such as network 106, wireless network 110, etc.
  • one or both of the wireless network 110 and the network 106 can be a digital communications network.
  • Client devices 101-105 may also include devices that are configured to be portable.
  • client devices 101-105 may include virtually any portable computing device capable of connecting to another computing device and receiving information.
  • Such devices include portable devices, such as cellular telephones, smart phones, display pagers, radio frequency (RF) devices, infrared (IR) devices, Personal Digital Assistants (PDAs), handheld computers, laptop computers, wearable computers, tablet computers, integrated devices combining one or more of the preceding devices, and the like.
  • portable devices such as cellular telephones, smart phones, display pagers, radio frequency (RF) devices, infrared (IR) devices, Personal Digital Assistants (PDAs), handheld computers, laptop computers, wearable computers, tablet computers, integrated devices combining one or more of the preceding devices, and the like.
  • Client devices 101-105 may also include virtually any computing device capable of communicating over a network to send and receive information, including track information and social networking information, performing audibly generated track search queries, or the like.
  • the set of such devices may include devices that typically connect using a wired or wireless communications medium such as personal computers, multiprocessor systems, microprocessor- based or programmable consumer electronics, network PCs, or the like.
  • client devices 101-105 may operate over wired and/or wireless network.
  • a client device 101-105 can be web-enabled and may include a browser application that is configured to receive and to send web pages, web-based messages, and the like.
  • the browser application may be configured to receive and display graphics, text, multimedia, video, etc., and can employ virtually any web-based language, including a wireless application protocol messages (WAP), and the like.
  • WAP wireless application protocol
  • the browser application is enabled to employ Handheld Device Markup Language (HDML), Wireless Markup Language (WML), WMLScript, JavaScript, Standard Generalized 25 Markup
  • HDML Handheld Device Markup Language
  • WML Wireless Markup Language
  • WMLScript Wireless Markup Language
  • JavaScript Standard Generalized 25 Markup
  • SMGL HyperText Markup Language
  • HTML HyperText Markup Language
  • XML extensible Markup Language
  • a user of the client device may employ the browser application to interact with a messaging client, such as a text messaging client, an email client, or the like, to send and/or receive messages.
  • a messaging client such as a text messaging client, an email client, or the like
  • Client devices 101-105 also may include at least one other client application that is configured to receive content from another computing device.
  • the client application may include a capability to provide and receive multimedia content, such as textual content, graphical content, audio content, video content, etc.
  • the client application may further provide information that identifies itself, including a type, capability, name, and the like.
  • client devices 101-105 may uniquely identify themselves through any of a variety of mechanisms, including a phone number, Mobile Identification Number (MUST), an electronic serial number (ESN), or other mobile device identifier.
  • MUST Mobile Identification Number
  • ESN electronic serial number
  • the information may also indicate a content format that the mobile device is enabled to employ. Such information may be provided in, for example, a network packet or other suitable form, sent to server 108, or other computing devices.
  • the media database 109 may be configured to store various media such as musical clips, video clips, graphics files, animation, etc., and the information stored in the media database may be accessed by the server 108 or, in other embodiments, accessed directly by other computing device through over the network 106 or wireless network 110.
  • Client devices 101-105 may further be configured to include a client application that enables the end-user to log into a user account that may be managed by another computing device, such as server 108.
  • a user account may be configured to enable the end-user to participate in one or more social networking activities, such as submit a track or a multi-track recording or video, search for tracks or recordings, download a multimedia track or other recording, stream video or audio content, or participate in an online music community.
  • participation in various networking activities may also be performed without logging into the user account.
  • Wireless network 110 is configured to couple client devices 103-105 and its components with network 106.
  • Wireless network 110 may include any of a variety of wireless sub-networks that may further overlay stand-alone ad-hoc networks, and the like, to provide an infrastructure-oriented connection for client devices 103-105.
  • Such sub-networks may include mesh networks, Wireless LAN (WLAN) networks, cellular networks, and the like.
  • Wireless network 110 may further include an autonomous system of terminals, gateways, routers, etc., connected by wireless radio links, or other suitable wireless communication protocols. These connectors may be configured to move freely and randomly and organize themselves arbitrarily, such that the topology of wireless network 110 may change rapidly.
  • Wireless network 110 may further employ a plurality of access technologies including 2nd (2G), 3rd (3G), 4th (4G) generation, and 4G Long Term Evolution (LTE) radio access for cellular systems, WLAN, Wireless Router (WR) mesh, and other suitable access technologies.
  • Access technologies such as 2G, 3G, 4G, 4G LTE, and future access networks may enable wide area coverage for mobile devices, such as client devices 103-105 with various degrees of mobility.
  • wireless network 110 may enable a radio connection through a radio network access such as Global System for Mobil communication (GSM), General Packet Radio Services (GPRS), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (WCDMA), etc.
  • GSM Global System for Mobil communication
  • GPRS General Packet Radio Services
  • EDGE Enhanced Data GSM Environment
  • WCDMA Wideband Code Division Multiple Access
  • wireless network 110 may include virtually any wireless communication mechanism by which information may travel between client devices 103-105 and another computing device, network, and the like.
  • Network 106 is configured to couple network devices with other computing devices, including, server 108, client devices 101-102, and through wireless network 110 to client devices 103-105.
  • Network 106 is enabled to employ any form of computer readable media for communicating information from one electronic device to another.
  • network 106 can include the Internet in addition to local area networks (LANs), wide area networks (WANs), direct connections, such as through a universal serial bus (USB) port, other forms of computer-readable media, or any combination thereof.
  • LANs local area networks
  • WANs wide area networks
  • USB universal serial bus
  • a router acts as a link between LANs, enabling messages to be sent from one to another.
  • communication links within LANs typically include twisted wire pair or coaxial cable
  • communication links between networks may utilize analog telephone lines, full or fractional dedicated digital lines including Tl, T2, T3, and T4, Integrated Services Digital Networks (ISDNs), Digital Subscriber Lines (DSLs), wireless links including satellite links, or other communications links known to those skilled in the art.
  • ISDNs Integrated Services Digital Networks
  • DSLs Digital Subscriber Lines
  • remote computers and other related electronic devices could be remotely connected to either LANs or WANs via a modem and temporary telephone link.
  • network 106 includes any communication method by which information may travel between computing devices.
  • client devices 101-105 may directly communicate, for example, using a peer to peer configuration.
  • communication media typically embodies computer-readable instructions, data structures, program modules, or other transport mechanism and includes any information delivery media.
  • communication media includes wired media such as twisted pair, coaxial cable, fiber optics, wave guides, and other wired media and wireless media such as acoustic, RF, infrared, and other wireless media.
  • Various peripherals including I/O devices 111-113 may be attached to client devices 101-105.
  • Multi -touch, pressure pad 113 may receive physical inputs from a user and be distributed as a USB peripheral, although not limited to USB, and other interface protocols may also be used, including but not limited to ZIGBEE, BLUETOOTH, or other suitable connections.
  • Data transported over an external and the interface protocol of pressure pad 113 may include, for example, MIDI formatted data, though data of other formats may be conveyed over this connection as well.
  • a similar pressure pad may alternately be bodily integrated with a client device, such as mobile devices 104 or 105.
  • a headset 112 may be attached to an audio port or other wired or wireless I/O interface of a client device, providing an exemplary arrangement for a user to listen to playback of a composed message, along with other audible outputs of the system.
  • Microphone 111 may be attached to a client device 101-105 via an audio input port or other connection as well.
  • one or more speakers and/or microphones may be integrated into one or more of the client devices 101-105 or other peripheral devices 111-113.
  • an external device may be connected to pressure pad 113 and/or client devices 101-105 to provide an external source of sound samples, waveforms, signals, or other musical inputs that can be reproduced by external control.
  • Such an external device may be a MIDI device to which a client device 103 and/or pressure pad 113 may route MIDI events or other data in order to trigger the playback of audio from external device.
  • formats other than MIDI may be employed by such an external device.
  • FIG. 2 is a flow diagram illustrating an embodiment of a method 200 for operating a media generation system, with references made to the components shown in FIG. 1.
  • the method 200 of operating a media generation system may be used to generate an audio selection for use with the lyric video system 100. More detail regarding the media generation system may be found in co-owned U.S. Patent Application No. 15/986,589, filed May 22, 2018, the disclosure of which is incorporated by reference herein.
  • the system can receive a lyrical input at 204.
  • the text or lyrical input may be input by the user via an electronic device, such as a PC, tablet, or smartphone, any other of the client devices 101-105 described in reference to FIG. 1 or other suitable devices.
  • the text may be input in the usual fashion in any of these devices (e.g., manual input using soft or mechanical keyboards, touch-screen keyboards, speech-to-text conversion).
  • the text or lyrical input is provided through a specialized user interface application accessed using the client device 101-105.
  • the lyrical input could be delivered via a general application for transmitting text-based messages using the client device 101-105.
  • the resulting lyrical input may be transmitted over the wireless communications network 110 and/or network 106 to be received by the server 108 at 204.
  • the system may analyze the lyrical input using server 108 to determine certain characteristics of the lyrical input. In some embodiments, however, it is contemplated that analysis of the lyrical input could alternatively take place on the client device 101-105 itself instead of or in parallel to the server 108. Analysis of the lyrical input can include a variety of data processing techniques and procedures. For example, in some embodiments, the lyrical input is parsed into the speech elements of the text with a speech parser.
  • the speech parser may identify important words (e.g., love, anger, crazy), demarcate phrase boundaries (e.g., "I miss you.” “I love you.” “Let's meet.” “That was an awesome concert ”) and/or identify slang terms (e.g., chill, hang). Words considered as important can vary by region or language, and can be updated over time to coincide with the contemporary culture. Similarly, slang terms can vary geographically and temporally such that the media generation system is updatable and customizable. Punctuation or other symbols used in the lyrical input can also be identified and attributed to certain moods or tones that can influence the analytical parsing of the text.
  • important words e.g., love, anger, crazy
  • demarcate phrase boundaries e.g., "I miss you.” "I love you.” “Let's meet.” “That was an awesome concert ”
  • slang terms e.g., chill, hang.
  • Words considered as important can vary by region or language,
  • the words or lyrics conveyed in the lyrical input can also be processed into its component pieces by breaking words down into syllables, and further by breaking the syllables into a series of phonemes.
  • the phonemes are used to create audio playback of the words or lyrics in the lyrical input. Additional techniques used to analyze the lyrical input are described in greater detail below.
  • the system may receive a selection of a musical input transmitted from the client device 101-105.
  • a user interface may be implemented to select the musical input from a list or library of pre-recorded and catalogued musical works or clips of musical works that may comprise one or more musical phrases.
  • a musical phrase may be a grouping of musical notes or connected sounds that exhibits a complete musical "thought," analogous to a linguistic phrase or sentence.
  • the list of available musical works or phrase may include, for example, a text-based description of the song title, performing artists, genre, and/or mood set by phrase, to name only a few possible pieces of information that could be provided to users via the user interface.
  • the user may then choose the desired musical work or clip for the media generation system to combine with the lyrical input.
  • there may be twenty or more pre-recorded and selected musical phrases for the user to choose from.
  • the pre-recorded musical works or phrases may be stored on the server 108 or media database 109 in any suitable computer readable format, and accessed via the client device 101-105 through the wireless network 106 and/or network 110.
  • the pre-recorded musical works may be stored directly onto the client device 101-105 or another local memory device, such as a flash drive or other computer memory device. Regardless of the storage location, the list of pre-recorded musical works can be updated over time, removing or adding musical works in order to provide the user with new options and additional choices.
  • a user may create their own melodies for use in association with the media generation system.
  • One or more melodies may be created using the technology disclosed in U.S. Patent No. 8,779,268 entitled “System and Method for Producing a More Harmonious Musical Accompaniment Graphical User Interface for a Display Screen System and Method that Ensures Harmonious Musical Accompaniment" assigned to the assignee of the present application.
  • Such patent disclosure is hereby incorporated by reference, in full.
  • a user may generate a musical input using an input device 111- 113, such as a MIDI instrument or other device for inputting user-created musical works or clips.
  • a user may use MIDI keyboard to generate a musical riff or entire song to be used as the musical input.
  • a user may create audio recording playing notes with a more traditional, non-MIDI instrument, such as a piano or a guitar. The audio recording may then be analyzed for pitch, tempo, etc., to utilize the audio recording as the musical input.
  • individual entries in the list of musical input options are selectable to provide, via the client device 101-105, a pre-recorded musical work (either stored or provided by the user), or a clip thereof, as a preview to the user.
  • the user interface associated with selecting a musical work includes audio playback capabilities to allow the user to listen to the musical clip in association with their selection of one of the musical works as the musical input.
  • such playback capability may be associated with a playback slider bar that graphically depicts the progressing playback of the musical work or clip.
  • the user selects the melody from the pre-recorded musical works stored within the system or from one or more melodies created by the user, it is contemplated that the user may be provided with functionality to select the points to begin and end within the musical work to define the musical input.
  • the client device 101-105 may transmit the selection over the wireless network 106 and/or network 110, which may be received by the server 108 as the musical input at 208 of FIG. 2.
  • the musical input may be analyzed and processed in order to identify certain characteristics and patterns associated with the musical input so as to more effectively match the musical input with the lyrical input to produce an original musical composition for use in a message or otherwise.
  • analysis and processing of the musical work includes "reducing" or “embellishing" the musical work.
  • the selected musical work may be parsed for features such as structurally important notes, rhythmic signatures, and phrase boundaries.
  • each musical work or clip may optionally be embellished or reduced, either adding a number of notes to the phrase in a musical way (embellish), or removing them (reduce), while still maintaining the idea and recognition of the original melody in the musical input.
  • embellishments or reductions may be performed in order to align the textual phrases in the lyrical input with the musical phrases by aligning their boundaries, and also to provide the musical material necessary for the alignment of the syllables of individual words to notes resulting in a natural musical expression of the input text.
  • all or part of the analysis of the pre-recorded musical works may have already been completed enabling the media generation system to merely retrieve the pre-analyzed data from the media database 109 for use in completing the musical composition.
  • the process of analyzing the musical work in preparation for matching with the lyrical input and for use in the musical message is set forth in more detail below.
  • the lyrical input and the musical input may be correlated with one another based on the analyses of both the lyrical input and the musical input 206 and 210.
  • the notes of the selected and analyzed musical work are intelligently and automatically assigned to one or more phonemes in the input text, as described in more detail below.
  • the resulting data correlating the lyrical input to the musical input may then be formatted into a synthesizer input at 214 for input into a voice synthesizer.
  • the formatted synthesizer input in the form of text syllable-melodic note pairs, may then be sent to a voice synthesizer at 216 to create a vocal rendering of the lyrical input for use in an original musical work that incorporates characteristics of the lyrical input and the musical input.
  • the musical message or vocal rendering may then be received by the server 108 at 218.
  • the generated musical work may be received in the form of an audio file including a vocal rendering of the lyrical input entered by the user correlating with the music/melody of the musical input, either selected or created.
  • the voice synthesizer may generate the entire musical work including the vocal rendering of the lyrical input and the musical portion from the musical input.
  • the voice synthesizer may generate only a vocal rendering of the input text created based on the synthesizer input, which may be generated by analyzing the lyrical input and the musical input described above.
  • a musical rendering based on the musical input, or the musical input itself may be combined with the vocal rendering to generate a musical work.
  • the voice synthesizer may be any suitable vocal renderer.
  • the voice synthesizer may be cloud-based with support from a web server that provides security, load balancing, and the ability to accept inbound messages and send outbound musically- enhanced messages.
  • the vocal renderer may be run locally on the server 108 itself or on the client device 101-105.
  • the voice synthesizer may render the formatted lyrical input data to provide a text-to-speech conversion as well as singing speech synthesis.
  • the vocal renderer may provide the user with a choice of a variety of voices, a variety of voice synthesizers (including but not limited to HMM-based, diphone or unit- selection based), or a choice of human languages.
  • Some examples of the choices of singing voices are gender (e.g., male/female), age (e.g., young/old), nationality or accent (e.g., American accent/British accent), or other distinguishing vocal characteristics (e.g., sober/drunk, yelling/whispering, seductive, anxious, robotic, etc.).
  • these choices of voices may be implemented through one or more speech synthesizers each using one or more vocal models, pitches, cadences, and other variables that may result in perceptively different sung attributes.
  • the choice of voice synthesizer may be made automatically by the system based on analysis of the lyrical input and/or the musical input for specific words or musical styles indicating mood, tone, or genre.
  • the system may provide harmonization to accompany the melody. Such accompaniment may be added into the message in the manner disclosed in pending U.S. Patent No. 8,779,268, incorporated by reference above.
  • the user may have the option of adding graphical elements to the musical work at 219. If selected, graphical elements may be chosen from a library of pre-existing elements stored either at the media database 109, on the client device 101-105 itself, or both. In another embodiment, the user may create its own graphical element for inclusion in a generated multimedia work. In yet other embodiments, graphic elements may be generated automatically without the user needing to specifically select them.
  • graphics may be colors and light flashes that correspond to the music in the musical work, animated figures or characters spelling out all or portions of textual message or lyrics input by the user, or other animations or colors that may be automatically determined to correspond with the tone of the musical input or with the tone of the lyrical input itself as determined by analysis of the lyrical input.
  • a graphical input indicating this selection may be transmitted to and received by the server 108 at 220.
  • the graphical element may then be generated at 222 using either the pre-existing elements selected by the user, automatic elements chosen by the system based on analysis of the lyrical input and/or the musical input, or a graphical elements provided by the user.
  • the user may choose, at 224, to include a video element to be paired with the musical work, or to be stored along with the musical work in the same media file output.
  • the user interface may activate one or more cameras that may be integrated into the client device 101-105 to capture video input, such as front-facing or rear-facing cameras on a smartphone or other device.
  • the user may manipulate the user interface on the client device to record video inputs to be incorporated into the generated musical .
  • the user interface displayed on the client device 101-105 may provide playback of the generated musical work while the user captures the video inputs allowing the user to coordinate particular features of the video inputs with particular portions of the musical work.
  • the user interface may display the text of the lyrical input on the device's screen with a progress indicator moving across the text during playback so as to provide the user with a visual representation of the musical work's progress during video capture.
  • the user interface may allow the user to stop and start video capture as desired throughout playback of the musical work, while simultaneously stopping playback of the musical work.
  • One such way of providing this functionality may be by capturing video while the user touches a touchscreen or other input of the client device 101-105, and at least temporarily pausing video capture when the user releases the touchscreen or other input.
  • the system may allow the user to capture certain portions of the video input during a first portion of the musical work, pause the video capture and playback of the musical work when desired, and then continue capture of another portion of the video input to correspond with a second portion of the musical work.
  • the user interface may provide the option of editing the video input by re-capturing portions of or the entirety of the video input.
  • the video input may be transmitted to and received by the server 108 for processing at 226.
  • the video input may then be processed to generate a video element at 228, and the video element may then be incorporated into the musical work to generate a multimedia musical work.
  • the video element may be synced and played along with the musical work corresponding to an order in which the user captured the portions of the video input.
  • processing and video element generation may be completed on the client device 101-105 itself without the need to transmit video input to the server 108.
  • the musical work or multimedia work may be transmitted or outputted, at 230, to the client device 101-105 over the network 110 and/or wireless network 110.
  • the musical work may be outputted to speakers and/or speakers combined with a visual display.
  • the system may provide the user with the option of previewing the musical or multimedia work at 232.
  • the musical or multimedia work may be played at 234 via the client device 101-105 for the user to review.
  • the user may be provided with the option to cancel the work without sending or otherwise storing, or to edit the work further.
  • the user may store the work as a media file, send the work as a musical or multimedia message to a selected message recipient, etc., at 235.
  • the musical or multimedia work may be sent to one or more recipients using a variety of communications and social media platforms, such as SMS or MMS messaging, e-mail, Facebook ® , Twitter ® , and Instagram ® , so long as the messaging
  • service/format supports the transmission, delivery, and playback of audio and/or video files.
  • a method of generating a musical work may additionally include receiving a selection of a singer corresponding to at least one voice characteristic.
  • the at least one voice characteristic may be indicative of a particular real- life or fictional singer with a particular recognizable style. For example, a particular musician may have a recognizable voice due to a specific twang, falsetto, vocal range, vibrato style, etc.
  • the at least one voice characteristic may be incorporated into the performance of the musical work. It is contemplated that, in some embodiments, the at least one voice characteristic may be included in the formatted data sent to the voice synthesizer at 216 of the method 200 in FIG. 2. However, it is also contemplated that the at least one voice characteristic may be incorporated into the vocal rendering received from the voice synthesizer.
  • the following provides a more detailed description of the methodology used in analyzing and processing the lyrical input and musical input provided by the user to create a musical or multimedia work. Specifically, the details provided pertain to at least one embodiment of performing steps 206 and 210-214 of the method 200 for operating the media generation system of the lyric video system 100. It should be understood, however, that other alternative methodologies for carrying out the steps of FIG. 2 are contemplated herein. It should also be understood that the media generation system can perform the following operations automatically upon receiving a lyrical input and selection of musical input from a user via the user's client device.
  • the methodology disclosed herein provides technical solutions to technical problems associated with correlating lyrical inputs with musical inputs such that the musical output of the correlation of the two inputs is matched effectively. Further, the methods and features described herein can operate to improve the functional ability of the computer or server to process certain types of information in a way that makes the computer more usable and functional than would otherwise be possible without the operations and systems described herein.
  • the media generation system may gather and manipulate text and musical inputs in such a way to assure system flexibility, scalability, and effectiveness.
  • collection and analysis of data points relating to the lyrical input and musical input is implemented to improve the computer and the system's ability to effectively correlate the musical and lyrical inputs.
  • Some data points determined and used by the system in analyzing and processing a lyrical input, such as in step 206, may be the number of characters, or character count (“CC"), and the number of words, or word count ("WC”) included in the lyrical input. Any suitable method may be used to determine the CC and WC.
  • the system may determine WC by counting spaces between groups of characters, or by recognizing words in groups of characters by reference to a database of known words in a particular language or selection of languages.
  • Other data points determined by the system during analysis of the lyrical input may be the number of syllables, or syllable count ("TC") and the number of sentences, or sentence count ("SC").
  • TC and SC may be determined in any suitable manner, for example, by analyzing punctuation and spacing for SC, or parsing words into syllables by reference to a word database stored in the media database 109 or elsewhere.
  • the system may analyze and parses the input text to determine values such as the CC, WC, TC, and SC. In some embodiments, this parsing may be conducted at the server 108, but it is also contemplated that, in some embodiments, parsing of the input text may be conducted on the client device 101-105. In certain embodiments, during analysis, the system may insert coded start flags and end flags at the beginning and end of each word, syllable, and sentence to mark the determination made during analysis.
  • a start flag at the beginning of a sentence may be referred to as the sentence start ("SS"), and the location of the end flag at the end of a sentence may be referred to as the sentence end ("SE").
  • SS sentence start
  • SE sentence end
  • words or syllables of the lyrical input may be flagged for a textual emphasis.
  • the system methodology for recognizing such instances in which words or syllables should receive textual emphasis may be based on language or be culturally specific.
  • another analysis conducted by the system on the input text may be determining the phrase class ("PC") of each of the CC and the WC.
  • the phrase class of the character count will be referred to as the CCPC and the phrase class of the word count will be referred to as the WCPC.
  • the value of the phrase class may be a sequentially indexed set of groups representing increasing sets of values of CC or WC. For example, a lyrical input with CC of 0 may have a CCPC of 1, and a lyrical input with a WC of 0 may have a WCPC of 1.
  • a lyrical input with a CC of between 1 and 6 may have a CCPC of 2
  • a lyrical input with a WC of 1 may have a WCPC of 2.
  • the CCPC and WCPC may then increase sequentially as the CC or the WC increases, respectively.
  • Table 1 illustrates, for exemplary and non-limiting purposes only, a possible classification of CCPC and WCPC based on CC and WC in a lyrical input.
  • the system may determine an overall phrase class for the entire lyrical input by the user, or the user phrase class ("UPC"). This determination may be made by giving different weights to different values of CCPC and WCPC, respectively. In some embodiments, greater weight may be given to the WCPC than the CCPC in
  • phrase class system and weighting system explained herein m variable based on several factors related to the selected musical input such as mood, genre, style, etc., or other factors related to the lyrical input, such as important words or phrases as determined during analysis of the lyrical input.
  • the musical input selected or provided by the user may be parsed during analysis and processing, such as in step 210 of FIG. 2.
  • the system may parse the musical input selected or provided by the user to determine a variety of data points.
  • One data point determined in the analysis may be the number of notes, or note count ("NC") in the particular musical input.
  • Another product of the analysis that may be done on the musical input may include determining the start and end of musical phrases throughout the musical input.
  • a musical phrase may be analogous to a linguistic sentence in that a musical phrase is a grouping of musical notes that conveys a musical thought.
  • the analysis and processing of the selected musical input may involve flagging the beginnings and endings of each identified musical phrase in a musical input.
  • a phrase class of the of the lyrical input UPC
  • a phrase class of the source musical input referred to as source phrase class (“SPC") may be determined, for example, based on the number of musical phrases and note count identified in the musical input.
  • PS PS
  • PE phrase end
  • the PS and the PE in the musical input may be analogous to the sentence start (SS) and sentence end (SE) in the lyrical input.
  • the PS and PE associated with the preexisting musical works may be pre-recorded and stored on the server 108 or the client device 101-105, where they may be available for selection by the user as a musical input.
  • the locations of PS and PE for the musical input may be pre-determined and analysis of the musical input involves retrieving such information from a store location, such as the media database 109.
  • further analysis is conducted to distinguish musical phrases in the musical input and, thus, determine the corresponding PS and PE for each identified musical phrase.
  • the phrase classes of the lyrical input and the musical input are compared to determine the parity or disparity between the two inputs. It should be understood that, although the disclosure describes comparing corresponding lyrical inputs and musical inputs using phrase classes, other methodologies for making comparisons between lyrical inputs and musical inputs are contemplated herein.
  • the phrase class comparison can take place upon correlating the musical input with the lyrical input based on the respective analyses, such as at step 212.
  • parity between a lyrical input and a musical input is analyzed by determining the phrase differential ("PD") between corresponding lyrical inputs and musical inputs provided by the user.
  • determining the PD is by dividing the user phrase class (UPC) by the source phrase class (SPC), as shown in Equation 3, below:
  • perfect phrase parity between the lyrical input and the musical input would result in a PD of 1.0, where the UPC and the SPC are equal. If the lyrical input is "shorter" than the musical input, the PD may have a value less than 1.0, and if the lyrical input is "longer” than the musical input, the PD may have a value of greater than 1.0.
  • Those with skill in the art will recognize that similar results could be obtained by dividing the SPC by the UPC, or with other suitable comparison methods.
  • Parity between the lyrical input and the musical input may also be determined by the "note” differential ("ND") between the lyrical input and the musical input provided by the user.
  • ND the "note" differential
  • One example of determining the ND is by taking the difference between the note count (NC) and the analogous syllable count (TC) of the lyrical input. For example:
  • perfect phrase parity between the lyrical input and the musical input would be an ND of 0, where the NC and the TC are equal. If the lyrical input is "shorter” than the musical input, the ND may be greater than or equal to 1, and if the lyrical input is "longer” than the musical input, the ND may be less than or equal to -1.
  • Those with skill in the art will recognize that similar results could be obtained by subtracting the NC from the TC, or with other suitable comparison methods.
  • the sentence starts (SS) and sentence ends (SE) of the lyrical input may align with the phrase starts (PS) and phrase ends (PE), respectively, of the musical input if the parity is perfect or close to perfect (i.e., high parity).
  • the SE and the PE may not align well when the SS and the PS are set aligned to one another.
  • various methods of processing the musical input and the lyrical input can be utilized to provide an optimal outcome for the musical work. In some embodiments, these techniques or editing tools may be applied automatically by the system, or may be manually applied by a user.
  • syllabic matching One example of a solution to correlate text and musical inputs is syllabic matching.
  • ND note differential
  • NC note count
  • TC syllable count
  • PD phrase differential
  • syllabic matching can involve simply matching the syllables in the text input to the notes in the musical input and/or matching the text input sentences to the musical input musical phrases.
  • PD is slightly greater than or less than to 1.0 and/or ND is between, for example, 1 and 5 or -1 and -5
  • melodic reduction or embellishment can be used to provide correlation between the inputs.
  • Melodic reduction involves reducing the number of notes played in the musical input and can be used when the NC is slightly greater than the TC (e.g., ND is between approximately 1 and 5) or the musical source phrase class (SPC) is slightly greater than the user phrase class (UPC) (e.g., PD is slightly less than 1.0). Reducing the notes in the musical input can shorten the overall length of the musical input and result in the NC being closer to or equal to the TC of the text input, increasing the phrase parity.
  • SPC musical source phrase class
  • UPC user phrase class
  • melodic embellishment involves adding notes to (i.e., "embellishing") the musical input.
  • melodic embellishment is used when the NC is slightly less than the TC (e.g., ND is between -1 and -5) or the SPC is slightly less than the UPC (e.g., PD is slightly greater than 1.0). Adding notes in the musical input can lengthen the musical input, which can add to the NC or SPC and, thus, increase the parity between the inputs.
  • the additional notes added to the musical work are determined by analyzing the original notes in the musical work and adding notes that make sense musically. For example, in some embodiments, the system may only add notes in the same musical key as the original musical work, or notes that maintain the tempo or other features of the original work so as to aide in keeping the musical work recognizable. It should be understood that although melodic reduction and embellishment have been described in the context of slight phrase disparity between the musical and text inputs, use of melodic reduction and embellishment in larger or smaller phrase disparity is also
  • a system for audio generation may be used by or in conjunction with the lyric video system.
  • the system may receive timing information from multiple sources, but may ultimately be converted into MIDI and MusicXML data, or other suitable data formats.
  • a performance of the timing data may be created at a stage where the system mimics a human technician by slightly adjusting pitch and timing information to match the original intent of the timing source, i.e. a song or other audio recording.
  • the system may then determine an appropriate voice model based on inputs associated with the timing data.
  • the inputs may be a music artist name, title of the work, gender of the speaker, musical key, etc.
  • the performance may be converted into a suitable data format along with the MusicXML and a voice model ID. Together, these inputs may be transmitted to a synthesis stage, which may outputs vocal audio.
  • FIG. 3 shows flow chart of an embodiment of a method for audio generation 300 that may be used in conjunction with the lyric video system.
  • the system may receive audio timing information at 302, receive digital sheet music, such as in MusicXML format at 304, or receive song audio track sourced from a master or other recording source at 306 for a particular audio selection. In each case, the received data may be converted to or remain as MusicXML data, for example, or another suitable digital format.
  • the system may receive song data, such as the artist, genre, tempo, song title, key, tone, etc.
  • the system may determine a vocalist gender, style, or ideal voice model based on the received song data.
  • the system may generate MIDI data for the audio selection based on the MusicXML data.
  • the system may conduct MIDI performance manipulation. For example, in some embodiments, the system may adjust the pitch or the length of a note to fit requirements for a performance MIDI based on the voice data and the song data.
  • the system may conduct MIDI timing manipulation. For example, the system may adjust note timing/length to fit requirements for a performance MIDI base on the ideal voice model, song data, etc.
  • the system may receive a lyric input, which may be received from a local or third party lyric database or from a user input.
  • the system may generate a text-to-music MusicXML based on the lyric input from 318 and the MIDI timing information from 316. Further detail on methods by which lyrical text data may be matched with music or musical input data are described above, and further in co-pending U.S. Patent Application No. 15/986,589.
  • the system may generate a pitch curve based on the MIDI performance manipulation result in 314 and the ideal voice model data from 312 using, for example, a song driven synthesizer.
  • vocal audio may be generated based on the ideal voice model data from 312, the text-to-music MusicXML generated at 322, and the pitch curve from 320.
  • the lyric video system may utilize the methods as described above with reference to FIG. 2 and the media generation system or FIG. 3 and the audio generation system as the audio selection for the lyric video system 100.
  • the audio selection may be a pre-recorded song, either by the user, a third party, or may be a commercially available song or other piece of audio.
  • the audio selection may be selected from a third-party music database, such as Apple iTunes ® Store, Spotify ® , Amazon Music ® , or any other third-party database.
  • the audio selection may be a song or audio file stored on a user device 101-105, or stored on a third-party remote server or cloud platform accessible via the Internet or other network.
  • an animation generation system of the lyric video system may generate a digital movie file that may include, for example, a video with lyric animations.
  • the animation generation system may begin with the same or similarly sourced timing data as used in the audio generation system described with regard to FIG. 3. Based on a lyric input, along with the timing data, the system may ultimately generate a visual animation that may be paired with a digital movie file audio to complete a final digital movie file.
  • the lyric input may be analyzed for logical breaks like stanzas or song sections. Examples of this type of textual analysis are described above and further with regard to co-pending U.S. Patent Application No.
  • the system may insert animations onto the determined stanzas or song sections, or on identified key words in the lyric input.
  • information about the lyric input may be shared with a third party system to retrieve additional information that may help the system determine a color palette, imagery and animations suitable to the song or lyrics.
  • themed animation pools may be introduced and selected based on genre, mood, tempo and text /word length.
  • the animation may be rendered in real time as the system receives information. The audio and animation may then be combined to render a final digital movie file.
  • FIG. 5 shows an embodiment of a method 500 for using the animation generation system of the lyric video system.
  • the system may receive a digital music score of an audio selection.
  • the digital music score may be received from a third- party repository, such as a sheet music warehouse, or other database.
  • the digital score may be store in a local system database, cloud storage, or on a user device.
  • the system may receive MusicXML data directly as the audio input, for example, from a MusicXML warehouse or other database.
  • the system may receive a song audio track sourced from a master or from any suitable source, including cloud streaming services, third-party databases, local storage, etc.
  • a MusicXML or other suitable data format may be generated from the digital sheet music or from the song audio track. Based on any of 502, 504, and 506, the system may generate a melody MIDI at 508.
  • the melody MIDI may include timing and pitches of the lead vocal in the audio selection based on timing information included in the audio selection either in the MusicXML format or otherwise.
  • the system may receive a lyric input that may be the text of the lyrics in the audio selection.
  • the lyric input may be the words to a third party song, or it may be the text input for lyrics provided by a user during the process described above with reference to FIG. 2.
  • the system may conduct a lyric analysis to generate a lyric timeline and assign lyric features based on the analysis.
  • lyric features may include analyzing the specific words in a lyrical input and assigning colors, images, animation, or other graphical or video features based on the meanings or context of the words. For example, if the lyric input includes the word "love," the lyric analysis may assign the color red to the word, stanza, verse, or section of the audio selection containing the word. In other embodiments, the system may assign certain imagery or animation based on certain other keywords or repeated words in the lyric input.
  • the system may transmit a song or audio selection identifier to a third party database or index based on information in the MusicXML or the audio selection identification more generally.
  • the system may then receive tone information about the audio selection.
  • the third party database may transmit tone information including the genre, mood, tempo, tone, style, significance, situational grouping information of artist or song, etc., which may be received by the system.
  • the tone information may be readily available on locally on a user device or cloud, or may be from a third party.
  • the system may determine graphic imagery that matches with or is otherwise most appropriate based on the tone information from 514, and may match the graphic imagery to the timing of the lead vocals generated in the melody MIDI at 508.
  • the graphic imagery may be, for example, color palette, animations, or other imagery reflecting specific moods, tones, or contexts of the audio selection.
  • the system may determine thematic animation to be incorporated into a lyric video based on the tone information received in 514 and the timing information.
  • the thematic animation may be selected from Java Script Object Notation (JSON) thematic animation pools, which may be determined based on genre, mood, tempo, and situational grouping and based on the word length determined in the timing data.
  • JSON Java Script Object Notation
  • the system may render an animation sequence for the audio selection to generate a lyric video.
  • the animation may be generated in real time, allowing for almost immediate playback and viewing by a user.
  • the system may perform the analysis of FIG. 5 on a verse by verse or section by section basis so the lyric video may begin playback before the entire audio selection may be rendered.
  • the system may render an entire audio selection before playback, and preserve the lyric video for selective playback by a user.
  • the lyric video may include color background determined based on tone information, lyric analysis, and timing information received or determined by the system.
  • the lyric video may flash across the screen as they are performed in the audio selection playback.
  • the words may be depicted in varying fonts, styles, colors, and animations that grow, shrink, move, or are otherwise adjusted and varied as a result of the analysis in FIG. 5.
  • the lyric video may also include background colors that change, shift, or flash according to the analysis in method 500.
  • the lyric video may include themed animations selected to correspond with themes of the music, genre, lyrics, tone, etc., of the audio selection.
  • the system may generate an original lyric video.
  • FIG. 6 shows a flow chart of another embodiment of a method 600 of using the lyric video system.
  • the system may receive an audio selection from a user, e.g., via a user device either locally or via a network.
  • the user may select the audio selection from a list, or may input the audio selection through a search or other input.
  • the audio selection may be selected in a third party application or database, such as the Apple iTunes Store®, Amazon Music®, or Spotify®.
  • the system may receive the audio selection via a song ID or other suitable notification or identification.
  • the audio selection may be played in real time and captured by the system.
  • the system may, at 604, determine timing information of the audio selection.
  • the timing information may be received along with the audio selection.
  • the timing information may be determined by querying a local or third party database, such as a digital sheet music database or MusicXML database.
  • the timing information of the audio selection may include lyric timing, such as when each word or syllable is played/sung in the song, and note timing.
  • parsing of the audio selection using methods described above with reference to FIG. 2 may be implemented to determine at least portions of the timing information.
  • a MIDI file may be generated based on the timing information and/or MusicXML data of the audio selection.
  • the system may determine lyric information of the audio selection, i.e., the words used or sung in the audio selection.
  • the lyric information may be determined via digital sheet music, a lyric database (either third party or local), or another suitable lyric source.
  • the system may identify the lyric information using voice recognition, such as by converting the spoken or sung words in the audio selection into text. This conversion may be done by the system itself or by using third party sources and received back into the system for analysis.
  • the system may analyze the lyric information of the audio selection. For example, the system may determine keywords among the lyric information that indicates the style, mood, or often repeated terms.
  • the system may also identify words commonly indicating particular moods or genres.
  • the system may create a timeline that assigns colors to verses or stanzas of the lyrics based on the lyric analysis.
  • the lyric analysis may include inserting particular imagery and/or animations associated with particular lyrics, phrases, verses, or stanzas.
  • parsing of the audio selection using methods described above with reference to FIG. 2 may be implemented to conduct at least portions of the lyric analysis.
  • the system may receive tone information of the audio selection.
  • the system may include a database of songs and the associated genre, mood, tempo, situational grouping, artist, style, etc.
  • the system may transmit the audio selection (via song ID or otherwise) to a third party database or application, requesting tone information of the audio selection.
  • the system may then receive tone information from the third party database or application, such as genre, mood, tempo, situational grouping, artist, style, etc.
  • the system may determine video content for a lyric video based one or all of the timing information, the lyric analysis and lyric information, and the tone information.
  • the video content automatically selected by the system may be at least partially determined by the tone information. For example, if the tone information is determined to be upbeat, happy, in a major key, etc., the system may select animation or graphics from a thematic animation pool that includes happy, upbeat visualizations with bright colors. In another example, if the tone information is determined to be somber, slow, in a minor key, etc., the system may select corresponding animation or graphics that sad or slow with more dark or drab colors to match the tone.
  • the video content may also be selected at least partially based on the timing information of the audio selection.
  • the visualizations chosen and the timing of the visualizations in the video content may be based on word length and timing of the lyrics.
  • system may match a graphic or image in the video content to be displayed for the length of a particular word in the lyrics, and to be removed or replaced with another graphic or animation once the lyric is finished.
  • the video content selection or determination may be based at least partially on the lyric analysis.
  • the system may determine that particular lyrics may be commonly associated with particular visualizations or animations, such as the word "love” being associated with hearts or flowers, or other associations.
  • the system may render the lyric video or portions of the lyric video based on the video content.
  • the lyric video may be a video file including audio of the audio selection played along with the video content determined by the system.
  • the video content may include animations, graphics, imagery and other visualizations along with visual depictions of the audio selection's lyrics.
  • the lyrics may be displayed in the lyric video with timing that matches the occurrence of those lyrics in the playback of the audio selection.
  • the visual depiction of the lyrics may be moving, varying fonts or sizes depending on the analysis done above, or varying colors to fit the tone information, lyric analysis, and timing information. In some embodiments, however, the lyrics themselves may not be displayed in the video content, or sometimes just certain lyrics will be selected for visualization. In some embodiments, however, the lyrics themselves may not be displayed in the video content, or sometimes just certain lyrics will be selected for visualization. In some
  • the graphics, animation, or other visualizations of the video content may be correlated to the timing of the audio selection, such as to the beat, tempo, lyric timing, etc.
  • the lyric video may be rendered all at once and saved as a video file that may be played back or transferred to another user or device.
  • the system may render the lyric video in substantially real time, lyric by lyric, verse by verse, phrase by phrase, or section by section of the audio selection. In such embodiments, playback of the lyric video may be possible before the system has finished rendering video content for the entire audio selection.
  • the system may apply machine learning techniques or other automatic analysis to determine timing information, lyric information and analysis, and tone information without the need to receive information from third party sources.
  • the system may receive an audio selection or input, automatically derive lyrics, timing information, lyric analysis, and tone information using reference databases and machine learning techniques. The system may then select video content based on the derived information and render the lyric video accordingly.
  • the lyric video system and the method for operating such lyric video system described herein could be performed on a single client device, such as client device 104 or server 108, or could be performed on a variety of devices, each device including different portions of the system and performing different portions of the method.
  • client device 104 or server 108 could perform most of the steps illustrated in FIG. 2, but the voice synthesis could be performed by another device or another server.
  • the following includes a description of one embodiment of a single device that could be configured to include the lyric video system described herein, but it should be understood that the single device could alternatively be multiple devices.
  • FIG. 4 shows one embodiment of the system 100 that may be deployed on any of a variety of devices 101-105 or 108 from FIG. 1, or on a plurality of devices working together, which may be, for illustrative purposes, any multi-purpose computer (101, 102), hand-held computing device (103-105) and/or server (108).
  • FIG. 4 depicts the system 100 operating on device 104 from FIG 1., but one skilled in the art would understand that the system 100 may be deployed either as an application installed on a single device or, alternatively, on a plurality of devices that each perform a portion of the system's operation.
  • the system may be operated within an http browser environment, which may optionally utilize web-plug in technology to expand the functionality of the browser to enable functionality associated with system 100.
  • Device 104 may include many more or less components than those shown in FIG. 4. However, it should be understood by those of ordinary skill in the art that certain components are not necessary to operate system 100, while others, such as processor, video display, and audio speaker are important to practice aspects of the present invention.
  • device 104 includes a processor 402, which may be a CPU, in communication with a mass memory 404 via a bus 406.
  • processor 402 could also comprise one or more general processors, digital signal processors, other specialized processors and/or ASICs, alone or in combination with one another.
  • Device 104 also includes a power supply 408, one or more network interfaces 410, an audio interface 412, a display driver 414, a user input handler 416, an illuminator 418, an input/output interface 420, an optional haptic interface 422, and an optional global positioning systems (GPS) receiver 424.
  • GPS global positioning systems
  • Device 104 may also include a camera, enabling video to be acquired and/or associated with a particular musical message. Video from the camera, or other source, may also further be provided to an online social network and/or an online music community. Device 104 may also optionally communicate with a base station or server 108 from FIG. 1, or directly with another computing device. Other computing device, such as the base station or server 108 from FIG. 1, may include additional audio-related components, such as a professional audio processor, generator, amplifier, speaker, XLR connectors and/or power supply.
  • additional audio-related components such as a professional audio processor, generator, amplifier, speaker, XLR connectors and/or power supply.
  • power supply 408 may comprise a rechargeable or non- rechargeable battery or may be provided by an external power source, such as an AC adapter or a powered docking cradle that could also supplement and/or recharge the battery.
  • Network interface 410 includes circuitry for coupling device 104 to one or more networks, and is constructed for use with one or more communication protocols and technologies including, but not limited to, global system for mobile communication (GSM), code division multiple access (CDMA), time division multiple access (TDMA), user datagram protocol (UDP), transmission control protocol/Internet protocol (TCP/IP), SMS, general packet radio service (GPRS), WAP, ultra wide band (UWB), IEEE 802.16 Worldwide Interoperability for Microwave Access (WiMax), SIP/RTP, or any of a variety of other wireless communication protocols.
  • GSM global system for mobile communication
  • CDMA code division multiple access
  • TDMA time division multiple access
  • UDP user datagram protocol
  • TCP/IP transmission control protocol/Internet protocol
  • SMS general packet radio service
  • GPRS
  • network interface 410 may include as a transceiver, transceiving device, or network interface card (NIC).
  • NIC network interface card
  • Audio interface 412 (FIG. 4) is arranged to produce and receive audio signals such as the sound of a human voice.
  • Display driver 414 (FIG. 4) is arranged to produce video signals to drive various types of displays.
  • display driver 414 may drive a video monitor display, which may be a liquid crystal, gas plasma, or light emitting diode (LED) based-display, or any other type of display that may be used with a computing device.
  • Display driver 414 may alternatively drive a hand-held, touch sensitive screen, which would also be arranged to receive input from an object such as a stylus or a digit from a human hand via user input handler 416.
  • Device 104 also comprises input/output interface 420 for communicating with external devices, such as a headset, a speaker, or other input or output devices.
  • Input/output interface 420 may utilize one or more communication technologies, such as USB, infrared, BluetoothTM, or the like.
  • the optional haptic interface 422 is arranged to provide tactile feedback to a user of device 104.
  • the optional haptic interface 422 may be employed to vibrate the device in a particular way such as, for example, when another user of a computing device is calling.
  • Optional GPS transceiver 424 may determine the physical coordinates of device
  • GPS transceiver 424 can also employ other geo-positioning mechanisms, including, but not limited to, tri angulation, assisted GPS (AGPS), E-OTD, CI, SAI, ETA, BSS or the like, to further determine the physical location of device 104 on the surface of the Earth.
  • AGPS assisted GPS
  • E-OTD E-OTD
  • CI CI
  • SAI SAI
  • ETA ETA
  • BSS BSS
  • MAC address MAC address
  • IP address IP address
  • mass memory 404 includes a RAM 423, a ROM 426, and other storage means.
  • Mass memory 404 illustrates an example of computer readable storage media for storage of information such as computer readable instructions, data structures, program modules, or other data.
  • Mass memory 404 stores a basic input/output system ("BIOS") 428 for controlling low-level operation of device 104.
  • BIOS basic input/output system
  • the mass memory also stores an operating system 430 for controlling the operation of device 104.
  • this component may include a general purpose operating system such as a version of MAC OS, WINDOWS, UNIX, LINUX, or a specialized operating system such as, for example, Xbox 360 system software, Wii IOS, Windows MobileTM, iOS, Android, webOS, QNX, or the
  • the operating system may include, or interface with, a Java virtual machine module that enables control of hardware components and/or operating system operations via Java application programs.
  • the operating system may also include a secure virtual container, also generally referred to as a "sandbox,” that enables secure execution of applications, for example, Flash and Unity.
  • One or more data storage modules may be stored in memory 404 of device 104.
  • data storage modules may also be stored on a disk drive or other storage medium associated with device 104.
  • These data storage modules may store multiple track recordings, MIDI files, WAV files, samples of audio data, and a variety of other data and/or data formats or input melody data in any of the formats discussed above.
  • Data storage modules may also store information that describes various capabilities of system 100, which may be sent to other devices, for instance as part of a header during a communication, upon request or in response to certain events, or the like.
  • data storage modules may also be employed to store social networking information including address books, buddy lists, aliases, user profile information, or the like.
  • Device 104 may store and selectively execute a number of different applications, including applications for use in accordance with system 100.
  • application for use in accordance with system 100 may include Audio Converter Module, Recording Session Live Looping (RSLL) Module, Multiple Take Auto-Compositor (MTAC) Module, Harmonizer Module, Track Sharer Module, Sound Searcher Module, Genre Matcher Module, and Chord Matcher Module.
  • RSLL Recording Session Live Looping
  • MTAC Multiple Take Auto-Compositor
  • Harmonizer Module Harmonizer Module
  • Track Sharer Module Sound Searcher Module
  • Genre Matcher Module Genre Matcher Module
  • Chord Matcher Module Chord Matcher Module
  • the applications on device 104 may also include a messenger 434 and browser
  • Messenger 434 may be configured to initiate and manage a messaging session using any of a variety of messaging communications including, but not limited to email, Short Message Service (SMS), Instant Message (FM), Multimedia Message Service (MMS), internet relay chat (IRC), mIRC, RSS feeds, and/or the like.
  • SMS Short Message Service
  • FM Instant Message
  • MMS Multimedia Message Service
  • IRC internet relay chat
  • messenger 434 may be configured as an FM messaging application, such as AOL Instant Messenger, Yahoo! Messenger, .NET Messenger Server, ICQ, or the like
  • messenger 434 may be a client application that is configured to integrate and employ a variety of messaging protocols.
  • messenger 434 may interact with browser 436 for managing messages.
  • Browser 436 may include virtually any application configured to receive and display graphics, text, multimedia, and the like, employing virtually any web based language.
  • the browser application is enabled to employ Handheld Device Markup Language (HDML), Wireless Markup Language (WML), WMLScript, JavaScript, Standard Generalized Markup Language (SMGL), HyperText Markup Language (HTML), extensible Markup Language (XML), and the like, to display and send a message.
  • HDML Handheld Device Markup Language
  • WML Wireless Markup Language
  • WMLScript Wireless Markup Language
  • JavaScript Standard Generalized Markup Language
  • SMGL Standard Generalized Markup Language
  • HTML HyperText Markup Language
  • XML extensible Markup Language
  • any of a variety of other web-based languages including Python, Java, and third party web plug-ins, may be employed.
  • Device 104 may also include other applications 438, such as computer executable instructions which, when executed by client device 104, transmit, receive, and/or otherwise process messages (e.g., SMS, MMS, FM, email, and/or other messages), audio, video, and enable telecommunication with another user of another client device.
  • applications 438 such as computer executable instructions which, when executed by client device 104, transmit, receive, and/or otherwise process messages (e.g., SMS, MMS, FM, email, and/or other messages), audio, video, and enable telecommunication with another user of another client device.
  • Other examples of application programs include calendars, search programs, email clients, EVI applications, SMS applications, VoIP applications, contact managers, task managers, transcoders, database programs, word processing programs, security applications, spreadsheet programs, games, search programs, and so forth.
  • Each of the applications described above may be embedded or, alternately, downloaded and executed on device 104.
  • each of these applications may be implemented on one or more remote devices or servers, wherein inputs and outputs of each portion are passed between device 104 and the one or more remote devices or servers over one or more networks.
  • one or more of the applications may be packaged for execution on, or downloaded from a peripheral device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne un procédé mis en œuvre par ordinateur permettant de générer automatiquement des vidéos de paroles, lequel procédé consiste à recevoir une sélection audio, à déterminer des informations de synchronisation de la sélection audio, et à déterminer des informations de paroles de la sélection audio. Le procédé consiste à recevoir des informations de tonalité de la sélection audio et à générer un contenu vidéo sur la base des informations de synchronisation, des informations de paroles et/ou des informations de tonalité de la sélection audio. Le procédé consiste également à restituer une vidéo de paroles sur la base du contenu vidéo et de la sélection audio.
EP18823868.7A 2017-06-26 2018-06-22 Système et procédé de génération automatique de supports Withdrawn EP3646315A4 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201762524838P 2017-06-26 2017-06-26
US15/986,589 US20180268792A1 (en) 2014-08-22 2018-05-22 System and method for automatically generating musical output
PCT/US2018/039093 WO2019005625A1 (fr) 2017-06-26 2018-06-22 Système et procédé de génération automatique de supports

Publications (2)

Publication Number Publication Date
EP3646315A1 true EP3646315A1 (fr) 2020-05-06
EP3646315A4 EP3646315A4 (fr) 2021-07-21

Family

ID=64742625

Family Applications (1)

Application Number Title Priority Date Filing Date
EP18823868.7A Withdrawn EP3646315A4 (fr) 2017-06-26 2018-06-22 Système et procédé de génération automatique de supports

Country Status (5)

Country Link
EP (1) EP3646315A4 (fr)
CN (1) CN111316350A (fr)
BR (1) BR112019027726A2 (fr)
CA (1) CA3067097A1 (fr)
WO (1) WO2019005625A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110600034B (zh) * 2019-09-12 2021-12-03 广州酷狗计算机科技有限公司 歌声生成方法、装置、设备及存储介质
CN111768755A (zh) * 2020-06-24 2020-10-13 华人运通(上海)云计算科技有限公司 信息处理方法、装置、车辆和计算机存储介质
CN112184861B (zh) * 2020-12-01 2021-07-30 成都极米科技股份有限公司 歌词编辑、显示方法、装置及存储介质
CN113709548B (zh) * 2021-08-09 2023-08-25 北京达佳互联信息技术有限公司 基于图像的多媒体数据合成方法、装置、设备及存储介质
CN117932110A (zh) * 2024-03-20 2024-04-26 深圳市海勤科技有限公司 一种歌词自动处理方法、计算机设备和蓝牙音响

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3239897B1 (ja) * 2001-03-14 2001-12-17 ヤマハ株式会社 作詞作曲装置及びプログラム
JP4159961B2 (ja) * 2003-09-30 2008-10-01 ヤマハ株式会社 カラオケ装置
US8996538B1 (en) * 2009-05-06 2015-03-31 Gracenote, Inc. Systems, methods, and apparatus for generating an audio-visual presentation using characteristics of audio, visual and symbolic media objects
AU2014204540B1 (en) * 2014-07-21 2015-08-20 Matthew Brown Audio Signal Processing Methods and Systems

Also Published As

Publication number Publication date
BR112019027726A2 (pt) 2020-08-18
WO2019005625A1 (fr) 2019-01-03
EP3646315A4 (fr) 2021-07-21
CN111316350A (zh) 2020-06-19
CA3067097A1 (fr) 2019-01-03

Similar Documents

Publication Publication Date Title
US10529310B2 (en) System and method for automatically converting textual messages to musical compositions
US20180374461A1 (en) System and method for automatically generating media
US20190147838A1 (en) Systems and methods for generating animated multimedia compositions
US20180268792A1 (en) System and method for automatically generating musical output
EP3646315A1 (fr) Système et procédé de génération automatique de supports
US20200372896A1 (en) Audio synthesizing method, storage medium and computer equipment
CN108962219B (zh) 用于处理文本的方法和装置
JP2018537727A5 (fr)
WO2018217790A1 (fr) Système et procédé de production automatique de sortie musicale
EP2704092A2 (fr) Système de création de contenu musical à l'aide d'un terminal client
CA2764042C (fr) Systeme et procede de reception, d'analyse et d'emission de contenu audio pour creer des compositions musicales
US9196241B2 (en) Asynchronous communications using messages recorded on handheld devices
CN107516511A (zh) 意图识别和情绪的文本到语音学习系统
CN107799119A (zh) 音频制作方法、装置及系统
JP2010048980A (ja) 自動会話システム、並びに会話シナリオ編集装置
WO2015140396A1 (fr) Procédé pour fournir à un utilisateur un retour sur l'interprétation d'une chanson de karaoké
CN111782576B (zh) 背景音乐的生成方法、装置、可读介质、电子设备
CN111292717A (zh) 语音合成方法、装置、存储介质和电子设备
WO2018094952A1 (fr) Procédé et appareil de recommandation de contenu
CN109741724B (zh) 制作歌曲的方法、装置及智能音响
CN114974184A (zh) 音频制作方法、装置、终端设备及可读存储介质
WO2020010329A1 (fr) Systèmes et procédés de génération de compositions multimédia animées
JP6587459B2 (ja) カラオケイントロにおける曲紹介システム
CN112669849A (zh) 用于输出信息的方法、装置、设备以及存储介质
KR20240033535A (ko) 대화 맥락에 어울리는 음원을 생성하여 제공하는 방법, 컴퓨터 장치, 및 컴퓨터 프로그램

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20200124

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RIN1 Information on inventor provided before grant (corrected)

Inventor name: WOODWARD, PATRICK

Inventor name: GROVES, RYAN

Inventor name: WEBB, THOMAS

Inventor name: BAZYLEVSKY, BO

Inventor name: SCHOFIELD, ED

Inventor name: HARRISON, BRETT

Inventor name: MITCHELL, JAMES

Inventor name: KOVAC, RICKY

Inventor name: SERLETIC, MATTHEW, MICHAEL

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20210618

RIC1 Information provided on ipc code assigned before grant

Ipc: G10H 1/00 20060101AFI20210614BHEP

Ipc: G10H 1/36 20060101ALN20210614BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20220104