WO2014100893A1 - System and method for the automated customization of audio and video media - Google Patents

System and method for the automated customization of audio and video media Download PDF

Info

Publication number
WO2014100893A1
WO2014100893A1 PCT/CA2013/001084 CA2013001084W WO2014100893A1 WO 2014100893 A1 WO2014100893 A1 WO 2014100893A1 CA 2013001084 W CA2013001084 W CA 2013001084W WO 2014100893 A1 WO2014100893 A1 WO 2014100893A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
message
accordance
video
video media
Prior art date
Application number
PCT/CA2013/001084
Other languages
French (fr)
Inventor
Jérémie Salvatore De Villiers
Original Assignee
Jérémie Salvatore De Villiers
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jérémie Salvatore De Villiers filed Critical Jérémie Salvatore De Villiers
Publication of WO2014100893A1 publication Critical patent/WO2014100893A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Definitions

  • the present disclosure relates to a system and method for the automated customization of audio and video media. More specifically, the present disclosure relates to a method and system for the Integration of audio messages into audio or video media and/or the i ntegration of audio/video or video messages into video media as well as the integration of text/captions and/or voice messages with user provided images and selected musical compositions.
  • the present disclosure provides a method for creating a customized video media, the method comprising:
  • creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
  • step of acquiring at least one audio message includes the sub-steps of:
  • the present disclosure provides a system for creating a customized video media, the system comprising;
  • processor operatively connected to the database and the user interface, the processor being so configured so as to:
  • processor is further configured so as to:
  • system further comprises:
  • a recording interface operatively connected to the processor
  • the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the processor being further configured so as to: acquire at least one audio message;
  • processor is further configured so as to, when acquiring at least one audio message:
  • the present disclosure also provides a method and system for customizing an audio/video media provided with at least one a pre-treated segment for receiving a message, the customizing Including applying digital signal processing effects and/or digital video filters to the message in accordance with treatment parameters selected according to the audio/video media and/or the means of acquiring the message.
  • FIG. 1 is a schematic view of an illustrative example of the network operating environment of the audio/video customization system
  • FIGS. 2A and 2B are a flow diagram of an illustrative example of the audio media customization process;
  • FIG. 3 is a flow diagram of an illustrative example of the audio mixing process;
  • FIGS, 4A and 4B are a flow diagram of an illustrative example of the video media customization process
  • FIG. 5 is a flow diagram of an illustrative example of the video mixing process
  • FIG. 6 is a flow diagram of an illustrative example of the slide show customization process.
  • FIG. 7 iB a schematic representation of an illustrative example of the customization server.
  • Audio media includes audio recordings, musical compositions, songs, speeches, spoken words, poems, etc.
  • Video media includes music video clips, movies, movie extracts, short films, video commercials, etc. It may have only video content or a combination of audio and video content.
  • Audio/video media includes audio only media, video only media or combined audio and video media.
  • Audio/video message includes audio only messages, video only messages or combined audio and video messages.
  • the non-limitative illustrative embodiment of the present disclosure provides a system and method for the automated customization of audio/video media. More specifically, the method and system allows the Integration of audio messages Into audio or video media and/or the integration of audio and video or video only messages Into video media as well as the creation of customized slide shows.
  • WAN wide area network
  • the audio/video customization system 30 includes a customization server 34, a media database 36 and a customized media database 38, all of which will be detailed further below.
  • the audio/video customization system 30 enables treated voice messages to be integrated into customized audio media such as, for example, musical compositions.
  • the customization involves the insertion of user-generated audio or text-to-voice converted messages within segments of pre-treated audio media, mixing the resulting customized audio media and presenting it as a new audio file in the form of, for example, an MP3 (or any other type of compressed audio file).
  • the pre-treated audio media for example consisting in new or pre-existing musical compositions or songs in which segments have been identified and modified in order to make space for the insertion of future audio messages, are stored in the media database 36.
  • the identified and modified segments are then used by the customization server 34 to allow an administrator of the audio/video customization system 30 to pre-program the positioning ant- length of each segment allotted for user-generated audio messages,
  • Each pra- treated audio media is typically pre-programmed to receive either one, two or three user-generated audio or text-to-voice converted messages depending on the make-up of the audio media. However, it is to be understood that some audio media may be pre-programmed to receive more than three audio or text-to-voice converted messages.
  • FIGS. 2A and 2B there is shown a flow diagram of an illustrative example of the audio media customization process 100 executed by the audio/video customization system 30, The steps of the process 100 are indicated by blocks 102 to 138.
  • the process 100 starts at block 102 where a user accesses the gateway server 32 of the audio/video customization system 30 and selects an audio media, for example a song, from the media database 36 he or she wishes to customize.
  • an audio media for example a song
  • the gateway server 32 may offer search capabilities, display audio media by categories, artist names, titles, etc.
  • the user inputs the information of the customized audio media's recipient.
  • This information may be, for example, an email address, phone number, physical location address, etc., and may also include, optionally, a text message intended for the recipient.
  • the user selects the number of messages to be inserted within the audio media, for example one, two or three. It is to be understood that the number of available segments for the Insertion of audio messages may vary depending on the selected audio media or settings of the audio/video customization system 30. If the number of messages to be inserted within the audio media is lesser than the number of available segments, the user may select which segments are to be filed or the audio/video customization system 30 may select the segments based on, for example, message length.
  • the user is asked to input his or her payment Information. This may be through a credit card, PaypalTM or any other suitable payment method. This step may also include the verification of the payment before proceeding further.
  • the user is asked, at block 110, if he or she wishes to provide his or her message(s) either by voice or by text.
  • the user is asked to select the means for providing his or her voice message(s),
  • the user may select to provide the voice message(s) by phone or through a web interface.
  • the user may be given the option of providing the voice message(s) as an P3 (or any other type of compressed audio file), through email or even by mail on a CD or other digital medium.
  • the user If the user has selected to provide the voice message(s) through a web interface, he or she is directed, at block 116, to a recording interface, for example a web recording page on the gateway server 32, which includes, for example, a java audio engine. Then, at block 118, the recording interface allows the verification of the user's computer 12, 14 microphone levels in order to prevent distortion in the recording.
  • the voice message(s) is recorded either the user's land phone, mobile phone or smart phone 18 (used as a microphone only in this case) through a telecommunication network 25 (land line, cellular network, etc.) or IP telephony, or computer 12, 14 microphone through the WAN 20, depending on the selected means of providing the voice message(s).
  • an audible beep may be used as a warning of the end of the allotted time, for example five seconds before the end, or an on-screen time bar may be used to give a visual indication of the remaining allotted time, depending once more on the selected means of providing the voice message(s),
  • the user Is asked to select the means for providing his or her text message(s).
  • the user may select to provide the text message(s) by phone or through a web Interface.
  • the user may be given the option of providing the text message(s) as a TXT file (or any other type of text file), through email or even by mail on a CD or other digital medium.
  • a phone number to send a text message to along with a pin associated with the selected audio media and recipient information.
  • the user is able to text the text message(s) using his or her mobile phone or smart phone 18 through a telecommunication network 25 (land line, cellular network, etc) or IP telephony.
  • a text input interface for example a web page with a text input box on the gateway server 32.
  • the text message(s) is converted into voice using a text- to-speech synthesis process, which processes are well known in the art.
  • the audio/video customization system 30 may provide the user with a selection of voice types (i.e. US male) for the synthesis.
  • the audio/video customization system 30 may also recommend one or more voice types that best match the mood or the content of the audio media. The matching can performed, for example, based on tags (e.g. "fast", "quiet") associated with the voice type and the audio media.
  • the user can verify the voice message(s) and, at block 132, accept or refuse the voice message(s). If the user refuses the voice message(s), the process 100 returns to either of blocks 114, 116, 124 or 126, depending on the means used to provide the voice/text message(s), where a new voice message(s) is recorded or produced from text. If the voice message(s) is accepted, the process 100 proceeds to block 134 where the voice message(s) is mixed with the chosen audio media.
  • the audio mixing process which is executed by the customization server 34, will be further detailed below.
  • the mixed audio media i.e, customized audio media
  • the mixed audio media is saved as, at block 136, for example, an MPS file in the customized media database 38.
  • the customized audio media is provided to the intended recipient, for example, by email, on a CD through regular mail, as a link to the customized audio media in the customized media database 38 or any other transmission means.
  • steps 102 to 130 may be part of an app for a personal assistant device or tablet 16 or a smart phone 18, or other such device, in which case the recording and/or text input interface is provided by the app.
  • the messages provided to the audio/video customization system 30 can be either audio or text to be converted to audio, or a combination thereof, and that these messages may be inputted into the audio/video customization system 30 using various means or interfaces or combination thereof. Therefore, depending on the specific embodiment, some of blocks 110 to 128 may be optional.
  • FIG. 3 there is shown a flow diagram of an illustrative example of the audio mixing process 200 executed by the customization server 34 at block 134 of the audio media customization process 100 (see FIGS. 2A and 2B). The steps of the process 200 are indicated by blocks 202 to 218.
  • the audio mixing process 200 automatically processes the voice message(s) through audio digital signal processing (DSP) affects so that the voice message(s) sound like It was recorded in a recording studio prior to being integrated into the audio media. This gives the final product, i.e. the customized audio media, a "professionally produced” sound.
  • DSP digital signal processing
  • the process 200 starts at block 202, where the voice message(s) is equalized and then, at block 204, compressed in order to regulate Its volume. Noise reduction is then applied, at block 206, to reduce background noise, followed by, at block 208, a noise gate to mute moments of silence.
  • reverb is applied to add different room ambiences and, at block 212, fading such as very fast fades at the beginning and the ending of the recorded audio message(s) in order to prevent pops and clicks.
  • the processed voice message(s) Is inserted into the pre-determined segments of the pre-treated audio media.
  • the processed voice message(s) is strategically placed in the allotted time segment(s) depending on the length of the message(s). If the user has not used up all of the time available for his or her message(s), the process 200 automatically places the processed voice message(s) at the end of the time allotted segment(s) in order to maximize the "professionally produced" effect.
  • audio encoding compression is applied, at block 216, to optimize portability, for example into an MP3 file, which is then, at block 218, provided to block 126 of process 100 (see FIGS. 2A and 2B).
  • the audio/video customization system 30 enables treated audio/video messages to be integrated into customized video media such as, for example, music video clips.
  • the customization involves the insertion of user- generated audio/video messages within segments of pre-treated video compositions, mixing the resulting customized audio/video media and presenting it as a new video file in the form of any type of compressed video file.
  • the pre-treated musical compositions which consist In new or pre-existing video clips media which segments have been identified and modified in order to make space for the Insertion of future audio/video messages, are stored in the media database 36.
  • the identified and modified segments are then used by the customization server 34 to allow an administrator of the audio/video customization system 30 to pre-program the positioning and length of each segment allotted for user-generated audio/video messages.
  • Each video media Is typically pre-programmed to receive either one, two or three user-generated audio/video messages depending on various factors, for example the musical make-up of a music video clip. However, it is to be understood that some video media may be pre-programmed to receive more than three audio/video messages.
  • FIGS. 4A and 4B there is shown a flow diagram of an Illustrative example of the video media customization process 300 executed by the audio/video customization system 30.
  • the steps of the process 300 are Indicated by blocks 302 to 324.
  • the process 300 starts at block 302 where a user accesses the gateway server 32 of the audio/video customization system 30 and selects a video media, for example a music video clip, from the media database 36 he or she wishes to customize, It is to be understood that the gateway server 32 may offer search capabilities, display music video media by categories, artist names, music video clip titles, etc.
  • the user inputs the information of the customized music video clip's recipient.
  • This information may be, for example, and email address, phone number, physical location address, etc. and may also include, optionally, a text message intended for the recipient.
  • the user selects the number of messages to be inserted within the video media, for example one, two or three. It is to be understood that the number of available segments for the insertion of audio/video messages may vary depending on the selected video media or settings of the audio/video customization system 30. If the number of messages to be inserted within the video media is lesser than the number of available segments, the user may select which segments are to be filed or the audio/video customization system 30 may select the segments based on, for example, message length.
  • the user is asked to input his or her payment information. This may be through a credit card, PaypalTM or any other suitable payment method. This step may also include the verification of the payment before proceeding.
  • a recording interface is provided to the user, for example a web recording page on the gateway server 32, which includes, for example, a java audio/video engine.
  • the user may be given the option of providing the audio/video message(s) as a video file in the form of any type of compressed video file, through email or even by mail on a DVD or other digital medium.
  • the user may be given the option of providing his or her message(s) either by voice or by text, in which cases steps similar to steps 110 to 130 of process 100 (see FIGS. 2A and 2B) instead of eteps 312 to 316,
  • the recording Interface allows the verification of the user's computer 12, 14 microphone levels, in order to prevent distortion in the recording, and/or video camera 13 picture quality.
  • the audio/video message(s) is recorded using the user's computer 12, 14 microphone and video camera 13.
  • the time allotted depends on the chosen video media and the number of messages to be inserted within the video media, an on-screen time bar may be used to give a visual indication of the remaining allotted time.
  • the user can verify the recorded audio/video message(s) and, at block 318, accept or refuse the recorded audio/video message(s). If the user refuses the recorded audio/video message(s), the process 300 returns to block 314 where a new audio/video message(s) is recorded. If the recorded audio/video message(s) is accepted, the process 300 proceeds to block 320 where the recorded audio/video message(s) is mixed with the chosen video media.
  • the video mixing process which is executed by the customization server 34 will be further detailed below.
  • the mixed video media i.e. customized video media
  • the mixed video media is saved, at block 322, as a video file in the customized media database 36.
  • the customized video media Is provided to the intended recipient, for example, by email, on a DVD through regular mail, as a link to the customized video media in the customized media database 3 ⁇ or any other transmission means.
  • the video media may be in the form of music video clips, movie extracts, short films, video commercials or other video media.
  • the audio/video message(s) may be either audio only, video only or combined audio and video.
  • FIG. 5 there is shown a flow diagram of an illustrative example of the video mixing process 400 executed by the customization server 34 at block 320 of the audio/video media customization process 300 (see FIGS. 4A and 4B). The steps of the process 400 are indicated by blocks 402 to 422.
  • the video mixing process 400 automatically processes the audio portion of the recorded audio/video message(s) through DSP effects so that the audio portion of the recorded audio/video message(s) sound like it was recorded in a recording studio prior to being integrated into the music video clip. This gives the final product, i.e. the customized music video clip, a "professionally produced” sound.
  • the process 400 starts at block 402, where the audio portion of the recorded audio/video message(s) is equalized and then, at block 404, compressed In order to regulate the volume, Noise reduction Is then applied, at block 406, to reduce background noise, followed by, at block 408, a noise gate to mute moments of silence.
  • reverb is applied to add different room ambiences and, at block 212, fading such as very fast fades at the beginning and the ending of the audio portion of the recorded audio/video message(s) in order to prevent pops and clicks,
  • the video mixing process 400 then automatically processes the video portion of the recorded audio/video message(s) through digital video filters in order to obtain optimal video quality prior to being integrated into the video media. This gives the final product, i.e. the customized video media, a "professionally produced” look.
  • the brightness and contrast of the video portion of the recorded audio/video message(s) are adjusted and, at block 416, grain reduction is applied.
  • the processed audio/video message(s) is inserted into the pre-determined segments of the pre-treated video media.
  • the processed audio/video message(s) is strategically placed in the allotted time segment depending on the length of the message. If the user has not used up all of the time available for his or her message(s), the process 400 automatically places the processed audio/video message(s) at the end of the time allotted segment in order to maximize the "professionally produced" effect.
  • Video encoding compression is then applied, at block 420, to optimize portability, which is then, at block 422, provided to block 322 of process 300 (see FIGS. 4A and 4B). It is to be understood that the audio portion of the video may be first extracted In order to perform blocks 402 to 418 solely on the audio portion after which the processed audio portion is recombined with the video at block 420.
  • the audio/video customization system 30 enables the integration of text/captions and/or voice messages with user provided images and selected musical compositions.
  • the customization involves the user providing a collection of images, for example photos taken on a trip, introductory text/captions and use them to create a "slide show", which consists of a series of visual transitions of the images and the audio of one or more musical compositions synchronized with the images.
  • the user may select the musical compositions to use for a given subset of the images.
  • the audio/video customization system 30 may recommend one or more musical compositions that best match the images based on a variety of attributes.
  • the audio/video customization system 30 may further enable treated voice messages to be integrated into the musical compositions. This involves the insertion of user-generated audio or text-to-voice converted messages within segments of pre-treated musical compositions and mixing the resulting customized musical compositions before their synchronization with the images.
  • FIG. 6 there is shown a flow diagram of an illustrative example of the slide show customization process 500 executed by the audio/video customization system 30.
  • the steps of the process 500 are indicated by blocks 502 to 520.
  • the process 500 starts at block 502 where a user accesses the gateway server 32 of the audio/video customization system 30 and is asked to input a collections of images, for example through an upload window accessing images stored on the user's personal computer 12, laptop computer 14, personal assistant device or tablet 16, mobile phone or smart phone 18.
  • the user is asked to input an introductory text and/or captions to be associated with the collection of and/or individual images.
  • information is extracted from each image, for example the location where the image was taken (e.g. using the GPS metadata produced by GPS enabled cameras), the color composition of the image (e.g. day time or night time based on pixel color spectrum), person(s) identified In the image (e.g. using face detection and facial recognition processes), etc.
  • the audio/video customization system 30 recommends musical compositions to be used for the image collection based on the information extracted at block 506 and introductory text/captions inputted at block 504 compared to metadata associated with the musical compositions (e.g. city name, mood, season, etc.) as well as the lyrics of the musical compositions.
  • metadata associated with the musical compositions e.g. city name, mood, season, etc.
  • Information about the user and its interests for example extracted from a profile on a social network, or similar Information from persons identified in the images.
  • An example of a musical compositions recommendation scheme will be further detailed below,
  • the user may also be allowed to select its own musical compositions, for example by providing musical compositions search capabilities.
  • one or more musical composition(s) her or she may be provided with, at block 512, the ability to customize the selected musical composition(s). It is to be understood that his step may be optional,
  • the process 500 proceeds to block 514 where the audio media process, which was previously described, is performed.
  • the user may be allowed to set desired transition effects between the various images of the image collection.
  • the image to video conversion is performed, taking the collection of images and introductory text/captions, and producing a video where each image is shown for a given duration, transitioning with a predetermined effect (for example fade-in/out) or with desired effects if so selected at block 516,
  • the video and audio i.e. the musical composition(s) or customized musical composition(s) are synchronized by adding the audio track to the video at defined time points.
  • the slide show customization process 500 may Include, in an alternative embodiment, steps for providing an intended recipient and payment information, and for providing the slide show to the intended recipient, for example, by email, on a DVD through regular mail, as a link to the slide show in the customized media database 38 or any other transmission means.
  • steps for providing an intended recipient and payment information, and for providing the slide show to the intended recipient may be omitted.
  • Attributes from the inputted text/captions and images are extraoted, as well as from the user.
  • the text/image/user attributes are then matched against the attributes of each musical composition.
  • the result of the match is a single numeric score, which is then used to rank the musical compositions.
  • the overall match score is a combination of the match scores from each pair of compatible attributes.
  • the combination can be based on the arithmetic mean or the geometric mean of the attribute scores. Alternatively, it can also be a weighted mean of the scores, where the weights are either set by a human expert, or they are computed based on the regression analysis on a collection of samples that are previously scored by human editors.
  • the text attributes include:
  • T1 words in the text, original and stemmed, plus the bigrams
  • the image attributes include:
  • time of day which can be derived from the image's timestamp (e.g. in the EXIF metadata) and the time zone Information (if available);
  • I2 geo-location of the image (e.g. in the EXIF metadata);
  • I3 country and city names of the image, derived from I2, using a lookup database (many are available commercially);
  • I4 color histogram of the image; and 15. "classes" of the image (e.g. night-time, quiet, vibrant, sunny, foggy, etc.), derived from 11 and I4.
  • the classes are computed based on a previously trained model built from previously classified images (by human editors) using a statistical classifier such as a decision tree, or a large margin classifier such as SVM (Support Vector Machine).
  • the user attributes Include:
  • the genre of music the user likes can be obtained using a user interface element.
  • the audio/video customization system 30 may obtain the information from a social network profile of the user.
  • the musical compositions attributes include:
  • C2 single and double word tags (e.g., "birthday”, “love”, “rock") assigned by human editors;
  • C4 location tags, which are the country and city names that the song describes, if any.
  • U1 the score is calculated based on TF- IDF and cosine similarity, which is commonly used for text matching with the bag-of-words model. The score is normalized to a value between 0 and 1 ; - between C4 and I3: the number of common locations, normalized to a value between 0 and 1 ; and
  • the audio/video customization system 30 may provide the user with a selection of voice types (i.e. US male) for the synthesis.
  • the audio/video customization system 30 may also recommend one or more voice types that best match the mood or the content of the audio media or musical composition, The matching can performed, for example, based on tags (e.g. "fast", "quiet") associated with the voice type and the audio media or musical composition.
  • the audio/video customization system 30 may also be accessed via mobile phones and smart phones (including BlackberryTM, SymbianOSTM, IPhoneTM, Windows MobileTM, Google AndroidTM and any other such system/device), in which case the gateway server 32 may also include a specifically created graphical user interface.
  • processes 100, 200, 300, 400 and 500 may be implemented individually or collectively as processor executable code stored within a memory of an associated device (i.e. customization server 34 and/or computing/communication devices 12, 14, 16, 18) to be executed by a processor of that device.
  • the customization server 34 which includes a processor 40 with an associated memory 50 having stored therein processor executable instructions 51 , 52, 53, 54 and 55 for configuring the processor 40 to perform, respectively, processes 100, 200, 300, 400 and 500, and an Input output (I/O) interface 42.
  • processor executable instructions 51 , 52, 53, 54 and 55 for configuring the processor 40 to perform, respectively, processes 100, 200, 300, 400 and 500, and an Input output (I/O) interface 42.
  • I/O Input output
  • oomputing/communication devices 12, 14, 16, 18 may be similarly provided with a processor, memory and I/O Interface.
  • processes 100, 200, 300, 400 and 500 may all implemented on the same device or selectively only on some devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

A system and method for creating a customized video media, the method comprising acquiring a plurality of images, providing a plurality of musical compositions, prompting a user to select at least one of the plurality of musical compositions and creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.

Description

SYSTEM AND METHOD FOR THE AUTOMATED CUSTOMIZATION
OF AUDIO AND VIDEO MEDIA
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefits of U.S. provisional patent application No, 61/747,085 filed on December 28, 2012, which is herein incorporated by reference.
TECHNICAL FIELD
[0002] The present disclosure relates to a system and method for the automated customization of audio and video media. More specifically, the present disclosure relates to a method and system for the Integration of audio messages into audio or video media and/or the i ntegration of audio/video or video messages into video media as well as the integration of text/captions and/or voice messages with user provided images and selected musical compositions.
BACKGROUND
[0003] Songs and music video clip revenues have been declining due to piracy. These revenues are still considered to be acceptable yet they do not represent the full revenue potential. Music labels have been confronted by the question: "Why pay for something when you can get it for free?"
[0004] Thus, there is a need for a solution to this downward spiral in the form of a service providing a novel use of digital media (I.e. songs and music clips) that can be used many times a year by the same person; one that generates revenues at every use, and a service that still manages to protect copyrights.
SUMMARY
[0005] The present disclosure provides a method for creating a customized video media, the method comprising:
acquiring a plurality of images; providing a plurality of musical compositions;
prompting a user to select at least one of the plurality of musical compositions;
creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
[0006] There is also provided a method as above, further comprising the step of:
extracting attributes from the plurality of images or the at least one text caption; and
recommending to the user a set of musical compositions based on a matching of the extracted attributes and attributes associated with the plurality of musical compositions.
[0007] There is further provided a method as above, wherein the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the method further comprising the steps of;
acquiring at least one audio message;
applying digital signal processing effects to the at least one audio message in accordance with the treatment parameters;
inserting the processed at least one audio message within the pre-treated segment of at least one of the selected musical compositions; and mixing the inserted processed at least one audio message and at least one of the selected musical compositions.
[0008] There is further still provided a method as above, wherein the step of acquiring at least one audio message includes the sub-steps of:
acquiring a text message; and converting the text message into the audio message using a text-to-speech synthesis process,
[0009] The present disclosure provides a system for creating a customized video media, the system comprising;
a database containing a plurality of musical compositions;
a user interface;
a processor operatively connected to the database and the user interface, the processor being so configured so as to:
acquire a plurality of images;
provide a plurality of musical compositions;
prompt a user to select at least one of the plurality of musical compositions; create the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
[0010] There is also provided a system as above, wherein the processor is further configured so as to:
extract attributes from the plurality of images or the at least one text caption; and
recommend to the user a set of musical compositions based on a matching of the extracted attributes and attributes associated with the plurality of musical compositions.
[0011] There is further provided a system as above, wherein the system further comprises:
a recording interface operatively connected to the processor;
wherein the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the processor being further configured so as to: acquire at least one audio message;
apply digital signal processing effects to the at least one audio message In accordance with the treatment parameters;
insert the processed at least one audio message within the pre-treated segment of at least one of the selected musical compositions; and mix the Inserted processed at least one audio message and at least one of the selected musical compositions.
[0012] There is further still provided a system as above, wherein the processor is further configured so as to, when acquiring at least one audio message:
acquire a text message; and
convert the text message into the audio message using a text-to-speech synthesis process.
[0013] The present disclosure also provides a method and system for customizing an audio/video media provided with at least one a pre-treated segment for receiving a message, the customizing Including applying digital signal processing effects and/or digital video filters to the message in accordance with treatment parameters selected according to the audio/video media and/or the means of acquiring the message.
BRIEF DESCRIPTION OF THE FIGURES
[0014] Embodiments of the disclosure will be described by way of examples only with reference to the accompanying drawings, in which:
[0015] FIG. 1 is a schematic view of an illustrative example of the network operating environment of the audio/video customization system;
[0016] FIGS. 2A and 2B are a flow diagram of an illustrative example of the audio media customization process; [0017] FIG. 3 is a flow diagram of an illustrative example of the audio mixing process;
[0018] FIGS, 4A and 4B are a flow diagram of an illustrative example of the video media customization process;
[0019] FIG. 5 is a flow diagram of an illustrative example of the video mixing process;
[0020] FIG. 6 is a flow diagram of an illustrative example of the slide show customization process; and
[0021] FIG. 7 iB a schematic representation of an illustrative example of the customization server.
[0022] Similar references used in different Figures denote similar components.
DEFINITIONS
[0023] The detailed description and figures refer to the following terms which are herein defined:
Audio media: includes audio recordings, musical compositions, songs, speeches, spoken words, poems, etc.
Video media: includes music video clips, movies, movie extracts, short films, video commercials, etc. It may have only video content or a combination of audio and video content.
Audio/video media: includes audio only media, video only media or combined audio and video media.
Audio/video message: includes audio only messages, video only messages or combined audio and video messages.
DETAILED DESCRIPTION [0024] Generally stated, the non-limitative illustrative embodiment of the present disclosure provides a system and method for the automated customization of audio/video media. More specifically, the method and system allows the Integration of audio messages Into audio or video media and/or the integration of audio and video or video only messages Into video media as well as the creation of customized slide shows.
[0025] Referring to FIG. 1, a user using a personal computer 12, laptop computer 14, personal assistant device or tablet 16, mobile phone or smart phone 18 or any other such computing/communication device, on which may run a user interface in the form of a communication software such as, for example, a web browser or an app, may access a communication interface on a gateway server 32 of the audio/video customization system 30 via wide area network (WAN) 20 such as, for example, Ethernet (broadband, high-speed), wireless WiFi, cable Internet, satellite connection, cellular or satellite network, etc.
[0026] Further to the gateway server 32, the audio/video customization system 30 includes a customization server 34, a media database 36 and a customized media database 38, all of which will be detailed further below.
Audio media customization
[0027] The audio/video customization system 30 enables treated voice messages to be integrated into customized audio media such as, for example, musical compositions. The customization involves the insertion of user-generated audio or text-to-voice converted messages within segments of pre-treated audio media, mixing the resulting customized audio media and presenting it as a new audio file in the form of, for example, an MP3 (or any other type of compressed audio file).
[0028] Referring back to FIG. 1, the pre-treated audio media, for example consisting in new or pre-existing musical compositions or songs in which segments have been identified and modified in order to make space for the insertion of future audio messages, are stored in the media database 36. The identified and modified segments are then used by the customization server 34 to allow an administrator of the audio/video customization system 30 to pre-program the positioning ant- length of each segment allotted for user-generated audio messages, Each pra- treated audio media is typically pre-programmed to receive either one, two or three user-generated audio or text-to-voice converted messages depending on the make-up of the audio media. However, it is to be understood that some audio media may be pre-programmed to receive more than three audio or text-to-voice converted messages.
[0029] Referring now to FIGS. 2A and 2B, there is shown a flow diagram of an illustrative example of the audio media customization process 100 executed by the audio/video customization system 30, The steps of the process 100 are indicated by blocks 102 to 138.
[0030] The process 100 starts at block 102 where a user accesses the gateway server 32 of the audio/video customization system 30 and selects an audio media, for example a song, from the media database 36 he or she wishes to customize. It Is to be understood that the gateway server 32 may offer search capabilities, display audio media by categories, artist names, titles, etc.
[0031] At block 104, the user inputs the information of the customized audio media's recipient. This information may be, for example, an email address, phone number, physical location address, etc., and may also include, optionally, a text message intended for the recipient.
[0032] Then, at block 106, the user selects the number of messages to be inserted within the audio media, for example one, two or three. It is to be understood that the number of available segments for the Insertion of audio messages may vary depending on the selected audio media or settings of the audio/video customization system 30. If the number of messages to be inserted within the audio media is lesser than the number of available segments, the user may select which segments are to be filed or the audio/video customization system 30 may select the segments based on, for example, message length.
[0033] At block 108, the user is asked to input his or her payment Information. This may be through a credit card, Paypal™ or any other suitable payment method. This step may also include the verification of the payment before proceeding further.
[0034] Once the payment has been effectuated, the user is asked, at block 110, if he or she wishes to provide his or her message(s) either by voice or by text.
[0035] If the user has selected to provide voice message(s), the process proceeds to block 112.
[0036] At block 112, the user is asked to select the means for providing his or her voice message(s), The user may select to provide the voice message(s) by phone or through a web interface. In an alternative embodiment, the user may be given the option of providing the voice message(s) as an P3 (or any other type of compressed audio file), through email or even by mail on a CD or other digital medium.
[0037] if the user has selected to provide voice message(s) by phone, he or she is provided with, at block 114, a phone number to dial along with a pin associated with the selected audio media and recipient information.
[0038] If the user has selected to provide the voice message(s) through a web interface, he or she is directed, at block 116, to a recording interface, for example a web recording page on the gateway server 32, which includes, for example, a java audio engine. Then, at block 118, the recording interface allows the verification of the user's computer 12, 14 microphone levels in order to prevent distortion in the recording. [0039] At block 120, the voice message(s) is recorded either the user's land phone, mobile phone or smart phone 18 (used as a microphone only in this case) through a telecommunication network 25 (land line, cellular network, etc.) or IP telephony, or computer 12, 14 microphone through the WAN 20, depending on the selected means of providing the voice message(s). The time allotted depends on the chosen audio media and the number of messages to be inserted within the audio media, an audible beep may be used as a warning of the end of the allotted time, for example five seconds before the end, or an on-screen time bar may be used to give a visual indication of the remaining allotted time, depending once more on the selected means of providing the voice message(s),
[0040] If the user has selected to provide text message(s), the process proceeds to block 122.
[0041] At block 122, the user Is asked to select the means for providing his or her text message(s). The user may select to provide the text message(s) by phone or through a web Interface. In an alternative embodiment, the user may be given the option of providing the text message(s) as a TXT file (or any other type of text file), through email or even by mail on a CD or other digital medium.
[0042] If the user has selected to provide text message(s) by phone, he or she is provided with, at block 124, a phone number to send a text message to along with a pin associated with the selected audio media and recipient information. The user is able to text the text message(s) using his or her mobile phone or smart phone 18 through a telecommunication network 25 (land line, cellular network, etc) or IP telephony.
[0043] If the user has selected to provide the text message(s) through a web interface, he or she is directed, at block 126, to a text input interface, for example a web page with a text input box on the gateway server 32.
[0044] At block 128, the text message(s) is converted into voice using a text- to-speech synthesis process, which processes are well known in the art. The audio/video customization system 30 may provide the user with a selection of voice types (i.e. US male) for the synthesis. In addition, the audio/video customization system 30 may also recommend one or more voice types that best match the mood or the content of the audio media. The matching can performed, for example, based on tags (e.g. "fast", "quiet") associated with the voice type and the audio media.
[0045] At block 130, the user can verify the voice message(s) and, at block 132, accept or refuse the voice message(s). If the user refuses the voice message(s), the process 100 returns to either of blocks 114, 116, 124 or 126, depending on the means used to provide the voice/text message(s), where a new voice message(s) is recorded or produced from text. If the voice message(s) is accepted, the process 100 proceeds to block 134 where the voice message(s) is mixed with the chosen audio media. The audio mixing process, which is executed by the customization server 34, will be further detailed below.
[0046] Once the recorded audio message(s) and the chosen audio media have been mixed, the mixed audio media (i.e, customized audio media) is saved as, at block 136, for example, an MPS file in the customized media database 38.
[0047] Finally, at block 138, depending on the recipient information entered at block 104, the customized audio media is provided to the intended recipient, for example, by email, on a CD through regular mail, as a link to the customized audio media in the customized media database 38 or any other transmission means.
[0048] It is to be understood that in an alternative embodiment some or all of steps 102 to 130 may be part of an app for a personal assistant device or tablet 16 or a smart phone 18, or other such device, in which case the recording and/or text input interface is provided by the app. It is also to be understood that in various alternative embodiments the messages provided to the audio/video customization system 30 can be either audio or text to be converted to audio, or a combination thereof, and that these messages may be inputted into the audio/video customization system 30 using various means or interfaces or combination thereof. Therefore, depending on the specific embodiment, some of blocks 110 to 128 may be optional.
Audio mixing
[0049] Referring to FIG. 3, there is shown a flow diagram of an illustrative example of the audio mixing process 200 executed by the customization server 34 at block 134 of the audio media customization process 100 (see FIGS. 2A and 2B). The steps of the process 200 are indicated by blocks 202 to 218.
[0050] The audio mixing process 200 automatically processes the voice message(s) through audio digital signal processing (DSP) affects so that the voice message(s) sound like It was recorded in a recording studio prior to being integrated into the audio media. This gives the final product, i.e. the customized audio media, a "professionally produced" sound.
[0051] The process 200 starts at block 202, where the voice message(s) is equalized and then, at block 204, compressed in order to regulate Its volume. Noise reduction is then applied, at block 206, to reduce background noise, followed by, at block 208, a noise gate to mute moments of silence.
[0052] At block 210, reverb is applied to add different room ambiences and, at block 212, fading such as very fast fades at the beginning and the ending of the recorded audio message(s) in order to prevent pops and clicks.
[0053] Following the DSP effects, at block 214, the processed voice message(s) Is inserted into the pre-determined segments of the pre-treated audio media. The processed voice message(s) is strategically placed in the allotted time segment(s) depending on the length of the message(s). If the user has not used up all of the time available for his or her message(s), the process 200 automatically places the processed voice message(s) at the end of the time allotted segment(s) in order to maximize the "professionally produced" effect. [0054] Once the processed voice message(s) is integrated into the pre- treated audio media, audio encoding compression is applied, at block 216, to optimize portability, for example into an MP3 file, which is then, at block 218, provided to block 126 of process 100 (see FIGS. 2A and 2B).
Video media customization
[0055] The audio/video customization system 30 enables treated audio/video messages to be integrated into customized video media such as, for example, music video clips. The customization involves the insertion of user- generated audio/video messages within segments of pre-treated video compositions, mixing the resulting customized audio/video media and presenting it as a new video file in the form of any type of compressed video file.
[0056] Referring back to FIG. 1, the pre-treated musical compositions, which consist In new or pre-existing video clips media which segments have been identified and modified in order to make space for the Insertion of future audio/video messages, are stored in the media database 36. The identified and modified segments are then used by the customization server 34 to allow an administrator of the audio/video customization system 30 to pre-program the positioning and length of each segment allotted for user-generated audio/video messages. Each video media Is typically pre-programmed to receive either one, two or three user-generated audio/video messages depending on various factors, for example the musical make-up of a music video clip. However, it is to be understood that some video media may be pre-programmed to receive more than three audio/video messages.
[0057] Referring now to FIGS. 4A and 4B, there is shown a flow diagram of an Illustrative example of the video media customization process 300 executed by the audio/video customization system 30. The steps of the process 300 are Indicated by blocks 302 to 324. [0058] The process 300 starts at block 302 where a user accesses the gateway server 32 of the audio/video customization system 30 and selects a video media, for example a music video clip, from the media database 36 he or she wishes to customize, It is to be understood that the gateway server 32 may offer search capabilities, display music video media by categories, artist names, music video clip titles, etc.
[0059] At block 304, the user inputs the information of the customized music video clip's recipient. This information may be, for example, and email address, phone number, physical location address, etc. and may also include, optionally, a text message intended for the recipient.
[0060] Then, at block 306, the user selects the number of messages to be inserted within the video media, for example one, two or three. It is to be understood that the number of available segments for the insertion of audio/video messages may vary depending on the selected video media or settings of the audio/video customization system 30. If the number of messages to be inserted within the video media is lesser than the number of available segments, the user may select which segments are to be filed or the audio/video customization system 30 may select the segments based on, for example, message length.
[0061] At block 308, the user is asked to input his or her payment information. This may be through a credit card, Paypal™ or any other suitable payment method. This step may also include the verification of the payment before proceeding.
[0062] Then, at block 310, a recording interface is provided to the user, for example a web recording page on the gateway server 32, which includes, for example, a java audio/video engine. In an alternative embodiment, the user may be given the option of providing the audio/video message(s) as a video file in the form of any type of compressed video file, through email or even by mail on a DVD or other digital medium. In a further alternative embodiment, the user may be given the option of providing his or her message(s) either by voice or by text, in which cases steps similar to steps 110 to 130 of process 100 (see FIGS. 2A and 2B) instead of eteps 312 to 316,
[0063] At block 312, the recording Interface allows the verification of the user's computer 12, 14 microphone levels, in order to prevent distortion in the recording, and/or video camera 13 picture quality. At block 314, the audio/video message(s) is recorded using the user's computer 12, 14 microphone and video camera 13. The time allotted depends on the chosen video media and the number of messages to be inserted within the video media, an on-screen time bar may be used to give a visual indication of the remaining allotted time.
[0064] At block 316, the user can verify the recorded audio/video message(s) and, at block 318, accept or refuse the recorded audio/video message(s). If the user refuses the recorded audio/video message(s), the process 300 returns to block 314 where a new audio/video message(s) is recorded. If the recorded audio/video message(s) is accepted, the process 300 proceeds to block 320 where the recorded audio/video message(s) is mixed with the chosen video media. The video mixing process, which is executed by the customization server 34 will be further detailed below.
[0065] Once the recorded audio/video message(s) and the chosen video media have been mixed, the mixed video media (i.e. customized video media) is saved, at block 322, as a video file in the customized media database 36.
[0066] Finally, at block 324, depending on the recipient information entered at block 304, the customized video media Is provided to the intended recipient, for example, by email, on a DVD through regular mail, as a link to the customized video media in the customized media database 3Θ or any other transmission means. [0067] It is to be understood that the video media may be in the form of music video clips, movie extracts, short films, video commercials or other video media. Furthermore, it is to be understood that the audio/video message(s) may be either audio only, video only or combined audio and video.
Video mixing
[0068] Referring to FIG. 5, there is shown a flow diagram of an illustrative example of the video mixing process 400 executed by the customization server 34 at block 320 of the audio/video media customization process 300 (see FIGS. 4A and 4B). The steps of the process 400 are indicated by blocks 402 to 422.
[0069] The video mixing process 400 automatically processes the audio portion of the recorded audio/video message(s) through DSP effects so that the audio portion of the recorded audio/video message(s) sound like it was recorded in a recording studio prior to being integrated into the music video clip. This gives the final product, i.e. the customized music video clip, a "professionally produced" sound.
[0070] The process 400 starts at block 402, where the audio portion of the recorded audio/video message(s) is equalized and then, at block 404, compressed In order to regulate the volume, Noise reduction Is then applied, at block 406, to reduce background noise, followed by, at block 408, a noise gate to mute moments of silence.
[0071] At block 210, reverb is applied to add different room ambiences and, at block 212, fading such as very fast fades at the beginning and the ending of the audio portion of the recorded audio/video message(s) in order to prevent pops and clicks,
[0072] The video mixing process 400 then automatically processes the video portion of the recorded audio/video message(s) through digital video filters in order to obtain optimal video quality prior to being integrated into the video media. This gives the final product, i.e. the customized video media, a "professionally produced" look.
[0073] At block 414, the brightness and contrast of the video portion of the recorded audio/video message(s) are adjusted and, at block 416, grain reduction is applied.
[0074] Following the DSP effects and digital video filters, at block 418, the processed audio/video message(s) is inserted into the pre-determined segments of the pre-treated video media. The processed audio/video message(s) is strategically placed in the allotted time segment depending on the length of the message. If the user has not used up all of the time available for his or her message(s), the process 400 automatically places the processed audio/video message(s) at the end of the time allotted segment in order to maximize the "professionally produced" effect.
[0075] Once the processed audio/video message(s) is integrated into the pre-treated video media, the end of each video portion of the audio/video message(s) Is automatically dissolved back to the video media. Video encoding compression is then applied, at block 420, to optimize portability, which is then, at block 422, provided to block 322 of process 300 (see FIGS. 4A and 4B). It is to be understood that the audio portion of the video may be first extracted In order to perform blocks 402 to 418 solely on the audio portion after which the processed audio portion is recombined with the video at block 420.
Slide Show Customization
[0076] The audio/video customization system 30 enables the integration of text/captions and/or voice messages with user provided images and selected musical compositions. The customization involves the user providing a collection of images, for example photos taken on a trip, introductory text/captions and use them to create a "slide show", which consists of a series of visual transitions of the images and the audio of one or more musical compositions synchronized with the images. The user may select the musical compositions to use for a given subset of the images. Additionally, the audio/video customization system 30 may recommend one or more musical compositions that best match the images based on a variety of attributes. The audio/video customization system 30 may further enable treated voice messages to be integrated into the musical compositions. This involves the insertion of user-generated audio or text-to-voice converted messages within segments of pre-treated musical compositions and mixing the resulting customized musical compositions before their synchronization with the images.
[0077] Referring to FIG. 6, there is shown a flow diagram of an illustrative example of the slide show customization process 500 executed by the audio/video customization system 30. The steps of the process 500 are indicated by blocks 502 to 520.
[0078] The process 500 starts at block 502 where a user accesses the gateway server 32 of the audio/video customization system 30 and is asked to input a collections of images, for example through an upload window accessing images stored on the user's personal computer 12, laptop computer 14, personal assistant device or tablet 16, mobile phone or smart phone 18.
[0079] At block 504, the user is asked to input an introductory text and/or captions to be associated with the collection of and/or individual images.
[0080] At block 506, information is extracted from each image, for example the location where the image was taken (e.g. using the GPS metadata produced by GPS enabled cameras), the color composition of the image (e.g. day time or night time based on pixel color spectrum), person(s) identified In the image (e.g. using face detection and facial recognition processes), etc.
[0081] Then, at block 508, the audio/video customization system 30 recommends musical compositions to be used for the image collection based on the information extracted at block 506 and introductory text/captions inputted at block 504 compared to metadata associated with the musical compositions (e.g. city name, mood, season, etc.) as well as the lyrics of the musical compositions. Information about the user and its interests, for example extracted from a profile on a social network, or similar Information from persons identified in the images. An example of a musical compositions recommendation scheme will be further detailed below, Optionally, the user may also be allowed to select its own musical compositions, for example by providing musical compositions search capabilities.
[0082] Once the user selects, at block 510, one or more musical composition(s), her or she may be provided with, at block 512, the ability to customize the selected musical composition(s). It is to be understood that his step may be optional,
[0083] If the user elects to customize the one or more musical composition(s), the process 500 proceeds to block 514 where the audio media process, which was previously described, is performed.
[0084] At block 516, optionally, the user may be allowed to set desired transition effects between the various images of the image collection.
[0085] At block 518, the image to video conversion is performed, taking the collection of images and introductory text/captions, and producing a video where each image is shown for a given duration, transitioning with a predetermined effect (for example fade-in/out) or with desired effects if so selected at block 516, The video and audio (i.e. the musical composition(s) or customized musical composition(s)) are synchronized by adding the audio track to the video at defined time points.
[0086] Finally, at block 520, the assembled slide show with Its musical composition(s) is provided to the user.
[0087] It is to be understood that the slide show customization process 500 may Include, in an alternative embodiment, steps for providing an intended recipient and payment information, and for providing the slide show to the intended recipient, for example, by email, on a DVD through regular mail, as a link to the slide show in the customized media database 38 or any other transmission means. [0088] Conversely, in alternative embodiments of the audio media customization 100 and video media customization 300 processes, the steps for providing an intended recipient and payment information, and for providing the slide show to the intended recipient may be omitted.
Musical Compositions Recommendation
[0089] Attributes from the inputted text/captions and images are extraoted, as well as from the user. The text/image/user attributes are then matched against the attributes of each musical composition. The result of the match is a single numeric score, which is then used to rank the musical compositions.
[0090] The audio/video customization system 30 presents the top-N (e.g. N = 3) choices for the user to choose based on a ranked list of musical compositions.
[0091] The overall match score is a combination of the match scores from each pair of compatible attributes. The combination can be based on the arithmetic mean or the geometric mean of the attribute scores. Alternatively, it can also be a weighted mean of the scores, where the weights are either set by a human expert, or they are computed based on the regression analysis on a collection of samples that are previously scored by human editors.
[0092] The text attributes include:
T1 : words in the text, original and stemmed, plus the bigrams,
[0093] The image attributes include:
11: time of day, which can be derived from the image's timestamp (e.g. in the EXIF metadata) and the time zone Information (if available);
I2: geo-location of the image (e.g. in the EXIF metadata);
I3: country and city names of the image, derived from I2, using a lookup database (many are available commercially);
I4: color histogram of the image; and 15. "classes" of the image (e.g. night-time, quiet, vibrant, sunny, foggy, etc.), derived from 11 and I4. The classes are computed based on a previously trained model built from previously classified images (by human editors) using a statistical classifier such as a decision tree, or a large margin classifier such as SVM (Support Vector Machine).
[0094] The user attributes Include:
U1; the genre of music the user likes. The information can be obtained using a user interface element. Alternative, the audio/video customization system 30 may obtain the information from a social network profile of the user.
[0095] The musical compositions attributes include:
C1: classes assigned by human editors, as in I5.
C2: single and double word tags (e.g., "birthday", "love", "rock") assigned by human editors;
C3: words in the lyrics and the title, original and stemmed, plus the bigrams; and
C4: location tags, which are the country and city names that the song describes, if any.
[0096] The following are how the pair-wise attribute match scores are computed:
- between C1 and I5; the number of common classes, normalized to a value between 0 and 1 ;
- between C2, C3 and T1, U1: the score is calculated based on TF- IDF and cosine similarity, which is commonly used for text matching with the bag-of-words model. The score is normalized to a value between 0 and 1 ; - between C4 and I3: the number of common locations, normalized to a value between 0 and 1 ; and
- all other attribute pairs: zero.
[0097] It is to be understood that other musical compositions recommendation schemes may be used with the same or different set of attributes.
[0098] In alternative embodiments of the above described processes, whenever text message(s) are converted into voice using a text-to-speech synthesis process, the audio/video customization system 30 may provide the user with a selection of voice types (i.e. US male) for the synthesis. In addition, the audio/video customization system 30 may also recommend one or more voice types that best match the mood or the content of the audio media or musical composition, The matching can performed, for example, based on tags (e.g. "fast", "quiet") associated with the voice type and the audio media or musical composition.
[0099J It is to be understood that the audio/video customization system 30 may also be accessed via mobile phones and smart phones (including Blackberry™, SymbianOS™, IPhone™, Windows Mobile™, Google Android™ and any other such system/device), in which case the gateway server 32 may also include a specifically created graphical user interface.
[00100] It is also to be understood that although throughout the disclosure reference is made to separate servers 32 and 34 as well as separate databases 36 and 38, these may be implemented on one or more physical device and/or may be combined. It Is further to be understood that either, some or all of the separate servers 32 and 34 and databases 36 and 38 may be implemented on one or more of the computing/communication devices 12, 14, 16, 18, for example as an app.
[00101] Further still, it is to be understood that the above-described processes (i.e. processes 100, 200, 300, 400 and 500) may be implemented individually or collectively as processor executable code stored within a memory of an associated device (i.e. customization server 34 and/or computing/communication devices 12, 14, 16, 18) to be executed by a processor of that device.
[00102] Referring to Fig. ?, there is shown an illustrative example of the customization server 34 which includes a processor 40 with an associated memory 50 having stored therein processor executable instructions 51 , 52, 53, 54 and 55 for configuring the processor 40 to perform, respectively, processes 100, 200, 300, 400 and 500, and an Input output (I/O) interface 42. It is to be understood that oomputing/communication devices 12, 14, 16, 18 may be similarly provided with a processor, memory and I/O Interface. It is to be further understood that processes 100, 200, 300, 400 and 500 may all implemented on the same device or selectively only on some devices.
[00103] Although the present disclosure has been described with a certain degree of particularity and by way of an illustrative embodiments and examples thereof, it is to be understood that the present disclosure is not limited to the features of the embodiments described and illustrated herein, but includes all variations and modifications within the scope and spirit of the disclosure as hereinafter claimed.

Claims

CLAIMS What Is claimed is:
1. A method for creating a customized video media, the method comprising:
acquiring a plurality of images;
providing a plurality of musical compositions;
prompting a user to select at least one of the plurality of musical compositions;
creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
2. A method in accordance with claim 1, further comprising the step of:
acquiring at least one text caption associated with one of the plurality of images;
wherein the step of creating the customized video media includes mixing the at least one text caption with the acquired plurality of images and the selected plurality of musical compositions.
3- A method in accordance with claim 2, further comprising the step of:
extracting attributes from the plurality of Images or the at least one text caption; and
recommending to the user a set of musical compositions based on a matching of the extracted attributes and attributes associated with the plurality of musical compositions.
4. A method in accordance with claim 3, wherein the extracted attributes are selected from a group consisting of an image geo-locatlon metadata, an image color composition and a person Identified by a facial recognition process.
5. A method in accordance with claim 3, further comprising the step of:
extracting attributes from a profile of the user or a person identified in one of the plurality of images by a facial recognition process on a social network; wherein the matching is further based on the profile extracted attributes.
6. A method in accordance with any of claims 1 to 5, wherein the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the method further comprising the steps of:
acquiring at least one audio message;
applying digital signal processing effects to the at least one audio message in accordance with the treatment parameters;
inserting the processed at least one audio message within the pre-treated segment of at least one of the selected musical compositions; and mixing the inserted processed at least one audio message and at least one of the selected musical compositions.
7. A method in accordance with claim 6, wherein the step of acquiring at least one audio message includes the sub-steps of:
acquiring a text message; and
converting the text message into the audio message using a text-to- speech synthesis process.
8. A method in accordance with claim 6, wherein the sub-step converting the text message into the audio message further includes providing one or more voice selection associated with at least one of the selected musical compositions.
9. A method in accordance with any of claims 1 to 8, wherein the step of creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions includes applying a transition effect between the acquired plurality of images.
10. A method in accordance with claim 9, further comprising the step of:
prompting the user to select a transition effect from a plurality of transition effects.
11 A system for creating a customized video media, the system comprising: a database containing a plurality of musical compositions;
a user interface;
a processor operatively connected to the database and the user interface, the processor being so configured so as to:
acquire a plurality of Images;
provide a plurality of musical compositions;
prompt a user to select at least one of the plurality of musical compositions;
create the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
12. A system in accordance with claim 11, wherein the processor is further configured so as to:
acquire at least one text caption associated with one of the plurality of images;
wherein creating the customized video media includes mixing the at least one text caption with the acquired plurality of images and the selected plurality of musical compositions.
13. A system in accordance with claim 12, wherein the processor is further configured so as to:
extract attributes from the plurality of images or the at least one text caption; and
recommend to the user a set of musical compositions based on a matching of the extracted attributes and attributes associated with the plurality of musical compositions.
1 . A system in accordance with claim 13, wherein the extracted attributes are selected from a group consisting of an image geo-location metadata, an image color composition and a person identified by a facial recognition process.
15. A system in accordance with claim 13, wherein the processor is further configured so as to:
extract attributes from a profile of the user or a person identified in one of the plurality of images by a facial recognition process on a social network; wherein the matching is further based on the profile extracted attributes.
16. A system in accordance with any of claims 11 to 15, wherein the system further comprises:
a recording interface operatively connected to the processor;
wherein the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the processor being further configured so as to:
acquire at least one audio message;
apply digital signal processing effects to the at least one audio message in accordance with the treatment parameters;
insert the processed at least one audio message within the pre-treated segment of at least one of the selected musical compositions; and mix the inserted processed at least one audio message and at least one of the selected musical compositions.
17. A system in accordance with claim 16, wherein the processor is further configured so as to, when acquiring at least one audio message:
acquire a text message; and
convert the text message into the audio message using a text-to- speech synthesis process.
1B.A system in accordance with claim 16, wherein the processor is further configured so as to, when converting the text message into the audio message: provide one or more voice selection associated with at least one of the selected musical compositions.
19. A system in accordance with any of claims 11 to 18, wherein the processor is further configured so as to, when creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions:
apply a transition effect between the acquired plurality of Images.
20. A system in accordance with claim 19, wherein the processor Is further configured so as to:
prompt the user to select a transition effect from a plurality of transition effects.
21. A method for customizing an audio/video media, the method comprising:
a. providing the audio/video media with a pre-treated segment for receiving a message that includes an audio portion, the audio/video media having associated treatment parameters;
b. acquiring the message;
c. applying digital signal processing effects to the audio portion of the message in accordance with the treatment parameters;
d. inserting the processed message within the pre-treated segment of the audio/video media in a position within the pre-treated segment depending on the length of the processed message; and
e. creating a customized audio/video media by mixing the inserted processed message and audio/video media.
22. A method in accordance with claim 21, wherein step a. includes the sub-steps of:
a1. displaying a list of available audio/video media with a pre-treated segment for receiving a message;
a2. prompting a user to select a listed audio/video media; a3. providing the selected audio/video media.
23. A method In accordance with claim 21, wherein step b. includes the sub-steps of:
b1. prompting a user to provide a message;
b2. providing the user with a message recording interface; and
b3. acquiring the message.
24. A method in accordance with claim 21, further comprising:
f. prompting a user to input information related to an intended recipient; and
g. proving the customized media to the intended recipient.
25. A method in accordance with claim 21, further comprising:
f. prompting a user to input payment information.
26. A method in accordance with claim 21, wherein the audio/video media includes a plurality of pre-treated segments and wherein a plurality of messages are acquired, each message being associable with one of the plurality of pre- treated segments,
27. A method in accordance with claim 26, wherein each message is automatically associated with one of the plurality of pre-treated segments,
28. A method in accordance with claim 27, wherein each message is automatically associated with one of the plurality of pre-treated segments based on the length of each of the plurality of messages and the length of each of the plurality of pre-treated segments.
29. A method in accordance with claim 21, wherein the treatment parameters are selected in accordance with the means of acquiring the message.
30. A method in accordance with claim 21 , wherein the message is a text message and further comprising the step of converting the text message into audio using a text-to-speech synthesis process.
31. A method in accordance with claim 30, wherein the step of converting the text message into audio further includes providing one or more voice selection associated with the audio/video media.
32. A method in accordance with claim 21, wherein the audio/video media is a video media and the message is a video message, the method further comprising the step of applying digital video filters to the message in accordance with the treatment parameters before inserting the message into the pre-treated segment of the audioA/ideo media, wherein the treatment parameters include video treatment parameters selected in accordance with one of the following: the audio/video media or the means of acquiring the message.
33. A method in accordance with claim 21 , wherein the audio/video media is an audio and video media and the message is an audio and video message, the method further comprising the step of applying digital signal processing effects to an audio portion of the message and applying digital video filters to a video portion of the message in accordance with the treatment parameters before inserting the message into the pre-treated segment of the audio/video media, wherein the treatment parameters include video treatment parameters selected in accordance with one of the following: the audio/video media or the means of acquiring the message.
34. A system for customizing an audio/video media, the system comprising:
a database containing at least one audio/video media with a pre-treated segment for receiving a message;
a user interface;
a recording interface; a processor operatively connected to the database, the user interface and the recording interface, the processor being so configured so as to:
display through the user interface a list of the audio/video media in the database, each of the audio/video media having associated treatment parameters;
prompt a user through the user interface to select a listed audio/video media;
prompt the user to provide a message through the recording interface; applying digital signal processing effects to the audio portion of the message in accordance with the treatment parameters of the selected audio/video media;
insert the processed message within the pre-treated segment of the selected communication media in a position within the pre-treated segment depending on the length of the processed message; and create a customized audio/video media by mixing the inserted processed message and the selected audio/video media.
35. A system in accordance with claim 34, wherein the treatment parameters are selected in accordance with the recording interface,
36. A system in accordance with claim 34, wherein the message is a text message, the processor being further configured so as apply a text-to-speech synthesis process to the text message.
37. A system in accordance with claim 34, wherein the audio/video media is a video media and the message is a video message, the processor being further configured so as to apply digital video filters to the message in accordance with the treatment parameters before inserting the message into the pre-treated segment of the audio/video media, wherein the treatment parameters include video treatment parameters selected in accordance with one of the following: the audio/video media or the recording interface.
38. A system in accordance with claim 34, wherein the audio/video msdia is an audio and video media and the message is an audio and video message, the processor being further configured so as to apply digital signal processing effects to an audio portion of the message and applying digital video filters to a video portion of the message in accordance with the treatment parameters before inserting the message into the pre-treated segment of the audio/video media, wherein the treatment parameters include video treatment parameters selected in accordance with one of the following: the audio/video media or the recording interface.
PCT/CA2013/001084 2012-12-28 2013-12-30 System and method for the automated customization of audio and video media WO2014100893A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261747085P 2012-12-28 2012-12-28
US61/747,085 2012-12-28

Publications (1)

Publication Number Publication Date
WO2014100893A1 true WO2014100893A1 (en) 2014-07-03

Family

ID=51019599

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2013/001084 WO2014100893A1 (en) 2012-12-28 2013-12-30 System and method for the automated customization of audio and video media

Country Status (1)

Country Link
WO (1) WO2014100893A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016028395A1 (en) * 2014-08-18 2016-02-25 KnowMe Systems, Inc. Unscripted digital media message generation
WO2016201376A1 (en) * 2015-06-10 2016-12-15 Piantedosi Avery Alarm notification system
US9973459B2 (en) 2014-08-18 2018-05-15 Nightlight Systems Llc Digital media message generation
US10037185B2 (en) 2014-08-18 2018-07-31 Nightlight Systems Llc Digital media message generation
TWI699663B (en) * 2018-09-07 2020-07-21 台達電子工業股份有限公司 Segmentation method, segmentation system and non-transitory computer-readable medium
US10735360B2 (en) 2014-08-18 2020-08-04 Nightlight Systems Llc Digital media messages and files
US10735361B2 (en) 2014-08-18 2020-08-04 Nightlight Systems Llc Scripted digital media message generation
CN113572981A (en) * 2021-01-19 2021-10-29 腾讯科技(深圳)有限公司 Video dubbing method and device, electronic equipment and storage medium
WO2022171052A1 (en) * 2021-02-10 2022-08-18 北京字节跳动网络技术有限公司 Video obtaining method and apparatus, video sharing method and apparatus, device, and medium
US11449306B1 (en) 2016-04-18 2022-09-20 Look Sharp Labs, Inc. Music-based social networking multi-media application and related methods
US11481434B1 (en) * 2018-11-29 2022-10-25 Look Sharp Labs, Inc. System and method for contextual data selection from electronic data files

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005076618A1 (en) * 2004-02-05 2005-08-18 Sony United Kingdom Limited System and method for providing customised audio/video sequences
US7301093B2 (en) * 2002-02-27 2007-11-27 Neil D. Sater System and method that facilitates customizing media
EP1879195A1 (en) * 2006-07-14 2008-01-16 Muvee Technologies Pte Ltd Creating a new music video by intercutting user-supplied visual data with a pre-existing music video
US20080215979A1 (en) * 2007-03-02 2008-09-04 Clifton Stephen J Automatically generating audiovisual works
US20110264755A1 (en) * 2008-10-08 2011-10-27 Salvatore De Villiers Jeremie System and method for the automated customization of audio and video media

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7301093B2 (en) * 2002-02-27 2007-11-27 Neil D. Sater System and method that facilitates customizing media
WO2005076618A1 (en) * 2004-02-05 2005-08-18 Sony United Kingdom Limited System and method for providing customised audio/video sequences
EP1879195A1 (en) * 2006-07-14 2008-01-16 Muvee Technologies Pte Ltd Creating a new music video by intercutting user-supplied visual data with a pre-existing music video
US20080215979A1 (en) * 2007-03-02 2008-09-04 Clifton Stephen J Automatically generating audiovisual works
US20110264755A1 (en) * 2008-10-08 2011-10-27 Salvatore De Villiers Jeremie System and method for the automated customization of audio and video media

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HYUN SUNG CHANG ET AL.: "Efficient Video Indexing Scheme for Content- Based Retrieval", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 9, no. 8, 1 December 1999 (1999-12-01), PISCATAWAY , N.J, US, pages 1269 - 1279 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10992623B2 (en) 2014-08-18 2021-04-27 Nightlight Systems Llc Digital media messages and files
US10728197B2 (en) 2014-08-18 2020-07-28 Nightlight Systems Llc Unscripted digital media message generation
US9973459B2 (en) 2014-08-18 2018-05-15 Nightlight Systems Llc Digital media message generation
US10037185B2 (en) 2014-08-18 2018-07-31 Nightlight Systems Llc Digital media message generation
US10038657B2 (en) 2014-08-18 2018-07-31 Nightlight Systems Llc Unscripted digital media message generation
US10691408B2 (en) 2014-08-18 2020-06-23 Nightlight Systems Llc Digital media message generation
US10735360B2 (en) 2014-08-18 2020-08-04 Nightlight Systems Llc Digital media messages and files
WO2016028395A1 (en) * 2014-08-18 2016-02-25 KnowMe Systems, Inc. Unscripted digital media message generation
US10735361B2 (en) 2014-08-18 2020-08-04 Nightlight Systems Llc Scripted digital media message generation
US11082377B2 (en) 2014-08-18 2021-08-03 Nightlight Systems Llc Scripted digital media message generation
US11670152B2 (en) 2015-06-10 2023-06-06 Avery Piantedosi Alarm notification system
WO2016201376A1 (en) * 2015-06-10 2016-12-15 Piantedosi Avery Alarm notification system
US11449306B1 (en) 2016-04-18 2022-09-20 Look Sharp Labs, Inc. Music-based social networking multi-media application and related methods
US11797265B1 (en) 2016-04-18 2023-10-24 Look Sharp Labs, Inc. Music-based social networking multi-media application and related methods
TWI699663B (en) * 2018-09-07 2020-07-21 台達電子工業股份有限公司 Segmentation method, segmentation system and non-transitory computer-readable medium
US11481434B1 (en) * 2018-11-29 2022-10-25 Look Sharp Labs, Inc. System and method for contextual data selection from electronic data files
US11971927B1 (en) 2018-11-29 2024-04-30 Look Sharp Labs, Inc. System and method for contextual data selection from electronic media content
CN113572981A (en) * 2021-01-19 2021-10-29 腾讯科技(深圳)有限公司 Video dubbing method and device, electronic equipment and storage medium
CN113572981B (en) * 2021-01-19 2022-07-19 腾讯科技(深圳)有限公司 Video dubbing method and device, electronic equipment and storage medium
WO2022171052A1 (en) * 2021-02-10 2022-08-18 北京字节跳动网络技术有限公司 Video obtaining method and apparatus, video sharing method and apparatus, device, and medium

Similar Documents

Publication Publication Date Title
WO2014100893A1 (en) System and method for the automated customization of audio and video media
US11960526B2 (en) Query response using media consumption history
US20110264755A1 (en) System and method for the automated customization of audio and video media
CN105095508B (en) A kind of multimedia content recommended method and multimedia content recommendation apparatus
CN101395607B (en) Method and device for automatic generation of summary of a plurality of images
US20140164507A1 (en) Media content portions recommended
US20150127643A1 (en) Digitally displaying and organizing personal multimedia content
US20140161356A1 (en) Multimedia message from text based images including emoticons and acronyms
US20090150797A1 (en) Rich media management platform
US20190335243A1 (en) Reminders of Media Content Referenced in Other Media Content
CN111368141B (en) Video tag expansion method, device, computer equipment and storage medium
US20200137011A1 (en) Method and system for communicating between a sender and a recipient via a personalized message including an audio clip extracted from a pre-existing recording
KR20070104614A (en) Automatic generation of trailers containing product placements
US20090154665A1 (en) Authenticated audiographs from voice mail
JP7155248B2 (en) Implementing a Cue Data Model for Adaptive Presentation of Collaborative Recollection of Memories
CN114173067B (en) Video generation method, device, equipment and storage medium
CN103761263A (en) Method for recommending information for users
TW200849030A (en) System and method of automated video editing
US20140161423A1 (en) Message composition of media portions in association with image content
CN104038774B (en) Generate the method and device of ring signal file
US20140078331A1 (en) Method and system for associating sound data with an image
US20200302933A1 (en) Generation of audio stories from text-based media
JP5997108B2 (en) Content distribution apparatus, content distribution method and program
US20150079947A1 (en) Emotion Express EMEX System and Method for Creating and Distributing Feelings Messages
KR20150111524A (en) An apparatus and a method of providing an advertisement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13866745

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13866745

Country of ref document: EP

Kind code of ref document: A1