WO2014100893A1 - System and method for the automated customization of audio and video media - Google Patents
System and method for the automated customization of audio and video media Download PDFInfo
- Publication number
- WO2014100893A1 WO2014100893A1 PCT/CA2013/001084 CA2013001084W WO2014100893A1 WO 2014100893 A1 WO2014100893 A1 WO 2014100893A1 CA 2013001084 W CA2013001084 W CA 2013001084W WO 2014100893 A1 WO2014100893 A1 WO 2014100893A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- message
- accordance
- video
- video media
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 109
- 239000000203 mixture Substances 0.000 claims abstract description 81
- 230000008569 process Effects 0.000 claims description 64
- 230000000694 effects Effects 0.000 claims description 23
- 230000015572 biosynthetic process Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000003786 synthesis reaction Methods 0.000 claims description 10
- 230000007704 transition Effects 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 230000001815 facial effect Effects 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 10
- 238000003780 insertion Methods 0.000 description 7
- 230000037431 insertion Effects 0.000 description 7
- 230000010354 integration Effects 0.000 description 5
- 238000012795 verification Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 239000012467 final product Substances 0.000 description 3
- 230000036651 mood Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000005562 fading Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
Definitions
- the present disclosure relates to a system and method for the automated customization of audio and video media. More specifically, the present disclosure relates to a method and system for the Integration of audio messages into audio or video media and/or the i ntegration of audio/video or video messages into video media as well as the integration of text/captions and/or voice messages with user provided images and selected musical compositions.
- the present disclosure provides a method for creating a customized video media, the method comprising:
- creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
- step of acquiring at least one audio message includes the sub-steps of:
- the present disclosure provides a system for creating a customized video media, the system comprising;
- processor operatively connected to the database and the user interface, the processor being so configured so as to:
- processor is further configured so as to:
- system further comprises:
- a recording interface operatively connected to the processor
- the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the processor being further configured so as to: acquire at least one audio message;
- processor is further configured so as to, when acquiring at least one audio message:
- the present disclosure also provides a method and system for customizing an audio/video media provided with at least one a pre-treated segment for receiving a message, the customizing Including applying digital signal processing effects and/or digital video filters to the message in accordance with treatment parameters selected according to the audio/video media and/or the means of acquiring the message.
- FIG. 1 is a schematic view of an illustrative example of the network operating environment of the audio/video customization system
- FIGS. 2A and 2B are a flow diagram of an illustrative example of the audio media customization process;
- FIG. 3 is a flow diagram of an illustrative example of the audio mixing process;
- FIGS, 4A and 4B are a flow diagram of an illustrative example of the video media customization process
- FIG. 5 is a flow diagram of an illustrative example of the video mixing process
- FIG. 6 is a flow diagram of an illustrative example of the slide show customization process.
- FIG. 7 iB a schematic representation of an illustrative example of the customization server.
- Audio media includes audio recordings, musical compositions, songs, speeches, spoken words, poems, etc.
- Video media includes music video clips, movies, movie extracts, short films, video commercials, etc. It may have only video content or a combination of audio and video content.
- Audio/video media includes audio only media, video only media or combined audio and video media.
- Audio/video message includes audio only messages, video only messages or combined audio and video messages.
- the non-limitative illustrative embodiment of the present disclosure provides a system and method for the automated customization of audio/video media. More specifically, the method and system allows the Integration of audio messages Into audio or video media and/or the integration of audio and video or video only messages Into video media as well as the creation of customized slide shows.
- WAN wide area network
- the audio/video customization system 30 includes a customization server 34, a media database 36 and a customized media database 38, all of which will be detailed further below.
- the audio/video customization system 30 enables treated voice messages to be integrated into customized audio media such as, for example, musical compositions.
- the customization involves the insertion of user-generated audio or text-to-voice converted messages within segments of pre-treated audio media, mixing the resulting customized audio media and presenting it as a new audio file in the form of, for example, an MP3 (or any other type of compressed audio file).
- the pre-treated audio media for example consisting in new or pre-existing musical compositions or songs in which segments have been identified and modified in order to make space for the insertion of future audio messages, are stored in the media database 36.
- the identified and modified segments are then used by the customization server 34 to allow an administrator of the audio/video customization system 30 to pre-program the positioning ant- length of each segment allotted for user-generated audio messages,
- Each pra- treated audio media is typically pre-programmed to receive either one, two or three user-generated audio or text-to-voice converted messages depending on the make-up of the audio media. However, it is to be understood that some audio media may be pre-programmed to receive more than three audio or text-to-voice converted messages.
- FIGS. 2A and 2B there is shown a flow diagram of an illustrative example of the audio media customization process 100 executed by the audio/video customization system 30, The steps of the process 100 are indicated by blocks 102 to 138.
- the process 100 starts at block 102 where a user accesses the gateway server 32 of the audio/video customization system 30 and selects an audio media, for example a song, from the media database 36 he or she wishes to customize.
- an audio media for example a song
- the gateway server 32 may offer search capabilities, display audio media by categories, artist names, titles, etc.
- the user inputs the information of the customized audio media's recipient.
- This information may be, for example, an email address, phone number, physical location address, etc., and may also include, optionally, a text message intended for the recipient.
- the user selects the number of messages to be inserted within the audio media, for example one, two or three. It is to be understood that the number of available segments for the Insertion of audio messages may vary depending on the selected audio media or settings of the audio/video customization system 30. If the number of messages to be inserted within the audio media is lesser than the number of available segments, the user may select which segments are to be filed or the audio/video customization system 30 may select the segments based on, for example, message length.
- the user is asked to input his or her payment Information. This may be through a credit card, PaypalTM or any other suitable payment method. This step may also include the verification of the payment before proceeding further.
- the user is asked, at block 110, if he or she wishes to provide his or her message(s) either by voice or by text.
- the user is asked to select the means for providing his or her voice message(s),
- the user may select to provide the voice message(s) by phone or through a web interface.
- the user may be given the option of providing the voice message(s) as an P3 (or any other type of compressed audio file), through email or even by mail on a CD or other digital medium.
- the user If the user has selected to provide the voice message(s) through a web interface, he or she is directed, at block 116, to a recording interface, for example a web recording page on the gateway server 32, which includes, for example, a java audio engine. Then, at block 118, the recording interface allows the verification of the user's computer 12, 14 microphone levels in order to prevent distortion in the recording.
- the voice message(s) is recorded either the user's land phone, mobile phone or smart phone 18 (used as a microphone only in this case) through a telecommunication network 25 (land line, cellular network, etc.) or IP telephony, or computer 12, 14 microphone through the WAN 20, depending on the selected means of providing the voice message(s).
- an audible beep may be used as a warning of the end of the allotted time, for example five seconds before the end, or an on-screen time bar may be used to give a visual indication of the remaining allotted time, depending once more on the selected means of providing the voice message(s),
- the user Is asked to select the means for providing his or her text message(s).
- the user may select to provide the text message(s) by phone or through a web Interface.
- the user may be given the option of providing the text message(s) as a TXT file (or any other type of text file), through email or even by mail on a CD or other digital medium.
- a phone number to send a text message to along with a pin associated with the selected audio media and recipient information.
- the user is able to text the text message(s) using his or her mobile phone or smart phone 18 through a telecommunication network 25 (land line, cellular network, etc) or IP telephony.
- a text input interface for example a web page with a text input box on the gateway server 32.
- the text message(s) is converted into voice using a text- to-speech synthesis process, which processes are well known in the art.
- the audio/video customization system 30 may provide the user with a selection of voice types (i.e. US male) for the synthesis.
- the audio/video customization system 30 may also recommend one or more voice types that best match the mood or the content of the audio media. The matching can performed, for example, based on tags (e.g. "fast", "quiet") associated with the voice type and the audio media.
- the user can verify the voice message(s) and, at block 132, accept or refuse the voice message(s). If the user refuses the voice message(s), the process 100 returns to either of blocks 114, 116, 124 or 126, depending on the means used to provide the voice/text message(s), where a new voice message(s) is recorded or produced from text. If the voice message(s) is accepted, the process 100 proceeds to block 134 where the voice message(s) is mixed with the chosen audio media.
- the audio mixing process which is executed by the customization server 34, will be further detailed below.
- the mixed audio media i.e, customized audio media
- the mixed audio media is saved as, at block 136, for example, an MPS file in the customized media database 38.
- the customized audio media is provided to the intended recipient, for example, by email, on a CD through regular mail, as a link to the customized audio media in the customized media database 38 or any other transmission means.
- steps 102 to 130 may be part of an app for a personal assistant device or tablet 16 or a smart phone 18, or other such device, in which case the recording and/or text input interface is provided by the app.
- the messages provided to the audio/video customization system 30 can be either audio or text to be converted to audio, or a combination thereof, and that these messages may be inputted into the audio/video customization system 30 using various means or interfaces or combination thereof. Therefore, depending on the specific embodiment, some of blocks 110 to 128 may be optional.
- FIG. 3 there is shown a flow diagram of an illustrative example of the audio mixing process 200 executed by the customization server 34 at block 134 of the audio media customization process 100 (see FIGS. 2A and 2B). The steps of the process 200 are indicated by blocks 202 to 218.
- the audio mixing process 200 automatically processes the voice message(s) through audio digital signal processing (DSP) affects so that the voice message(s) sound like It was recorded in a recording studio prior to being integrated into the audio media. This gives the final product, i.e. the customized audio media, a "professionally produced” sound.
- DSP digital signal processing
- the process 200 starts at block 202, where the voice message(s) is equalized and then, at block 204, compressed in order to regulate Its volume. Noise reduction is then applied, at block 206, to reduce background noise, followed by, at block 208, a noise gate to mute moments of silence.
- reverb is applied to add different room ambiences and, at block 212, fading such as very fast fades at the beginning and the ending of the recorded audio message(s) in order to prevent pops and clicks.
- the processed voice message(s) Is inserted into the pre-determined segments of the pre-treated audio media.
- the processed voice message(s) is strategically placed in the allotted time segment(s) depending on the length of the message(s). If the user has not used up all of the time available for his or her message(s), the process 200 automatically places the processed voice message(s) at the end of the time allotted segment(s) in order to maximize the "professionally produced" effect.
- audio encoding compression is applied, at block 216, to optimize portability, for example into an MP3 file, which is then, at block 218, provided to block 126 of process 100 (see FIGS. 2A and 2B).
- the audio/video customization system 30 enables treated audio/video messages to be integrated into customized video media such as, for example, music video clips.
- the customization involves the insertion of user- generated audio/video messages within segments of pre-treated video compositions, mixing the resulting customized audio/video media and presenting it as a new video file in the form of any type of compressed video file.
- the pre-treated musical compositions which consist In new or pre-existing video clips media which segments have been identified and modified in order to make space for the Insertion of future audio/video messages, are stored in the media database 36.
- the identified and modified segments are then used by the customization server 34 to allow an administrator of the audio/video customization system 30 to pre-program the positioning and length of each segment allotted for user-generated audio/video messages.
- Each video media Is typically pre-programmed to receive either one, two or three user-generated audio/video messages depending on various factors, for example the musical make-up of a music video clip. However, it is to be understood that some video media may be pre-programmed to receive more than three audio/video messages.
- FIGS. 4A and 4B there is shown a flow diagram of an Illustrative example of the video media customization process 300 executed by the audio/video customization system 30.
- the steps of the process 300 are Indicated by blocks 302 to 324.
- the process 300 starts at block 302 where a user accesses the gateway server 32 of the audio/video customization system 30 and selects a video media, for example a music video clip, from the media database 36 he or she wishes to customize, It is to be understood that the gateway server 32 may offer search capabilities, display music video media by categories, artist names, music video clip titles, etc.
- the user inputs the information of the customized music video clip's recipient.
- This information may be, for example, and email address, phone number, physical location address, etc. and may also include, optionally, a text message intended for the recipient.
- the user selects the number of messages to be inserted within the video media, for example one, two or three. It is to be understood that the number of available segments for the insertion of audio/video messages may vary depending on the selected video media or settings of the audio/video customization system 30. If the number of messages to be inserted within the video media is lesser than the number of available segments, the user may select which segments are to be filed or the audio/video customization system 30 may select the segments based on, for example, message length.
- the user is asked to input his or her payment information. This may be through a credit card, PaypalTM or any other suitable payment method. This step may also include the verification of the payment before proceeding.
- a recording interface is provided to the user, for example a web recording page on the gateway server 32, which includes, for example, a java audio/video engine.
- the user may be given the option of providing the audio/video message(s) as a video file in the form of any type of compressed video file, through email or even by mail on a DVD or other digital medium.
- the user may be given the option of providing his or her message(s) either by voice or by text, in which cases steps similar to steps 110 to 130 of process 100 (see FIGS. 2A and 2B) instead of eteps 312 to 316,
- the recording Interface allows the verification of the user's computer 12, 14 microphone levels, in order to prevent distortion in the recording, and/or video camera 13 picture quality.
- the audio/video message(s) is recorded using the user's computer 12, 14 microphone and video camera 13.
- the time allotted depends on the chosen video media and the number of messages to be inserted within the video media, an on-screen time bar may be used to give a visual indication of the remaining allotted time.
- the user can verify the recorded audio/video message(s) and, at block 318, accept or refuse the recorded audio/video message(s). If the user refuses the recorded audio/video message(s), the process 300 returns to block 314 where a new audio/video message(s) is recorded. If the recorded audio/video message(s) is accepted, the process 300 proceeds to block 320 where the recorded audio/video message(s) is mixed with the chosen video media.
- the video mixing process which is executed by the customization server 34 will be further detailed below.
- the mixed video media i.e. customized video media
- the mixed video media is saved, at block 322, as a video file in the customized media database 36.
- the customized video media Is provided to the intended recipient, for example, by email, on a DVD through regular mail, as a link to the customized video media in the customized media database 3 ⁇ or any other transmission means.
- the video media may be in the form of music video clips, movie extracts, short films, video commercials or other video media.
- the audio/video message(s) may be either audio only, video only or combined audio and video.
- FIG. 5 there is shown a flow diagram of an illustrative example of the video mixing process 400 executed by the customization server 34 at block 320 of the audio/video media customization process 300 (see FIGS. 4A and 4B). The steps of the process 400 are indicated by blocks 402 to 422.
- the video mixing process 400 automatically processes the audio portion of the recorded audio/video message(s) through DSP effects so that the audio portion of the recorded audio/video message(s) sound like it was recorded in a recording studio prior to being integrated into the music video clip. This gives the final product, i.e. the customized music video clip, a "professionally produced” sound.
- the process 400 starts at block 402, where the audio portion of the recorded audio/video message(s) is equalized and then, at block 404, compressed In order to regulate the volume, Noise reduction Is then applied, at block 406, to reduce background noise, followed by, at block 408, a noise gate to mute moments of silence.
- reverb is applied to add different room ambiences and, at block 212, fading such as very fast fades at the beginning and the ending of the audio portion of the recorded audio/video message(s) in order to prevent pops and clicks,
- the video mixing process 400 then automatically processes the video portion of the recorded audio/video message(s) through digital video filters in order to obtain optimal video quality prior to being integrated into the video media. This gives the final product, i.e. the customized video media, a "professionally produced” look.
- the brightness and contrast of the video portion of the recorded audio/video message(s) are adjusted and, at block 416, grain reduction is applied.
- the processed audio/video message(s) is inserted into the pre-determined segments of the pre-treated video media.
- the processed audio/video message(s) is strategically placed in the allotted time segment depending on the length of the message. If the user has not used up all of the time available for his or her message(s), the process 400 automatically places the processed audio/video message(s) at the end of the time allotted segment in order to maximize the "professionally produced" effect.
- Video encoding compression is then applied, at block 420, to optimize portability, which is then, at block 422, provided to block 322 of process 300 (see FIGS. 4A and 4B). It is to be understood that the audio portion of the video may be first extracted In order to perform blocks 402 to 418 solely on the audio portion after which the processed audio portion is recombined with the video at block 420.
- the audio/video customization system 30 enables the integration of text/captions and/or voice messages with user provided images and selected musical compositions.
- the customization involves the user providing a collection of images, for example photos taken on a trip, introductory text/captions and use them to create a "slide show", which consists of a series of visual transitions of the images and the audio of one or more musical compositions synchronized with the images.
- the user may select the musical compositions to use for a given subset of the images.
- the audio/video customization system 30 may recommend one or more musical compositions that best match the images based on a variety of attributes.
- the audio/video customization system 30 may further enable treated voice messages to be integrated into the musical compositions. This involves the insertion of user-generated audio or text-to-voice converted messages within segments of pre-treated musical compositions and mixing the resulting customized musical compositions before their synchronization with the images.
- FIG. 6 there is shown a flow diagram of an illustrative example of the slide show customization process 500 executed by the audio/video customization system 30.
- the steps of the process 500 are indicated by blocks 502 to 520.
- the process 500 starts at block 502 where a user accesses the gateway server 32 of the audio/video customization system 30 and is asked to input a collections of images, for example through an upload window accessing images stored on the user's personal computer 12, laptop computer 14, personal assistant device or tablet 16, mobile phone or smart phone 18.
- the user is asked to input an introductory text and/or captions to be associated with the collection of and/or individual images.
- information is extracted from each image, for example the location where the image was taken (e.g. using the GPS metadata produced by GPS enabled cameras), the color composition of the image (e.g. day time or night time based on pixel color spectrum), person(s) identified In the image (e.g. using face detection and facial recognition processes), etc.
- the audio/video customization system 30 recommends musical compositions to be used for the image collection based on the information extracted at block 506 and introductory text/captions inputted at block 504 compared to metadata associated with the musical compositions (e.g. city name, mood, season, etc.) as well as the lyrics of the musical compositions.
- metadata associated with the musical compositions e.g. city name, mood, season, etc.
- Information about the user and its interests for example extracted from a profile on a social network, or similar Information from persons identified in the images.
- An example of a musical compositions recommendation scheme will be further detailed below,
- the user may also be allowed to select its own musical compositions, for example by providing musical compositions search capabilities.
- one or more musical composition(s) her or she may be provided with, at block 512, the ability to customize the selected musical composition(s). It is to be understood that his step may be optional,
- the process 500 proceeds to block 514 where the audio media process, which was previously described, is performed.
- the user may be allowed to set desired transition effects between the various images of the image collection.
- the image to video conversion is performed, taking the collection of images and introductory text/captions, and producing a video where each image is shown for a given duration, transitioning with a predetermined effect (for example fade-in/out) or with desired effects if so selected at block 516,
- the video and audio i.e. the musical composition(s) or customized musical composition(s) are synchronized by adding the audio track to the video at defined time points.
- the slide show customization process 500 may Include, in an alternative embodiment, steps for providing an intended recipient and payment information, and for providing the slide show to the intended recipient, for example, by email, on a DVD through regular mail, as a link to the slide show in the customized media database 38 or any other transmission means.
- steps for providing an intended recipient and payment information, and for providing the slide show to the intended recipient may be omitted.
- Attributes from the inputted text/captions and images are extraoted, as well as from the user.
- the text/image/user attributes are then matched against the attributes of each musical composition.
- the result of the match is a single numeric score, which is then used to rank the musical compositions.
- the overall match score is a combination of the match scores from each pair of compatible attributes.
- the combination can be based on the arithmetic mean or the geometric mean of the attribute scores. Alternatively, it can also be a weighted mean of the scores, where the weights are either set by a human expert, or they are computed based on the regression analysis on a collection of samples that are previously scored by human editors.
- the text attributes include:
- T1 words in the text, original and stemmed, plus the bigrams
- the image attributes include:
- time of day which can be derived from the image's timestamp (e.g. in the EXIF metadata) and the time zone Information (if available);
- I2 geo-location of the image (e.g. in the EXIF metadata);
- I3 country and city names of the image, derived from I2, using a lookup database (many are available commercially);
- I4 color histogram of the image; and 15. "classes" of the image (e.g. night-time, quiet, vibrant, sunny, foggy, etc.), derived from 11 and I4.
- the classes are computed based on a previously trained model built from previously classified images (by human editors) using a statistical classifier such as a decision tree, or a large margin classifier such as SVM (Support Vector Machine).
- the user attributes Include:
- the genre of music the user likes can be obtained using a user interface element.
- the audio/video customization system 30 may obtain the information from a social network profile of the user.
- the musical compositions attributes include:
- C2 single and double word tags (e.g., "birthday”, “love”, “rock") assigned by human editors;
- C4 location tags, which are the country and city names that the song describes, if any.
- U1 the score is calculated based on TF- IDF and cosine similarity, which is commonly used for text matching with the bag-of-words model. The score is normalized to a value between 0 and 1 ; - between C4 and I3: the number of common locations, normalized to a value between 0 and 1 ; and
- the audio/video customization system 30 may provide the user with a selection of voice types (i.e. US male) for the synthesis.
- the audio/video customization system 30 may also recommend one or more voice types that best match the mood or the content of the audio media or musical composition, The matching can performed, for example, based on tags (e.g. "fast", "quiet") associated with the voice type and the audio media or musical composition.
- the audio/video customization system 30 may also be accessed via mobile phones and smart phones (including BlackberryTM, SymbianOSTM, IPhoneTM, Windows MobileTM, Google AndroidTM and any other such system/device), in which case the gateway server 32 may also include a specifically created graphical user interface.
- processes 100, 200, 300, 400 and 500 may be implemented individually or collectively as processor executable code stored within a memory of an associated device (i.e. customization server 34 and/or computing/communication devices 12, 14, 16, 18) to be executed by a processor of that device.
- the customization server 34 which includes a processor 40 with an associated memory 50 having stored therein processor executable instructions 51 , 52, 53, 54 and 55 for configuring the processor 40 to perform, respectively, processes 100, 200, 300, 400 and 500, and an Input output (I/O) interface 42.
- processor executable instructions 51 , 52, 53, 54 and 55 for configuring the processor 40 to perform, respectively, processes 100, 200, 300, 400 and 500, and an Input output (I/O) interface 42.
- I/O Input output
- oomputing/communication devices 12, 14, 16, 18 may be similarly provided with a processor, memory and I/O Interface.
- processes 100, 200, 300, 400 and 500 may all implemented on the same device or selectively only on some devices.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
A system and method for creating a customized video media, the method comprising acquiring a plurality of images, providing a plurality of musical compositions, prompting a user to select at least one of the plurality of musical compositions and creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
Description
SYSTEM AND METHOD FOR THE AUTOMATED CUSTOMIZATION
OF AUDIO AND VIDEO MEDIA
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefits of U.S. provisional patent application No, 61/747,085 filed on December 28, 2012, which is herein incorporated by reference.
TECHNICAL FIELD
[0002] The present disclosure relates to a system and method for the automated customization of audio and video media. More specifically, the present disclosure relates to a method and system for the Integration of audio messages into audio or video media and/or the i ntegration of audio/video or video messages into video media as well as the integration of text/captions and/or voice messages with user provided images and selected musical compositions.
BACKGROUND
[0003] Songs and music video clip revenues have been declining due to piracy. These revenues are still considered to be acceptable yet they do not represent the full revenue potential. Music labels have been confronted by the question: "Why pay for something when you can get it for free?"
[0004] Thus, there is a need for a solution to this downward spiral in the form of a service providing a novel use of digital media (I.e. songs and music clips) that can be used many times a year by the same person; one that generates revenues at every use, and a service that still manages to protect copyrights.
SUMMARY
[0005] The present disclosure provides a method for creating a customized video media, the method comprising:
acquiring a plurality of images;
providing a plurality of musical compositions;
prompting a user to select at least one of the plurality of musical compositions;
creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
[0006] There is also provided a method as above, further comprising the step of:
extracting attributes from the plurality of images or the at least one text caption; and
recommending to the user a set of musical compositions based on a matching of the extracted attributes and attributes associated with the plurality of musical compositions.
[0007] There is further provided a method as above, wherein the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the method further comprising the steps of;
acquiring at least one audio message;
applying digital signal processing effects to the at least one audio message in accordance with the treatment parameters;
inserting the processed at least one audio message within the pre-treated segment of at least one of the selected musical compositions; and mixing the inserted processed at least one audio message and at least one of the selected musical compositions.
[0008] There is further still provided a method as above, wherein the step of acquiring at least one audio message includes the sub-steps of:
acquiring a text message; and
converting the text message into the audio message using a text-to-speech synthesis process,
[0009] The present disclosure provides a system for creating a customized video media, the system comprising;
a database containing a plurality of musical compositions;
a user interface;
a processor operatively connected to the database and the user interface, the processor being so configured so as to:
acquire a plurality of images;
provide a plurality of musical compositions;
prompt a user to select at least one of the plurality of musical compositions; create the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
[0010] There is also provided a system as above, wherein the processor is further configured so as to:
extract attributes from the plurality of images or the at least one text caption; and
recommend to the user a set of musical compositions based on a matching of the extracted attributes and attributes associated with the plurality of musical compositions.
[0011] There is further provided a system as above, wherein the system further comprises:
a recording interface operatively connected to the processor;
wherein the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the processor being further configured so as to:
acquire at least one audio message;
apply digital signal processing effects to the at least one audio message In accordance with the treatment parameters;
insert the processed at least one audio message within the pre-treated segment of at least one of the selected musical compositions; and mix the Inserted processed at least one audio message and at least one of the selected musical compositions.
[0012] There is further still provided a system as above, wherein the processor is further configured so as to, when acquiring at least one audio message:
acquire a text message; and
convert the text message into the audio message using a text-to-speech synthesis process.
[0013] The present disclosure also provides a method and system for customizing an audio/video media provided with at least one a pre-treated segment for receiving a message, the customizing Including applying digital signal processing effects and/or digital video filters to the message in accordance with treatment parameters selected according to the audio/video media and/or the means of acquiring the message.
BRIEF DESCRIPTION OF THE FIGURES
[0014] Embodiments of the disclosure will be described by way of examples only with reference to the accompanying drawings, in which:
[0015] FIG. 1 is a schematic view of an illustrative example of the network operating environment of the audio/video customization system;
[0016] FIGS. 2A and 2B are a flow diagram of an illustrative example of the audio media customization process;
[0017] FIG. 3 is a flow diagram of an illustrative example of the audio mixing process;
[0018] FIGS, 4A and 4B are a flow diagram of an illustrative example of the video media customization process;
[0019] FIG. 5 is a flow diagram of an illustrative example of the video mixing process;
[0020] FIG. 6 is a flow diagram of an illustrative example of the slide show customization process; and
[0021] FIG. 7 iB a schematic representation of an illustrative example of the customization server.
[0022] Similar references used in different Figures denote similar components.
DEFINITIONS
[0023] The detailed description and figures refer to the following terms which are herein defined:
Audio media: includes audio recordings, musical compositions, songs, speeches, spoken words, poems, etc.
Video media: includes music video clips, movies, movie extracts, short films, video commercials, etc. It may have only video content or a combination of audio and video content.
Audio/video media: includes audio only media, video only media or combined audio and video media.
Audio/video message: includes audio only messages, video only messages or combined audio and video messages.
DETAILED DESCRIPTION
[0024] Generally stated, the non-limitative illustrative embodiment of the present disclosure provides a system and method for the automated customization of audio/video media. More specifically, the method and system allows the Integration of audio messages Into audio or video media and/or the integration of audio and video or video only messages Into video media as well as the creation of customized slide shows.
[0025] Referring to FIG. 1, a user using a personal computer 12, laptop computer 14, personal assistant device or tablet 16, mobile phone or smart phone 18 or any other such computing/communication device, on which may run a user interface in the form of a communication software such as, for example, a web browser or an app, may access a communication interface on a gateway server 32 of the audio/video customization system 30 via wide area network (WAN) 20 such as, for example, Ethernet (broadband, high-speed), wireless WiFi, cable Internet, satellite connection, cellular or satellite network, etc.
[0026] Further to the gateway server 32, the audio/video customization system 30 includes a customization server 34, a media database 36 and a customized media database 38, all of which will be detailed further below.
Audio media customization
[0027] The audio/video customization system 30 enables treated voice messages to be integrated into customized audio media such as, for example, musical compositions. The customization involves the insertion of user-generated audio or text-to-voice converted messages within segments of pre-treated audio media, mixing the resulting customized audio media and presenting it as a new audio file in the form of, for example, an MP3 (or any other type of compressed audio file).
[0028] Referring back to FIG. 1, the pre-treated audio media, for example consisting in new or pre-existing musical compositions or songs in which segments have been identified and modified in order to make space for the insertion of future audio messages, are stored in the media database 36. The identified and modified
segments are then used by the customization server 34 to allow an administrator of the audio/video customization system 30 to pre-program the positioning ant- length of each segment allotted for user-generated audio messages, Each pra- treated audio media is typically pre-programmed to receive either one, two or three user-generated audio or text-to-voice converted messages depending on the make-up of the audio media. However, it is to be understood that some audio media may be pre-programmed to receive more than three audio or text-to-voice converted messages.
[0029] Referring now to FIGS. 2A and 2B, there is shown a flow diagram of an illustrative example of the audio media customization process 100 executed by the audio/video customization system 30, The steps of the process 100 are indicated by blocks 102 to 138.
[0030] The process 100 starts at block 102 where a user accesses the gateway server 32 of the audio/video customization system 30 and selects an audio media, for example a song, from the media database 36 he or she wishes to customize. It Is to be understood that the gateway server 32 may offer search capabilities, display audio media by categories, artist names, titles, etc.
[0031] At block 104, the user inputs the information of the customized audio media's recipient. This information may be, for example, an email address, phone number, physical location address, etc., and may also include, optionally, a text message intended for the recipient.
[0032] Then, at block 106, the user selects the number of messages to be inserted within the audio media, for example one, two or three. It is to be understood that the number of available segments for the Insertion of audio messages may vary depending on the selected audio media or settings of the audio/video customization system 30. If the number of messages to be inserted within the audio media is lesser than the number of available segments, the user
may select which segments are to be filed or the audio/video customization system 30 may select the segments based on, for example, message length.
[0033] At block 108, the user is asked to input his or her payment Information. This may be through a credit card, Paypal™ or any other suitable payment method. This step may also include the verification of the payment before proceeding further.
[0034] Once the payment has been effectuated, the user is asked, at block 110, if he or she wishes to provide his or her message(s) either by voice or by text.
[0035] If the user has selected to provide voice message(s), the process proceeds to block 112.
[0036] At block 112, the user is asked to select the means for providing his or her voice message(s), The user may select to provide the voice message(s) by phone or through a web interface. In an alternative embodiment, the user may be given the option of providing the voice message(s) as an P3 (or any other type of compressed audio file), through email or even by mail on a CD or other digital medium.
[0037] if the user has selected to provide voice message(s) by phone, he or she is provided with, at block 114, a phone number to dial along with a pin associated with the selected audio media and recipient information.
[0038] If the user has selected to provide the voice message(s) through a web interface, he or she is directed, at block 116, to a recording interface, for example a web recording page on the gateway server 32, which includes, for example, a java audio engine. Then, at block 118, the recording interface allows the verification of the user's computer 12, 14 microphone levels in order to prevent distortion in the recording.
[0039] At block 120, the voice message(s) is recorded either the user's land phone, mobile phone or smart phone 18 (used as a microphone only in this case) through a telecommunication network 25 (land line, cellular network, etc.) or IP telephony, or computer 12, 14 microphone through the WAN 20, depending on the selected means of providing the voice message(s). The time allotted depends on the chosen audio media and the number of messages to be inserted within the audio media, an audible beep may be used as a warning of the end of the allotted time, for example five seconds before the end, or an on-screen time bar may be used to give a visual indication of the remaining allotted time, depending once more on the selected means of providing the voice message(s),
[0040] If the user has selected to provide text message(s), the process proceeds to block 122.
[0041] At block 122, the user Is asked to select the means for providing his or her text message(s). The user may select to provide the text message(s) by phone or through a web Interface. In an alternative embodiment, the user may be given the option of providing the text message(s) as a TXT file (or any other type of text file), through email or even by mail on a CD or other digital medium.
[0042] If the user has selected to provide text message(s) by phone, he or she is provided with, at block 124, a phone number to send a text message to along with a pin associated with the selected audio media and recipient information. The user is able to text the text message(s) using his or her mobile phone or smart phone 18 through a telecommunication network 25 (land line, cellular network, etc) or IP telephony.
[0043] If the user has selected to provide the text message(s) through a web interface, he or she is directed, at block 126, to a text input interface, for example a web page with a text input box on the gateway server 32.
[0044] At block 128, the text message(s) is converted into voice using a text- to-speech synthesis process, which processes are well known in the art. The
audio/video customization system 30 may provide the user with a selection of voice types (i.e. US male) for the synthesis. In addition, the audio/video customization system 30 may also recommend one or more voice types that best match the mood or the content of the audio media. The matching can performed, for example, based on tags (e.g. "fast", "quiet") associated with the voice type and the audio media.
[0045] At block 130, the user can verify the voice message(s) and, at block 132, accept or refuse the voice message(s). If the user refuses the voice message(s), the process 100 returns to either of blocks 114, 116, 124 or 126, depending on the means used to provide the voice/text message(s), where a new voice message(s) is recorded or produced from text. If the voice message(s) is accepted, the process 100 proceeds to block 134 where the voice message(s) is mixed with the chosen audio media. The audio mixing process, which is executed by the customization server 34, will be further detailed below.
[0046] Once the recorded audio message(s) and the chosen audio media have been mixed, the mixed audio media (i.e, customized audio media) is saved as, at block 136, for example, an MPS file in the customized media database 38.
[0047] Finally, at block 138, depending on the recipient information entered at block 104, the customized audio media is provided to the intended recipient, for example, by email, on a CD through regular mail, as a link to the customized audio media in the customized media database 38 or any other transmission means.
[0048] It is to be understood that in an alternative embodiment some or all of steps 102 to 130 may be part of an app for a personal assistant device or tablet 16 or a smart phone 18, or other such device, in which case the recording and/or text input interface is provided by the app. It is also to be understood that in various alternative embodiments the messages provided to the audio/video customization system 30 can be either audio or text to be converted to audio, or a combination thereof, and that these messages may be inputted into the audio/video
customization system 30 using various means or interfaces or combination thereof. Therefore, depending on the specific embodiment, some of blocks 110 to 128 may be optional.
Audio mixing
[0049] Referring to FIG. 3, there is shown a flow diagram of an illustrative example of the audio mixing process 200 executed by the customization server 34 at block 134 of the audio media customization process 100 (see FIGS. 2A and 2B). The steps of the process 200 are indicated by blocks 202 to 218.
[0050] The audio mixing process 200 automatically processes the voice message(s) through audio digital signal processing (DSP) affects so that the voice message(s) sound like It was recorded in a recording studio prior to being integrated into the audio media. This gives the final product, i.e. the customized audio media, a "professionally produced" sound.
[0051] The process 200 starts at block 202, where the voice message(s) is equalized and then, at block 204, compressed in order to regulate Its volume. Noise reduction is then applied, at block 206, to reduce background noise, followed by, at block 208, a noise gate to mute moments of silence.
[0052] At block 210, reverb is applied to add different room ambiences and, at block 212, fading such as very fast fades at the beginning and the ending of the recorded audio message(s) in order to prevent pops and clicks.
[0053] Following the DSP effects, at block 214, the processed voice message(s) Is inserted into the pre-determined segments of the pre-treated audio media. The processed voice message(s) is strategically placed in the allotted time segment(s) depending on the length of the message(s). If the user has not used up all of the time available for his or her message(s), the process 200 automatically places the processed voice message(s) at the end of the time allotted segment(s) in order to maximize the "professionally produced" effect.
[0054] Once the processed voice message(s) is integrated into the pre- treated audio media, audio encoding compression is applied, at block 216, to optimize portability, for example into an MP3 file, which is then, at block 218, provided to block 126 of process 100 (see FIGS. 2A and 2B).
Video media customization
[0055] The audio/video customization system 30 enables treated audio/video messages to be integrated into customized video media such as, for example, music video clips. The customization involves the insertion of user- generated audio/video messages within segments of pre-treated video compositions, mixing the resulting customized audio/video media and presenting it as a new video file in the form of any type of compressed video file.
[0056] Referring back to FIG. 1, the pre-treated musical compositions, which consist In new or pre-existing video clips media which segments have been identified and modified in order to make space for the Insertion of future audio/video messages, are stored in the media database 36. The identified and modified segments are then used by the customization server 34 to allow an administrator of the audio/video customization system 30 to pre-program the positioning and length of each segment allotted for user-generated audio/video messages. Each video media Is typically pre-programmed to receive either one, two or three user-generated audio/video messages depending on various factors, for example the musical make-up of a music video clip. However, it is to be understood that some video media may be pre-programmed to receive more than three audio/video messages.
[0057] Referring now to FIGS. 4A and 4B, there is shown a flow diagram of an Illustrative example of the video media customization process 300 executed by the audio/video customization system 30. The steps of the process 300 are Indicated by blocks 302 to 324.
[0058] The process 300 starts at block 302 where a user accesses the gateway server 32 of the audio/video customization system 30 and selects a video media, for example a music video clip, from the media database 36 he or she wishes to customize, It is to be understood that the gateway server 32 may offer search capabilities, display music video media by categories, artist names, music video clip titles, etc.
[0059] At block 304, the user inputs the information of the customized music video clip's recipient. This information may be, for example, and email address, phone number, physical location address, etc. and may also include, optionally, a text message intended for the recipient.
[0060] Then, at block 306, the user selects the number of messages to be inserted within the video media, for example one, two or three. It is to be understood that the number of available segments for the insertion of audio/video messages may vary depending on the selected video media or settings of the audio/video customization system 30. If the number of messages to be inserted within the video media is lesser than the number of available segments, the user may select which segments are to be filed or the audio/video customization system 30 may select the segments based on, for example, message length.
[0061] At block 308, the user is asked to input his or her payment information. This may be through a credit card, Paypal™ or any other suitable payment method. This step may also include the verification of the payment before proceeding.
[0062] Then, at block 310, a recording interface is provided to the user, for example a web recording page on the gateway server 32, which includes, for example, a java audio/video engine. In an alternative embodiment, the user may be given the option of providing the audio/video message(s) as a video file in the form of any type of compressed video file, through email or even by mail on a DVD or other digital medium. In a further alternative embodiment, the user may be given
the option of providing his or her message(s) either by voice or by text, in which cases steps similar to steps 110 to 130 of process 100 (see FIGS. 2A and 2B) instead of eteps 312 to 316,
[0063] At block 312, the recording Interface allows the verification of the user's computer 12, 14 microphone levels, in order to prevent distortion in the recording, and/or video camera 13 picture quality. At block 314, the audio/video message(s) is recorded using the user's computer 12, 14 microphone and video camera 13. The time allotted depends on the chosen video media and the number of messages to be inserted within the video media, an on-screen time bar may be used to give a visual indication of the remaining allotted time.
[0064] At block 316, the user can verify the recorded audio/video message(s) and, at block 318, accept or refuse the recorded audio/video message(s). If the user refuses the recorded audio/video message(s), the process 300 returns to block 314 where a new audio/video message(s) is recorded. If the recorded audio/video message(s) is accepted, the process 300 proceeds to block 320 where the recorded audio/video message(s) is mixed with the chosen video media. The video mixing process, which is executed by the customization server 34 will be further detailed below.
[0065] Once the recorded audio/video message(s) and the chosen video media have been mixed, the mixed video media (i.e. customized video media) is saved, at block 322, as a video file in the customized media database 36.
[0066] Finally, at block 324, depending on the recipient information entered at block 304, the customized video media Is provided to the intended recipient, for example, by email, on a DVD through regular mail, as a link to the customized video media in the customized media database 3Θ or any other transmission means.
[0067] It is to be understood that the video media may be in the form of music video clips, movie extracts, short films, video commercials or other video media. Furthermore, it is to be understood that the audio/video message(s) may be either audio only, video only or combined audio and video.
Video mixing
[0068] Referring to FIG. 5, there is shown a flow diagram of an illustrative example of the video mixing process 400 executed by the customization server 34 at block 320 of the audio/video media customization process 300 (see FIGS. 4A and 4B). The steps of the process 400 are indicated by blocks 402 to 422.
[0069] The video mixing process 400 automatically processes the audio portion of the recorded audio/video message(s) through DSP effects so that the audio portion of the recorded audio/video message(s) sound like it was recorded in a recording studio prior to being integrated into the music video clip. This gives the final product, i.e. the customized music video clip, a "professionally produced" sound.
[0070] The process 400 starts at block 402, where the audio portion of the recorded audio/video message(s) is equalized and then, at block 404, compressed In order to regulate the volume, Noise reduction Is then applied, at block 406, to reduce background noise, followed by, at block 408, a noise gate to mute moments of silence.
[0071] At block 210, reverb is applied to add different room ambiences and, at block 212, fading such as very fast fades at the beginning and the ending of the audio portion of the recorded audio/video message(s) in order to prevent pops and clicks,
[0072] The video mixing process 400 then automatically processes the video portion of the recorded audio/video message(s) through digital video filters in order to obtain optimal video quality prior to being integrated into the video media.
This gives the final product, i.e. the customized video media, a "professionally produced" look.
[0073] At block 414, the brightness and contrast of the video portion of the recorded audio/video message(s) are adjusted and, at block 416, grain reduction is applied.
[0074] Following the DSP effects and digital video filters, at block 418, the processed audio/video message(s) is inserted into the pre-determined segments of the pre-treated video media. The processed audio/video message(s) is strategically placed in the allotted time segment depending on the length of the message. If the user has not used up all of the time available for his or her message(s), the process 400 automatically places the processed audio/video message(s) at the end of the time allotted segment in order to maximize the "professionally produced" effect.
[0075] Once the processed audio/video message(s) is integrated into the pre-treated video media, the end of each video portion of the audio/video message(s) Is automatically dissolved back to the video media. Video encoding compression is then applied, at block 420, to optimize portability, which is then, at block 422, provided to block 322 of process 300 (see FIGS. 4A and 4B). It is to be understood that the audio portion of the video may be first extracted In order to perform blocks 402 to 418 solely on the audio portion after which the processed audio portion is recombined with the video at block 420.
Slide Show Customization
[0076] The audio/video customization system 30 enables the integration of text/captions and/or voice messages with user provided images and selected musical compositions. The customization involves the user providing a collection of images, for example photos taken on a trip, introductory text/captions and use them to create a "slide show", which consists of a series of visual transitions of the images and the audio of one or more musical compositions synchronized with the images. The user may select the musical compositions to use for a given subset of
the images. Additionally, the audio/video customization system 30 may recommend one or more musical compositions that best match the images based on a variety of attributes. The audio/video customization system 30 may further enable treated voice messages to be integrated into the musical compositions. This involves the insertion of user-generated audio or text-to-voice converted messages within segments of pre-treated musical compositions and mixing the resulting customized musical compositions before their synchronization with the images.
[0077] Referring to FIG. 6, there is shown a flow diagram of an illustrative example of the slide show customization process 500 executed by the audio/video customization system 30. The steps of the process 500 are indicated by blocks 502 to 520.
[0078] The process 500 starts at block 502 where a user accesses the gateway server 32 of the audio/video customization system 30 and is asked to input a collections of images, for example through an upload window accessing images stored on the user's personal computer 12, laptop computer 14, personal assistant device or tablet 16, mobile phone or smart phone 18.
[0079] At block 504, the user is asked to input an introductory text and/or captions to be associated with the collection of and/or individual images.
[0080] At block 506, information is extracted from each image, for example the location where the image was taken (e.g. using the GPS metadata produced by GPS enabled cameras), the color composition of the image (e.g. day time or night time based on pixel color spectrum), person(s) identified In the image (e.g. using face detection and facial recognition processes), etc.
[0081] Then, at block 508, the audio/video customization system 30 recommends musical compositions to be used for the image collection based on the information extracted at block 506 and introductory text/captions inputted at block 504 compared to metadata associated with the musical compositions (e.g. city name, mood, season, etc.) as well as the lyrics of the musical compositions.
Information about the user and its interests, for example extracted from a profile on a social network, or similar Information from persons identified in the images. An example of a musical compositions recommendation scheme will be further detailed below, Optionally, the user may also be allowed to select its own musical compositions, for example by providing musical compositions search capabilities.
[0082] Once the user selects, at block 510, one or more musical composition(s), her or she may be provided with, at block 512, the ability to customize the selected musical composition(s). It is to be understood that his step may be optional,
[0083] If the user elects to customize the one or more musical composition(s), the process 500 proceeds to block 514 where the audio media process, which was previously described, is performed.
[0084] At block 516, optionally, the user may be allowed to set desired transition effects between the various images of the image collection.
[0085] At block 518, the image to video conversion is performed, taking the collection of images and introductory text/captions, and producing a video where each image is shown for a given duration, transitioning with a predetermined effect (for example fade-in/out) or with desired effects if so selected at block 516, The video and audio (i.e. the musical composition(s) or customized musical composition(s)) are synchronized by adding the audio track to the video at defined time points.
[0086] Finally, at block 520, the assembled slide show with Its musical composition(s) is provided to the user.
[0087] It is to be understood that the slide show customization process 500 may Include, in an alternative embodiment, steps for providing an intended recipient and payment information, and for providing the slide show to the intended recipient, for example, by email, on a DVD through regular mail, as a link to the slide show in the customized media database 38 or any other transmission means.
[0088] Conversely, in alternative embodiments of the audio media customization 100 and video media customization 300 processes, the steps for providing an intended recipient and payment information, and for providing the slide show to the intended recipient may be omitted.
Musical Compositions Recommendation
[0089] Attributes from the inputted text/captions and images are extraoted, as well as from the user. The text/image/user attributes are then matched against the attributes of each musical composition. The result of the match is a single numeric score, which is then used to rank the musical compositions.
[0090] The audio/video customization system 30 presents the top-N (e.g. N = 3) choices for the user to choose based on a ranked list of musical compositions.
[0091] The overall match score is a combination of the match scores from each pair of compatible attributes. The combination can be based on the arithmetic mean or the geometric mean of the attribute scores. Alternatively, it can also be a weighted mean of the scores, where the weights are either set by a human expert, or they are computed based on the regression analysis on a collection of samples that are previously scored by human editors.
[0092] The text attributes include:
T1 : words in the text, original and stemmed, plus the bigrams,
[0093] The image attributes include:
11: time of day, which can be derived from the image's timestamp (e.g. in the EXIF metadata) and the time zone Information (if available);
I2: geo-location of the image (e.g. in the EXIF metadata);
I3: country and city names of the image, derived from I2, using a lookup database (many are available commercially);
I4: color histogram of the image; and
15. "classes" of the image (e.g. night-time, quiet, vibrant, sunny, foggy, etc.), derived from 11 and I4. The classes are computed based on a previously trained model built from previously classified images (by human editors) using a statistical classifier such as a decision tree, or a large margin classifier such as SVM (Support Vector Machine).
[0094] The user attributes Include:
U1; the genre of music the user likes. The information can be obtained using a user interface element. Alternative, the audio/video customization system 30 may obtain the information from a social network profile of the user.
[0095] The musical compositions attributes include:
C1: classes assigned by human editors, as in I5.
C2: single and double word tags (e.g., "birthday", "love", "rock") assigned by human editors;
C3: words in the lyrics and the title, original and stemmed, plus the bigrams; and
C4: location tags, which are the country and city names that the song describes, if any.
[0096] The following are how the pair-wise attribute match scores are computed:
- between C1 and I5; the number of common classes, normalized to a value between 0 and 1 ;
- between C2, C3 and T1, U1: the score is calculated based on TF- IDF and cosine similarity, which is commonly used for text matching with the bag-of-words model. The score is normalized to a value between 0 and 1 ;
- between C4 and I3: the number of common locations, normalized to a value between 0 and 1 ; and
- all other attribute pairs: zero.
[0097] It is to be understood that other musical compositions recommendation schemes may be used with the same or different set of attributes.
[0098] In alternative embodiments of the above described processes, whenever text message(s) are converted into voice using a text-to-speech synthesis process, the audio/video customization system 30 may provide the user with a selection of voice types (i.e. US male) for the synthesis. In addition, the audio/video customization system 30 may also recommend one or more voice types that best match the mood or the content of the audio media or musical composition, The matching can performed, for example, based on tags (e.g. "fast", "quiet") associated with the voice type and the audio media or musical composition.
[0099J It is to be understood that the audio/video customization system 30 may also be accessed via mobile phones and smart phones (including Blackberry™, SymbianOS™, IPhone™, Windows Mobile™, Google Android™ and any other such system/device), in which case the gateway server 32 may also include a specifically created graphical user interface.
[00100] It is also to be understood that although throughout the disclosure reference is made to separate servers 32 and 34 as well as separate databases 36 and 38, these may be implemented on one or more physical device and/or may be combined. It Is further to be understood that either, some or all of the separate servers 32 and 34 and databases 36 and 38 may be implemented on one or more of the computing/communication devices 12, 14, 16, 18, for example as an app.
[00101] Further still, it is to be understood that the above-described processes (i.e. processes 100, 200, 300, 400 and 500) may be implemented individually or collectively as processor executable code stored within a memory of an associated device (i.e. customization server 34 and/or
computing/communication devices 12, 14, 16, 18) to be executed by a processor of that device.
[00102] Referring to Fig. ?, there is shown an illustrative example of the customization server 34 which includes a processor 40 with an associated memory 50 having stored therein processor executable instructions 51 , 52, 53, 54 and 55 for configuring the processor 40 to perform, respectively, processes 100, 200, 300, 400 and 500, and an Input output (I/O) interface 42. It is to be understood that oomputing/communication devices 12, 14, 16, 18 may be similarly provided with a processor, memory and I/O Interface. It is to be further understood that processes 100, 200, 300, 400 and 500 may all implemented on the same device or selectively only on some devices.
[00103] Although the present disclosure has been described with a certain degree of particularity and by way of an illustrative embodiments and examples thereof, it is to be understood that the present disclosure is not limited to the features of the embodiments described and illustrated herein, but includes all variations and modifications within the scope and spirit of the disclosure as hereinafter claimed.
Claims
1. A method for creating a customized video media, the method comprising:
acquiring a plurality of images;
providing a plurality of musical compositions;
prompting a user to select at least one of the plurality of musical compositions;
creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
2. A method in accordance with claim 1, further comprising the step of:
acquiring at least one text caption associated with one of the plurality of images;
wherein the step of creating the customized video media includes mixing the at least one text caption with the acquired plurality of images and the selected plurality of musical compositions.
3- A method in accordance with claim 2, further comprising the step of:
extracting attributes from the plurality of Images or the at least one text caption; and
recommending to the user a set of musical compositions based on a matching of the extracted attributes and attributes associated with the plurality of musical compositions.
4. A method in accordance with claim 3, wherein the extracted attributes are selected from a group consisting of an image geo-locatlon metadata, an image color composition and a person Identified by a facial recognition process.
5. A method in accordance with claim 3, further comprising the step of:
extracting attributes from a profile of the user or a person identified in one of the plurality of images by a facial recognition process on a social network; wherein the matching is further based on the profile extracted attributes.
6. A method in accordance with any of claims 1 to 5, wherein the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the method further comprising the steps of:
acquiring at least one audio message;
applying digital signal processing effects to the at least one audio message in accordance with the treatment parameters;
inserting the processed at least one audio message within the pre-treated segment of at least one of the selected musical compositions; and mixing the inserted processed at least one audio message and at least one of the selected musical compositions.
7. A method in accordance with claim 6, wherein the step of acquiring at least one audio message includes the sub-steps of:
acquiring a text message; and
converting the text message into the audio message using a text-to- speech synthesis process.
8. A method in accordance with claim 6, wherein the sub-step converting the text message into the audio message further includes providing one or more voice selection associated with at least one of the selected musical compositions.
9. A method in accordance with any of claims 1 to 8, wherein the step of creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions includes applying a transition effect between the acquired plurality of images.
10. A method in accordance with claim 9, further comprising the step of:
prompting the user to select a transition effect from a plurality of transition effects.
11 A system for creating a customized video media, the system comprising:
a database containing a plurality of musical compositions;
a user interface;
a processor operatively connected to the database and the user interface, the processor being so configured so as to:
acquire a plurality of Images;
provide a plurality of musical compositions;
prompt a user to select at least one of the plurality of musical compositions;
create the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions.
12. A system in accordance with claim 11, wherein the processor is further configured so as to:
acquire at least one text caption associated with one of the plurality of images;
wherein creating the customized video media includes mixing the at least one text caption with the acquired plurality of images and the selected plurality of musical compositions.
13. A system in accordance with claim 12, wherein the processor is further configured so as to:
extract attributes from the plurality of images or the at least one text caption; and
recommend to the user a set of musical compositions based on a matching of the extracted attributes and attributes associated with the plurality of musical compositions.
1 . A system in accordance with claim 13, wherein the extracted attributes are selected from a group consisting of an image geo-location metadata, an image color composition and a person identified by a facial recognition process.
15. A system in accordance with claim 13, wherein the processor is further configured so as to:
extract attributes from a profile of the user or a person identified in one of the plurality of images by a facial recognition process on a social network; wherein the matching is further based on the profile extracted attributes.
16. A system in accordance with any of claims 11 to 15, wherein the system further comprises:
a recording interface operatively connected to the processor;
wherein the musical compositions are provided with at least one pre-treated segment for receiving an audio message and associated treatment parameters, the processor being further configured so as to:
acquire at least one audio message;
apply digital signal processing effects to the at least one audio message in accordance with the treatment parameters;
insert the processed at least one audio message within the pre-treated segment of at least one of the selected musical compositions; and mix the inserted processed at least one audio message and at least one of the selected musical compositions.
17. A system in accordance with claim 16, wherein the processor is further configured so as to, when acquiring at least one audio message:
acquire a text message; and
convert the text message into the audio message using a text-to- speech synthesis process.
1B.A system in accordance with claim 16, wherein the processor is further configured so as to, when converting the text message into the audio message: provide one or more voice selection associated with at least one of the selected musical compositions.
19. A system in accordance with any of claims 11 to 18, wherein the processor is further configured so as to, when creating the customized video media by mixing the acquired plurality of images and the selected plurality of musical compositions:
apply a transition effect between the acquired plurality of Images.
20. A system in accordance with claim 19, wherein the processor Is further configured so as to:
prompt the user to select a transition effect from a plurality of transition effects.
21. A method for customizing an audio/video media, the method comprising:
a. providing the audio/video media with a pre-treated segment for receiving a message that includes an audio portion, the audio/video media having associated treatment parameters;
b. acquiring the message;
c. applying digital signal processing effects to the audio portion of the message in accordance with the treatment parameters;
d. inserting the processed message within the pre-treated segment of the audio/video media in a position within the pre-treated segment depending on the length of the processed message; and
e. creating a customized audio/video media by mixing the inserted processed message and audio/video media.
22. A method in accordance with claim 21, wherein step a. includes the sub-steps of:
a1. displaying a list of available audio/video media with a pre-treated segment for receiving a message;
a2. prompting a user to select a listed audio/video media;
a3. providing the selected audio/video media.
23. A method In accordance with claim 21, wherein step b. includes the sub-steps of:
b1. prompting a user to provide a message;
b2. providing the user with a message recording interface; and
b3. acquiring the message.
24. A method in accordance with claim 21, further comprising:
f. prompting a user to input information related to an intended recipient; and
g. proving the customized media to the intended recipient.
25. A method in accordance with claim 21, further comprising:
f. prompting a user to input payment information.
26. A method in accordance with claim 21, wherein the audio/video media includes a plurality of pre-treated segments and wherein a plurality of messages are acquired, each message being associable with one of the plurality of pre- treated segments,
27. A method in accordance with claim 26, wherein each message is automatically associated with one of the plurality of pre-treated segments,
28. A method in accordance with claim 27, wherein each message is automatically associated with one of the plurality of pre-treated segments based on the length of each of the plurality of messages and the length of each of the plurality of pre-treated segments.
29. A method in accordance with claim 21, wherein the treatment parameters are selected in accordance with the means of acquiring the message.
30. A method in accordance with claim 21 , wherein the message is a text message and further comprising the step of converting the text message into audio using a text-to-speech synthesis process.
31. A method in accordance with claim 30, wherein the step of converting the text message into audio further includes providing one or more voice selection associated with the audio/video media.
32. A method in accordance with claim 21, wherein the audio/video media is a video media and the message is a video message, the method further comprising the step of applying digital video filters to the message in accordance with the treatment parameters before inserting the message into the pre-treated segment of the audioA/ideo media, wherein the treatment parameters include video treatment parameters selected in accordance with one of the following: the audio/video media or the means of acquiring the message.
33. A method in accordance with claim 21 , wherein the audio/video media is an audio and video media and the message is an audio and video message, the method further comprising the step of applying digital signal processing effects to an audio portion of the message and applying digital video filters to a video portion of the message in accordance with the treatment parameters before inserting the message into the pre-treated segment of the audio/video media, wherein the treatment parameters include video treatment parameters selected in accordance with one of the following: the audio/video media or the means of acquiring the message.
34. A system for customizing an audio/video media, the system comprising:
a database containing at least one audio/video media with a pre-treated segment for receiving a message;
a user interface;
a recording interface;
a processor operatively connected to the database, the user interface and the recording interface, the processor being so configured so as to:
display through the user interface a list of the audio/video media in the database, each of the audio/video media having associated treatment parameters;
prompt a user through the user interface to select a listed audio/video media;
prompt the user to provide a message through the recording interface; applying digital signal processing effects to the audio portion of the message in accordance with the treatment parameters of the selected audio/video media;
insert the processed message within the pre-treated segment of the selected communication media in a position within the pre-treated segment depending on the length of the processed message; and create a customized audio/video media by mixing the inserted processed message and the selected audio/video media.
35. A system in accordance with claim 34, wherein the treatment parameters are selected in accordance with the recording interface,
36. A system in accordance with claim 34, wherein the message is a text message, the processor being further configured so as apply a text-to-speech synthesis process to the text message.
37. A system in accordance with claim 34, wherein the audio/video media is a video media and the message is a video message, the processor being further configured so as to apply digital video filters to the message in accordance with the treatment parameters before inserting the message into the pre-treated segment of the audio/video media, wherein the treatment parameters include
video treatment parameters selected in accordance with one of the following: the audio/video media or the recording interface.
38. A system in accordance with claim 34, wherein the audio/video msdia is an audio and video media and the message is an audio and video message, the processor being further configured so as to apply digital signal processing effects to an audio portion of the message and applying digital video filters to a video portion of the message in accordance with the treatment parameters before inserting the message into the pre-treated segment of the audio/video media, wherein the treatment parameters include video treatment parameters selected in accordance with one of the following: the audio/video media or the recording interface.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261747085P | 2012-12-28 | 2012-12-28 | |
US61/747,085 | 2012-12-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014100893A1 true WO2014100893A1 (en) | 2014-07-03 |
Family
ID=51019599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA2013/001084 WO2014100893A1 (en) | 2012-12-28 | 2013-12-30 | System and method for the automated customization of audio and video media |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2014100893A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016028395A1 (en) * | 2014-08-18 | 2016-02-25 | KnowMe Systems, Inc. | Unscripted digital media message generation |
WO2016201376A1 (en) * | 2015-06-10 | 2016-12-15 | Piantedosi Avery | Alarm notification system |
US9973459B2 (en) | 2014-08-18 | 2018-05-15 | Nightlight Systems Llc | Digital media message generation |
US10037185B2 (en) | 2014-08-18 | 2018-07-31 | Nightlight Systems Llc | Digital media message generation |
TWI699663B (en) * | 2018-09-07 | 2020-07-21 | 台達電子工業股份有限公司 | Segmentation method, segmentation system and non-transitory computer-readable medium |
US10735360B2 (en) | 2014-08-18 | 2020-08-04 | Nightlight Systems Llc | Digital media messages and files |
US10735361B2 (en) | 2014-08-18 | 2020-08-04 | Nightlight Systems Llc | Scripted digital media message generation |
CN113572981A (en) * | 2021-01-19 | 2021-10-29 | 腾讯科技(深圳)有限公司 | Video dubbing method and device, electronic equipment and storage medium |
WO2022171052A1 (en) * | 2021-02-10 | 2022-08-18 | 北京字节跳动网络技术有限公司 | Video obtaining method and apparatus, video sharing method and apparatus, device, and medium |
US11449306B1 (en) | 2016-04-18 | 2022-09-20 | Look Sharp Labs, Inc. | Music-based social networking multi-media application and related methods |
US11481434B1 (en) * | 2018-11-29 | 2022-10-25 | Look Sharp Labs, Inc. | System and method for contextual data selection from electronic data files |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005076618A1 (en) * | 2004-02-05 | 2005-08-18 | Sony United Kingdom Limited | System and method for providing customised audio/video sequences |
US7301093B2 (en) * | 2002-02-27 | 2007-11-27 | Neil D. Sater | System and method that facilitates customizing media |
EP1879195A1 (en) * | 2006-07-14 | 2008-01-16 | Muvee Technologies Pte Ltd | Creating a new music video by intercutting user-supplied visual data with a pre-existing music video |
US20080215979A1 (en) * | 2007-03-02 | 2008-09-04 | Clifton Stephen J | Automatically generating audiovisual works |
US20110264755A1 (en) * | 2008-10-08 | 2011-10-27 | Salvatore De Villiers Jeremie | System and method for the automated customization of audio and video media |
-
2013
- 2013-12-30 WO PCT/CA2013/001084 patent/WO2014100893A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7301093B2 (en) * | 2002-02-27 | 2007-11-27 | Neil D. Sater | System and method that facilitates customizing media |
WO2005076618A1 (en) * | 2004-02-05 | 2005-08-18 | Sony United Kingdom Limited | System and method for providing customised audio/video sequences |
EP1879195A1 (en) * | 2006-07-14 | 2008-01-16 | Muvee Technologies Pte Ltd | Creating a new music video by intercutting user-supplied visual data with a pre-existing music video |
US20080215979A1 (en) * | 2007-03-02 | 2008-09-04 | Clifton Stephen J | Automatically generating audiovisual works |
US20110264755A1 (en) * | 2008-10-08 | 2011-10-27 | Salvatore De Villiers Jeremie | System and method for the automated customization of audio and video media |
Non-Patent Citations (1)
Title |
---|
HYUN SUNG CHANG ET AL.: "Efficient Video Indexing Scheme for Content- Based Retrieval", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 9, no. 8, 1 December 1999 (1999-12-01), PISCATAWAY , N.J, US, pages 1269 - 1279 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10992623B2 (en) | 2014-08-18 | 2021-04-27 | Nightlight Systems Llc | Digital media messages and files |
US10728197B2 (en) | 2014-08-18 | 2020-07-28 | Nightlight Systems Llc | Unscripted digital media message generation |
US9973459B2 (en) | 2014-08-18 | 2018-05-15 | Nightlight Systems Llc | Digital media message generation |
US10037185B2 (en) | 2014-08-18 | 2018-07-31 | Nightlight Systems Llc | Digital media message generation |
US10038657B2 (en) | 2014-08-18 | 2018-07-31 | Nightlight Systems Llc | Unscripted digital media message generation |
US10691408B2 (en) | 2014-08-18 | 2020-06-23 | Nightlight Systems Llc | Digital media message generation |
US10735360B2 (en) | 2014-08-18 | 2020-08-04 | Nightlight Systems Llc | Digital media messages and files |
WO2016028395A1 (en) * | 2014-08-18 | 2016-02-25 | KnowMe Systems, Inc. | Unscripted digital media message generation |
US10735361B2 (en) | 2014-08-18 | 2020-08-04 | Nightlight Systems Llc | Scripted digital media message generation |
US11082377B2 (en) | 2014-08-18 | 2021-08-03 | Nightlight Systems Llc | Scripted digital media message generation |
US11670152B2 (en) | 2015-06-10 | 2023-06-06 | Avery Piantedosi | Alarm notification system |
WO2016201376A1 (en) * | 2015-06-10 | 2016-12-15 | Piantedosi Avery | Alarm notification system |
US11449306B1 (en) | 2016-04-18 | 2022-09-20 | Look Sharp Labs, Inc. | Music-based social networking multi-media application and related methods |
US11797265B1 (en) | 2016-04-18 | 2023-10-24 | Look Sharp Labs, Inc. | Music-based social networking multi-media application and related methods |
TWI699663B (en) * | 2018-09-07 | 2020-07-21 | 台達電子工業股份有限公司 | Segmentation method, segmentation system and non-transitory computer-readable medium |
US11481434B1 (en) * | 2018-11-29 | 2022-10-25 | Look Sharp Labs, Inc. | System and method for contextual data selection from electronic data files |
US11971927B1 (en) | 2018-11-29 | 2024-04-30 | Look Sharp Labs, Inc. | System and method for contextual data selection from electronic media content |
CN113572981A (en) * | 2021-01-19 | 2021-10-29 | 腾讯科技(深圳)有限公司 | Video dubbing method and device, electronic equipment and storage medium |
CN113572981B (en) * | 2021-01-19 | 2022-07-19 | 腾讯科技(深圳)有限公司 | Video dubbing method and device, electronic equipment and storage medium |
WO2022171052A1 (en) * | 2021-02-10 | 2022-08-18 | 北京字节跳动网络技术有限公司 | Video obtaining method and apparatus, video sharing method and apparatus, device, and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014100893A1 (en) | System and method for the automated customization of audio and video media | |
US11960526B2 (en) | Query response using media consumption history | |
US20110264755A1 (en) | System and method for the automated customization of audio and video media | |
CN105095508B (en) | A kind of multimedia content recommended method and multimedia content recommendation apparatus | |
CN101395607B (en) | Method and device for automatic generation of summary of a plurality of images | |
US20140164507A1 (en) | Media content portions recommended | |
US20150127643A1 (en) | Digitally displaying and organizing personal multimedia content | |
US20140161356A1 (en) | Multimedia message from text based images including emoticons and acronyms | |
US20090150797A1 (en) | Rich media management platform | |
US20190335243A1 (en) | Reminders of Media Content Referenced in Other Media Content | |
CN111368141B (en) | Video tag expansion method, device, computer equipment and storage medium | |
US20200137011A1 (en) | Method and system for communicating between a sender and a recipient via a personalized message including an audio clip extracted from a pre-existing recording | |
KR20070104614A (en) | Automatic generation of trailers containing product placements | |
US20090154665A1 (en) | Authenticated audiographs from voice mail | |
JP7155248B2 (en) | Implementing a Cue Data Model for Adaptive Presentation of Collaborative Recollection of Memories | |
CN114173067B (en) | Video generation method, device, equipment and storage medium | |
CN103761263A (en) | Method for recommending information for users | |
TW200849030A (en) | System and method of automated video editing | |
US20140161423A1 (en) | Message composition of media portions in association with image content | |
CN104038774B (en) | Generate the method and device of ring signal file | |
US20140078331A1 (en) | Method and system for associating sound data with an image | |
US20200302933A1 (en) | Generation of audio stories from text-based media | |
JP5997108B2 (en) | Content distribution apparatus, content distribution method and program | |
US20150079947A1 (en) | Emotion Express EMEX System and Method for Creating and Distributing Feelings Messages | |
KR20150111524A (en) | An apparatus and a method of providing an advertisement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13866745 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13866745 Country of ref document: EP Kind code of ref document: A1 |