WO2015091633A1

WO2015091633A1 - Apparatus and method for converting text to media clips

Info

Publication number: WO2015091633A1
Application number: PCT/EP2014/078201
Authority: WO
Inventors: David Bailey
Original assignee: Wordeo Limited
Priority date: 2013-12-17
Filing date: 2014-12-17
Publication date: 2015-06-25
Also published as: GB2523075A; GB201322298D0

Abstract

A computer-implemented method of converting a text-based message into a sequence of media clips. The method is carried out by a user device and comprises receiving a text-based message comprising a plurality of words and sending the text based message, via a network connection, to a remote server. At the remote server the message is separated into constituent words whereby sets of media clips are each associated with one or more of the respective words of the message, the media clips in a given set being identified based on associated metadata related to the respective word. The user device receives at least a first automatically selected media clip from each of the sets of media clips for the respective words of the message, wherein the sequence of media clips contains one automatically selected media clip from each of the sets of media clips. For one or more of the words in the text based message, user input is received selecting a media clip to replace an automatically selected media clip for inclusion in the sequence of media clips. The user input may select a media clip from the set of media clips associated with the word or an additional media clip accessible by the user device from an alternative source. Corresponding user devices and remote servers are also provided.

Description

APPARATUS AND METHOD FOR CONVERTING TEXT TO MEDIA CLIPS TECHNICAL FIELD

Embodiments of the invention relate to systems and methods for converting a string of text into a sequence of related media clips with accompanying soundtrack and text.

BACKGROUND

Many different forms of electronic communication are now available, and a number of these rely upon the transmission of text from a sender to a recipient. It would be desirable to provide users with alternative ways of expressing their message to a recipient in an efficient manner whilst still providing the sender with a degree of control over the content that is being sent to the recipient.

SUMMARY OF INVENTION

The invention is defined in the independent claims to which reference is now directed.

Preferred features are set out in the dependent claims.

Embodiments of the invention may provide a system and method for converting a text based message into a sequence of media clips, each media clip being representative of one of the words within the text-based message. This is achieved by separating the text based message into its constituent words and, for at least some of the words, performing a search for relevant media clips. Since it is likely that more than one relevant clip will be identified in each search, a subset of the results for each word may be selected, each subset being associated with the relevant word from the text-based message. Initially, a clip from each set may be automatically selected by the method or system, with these automatically selected clips being the default clips used when creating the sequence of media clips representative of the text based message. The user is also presented with the option of replacing any of the media clips with an alternative media clip from the respective search result set associated with a given word or including a self or user generated piece of media. Once the relevant clips for each word in the text-based message have been selected, a final sequence of media clips is generated, and may be communicated to a recipient party.

According to a first aspect of the invention there is provided a computer- implemented method of converting a text-based message into a sequence of media clips. The method is carried out by a user device and comprises receiving a text-based message comprising a plurality of words and sending the text based message, via a network connection, to a remote server. At the remote server the message is separated into constituent words whereby sets of media clips are each associated with one or more of the respective words of the message, the media clips in a given set being identified based on associated metadata related to the respective word. The user device then receives, via the network connection, at least a first automatically selected media clip from each of the sets of media clips for the respective words of the message, wherein at this stage the sequence of media clips contains one automatically selected media clip from each of the sets of media clips. The method further comprises, for one or more of the words in the text based message, receiving user input selecting a media clip to replace an automatically selected media clip for inclusion in the sequence of media clips. The user input may select a media clip from the set of media clips associated with the word or an additional media clip accessible by the user device from an alternative source. Such a method allows a user to alter an initial sequence of media clips representing a text based message.

The user input selecting a media clip to replace an automatically selected media clip may cause the user device to receive, via the network connection, one or more further media clips from the set of media clips associated with that word. In this way possible replacement media clips are only downloaded by the user device as and when the user opts to view or select a replacement, saving on bandwidth when a user does not browse through all the possible clips in the sets. Alternatively, the step of receiving, via the network connection, at least a first automatically selected media clip from each of the sets of media clips may further include receiving one or more further media clips from each of the sets of media clips. In this way, all the media clips are downloaded at the same time, giving the user a more seamless selection experience.

The method may further comprise displaying a digital object representative of an automatically selected media clip from the set of media clips associated with a word in the text based message. In response to receiving user input of a first kind, the method may include displaying a digital object representative of a second media clip from the set of media clips associated with the word in the text based message and also selecting the second media clip for inclusion in the sequence of media clips. This allows users to select different clips within a group. The selection step may, optionally, require further user input of a different kind. In response to receiving user input of a second type, the method may include displaying a digital object representative of an automatically selected media clip from the set of media clips associated with a second word in the text based message. In response to subsequently receiving user input of the first type whilst the digital object representative of the media clip associated with the second word is being displayed, the method may further include displaying a digital object representative of a second media clip from the set of media clips associated with the second word in the text based message, and selecting the second media clip for inclusion in the sequence of media clips. This allows a user to scroll through the different words to review and optionally select alternative clips for the other words. The user device may include a touch screen or gesture controlled display and the user input of the first kind may be a swipe gesture in a first direction and the user input of the second kind may be a swipe gesture in a second direction different to the first direction, such as vertical and horizontal swipes, or vice versa. The digital object may be, for example, a still image from a video. The method may further comprise displaying a preview of a media clip when user input is received selecting the digital object representative of the media clip. In response to receiving user input, the method may further include displaying a digital object representative of an additional media clip obtained from an alternative source and selecting the media clip for inclusion in the sequence of media clips. The additional media clip accessible by the user device from an alternative source may be a user generated media clip, such as a media clip is generated by a video camera integrated with the user device.

Each media clip may be of a relatively short duration such as between 1 and 3 seconds in duration; 2 seconds in duration; or approximately 2 seconds in duration. The maximum number of words that may be included within the text based message may be selected such that the sequence of media clips cannot exceed a predetermined maximum period of time. The maximum number of words may correspond to the maximum period of time for the sequence of media clips divided by the duration of a clip.

The text-based message may be input into the user device by a user via a user input device such, as a keyboard or microphone couple with voice to text software, and the method may further comprise, as the text-based message is received, calculating the total length of the resulting sequence of media clips by associating each word with a

predetermined period of time and calculating the sum of these periods of time. The method may further comprise comparing the calculated total length of the sequence of media clips with an upper threshold limit and displaying an indicator to the user when the upper threshold limit is exceeded. Words of a first type may be associated with a first predetermined period of time and words of a second type may be associated with a second predetermined period of time, the first period of time being greater than the second period of time. The words of a second type may be identified based on the number of letters in the word, whereby words having a number of letters less than or equal to a predetermined number are associated with the second period of time. The first predetermined period of time may be between 1 and 3 seconds in duration; 2 seconds in duration; or approximately 2 seconds in duration.

The method may further include sending data to the remote server indicative of the revised sequence of media clips to be combined into a revised sequence of clips.

Therefore the combination of clips into a final sequence may occur at the server. The sets of media clips received by the user device may be low-resolution media clips and the revised sequence of media clips may be created at the server by combining high-resolution versions of the selected media clips at the remote server. The method may further comprise sending any selected media clips obtained from an alternative source to the remote server for inclusion in the final sequence of clips. According to the first aspect a computer-implemented method for use in the conversion of a text based message into a sequence of media clips is provided, the method being performed at a remote server system. The method includes receiving from a user device, via a network connection, a text based message comprising a plurality of words. The text based message is separated into the constituent words and one or more of the constituent words are compared with metadata contained within a database to identify, for each of the compared words, a set of media clips relevant to the word. The method further includes sending to the user device, via the network connection, at least a first

automatically selected media clip from a set of media clips for each of the compared words and at least a second media clip from a set of media clips for at least one of the compared words.

The server method may further comprise receiving a request from the user device to select a media clip to replace an automatically selected media clip and in response sending, via the network connection, one or more further media clips from the set of media clips associated with that word. Alternatively, the step of sending to the user device, via the network connection, at least a first automatically selected media clip may include sending one or more further media clips from each of the sets of media clips.

The step of sending the media clips for each word may comprise sending low- resolution media clips.

The method may further comprise receiving from the user device, in relation to a particular word, data indicating a user selected media clip from one or more of the sets of media clips, and combining the selected media clips into a sequence of media clips.

Combining the selected media clips into a sequence of media clips may be performed using high-resolution versions of the media clips. The method may further comprise receiving, from the user device, one or more additional media clips from one or more alternative sources accessible by the user device and data indicating a word within the text based message that each of the additional media clips is to be associated with; and combining the selected media clips and the user generated media clips into a sequence of media clips.

According to the first aspect there may be provided an electronic user device that comprises a user input device, a network connection, one or more processors and a memory. A program is stored in the memory and configured to be executed by the one or more processors. The program includes instructions to: send, via the network connection, a text based message to a remote server to be separated into constituent words, whereby sets of media clips are each associated with respective words in the message, the media clips in a given set being identified based on associated metadata related to the respective word; and receive, via the network connection, at least a first automatically selected media clip from each of the sets of media clips for the respective words in the message; whereby the sequence of media clips contains one automatically selected media clip from each of the sets of media clips. The program further includes instructions to: for one or more of the words in the text based message, receive user input to select a media clip and replace an automatically selected media clip for inclusion in the sequence of media clips to generate a revised sequence of media clips, the media clip being selected from the set of media clips associated with the word or an additional media clip accessible by the user device from one or more alternative sources. The user device may be configured to generate a revised sequence of media clips including the user selected media clips or alternatively to send data to the remote server indicative of a revised sequence of media clips including the user selected media clips to be combined into a revised sequence of media clips. The user input device may include a touch screen or gesture controlled display. The user input device may include an integral video camera.

The alternative source may include a memory of the user device. The alternative source may alternatively or additionally include a remote server storing one or more media clips. Different clips may therefore be obtained from different sources. According to the first aspect there may be provided a remote server system that comprises a network connection, one or more processors, and memory. A program is stored in the memory and configured to be executed by the one or more processors. The program includes instructions to cause the server system to: receive from an electronic user device, via the network connection, a text-based message comprising a plurality of words; separate the text-based message into the constituent words; compare one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a set of media clips relevant to the word; and send to the user device, via the network connection, at least a first automatically selected media clip from a set of media clips for each of the compared words and at least a second media clip from a set of media clips for at least one of the compared words.

The server system may further comprise a database containing a plurality of media clips associated with the metadata, the database storing a high-resolution version of each clip and a low-resolution version of each clip. The program may further include instructions to: send low-resolution versions of the clips to the user device; receive from the user device, in relation to a particular word, data indicating a user selected media clip from one or more of the sets of media clips; and combine the selected media clips into a sequence of media clips using high-resolution versions of the media clips. The program may further include instructions to: receive, from the user device, one or more additional media clips from one or more alternative sources accessible by the user device and data indicating a word within the text based message that each of the additional media clips is to be associated with; and combine the selected media clips and the additional media clips into a sequence of media clips. A method of converting a text based message into a sequence of media clips may be provided, the method comprising: receiving a text-based message comprising a plurality of words; sending, via a network connection, the text-based message to a remote server to be separated into constituent words, whereby sets of media clips are each associated with respective words in the message, the media clips in a given set being identified based on associated metadata related to the respective word; receiving, via the network connection, at least a first automatically selected media clip from each of the sets of media clips for the respective words in the message; wherein the sequence of media clips contains one automatically selected media clip from each of the sets of media clips, the method further comprising: for one or more of the words in the text based message, receiving user input to select a media clip to replace an automatically selected media clip for inclusion in the sequence of media clips to generate a revised sequence of media clips, the user input selecting a media clip from the set of media clips associated with the word or an additional media clip accessible by the user device from an alternative source.

A method for use in the process of converting a text based message into a sequence of media clips may be provided, the method comprising: receiving a text based message comprising a plurality of words; separating the text based message into the constituent words; comparing each of the constituent words with metadata contained within a database to identify, for each word, a plurality of media clips relevant to the word;

selecting a set of media clips and associating that set with the word; and sending the set of media clips for each word to a user device.

A method of converting a text based message into a sequence of media clips may be provided, the method comprising: receiving a text based message comprising a plurality of words; separating the text based message into the constituent words; comparing one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a set of media clips relevant to the word; automatically selecting, from each of the sets of media clips, a media clip; for one or more of the words in the text based message, receiving user input to select a media clip to replace the automatically selected media clip for inclusion in the sequence of media clips; and generating a sequence of media clips using the selected media clips.

A system may be provided comprising an electronic user device and a remote server system. The electronic user device is configured to: send, via the network connection, a text based message to a remote server to be separated into constituent words, whereby sets of media clips are each associated with respective words in the message, the media clips in a given set being identified based on associated metadata related to the respective word; receive, via the network connection, at least a first automatically selected media clip from each of the sets of media clips for the respective words in the message; wherein the sequence of media clips contains one selected media clip from each of the set of media clips. The user device is further configured to: for one or more of the words in the text based message, receive user input to select a media clip and replace an automatically selected media clip for inclusion in the sequence of media clips. The remote server system is configured to: receive from the electronic user device the text- based message comprising a plurality of words; separate the text-based message into the constituent words; compare one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a set of media clips relevant to the word; send to the user device, via the network connection, at least a first

automatically selected media clip from a set of media clips for each of the compared words and at least a second media clip from a set of media clips for at least one of the compared words. The system is configured to generate a sequence of media clips using the selected media clips. This step may take place at the server system or at the user device.

According to a second aspect there is provided a computer-implemented method of converting a text-based message into a sequence of media clips that does not necessarily require the providing the user with the option of replacing media clips with alternatives. The method comprises, at a user device, receiving a text-based message comprising a plurality of words and sending, via a network connection, the text-based message to a remote server to be separated into constituent words, whereby media clips are associated with one or more of the words in the message, the media clips being identified based on associated metadata related to a respective word. Embodiments according to this second aspect do not require the user selection functionality, whereby users can select

replacement clips to replace automatically selected clips.

A corresponding computer-implemented method may be provided to be executed at a remote server system, the method comprising receiving from a user device, via a network connection, a text based message comprising a plurality of words, separating the text based message into the constituent words and comparing one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a media clip relevant to the word. The final sequence of media clips may be accompanied by an overarching audio soundtrack. Different soundtracks can be chosen by the user. Alternatively, or in addition, each clip may be accompanied by a graphical display of the word with which that clip is associated or relevant to. Different fonts can be chosen by the user for these displayed words.

BRIEF DESCRIPTION OF THE FIGURES

Embodiments of the invention will now be described with reference to the accompanying figures in which:

Figure 1 is an example of a method according to the present invention;

Figure 2 is an example of a method carried out by a user device according to the present invention;

Figure 3 is an example of a layout for a set of screens for previewing media clips and selecting alternative clips; and

Figure 4 is an example of a system according to the present invention. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Figure 1 is an example of a method for converting a text-based message into a sequence of media clips. The method is generally implemented using a system comprising an electronic user device and one or more remote servers making up a server side system. The user device may be, for example, a mobile phone, smart phone, tablet, laptop, PDA or other type of user operated computing device. The remote server may be a single server unit, or may be implemented as a distributed system, such as a cloud based or virtual server system.

Initially, a text-based message is received at the user device. The text-based message may be received by being directly entered by the user, being input as text using a user input device such as a keyboard or a touch screen display including a virtual keyboard for the user to compose their message. Alternatively, the text based message may be received at the user device from another source, such as text copied from a website or a message received from another user device. At step 102 the text based message is parsed to divide it up into its constituent words. Parsing can be achieved using any suitable software based method, which will not be described in detail here, and may also include language processing to determine the meaning of words in context for example by analysing the use of the word as a verb or the use of plurals. The parsing step may optionally be performed at the server side, on receipt of the text based message from the user device, with the user device and the server being connected via a network such as the internet. Once the text message has been parsed the method involves identifying, for each constituent word, relevant media clips that are representative of the word. For the purposes of the present example it will be assumed that the media clips are video clips that are also associated with an accompanying audio element or stream. It will, however, be appreciated that the media clips may be intended to be video only or audio only.

At step 103 a database search is performed. Some or all of the constituent words of the text-based message are each compared with a metadata database, which may optionally be a database stored within, or accessible by, the server system. The metadata database contains one or more descriptive terms each associated with a particular media clip which may be referenced by the metadata database, or contained within the same database structure. In order to generate the metadata database, key words are associated with each video clip that is accessible by the system. In this way, a given word forming part of a text-based message is compared with the metadata database to identify appropriate media clips that are suitable to represent the word in a final sequence of media clips that is to be sent to a recipient user device.

It will be appreciated that a given word may result in a number of matches from the database, whereby the word appears in the metadata associated with a large number of media clips. Where this is the case, the method may optionally include a step 104 of selecting a predetermined number of media clips to return in the search results. This may be performed in two stages or as a single step. For example, the method may involve selecting 50 media clips from the total possible results, even though a large number (such as 500 or more) may be found. From these limited results a further set may be selected, such as five clips, these selected clips forming a subset of the search results. The first or second stage may involve selecting clips from the results randomly. Alternatively, where a weighting or ranking system is used, the most relevant search results may be selected. Alternatively, the selection of the set of clips from the search results may be performed in a single step, selecting a predetermined number, for example up to five or up to ten clips, from the results which may then be presented to the user in the manner described below. The final sequence of media clips representative of the original text based message is created by selecting, from each set of clips associated with each word of the text based message, one of the media clips from that set. At step 105, an automatic selection is made of a media clip from each set associated with a given word. This may be a random selection from the set (e.g. a random selection from the set of five to ten clips selected from the search results). This automatic selection may take place by logic enacted at the user device or at the server system.

Without any further user input the final sequence of media clips may simply comprise the automatically selected clips from each set of clips. However, according to embodiments of the invention, the user is given the opportunity, at step 106, to provide input selecting a replacement media clip from a set of media clips associated with a particular word in the text-based message. Optionally the opportunity to provide user input to select a media clip is provided for each word within the text based message, or for each word for which a set of media clips has been identified. This may, however, not be possible if the search results only return a single match, or do not return any matches, for a given word. Where a choice is provided the user may provide user input of the type described herein to optionally preview the different available clips and to select a preferred clip from a set associated with a given word. If the user provides input selecting an alternative clip to the automatically selected clip, the automatically selected clip is replaced with the user's choice at step 107. If the user does not provide any input selecting an alternative clip then the automatically selected clip may be used.

Once the final selection of media clips has been made a combined sequence of media clips may be generated, at step 108, using the selected media clips. Generating this sequence of media clips may comprise generating data indicative of the media clips and their ordering, which may then be forwarded to the server system to combine together the media clips and share the resulting combination with a recipient party via a recipient user device or by posting it to a website. Alternatively, generating a sequence of media clips may involve combining the clips together at the user device, for forwarding directly on to the desired recipient.

Optionally, user generated content can be used as replacement media clips for any of the words within the text based message. Step 106 of Figure 1 may be expanded to include receiving input selecting a replacement media clip from a memory associated with, or accessible by, the user device. Allowing the selection of user generated content allows additional choice for the user where, for example, the search results for a given word only return a single match, or do not return any matches. The user may provide user input of the type described herein to optionally preview their user generated content. If the user provides input selecting an alternative user generated clip to the automatically selected clip, the automatically selected clip is replaced with the user's choice. User generated content may include video generated by the user using the user device, for example, a video camera device integral with, or coupled to, their user device. Alternatively, or additionally, the user generated content may include content stored on the user device or in a memory accessible by the user device such as a personal video collection.

Figure 2 shows the steps in a method such as that described in relation to Figure 1 that may be applied at the user device. Initially, at step 201 , the text based message is created in the same manor as described above for step 101. The text may then be sent to the remote server system where parsing and metadata database searches are performed for the constituent words making up the text-based message. The user device may then receive the results of the metadata searches performed at the server system, the results being sets of media clips each comprising one or more media clips, associated with words making up the text based message, at step 202. The received sets of media clips are identified as having a sequence corresponding to the sequence of words appearing in the text based message, such that the ordering of the sets of media clips corresponds to the ordering of the words in the text-based message. The user device may receive the different sets of media clips with an indication of an automatically pre-selected clip for each word. Alternatively, the user device may be configured to automatically select a media clip for each word from the respective sets. At step 203 the user is able to adjust the media clips used for each word in the sequence. In particular, the user is able to preview the selected media clip associated with each word and decide whether they are happy with the selection or not. If the user is happy with the selection then they do not need to provide any further input for the particular word. If the user wishes to browse the other options contained within the set of media clips then they can do so by providing user input to identify and preview the desired clip. In addition, or alternatively, the user can select additional clips from alternative sources other than the database/library used by the remote server. In order to give an indication of the content of the clip, the user device may be configured to display a digital object

representative of each media clip from the set of media clips or the media clip or clips obtained from the alternative source or sources. The digital object may be a still image from the video clip for example, and if the user provides input identifying or selecting a given image, the associated media clip will preview or play. If the user prefers one of the previewed clips they can provide input selecting that clip to replace the automatically selected clip in the sequence of clips. The user may also be able to select a soundtrack to play over the final sequence of clips and the style of font used to display the corresponding words over each clip.

The set of media clips associated with a given word may be transmitted to the user device from the server at the same time, for example at step 202, such that the user is able to browse and select the different media clips associated with each word. Alternatively, however, only a first clip may be sent initially, for each word, being a clip that has been automatically selected from the relevant set of clips by the server system. Further clips in a given set may only be sent to the user device from the server in response to input from the user. In particular this may include user input requesting to view and/or select an alternative clip for a given word. As the user selects to change clips, a next clip in the set of clips is retrieved from the server, and this may be repeated as the user requests to view and/or select further clips. It is also possible that a single request by a user to select and/or view an alternative clip may cause the user device to request the remainder of the set of clips for the particular word, or for all words. At step 204 the user is able to optionally preview the resulting sequence of clips that will be viewed by a recipient, including any changes made by the user. The user may be presented with an option to preview the final sequence at the end of the process. Once the final sequence has been decided then data indicative of the final sequence may be generated and, at step 205, sent to a server for generation and distribution of the combined sequence of clips.

Additional clips from alternative sources may include user generated clips generated using a video camera and/or microphone integral to the user device, such as a camera within a smart phone. Alternatively, or additionally, the clips may be generated with a stand alone camera device and made available to the user device, such as by uploading the video clips to a memory on the user device or otherwise accessible by the user device, including a web-based or server based storage or memory. Alternatively, or additionally, the additional clips may be obtained from one or more servers other than the remote server system used for parsing the text-based messages and searching for selections of clips, including web pages hosting media clips such as YouTube (TM). Figure 3 shows an example layout for the screens presented to a user when a user is making changes to the selected media clips. In this example, a simple three word message "cat dog man" has been composed by the user, and a set of results has been returned for each word in the message. The term "cat" has returned a single hit and so a single screen 301 is available showing a still image "1.1" from a media clip relevant to the term "cat". In contrast, the term "dog" has returned three hits resulting in three still images 302 to 304 respectively labelled "2.1", "2.2" and "2.3". These represent the options available to the user for replacing the automatically selected clip identified as "2.1" in image box 302. Similarly, the term "man" has resulted in two hits. Each of the boxes 301 to 306 represents screen views that may be presented on a user device to provide the user with the opportunity of previewing a given clip. The user device may be equipped with a touch screen or a gesture controlled input associated with a display screen. The first box 301 may be first presented to the user, giving them the option of previewing the clip

represented by the image 1.1 , for example by pressing the image with their finger, stylus or other pointing device. The user may then move on to the second word by providing an appropriate gesture, such as by swiping from right to left across the image presented on display screen 301. This will cause the user device to display a still image associated with the automatically selected clip 2.1 associated with the second word "dog". As previously, the user can select to play or preview the media clip by tapping or touching the screen, or an appropriate preview icon. The user may also move to, or identify, the second media clip 2.2 within the set of media clips associated with the second term "dog" by providing user input different to the input used to cause the user device to change displays from the first word to the second word. For example, the user may provide a vertical swiping gesture, as opposed to a horizontal swiping gesture, such as by swiping in a direction from the bottom of the display screen to the top of the display screen or vice versa. This will cause the user device to display the screen 303, giving the user the option of previewing the clip associated with image "2.2" or proceeding to image "2.3" in box 304 by swiping again. Similarly, the user can move across to the third word and select alternative media clips in a similar manner. In this way a menu of alternative media clips can be presented to the user and selected for inclusion in the final sequence of media clips. This allows multiple options to be presented to a user on a relatively small screen such as the screen on a mobile phone or tablet, which might not be large enough to adequately display all the available options for the clips making up the sequence of clips. Allowing the user to scroll through the different words and also scroll through the different available clips for each word in this manner allows easy selection of clips on a limited input device with a limited screen space.

The user may be given the option of adding user generated video or other media clips as described above. For example, the option may be presented by providing an additional box beneath each of the set of boxes associated with respective words (e.g. beneath boxes 301 , 304 and 306). When the additional box is selected, the user is provided with the option of selecting a clip from their own collection of clips, which may have been generated by them, or obtained from a source coupled to their user device such as via the internet.

The final sequence of media clips may include, or be accompanied by, an audio track, and particularly a music track or soundtrack, that plays as the sequence of media clips are shown. Therefore, a single audio track may play over the plurality of media clips making up the final sequence of media clips. An audio track may be automatically selected and the user given the opportunity to preview the track and replace it with a different audio track as desired. Alternatively, the user may be given a direct choice as to which audio track should be used to accompany the sequence of media clips.

The user may be presented with a number of options to adjust the properties of the final sequence of media clips. In the final sequence of media clips each media clip may be presented alongside the original word with which the clip is associated. In the example above the clips for "cat", "dog" and "man" could be accompanied by the respective relevant words. The user may be presented with the option to allow text to be presented in any suitable location relative to the clip, such as the top or bottom of the screen, to overlay the clip, or to hide or remove the text. The user may be given the option of selecting different fonts, colours or styles for the accompanying wording. The user may be given the option of selecting different soundtracks to accompany the sequence of media clips, or to shuffle the text font and/or soundtrack throughout the final sequence. The user may also be given the option to combine or separate video clips. These options may be presented in an additional box beneath each of the set of boxes associated with respective words (e.g. beneath boxes 301 , 304 and 306), or may be presented simultaneously with the boxes allowing selection of clips for respective words. Figure 4 shows an embodiment of a system 400 for implementing the methods described herein. The system comprises a user device 401 of the type described herein, such as a mobile phone or tablet device comprising a touch screen input. The user device includes a network connection such that it is able to communicate with a server system 402 via a network such as the internet. The server 402 comprises a number of functional components which may each be implemented as hardware functional units, or in software. The server itself may be a single server unit, a collection of network server units, or a virtual server unit implemented in a cloud based system. Functionally, the server may include an application programming interface (API) 403 for controlling the operation of other software based functional units within the server. The API is able to exchange data with the user device 401. Included in the server is a parser 404 for separating the text of a text-based message received from the user device into its constituent words. A database 406 is provided. The database 406 may be a metadata database, storing metadata in the form of key words linked to respective media clips. The media clips themselves may be stored in a clip storage device 407 which may be a memory internal to or accessible by the server system such as one or more hard disk drives storing a large number of clips. Also included is a media server 405 which may be used to retrieve media clips from the storage device. Further optional components such as delegator 408 and other load balancing components may be provided as appropriate.

The media clips are retrieved from a library, which may in this embodiment be stored in storage device 407. The library may be a maintained collection of media clips such, as videos, of a predetermined duration. Each media clip stored on the clip storage device 407 is optionally short in duration. For example, each of the media clips may be less than 10 seconds in duration, less than 5 seconds in duration, or between 1 and 3 seconds in duration. Preferably, in some embodiments, each clip is 2 seconds, or approximately 2 seconds, in duration. For example, each clip may be limited to between 1.50 and 2.49 seconds in duration. Many words will be associated with metadata in the database, and these can be considered words of a first type. Some words, however, may not have any appropriate media clips associated with them, such that a search for that word will not identify any relevant results. Additionally, or alternatively, some words may be predefined as being unsuitable for having a clip associated with them in the database, such as common words like "the" or "a" or words that contain a predetermined number of letters or less (e.g. 3 letters or fewer). Either or both of these types of words can be considered words of a second type. For words of the second type, a media clip may be automatically generated that simply displays the text of the word against a background. The background may be a predetermined background, for example a plain coloured background. Clips relating to words of the first type, obtained from the library, will have a particular duration as described above. Clips generated for words of the second type may have a shorter duration, such as 1 second or less for example.

The system may be configured such that the resulting final sequence of clips cannot exceed a predetermined period of time, the period of time being, for example, 60 seconds or 30 seconds. This can be achieved by limiting the number of words in a text based message to a predetermined number based on the length of the clips. The number may, for example, correspond to the maximum period of time for the final sequence of clips divided by the duration of a clip. The duration of clips may be consistent across all clips, or the average value for clip duration may be used where the duration of clips varies between upper and lower values.

Embodiments may provide a method that monitors the predicted duration of the sequence of media clips as the user enters the words using an input device such as a keyboard, touch screen or microphone coupled with voice to text functionality associated with the user device 401. As the text-based message is input, the user device is configured to calculate the total length of the resulting sequence of media clips. Each word may be associated with a predetermined period of time and the sum of these periods of time is then calculated and compared with a threshold time limit. Once the time limit is exceeded a warning or indicator is displayed to the user. The calculation may be repeated each time a new word is input to keep track of the size of the text based message. The predetermined period of time may simply correspond to the duration or average duration of the media clips from the library. Alternatively, words of the first type may be associated with a first predetermined period of time and words of the second type may be associated with one or more second predetermined periods being shorter in duration than the first period such as 1 second or less.

The clip storage device 407 optionally stores two versions of each media clip. A high-resolution version of a given media clip is stored for use in a final sequence of media clips that is presented to the message recipient. A "high" resolution media clip may include, for example, that achieved using the "FFmpeg" software at a setting of mp4 = 1024x576 or equivalent. In addition, a low-resolution version of each clip is stored. It is the low-resolution version of clips that are passed from the server system 402 to the user device 401 in order to present the user with the set of media clips used to make up the sequence of media clips associated with their text-based message. A "low" resolution clip may include, for example, that achieved using the "FFmpeg" software at mp4 = 480x270 or equivalent. The low-resolution clips are of a lower resolution than the high-resolution clips and therefore take up less memory and require less bandwidth to send them to the user device. The settings may be adjusted appropriately to achieve a balance of size/bandwidth considerations vs. quality. Upon receipt of a text-based message from a user device 401 , the API 403 sends the message to parser 404 to be deconstructed into its constituent words. The API then instructs a search to be performed using database 406 to identify, for each word, a plurality of clips containing content representative of the word in question. Where a large number of results may be returned, the API may select a predetermined number of the results, and select from that predetermined number a predetermined subset of results to pass back to the user device for possible selection by the user. For example, the API may limit the number of results to 50, and select five or ten random results from the 50 results to pass back to the user device for possible selection as described above. Alternatively a predetermined number of results may simply be selected, such as five or ten results, and passed back to the user device, again as described above. The server system may make an initial automatic selection of a first clip from each selection of clips, and the remainder of the clips in the selection may only be passed to the user device in response to the user indicating that they wish to review or select alternative clips. This may be performed on a per word basis.

The database 406 associates media clips from storage 407 with appropriate metadata. Low resolution versions of the clips are passed to the user device for review and possible selection by the user, resulting in a final selection of clips for the sequence of clips representative of the original text-based message. The user device may generate the final sequence of clips, or it may pass data indicative of the final sequence of clips back to the server system 402. The data may include a list of clip identifiers and the ordering of the clips, such that the clips themselves do not need to be passed back to the server system. The server system may produce the finalised sequence of media clips using high-resolution clips from the clip storage 407. Where user generated clips, or clips from other alternative sources, are included in the final sequence of clips, these clips may be sent from the user device to the server system as appropriate. The recipient watches the resulting finalised sequence of media clips preferably by streaming it to their recipient user device from the server system, although other mechanisms for delivering the finalised sequence of media clips may be used as appropriate. It will be appreciated that different server architectures and structures may be used as appropriate, and that the methods described above may be implemented in software executing on electronic user devices and/or server system.

Claims

1. A computer-implemented method of converting a text-based message into a sequence of media clips, the method comprising, at a user device:

receiving a text-based message comprising a plurality of words;

sending, via a network connection, the text-based message to a remote server to be separated into constituent words, whereby sets of media clips are each associated with respective words in the message, the media clips in a given set being identified based on associated metadata related to the respective word;

receiving, via the network connection, at least a first automatically selected media clip from each of the sets of media clips for the respective words in the message;

wherein the sequence of media clips contains one automatically selected media clip from each of the sets of media clips, the method further comprising:

for one or more of the words in the text based message, receiving user input to select a media clip to replace an automatically selected media clip for inclusion in the sequence of media clips to generate a revised sequence of media clips, the user input selecting a media clip from the set of media clips associated with the word or an additional media clip accessible by the user device from an alternative source.

2. A method according to claim 1 wherein the user input to select a media clip causes the user device to receive, via the network connection, one or more further media clips from the set of media clips associated with that word.

3. A method according to claim 1 wherein the step of receiving, via the network

connection, at least a first automatically selected media clip from each of the sets of media clips further includes receiving one or more further media clips from each of the sets of media clips

4. A method according to any preceding claim further comprising:

displaying a digital object representative of an automatically selected media clip from the set of media clips associated with a word in the text based message; and

in response to receiving user input of a first kind, displaying a digital object representative of a second media clip from the set of media clips associated with the word in the text based message and selecting the second media clip for inclusion in the sequence of media clips.

5. A method according to claim 4 further comprising:

in response to receiving user input of a second type, displaying a digital object representative of an automatically selected media clip from the set of media clips associated with a second word in the text based message; and

in response to subsequently receiving user input of the first type whilst the digital object representative of the media clip associated with the second word is being displayed, displaying a digital object representative of a second media clip from the set of media clips associated with the second word in the text based message, and selecting the second media clip for inclusion in the sequence of media clips.

6. A method according to claim 5 wherein the user device includes a touch screen or gesture controlled display and the user input of the first kind is a swipe gesture in a first direction and the user input of the second kind is a swipe gesture in a second direction different to the first direction.

7. A method according to claim 4, 5 or 6 wherein the digital object is a still image from a video.

8. A method according to any of claims claim 4 to 7 further comprising displaying a preview of a media clip when user input is received selecting the digital object representative of the media clip.

9. A method according to any preceding claim further comprising, in response to

receiving user input, displaying a digital object representative of an additional media clip obtained from an alternative source and selecting the media clip for inclusion in the sequence of media clips.

10. A method according to any preceding claim wherein each media clip is

approximately 2 seconds in duration.

11. A method according to any preceding claim wherein the maximum number of words that may be included within the text based message is selected such that the sequence of media clips cannot exceed a predetermined maximum period of time.

12. A method according to claim 11 wherein the maximum number of words

corresponds to the maximum period of time for the sequence of media clips divided by the duration of a clip.

13. A method according to any preceding claim wherein the text-based message is input into the user device by a user via a user input device, and the method further comprises, as the text-based message is received, calculating the total length of the resulting sequence of media clips by associating each word with a predetermined period of time and calculating the sum of these periods of time.

14. A method according to claim 13 further comprising comparing the calculated total length of the sequence of media clips with an upper threshold limit and displaying an indicator to the user when the upper threshold limit is exceeded.

15. A method according to claim 13 or 14 wherein words of a first type are associated with a first predetermined period of time and words of a second type are associated with a second predetermined period of time, the first period of time being greater than the second period of time.

16. A method according to claim 15 wherein the words of a second type are identified based on the number of letters in the word, whereby words having a number of letters less than or equal to a predetermined number are associated with the second period of time.

17. A method according to claim 15 or 16 wherein the first predetermined period of time is approximately 2 seconds.

18. A method according to any preceding claim further including sending data to the remote server indicative of the revised sequence of media clips to be combined into a revised sequence of clips.

19. A method according to claim 18 wherein: the sets of media clips received by the user device are low-resolution media clips; and

the revised sequence of media clips is created by combining high-resolution versions of the selected media clips at the remote server.

20. A method according to claim 18 or 19 further comprising sending any selected

media clips obtained from an alternative source to the remote server.

21. A method according to any preceding claim wherein the additional media clip

accessible by the user device from an alternative source is a user generated media clip.

22. A method according to claim 21 wherein the user generated media clip is generated by a video camera integrated with the user device.

23. A computer-implemented method of converting a text based message into a sequence of media clips, the method comprising, at a remote server system:

receiving from a user device, via a network connection, a text based message comprising a plurality of words;

separating the text based message into the constituent words; comparing one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a set of media clips relevant to the word;

sending to the user device, via the network connection, at least a first automatically selected media clip from a set of media clips for each of the compared words and at least a second media clip from a set of media clips for at least one of the compared words.

24. A method according to claim 23 further comprising receiving a request from the user device to select a media clip to replace an automatically selected media clip and in response sending, via the network connection, one or more further media clips from the set of media clips associated with that word.

25. A method according to claim 23 wherein the step of sending to the user device, via the network connection, at least a first automatically selected media clip includes sending one or more further media clips from each of the sets of media clips.

26. A method according to any of claims 23 to 25, wherein the step of sending the media clips for each word comprises sending low-resolution media clips.

27. A method according to any of claims 23 to 26 further comprising:

receiving from the user device, in relation to a particular word, data indicating a user selected media clip from one or more of the sets of media clips; and

combining the selected media clips into a sequence of media clips.

28. A method according to claim 27 wherein combining the selected media clips into a sequence of media clips is performed using high-resolution versions of the media clips.

29. A method according to claim 27 or 28 further comprising:

- receiving, from the user device, one or more additional media clips from one or more alternative sources accessible by the user device and data indicating a word within the text based message that each of the additional media clips is to be associated with; and

combining the selected media clips and the user generated media clips into a sequence of media clips.

30. A method according to any of claims 23 to 29 further comprising the step of selecting a set of media clips relevant to each of the compared words from a plurality of media clips relevant to each compared word, wherein the step of selecting a set of media clips includes selecting a predetermined number of randomly selected media clips from the identified plurality of media clips.

31. An electronic user device comprising:

a user input device;

- a network connection;

one or more processors;

memory; and

a program stored in the memory and configured to be executed by the one or more processors, the program including instructions to:

- send, via the network connection, a text based message to a remote server to be separated into constituent words, whereby sets of media clips are each associated with respective words in the message, the media clips in a given set being identified based on associated metadata related to the respective word;

receive, via the network connection, at least a first automatically selected media clip from each of the sets of media clips for the respective words in the message;

wherein the sequence of media clips contains one automatically selected media clip from each of the sets of media clips, the program further including instructions to:

for one or more of the words in the text based message, receive user input to select a media clip and replace an automatically selected media clip for inclusion in the sequence of media clips to generate a revised sequence of media clips, the media clip being selected from the set of media clips associated with the word or an additional media clip accessible by the user device from an alternative source; and

generate a revised sequence of media clips including the user selected media clips or send data to the remote server indicative of a revised sequence of media clips including the user selected media clips to be combined into a revised sequence of media clips.

32. An electronic user device according to claim 31 wherein the user input device includes a touch screen or gesture controlled display.

33. An electronic user device according to claim 31 or 32 wherein the user input device includes an integral video camera.

34. An electronic user device according to any of claims 31 to 33 wherein the alternative source includes a memory of the user device.

35. An electronic user device according to any of claims 31 to 34 wherein the alternative source includes a remote server storing one or more media clips.

36. A remote server system comprising;

a network connection;

one or more processors;

memory; and a program stored in the memory and configured to be executed by the one or more processors, the program including instructions to:

receive from an electronic user device, via the network connection, a text-based message comprising a plurality of words;

- separate the text-based message into the constituent words;

compare one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a set of media clips relevant to the word; - send to the user device, via the network connection, at least a first automatically selected media clip from a set of media clips for each of the compared words and at least a second media clip from a set of media clips for at least one of the compared words.

37. A remote server system according to claim 36 further comprising:

a database containing a plurality of media clips associated with the metadata, the database storing a high-resolution version of each clip and a low- resolution version of each clip;

wherein the program includes instructions to:

- send low-resolution versions of the clips to the user device;

receive from the user device, in relation to a particular word, data indicating a user selected media clip from one or more of the sets of media clips; and

combine the selected media clips into a sequence of media clips using high- resolution versions of the media clips.

38. A remote server system according to claim 37 wherein the program includes instructions to:

receive, from the user device, one or more additional media clips from one or more alternative sources accessible by the user device and data indicating a word within the text based message that each of the additional media clips is to be associated with; and

combine the selected media clips and the additional media clips into a sequence of media clips.

39. A method, electronic user device or remote server according to any preceding claim wherein the media clips are video and/or audio clips.

40. A method of converting a text based message into a sequence of media clips, the method comprising:

receiving a text-based message comprising a plurality of words;

sending, via a network connection, the text-based message to a remote server to be separated into constituent words, whereby sets of media clips are each associated with respective words in the message, the media clips in a given set being identified based on associated metadata related to the respective word; receiving, via the network connection, at least a first automatically selected media clip from each of the sets of media clips for the respective words in the message;

41. A method of converting a text based message into a sequence of media clips, the method comprising:

separating the text based message into the constituent words; comparing each of the constituent words with metadata contained within a database to identify, for each word, a plurality of media clips relevant to the word; selecting a set of media clips and associating that set with the word;

sending to the user device, via the network connection, the set of media clips for each word.

42. A method of converting a text based message into a sequence of media clips, the method comprising:

receiving a text based message comprising a plurality of words; separating the text based message into the constituent words; comparing one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a set of media clips relevant to the word;

automatically selecting, from each of the sets of media clips, a media clip; - for one or more of the words in the text based message, receiving user input to select a media clip to replace the automatically selected media clip for inclusion in the sequence of media clips; and

generating a sequence of media clips using the selected media clips.

43. A system comprising an electronic user device and a remote server system;

the electronic user device being configured to:

send, via the network connection, a text based message to a remote server to be separated into constituent words, whereby sets of media clips are each associated with respective words in the message, the media clips in a given set being identified based on associated metadata related to the respective word;

wherein the sequence of media clips contains one selected media clip from each of the set of media clips, the user device further being configured to:

for one or more of the words in the text based message, receive user input to select a media clip and replace an automatically selected media clip for inclusion in the sequence of media clips;

the remote server system being configured to:

- receive from the electronic user device the text-based message comprising a plurality of words;

separate the text-based message into the constituent words; compare one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a set of media clips relevant to the word;

send to the user device, via the network connection, at least a first automatically selected media clip from a set of media clips for each of the compared words and at least a second media clip from a set of media clips for at least one of the compared words;

the system being configured to:

generate a sequence of media clips using the selected media clips.

44. A computer-implemented method of converting a text-based message into a sequence of media clips, the method comprising, at a user device:

receiving a text-based message comprising a plurality of words;

sending, via a network connection, the text-based message to a remote server to be separated into constituent words, whereby media clips are associated with one or more of the words in the message, the media clips being identified based on associated metadata related to a respective word.

45. A computer-implemented method of converting a text based message into a sequence of media clips, the method comprising, at a remote server system:

separating the text based message into the constituent words; comparing one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a media clip relevant to the word.

46. A computer-implemented method of converting a text-based message into a sequence of media clips, the method comprising:

receiving a text-based message comprising a plurality of words;

separating the text based message into the constituent words; comparing one or more of the constituent words with metadata contained within a database to identify, for each of the compared words, a media clip relevant to the word;

generating a sequence of media clips using the identified media clips.