EP1829344A1

EP1829344A1 - Method and system for synthesizing a video message

Info

Publication number: EP1829344A1
Application number: EP05824310A
Authority: EP
Inventors: Mauro Barbieri; Lalitha Agnihotri; Nevenka Dimitrova; Srinivas Gutta; Jun Fan; Alan Hanjalic; Robert Turetsky
Original assignee: Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2004-12-14
Filing date: 2005-12-12
Publication date: 2007-09-05
Also published as: CN101080918A; WO2006064455A1; JP2008523759A

Abstract

A method (Figure 1) and system (Figure 2) synthesize a smart MMS message (10) derived from an SMS preliminary message (34) that conveys one or more actions and concepts. A user (12) archives (in 28; Steps 56, 58) user-acquired video clips (20) that are semantically annotated (22,102; Step 54) both generically, according to objectively sensible content, and personally, according to user-subjective content. A service provider (18) archives (in 28; Steps 52, 58) service provider-acquired MMS clips (20) that are semantically annotated (22,110; Step 50) generically. Annotating is effected pursuant to an ontology (118; Steps 50, 52). When the user (12) wishes to send an MMS message (10) to a recipient (16), the preliminary message (34) is sent (12,130; Step 62) to the service provider (18), who extracts (at 132; Step 64) those semantically annotated (at 102 and 110; Steps 50,54) archived (at 28) clips that, according to the ontology (118) match user- selected (Steps 68,76) aspects of the action or concept of the preliminary message. When the user makes a final selection (Step 70) of extracted clips that are presented thereto (Step 66), they are combined (at 142; Step 72) with each other and with the preliminary message and sent to the recipient (Step 74).

Description

PHUS040527

METHOD AND SYSTEM FOR SYNTHESIZING A VIDEO MESSAGE

[0001] This invention relates to a method and system for synthesizing a video message. More particularly, the present invention relates to a user sending to a recipient a synthesized MMS message that comprises the combination of the SMS content of a preliminary message, composed by the user, and annotated stored video or other MMS content that matches the concept or action conveyed by the preliminary message. A portion of the stored content may originate with a third party, which performs the synthesizing, while another portion of the stored content may originate with the user. [0002] Authoring and editing videos is difficult for an unskilled person, and is especially difficult for those employing the simple user interfaces of mobile phones. Further, accessing, selecting, and composing scenes from large video archives are difficult, if not impossible, tasks for the average person, whether or not they are attempted using a mobile phone.

[0003] A popular and much used feature of portable communication devices, such as mobile phones, is the ability to compose and send SMS (Short Message Service) messages. A principal reason for this popularity and utility is that SMS is similar to other easy-to-use "natural" features and services, such as composing and sending computer e-mail. [0004] Recently, there has been movement to provide to users mobile phones having MMS (Multimedia Message Service) capability, that is, the ability to compose and send messages having video and audio and other components. It is thought that devices with MMS capabilities are not experiencing the early success of SMS devices because of users' lack of familiarity therewith and because of the aforenoted difficulties related to authoring, editing, and composing scenes from archived audio-video materials, which are not "natural" features for the average person. PHUS040527

[0005] US Published Application 2004/0092250 Aldiscloses an MMS based photo album publishing system. Rich media content associated with a first person's telephone number is stored. When a second person sends an MMS message from an MMS mobile device and the message includes a predetermined indicator and the first person's telephone number as a destination for the message, the stored rich media content is transmitted to the second person's mobile MMS device.

[0006] In International Published Application WO 2004/019583 A2 there are disclosed a method and system for transmitting messages on a telecommunications network which integrate an SMS text message that contains emoticons with video content as an MMS message, which is sent to a recipient terminal. Textual and oral portions of an SMS message are transformed into synthesized oral content. Virtual characters (avatars) and backgrounds are selected from stored galleries thereof and combined and customized into a desired scenario. The avatar(s) "speak" the synthesized oral content, and word stress and facial expressions of the avatar(s) is related to the emoticons. The talking avatar scenario is sent as an MMS message.

[0007] European Patent Application EP 1 077 414 A2 discloses a method and system for retrieving images from a database having images that are relevant to indicated text. Publication 2002-007415 of the Patent Abstracts of Japan appears to be similar the last- described application, except that video content is retrieved. US Published Application 2002/0080162 Al discloses a method for automatically extracting semantically significant events from videos, specifically for identifying and downloading only slow motion portions of recorded sporting events by differentiating the characteristics slow motion portions from normal motion portion. US Patent 6,108,674 discloses apparatus for integrating a selected image into written text by using the text to obtain from memory and display stored, semantically-identified images, the identifications corresponding to a PHUS040527

portion of the text. Selected ones of the corresponding images are "pasted" into the text. International Published Application WO 99/45483 discloses a method and system for generating semantic visual templates for image and video retrieval. [0008] The present invention contemplates rendering the creation of an MMS message more convenient for the average user of a mobile communication device, such as a mobile phone, as well as other devices, such as PC's, as a creative authoring tool. [0009] The invention involves three entities: the message sender (user), the recipient of the message, and a service provider. The service provider produces or obtains plural video clips having visual and audio content. The clips are annotated by appending to them semantically-based tags corresponding to their content and are stored in an archive remote from the user. More than one tag may be, and usually will be, appended to a clip. For example a clip showing two persons kissing may include tags such as "love," "kiss," "affection," and "romance." In some embodiments, the user may also have created or obtained video clips. These clips are similarly annotated and may be stored in an archive local to the user or in the remote archive by agreement between the service provider and the user. The semantic tags appended to the user-created clips are similar to those appended to the service provider's clips, and will typically contain additional tags based on the user's personal experience and knowledge. For example, the user may have created a clip -with a video camera or functionally similar device — or obtained a clip from a friend showing the user and another person kissing. This clip is tagged similarly to the service provider's clip showing two people kissing (i.e., with the tags "love," "kiss," etc.), and contains additional tags, such as the names of the kissing couple or the place or event where the kissing occurred. Thus, the user's clip may contain additional tags such as "John," "Mary," and "senior prom." PHUS040527

[0010] An archive of all available clips comprises a first sub-archive (identical to the remote archive), containing the service provider's clips and annotations, and a second sub- archive (identical to the local archive), containing the user's clips and annotations. If the second sub-archive (the local archive) is sent to the service provider, the service provider may append additional relevant "general" tags inadvertently omitted by the user, such as "affection." In any event, there is ultimately created an archive of semantically-annotated clips.

[0011] After creation of the archive, the user may wish to send a "smart" MMS message. In this event, the user sends and the service provider receives a language-based preliminary message, which sets forth an "action or concept," herein meaning an action, concept, subject, situation, location, event, name(s), object, or similar information. The user wishes the message and its action or concept to be conveyed to a named recipient. The service provider extracts from the archive (the first sub -archive + the second sub- archive) those clips that have tags semantically matching items in the preliminary message, such as the action or concept set forth therein, the names of the sender and/or the recipient, or an event or special occasion. Matching may be effected by simple keyword matching or by more sophisticated ontology-based matching. Where ontological matching is used, the ontology may be an RDF (Resource Descriptive Framework) ontology, XML (Extensible Markup Language), or other suitable ontology. The match may be effected without intervention by the user, or the user may specify which portions of the SMS preliminary message or which tags are to be ontologically significant.

[0012] It is contemplated that the user's preliminary message may include, or consist entirely of, voice content intended to be included in the MMS message. This voice content may comprise a simple voice mail message to the service provider or other audio content. In this event an appropriate technique, such as ASR (Automatic Speech Recognition) may PHUS040527

be used to convert the voice recording into SMS-type text, which is also used in the matching process. If the preliminary message has audio-visual content, ASR and/or AVA

(automatic video analysis) may be used to provide bases for semantic tagging and matching.

[0013] The clips extracted by the service provider, or summaries thereof, are presented to the user. If the user is without MMS capabilities, textual summaries created by the service provider are presented to the user as SMS content. If the user has MMS capabilities, either the extracted clips or summaries thereof (e.g., key frames from each clip) are presented to the user in as MMS content. At this point, the user may select some or all of the clips, or portions thereof; or instruct the service provider to extract additional clips matching the original semantic content of the preliminary message; or request that additional semantic tags be considered relevant.

[0014] Ultimately, when the user is satisfied with the number and content of clips, he identifies and selects them. In response to this selection, the service provider combines and integrates the preliminary message with the user-selected clips, thereafter transmitting the combination as a completed MMS message to the designated recipient.

[0015] Figure 1 is a generalized flow chart for performing a method of synthesizing a video message in response to the receipt of a preliminary message in accordance with the principals of the present invention.

[0016] Figure 2 is a generalized schematic representation of a system for performing the method illustrated by Figure 1.

[0017] Referring to Figures 1 and 2, there are shown a method and system according to the present invention for creating an MMS message 10 more in a manner that is more convenient for the average user of a communication device 12, such as a mobile phone or a

PC, effectively permitting the use of the device 12 as a creative authoring tool. PHUS040527

[0018] The invention involves three entities: the message sender (user) 14, the recipient 16 of the message, and a service provider 18. The service provider 18 produces or obtains plural video clips 20 having visual and audio content. The clips 20 may include content taken from or comprising videos or movies (or portions or trailers thereof), recorded music, material obtained or obtainable from the Web, advertising material, promotional material, or any other suitable material.

[0019] In Step 50, The clips 20 are generically annotated by the service provider 18 by appending to them semantically-based tags 22 corresponding to their observable, general content and are then stored, Step 52, in an archive 24 remote from the user 14. More than one tag 22 may be, and usually will be, appended to each clip 20. For example a clip 20 showing two persons kissing may include tags 22 such as "love," "kiss," "affection," "romance," and the like.

[0020] In some embodiments, the user 14 may also have created or obtained video clips 20. These clips 20 are similarly generically annotated, Step 54, and may be stored, Step 56, in an archive 26 local to the user 14. The generic tags 22 appended by the user 14 to the user-created clips 20 should be similar to those appended to the service provider's clips 20, and will also contain additional personal tags 22 based on the user's personal experience and knowledge regarding them.

[0021] For example, the user 14 may have created a clip 20, with a video camera or functionally similar device, or obtained it from another, the clip 20 showing the user 14 and another person kissing. Such a personal content clip 20 may be generically tagged similarly to the service provider's clips 20 having similar objectively determinable content (or may be later generically tagged by the service provider 18, as set forth below). For example, a user-produced and user-tagged clip 20 showing two people kissing should typically contain the generic tags 22 "love," "kiss," "affection," "romance," etc., and will PHUS040527

also contain additional personal or user-specific tags 22, such as the names of the kissing couple and/or the place or event where the kissing occurred. These personal tags 22 will contain information relative to the personal action or concept of the clip 20, such as "John," "Mary," and "senior prom." Thus, the service provider's tags 22 will relate to the generic content of the user clips 20 archived thereby, and the user's tags 22 will relate to this same generic content as well as to explicit or implicit personal content within the personal knowledge of the user 14.

[0022] There is thus created, Step 58, by viewing the local and remote archives 26 and 24 together, an archive 28 containing all available, semantically-tagged clips 20. The archive 28 may initially, in effect, be distributed (have two different physical locations), the local archive 26 remaining with the user 14, until it is sent to the service provider 18, and the remote archive 24 being located with the service provider 18. The archive 28 may also be non-distributed and preferably located with the service provider 18. The user 14 and the service provider 18 may reach an agreement pursuant to which the user 14 is permitted to transmit to the service provider 18 the contents of the local archive 26 (user-annotated clips) prior to, or at the time of, a user request for the service provider to create the MMS message 10. In these latter events, the service provider 18 appends, Step 60, additional or corrected "general" tags 22 to the user-tagged clips 20 so that the generic tags 22 on clips 20 of both parties conveying similar actions and concepts will be the same. [0023] It should be noted that the Steps shown thus far in Figure 1 are not set forth in a time sequence. In Step 50, the service provider 18 may have created the remote archive 24 long before the user 14 requests the service provider 18 to synthesize the MMS message 10. For cost-related or other reasons the service provider 18 also may create the remote archive 24 only after receiving the user's MMS-synthesizing request. As already noted, the user's clips 20 may be sent to the service provider before, or at the time of, a request to PHUS040527

synthesize an MMS message 10. The foregoing also means that the times that the clips 20 are annotated, Steps 50, 54, that the local/remote archives 26/24 are created, Steps 56/52, that the archive 28 (distributed or not) is created, Step 58, and that the service provider amends the tags on the user-annotated clips, Step 60, are variable and not necessarily in any particular sequence.

[0024] When the user 14 decides to send a "smart" MMS message 10, the user transmits, and the service provider 18 receives, Step 62, an SMS language-based preliminary message 34, which sets forth an "action or concept," herein meaning an action, concept, subject, situation, location, event, name(s), object, emotion, or similar information. This (Step 62) constitutes the user's request to the service provider that the preliminary message 34 plus additional and embellishing material be sent to the recipient 16. The service provider 18 extracts from the archive 28, Step 64, those clips 20 that have tags 22 semantically matching the action or concept of the preliminary message 34, the names of the sender (the user) and/or the recipient, and any other information indicated by the user as being of significance, such as an event, special occasion or relationship between the user and the recipient. Matching is preferably effected by an ontology, i.e., a specification of a conceptualization, such as RDF (Resource Descriptive Framework) ontology, XML (Extensible Markup Language), or other suitable ontology. Key-word matching may also be used, for example, matching can simply be effected between the tags 22 on the clips 20 and nouns and verbs contained in the preliminary message 34. Matching may be effected without intervention by the user 14, or the user 14 may specify which portions of the preliminary message 34 or which tags are to be ontologically significant. [0025] It is contemplated that the user's preliminary message 34 may include recorded voice content intended to be included in the MMS message 10. In this event an appropriate technique, such as ASR (Automatic Speech Recognition) may be used to convert the voice PHUS040527

recording into text, which is then tagged according to its action or content and which is also used in the matching process.

[0026] In Step 66 the clips 20 extracted by the service provider 18, or portions or summaries thereof, are presented to the user 14. If the user 14 is without MMS capabilities, textual summaries created by the service provider 18 are presented to the user 14. If the user has MMS capabilities, the clips 20 themselves, or summaries thereof (e.g., selected frames), are presented to the user 14. At this point, at Step 68 the user may select some or all of the clips 20, or portions thereof, or instruct the service provider 18 to extract additional clips 20 matching the original semantic content of the preliminary message 34 or additional semantic matches. Steps 66 and 68 are iterated until the user 14 is satisfied with the service provider-extracted material.

[0027] Ultimately, when the user 14 is satisfied with the number and content of the clips 20 presented, the user identifies them to the service provider and thereby selects them, Step 70. In response to the user's selection, the service provider 18 combines and integrates the preliminary message 34 with the user-selected clips 20, Step 72, and thereafter transmits, Step 74, the combination as a completed smart MMS message 10 to an MMS device 36 operated by the designated recipient 16.

[0028] In one example, the user 14, named John, wishes to send to the recipient 16, named Mary, a Valentine "card" (the MMS message 10) containing the words "I love you," and he forwards this information to the service provider 18 via the preliminary message 34. The archive 28 contains several clips 20 of movie segments in which two actors are kissing, including a segment from "Gone With the Wind" in which Rhett Butler kisses Scarlet O'Hara. These clips 20 are all tagged "kissing," "love" and "romance," and similar contexts. The archive 28 also contains several clips 20 of John Kissing Mary that contain similar general tags 22 ("kissing," "love," etc), as well as the tags 22 "John" and "Mary". PHUS040527

Also in the archive 28 are recorded romantic songs, tagged with "love" and "romance," including the song "I Love You Truly."

[0029] The semantic matching of the actions or concepts of the preliminary message 34 to the tags 22 (or vice versa) on the clips 20 in the archive 28, will result in the extraction from the archive 28 of multiple items, including the pictures of John and Mary kissing, Rhett and Scarlett kissing, and the recorded song "I Love You Truly." The extracted items or summaries thereof (if the user 14 is using an SMS device) are presented to the user 14. Assume that the user 14 selects and identifies to the service provider 18 the John/Mary and Rhett/Scarlett clips 20, requesting that the latter be shown first, then faded out as the former fades in, the showing of these clips taking about one minute. The user 14 also requests that the first minute of the song be simultaneously played. The service provider 18, accordingly combines these desiderata with the text "I love you" from the preliminary message 34 to produce the completed smart MMS message 10. [0030] The synthesis of the completed MMS message 10 may be fully automatic, with little or no input from the user 14 other than the sending of the preliminary message 34 to the service provider 18. Preferably, however, the method is interactive to a greater or lesser degree. Specifically, in addition to sending the preliminary message 34 and possibly contributing some personal clips 20 to the archive 28, the user may specify, qualify or limit various aspects of the method, by engaging in one or more of the following activities:

(1) Selecting (as in Step 68) one or more of the multiple clips 20 that are presented by the service provider 18 as possible acceptable matches between the preliminary message 34 and the semantic tags on the 22 archived clips 20;

(2) Requesting the addition of a semantically matched music track or other audio content from the archive 28 and whether and how the audio content should be synchronized with the visual content of the completed message 10; PHUS040527

(3) Limiting semantic matching of tags 22 to only certain aspects (actions or concepts) of the preliminary message 34, or specifying semantic actions or concepts, in addition to those naturally ontologically generically derivable from the preliminary message 34, as possible matches for tagged clips 20 in the archive 28;

(4) Specifying the sources or types of some clips 20 (trailers of pre-1950 movies versus post- 1950 movie trailers; black-and-white movie segments versus color segments; personal videos only; non-personal videos only) and/or music content to be incorporated by the service provider 18 (jazz versus rock and roll);

(5) Limiting the clips 20 extracted and presented by the service provider

18 to those containing at least specified content, e.g., limiting presented clips 20 to those showing Brad Pitt kissing a female or to those showing kissing scenes from "famous" movies that the user 14 and the recipient 16 have seen together. Some desired content, such as a selected one of "movies seen together" may require permitted access to a third party's facility;

(6) Pre-defining the semantic significance of the content (action or concept) of certain types of preliminary messages 34 for purposes of semantic matching. For example, there may be pre-defined preliminary messages 34 of types including "best of me," "marriage proposal," "to a familiar person," "to a best friend," "Greetings to Grandma," and "to a relative" messages. A "best of me" preliminary message might be pre-defined so as to semantically match only with clips 20 in which the recipient 16 is depicted in situations wherein another person was not present and with clips showing famous persons engaging in the same or similar situations. The specified situations may be "out of the ordinary," this term being semantically weighted by the user 14. Pre-definition may occur when the user 14 initially sends the content of the archive 26 to the service provider 18, or at any other time prior to the formation of the completed message 10. A PHUS040527

preliminary message 34 intended for a best friend or a relative to commemorate a birthday, Valentine's Day, an anniversary, or other important event that is semantically significant by requiring that extracted clips 20 generally depict the user 14 and the recipient 16 or depict them together at a prior similar event. Moreover, the user 14 may designate as having semantic significance clips 20 from the archive 24 that show participation in similar important events by actors in a movie or video;

(7) Specifying the manner in which personal clips 20 from the second sub-archive 26 are combined with clips 20 from the first sub-archive 24 and/or with added music or other sound material. Among the spectrum of these specifications are fade- in/fade-out, segment interlacing, music type and tempo, order of presentation, length of the completed message 34 and of individual portions thereof, other audio-visual mixing preferences, and the like.

[0031] The foregoing actions (2)-(7), and other actions by which the user specifies that certain content be contained (or not) in the MMS message 10, may occur at any time prior to Steps 72 and 74. As a consequence, actions (2)-(7) and additional actions are collectively, but only generally, indicated at Step 76. Specifically, while Steps 62-70 constitute a first timed sequence, and Steps 72, 74 represent a second timed sequence, Step 76 may occur at any time during the sequence of Steps 62-70 but before Steps 72, 74. [0032] As alluded to above, the service provider 18 may have added to the remote archive 24 (and ultimately to the archive 28) downloaded web content that was subsequently semantically tagged. The web content may also include material added to the first sub- archive 24 and semantically tagged following receipt of the preliminary message 34 and before or during presentation and review of matched clips presented to the user 14. Further, in addition to the service provider 18 having the ability to apply ASR to audio portions of the preliminary message 34, it may also have the ability of performing face and PHUS040527

voice identification, especially regarding clips 20 personal to the user 14 and contained in the archive 26 (and ultimately in the archive 28) for the purpose of attaching corresponding tags 22 to the clips and semantically matching the clips 20 to the preliminary message 34. Also, the service provider 18 may possess the ability to add to the clips 20 presented to the user 14 clips that do not directly semantically match the preliminary message 34 but that do match tags 22 of clips 20 having other tags 22 which semantically match the preliminary message 34. For example, assume that a clip 20 having a "love" tag 22 is extracted as a match to a preliminary message 34 containing "I love you." Further assume that the extracted clip 20 also contains the tag 22 "Paris," since the clip depicts the user 14 and the recipient 16 posing in front of the Eiffel Tower. The tag "Paris" might result in the addition to the presented clips 20 of a song containing "Paris" in its title or lyrics or of a video clip depicting a quick tour of famous sites in Paris, even though the preliminary message 34 does not contain "Paris" as semantic content.

[0033] As noted above, the recipient 16 has access to an MMS device for receiving the completed smart MMS message 10. It is preferable that the user 14 also has such access. Suitable MMS devices include PC's, laptops, mobile devices, such as the Nokia 7650 or the Sony Ericsson T68i, and portable or non-portable devices MMS devices similar to the foregoing. Further, as discussed earlier, the user 14 may have access to only an SMS device, in which case at Step 66 only summaries of extracted clips are presented. [0034] Referring now to Figure 2, there is shown a generalized system 100 for performing the above-described method. Reference numerals 10-34 of Figure 1 are used in Figure 2 to identify some of the various elements of the system 40. Other elements are depicted as elements some of which are identified by reference numerals corresponding to the enumerated Steps of the method of Figure 1 and having the capability of performing the described functions. PHUS040527

[0035] The clips 20 acquired by the user 14 via conventional facilities 101 are sent to an annotator 102 (Step 54) along a path 104 for the addition thereto of the semantic tags 22 furnished on a path 106. The annotated clips 22 are placed in the local archive 26 (Step 56) on a path 108. Similarly, the clips 20 of the service provider 18 are sent to an annotator 110 (Step 50) on a path 112 to receive tags 22 on a path 114, and the annotated clips 20 are sent to the remote archive 24 on a path 116 (Step 52). As represented by the path 120, the service provider may cause the ontology 118 to provide guidance along a path 122 to the annotator 110.

[0036] The local/remote archives 26/24 are combined (Step 58) to form the archive 28, their contents traversing paths 124 and 126. The contents of the local archive 26 may first be sent along a path 128 to the service provider's annotator 110 -where the user's annotations may be checked for adherence to the protocol 118 and to ensure that generic tags consistent with those used by the service provider have been appended (Step 60) — and thereafter sent to the archive 28 via the paths 116 and 126.

[0037] The preliminary message 34 is sent to the service provider along a path 130 (Step 62). Also sent to the service provided on the path 130 are, inter alia, any user instructions accompanying (or sent after) the preliminary message 34, user feedback regarding presented clips, and the final selection of extracted clips (Steps 68, 70, 76). [0038] Facilities 132 use the ontology 118 on a path 134 and the annotated clips in the archive 28 along a path 136 to ontologically match clips with the action or concept of the preliminary message 34 and to extract the matching clips (Step 64). The extracted clips are sent to the user on a path 138 for review (Step 66). In response to this last action, the user may furnish clip selections and requests for further or modified clips and matches along the path 130 (Step 68). Ultimately, the user sends his final selection and instructions on the path 130 to the service provider (Step 70), who then, along a path 140, effects PHUS040527

operation of combining facilities 142 to combine the selected clips from a path 144 with the preliminary message 34 from the path 140 (Step 72) to form the MMS message 10 which is then sent to the recipient's MMS communication device 36 along a path 146. [0039] The various paths and elements described with reference to Figure 2 may take any convenient form chosen by an ordinarily skilled worker in the field, the function, but not necessarily the structure, of these paths and elements being of foremost importance. [0040] The foregoing description shall not limit the present invention to the precise form thereof as shown and described. Further, those ordinarily skilled in the field of this invention will appreciate that variations and modifications of the present invention, as shown and described above, that are equivalent thereto, may be effected without departing from the spirit hereof. Such variations and modifications are covered by the following claims.

Claims

PHUS040527CLAIMS

1. A method of synthesizing an MMS message, which comprises : producing plural clips having visual and audio content; annotating the clips by appending thereto semantically-based tags corresponding to their content; creating an archive of the semantically-annotated clips; receiving from a user a preliminary message which is intended to be sent to a recipient to convey to the recipient and action or concept; extracting from the archive those clips having tags that semantically match at least the action or concept contained in the preliminary message; presenting to the user the extracted clips; in response to the user making a final selection of one or more of the presented clips, combining the selected clips and the preliminary message to produce a completed MMS message; and transmitting the completed MMS message to the recipient.

2. A method as in Claim 1 , wherein: the preliminary message is language-based and includes textual content and/or audio content.

3. A method as in Claim 2, which further comprises: performing automatic speech recognition of the audio content of the preliminary message to convert it to corresponding language-based textual content. PHUS040527

4. A method as in Claim 1 , wherein: the preliminary message contains audio-visual content.

5. A method as in Claim 4, which further comprises: performing automatic speech recognition of the audio content and automatic video analysis of the visual content of the preliminary message to convert each to corresponding language-based textual content.

6. A method as in Claim 1 , wherein: semantic matching also includes the name of the user and/or the recipient.

7. A method as in Claim 6, wherein: semantic matching is effected by key word matching or ontological protocols.

8. A method as in Claim 1 , wherein: semantic matching is effected by key word matching or ontological protocols.

9. A method as in Claim 1 , wherein: the clips are produced and annotated by the user and/or a third party or parties, the user-annotated clips being in an archive local to the user and the third party-annotated clips being in an archive remote from the user. PHUS040527

10. A method as in Claim 9, wherein: the audio-visual content of the second remote archive includes videos, advertising material, promotional material, portions or trailers of videos or movies, music, material obtained from the internet or WWW, and similar material.

11. A method as in Claim 1 , wherein: the user selects portions or all of each presented clip prior to the combining step.

12. A method as in Claim 1 , wherein: prior to the combining step, the user specifies or adds to either or both (i) the action or concept to be semantically matched or (ii) the semantic criteria used in the extracting step.

13. A method as in Claim 12, wherein: the user reviews and deletes or amends the presented clips prior to making the final selection.

14. A method as in Claim 1 , wherein: the transmitted completed message is received by the recipient on an MMS -capable device.

15. A method as in Claim 14, wherein: the MMS-capable device is a mobile communication device. PHUS040527

16. A method as in Claim 14, wherein: the MMS-capable device is a computer.

17. A method as in Claim 1, wherein: the preliminary message is transmitted by the user via an SMS-capable device, an MMS-capable device, or by voice mail.

18. A method as in Claim 1, which further comprises: after presenting to the user the extracted clips and before the combining step, obtaining from the user instructions and varying the parameters of the extracting step pursuant to the instructions, and then presenting to the user those clips extracted pursuant to the varied parameters.

19. A method as in Claim 18, wherein: the obtaining, varying, and presenting steps are iterated until the final selection is made.

20. A system for synthesizing an MMS message, which comprises: means for producing plural clips having visual and audio content; means for annotating the clips by appending thereto semantically-based tags corresponding to their content; means for creating an archive of the semantically-annotated clips; means for receiving from a user a preliminary message intended to be sent from the user to a recipient to convey to the recipient an action or concept; PHUS040527

means for extracting from the archive those clips having tags that semantically match at least the action or concept contained in the preliminary message; means for presenting to the user the extracted clips; means responsive to the user selecting one or more of the presented clips for combining the selected clips and the preliminary message to produce a completed MMS message; and means for transmitting the completed MMS message to the recipient.

21. A system as in Claim 20, wherein. the preliminary message is language-based and includes textual content and/or audio content.

22. A system as in Claim 21, which further comprises: means for performing automatic speech recognition of the audio content to convert the audio content to corresponding language-based textual content.

23. A system as in Claim 20, wherein: the preliminary message contains audio-visual content.

24. A system as in Claim 23, which further comprises: means for performing automatic speech recognition of the audio content and automatic video analysis of the visual content of the preliminary message to convert each to corresponding language-based textual content. PHUS040527

25. A system as in Claim 20, wherein: the extracting means also extracts from the archive clips that match the name of the user and/or the recipient.

26. A system as in Claim 25, wherein: the extracting means effects semantic matching by key word matching or ontological protocols.

27. A system as in Claim 20, wherein: the extracting means effects semantic matching by key word matching or ontological protocols.

28. A system as in Claim 20, which further comprises: means responsive to input from the user and/or from a third party or parties for producing and annotating the clips and for placing the user-annotated clips in an archive local to the user and the third party-annotated clips in an archive remote from the user.

29. A system as in Claim 28, wherein: the audio-visual content of the remote archive includes videos, advertising material, promotional material, portions or trailers of videos or movies, music, material obtained from the internet or WWW, and similar material.

30. A system as in Claim 20, which further comprises: means for permitting the user to select portions or all of each presented clip prior to the combination of the selected clips and the preliminary message. PHUS040527

31. A system as in Claim 20, which further comprises: means for allowing the user to specify or add to either or both (i) the action or concept to be semantically matched or (ii) the semantic criteria used by the extracting means, prior to the combination of the selected clips and the preliminary message.

32. A system as in Claim 31 , which further comprises: means for permitting the user to review and delete or amend the presented clips prior to final clip selection.

33. A system as in Claim 20, which further comprises: an MMS-capable device on which the recipient receives the transmitted completed message.

34. A system as in Claim 33, wherein: the MMS-capable device is a mobile communication device.

35. A system as in Claim 33, wherein: the MMS-capable device is a computer.

36. A system as in Claim 20, which further comprises: an SMS-capable device, an MMS-capable device or a voice mail device for transmitting the preliminary message by the user. PHUS040527

37. A system as in Claim 20, which further comprises: means for obtaining from the user, following the user being presented the extracted clips and before the extracted clips are combined with the preliminary message, instructions and for varying the parameters of the extracting means pursuant to the instructions, and for then presenting to the user clips extracted pursuant to the varied parameters.

38. A system as in Claim 37, which further comprises: means for iterating the operation of the obtaining means and the varying and presenting means until the selecting means is operated.